I need your help as I'm new in data analysis. I have a dataframe iv csv, which contains data about sales performance. The columns of the dataset are "SalesDate" as datetime64[ns], "ProductCategory" as object, "SalesAmount" as int64, "CustomerAge" as int64, "CustomerGender" as object, "CustomerLocation" as and "ProductRatings" as int64. All the variables except of "SalesDate" and "SalesAmount" must be categorical. My questions are:
- Do I have to convert first the variables to categorical and then use the methods dummies or one-hot-encoding?
- How to build the linear regression between "SalesAmount" and "ProductCategory" and make predictions?
Can you take a screenshot of the first five rows of the data and share that instead?
Based on the info you've provided:
I would look at the kind of analysis you're doing and convert the datetime to a more precise format. For example, if you are analyzing data over a few years and the exact date of sale is not relevant, you can separate the datetime column into two new columns: month and year.
It depends on what you are trying to predict. Accordingly, you should make one of these parameters as your target variable and separate it from the rest of the dataset. If Product Category is your target variable, shouldn't it be a classification task and not a regression task?
Also, I am not sure why customer age (int 64) would be a categorical variable (unless its a range). So you might want to study the data more closely first.