Regression Analysis w/ Dummy Variables in Tableau

85 Views Asked by At

The dataset I am working with is an automotive dataset with different vehicle sales. My assignment is to do a regression analysis on the dataset using Tableau. I wanted to do a regression analysis of the variable vehicle use (personal) against gross margin %. And another regression analysis of car category (SUV) against gross margin %. I want to compare the r^2 value between both of these to determine which variable explains the gross margin % better.

The vehicle use variable has 3 categories: personal use, business use, and mixed use. it has been found that personal use yields the most profit (I use gross margin % as a measure of profit) so I want to do a regression analysis on personal use. Car category also has 3 categories: SUV, pick up truck and sedan. SUV was found to be the most profitable so that's why I chose SUV.

Because the variables vehicle use and car category are qualitative variables, I am 99% sure I have to create a dummy variable. So I created a NULL variable in Tableau where I put:

IF [Vehicle Use] = 'personal' THEN 1
ELSE 0
END

I then created one for the car category:

IF [Category] = 'SUV' THEN 1
ELSE 0
END

For the Personal Use: In Tableau, I put SUM(Personal Use) in columns and SUM(Gross Margin %) in rows. I then did Analysis > deselect Aggregate Measures to get the scatterplot visual. Then I added a trend line to get the r^2 value. I repeated this step with the SUV variable

I found the r^2 value for personal use is 0.000079 while the r^2 value for SUV is 0.18. Personal Use SUV

I'm not sure if I did this correct. I have concerns that I might have done this wrong for several reasons.

  1. To my knowledge dummy variables are binary: either 1 or 0. does this mean I can't use dummy variables for vehicle use and car category since they have 3 options instead of 2? Or do I need to do something else to account for 3 options
  2. My graphs look very weird. When I search up regression analysis for dummy variables I get images of graph like this: sample

Did I do something wrong in my regression analysis process? because on the sample image, there is a dependent variable (score), independent variable (exercise), and the dummy variable (attend), each representing a different colored line. But my image has a dependent variable (gross margin %) and independent variable (SUV or personal use) which in this case is my dummy variable. so did I do something wrong?

Or if you have any suggestions on what I can analyze in terms of regression analysis on this dataset please let me know. The variables in this dataset include: category, city, company, country, order date, postal code, promotion type, region, state, transaction date, vehicle use, final sales price, gross margin, gross margin %, quantity, unit sales price. is there an easier regression analysis I can do with these variables that will generate a prettier graph and an actual conclusion (I can't really conclude anything with r square values of 0.000079 and 0.18).

0

There are 0 best solutions below