Change date to work in correlation matrix

19 Views Asked by At

I am trying to create a correlation matrix to see which variables are useful from my dataset since there are over 600 variables.

dataset

I used df.corr() and received an error message that Python cound not convert string into a float. It was the date column. It is set up as YYYYmM or 2019m5 (2019 month 5(May). Do I just need to change the format? If so, how would I do that for the matrix to work?

1

There are 1 best solutions below

0
Aditya On

Correlation can only be mathematically calculated with numerical data.

If you want to perform this calculation, choose all numerical data types using the code

df.select_dtypes(include='number')

I recommend starting your data visualization with scatterplot and heatmap!