how to (reverse) find range of (other) values from pandas.corr()?

71 Views Asked by At

While very interesting, I am having a hard time finding applicable uses of pandas.DataFrame.corr().

            age   weight   height       IQ
age        1.00     0.25     0.42     0.33
weight     0.25     1.00     0.82     0.69
height     0.42     0.82     1.00     0.08
IQ         0.33     0.69     0.08     1.00

Given age, weight and height, how to find the (range) of IQ within this given correlation of 0.33?

Or alternatively, given age, weight, height and IQ, how to find the correlation numbers --- meaning, how to find the individual correlation values of each individual data (that aggregated up to this final corr() map)?

Also, in general, I would like to know (an actual use_case scenario of) how experts apply these correlation data in a meaningful way.

Thank you!

I tried searching.

And I do not want to build models to predict, I specifically want to know about pandas.DataFrame.corr()

1

There are 1 best solutions below

0
DataJanitor On

Let's use another example:

With a correlation coefficient of 0.82, the table tells you that the variables 'height' and 'weight' are strongly associated, meaning that high values in the variable 'height' often appear with high values in the variable 'weight'. Low values often appear with low values in the other.

Contrary, the association between the variables 'IQ' and 'weight' is almost negligible (0.08). This means that high values in 'IQ' do not consistently align with high or low values in 'weight'.

how to find the (range) of IQ within this given correlation of 0.33

From these numbers, you cannot reverse-engineer the value range of the underlying data.

how to find the individual correlation values of each individual data (that aggregated up to this final corr() map)?

There is no concept as an individual correlation value, as correlation considers the whole variable (i.e. column of your DataFrame) and thus only works at the aggregate level. There is no way to look at a single pair of values (let's say 1.80m and 75kg) and calculate a correlation value by these two values alone.

I would like to know (an actual use_case scenario of) how experts apply these correlation data in a meaningful way

The correlation matrix gives an initial idea about the variables. This might play a crucial role when working with multiple linear regressions. They have the assumption, that the independent (predictive) variables may not be correlated to each other. To solve this issue, variables need to be iteratively removed. In this specific scenario, the correlation table (along with other tools like variance inflation factor, VIF) iteratively helps to check, if independent variables are still correlated.

While the correlation coefficient prodives insights about the data and is thus a good starting point, please keep in mind that correlation (a.k.a. association, "A happens at the same time as B") does not neccessarily imply causation ("A happens because of B"). There might be other factors which might influence the result where correlation alone might be leading you to wrong conclusions.