I want to obtain a matrix of partial correlatins (for all pairs), removing the effect of all other columns.
I am using pingouin
, however the function
df.pcorr().round(3)
only works with pearson correlation
.
Here is the code:
#!pip install pingouin
import pandas as pd
import pingouin as pg
df = pg.read_dataset('partial_corr')
print (df.pcorr().round(3)) #LIKE THIS BUT USING SPEARMAN CORRELATION
OUT: #like this one except obtained with SPEARMAN
x y cv1 cv2 cv3
x 1.000 0.493 -0.095 0.130 -0.385
y 0.493 1.000 -0.007 0.104 -0.002
cv1 -0.095 -0.007 1.000 -0.241 -0.470
cv2 0.130 0.104 -0.241 1.000 -0.118
cv3 -0.385 -0.002 -0.470 -0.118 1.00
Question: how do I make a partial correlation matrix for a pandas dataframe, excluding covariance of all other columns using SPEARMAN?
You can use the fact that a partial correlation matrix is simply a correlation matrix of residuals when the pair of variables are fitted against the rest of the variables (see here).
You will need to get all the pairs - (
itertools.combinations
will help here) and fit linear regression (sklearn
), get the spearman correlation on the residuals, then reshape the data to get the matrix.Here is an example with the Iris Dataset that comes with
sklearn
.Output