How to calculate pairwise Mutual Information for entire pandas dataset?

4k Views Asked by At

I have 50 variables in my dataframe. 46 are dependant variables and 4 are independandt variables (precipitation, temperature, dew, snow). I want to calculate the mutual information of my dependant variables agaisnt my independant.

So in the end i want a dataframe like this enter image description here

Right now i am calculating it using the following but it's taking so long because i have to change my y each time

X = df[['Temperature', 'Precipitation','Dew','Snow']] # Features
y = df[['N0037']] #target 

from sklearn.feature_selection import mutual_info_regression
mi = mutual_info_regression(X, y)
mi /= np.max(mi)

mi = pd.Series(mi)
mi.index = X.columns
mi.sort_values(ascending=False)
mi
2

There are 2 best solutions below

1
Always Right Never Left On BEST ANSWER

Using list comprehension:

indep_vars = ['Temperature', 'Precipitation', 'Dew', 'Snow'] # set independent vars
dep_vars = df.columns.difference(indep_vars).tolist() # set dependent vars

from sklearn.feature_selection import mutual_info_regression as mi_reg

df_mi = pd.DataFrame([mi_reg(df[indep_vars], df[dep_var]) for dep_var in dep_vars], index = dep_vars, columns = indep_vars).apply(lambda x: x / x.max(), axis = 1)
2
Miguel Trejo On

Another way is to pass a custom method to pandas.DataFrame.corr() function

from sklearn.feature_selection import mutual_info_regression

def custom_mi_reg(a, b):
    a = a.reshape(-1, 1)
    b = b.reshape(-1, 1)
    return  mutual_info_regression(a, b)[0] # should return a float value
    
    
df_mi = df.corr(method=custom_mi_reg)