Calculating the mutual information between two random vectors returns the same value

Question

Calculating the mutual information between two random vectors returns the same value

2.1k Views Asked by Shayan At 19 November 2022 at 11:00

I want to calculate the mutual information between two numpy vectors:

>>>from sklearn.metrics.cluster import mutual_info_score
>>>import numpy as np

>>>a, b = np.random.rand(10), np.random.rand(10)
>>>mutual_info_score(a, b)
1.6094379124341005

>>>a, b = np.random.rand(10), np.random.rand(10)
>>>mutual_info_score(a, b)
1.6094379124341005

As you can see, although I updated a and b, it returned the same value. Then I tried another example:

>>>a = np.array([167.52523295,  73.2904335 ,  98.61953303, 152.17297007,
       211.01341451, 327.72296346, 356.60500081,  43.9371432 ,
       119.09474284, 125.20180842])

>>>b = np.array([280.9287028 , 131.76304983, 176.0277832 , 188.56630096,
       229.09811401, 228.47200012, 617.67000122,  52.7211511 ,
       125.95361582, 148.55247447])

>>>mutual_info_score(a, b)
2.302585092994046


>>>a = np.array([ 6.71381009,  1.43607653,  3.78729242, -4.75706796, -3.81281173,
        3.23440092, 10.84495625, -0.19646145,  4.09724507, -0.13858104])

>>>b = np.array([ 4.25330873,  3.02197642, -3.2833848 ,  0.41855662, -3.74693531,
        0.7674982 , 11.36459148,  0.64636462,  0.51817262,  1.65318943])

>>>mutual_info_score(a, b)
2.302585092994046

Why? Look at the difference between those numbers. Why it returns the same value? More importantly, how do I calculate the MI between two vectors?

Original Q&A

There are 1 best solutions below

**Shayan** · Accepted Answer · 2022-12-07T08:59:04.770000

In that case, you will obtain different numbers each time you run the cell. Here, you're utilizing a method that is suitable for measuring the quality of clustering results!
Let's quickly jump into the principal material. For observing the mutual information (MI) between two vectors (or even several vectors), you can use the mutual_info_regression function (as described here):

In [1]: from sklearn.feature_selection import mutual_info_regression

In [2]: a, target = np.random.rand(10, 3)+300, np.random.rand(10)

In [3]: mi = mutual_info_regression(a, target)

In [4]: mi
Out[4]: array([0.18373016, 0.19396825, 0.09634921])

In the above, I calculated the MI between each feature of the a with the target! E.g., the MI between the first feature and the target is ~0.184. There are various ways to calculate MI between variables, e.g.:

estimate mutual information (MI) with histograms. E.g., code:

from sklearn.metrics import mutual_info_score

def MI(x, y, bins):
    c_xy = np.histogram2d(x, y, bins)[0]
    mi = mutual_info_score(None, None, contingency=c_xy)
    return mi

The challenge is finding a suitable value for the number of bins here. [1]

based on entropy estimation from k-nearest neighbors' distances (mutual_info_regression is based on this approach)
etc.

P.S. Reading this document is worthwhile.

Calculating the mutual information between two random vectors returns the same value

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in MUTUAL-INFORMATION

Trending Questions

Popular # Hahtags

Popular Questions