Mahalanobis distance computation in Python

29 Views Asked by Antonio Piemontese At 27 March 2024 at 16:33

When computing the Mahalanobis distance on the following (poorly correlated) dataframe, I get weird distance values. Here is the python code:

dataframe

data = { 'Price': [100000, 800000, 650000, 700000, 
               860000, 730000, 400000, 870000, 
               780000, 400000], 
     'Distance': [16000, 60000, 300000, 10000, 
                  252000, 350000, 260000, 510000, 
                  2000, 5000], 
     'Emission': [300, 400, 1230, 300, 400, 104, 
                  632, 221, 142, 267], 
     'Performance': [60, 88, 90, 87, 83, 81, 72,  
                     91, 90, 93], 
     'Mileage': [76, 89, 89, 57, 79, 84, 78, 99,  
                 97, 99] 
       }

import libraries

import numpy as np 
import pandas as pd  
import scipy as stats

create dataset

df = pd.DataFrame(data,columns=['Price', 'Distance', 
                            'Emission','Performance', 
                            'Mileage'])

compute the correlation matrix

df.corr(numeric_only=True)

the Mahalanobis distance function

def calculateMahalanobis(y=None, data=None, cov=None): 

    y_mu = y - np.mean(data) 
    if not cov: 
        cov = np.cov(data.values.T) 
    inv_covmat = np.linalg.inv(cov) 
    left = np.dot(y_mu, inv_covmat) 
    mahal = np.dot(left, y_mu.T) 
    return mahal.diagonal()

create new column in dataframe that contains Mahalanobis distance for each row

df['MahalanobisDistance'] = calculateMahalanobis(y=df, data=df[['Price', 'Distance', 'Emission','Performance', 'Mileage']])

display the dataframe

print(df)

All the distances in the last column are equal and so large! Why? I carefully checked the function and it seems correct. On the contrary the first 10, as an example, are expected to be the following (from a reliable source):

Original Q&A

Mahalanobis distance computation in Python

dataframe

import libraries

create dataset

compute the correlation matrix

the Mahalanobis distance function

create new column in dataframe that contains Mahalanobis distance for each row

display the dataframe

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in CLUSTER-ANALYSIS

Related Questions in LINEAR-ALGEBRA

Related Questions in MAHALANOBIS

Trending Questions

Popular # Hahtags

Popular Questions