Understanding Mahalanobis distance between ECG beats

33 Views Asked by At

I have a dataframe where each row represents a 200-dimensional ecg beat. I have a lot of noisy ecg beats (rows), and so I want to calculate Mahalanobis distance of each beat with its average beat (each row and average row) and identify beats (rows) which are some distance away. Here is my code:

df_cut = dfs[0].iloc[:,:200].fillna(0)
df_selected = df_cut
mean_vec = np.mean(df_selected, axis=0)
df_diff = df_selected - mean_vec

cov_mat = np.cov(df_diff.values.T)
det = np.linalg.det(cov_mat)
if det == 0:
    print('Matrix is singular')
else:
    print('Matrix is not singular')
# std_devs = np.sqrt(np.diagonal(cov_mat))
# cor_mat = cov_mat / np.outer(std_devs, std_devs)
zero_vec = np.zeros(mean_vec.shape)
# Calculate the inverse of the covariance matrix
# If the covariance matrix is singular or not well-conditioned, add a small positive number to the diagonal
if np.linalg.cond(cov_mat) < 1/sys.float_info.epsilon:
    inv_cov_mat = inv(cov_mat)
else:
    inv_cov_mat = inv(cov_mat + np.eye(cov_mat.shape[0]) * 1e-8)

# Calculate Mahalanobis distance for each row
df_diff['Mahalanobis_Distance'] = df_diff.apply(lambda row: distance.mahalanobis(row, zero_vec, inv_cov_mat), axis=1)

The Mahalanobis distance I ma getting here is plotted in the histogram below:

enter image description here

As per my understanding, the Mahalanobis distance indicates how many stds is that point away from my mean. however here when I see the distance values around 10 or more, I am confused whether there is an error in my code? How can I interpret the results if I think this is wrong

0

There are 0 best solutions below