wasserstein distance for multiple histograms

564 Views Asked by At

I'm trying to calculate the distance matrix between histograms. I can only find the code for calculating the distance between 2 histograms and my data have more than 10. My data is a CSV file and histogram comes in columns that add up to 100. Which consist of about 65,000 entries, I only run with 20% of the data but the code still does not work.

I've tried the distance_matrix from scipy.spatial.distance_matrix but it ignore the face that data are histogram and treat them as normal numerical data. I've also tried wasserstein distance but the error was object too deep for desired array

from scipy.stats import wasserstein_distance
distance = wasserstein_distance (df3,df3)

I expected the result to be somewhat like this :

0   1              2           3           4             5          6    
0   0.000000    259.730341  331.083554  320.302997  309.577373  249.868085 
1   259.730341  0.000000    208.368304  190.441382  262.030304  186.033572  
2   331.083554  208.368304  0.000000    112.255111  256.269253  227.510879  
3   320.302997  190.441382  112.255111  0.000000    246.350482  205.346804  
4   309.577373  262.030304  256.269253  246.350482  0.000000    239.642379  

but it was an error instead

ValueError: object too deep for desired array
0

There are 0 best solutions below