Reference for wasserstein distance function in python

6.9k Views Asked by At

We are trying to calculate the distance between two discrete 1-d distributions. Our purpose is to compute a distance function that follows the intuition of optimal transport: Our distributions are masses at "points", i.e vectors, with importance to the order of elements in each vector. Given a matrix that describes the distances between any two points, we would like to find the minimal-cost transport in u, in order to make it v.

The simplest example is: Let u,v be the distributions: u=(0.5,0.2,0.3), v=(0.5,0.3,0.2)

Assume that the distances matrix is [[1,1,1],[1,1,1],[1,1,1]], which means it costs 1 to move unit of mass between any two points. obviously, the optimal way to make u look like v is to transport 0.1 from the third point to the second point. The cost in that case will be 1*0.1 which is 0.1.

Following this intuition we turned to the Wasserstein distance. We have tried both scipy.stats.wasserstein_1d and the POT package - in particular ot.emd2. However, none really computes what we want, regarding the example above, the first doesn't consider the order of elements in the vector, so the result is 0. The second algorithm returns 1

We would really appreciate any explanation we might have missed regarding to the operation of this two python functions, or any other references or suggestions.

  • We are aware to the fact that given a non symmetric distance metric this notion of "distance" won't be symmetric.
2

There are 2 best solutions below

2
On BEST ANSWER

For the case where all weights are 1, Wasserstein distance will yield the measurement you're looking by doing something like the following.

from scipy import stats

u = [0.5,0.2,0.3]
v = [0.5,0.3,0.2]

# create and array with cardinality 3 (your metric space is 3-dimensional and
# where distance between each pair of adjacent elements is 1
dists = [i for i in range(len(w1))]

stats.wasserstein_distance(dists, dists, u, v)

This code treats what you are calling "distributions" as weights over distributions with values [0,1,2]. In simple graphical terms, your example distributions look like this in my treatment.

  u         v

|         |
|         |
|   |     | |
| | |     | | |
| | |     | | |
-----     -----
0 1 2     0 1 2
0
On

You misunderstand the Wasserstein distance. It is defined as a minimal average distance.

Here you have two distributions u and v on three values, say 1, 2, 3. The (i,j)-entry of the cost matrix is a distance between i and j. Note that in your case this is not a distance because d(i,i) is not zero. But that is not a problem.

Now, what is this minimal average distance? We have to introduce a joining of u and v, that is a two-dimensional distribution J whose first and second margins are u and v respectively. That is, you have a probability J(i,j) for each i,j in {1,2,3}. Then you have the average cost with respect to J: sum_{i,j} J(i,j)*d(i,j). The Wasserstein distance is the minimum value of this average cost over all possible joinings J.

So here the Wasserstein distance is obviously 1 because d(i,j)=1 for every i and j.