Wasserstein distance between two distributions python

884 Views Asked by user5264628 At 28 June 2025 at 18:47

I have distributions of some data pre and post an event occurrence. I want to find the distance between these two distributions. To put it differently, how much would I need to scale pre-event distribution to come close to the post-event distribution? I think Wasserstein distance seems like a good fit to my problem but I have some doubts :

The distribution is : X axis is days, and Y axis is number of data points on that day. How do I pass these two columns as input to scipy.stats.wasserstein_distance ?
Post event distribution is more long tailed than pre event distribution. What is the best distance metric to measure the magnitude change on X axis, as well as the increase in Y axis ?

>>> df.head()
   day  number
0    7       1
1    8       1
2   10       2
3   11       1
4   15       4
>>> df_after.head()
   day  number
0    6       1
1   19       1
2   20       1
3   21       1
4   22       2
>>> wasserstein_distance(df['number'], df_after['number']) #looks at only one column of DF- how do I pass the distribution?
0.8674329501915711

Here is a sample plot of the real dataset, blue is pre-event occurring and orange is post-event occurrence. My end goal is to learn from such distributions and generalize a scaling factor, i.e. how much do I need to scale my pre-event distribution to get to post-event distribution?

Original Q&A

Wasserstein distance between two distributions python

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in STATISTICS

Related Questions in SCIPY.STATS

Related Questions in EMPIRICAL-DISTRIBUTION

Related Questions in EARTH-MOVERS-DISTANCE

Trending Questions

Popular # Hahtags

Popular Questions