Given some histograms of the same number of buckets, I need to find the "center" of those histograms. The "center" is a histogram such that the sum of Earth Mover's Distances to it from all other histograms is the minimum.
For example, given 4 histogram A
, B
, C
, D
, the algorithm needs to output a new histogram X
such that EMD(X, A) + EMD(X, B) + EMD(X, C) + EMD(X, D)
is the minimum.
Simple arithmetic mean cannot find the "center", here is an example.
I need to calculate the "center" of millions of histograms, so how can I find the "center" efficiently. If no fast algorithm exists, is there any good approximate ?
=== edit ===
Added an example to clarify my problem.
If by "center" you are referring to the median, that would require sorting of the data set; in this case histograms are sorted already. It is understood that the data of the histograms will likely not be in list form; however, as no alternative is noted the answer will be structured as such.
raw data: list of values histogram data:list of tuples "((min, max), quantity)"
raw data available:
histogram data:
issue: