I have a distributed array of rankings, of total size N
irregularly distributed amongst NP
processors, from which I need to extract the K
largest elements. In the limit that K << N
, K
is smaller than any of the local buffer lengths, and K
is relatively small in general (such that it can e.g. fit in reasonable MPI buffers), the following algorithm seems to work well
- Perform a local top-K search to determine the largest
K
values in each local array segment - Perform a custom Allreduce which performs binary top-K reductions between buffers of size
K
coming from different processes.
This can be done in a semi-communication-optimal way given the communication patterns underlying MPI_Allreduce
.
I'm unclear how this can be done efficiently without the above assumptions about the size of K
relative to N
and the local buffer sizes. In particular, I'm trying to determine an optimal (or reasonably scaling) algorithm that is compatible with the following:
K
can be larger than some or all of the local buffer dimensionsK
can be so large is to be impractical to communicate entirely (e.g. trying to determine the top billion elements of a 10 billon element array)
The full array nor the top-K
elements need to be sorted on completion.
For arrays which reside on a single processing element, the following questions are related:
Collect top K elements from multiple sorted arrays
Average time complexity of finding top-k elements
Optimal algorithm for returning top k values from an array of length N
The key here is to find the top-
K
repeatedly, where each time you makeK
the largest value satisfying the original assumption.Assuming you'd want to get top-
K'
whereK'
is greater than the local buffer and MPI communication sizes. You could do the following to eventually find top-K'
:K
whereK
is the greatest value that fits inside local buffer and can be communicated with MPI.K
elements to the top arrayA
, and remove them from the local arrays.size(A) == K'
.The resulting array
A
should contain the top-K'
elements.