I have a distributed array of rankings, of total size N irregularly distributed amongst NP processors, from which I need to extract the K largest elements. In the limit that K << N, K is smaller than any of the local buffer lengths, and K is relatively small in general (such that it can e.g. fit in reasonable MPI buffers), the following algorithm seems to work well
- Perform a local top-K search to determine the largest
Kvalues in each local array segment - Perform a custom Allreduce which performs binary top-K reductions between buffers of size
Kcoming from different processes.
This can be done in a semi-communication-optimal way given the communication patterns underlying MPI_Allreduce.
I'm unclear how this can be done efficiently without the above assumptions about the size of K relative to N and the local buffer sizes. In particular, I'm trying to determine an optimal (or reasonably scaling) algorithm that is compatible with the following:
Kcan be larger than some or all of the local buffer dimensionsKcan be so large is to be impractical to communicate entirely (e.g. trying to determine the top billion elements of a 10 billon element array)
The full array nor the top-K elements need to be sorted on completion.
For arrays which reside on a single processing element, the following questions are related:
Collect top K elements from multiple sorted arrays
Average time complexity of finding top-k elements
Optimal algorithm for returning top k values from an array of length N
The key here is to find the top-
Krepeatedly, where each time you makeKthe largest value satisfying the original assumption.Assuming you'd want to get top-
K'whereK'is greater than the local buffer and MPI communication sizes. You could do the following to eventually find top-K':KwhereKis the greatest value that fits inside local buffer and can be communicated with MPI.Kelements to the top arrayA, and remove them from the local arrays.size(A) == K'.The resulting array
Ashould contain the top-K'elements.