Distributed Memory Top-K Algorithm for Large K

119 Views Asked by At

I have a distributed array of rankings, of total size N irregularly distributed amongst NP processors, from which I need to extract the K largest elements. In the limit that K << N, K is smaller than any of the local buffer lengths, and K is relatively small in general (such that it can e.g. fit in reasonable MPI buffers), the following algorithm seems to work well

  1. Perform a local top-K search to determine the largest K values in each local array segment
  2. Perform a custom Allreduce which performs binary top-K reductions between buffers of size K coming from different processes.

This can be done in a semi-communication-optimal way given the communication patterns underlying MPI_Allreduce.

I'm unclear how this can be done efficiently without the above assumptions about the size of K relative to N and the local buffer sizes. In particular, I'm trying to determine an optimal (or reasonably scaling) algorithm that is compatible with the following:

  • K can be larger than some or all of the local buffer dimensions
  • K can be so large is to be impractical to communicate entirely (e.g. trying to determine the top billion elements of a 10 billon element array)

The full array nor the top-K elements need to be sorted on completion.

For arrays which reside on a single processing element, the following questions are related:

Collect top K elements from multiple sorted arrays

Average time complexity of finding top-k elements

Optimal algorithm for returning top k values from an array of length N

1

There are 1 best solutions below

0
On

The key here is to find the top-K repeatedly, where each time you make K the largest value satisfying the original assumption.

Assuming you'd want to get top-K' where K' is greater than the local buffer and MPI communication sizes. You could do the following to eventually find top-K':

  1. Find top-K where K is the greatest value that fits inside local buffer and can be communicated with MPI.
  2. Append the K elements to the top array A, and remove them from the local arrays.
  3. Go back to #1 above until size(A) == K'.

The resulting array A should contain the top-K' elements.