Parallel code for next_permutation()

2k Views Asked by At

I am wondering if I can parallelize this code using OpenMP. Will OpenMP make the code run faster? Is there a better way to achieve this?

vector<int> t; // already initialized
do{
    // Something with current permutation

} while(next_permutation(t.begin(), t.end()));

I know I can easily parallelize a for instruction, but here I have a while (condition = true).

2

There are 2 best solutions below

3
On

Use Finding n-th permutation without computing others to get the kth permutation, for k=i*n/count, i from 0 to count, where n is the number of permutations, i is an index, and count is the number of threads.

This gives you count blocks or intervals. Iterate within each interval in a separate thread in parallel, calling next_permutation repeatedly in each thread.

3
On
  1. next_permutation produces permutations in lexicographical order, which means that the prefixes to the permutations produced are also in lexicographical order. In other words, you could do a very crude parallelization by separately processing each possible initial element:

    // Assume that v is sorted (or sort it)
    // This `for` loop should be parallelized
    for (auto n = v.size(), i = 0; i < n; ++i) {
      // Make a copy of v with the element at 'it' rotated to the beginning
      auto vprime = v;
      std::rotate(vprime.begin(), vprime.begin() + i, vprime.begin() + i + 1);
      // The above guarantees that vprime[1:] is still sorted.
      // Since vprime[0] is constant, we only need to permute vprime[1:]
      while (std::next_permutation(vprime.begin() + 1, vprime.end()) {
        // do something with vprime
      }
    }
    

    The above assumes that the processing time for each permutation is roughly the same. If the processing time for permutations with some initial element is different from the average time for permutations with some other initial element, then some threads will terminate before others, reducing the effectiveness of the parallelization. You could make the parallelization chunk smaller by using prefixes larger than one element.

  2. It appears that what you really want is to produce all permutations of sets of k elements drawn from a vector of n elements. There are n!/(nk)! such permutations, which rapidly becomes a very large number. For example, if n is 15 and k is 10, there are 10,897,286,400 permutations. Processing all of them will take a while, even if the processing is quite fast. So you are correct to seek a method of working in parallel.

  3. To find all the permutations of the k-combinations, performing next_permutation on each possible combination is a reasonable approach, if you have a library function which produces all combinations. But be aware that many implementations of next_combination are optimized for ease of use, not for performance. Efficiently performing next_combination in a loop requires a persistent state, which will can radically reduce the cost of searching the next combination.

    Another approach is to use an implementation of next_partial_permutation, which directly produces the next permutation of k out of n elements. A simple solution is based on next_permutation, but this is also suboptimal because of the additional call to std::reverse. (It's worthwhile thinking about why this algorithm works. One hint: if you reverse the lexicographically first permutation of a sequence, you get the lexicographically last permutation.) (Code adapted from N2639)

    template<typename BidiIt>
    bool next_partial_permutation(BidiIt first, BidiIt middle, BidiIt last) {
      std::reverse(middle, last);
      return std::next_permutation(first, last);
    }
    

    Regardless of how you compute the partial permutations, you can parallelize the algorithm using the same approach as indicated above: Chunk the permutations by prefix (or initial element, if processing time doesn't vary) and do the chunks in parallel.