How to optimize this Radix-Sort algorithm in C++?

230 Views Asked by At

I am working with this assignment of optimizing a radix sort code in C++ and I need to reduce the execution time, my code is working and it looks like this:

void RadixSort::RadixSortNaive(std::vector<long> &Arr) {

long Max_Value = findMax(Arr);

    int Max_Radix = 1;
while (1) {
  if (Max_Radix >= Max_Value) break;
  Max_Radix = Max_Radix*radix_;
}

for (int i = 1; i < Max_Radix; i = i*radix_) {
  for (int j = 0; j < key_length_; j++) {
    int K;
    if (Arr[j] < i) K = 0;
    else K = (Arr[j] / i) % radix_;
    Q[K].push(Arr[j]);
  }

  int idx = 0;
  for (int j = 0; j < radix_; j++) {
    while (Q[j].empty() == 0) {
      Arr[idx] = Q[j].front();
      Q[j].pop();
      idx++;
    }
  }
}
class RadixSort{
public :

  void setConfig(int key_length, int radix) {
    key_length_ = key_length;
    radix_ = radix;
    for (int i = 0; i < radix_; i++) {
      Q.push_back(std::queue<long>());
    }
  }

  long findMax(std::vector<long> Arr) const {
    long Max = 0;
    for (int i = 0; i < key_length_; i++) {
      if (Max < Arr[i])
        Max = Arr[i];
    }
    return Max;
  }

  void RadixSortNaive(std::vector<long> &Arr);
  void RadixSortStudent(std::vector<long> &Arr);

private:
  int key_length_;
  int radix_;
  std::vector<std::queue<long>> Q;
};
}

However, I am sure that there is still room for improvement. I have been trying to implement parallelization with OMP library but nothings seems to work. Is there any way where I can improve the previous code? Maybe improving the loops or any other code optimization technique.

1

There are 1 best solutions below

0
Aki Suihkonen On

As suggested in the comments, first thing is to get the API right.

findMax can be replaced by std::max_element( ), which uses iterators, and doesn't make a copy of the input.

Other suspicious thing is Q[K].push(Arr[j]);. If memory so permits, at least reserve the maximum amount of elements in each queue -- otherwise the queues need to copy old data when resizing.

Then if possible, using raw pointers with no out of range check, you can push() and pop() with auto popped = *tail++ and *head++ = new_element; My observation is that while STL is correctly implemented and is fast to develop with, the support of dynamic memory allocation in insertions practically always degrades performance compared to known static allocations.

Third thing is to specialise the radix for powers of two, since now the division is strength reduced to shift, and the modulus is strength reduced to logical and (by some constants, which need to be calculated).

Especially when radix is a power of two, and possibly otherwise too, I guess it's not useful to calculate K==0 conditionally: if (Arr[j] < i) K = 0;.