I did not find this specific topic anywhere...
I am calling the nth_element() algorithm about 400,000 times per second on different data in a std::vector of 23 integers, more precise "unsigned short" values.
I want to improve computation speed and this particular call needs a significant portion of CPU time. Now I noted, as with std::sort(), that the nth_element function is visible in the profiler even with highest optimisation level and NDEBUG mode (Linux Clang compiler), so the comparison is inlined but not the function call itself. Well, more preise: not nth_element() but std::__introselect() is visible.
Since the size of the data is small, I experimented with using a quadratic sorting function PIKSORT, which is often quicker than calling std::sort when the size of data is less than 20 elements, probably because the function will be inline.
template <class CONTAINER>
inline void piksort(CONTAINER& arr) // indeed this is "insertion sort"
{
typename CONTAINER::value_type a;
const int n = (int)arr.size();
for (int j = 1; j<n; ++j) {
a = arr[j];
int i = j;
while (i > 0 && a < arr[i - 1]) {
arr[i] = arr[i - 1];
i--;
}
arr[i] = a;
}
}
However this was slower than using nth_element in this case.
Also, using a statistical method is not appropriate, Something faster than std::nth_element
Finally, since the values are in the range from 0 to about 20000, a histogram method does not look appropriate.
My question: does anyone know a simple solution to this? I think I am probably not the only one to have to call std::sort or nth_element very frequently.
I found this problem interesting, so I tried all the algorithms I could think of.
Here are the results:
I think we can conclude, that you where already using the fastest algorithm.Boy was I wrong. However, if you can accept an approximate answer, there are probably faster ways, such as median of medians.If you are interested, the source is here.