To avoid the O(n^2) worst case scenario for quick select, I am aware of 2 options:
Randomly choose a pivot index
Use median of medians (MoM) to select an approximate median and pivot around that
When using MoM with quick select, we can guarantee worst case O(n). When using (1), we can't guarantee worst case O(n), but the probability of the algorithm going to O(n^2) should be extremely small. The overhead cost of (2) is much more than (1), where the latter adds little to no additional complexity.
So when should we use one over the other?
As you've noted, the median-of-medians approach is slower than quickselect, but has a better worst-case runtime. Assuming quickselect is truly using a random choice of pivot at each step, you can prove that not only is the expected runtime O(n), but that the probability that its runtime exceeds Θ(n log n) is very, very small (at most 1 / nk for any choice of constant k). So in that sense, if you have the ability to select pivots at random, quickselect will likely be faster.
However, not all implementations of quickselect use true randomness for the pivots, and some use deterministic pivot selection algorithms. This, unfortunately, can lead to pathological inputs that trigger the Θ(n2) worst-case runtime, which is a problem if you have adversarially-chosen inputs.
Once nice compromise between the two is introselect. The basic idea behind introselect is to use quickselect with a deterministic pivot selection algorithm. As the algorithm is running, it keeps track of how many times it's picked a pivot without throwing away at least 30% the input array. If that number exceeds some threshold, it stops using a random pivot choice and switches to the median-of-medians approach to select a good pivot, forcing a 30% size reduction. This approach means that in the common case when quickselect rapidly reduces the input size, introselect is basically identical to quickselect with a tiny bookkeeping overhead. However, in cases where quickselect would degrade to quadratic, introselect stops and switches to the worst-case efficient median-of-medians approach, ensuring the worst-case runtime is O(n). This gives you, essentially, the best of both worlds - it's fast on average, and its worst-case is never worse than O(n).