I have two alternative code implementations for calculating the largest power of 2 smaller than a given 64-bit unsigned integer (uint64_t):
Implementation 1:
#include <iostream>
#include <cstdint>
uint64_t floor_power_of_2(uint64_t x)
{
x |= (x >> 1);
x |= (x >> 2);
x |= (x >> 4);
x |= (x >> 8);
x |= (x >> 16);
x |= (x >> 32);
return x - (x >> 1);
}
Implementation 2 (which I prefer, since is a lot faster):
#include <iostream>
#include <cstdint>
uint64_t floor_power_of_2(uint64_t x)
{
return 1ULL << (63 - __builtin_clzll(x));//63-count of leading zeros
}
My questions are:
- Portability of
__builtin_clzll(): How portable is Implementation 2 across different compilers, operating systems, and CPU architectures? Does it rely on specific CPU instructions that might not be available everywhere? - Detecting availability and fallback: Is it possible to detect at
runtime whether
__builtin_clzll()is available, so that the program can use Implementation 1 as a fallback if Implementation 2 is not supported? - Best practice for portability: What's the recommended approach to ensure portability while maintaining fast performance for this calculation?
I'm currently using the GCC compiler, but I've also tested on Compiler Explorer with Clang (which can compile it, and seems to do it better than gcc).
I'm interested in ensuring compatibility with a wide range of platforms.
I'd appreciate any insights or guidance on these questions.
__builtin_clzllis a GCC built-in. Compilers derived from GCC and compilers that try to be compatible with GCC (such as Clang) will support it, but other compilers will not or may have their own alternative built-in.There is no standard way to detect built-ins. They are completely up to the implementation (as everything containing double underscores). You would need to check which compiler you are compiling under. This task may be better suited to the build system, but you could use the
__GNUC__macro which is defined by both GCC and Clang.In C++20 a standardization of
__builtin_clzllis already present in the standard library asstd::countl_zeroand I would expect the standard library implementer to carefully implement it in the most efficient way for every platform and compiler they support.Also, the standard library implementation works correctly on all unsigned integer types automatically and they behave predictably with input
0which your second implementation doesn't, because according to GCC documentation it has undefined behavior for input0. Consider what the output should be for0.There is also
std::bit_floorwhich is the function that you want to implement. So you can also use that directly with defined behavior at0.