I am trying to implement a C++17 portable and efficient Hamming distance function for ORB features, hopefully automatically using SIMD when compiling for both x86 and ARM.
Problem with std::bitset
std::bitset<N> provides a standard way for bitwise operation and count, and it's also superior to __builtin_popcount. However, it's a container type that owns data, and is not easy to be converted from a 256-bit vector stored as cv::Mat that is computed by cv::ORB::detectAndCompute.
This thread asks for converting cv::Mat to std::bitset<256>. I don't think memcpy in its answer is right, since I didn't find the memory layout for std::bitset in https://en.cppreference.com/w/cpp/utility/bitset. Moreover, std::bitset constructor does not support to initialize more than sizeof(unsigned long long) bits.
Problem with my implementation
To avoid copy, my current implementation use a span-like class:
struct ORB_view {
inline static constexpr int length = 256 / sizeof(uint32_t) / 8;
const uint32_t* data;
};
However, this cannot use bitwise operation and popcount directly, and lead to explicit SIMD instructions in implementation, which I would like to avoid by using std::bitset<N>::count().
What I expect
As a result, a non-owning reference providing bitwise operation and count would be much helpful in such a case. The desired features are:
- no data copy,
- fixed length as
std::bitset - bitwise XOR operation and popcount
- no explicit SIMD code for ARM or x86
The standard library does not provide it as far as I know.
Firstly, if performance is important, you should stay away from
std::bitsetfor most purposes. It throws exceptions on invalid inputs and those runtime checks are suboptimal for high-performance applications.Secondly, if you were able to use C++20, you could use
std::popcount. Prior to C++20, you can use__builtin_popcnt. You have pointed out portability issues, but those can be overcome with conditional compilation:See live example at Compiler Explorer.
With this wrapper (taken from the
bitmaniplibrary), you can callpopcountsafely for GCC, MSVC, and clang. The Android native compiler is based on LLVM, so it should also work.It's very easy to create a a wrapper like:
Obviously, this could also be a member function of your
ORB_view. There is no explicit SIMD-code, though clang auto-vectorizes this code.