I am using the BRIEF descriptor in OpenCV in Visual C++ 2010 to match points in two images.
In the paper about the BRIEF-descriptor is written that it is possible to speed up things:
"The BRIEF descriptor uses hamming distance, which can be done extremely fast on modern CPUs that often provide a specific instruction to perform a XOR or bit count operation, as is the case in the latest SSE instruction set."
With SSE4.2 enabled it should be speeded up. My questions is simply how I do this in Visual C++?
An alternative way could be to choose another compiler supporting SSE4. For instance Intel's ICC. Is this really necessary?
The MSVC compiler has an
/arch
option for specifying the minimum architecture you want your program to target. Setting it like/arch:SSE2
will tell the compiler to assume that the CPU supports the SSE2 instructions, and it will automatically use them whenever the optimizer determines it's appropriate.However, MSVC has no
/arch:SSE4
or/arch:SSE42
option. A peek into the standard library implementation suggests that/arch:AVX
or/arch:AVX2
also implies SSE4.2. For example, the MSVC implementation of the C++20 library functionstd::popcount
will do a runtime check of the processor to see if it can use the SSE4.2 popcnt instruction. But if you target AVX, it skips the runtime check and just assumes the processor supports it.I think gcc and clang do have specific options for enabling SSE4 and SSE4.2. Update: Peter Cordes confirms in the comments: "To enable popcnt specifically, -mpopcnt, or for SSE4.2 -msse4.2 which implies popcnt."
You can also use intrinsic functions for built-in instructions if you don't want to rely on the optimizer and the library implementation to find the optimal instructions.