Why is memcmp(a, b, size)
so much faster than:
for(i = 0; i < nelements; i++) {
if a[i] != b[i] return 0;
}
return 1;
Is memcmp a CPU instruction or something? It must be pretty deep because I got a massive speedup using memcmp
over the loop.
memcmp
is often implemented in assembly to take advantage of a number of architecture-specific features, which can make it much faster than a simple loop in C.As a "builtin"
GCC supports
memcmp
(as well as a ton of other functions) as builtins. In some versions / configurations of GCC, a call tomemcmp
will be recognized as__builtin_memcmp
. Instead of emitting acall
to thememcmp
library function, GCC will emit a handful of instructions to act as an optimized inline version of the function.On x86, this leverages the use of the
cmpsb
instruction, which compares a string of bytes at one memory location to another. This is coupled with therepe
prefix, so the strings are compared until they are no longer equal, or a count is exhausted. (Exactly whatmemcmp
does).Given the following code:
gcc version 3.4.4
on Cygwin generates the following assembly:Reference:
cmpsb
instructionAs a library function
Highly-optimized versions of
memcmp
exist in many C standard libraries. These will usually take advantage of architecture-specific instructions to work with lots of data in parallel.In Glibc, there are versions of
memcmp
for x86_64 that can take advantage of the following instruction set extensions:sysdeps/x86_64/memcmp.S
sysdeps/x86_64/multiarch/memcmp-sse4.S
sysdeps/x86_64/multiarch/memcmp-ssse3.S
The cool part is that glibc will detect (at run-time) the newest instruction set your CPU has, and execute the version optimized for it. See this snippet from
sysdeps/x86_64/multiarch/memcmp.S
:In the Linux kernel
Linux does not seem to have an optimized version of
memcmp
for x86_64, but it does formemcpy
, inarch/x86/lib/memcpy_64.S
. Note that is uses the alternatives infrastructure (arch/x86/kernel/alternative.c
) for not only deciding at runtime which version to use, but actually patching itself to only make this decision once at boot-up.