So I was browsing the Quake engine source code earlier today and stumbled upon some written utility functions. One of them was 'Q_memcpy':
void Q_memcpy (void *dest, void *src, int count)
{
int i;
if (( ( (long)dest | (long)src | count) & 3) == 0 )
{
count>>=2;
for (i=0 ; i<count ; i++)
((int *)dest)[i] = ((int *)src)[i];
}
else
for (i=0 ; i<count ; i++)
((byte *)dest)[i] = ((byte *)src)[i];
}
I understand the whole premise of the function but I don't quite understand the reason for the bitwise OR between the source and destination address. So the sum of my questions are as follows:
- Why does 'count' get used in the same bitwise arithmetic?
- Why is that result's last two bits checked if they are differing?
- What purpose does this whole check serve?
I'm sure it's something obvious but please excuse my ignorance because I haven't really delved into the more low level side of things when it comes to programming. I just find it interesting and want to learn more.
It first tests if all 3 arguments are divisible by 4. If - and only if - they all are, it proceeds with copying 4 bytes at a time.
I.e. this undecoded would be
I am not sure if they tested their compiler and it generated bad code for even a test, and therefore they decided to write it in such a convoluted way. In any case, the
x | y | z
will guarantee that a bit n is set in the result if it is set in any ofx
,y
orz
. Therefore if the(x | y | z) & 3
results in 0, none of the numbers had either of the 2 lowest bits set, and therefore are divisible by 4.Of course it would be rather silly to use now - the standard library
memcpy
in recent library implementations is almost certainly better than this.Therefore, on recent compilers you can optimize all calls to
Q_memcpy
by switching them tomemcpy
. GCC could generate things like 64-bit or SIMD moves withmemcpy
depending on the size of area to be copied.