Don't fully understand custom-written 'memcpy' function in C

Question

Don't fully understand custom-written 'memcpy' function in C

171 Views Asked by Simeon Laplev At 01 August 2025 at 11:11

So I was browsing the Quake engine source code earlier today and stumbled upon some written utility functions. One of them was 'Q_memcpy':

void Q_memcpy (void *dest, void *src, int count)
{
    int             i;

    if (( ( (long)dest | (long)src | count) & 3) == 0 )
    {
        count>>=2;
        for (i=0 ; i<count ; i++)
            ((int *)dest)[i] = ((int *)src)[i];
    }
    else
        for (i=0 ; i<count ; i++)
            ((byte *)dest)[i] = ((byte *)src)[i];
}

I understand the whole premise of the function but I don't quite understand the reason for the bitwise OR between the source and destination address. So the sum of my questions are as follows:

Why does 'count' get used in the same bitwise arithmetic?
Why is that result's last two bits checked if they are differing?
What purpose does this whole check serve?

I'm sure it's something obvious but please excuse my ignorance because I haven't really delved into the more low level side of things when it comes to programming. I just find it interesting and want to learn more.

Original Q&A

There are 3 best solutions below

Eugene Sh. On 23 May 2018 at 18:34

The bitwise ORing and ANding with 3 is to check whether the source, destination and count are divisible by 4. If they are, the operation can work with 4-byte words, while this code is assuming int as 4 bytes. Otherwise the operation is performed bytewise.

Weather Vane On 23 May 2018 at 18:35

It is finding out whether the source and destination pointers are int aligned, and whether the count is an exact int size of bytes.

If those three things are all true, the l.s. 2 bits of them all will be 0 (assuming pointers and int are 4 bytes). So the algorithm ORs the three values, and isolates the l.s. 2 bits.

In this case, it copies int by int. Otherwise it copies char by char.

If the test fails, a more sophisticated algorithm would copy some of the leading and trailing bytes char by char and the intermediate bytes int by int.

**Antti Haapala -- Слава Україні** · Accepted Answer

It first tests if all 3 arguments are divisible by 4. If - and only if - they all are, it proceeds with copying 4 bytes at a time.

I.e. this undecoded would be

if ((long) src % 4 == 0 && (long) dst % 4 == 0 && count % 4 == 0 )
{
    count = count / 4;
    for (i = 0; i < count; i++)
        ((int *)dest)[i] = ((int *)src)[i];
}

I am not sure if they tested their compiler and it generated bad code for even a test, and therefore they decided to write it in such a convoluted way. In any case, the x | y | z will guarantee that a bit n is set in the result if it is set in any of x, y or z. Therefore if the (x | y | z) & 3 results in 0, none of the numbers had either of the 2 lowest bits set, and therefore are divisible by 4.

Of course it would be rather silly to use now - the standard library memcpy in recent library implementations is almost certainly better than this.

Therefore, on recent compilers you can optimize all calls to Q_memcpy by switching them to memcpy. GCC could generate things like 64-bit or SIMD moves with memcpy depending on the size of area to be copied.

Don't fully understand custom-written 'memcpy' function in C

There are 3 best solutions below

Related Questions in C

Related Questions in UTILITY-METHOD

Trending Questions

Popular # Hahtags

Popular Questions