Should GCC/Clang optimize this redundant load via an array of restrict-qualified pointers?

202 Views Asked by At

I am looking into optimizations permitted for a compiler by adding the C99 restrict type qualifier to the type parameter of an array.

For example, consider the function f below:

int f(int* restrict a[2]) {
    *(a[0]) = 10;
    *(a[1]) = 11;
    return *(a[0]);
}

It takes as argument a two-element array a of restrict pointers to integers. The function stores 10 into the object a[0] points to and 11 into the object a[1] points to. Finally, it returns the value of the object a[0] points to.

Because the pointers a[0] and a[1] are restrict qualified, I would assume a compiler has enough information to perform the typical optimization of not loading from a[0] again, but instead directly returning 10 (because the write to a[1] could not have changed the object a[0] points to if the program has defined behavior).

So if the optimization would take place on "C-code level", the function would then become:

int f(int* restrict a[2]) {
    *(a[0]) = 10;
    *(a[1]) = 11;
    return 10;
}

However, both GCC and Clang (I tried several versions, among which the latest on Godbolt), do not perform this optimization (when compiling with -O3).

The assembly for X86 that GCC13.2 emits is

f:
        mov     rax, QWORD PTR [rdi]
        mov     rdx, QWORD PTR [rdi+8]
        mov     DWORD PTR [rax], 10
        mov     DWORD PTR [rdx], 11
        mov     eax, DWORD PTR [rax]
        ret

As one can see, a reload from register [rax] is performed instead of just moving 10 into eax.

A similar program where the pointers are passed as separate arguments instead of an array of pointers does exhibit the expected optimization for both compilers:

int g(int* restrict p, int* restrict q) {
    *p = 10;
    *q = 11;
    return *p;
}

Is compiled into:

g:
        mov     DWORD PTR [rdi], 10
        mov     eax, 10
        mov     DWORD PTR [rsi], 11
        ret

Do I misunderstand the semantics of a restrict qualified pointer as the type parameter of an array, or is the expected optimization simply too hard for aforementioned compilers?

1

There are 1 best solutions below

17
On

This optimization cannot be done and your second example is invalid (as forgot about one level of indirection).

  1. Even if you restrict both pointers when you access it as an array you will not break the contract with the compiler if access via the second element will not modify the the first one. So it has to be loaded again.
int f(int* restrict * restrict a) {
    *(a[0]) = 10;
    *(a[1]) = 11;
    return *(a[0]);
}

//this function is to illustrate the issue
void foo(void)
{
    int a;
    int *b[2] = {&a,&a};
    f(b);
}
  1. Your pointer example needs to use double pointers not the single ones.
int g(int* restrict * restrict p, int* restrict * restrict q) {
    **p = 10;
    **q = 11;
    return **p;
}

This will generate the optimized code as you can't access the object referenced by *p using *q without breaking the promise you give to the compiler.

        mov     rax, QWORD PTR [rdi]
        mov     DWORD PTR [rax], 10
        mov     rax, QWORD PTR [rsi]
        mov     DWORD PTR [rax], 11
        mov     eax, 10
        ret