I am looking into optimizations permitted for a compiler by adding the C99 restrict type qualifier to the type parameter of an array.
For example, consider the function f below:
int f(int* restrict a[2]) {
*(a[0]) = 10;
*(a[1]) = 11;
return *(a[0]);
}
It takes as argument a two-element array a of restrict pointers to integers. The function stores 10
into the object a[0]
points to and 11
into the object a[1]
points to. Finally, it returns the value of the object a[0]
points to.
Because the pointers a[0]
and a[1]
are restrict qualified, I would assume a compiler has enough information to perform the typical optimization of not loading from a[0]
again, but instead directly returning 10
(because the write to a[1]
could not have changed the object a[0]
points to if the program has defined behavior).
So if the optimization would take place on "C-code level", the function would then become:
int f(int* restrict a[2]) {
*(a[0]) = 10;
*(a[1]) = 11;
return 10;
}
However, both GCC and Clang (I tried several versions, among which the latest on Godbolt), do not perform this optimization (when compiling with -O3).
The assembly for X86 that GCC13.2 emits is
f:
mov rax, QWORD PTR [rdi]
mov rdx, QWORD PTR [rdi+8]
mov DWORD PTR [rax], 10
mov DWORD PTR [rdx], 11
mov eax, DWORD PTR [rax]
ret
As one can see, a reload from register [rax]
is performed instead of just moving 10
into eax
.
A similar program where the pointers are passed as separate arguments instead of an array of pointers does exhibit the expected optimization for both compilers:
int g(int* restrict p, int* restrict q) {
*p = 10;
*q = 11;
return *p;
}
Is compiled into:
g:
mov DWORD PTR [rdi], 10
mov eax, 10
mov DWORD PTR [rsi], 11
ret
Do I misunderstand the semantics of a restrict qualified pointer as the type parameter of an array, or is the expected optimization simply too hard for aforementioned compilers?
This optimization cannot be done and your second example is invalid (as forgot about one level of indirection).
This will generate the optimized code as you can't access the object referenced by
*p
using*q
without breaking the promise you give to the compiler.