The __restrict
in the code below completely unwinds the loop and shortens the assembly by more than a half. But what does it mean and how should it be correctly used?
I did research before asking... I found this. But alas, I do not understand it.
// Compile with -O3 -march=native to see autovectorization
void maxArray(double* __restrict x, double* __restrict y) {
for (int i = 0; i < 65536; i++) {
if (y[i] > x[i]) x[i] = y[i];
}
}
Imagine you declare some
static double array[100000];
then yourmain
is callingmaxArray(array, array + 17);
Without the
restrict
annotation (or GCC extension), the compiler is not allowed to strongly unroll the loop (because the two array slices are overlapping)With the
restrict
annotation you as a programmer promises that this would never happen (so you won't domaxArray(array, array + 17);
in such amain
), and then the compiler can optimize more agressivelyThere is a similar difference (for C) between memcpy and memmove and an optimizing compiler would generate different code for them.
Be aware of the Rice's theorem, which states theoretical limitations related to these issues. A theoretical framework for agressive optimizations could be abstract interpretation.
If you use GCC (you may look into the generated assembler code produced with
g++ -Wall -O3 -S -fverbose-asm
) you could with your GCC plugin and a lot of efforts improve the optimizations. You also could use GCC developer options to understand various optimizations, and since GCC is free software, you can study and improve its source code. Budget months of effort for this.Consider using, if so allowed, static analysis tools for C or C++ code like Frama-C or the Clang static analyzer.
Consider using, in addition of your debugger (e.g. GDB and its watchpoints), if so allowed, dynamic instrumentation techniques like valgrind and the address sanitizer. They do slow down a lot your executable!