How to tell GCC/Clang optimizer to generate specific sequence of operations

132 Views Asked by At

I have a loop that needs to execute sequences of operations in a specific order. What I am doing here is manually unrolling the loop a number of times:

loop
{
    delta = get_delta();
    sum1 += delta;
    sum2 += delta;
    sum3 += delta;

    delta = get_delta();
    sum1 += delta;
    sum2 += delta;
    sum3 += delta;
}

Looking at the generated assembly code, sometimes compilers optimize this loop into something like this:

loop
{
    delta1 = get_delta();
    delta2 = get_delta();
    delta_sum = delta1 + delta2;

    sum1 += delta_sum;
    sum2 += delta_sum;
    sum3 += delta_sum;
}

I don't want this to happen, as the behavior does not quite replicate loop unrolling. I've been looking into inline assembly hints via __asm__ keyword, but can't quite get this to work reliably. I also don't want to use volatile variables as this causes loads from memory, rather than using registers. Changing optimization flags at build to -O0 or -O1 is not ideal, as this is not guaranteed to always produce consistent results with different compiler versions, etc.

Does anyone know of any tricks I can use with __asm__ or similar, so that I can unroll the loop in exactly the sequence how it is written in the source code, without having to resort to writing assembly code?

1

There are 1 best solutions below

0
On

After experimenting with various __asm__ volatile () statements and different compiler optimization options, it became clear that loop unrolling, either automatically or manually, has too many side effects. Compilers perform all sorts of code optimizations and eliminations, which result in very different code constructs.

To achieve consistent results with timing loops, the following should be done:

  1. Completely disable loop unrolling for specific translation units with _Pragma ("GCC unroll 1") and -fno-unroll-loops.

  2. Increase the number of instructions inside the loop, in order to reduce loop overhead skewing the measurements with large number of loop iterations.

  3. Don't use -O0 or similar, this is too crude and disables many other useful optimizations.