Can speculative execution of modern CPUs cross loop iterations?

130 Views Asked by At

Consider below loop (https://godbolt.org/z/z4Wz1aanK) that has no loop-carried dependence. Will modern CPU speculatively execute next iteration with previous one? if true, is loop expansion still necessary here?

void bar(void)
{
    for (int i = 0; i < 1024; i++)
    out[i] = foo(src[i]);
}

The result of compilation:

bar():
       pushq   %rbx
       xorl    %ebx, %ebx
.L2:
       movl    src(%rbx), %edi
       addq    $4, %rbx
       call    foo(int)
       movl    %eax, out-4(%rbx)
       cmpq    $4096, %rbx
       jne     .L2
       popq    %rbx
       ret
src:
       .zero   400
out:
       .zero   400

Update1: Now I am sure speculative execution can cross loop iterations. The question is how far that can be, considering dependency chain introduced by loop count i?

1

There are 1 best solutions below

6
On

Yes, this loop will likely benefit from branch prediction / speculative execution.

Loop unrolling by hand is generally considered to be an obsolete optimization, see for example here: https://www.intel.com/content/www/us/en/developer/articles/technical/avoid-manual-loop-unrolling.html

Speculative execution does not change the observed behaviour of your program. It does not even require compiler-support since it is something the CPU itself does when it encounters conditional jumps. Whether your iterations will be correctly predicted will depend on what happens inside of foo and possibly even the data in src. If foo has too many conditionals or if the conditionals follow hard-to-predict patterns, the speed will be lower.

Other optimizations may appear in the code though if the compiler thinks they are beneficial: There might be loop unrolling, there might be SIMD-operations. To see what the compiler actually does with your code you can try https://godbolt.org/