Consider below loop (https://godbolt.org/z/z4Wz1aanK) that has no loop-carried dependence. Will modern CPU speculatively execute next iteration with previous one? if true, is loop expansion still necessary here?
void bar(void)
{
for (int i = 0; i < 1024; i++)
out[i] = foo(src[i]);
}
The result of compilation:
bar():
pushq %rbx
xorl %ebx, %ebx
.L2:
movl src(%rbx), %edi
addq $4, %rbx
call foo(int)
movl %eax, out-4(%rbx)
cmpq $4096, %rbx
jne .L2
popq %rbx
ret
src:
.zero 400
out:
.zero 400
Update1: Now I am sure speculative execution can cross loop iterations. The question is how far that can be, considering dependency chain introduced by loop count i
?
Yes, this loop will likely benefit from branch prediction / speculative execution.
Loop unrolling by hand is generally considered to be an obsolete optimization, see for example here: https://www.intel.com/content/www/us/en/developer/articles/technical/avoid-manual-loop-unrolling.html
Speculative execution does not change the observed behaviour of your program. It does not even require compiler-support since it is something the CPU itself does when it encounters conditional jumps. Whether your iterations will be correctly predicted will depend on what happens inside of
foo
and possibly even the data insrc
. Iffoo
has too many conditionals or if the conditionals follow hard-to-predict patterns, the speed will be lower.Other optimizations may appear in the code though if the compiler thinks they are beneficial: There might be loop unrolling, there might be SIMD-operations. To see what the compiler actually does with your code you can try https://godbolt.org/