Static Branch prediction for the ARM with __builtin_expect is not functional!!?

1.6k Views Asked by At

Im doing the optimization in the C code running in the Cortex-R4. first of all I haven't seen any change in the assembly code output when I indicated the "__builtin_expect" in condition check. It seem like the compiler generate the unnecessary Jump.

My C code:

           bit ++; (Likely)

if(__builtin_expect(bit >= 32),0) 
{ 
  bit -=32; // unlikely code
  xxxxxx;   // unlikely code
  xxxxxx;   // unlikely code
  xxxxxx;   // unlikely code
} 

bit = bit*2 // something  (Likely)
return bit; 

---- Generated ASM code -------- (bit => r0)

                      ADD   r2,r2,#1 
                      CMP   r0,#0x20 
                      BCC  NoDecrement 
                      SUB   r0,r0,#0x20 
                      XXXXXXXXX 
                      XXXXXXXXX 
                      XXXXXXXXX 
NoDecrement LSL   r0,r0,#1 
                          BX  lr 

---- My expected ASM Code --------

                          ADD   r2,r2,#1 
                          CMP   r0,#0x20 
                          BHE   Decrement 
JumbBack       LSL   r0,r0,#1 
                          BX  lr 
Decrement      SUB   r0,r0,#0x20 
                          XXXXXXXXX 
                          XXXXXXXXX 
                          XXXXXXXXX 
                          B JumbBack

suppose if this piece of C code runs in a loop, then each time it has to jump (because the if condition is passed only once). Is there any other compiler setting which actually, generates the code as expected..??

1

There are 1 best solutions below

0
On

You wrote:

if(__builtin_expect(bit >= 32),0)
{
    ...
}

The code inside the curly braces will never be executed, because it's surrounded by if(foo,0) which is equivalent to if(0) for any value of foo, no matter what builtin you're trying to use. If you turn on optimization with -O2, you'll see that the compiler removes the dead code completely, rather than just jumping around it. I think you probably meant to write

if (__builtin_expect(bit >= 32, 0)) {
    bit -= 32;
}

If I do this, I get exactly the forward branch I'd expect (with clang -O1 or higher).

extern void something();
int foo(int bit)
{
    ++bit;
    if (__builtin_expect(bit >= 32, 0)) {
        bit -= 32;  // "Decrement"
        something();
    }
    bit = bit*2;
    something();
    return bit;
}

Here's the code from clang -arch armv7 -O2 -S:

_foo:
@ BB#0:
push    {r4, r7, lr}
adds    r4, r0, #1
add r7, sp, #4
cmp r4, #32
bge LBB0_2           // a forward branch for the unlikely case
LBB0_1:
lsls    r4, r4, #1
blx _something
mov r0, r4
pop {r4, r7, pc}
LBB0_2:                      // "Decrement"
sub.w   r4, r0, #31
blx _something
b   LBB0_1