I'm writing some logging C code for an ARM9 processor. This code will record some data if a dynamic module is present. The module will usually not be present in a production build, but the logging code will always be compiled in. The idea is that if a customer encounters a bug, we can load this module, and the logging code will dump debugging information.
The logging code must have minimal impact when the module is not present, so every cycle counts. In general, the logging code looks something like this:
__inline void log_some_stuff(Provider *pProvider, other args go here...)
{
if (NULL == pProvider)
return;
... logging code goes here ...
}
With optimization on, RVCT 4.0 generates code that looks like this:
ldr r4,[r0,#0x2C] ; pProvider,[r0,#44]
cmp r4,#0x0 ; pProvider,#0
beq 0x23BB4BE (usually taken)
... logging code goes here...
... regular code starts at 0x23BB4BE
This processor has no branch predictor, and my understanding is that there is a 2 cycle penalty whenever a branch is taken (no penalty if the branch is not taken).
I would like the common case, where NULL == pProvider
, to be the fast case, where the branch is not taken. How can I make RVCT 4.0 generate code like this?
I've tried using __builtin_expect
as follows:
if (__builtin_expect(NULL == pProvider, 1))
return;
Unfortunately, this has no impact on the generated code. Am I using __builtin_expect
incorrectly? Is there another method (hopefully without inline assembly)?
So if there's no branch predictor and you get a penalty of two cycles when taking a branch, why not just rewrite the program accordingly to just do that? (well actually you'd think that your example above would already result in the "correct" code, but we can try)
that "could" compile to:
if you're lucky, but even if it does now, every change to the compiler may change it and I've got no idea if it'd even result in the assembly code with whatever compiler you're using. So probably write it in inline assembly anyhow? Not that much code and gcc (as well as VC; I assume others do too) make that quite easy. Easiest you'd just define an extra method with your logging code and call that (no idea about the ARM ABI, so you'll have to write that yourself)