Does anybody know how to insert recognizable code sequences using the Sun Studio compiler, without horribly messing up optimization?
I'd like to look to see what the Sun Studio (12.1) compiler does with a bit of code in a number of instances, and was looking for a way to mark the generated code with a recognizable set of no-op instructions so I could find my fragments of code. My first try used:
asm volatile ("nop ; nop ; nop ") ;
// ... <stuff I want to look at here> ...
asm volatile ("nop ; nop ; nop ; nop ; nop") ;
However, when I use this, the compiler generates unoptimized looking code within the nop blocks. Example:
nop
nop
nop
ld [%sp + 0x8bf], %g2
srl %g2, 0x0, %g3
sllx %g3, 0x2, %g4
ld [%sp + 0x8c3], %g5
ldx [%sp + 0x8c7], %o2
st %g5, [%o2 + %g4]
ld [%sp + 0x8b7], %o3
ldx [%sp + 0x8c7], %o4
st %o3, [%o4 + 0x28]
nop
nop
nop
nop
nop
The code in question is just two stores. I don't really know sparc assembly, but this looks like the compiler has completely given up on optimizing the code within the nop blocks. Why, for example, would it generate a new load, the ldx [%sp + 0x8c7], %o4
, recalculating the base address for the store when it already had this done in %02
?
At a glance at the surrounding code, it may very well be unoptimized anywhere in the vicinity of the asm volatiles used.
I tried the following instead, creating a .il file with this inline asm:
.inline DO_Nop3,0
nop
nop
nop
.end
.inline DO_Nop5,0
nop
nop
nop
nop
nop
.end
with the following in my source:
extern "C" void DO_Nop3() ;
extern "C" void DO_Nop5() ;
Using this, I've got the opposite problem, the compiler is now too smart, and eliminates my nop instructions completely (I'm guessing it looks at the side effects of the instructions in the .inline
blocks, and then later, rightly, decides this doesn't do anything, and tosses that bit of code).
Any better ways?
The problem is that the compiler is free to reorder instructions; the
asm volatile
blocks stops it from doing so and potentially inhibits optimization.Debugging symbols should give you a mapping from instruction addresses to source lines. I'm not aware of any good tools for conveniently reading dwarf2/stabs, though.