How to stop VC++ compiler from reordering code?

608 Views Asked by At

I have a code like that:

const uint64_t tsc = __rdtsc();
const __m128 res = computeSomethingExpensive();
const uint64_t elapsed = __rdtsc() - tsc;
printf( "%" PRIu64 " cycles", elapsed );

In release builds, this prints garbage like “38 cycles” because VC++ compiler reordered my code:

    const uint64_t tsc = __rdtsc();
00007FFF3D398D00  rdtsc  
00007FFF3D398D02  shl         rdx,20h  
00007FFF3D398D06  or          rax,rdx  
00007FFF3D398D09  mov         r9,rax  
    const uint64_t elapsed = __rdtsc() - tsc;
00007FFF3D398D0C  rdtsc  
00007FFF3D398D0E  shl         rdx,20h  
00007FFF3D398D12  or          rax,rdx  
00007FFF3D398D15  mov         rbx,rax  
00007FFF3D398D18  sub         rbx,r9  
    const __m128 res = …
00007FFF3D398D1B  lea         rdx,[rcx+98h]  
00007FFF3D398D22  mov         rcx,r10  
00007FFF3D398D25  call        computeSomethingExpensive (07FFF3D393E50h)  

What’s the best way to fix?

P.S. I’m aware rdtsc doesn’t count cycles, it measures time based on CPU’s base frequency. I’m OK with that, I still want to measure that number.

Update: godbolt link

1

There are 1 best solutions below

3
Alex Guteniev On

Adding a fake store

static bool save = false;
if (save)
{
   static float res1[4];
   _mm_store_ps(res1, res);
}

before the second __rdtsc seem to be enough to fool the compiler.

(Not adding a real store to avoid contention if this function is called in multiple threads, though could use TLS to avoid that)