I've come across something interesting. I've noticed that when I use __faststorefence in my timing function, I get more consistent results compared to when I use _ReadWriteBarrier. Here's the basic structure of my timing function
__forceinline static uint64_t ReadTime() {
_mm_lfence();
__faststorefence(); // Previously used _ReadWriteBarrier
const uint64_t result = __rdtsc();
_mm_lfence();
__faststorefence(); // Previously used _ReadWriteBarrier
return result;
}
Could someone explain why __faststorefence might be providing more consistent timing results compared to _ReadWriteBarrier? thanks !