I have a critical code path where threads use an atomic increment on an integer to count the number of events that have happened globally. This is reasonably fast, but still requires that the cache line holding the integer bounces between cores. In a NUMA system, this creates a lot of MESI traffic.
The pseudo code of the hot pat is that all threads do this:
const int CHECK_VALUE = 42;
int counterNew = counter++;
if (counterNew == CHECK_VALUE) {
Do Final work
}
The counter is monotonically increasing and the value it must reach is known in advance.
At least one thread must conclude that the global counter has reached CHECK_VALUE after it has incremented counter. It is acceptable that more than one thread draws that conclusion (I can always synchronise them at that point - as that is no longer the hot path).
Is it possible to do better than using atomic increment to track the value of counter if I know it is monotonic and the final value is known?
You can do it with atomic CAS operation (compare ans swap). On i386 architecture, this is instruction CMPXCHG. If needed, you can use small assembly function, implements CAS on your platform, or ask me here about Intel implementation. Your code must be following: