I have the following program:
int normalData[2];
std::atomic<int> counter {0};
// thread A:
// write new data
normalData[(counter + 1) % 2] = newData;
counter.fetch_and_add(1, std::memory_order_release);
// thread B:
auto before = counter.load(std::memory_order_acquire);
auto tmp = normalData[before % 2];
auto after = counter.load(?);
bool success = (before == after);
Each time thread A writes a new value, the counter is incremented. The std::memory_order_release
ensures that the data has been written before the counter is incremented. On the other hand, thread B wants to read the data and loads the counter with std::memory_order_acquire
. This ensures that the reading of the data cannot start before thread A has written the data. So far everything works fine. But to see if the read data is valid, the counter is loaded a second time. But how can I prevent the second load from being executed before the data is actually read? Usually the second load must have std::memory_order_release
, which is forbidden for load operations. So how can I prevent that the code gets reordered to this:
// thread B:
auto before = counter.load(std::memory_order_acquire);
auto after = counter.load(?);
auto tmp = normalData[before % 2];
bool success = (before == after);
Finally, I want to ensure that the read data was not altered by a simultaneous write operation.
Update
is this a solution?
// thread B:
auto before = counter.load(std::memory_order_acquire);
auto tmp = normalData[before % 2];
bool success = counter.compare_exchange_strong(before, before, std::memory_order_acq_rel);
Accessing the non-atomic
int normalData[2]
from two threads is in general a data race (and hence undefined behaviour) even if you accurately detect the fact that the race happened afterwards and discard the value.For
std::atomic<int> normalData[2]
, even a relaxed load ofnormalData[before % 2]
will happen after the first acquire load ofbefore
because it's sequenced after it in the same thread. If it was also an acquire load, it would happen before the second load of the counter, which is the hard part. (std::atomic_thread_fence(std::memory_order_acquire)
between those two loads would also work, and in practice on most implementations will work even for a non-atomic payload.)As discussed in comments, if the payload is small enough to be lock-free
atomic
, it's not obvious why you're doing this in the first place (maybe there should be a long, speculative computation between loading the data and checking whether it's still valid).As also discussed, this particular UB is probably perfectly well defined on most specific platforms - just not by the standard.