Re-ordering between two atomic loads

195 Views Asked by At

I have the following program:

int normalData[2];
std::atomic<int> counter {0};

// thread A:
// write new data
normalData[(counter + 1) % 2] = newData;
counter.fetch_and_add(1, std::memory_order_release);

// thread B:
auto before = counter.load(std::memory_order_acquire);
auto tmp = normalData[before % 2];
auto after = counter.load(?);
bool success = (before == after);

Each time thread A writes a new value, the counter is incremented. The std::memory_order_release ensures that the data has been written before the counter is incremented. On the other hand, thread B wants to read the data and loads the counter with std::memory_order_acquire. This ensures that the reading of the data cannot start before thread A has written the data. So far everything works fine. But to see if the read data is valid, the counter is loaded a second time. But how can I prevent the second load from being executed before the data is actually read? Usually the second load must have std::memory_order_release, which is forbidden for load operations. So how can I prevent that the code gets reordered to this:

// thread B:
auto before = counter.load(std::memory_order_acquire);
auto after = counter.load(?);
auto tmp = normalData[before % 2];
bool success = (before == after);

Finally, I want to ensure that the read data was not altered by a simultaneous write operation.

Update

is this a solution?

// thread B:
auto before = counter.load(std::memory_order_acquire);
auto tmp = normalData[before % 2];
bool success = counter.compare_exchange_strong(before, before, std::memory_order_acq_rel);
1

There are 1 best solutions below

8
On

Accessing the non-atomic int normalData[2] from two threads is in general a data race (and hence undefined behaviour) even if you accurately detect the fact that the race happened afterwards and discard the value.

For std::atomic<int> normalData[2], even a relaxed load of normalData[before % 2] will happen after the first acquire load of before because it's sequenced after it in the same thread. If it was also an acquire load, it would happen before the second load of the counter, which is the hard part. (std::atomic_thread_fence(std::memory_order_acquire) between those two loads would also work, and in practice on most implementations will work even for a non-atomic payload.)

As discussed in comments, if the payload is small enough to be lock-free atomic, it's not obvious why you're doing this in the first place (maybe there should be a long, speculative computation between loading the data and checking whether it's still valid).

As also discussed, this particular UB is probably perfectly well defined on most specific platforms - just not by the standard.