Why 'acquire/release' can not guarantee sequential consistency in c++11?

1.1k Views Asked by At
-Thread 1-                
y.store (20, memory_order_release); 
x.store (10, memory_order_release);

-Thread 2-
if (x.load(memory_order_acquire) == 10) {
  assert (y.load(memory_order_acquire) == 20);
  y.store (10, memory_order_release)
}

-Thread 3-
if (y.load(memory_order_acquire) == 10) {
  assert (x.load(memory_order_acquire) == 10);
}

GCC Atomic Wiki paragraph “Overall Summary” says, the above code assert(x.load(memory_order_acquire)) can fail. But I don't understand why ?

My understanding is:

  1. Thread3 can not LoadLoad reorder due to acquire barrier.
  2. Thread1 can not StoreStore reorder due to release barrier.
  3. When Thread2 read(x)->10, x must be flushed from storebuffer to cache in Thread1, so every thread know the value x has changed, such as invalidate cache line.
  4. Thread3 uses Acquire barrier, so it can see x(10).
2

There are 2 best solutions below

5
On

This is a bad example, though it does illustrate how mind-warping relaxed atomics can be, I suppose.

[intro.execution]p9:

Every value computation and side effect associated with a full-expression is sequenced before every value computation and side effect associated with the next full-expression to be evaluated.

[atomics.order]p2:

An atomic operation A that performs a release operation on an atomic object M synchronizes with an atomic operation B that performs an acquire operation on M and takes its value from any side effect in the release sequence headed by A.

As a result, the evaluations shown are chained together by sequenced-before and synchronized-with relationships:

Thread 1                   Thread 2              Thread 3

y.store(20)
   |
   | s.b.
   V           s.w.
x.store(10)  -------->  x.load() == 10
                               |
                               | s.b.
                               V      s.w.
                        y.store(10) --------> y.load() == 10
                                                  |
                                                  | s.b.
                                                  V
                                              x.load() == ?

And so each evaluation in the chain happens before the next (see [intro.races]p9-10).

[intro.races]p15,

If a value computation A of an atomic object M happens before a value computation B of M, and A takes its value from a side effect X on M, then the value computed by B shall either be the value stored by X or the value stored by a side effect Y on M, where Y follows X in the modification order of M.

Here, A is the load in Thread 2 that took the value 10, B is the load in Thread 3 (in the assert). Since A happens before B, and there are no other side effects on x, B must also read 10.


Herb Sutter has a much simpler example on his blog:

T1: x = 1;
T2: y = 1;
T3: if( x == 1 && y == 0 ) puts("x first");
T4: if( y == 1 && x == 0 ) puts("y first");

You absolutely need sequential consistency to guarantee that at most one line is printed.

0
On

Your example is special case of multithread program that only use atomic objects for synchronization, not for communicating information with significant entropy: the only values written act as milestones.

It means that any store:

  • is a release operation
  • communicates a value that indicate a precise point in the progress of one thread

The form must be exactly A.store (C, memory_order_release); where:

  • A is an atomic object
  • C is a constant

and the pair (A,C) characterizes uniquely the program line.

And conversely each load:

  • is an acquire
  • only checks for a particular value

The form must be exactly

if (A.load(memory_order_acquire) == C) { ... }

where there is no else clause.

Reading a particular value indicates progress and establish that all previous side effect (before a particular program point) have happened. That small class of programs never has funny multithread behavior because everything is a function of progress: milestones are passed, and the behavior must be purely sequential.