A full/general memory barrier is one where all the LOAD and STORE operations specified before the barrier will appear to happen before all the LOAD and STORE operations specified after the barrier with respect to the other components of the system.
According to cppreference, memory_order_seq_cst is equal to memory_order_acq_rel plus a single total modification order on all operations so tagged. But as far as I know, neither acquire nor release fence in C++11 enforces a #StoreLoad (load after store) ordering. A release fence requires that no previous read/write can be reordered with any following write; An acquire fence requires that no following read/write can be reordered with any previous read. Please correct me if I am wrong;)
Giving an example,
atomic<int> x;
atomic<int> y;
y.store(1, memory_order_relaxed); //(1)
atomic_thread_fence(memory_order_seq_cst); //(2)
x.load(memory_order_relaxed); //(3)
Is it allowed by a optimizing compiler to reorder instruction (3) to before (1) so that it effective looks like:
x.load(memory_order_relaxed); //(3)
y.store(1, memory_order_relaxed); //(1)
atomic_thread_fence(memory_order_seq_cst); //(2)
If this is a valid tranformation, then it proves that atomic_thread_fence(memory_order_seq_cst) doesn't not necessarily encompass the semantics of what a full barrier has.
atomic_thread_fence(memory_order_seq_cst)always generates a full-barrier.MFENCEhwsyncmfdmb ishsyncThe main thing: observing thread can simply observe in a different order, and will not matter what fences you are using in the observed thread.
Not, it isn't allowed. But in globally visible for multithreading programm this is true, only if:
memory_order_seq_cstfor atomically read/write-operations with these valuesatomic_thread_fence(memory_order_seq_cst);between load() and store() too - but this approach doesn't guarantee sequential consistency in general, because sequential consistency is more strong guaranteeWorking Draft, Standard for Programming Language C++ 2016-07-12: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/n4606.pdf
How it can be mapped to assembler:
Case-1:
This code isn't always equivalent to the meaning of Case-2, but this code produce the same instructions between STORE & LOAD, as well as if both LOAD and STORE uses
memory_order_seq_cst- this is Sequential Consistency which prevents StoreLoad-reordering, Case-2:With some notes:
or may use similar operations in the form of other instructions:
LOCK-prefix flushes Store-Buffer exactly asMFENCEto prevent StoreLoad-reorderingDMB ISHare full-barrier which prevents StoreLoad-reordering: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/CHDGACJD.htmlPrevent reordering of two instructions can be done by additional instructions between these two. And as we see the first STORE(seq_cst) and next LOAD(seq_cst) generate instructions between its are the same as FENCE(seq_cst) (
atomic_thread_fence(memory_order_seq_cst))Mapping of C/C++11
memory_order_seq_cstto differenct CPU architectures for:load(),store(),atomic_thread_fence():Note
atomic_thread_fence(memory_order_seq_cst);always generates Full-barrier:x86_64: STORE-
MOV (into memory),MFENCE, LOAD-MOV (from memory), fence-MFENCEx86_64-alt: STORE-
MOV (into memory), LOAD-MFENCE,MOV (from memory), fence-MFENCEx86_64-alt3: STORE-
(LOCK) XCHG, LOAD-MOV (from memory), fence-MFENCE- full barrierx86_64-alt4: STORE-
MOV (into memory), LOAD-LOCK XADD(0), fence-MFENCE- full barrierPowerPC: STORE-
hwsync; st, LOAD-hwsync;ld; cmp; bc; isync, fence-hwsyncItanium: STORE-
st.rel;mf, LOAD-ld.acq, fence-mfARMv7: STORE-
dmb ish; str;dmb ish, LOAD-ldr; dmb ish, fence-dmb ishARMv7-alt: STORE-
dmb ish; str, LOAD-dmb ish;ldr; dmb ish, fence-dmb ishARMv8(AArch32): STORE-
STL, LOAD-LDA, fence-DMB ISH- full barrierARMv8(AArch64): STORE-
STLR, LOAD-LDAR, fence-DMB ISH- full barrierMIPS64: STORE-
sync; sw;sync;, LOAD-sync; lw; sync;, fence-syncThere are described all mapping of C/C++11 semantics to differenct CPU architectures for: load(), store(), atomic_thread_fence(): http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
Because Sequential-Consistency prevents StoreLoad-reordering, and because Sequential-Consistency (
store(memory_order_seq_cst)and nextload(memory_order_seq_cst)) generates instructions between its are the same asatomic_thread_fence(memory_order_seq_cst), thenatomic_thread_fence(memory_order_seq_cst)prevents StoreLoad-reordering.