Often in internet I find that LFENCE
makes no sense in processors x86, ie it does nothing , so instead MFENCE
we can absolutely painless to use SFENCE
, because MFENCE
= SFENCE
+ LFENCE
= SFENCE
+ NOP
= SFENCE
.
But if LFENCE
does not make sense, then why we have four approaches to make Sequential Consistency in x86/x86_64:
LOAD
(without fence) andSTORE
+MFENCE
LOAD
(without fence) andLOCK XCHG
MFENCE
+LOAD
andSTORE
(without fence)LOCK XADD
( 0 ) andSTORE
(without fence)
Taken from here: http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
As well as performances from Herb Sutter on page 34 at the bottom: https://skydrive.live.com/view.aspx?resid=4E86B0CF20EF15AD!24884&app=WordPdf&wdo=2&authkey=!AMtj_EflYn2507c
If LFENCE
did not do anything, then the approach (3) would have the following meanings: SFENCE + LOAD and STORE (without fence)
, but there is no point in doing SFENCE
before LOAD
. Ie if LFENCE
does nothing , the approach (3) does not make sense.
Does it make any sense instruction LFENCE
in processors x86/x86_64?
ANSWER:
1. LFENCE
required in cases which described in the accepted answer, below.
2. The approach (3) should be viewed not independently, but in combination with the previous commands. For example, approach (3):
MFENCE
MOV reg, [addr1] // LOAD-1
MOV [addr2], reg //STORE-1
MFENCE
MOV reg, [addr1] // LOAD-2
MOV [addr2], reg //STORE-2
We can rewrite the code of approach (3) as follows:
SFENCE
MOV reg, [addr1] // LOAD-1
MOV [addr2], reg //STORE-1
SFENCE
MOV reg, [addr1] // LOAD-2
MOV [addr2], reg //STORE-2
And here SFENCE
makes sense to prevent reordering STORE-1 and LOAD-2. For this after STORE-1 command SFENCE
flushes Store-Buffer.
Bottom line (TL;DR):
LFENCE
alone indeed seems useless for memory ordering, however it does not makeSFENCE
a substitute forMFENCE
. The "arithmetic" logic in the question is not applicable.Here is an excerpt from Intel's Software Developers Manual, volume 3, section 8.2.2 (the edition 325384-052US of September 2014), the same that I used in another answer
From here, it follows that:
MFENCE
is a full memory fence for all operations on all memory types, whether non-temporal or not.SFENCE
only prevents reordering of writes (in other terminology, it's a StoreStore barrier), and is only useful together with non-temporal stores and other instructions listed as exceptions.LFENCE
prevents reordering of reads with subsequent reads and writes (i.e. it combines LoadLoad and LoadStore barriers). However, the first two bullets say that LoadLoad and LoadStore barriers are always in place, no exceptions. ThereforeLFENCE
alone is useless for memory ordering.To support the last claim, I looked at all places where
LFENCE
is mentioned in all 3 volumes of Intel's manual, and found none which would say thatLFENCE
is required for memory consistency. EvenMOVNTDQA
- the only non-temporal load instruction so far - mentionsMFENCE
but notLFENCE
.Update: see answers on Why is (or isn't?) SFENCE + LFENCE equivalent to MFENCE? for correct answers to the guesswork below
Whether
MFENCE
is equivalent to a "sum" of other two fences or not is a tricky question. At glance, among the three fence instructions onlyMFENCE
provides StoreLoad barrier, i.e. prevents reordering of reads with earlier writes. However the correct answer requires to know more than the above rules; namely, it's important that all fence instructions are ordered with respect to each other. This makes theSFENCE LFENCE
sequence more powerful than a mere union of individual effects: this sequence also prevents StoreLoad reordering (because loads cannot passLFENCE
, which cannot passSFENCE
, which cannot pass stores), and thus constitutes a full memory fence (but also see the note (*) below). Note however that order matters here, and theLFENCE SFENCE
sequence does not have the same synergy effect.However, while one can say that
MFENCE ~ SFENCE LFENCE
andLFENCE ~ NOP
, that does not meanMFENCE ~ SFENCE
. I deliberately use equivalence (~) and not equality (=) to stress that arithmetic rules do not apply here. The mutual effect ofSFENCE
followed byLFENCE
makes the difference; even though loads are not reordered with each other,LFENCE
is required to prevent reordering of loads withSFENCE
.(*) It still might be correct to say that
MFENCE
is stronger than the combination of the other two fences. In particular, a note toCLFLUSH
instruction in the volume 2 of Intel's manual says that "CLFLUSH
is only ordered by theMFENCE
instruction. It is not guaranteed to be ordered by any other fencing or serializing instructions or by anotherCLFLUSH
instruction."(Update,
clflush
is now defined as strongly ordered (like a normal store, so you only needmfence
if you want to block later loads), butclflushopt
is weakly ordered, but can be fenced bysfence
.)