Commonly, cacheline is 64B but atomicity of non-volatile memory is 8B.
For example:
x[1]=100;
x[2]=100;
clflush(x);
x is cacheline aligned, and is initially set to 0.
System crashs in clflush();
Is it possible x[1]=0, x[2]=100 after reboot?
Under the following assumptions:
The global observablility order of stores may differ from the persist order on Intel x86 processors. This is referred to as relaxed persistency. The only case in which the order is guaranteed to be the same is for a sequence of stores of type WB to the same cache line (but a store reaching GO doesn't necessarily meant it's become durable). This is because
CLFLUSHis atomic and WB stores cannot be reordered in global observability. See: On x86-64, is the “movnti” or "movntdq" instruction atomic when system crash?.If the two stores cross a cache line boundary or if the effective memory type of the stores is WC:
The x86-TSO memory model doesn't allow reordering stores, so it's impossible for another agent to observe
x[2] == 100andx[1] != 100during normal operation (i.e., in the volatile state without a crash). However, if the system crashed and rebooted, it's possible for the persistent state to bex[2] == 100andx[1] != 100. This is possible even if the system crashed after retiringclflushbecause the retirement ofclflushdoesn't necessarily mean that the cache line flushed has reached the persistence domain.If you want to eliminate that possibly, you can either move
clflushas follows:clflushon Intel processors is ordered with respect to all writes, meaning that the line is guaranteed to reach the persistence domain before any later stores become globally observable. See: Persistent Memory Programming Primary (PDF) and the Intel SDM V2. The second store could be to the same line or any other line.If you want
x[1]=100to become persistent beforex[2]=100becomes globally observable, addsfenceafterclflushon Intel CSX ormfenceon AMD processors (clflushis only ordered bymfenceon AMD processors).clflushby itself sufficient to control persist order.Alternatively, use the sequence
clflushopt+sfence(orclwb+sfence) as follows:In this case, if a crashed happened and if
x[2] == 100in the persistent state, then it's guaranteed thatx[1] == 100.clflushoptby itself doesn't impose any persist ordering.