Commonly, cacheline is 64B but atomicity of non-volatile memory is 8B.
For example:
x[1]=100;
x[2]=100;
clflush(x);
x
is cacheline aligned, and is initially set to 0
.
System crashs in clflush();
Is it possible x[1]=0
, x[2]=100
after reboot?
Under the following assumptions:
The global observablility order of stores may differ from the persist order on Intel x86 processors. This is referred to as relaxed persistency. The only case in which the order is guaranteed to be the same is for a sequence of stores of type WB to the same cache line (but a store reaching GO doesn't necessarily meant it's become durable). This is because
CLFLUSH
is atomic and WB stores cannot be reordered in global observability. See: On x86-64, is the “movnti” or "movntdq" instruction atomic when system crash?.If the two stores cross a cache line boundary or if the effective memory type of the stores is WC:
The x86-TSO memory model doesn't allow reordering stores, so it's impossible for another agent to observe
x[2] == 100
andx[1] != 100
during normal operation (i.e., in the volatile state without a crash). However, if the system crashed and rebooted, it's possible for the persistent state to bex[2] == 100
andx[1] != 100
. This is possible even if the system crashed after retiringclflush
because the retirement ofclflush
doesn't necessarily mean that the cache line flushed has reached the persistence domain.If you want to eliminate that possibly, you can either move
clflush
as follows:clflush
on Intel processors is ordered with respect to all writes, meaning that the line is guaranteed to reach the persistence domain before any later stores become globally observable. See: Persistent Memory Programming Primary (PDF) and the Intel SDM V2. The second store could be to the same line or any other line.If you want
x[1]=100
to become persistent beforex[2]=100
becomes globally observable, addsfence
afterclflush
on Intel CSX ormfence
on AMD processors (clflush
is only ordered bymfence
on AMD processors).clflush
by itself sufficient to control persist order.Alternatively, use the sequence
clflushopt+sfence
(orclwb+sfence
) as follows:In this case, if a crashed happened and if
x[2] == 100
in the persistent state, then it's guaranteed thatx[1] == 100
.clflushopt
by itself doesn't impose any persist ordering.