Threads A and B are executing concurrently. Which ARMv8-A memory barrier types (like DMB, DSB) are sufficient to fulfill the postcondition, and why?
Initially x1 = 0, x2 = 0
Thread A | Thread B
----------------------------------
x1 = 1 | x2 = 1
barrier | barrier
y1 = x2 | y2 = x1
Postcondition: (y1 == 1) || (y2 == 1)
I looked at the ARMv8-A Architecture Reference Manual memory model definition of DMB and DSB, but could not deduce an argument why the postcondition would hold even with the DSB memory barrier. I think the key definitions in the Architecture Reference Manual are:
The DMB instruction ensures that all affected memory accesses by the PE executing the DMB that appear in program order before the DMB and those which originate from a different PE [...] which have been Observed-by the PE before the DMB is executed, are Observed-by each PE [...] before any affected memory accesses that appear in program order after the DMB are Observed-by that PE.
and
A DSB executed by a PE [...] completes when all of the following apply:
All explicit memory accesses of the required access types appearing in program order before the DSB are complete for the set of observers in the required shareability domain.
[...]
and
In addition, no instruction that appears in program order after the DSB instruction can alter any state of the system or perform any part of its functionality until the DSB completes other than [...]
Unix smurf wrote a series on ARM memory barriers. The
DSBis a super-set of theDMBor in other words theDSBis more restrictive. TheDMBis sufficient to ensure that the writes tox1andx2are complete beforey1ory2is updated using normal memory. Ie, it is a sufficient substitute for barrier in your example on most ARM systems withDSBalso working.An OS can use different properties in the MMU tables and this could affect your results. For instance using graphics RAM or some network devices buffer as the backing store for
x1,x2, may not need to issue admbor thedmbmay need different parameters as these type of memory maybe put in a different domain.In fact, an OS can probably completely subvert these mechanisms. This won't be a factor for most use cases and I just state it to be complete. It is also possible to have AMP (asymmetric multi-processor) systems where this won't work. If you have a system with a ARMv8 and a Cortex-M for instance.
Reference: