I have some immutable data structures that I would like to manage using reference counts, sharing them across threads on an SMP system.
Here's what the release code looks like:
void avocado_release(struct avocado *p)
{
if (atomic_dec(p->refcount) == 0) {
free(p->pit);
free(p->juicy_innards);
free(p);
}
}
Does atomic_dec
need a memory barrier in it? If so, what kind of memory barrier?
Additional notes: The application must run on PowerPC and x86, so any processor-specific information is welcomed. I already know about the GCC atomic builtins. As for immutability, the refcount is the only field that changes over the duration of the object.
On x86, it will turn into a
lock
prefixed assembly instruction, likeLOCK XADD
.Being a single instruction, it is non-interruptible. As an added "feature", the
lock
prefix results in a full memory barrier:A memory barrier is in fact implemented as a dummy
LOCK OR
orLOCK AND
in both the .NET and the JAVA JIT on x86/x64, becausemfence
is slower on many CPUs even when it's guaranteed to be available, like in 64-bit mode. (Does lock xchg have the same behavior as mfence?)So you have a full fence on x86 as an added bonus, whether you like it or not. :-)
On PPC, it is different. An LL/SC pair -
lwarx
&stwcx
- with a subtraction inside can be used to load the memory operand into a register, subtract one, then either write it back if there was no other store to the target location, or retry the whole loop if there was. An LL/SC can be interrupted (meaning it will fail and retry).It also does not mean an automatic full fence.
This does not however compromise the atomicity of the counter in any way.
It just means that in the x86 case, you happen to get a fence as well, "for free".
On PPC, one can insert a (partial or) full fence by emitting a
(lw)sync
instruction.All in all, explicit memory barriers are not necessary for the atomic counter to work properly.