Folly hazard pointer implementation incorrect memory barrier?

296 Views Asked by At

The folly implementation of hazard pointer could be simplified like this (when the asymmetric memory barrier on Linux is used):

Atomic<T*> source_ptr;

1. writer/updater: (retire operation)
    old_ptr = source_ptr.load();
    source_ptr.exchange(....);
    compiler barrier;====================
    move old_ptr to retirement list

2. consumer/reader: (try_protect operation)
    ptr = source_ptr.load();
    compiler barrier;====================
    if (source_ptr.load() == ptr) {
         success; start to use ptr;
    }
    fail; retry to get source_ptr;

3. reclaim:
   read all retirement list;
   heavy memory barrier --> membarrier() call; ===============
   read all hazard pointers;
   reclaim (delete) ptr if ptr is in retirement list but not in hazard pointers.

membarrier() in 3 could synchronize with 1 and 2 above, but there's no synchronization btween 1 and 2 themselves.

So I am wondering if the following could happen:

source ptr: the memory pointer that can be protected by hazard pointer
thread 1: thread that tries to delete the source ptr
thread 2: consumer thread that tries to read source ptr
thread 3: the thread that does the memory reclaim (free memory from retirement list)

At start:
source ptr value = PTR_A

At time 0:
thread 1 (updater): 
change source ptr value from PTR_A to PTR_B and put PTR_A in the retirement list (retire operation);
let's say these results are not visible to the consumer thread (thread 2) but visible to thread 3 (reclaim thread),
because there's only the light barrier.

At time 1:
thread 3 (reclaim thread): 
read the retirement list and found PTR_A in the list.
issue the heavy barrier which is 'membarrier()' call.
Say it first sends IPI to thread 2 and the CPU for thread 2 finishes.

At time 2:
thread 2 (consumer thread):
read the old value (PTR_A) from source ptr (new value not visible yet).
and calls 'try_protect' to protect the pointer.
But this is not visible to other CPU yet, again since there's only compiler barrier now. thread 2 starts to use PTR_A.

At time 3:
the heavy barrier issued by thread 3 at time 1 now reaches thread 1 and finishes.
Now thread 3 starts to collect the hazard pointer list.
But the hazard pointer set by thread 2 at time 2 is not visible to thread 3 yet.
So it doesn't see it. Then thread 3 now could see PTR_A in the retirement list but not in hazard pointer list, which means it would start to delete it.

Is this a bug or I miss something?

0

There are 0 best solutions below