I'm trying to build a single producer/single consumer lock-free thread safe ring buffer.
Here's piece the code:
#include <iostream>
struct RingBuffer {
static constexpr size_t S = 2;
int data[S];
size_t start = 0;
size_t end = 0;
size_t mask(size_t i) const {
return i & (S - 1);
}
bool full() const {
return end - start == S;
}
size_t size() const {
return end - start;
}
void push(int t) {
size_t i = mask(end);
data[i] = t;
end++;
}
int shift() {
return data[mask(start++)];
}
};
int main()
{
RingBuffer ringBuffer;
// emulate t1 that will write
if (!ringBuffer.full()) {
int dataToWrite = 10;
ringBuffer.push(dataToWrite);
}
// emulate t2 that will read
if (ringBuffer.size() > 0) {
int dataRead = ringBuffer.shift();
std::cout << dataRead << std::endl;
}
}
Write on the buffer will be performed by a single t1 thread. Read from the buffer will be performed by a single t2 thread.
For what I'm learning, the only concurrency problem here could be in the shift
method:
return data[mask(start++)];
because the order of operations must be:
- do
mask()
with the currentstart
value - return
data[]
at the index returned from point 1 - than, increment
start
But the code actually will do 1-3-2, not 1-2-3. The question are:
- is it possible to to 1-2-3 with this kind of code?
- using an
-O3
optimization (gcc), can the order of the operations be changed, making the whole undefined? (i.e. onpush()
, moveend++
beforedata[i] = t
?)
Generally, using a thread-safe ring buffer, also requires to access it in a thread-safe manner. In
main()
, the check forfull()
is separate frompush()
, which makes it impossible for even a perfectringBuffer
to ensure thread-safety. Because of the one-producer and one-consumer restriction, there is no problem with this approach. However, the consistency of the ring buffer itself is at the mercy of a well-written producer and consumer. For example, if the producer callspush()
five times in a row without checkingfull()
before each, there will be overwritten values that are never read, even with a single producer and consumer.Attacking the thread-safety problem
The thread-safety problem is twofold. First, the values that are stored in
data
in one thread, are not necessarily visible in another, thanks to the memory model of modern languages and CPUs. This is also true for the state of the buffer that is described bystart
andend
. In addition, access to the state is unprotected, leaving room for the compiler and CPU to mess things up even further, leading to race-conditions!Fortunately, C++ offers language features to amend these problems. IMHO, studying what atomic has to offer, especially atomic_thread_fence, atomic_flag and atomic types, is worthwhile!
Lock-free implementation
Not using a lock requires a trade-off where
push()
andshift()
"fail a little more often than necessary" to ensure that values are not overwritten or no garbage is read. In general this is complex and very hard to explain. Luckily, the one-producer and one-consumer restrictions together withstart
andend
only increasing, the situation can be fixed more easily.This can be applied in
main()
as follows:Using a spin-lock?
Depending on your definition of "lock-free", it could be allowed to use a spinlock to guard the ring buffer state, using a busy loop to set a flag if and only if it was not set yet.
The advantage of using the spinlock is that the number of producer and consumer threads is unimportant: the implementation serves a more general use case. The drawback is busy waiting. While busy waiting is generally considered a bad pattern, the design and context of the consumer and producer can greatly affect whether that is a problem. Imagine a completely lock-free ring buffer and a consumer as follows:
The busy-waiting was moved from the ring buffer itself to the consumer code, that still halts progress until something is available. In this case, using a ring buffer that does use a spin lock is probably okay.
The spinlock is used to ensure that only one thread can change the state of the ring buffer, consisting of {
start
,end
}. It uses an atomic_flag for that. To ensure that the data written to elements ofdata
is visible to other threads, an atomic_thread_fence is used. Here's the code, that omits thefull()
andsize()
methods.Splitting the check in
full()
andsize()
from the actualpush()
andshift()
can work. But it still requires a lock in bothfull()
andpush()
, as the lock also makes the variablesstart
andend
visible in both threads. Another solution is to expose the SpinlockAndFence, however, that puts a lot of responsibility at the call-site.