Why is this Swift Readers-Writers code causing deadlock?

267 Views Asked by At

I seem to have a classic solution for Readers-Writers problem in Swift using concurrent DispatchQueue with a barrier for writes. However, then I run my test code, it deadlocks.

Here's my wanna-be-thread-safe data structure:

class Container<T> {
    private var _value: T
    private let _queue = DispatchQueue(label: "containerQueue", attributes: .concurrent)
    
    init(_ value: T) {
        self._value = value
    }
    
    var value: T {
        get {
            _queue.sync {
                _value
            }
        }
        set {
            _queue.async(flags: .barrier) {
                self._value = newValue
            }
        }
    }
}

And here's my test code:

class ContainerTest {
    let testQueue = DispatchQueue(label: "testQueue", attributes: .concurrent)
    var container = Container(0)
    let group = DispatchGroup()
    
    func runTest() {
        for i in 0..<1000 {
            testQueue.async(group: group) {
                self.container.value = max(i, self.container.value)
            }
        }
        
        group.notify(queue: .main) {
            print("Finished")
        }
    }
}

The piece of code that's run repeatedly is just some random read and write operations. It's not attempting to produce anything sensible, it's just there to stress-test the data structure.

So when I run this, "Finished" is never printed. However, if I change _queue.async(flags: .barrier) to _queue.sync(flags: .barrier), then I see "Finished" printed.

I'm guessing when I'm using the async write version, I'm getting a deadlock, but why? It's the textbook Readers-Writers solution that's typically used. Perhaps it's my test code that is at fault, but again, why?

2

There are 2 best solutions below

8
Rob Napier On BEST ANSWER

You're almost certainly getting thread-pool exhaustion. This is a pattern Apple engineers repeatedly warn against using because of that risk (I'm aware it's also documented as a common pattern in many places). Each call to .async is spawning a thread, and if there is severe contention (as you're creating), eventually there will be no threads available, and everything will lock up.

In modern Swift, the correct tool for this is an actor. In pre-concurrency Swift, you'd likely want to use an OSAllocatedUnfairLock, though when you find yourself making an atomic property, thinking you mean thread-safe, you are often on the wrong road.

3
Alex On

Rob in his answer gave me some great clues into why my code was not completing: "thread pool exhaustion, not threads available, everything locks up". So I wanted to see exactly why does everything lock up when pool exhaustion happens.

Having learned a bit about thread pools, I would normally expect pool exhaustion to be perhaps bad for performance, but not necessarily a reason for a deadlock, because once some blocks finish executing, they would free up their threads, and allow other blocks to execute. So I didn't see immediately how can this result in a deadlock.

But alas, here's one way this could have resulted in a deadlock:

enter image description here

In the image above I'm trying to depict what the state of my queues was at the time when deadlock happened.

The main queue is "fine", because it dispatched everything asynchronously to the "testQueue".

Now, I just learned about the GCD thread pool – it's a pool of available threads, and GCD can add more threads to it when needed, but there's a limit, which probably depends on the platform and device, but for the sake of example let's say it's max 4 threads.

So in this case, I dispatched 1000 concurrent blocks, and the first 4 got their own thread to execute on, immediately depleting the GCD thread pool.

The remaining 996 blocks have to wait for threads to free up before they can run. This is all fine so far.

Next, the first block that received a thread starts executing. In its code it calls _queue.sync to read the value on the "containerQueue". To run a block on "containerQueue" it needs its own thread, but because our GCD thread pool is empty, it can't run and has to wait for threads to become available.

Now because the block in "containerQueue" was dispatched synchronously, it blocks the "testQueue" block from completing. Now since it can't complete, it can't release its thread back to the GCD thread pool.

So "testQueue" blocks never complete, never release their threads, and "containerQueue" blocks never start because they wait forever for threads to free up.

This is a deadlock.


Now if I pause my program execution and look at my threads, I see a whole bunch of "testQueue" threads, all waiting on a lock, after trying to access Container.value.getter, which seems to support my hypothesis above.

enter image description here


However, this doesn't explain why the deadlock goes away when I change the setter from async to sync. Maybe it's just "luck".