My question refers specifically to why it was designed that way, due to the unnecessary performance implication.
When thread T1 has this code:
cv.acquire()
cv.wait()
cv.release()
and thread T2 has this code:
cv.acquire()
cv.notify() # requires that lock be held
cv.release()
what happens is that T1 waits and releases the lock, then T2 acquires it, notifies cv
which wakes up T1. Now, there is a race-condition between T2's release and T1's reacquiring after returning from wait()
. If T1 tries to reacquire first, it will be unnecessarily resuspended until T2's release()
is completed.
Note: I'm intentionally not using the with
statement, to better illustrate the race with explicit calls.
This seems like a design flaw. Is there any rationale known for this, or am I missing something?
This is not a definitive answer, but it's supposed to cover the relevant details I've managed to gather about this problem.
First, Python's threading implementation is based on Java's. Java's
Condition.signal()
documentation reads:Now, the question was why enforce this behavior in Python in particular. But first I want to cover the pros and cons of each approach.
As to why some think it's often a better idea to hold the lock, I found two main arguments:
From the minute a waiter
acquire()
s the lock—that is, before releasing it onwait()
—it is guaranteed to be notified of signals. If the correspondingrelease()
happened prior to signalling, this would allow the sequence(where P=Producer and C=Consumer)P: release(); C: acquire(); P: notify(); C: wait()
in which case thewait()
corresponding to theacquire()
of the same flow would miss the signal. There are cases where this doesn't matter (and could even be considered to be more accurate), but there are cases where that's undesirable. This is one argument.When you
notify()
outside a lock, this may cause a scheduling priority inversion; that is, a low-priority thread might end up taking priority over a high-priority thread. Consider a work queue with one producer and two consumers (LC=Low-priority consumer and HC=High-priority consumer), where LC is currently executing a work item and HC is blocked inwait()
.The following sequence may occur:
Whereas if the
notify()
happened beforerelease()
, LC wouldn't have been able toacquire()
before HC had been woken-up. This is where the priority inversion occurred. This is the second argument.The argument in favor of notifying outside of the lock is for high-performance threading, where a thread need not go back to sleep just to wake-up again the very next time-slice it gets—which was already explained how it might happen in my question.
Python's
threading
ModuleIn Python, as I said, you must hold the lock while notifying. The irony is that the internal implementation does not allow the underlying OS to avoid priority inversion, because it enforces a FIFO order on the waiters. Of course, the fact that the order of waiters is deterministic could come in handy, but the question remains why enforce such a thing when it could be argued that it would be more precise to differentiate between the lock and the condition variable, for that in some flows that require optimized concurrency and minimal blocking,
acquire()
should not by itself register a preceding waiting state, but only thewait()
call itself.Arguably, Python programmers would not care about performance to this extent anyway—although that still doesn't answer the question of why, when implementing a standard library, one should not allow several standard behaviors to be possible.
One thing which remains to be said is that the developers of the
threading
module might have specifically wanted a FIFO order for some reason, and found that this was somehow the best way of achieving it, and wanted to establish that as aCondition
at the expense of the other (probably more prevalent) approaches. For this, they deserve the benefit of the doubt until they might account for it themselves.