-
-
Notifications
You must be signed in to change notification settings - Fork 34k
Description
Feature or enhancement
Currently, it's not safe to use critical sections within stop-the-world (STW) pauses because it risks deadlock. The deadlock risk isn't obvious and actual deadlocks happens infrequently, which makes our unit tests less effective at catching this sort of bug.
Deadlock
Normally, when a thread pauses for a stop-the-world event, it releases all the critical sections that it holds when the thread state is detached (_PyThreadState_Detach). The problem is that when locks are contended for a certain duration (TIME_TO_BE_FAIR_NS), ownership is handed off directly to a waiting thread rather than being released. This can happen after the waiting thread detaches and reaches a safe point for a stop-the-world event.
Proposal
The slow paths _PyCriticalSection_BeginSlow and _PyCriticalSection2_BeginSlow should check if the interpreter is in a stop-the-world pause. If so, we should return without acquiring the lock like we do in the "optimisation for locking the same object recursively".
I think we only care about per-interpreter stop-the-world pauses here. We only use cross-interpreter STW pauses (_PyEval_StopTheWorldAll) in a few places, like before os.fork().