8.1.1 Deadlock

Deadlock occurs when each of a group of threads is holding at least one lock while at the same time waiting on a lock held by a member of the same group.

Without some sort of external intervention, deadlock is forever. No thread can acquire the lock it is waiting on until that lock is released by the thread holding it, but the thread holding it cannot release it until the holding thread acquires the lock that it is waiting on.

Figure: Deadlock Cycle
\resizebox{3in}{!}{\includegraphics{locking/DeadlockCycle}}

We can create a directed-graph representation of a deadlock scenario with nodes for threads and locks, as shown in Figure [*]. An arrow from a lock to a thread indicates that the thread holds the lock, for example, Thread B holds Locks 2 and 4. An arrow from a thread to a lock indicates that the thread is waiting on the lock, for example, Thread B is waiting on Lock 3.

A deadlock scenario will always contain at least one deadlock cycle. In Figure [*], this cycle is Thread B, Lock 3, Thread C, Lock 4, and back to Thread B.

Quick Quiz 8.1: But the definition of deadlock only said that each thread was holding at least one lock and waiting on another lock that was held by some thread. How do you know that there is a cycle? End Quick Quiz

Although there are some software environments such as database systems that can repair an existing deadlock, this approach requires either that one of the threads be killed or that a lock be forcibly stolen from one of the threads. This killing and forcible stealing can be appropriate for transactions, but is problematic for kernel and application-level use of locking.

Kernels and applications therefore work to avoid deadlocks. There are three major approaches, locking hierarchies, conditional locking, and single-lock-at-a-time designs.

Locking hierarchies order the locks and prohibit acquiring locks out of order. In Figure [*], we might order the locks numerically, so that a thread was forbidden from acquiring a given lock if it already held a lock with the same or a higher number. Thread B has violated this hierarchy because it is attempting to acquire Lock 3 while holding Lock 4, which permitted the deadlock to occur.

Again, to apply a locking hierarchy, order the locks and prohibit out-of-order lock acquisition. In large program, it is wise to use tools to enforce your locking hierarchy [Cor06].

Figure: Protocol Layering and Deadlock
\begin{figure}{ \scriptsize
\begin{verbatim}1 spin_lock(&lock2);
2 layer_2_p...
...unlock(&lock2);
7 spin_unlock(&nextlayer->lock1);\end{verbatim}
}\end{figure}

But suppose that there is no reasonable locking hierarchy. This can happen in real life, for example, in layered network protocol stacks where packets flow in both directions. In the networking case, it might be necessary to hold the locks from both layers when passing a packet from one layer to another. Given that packets travel both up and down the protocol stack, this is an excellent recipe for deadlock, as illustrated in Figure [*]. Here, a packet moving down the stack towards the wire must acquire the next layer's lock out of order. Given that packets moving up the stack away from the wire are acquiring the locks in order, the lock acquisition in line 4 of the figure can result in deadlock.

Figure: Avoiding Deadlock Via Conditional Locking
\begin{figure}{ \scriptsize
\begin{verbatim}1 retry:
2 spin_lock(&lock2);
3...
...nlock(&lock2);
17 spin_unlock(&nextlayer->lock1);\end{verbatim}
}\end{figure}

One way to avoid deadlocks in this case is to impose a locking hierarchy, but when it is necessary to acquire a lock out of order, acquire it conditionally, as shown in Figure [*]. Instead of unconditionally acquiring the layer-1 lock, line 5 conditionally acquires the lock using the spin_trylock() primitive. This primitive acquires the lock immediately if the lock is available (returning non-zero), and otherwise returns zero without acquiring the lock.

If spin_trylock() was successful, line 15 does the needed layer-1 processing. Otherwise, line 6 releases the lock, and lines 7 and 8 acquire them in the correct order. Unfortunately, there might be multiple networking devices on the system (e.g., Ethernet and WiFi), so that the layer_1() function must make a routing decision. This decision might change at any time, especially if the system is mobile.8.1Therefore, line 9 must recheck the decision, and if it has changed, must release the locks and start over.

Quick Quiz 8.2: Can the transformation from Figure [*] to Figure [*] be applied universally? End Quick Quiz

Quick Quiz 8.3: But the complexity in Figure [*] is well worthwhile given that it avoids deadlock, right? End Quick Quiz

In some cases, it is possible to avoid nesting locks, thus avoiding deadlock. However, there must be some mechanism to ensure that the needed data structures remain in existence during the time that neither lock is held. One such mechanism is discussed in Section [*] and several others are presented in Chapter [*].

Paul E. McKenney 2011-12-16