10.3.4.1 Lock-Based RCU

Figure: Lock-Based RCU Implementation
\begin{figure}{ \scriptsize
\begin{verbatim}1 static void rcu_read_lock(void)...
...rcu_gp_lock);
14 spin_unlock(&rcu_gp_lock);
15 }\end{verbatim}
}\end{figure}

Perhaps the simplest RCU implementation leverages locking, as shown in Figure [*] (rcu_lock.h and rcu_lock.c). In this implementation, rcu_read_lock() acquires a global spinlock, rcu_read_unlock() releases it, and synchronize_rcu() acquires it then immediately releases it.

Because synchronize_rcu() does not return until it has acquired (and released) the lock, it cannot return until all prior RCU read-side critical sections have completed, thus faithfully implementing RCU semantics. Of course, only one RCU reader may be in its read-side critical section at a time, which almost entirely defeats the purpose of RCU. In addition, the lock operations in rcu_read_lock() and rcu_read_unlock() are extremely heavyweight, with read-side overhead ranging from about 100 nanoseconds on a single Power5 CPU up to more than 17 microseconds on a 64-CPU system. Worse yet, these same lock operations permit rcu_read_lock() to participate in deadlock cycles. Furthermore, in absence of recursive locks, RCU read-side critical sections cannot be nested, and, finally, although concurrent RCU updates could in principle be satisfied by a common grace period, this implementation serializes grace periods, preventing grace-period sharing.

Quick Quiz 10.34: Why wouldn't any deadlock in the RCU implementation in Figure [*] also be a deadlock in any other RCU implementation? End Quick Quiz

Quick Quiz 10.35: Why not simply use reader-writer locks in the RCU implementation in Figure [*] in order to allow RCU readers to proceed in parallel? End Quick Quiz

It is hard to imagine this implementation being useful in a production setting, though it does have the virtue of being implementable in almost any user-level application. Furthermore, similar implementations having one lock per CPU or using reader-writer locks have been used in production in the 2.4 Linux kernel.

A modified version of this one-lock-per-CPU approach, but instead using one lock per thread, is described in the next section.

Paul E. McKenney 2011-12-16