Perhaps the simplest RCU implementation leverages locking, as
shown in
Figure
(rcu_lock.h and rcu_lock.c).
In this implementation, rcu_read_lock() acquires a global
spinlock, rcu_read_unlock() releases it, and
synchronize_rcu() acquires it then immediately releases it.
Because synchronize_rcu() does not return until it has acquired (and released) the lock, it cannot return until all prior RCU read-side critical sections have completed, thus faithfully implementing RCU semantics. Of course, only one RCU reader may be in its read-side critical section at a time, which almost entirely defeats the purpose of RCU. In addition, the lock operations in rcu_read_lock() and rcu_read_unlock() are extremely heavyweight, with read-side overhead ranging from about 100 nanoseconds on a single Power5 CPU up to more than 17 microseconds on a 64-CPU system. Worse yet, these same lock operations permit rcu_read_lock() to participate in deadlock cycles. Furthermore, in absence of recursive locks, RCU read-side critical sections cannot be nested, and, finally, although concurrent RCU updates could in principle be satisfied by a common grace period, this implementation serializes grace periods, preventing grace-period sharing.
Quick Quiz 10.34:
Why wouldn't any deadlock in the RCU implementation in
Figure
also be a deadlock in any other RCU implementation?
End Quick Quiz
Quick Quiz 10.35:
Why not simply use reader-writer locks in the RCU implementation
in
Figure
in order to allow RCU readers to proceed in parallel?
End Quick Quiz
It is hard to imagine this implementation being useful in a production setting, though it does have the virtue of being implementable in almost any user-level application. Furthermore, similar implementations having one lock per CPU or using reader-writer locks have been used in production in the 2.4 Linux kernel.
A modified version of this one-lock-per-CPU approach, but instead using one lock per thread, is described in the next section.
Paul E. McKenney 2011-12-16