10.3.4.2 Per-Thread Lock-Based RCU

Figure: Per-Thread Lock-Based RCU Implementation
\begin{figure}{ \scriptsize
\begin{verbatim}1 static void rcu_read_lock(void)...
...n_unlock(&per_thread(rcu_gp_lock, t));
18 }
19 }\end{verbatim}
}\end{figure}

Figure [*] (rcu_lock_percpu.h and rcu_lock_percpu.c) shows an implementation based on one lock per thread. The rcu_read_lock() and rcu_read_unlock() functions acquire and release, respectively, the current thread's lock. The synchronize_rcu() function acquires and releases each thread's lock in turn. Therefore, all RCU read-side critical sections running when synchronize_rcu() starts must have completed before synchronize_rcu() can return.

This implementation does have the virtue of permitting concurrent RCU readers, and does avoid the deadlock condition that can arise with a single global lock. Furthermore, the read-side overhead, though high at roughly 140 nanoseconds, remains at about 140 nanoseconds regardless of the number of CPUs. However, the update-side overhead ranges from about 600 nanoseconds on a single Power5 CPU up to more than 100 microseconds on 64 CPUs.

Quick Quiz 10.36: Wouldn't it be cleaner to acquire all the locks, and then release them all in the loop from lines 15-18 of Figure [*]? After all, with this change, there would be a point in time when there were no readers, simplifying things greatly. End Quick Quiz

Quick Quiz 10.37: Is the implementation shown in Figure [*] free from deadlocks? Why or why not? End Quick Quiz

Quick Quiz 10.38: Isn't one advantage of the RCU algorithm shown in Figure [*] that it uses only primitives that are widely available, for example, in POSIX pthreads? End Quick Quiz

This approach could be useful in some situations, given that a similar approach was used in the Linux 2.4 kernel [MM00].

The counter-based RCU implementation described next overcomes some of the shortcomings of the lock-based implementation.

Paul E. McKenney 2011-12-16