Figure
(rcu_lock_percpu.h and rcu_lock_percpu.c)
shows an implementation based on one lock per thread.
The rcu_read_lock() and rcu_read_unlock() functions
acquire and release, respectively, the current thread's lock.
The synchronize_rcu() function acquires and releases each thread's
lock in turn.
Therefore, all RCU read-side critical sections running
when synchronize_rcu() starts must have completed before
synchronize_rcu() can return.
This implementation does have the virtue of permitting concurrent RCU readers, and does avoid the deadlock condition that can arise with a single global lock. Furthermore, the read-side overhead, though high at roughly 140 nanoseconds, remains at about 140 nanoseconds regardless of the number of CPUs. However, the update-side overhead ranges from about 600 nanoseconds on a single Power5 CPU up to more than 100 microseconds on 64 CPUs.
Quick Quiz 10.36:
Wouldn't it be cleaner to acquire all the locks, and then
release them all in the loop from lines 15-18 of
Figure ?
After all, with this change, there would be a point in time
when there were no readers, simplifying things greatly.
End Quick Quiz
Quick Quiz 10.37:
Is the implementation shown in
Figure
free from deadlocks?
Why or why not?
End Quick Quiz
Quick Quiz 10.38:
Isn't one advantage of the RCU algorithm shown in
Figure
that it uses only primitives that are widely available,
for example, in POSIX pthreads?
End Quick Quiz
This approach could be useful in some situations, given that a similar approach was used in the Linux 2.4 kernel [MM00].
The counter-based RCU implementation described next overcomes some of the shortcomings of the lock-based implementation.
Paul E. McKenney 2011-12-16