Figure
(rcu.h and rcu.c)
show an RCU implementation based on a single global free-running counter
that takes on only even-numbered values, with data shown in
Figure
.
The resulting rcu_read_lock() implementation is extremely
straightforward.
Line 3 simply adds one to the global free-running rcu_gp_ctr
variable and stores the resulting odd-numbered value into the
rcu_reader_gp per-thread variable.
Line 4 executes a memory barrier to prevent the content of the
subsequent RCU read-side critical section from ``leaking out''.
The rcu_read_unlock() implementation is similar. Line 9 executes a memory barrier, again to prevent the prior RCU read-side critical section from ``leaking out''. Line 10 then copies the rcu_gp_ctr global variable to the rcu_reader_gp per-thread variable, leaving this per-thread variable with an even-numbered value so that a concurrent instance of synchronize_rcu() will know to ignore it.
Quick Quiz 10.48:
If any even value is sufficient to tell synchronize_rcu()
to ignore a given task, why doesn't line 10 of
Figure
simply assign zero to rcu_reader_gp?
End Quick Quiz
Thus, synchronize_rcu() could wait for all of the per-thread rcu_reader_gp variables to take on even-numbered values. However, it is possible to do much better than that because synchronize_rcu() need only wait on pre-existing RCU read-side critical sections. Line 17 executes a memory barrier to prevent prior manipulations of RCU-protected data structures from being reordered (by either the CPU or the compiler) to follow the increment on line 17. Line 18 acquires the rcu_gp_lock (and line 28 releases it) in order to prevent multiple synchronize_rcu() instances from running concurrently. Line 19 then increments the global rcu_gp_ctr variable by two, so that all pre-existing RCU read-side critical sections will have corresponding per-thread rcu_reader_gp variables with values less than that of rcu_gp_ctr, modulo the machine's word size. Recall also that threads with even-numbered values of rcu_reader_gp are not in an RCU read-side critical section, so that lines 21-27 scan the rcu_reader_gp values until they all are either even (line 22) or are greater than the global rcu_gp_ctr (lines 23-24). Line 25 blocks for a short period of time to wait for a pre-existing RCU read-side critical section, but this can be replaced with a spin-loop if grace-period latency is of the essence. Finally, the memory barrier at line 29 ensures that any subsequent destruction will not be reordered into the preceding loop.
Quick Quiz 10.49:
Why are the memory barriers on lines 17 and 29 of
Figure
needed?
Aren't the memory barriers inherent in the locking
primitives on lines 18 and 28 sufficient?
End Quick Quiz
This approach achieves much better read-side performance, incurring roughly 63 nanoseconds of overhead regardless of the number of Power5 CPUs. Updates incur more overhead, ranging from about 500 nanoseconds on a single Power5 CPU to more than 100 microseconds on 64 such CPUs.
Quick Quiz 10.50:
Couldn't the update-side optimization described in
Section
be applied to the implementation shown in
Figure
?
End Quick Quiz
This implementation suffers from some serious shortcomings in
addition to the high update-side overhead noted earlier.
First, it is no longer permissible to nest RCU read-side critical
sections, a topic that is taken up in the next section.
Second, if a reader is preempted at line 3 of
Figure after fetching from
rcu_gp_ctr but before storing to rcu_reader_gp,
and if the rcu_gp_ctr counter then runs through more than half
but less than all of its possible values, then synchronize_rcu()
will ignore the subsequent RCU read-side critical section.
Third and finally, this implementation requires that the enclosing software
environment be able to enumerate threads and maintain per-thread
variables.
Quick Quiz 10.51:
Is the possibility o readers being preempted in
line 3 of Figure
a real problem, in other words, is there a real sequence
of events that could lead to failure?
If not, why not?
If so, what is the sequence of events, and how can the
failure be addressed?
End Quick Quiz
Paul E. McKenney 2011-12-16