D.4.2.4.3 Memory-Barrier Considerations

Figure: Preemptible RCU with Read-Side Memory Barriers
\resizebox{3in}{!}{\includegraphics{appendix/rcuimpl/RCUrt-MBwaste}}

Note that these two primitives contains no memory barriers, so there is nothing to stop the CPU from executing the critical section before executing the rcu_read_lock() or after executing the rcu_read_unlock(). The purpose of the rcu_try_flip_waitmb_state is to account for this possible reordering, but only at the beginning or end of a grace period. To see why this approach is helpful, consider Figure [*], which shows the wastefulness of the conventional approach of placing a memory barrier at the beginning and end of each RCU read-side critical section [MSMB06].

Figure: Preemptible RCU with Grace-Period Memory Barriers
\resizebox{3in}{!}{\includegraphics{appendix/rcuimpl/RCUrt-MBnowaste}}

The "MB"s represent memory barriers, and only the emboldened barriers are needed, namely the first and last on a given CPU for each grace period. This preemptible RCU implementation therefore associates the memory barriers with the grace period, as shown in Figure [*].

Given that the Linux kernel can execute literally millions of RCU read-side critical sections per grace period, this latter approach can result in substantial read-side savings, due to the fact that it amortizes the cost of the memory barrier over all the read-side critical sections in a grace period.

Paul E. McKenney 2011-12-16