Note that these two primitives contains no memory barriers, so there is
nothing to stop the CPU from executing the critical section
before executing the rcu_read_lock() or after executing
the rcu_read_unlock().
The purpose of the rcu_try_flip_waitmb_state is to
account for this possible reordering, but only at the beginning or end of
a grace period.
To see why this approach is helpful, consider
Figure ,
which shows the wastefulness of the conventional approach of placing
a memory barrier at the beginning and end of each RCU read-side critical
section [MSMB06].
The "MB"s represent memory barriers, and only the emboldened
barriers are needed, namely the first and last on a given CPU
for each grace period.
This preemptible RCU implementation therefore associates the memory
barriers with the grace period, as shown in
Figure .
Given that the Linux kernel can execute literally millions of RCU read-side critical sections per grace period, this latter approach can result in substantial read-side savings, due to the fact that it amortizes the cost of the memory barrier over all the read-side critical sections in a grace period.
Paul E. McKenney 2011-12-16