10.3.1.2 Wait For Pre-Existing RCU Readers to Complete

In its most basic form, RCU is a way of waiting for things to finish. Of course, there are a great many other ways of waiting for things to finish, including reference counts, reader-writer locks, events, and so on. The great advantage of RCU is that it can wait for each of (say) 20,000 different things without having to explicitly track each and every one of them, and without having to worry about the performance degradation, scalability limitations, complex deadlock scenarios, and memory-leak hazards that are inherent in schemes using explicit tracking.

In RCU's case, the things waited on are called ``RCU read-side critical sections''. An RCU read-side critical section starts with an rcu_read_lock() primitive, and ends with a corresponding rcu_read_unlock() primitive. RCU read-side critical sections can be nested, and may contain pretty much any code, as long as that code does not explicitly block or sleep (although a special form of RCU called SRCU [McK06] does permit general sleeping in SRCU read-side critical sections). If you abide by these conventions, you can use RCU to wait for any desired piece of code to complete.

RCU accomplishes this feat by indirectly determining when these other things have finished [McK07g,McK07a], as is described in detail in Appendix [*].

Figure: Readers and RCU Grace Period
\resizebox{3in}{!}{\includegraphics{defer/GracePeriodGood}}

In particular, as shown in Figure [*], RCU is a way of waiting for pre-existing RCU read-side critical sections to completely finish, including memory operations executed by those critical sections. However, note that RCU read-side critical sections that begin after the beginning of a given grace period can and will extend beyond the end of that grace period.

The following pseudocode shows the basic form of algorithms that use RCU to wait for readers:

  1. Make a change, for example, replace an element in a linked list.
  2. Wait for all pre-existing RCU read-side critical sections to completely finish (for example, by using the synchronize_rcu() primitive). The key observation here is that subsequent RCU read-side critical sections have no way to gain a reference to the newly removed element.
  3. Clean up, for example, free the element that was replaced above.

Figure: Canonical RCU Replacement Example
\begin{figure}{ \scriptsize
\begin{verbatim}1 struct foo {
2 struct list_hea...
...t, &q->list);
20 synchronize_rcu();
21 kfree(p);\end{verbatim}
}\end{figure}

The code fragment shown in Figure [*], adapted from those in Section [*], demonstrates this process, with field a being the search key.

Lines 19, 20, and 21 implement the three steps called out above. Lines 16-19 gives RCU (``read-copy update'') its name: while permitting concurrent reads, line 16 copies and lines 17-19 do an update.

The synchronize_rcu() primitive might seem a bit mysterious at first. After all, it must wait for all RCU read-side critical sections to complete, and, as we saw earlier, the rcu_read_lock() and rcu_read_unlock() primitives that delimit RCU read-side critical sections don't even generate any code in non-CONFIG_PREEMPT kernels!

There is a trick, and the trick is that RCU Classic read-side critical sections delimited by rcu_read_lock() and rcu_read_unlock() are not permitted to block or sleep. Therefore, when a given CPU executes a context switch, we are guaranteed that any prior RCU read-side critical sections will have completed. This means that as soon as each CPU has executed at least one context switch, all prior RCU read-side critical sections are guaranteed to have completed, meaning that synchronize_rcu() can safely return.

Thus, RCU Classic's synchronize_rcu() can conceptually be as simple as the following (see Appendix [*] for additional ``toy'' RCU implementations):



  1 for_each_online_cpu(cpu)
  2   run_on(cpu);


Here, run_on() switches the current thread to the specified CPU, which forces a context switch on that CPU. The for_each_online_cpu() loop therefore forces a context switch on each CPU, thereby guaranteeing that all prior RCU read-side critical sections have completed, as required. Although this simple approach works for kernels in which preemption is disabled across RCU read-side critical sections, in other words, for non-CONFIG_PREEMPT and CONFIG_PREEMPT kernels, it does not work for CONFIG_PREEMPT_RT realtime (-rt) kernels. Therefore, realtime RCU uses a different approach based loosely on reference counters [McK07a].

Of course, the actual implementation in the Linux kernel is much more complex, as it is required to handle interrupts, NMIs, CPU hotplug, and other hazards of production-capable kernels, but while also maintaining good performance and scalability. Realtime implementations of RCU must additionally help provide good realtime response, which rules out implementations (like the simple two-liner above) that rely on disabling preemption.

Although it is good to know that there is a simple conceptual implementation of synchronize_rcu(), other questions remain. For example, what exactly do RCU readers see when traversing a concurrently updated list? This question is addressed in the following section.

Paul E. McKenney 2011-12-16