10.3.3.1 RCU has a Family of Wait-to-Finish APIs


Table: RCU Wait-to-Finish APIs
Attribute RCU Classic RCU BH RCU Sched Realtime RCU
Purpose Original Prevent DDoS attacks Wait for preempt-disable regions, hardirqs, & NMIs Realtime response
Availability 2.5.43 2.6.9 2.6.12 2.6.26
Read-side primitives rcu_read_lock() !
rcu_read_unlock() !
rcu_read_lock_bh()
rcu_read_unlock_bh()
preempt_disable()
preempt_enable()
(and friends)
rcu_read_lock()
rcu_read_unlock()
Update-side primitives (synchronous) synchronize_rcu() synchronize_net() synchronize_sched() synchronize_rcu() synchronize_net()
Update-side primitives (asynchronous/callback) call_rcu() ! call_rcu_bh() call_rcu_sched() call_rcu()
Update-side primitives (wait for callbacks) rcu_barrier() rcu_barrier_bh() rcu_barrier_sched() rcu_barrier()
Type-safe memory SLAB_DESTROY_BY_RCU SLAB_DESTROY_BY_RCU
Read side constraints No blocking No irq enabling No blocking Only preemption and lock acquisition
Read side overhead Preempt disable/enable (free on non-PREEMPT) BH disable/enable Preempt disable/enable (free on non-PREEMPT) Simple instructions, irq disable/enable
Asynchronous update-side overhead sub-microsecond sub-microsecond sub-microsecond
Grace-period latency 10s of milliseconds 10s of milliseconds 10s of milliseconds 10s of milliseconds
Non-PREEMPT_RT implementation RCU Classic RCU BH RCU Classic Preemptible RCU
PREEMPT_RT implementation Preemptible RCU Realtime RCU Forced Schedule on all CPUs Realtime RCU



Table: Sleepable RCU Wait-to-Finish APIs
Attribute SRCU QRCU
Purpose Sleeping readers Sleeping readers and fast grace periods
Availability 2.6.19
Read-side primitives srcu_read_lock()
srcu_read_unlock()
qrcu_read_lock()
qrcu_read_unlock()
Update-side primitives (synchronous) synchronize_srcu() synchronize_qrcu()
Update-side primitives (asynchronous/callback) N/A N/A
Update-side primitives (wait for callbacks) N/A N/A
Type-safe memory
Read side constraints No synchronize_srcu() No synchronize_qrcu()
Read side overhead Simple instructions, preempt disable/enable Atomic increment and decrement of shared variable
Asynchronous update-side overhead N/A N/A
Grace-period latency 10s of milliseconds 10s of nanoseconds in absence of readers
Non-PREEMPT_RT implementation SRCU N/A
PREEMPT_RT implementation SRCU N/A


The most straightforward answer to ``what is RCU'' is that RCU is an API used in the Linux kernel, as summarized by Tables [*] and [*], which shows the wait-for-RCU-readers portions of the non-sleepable and sleepable APIs, respectively, and by Table [*], which shows the publish/subscribe portions of the API.

If you are new to RCU, you might consider focusing on just one of the columns in Table [*], each of which summarizes one member of the Linux kernel's RCU API family.. For example, if you are primarily interested in understanding how RCU is used in the Linux kernel, ``RCU Classic'' would be the place to start, as it is used most frequently. On the other hand, if you want to understand RCU for its own sake, ``SRCU'' has the simplest API. You can always come back for the other columns later.

If you are already familiar with RCU, these tables can serve as a useful reference.

Quick Quiz 10.22: Why do some of the cells in Table [*] have exclamation marks (``!'')? End Quick Quiz

The ``RCU Classic'' column corresponds to the original RCU implementation, in which RCU read-side critical sections are delimited by rcu_read_lock() and rcu_read_unlock(), which may be nested. The corresponding synchronous update-side primitives, synchronize_rcu(), along with its synonym synchronize_net(), wait for any currently executing RCU read-side critical sections to complete. The length of this wait is known as a ``grace period''. The asynchronous update-side primitive, call_rcu(), invokes a specified function with a specified argument after a subsequent grace period. For example, call_rcu(p,f); will result in the ``RCU callback'' f(p) being invoked after a subsequent grace period. There are situations, such as when unloading a Linux-kernel module that uses call_rcu(), when it is necessary to wait for all outstanding RCU callbacks to complete [McK07e]. The rcu_barrier() primitive does this job. Note that the more recent hierarchical RCU [McK08a] implementation described in Sections [*] and [*] also adheres to ``RCU Classic'' semantics.

Finally, RCU may be used to provide type-safe memory [GC96], as described in Section [*]. In the context of RCU, type-safe memory guarantees that a given data element will not change type during any RCU read-side critical section that accesses it. To make use of RCU-based type-safe memory, pass SLAB_DESTROY_BY_RCU to kmem_cache_create(). It is important to note that SLAB_DESTROY_BY_RCU will in no way prevent kmem_cache_alloc() from immediately reallocating memory that was just now freed via kmem_cache_free()! In fact, the SLAB_DESTROY_BY_RCU-protected data structure just returned by rcu_dereference might be freed and reallocated an arbitrarily large number of times, even when under the protection of rcu_read_lock(). Instead, SLAB_DESTROY_BY_RCU operates by preventing kmem_cache_free() from returning a completely freed-up slab of data structures to the system until after an RCU grace period elapses. In short, although the data element might be freed and reallocated arbitrarily often, at least its type will remain the same.

Quick Quiz 10.23: How do you prevent a huge number of RCU read-side critical sections from indefinitely blocking a synchronize_rcu() invocation? End Quick Quiz

Quick Quiz 10.24: The synchronize_rcu() API waits for all pre-existing interrupt handlers to complete, right? End Quick Quiz

In the ``RCU BH'' column, rcu_read_lock_bh() and rcu_read_unlock_bh() delimit RCU read-side critical sections, and call_rcu_bh() invokes the specified function and argument after a subsequent grace period. Note that RCU BH does not have a synchronous synchronize_rcu_bh() interface, though one could easily be added if required.

Quick Quiz 10.25: What happens if you mix and match? For example, suppose you use rcu_read_lock() and rcu_read_unlock() to delimit RCU read-side critical sections, but then use call_rcu_bh() to post an RCU callback? End Quick Quiz

Quick Quiz 10.26: Hardware interrupt handlers can be thought of as being under the protection of an implicit rcu_read_lock_bh(), right? End Quick Quiz

In the ``RCU Sched'' column, anything that disables preemption acts as an RCU read-side critical section, and synchronize_sched() waits for the corresponding RCU grace period. This RCU API family was added in the 2.6.12 kernel, which split the old synchronize_kernel() API into the current synchronize_rcu() (for RCU Classic) and synchronize_sched() (for RCU Sched). Note that RCU Sched did not originally have an asynchronous call_rcu_sched() interface, but one was added in 2.6.26. In accordance with the quasi-minimalist philosophy of the Linux community, APIs are added on an as-needed basis.

Quick Quiz 10.27: What happens if you mix and match RCU Classic and RCU Sched? End Quick Quiz

Quick Quiz 10.28: In general, you cannot rely on synchronize_sched() to wait for all pre-existing interrupt handlers, right? End Quick Quiz

The ``Realtime RCU'' column has the same API as does RCU Classic, the only difference being that RCU read-side critical sections may be preempted and may block while acquiring spinlocks. The design of Realtime RCU is described elsewhere [McK07a].

Quick Quiz 10.29: Why do both SRCU and QRCU lack asynchronous call_srcu() or call_qrcu() interfaces? End Quick Quiz

The ``SRCU'' column in Table [*] displays a specialized RCU API that permits general sleeping in RCU read-side critical sections (see Appendix [*] for more details). Of course, use of synchronize_srcu() in an SRCU read-side critical section can result in self-deadlock, so should be avoided. SRCU differs from earlier RCU implementations in that the caller allocates an srcu_struct for each distinct SRCU usage. This approach prevents SRCU read-side critical sections from blocking unrelated synchronize_srcu() invocations. In addition, in this variant of RCU, srcu_read_lock() returns a value that must be passed into the corresponding srcu_read_unlock().

The ``QRCU'' column presents an RCU implementation with the same API structure as SRCU, but optimized for extremely low-latency grace periods in absence of readers, as described elsewhere [McK07f]. As with SRCU, use of synchronize_qrcu() in a QRCU read-side critical section can result in self-deadlock, so should be avoided. Although QRCU has not yet been accepted into the Linux kernel, it is worth mentioning given that it is the only kernel-level RCU implementation that can boast deep sub-microsecond grace-period latencies.

Quick Quiz 10.30: Under what conditions can synchronize_srcu() be safely used within an SRCU read-side critical section? End Quick Quiz

The Linux kernel currently has a surprising number of RCU APIs and implementations. There is some hope of reducing this number, evidenced by the fact that a given build of the Linux kernel currently has at most three implementations behind four APIs (given that RCU Classic and Realtime RCU share the same API). However, careful inspection and analysis will be required, just as would be required in order to eliminate one of the many locking APIs.

The various RCU APIs are distinguished by the forward-progress guarantees that their RCU read-side critical sections must provide, and also by their scope, as follows:

  1. RCU BH: read-side critical sections must guarantee forward progress against everything except for NMI and IRQ handlers, but not including softirq handlers. RCU BH is global in scope.
  2. RCU Sched: read-side critical sections must guarantee forward progress against everything except for NMI and IRQ handlers, including softirq handlers. RCU Sched is global in scope.
  3. RCU (both classic and real-time): read-side critical sections must guarantee forward progress against everything except for NMI handlers, IRQ handlers, softirq handlers, and (in the real-time case) higher-priority real-time tasks. RCU is global in scope.
  4. SRCU and QRCU: read-side critical sections need not guarantee forward progress unless some other task is waiting for the corresponding grace period to complete, in which case these read-side critical sections should complete in no more than a few seconds (and preferably much more quickly).10.1 SRCU's and QRCU's scope is defined by the use of the corresponding srcu_struct or qrcu_struct, respectively.

In other words, SRCU and QRCU compensate for their extremely weak forward-progress guarantees by permitting the developer to restrict their scope.

Paul E. McKenney 2011-12-16