10.3.1.1 Publish-Subscribe Mechanism

Figure: Data Structure Publication (Unsafe)
\begin{figure}{ \scriptsize
\begin{verbatim}1 struct foo {
2 int a;
3 int b...
... p->a = 1;
12 p->b = 2;
13 p->c = 3;
14 gp = p;\end{verbatim}
}\end{figure}

One key attribute of RCU is the ability to safely scan data, even though that data is being modified concurrently. To provide this ability for concurrent insertion, RCU uses what can be thought of as a publish-subscribe mechanism. For example, consider an initially NULL global pointer gp that is to be modified to point to a newly allocated and initialized data structure. The code fragment shown in Figure [*] (with the addition of appropriate locking) might be used for this purpose.

Unfortunately, there is nothing forcing the compiler and CPU to execute the last four assignment statements in order. If the assignment to gp happens before the initialization of p fields, then concurrent readers could see the uninitialized values. Memory barriers are required to keep things ordered, but memory barriers are notoriously difficult to use. We therefore encapsulate them into a primitive rcu_assign_pointer() that has publication semantics. The last four lines would then be as follows:



  1 p->a = 1;
  2 p->b = 2;
  3 p->c = 3;
  4 rcu_assign_pointer(gp, p);


The rcu_assign_pointer() would publish the new structure, forcing both the compiler and the CPU to execute the assignment to gp after the assignments to the fields referenced by p

However, it is not sufficient to only enforce ordering at the updater, as the reader must enforce proper ordering as well. Consider for example the following code fragment:



  1 p = gp;
  2 if (p != NULL) {
  3   do_something_with(p->a, p->b, p->c);
  4 }


Although this code fragment might well seem immune to misordering, unfortunately, the DEC Alpha CPU [McK05a,McK05b] and value-speculation compiler optimizations can, believe it or not, cause the values of p->a, p->b, and p->c to be fetched before the value of p. This is perhaps easiest to see in the case of value-speculation compiler optimizations, where the compiler guesses the value of p fetches p->a, p->b, and p->c then fetches the actual value of p in order to check whether its guess was correct. This sort of optimization is quite aggressive, perhaps insanely so, but does actually occur in the context of profile-driven optimization.

Clearly, we need to prevent this sort of skullduggery on the part of both the compiler and the CPU. The rcu_dereference() primitive uses whatever memory-barrier instructions and compiler directives are required for this purpose:



  1 rcu_read_lock();
  2 p = rcu_dereference(gp);
  3 if (p != NULL) {
  4   do_something_with(p->a, p->b, p->c);
  5 }
  6 rcu_read_unlock();


The rcu_dereference() primitive can thus be thought of as subscribing to a given value of the specified pointer, guaranteeing that subsequent dereference operations will see any initialization that occurred before the corresponding publish (rcu_assign_pointer() operation. The rcu_read_lock() and rcu_read_unlock() calls are absolutely required: they define the extent of the RCU read-side critical section. Their purpose is explained in Section [*], however, they never spin or block, nor do they prevent the list_add_rcu() from executing concurrently. In fact, in non-CONFIG_PREEMPT kernels, they generate absolutely no code.

Figure: Linux Circular Linked List
\resizebox{3in}{!}{\includegraphics{defer/Linux_list}}

Figure: Linux Linked List Abbreviated
\resizebox{3in}{!}{\includegraphics{defer/Linux_list_abbr}}

Although rcu_assign_pointer() and rcu_dereference() can in theory be used to construct any conceivable RCU-protected data structure, in practice it is often better to use higher-level constructs. Therefore, the rcu_assign_pointer() and rcu_dereference() primitives have been embedded in special RCU variants of Linux's list-manipulation API. Linux has two variants of doubly linked list, the circular struct list_head and the linear struct hlist_head/struct hlist_node pair. The former is laid out as shown in Figure [*], where the green boxes represent the list header and the blue boxes represent the elements in the list. This notation is cumbersome, and will therefore be abbreviated as shown in Figure [*].

Figure: RCU Data Structure Publication
\begin{figure}{ \scriptsize
\begin{verbatim}1 struct foo {
2 struct list_hea...
...;
14 p->c = 3;
15 list_add_rcu(&p->list, &head);\end{verbatim}
}\end{figure}

Adapting the pointer-publish example for the linked list results in the code shown in Figure [*].

Line 15 must be protected by some synchronization mechanism (most commonly some sort of lock) to prevent multiple list_add() instances from executing concurrently. However, such synchronization does not prevent this list_add() instance from executing concurrently with RCU readers.

Subscribing to an RCU-protected list is straightforward:



  1 rcu_read_lock();
  2 list_for_each_entry_rcu(p, head, list) {
  3   do_something_with(p->a, p->b, p->c);
  4 }
  5 rcu_read_unlock();


The list_add_rcu() primitive publishes an entry into the specified list, guaranteeing that the corresponding list_for_each_entry_rcu() invocation will properly subscribe to this same entry.

Quick Quiz 10.7: What prevents the list_for_each_entry_rcu() from getting a segfault if it happens to execute at exactly the same time as the list_add_rcu()? End Quick Quiz

Figure: Linux Linear Linked List
\resizebox{3in}{!}{\includegraphics{defer/Linux_hlist}}

Linux's other doubly linked list, the hlist, is a linear list, which means that it needs only one pointer for the header rather than the two required for the circular list, as shown in Figure[*]. Thus, use of hlist can halve the memory consumption for the hash-bucket arrays of large hash tables. As before, this notation is cumbersome, so hlists will be abbreviated in the same way lists are, as shown in Figure [*].

Figure: RCU hlist Publication
\begin{figure}{ \scriptsize
\begin{verbatim}1 struct foo {
2 struct hlist_no...
...p->c = 3;
15 hlist_add_head_rcu(&p->list, &head);\end{verbatim}
}\end{figure}

Publishing a new element to an RCU-protected hlist is quite similar to doing so for the circular list, as shown in Figure [*].

As before, line 15 must be protected by some sort of synchronization mechanism, for example, a lock.

Subscribing to an RCU-protected hlist is also similar to the circular list:



  1 rcu_read_lock();
  2 hlist_for_each_entry_rcu(p, q, head, list) {
  3   do_something_with(p->a, p->b, p->c);
  4 }
  5 rcu_read_unlock();


Quick Quiz 10.8: Why do we need to pass two pointers into hlist_for_each_entry_rcu() when only one is needed for list_for_each_entry_rcu()? End Quick Quiz


Table: RCU Publish and Subscribe Primitives
Category Publish Retract Subscribe
Pointers rcu_assign_pointer() rcu_assign_pointer(..., NULL) rcu_dereference()
Lists list_add_rcu()
list_add_tail_rcu()
list_replace_rcu()
list_del_rcu() list_for_each_entry_rcu()
Hlists hlist_add_after_rcu()
hlist_add_before_rcu()
hlist_add_head_rcu()
hlist_replace_rcu()
hlist_del_rcu() hlist_for_each_entry_rcu()


The set of RCU publish and subscribe primitives are shown in Table [*], along with additional primitives to ``unpublish'', or retract.

Note that the list_replace_rcu(), list_del_rcu(), hlist_replace_rcu(), and hlist_del_rcu() APIs add a complication. When is it safe to free up the data element that was replaced or removed? In particular, how can we possibly know when all the readers have released their references to that data element?

These questions are addressed in the following section.

Paul E. McKenney 2011-12-16