E.6 Promela Example: QRCU

This final example demonstrates a real-world use of Promela on Oleg Nesterov's QRCU [Nes06a,Nes06b], but modified to speed up the synchronize_qrcu() fastpath.

But first, what is QRCU?

QRCU is a variant of SRCU [McK06] that trades somewhat higher read overhead (atomic increment and decrement on a global variable) for extremely low grace-period latencies. If there are no readers, the grace period will be detected in less than a microsecond, compared to the multi-millisecond grace-period latencies of most other RCU implementations.

  1. There is a qrcu_struct that defines a QRCU domain. Like SRCU (and unlike other variants of RCU) QRCU's action is not global, but instead focused on the specified qrcu_struct.
  2. There are qrcu_read_lock() and qrcu_read_unlock() primitives that delimit QRCU read-side critical sections. The corresponding qrcu_struct must be passed into these primitives, and the return value from rcu_read_lock() must be passed to rcu_read_unlock().

    For example:



    idx = qrcu_read_lock(&my_qrcu_struct);
    /* read-side critical section. */
    qrcu_read_unlock(&my_qrcu_struct, idx);
    


  3. There is a synchronize_qrcu() primitive that blocks until all pre-existing QRCU read-side critical sections complete, but, like SRCU's synchronize_srcu(), QRCU's synchronize_qrcu() need wait only for those read-side critical sections that are using the same qrcu_struct.

    For example, synchronize_qrcu(&your_qrcu_struct) would not need to wait on the earlier QRCU read-side critical section. In contrast, synchronize_qrcu(&my_qrcu_struct) would need to wait, since it shares the same qrcu_struct.

A Linux-kernel patch for QRCU has been produced [McK07b], but has not yet been included in the Linux kernel as of April 2008.

Figure: QRCU Global Variables
\begin{figure}{ % \scriptsize
\begin{verbatim}1  ...

Returning to the Promela code for QRCU, the global variables are as shown in Figure [*]. This example uses locking, hence including lock.h. Both the number of readers and writers can be varied using the two #define statements, giving us not one but two ways to create combinatorial explosion. The idx variable controls which of the two elements of the ctr array will be used by readers, and the readerprogress variable allows to assertion to determine when all the readers are finished (since a QRCU update cannot be permitted to complete until all pre-existing readers have completed their QRCU read-side critical sections). The readerprogress array elements have values as follows, indicating the state of the corresponding reader:

  1. 0: not yet started.
  2. 1: within QRCU read-side critical section.
  3. 2: finished with QRCU read-side critical section.

Finally, the mutex variable is used to serialize updaters' slowpaths.

Figure: QRCU Reader Process
\begin{figure}{ % \scriptsize
\begin{verbatim}1 proctype qrcu_reader(byte me)...
...rogress[me] = 2;
19 atomic { ctr[myidx]-- }
20 }\end{verbatim}
}\end{figure}

QRCU readers are modeled by the qrcu_reader() process shown in Figure [*]. A do-od loop spans lines 5-16, with a single guard of ``1'' on line 6 that makes it an infinite loop. Line 7 captures the current value of the global index, and lines 8-15 atomically increment it (and break from the infinite loop) if its value was non-zero (atomic_inc_not_zero()). Line 17 marks entry into the RCU read-side critical section, and line 18 marks exit from this critical section, both lines for the benefit of the assert() statement that we shall encounter later. Line 19 atomically decrements the same counter that we incremented, thereby exiting the RCU read-side critical section.

Figure: QRCU Unordered Summation
\begin{figure}{ % \scriptsize
\begin{verbatim}1  ...

The C-preprocessor macro shown in Figure [*] sums the pair of counters so as to emulate weak memory ordering. Lines 2-13 fetch one of the counters, and line 14 fetches the other of the pair and sums them. The atomic block consists of a single do-od statement. This do-od statement (spanning lines 3-12) is unusual in that it contains two unconditional branches with guards on lines 4 and 8, which causes Promela to non-deterministically choose one of the two (but again, the full state-space search causes Promela to eventually make all possible choices in each applicable situation). The first branch fetches the zero-th counter and sets i to 1 (so that line 14 will fetch the first counter), while the second branch does the opposite, fetching the first counter and setting i to 0 (so that line 14 will fetch the second counter).

Quick Quiz E.3: Is there a more straightforward way to code the do-od statement? End Quick Quiz

Figure: QRCU Updater Process
\begin{figure}{ \scriptsize
\begin{verbatim}1 proctype qrcu_updater(byte me)
...
...ert(sum == 0);
54 break
55 od
56 }
57 od
58 }\end{verbatim}
}\end{figure}

With the sum_unordered macro in place, we can now proceed to the update-side process shown in Figure. The update-side process repeats indefinitely, with the corresponding do-od loop ranging over lines 7-57. Each pass through the loop first snapshots the global readerprogress array into the local readerstart array on lines 12-21. This snapshot will be used for the assertion on line 53. Line 23 invokes sum_unordered, and then lines 24-27 re-invoke sum_unordered if the fastpath is potentially usable.

Lines 28-40 execute the slowpath code if need be, with lines 30 and 38 acquiring and releasing the update-side lock, lines 31-33 flipping the index, and lines 34-37 waiting for all pre-existing readers to complete.

Lines 44-56 then compare the current values in the readerprogress array to those collected in the readerstart array, forcing an assertion failure should any readers that started before this update still be in progress.

Quick Quiz E.4: Why are there atomic blocks at lines 12-21 and lines 44-56, when the operations within those atomic blocks have no atomic implementation on any current production microprocessor? End Quick Quiz

Quick Quiz E.5: Is the re-summing of the counters on lines 24-27 really necessary? End Quick Quiz

Figure: QRCU Initialization Process
\begin{figure}{ % \scriptsize
\begin{verbatim}1 init {
2 int i;
3
4 atomic...
...: i >= N_QRCU_UPDATERS -> break
21 od
22 }
23 }\end{verbatim}
}\end{figure}

All that remains is the initialization block shown in Figure [*]. This block simply initializes the counter pair on lines 5-6, spawns the reader processes on lines 7-14, and spawns the updater processes on lines 15-21. This is all done within an atomic block to reduce state space.



Subsections
Paul E. McKenney 2011-12-16