This final example demonstrates a real-world use of Promela on Oleg Nesterov's QRCU [Nes06a,Nes06b], but modified to speed up the synchronize_qrcu() fastpath.
But first, what is QRCU?
QRCU is a variant of SRCU [McK06] that trades somewhat higher read overhead (atomic increment and decrement on a global variable) for extremely low grace-period latencies. If there are no readers, the grace period will be detected in less than a microsecond, compared to the multi-millisecond grace-period latencies of most other RCU implementations.
For example:
idx = qrcu_read_lock(&my_qrcu_struct); /* read-side critical section. */ qrcu_read_unlock(&my_qrcu_struct, idx); |
For example, synchronize_qrcu(&your_qrcu_struct) would not need to wait on the earlier QRCU read-side critical section. In contrast, synchronize_qrcu(&my_qrcu_struct) would need to wait, since it shares the same qrcu_struct.
A Linux-kernel patch for QRCU has been produced [McK07b], but has not yet been included in the Linux kernel as of April 2008.
Returning to the Promela code for QRCU, the global variables are as shown in
Figure .
This example uses locking, hence including lock.h.
Both the number of readers and writers can be varied using the
two #define statements, giving us not one but two ways to create
combinatorial explosion.
The idx variable controls which of the two elements of the ctr
array will be used by readers, and the readerprogress variable
allows to assertion to determine when all the readers are finished
(since a QRCU update cannot be permitted to complete until all
pre-existing readers have completed their QRCU read-side critical
sections).
The readerprogress array elements have values as follows,
indicating the state of the corresponding reader:
Finally, the mutex variable is used to serialize updaters' slowpaths.
QRCU readers are modeled by the qrcu_reader() process shown in
Figure .
A do-od loop spans lines 5-16, with a single guard of ``1''
on line 6 that makes it an infinite loop.
Line 7 captures the current value of the global index, and lines 8-15
atomically increment it (and break from the infinite loop)
if its value was non-zero (atomic_inc_not_zero()).
Line 17 marks entry into the RCU read-side critical section, and
line 18 marks exit from this critical section, both lines for the benefit of
the assert() statement that we shall encounter later.
Line 19 atomically decrements the same counter that we incremented,
thereby exiting the RCU read-side critical section.
The C-preprocessor macro shown in
Figure
sums the pair of counters so as to emulate weak memory ordering.
Lines 2-13 fetch one of the counters, and line 14 fetches the other
of the pair and sums them.
The atomic block consists of a single do-od statement.
This do-od statement (spanning lines 3-12) is unusual in that
it contains two unconditional
branches with guards on lines 4 and 8, which causes Promela to
non-deterministically choose one of the two (but again, the full
state-space search causes Promela to eventually make all possible
choices in each applicable situation).
The first branch fetches the zero-th counter and sets i to 1 (so
that line 14 will fetch the first counter), while the second
branch does the opposite, fetching the first counter and setting i
to 0 (so that line 14 will fetch the second counter).
Quick Quiz E.3: Is there a more straightforward way to code the do-od statement? End Quick Quiz
With the sum_unordered macro in place, we can now proceed to the update-side process shown in Figure. The update-side process repeats indefinitely, with the corresponding do-od loop ranging over lines 7-57. Each pass through the loop first snapshots the global readerprogress array into the local readerstart array on lines 12-21. This snapshot will be used for the assertion on line 53. Line 23 invokes sum_unordered, and then lines 24-27 re-invoke sum_unordered if the fastpath is potentially usable.
Lines 28-40 execute the slowpath code if need be, with lines 30 and 38 acquiring and releasing the update-side lock, lines 31-33 flipping the index, and lines 34-37 waiting for all pre-existing readers to complete.
Lines 44-56 then compare the current values in the readerprogress array to those collected in the readerstart array, forcing an assertion failure should any readers that started before this update still be in progress.
Quick Quiz E.4: Why are there atomic blocks at lines 12-21 and lines 44-56, when the operations within those atomic blocks have no atomic implementation on any current production microprocessor? End Quick Quiz
Quick Quiz E.5: Is the re-summing of the counters on lines 24-27 really necessary? End Quick Quiz
All that remains is the initialization block shown in
Figure .
This block simply initializes the counter pair on lines 5-6,
spawns the reader processes on lines 7-14, and spawns the updater
processes on lines 15-21.
This is all done within an atomic block to reduce state space.