The read-side performance advantages of RCU over reader-writer locking
are shown in
Figure .
Quick Quiz 10.12: WTF? How the heck do you expect me to believe that RCU has a 100-femtosecond overhead when the clock period at 3GHz is more than 300 picoseconds? End Quick Quiz
Note that reader-writer locking is orders of magnitude slower than RCU on a single CPU, and is almost two additional orders of magnitude slower on 16 CPUs. In contrast, RCU scales quite well. In both cases, the error bars span a single standard deviation in either direction.
A more moderate view may be obtained from a CONFIG_PREEMPT
kernel, though RCU still beats reader-writer locking by between one and
three orders of magnitude, as shown in
Figure .
Note the high variability of reader-writer locking at larger numbers of CPUs.
The error bars span a single standard deviation in either direction.
Of course, the low performance of reader-writer locking in
Figure
is exaggerated by the unrealistic zero-length critical sections.
The performance advantages of RCU become less significant as the overhead
of the critical section increases, as shown in
Figure
for a 16-CPU system,
in which the y-axis represents the sum of the overhead
of the read-side primitives and that of the critical section.
Quick Quiz 10.13: Why does both the variability and overhead of rwlock decrease as the critical-section overhead increases? End Quick Quiz
However, this observation must be tempered by the fact that a number of system calls (and thus any RCU read-side critical sections that they contain) can complete within a few microseconds.
In addition, as is discussed in the next section, RCU read-side primitives are almost entirely deadlock-immune.
Paul E. McKenney 2011-12-16