10.3.2.1.1 Performance

Figure: Performance Advantage of RCU Over Reader-Writer Locking
\resizebox{3in}{!}{\includegraphics{defer/rwlockRCUperf}}

The read-side performance advantages of RCU over reader-writer locking are shown in Figure [*].

Quick Quiz 10.12: WTF? How the heck do you expect me to believe that RCU has a 100-femtosecond overhead when the clock period at 3GHz is more than 300 picoseconds? End Quick Quiz

Note that reader-writer locking is orders of magnitude slower than RCU on a single CPU, and is almost two additional orders of magnitude slower on 16 CPUs. In contrast, RCU scales quite well. In both cases, the error bars span a single standard deviation in either direction.

Figure: Performance Advantage of Preemptible RCU Over Reader-Writer Locking
\resizebox{3in}{!}{\includegraphics{defer/rwlockRCUperfPREEMPT}}

A more moderate view may be obtained from a CONFIG_PREEMPT kernel, though RCU still beats reader-writer locking by between one and three orders of magnitude, as shown in Figure [*]. Note the high variability of reader-writer locking at larger numbers of CPUs. The error bars span a single standard deviation in either direction.

Figure: Comparison of RCU to Reader-Writer Locking as Function of Critical-Section Duration
\resizebox{3in}{!}{\includegraphics{defer/rwlockRCUperfwtPREEMPT}}

Of course, the low performance of reader-writer locking in Figure [*] is exaggerated by the unrealistic zero-length critical sections. The performance advantages of RCU become less significant as the overhead of the critical section increases, as shown in Figure [*] for a 16-CPU system, in which the y-axis represents the sum of the overhead of the read-side primitives and that of the critical section.

Quick Quiz 10.13: Why does both the variability and overhead of rwlock decrease as the critical-section overhead increases? End Quick Quiz

However, this observation must be tempered by the fact that a number of system calls (and thus any RCU read-side critical sections that they contain) can complete within a few microseconds.

In addition, as is discussed in the next section, RCU read-side primitives are almost entirely deadlock-immune.

Paul E. McKenney 2011-12-16