7.4 Parallel Fastpath

Fine-grained (and therefore usually higher-performance) designs are typically more complex than are coarser-grained designs. In many cases, most of the overhead is incurred by a small fraction of the code [Knu73]. So why not focus effort on that small fraction?

This is the idea behind the parallel-fastpath design pattern, to aggressively parallelize the common-case code path without incurring the complexity that would be required to aggressively parallelize the entire algorithm. You must understand not only the specific algorithm you wish to parallelize, but also the workload that the algorithm will be subjected to. Great creativity and design effort is often required to construct a parallel fastpath.

Parallel fastpath combines different patterns (one for the fastpath, one elsewhere) and is therefore a template pattern. The following instances of parallel fastpath occur often enough to warrant their own patterns, as depicted in Figure [*]:

Figure: Parallel-Fastpath Design Patterns
\includegraphics{SMPdesign/ParallelFastpath}

  1. Reader/Writer Locking (described below in Section [*]).
  2. Read-copy update (RCU), which may be used as a high-performance replacement for reader/writer locking, is introduced in Section [*], and will not be discussed further in this chapter.
  3. Hierarchical Locking ([McK96]), which is touched upon in Section [*].
  4. Resource Allocator Caches ([McK96,MS93]). See Section [*] for more detail.



Subsections
Paul E. McKenney 2011-12-16