14.2.1 Memory Ordering and Memory Barriers

But why are memory barriers needed in the first place? Can't CPUs keep track of ordering on their own? Isn't that why we have computers in the first place, to keep track of things?

Many people do indeed expect their computers to keep track of things, but many also insist that they keep track of things quickly. One difficulty that modern computer-system vendors face is that the main memory cannot keep up with the CPU - modern CPUs can execute hundreds of instructions in time required to fetch a single variable from memory. CPUs therefore sport increasingly large caches, as shown in Figure [*]. Variables that are heavily used by a given CPU will tend to remain in that CPU's cache, allowing high-speed access to the corresponding data.

Figure: Modern Computer System Cache Structure
\resizebox{3in}{!}{\includegraphics{appendix/whymb/cacheSC}}

Unfortunately, when a CPU accesses data that is not yet in its cache will result in an expensive ``cache miss'', requiring the data to be fetched from main memory. Doubly unfortunately, running typical code results in a significant number of cache misses. To limit the resulting performance degradation, CPUs have been designed to execute other instructions and memory references while waiting for a cache miss to fetch data from memory. This clearly causes instructions and memory references to execute out of order, which could cause serious confusion, as illustrated in Figure [*]. Compilers and synchronization primitives (such as locking and RCU) are responsible for maintaining the illusion of ordering through use of ``memory barriers'' (for example, smp_mb() in the Linux kernel). These memory barriers can be explicit instructions, as they are on ARM, POWER, Itanium, and Alpha, or they can be implied by other instructions, as they are on x86.

Figure: CPUs Can Do Things Out of Order
\resizebox{3in}{2in}{\includegraphics{cartoons/CPU_toon_outoforder_colored}}

Since the standard synchronization primitives preserve the illusion of ordering, your path of least resistance is to stop reading this section and simply use these primitives.

However, if you need to implement the synchronization primitives themselves, or if you are simply interested in understanding how memory ordering and memory barriers work, read on!

The next sections present counter-intuitive scenarios that you might encounter when using explicit memory barriers.

Paul E. McKenney 2011-12-16