But why are memory barriers needed in the first place? Can't CPUs keep track of ordering on their own? Isn't that why we have computers in the first place, to keep track of things?
Many people do indeed expect their computers to keep track of things,
but many also insist that they keep track of things quickly.
One difficulty that modern computer-system vendors face is that
the main memory cannot keep up with the CPU - modern CPUs can execute
hundreds of instructions in time required to fetch a single variable
from memory.
CPUs therefore sport increasingly large caches, as shown in
Figure .
Variables that are heavily used by a given CPU will tend to remain
in that CPU's cache, allowing high-speed access to the corresponding
data.
Unfortunately, when a CPU accesses data that is not yet in its cache
will result in an expensive ``cache miss'', requiring the data to
be fetched from main memory.
Doubly unfortunately, running typical code results in a significant
number of cache misses.
To limit the resulting performance degradation, CPUs have been designed to
execute other instructions and memory references while waiting for
a cache miss to fetch data from memory.
This clearly causes instructions and memory references to execute out
of order, which could cause serious confusion, as illustrated in
Figure .
Compilers and synchronization primitives (such as locking and RCU)
are responsible for maintaining the illusion of ordering through use of
``memory barriers'' (for example, smp_mb() in the Linux kernel).
These memory barriers can be explicit instructions, as they are on
ARM, POWER, Itanium, and Alpha, or they can be implied by other instructions,
as they are on x86.
Since the standard synchronization primitives preserve the illusion of ordering, your path of least resistance is to stop reading this section and simply use these primitives.
However, if you need to implement the synchronization primitives themselves, or if you are simply interested in understanding how memory ordering and memory barriers work, read on!
The next sections present counter-intuitive scenarios that you might encounter when using explicit memory barriers.
Paul E. McKenney 2011-12-16