It may seem strange to say much of anything about a CPU whose end of life has been announced, but Alpha is interesting because, with the weakest memory ordering model, it reorders memory operations the most aggressively. It therefore has defined the Linux-kernel memory-ordering primitives, which must work on all CPUs, including Alpha. Understanding Alpha is therefore surprisingly important to the Linux kernel hacker.
The difference between Alpha and the other CPUs is illustrated by the
code shown in
Figure .
This smp_wmb() on line 9 of this figure
guarantees that the element initialization
in lines 6-8 is executed before the element is added to the
list on line 10, so that the lock-free search will work correctly.
That is, it makes this guarantee on all CPUs except Alpha.
Alpha has extremely weak memory ordering
such that the code on line 20 of
Figure could see the old
garbage values that were present before the initialization on lines 6-8.
Figure
shows how this can happen on
an aggressively parallel machine with partitioned caches, so that
alternating caches lines are processed by the different partitions
of the caches.
Assume that the list header head will be processed by cache bank 0,
and that the new element will be processed by cache bank 1.
On Alpha, the smp_wmb() will guarantee that the cache invalidates performed
by lines 6-8 of
Figure
will reach
the interconnect before that of line 10 does, but
makes absolutely no guarantee about the order in which the new values will
reach the reading CPU's core.
For example, it is possible that the reading CPU's cache bank 1 is very
busy, but cache bank 0 is idle.
This could result in the cache invalidates for the new element being
delayed, so that the reading CPU gets the new value for the pointer,
but sees the old cached values for the new element.
See the Web site called out earlier for more information,
or, again, if you think that I am just making all this up.C.6
One could place an smp_rmb() primitive
between the pointer fetch and dereference.
However, this imposes unneeded overhead on systems (such as i386,
IA64, PPC, and SPARC) that respect data dependencies on the read side.
A smp_read_barrier_depends() primitive has been added to the
Linux 2.6 kernel to eliminate overhead on these systems.
This primitive may be used as shown on line 19 of
Figure .
It is also possible to implement a software barrier that could be used in place of smp_wmb(), which would force all reading CPUs to see the writing CPU's writes in order. However, this approach was deemed by the Linux community to impose excessive overhead on extremely weakly ordered CPUs such as Alpha. This software barrier could be implemented by sending inter-processor interrupts (IPIs) to all other CPUs. Upon receipt of such an IPI, a CPU would execute a memory-barrier instruction, implementing a memory-barrier shootdown. Additional logic is required to avoid deadlocks. Of course, CPUs that respect data dependencies would define such a barrier to simply be smp_wmb(). Perhaps this decision should be revisited in the future as Alpha fades off into the sunset.
The Linux memory-barrier primitives took their names from the Alpha instructions, so smp_mb() is mb, smp_rmb() is rmb, and smp_wmb() is wmb. Alpha is the only CPU where smp_read_barrier_depends() is an smp_mb() rather than a no-op.
Quick Quiz C.13: Why is Alpha's smp_read_barrier_depends() an smp_mb() rather than smp_rmb()? End Quick Quiz
For more detail on Alpha, see the reference manual [SW95].
Paul E. McKenney 2011-12-16