The POWER and Power PC
CPU families have a wide variety of memory-barrier
instructions [IBM94,LSH02]:
Unfortunately, none of these instructions line up exactly with Linux's wmb() primitive, which requires all stores to be ordered, but does not require the other high-overhead actions of the sync instruction. But there is no choice: ppc64 versions of wmb() and mb() are defined to be the heavyweight sync instruction. However, Linux's smp_wmb() instruction is never used for MMIO (since a driver must carefully order MMIOs in UP as well as SMP kernels, after all), so it is defined to be the lighter weight eieio instruction. This instruction may well be unique in having a five-vowel mnemonic. The smp_mb() instruction is also defined to be the sync instruction, but both smp_rmb() and rmb() are defined to be the lighter-weight lwsync instruction.
Power features ``cumulativity'', which can be used to obtain transitivity. When used properly, any code seeing the results of an earlier code fragment will also see the accesses that this earlier code fragment itself saw. Much more detail is available from McKenney and Silvera [MS09].
Power respects control dependencies in much the same way that ARM does, with the exception that the Power isync instruction is substituted for the ARM ISB instruction.
Many members of the POWER architecture have incoherent instruction caches, so that a store to memory will not necessarily be reflected in the instruction cache. Thankfully, few people write self-modifying code these days, but JITs and compilers do it all the time. Furthermore, recompiling a recently run program looks just like self-modifying code from the CPU's viewpoint. The icbi instruction (instruction cache block invalidate) invalidates a specified cache line from the instruction cache, and may be used in these situations.
Paul E. McKenney 2011-12-16