The ARM family of CPUs is extremely popular in embedded applications,
particularly for power-constrained applications such as cellphones.
There have nevertheless been multiprocessor implementations of ARM
for more than five years.
Its memory model is similar to that of Power
(see Section , but ARM uses a
different set of memory-barrier instructions [ARM10]:
None of these instructions exactly match the semantics of Linux's rmb() primitive, which must therefore be implemented as a full DMB. The DMB and DSB instructions have a recursive definition of accesses ordered before and after the barrier, which has an effect similar to that of POWER's cumulativity.
ARM also implements control dependencies, so that if a conditional branch depends on a load, then any store executed after that conditional branch will be ordered after the load. However, loads following the conditional branch will not be guaranteed to be ordered unless there is an ISB instruction between the branch and the load. Consider the following example:
1 r1 = x; 2 if (r1 == 0) 3 nop(); 4 y = 1; 5 r2 = z; 6 ISB(); 7 r3 = z; |
In this example, load-store control dependency ordering causes the load from x on line 1 to be ordered before the store to y on line 4. However, ARM does not respect load-load control dependencies, so that the load on line 1 might well happen after the load on line 5. On the other hand, the combination of the conditional branch on line 2 and the ISB instruction on line 6 ensures that the load on line 7 happens after the load on line 1. Note that inserting an additional ISB instruction somewhere between lines 3 and 4 would enforce ordering between lines 1 and 5.
Paul E. McKenney 2011-12-16