14.2.7 Abstract Memory Access Model

Consider the abstract model of the system shown in Figure [*].

Figure: Abstract Memory Access Model
\includegraphics{advsync/AbstractMemoryAccessModel}

Each CPU executes a program that generates memory access operations. In the abstract CPU, memory operation ordering is very relaxed, and a CPU may actually perform the memory operations in any order it likes, provided program causality appears to be maintained. Similarly, the compiler may also arrange the instructions it emits in any order it likes, provided it doesn't affect the apparent operation of the program.

So in the above diagram, the effects of the memory operations performed by a CPU are perceived by the rest of the system as the operations cross the interface between the CPU and rest of the system (the dotted lines).

For example, consider the following sequence of events given the initial values {A = 1, B = 2}:



CPU 1 CPU 2
A = 3; x = A;
B = 4; y = B;


The set of accesses as seen by the memory system in the middle can be arranged in 24 different combinations, with loads denoted by ``ld'' and stores denoted by ``st'':

 
st A=3, st B=4, x=ld A$\rightarrow$3, y=ld B$\rightarrow$4
st A=3, st B=4, y=ld B$\rightarrow$4, x=ld A$\rightarrow$3
st A=3, x=ld A$\rightarrow$3, st B=4, y=ld B$\rightarrow$4
st A=3, x=ld A$\rightarrow$3, y=ld B$\rightarrow$2, st B=4
st A=3, y=ld B$\rightarrow$2, st B=4, x=ld A$\rightarrow$3
st A=3, y=ld B$\rightarrow$2, x=ld A$\rightarrow$3, st B=4
st B=4, st A=3, x=ld A$\rightarrow$3, y=ld B$\rightarrow$4
st B=4, ...
...


and can thus result in four different combinations of values:



x == 1, y == 2
x == 1, y == 4
x == 3, y == 2
x == 3, y == 4


Furthermore, the stores committed by a CPU to the memory system may not be perceived by the loads made by another CPU in the same order as the stores were committed.

As a further example, consider this sequence of events given the initial values {A = 1, B = 2, C = 3, P = &A, Q = &C}:



CPU 1 CPU 2
B = 4; Q = P;
P = &B D = *Q;


There is an obvious data dependency here, as the value loaded into D depends on the address retrieved from P by CPU 2. At the end of the sequence, any of the following results are possible:



(Q == &A) and (D == 1)
(Q == &B) and (D == 2)
(Q == &B) and (D == 4)


Note that CPU 2 will never try and load C into D because the CPU will load P into Q before issuing the load of *Q.

Paul E. McKenney 2011-12-16