14.2.10.4 Data Dependency Barriers

The usage requirements of data dependency barriers are a little subtle, and it's not always obvious that they're needed. To illustrate, consider the following sequence of events, with initial values {A = 1, B = 2, C = 3, P = &A, Q = &C}:



CPU 1 CPU 2
B = 4;
<write barrier>
P = &B;
Q = P;
D = *Q;


There's a clear data dependency here, and it would seem intuitively obvious that by the end of the sequence, Q must be either &A or &B, and that:



(Q == &A) implies (D == 1)
(Q == &B) implies (D == 4)


Counter-intuitive though it might be, it is quite possible that CPU 2's perception of P might be updated before its perception of B, thus leading to the following situation:



(Q == &B) and (D == 2) ????


Whilst this may seem like a failure of coherency or causality maintenance, it isn't, and this behaviour can be observed on certain real CPUs (such as the DEC Alpha).

To deal with this, a data dependency barrier must be inserted between the address load and the data load (again with initial values of {A = 1, B = 2, C = 3, P = &A, Q = &C}):



CPU 1 CPU 2
B = 4;
<write barrier>
P = &B;
Q = P;
<data dependency barrier>
D = *Q;


This enforces the occurrence of one of the two implications, and prevents the third possibility from arising.

Note that this extremely counterintuitive situation arises most easily on machines with split caches, so that, for example, one cache bank processes even-numbered cache lines and the other bank processes odd-numbered cache lines. The pointer P might be stored in an odd-numbered cache line, and the variable B might be stored in an even-numbered cache line. Then, if the even-numbered bank of the reading CPU's cache is extremely busy while the odd-numbered bank is idle, one can see the new value of the pointer P (which is &B), but the old value of the variable B (which is 1).

Another example of where data dependency barriers might by required is where a number is read from memory and then used to calculate the index for an array access with initial values {M[0] = 1, M[1] = 2, M[3] = 3, P = 0, Q = 3}:



CPU 1 CPU 2
M[1] = 4;
<write barrier>
P = 1;
Q = P;
<data dependency barrier>
D = M[Q];


The data dependency barrier is very important to the Linux kernel's RCU system, for example, see rcu_dereference() in include/linux/rcupdate.h. This permits the current target of an RCU'd pointer to be replaced with a new modified target, without the replacement target appearing to be incompletely initialised.

See also the subsection on @@@"Cache Coherency" for a more thorough example.

Paul E. McKenney 2011-12-16