To see the first complication, a violation of self-consistency, consider the following code with variables ``a'' and ``b'' both initially zero, and with the cache line containing variable ``a'' initially owned by CPU 1 and that containing ``b'' initially owned by CPU 0:
1 a = 1; 2 b = a + 1; 3 assert(b == 2); |
One would not expect the assertion to fail.
However, if one were foolish enough to use the very simple architecture
shown in
Figure ,
one would be surprised.
Such a system could potentially see the following sequence of events:
The problem is that we have two copies of ``a'', one in the cache and the other in the store buffer.
This example breaks a very important guarantee, namely that each CPU
will always see its own operations as if they happened in program order.
Breaking this guarantee is violently counter-intuitive to software types,
so much so
that the hardware guys took pity and implemented ``store forwarding'',
where each CPU refers to (or ``snoops'') its store buffer as well
as its cache when performing loads, as shown in
Figure .
In other words, a given CPU's stores are directly forwarded to its
subsequent loads, without having to pass through the cache.
With store forwarding in place, item
in the above sequence would have found the correct value of 1 for ``a'' in
the store buffer, so that the final value of ``b'' would have been 2,
as one would hope.
Paul E. McKenney 2011-12-16