Many CPUs speculate with loads: that is, they see that they will need to load an item from memory, and they find a time where they're not using the bus for any other loads, and then do the load in advance -- even though they haven't actually got to that point in the instruction execution flow yet. Later on, this potentially permits the actual load instruction to complete immediately because the CPU already has the value on hand.
It may turn out that the CPU didn't actually need the value (perhaps because a branch circumvented the load) in which case it can discard the value or just cache it for later use. For example, consider the following:
|
On some CPUs, divide instructions can take a long time to complete,
which means that CPU 2's bus might go idle during that time.
CPU 2 might therefore speculatively load A before the divides
complete.
In the (hopefully) unlikely event of an exception from one of the dividees,
this speculative load will have been wasted, but in the (again, hopefully)
common case, overlapping the load with the divides will permit the load
to complete more quickly, as illustrated by
Figure .
Placing a read barrier or a data dependency barrier just before the second load:
|
will force any value speculatively obtained to be reconsidered to an extent
dependent on the type of barrier used. If there was no change made to the
speculated memory location, then the speculated value will just be used,
as shown in
Figure .
On the other hand, if there was an update or invalidation to A
from some other CPU, then the speculation will be cancelled and the
value of A will be reloaded,
as shown in Figure
.
Paul E. McKenney 2011-12-16