It is natural to think of a variable as taking on a well-defined sequence of values in a well-defined, global order. Unfortunately, it is time to say ``goodbye'' to this sort of comforting fiction.
To see this, consider the program fragment shown in
Figure .
This code fragment is executed in parallel by several CPUs.
Line 1 sets a shared variable to the current CPU's ID, line 2
initializes several variables from a gettb() function that
delivers the value of fine-grained hardware ``timebase'' counter that is
synchronized among all CPUs (not available from all CPU architectures,
unfortunately!), and the loop from lines 3-8 records the length of
time that the variable retains the value that this CPU assigned to it.
Of course, one of the CPUs will ``win'', and would thus never exit
the loop if not for the check on lines 7-8.
Quick Quiz 14.3:
What assumption is the code fragment
in Figure
making that might not be valid on real hardware?
End Quick Quiz
Upon exit from the loop, firsttb will hold a timestamp
taken shortly after the assignment and lasttb will hold
a timestamp taken before the last sampling of the shared variable
that still retained the assigned value, or a value equal to firsttb
if the shared variable had changed before entry into the loop.
This allows us to plot each CPU's view of the value of state.variable
over a 532-nanosecond time period, as shown in
Figure .
This data was collected on 1.5GHz POWER5 system with 8 cores, each containing
a pair of hardware threads.
CPUs 1, 2, 3, and 4 recorded the values, while CPU 0 controlled the test.
The timebase counter period was about 5.32ns, sufficiently fine-grained
to allow observations of intermediate cache states.
Each horizontal bar represents the observations of a given CPU over time, with the black regions to the left indicating the time before the corresponding CPU's first measurement. During the first 5ns, only CPU 3 has an opinion about the value of the variable. During the next 10ns, CPUs 2 and 3 disagree on the value of the variable, but thereafter agree that the value is ``2'', which is in fact the final agreed-upon value. However, CPU 1 believes that the value is ``1'' for almost 300ns, and CPU 4 believes that the value is ``4'' for almost 500ns.
Quick Quiz 14.4: How could CPUs possibly have different views of the value of a single variable at the same time? End Quick Quiz
Quick Quiz 14.5: Why do CPUs 2 and 3 come to agreement so quickly, when it takes so long for CPUs 1 and 4 to come to the party? End Quick Quiz
We have entered a regime where we must bade a fond farewell to comfortable intuitions about values of variables and the passage of time. This is the regime where memory barriers are needed.
Paul E. McKenney 2011-12-16