Memory ordering and memory barriers can be extremely counter-intuitive.
For example, consider the functions shown in
Figure
executing in parallel
where variables A, B, and C are initially zero:
Intuitively, thread0() assigns to B after it assigns to A, thread1() waits until thread0() has assigned to B before assigning to C, and thread2() waits until thread1() has assigned to C before referencing A. Therefore, again intuitively, the assertion on line 21 cannot possibly fire.
This line of reasoning, intuitively obvious though it may be, is completely and utterly incorrect. Please note that this is not a theoretical assertion: actually running this code on real-world weakly-ordered hardware (a 1.5GHz 16-CPU POWER 5 system) resulted in the assertion firing 16 times out of 10 million runs. Clearly, anyone who produces code with explicit memory barriers should do some extreme testing - although a proof of correctness might be helpful, the strongly counter-intuitive nature of the behavior of memory barriers should in turn strongly limit one's trust in such proofs. The requirement for extreme testing should not be taken lightly, given that a number of dirty hardware-dependent tricks were used to greatly increase the probability of failure in this run.
Quick Quiz 14.1:
How on earth could the assertion on line 21 of the code in
Figure on
page
possibly fail?
End Quick Quiz
Quick Quiz 14.2: Great... So how do I fix it? End Quick Quiz
So what should you do? Your best strategy, if possible, is to use existing primitives that incorporate any needed memory barriers, so that you can simply ignore the rest of this chapter.
Of course, if you are implementing synchronization primitives, you don't have this luxury. The following discussion of memory ordering and memory barriers is for you.
Paul E. McKenney 2011-12-16