Quick Quiz .2:
Great... So how do I fix it?
Answer:
The easiest fix is to replace the barrier() on
line 12 with an smp_mb().
Quick Quiz .3:
What assumption is the code fragment
in Figure
making that might not be valid on real hardware?
Answer:
The code assumes that as soon as a given CPU stops
seeing its own value, it will immediately see the
final agreed-upon value.
On real hardware, some of the CPUs might well see several
intermediate results before converging on the final value.
Quick Quiz .4:
How could CPUs possibly have different views of the
value of a single variable at the same time?
Answer:
Many CPUs have write buffers that record the values of
recent writes, which are applied once the corresponding
cache line makes its way to the CPU.
Therefore, it is quite possible for each CPU to see a
different value for a given variable at a single point
in time -- and for main memory to hold yet another value.
One of the reasons that memory barriers were invented was
to allow software to deal gracefully with situations like
this one.
Quick Quiz .5:
Why do CPUs 2 and 3 come to agreement so quickly, when it
takes so long for CPUs 1 and 4 to come to the party?
Answer:
CPUs 2 and 3 are a pair of hardware threads on the same
core, sharing the same cache hierarchy, and therefore have
very low communications latencies.
This is a NUMA, or, more accurately, a NUCA effect.
This leads to the question of why CPUs 2 and 3 ever disagree at all. One possible reason is that they each might have a small amount of private cache in addition to a larger shared cache. Another possible reason is instruction reordering, given the short 10-nanosecond duration of the disagreement and the total lack of memory barriers in the code fragment.
Quick Quiz .6:
But if the memory barriers do not unconditionally force
ordering, how the heck can a device driver reliably execute
sequences of loads and stores to MMIO registers?
Answer:
MMIO registers are special cases: because they appear
in uncached regions of physical memory.
Memory barriers do unconditionally force ordering
of loads and stores to uncached memory.
See Section @@@ for more information on memory barriers
and MMIO regions.
Quick Quiz .7:
How could the assertion b==2 on
page
possibly fail?
Answer:
If the CPU is not required to see all of its loads and
stores in order, then the b=1+a might well see an
old version of the variable ``a''.
This is why it is so very important that each CPU or thread see all of its own loads and stores in program order.
Quick Quiz .8:
How could the code on
page
possibly leak memory?
Answer:
Only the first execution of the critical section should
see p==NULL.
However, if there is no global ordering of critical sections for
mylock, then how can you say that a particular one was
first?
If several different executions of that critical section thought
that they were first, they would all see p==NULL, and
they would all allocate memory.
All but one of those allocations would be leaked.
This is why it is so very important that all the critical sections for a given exclusive lock appear to execute in some well-defined order.
Quick Quiz .9:
How could the code on
page
possibly count backwards?
Answer:
Suppose that the counter started out with the value zero,
and that three executions of the critical section had therefore
brought its value to three.
If the fourth execution of the critical section is not constrained
to see the most recent store to this variable, it might well see
the original value of zero, and therefore set the counter to
one, which would be going backwards.
This is why it is so very important that loads from a given variable in a given critical section see the last store from the last prior critical section to store to that variable.
Quick Quiz .10:
What effect does the following sequence have on the
order of stores to variables ``a'' and ``b''?
a = 1;
b = 1;
<write barrier>
Answer:
Absolutely none. This barrier would ensure that the
assignments to ``a'' and ``b'' happened before any subsequent
assignments, but it does nothing to enforce any order of
assignments to ``a'' and ``b'' themselves.
Quick Quiz .11:
What sequence of LOCK-UNLOCK operations would
act as a full memory barrier?
Answer:
A series of two back-to-back LOCK-UNLOCK operations, or, somewhat
less conventionally, an UNLOCK operations followed by a LOCK
operation.
Quick Quiz .12:
What (if any) CPUs have memory-barrier instructions
from which these semi-permeable locking primitives might
be constructed?
Answer:
Itanium is one example.
The identification of any others is left as an
exercise for the reader.
Quick Quiz .13:
Given that operations grouped in curly braces are executed
concurrently, which of the rows of
Table
are legitimate reorderings of the assignments to variables
``A'' through ``F'' and the LOCK/UNLOCK operations?
(The order in the code is A, B, LOCK, C, D, UNLOCK, E, F.)
Why or why not?
Answer:
Quick Quiz .14:
What are the constraints for
Table ?
Answer:
They are as follows:
Paul E. McKenney 2011-12-16