F.9 Chapter [*]

Quick Quiz [*].1: 
How on earth could the assertion on line 21 of the code in Figure [*] on page [*] possibly fail?
 
Answer:
The key point is that the intuitive analysis missed is that there is nothing preventing the assignment to C from overtaking the assignment to A as both race to reach thread2(). This is explained in the remainder of this section.

Quick Quiz [*].2: 
Great... So how do I fix it?
 
Answer:
The easiest fix is to replace the barrier() on line 12 with an smp_mb().

Quick Quiz [*].3: 
What assumption is the code fragment in Figure [*] making that might not be valid on real hardware?
 
Answer:
The code assumes that as soon as a given CPU stops seeing its own value, it will immediately see the final agreed-upon value. On real hardware, some of the CPUs might well see several intermediate results before converging on the final value.

Quick Quiz [*].4: 
How could CPUs possibly have different views of the value of a single variable at the same time?
 
Answer:
Many CPUs have write buffers that record the values of recent writes, which are applied once the corresponding cache line makes its way to the CPU. Therefore, it is quite possible for each CPU to see a different value for a given variable at a single point in time -- and for main memory to hold yet another value. One of the reasons that memory barriers were invented was to allow software to deal gracefully with situations like this one.

Quick Quiz [*].5: 
Why do CPUs 2 and 3 come to agreement so quickly, when it takes so long for CPUs 1 and 4 to come to the party?
 
Answer:
CPUs 2 and 3 are a pair of hardware threads on the same core, sharing the same cache hierarchy, and therefore have very low communications latencies. This is a NUMA, or, more accurately, a NUCA effect.

This leads to the question of why CPUs 2 and 3 ever disagree at all. One possible reason is that they each might have a small amount of private cache in addition to a larger shared cache. Another possible reason is instruction reordering, given the short 10-nanosecond duration of the disagreement and the total lack of memory barriers in the code fragment.

Quick Quiz [*].6: 
But if the memory barriers do not unconditionally force ordering, how the heck can a device driver reliably execute sequences of loads and stores to MMIO registers?
 
Answer:
MMIO registers are special cases: because they appear in uncached regions of physical memory. Memory barriers do unconditionally force ordering of loads and stores to uncached memory. See Section @@@ for more information on memory barriers and MMIO regions.

Quick Quiz [*].7: 
How could the assertion b==2 on page [*] possibly fail?
 
Answer:
If the CPU is not required to see all of its loads and stores in order, then the b=1+a might well see an old version of the variable ``a''.

This is why it is so very important that each CPU or thread see all of its own loads and stores in program order.

Quick Quiz [*].8: 
How could the code on page [*] possibly leak memory?
 
Answer:
Only the first execution of the critical section should see p==NULL. However, if there is no global ordering of critical sections for mylock, then how can you say that a particular one was first? If several different executions of that critical section thought that they were first, they would all see p==NULL, and they would all allocate memory. All but one of those allocations would be leaked.

This is why it is so very important that all the critical sections for a given exclusive lock appear to execute in some well-defined order.

Quick Quiz [*].9: 
How could the code on page [*] possibly count backwards?
 
Answer:
Suppose that the counter started out with the value zero, and that three executions of the critical section had therefore brought its value to three. If the fourth execution of the critical section is not constrained to see the most recent store to this variable, it might well see the original value of zero, and therefore set the counter to one, which would be going backwards.

This is why it is so very important that loads from a given variable in a given critical section see the last store from the last prior critical section to store to that variable.

Quick Quiz [*].10: 
What effect does the following sequence have on the order of stores to variables ``a'' and ``b''?
    a = 1;
    b = 1;
    <write barrier>
 
Answer:
Absolutely none. This barrier would ensure that the assignments to ``a'' and ``b'' happened before any subsequent assignments, but it does nothing to enforce any order of assignments to ``a'' and ``b'' themselves.

Quick Quiz [*].11: 
What sequence of LOCK-UNLOCK operations would act as a full memory barrier?
 
Answer:
A series of two back-to-back LOCK-UNLOCK operations, or, somewhat less conventionally, an UNLOCK operations followed by a LOCK operation.

Quick Quiz [*].12: 
What (if any) CPUs have memory-barrier instructions from which these semi-permeable locking primitives might be constructed?
 
Answer:
Itanium is one example. The identification of any others is left as an exercise for the reader.

Quick Quiz [*].13: 
Given that operations grouped in curly braces are executed concurrently, which of the rows of Table [*] are legitimate reorderings of the assignments to variables ``A'' through ``F'' and the LOCK/UNLOCK operations? (The order in the code is A, B, LOCK, C, D, UNLOCK, E, F.) Why or why not?
 
Answer:

  1. Legitimate, executed in order.
  2. Legitimate, the lock acquisition was executed concurrently with the last assignment preceding the critical section.
  3. Illegitimate, the assignment to ``F'' must follow the LOCK operation.
  4. Illegitimate, the LOCK must complete before any operation in the critical section. However, the UNLOCK may legitimately be executed concurrently with subsequent operations.
  5. Legitimate, the assignment to ``A'' precedes the UNLOCK, as required, and all other operations are in order.
  6. Illegitimate, the assignment to ``C'' must follow the LOCK.
  7. Illegitimate, the assignment to ``D'' must precede the UNLOCK.
  8. Legitimate, all assignments are ordered with respect to the LOCK and UNLOCK operations.
  9. Illegitimate, the assignment to ``A'' must precede the UNLOCK.

Quick Quiz [*].14: 
What are the constraints for Table [*]?
 
Answer:
They are as follows:

  1. LOCK M must precede B, C, and D.
  2. UNLOCK M must follow A, B, and C.
  3. LOCK Q must precede F, G, and H.
  4. UNLOCK Q must follow E, F, and G.

Paul E. McKenney 2011-12-16