A.1 What Does ``After'' Mean?

``After'' is an intuitive, but surprisingly difficult concept. An important non-intuitive issue is that code can be delayed at any point for any amount of time. Consider a producing and a consuming thread that communicate using a global struct with a timestamp ``t'' and integer fields ``a'', ``b'', and ``c''. The producer loops recording the current time (in seconds since 1970 in decimal), then updating the values of ``a'', ``b'', and ``c'', as shown in Figure [*]. The consumer code loops, also recording the current time, but also copying the producer's timestamp along with the fields ``a'', ``b'', and ``c'', as shown in Figure [*]. At the end of the run, the consumer outputs a list of anomalous recordings, e.g., where time has appeared to go backwards.

Figure: ``After'' Producer Function
\begin{figure}{ \scriptsize
\begin{verbatim}1 /* WARNING: BUGGY CODE. */
2 v...
...);
17 producer_done = 1;
18 return (NULL);
19 }\end{verbatim}
}\end{figure}

Figure: ``After'' Consumer Function
\begin{figure}{ \scriptsize
\begin{verbatim}1 /* WARNING: BUGGY CODE. */
2 v...
...j].c - ssc[j - 1].c);
51 consumer_done = 1;
52 }\end{verbatim}
}\end{figure}

Quick Quiz A.1: What SMP coding errors can you see in these examples? See time.c for full code. End Quick Quiz

One might intuitively expect that the difference between the producer and consumer timestamps would be quite small, as it should not take much time for the producer to record the timestamps or the values. An excerpt of some sample output on a dual-core 1GHz x86 is shown in Table [*]. Here, the ``seq'' column is the number of times through the loop, the ``time'' column is the time of the anomaly in seconds, the ``delta'' column is the number of seconds the consumer's timestamp follows that of the producer (where a negative value indicates that the consumer has collected its timestamp before the producer did), and the columns labelled ``a'', ``b'', and ``c'' show the amount that these variables increased since the prior snapshot collected by the consumer.


Table: ``After'' Program Sample Output
seq time (seconds) delta  a b c
17563: 1152396.251585 (-16.928) 27 27 27
18004: 1152396.252581 (-12.875) 24 24 24
18163: 1152396.252955 (-19.073) 18 18 18
18765: 1152396.254449 (-148.773) 216 216 216
19863: 1152396.256960 (-6.914) 18 18 18
21644: 1152396.260959 (-5.960) 18 18 18
23408: 1152396.264957 (-20.027) 15 15 15


Why is time going backwards? The number in parentheses is the difference in microseconds, with a large number exceeding 10 microseconds, and one exceeding even 100 microseconds! Please note that this CPU can potentially execute about more than 100,000 instructions in that time.

One possible reason is given by the following sequence of events:

  1. Consumer obtains timestamp (Figure [*], line 13).
  2. Consumer is preempted.
  3. An arbitrary amount of time passes.
  4. Producer obtains timestamp (Figure [*], line 10).
  5. Consumer starts running again, and picks up the producer's timestamp (Figure [*], line 14).

In this scenario, the producer's timestamp might be an arbitrary amount of time after the consumer's timestamp.

How do you avoid agonizing over the meaning of ``after'' in your SMP code?

Simply use SMP primitives as designed.

In this example, the easiest fix is to use locking, for example, acquire a lock in the producer before line 10 in Figure [*] and in the consumer before line 13 in Figure [*]. This lock must also be released after line 13 in Figure [*] and after line 17 in Figure [*]. These locks cause the code segments in line 10-13 of Figure [*] and in line 13-17 of Figure [*] to exclude each other, in other words, to run atomically with respect to each other. This is represented in Figure [*]: the locking prevents any of the boxes of code from overlapping in time, so that the consumer's timestamp must be collected after the prior producer's timestamp. The segments of code in each box in this figure are termed ``critical sections''; only one such critical section may be executing at a given time.

Figure: Effect of Locking on Snapshot Collection
\includegraphics{appendix/questions/after}

This addition of locking results in output as shown in Figure [*]. Here there are no instances of time going backwards, instead, there are only cases with more than 1,000 counts different between consecutive reads by the consumer.


Table: Locked ``After'' Program Sample Output
seq time (seconds) delta  a b c
58597: 1156521.556296 (3.815) 1485 1485 1485
403927: 1156523.446636 (2.146) 2583 2583 2583


Quick Quiz A.2: How could there be such a large gap between successive consumer reads? See timelocked.c for full code. End Quick Quiz

In summary, if you acquire an exclusive lock, you know that anything you do while holding that lock will appear to happen after anything done by any prior holder of that lock. No need to worry about which CPU did or did not execute a memory barrier, no need to worry about the CPU or compiler reordering operations - life is simple. Of course, the fact that this locking prevents these two pieces of code from running concurrently might limit the program's ability to gain increased performance on multiprocessors, possibly resulting in a ``safe but slow'' situation. Chapter [*] describes ways of gaining performance and scalability in many situations.

However, in most cases, if you find yourself worrying about what happens before or after a given piece of code, you should take this as a hint to make better use of the standard primitives. Let these primitives do the worrying for you.

Paul E. McKenney 2011-12-16