6.4.1 Atomic Limit Counter Implementation

Unfortunately, when causing a given thread to give up its count, it is necessary to atomically manipulate both that thread's counter and countermax variables. The usual way to do this is to combine these two variables into a single variable, for example, given a 32-bit variable, using the high-order 16 bits to represent counter and the low-order 16 bits to represent countermax.

Figure: Atomic Limit Counter Variables and Access Functions
\begin{figure}{ \scriptsize
\begin{verbatim}1 atomic_t __thread counterandmax...
...< CM_BITS) \vert cm;
32 return ((int)cami);
33 }\end{verbatim}
}\end{figure}

The variables and access functions for a simple atomic limit counter are shown in Figure [*] (count_lim_atomic.c). The counter and countermax variables in earlier algorithms are combined into the single variable counterandmax shown on line 1, with counter in the upper half and countermax in the lower half. This variable is of type atomic_t, which has an underlying representation of int.

Lines 2-6 show the definitions for globalcountmax, globalcount, globalreserve, counterp, and gblcnt_mutex, all of which take on roles similar to their counterparts in Figure [*]. Line 7 defines CM_BITS, which gives the number of bits in each half of counterandmax, and line 8 defines MAX_COUNTERMAX, which gives the maximum value that may be held in either half of counterandmax.

Quick Quiz 6.29: In what way does line 7 of Figure [*] violate the C standard? End Quick Quiz

Lines 10-15 show the split_counterandmax_int() function, which, when given the underlying int from the atomic_t counterandmax variable. Line 13 isolates the most-significant half of this int, placing the result as specified by argument c, and line 14 isolates the least-significant half of this int, placing the result as specified by argument cm.

Lines 17-25 show the split_counterandmax() function, which picks up the underlying int from the specified variable on line 21, stores it as specified by the old argument on line 23, and then invokes split_counterandmax_int() to split it on line 24.

Quick Quiz 6.30: Given that there is only one counterandmax variable, why bother passing in a pointer to it on line 18 of Figure [*]? End Quick Quiz

Lines 27-33 show the merge_counterandmax() function, which can be thought of as the inverse of split_counterandmax(). Line 31 merges the counter and countermax values passed in c and cm, respectively, and returns the result.

Quick Quiz 6.31: Why does merge_counterandmax() in Figure [*] return an int rather than storing directly into an atomic_t? End Quick Quiz

Figure: Atomic Limit Counter Add and Subtract
\begin{figure}{ \scriptsize
\begin{verbatim}1 int add_count(unsigned long del...
...61 spin_unlock(&gblcnt_mutex);
62 return 1;
63 }\end{verbatim}
}\end{figure}

Figure [*] shows the add_count(), sub_count(), and read_count() functions.

Lines 1-32 show add_count(), whose fastpath spans lines 8-15, with the remainder of the function being the slowpath. Lines 8-14 of the fastpath form a compare-and-swap (CAS) loop, with the atomic_cmpxchg() primitives on lines 13-14 performing the actual CAS. Line 9 splits the current thread's counterandmax variable into its counter (in c) and countermax (in cm) components, while placing the underlying int into old. Line 10 checks whether the amount delta can be accommodated locally (taking care to avoid integer overflow), and if not, line 11 transfers to the slowpath. Otherwise, line 11 combines an updated counter value with the original countermax value into new. The atomic_cmpxchg() primitive on lines 13-14 then atomically compares this thread's counterandmax variable to old, updating its value to new if the comparison succeeds. If the comparison succeeds, line 15 returns success, otherwise, execution continues in the loop at line 9.

Quick Quiz 6.32: Yecch! Why the ugly goto on line 11 of Figure [*]? Haven't you heard of the break statement??? End Quick Quiz

Quick Quiz 6.33: Why would the atomic_cmpxchg() primitive at lines 13-14 of Figure [*] ever fail? After all, we picked up its old value on line 9 and have not changed it! End Quick Quiz

Lines 16-32 of Figure [*] show add_count()'s slowpath, which is protected by gblcnt_mutex, which is acquired on line 17 and released on lines 24 and 30. Line 18 invokes globalize_count(), which moves this thread's state to the global counters. Lines 19-20 check whether the delta value can be accommodated by the current global state, and, if not, line 21 invokes flush_local_count() to flush all threads' local state to the global counters, and then lines 22-23 recheck whether delta can be accommodated. If, after all that, the addition of delta still cannot be accommodated, then line 24 releases gblcnt_mutex (as noted earlier), and then line 25 returns failure.

Otherwise, line 28 adds delta to the global counter, line 29 spreads counts to the local state if appropriate, line 30 releases gblcnt_mutex (again, as noted earlier), and finally, line 31 returns success.

Lines 34-63 of Figure [*] show sub_count(), which is structured similarly to add_count(), having a fastpath on lines 41-48 and a slowpath on lines 49-62. A line-by-line analysis of this function is left as an exercise to the reader.

Figure: Atomic Limit Counter Read
\begin{figure}{ \scriptsize
\begin{verbatim}1 unsigned long read_count(void)
...
... spin_unlock(&gblcnt_mutex);
17 return sum;
18 }\end{verbatim}
}\end{figure}

Figure [*] shows read_count(). Line 9 acquires gblcnt_mutex and line 16 releases it. Line 10 initializes local variable sum to the value of globalcount, and the loop spanning lines 11-15 adds the per-thread counters to this sum, isolating each per-thread counter using split_counterandmax on line 13. Finally, line 17 returns the sum.

Figure: Atomic Limit Counter Utility Functions
\begin{figure}{ \scriptsize
\begin{verbatim}1 static void globalize_count(voi...
...idx] = NULL;
72 spin_unlock(&gblcnt_mutex);
73 }\end{verbatim}
}\end{figure}

Figure [*] shows the utility functions globalize_count(), flush_local_count(), balance_count(), count_register_thread(), and count_unregister_thread(). The code for globalize_count() is shown on lines 1-12, and it is similar to that of previous algorithms, with the addition of line 7, which is now required to split out counter and countermax from counterandmax.

The code for flush_local_count(), which moves all threads' local counter state to the global counter, is shown on lines 14-32. Line 22 checks to see if the value of globalreserve permits any per-thread counts, and, if not, line 23 returns. Otherwise, line 24 initializes local variable zero to a combined zeroed counter and countermax. The loop spanning lines 25-31 sequences through each thread. Line 26 checks to see if the current thread has counter state, and, if so, lines 27-30 move that state to the global counters. Line 27 atomically fetches the current thread's state while replacing it with zero. Line 28 splits this state into its counter (in local variable c) and countermax (in local variable cm) components. Line 29 adds this thread's counter to globalcount, while line 30 subtracts this thread's countermax from globalreserve.

Quick Quiz 6.34: What stops a thread from simply refilling its counterandmax variable immediately after flush_local_count() on line 14 of Figure [*] empties it? End Quick Quiz

Quick Quiz 6.35: What prevents concurrent execution of the fastpath of either atomic_add() or atomic_sub() from interfering with the counterandmax variable while flush_local_count() is accessing it on line 27 of Figure [*] empties it? End Quick Quiz

Lines 34-54 show the code for balance_count(), which refills the calling thread's local counterandmax variable. This function is quite similar to that of the preceding algorithms, with changes required to handle the merged counterandmax variable. Detailed analysis of the code is left as an exercise for the reader, as it is with the count_register_thread() function starting on line 56 and the count_unregister_thread() function starting on line 65.

Quick Quiz 6.36: Given that the atomic_set() primitive does a simple store to the specified atomic_t, how can line 53 of balance_count() in Figure [*] work correctly in face of concurrent flush_local_count() updates to this variable? End Quick Quiz

Paul E. McKenney 2011-12-16