Figure
shows both the per-thread and global variables used by this
implementation.
The per-thread counter and countermax variables are the
corresponding thread's local counter and the upper bound on that
counter, respectively.
The globalcountmax variable on line 3 contains the upper
bound for the aggregate counter, and the globalcount variable
on line 4 is the global counter.
The sum of globalcount and each thread's counter gives
the aggregate value of the overall counter.
The globalreserve variable on line 5 is the sum of all of the
per-thread countermax variables.
The relationship among these variables is shown by
Figure
:
Each element of the counterp[] array references the corresponding thread's counter variable, and, finally, the gblcnt_mutex spinlock guards all of the global variables, in other words, no thread is permitted to access or modify any of the global variables unless it has acquired gblcnt_mutex.
Figure
shows the add_count(), sub_count(), and read_count()
functions (count_lim.c).
Lines 1-18 show add_count(), which adds the specified value delta to the counter. Line 3 checks to see if there is room for delta on this thread's counter, and, if so, line 4 adds it and line 6 returns success. This is the add_counter() fastpath, and it does no atomic operations, references only per-thread variables, and should not incur any cache misses.
Quick Quiz 6.25:
What is with the strange form of the condition on line 3 of
Figure ?
Why not the following more intuitive form of the fastpath?
3 if (counter + delta <= countermax){ 4 counter += delta; 5 return 1; 6 } |
If the test on line 3 fails, we must access global variables, and thus
must acquire gblcnt_mutex on line 7, which we release on line 11
in the failure case or on line 16 in the success case.
Line 8 invokes globalize_count(), shown in
Figure ,
which clears the thread-local variables, adjusting the global variables
as needed, thus simplifying global processing.
(But don't take my word for it, try coding it yourself!)
Lines 9 and 10 check to see if addition of delta can be accommodated,
with the meaning of the expression preceding the less-than sign shown in
Figure
as the difference in height of the two red bars.
If the addition of delta cannot be accommodated, then
line 11 (as noted earlier) releases gblcnt_mutex and line 12
returns indicating failure.
Otherwise, line 14 subtracts delta from globalcount,
line 15 invokes balance_count() (shown in
Figure )
in order to update both the global and the per-thread variables
(hopefully setting this thread's countermax to re-enable the
fastpath),
if appropriate, to re-enable fastpath processing, line 16 release
gblcnt_mutex (again, as noted earlier), and, finally,
line 17 returns indicating success.
Quick Quiz 6.26:
Why do globalize_count() to zero the per-thread variables,
only to later call balance_count() to refill them in
Figure ?
Why not just leave the per-thread variables non-zero?
End Quick Quiz
Lines 20-36 show sub_count(), which subtracts the specified delta from the counter. Line 22 checks to see if the per-thread counter can accommodate this subtraction, and, if so, line 23 does the subtraction and line 24 returns success. These lines form sub_count()'s fastpath, and, as with add_count(), this fastpath executes no costly operations.
If the fastpath cannot accommodate subtraction of delta,
execution proceeds to the slowpath on lines 26-35.
Because the slowpath must access global state, line 26
acquires gblcnt_mutex, which is release either by line 29
(in case of failure) or by line 34 (in case of success).
Line 27 invokes globalize_count(), shown in
Figure ,
which again clears the thread-local variables, adjusting the global variables
as needed.
Line 28 checks to see if the counter can accommodate subtracting
delta, and, if not, line 29 releases gblcnt_mutex
(as noted earlier) and line 30 returns failure.
Quick Quiz 6.27:
Given that globalreserve counted against us in add_count(),
why doesn't it count for us in sub_count() in
Figure ?
End Quick Quiz
If, on the other hand, line 28 finds that the counter can
accommodate subtracting delta, then line 32 does the subtraction,
line 33 invokes balance_count() (shown in
Figure )
in order to update both global and per-thread variables
(hopefully re-enabling the fastpath),
line 34 releases gblcnt_mutex, and line 35 returns success.
Quick Quiz 6.28:
Why have both add_count() and sub_count() in
Figure ?
Why not simply pass a negative number to add_count()?
End Quick Quiz
Lines 38-50 show read_count(), which returns the aggregate value of the counter. It acquires gblcnt_mutex on line 43 and releases it on line 48, excluding global operations from add_count() and sub_count(), and, as we will see, also excluding thread creation and exit. Line 44 initializes local variable sum to the value of globalcount, and then the loop spanning lines 45-47 sums the per-thread counter variables. Line 49 then returns the sum.
Figure
shows a number of utility functions that support the add_count()
sub_count(), and read_count() primitives shown in
Figure
.
Lines 1-7 show globalize_count(), which zeros the current thread's
per-thread counters, adjusting the global variables appropriately.
It is important to note that this function does not change the aggregate
value of the counter, but instead changes how the counter's current value
is represented.
Line 3 adds the thread's counter variable to globalcount,
and line 4 zeroes counter.
Similarly, line 5 subtracts the per-thread countermax from
globalreserve, and line 6 zeroes countermax.
It is helpful to refer to
Figure
when reading both this function and balance_count(), which is next.
Lines 9-19 show balance_count(), which can is, roughly speaking the inverse of globalize_count(). This function sets the current thread's counter and countermax variables (with corresponding adjustments to globalcount and globalreserve) in an attempt to promote use of add_count()'s and sub_count()'s fastpaths. As with globalize_count(), balance_count() does not change the aggregate value of the counter. Lines 11-13 compute this thread's share of that portion of globalcountmax that is not already covered by either globalcount or globalreserve, and assign the computed quantity to this thread's countermax. Line 14 makes the corresponding adjustment to globalreserve. Line 15 sets this thread's counter to the middle of the range from zero to countermax. Line 16 checks to see whether globalcount can in fact accommodate this value of counter, and, if not, line 17 decreases counter accordingly. Finally, in either case, line 18 makes the corresponding adjustment to globalcount.
Lines 21-28 show count_register_thread(), which sets up state for newly created threads. This function simply installs a pointer to the newly created thread's counter variable into the corresponding entry of the counterp[] array under the protection of gblcnt_mutex.
Finally, lines 30-38 show count_unregister_thread(), which tears down state for a soon-to-be-exiting thread. Line 34 acquires gblcnt_mutex and line 37 releases it. Line 35 invokes globalize_count() to clear out this thread's counter state, and line 36 clears this thread's entry in the counterp[] array.
Paul E. McKenney 2011-12-16