Figure
(count_lim_sig.c)
shows the data structures used by the signal-theft based counter
implementation.
Lines 1-7 define the states and values for the per-thread theft state machine
described in the preceding section.
Lines 8-17 are similar to earlier implementations, with the addition of
lines 14 and 15 to allow remote access to a thread's countermax
and theft variables, respectively.
Figure
shows the functions responsible for migrating counts between per-thread
variables and the global variables.
Lines 1-7 shows global_count(), which is identical to earlier
implementations.
Lines 9-19 shows flush_local_count_sig(), which is the signal
handler used in the theft process.
Lines 11 and 12 check to see if the theft state is REQ, and, if not
returns without change.
Line 13 executes a memory barrier to ensure that the sampling of the
theft variable happens before any change to that variable.
Line 14 sets the theft state to ACK, and, if line 15 sees that
this thread's fastpaths are not running, line 16 sets the theft
state to READY.
Quick Quiz 6.40:
In Figure
function flush_local_count_sig(), why are there
ACCESS_ONCE() wrappers around the uses of the
theft per-thread variable?
End Quick Quiz
Lines 21-49 shows flush_local_count(), which is called from the slowpath to flush all threads' local counts. The loop spanning lines 26-34 advances the theft state for each thread that has local count, and also sends that thread a signal. Line 27 skips any non-existent threads. Otherwise, line 28 checks to see if the current thread holds any local count, and, if not, line 29 sets the thread's theft state to READY and line 28 skips to the next thread. Otherwise, line 32 sets the thread's theft state to REQ and line 29 sends the thread a signal.
Quick Quiz 6.41:
In Figure ,
why is it safe for line 28 to directly access the other thread's
countermax variable?
End Quick Quiz
Quick Quiz 6.42:
In Figure ,
why doesn't line 33 check for the current thread sending itself
a signal?
End Quick Quiz
Quick Quiz 6.43:
The code in
Figure ,
works with gcc and POSIX.
What would be required to make it also conform to the ISO C standard?
End Quick Quiz
The loop spanning lines 35-48 waits until each thread reaches READY state, then steals that thread's count. Lines 36-37 skip any non-existent threads, and the loop spanning lines 38-42 wait until the current thread's theft state becomes READY. Line 39 blocks for a millisecond to avoid priority-inversion problems, and if line 40 determines that the thread's signal has not yet arrived, line 41 resends the signal. Execution reaches line 43 when the thread's theft state becomes READY, so lines 43-46 do the thieving. Line 47 then sets the thread's theft state back to IDLE.
Quick Quiz 6.44:
In Figure , why does line 41 resend the signal?
End Quick Quiz
Lines 51-63 show balance_count(), which is similar to that of earlier examples.
Lines 1-36 of
Figure
shows the add_count() function.
The fastpath spans lines 5-20, and the slowpath lines 21-35.
Line 5 sets the per-thread counting variable to 1 so that
any subsequent signal handlers interrupting this thread will
set the theft state to ACK rather than READY, allowing this
fastpath to complete properly.
Line 6 prevents the compiler from reordering any of the fastpath body
to precede the setting of counting.
Lines 7 and 8 check to see if the per-thread data can accommodate
the add_count() and if there is no ongoing theft in progress,
and if so line 9 does the fastpath addition and line 10 notes that
the fastpath was taken.
In either case, line 12 prevents the compiler from reordering the fastpath body to follow line 13, which permits any subsequent signal handlers to undertake theft. Line 14 again disables compiler reordering, and then line 15 checks to see if the signal handler deferred the theft state-change to READY, and, if so, line 16 executes a memory barrier to ensure that any CPU that sees line 17 setting state to READY also sees the effects of line 9. If the fastpath addition at line 9 was executed, then line 20 returns success.
Otherwise, we fall through to the slowpath starting at line 21.
The structure of the slowpath is similar to those of earlier examples,
so its analysis is left as an exercise to the reader.
Similarly, the structure of sub_count() on lines 38-71 is the same
as that of add_count(), so the analysis of sub_count() is also
left as an exercise for the reader, as is the analysis of
read_count() in
Figure .
Lines 1-12 of
Figure
show count_init(), which set up flush_local_count_sig()
as the signal handler for SIGUSR1,
enabling the pthread_kill() calls in flush_local_count()
to invoke flush_local_count_sig().
The code for thread registry and unregistry is similar to that of
earlier examples, so its analysis is left as an exercise for the
reader.
Paul E. McKenney 2011-12-16