Quick Quiz .2:
Just what is the accuracy of read_count(), anyway?
Answer:
Refer to
Figure on
Page
.
Clearly, if there are no concurrent invocations of inc_count(),
read_count() will return an exact result.
However, if there are concurrent invocations of
inc_count(), then the sum is in fact changing as
read_count() performs its summation.
That said, because thread creation and exit are excluded by
final_mutex, the pointers in counterp remain constant.
Let's imagine a mythical machine that is able to take an instantaneous snapshot of its memory. Suppose that this machine takes such a snapshot at the beginning of read_count()'s execution, and another snapshot at the end of read_count()'s execution. Then read_count() will access each thread's counter at some time between these two snapshots, and will therefore obtain a result that is bounded by those of the two snapshots, inclusive. The overall sum will therefore be bounded by the pair of sums that would have been obtained from each of the two snapshots (again, inclusive).
The expected error is therefore half of the difference between the pair of sums that would have been obtained from each of the two snapshots, that is to say, half of the execution time of read_count() multiplied by the number of expected calls to inc_count() per unit time.
Or, for those who prefer equations:
![]() |
(F.1) |
Quick Quiz .3:
Hey!!!
Line 45 of
Figure
modifies a value in a pre-existing countarray structure!
Didn't you say that this structure, once made available to
read_count(), remained constant???
Answer:
Indeed I did say that.
And it would be possible to make count_register_thread()
allocate a new structure, much as count_unregister_thread()
currently does.
But this is unnecessary. Recall the derivation of the error bounds of read_count() that was based on the snapshots of memory. Because new threads start with initial counter values of zero, the derivation holds even if we add a new thread partway through read_count()'s execution. So, interestingly enough, when adding a new thread, this implementation gets the effect of allocating a new structure, but without actually having to do the allocation.
Quick Quiz .4:
Wow!
Figure
contains 69 lines of code, compared to only 42 in
Figure
.
Is this extra complexity really worth it?
Answer:
This of course needs to be decided on a case-by-case basis.
If you need an implementation of read_count() that
scales linearly, then the lock-based implementation shown in
Figure
simply will not work for you.
On the other hand, if calls to count_read() are sufficiently
rare, then the lock-based version is simpler and might thus be
better, although much of the size difference is due
to the structure definition, memory allocation, and NULL
return checking.
Of course, a better question is "why doesn't the language
implement cross-thread access to __thread variables?"
After all, such an implementation would make both the locking
and the use of RCU unnecessary.
This would in turn enable an implementation that
was even simpler than the one shown in
Figure , but
with all the scalability and performance benefits of the
implementation shown in
Figure
!
Paul E. McKenney 2011-12-16