In some cases where increments and decrements are common, but checks
for zero are rare, it makes sense to maintain per-CPU or per-task
counters, as was discussed in Chapter .
See Appendix
for an example of this technique applied to RCU.
See the paper on sleepable read-copy update
(SRCU) for an example of this technique applied to
RCU [McK06].
This approach eliminates the need for atomic instructions or memory
barriers on the increment and decrement primitives, but still requires
that code-motion compiler optimizations be disabled.
In addition, the primitives such as synchronize_srcu()
that check for the aggregate reference
count reaching zero can be quite slow.
This underscores the fact that these techniques are designed
for situations where the references are frequently acquired and
released, but where it is rarely necessary to check for a zero
reference count.