Rough performance results7.9are shown in
Figure ,
running on a dual-core Intel x86 running at 1GHz (4300 bogomips per CPU)
with at most six blocks allowed in each CPU's cache.
In this micro-benchmark,
each thread repeatedly allocates a group of blocks and then frees it, with
the size of the group being the ``allocation run length'' displayed on
the x-axis.
The y-axis shows the number of successful allocation/free pairs per
microsecond -- failed allocations are not counted.
The ``X''s are from a two-thread run, while the ``+''s are from a
single-threaded run.
Note that run lengths up to six scale linearly and give excellent performance, while run lengths greater than six show poor performance and almost always also show negative scaling. It is therefore quite important to size TARGET_POOL_SIZE sufficiently large, which fortunately is usually quite easy to do in actual practice [MSK01], especially given today's large memories. For example, in most systems, it is quite reasonable to set TARGET_POOL_SIZE to 100, in which case allocations and frees are guaranteed to be confined to per-thread pools at least 99% of the time.
As can be seen from the figure, the situations where the common-case data-ownership applies (run lengths up to six) provide greatly improved performance compared to the cases where locks must be acquired. Avoiding locking in the common case will be a recurring theme through this book.
Quick Quiz 7.15:
In Figure ,
there is a pattern of performance rising with increasing run
length in groups of three samples, for example, for run lengths
10, 11, and 12.
Why?
End Quick Quiz
Quick Quiz 7.16: Allocation failures were observed in the two-thread tests at run lengths of 19 and greater. Given the global-pool size of 40 and the per-CPU target pool size of three, what is the smallest allocation run length at which failures can occur? End Quick Quiz
Paul E. McKenney 2011-12-16