The compound implementation is somewhat more complex than the
hashed variant presented in
Section ,
but is still reasonably simple.
Of course, a more intelligent rebalancing scheme could be arbitrarily
complex, but the simple scheme shown here will has been shown to
perform well compared to software
alternatives [DCW+11]
and even compared to algorithms using hardware
assist [DLM+10].
Nevertheless, the best we can hope for from such a scheme
is 2x scalability, as at most two threads can be holding the
dequeue's locks concurrently.
The key point is that there can be significant overhead enqueuing to or dequeuing from a shared queue.