17.1.13 RCU
Because read-copy update (RCU) finds its main use in the Linux kernel,
one might be forgiven for assuming that there had been no academic work
on combining RCU and TM.
However, the TxLinux group from the University of Texas at Austin had
no choice [RHP+07].
The fact that they applied TM to the Linux 2.6 kernel, which uses RCU,
forced them to integrate TM and RCU, with TM taking the place of locking
for RCU updates.
Unfortunately, although the paper does state that the RCU implementation's
locks (e.g., rcu_ctrlblk.lock) were converted to transactions,
it is silent about what happened to locks used in RCU-based updates
(e.g., dcache_lock).
It is important to note that RCU permits readers and updaters to run
concurrently, further permitting RCU readers to access data that is in
the act of being updated.
Of course, this property of RCU, whatever its performance, scalability,
and real-time-response benefits might be, flies in the face of the
underlying atomicity properties of TM.
So how should TM-based updates interact with concurrent RCU readers?
Some possibilities are as follows:
- RCU readers abort concurrent conflicting TM updates.
This is in fact the approach taken by the TxLinux project.
This approach does preserve RCU semantics, and also preserves
RCU's read-side performance, scalability, and real-time-response
properties, but it does have the unfortunate side-effect of
unnecessarily aborting conflicting updates.
In the worst case, a long sequence of RCU readers could
potentially starve all updaters, which could in theory result
in system hangs.
In addition, not all TM implementations offer the strong atomicity
required to implement this approach.
- RCU readers that run concurrently with conflicting TM updates
get old (pre-transaction) values from any conflicting RCU loads.
This preserves RCU semantics and performance, and also prevents
RCU-update starvation.
However, not all TM implementations can provide timely access
to old values of variables that have been tentatively updated
by an in-flight transaction.
In particular, log-based TM implementations that maintain
old values in the log (thus making for excellent TM commit
performance) are not likely to be happy with this approach.
Perhaps the rcu_dereference() primitive can be leveraged
to permit RCU to access the old values within a greater range
of TM implementations, though performance might still be an issue.
- If an RCU reader executes an access that conflicts with an
in-flight transaction, then that RCU access is delayed until
the conflicting transaction either commits or aborts.
This approach preserves RCU semantics, but not RCU's performance
or real-time response, particularly in presence of long-running
transactions.
In addition, not all TM implementations are capable of delaying
conflicting accesses.
That said, this approach seems eminently reasonable for hardware
TM implementations that support only small transactions.
- RCU readers are converted to transactions.
This approach pretty much guarantees that RCU is compatible with
any TM implementation, but it also imposes TM's rollbacks on RCU
read-side critical sections, destroying RCU's real-time response
guarantees, and also degrading RCU's read-side performance.
Furthermore, this approach is infeasible in cases where any of
the RCU read-side critical sections contains operations that
the TM implementation in question is incapable of handling.
- Many update-side uses of RCU modify a single pointer to publish
a new data structure.
In some these cases, RCU can safely be permitted to see a
transactional pointer update that is subsequently rolled back,
as long as the transaction respects memory ordering and as long
as the roll-back process uses call_rcu() to free up the
corresponding structure.
Unfortunately, not all TM implementations respect memory barriers
within a transaction.
Apparently, the thought is that because transactions are supposed
to be atomic, the ordering of the accesses within the transaction
is not supposed to matter.
- Prohibit use of TM in RCU updates.
This is guaranteed to work, but seems a bit restrictive.
It seems likely that additional approaches will be uncovered, especially
given the advent of user-level RCU implementations.17.4
Paul E. McKenney
2011-12-16