D.3.1.3 Per-CPU Data
The rcu_data structure contains RCU's per-CPU state.
It contains control variables governing grace periods and
quiescent states (completed, gpnum, passed_quiesc_completed,
passed_quiesc, qs_pending, beenonline, mynode,
and grpmask).
The rcu_data structure also contains control variables pertaining
to RCU callbacks
(nxtlist, nxttail, qlen, and blimit).
Kernels with dynticks enabled will have relevant control variables in
the rcu_data structure
(dynticks, dynticks_snap, and dynticks_nmi_snap).
The rcu_data structure contains event counters used by tracing
(dynticks_fqs given dynticks, offline_fqs, and resched_ipi).
Finally, a pair of fields count calls to rcu_pending() in order
to determine when to force quiescent states (n_rcu_pending and
n_rcu_pending_force_qs), and a cpu field indicates which
CPU to which a given rcu_data structure corresponds.
Each of these fields is described below.
- completed:
This field contains the number of the most recent grace period
that this CPU is aware of having completed.
- gpnum:
This field contains the number of the most recent grace period
that this CPU is aware of having started.
- passed_quiesc_completed:
This field contains the number of the grace period that had most
recently completed when this
CPU last passed through a quiescent state.
The "most recently completed" will be from the viewpoint of
the CPU passing through the quiescent state: if the CPU is
not yet aware that grace period (say) 42 has completed, it
will still record the old value of 41.
This is OK, because the only way that the grace period can
complete is if this CPU has already passed through a
quiescent state.
This field is initialized to a (possibly mythical) past
grace period number to avoid race conditions when booting
and when onlining a CPU.
- passed_quiesc:
This field indicates whether this CPU has passed
through a quiescent state since the grace period number
stored in passed_quiesc_completed completed.
This field is cleared each time the corresponding CPU
becomes aware of the start of a new grace period.
- qs_pending:
This field indicates that this CPU is aware that the core
RCU mechanism is waiting for it to pass through a quiescent state.
This field is set to one when the CPU detects a new grace
period or when a CPU is coming online.
Quick Quiz D.21:
But why bother setting qs_pending to one when a CPU
is coming online, given that being offline is an extended
quiescent state that should cover any ongoing grace period?
End Quick Quiz
Quick Quiz D.22:
Why record the last completed grace period number in
passed_quiesc_completed?
Doesn't that cause this RCU implementation to be vulnerable
to quiescent states seen while no grace period was in progress
being incorrectly applied to the next grace period that starts?
End Quick Quiz
- beenonline:
This field, initially zero, is set to one whenever the corresponding
CPU comes online.
This is used to avoid producing useless tracing output for CPUs
that never have been online, which is useful in kernels where
NR_CPUS greatly exceeds the actual number of CPUs.
Quick Quiz D.23:
What is the point of running a system with NR_CPUS
way bigger than the actual number of CPUs?
End Quick Quiz
- mynode:
This field is a pointer to the leaf rcu_node structure that
handles the corresponding CPU.
- grpmask:
This field is a bitmask that has the single bit set that indicates
which bit in mynode->qsmask signifies the corresponding CPU.
- nxtlist:
This field is a pointer to the oldest RCU callback (rcu_head
structure) residing on this CPU, or NULL if this CPU currently
has no such callbacks.
Additional callbacks may be chained via their next pointers.
- nxttail:
This field is an array of double-indirect tail pointers
into the nxtlist callback list.
If nxtlist is empty, then all of the nxttail pointers
directly reference the nxtlist field.
Each element of the nxttail array has meaning as follows:
- RCU_DONE_TAIL=0:
This element references the ->next field of
the last callback that has passed through its grace
period and is ready to invoke, or references the nxtlist
field if there is no such callback.
- RCU_WAIT_TAIL=1:
This element references the next field of the
last callback that is waiting for the current grace
period to end, or is equal to the RCU_DONE_TAIL
element if there is no such callback.
- RCU_NEXT_READY_TAIL=2:
This element references the next field of the
last callback that is ready to wait for the next
grace period, or is equal to the RCU_WAIT_TAIL
element if there is no such callback.
- RCU_NEXT_TAIL=3:
This element references the next field of the
last callback in the list, or references the nxtlist
field if the list is empty.
Quick Quiz D.24:
Why not simply have multiple lists rather than this funny
multi-tailed list?
End Quick Quiz
- qlen:
This field contains the number of callbacks queued on
nxtlist.
- blimit:
This field contains the maximum number of callbacks that may
be invoked at a time.
This limitation improves system responsiveness under heavy load.
- dynticks:
This field references the rcu_dynticks structure for
the corresponding CPU, which is described in
Section
.
- dynticks_snap:
This field contains a past value of dynticks->dynticks,
which is used to detect when a CPU passes through a dynticks
idle state when this CPU happens to be in an irq
handler each time that force_quiescent_state() checks it.
- dynticks_nmi_snap:
This field contains a past value of dynticks->dynticks_nmi,
which is used to detect when a CPU passes through a dynticks
idle state when this CPU happens to be in an NMI
handler each time that force_quiescent_state() checks it.
- dynticks_fqs:
This field counts the number of times that some other CPU noted
a quiescent state on behalf of
the CPU corresponding to this rcu_data structure due to
its being in dynticks-idle mode.
- offline_fqs:
This field counts the number of times that some other CPU noted
a quiescent state on behalf of
the CPU corresponding to this rcu_data structure due to
its being offline.
Quick Quiz D.25:
So some poor CPU has to note quiescent states on behalf of
each and every offline CPU?
Yecch!
Won't that result in excessive overheads in the not-uncommon
case of a system with a small number of CPUs but a large value
for NR_CPUS?
End Quick Quiz
- resched_ipi:
This field counts the number of times that a reschedule IPI
is sent to the corresponding CPU.
Such IPIs are sent to CPUs that fail to report passing through
a quiescent states in a timely manner, but are neither offline
nor in dynticks idle state.
- n_rcu_pending:
This field counts the number of calls to rcu_pending(),
which is called once per jiffy on non-dynticks-idle CPUs.
- n_rcu_pending_force_qs:
This field holds a threshold value for n_rcu_pending.
If n_rcu_pending reaches this threshold, that indicates
that the current grace period has extended too long, so
force_quiescent_state() is invoked to expedite it.
Paul E. McKenney
2011-12-16