D.3.6.4 Reporting Quiescent States

This hierarchical RCU implementation implements a layered approach to reporting quiescent states, using the following functions:

  1. rcu_qsctr_inc() and rcu_bh_qsctr_inc() are invoked when a given CPU passes through a quiescent state for ``rcu'' and ``rcu_bh'', respectively. Note that the dynticks-idle and CPU-offline quiescent states are handled specially, due to the fact that such a CPU is not executing, and thus is unable to report itself as being in a quiescent state.
  2. rcu_check_quiescent_state() checks to see if the current CPU has passed through a quiescent state, invoking cpu_quiet() if so.
  3. cpu_quiet() reports the specified CPU as having passed through a quiescent state by invoking cpu_quiet_msk(). The specified CPU must either be the current CPU or an offline CPU.
  4. cpu_quiet_msk() reports the specified vector of CPUs as having passed through a quiescent state. The CPUs in the vector need not be the current CPU, nor must they be offline.

Each of these functions is described below.

Figure: Code for Recording Quiescent States
\begin{figure}{ \scriptsize
\begin{verbatim}1 void rcu_qsctr_inc(int cpu)
2 ...
...p->passed_quiesc_completed = rdp->completed;
13 }\end{verbatim}
}\end{figure}

Figure [*] shows the code for rcu_qsctr_inc() and rcu_bh_qsctr_inc(), which note the current CPU's passage through a quiescent state.

Line 3 of rcu_qsctr_inc() obtains a pointer to the specified CPU's rcu_data structure (which corresponds to ``rcu'' as opposed to ``rcu_bh''). Line 4 sets the ->passed_quiesc field, recording the quiescent state. Line 5 sets the ->passed_quiesc_completed field to the number of the last completed grace period that this CPU knows of (which is stored in the ->completed field of the rcu_data structure).

The rcu_bh_qsctr_inc() function operates in the same manner, the only difference being that line 10 obtains the rcu_data pointer from the rcu_bh_data per-CPU variable rather than the rcu_data per-CPU variable.

Figure: Code for rcu_check_quiescent_state()
\begin{figure}{ \scriptsize
\begin{verbatim}1 static void
2 rcu_check_quiesc...
...rsp, rdp,
12 rdp->passed_quiesc_completed);
13 }\end{verbatim}
}\end{figure}

Figure [*] shows the code for rcu_check_quiescent_state(), which is invoked from rcu_process_callbacks() (described in Section [*]) in order to determine when other CPUs have started a new grace period and to inform RCU of recent quiescent states for this CPU.

Line 5 invokes check_for_new_grace_period() to check for a new grace period having been started by some other CPU, and also updating this CPU's local state to account for that new grace period. If a new grace period has just started, line 6 returns. Line 7 checks to see if RCU is still expecting a quiescent state from the current CPU, and line 8 returns if not. Line 9 checks to see if this CPU has passed through a quiescent state since the start of the current grace period (in other words, if rcu_qsctr_inc() or rcu_bh_qsctr_inc() have been invoked for ``rcu'' and ``rcu_bh'', respectively), and line 10 returns if not.

Therefore, execution reaches line 11 only if a previously noted grace period is still in effect, if this CPU needs to pass through a quiescent state in order to allow this grace period to end, and if this CPU has passed through such a quiescent state. In this case, lines 11-12 invoke cpu_quiet() in order to report this quiescent state to RCU.

Quick Quiz D.46: What prevents lines 11-12 of Figure [*] from reporting a quiescent state from a prior grace period against the current grace period? End Quick Quiz

Figure: Code for cpu_quiet()
\begin{figure}{ \scriptsize
\begin{verbatim}1 static void
2 cpu_quiet(int cp...
... cpu_quiet_msk(mask, rsp, rnp, flags);
25 }
26 }\end{verbatim}
}\end{figure}

Figure [*] shows cpu_quiet, which is used to report a quiescent state for the specified CPU. As noted earlier, this must either be the currently running CPU or a CPU that is guaranteed to remain offline throughout.

Line 9 picks up a pointer to the leaf rcu_node structure responsible for this CPU. Line 10 acquires this leaf rcu_node structure's lock and disables interrupts. Line 11 checks to make sure that the specified grace period is still in effect, and, if not, line 11 clears the indication that this CPU passed through a quiescent state (since it belongs to a defunct grace period), line 13 releases the lock and re-enables interrupts, and line 14 returns to the caller.

Otherwise, line 16 forms a mask with the specified CPU's bit set. Line 17 checks to see if this bit is still set in the leaf rcu_node structure, and, if not, line 18 releases the lock and re-enables interrupts.

On the other hand, if the CPU's bit is still set, line 20 clears ->qs_pending, reflecting that this CPU has passed through its quiescent state for this grace period. Line 21 then overwrites local variable rdp with a pointer to the running CPU's rcu_data structure, and lines 22-23 updates the running CPU's RCU callbacks so that all those not yet associated with a specific grace period be serviced by the next grace period. Finally, line 24 clears bits up the rcu_node hierarchy, ending the current grace period if appropriate and perhaps even starting a new one. Note that cpu_quiet() releases the lock and re-enables interrupts.

Quick Quiz D.47: How do lines 22-23 of Figure [*] know that it is safe to promote the running CPU's RCU callbacks? End Quick Quiz

Figure: Code for cpu_quiet_msk()
\begin{figure}{ \scriptsize
\begin{verbatim}1 static void
2 cpu_quiet_msk(un...
...cessor_id()]);
26 rcu_start_gp(rsp, flags);
27 }\end{verbatim}
}\end{figure}

Figure [*] shows cpu_quiet_msk(), which updates the rcu_node hierarchy to reflect the passage of the CPUs indicated by argument mask through their respective quiescent states. Note that argument rnp is the leaf rcu_node structure corresponding to the specified CPUs.

Quick Quiz D.48: Given that argument mask on line 2 of Figure [*] is an unsigned long, how can it possibly deal with systems with more than 64 CPUs? End Quick Quiz

Line 4 is annotation for the sparse utility, indicating that cpu_quiet_msk() releases the leaf rcu_node structure's lock.

Figure: Scanning rcu_node Structures When Applying Quiescent States
\resizebox{6in}{!}{\includegraphics{appendix/rcuimpl/RCUTreeQSScan}}

Each pass through the loop spanning lines 6-23 does the required processing for one level of the rcu_node hierarchy, traversing the data structures as shown by the blue arrow in Figure [*].

Line 7 checks to see if all of the bits in mask have already been cleared in the current rcu_node structure's ->qsmask field, and, if so, line 8 releases the lock and re-enables interrupts, and line 9 returns to the caller. If not, line 11 clears the bits specified by mask from the current rcu_node structure's qsmask field. Line 12 then checks to see if there are more bits remaining in ->qsmask, and, if so, line 13 releases the lock and re-enables interrupts, and line 14 returns to the caller.

Otherwise, it is necessary to advance up to the next level of the rcu_node hierarchy. In preparation for this next level, line 16 places a mask with the single bit set corresponding to the current rcu_node structure within its parent. Line 17 checks to see if there in fact is a parent for the current rcu_node structure, and, if not, line 18 breaks from the loop. On the other hand, if there is a parent rcu_node structure, line 20 releases the current rcu_node structure's lock, line 21 advances the rnp local variable to the parent, and line 22 acquires the parent's lock. Execution then continues at the beginning of the loop on line 7.

If line 18 breaks from the loop, we know that the current grace period has ended, as the only way that all bits can be cleared in the root rcu_node structure is if all CPUs have passed through quiescent states. In this case, line 24 updates the rcu_state structure's ->completed field to match the number of the newly ended grace period, indicating that the grace period has in fact ended. Line 24 then invokes rcu_process_gp_end() to advance the running CPU's RCU callbacks, and, finally, line 26 invokes rcu_start_gp() in order to start a new grace period should any remaining callbacks on the currently running CPU require one.

Figure: Code for rcu_do_batch()
\begin{figure}{ \scriptsize
\begin{verbatim}1 static void rcu_do_batch(struct...
...invoke(rdp))
43 raise_softirq(RCU_SOFTIRQ);
44 }\end{verbatim}
}\end{figure}

Figure [*] shows rcu_do_batch(), which invokes RCU callbacks whose grace periods have ended. Only callbacks on the running CPU will be invoked--other CPUs must invoke their own callbacks.

Quick Quiz D.49: How do RCU callbacks on dynticks-idle or offline CPUs get invoked? End Quick Quiz

Line 7 invokes cpu_has_callbacks_ready_to_invoke() to see if this CPU has any RCU callbacks whose grace period has completed, and, if not, line 8 returns. Lines 9 and 18 disable and re-enable interrupts, respectively. Lines 11-13 remove the ready-to-invoke callbacks from ->nxtlist, and lines 14-17 make any needed adjustments to the tail pointers.

Quick Quiz D.50: Why would lines 14-17 in Figure [*] need to adjust the tail pointers? End Quick Quiz

Line 19 initializes local variable count to zero in preparation for counting the number of callbacks that will actually be invoked. Each pass through the loop spanning lines 20-27 invokes and counts a callback, with lines 25-26 exiting the loop if too many callbacks are to be invoked at a time (thus preserving responsiveness). The remainder of the function then requeues any callbacks that could not be invoked due to this limit.

Lines 28 and 41 disable and re-enable interrupts, respectively. Line 29 updates the ->qlen field, which maintains a count of the total number of RCU callbacks for this CPU. Line 30 checks to see if there were any ready-to-invoke callbacks that could not be invoked at the moment due to the limit on the number that may be invoked at a given time. If such callbacks remain, lines 30-38 requeue them, again adjusting the tail pointers as needed. Lines 39-40 restore the batch limit if it was increased due to excessive callback backlog, and lines 42-43 cause additional RCU processing to be scheduled if there are any ready-to-invoke callbacks remaining.

Paul E. McKenney 2011-12-16