Figure
shows the code for force_quiescent_state() for
CONFIG_SMP,D.4which is invoked when RCU feels the need to expedite the current
grace period by forcing CPUs through quiescent states.
RCU feels this need when either:
Lines 10-12 check to see if there is a grace period in progress, silently exiting if not. Lines 13-16 attempt to acquire ->fqslock, which prevents concurrent attempts to expedite a grace period. The ->n_force_qs_lh counter is incremented when this lock is already held, and is visible via the fqlh= field in the rcuhier debugfs file when the CONFIG_RCU_TRACE kernel parameter is enabled. Lines 17-21 check to see if it is really necessary to expedite the current grace period, in other words, if (1) the current CPU has 10,000 RCU callbacks waiting, or (2) at least three jiffies have passed since either the beginning of the current grace period or since the last attempt to expedite the current grace period, measured either by the jiffies counter or by the number of calls to rcu_pending. Line 22 then counts the number of attempts to expedite grace periods.
Lines 23-36 are executed with the root rcu_node structure's lock held in order to prevent confusion should the current grace period happen to end just as we try to expedite it. Lines 24 and 25 snapshot the ->completed and ->signaled fields, lines 26-30 set the soonest time that a subsequent non-relaxed force_quiescent_state() will be allowed to actually do any expediting, and lines 31-35 check to see if the grace period ended while we were acquiring the rcu_node structure's lock, releasing this lock and returning if so.
Lines 37-59 drive the force_quiescent_state() state machine. If the grace period is still in the midst of initialization, lines 41 and 42 simply return, allowing force_quiescent_state() to be called again at a later time, presumably after initialization has completed. If dynticks are enabled (via the CONFIG_NO_HZ kernel parameter), the first post-initialization call to force_quiescent_state() in a given grace period will execute lines 40-52, and the second and subsequent calls will execute lines 53-59. On the other hand, if dynticks is not enabled, then all post-initialization calls to force_quiescent_state() will execute lines 53-59.
The purpose of lines 40-52 is to record the current dynticks-idle state of all CPUs that have not yet passed through a quiescent state, and to record a quiescent state for any that are currently in dynticks-idle state (but not currently in an irq or NMI handler). Lines 41-42 serve to inform gcc that this branch of the switch statement is dead code for non-CONFIG_NO_HZ kernels. Lines 43-45 invoke rcu_process_dyntick() in order to invoke dyntick_save_progress_counter() for each CPU that has not yet passed through a quiescent state for the current grace period, exiting force_quiescent_state() if the grace period ends in the meantime (possibly due to having found that all the CPUs that had not yet passed through a quiescent state were sleeping in dyntick-idle mode). Lines 46 and 51 acquire and release the root rcu_node structure's lock, again to avoid possible confusion with a concurrent end of the current grace period. Line 47 checks to see if the current grace period is still in force, and, if so, line 48 advances the state machine to the RCU_FORCE_QS state and line 49 saves the current grace-period number for the benefit of the next invocation of force_quiescent_state(). The reason for saving the current grace-period number is to correctly handle race conditions involving the current grace period ending concurrently with the next invocation of force_quiescent_state().
As noted earlier, lines 53-58 handle the second and subsequent invocations of force_quiescent_state() in CONFIG_NO_HZ kernels, and all invocations in non-CONFIG_NO_HZ kernels. Lines 54 and 58 invoke rcu_process_dyntick(), which cycles through the CPUs that have still not passed through a quiescent state, invoking rcu_implicit_dynticks_qs() on them, which in turn checks to see if any of these CPUs have passed through dyntick-idle state (if CONFIG_NO_HZ is enabled), checks to see if we are waiting on any offline CPUs, and finally sends a reschedule IPI to any remaining CPUs not in the first two groups.
Paul E. McKenney 2011-12-16