Commit 0edd1b17 authored by Paul E. McKenney's avatar Paul E. McKenney

nohz_full: Add full-system-idle state machine

This commit adds the state machine that takes the per-CPU idle data
as input and produces a full-system-idle indication as output.  This
state machine is driven out of RCU's quiescent-state-forcing
mechanism, which invokes rcu_sysidle_check_cpu() to collect per-CPU
idle state and then rcu_sysidle_report() to drive the state machine.

The full-system-idle state is sampled using rcu_sys_is_idle(), which
also drives the state machine if RCU is idle (and does so by forcing
RCU to become non-idle).  This function returns true if all but the
timekeeping CPU (tick_do_timer_cpu) are idle and have been idle long
enough to avoid memory contention on the full_sysidle_state state
variable.  The rcu_sysidle_force_exit() may be called externally
to reset the state machine back into non-idle state.

For large systems the state machine is driven out of RCU's
force-quiescent-state logic, which provides good scalability at the price
of millisecond-scale latencies on the transition to full-system-idle
state.  This is not so good for battery-powered systems, which are usually
small enough that they don't need to care about scalability, but which
do care deeply about energy efficiency.  Small systems therefore drive
the state machine directly out of the idle-entry code.  The number of
CPUs in a "small" system is defined by a new NO_HZ_FULL_SYSIDLE_SMALL
Kconfig parameter, which defaults to 8.  Note that this is a build-time
definition.
Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
[ paulmck: Use true and false for boolean constants per Lai Jiangshan. ]
Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
[ paulmck: Simplify logic and provide better comments for memory barriers,
  based on review comments and questions by Lai Jiangshan. ]
parent 217af2a2
...@@ -1011,4 +1011,22 @@ static inline bool rcu_is_nocb_cpu(int cpu) { return false; } ...@@ -1011,4 +1011,22 @@ static inline bool rcu_is_nocb_cpu(int cpu) { return false; }
#endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */ #endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */
/* Only for use by adaptive-ticks code. */
#ifdef CONFIG_NO_HZ_FULL_SYSIDLE
extern bool rcu_sys_is_idle(void);
extern void rcu_sysidle_force_exit(void);
#else /* #ifdef CONFIG_NO_HZ_FULL_SYSIDLE */
static inline bool rcu_sys_is_idle(void)
{
return false;
}
static inline void rcu_sysidle_force_exit(void)
{
}
#endif /* #else #ifdef CONFIG_NO_HZ_FULL_SYSIDLE */
#endif /* __LINUX_RCUPDATE_H */ #endif /* __LINUX_RCUPDATE_H */
...@@ -734,6 +734,7 @@ static int dyntick_save_progress_counter(struct rcu_data *rdp, ...@@ -734,6 +734,7 @@ static int dyntick_save_progress_counter(struct rcu_data *rdp,
bool *isidle, unsigned long *maxj) bool *isidle, unsigned long *maxj)
{ {
rdp->dynticks_snap = atomic_add_return(0, &rdp->dynticks->dynticks); rdp->dynticks_snap = atomic_add_return(0, &rdp->dynticks->dynticks);
rcu_sysidle_check_cpu(rdp, isidle, maxj);
return (rdp->dynticks_snap & 0x1) == 0; return (rdp->dynticks_snap & 0x1) == 0;
} }
...@@ -1373,11 +1374,17 @@ int rcu_gp_fqs(struct rcu_state *rsp, int fqs_state_in) ...@@ -1373,11 +1374,17 @@ int rcu_gp_fqs(struct rcu_state *rsp, int fqs_state_in)
rsp->n_force_qs++; rsp->n_force_qs++;
if (fqs_state == RCU_SAVE_DYNTICK) { if (fqs_state == RCU_SAVE_DYNTICK) {
/* Collect dyntick-idle snapshots. */ /* Collect dyntick-idle snapshots. */
if (is_sysidle_rcu_state(rsp)) {
isidle = 1;
maxj = jiffies - ULONG_MAX / 4;
}
force_qs_rnp(rsp, dyntick_save_progress_counter, force_qs_rnp(rsp, dyntick_save_progress_counter,
&isidle, &maxj); &isidle, &maxj);
rcu_sysidle_report_gp(rsp, isidle, maxj);
fqs_state = RCU_FORCE_QS; fqs_state = RCU_FORCE_QS;
} else { } else {
/* Handle dyntick-idle and offline CPUs. */ /* Handle dyntick-idle and offline CPUs. */
isidle = 0;
force_qs_rnp(rsp, rcu_implicit_dynticks_qs, &isidle, &maxj); force_qs_rnp(rsp, rcu_implicit_dynticks_qs, &isidle, &maxj);
} }
/* Clear flag to prevent immediate re-entry. */ /* Clear flag to prevent immediate re-entry. */
...@@ -2103,9 +2110,12 @@ static void force_qs_rnp(struct rcu_state *rsp, ...@@ -2103,9 +2110,12 @@ static void force_qs_rnp(struct rcu_state *rsp,
cpu = rnp->grplo; cpu = rnp->grplo;
bit = 1; bit = 1;
for (; cpu <= rnp->grphi; cpu++, bit <<= 1) { for (; cpu <= rnp->grphi; cpu++, bit <<= 1) {
if ((rnp->qsmask & bit) != 0 && if ((rnp->qsmask & bit) != 0) {
f(per_cpu_ptr(rsp->rda, cpu), isidle, maxj)) if ((rnp->qsmaskinit & bit) != 0)
mask |= bit; *isidle = 0;
if (f(per_cpu_ptr(rsp->rda, cpu), isidle, maxj))
mask |= bit;
}
} }
if (mask != 0) { if (mask != 0) {
......
...@@ -555,6 +555,11 @@ static void rcu_kick_nohz_cpu(int cpu); ...@@ -555,6 +555,11 @@ static void rcu_kick_nohz_cpu(int cpu);
static bool init_nocb_callback_list(struct rcu_data *rdp); static bool init_nocb_callback_list(struct rcu_data *rdp);
static void rcu_sysidle_enter(struct rcu_dynticks *rdtp, int irq); static void rcu_sysidle_enter(struct rcu_dynticks *rdtp, int irq);
static void rcu_sysidle_exit(struct rcu_dynticks *rdtp, int irq); static void rcu_sysidle_exit(struct rcu_dynticks *rdtp, int irq);
static void rcu_sysidle_check_cpu(struct rcu_data *rdp, bool *isidle,
unsigned long *maxj);
static bool is_sysidle_rcu_state(struct rcu_state *rsp);
static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle,
unsigned long maxj);
static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp); static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp);
#endif /* #ifndef RCU_TREE_NONCORE */ #endif /* #ifndef RCU_TREE_NONCORE */
......
This diff is collapsed.
...@@ -157,6 +157,33 @@ config NO_HZ_FULL_SYSIDLE ...@@ -157,6 +157,33 @@ config NO_HZ_FULL_SYSIDLE
Say N if you are unsure. Say N if you are unsure.
config NO_HZ_FULL_SYSIDLE_SMALL
int "Number of CPUs above which large-system approach is used"
depends on NO_HZ_FULL_SYSIDLE
range 1 NR_CPUS
default 8
help
The full-system idle detection mechanism takes a lazy approach
on large systems, as is required to attain decent scalability.
However, on smaller systems, scalability is not anywhere near as
large a concern as is energy efficiency. The sysidle subsystem
therefore uses a fast but non-scalable algorithm for small
systems and a lazier but scalable algorithm for large systems.
This Kconfig parameter defines the number of CPUs in the largest
system that will be considered to be "small".
The default value will be fine in most cases. Battery-powered
systems that (1) enable NO_HZ_FULL_SYSIDLE, (2) have larger
numbers of CPUs, and (3) are suffering from battery-lifetime
problems due to long sysidle latencies might wish to experiment
with larger values for this Kconfig parameter. On the other
hand, they might be even better served by disabling NO_HZ_FULL
entirely, given that NO_HZ_FULL is intended for HPC and
real-time workloads that at present do not tend to be run on
battery-powered systems.
Take the default if you are unsure.
config NO_HZ config NO_HZ
bool "Old Idle dynticks config" bool "Old Idle dynticks config"
depends on !ARCH_USES_GETTIMEOFFSET && GENERIC_CLOCKEVENTS depends on !ARCH_USES_GETTIMEOFFSET && GENERIC_CLOCKEVENTS
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment