Commit ed562d28 authored by Rafael J. Wysocki's avatar Rafael J. Wysocki

Merge branches 'pm-cpufreq' and 'pm-cpuidle'

* pm-cpufreq:
  cpufreq: Make cpufreq_online() call driver->offline() on errors
  cpufreq: loongson2: Remove unused linux/sched.h headers
  cpufreq: sh: Remove unused linux/sched.h headers
  cpufreq: stats: Clean up local variable in cpufreq_stats_create_table()
  cpufreq: intel_pstate: hybrid: Fix build with CONFIG_ACPI unset
  cpufreq: sc520_freq: add 'fallthrough' to one case
  cpufreq: intel_pstate: Add Cometlake support in no-HWP mode
  cpufreq: intel_pstate: Add Icelake servers support in no-HWP mode
  cpufreq: intel_pstate: hybrid: CPU-specific scaling factor
  cpufreq: intel_pstate: hybrid: Avoid exposing two global attributes

* pm-cpuidle:
  cpuidle: teo: remove unneeded semicolon in teo_select()
  cpuidle: teo: Use kerneldoc documentation in admin-guide
  cpuidle: teo: Rework most recent idle duration values treatment
  cpuidle: teo: Change the main idle state selection logic
  cpuidle: teo: Cosmetic modification of teo_select()
  cpuidle: teo: Cosmetic modifications of teo_update()
  intel_idle: Adjust the SKX C6 parameters if PC6 is disabled
...@@ -347,81 +347,8 @@ for tickless systems. It follows the same basic strategy as the ``menu`` `one ...@@ -347,81 +347,8 @@ for tickless systems. It follows the same basic strategy as the ``menu`` `one
<menu-gov_>`_: it always tries to find the deepest idle state suitable for the <menu-gov_>`_: it always tries to find the deepest idle state suitable for the
given conditions. However, it applies a different approach to that problem. given conditions. However, it applies a different approach to that problem.
First, it does not use sleep length correction factors, but instead it attempts .. kernel-doc:: drivers/cpuidle/governors/teo.c
to correlate the observed idle duration values with the available idle states :doc: teo-description
and use that information to pick up the idle state that is most likely to
"match" the upcoming CPU idle interval. Second, it does not take the tasks
that were running on the given CPU in the past and are waiting on some I/O
operations to complete now at all (there is no guarantee that they will run on
the same CPU when they become runnable again) and the pattern detection code in
it avoids taking timer wakeups into account. It also only uses idle duration
values less than the current time till the closest timer (with the scheduler
tick excluded) for that purpose.
Like in the ``menu`` governor `case <menu-gov_>`_, the first step is to obtain
the *sleep length*, which is the time until the closest timer event with the
assumption that the scheduler tick will be stopped (that also is the upper bound
on the time until the next CPU wakeup). That value is then used to preselect an
idle state on the basis of three metrics maintained for each idle state provided
by the ``CPUIdle`` driver: ``hits``, ``misses`` and ``early_hits``.
The ``hits`` and ``misses`` metrics measure the likelihood that a given idle
state will "match" the observed (post-wakeup) idle duration if it "matches" the
sleep length. They both are subject to decay (after a CPU wakeup) every time
the target residency of the idle state corresponding to them is less than or
equal to the sleep length and the target residency of the next idle state is
greater than the sleep length (that is, when the idle state corresponding to
them "matches" the sleep length). The ``hits`` metric is increased if the
former condition is satisfied and the target residency of the given idle state
is less than or equal to the observed idle duration and the target residency of
the next idle state is greater than the observed idle duration at the same time
(that is, it is increased when the given idle state "matches" both the sleep
length and the observed idle duration). In turn, the ``misses`` metric is
increased when the given idle state "matches" the sleep length only and the
observed idle duration is too short for its target residency.
The ``early_hits`` metric measures the likelihood that a given idle state will
"match" the observed (post-wakeup) idle duration if it does not "match" the
sleep length. It is subject to decay on every CPU wakeup and it is increased
when the idle state corresponding to it "matches" the observed (post-wakeup)
idle duration and the target residency of the next idle state is less than or
equal to the sleep length (i.e. the idle state "matching" the sleep length is
deeper than the given one).
The governor walks the list of idle states provided by the ``CPUIdle`` driver
and finds the last (deepest) one with the target residency less than or equal
to the sleep length. Then, the ``hits`` and ``misses`` metrics of that idle
state are compared with each other and it is preselected if the ``hits`` one is
greater (which means that that idle state is likely to "match" the observed idle
duration after CPU wakeup). If the ``misses`` one is greater, the governor
preselects the shallower idle state with the maximum ``early_hits`` metric
(or if there are multiple shallower idle states with equal ``early_hits``
metric which also is the maximum, the shallowest of them will be preselected).
[If there is a wakeup latency constraint coming from the `PM QoS framework
<cpu-pm-qos_>`_ which is hit before reaching the deepest idle state with the
target residency within the sleep length, the deepest idle state with the exit
latency within the constraint is preselected without consulting the ``hits``,
``misses`` and ``early_hits`` metrics.]
Next, the governor takes several idle duration values observed most recently
into consideration and if at least a half of them are greater than or equal to
the target residency of the preselected idle state, that idle state becomes the
final candidate to ask for. Otherwise, the average of the most recent idle
duration values below the target residency of the preselected idle state is
computed and the governor walks the idle states shallower than the preselected
one and finds the deepest of them with the target residency within that average.
That idle state is then taken as the final candidate to ask for.
Still, at this point the governor may need to refine the idle state selection if
it has not decided to `stop the scheduler tick <idle-cpus-and-tick_>`_. That
generally happens if the target residency of the idle state selected so far is
less than the tick period and the tick has not been stopped already (in a
previous iteration of the idle loop). Then, like in the ``menu`` governor
`case <menu-gov_>`_, the sleep length used in the previous computations may not
reflect the real time until the closest timer event and if it really is greater
than that time, a shallower state with a suitable target residency may need to
be selected.
.. _idle-states-representation: .. _idle-states-representation:
......
...@@ -365,6 +365,9 @@ argument is passed to the kernel in the command line. ...@@ -365,6 +365,9 @@ argument is passed to the kernel in the command line.
inclusive) including both turbo and non-turbo P-states (see inclusive) including both turbo and non-turbo P-states (see
`Turbo P-states Support`_). `Turbo P-states Support`_).
This attribute is present only if the value exposed by it is the same
for all of the CPUs in the system.
The value of this attribute is not affected by the ``no_turbo`` The value of this attribute is not affected by the ``no_turbo``
setting described `below <no_turbo_attr_>`_. setting described `below <no_turbo_attr_>`_.
...@@ -374,6 +377,9 @@ argument is passed to the kernel in the command line. ...@@ -374,6 +377,9 @@ argument is passed to the kernel in the command line.
Ratio of the `turbo range <turbo_>`_ size to the size of the entire Ratio of the `turbo range <turbo_>`_ size to the size of the entire
range of supported P-states, in percent. range of supported P-states, in percent.
This attribute is present only if the value exposed by it is the same
for all of the CPUs in the system.
This attribute is read-only. This attribute is read-only.
.. _no_turbo_attr: .. _no_turbo_attr:
......
...@@ -1367,9 +1367,14 @@ static int cpufreq_online(unsigned int cpu) ...@@ -1367,9 +1367,14 @@ static int cpufreq_online(unsigned int cpu)
goto out_free_policy; goto out_free_policy;
} }
/*
* The initialization has succeeded and the policy is online.
* If there is a problem with its frequency table, take it
* offline and drop it.
*/
ret = cpufreq_table_validate_and_sort(policy); ret = cpufreq_table_validate_and_sort(policy);
if (ret) if (ret)
goto out_exit_policy; goto out_offline_policy;
/* related_cpus should at least include policy->cpus. */ /* related_cpus should at least include policy->cpus. */
cpumask_copy(policy->related_cpus, policy->cpus); cpumask_copy(policy->related_cpus, policy->cpus);
...@@ -1515,6 +1520,10 @@ static int cpufreq_online(unsigned int cpu) ...@@ -1515,6 +1520,10 @@ static int cpufreq_online(unsigned int cpu)
up_write(&policy->rwsem); up_write(&policy->rwsem);
out_offline_policy:
if (cpufreq_driver->offline)
cpufreq_driver->offline(policy);
out_exit_policy: out_exit_policy:
if (cpufreq_driver->exit) if (cpufreq_driver->exit)
cpufreq_driver->exit(policy); cpufreq_driver->exit(policy);
......
...@@ -211,7 +211,7 @@ void cpufreq_stats_free_table(struct cpufreq_policy *policy) ...@@ -211,7 +211,7 @@ void cpufreq_stats_free_table(struct cpufreq_policy *policy)
void cpufreq_stats_create_table(struct cpufreq_policy *policy) void cpufreq_stats_create_table(struct cpufreq_policy *policy)
{ {
unsigned int i = 0, count = 0, ret = -ENOMEM; unsigned int i = 0, count;
struct cpufreq_stats *stats; struct cpufreq_stats *stats;
unsigned int alloc_size; unsigned int alloc_size;
struct cpufreq_frequency_table *pos; struct cpufreq_frequency_table *pos;
...@@ -253,8 +253,7 @@ void cpufreq_stats_create_table(struct cpufreq_policy *policy) ...@@ -253,8 +253,7 @@ void cpufreq_stats_create_table(struct cpufreq_policy *policy)
stats->last_index = freq_table_get_index(stats, policy->cur); stats->last_index = freq_table_get_index(stats, policy->cur);
policy->stats = stats; policy->stats = stats;
ret = sysfs_create_group(&policy->kobj, &stats_attr_group); if (!sysfs_create_group(&policy->kobj, &stats_attr_group))
if (!ret)
return; return;
/* We failed, release resources */ /* We failed, release resources */
......
This diff is collapsed.
...@@ -16,7 +16,6 @@ ...@@ -16,7 +16,6 @@
#include <linux/cpufreq.h> #include <linux/cpufreq.h>
#include <linux/module.h> #include <linux/module.h>
#include <linux/err.h> #include <linux/err.h>
#include <linux/sched.h> /* set_cpus_allowed() */
#include <linux/delay.h> #include <linux/delay.h>
#include <linux/platform_device.h> #include <linux/platform_device.h>
......
...@@ -42,6 +42,7 @@ static unsigned int sc520_freq_get_cpu_frequency(unsigned int cpu) ...@@ -42,6 +42,7 @@ static unsigned int sc520_freq_get_cpu_frequency(unsigned int cpu)
default: default:
pr_err("error: cpuctl register has unexpected value %02x\n", pr_err("error: cpuctl register has unexpected value %02x\n",
clockspeed_reg); clockspeed_reg);
fallthrough;
case 0x01: case 0x01:
return 100000; return 100000;
case 0x02: case 0x02:
......
...@@ -23,7 +23,6 @@ ...@@ -23,7 +23,6 @@
#include <linux/cpumask.h> #include <linux/cpumask.h>
#include <linux/cpu.h> #include <linux/cpu.h>
#include <linux/smp.h> #include <linux/smp.h>
#include <linux/sched.h> /* set_cpus_allowed() */
#include <linux/clk.h> #include <linux/clk.h>
#include <linux/percpu.h> #include <linux/percpu.h>
#include <linux/sh_clk.h> #include <linux/sh_clk.h>
......
This diff is collapsed.
...@@ -1484,6 +1484,36 @@ static void __init sklh_idle_state_table_update(void) ...@@ -1484,6 +1484,36 @@ static void __init sklh_idle_state_table_update(void)
skl_cstates[6].flags |= CPUIDLE_FLAG_UNUSABLE; /* C9-SKL */ skl_cstates[6].flags |= CPUIDLE_FLAG_UNUSABLE; /* C9-SKL */
} }
/**
* skx_idle_state_table_update - Adjust the Sky Lake/Cascade Lake
* idle states table.
*/
static void __init skx_idle_state_table_update(void)
{
unsigned long long msr;
rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, msr);
/*
* 000b: C0/C1 (no package C-state support)
* 001b: C2
* 010b: C6 (non-retention)
* 011b: C6 (retention)
* 111b: No Package C state limits.
*/
if ((msr & 0x7) < 2) {
/*
* Uses the CC6 + PC0 latency and 3 times of
* latency for target_residency if the PC6
* is disabled in BIOS. This is consistent
* with how intel_idle driver uses _CST
* to set the target_residency.
*/
skx_cstates[2].exit_latency = 92;
skx_cstates[2].target_residency = 276;
}
}
static bool __init intel_idle_verify_cstate(unsigned int mwait_hint) static bool __init intel_idle_verify_cstate(unsigned int mwait_hint)
{ {
unsigned int mwait_cstate = MWAIT_HINT2CSTATE(mwait_hint) + 1; unsigned int mwait_cstate = MWAIT_HINT2CSTATE(mwait_hint) + 1;
...@@ -1515,6 +1545,9 @@ static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv) ...@@ -1515,6 +1545,9 @@ static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv)
case INTEL_FAM6_SKYLAKE: case INTEL_FAM6_SKYLAKE:
sklh_idle_state_table_update(); sklh_idle_state_table_update();
break; break;
case INTEL_FAM6_SKYLAKE_X:
skx_idle_state_table_update();
break;
} }
for (cstate = 0; cstate < CPUIDLE_STATE_MAX; ++cstate) { for (cstate = 0; cstate < CPUIDLE_STATE_MAX; ++cstate) {
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment