Commit ab271bc9 authored by Rafael J. Wysocki's avatar Rafael J. Wysocki

Merge branch 'intel_pstate'

* intel_pstate:
  cpufreq: intel_pstate: Shorten a couple of long names
  cpufreq: intel_pstate: Simplify intel_pstate_adjust_pstate()
  cpufreq: intel_pstate: Improve IO performance with per-core P-states
  cpufreq: intel_pstate: Drop INTEL_PSTATE_HWP_SAMPLING_INTERVAL
  cpufreq: intel_pstate: Drop ->update_util from pstate_funcs
  cpufreq: intel_pstate: Do not use PID-based P-state selection
parents 08a10002 57ccaf33
...@@ -167,35 +167,17 @@ is set. ...@@ -167,35 +167,17 @@ is set.
``powersave`` ``powersave``
............. .............
Without HWP, this P-state selection algorithm generally depends on the Without HWP, this P-state selection algorithm is similar to the algorithm
processor model and/or the system profile setting in the ACPI tables and there
are two variants of it.
One of them is used with processors from the Atom line and (regardless of the
processor model) on platforms with the system profile in the ACPI tables set to
"mobile" (laptops mostly), "tablet", "appliance PC", "desktop", or
"workstation". It is also used with processors supporting the HWP feature if
that feature has not been enabled (that is, with the ``intel_pstate=no_hwp``
argument in the kernel command line). It is similar to the algorithm
implemented by the generic ``schedutil`` scaling governor except that the implemented by the generic ``schedutil`` scaling governor except that the
utilization metric used by it is based on numbers coming from feedback utilization metric used by it is based on numbers coming from feedback
registers of the CPU. It generally selects P-states proportional to the registers of the CPU. It generally selects P-states proportional to the
current CPU utilization, so it is referred to as the "proportional" algorithm. current CPU utilization.
The second variant of the ``powersave`` P-state selection algorithm, used in all This algorithm is run by the driver's utilization update callback for the
of the other cases (generally, on processors from the Core line, so it is given CPU when it is invoked by the CPU scheduler, but not more often than
referred to as the "Core" algorithm), is based on the values read from the APERF every 10 ms. Like in the ``performance`` case, the hardware configuration
and MPERF feedback registers and the previously requested target P-state. is not touched if the new P-state turns out to be the same as the current
It does not really take CPU utilization into account explicitly, but as a rule one.
it causes the CPU P-state to ramp up very quickly in response to increased
utilization which is generally desirable in server environments.
Regardless of the variant, this algorithm is run by the driver's utilization
update callback for the given CPU when it is invoked by the CPU scheduler, but
not more often than every 10 ms (that can be tweaked via ``debugfs`` in `this
particular case <Tuning Interface in debugfs_>`_). Like in the ``performance``
case, the hardware configuration is not touched if the new P-state turns out to
be the same as the current one.
This is the default P-state selection algorithm if the This is the default P-state selection algorithm if the
:c:macro:`CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE` kernel configuration option :c:macro:`CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE` kernel configuration option
...@@ -720,34 +702,7 @@ P-state is called, the ``ftrace`` filter can be set to to ...@@ -720,34 +702,7 @@ P-state is called, the ``ftrace`` filter can be set to to
gnome-shell-3409 [001] ..s. 2537.650850: intel_pstate_set_pstate <-intel_pstate_timer_func gnome-shell-3409 [001] ..s. 2537.650850: intel_pstate_set_pstate <-intel_pstate_timer_func
<idle>-0 [000] ..s. 2537.654843: intel_pstate_set_pstate <-intel_pstate_timer_func <idle>-0 [000] ..s. 2537.654843: intel_pstate_set_pstate <-intel_pstate_timer_func
Tuning Interface in ``debugfs``
-------------------------------
The ``powersave`` algorithm provided by ``intel_pstate`` for `the Core line of
processors in the active mode <powersave_>`_ is based on a `PID controller`_
whose parameters were chosen to address a number of different use cases at the
same time. However, it still is possible to fine-tune it to a specific workload
and the ``debugfs`` interface under ``/sys/kernel/debug/pstate_snb/`` is
provided for this purpose. [Note that the ``pstate_snb`` directory will be
present only if the specific P-state selection algorithm matching the interface
in it actually is in use.]
The following files present in that directory can be used to modify the PID
controller parameters at run time:
| ``deadband``
| ``d_gain_pct``
| ``i_gain_pct``
| ``p_gain_pct``
| ``sample_rate_ms``
| ``setpoint``
Note, however, that achieving desirable results this way generally requires
expert-level understanding of the power vs performance tradeoff, so extra care
is recommended when attempting to do that.
.. _LCEU2015: http://events.linuxfoundation.org/sites/events/files/slides/LinuxConEurope_2015.pdf .. _LCEU2015: http://events.linuxfoundation.org/sites/events/files/slides/LinuxConEurope_2015.pdf
.. _SDM: http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-system-programming-manual-325384.html .. _SDM: http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-system-programming-manual-325384.html
.. _ACPI specification: http://www.uefi.org/sites/default/files/resources/ACPI_6_1.pdf .. _ACPI specification: http://www.uefi.org/sites/default/files/resources/ACPI_6_1.pdf
.. _PID controller: https://en.wikipedia.org/wiki/PID_controller
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment