Commit a9a939cb authored by Rafael J. Wysocki's avatar Rafael J. Wysocki

Merge branches 'powercap' and 'pm-misc'

* powercap:
  powercap: intel_rapl: Use topology interface in rapl_init_domains()
  powercap: intel_rapl: Use topology interface in rapl_add_package()
  powercap/intel_rapl: add support for AlderLake Mobile
  powercap/drivers/dtpm: Fix size of object being allocated
  powercap/drivers/dtpm: Fix an IS_ERR() vs NULL check
  powercap/drivers/dtpm: Fix some missing unlock bugs
  powercap/drivers/dtpm: Fix a double shift bug
  powercap/drivers/dtpm: Fix __udivdi3 and __aeabi_uldivmod unresolved symbols
  powercap/drivers/dtpm: Add CPU energy model based support
  powercap/drivers/dtpm: Add API for dynamic thermal power management
  Documentation/powercap/dtpm: Add documentation for dtpm
  units: Add Watt units

* pm-misc:
  PM: Kconfig: remove unneeded "default n" options
  PM: EM: update Kconfig description and drop "default n" option
......@@ -30,6 +30,7 @@ Power Management
userland-swsusp
powercap/powercap
powercap/dtpm
regulator/consumer
regulator/design
......
.. SPDX-License-Identifier: GPL-2.0
==========================================
Dynamic Thermal Power Management framework
==========================================
On the embedded world, the complexity of the SoC leads to an
increasing number of hotspots which need to be monitored and mitigated
as a whole in order to prevent the temperature to go above the
normative and legally stated 'skin temperature'.
Another aspect is to sustain the performance for a given power budget,
for example virtual reality where the user can feel dizziness if the
performance is capped while a big CPU is processing something else. Or
reduce the battery charging because the dissipated power is too high
compared with the power consumed by other devices.
The user space is the most adequate place to dynamically act on the
different devices by limiting their power given an application
profile: it has the knowledge of the platform.
The Dynamic Thermal Power Management (DTPM) is a technique acting on
the device power by limiting and/or balancing a power budget among
different devices.
The DTPM framework provides an unified interface to act on the
device power.
Overview
========
The DTPM framework relies on the powercap framework to create the
powercap entries in the sysfs directory and implement the backend
driver to do the connection with the power manageable device.
The DTPM is a tree representation describing the power constraints
shared between devices, not their physical positions.
The nodes of the tree are a virtual description aggregating the power
characteristics of the children nodes and their power limitations.
The leaves of the tree are the real power manageable devices.
For instance::
SoC
|
`-- pkg
|
|-- pd0 (cpu0-3)
|
`-- pd1 (cpu4-5)
The pkg power will be the sum of pd0 and pd1 power numbers::
SoC (400mW - 3100mW)
|
`-- pkg (400mW - 3100mW)
|
|-- pd0 (100mW - 700mW)
|
`-- pd1 (300mW - 2400mW)
When the nodes are inserted in the tree, their power characteristics are propagated to the parents::
SoC (600mW - 5900mW)
|
|-- pkg (400mW - 3100mW)
| |
| |-- pd0 (100mW - 700mW)
| |
| `-- pd1 (300mW - 2400mW)
|
`-- pd2 (200mW - 2800mW)
Each node have a weight on a 2^10 basis reflecting the percentage of power consumption along the siblings::
SoC (w=1024)
|
|-- pkg (w=538)
| |
| |-- pd0 (w=231)
| |
| `-- pd1 (w=794)
|
`-- pd2 (w=486)
Note the sum of weights at the same level are equal to 1024.
When a power limitation is applied to a node, then it is distributed along the children given their weights. For example, if we set a power limitation of 3200mW at the 'SoC' root node, the resulting tree will be::
SoC (w=1024) <--- power_limit = 3200mW
|
|-- pkg (w=538) --> power_limit = 1681mW
| |
| |-- pd0 (w=231) --> power_limit = 378mW
| |
| `-- pd1 (w=794) --> power_limit = 1303mW
|
`-- pd2 (w=486) --> power_limit = 1519mW
Flat description
----------------
A root node is created and it is the parent of all the nodes. This
description is the simplest one and it is supposed to give to user
space a flat representation of all the devices supporting the power
limitation without any power limitation distribution.
Hierarchical description
------------------------
The different devices supporting the power limitation are represented
hierarchically. There is one root node, all intermediate nodes are
grouping the child nodes which can be intermediate nodes also or real
devices.
The intermediate nodes aggregate the power information and allows to
set the power limit given the weight of the nodes.
User space API
==============
As stated in the overview, the DTPM framework is built on top of the
powercap framework. Thus the sysfs interface is the same, please refer
to the powercap documentation for further details.
* power_uw: Instantaneous power consumption. If the node is an
intermediate node, then the power consumption will be the sum of all
children power consumption.
* max_power_range_uw: The power range resulting of the maximum power
minus the minimum power.
* name: The name of the node. This is implementation dependent. Even
if it is not recommended for the user space, several nodes can have
the same name.
* constraint_X_name: The name of the constraint.
* constraint_X_max_power_uw: The maximum power limit to be applicable
to the node.
* constraint_X_power_limit_uw: The power limit to be applied to the
node. If the value contained in constraint_X_max_power_uw is set,
the constraint will be removed.
* constraint_X_time_window_us: The meaning of this file will depend
on the constraint number.
Constraints
-----------
* Constraint 0: The power limitation is immediately applied, without
limitation in time.
Kernel API
==========
Overview
--------
The DTPM framework has no power limiting backend support. It is
generic and provides a set of API to let the different drivers to
implement the backend part for the power limitation and create the
power constraints tree.
It is up to the platform to provide the initialization function to
allocate and link the different nodes of the tree.
A special macro has the role of declaring a node and the corresponding
initialization function via a description structure. This one contains
an optional parent field allowing to hook different devices to an
already existing tree at boot time.
For instance::
struct dtpm_descr my_descr = {
.name = "my_name",
.init = my_init_func,
};
DTPM_DECLARE(my_descr);
The nodes of the DTPM tree are described with dtpm structure. The
steps to add a new power limitable device is done in three steps:
* Allocate the dtpm node
* Set the power number of the dtpm node
* Register the dtpm node
The registration of the dtpm node is done with the powercap
ops. Basically, it must implements the callbacks to get and set the
power and the limit.
Alternatively, if the node to be inserted is an intermediate one, then
a simple function to insert it as a future parent is available.
If a device has its power characteristics changing, then the tree must
be updated with the new power numbers and weights.
Nomenclature
------------
* dtpm_alloc() : Allocate and initialize a dtpm structure
* dtpm_register() : Add the dtpm node to the tree
* dtpm_unregister() : Remove the dtpm node from the tree
* dtpm_update_power() : Update the power characteristics of the dtpm node
......@@ -43,4 +43,17 @@ config IDLE_INJECT
CPUs for power capping. Idle period can be injected
synchronously on a set of specified CPUs or alternatively
on a per CPU basis.
config DTPM
bool "Power capping for Dynamic Thermal Power Management"
help
This enables support for the power capping for the dynamic
thermal power management userspace engine.
config DTPM_CPU
bool "Add CPU power capping based on the energy model"
depends on DTPM && ENERGY_MODEL
help
This enables support for CPU power limitation based on
energy model.
endif
# SPDX-License-Identifier: GPL-2.0-only
obj-$(CONFIG_DTPM) += dtpm.o
obj-$(CONFIG_DTPM_CPU) += dtpm_cpu.o
obj-$(CONFIG_POWERCAP) += powercap_sys.o
obj-$(CONFIG_INTEL_RAPL_CORE) += intel_rapl_common.o
obj-$(CONFIG_INTEL_RAPL) += intel_rapl_msr.o
......
This diff is collapsed.
// SPDX-License-Identifier: GPL-2.0-only
/*
* Copyright 2020 Linaro Limited
*
* Author: Daniel Lezcano <daniel.lezcano@linaro.org>
*
* The DTPM CPU is based on the energy model. It hooks the CPU in the
* DTPM tree which in turns update the power number by propagating the
* power number from the CPU energy model information to the parents.
*
* The association between the power and the performance state, allows
* to set the power of the CPU at the OPP granularity.
*
* The CPU hotplug is supported and the power numbers will be updated
* if a CPU is hot plugged / unplugged.
*/
#include <linux/cpumask.h>
#include <linux/cpufreq.h>
#include <linux/cpuhotplug.h>
#include <linux/dtpm.h>
#include <linux/energy_model.h>
#include <linux/pm_qos.h>
#include <linux/slab.h>
#include <linux/units.h>
static struct dtpm *__parent;
static DEFINE_PER_CPU(struct dtpm *, dtpm_per_cpu);
struct dtpm_cpu {
struct freq_qos_request qos_req;
int cpu;
};
/*
* When a new CPU is inserted at hotplug or boot time, add the power
* contribution and update the dtpm tree.
*/
static int power_add(struct dtpm *dtpm, struct em_perf_domain *em)
{
u64 power_min, power_max;
power_min = em->table[0].power;
power_min *= MICROWATT_PER_MILLIWATT;
power_min += dtpm->power_min;
power_max = em->table[em->nr_perf_states - 1].power;
power_max *= MICROWATT_PER_MILLIWATT;
power_max += dtpm->power_max;
return dtpm_update_power(dtpm, power_min, power_max);
}
/*
* When a CPU is unplugged, remove its power contribution from the
* dtpm tree.
*/
static int power_sub(struct dtpm *dtpm, struct em_perf_domain *em)
{
u64 power_min, power_max;
power_min = em->table[0].power;
power_min *= MICROWATT_PER_MILLIWATT;
power_min = dtpm->power_min - power_min;
power_max = em->table[em->nr_perf_states - 1].power;
power_max *= MICROWATT_PER_MILLIWATT;
power_max = dtpm->power_max - power_max;
return dtpm_update_power(dtpm, power_min, power_max);
}
static u64 set_pd_power_limit(struct dtpm *dtpm, u64 power_limit)
{
struct dtpm_cpu *dtpm_cpu = dtpm->private;
struct em_perf_domain *pd;
struct cpumask cpus;
unsigned long freq;
u64 power;
int i, nr_cpus;
pd = em_cpu_get(dtpm_cpu->cpu);
cpumask_and(&cpus, cpu_online_mask, to_cpumask(pd->cpus));
nr_cpus = cpumask_weight(&cpus);
for (i = 0; i < pd->nr_perf_states; i++) {
power = pd->table[i].power * MICROWATT_PER_MILLIWATT * nr_cpus;
if (power > power_limit)
break;
}
freq = pd->table[i - 1].frequency;
freq_qos_update_request(&dtpm_cpu->qos_req, freq);
power_limit = pd->table[i - 1].power *
MICROWATT_PER_MILLIWATT * nr_cpus;
return power_limit;
}
static u64 get_pd_power_uw(struct dtpm *dtpm)
{
struct dtpm_cpu *dtpm_cpu = dtpm->private;
struct em_perf_domain *pd;
struct cpumask cpus;
unsigned long freq;
int i, nr_cpus;
pd = em_cpu_get(dtpm_cpu->cpu);
freq = cpufreq_quick_get(dtpm_cpu->cpu);
cpumask_and(&cpus, cpu_online_mask, to_cpumask(pd->cpus));
nr_cpus = cpumask_weight(&cpus);
for (i = 0; i < pd->nr_perf_states; i++) {
if (pd->table[i].frequency < freq)
continue;
return pd->table[i].power *
MICROWATT_PER_MILLIWATT * nr_cpus;
}
return 0;
}
static void pd_release(struct dtpm *dtpm)
{
struct dtpm_cpu *dtpm_cpu = dtpm->private;
if (freq_qos_request_active(&dtpm_cpu->qos_req))
freq_qos_remove_request(&dtpm_cpu->qos_req);
kfree(dtpm_cpu);
}
static struct dtpm_ops dtpm_ops = {
.set_power_uw = set_pd_power_limit,
.get_power_uw = get_pd_power_uw,
.release = pd_release,
};
static int cpuhp_dtpm_cpu_offline(unsigned int cpu)
{
struct cpufreq_policy *policy;
struct em_perf_domain *pd;
struct dtpm *dtpm;
policy = cpufreq_cpu_get(cpu);
if (!policy)
return 0;
pd = em_cpu_get(cpu);
if (!pd)
return -EINVAL;
dtpm = per_cpu(dtpm_per_cpu, cpu);
power_sub(dtpm, pd);
if (cpumask_weight(policy->cpus) != 1)
return 0;
for_each_cpu(cpu, policy->related_cpus)
per_cpu(dtpm_per_cpu, cpu) = NULL;
dtpm_unregister(dtpm);
return 0;
}
static int cpuhp_dtpm_cpu_online(unsigned int cpu)
{
struct dtpm *dtpm;
struct dtpm_cpu *dtpm_cpu;
struct cpufreq_policy *policy;
struct em_perf_domain *pd;
char name[CPUFREQ_NAME_LEN];
int ret = -ENOMEM;
policy = cpufreq_cpu_get(cpu);
if (!policy)
return 0;
pd = em_cpu_get(cpu);
if (!pd)
return -EINVAL;
dtpm = per_cpu(dtpm_per_cpu, cpu);
if (dtpm)
return power_add(dtpm, pd);
dtpm = dtpm_alloc(&dtpm_ops);
if (!dtpm)
return -EINVAL;
dtpm_cpu = kzalloc(sizeof(*dtpm_cpu), GFP_KERNEL);
if (!dtpm_cpu)
goto out_kfree_dtpm;
dtpm->private = dtpm_cpu;
dtpm_cpu->cpu = cpu;
for_each_cpu(cpu, policy->related_cpus)
per_cpu(dtpm_per_cpu, cpu) = dtpm;
sprintf(name, "cpu%d", dtpm_cpu->cpu);
ret = dtpm_register(name, dtpm, __parent);
if (ret)
goto out_kfree_dtpm_cpu;
ret = power_add(dtpm, pd);
if (ret)
goto out_dtpm_unregister;
ret = freq_qos_add_request(&policy->constraints,
&dtpm_cpu->qos_req, FREQ_QOS_MAX,
pd->table[pd->nr_perf_states - 1].frequency);
if (ret)
goto out_power_sub;
return 0;
out_power_sub:
power_sub(dtpm, pd);
out_dtpm_unregister:
dtpm_unregister(dtpm);
dtpm_cpu = NULL;
dtpm = NULL;
out_kfree_dtpm_cpu:
for_each_cpu(cpu, policy->related_cpus)
per_cpu(dtpm_per_cpu, cpu) = NULL;
kfree(dtpm_cpu);
out_kfree_dtpm:
kfree(dtpm);
return ret;
}
int dtpm_register_cpu(struct dtpm *parent)
{
__parent = parent;
return cpuhp_setup_state(CPUHP_AP_DTPM_CPU_ONLINE,
"dtpm_cpu:online",
cpuhp_dtpm_cpu_online,
cpuhp_dtpm_cpu_offline);
}
......@@ -547,7 +547,7 @@ static void rapl_init_domains(struct rapl_package *rp)
if (i == RAPL_DOMAIN_PLATFORM && rp->id > 0) {
snprintf(rd->name, RAPL_DOMAIN_NAME_LENGTH, "psys-%d",
cpu_data(rp->lead_cpu).phys_proc_id);
topology_physical_package_id(rp->lead_cpu));
} else
snprintf(rd->name, RAPL_DOMAIN_NAME_LENGTH, "%s",
rapl_domain_names[i]);
......@@ -1049,6 +1049,7 @@ static const struct x86_cpu_id rapl_ids[] __initconst = {
X86_MATCH_INTEL_FAM6_MODEL(TIGERLAKE, &rapl_defaults_core),
X86_MATCH_INTEL_FAM6_MODEL(ROCKETLAKE, &rapl_defaults_core),
X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE, &rapl_defaults_core),
X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE_L, &rapl_defaults_core),
X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, &rapl_defaults_spr_server),
X86_MATCH_INTEL_FAM6_MODEL(LAKEFIELD, &rapl_defaults_core),
......@@ -1309,7 +1310,6 @@ struct rapl_package *rapl_add_package(int cpu, struct rapl_if_priv *priv)
{
int id = topology_logical_die_id(cpu);
struct rapl_package *rp;
struct cpuinfo_x86 *c = &cpu_data(cpu);
int ret;
if (!rapl_defaults)
......@@ -1326,10 +1326,11 @@ struct rapl_package *rapl_add_package(int cpu, struct rapl_if_priv *priv)
if (topology_max_die_per_package() > 1)
snprintf(rp->name, PACKAGE_DOMAIN_NAME_LENGTH,
"package-%d-die-%d", c->phys_proc_id, c->cpu_die_id);
"package-%d-die-%d",
topology_physical_package_id(cpu), topology_die_id(cpu));
else
snprintf(rp->name, PACKAGE_DOMAIN_NAME_LENGTH, "package-%d",
c->phys_proc_id);
topology_physical_package_id(cpu));
/* check if the package contains valid domains */
if (rapl_detect_domains(rp, cpu) || rapl_defaults->check_unit(rp, cpu)) {
......
......@@ -316,6 +316,16 @@
#define THERMAL_TABLE(name)
#endif
#ifdef CONFIG_DTPM
#define DTPM_TABLE() \
. = ALIGN(8); \
__dtpm_table = .; \
KEEP(*(__dtpm_table)) \
__dtpm_table_end = .;
#else
#define DTPM_TABLE()
#endif
#define KERNEL_DTB() \
STRUCT_ALIGN(); \
__dtb_start = .; \
......@@ -733,6 +743,7 @@
ACPI_PROBE_TABLE(irqchip) \
ACPI_PROBE_TABLE(timer) \
THERMAL_TABLE(governor) \
DTPM_TABLE() \
EARLYCON_TABLE() \
LSM_TABLE() \
EARLY_LSM_TABLE() \
......
......@@ -193,6 +193,7 @@ enum cpuhp_state {
CPUHP_AP_ONLINE_DYN_END = CPUHP_AP_ONLINE_DYN + 30,
CPUHP_AP_X86_HPET_ONLINE,
CPUHP_AP_X86_KVM_CLK_ONLINE,
CPUHP_AP_DTPM_CPU_ONLINE,
CPUHP_AP_ACTIVE,
CPUHP_ONLINE,
};
......
/* SPDX-License-Identifier: GPL-2.0-only */
/*
* Copyright (C) 2020 Linaro Ltd
*
* Author: Daniel Lezcano <daniel.lezcano@linaro.org>
*/
#ifndef ___DTPM_H__
#define ___DTPM_H__
#include <linux/powercap.h>
#define MAX_DTPM_DESCR 8
#define MAX_DTPM_CONSTRAINTS 1
struct dtpm {
struct powercap_zone zone;
struct dtpm *parent;
struct list_head sibling;
struct list_head children;
struct dtpm_ops *ops;
unsigned long flags;
u64 power_limit;
u64 power_max;
u64 power_min;
int weight;
void *private;
};
struct dtpm_ops {
u64 (*set_power_uw)(struct dtpm *, u64);
u64 (*get_power_uw)(struct dtpm *);
void (*release)(struct dtpm *);
};
struct dtpm_descr;
typedef int (*dtpm_init_t)(struct dtpm_descr *);
struct dtpm_descr {
struct dtpm *parent;
const char *name;
dtpm_init_t init;
};
/* Init section thermal table */
extern struct dtpm_descr *__dtpm_table[];
extern struct dtpm_descr *__dtpm_table_end[];
#define DTPM_TABLE_ENTRY(name) \
static typeof(name) *__dtpm_table_entry_##name \
__used __section("__dtpm_table") = &name
#define DTPM_DECLARE(name) DTPM_TABLE_ENTRY(name)
#define for_each_dtpm_table(__dtpm) \
for (__dtpm = __dtpm_table; \
__dtpm < __dtpm_table_end; \
__dtpm++)
static inline struct dtpm *to_dtpm(struct powercap_zone *zone)
{
return container_of(zone, struct dtpm, zone);
}
int dtpm_update_power(struct dtpm *dtpm, u64 power_min, u64 power_max);
int dtpm_release_zone(struct powercap_zone *pcz);
struct dtpm *dtpm_alloc(struct dtpm_ops *ops);
void dtpm_unregister(struct dtpm *dtpm);
int dtpm_register(const char *name, struct dtpm *dtpm, struct dtpm *parent);
int dtpm_register_cpu(struct dtpm *parent);
#endif
......@@ -4,6 +4,10 @@
#include <linux/math.h>
#define MILLIWATT_PER_WATT 1000L
#define MICROWATT_PER_MILLIWATT 1000L
#define MICROWATT_PER_WATT 1000000L
#define ABSOLUTE_ZERO_MILLICELSIUS -273150
static inline long milli_kelvin_to_millicelsius(long t)
......
......@@ -139,7 +139,6 @@ config PM_SLEEP_SMP_NONZERO_CPU
config PM_AUTOSLEEP
bool "Opportunistic sleep"
depends on PM_SLEEP
default n
help
Allow the kernel to trigger a system transition into a global sleep
state automatically whenever there are no active wakeup sources.
......@@ -147,7 +146,6 @@ config PM_AUTOSLEEP
config PM_WAKELOCKS
bool "User space wakeup sources interface"
depends on PM_SLEEP
default n
help
Allow user space to create, activate and deactivate wakeup source
objects with the help of a sysfs-based interface.
......@@ -293,7 +291,6 @@ config PM_GENERIC_DOMAINS
config WQ_POWER_EFFICIENT_DEFAULT
bool "Enable workqueue power-efficient mode by default"
depends on PM
default n
help
Per-cpu workqueues are generally preferred because they show
better performance thanks to cache locality; unfortunately,
......@@ -322,15 +319,14 @@ config CPU_PM
bool
config ENERGY_MODEL
bool "Energy Model for CPUs"
bool "Energy Model for devices with DVFS (CPUs, GPUs, etc)"
depends on SMP
depends on CPU_FREQ
default n
help
Several subsystems (thermal and/or the task scheduler for example)
can leverage information about the energy consumed by CPUs to make
smarter decisions. This config option enables the framework from
which subsystems can access the energy models.
can leverage information about the energy consumed by devices to
make smarter decisions. This config option enables the framework
from which subsystems can access the energy models.
The exact usage of the energy model is subsystem-dependent.
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment