Commit bbfa306a authored by Naga Chumbalkar's avatar Naga Chumbalkar Committed by Jesse Barnes

PCI: Changing ASPM policy, via /sys, to POWERSAVE could cause NMIs

v3 -> v2: Modified the text that describes the problem
v2 -> v1: Returned -EPERM
v1      : http://marc.info/?l=linux-pci&m=130013194803727&w=2

For servers whose hardware cannot handle ASPM the BIOS ought to set the
FADT bit shown below:
In Sec 5.2.9.3 (IA-PC Boot Arch. Flags) of ACPI4.0a Specification, please
see Table 5-11:
PCIe ASPM Controls: If set, indicates to OSPM that it must not enable
OPSM ASPM control on this platform.

However there are shipping servers whose BIOS did not set this bit. (An
example is the HP ProLiant DL385 G6. A Maintenance BIOS will fix that).
For such servers even if a call is made via pci_no_aspm(), based on _OSC
support in the BIOS, it may be too late because the ASPM code may have
already allocated and filled its "link_list".

So if a user sets the ASPM "policy" to "powersave" via /sys then
pcie_aspm_set_policy() will run through the "link_list" and re-configure
ASPM policy on devices that advertise ASPM L0s/L1 capability:
# echo powersave > /sys/module/pcie_aspm/parameters/policy
# cat /sys/module/pcie_aspm/parameters/policy
default performance [powersave]

That can cause NMIs since the hardware doesn't play well with ASPM:
[ 1651.906015] NMI: PCI system error (SERR) for reason b1 on CPU 0.
[ 1651.906015] Dazed and confused, but trying to continue

Ideally, the BIOS should have set that FADT bit in the first place but we
could be more robust - especially given the fact that Windows doesn't
cause NMIs in the above scenario.

There should be a sanity check to not allow a user to modify ASPM policy
when aspm_disabled is set.
Signed-off-by: default avatarNaga Chumbalkar <nagananda.chumbalkar@hp.com>
Acked-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: default avatarJesse Barnes <jbarnes@virtuousgeek.org>
parent 1a680b7c
...@@ -770,6 +770,8 @@ static int pcie_aspm_set_policy(const char *val, struct kernel_param *kp) ...@@ -770,6 +770,8 @@ static int pcie_aspm_set_policy(const char *val, struct kernel_param *kp)
int i; int i;
struct pcie_link_state *link; struct pcie_link_state *link;
if (aspm_disabled)
return -EPERM;
for (i = 0; i < ARRAY_SIZE(policy_str); i++) for (i = 0; i < ARRAY_SIZE(policy_str); i++)
if (!strncmp(val, policy_str[i], strlen(policy_str[i]))) if (!strncmp(val, policy_str[i], strlen(policy_str[i])))
break; break;
...@@ -824,6 +826,8 @@ static ssize_t link_state_store(struct device *dev, ...@@ -824,6 +826,8 @@ static ssize_t link_state_store(struct device *dev,
struct pcie_link_state *link, *root = pdev->link_state->root; struct pcie_link_state *link, *root = pdev->link_state->root;
u32 val = buf[0] - '0', state = 0; u32 val = buf[0] - '0', state = 0;
if (aspm_disabled)
return -EPERM;
if (n < 1 || val > 3) if (n < 1 || val > 3)
return -EINVAL; return -EINVAL;
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment