• Rafael J. Wysocki's avatar
    cpufreq: ACPI: Extend frequency tables to cover boost frequencies · 3c55e94c
    Rafael J. Wysocki authored
    A severe performance regression on AMD EPYC processors when using
    the schedutil scaling governor was discovered by Phoronix.com and
    attributed to the following commits:
    
      41ea6672 ("x86, sched: Calculate frequency invariance for AMD
      systems")
    
      976df7e5 ("x86, sched: Use midpoint of max_boost and max_P for
      frequency invariance on AMD EPYC")
    
    The source of the problem is that the maximum performance level taken
    for computing the arch_max_freq_ratio value used in the x86 scale-
    invariance code is higher than the one corresponding to the
    cpuinfo.max_freq value coming from the acpi_cpufreq driver.
    
    This effectively causes the scale-invariant utilization to fall below
    100% even if the CPU runs at cpuinfo.max_freq or slightly faster, so
    the schedutil governor selects a frequency below cpuinfo.max_freq
    then.  That frequency corresponds to a frequency table entry below
    the maximum performance level necessary to get to the "boost" range
    of CPU frequencies.
    
    However, if the cpuinfo.max_freq value coming from acpi_cpufreq was
    higher, the schedutil governor would select higher frequencies which
    in turn would allow acpi_cpufreq to set more adequate performance
    levels and to get to the "boost" range of CPU frequencies more often.
    
    This issue affects any systems where acpi_cpufreq is used and the
    "boost" (or "turbo") frequencies are enabled, not just AMD EPYC.
    Moreover, commit db865272 ("cpufreq: Avoid configuring old
    governors as default with intel_pstate") from the 5.10 development
    cycle made it extremely easy to default to schedutil even if the
    preferred driver is acpi_cpufreq as long as intel_pstate is built
    too, because the mere presence of the latter effectively removes the
    ondemand governor from the defaults.  Distro kernels are likely to
    include both intel_pstate and acpi_cpufreq on x86, so their users
    who cannot use intel_pstate or choose to use acpi_cpufreq may
    easily be affectecd by this issue.
    
    To address this issue, extend the frequency table constructed by
    acpi_cpufreq for each CPU to cover the entire range of available
    frequencies (including the "boost" ones) if CPPC is available and
    indicates that "boost" (or "turbo") frequencies are enabled.  That
    causes cpuinfo.max_freq to become the maximum "boost" frequency of
    the given CPU (instead of the maximum frequency returned by the ACPI
    _PSS object that corresponds to the "nominal" performance level).
    
    Fixes: 41ea6672 ("x86, sched: Calculate frequency invariance for AMD systems")
    Fixes: 976df7e5 ("x86, sched: Use midpoint of max_boost and max_P for frequency invariance on AMD EPYC")
    Fixes: db865272 ("cpufreq: Avoid configuring old governors as default with intel_pstate")
    Link: https://www.phoronix.com/scan.php?page=article&item=linux511-amd-schedutil&num=1
    Link: https://lore.kernel.org/linux-pm/20210203135321.12253-2-ggherdovich@suse.cz/Reported-by: default avatarMichael Larabel <Michael@phoronix.com>
    Diagnosed-by: default avatarGiovanni Gherdovich <ggherdovich@suse.cz>
    Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
    Tested-by: default avatarGiovanni Gherdovich <ggherdovich@suse.cz>
    Reviewed-by: default avatarGiovanni Gherdovich <ggherdovich@suse.cz>
    Tested-by: default avatarMichael Larabel <Michael@phoronix.com>
    3c55e94c
acpi-cpufreq.c 27 KB