• Kajetan Puchalski's avatar
    cpuidle: teo: Introduce util-awareness · 9ce0f7c4
    Kajetan Puchalski authored
    Modern interactive systems, such as recent Android phones, tend to have
    power efficient shallow idle states. Selecting deeper idle states on a
    device while a latency-sensitive workload is running can adversely
    impact performance due to increased latency. Additionally, if the CPU
    wakes up from a deeper sleep before its target residency as is often the
    case, it results in a waste of energy on top of that.
    
    At the moment, none of the available idle governors take any scheduling
    information into account. They also tend to overestimate the idle
    duration quite often, which causes them to select excessively deep idle
    states, thus leading to increased wakeup latency and lower performance
    with no power saving. For 'menu' while web browsing on Android for
    instance, those types of wakeups ('too deep') account for over 24% of
    all wakeups.
    
    At the same time, on some platforms idle state 0 can be power efficient
    enough to warrant wanting to prefer it over idle state 1. This is
    because the power usage of the two states can be so close that
    sufficient amounts of too deep state 1 sleeps can completely offset the
    state 1 power saving to the point where it would've been more power
    efficient to just use state 0 instead. This is, of course, for systems
    where state 0 is not a polling state, such as arm-based devices.
    
    Sleeps that happened in state 0 while they could have used state 1 ('too
    shallow') only save less power than they otherwise could have. Too deep
    sleeps, on the other hand, harm performance and nullify the potential
    power saving from using state 1 in the first place. While taking this
    into account, it is clear that on balance it is preferable for an idle
    governor to have more too shallow sleeps instead of more too deep sleeps
    on those kinds of platforms.
    
    This patch specifically tunes TEO to prefer shallower idle states in
    order to reduce wakeup latency and achieve better performance.
    
    To this end, before selecting the next idle state it uses the avg_util
    signal of a CPU's runqueue in order to determine to what extent the CPU
    is being utilized. This util value is then compared to a threshold
    defined as a percentage of the CPU's capacity (capacity >> 6 ie. ~1.5%
    in the current implementation). If the util is above the threshold, the
    index of the idle state selected by TEO metrics will be reduced by 1,
    thus selecting a shallower state. If the util is below the threshold,
    the governor defaults to the TEO metrics mechanism to try to select the
    deepest available idle state based on the closest timer event and its
    own correctness.
    
    The main goal of this is to reduce latency and increase performance for
    some workloads. Under some workloads it will result in an increase in
    power usage (Geekbench 5) while for other workloads it will also result
    in a decrease in power usage compared to TEO (PCMark Web, Jankbench,
    Speedometer).
    
    It can provide drastically decreased latency and performance benefits in
    certain types of workloads that are sensitive to latency.
    
    Example test results:
    
    1. GB5 (better score, latency & more power usage)
    
    | metric                                | menu           | teo               | teo-util-aware    |
    | ------------------------------------- | -------------- | ----------------- | ----------------- |
    | gmean score                           | 2826.5 (0.0%)  | 2764.8 (-2.18%)   | 2865 (1.36%)      |
    | gmean power usage [mW]                | 2551.4 (0.0%)  | 2606.8 (2.17%)    | 2722.3 (6.7%)     |
    | gmean too deep %                      | 14.99%         | 9.65%             | 4.02%             |
    | gmean too shallow %                   | 2.5%           | 5.96%             | 14.59%            |
    | gmean task wakeup latency (asynctask) | 78.16μs (0.0%) | 61.60μs (-21.19%) | 54.45μs (-30.34%) |
    
    2. Jankbench (better score, latency & less power usage)
    
    | metric                                | menu           | teo               | teo-util-aware    |
    | ------------------------------------- | -------------- | ----------------- | ----------------- |
    | gmean frame duration                  | 13.9 (0.0%)    | 14.7 (6.0%)       | 12.6 (-9.0%)      |
    | gmean jank percentage                 | 1.5 (0.0%)     | 2.1 (36.99%)      | 1.3 (-17.37%)     |
    | gmean power usage [mW]                | 144.6 (0.0%)   | 136.9 (-5.27%)    | 121.3 (-16.08%)   |
    | gmean too deep %                      | 26.00%         | 11.00%            | 2.54%             |
    | gmean too shallow %                   | 4.74%          | 11.89%            | 21.93%            |
    | gmean wakeup latency (RenderThread)   | 139.5μs (0.0%) | 116.5μs (-16.49%) | 91.11μs (-34.7%)  |
    | gmean wakeup latency (surfaceflinger) | 124.0μs (0.0%) | 151.9μs (22.47%)  | 87.65μs (-29.33%) |
    Signed-off-by: default avatarKajetan Puchalski <kajetan.puchalski@arm.com>
    [ rjw: Comment edits and white space adjustments ]
    Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
    9ce0f7c4
teo.c 20.5 KB