• Juri Lelli's avatar
    cpufreq: schedutil: use now as reference when aggregating shared policy requests · d86ab9cf
    Juri Lelli authored
    Currently, sugov_next_freq_shared() uses last_freq_update_time as a
    reference to decide when to start considering CPU contributions as
    stale.
    
    However, since last_freq_update_time is set by the last CPU that issued
    a frequency transition, this might cause problems in certain cases. In
    practice, the detection of stale utilization values fails whenever the
    CPU with such values was the last to update the policy. For example (and
    please note again that the SCHED_CPUFREQ_RT flag is not the problem
    here, but only the detection of after how much time that flag has to be
    considered stale), suppose a policy with 2 CPUs:
    
                   CPU0                |               CPU1
                                       |
                                       |     RT task scheduled
                                       |     SCHED_CPUFREQ_RT is set
                                       |     CPU1->last_update = now
                                       |     freq transition to max
                                       |     last_freq_update_time = now
                                       |
    
                            more than TICK_NSEC nsecs
    
                                       |
         a small CFS wakes up          |
         CPU0->last_update = now1      |
         delta_ns(CPU0) < TICK_NSEC*   |
         CPU0's util is considered     |
         delta_ns(CPU1) =              |
          last_freq_update_time -      |
          CPU1->last_update = 0        |
          < TICK_NSEC                  |
         CPU1 is still considered      |
         CPU1->SCHED_CPUFREQ_RT is set |
         we stay at max (until CPU1    |
         exits from idle)              |
    
    * delta_ns is actually negative as now1 > last_freq_update_time
    
    While last_freq_update_time is a sensible reference for rate limiting,
    it doesn't seem to be useful for working around stale CPU states.
    
    Fix the problem by always considering now (time) as the reference for
    deciding when CPUs have stale contributions.
    Signed-off-by: default avatarJuri Lelli <juri.lelli@arm.com>
    Acked-by: default avatarVincent Guittot <vincent.guittot@linaro.org>
    Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
    Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
    d86ab9cf
cpufreq_schedutil.c 17.2 KB