1. 18 May, 2016 1 commit
    • Daniel Lezcano's avatar
      cpuidle: Fix cpuidle_state_is_coupled() argument in cpuidle_enter() · e7387da5
      Daniel Lezcano authored
      Commit 0b89e9aa (cpuidle: delay enabling interrupts until all
      coupled CPUs leave idle) rightfully fixed a regression by letting
      the coupled idle state framework to handle local interrupt enabling
      when the CPU is exiting an idle state.
      
      The current code checks if the idle state is coupled and, if so, it
      will let the coupled code to enable interrupts. This way, it can
      decrement the ready-count before handling the interrupt. This
      mechanism prevents the other CPUs from waiting for a CPU which is
      handling interrupts.
      
      But the check is done against the state index returned by the back
      end driver's ->enter functions which could be different from the
      initial index passed as parameter to the cpuidle_enter_state()
      function.
      
       entered_state = target_state->enter(dev, drv, index);
      
       [ ... ]
      
       if (!cpuidle_state_is_coupled(drv, entered_state))
      	local_irq_enable();
      
       [ ... ]
      
      If the 'index' is referring to a coupled idle state but the
      'entered_state' is *not* coupled, then the interrupts are enabled
      again. All CPUs blocked on the sync barrier may busy loop longer
      if the CPU has interrupts to handle before decrementing the
      ready-count. That's consuming more energy than saving.
      
      Fixes: 0b89e9aa (cpuidle: delay enabling interrupts until all coupled CPUs leave idle)
      Signed-off-by: default avatarDaniel Lezcano <daniel.lezcano@linaro.org>
      Cc: 3.15+ <stable@vger.kernel.org> # 3.15+
      [ rjw: Subject & changelog ]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      e7387da5
  2. 06 May, 2016 1 commit
  3. 28 Apr, 2016 1 commit
  4. 26 Apr, 2016 3 commits
    • Daniel Lezcano's avatar
      cpuidle: Replace ktime_get() with local_clock() · e93e59ce
      Daniel Lezcano authored
      The ktime_get() can have a non negligeable overhead, use local_clock()
      instead.
      
      In order to test the difference between ktime_get() and local_clock(),
      a quick hack has been added to trigger, via debugfs, 10000 times a
      call to ktime_get() and local_clock() and measure the elapsed time.
      
      Then the average value, the min and max is computed for each call.
      
      From userspace, the test above was called 100 times every 2 seconds.
      
      So, ktime_get() and local_clock() have been called 1000000 times in
      total.
      
      The results are:
      
      ktime_get():
      ============
       * average: 101 ns (stddev: 27.4)
       * maximum: 38313 ns
       * minimum: 65 ns
      
      local_clock():
      ==============
       * average: 60 ns (stddev: 9.8)
       * maximum: 13487 ns
       * minimum: 46 ns
      
      The local_clock() is faster and more stable.
      
      Even if it is a drop in the ocean, changing the ktime_get() by the
      local_clock() allows to save 80ns at idle time (entry + exit). And
      in some circumstances, especially when there are several CPUs racing
      for the clock access, we save tens of microseconds.
      
      The idle duration resulting from a diff is converted from nanosec to
      microsec. This could be done with integer division (div 1000) - which is
      an expensive operation or by 10 bits shifting (div 1024) - which is fast
      but unprecise.
      
      The following table gives some results at the limits.
      
       ------------------------------------------
      |   nsec   |   div(1000)   |   div(1024)   |
       ------------------------------------------
      |   1e3    |        1 usec |      976 nsec |
       ------------------------------------------
      |   1e6    |     1000 usec |      976 usec |
       ------------------------------------------
      |   1e9    |  1000000 usec |   976562 usec |
       ------------------------------------------
      
      There is a linear deviation of 2.34%. This loss of precision is acceptable
      in the context of the resulting diff which is used for statistics. These
      ones are processed to guess estimate an approximation of the duration of the
      next idle period which ends up into an idle state selection. The selection
      criteria takes into account the next duration based on large intervals,
      represented by the idle state's target residency.
      
      The 2^10 division is enough because the approximation regarding the 1e3
      division is lost in all the approximations done for the next idle duration
      computation.
      Signed-off-by: default avatarDaniel Lezcano <daniel.lezcano@linaro.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      [ rjw: Subject ]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      e93e59ce
    • Rafael J. Wysocki's avatar
      f0fb0dd0
    • Rafael J. Wysocki's avatar
      Merge branch 'cpuidle/4.7' of http://git.linaro.org/people/daniel.lezcano/linux into tmp · b5ebbcdb
      Rafael J. Wysocki authored
      Pull ARM cpuidle changes for v4.7 from Daniel Lezcano.
      
      * 'cpuidle/4.7' of http://git.linaro.org/people/daniel.lezcano/linux:
        drivers: firmware: psci: use const and __initconst for psci_cpuidle_ops
        soc: qcom: spm: Use const and __initconst for qcom_cpuidle_ops
        ARM: cpuidle: constify return value of arm_cpuidle_get_ops()
        ARM: cpuidle: add const qualifier to cpuidle_ops member in structures
      b5ebbcdb
  5. 24 Apr, 2016 2 commits
  6. 23 Apr, 2016 10 commits
  7. 22 Apr, 2016 22 commits