• Rafael J. Wysocki's avatar
    Revert "cpuidle: Quickly notice prediction failure for repeat mode" · 14851912
    Rafael J. Wysocki authored
    Revert commit 69a37bea (cpuidle: Quickly notice prediction failure for
    repeat mode), because it has been identified as the source of a
    significant performance regression in v3.8 and later as explained by
    Jeremy Eder:
    
      We believe we've identified a particular commit to the cpuidle code
      that seems to be impacting performance of variety of workloads.
      The simplest way to reproduce is using netperf TCP_RR test, so
      we're using that, on a pair of Sandy Bridge based servers.  We also
      have data from a large database setup where performance is also
      measurably/positively impacted, though that test data isn't easily
      share-able.
    
      Included below are test results from 3 test kernels:
    
      kernel       reverts
      -----------------------------------------------------------
      1) vanilla   upstream (no reverts)
    
      2) perfteam2 reverts e11538d1
    
      3) test      reverts 69a37bea
                           e11538d1
    
      In summary, netperf TCP_RR numbers improve by approximately 4%
      after reverting 69a37bea.  When
      69a37bea is included, C0 residency
      never seems to get above 40%.  Taking that patch out gets C0 near
      100% quite often, and performance increases.
    
      The below data are histograms representing the %c0 residency @
      1-second sample rates (using turbostat), while under netperf test.
    
      - If you look at the first 4 histograms, you can see %c0 residency
        almost entirely in the 30,40% bin.
      - The last pair, which reverts 69a37bea,
        shows %c0 in the 80,90,100% bins.
    
      Below each kernel name are netperf TCP_RR trans/s numbers for the
      particular kernel that can be disclosed publicly, comparing the 3
      test kernels.  We ran a 4th test with the vanilla kernel where
      we've also set /dev/cpu_dma_latency=0 to show overall impact
      boosting single-threaded TCP_RR performance over 11% above
      baseline.
    
      3.10-rc2 vanilla RX + c0 lock (/dev/cpu_dma_latency=0):
      TCP_RR trans/s 54323.78
    
      -----------------------------------------------------------
      3.10-rc2 vanilla RX (no reverts)
      TCP_RR trans/s 48192.47
    
      Receiver %c0
          0.0000 -    10.0000 [     1]: *
         10.0000 -    20.0000 [     0]:
         20.0000 -    30.0000 [     0]:
         30.0000 -    40.0000 [    59]:
      ***********************************************************
         40.0000 -    50.0000 [     1]: *
         50.0000 -    60.0000 [     0]:
         60.0000 -    70.0000 [     0]:
         70.0000 -    80.0000 [     0]:
         80.0000 -    90.0000 [     0]:
         90.0000 -   100.0000 [     0]:
    
      Sender %c0
          0.0000 -    10.0000 [     1]: *
         10.0000 -    20.0000 [     0]:
         20.0000 -    30.0000 [     0]:
         30.0000 -    40.0000 [    11]: ***********
         40.0000 -    50.0000 [    49]:
      *************************************************
         50.0000 -    60.0000 [     0]:
         60.0000 -    70.0000 [     0]:
         70.0000 -    80.0000 [     0]:
         80.0000 -    90.0000 [     0]:
         90.0000 -   100.0000 [     0]:
    
      -----------------------------------------------------------
      3.10-rc2 perfteam2 RX (reverts commit
      e11538d1)
      TCP_RR trans/s 49698.69
    
      Receiver %c0
          0.0000 -    10.0000 [     1]: *
         10.0000 -    20.0000 [     1]: *
         20.0000 -    30.0000 [     0]:
         30.0000 -    40.0000 [    59]:
      ***********************************************************
         40.0000 -    50.0000 [     0]:
         50.0000 -    60.0000 [     0]:
         60.0000 -    70.0000 [     0]:
         70.0000 -    80.0000 [     0]:
         80.0000 -    90.0000 [     0]:
         90.0000 -   100.0000 [     0]:
    
      Sender %c0
          0.0000 -    10.0000 [     1]: *
         10.0000 -    20.0000 [     0]:
         20.0000 -    30.0000 [     0]:
         30.0000 -    40.0000 [     2]: **
         40.0000 -    50.0000 [    58]:
      **********************************************************
         50.0000 -    60.0000 [     0]:
         60.0000 -    70.0000 [     0]:
         70.0000 -    80.0000 [     0]:
         80.0000 -    90.0000 [     0]:
         90.0000 -   100.0000 [     0]:
    
      -----------------------------------------------------------
      3.10-rc2 test RX (reverts 69a37bea
      and e11538d1)
      TCP_RR trans/s 47766.95
    
      Receiver %c0
          0.0000 -    10.0000 [     1]: *
         10.0000 -    20.0000 [     1]: *
         20.0000 -    30.0000 [     0]:
         30.0000 -    40.0000 [    27]: ***************************
         40.0000 -    50.0000 [     2]: **
         50.0000 -    60.0000 [     0]:
         60.0000 -    70.0000 [     2]: **
         70.0000 -    80.0000 [     0]:
         80.0000 -    90.0000 [     0]:
         90.0000 -   100.0000 [    28]: ****************************
    
      Sender:
          0.0000 -    10.0000 [     1]: *
         10.0000 -    20.0000 [     0]:
         20.0000 -    30.0000 [     0]:
         30.0000 -    40.0000 [    11]: ***********
         40.0000 -    50.0000 [     0]:
         50.0000 -    60.0000 [     1]: *
         60.0000 -    70.0000 [     0]:
         70.0000 -    80.0000 [     3]: ***
         80.0000 -    90.0000 [     7]: *******
         90.0000 -   100.0000 [    38]: **************************************
    
      These results demonstrate gaining back the tendency of the CPU to
      stay in more responsive, performant C-states (and thus yield
      measurably better performance), by reverting commit
      69a37bea.
    Requested-by: default avatarJeremy Eder <jeder@redhat.com>
    Tested-by: default avatarLen Brown <len.brown@intel.com>
    Cc: 3.8+ <stable@vger.kernel.org>
    Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
    14851912
tick-sched.c 29.3 KB