• Nikhil Rao's avatar
    sched: Increase SCHED_LOAD_SCALE resolution · c8b28116
    Nikhil Rao authored
    Introduce SCHED_LOAD_RESOLUTION, which scales is added to
    SCHED_LOAD_SHIFT and increases the resolution of
    SCHED_LOAD_SCALE. This patch sets the value of
    SCHED_LOAD_RESOLUTION to 10, scaling up the weights for all
    sched entities by a factor of 1024. With this extra resolution,
    we can handle deeper cgroup hiearchies and the scheduler can do
    better shares distribution and load load balancing on larger
    systems (especially for low weight task groups).
    
    This does not change the existing user interface, the scaled
    weights are only used internally. We do not modify
    prio_to_weight values or inverses, but use the original weights
    when calculating the inverse which is used to scale execution
    time delta in calc_delta_mine(). This ensures we do not lose
    accuracy when accounting time to the sched entities. Thanks to
    Nikunj Dadhania for fixing an bug in c_d_m() that broken fairness.
    
    Below is some analysis of the performance costs/improvements of
    this patch.
    
    1. Micro-arch performance costs:
    
    Experiment was to run Ingo's pipe_test_100k 200 times with the
    task pinned to one cpu. I measured instruction, cycles and
    stalled-cycles for the runs. See:
    
       http://thread.gmane.org/gmane.linux.kernel/1129232/focus=1129389
    
    for more info.
    
    -tip (baseline):
    
     Performance counter stats for '/root/load-scale/pipe-test-100k' (200 runs):
    
           964,991,769 instructions             #    0.82  insns per cycle
                                                #    0.33  stalled cycles per insn
                                                #    ( +-  0.05% )
         1,171,186,635 cycles                   #    0.000 GHz                      ( +-  0.08% )
           306,373,664 stalled-cycles-backend   #   26.16% backend  cycles idle     ( +-  0.28% )
           314,933,621 stalled-cycles-frontend  #   26.89% frontend cycles idle     ( +-  0.34% )
    
            1.122405684  seconds time elapsed  ( +-  0.05% )
    
    -tip+patches:
    
     Performance counter stats for './load-scale/pipe-test-100k' (200 runs):
    
           963,624,821 instructions             #    0.82  insns per cycle
                                                #    0.33  stalled cycles per insn
                                                #    ( +-  0.04% )
         1,175,215,649 cycles                   #    0.000 GHz                      ( +-  0.08% )
           315,321,126 stalled-cycles-backend   #   26.83% backend  cycles idle     ( +-  0.28% )
           316,835,873 stalled-cycles-frontend  #   26.96% frontend cycles idle     ( +-  0.29% )
    
            1.122238659  seconds time elapsed  ( +-  0.06% )
    
    With this patch, instructions decrease by ~0.10% and cycles
    increase by 0.27%. This doesn't look statistically significant.
    The number of stalled cycles in the backend increased from
    26.16% to 26.83%. This can be attributed to the shifts we do in
    c_d_m() and other places. The fraction of stalled cycles in the
    frontend remains about the same, at 26.96% compared to 26.89% in -tip.
    
    2. Balancing low-weight task groups
    
    Test setup: run 50 tasks with random sleep/busy times (biased
    around 100ms) in a low weight container (with cpu.shares = 2).
    Measure %idle as reported by mpstat over a 10s window.
    
    -tip (baseline):
    
    06:47:48 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle    intr/s
    06:47:49 PM  all   94.32    0.00    0.06    0.00    0.00    0.00    0.00    0.00    5.62  15888.00
    06:47:50 PM  all   94.57    0.00    0.62    0.00    0.00    0.00    0.00    0.00    4.81  16180.00
    06:47:51 PM  all   94.69    0.00    0.06    0.00    0.00    0.00    0.00    0.00    5.25  15966.00
    06:47:52 PM  all   95.81    0.00    0.00    0.00    0.00    0.00    0.00    0.00    4.19  16053.00
    06:47:53 PM  all   94.88    0.06    0.00    0.00    0.00    0.00    0.00    0.00    5.06  15984.00
    06:47:54 PM  all   93.31    0.00    0.00    0.00    0.00    0.00    0.00    0.00    6.69  15806.00
    06:47:55 PM  all   94.19    0.00    0.06    0.00    0.00    0.00    0.00    0.00    5.75  15896.00
    06:47:56 PM  all   92.87    0.00    0.00    0.00    0.00    0.00    0.00    0.00    7.13  15716.00
    06:47:57 PM  all   94.88    0.00    0.00    0.00    0.00    0.00    0.00    0.00    5.12  15982.00
    06:47:58 PM  all   95.44    0.00    0.00    0.00    0.00    0.00    0.00    0.00    4.56  16075.00
    Average:     all   94.49    0.01    0.08    0.00    0.00    0.00    0.00    0.00    5.42  15954.60
    
    -tip+patches:
    
    06:47:03 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle    intr/s
    06:47:04 PM  all  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  16630.00
    06:47:05 PM  all   99.69    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.31  16580.20
    06:47:06 PM  all   99.69    0.00    0.06    0.00    0.00    0.00    0.00    0.00    0.25  16596.00
    06:47:07 PM  all   99.20    0.00    0.74    0.00    0.00    0.06    0.00    0.00    0.00  17838.61
    06:47:08 PM  all  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  16540.00
    06:47:09 PM  all  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  16575.00
    06:47:10 PM  all  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  16614.00
    06:47:11 PM  all   99.94    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.06  16588.00
    06:47:12 PM  all   99.94    0.00    0.06    0.00    0.00    0.00    0.00    0.00    0.00  16593.00
    06:47:13 PM  all   99.94    0.00    0.06    0.00    0.00    0.00    0.00    0.00    0.00  16551.00
    Average:     all   99.84    0.00    0.09    0.00    0.00    0.01    0.00    0.00    0.06  16711.58
    
    We see an improvement in idle% on the system (drops from 5.42% on -tip to 0.06%
    with the patches).
    
    We see an improvement in idle% on the system (drops from 5.42%
    on -tip to 0.06% with the patches).
    Signed-off-by: default avatarNikhil Rao <ncrao@google.com>
    Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
    Cc: Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com>
    Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
    Cc: Stephan Barwolf <stephan.baerwolf@tu-ilmenau.de>
    Cc: Mike Galbraith <efault@gmx.de>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Link: http://lkml.kernel.org/r/1305754668-18792-1-git-send-email-ncrao@google.comSigned-off-by: default avatarIngo Molnar <mingo@elte.hu>
    c8b28116
sched.c 219 KB