• Venkatesh Pallipadi's avatar
    sched: Next buddy hint on sleep and preempt path · 2f36825b
    Venkatesh Pallipadi authored
    When a task in a taskgroup sleeps, pick_next_task starts all the way back at
    the root and picks the task/taskgroup with the min vruntime across all
    runnable tasks.
    
    But when there are many frequently sleeping tasks across different taskgroups,
    it makes better sense to stay with same taskgroup for its slice period (or
    until all tasks in the taskgroup sleeps) instead of switching cross taskgroup
    on each sleep after a short runtime.
    
    This helps specifically where taskgroups corresponds to a process with
    multiple threads. The change reduces the number of CR3 switches in this case.
    
    Example:
    
    Two taskgroups with 2 threads each which are running for 2ms and
    sleeping for 1ms. Looking at sched:sched_switch shows:
    
    BEFORE: taskgroup_1 threads [5004, 5005], taskgroup_2 threads [5016, 5017]
          cpu-soaker-5004  [003]  3683.391089
          cpu-soaker-5016  [003]  3683.393106
          cpu-soaker-5005  [003]  3683.395119
          cpu-soaker-5017  [003]  3683.397130
          cpu-soaker-5004  [003]  3683.399143
          cpu-soaker-5016  [003]  3683.401155
          cpu-soaker-5005  [003]  3683.403168
          cpu-soaker-5017  [003]  3683.405170
    
    AFTER: taskgroup_1 threads [21890, 21891], taskgroup_2 threads [21934, 21935]
          cpu-soaker-21890 [003]   865.895494
          cpu-soaker-21935 [003]   865.897506
          cpu-soaker-21934 [003]   865.899520
          cpu-soaker-21935 [003]   865.901532
          cpu-soaker-21934 [003]   865.903543
          cpu-soaker-21935 [003]   865.905546
          cpu-soaker-21891 [003]   865.907548
          cpu-soaker-21890 [003]   865.909560
          cpu-soaker-21891 [003]   865.911571
          cpu-soaker-21890 [003]   865.913582
          cpu-soaker-21891 [003]   865.915594
          cpu-soaker-21934 [003]   865.917606
    
    Similar problem is there when there are multiple taskgroups and say a task A
    preempts currently running task B of taskgroup_1. On schedule, pick_next_task
    can pick an unrelated task on taskgroup_2. Here it would be better to give some
    preference to task B on pick_next_task.
    
    A simple (may be extreme case) benchmark I tried was tbench with 2 tbench
    client processes with 2 threads each running on a single CPU. Avg throughput
    across 5 50 sec runs was:
    
     BEFORE: 105.84 MB/sec
     AFTER:  112.42 MB/sec
    Signed-off-by: default avatarVenkatesh Pallipadi <venki@google.com>
    Acked-by: default avatarRik van Riel <riel@redhat.com>
    Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
    Link: http://lkml.kernel.org/r/1302802253-25760-1-git-send-email-venki@google.comSigned-off-by: default avatarIngo Molnar <mingo@elte.hu>
    2f36825b
sched_fair.c 110 KB