• Dmitry Adamushko's avatar
    cpusets, hotplug, scheduler: fix scheduler domain breakage · 3e84050c
    Dmitry Adamushko authored
    Commit f18f982a ("sched: CPU hotplug events must not destroy scheduler
    domains created by the cpusets") introduced a hotplug-related problem as
    described below:
    
    Upon CPU_DOWN_PREPARE,
    
      update_sched_domains() -> detach_destroy_domains(&cpu_online_map)
    
    does the following:
    
    /*
     * Force a reinitialization of the sched domains hierarchy. The domains
     * and groups cannot be updated in place without racing with the balancing
     * code, so we temporarily attach all running cpus to the NULL domain
     * which will prevent rebalancing while the sched domains are recalculated.
     */
    
    The sched-domains should be rebuilt when a CPU_DOWN ops. has been
    completed, effectively either upon CPU_DEAD{_FROZEN} (upon success) or
    CPU_DOWN_FAILED{_FROZEN} (upon failure -- restore the things to their
    initial state). That's what update_sched_domains() also does but only
    for !CPUSETS case.
    
    With f18f982a, sched-domains' reinitialization is delegated to
    CPUSETS code:
    
    cpuset_handle_cpuhp() -> common_cpu_mem_hotplug_unplug() ->
    rebuild_sched_domains()
    
    Being called for CPU_UP_PREPARE and if its callback is called after
    update_sched_domains()), it just negates all the work done by
    update_sched_domains() -- i.e. a soon-to-be-offline cpu is included in
    the sched-domains and that makes it visible for the load-balancer
    while the CPU_DOWN ops. is in progress.
    
    __migrate_live_tasks() moves the tasks off a 'dead' cpu (it's already
    "offline" when this function is called).
    
    try_to_wake_up() is called for one of these tasks from another CPU ->
    the load-balancer (wake_idle()) picks up a "dead" CPU and places the
    task on it. Then e.g. BUG_ON(rq->nr_running) detects this a bit later
    -> oops.
    Signed-off-by: default avatarDmitry Adamushko <dmitry.adamushko@gmail.com>
    Tested-by: default avatarVegard Nossum <vegard.nossum@gmail.com>
    Cc: Paul Menage <menage@google.com>
    Cc: Max Krasnyansky <maxk@qualcomm.com>
    Cc: Paul Jackson <pj@sgi.com>
    Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Cc: miaox@cn.fujitsu.com
    Cc: rostedt@goodmis.org
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
    3e84050c
cpuset.c 69.1 KB