• Prasad Sodagudi's avatar
    stop_machine: Atomically queue and wake stopper threads · cfd35514
    Prasad Sodagudi authored
    When cpu_stop_queue_work() releases the lock for the stopper
    thread that was queued into its wake queue, preemption is
    enabled, which leads to the following deadlock:
    
    CPU0                              CPU1
    sched_setaffinity(0, ...)
    __set_cpus_allowed_ptr()
    stop_one_cpu(0, ...)              stop_two_cpus(0, 1, ...)
    cpu_stop_queue_work(0, ...)       cpu_stop_queue_two_works(0, ..., 1, ...)
    
    -grabs lock for migration/0-
                                      -spins with preemption disabled,
                                       waiting for migration/0's lock to be
                                       released-
    
    -adds work items for migration/0
    and queues migration/0 to its
    wake_q-
    
    -releases lock for migration/0
     and preemption is enabled-
    
    -current thread is preempted,
    and __set_cpus_allowed_ptr
    has changed the thread's
    cpu allowed mask to CPU1 only-
    
                                      -acquires migration/0 and migration/1's
                                       locks-
    
                                      -adds work for migration/0 but does not
                                       add migration/0 to wake_q, since it is
                                       already in a wake_q-
    
                                      -adds work for migration/1 and adds
                                       migration/1 to its wake_q-
    
                                      -releases migration/0 and migration/1's
                                       locks, wakes migration/1, and enables
                                       preemption-
    
                                      -since migration/1 is requested to run,
                                       migration/1 begins to run and waits on
                                       migration/0, but migration/0 will never
                                       be able to run, since the thread that
                                       can wake it is affine to CPU1-
    
    Disable preemption in cpu_stop_queue_work() before queueing works for
    stopper threads, and queueing the stopper thread in the wake queue, to
    ensure that the operation of queueing the works and waking the stopper
    threads is atomic.
    
    Fixes: 0b26351b ("stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock")
    Signed-off-by: default avatarPrasad Sodagudi <psodagud@codeaurora.org>
    Signed-off-by: default avatarIsaac J. Manjarres <isaacm@codeaurora.org>
    Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Cc: peterz@infradead.org
    Cc: matt@codeblueprint.co.uk
    Cc: bigeasy@linutronix.de
    Cc: gregkh@linuxfoundation.org
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/1533329766-4856-1-git-send-email-isaacm@codeaurora.orgCo-Developed-by: default avatarIsaac J. Manjarres <isaacm@codeaurora.org>
    cfd35514
stop_machine.c 17.8 KB