• Thomas Gleixner's avatar
    stop_machine: Mark per cpu stopper enabled early · 46c498c2
    Thomas Gleixner authored
    commit 14e568e7 (stop_machine: Use smpboot threads) introduced the
    following regression:
    
    Before this commit the stopper enabled bit was set in the online
    notifier.
    
    CPU0				CPU1
    cpu_up
    				cpu online
    hotplug_notifier(ONLINE)
      stopper(CPU1)->enabled = true;
    ...
    stop_machine()
    
    The conversion to smpboot threads moved the enablement to the wakeup
    path of the parked thread. The majority of users seem to have the
    following working order:
    
    CPU0				CPU1
    cpu_up
    				cpu online
    unpark_threads()
      wakeup(stopper[CPU1])
    ....
    				stopper thread runs
    				  stopper(CPU1)->enabled = true;
    stop_machine()
    
    But Konrad and Sander have observed:
    
    CPU0				CPU1
    cpu_up
    				cpu online
    unpark_threads()
      wakeup(stopper[CPU1])
    ....
    stop_machine()
    				stopper thread runs
    				  stopper(CPU1)->enabled = true;
    
    Now the stop machinery kicks CPU0 into the stop loop, where it gets
    stuck forever because the queue code saw stopper(CPU1)->enabled ==
    false, so CPU0 waits for CPU1 to enter stomp_machine, but the CPU1
    stopper work got discarded due to enabled == false.
    
    Add a pre_unpark function to the smpboot thread descriptor and call it
    before waking the thread.
    
    This fixes the problem at hand, but the stop_machine code should be
    more robust. The stopper->enabled flag smells fishy at best.
    
    Thanks to Konrad for going through a loop of debug patches and
    providing the information to decode this issue.
    Reported-and-tested-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Reported-and-tested-by: default avatarSander Eikelenboom <linux@eikelenboom.it>
    Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
    Cc: Rusty Russell <rusty@rustcorp.com.au>
    Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1302261843240.22263@ionosSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    46c498c2
stop_machine.c 14.6 KB