• Nathan Lynch's avatar
    powerpc/pseries/mobility: use stop_machine for join/suspend · 9327dc0a
    Nathan Lynch authored
    The partition suspend sequence as specified in the platform
    architecture requires that all active processor threads call
    H_JOIN, which:
    
    - suspends the calling thread until it is the target of
      an H_PROD; or
    - immediately returns H_CONTINUE, if the calling thread is the last to
      call H_JOIN. This thread is expected to call ibm,suspend-me to
      completely suspend the partition.
    
    Upon returning from ibm,suspend-me the calling thread must wake all
    others using H_PROD.
    
    rtas_ibm_suspend_me_unsafe() uses on_each_cpu() to implement this
    protocol, but because of its synchronizing nature this is susceptible
    to deadlock versus users of stop_machine() or other callers of
    on_each_cpu().
    
    Not only is stop_machine() intended for use cases like this, it
    handles error propagation and allows us to keep the data shared
    between CPUs minimal: a single atomic counter which ensures exactly
    one CPU will wake the others from their joined states.
    
    Switch the migration code to use stop_machine() and a less complex
    local implementation of the H_JOIN/ibm,suspend-me logic, which
    carries additional benefits:
    
    - more informative error reporting, appropriately ratelimited
    - resets the lockup detector / watchdog on resume to prevent lockup
      warnings when the OS has been suspended for a time exceeding the
      threshold.
    
    Fixes: 91dc182c ("[PATCH] powerpc: special-case ibm,suspend-me RTAS call")
    Signed-off-by: default avatarNathan Lynch <nathanl@linux.ibm.com>
    Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20201207215200.1785968-13-nathanl@linux.ibm.com
    9327dc0a
mobility.c 11.8 KB