• Nicholas Piggin's avatar
    powerpc/watchdog: Avoid holding wd_smp_lock over printk and smp_send_nmi_ipi · 76521c4b
    Nicholas Piggin authored
    
    
    There is a deadlock with the console_owner lock and the wd_smp_lock:
    
    CPU x takes the console_owner lock
    CPU y takes a watchdog timer interrupt and takes __wd_smp_lock
    CPU x takes a soft-NMI interrupt, detects deadlock, spins on __wd_smp_lock
    CPU y detects deadlock, tries to print something and spins on console_owner
    -> deadlock
    
    Change the watchdog locking scheme so wd_smp_lock protects the watchdog
    internal data, but "reporting" (printing, issuing NMI IPIs, taking any
    action outside of watchdog) uses a non-waiting exclusion. If a CPU detects
    a problem but can not take the reporting lock, it just returns because
    something else is already reporting. It will try again at some point.
    
    Typically hard lockup watchdog report usefulness is not impacted due to
    failure to spewing a large enough amount of data in as short a time as
    possible, but by messages getting garbled.
    
    Laurent debugged this and found the deadlock, and this patch is based on
    his general approach to avoid expensive operations while holding the lock.
    With the addition of the reporting exclusion.
    Signed-off-by: default avatarLaurent Dufour <ldufour@linux.ibm.com>
    [np: rework to add reporting exclusion update changelog]
    Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
    Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20211110025056.2084347-4-npiggin@gmail.com
    76521c4b
watchdog.c 14.9 KB