• Bitao Hu's avatar
    watchdog/softlockup: Report the most frequent interrupts · e9a9292e
    Bitao Hu authored
    When the watchdog determines that the current soft lockup is due to an
    interrupt storm based on CPU utilization, reporting the most frequent
    interrupts could be good enough for further troubleshooting.
    
    Below is an example of interrupt storm. The call tree does not provide
    useful information, but analyzing which interrupt caused the soft lockup by
    comparing the counts of interrupts during the lockup period allows to
    identify the culprit.
    
    [  638.870231] watchdog: BUG: soft lockup - CPU#9 stuck for 26s! [swapper/9:0]
    [  638.870825] CPU#9 Utilization every 4s during lockup:
    [  638.871194]  #1:   0% system,          0% softirq,   100% hardirq,     0% idle
    [  638.871652]  #2:   0% system,          0% softirq,   100% hardirq,     0% idle
    [  638.872107]  #3:   0% system,          0% softirq,   100% hardirq,     0% idle
    [  638.872563]  #4:   0% system,          0% softirq,   100% hardirq,     0% idle
    [  638.873018]  #5:   0% system,          0% softirq,   100% hardirq,     0% idle
    [  638.873494] CPU#9 Detect HardIRQ Time exceeds 50%. Most frequent HardIRQs:
    [  638.873994]  #1: 330945      irq#7
    [  638.874236]  #2: 31          irq#82
    [  638.874493]  #3: 10          irq#10
    [  638.874744]  #4: 2           irq#89
    [  638.874992]  #5: 1           irq#102
    ...
    [  638.875313] Call trace:
    [  638.875315]  __do_softirq+0xa8/0x364
    Signed-off-by: default avatarBitao Hu <yaoma@linux.alibaba.com>
    Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Reviewed-by: default avatarLiu Song <liusong@linux.alibaba.com>
    Reviewed-by: default avatarDouglas Anderson <dianders@chromium.org>
    Link: https://lore.kernel.org/r/20240411074134.30922-6-yaoma@linux.alibaba.com
    e9a9292e
watchdog.c 33.7 KB