• Ben Greear's avatar
    Fix lockup related to stop_machine being stuck in __do_softirq. · 34376a50
    Ben Greear authored
    The stop machine logic can lock up if all but one of the migration
    threads make it through the disable-irq step and the one remaining
    thread gets stuck in __do_softirq.  The reason __do_softirq can hang is
    that it has a bail-out based on jiffies timeout, but in the lockup case,
    jiffies itself is not incremented.
    
    To work around this, re-add the max_restart counter in __do_irq and stop
    processing irqs after 10 restarts.
    
    Thanks to Tejun Heo and Rusty Russell and others for helping me track
    this down.
    
    This was introduced in 3.9 by commit c10d7367 ("softirq: reduce
    latencies").
    
    It may be worth looking into ath9k to see if it has issues with its irq
    handler at a later date.
    
    The hang stack traces look something like this:
    
        ------------[ cut here ]------------
        WARNING: at kernel/watchdog.c:245 watchdog_overflow_callback+0x9c/0xa7()
        Watchdog detected hard LOCKUP on cpu 2
        Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen lockd sunrpc]
        Pid: 23, comm: migration/2 Tainted: G         C   3.9.4+ #11
        Call Trace:
         <NMI>   warn_slowpath_common+0x85/0x9f
          warn_slowpath_fmt+0x46/0x48
          watchdog_overflow_callback+0x9c/0xa7
          __perf_event_overflow+0x137/0x1cb
          perf_event_overflow+0x14/0x16
          intel_pmu_handle_irq+0x2dc/0x359
          perf_event_nmi_handler+0x19/0x1b
          nmi_handle+0x7f/0xc2
          do_nmi+0xbc/0x304
          end_repeat_nmi+0x1e/0x2e
         <<EOE>>
          cpu_stopper_thread+0xae/0x162
          smpboot_thread_fn+0x258/0x260
          kthread+0xc7/0xcf
          ret_from_fork+0x7c/0xb0
        ---[ end trace 4947dfa9b0a4cec3 ]---
        BUG: soft lockup - CPU#1 stuck for 22s! [migration/1:17]
        Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen lockd sunrpc]
        irq event stamp: 835637905
        hardirqs last  enabled at (835637904): __do_softirq+0x9f/0x257
        hardirqs last disabled at (835637905): apic_timer_interrupt+0x6d/0x80
        softirqs last  enabled at (5654720): __do_softirq+0x1ff/0x257
        softirqs last disabled at (5654725): irq_exit+0x5f/0xbb
        CPU 1
        Pid: 17, comm: migration/1 Tainted: G        WC   3.9.4+ #11 To be filled by O.E.M. To be filled by O.E.M./To be filled by O.E.M.
        RIP: tasklet_hi_action+0xf0/0xf0
        Process migration/1
        Call Trace:
         <IRQ>
          __do_softirq+0x117/0x257
          irq_exit+0x5f/0xbb
          smp_apic_timer_interrupt+0x8a/0x98
          apic_timer_interrupt+0x72/0x80
         <EOI>
          printk+0x4d/0x4f
          stop_machine_cpu_stop+0x22c/0x274
          cpu_stopper_thread+0xae/0x162
          smpboot_thread_fn+0x258/0x260
          kthread+0xc7/0xcf
          ret_from_fork+0x7c/0xb0
    Signed-off-by: default avatarBen Greear <greearb@candelatech.com>
    Acked-by: default avatarTejun Heo <tj@kernel.org>
    Acked-by: default avatarPekka Riikonen <priikone@iki.fi>
    Cc: Eric Dumazet <eric.dumazet@gmail.com>
    Cc: stable@kernel.org
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    34376a50
softirq.c 21.2 KB