• Nicholas Piggin's avatar
    powerpc/64s: Fix lost pending interrupt due to race causing lost update to irq_happened · ff6781fd
    Nicholas Piggin authored
    force_external_irq_replay() can be called in the do_IRQ path with
    interrupts hard enabled and soft disabled if may_hard_irq_enable() set
    MSR[EE]=1. It updates local_paca->irq_happened with a load, modify,
    store sequence. If a maskable interrupt hits during this sequence, it
    will go to the masked handler to be marked pending in irq_happened.
    This update will be lost when the interrupt returns and the store
    instruction executes. This can result in unpredictable latencies,
    timeouts, lockups, etc.
    
    Fix this by ensuring hard interrupts are disabled before modifying
    irq_happened.
    
    This could cause any maskable asynchronous interrupt to get lost, but
    it was noticed on P9 SMP system doing RDMA NVMe target over 100GbE,
    so very high external interrupt rate and high IPI rate. The hang was
    bisected down to enabling doorbell interrupts for IPIs. These provided
    an interrupt type that could run at high rates in the do_IRQ path,
    stressing the race.
    
    Fixes: 1d607bb3 ("powerpc/irq: Add mechanism to force a replay of interrupts")
    Cc: stable@vger.kernel.org # v4.8+
    Reported-by: default avatarCarol L. Soto <clsoto@us.ibm.com>
    Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
    Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    ff6781fd
irq.c 20.7 KB