• James Gowans's avatar
    genirq: Allow fasteoi handler to resend interrupts on concurrent handling · 9c15eeb5
    James Gowans authored
    There is a class of interrupt controllers out there that, once they
    have signalled a given interrupt number, will still signal incoming
    instances of the *same* interrupt despite the original interrupt
    not having been EOIed yet.
    
    As long as the new interrupt reaches the *same* CPU, nothing bad
    happens, as that CPU still has its interrupts globally disabled,
    and we will only take the new interrupt once the interrupt has
    been EOIed.
    
    However, things become more "interesting" if an affinity change comes
    in while the interrupt is being handled. More specifically, while
    the per-irq lock is being dropped. This results in the affinity change
    taking place immediately. At this point, there is nothing that prevents
    the interrupt from firing on the new target CPU. We end-up with the
    interrupt running concurrently on two CPUs, which isn't a good thing.
    
    And that's where things become worse: the new CPU notices that the
    interrupt handling is in progress (irq_may_run() return false), and
    *drops the interrupt on the floor*.
    
    The whole race looks like this:
    
               CPU 0             |          CPU 1
    -----------------------------|-----------------------------
    interrupt start              |
      handle_fasteoi_irq         | set_affinity(CPU 1)
        handler                  |
        ...                      | interrupt start
        ...                      |   handle_fasteoi_irq -> early out
      handle_fasteoi_irq return  | interrupt end
    interrupt end                |
    
    If the interrupt was an edge, too bad. The interrupt is lost, and
    the system will eventually die one way or another. Not great.
    
    A way to avoid this situation is to detect this problem at the point
    we handle the interrupt on the new target. Instead of dropping the
    interrupt, use the resend mechanism to force it to be replayed.
    
    Also, in order to limit the impact of this workaround to the pathetic
    architectures that require it, gate it behind a new irq flag aptly
    named IRQD_RESEND_WHEN_IN_PROGRESS.
    Suggested-by: default avatarMarc Zyngier <maz@kernel.org>
    Signed-off-by: default avatarJames Gowans <jgowans@amazon.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Marc Zyngier <maz@kernel.org>
    Cc: KarimAllah Raslan <karahmed@amazon.com>
    Cc: Yipeng Zou <zouyipeng@huawei.com>
    Cc: Zhang Jianhua <chris.zjh@huawei.com>
    [maz: reworded commit mesage]
    Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
    Link: https://lore.kernel.org/r/20230608120021.3273400-3-jgowans@amazon.com
    9c15eeb5
chip.c 40.8 KB