• Michael J. Ruhl's avatar
    IB/hfi1: Re-order IRQ cleanup to address driver cleanup race · 82a97926
    Michael J. Ruhl authored
    The pci_request_irq() interfaces always adds the IRQF_SHARED bit to
    all IRQ requests.
    
    When the kernel is built with CONFIG_DEBUG_SHIRQ config flag, if the
    IRQF_SHARED bit is set, a call to the IRQ handler is made from the
    __free_irq() function. This is testing a race condition between the
    IRQ cleanup and an IRQ racing the cleanup.  The HFI driver should be
    able to handle this race, but does not.
    
    This race can cause traces that start with this footprint:
    
    BUG: unable to handle kernel NULL pointer dereference at   (null)
    Call Trace:
     <hfi1 irq handler>
     ...
     __free_irq+0x1b3/0x2d0
     free_irq+0x35/0x70
     pci_free_irq+0x1c/0x30
     clean_up_interrupts+0x53/0xf0 [hfi1]
     hfi1_start_cleanup+0x122/0x190 [hfi1]
     postinit_cleanup+0x1d/0x280 [hfi1]
     remove_one+0x233/0x250 [hfi1]
     pci_device_remove+0x39/0xc0
    
    Export IRQ cleanup function so it can be called from other modules.
    
    Using the exported cleanup function:
    
      Re-order the driver cleanup code to clean up IRQ resources before
      other resources, eliminating the race.
    
      Re-order error path for init so that the race does not occur.
    
    Reduce severity on spurious error message for SDMA IRQs to info.
    Reviewed-by: default avatarAlex Estrin <alex.estrin@intel.com>
    Reviewed-by: default avatarPatel Jay P <jay.p.patel@intel.com>
    Reviewed-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
    Signed-off-by: default avatarMichael J. Ruhl <michael.j.ruhl@intel.com>
    Signed-off-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
    Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
    82a97926
chip.c 449 KB