• Edwin Peer's avatar
    bnxt_en: reliably allocate IRQ table on reset to avoid crash · 20d7d1c5
    Edwin Peer authored
    The following trace excerpt corresponds with a NULL pointer dereference
    of 'bp->irq_tbl' in bnxt_setup_inta() on an Aarch64 system after many
    device resets:
    
        Unable to handle kernel NULL pointer dereference at ... 000000d
        ...
        pc : string+0x3c/0x80
        lr : vsnprintf+0x294/0x7e0
        sp : ffff00000f61ba70 pstate : 20000145
        x29: ffff00000f61ba70 x28: 000000000000000d
        x27: ffff0000009c8b5a x26: ffff00000f61bb80
        x25: ffff0000009c8b5a x24: 0000000000000012
        x23: 00000000ffffffe0 x22: ffff000008990428
        x21: ffff00000f61bb80 x20: 000000000000000d
        x19: 000000000000001f x18: 0000000000000000
        x17: 0000000000000000 x16: ffff800b6d0fb400
        x15: 0000000000000000 x14: ffff800b7fe31ae8
        x13: 00001ed16472c920 x12: ffff000008c6b1c9
        x11: ffff000008cf0580 x10: ffff00000f61bb80
        x9 : 00000000ffffffd8 x8 : 000000000000000c
        x7 : ffff800b684b8000 x6 : 0000000000000000
        x5 : 0000000000000065 x4 : 0000000000000001
        x3 : ffff0a00ffffff04 x2 : 000000000000001f
        x1 : 0000000000000000 x0 : 000000000000000d
        Call trace:
        string+0x3c/0x80
        vsnprintf+0x294/0x7e0
        snprintf+0x44/0x50
        __bnxt_open_nic+0x34c/0x928 [bnxt_en]
        bnxt_open+0xe8/0x238 [bnxt_en]
        __dev_open+0xbc/0x130
        __dev_change_flags+0x12c/0x168
        dev_change_flags+0x20/0x60
        ...
    
    Ordinarily, a call to bnxt_setup_inta() (not in trace due to inlining)
    would not be expected on a system supporting MSIX at all. However, if
    bnxt_init_int_mode() does not end up being called after the call to
    bnxt_clear_int_mode() in bnxt_fw_reset_close(), then the driver will
    think that only INTA is supported and bp->irq_tbl will be NULL,
    causing the above crash.
    
    In the error recovery scenario, we call bnxt_clear_int_mode() in
    bnxt_fw_reset_close() early in the sequence. Ordinarily, we will
    call bnxt_init_int_mode() in bnxt_hwrm_if_change() after we
    reestablish communication with the firmware after reset.  However,
    if the sequence has to abort before we call bnxt_init_int_mode() and
    if the user later attempts to re-open the device, then it will cause
    the crash above.
    
    We fix it in 2 ways:
    
    1. Check for bp->irq_tbl in bnxt_setup_int_mode(). If it is NULL, call
    bnxt_init_init_mode().
    
    2. If we need to abort in bnxt_hwrm_if_change() and cannot complete
    the error recovery sequence, set the BNXT_STATE_ABORT_ERR flag.  This
    will cause more drastic recovery at the next attempt to re-open the
    device, including a call to bnxt_init_int_mode().
    
    Fixes: 3bc7d4a3 ("bnxt_en: Add BNXT_STATE_IN_FW_RESET state.")
    Reviewed-by: default avatarScott Branden <scott.branden@broadcom.com>
    Signed-off-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
    Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
    Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
    20d7d1c5
bnxt.c 349 KB