• Yishai Hadas's avatar
    IB/mlx4: Reset flow support for IB kernel ULPs · 35f05dab
    Yishai Hadas authored
    The driver exposes interfaces that directly relate to HW state. Upon fatal
    error, consumers of these interfaces (ULPs) that rely on completion of
    all their posted work-request could hang, thereby introducing dependencies
    in shutdown order.  To prevent this from happening, we manage the
    relevant resources (CQs, QPs) that are used by the device. Upon a fatal error,
    we now generate simulated completions for outstanding WQEs that were not
    completed at the time the HW was reset.
    
    It includes invoking the completion event handler for all involved CQs so that
    the ULPs will poll those CQs. When polled we return simulated CQEs with
    IB_WC_WR_FLUSH_ERR return code enabling ULPs to clean up their resources and
    not wait forever for completions upon receiving remove_one.
    
    The above change requires an extra check in the data path to make sure that when
    device is in error state, the simulated CQEs will be returned and no further
    WQEs will be posted.
    Signed-off-by: default avatarYishai Hadas <yishaih@mellanox.com>
    Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    35f05dab
main.c 75.7 KB