• Maor Gottlieb's avatar
    IB/mlx5: Reset flow support for IB kernel ULPs · 89ea94a7
    Maor Gottlieb authored
    The driver exposes interfaces that directly relate to HW state.
    Upon fatal error, consumers of these interfaces (ULPs) that rely
    on completion of all their posted work-request could hang, thereby
    introducing dependencies in shutdown order. To prevent this from
    happening, we manage the relevant resources (CQs, QPs) that are used
    by the device. Upon a fatal error, we now generate simulated
    completions for outstanding WQEs that were not completed at the
    time the HW was reset.
    
    It includes invoking the completion event handler for all involved
    CQs so that the ULPs will poll those CQs. When polled we return
    simulated CQEs with IB_WC_WR_FLUSH_ERR return code enabling ULPs
    to clean up their  resources and not wait forever for completions
    upon receiving remove_one.
    
    The above change requires an extra check in the data path to make
    sure that when device is in error state, the simulated CQEs will
    be returned and no further WQEs will be posted.
    Signed-off-by: default avatarMaor Gottlieb <maorg@mellanox.com>
    Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
    Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
    89ea94a7
main.c 73.7 KB