• Chunguang Xu's avatar
    nvme: fix reconnection fail due to reserved tag allocation · de105068
    Chunguang Xu authored
    We found a issue on production environment while using NVMe over RDMA,
    admin_q reconnect failed forever while remote target and network is ok.
    After dig into it, we found it may caused by a ABBA deadlock due to tag
    allocation. In my case, the tag was hold by a keep alive request
    waiting inside admin_q, as we quiesced admin_q while reset ctrl, so the
    request maked as idle and will not process before reset success. As
    fabric_q shares tagset with admin_q, while reconnect remote target, we
    need a tag for connect command, but the only one reserved tag was held
    by keep alive command which waiting inside admin_q. As a result, we
    failed to reconnect admin_q forever. In order to fix this issue, I
    think we should keep two reserved tags for admin queue.
    
    Fixes: ed01fee2 ("nvme-fabrics: only reserve a single tag")
    Signed-off-by: default avatarChunguang Xu <chunguang.xu@shopee.com>
    Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
    Reviewed-by: default avatarChaitanya Kulkarni <kch@nvidia.com>
    Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
    de105068
fabrics.h 8.26 KB