• Tom Tucker's avatar
    RPCRDMA: Fix FRMR registration/invalidate handling. · 5c635e09
    Tom Tucker authored
    When the rpc_memreg_strategy is 5, FRMR are used to map RPC data.
    This mode uses an FRMR to map the RPC data, then invalidates
    (i.e. unregisers) the data in xprt_rdma_free. These FRMR are used
    across connections on the same mount, i.e. if the connection goes
    away on an idle timeout and reconnects later, the FRMR are not
    destroyed and recreated.
    
    This creates a problem for transport errors because the WR that
    invalidate an FRMR may be flushed (i.e. fail) leaving the
    FRMR valid. When the FRMR is later used to map an RPC it will fail,
    tearing down the transport and starting over. Over time, more and
    more of the FRMR pool end up in the wrong state resulting in
    seemingly random disconnects.
    
    This fix keeps track of the FRMR state explicitly by setting it's
    state based on the successful completion of a reg/inv WR. If the FRMR
    is ever used and found to be in the wrong state, an invalidate WR
    is prepended, re-syncing the FRMR state and avoiding the connection loss.
    Signed-off-by: default avatarTom Tucker <tom@ogc.us>
    Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
    5c635e09
xprt_rdma.h 12.4 KB