• Hans Westgaard Ry's avatar
    IB/core: Issue DREQ when receiving REQ/REP for stale QP · 9315bc9a
    Hans Westgaard Ry authored
    from "InfiBand Architecture Specifications Volume 1":
    
      A QP is said to have a stale connection when only one side has
      connection information. A stale connection may result if the remote CM
      had dropped the connection and sent a DREQ but the DREQ was never
      received by the local CM. Alternatively the remote CM may have lost
      all record of past connections because its node crashed and rebooted,
      while the local CM did not become aware of the remote node's reboot
      and therefore did not clean up stale connections.
    
    and:
    
       A local CM may receive a REQ/REP for a stale connection. It shall
       abort the connection issuing REJ to the REQ/REP. It shall then issue
       DREQ with "DREQ:remote QPN” set to the remote QPN from the REQ/REP.
    
    This patch solves a problem with reuse of QPN. Current codebase, that
    is IPoIB, relies on a REAP-mechanism to do cleanup of the structures
    in CM. A problem with this is the timeconstants governing this
    mechanism; they are up to 768 seconds and the interface may look
    inresponsive in that period.  Issuing a DREQ (and receiving a DREP)
    does the necessary cleanup and the interface comes up.
    Signed-off-by: default avatarHans Westgaard Ry <hans.westgaard.ry@oracle.com>
    Reviewed-by: default avatarHåkon Bugge <haakon.bugge@oracle.com>
    Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
    Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
    9315bc9a
cm.c 112 KB