1. 13 Feb, 2019 2 commits
  2. 11 Feb, 2019 14 commits
  3. 09 Feb, 2019 5 commits
    • Doug Ledford's avatar
      Merge branch 'wip/dl-for-next' into for-next · 82771f20
      Doug Ledford authored
      Due to concurrent work by myself and Jason, a normal fast forward merge
      was not possible.  This brings in a number of hfi1 changes, mainly the
      hfi1 TID RDMA support (roughly 10,000 LOC change), which was reviewed
      and integrated over a period of days.
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      82771f20
    • Doug Ledford's avatar
      Merge branch 'hfi1-tid' into wip/dl-for-next · 416fbc1b
      Doug Ledford authored
      Omni-Path TID RDMA Feature
      
      Intel Omni-Path (OPA) TID RDMA support is a feature that accelerates
      data movement between two OPA nodes through the IB Verbs interface. It
      improves RDMA READ/WRITE performance by delivering the data payload to a
      user buffer directly without any software copying.
      
      Architecture
      =============
      The TID RDMA protocol is implemented on the hfi1 driver level and is
      therefore transparent to the ULPs. It is designed to facilitate the data
      transactions for two specific RDMA requests:
        - RDMA READ;
        - RDMA WRITE.
      Previously, when a verbs data packet is received at the destination
      (requester side for RDMA READ and responder side for RDMA WRITE), the
      data payload is copied to the user buffer by software, which slows down
      the performance significantly for large requests.
      
      Internally, hfi1 converts qualified RDMA READ/WRITE requests into TID
      RDMA READ/WRITE requests when the requests are post sent to the hfi1
      driver. Non-qualified RDMA requests are handled by normal RDMA protocol.
      
      For TID RDMA requests, hardware resources (hardware flow and TID entries)
      are allocated on the destination side (the requester side for TID RDMA
      READ and the responder side for TID RDMA WRITE). The information for
      these resources is conveyed to the data source side (the responder side
      for TID RDMA READ and the requester side for TID RDMA WRITE) and embedded
      in data packets. When data packets are received by the destination,
      hardware will deliver the data payload to the destination buffer without
      involving software and therefore improve the performance.
      
      Details
      =======
      RDMA READ/WRITE requests are qualified by the following:
        - Total data length >= 256k;
        - Totoal data length is a multiple of 4K pages.
      
      Additional qualifications are enforced for the destination buffers:
        For RDMA RAED:
          - Each destination sge buffer is 4K aligned;
          - Each destination sge buffer is a multiple of 4K pages.
        For RDMA WRITE:
          - The destination number is 4K aligned.
      
      In addition, in an OPA fabric, some nodes may support TID RDMA while
      others may not. As such, it is important for two transaction nodes to
      exchange the information about the features they support. This discovery
      mechanism is called OPA Feature Negotion (OPFN) and is described in
      details in the patch series. Through OPFN, two nodes can find whether
      they both support TID RDMA and subsequently convert RDMA requests into
      TID RDMA requests.
      
      * hfi1-tid: (46 commits)
        IB/hfi1: Prioritize the sending of ACK packets
        IB/hfi1: Add static trace for TID RDMA WRITE protocol
        IB/hfi1: Enable TID RDMA WRITE protocol
        IB/hfi1: Add interlock between TID RDMA WRITE and other requests
        IB/hfi1: Add TID RDMA WRITE functionality into RDMA verbs
        IB/hfi1: Add the dual leg code
        IB/hfi1: Add the TID second leg ACK packet builder
        IB/hfi1: Add the TID second leg send packet builder
        IB/hfi1: Resend the TID RDMA WRITE DATA packets
        IB/hfi1: Add a function to receive TID RDMA RESYNC packet
        IB/hfi1: Add a function to build TID RDMA RESYNC packet
        IB/hfi1: Add TID RDMA retry timer
        IB/hfi1: Add a function to receive TID RDMA ACK packet
        IB/hfi1: Add a function to build TID RDMA ACK packet
        IB/hfi1: Add a function to receive TID RDMA WRITE DATA packet
        IB/hfi1: Add a function to build TID RDMA WRITE DATA packet
        IB/hfi1: Add a function to receive TID RDMA WRITE response
        IB/hfi1: Add TID resource timer
        IB/hfi1: Add a function to build TID RDMA WRITE response
        IB/hfi1: Add functions to receive TID RDMA WRITE request
        ...
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      416fbc1b
    • Raju Rangoju's avatar
      iw_cxgb4: fix srqidx leak during connection abort · f368ff18
      Raju Rangoju authored
      When an application aborts the connection by moving QP from RTS to ERROR,
      then iw_cxgb4's modify_rc_qp() RTS->ERROR logic sets the
      *srqidxp to 0 via t4_set_wq_in_error(&qhp->wq, 0), and aborts the
      connection by calling c4iw_ep_disconnect().
      
      c4iw_ep_disconnect() does the following:
       1. sends up a close_complete_upcall(ep, -ECONNRESET) to libcxgb4.
       2. sends abort request CPL to hw.
      
      But, since the close_complete_upcall() is sent before sending the
      ABORT_REQ to hw, libcxgb4 would fail to release the srqidx if the
      connection holds one. Because, the srqidx is passed up to libcxgb4 only
      after corresponding ABORT_RPL is processed by kernel in abort_rpl().
      
      This patch handle the corner-case by moving the call to
      close_complete_upcall() from c4iw_ep_disconnect() to abort_rpl().  So that
      libcxgb4 is notified about the -ECONNRESET only after abort_rpl(), and
      libcxgb4 can relinquish the srqidx properly.
      Signed-off-by: default avatarRaju Rangoju <rajur@chelsio.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      f368ff18
    • Raju Rangoju's avatar
      iw_cxgb4: complete the cached SRQ buffers · 11a27e21
      Raju Rangoju authored
      If TP fetches an SRQ buffer but ends up not using it before the connection
      is aborted, then it passes the index of that SRQ buffer to the host in
      ABORT_REQ_RSS or ABORT_RPL CPL message.
      
      But, if the srqidx field is zero in the received ABORT_RPL or
      ABORT_REQ_RSS CPL, then we need to read the tcb.rq_start field to see if
      it really did have an RQE cached. This works around a case where HW does
      not include the srqidx in the ABORT_RPL/ABORT_REQ_RSS CPL.
      
      The final value of rq_start is the one present in TCB with the
      TF_RX_PDU_OUT bit cleared. So, we need to read the TCB, examine the
      TF_RX_PDU_OUT (bit 49 of t_flags) in order to determine if there's a rx
      PDU feedback event pending.
      Signed-off-by: default avatarRaju Rangoju <rajur@chelsio.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      11a27e21
    • Raju Rangoju's avatar
      cxgb4: add tcb flags and tcb rpl struct · e381a1cb
      Raju Rangoju authored
      This patch adds the tcb flags and structures needed for querying tcb
      information.
      Signed-off-by: default avatarRaju Rangoju <rajur@chelsio.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      e381a1cb
  4. 08 Feb, 2019 19 commits