An error occurred fetching the project authors.
  1. 27 Sep, 2017 2 commits
  2. 20 Apr, 2017 2 commits
  3. 10 Jan, 2017 1 commit
    • Steve Wise's avatar
      iw_cxgb4: refactor sq/rq drain logic · 4fe7c296
      Steve Wise authored
      With the addition of the IB/Core drain API, iw_cxgb4 supported drain
      by watching the CQs when the QP was out of RTS and signalling "drain
      complete" when the last CQE is polled.  This, however, doesn't fully
      support the drain semantics. Namely, the drain logic is supposed to signal
      "drain complete" only when the application has _processed_ the last CQE,
      not just removed them from the CQ.  Thus a small timing hole exists that
      can cause touch after free type bugs in applications using the drain API
      (nvmf, iSER, for example).  So iw_cxgb4 needs a better solution.
      
      The iWARP Verbs spec mandates that "_at some point_ after the QP is
      moved to ERROR", the iWARP driver MUST synchronously fail post_send and
      post_recv calls.  iw_cxgb4 was currently not allowing any posts once the
      QP is in ERROR.  This was in part due to the fact that the HW queues for
      the QP in ERROR state are disabled at this point, so there wasn't much
      else to do but fail the post operation synchronously.  This restriction
      is what drove the first drain implementation in iw_cxgb4 that has the
      above mentioned flaw.
      
      This patch changes iw_cxgb4 to allow post_send and post_recv WRs after
      the QP is moved to ERROR state for kernel mode users, thus still adhering
      to the Verbs spec for user mode users, but allowing flush WRs for kernel
      users.  Since the HW queues are disabled, we just synthesize a CQE for
      this post, queue it to the SW CQ, and then call the CQ event handler.
      This enables proper drain operations for the various storage applications.
      Signed-off-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      4fe7c296
  4. 07 Oct, 2016 1 commit
    • Steve Wise's avatar
      iw_cxgb4: add fast-path for small REG_MR operations · 49b53a93
      Steve Wise authored
      When processing a REG_MR work request, if fw supports the
      FW_RI_NSMR_TPTE_WR work request, and if the page list for this
      registration is <= 2 pages, and the current state of the mr is INVALID,
      then use FW_RI_NSMR_TPTE_WR to pass down a fully populated TPTE for FW
      to write.  This avoids FW having to do an async read of the TPTE blocking
      the SQ until the read completes.
      
      To know if the current MR state is INVALID or not, iw_cxgb4 must track the
      state of each fastreg MR.  The c4iw_mr struct state is updated as REG_MR
      and LOCAL_INV WRs are posted and completed, when a reg_mr is destroyed,
      and when RECV completions are processed that include a local invalidation.
      
      This optimization increases small IO IOPS for both iSER and NVMF.
      Signed-off-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      49b53a93
  5. 23 Aug, 2016 1 commit
  6. 24 Dec, 2015 1 commit
  7. 22 Oct, 2015 1 commit
  8. 11 Jun, 2015 1 commit
  9. 05 May, 2015 1 commit
  10. 16 Jan, 2015 2 commits
  11. 05 Jan, 2015 1 commit
  12. 01 Aug, 2014 1 commit
    • Steve Wise's avatar
      RDMA/cxgb4: Only call CQ completion handler if it is armed · 678ea9b5
      Steve Wise authored
      The function __flush_qp() always calls the ULP's CQ completion handler
      functions even if the CQ was not armed.  This can crash the system if
      the function pointer is NULL. The iSER ULP behaves this way: no
      completion handler and never arm the CQ for notification.  So now we
      track whether the CQ is armed at flush time and only call the
      completion handlers if their CQs were armed.
      
      Also, if the RCQ and SCQ are the same CQ, the completion handler is
      getting called twice.  It should only be called once after all SQ and
      RQ WRs are flushed from the QP.  So rearrange the logic to fix this.
      Signed-off-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      678ea9b5
  13. 22 Jul, 2014 2 commits
  14. 15 Jul, 2014 3 commits
    • Hariprasad Shenai's avatar
      cxgb4/iw_cxgb4: work request logging feature · 7730b4c7
      Hariprasad Shenai authored
      This commit enhances the iwarp driver to optionally keep a log of rdma
      work request timining data for kernel mode QPs.  If iw_cxgb4 module option
      c4iw_wr_log is set to non-zero, each work request is tracked and timing
      data maintained in a rolling log that is 4096 entries deep by default.
      Module option c4iw_wr_log_size_order allows specifing a log2 size to use
      instead of the default order of 12 (4096 entries). Both module options
      are read-only and must be passed in at module load time to set them. IE:
      
      modprobe iw_cxgb4 c4iw_wr_log=1 c4iw_wr_log_size_order=10
      
      The timing data is viewable via the iw_cxgb4 debugfs file "wr_log".
      Writing anything to this file will clear all the timing data.
      Data tracked includes:
      
      - The host time when the work request was posted, just before ringing
      the doorbell.  The host time when the completion was polled by the
      application.  This is also the time the log entry is created.  The delta
      of these two times is the amount of time took processing the work request.
      
      - The qid of the EQ used to post the work request.
      
      - The work request opcode.
      
      - The cqe wr_id field.  For sq completions requests this is the swsqe
      index.  For recv completions this is the MSN of the ingress SEND.
      This value can be used to match log entries from this log with firmware
      flowc event entries.
      
      - The sge timestamp value just before ringing the doorbell when
      posting,  the sge timestamp value just after polling the completion,
      and CQE.timestamp field from the completion itself.  With these three
      timestamps we can track the latency from post to poll, and the amount
      of time the completion resided in the CQ before being reaped by the
      application.  With debug firmware, the sge timestamp is also logged by
      firmware in its flowc history so that we can compute the latency from
      posting the work request until the firmware sees it.
      Signed-off-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7730b4c7
    • Hariprasad Shenai's avatar
      cxgb4/iw_cxgb4: display TPTE on errors · 031cf476
      Hariprasad Shenai authored
      With ingress WRITE or READ RESPONSE errors, HW provides the offending
      stag from the packet.  This patch adds logic to log the parsed TPTE
      in this case. cxgb4 now exports a function to read a TPTE entry
      from adapter memory.
      Signed-off-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      031cf476
    • Hariprasad Shenai's avatar
      iw_cxgb4: Detect Ing. Padding Boundary at run-time · 04e10e21
      Hariprasad Shenai authored
      Updates iw_cxgb4 to determine the Ingress Padding Boundary from
      cxgb4_lld_info, and take subsequent actions.
      Signed-off-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      04e10e21
  15. 11 Jun, 2014 1 commit
    • Hariprasad Shenai's avatar
      iw_cxgb4: Allocate and use IQs specifically for indirect interrupts · cf38be6d
      Hariprasad Shenai authored
      Currently indirect interrupts for RDMA CQs funnel through the LLD's RDMA
      RXQs, which also handle direct interrupts for offload CPLs during RDMA
      connection setup/teardown.  The intended T4 usage model, however, is to
      have indirect interrupts flow through dedicated IQs. IE not to mix
      indirect interrupts with CPL messages in an IQ.  This patch adds the
      concept of RDMA concentrator IQs, or CIQs, setup and maintained by the
      LLD and exported to iw_cxgb4 for use when creating CQs. RDMA CPLs will
      flow through the LLD's RDMA RXQs, and CQ interrupts flow through the
      CIQs.
      
      Design:
      
      cxgb4 creates and exports an array of CIQs for the RDMA ULD.  These IQs
      are sized according to the max available CQs available at adapter init.
      In addition, these IQs don't need FL buffers since they only service
      indirect interrupts.  One CIQ is setup per RX channel similar to the
      RDMA RXQs.
      
      iw_cxgb4 will utilize these CIQs based on the vector value passed into
      create_cq().  The num_comp_vectors advertised by iw_cxgb4 will be the
      number of CIQs configured, and thus the vector value will be the index
      into the array of CIQs.
      
      Based on original work by Steve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cf38be6d
  16. 11 Apr, 2014 3 commits
  17. 15 Mar, 2014 1 commit
    • Steve Wise's avatar
      cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes · 05eb2389
      Steve Wise authored
      The current logic suffers from a slow response time to disable user DB
      usage, and also fails to avoid DB FIFO drops under heavy load. This commit
      fixes these deficiencies and makes the avoidance logic more optimal.
      This is done by more efficiently notifying the ULDs of potential DB
      problems, and implements a smoother flow control algorithm in iw_cxgb4,
      which is the ULD that puts the most load on the DB fifo.
      
      Design:
      
      cxgb4:
      
      Direct ULD callback from the DB FULL/DROP interrupt handler.  This allows
      the ULD to stop doing user DB writes as quickly as possible.
      
      While user DB usage is disabled, the LLD will accumulate DB write events
      for its queues.  Then once DB usage is reenabled, a single DB write is
      done for each queue with its accumulated write count.  This reduces the
      load put on the DB fifo when reenabling.
      
      iw_cxgb4:
      
      Instead of marking each qp to indicate DB writes are disabled, we create
      a device-global status page that each user process maps.  This allows
      iw_cxgb4 to only set this single bit to disable all DB writes for all
      user QPs vs traversing the idr of all the active QPs.  If the libcxgb4
      doesn't support this, then we fall back to the old approach of marking
      each QP.  Thus we allow the new driver to work with an older libcxgb4.
      
      When the LLD upcalls iw_cxgb4 indicating DB FULL, we disable all DB writes
      via the status page and transition the DB state to STOPPED.  As user
      processes see that DB writes are disabled, they call into iw_cxgb4
      to submit their DB write events.  Since the DB state is in STOPPED,
      the QP trying to write gets enqueued on a new DB "flow control" list.
      As subsequent DB writes are submitted for this flow controlled QP, the
      amount of writes are accumulated for each QP on the flow control list.
      So all the user QPs that are actively ringing the DB get put on this
      list and the number of writes they request are accumulated.
      
      When the LLD upcalls iw_cxgb4 indicating DB EMPTY, which is in a workq
      context, we change the DB state to FLOW_CONTROL, and begin resuming all
      the QPs that are on the flow control list.  This logic runs on until
      the flow control list is empty or we exit FLOW_CONTROL mode (due to
      a DB DROP upcall, for example).  QPs are removed from this list, and
      their accumulated DB write counts written to the DB FIFO.  Sets of QPs,
      called chunks in the code, are removed at one time. The chunk size is 64.
      So 64 QPs are resumed at a time, and before the next chunk is resumed, the
      logic waits (blocks) for the DB FIFO to drain.  This prevents resuming to
      quickly and overflowing the FIFO.  Once the flow control list is empty,
      the db state transitions back to NORMAL and user QPs are again allowed
      to write directly to the user DB register.
      
      The algorithm is designed such that if the DB write load is high enough,
      then all the DB writes get submitted by the kernel using this flow
      controlled approach to avoid DB drops.  As the load lightens though, we
      resume to normal DB writes directly by user applications.
      Signed-off-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      05eb2389
  18. 13 Aug, 2013 3 commits
  19. 14 Mar, 2013 2 commits
  20. 18 May, 2012 1 commit
  21. 27 Apr, 2011 1 commit
  22. 14 Mar, 2011 1 commit
  23. 28 Sep, 2010 2 commits
  24. 08 Aug, 2010 1 commit
  25. 21 Jul, 2010 1 commit
    • Steve Wise's avatar
      RDMA/cxgb4: Support variable sized work requests · d37ac31d
      Steve Wise authored
      T4 EQ entries are in multiples of 64 bytes.  Currently the RDMA SQ and
      RQ use fixed sized entries composed of 4 EQ entries for the SQ and 2
      EQ entries for the RQ.  For optimial latency with small IO, we need to
      change this so the HW only needs to DMA the EQ entries actually used
      by a given work request.
      
      Implementation:
      
      - add wq_pidx counter to track where we are in the EQ.  cidx/pidx are
        used for the sw sq/rq tracking and flow control.
      
      - the variable part of work requests is the SGL.  Add new functions to
        build the SGL and/or immediate data directly in the EQ memory
        wrapping when needed.
      
      - adjust the min burst size for the EQ contexts to 64B.
      Signed-off-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
      d37ac31d
  26. 06 Jul, 2010 1 commit
  27. 25 May, 2010 2 commits