Commits · fd47c2f99f04249d1ba82c422d1818dcbe193908 · Kirill Smelkov / linux

19 Feb, 2019 1 commit

RDMA/restrack: Convert internal DB from hash to XArray · fd47c2f9

Leon Romanovsky authored Feb 18, 2019

The additions of .doit callbacks posses new access pattern to the resource
entries by some user visible index. Back then, the legacy DB was
implemented as hash because per-index access wasn't needed and XArray
wasn't accepted yet.

Acceptance of XArray together with per-index access requires the refresh
of DB implementation.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

fd47c2f9

16 Feb, 2019 5 commits

RDMA/core: Move device addition deletion to device.c · 5f8f5499

Parav Pandit authored Feb 13, 2019

Move core device addition and removal from sysfs.c to device.c as device.c
is more appropriate place for device management.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

5f8f5499

RDMA/core: Introduce and use ib_setup_port_attrs() · 5767198a

Parav Pandit authored Feb 13, 2019

Refactor code for device and port sysfs attributes for reuse.

While at it, rename counter part free function to ib_free_port_attrs.

Also attribute setup sequence is:
(a) port specific init.
(b) device stats alloc/init.

So for cleanup, follow reverse sequence:
(a) device stats dealloc
(b) port specific cleanup
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

5767198a

RDMA/core: Use simpler device_del() instead of device_unregister() · e155755e

Parav Pandit authored Feb 13, 2019

Instead of holding extra reference using get_device() that
device_unregister() releases, simplify it as below.

device_add() balances with device_del().  device_initialize() balances
with put_device(), always via ib_dealloc_device().
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

e155755e

RDMA/cxgb4: Remove kref accounting for sync operation · cfe876d8

Leon Romanovsky authored Feb 12, 2019

Ucontext allocation and release aren't async events and don't need kref
accounting. The common layer of RDMA subsystem ensures that dealloc
ucontext will be called after all other objects are released.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

cfe876d8

RDMA/nes: Remove useless usecnt variable and redundant memset · be56b07b

Leon Romanovsky authored Feb 12, 2019

The internal design of RDMA/core ensures that there dealloc ucontext will
be called only if alloc_ucontext succeeded, hence there is no need to
manage internal variable to mark validity of ucontext.

As part of this change, remove redundant memeset too.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

be56b07b

15 Feb, 2019 5 commits

IB/hfi1: Fix a build warning for TID RDMA READ · e50838c2

Kaike Wan authored Feb 15, 2019

The following build warning was produced for the TID RDMA READ
patch ("IB/hfi1: Enable TID RDMA READ protocol"):

drivers/infiniband/hw/hfi1/qp.c: In function 'hfi1_setup_wqe':
drivers/infiniband/hw/hfi1/qp.c:328:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
   hfi1_setup_tid_rdma_wqe(qp, wqe);
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/infiniband/hw/hfi1/qp.c:329:2: note: here
  case IB_QPT_UC:
  ^~~~

This patch will fix the issue by adding the "fall through" comment.

Fixes: f1ab4efa ("IB/hfi1: Enable TID RDMA READ protocol")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

e50838c2

RDMA/uverbs: Fix an error flow in ib_uverbs_poll_cq · 9a778678

Jason Gunthorpe authored Feb 14, 2019

The new output_written block was wrongly placed before the ret=0, causing
the error code to be lost. uverbs_output_written is not expected to fail,
and even if it does fail it has no significant impact on the userspace
flow.
Reported-by: Bart Van Assche <bvanassche@acm.org>
Fixes: d6f4a21f ("RDMA/uverbs: Mark ioctl responses with UVERBS_ATTR_F_VALID_OUTPUT")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>

9a778678

IB/{hw,sw}: Remove 'uobject->context' dependency in object creation APIs · 89944450

Shamir Rabinovitch authored Feb 07, 2019

Now when we have the udata passed to all the ib_xxx object creation APIs
and the additional macro 'rdma_udata_to_drv_context' to get the
ib_ucontext from ib_udata stored in uverbs_attr_bundle, we can finally
start to remove the dependency of the drivers in the
ib_xxx->uobject->context.
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

89944450

IB/verbs: Add helper function rdma_udata_to_drv_context · 730623f4

Shamir Rabinovitch authored Feb 07, 2019

Helper function to get driver's context out of ib_udata wrapped in
uverbs_attr_bundle for user objects or NULL for kernel objects.
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

730623f4

IB/uverbs: Add ib_ucontext to uverbs_attr_bundle sent from ioctl and cmd flows · 3d9dfd06

Shamir Rabinovitch authored Feb 07, 2019

Add ib_ucontext to the uverbs_attr_bundle sent down the iocl and cmd flows
as soon as the flow has ib_uobject.

In addition, remove rdma_get_ucontext helper function that is only used by
ib_umem_get.
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

3d9dfd06

14 Feb, 2019 7 commits

IB/ipoib: Use __func__ instead of function's name · 0dd9ce18

Erez Alfasi authored Feb 13, 2019

Changed debug statements to use %s and __func__ instead
of hard-coded function's name.
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

0dd9ce18

RDMA/iwpm: Remove set but not used variable 'msg_seq' · 36f0a1cc

YueHaibing authored Feb 14, 2019

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/infiniband/core/iwpm_util.c: In function 'iwpm_send_hello':
drivers/infiniband/core/iwpm_util.c:811:6: warning:
 variable 'msg_seq' set but not used [-Wunused-but-set-variable]

It never used since introduction in commit b0bad9ad ("RDMA/IWPM:
Support no port mapping requirements")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

36f0a1cc

RDMA/hns: Configure capacity of hns device · dad1f980

Lijun Ou authored Feb 03, 2019

This patch adds new device capability for IB_DEVICE_MEM_MGT_EXTENSIONS to
indicate device support for the following features:

1. Fast register memory region.
2. send with remote invalidate by frmr
3. local invalidate memory regsion

As well as adds the max depth of frmr page list len.
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

dad1f980

RDMA/hns: Delete useful prints for aeq subtype event · e95c716c

Yixian Liu authored Feb 03, 2019

Current all messages printed for aeq subtype event are wrong.  Thus,
delete them and only the value of subtype event is printed.
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

e95c716c

RDMA/hns: Set allocated memory to zero for wrid · f7f27a5f

Yixian Liu authored Feb 03, 2019

The memory allocated for wrid should be initialized to zero.
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

f7f27a5f

RDMA/hns: Fix the state of rereg mr · ab22bf05

Yixian Liu authored Feb 03, 2019

The state of mr after reregister operation should be set to valid
state. Otherwise, it will keep the same as the state before reregistered.
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

ab22bf05

RDMA/hns: Limit minimum ROCE CQ depth to 64 · 704e0e61

chenglang authored Feb 03, 2019

This patch modifies the minimum CQ depth specification of hip08 and is
consistent with the processing of hip06.
Signed-off-by: chenglang <chenglang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

704e0e61

13 Feb, 2019 4 commits

RDMA/nes: Use for_each_sg_dma_page iterator for umem SGL · 52a572e9

Shiraz, Saleem authored Feb 12, 2019

Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped
SGL and get the page DMA address. This avoids the extra loop to iterate
pages in the SGE when for_each_sg iterator is used.

Additionally, purge umem->page_shift usage in the driver as its only
relevant for ODP MRs. Use system page size and shift instead.
Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

52a572e9

RDMA/rdmavt: Adapt to handle non-uniform sizes on umem SGEs · 36d57708

Shiraz, Saleem authored Feb 12, 2019

rdmavt expects a uniform size on all umem SGEs which is currently at
PAGE_SIZE.

Adapt to a umem API change which could return non-uniform sized SGEs due
to combining contiguous PAGE_SIZE regions into an SGE. Use
for_each_sg_page variant to unfold the larger SGEs into a list of
PAGE_SIZE elements.

Additionally, purge umem->page_shift usage in the driver as its only
relevant for ODP MRs. Use system page size and shift instead.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

36d57708

RDMA: Fix allocation failure on pointer pd · e8ac9389

Colin Ian King authored Feb 12, 2019

The null check on an allocation failure on pd is currently checking
if pd is non-null rather than null. Fix this by adding the missing !
operator.

Fixes: 21a428a0 ("RDMA: Handle PD allocations by IB/core")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

e8ac9389

Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma into for-next · d892273b

Doug Ledford authored Feb 13, 2019

I had merged the hfi1-tid code into my local copy of for-next, but was
waiting on 0day testing before pushing it (I pushed it to my wip
branch).  Having waited several days for 0day testing to show up, I'm
finally just going to push it out.  In the meantime, though, Jason
pushed other stuff to for-next, so I needed to merge up the branches
before pushing.
Signed-off-by: Doug Ledford <dledford@redhat.com>

d892273b

11 Feb, 2019 14 commits

RDMA/bnxt_re: fix or'ing of data into an uninitialized struct member · a8714595

Colin Ian King authored Feb 11, 2019

The struct member comp_mask has not been initialized however a bit
pattern is being bitwise or'd into the member and hence other bit
fields in comp_mask may contain any garbage from the stack. Fix this
by making the bitwise or into an assignment.

Fixes: 95b86d1c ("RDMA/bnxt_re: Update kernel user abi to pass chip context")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

a8714595

RDMA/mlx5: Fix memory leak in case we fail to add an IB device · fc9e4477

Mark Bloch authored Feb 11, 2019

Make sure the IB device is freed on failure.

Fixes: b5ca15ad ("IB/mlx5: Add proper representors support")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

fc9e4477

IB/mlx5: Fix bad flow upon DEVX mkey creation · 0da4d48d

Yishai Hadas authored Feb 11, 2019

Fix bad flow upon DEVX mkey creation to prevent deleting the indirect mkey
from the radix tree in case there was a previous failure to insert it.

Fixes: 534fd7aa ("IB/mlx5: Manage indirection mkey upon DEVX flow for ODP")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

0da4d48d

RDMA/rxe: Use for_each_sg_page iterator on umem SGL · 8317d6cd

Shiraz, Saleem authored Feb 11, 2019

The driver walks the umem SGL assuming a 1:1 mapping between SGE and
system page. Update to use the for_each_sg_page iterator to get individual
pages contained in the SGEs. This is a pre-requisite before adding page
combining into SGEs while building the scatter table in IB core.

8317d6cd

RDMA/ocrdma: Use for_each_sg_dma_page iterator on umem SGL · be8c456a

Shiraz, Saleem authored Feb 11, 2019

Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped
SGL and get the page DMA address. This avoids the extra loop to iterate
pages in the SGE when for_each_sg iterator is used.

be8c456a

RDMA/qedr: Use for_each_sg_dma_page iterator on umem SGL · 95ad233f

Shiraz, Saleem authored Feb 11, 2019

Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped
SGL and get the page DMA address. This avoids the extra loop to iterate
pages in the SGE when for_each_sg iterator is used.

Additionally, purge umem->page_shift usage in the driver as its only
relevant for ODP MRs. Use system page size and shift instead.
Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

95ad233f

RDMA/vmw_pvrdma: Use for_each_sg_dma_page iterator on umem SGL · f3e6d311

Shiraz, Saleem authored Feb 11, 2019

Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped
SGL and get the page DMA address. This avoids the extra loop to iterate
pages in the SGE when for_each_sg iterator is used.

f3e6d311

RDMA/cxgb3: Use for_each_sg_dma_page iterator on umem SGL · b44e47eb

Shiraz, Saleem authored Feb 11, 2019

Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped
SGL and get the page DMA address. This avoids the extra loop to iterate
pages in the SGE when for_each_sg iterator is used.

Additionally, purge umem->page_shift usage in the driver as its only
relevant for ODP MRs. Use system page size and shift instead.
Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

b44e47eb

RDMA/cxgb4: Use for_each_sg_dma_page iterator on umem SGL · 48b586ac

Shiraz, Saleem authored Feb 11, 2019

Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped
SGL and get the page DMA address. This avoids the extra loop to iterate
pages in the SGE when for_each_sg iterator is used.

Additionally, purge umem->page_shift usage in the driver as its only
relevant for ODP MRs. Use system page size and shift instead.
Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

48b586ac

RDMA/hns: Use for_each_sg_dma_page iterator on umem SGL · 3856ec55

Shiraz, Saleem authored Feb 11, 2019

Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped
SGL and get the page DMA address. This avoids the extra loop to iterate
pages in the SGE when for_each_sg iterator is used.

3856ec55

RDMA/i40iw: Use for_each_sg_dma_page iterator on umem SGL · 43fae912

Shiraz, Saleem authored Feb 11, 2019

Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped
SGL and get the page DMA address. This avoids the extra loop to iterate
pages in the SGE when for_each_sg iterator is used.

43fae912

RDMA/mthca: Use for_each_sg_dma_page iterator on umem SGL · 8d249af3

Shiraz, Saleem authored Feb 11, 2019

Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped
SGL and get the page DMA address. This avoids the extra loop to iterate
pages in the SGE when for_each_sg iterator is used.

8d249af3

RDMA/bnxt_re: Use for_each_sg_dma_page iterator on umem SGL · 161ebe24

Shiraz, Saleem authored Feb 11, 2019

Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped
SGL and get the page DMA address. This avoids the extra loop to iterate
pages in the SGE when for_each_sg iterator is used.

161ebe24

lib/scatterlist: Provide a DMA page iterator · d901b276

Jason Gunthorpe authored Jan 04, 2019

Commit 2db76d7c ("lib/scatterlist: sg_page_iter: support sg lists w/o
backing pages") introduced the sg_page_iter_dma_address() function without
providing a way to use it in the general case. If the sg_dma_len() is not
equal to the sg length callers cannot safely use the
for_each_sg_page/sg_page_iter_dma_address combination.

Resolve this API mistake by providing a DMA specific iterator,
for_each_sg_dma_page(), that uses the right length so
sg_page_iter_dma_address() works as expected with all sglists.

A new iterator type is introduced to provide compile-time safety against
wrongly mixing accessors and iterators.

Acked-by: Christoph Hellwig <hch@lst.de> (for scatterlist)
Acked-by: Thomas Hellstrom <thellstrom@vmware.com>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> (ipu3-cio2)
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

d901b276

09 Feb, 2019 4 commits

Merge branch 'wip/dl-for-next' into for-next · 82771f20

Doug Ledford authored Feb 09, 2019

Due to concurrent work by myself and Jason, a normal fast forward merge
was not possible.  This brings in a number of hfi1 changes, mainly the
hfi1 TID RDMA support (roughly 10,000 LOC change), which was reviewed
and integrated over a period of days.
Signed-off-by: Doug Ledford <dledford@redhat.com>

82771f20

Merge branch 'hfi1-tid' into wip/dl-for-next · 416fbc1b

Doug Ledford authored Feb 09, 2019

Omni-Path TID RDMA Feature

Intel Omni-Path (OPA) TID RDMA support is a feature that accelerates
data movement between two OPA nodes through the IB Verbs interface. It
improves RDMA READ/WRITE performance by delivering the data payload to a
user buffer directly without any software copying.

Architecture
=============
The TID RDMA protocol is implemented on the hfi1 driver level and is
therefore transparent to the ULPs. It is designed to facilitate the data
transactions for two specific RDMA requests:
  - RDMA READ;
  - RDMA WRITE.
Previously, when a verbs data packet is received at the destination
(requester side for RDMA READ and responder side for RDMA WRITE), the
data payload is copied to the user buffer by software, which slows down
the performance significantly for large requests.

Internally, hfi1 converts qualified RDMA READ/WRITE requests into TID
RDMA READ/WRITE requests when the requests are post sent to the hfi1
driver. Non-qualified RDMA requests are handled by normal RDMA protocol.

For TID RDMA requests, hardware resources (hardware flow and TID entries)
are allocated on the destination side (the requester side for TID RDMA
READ and the responder side for TID RDMA WRITE). The information for
these resources is conveyed to the data source side (the responder side
for TID RDMA READ and the requester side for TID RDMA WRITE) and embedded
in data packets. When data packets are received by the destination,
hardware will deliver the data payload to the destination buffer without
involving software and therefore improve the performance.

Details
=======
RDMA READ/WRITE requests are qualified by the following:
  - Total data length >= 256k;
  - Totoal data length is a multiple of 4K pages.

Additional qualifications are enforced for the destination buffers:
  For RDMA RAED:
    - Each destination sge buffer is 4K aligned;
    - Each destination sge buffer is a multiple of 4K pages.
  For RDMA WRITE:
    - The destination number is 4K aligned.

In addition, in an OPA fabric, some nodes may support TID RDMA while
others may not. As such, it is important for two transaction nodes to
exchange the information about the features they support. This discovery
mechanism is called OPA Feature Negotion (OPFN) and is described in
details in the patch series. Through OPFN, two nodes can find whether
they both support TID RDMA and subsequently convert RDMA requests into
TID RDMA requests.

* hfi1-tid: (46 commits)
  IB/hfi1: Prioritize the sending of ACK packets
  IB/hfi1: Add static trace for TID RDMA WRITE protocol
  IB/hfi1: Enable TID RDMA WRITE protocol
  IB/hfi1: Add interlock between TID RDMA WRITE and other requests
  IB/hfi1: Add TID RDMA WRITE functionality into RDMA verbs
  IB/hfi1: Add the dual leg code
  IB/hfi1: Add the TID second leg ACK packet builder
  IB/hfi1: Add the TID second leg send packet builder
  IB/hfi1: Resend the TID RDMA WRITE DATA packets
  IB/hfi1: Add a function to receive TID RDMA RESYNC packet
  IB/hfi1: Add a function to build TID RDMA RESYNC packet
  IB/hfi1: Add TID RDMA retry timer
  IB/hfi1: Add a function to receive TID RDMA ACK packet
  IB/hfi1: Add a function to build TID RDMA ACK packet
  IB/hfi1: Add a function to receive TID RDMA WRITE DATA packet
  IB/hfi1: Add a function to build TID RDMA WRITE DATA packet
  IB/hfi1: Add a function to receive TID RDMA WRITE response
  IB/hfi1: Add TID resource timer
  IB/hfi1: Add a function to build TID RDMA WRITE response
  IB/hfi1: Add functions to receive TID RDMA WRITE request
  ...
Signed-off-by: Doug Ledford <dledford@redhat.com>

416fbc1b

iw_cxgb4: fix srqidx leak during connection abort · f368ff18

Raju Rangoju authored Feb 06, 2019

When an application aborts the connection by moving QP from RTS to ERROR,
then iw_cxgb4's modify_rc_qp() RTS->ERROR logic sets the
*srqidxp to 0 via t4_set_wq_in_error(&qhp->wq, 0), and aborts the
connection by calling c4iw_ep_disconnect().

c4iw_ep_disconnect() does the following:
 1. sends up a close_complete_upcall(ep, -ECONNRESET) to libcxgb4.
 2. sends abort request CPL to hw.

But, since the close_complete_upcall() is sent before sending the
ABORT_REQ to hw, libcxgb4 would fail to release the srqidx if the
connection holds one. Because, the srqidx is passed up to libcxgb4 only
after corresponding ABORT_RPL is processed by kernel in abort_rpl().

This patch handle the corner-case by moving the call to
close_complete_upcall() from c4iw_ep_disconnect() to abort_rpl().  So that
libcxgb4 is notified about the -ECONNRESET only after abort_rpl(), and
libcxgb4 can relinquish the srqidx properly.
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

f368ff18

iw_cxgb4: complete the cached SRQ buffers · 11a27e21

Raju Rangoju authored Feb 06, 2019

If TP fetches an SRQ buffer but ends up not using it before the connection
is aborted, then it passes the index of that SRQ buffer to the host in
ABORT_REQ_RSS or ABORT_RPL CPL message.

But, if the srqidx field is zero in the received ABORT_RPL or
ABORT_REQ_RSS CPL, then we need to read the tcb.rq_start field to see if
it really did have an RQE cached. This works around a case where HW does
not include the srqidx in the ABORT_RPL/ABORT_REQ_RSS CPL.

The final value of rq_start is the one present in TCB with the
TF_RX_PDU_OUT bit cleared. So, we need to read the TCB, examine the
TF_RX_PDU_OUT (bit 49 of t_flags) in order to determine if there's a rx
PDU feedback event pending.
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

11a27e21