Commits · 8d7c7c0eeb74281c846ef9231ce20536c79a99b4 · Kirill Smelkov / linux

16 Apr, 2023 2 commits

RDMA: Add ib_virt_dma_to_page() · 8d7c7c0e

Jason Gunthorpe authored Apr 14, 2023

Make it clearer what is going on by adding a function to go back from the
"virtual" dma_addr to a kva and another to a struct page. This is used in the
ib_uses_virt_dma() style drivers (siw, rxe, hfi, qib).

Call them instead of a naked casting and virt_to_page() when working with dma_addr
values encoded by the various ib_map functions.

This also fixes the virt_to_page() casting problem Linus Walleij has been
chasing.

Cc: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/0-v2-05ea785520ed+10-ib_virt_page_jgg@nvidia.comSigned-off-by: Leon Romanovsky <leon@kernel.org>

8d7c7c0e

RDMA/rxe: Fix the error "trying to register non-static key in rxe_cleanup_task" · b2b1ddc4

Zhu Yanjun authored Apr 13, 2023

In the function rxe_create_qp(), rxe_qp_from_init() is called to
initialize qp, internally things like rxe_init_task are not setup until
rxe_qp_init_req().

If an error occurred before this point then the unwind will call
rxe_cleanup() and eventually to rxe_qp_do_cleanup()/rxe_cleanup_task()
which will oops when trying to access the uninitialized spinlock.

If rxe_init_task is not executed, rxe_cleanup_task will not be called.

Reported-by: syzbot+cfcc1a3c85be15a40cba@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=fd85757b74b3eb59f904138486f755f71e090df8
Fixes: 8700e3e7 ("Soft RoCE driver")
Fixes: 2d4b21e0 ("IB/rxe: Prevent from completer to operate on non valid QP")
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Link: https://lore.kernel.org/r/20230413101115.1366068-1-yanjun.zhu@intel.comSigned-off-by: Leon Romanovsky <leon@kernel.org>

b2b1ddc4

13 Apr, 2023 1 commit

RDMA/irdma: Slightly optimize irdma_form_ah_cm_frame() · a2e20b29

Christophe JAILLET authored Feb 02, 2023

There is no need to zero 'pktsize' bytes of 'buf', only the header needs
to be cleared, to be safe.
All the other bytes are already written with some memcpy() at the end of
the function.

Doing so also gives the opportunity to the compiler to avoid the memset()
call. It can be inlined now that the length is known as compile time.

Link: https://lore.kernel.org/r/098e3c397be0436f1867899245ecfe656c472110.1675369386.git.christophe.jaillet@wanadoo.frSigned-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

a2e20b29

12 Apr, 2023 1 commit

RDMA/rxe: Fix incorrect TASKLET_STATE_SCHED check in rxe_task.c · 67a00d29

Bob Pearson authored Mar 29, 2023

In a previous patch TASKLET_STATE_SCHED was used as a mask but it is a bit
position instead. Add the missing shift.

Link: https://lore.kernel.org/r/20230329193308.7489-1-rpearsonhpe@gmail.comReported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/linux-rdma/8a054b78-6d50-4bc6-8d8a-83f85fbdb82f@kili.mountain/
Fixes: d9467163 ("RDMA/rxe: Rewrite rxe_task.c")
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

67a00d29

09 Apr, 2023 7 commits

IB/hfi1: Place struct mmu_rb_handler on cache line start · 866694af

Patrick Kelsey authored Apr 07, 2023

Place struct mmu_rb_handler on cache line start like so:

	struct mmu_rb_handler *h;
	void *free_ptr;
	int ret;

	free_ptr = kzalloc(sizeof(*h) + cache_line_size() - 1, GFP_KERNEL);
	if (!free_ptr)
		return -ENOMEM;

	h = PTR_ALIGN(free_ptr, cache_line_size());

Additionally, move struct mmu_rb_handler fields "root" and "ops_args" to
start after the next cacheline using the "____cacheline_aligned_in_smp"
annotation.

Allocating an additional cache_line_size() - 1 bytes to place
struct mmu_rb_handler on a cache line start does increase memory
consumption.

However, few struct mmu_rb_handler are created when hfi1 is in use.
As mmu_rb_handler->root and mmu_rb_handler->ops_args are accessed
frequently, the advantage of having them both within a cache line is
expected to outweigh the disadvantage of the additional memory
consumption per struct mmu_rb_handler.
Signed-off-by: Brendan Cunningham <bcunningham@cornelisnetworks.com>
Signed-off-by: Patrick Kelsey <pat.kelsey@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Link: https://lore.kernel.org/r/168088636963.3027109.16959757980497822530.stgit@252.162.96.66.static.eigbox.netSigned-off-by: Leon Romanovsky <leon@kernel.org>

866694af

IB/hfi1: Fix bugs with non-PAGE_SIZE-end multi-iovec user SDMA requests · 00cbce5c

Patrick Kelsey authored Apr 07, 2023

hfi1 user SDMA request processing has two bugs that can cause data
corruption for user SDMA requests that have multiple payload iovecs
where an iovec other than the tail iovec does not run up to the page
boundary for the buffer pointed to by that iovec.a

Here are the specific bugs:
1. user_sdma_txadd() does not use struct user_sdma_iovec->iov.iov_len.
   Rather, user_sdma_txadd() will add up to PAGE_SIZE bytes from iovec
   to the packet, even if some of those bytes are past
   iovec->iov.iov_len and are thus not intended to be in the packet.
2. user_sdma_txadd() and user_sdma_send_pkts() fail to advance to the
   next iovec in user_sdma_request->iovs when the current iovec
   is not PAGE_SIZE and does not contain enough data to complete the
   packet. The transmitted packet will contain the wrong data from the
   iovec pages.

This has not been an issue with SDMA packets from hfi1 Verbs or PSM2
because they only produce iovecs that end short of PAGE_SIZE as the tail
iovec of an SDMA request.

Fixing these bugs exposes other bugs with the SDMA pin cache
(struct mmu_rb_handler) that get in way of supporting user SDMA requests
with multiple payload iovecs whose buffers do not end at PAGE_SIZE. So
this commit fixes those issues as well.

Here are the mmu_rb_handler bugs that non-PAGE_SIZE-end multi-iovec
payload user SDMA requests can hit:
1. Overlapping memory ranges in mmu_rb_handler will result in duplicate
   pinnings.
2. When extending an existing mmu_rb_handler entry (struct mmu_rb_node),
   the mmu_rb code (1) removes the existing entry under a lock, (2)
   releases that lock, pins the new pages, (3) then reacquires the lock
   to insert the extended mmu_rb_node.

   If someone else comes in and inserts an overlapping entry between (2)
   and (3), insert in (3) will fail.

   The failure path code in this case unpins _all_ pages in either the
   original mmu_rb_node or the new mmu_rb_node that was inserted between
   (2) and (3).
3. In hfi1_mmu_rb_remove_unless_exact(), mmu_rb_node->refcount is
   incremented outside of mmu_rb_handler->lock. As a result, mmu_rb_node
   could be evicted by another thread that gets mmu_rb_handler->lock and
   checks mmu_rb_node->refcount before mmu_rb_node->refcount is
   incremented.
4. Related to #2 above, SDMA request submission failure path does not
   check mmu_rb_node->refcount before freeing mmu_rb_node object.

   If there are other SDMA requests in progress whose iovecs have
   pointers to the now-freed mmu_rb_node(s), those pointers to the
   now-freed mmu_rb nodes will be dereferenced when those SDMA requests
   complete.

Fixes: 7be85676 ("IB/hfi1: Don't remove RB entry when not needed.")
Fixes: 77241056 ("IB/hfi1: add driver files")
Signed-off-by: Brendan Cunningham <bcunningham@cornelisnetworks.com>
Signed-off-by: Patrick Kelsey <pat.kelsey@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Link: https://lore.kernel.org/r/168088636445.3027109.10054635277810177889.stgit@252.162.96.66.static.eigbox.netSigned-off-by: Leon Romanovsky <leon@kernel.org>

00cbce5c

IB/hfi1: Fix SDMA mmu_rb_node not being evicted in LRU order · 9fe8fec5

Patrick Kelsey authored Apr 07, 2023

hfi1_mmu_rb_remove_unless_exact() did not move mmu_rb_node objects in
mmu_rb_handler->lru_list after getting a cache hit on an mmu_rb_node.

As a result, hfi1_mmu_rb_evict() was not guaranteed to evict truly
least-recently used nodes.

This could be a performance issue for an application when that
application:
- Uses some long-lived buffers frequently.
- Uses a large number of buffers once.
- Hits the mmu_rb_handler cache size or pinned-page limits, forcing
  mmu_rb_handler cache entries to be evicted.

In this case, the one-time use buffers cause the long-lived buffer
entries to eventually filter to the end of the LRU list where
hfi1_mmu_rb_evict() will consider evicting a frequently-used long-lived
entry instead of evicting one of the one-time use entries.

Fix this by inserting new mmu_rb_node at the tail of
mmu_rb_handler->lru_list and move mmu_rb_ndoe to the tail of
mmu_rb_handler->lru_list when the mmu_rb_node is a hit in
hfi1_mmu_rb_remove_unless_exact(). Change hfi1_mmu_rb_evict() to evict
from the head of mmu_rb_handler->lru_list instead of the tail.

Fixes: 0636e9ab ("IB/hfi1: Add cache evict LRU list")
Signed-off-by: Brendan Cunningham <bcunningham@cornelisnetworks.com>
Signed-off-by: Patrick Kelsey <pat.kelsey@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Link: https://lore.kernel.org/r/168088635931.3027109.10423156330761536044.stgit@252.162.96.66.static.eigbox.netSigned-off-by: Leon Romanovsky <leon@kernel.org>

9fe8fec5

IB/hfi1: Suppress useless compiler warnings · cf0455f1

Ehab Ababneh authored Apr 07, 2023

These warnings can cause build failure:

In file included from ./include/trace/define_trace.h:102,
                 from drivers/infiniband/hw/hfi1/trace_dbg.h:111,
                 from drivers/infiniband/hw/hfi1/trace.h:15,
                 from drivers/infiniband/hw/hfi1/trace.c:6:
drivers/infiniband/hw/hfi1/./trace_dbg.h: In function ‘trace_event_get_offsets_hfi1_trace_template’:
./include/trace/trace_events.h:261:9: warning: function ‘trace_event_get_offsets_hfi1_trace_template’ might be a candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format]
  struct trace_event_raw_##call __maybe_unused *entry;  \
         ^~~~~~~~~~~~~~~~
drivers/infiniband/hw/hfi1/./trace_dbg.h:25:1: note: in expansion of macro ‘DECLARE_EVENT_CLASS’
 DECLARE_EVENT_CLASS(hfi1_trace_template,
 ^~~~~~~~~~~~~~~~~~~
In file included from ./include/trace/define_trace.h:102,
                 from drivers/infiniband/hw/hfi1/trace_dbg.h:111,
                 from drivers/infiniband/hw/hfi1/trace.h:15,
                 from drivers/infiniband/hw/hfi1/trace.c:6:
drivers/infiniband/hw/hfi1/./trace_dbg.h: In function ‘trace_event_raw_event_hfi1_trace_template’:
./include/trace/trace_events.h:386:9: warning: function ‘trace_event_raw_event_hfi1_trace_template’ might be a candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format]
  struct trace_event_raw_##call *entry;    \
         ^~~~~~~~~~~~~~~~
drivers/infiniband/hw/hfi1/./trace_dbg.h:25:1: note: in expansion of macro ‘DECLARE_EVENT_CLASS’
 DECLARE_EVENT_CLASS(hfi1_trace_template,
 ^~~~~~~~~~~~~~~~~~~
In file included from ./include/trace/define_trace.h:103,
                 from drivers/infiniband/hw/hfi1/trace_dbg.h:111,
                 from drivers/infiniband/hw/hfi1/trace.h:15,
                 from drivers/infiniband/hw/hfi1/trace.c:6:
drivers/infiniband/hw/hfi1/./trace_dbg.h: In function ‘perf_trace_hfi1_trace_template’:
./include/trace/perf.h:70:9: warning: function ‘perf_trace_hfi1_trace_template’ might be a candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format]
  struct hlist_head *head;     \
         ^~~~~~~~~~
drivers/infiniband/hw/hfi1/./trace_dbg.h:25:1: note: in expansion of macro ‘DECLARE_EVENT_CLASS’
 DECLARE_EVENT_CLASS(hfi1_trace_template,
 ^~~~~~~~~~~~~~~~~~~

Solution adapted here is similar to the one in fbbc95a4Signed-off-by: Ehab Ababneh <ehab.ababneh@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Link: https://lore.kernel.org/r/168088635415.3027109.5711716700328939402.stgit@252.162.96.66.static.eigbox.netSigned-off-by: Leon Romanovsky <leon@kernel.org>

cf0455f1

IB/hfi1: Remove trace newlines · d2590edc

Dean Luick authored Apr 07, 2023

The hfi1_cdbg trace mechanism appends a newline. Remove trailing
newlines from all format strings.
Signed-off-by: Dean Luick <dean.luick@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Link: https://lore.kernel.org/r/168088634897.3027109.10401662436950683555.stgit@252.162.96.66.static.eigbox.netSigned-off-by: Leon Romanovsky <leon@kernel.org>

d2590edc

RDMA/srpt: Add a check for valid 'mad_agent' pointer · eca5cd94

Saravanan Vajravel authored Apr 05, 2023

When unregistering MAD agent, srpt module has a non-null check
for 'mad_agent' pointer before invoking ib_unregister_mad_agent().
This check can pass if 'mad_agent' variable holds an error value.
The 'mad_agent' can have an error value for a short window when
srpt_add_one() and srpt_remove_one() is executed simultaneously.

In srpt module, added a valid pointer check for 'sport->mad_agent'
before unregistering MAD agent.

This issue can hit when RoCE driver unregisters ib_device

Stack Trace:
------------
BUG: kernel NULL pointer dereference, address: 000000000000004d
PGD 145003067 P4D 145003067 PUD 2324fe067 PMD 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 10 PID: 4459 Comm: kworker/u80:0 Kdump: loaded Tainted: P
Hardware name: Dell Inc. PowerEdge R640/06NR82, BIOS 2.5.4 01/13/2020
Workqueue: bnxt_re bnxt_re_task [bnxt_re]
RIP: 0010:_raw_spin_lock_irqsave+0x19/0x40
Call Trace:
  ib_unregister_mad_agent+0x46/0x2f0 [ib_core]
  IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
  ? __schedule+0x20b/0x560
  srpt_unregister_mad_agent+0x93/0xd0 [ib_srpt]
  srpt_remove_one+0x20/0x150 [ib_srpt]
  remove_client_context+0x88/0xd0 [ib_core]
  bond0: (slave p2p1): link status definitely up, 100000 Mbps full duplex
  disable_device+0x8a/0x160 [ib_core]
  bond0: active interface up!
  ? kernfs_name_hash+0x12/0x80
 (NULL device *): Bonding Info Received: rdev: 000000006c0b8247
  __ib_unregister_device+0x42/0xb0 [ib_core]
 (NULL device *):         Master: mode: 4 num_slaves:2
  ib_unregister_device+0x22/0x30 [ib_core]
 (NULL device *):         Slave: id: 105069936 name:p2p1 link:0 state:0
  bnxt_re_stopqps_and_ib_uninit+0x83/0x90 [bnxt_re]
  bnxt_re_alloc_lag+0x12e/0x4e0 [bnxt_re]

Fixes: a42d985b ("ib_srpt: Initial SRP Target merge for v3.3-rc1")
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Link: https://lore.kernel.org/r/20230406042549.507328-1-saravanan.vajravel@broadcom.comReviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Leon Romanovsky <leon@kernel.org>

eca5cd94

RDMA/cm: Trace icm_send_rej event before the cm state is reset · bd9de1ba

Mark Zhang authored Mar 30, 2023

Trace icm_send_rej event before the cm state is reset to idle, so that
correct cm state will be logged. For example when an incoming request is
rejected, the old trace log was:
icm_send_rej: local_id=961102742 remote_id=3829151631 state=IDLE reason=REJ_CONSUMER_DEFINED
With this patch:
icm_send_rej: local_id=312971016 remote_id=3778819983 state=MRA_REQ_SENT reason=REJ_CONSUMER_DEFINED

Fixes: 8dc105be ("RDMA/cm: Add tracepoints to track MAD send operations")
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Link: https://lore.kernel.org/r/20230330072351.481200-1-markzhang@nvidia.comSigned-off-by: Leon Romanovsky <leon@kernel.org>

bd9de1ba

04 Apr, 2023 7 commits

RDMA/bnxt_re: Enable congestion control by default · f13bcef0

Selvin Xavier authored Mar 30, 2023

Enable Congesion control by default. Issue FW command
enable the CC during driver load and disable it during
unload.
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1680169540-10029-8-git-send-email-selvin.xavier@broadcom.comSigned-off-by: Leon Romanovsky <leon@kernel.org>

f13bcef0

RDAM/bnxt_re: Use tlv apis while processing the slow path commands · c682c6ed

Selvin Xavier authored Mar 30, 2023

Use the new TLV APIs for existing slow path commands. The TLV
APIs will be used to populate extended headers for some of the
Firmware commands, which will be introduced in the patches that
follow.
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1680169540-10029-7-git-send-email-selvin.xavier@broadcom.comSigned-off-by: Leon Romanovsky <leon@kernel.org>

c682c6ed

RDMA/bnxt_re: RoCE slow path TLV support · 0722f1f7

Selvin Xavier authored Mar 30, 2023

Header file to support TLV encapsulated commands. These
functions will be used by the driver in the follow up patches.
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1680169540-10029-6-git-send-email-selvin.xavier@broadcom.comSigned-off-by: Leon Romanovsky <leon@kernel.org>

0722f1f7

RDMA/bnxt_re: Reduce number of argumets to control path command APIs · ff015bcd

Selvin Xavier authored Mar 30, 2023

Reducing the number of arguments to bnxt_qplib_rcfw_send_message
by enclosing all its arguments into a command message structure.
Use the same struct while passing the command information to
send_message.
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1680169540-10029-5-git-send-email-selvin.xavier@broadcom.comSigned-off-by: Leon Romanovsky <leon@kernel.org>

ff015bcd

RDMA/bnxt_re: Convert RCFW_CMD_PREP macro to static inline function · e576adf5

Selvin Xavier authored Mar 30, 2023

Convert RCFW_CMD_PREP macro to static inline function.
Also, remove the cmd_flags passed as none of the functions
are using it.
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1680169540-10029-4-git-send-email-selvin.xavier@broadcom.comSigned-off-by: Leon Romanovsky <leon@kernel.org>

e576adf5

RDMA/bnxt_re: Remove HW queue mapping from RoCE Driver · b400acee

Selvin Xavier authored Mar 30, 2023

bnxt_en driver does the queue mapping for RoCE traffic. Removing the
queue mapping from RoCE driver.
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1680169540-10029-3-git-send-email-selvin.xavier@broadcom.comSigned-off-by: Leon Romanovsky <leon@kernel.org>

b400acee

RDMA/bnxt_re: Update HW interface headers · a9a457f3

Selvin Xavier authored Mar 30, 2023

Updating the HW structures to the latest version.
This is copied from the code maintained internally. No functionality
changes in this patch. Code is re-organized to match the file maintained
in the internal tree. Also, New HW interface structures are added, which
will be used by the drivers in future.

CC: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1680169540-10029-2-git-send-email-selvin.xavier@broadcom.comSigned-off-by: Leon Romanovsky <leon@kernel.org>

a9a457f3

03 Apr, 2023 7 commits

RDMA/siw: Remove namespace check from siw_netdev_event() · 266e9b34

Tetsuo Handa authored Apr 02, 2023

syzbot is reporting that siw_netdev_event(NETDEV_UNREGISTER) cannot destroy
siw_device created after unshare(CLONE_NEWNET) due to net namespace check.
It seems that this check was by error there and should be removed.
Reported-by: syzbot <syzbot+5e70d01ee8985ae62a3b@syzkaller.appspotmail.com>
Link: https://syzkaller.appspot.com/bug?extid=5e70d01ee8985ae62a3bSuggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Suggested-by: Leon Romanovsky <leon@kernel.org>
Fixes: bdcf26bf ("rdma/siw: network and RDMA core interface")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Link: https://lore.kernel.org/r/a44e9ac5-44e2-d575-9e30-02483cc7ffd1@I-love.SAKURA.ne.jpReviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>

266e9b34

RDMA/cma: Remove NULL check before dev_{put, hold} · 08ebf57f

Yang Li authored Mar 31, 2023

The call netdev_{put, hold} of dev_{put, hold} will check NULL,
so there is no need to check before using dev_{put, hold},
remove it to silence the warnings:

./drivers/infiniband/core/cma.c:713:2-9: WARNING: NULL check before dev_{put, hold} functions is not needed.
./drivers/infiniband/core/cma.c:2433:2-9: WARNING: NULL check before dev_{put, hold} functions is not needed.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4668Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230331010633.63261-1-yang.lee@linux.alibaba.comSigned-off-by: Leon Romanovsky <leon@kernel.org>

08ebf57f

IB/qib: Remove unused cnt variable · e7706c4b

Tom Rix authored Mar 30, 2023

clang with W=1 reports
drivers/infiniband/hw/qib/qib_file_ops.c:487:20: error: variable
  'cnt' set but not used [-Werror,-Wunused-but-set-variable]
        u32 tid, ctxttid, cnt, limit, tidcnt;
                          ^
drivers/infiniband/hw/qib/qib_file_ops.c:1771:9: error: variable
  'cnt' set but not used [-Werror,-Wunused-but-set-variable]
        int i, cnt = 0, maxtid = ctxt_tidbase + dd->rcvtidcnt;
               ^
This variable is not used so remove it.
Signed-off-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20230330235800.1845815-1-trix@redhat.comSigned-off-by: Leon Romanovsky <leon@kernel.org>

e7706c4b

RDMA/mlx5: Remove unused num_alloc_xa_entries variable · 081c27b3

Tom Rix authored Mar 30, 2023

clang with W=1 reports
drivers/infiniband/hw/mlx5/devx.c:1996:6: error: variable
  'num_alloc_xa_entries' set but not used [-Werror,-Wunused-but-set-variable]
        int num_alloc_xa_entries = 0;
            ^
This variable is not used so remove it.
Signed-off-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20230330153607.1838750-1-trix@redhat.comSigned-off-by: Leon Romanovsky <leon@kernel.org>

081c27b3

IB/iser: remove redundant new line · 070fc1c0

Max Gurtovoy authored Mar 30, 2023

This commit doesn't change any logic.
Reviewed-by: Sergey Gorenko <sergeygo@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Link: https://lore.kernel.org/r/20230330131333.37900-3-mgurtovoy@nvidia.comReviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Leon Romanovsky <leon@kernel.org>

070fc1c0

IB/iser: centralize setting desc type and done callback · 92363895

Max Gurtovoy authored Mar 30, 2023

Move this common logic into iser_create_send_desc instead of duplicating
the code.
Reviewed-by: Sergey Gorenko <sergeygo@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Link: https://lore.kernel.org/r/20230330131333.37900-2-mgurtovoy@nvidia.comReviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Leon Romanovsky <leon@kernel.org>

92363895

IB/iser: remove unused macros · b7727e23

Max Gurtovoy authored Mar 30, 2023

The removed macros are old leftovers.
Reviewed-by: Sergey Gorenko <sergeygo@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Link: https://lore.kernel.org/r/20230330131333.37900-1-mgurtovoy@nvidia.comReviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Leon Romanovsky <leon@kernel.org>

b7727e23

30 Mar, 2023 1 commit

RDMA/rxe: Clean kzalloc failure paths · b6ba6855

Leon Romanovsky authored Mar 29, 2023

There is no need to print any debug messages after failure to
allocate memory, because kernel will print OOM dumps anyway.

Together with removal of these messages, remove useless goto jumps.

Fixes: 5bf944f2 ("RDMA/rxe: Add error messages")
Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/all/ea43486f-43dd-4054-b1d5-3a0d202be621@kili.mountain
Link: https://lore.kernel.org/r/d3cedf723b84e73e8062a67b7489d33802bafba2.1680113597.git.leon@kernel.orgReviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>

b6ba6855

29 Mar, 2023 8 commits

RDMA/rxe: Remove tasklet call from rxe_cq.c · 78b26a33

Bob Pearson authored Mar 27, 2023

Remove the tasklet call in rxe_cq.c and also the is_dying in the
cq struct. There is no reason for the rxe driver to defer the call
to the cq completion handler by scheduling a tasklet. rxe_cq_post()
is not called in a hard irq context.

The rxe driver currently is incorrect because the tasklet call is
made without protecting the cq pointer with a reference from having
the underlying memory freed before the deferred routine is called.
Executing the comp_handler inline fixes this problem.

Fixes: 8700e3e7 ("Soft RoCE driver")
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Link: https://lore.kernel.org/r/20230327215643.10410-1-rpearsonhpe@gmail.comAcked-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>

78b26a33

RDMA/ocrdma: remove unused discard_cnt variable · cba968e3

Tom Rix authored Mar 26, 2023

clang with W=1 reports
drivers/infiniband/hw/ocrdma/ocrdma_verbs.c:1592:6: error: variable
  'discard_cnt' set but not used [-Werror,-Wunused-but-set-variable]
        int discard_cnt = 0;
            ^
This variable is not used so remove it.
Signed-off-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20230326120959.1351948-1-trix@redhat.comSigned-off-by: Leon Romanovsky <leon@kernel.org>

cba968e3

RDMA/bnxt_re: remove unused num_srqne_processed and num_cqne_processed variables · 1b69f1e3

Tom Rix authored Mar 25, 2023

clang with W=1 reports
drivers/infiniband/hw/bnxt_re/qplib_fp.c:303:6: error: variable
  'num_srqne_processed' set but not used [-Werror,-Wunused-but-set-variable]
        int num_srqne_processed = 0;
            ^
drivers/infiniband/hw/bnxt_re/qplib_fp.c:304:6: error: variable
  'num_cqne_processed' set but not used [-Werror,-Wunused-but-set-variable]
        int num_cqne_processed = 0;
            ^
These variables are not used so remove them.
Signed-off-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20230325140559.1336056-1-trix@redhat.comAcked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>

1b69f1e3

RDMA/usnic: Remove redundant pci_clear_master · fc36ce35

Cai Huoqing authored Mar 23, 2023

Remove pci_clear_master to simplify the code,
the bus-mastering is also cleared in do_pci_disable_device,
like this:
./drivers/pci/pci.c:2197
static void do_pci_disable_device(struct pci_dev *dev)
{
	u16 pci_command;

	pci_read_config_word(dev, PCI_COMMAND, &pci_command);
	if (pci_command & PCI_COMMAND_MASTER) {
		pci_command &= ~PCI_COMMAND_MASTER;
		pci_write_config_word(dev, PCI_COMMAND, pci_command);
	}

	pcibios_disable_device(dev);
}.
And dev->is_busmaster is set to 0 in pci_disable_device.
Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Link: https://lore.kernel.org/r/20230323115742.13836-1-cai.huoqing@linux.devSigned-off-by: Leon Romanovsky <leon@kernel.org>

fc36ce35

RDMA/mlx5: Expand switchdev Q-counters to expose representor statistics · d22467a7

Patrisious Haddad authored Mar 23, 2023

Previously for switchdev only per device counters were supported.

Currently we allocate counters for switchdev per port, which also
includes the ports that belong to VF representors in order to expose
them to users through the rdma tool, allowing the host to track the VFs
statistics through their representors counters.
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Link: https://lore.kernel.org/r/ea31e1103c125cd27931ba213f307cde30d2eaed.1679566038.git.leon@kernel.orgSigned-off-by: Leon Romanovsky <leon@kernel.org>

d22467a7

Merge branch 'mlx5-next' into wip/leon-for-next · bbe37139
Leon Romanovsky authored Mar 29, 2023
```
* mlx5-next:
  net/mlx5: Introduce other vport query for Q-counters
```
bbe37139

net/mlx5: Introduce other vport query for Q-counters · 77f7eb9f

Patrisious Haddad authored Mar 23, 2023

These new fields in QUERY_Q_COUNTER command allow us to access
another vport counters during the query command, which is specially
useful to query representor vports.

In addition also add the required caps to check if this capability
is actually supported.
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://lore.kernel.org/r/75c73a4a0e60f18c37b35a4a11ca2e2415e4a6f3.1679566038.git.leon@kernel.orgSigned-off-by: Leon Romanovsky <leon@kernel.org>

77f7eb9f

RDMA/bnxt_re: Add resize_cq support · d54bd5ab

Selvin Xavier authored Mar 15, 2023

Add resize_cq verb support for user space CQs. Resize operation for
kernel CQs are not supported now.

Driver should free the current CQ only after user library polls
for all the completions and switch to new CQ. So after the resize_cq
is returned from the driver, user library polls for existing completions
and store it as temporary data. Once library reaps all completions in the
current CQ, it invokes the ibv_cmd_poll_cq to inform the driver about
the resize_cq completion. Adding a check for user CQs in driver's
poll_cq and complete the resize operation for user CQs.
Updating uverbs_cmd_mask with poll_cq to support this.
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1678868215-23626-1-git-send-email-selvin.xavier@broadcom.comSigned-off-by: Leon Romanovsky <leon@kernel.org>

d54bd5ab

24 Mar, 2023 6 commits

RDMA/erdma: Use fixed hardware page size · d649c638

Cheng Xu authored Mar 07, 2023

Hardware's page size is 4096, but the kernel's page size may vary. Driver
should use hardware's page size when communicating with hardware.

Fixes: 15505577 ("RDMA/erdma: Add verbs implementation")
Link: https://lore.kernel.org/r/20230307102924.70577-2-chengyou@linux.alibaba.comSigned-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

d649c638

RDMA/rxe: Rewrite rxe_task.c · d9467163

Bob Pearson authored Mar 04, 2023

This patch is a major rewrite of the tasklet routines in rxe_task.c. The
main motivation for this is the realization that the code violates the
safety of the qp pointer by correct reference counting. When a tasklet is
scheduled from a verbs API the calling thread has a valid reference to the
qp and schedules the tasklet to run at a later time carrying a pointer to
the qp. Once the calling code returns however the qp can be destroyed at
any time. In order to correct this a reference to the qp must be taken
when the task is scheduled and held until it finishes running. This is
complicated by the tasklet library not alwys running a task that is
scheduled depending on whether someone else has scheduled it.

This patch moves the logic for deciding whether to run or schedule a task
outside of do_task() and guarantees that there is only one copy of the
task scheduled or running at a time.

Secondly the separate flags controlling teardown and draining of the task
are included in the task state machine and all references to the state are
protected by spinlocks to avoid consistency and memory barrier issues.

Link: https://lore.kernel.org/r/20230304174533.11296-9-rpearsonhpe@gmail.comSigned-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

d9467163

RDMA/rxe: Make tasks schedule each other · f455a1bc

Bob Pearson authored Mar 04, 2023

Replace rxe_run_task() by rxe_sched_task() when tasks call each other.
These are not performance critical and mainly involve error paths but they
run the risk of causing deadlocks.

Link: https://lore.kernel.org/r/20230304174533.11296-8-rpearsonhpe@gmail.comSigned-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

f455a1bc

RDMA/rxe: Remove __rxe_do_task() · 960ebe97

Bob Pearson authored Mar 04, 2023

The subroutine __rxe_do_task is not thread safe and it has no way to
guarantee that the tasks, which are designed with the assumption that they
are non-reentrant, are not reentered. All of its uses are non-performance
critical.

This patch replaces calls to __rxe_do_task with calls to
rxe_sched_task. It also removes irrelevant or unneeded if tests.

Instead of calling the task machinery a single call to the tasklet
function (rxe_requester, etc.) is sufficient to draing the queues if task
execution has been disabled or stopped.

Together these changes allow the removal of __rxe_do_task.

Link: https://lore.kernel.org/r/20230304174533.11296-7-rpearsonhpe@gmail.comSigned-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

960ebe97

RDMA/rxe: Remove qp reference counting in tasks · a246aa2e

Bob Pearson authored Mar 04, 2023

Currently each of the three tasklets requester, completer and responder in
the rxe driver take and release a reference to the qp argument at the
beginning and end of the subroutines. The caller passing in the qp
argument should be responsible for holding a reference to qp so these are
not required. Further doing so breaks the qp cleanup code in
rxe_qp_do_cleanup which calls these routines after all the references have
been dropped so they cannot drain the packet and work request queues as
intended.

In fact if these routines are deferred by calling tasklet_schedule there
is no guarantee that the calling code does have a qp reference. That is a
bug in rxe_task.c which will be fixed later in this series.

Link: https://lore.kernel.org/r/20230304174533.11296-6-rpearsonhpe@gmail.comSigned-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

a246aa2e

RDMA/rxe: Cleanup error state handling in rxe_comp.c · fbdeb828

Bob Pearson authored Mar 04, 2023

Cleanup the handling of qp in the error state, reset state and during
rxe_qp_do_cleanup. Make the same as rxe_resp.c

Link: https://lore.kernel.org/r/20230304174533.11296-5-rpearsonhpe@gmail.comSigned-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

fbdeb828