1. 09 Apr, 2023 5 commits
    • Patrick Kelsey's avatar
      IB/hfi1: Fix SDMA mmu_rb_node not being evicted in LRU order · 9fe8fec5
      Patrick Kelsey authored
      hfi1_mmu_rb_remove_unless_exact() did not move mmu_rb_node objects in
      mmu_rb_handler->lru_list after getting a cache hit on an mmu_rb_node.
      
      As a result, hfi1_mmu_rb_evict() was not guaranteed to evict truly
      least-recently used nodes.
      
      This could be a performance issue for an application when that
      application:
      - Uses some long-lived buffers frequently.
      - Uses a large number of buffers once.
      - Hits the mmu_rb_handler cache size or pinned-page limits, forcing
        mmu_rb_handler cache entries to be evicted.
      
      In this case, the one-time use buffers cause the long-lived buffer
      entries to eventually filter to the end of the LRU list where
      hfi1_mmu_rb_evict() will consider evicting a frequently-used long-lived
      entry instead of evicting one of the one-time use entries.
      
      Fix this by inserting new mmu_rb_node at the tail of
      mmu_rb_handler->lru_list and move mmu_rb_ndoe to the tail of
      mmu_rb_handler->lru_list when the mmu_rb_node is a hit in
      hfi1_mmu_rb_remove_unless_exact(). Change hfi1_mmu_rb_evict() to evict
      from the head of mmu_rb_handler->lru_list instead of the tail.
      
      Fixes: 0636e9ab ("IB/hfi1: Add cache evict LRU list")
      Signed-off-by: default avatarBrendan Cunningham <bcunningham@cornelisnetworks.com>
      Signed-off-by: default avatarPatrick Kelsey <pat.kelsey@cornelisnetworks.com>
      Signed-off-by: default avatarDennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
      Link: https://lore.kernel.org/r/168088635931.3027109.10423156330761536044.stgit@252.162.96.66.static.eigbox.netSigned-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      9fe8fec5
    • Ehab Ababneh's avatar
      IB/hfi1: Suppress useless compiler warnings · cf0455f1
      Ehab Ababneh authored
      These warnings can cause build failure:
      
      In file included from ./include/trace/define_trace.h:102,
                       from drivers/infiniband/hw/hfi1/trace_dbg.h:111,
                       from drivers/infiniband/hw/hfi1/trace.h:15,
                       from drivers/infiniband/hw/hfi1/trace.c:6:
      drivers/infiniband/hw/hfi1/./trace_dbg.h: In function ‘trace_event_get_offsets_hfi1_trace_template’:
      ./include/trace/trace_events.h:261:9: warning: function ‘trace_event_get_offsets_hfi1_trace_template’ might be a candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format]
        struct trace_event_raw_##call __maybe_unused *entry;  \
               ^~~~~~~~~~~~~~~~
      drivers/infiniband/hw/hfi1/./trace_dbg.h:25:1: note: in expansion of macro ‘DECLARE_EVENT_CLASS’
       DECLARE_EVENT_CLASS(hfi1_trace_template,
       ^~~~~~~~~~~~~~~~~~~
      In file included from ./include/trace/define_trace.h:102,
                       from drivers/infiniband/hw/hfi1/trace_dbg.h:111,
                       from drivers/infiniband/hw/hfi1/trace.h:15,
                       from drivers/infiniband/hw/hfi1/trace.c:6:
      drivers/infiniband/hw/hfi1/./trace_dbg.h: In function ‘trace_event_raw_event_hfi1_trace_template’:
      ./include/trace/trace_events.h:386:9: warning: function ‘trace_event_raw_event_hfi1_trace_template’ might be a candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format]
        struct trace_event_raw_##call *entry;    \
               ^~~~~~~~~~~~~~~~
      drivers/infiniband/hw/hfi1/./trace_dbg.h:25:1: note: in expansion of macro ‘DECLARE_EVENT_CLASS’
       DECLARE_EVENT_CLASS(hfi1_trace_template,
       ^~~~~~~~~~~~~~~~~~~
      In file included from ./include/trace/define_trace.h:103,
                       from drivers/infiniband/hw/hfi1/trace_dbg.h:111,
                       from drivers/infiniband/hw/hfi1/trace.h:15,
                       from drivers/infiniband/hw/hfi1/trace.c:6:
      drivers/infiniband/hw/hfi1/./trace_dbg.h: In function ‘perf_trace_hfi1_trace_template’:
      ./include/trace/perf.h:70:9: warning: function ‘perf_trace_hfi1_trace_template’ might be a candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format]
        struct hlist_head *head;     \
               ^~~~~~~~~~
      drivers/infiniband/hw/hfi1/./trace_dbg.h:25:1: note: in expansion of macro ‘DECLARE_EVENT_CLASS’
       DECLARE_EVENT_CLASS(hfi1_trace_template,
       ^~~~~~~~~~~~~~~~~~~
      
      Solution adapted here is similar to the one in fbbc95a4Signed-off-by: default avatarEhab Ababneh <ehab.ababneh@cornelisnetworks.com>
      Signed-off-by: default avatarDennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
      Link: https://lore.kernel.org/r/168088635415.3027109.5711716700328939402.stgit@252.162.96.66.static.eigbox.netSigned-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      cf0455f1
    • Dean Luick's avatar
      IB/hfi1: Remove trace newlines · d2590edc
      Dean Luick authored
      The hfi1_cdbg trace mechanism appends a newline.  Remove trailing
      newlines from all format strings.
      Signed-off-by: default avatarDean Luick <dean.luick@cornelisnetworks.com>
      Signed-off-by: default avatarDennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
      Link: https://lore.kernel.org/r/168088634897.3027109.10401662436950683555.stgit@252.162.96.66.static.eigbox.netSigned-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      d2590edc
    • Saravanan Vajravel's avatar
      RDMA/srpt: Add a check for valid 'mad_agent' pointer · eca5cd94
      Saravanan Vajravel authored
      When unregistering MAD agent, srpt module has a non-null check
      for 'mad_agent' pointer before invoking ib_unregister_mad_agent().
      This check can pass if 'mad_agent' variable holds an error value.
      The 'mad_agent' can have an error value for a short window when
      srpt_add_one() and srpt_remove_one() is executed simultaneously.
      
      In srpt module, added a valid pointer check for 'sport->mad_agent'
      before unregistering MAD agent.
      
      This issue can hit when RoCE driver unregisters ib_device
      
      Stack Trace:
      ------------
      BUG: kernel NULL pointer dereference, address: 000000000000004d
      PGD 145003067 P4D 145003067 PUD 2324fe067 PMD 0
      Oops: 0002 [#1] PREEMPT SMP NOPTI
      CPU: 10 PID: 4459 Comm: kworker/u80:0 Kdump: loaded Tainted: P
      Hardware name: Dell Inc. PowerEdge R640/06NR82, BIOS 2.5.4 01/13/2020
      Workqueue: bnxt_re bnxt_re_task [bnxt_re]
      RIP: 0010:_raw_spin_lock_irqsave+0x19/0x40
      Call Trace:
        ib_unregister_mad_agent+0x46/0x2f0 [ib_core]
        IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
        ? __schedule+0x20b/0x560
        srpt_unregister_mad_agent+0x93/0xd0 [ib_srpt]
        srpt_remove_one+0x20/0x150 [ib_srpt]
        remove_client_context+0x88/0xd0 [ib_core]
        bond0: (slave p2p1): link status definitely up, 100000 Mbps full duplex
        disable_device+0x8a/0x160 [ib_core]
        bond0: active interface up!
        ? kernfs_name_hash+0x12/0x80
       (NULL device *): Bonding Info Received: rdev: 000000006c0b8247
        __ib_unregister_device+0x42/0xb0 [ib_core]
       (NULL device *):         Master: mode: 4 num_slaves:2
        ib_unregister_device+0x22/0x30 [ib_core]
       (NULL device *):         Slave: id: 105069936 name:p2p1 link:0 state:0
        bnxt_re_stopqps_and_ib_uninit+0x83/0x90 [bnxt_re]
        bnxt_re_alloc_lag+0x12e/0x4e0 [bnxt_re]
      
      Fixes: a42d985b ("ib_srpt: Initial SRP Target merge for v3.3-rc1")
      Reviewed-by: default avatarSelvin Xavier <selvin.xavier@broadcom.com>
      Reviewed-by: default avatarKashyap Desai <kashyap.desai@broadcom.com>
      Signed-off-by: default avatarSaravanan Vajravel <saravanan.vajravel@broadcom.com>
      Link: https://lore.kernel.org/r/20230406042549.507328-1-saravanan.vajravel@broadcom.comReviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      eca5cd94
    • Mark Zhang's avatar
      RDMA/cm: Trace icm_send_rej event before the cm state is reset · bd9de1ba
      Mark Zhang authored
      Trace icm_send_rej event before the cm state is reset to idle, so that
      correct cm state will be logged. For example when an incoming request is
      rejected, the old trace log was:
          icm_send_rej: local_id=961102742 remote_id=3829151631 state=IDLE reason=REJ_CONSUMER_DEFINED
      With this patch:
          icm_send_rej: local_id=312971016 remote_id=3778819983 state=MRA_REQ_SENT reason=REJ_CONSUMER_DEFINED
      
      Fixes: 8dc105be ("RDMA/cm: Add tracepoints to track MAD send operations")
      Signed-off-by: default avatarMark Zhang <markzhang@nvidia.com>
      Link: https://lore.kernel.org/r/20230330072351.481200-1-markzhang@nvidia.comSigned-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      bd9de1ba
  2. 04 Apr, 2023 7 commits
  3. 03 Apr, 2023 7 commits
  4. 30 Mar, 2023 1 commit
  5. 29 Mar, 2023 8 commits
  6. 24 Mar, 2023 12 commits