1. 07 Jul, 2020 1 commit
    • Zhu Yanjun's avatar
      RDMA/rxe: Skip dgid check in loopback mode · 5c99274b
      Zhu Yanjun authored
      In the loopback tests, the following call trace occurs.
      
       Call Trace:
        __rxe_do_task+0x1a/0x30 [rdma_rxe]
        rxe_qp_destroy+0x61/0xa0 [rdma_rxe]
        rxe_destroy_qp+0x20/0x60 [rdma_rxe]
        ib_destroy_qp_user+0xcc/0x220 [ib_core]
        uverbs_free_qp+0x3c/0xc0 [ib_uverbs]
        destroy_hw_idr_uobject+0x24/0x70 [ib_uverbs]
        uverbs_destroy_uobject+0x43/0x1b0 [ib_uverbs]
        uobj_destroy+0x41/0x70 [ib_uverbs]
        __uobj_get_destroy+0x39/0x70 [ib_uverbs]
        ib_uverbs_destroy_qp+0x88/0xc0 [ib_uverbs]
        ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xb9/0xf0 [ib_uverbs]
        ib_uverbs_cmd_verbs+0xb16/0xc30 [ib_uverbs]
      
      The root cause is that the actual RDMA connection is not created in the
      loopback tests and the rxe_match_dgid will fail randomly.
      
      To fix this call trace which appear in the loopback tests, skip check of
      the dgid.
      
      Fixes: 8700e3e7 ("Soft RoCE driver")
      Link: https://lore.kernel.org/r/20200630123605.446959-1-leon@kernel.orgSigned-off-by: default avatarZhu Yanjun <yanjunz@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      5c99274b
  2. 06 Jul, 2020 18 commits
  3. 03 Jul, 2020 2 commits
  4. 02 Jul, 2020 2 commits
    • Jason Gunthorpe's avatar
      RDMA/core: Fix bogus WARN_ON during ib_unregister_device_queued() · 0cb42c02
      Jason Gunthorpe authored
      ib_unregister_device_queued() can only be used by drivers using the new
      dealloc_device callback flow, and it has a safety WARN_ON to ensure
      drivers are using it properly.
      
      However, if unregister and register are raced there is a special
      destruction path that maintains the uniform error handling semantic of
      'caller does ib_dealloc_device() on failure'. This requires disabling the
      dealloc_device callback which triggers the WARN_ON.
      
      Instead of using NULL to disable the callback use a special function
      pointer so the WARN_ON does not trigger.
      
      Fixes: d0899892 ("RDMA/device: Provide APIs from the core code to help unregistration")
      Link: https://lore.kernel.org/r/0-v1-a36d512e0a99+762-syz_dealloc_driver_jgg@nvidia.com
      Reported-by: syzbot+4088ed905e4ae2b0e13b@syzkaller.appspotmail.com
      Suggested-by: default avatarHillf Danton <hdanton@sina.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      0cb42c02
    • Jason Gunthorpe's avatar
      RDMA/ipoib: Fix ABBA deadlock with ipoib_reap_ah() · 65936bf2
      Jason Gunthorpe authored
      ipoib_mcast_carrier_on_task() insanely open codes a rtnl_lock() such that
      the only time flush_workqueue() can be called is if it also clears
      IPOIB_FLAG_OPER_UP.
      
      Thus the flush inside ipoib_flush_ah() will deadlock if it gets unlucky
      enough, and lockdep doesn't help us to find it early:
      
                CPU0               CPU1          CPU2
         __ipoib_ib_dev_flush()
            down_read(vlan_rwsem)
      
                               ipoib_vlan_add()
                                 rtnl_trylock()
                                 down_write(vlan_rwsem)
      
      				      ipoib_mcast_carrier_on_task()
      					 while (!rtnl_trylock())
      					      msleep(20);
      
            ipoib_flush_ah()
      	flush_workqueue(priv->wq)
      
      Clean up the ah_reaper related functions and lifecycle to make sense:
      
       - Start/Stop of the reaper should only be done in open/stop NDOs, not in
         any other places
      
       - cancel and flush of the reaper should only happen in the stop NDO.
         cancel is only functional when combined with IPOIB_STOP_REAPER.
      
       - Non-stop places were flushing the AH's just need to flush out dead AH's
         synchronously and ignore the background task completely. It is fully
         locked and harmless to leave running.
      
      Which ultimately fixes the ABBA deadlock by removing the unnecessary
      flush_workqueue() from the problematic place under the vlan_rwsem.
      
      Fixes: efc82eee ("IB/ipoib: No longer use flush as a parameter")
      Link: https://lore.kernel.org/r/20200625174219.290842-1-kamalheib1@gmail.comReported-by: default avatarKamal Heib <kheib@redhat.com>
      Tested-by: default avatarKamal Heib <kheib@redhat.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      65936bf2
  5. 30 Jun, 2020 1 commit
  6. 27 Jun, 2020 3 commits
  7. 24 Jun, 2020 10 commits
  8. 23 Jun, 2020 3 commits