1. 04 Apr, 2018 8 commits
    • Parav Pandit's avatar
      RDMA: Use ib_gid_attr during GID modification · 414448d2
      Parav Pandit authored
      Now that ib_gid_attr contains device, port and index, simplify the
      provider APIs add_gid() and del_gid() to use device, port and index
      fields from the ib_gid_attr attributes structure.
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      414448d2
    • Parav Pandit's avatar
      IB/providers: Avoid null netdev check for RoCE · 3e44e0ee
      Parav Pandit authored
      Now that IB core GID cache ensures that all RoCE entries have an
      associated netdev remove null checks from the provider drivers for
      clarity.
      Reviewed-by: default avatarMark Bloch <markb@mellanox.com>
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      3e44e0ee
    • Parav Pandit's avatar
      IB/providers: Avoid zero GID check for RoCE · 14169e33
      Parav Pandit authored
      Now that the IB core GID cache ensures that a zero GID doesn't exist in
      the GID table remove zero GID checks from the provider drivers for
      clarity.
      Reviewed-by: default avatarMark Bloch <markb@mellanox.com>
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      14169e33
    • Parav Pandit's avatar
      IB/core: Refactor GID modify code for RoCE · 598ff6ba
      Parav Pandit authored
      Code is refactored to prepare separate functions for RoCE which can do more
      complex operations related to reference counting, while still
      maintainining code readability. This includes
      (a) Simplification to not perform netdevice checks and modifications
      for IB link layer.
      (b) Do not add RoCE GID entry which has NULL netdevice; instead return
      an error.
      (c) If GID addition fails at provider level add_gid(), do not add the
      entry in the cache and keep the entry marked as INVALID.
      (d) Simplify and reuse the ib_cache_gid_add()/del() routines so that they
      can be used even for modifying default GIDs. This avoid some code
      duplication in modifying default GIDs.
      (e) find_gid() routine refers to the data entry flags to qualify a GID
      as valid or invalid GID rather than depending on attributes and zeroness
      of the GID content.
      (f) gid_table_reserve_default() sets the GID default attribute at
      beginning while setting up the GID table. There is no need to use
      default_gid flag in low level functions such as write_gid(), add_gid(),
      del_gid(), as they never need to update the DEFAULT property of the GID
      entry while during GID table update.
      
      As as result of this refactor, reserved GID 0:0:0:0:0:0:0:0 is no longer
      searchable as described below.
      
      A unicast GID entry of 0:0:0:0:0:0:0:0 is Reserved GID as per the IB
      spec version 1.3 section 4.1.1, point (6) whose snippet is below.
      
      "The unicast GID address 0:0:0:0:0:0:0:0 is reserved - referred to as
      the Reserved GID. It shall never be assigned to any endport. It shall
      not be used as a destination address or in a global routing header
      (GRH)."
      
      GID table cache now only stores valid GID entries. Before this patch,
      Reserved GID 0:0:0:0:0:0:0:0 was searchable in the GID table using
      ib_find_cached_gid_by_port() and other similar find routines.
      
      Zero GID is no longer searchable as it shall not to be present in GRH or
      path recored entry as described in IB spec version 1.3 section 4.1.1,
      point (6), section 12.7.10 and section 12.7.20.
      
      ib_cache_update() is simplified to check link layer once, use unified
      locking scheme for all link layers, removed temporary gid table
      allocation/free logic.
      
      Additionally,
      (a) Expand ib_gid_attr to store port and index so that GID query
      routines can get port and index information from the attribute structure.
      (b) Expand ib_gid_attr to store device as well so that in future code when
      GID reference counting is done, device is used to reach back to the GID
      table entry.
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      598ff6ba
    • Parav Pandit's avatar
      IB/core: Simplify ib_query_gid to always refer to cache · f35faa4b
      Parav Pandit authored
      Currently following inconsistencies exist.
      1. ib_query_gid() returns GID from the software cache for a RoCE port
      and returns GID from the HCA for an IB port.
      This is incorrect because software GID cache is maintained regardless
      of HCA port type.
      
      2. GID is queries from the HCA via ib_query_gid and updated in the
      software cache for IB link layer. Both of them might not be in sync.
      
      ULPs such as SRP initiator, SRP target, IPoIB driver have historically
      used ib_query_gid() API to query the GID. However CM used cached version
      during CM processing, When software cache was introduced, this
      inconsitency remained.
      
      In order to simplify, improve readability and avoid link layer
      specific above inconsistencies, this patch brings following changes.
      
      1. ib_query_gid() always refers to the cache layer regardless of link
      layer.
      
      2. cache module who reads the GID entry from HCA and builds the cache,
      directly invokes the HCA provider verb's query_gid() callback function.
      
      3. ib_query_port() is being called in early stage where GID cache is not
      yet build while reading port immutable property. Therefore it needs to
      read the default GID from the HCA for IB link layer to publish the
      subnet prefix.
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      f35faa4b
    • Parav Pandit's avatar
      RDMA/providers: Simplify query_gid callback of RoCE providers · 0e1f9b92
      Parav Pandit authored
      ib_query_gid() fetches the GID from the software cache maintained in
      ib_core for RoCE ports.
      
      Therefore, simplify the provider drivers for RoCE to treat query_gid()
      callback as never called for RoCE, and only require non-RoCE devices to
      implement it.
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      0e1f9b92
    • Parav Pandit's avatar
      RDMA/core: Update query_gid documentation for HCA drivers · 72e1ff0f
      Parav Pandit authored
      query_gid() should return right GID value for iWarp and IB link layers.
      It is a no-op for RoCE link layer.  Update the documentation to reflect
      this.
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      72e1ff0f
    • Roland Dreier's avatar
      RDMA/ucma: Don't allow setting RDMA_OPTION_IB_PATH without an RDMA device · 8435168d
      Roland Dreier authored
      Check to make sure that ctx->cm_id->device is set before we use it.
      Otherwise userspace can trigger a NULL dereference by doing
      RDMA_USER_CM_CMD_SET_OPTION on an ID that is not bound to a device.
      
      Cc: <stable@vger.kernel.org>
      Reported-by: <syzbot+a67bc93e14682d92fc2f@syzkaller.appspotmail.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      8435168d
  2. 03 Apr, 2018 5 commits
  3. 29 Mar, 2018 8 commits
  4. 28 Mar, 2018 1 commit
  5. 27 Mar, 2018 16 commits
    • Parav Pandit's avatar
      IB/core: Refer to RoCE port property to decide building cache · 190fb9c4
      Parav Pandit authored
      IB core maintains the GID cache entries for the GID table.
      This cache table has to be maintained regardless of HCA's
      support of GID table.
      For IB and iWarp ports, cache is created by querying the HCA.
      For RoCE cache is created based on netdev events.
      
      Therefore just refer to the RoCE port property of the {device, port} to
      decide whether to build cache by querying HCA or from netdev events.
      There is no need to check if HCA support GID table or not.
      
      ib_cache_update() referred to RoCE attribute before validating
      port. Though in all current callers port is valid, it is incorrect
      to query RoCE port property before validating the port. Therefore,
      rdma_protocol_roce() check is done after rdma_is_port_valid() verifies
      that port is valid.
      
      Fixes: 115b68aa ("IB/ocrdma: Removed GID add/del null routines")
      Reviewed-by: default avatarDaniel Jurgens <danielj@mellanox.com>
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      190fb9c4
    • Parav Pandit's avatar
      IB/core: Search GID only for IB link layer · 22d24f75
      Parav Pandit authored
      Even though API is only used by IPoIB driver, its incorrect to refer
      RoCE GID table property to search for GID.
      
      Look for only IB link layer to search for the GID.
      
      Fixes: dbb12562 ("IB/{core, ipoib}: Simplify ib_find_gid to search only for IB link layer")
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      22d24f75
    • Parav Pandit's avatar
      IB/core: Refer to RoCE port property instead of GID table property · 4ab7cb4b
      Parav Pandit authored
      ib_find_gid_by_filter() searches GID with filter only for RoCE link
      layer regardless of HCA's support for GID table.
      Therefore, right way to lookup is compare RoCE port property and not
      the GID table property.
      
      Fixes: 99b27e3b ("IB/cache: Add ib_find_gid_by_filter cache API")
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      4ab7cb4b
    • Parav Pandit's avatar
      IB/core: Generate GID change event regardless of RoCE GID table property · 3401857e
      Parav Pandit authored
      Due to following reasons, GID table event is generated regardless of GID
      table property.
      
      1. GID table cache is maintained at ib core layer regardless of link layer.
      2. GID change event has no relation with IB link layer.
      3. GID change event also doesn't depend on whether HCA supports GID table
      or not.
      
      Fixes: f3906bd3 ("IB/core: Refactor GID cache's ib_dispatch_event")
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      3401857e
    • Parav Pandit's avatar
      IB/cm: Block processing alternate path handling RoCE Rx cm messages · 97c45c2c
      Parav Pandit authored
      Due to below reasons, it is better to not support alternate path receive
      messages for RoCE in near term.
      
      1. Alternate path for RoCE is not supported at rdmacm layer.
      2. It is not supported in uverbs/core layer for RoCE.
      3. Alternate path for IPv6 for link local address cannot resolve route
      determinstically without a valid incoming interface id whose usecase
      make sense only with dual port mode.
      4. init_av_from_path while processing LAP messages for IB and RoCE can
      lead to adding duplicate entry of AV into the port list, leads to list
      corruption.
      5. rdma-core userspace a well known userspace implementation has removed
      support of libucm which use ucm.ko module, which is the only module that
      can trigger alternate path related messages.
      6. ucm kernel module is requested to be removed from the IB core in
      patch [1].
      
      [1] https://patchwork.kernel.org/patch/10268503/Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      97c45c2c
    • Mark Bloch's avatar
      IB/core: Protect against concurrent access to hardware stats · e945130b
      Mark Bloch authored
      Currently access to hardware stats buffer isn't protected, this can
      result in multiple writes and reads at the same time to the same
      memory location. This can lead to providing an incorrect value to
      the user. Add a mutex to protect against it.
      
      Fixes: b40f4757 ("IB/core: Make device counter infrastructure dynamic")
      Signed-off-by: default avatarMark Bloch <markb@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      e945130b
    • Majd Dibbiny's avatar
      IB/mlx5: Respect new UMR capabilities · c8d75a98
      Majd Dibbiny authored
      In some firmware configuration, UMR usage from Virtual Functions is restricted.
      This information is published to the driver using new capability bits.
      
      Avoid using UMRs in these cases and use the Firmware slow-path flow to create
      mkeys and populate them with Virtual to Physical address translation.
      
      Older drivers that do not have this patch, will end up using memory keys that
      aren't populated with Virtual to Physical address translation that is done
      part of the UMR work.
      Reviewed-by: default avatarMark Bloch <markb@mellanox.com>
      Signed-off-by: default avatarMajd Dibbiny <majd@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Tested-by: default avatarLaurence Oberman <loberman@redhat.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      c8d75a98
    • Majd Dibbiny's avatar
      IB/mlx5: Enable ECN capable bits for UD RoCE v2 QPs · ea8af0d2
      Majd Dibbiny authored
      When working with RC QPs, the FW sets the ECN capable bits for all
      the RoCE v2 packets. On the other hand, for UD QPs, the driver needs
      to set the the ECN capable bits in the Address Handler since the HW
      generates each packet according to the Address Handler and not
      the QP context.
      
      If ECN is not enabled in NIC or switch, these bits are ignored.
      
      Fixes: 2811ba51 ("IB/mlx5: Add RoCE fields to Address Vector")
      Reviewed-by: default avatarMark Bloch <markb@mellanox.com>
      Signed-off-by: default avatarMajd Dibbiny <majd@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      ea8af0d2
    • Matan Barak's avatar
      IB/uverbs: UAPI pointers should use __aligned_u64 type · be23fb9a
      Matan Barak authored
      The ioctl() UAPIs are meant to be used by both user-space
      and kernel ioctl() handlers.
      
      Mostly, these UAPI structs tend to consist of simple types, but
      sometimes user-space pointers may be passed between user-space and
      kernel. We would like to avoid dereferencing a user-space pointer in
      the kernel, thus - we always define RDMA_UAPI_PTR as a __aligned_u64
      type.
      
      Fixes: 1f7ff9d5 ('IB/uverbs: Move to new headers and make naming consistent')
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      be23fb9a
    • Jason Gunthorpe's avatar
      Merge branch '32compat' · 819b6028
      Jason Gunthorpe authored
      The design of the uAPI had intended all structs to share the same layout on 32
      and 64 bit compiles. Unfortunately over the years some errors have crept in.
      
      This series fixes all the incompatabilities. It goes along with a userspace
      rdma-core series that causes the providers to use these structs directly and
      then does various self-checks on the command formation.
      
      Those checks were combined with output from pahole on 32 and 64 bit compiles
      to confirm that the structure layouts are the same.
      
      This series does not make implicit padding explicit, as long as the implicit
      padding is the same on 32 and 64 bit compiles.
      
      Finally, the issue is put to rest by using __aligned_u64 in the uapi headers,
      if new code copies that type, and is checked in userspace, it is unlikely we
      will see problems in future.
      
      There are two patches that break the ABI for a 32 bit kernel, one for rxe and
      one for mlx4. Both patches have notes, but the overall feeling from Doug and I
      is that providing compat is just too difficult and not necessary since there
      is no real user of a 32 bit userspace and 32 bit kernel for various good
      reasons.
      
      The 32 bit userspace / 64 bit kernel case however does seem to have some real
      users and does need to work as designed.
      
      * 32compat:
        RDMA: Change all uapi headers to use __aligned_u64 instead of __u64
        RDMA/rxe: Fix uABI structure layouts for 32/64 compat
        RDMA/mlx4: Fix uABI structure layouts for 32/64 compat
        RDMA/qedr: Fix uABI structure layouts for 32/64 compat
        RDMA/ucma: Fix uABI structure layouts for 32/64 compat
        RDMA: Remove minor pahole differences between 32/64
      819b6028
    • Jason Gunthorpe's avatar
      RDMA: Change all uapi headers to use __aligned_u64 instead of __u64 · 26b99066
      Jason Gunthorpe authored
      The new auditing standard for the subsystem will be to only use
      __aligned_64 in uapi headers to try and prevent 32/64 compat bugs
      from existing in the future.
      
      Changing all existing usage will help ensure new developers copy the
      right idea.
      
      The before/after of this patch was tested using pahole on 32 and 64
      bit compiles to confirm it has no change in the structure layout, so
      this patch is a NOP.
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      26b99066
    • Jason Gunthorpe's avatar
      RDMA/rxe: Fix uABI structure layouts for 32/64 compat · f2e9bfac
      Jason Gunthorpe authored
      With 32 bit compilation several of the fields become misaligned here.
      Fixing this is an ABI break for 32 bit rxe and it is in well used
      portions of the rxe ABI.
      
      To handle this we bump the ABI version, as expected. However the user
      space driver doesn't handle it properly today, so all existing user
      space continues to work.
      
      Updated userspace will start to require the necessary kernel version.
      
      We don't expect there to be any 32 bit users of rxe. Most likely cases,
      such as ARM 32 already generally don't work because rxe does not handle
      the CPU cache properly on its shared with userspace pages.
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      f2e9bfac
    • Jason Gunthorpe's avatar
      RDMA/mlx4: Fix uABI structure layouts for 32/64 compat · 366380a0
      Jason Gunthorpe authored
      rss_caps in struct mlx4_uverbs_ex_query_device_resp is misaligned on
      32 bit compared to 64 bit, add explicit padding.
      
      The rss caps were introduced recently and are very rarely used in user
      space, mainly for DPDK.
      
      We don't expect there to be a real 32 bit user, so this change is done
      without compat considerations.
      
      Fixes: 09d208b2 ("IB/mlx4: Add report for RSS capabilities by vendor channel")
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      366380a0
    • Jason Gunthorpe's avatar
      RDMA/qedr: Fix uABI structure layouts for 32/64 compat · 71e80a47
      Jason Gunthorpe authored
      struct qedr_alloc_ucontext_resp is a different length in 32 and 64
      bit compiles due to implicit compiler padding.
      
      The structs alloc_pd_uresp, create_cq_uresp and create_qp_uresp are
      not padded by the compiler, but in user space the compiler pads them
      due to the way the core and driver structs are concatenated. Make
      this padding explicit and consistent for future sanity.
      
      The kernel driver can already handle the user buffer being smaller
      than required and copies correctly, so no compat or ABI break happens
      from introducing the explicit padding.
      Acked-by: default avatarMichal Kalderon <michal.kalderon@cavium.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      71e80a47
    • Jason Gunthorpe's avatar
      RDMA/ucma: Fix uABI structure layouts for 32/64 compat · 611cb92b
      Jason Gunthorpe authored
      The rdma_ucm_event_resp is a different length on 32 and 64 bit compiles.
      
      The kernel requires it to be the expected length or longer so 32 bit
      builds running on a 64 bit kernel will not work.
      
      Retain full compat by having all kernels accept a struct with or without
      the trailing reserved field.
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      611cb92b
    • Jason Gunthorpe's avatar
      RDMA: Remove minor pahole differences between 32/64 · 38b48808
      Jason Gunthorpe authored
      To help automatic detection we want pahole to report the same struct
      layouts for 32 and 64 bit compiles. These cases are all implicit
      padding added at the end of embedded structs as part of a union.
      
      The added reserved fields have no impact on the ABI.
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      38b48808
  6. 23 Mar, 2018 2 commits