1. 17 Nov, 2013 8 commits
    • Roland Dreier's avatar
      Merge branches 'cma', 'cxgb4', 'flowsteer', 'ipoib', 'misc', 'mlx4', 'mlx5',... · b4fdf52b
      Roland Dreier authored
      Merge branches 'cma', 'cxgb4', 'flowsteer', 'ipoib', 'misc', 'mlx4', 'mlx5', 'nes', 'ocrdma', 'qib' and 'srp' into for-next
      b4fdf52b
    • Matan Barak's avatar
      IB/core: Re-enable create_flow/destroy_flow uverbs · 69ad5da4
      Matan Barak authored
      This commit reverts commit 7afbddfa ("IB/core: Temporarily disable
      create_flow/destroy_flow uverbs").  Since the uverbs extensions
      functionality was experimental for v3.12, this patch re-enables the
      support for them and flow-steering for v3.13.
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      69ad5da4
    • Yann Droneaud's avatar
      IB/core: extended command: an improved infrastructure for uverbs commands · f21519b2
      Yann Droneaud authored
      Commit 400dbc96 ("IB/core: Infrastructure for extensible uverbs
      commands") added an infrastructure for extensible uverbs commands
      while later commit 436f2ad0 ("IB/core: Export ib_create/destroy_flow
      through uverbs") exported ib_create_flow()/ib_destroy_flow() functions
      using this new infrastructure.
      
      According to the commit 400dbc96, the purpose of this
      infrastructure is to support passing around provider (eg. hardware)
      specific buffers when userspace issue commands to the kernel, so that
      it would be possible to extend uverbs (eg. core) buffers independently
      from the provider buffers.
      
      But the new kernel command function prototypes were not modified to
      take advantage of this extension. This issue was exposed by Roland
      Dreier in a previous review[1].
      
      So the following patch is an attempt to a revised extensible command
      infrastructure.
      
      This improved extensible command infrastructure distinguish between
      core (eg. legacy)'s command/response buffers from provider
      (eg. hardware)'s command/response buffers: each extended command
      implementing function is given a struct ib_udata to hold core
      (eg. uverbs) input and output buffers, and another struct ib_udata to
      hold the hw (eg. provider) input and output buffers.
      
      Having those buffers identified separately make it easier to increase
      one buffer to support extension without having to add some code to
      guess the exact size of each command/response parts: This should make
      the extended functions more reliable.
      
      Additionally, instead of relying on command identifier being greater
      than IB_USER_VERBS_CMD_THRESHOLD, the proposed infrastructure rely on
      unused bits in command field: on the 32 bits provided by command
      field, only 6 bits are really needed to encode the identifier of
      commands currently supported by the kernel. (Even using only 6 bits
      leaves room for about 23 new commands).
      
      So this patch makes use of some high order bits in command field to
      store flags, leaving enough room for more command identifiers than one
      will ever need (eg. 256).
      
      The new flags are used to specify if the command should be processed
      as an extended one or a legacy one. While designing the new command
      format, care was taken to make usage of flags itself extensible.
      
      Using high order bits of the commands field ensure that newer
      libibverbs on older kernel will properly fail when trying to call
      extended commands. On the other hand, older libibverbs on newer kernel
      will never be able to issue calls to extended commands.
      
      The extended command header includes the optional response pointer so
      that output buffer length and output buffer pointer are located
      together in the command, allowing proper parameters checking. This
      should make implementing functions easier and safer.
      
      Additionally the extended header ensure 64bits alignment, while making
      all sizes multiple of 8 bytes, extending the maximum buffer size:
      
                                   legacy      extended
      
         Maximum command buffer:  256KBytes   1024KBytes (512KBytes + 512KBytes)
        Maximum response buffer:  256KBytes   1024KBytes (512KBytes + 512KBytes)
      
      For the purpose of doing proper buffer size accounting, the headers
      size are no more taken in account in "in_words".
      
      One of the odds of the current extensible infrastructure, reading
      twice the "legacy" command header, is fixed by removing the "legacy"
      command header from the extended command header: they are processed as
      two different parts of the command: memory is read once and
      information are not duplicated: it's making clear that's an extended
      command scheme and not a different command scheme.
      
      The proposed scheme will format input (command) and output (response)
      buffers this way:
      
      - command:
      
        legacy header +
        extended header +
        command data (core + hw):
      
          +----------------------------------------+
          | flags     |   00      00    |  command |
          |        in_words    |   out_words       |
          +----------------------------------------+
          |                 response               |
          |                 response               |
          | provider_in_words | provider_out_words |
          |                 padding                |
          +----------------------------------------+
          |                                        |
          .              <uverbs input>            .
          .              (in_words * 8)            .
          |                                        |
          +----------------------------------------+
          |                                        |
          .             <provider input>           .
          .          (provider_in_words * 8)       .
          |                                        |
          +----------------------------------------+
      
      - response, if present:
      
          +----------------------------------------+
          |                                        |
          .          <uverbs output space>         .
          .             (out_words * 8)            .
          |                                        |
          +----------------------------------------+
          |                                        |
          .         <provider output space>        .
          .         (provider_out_words * 8)       .
          |                                        |
          +----------------------------------------+
      
      The overall design is to ensure that the extensible infrastructure is
      itself extensible while begin more reliable with more input and bound
      checking.
      
      Note:
      
      The unused field in the extended header would be perfect candidate to
      hold the command "comp_mask" (eg. bit field used to handle
      compatibility).  This was suggested by Roland Dreier in a previous
      review[2].  But "comp_mask" field is likely to be present in the uverb
      input and/or provider input, likewise for the response, as noted by
      Matan Barak[3], so it doesn't make sense to put "comp_mask" in the
      header.
      
      [1]:
      http://marc.info/?i=CAL1RGDWxmM17W2o_era24A-TTDeKyoL6u3NRu_=t_dhV_ZA9MA@mail.gmail.com
      
      [2]:
      http://marc.info/?i=CAL1RGDXJtrc849M6_XNZT5xO1+ybKtLWGq6yg6LhoSsKpsmkYA@mail.gmail.com
      
      [3]:
      http://marc.info/?i=525C1149.6000701@mellanox.comSigned-off-by: default avatarYann Droneaud <ydroneaud@opteya.com>
      Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com
      
      [ Convert "ret ? ret : 0" to the equivalent "ret".  - Roland ]
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      f21519b2
    • Yann Droneaud's avatar
      IB/core: Remove ib_uverbs_flow_spec structure from userspace · 2490f20b
      Yann Droneaud authored
      The structure holding any types of flow_spec is of no use to
      userspace.  It would be wrong for userspace to do:
      
        struct ib_uverbs_flow_spec flow_spec;
      
        flow_spec.type = IB_FLOW_SPEC_TCP;
        flow_spec.size = sizeof(flow_spec);
      
      Instead, userspace should use the dedicated flow_spec structure for
        - Ethernet : struct ib_uverbs_flow_spec_eth,
        - IPv4     : struct ib_uverbs_flow_spec_ipv4,
        - TCP/UDP  : struct ib_uverbs_flow_spec_tcp_udp.
      
      In other words, struct ib_uverbs_flow_spec is a "virtual" data
      structure that can only be use by the kernel as an alias to the other.
      Signed-off-by: default avatarYann Droneaud <ydroneaud@opteya.com>
      Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.comSigned-off-by: default avatarRoland Dreier <roland@purestorage.com>
      2490f20b
    • Yann Droneaud's avatar
      IB/core: Use a common header for uverbs flow_specs · 58913efb
      Yann Droneaud authored
      A common header will allows better checking of flow specs size, while
      ensuring strict alignment to 64 bits.
      Signed-off-by: default avatarYann Droneaud <ydroneaud@opteya.com>
      Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.comSigned-off-by: default avatarRoland Dreier <roland@purestorage.com>
      58913efb
    • Yann Droneaud's avatar
      IB/core: Make uverbs flow structure use names like verbs ones · b68c9560
      Yann Droneaud authored
      This patch adds "flow" prefix to most of data structure added as part
      of commit 436f2ad0 ("IB/core: Export ib_create/destroy_flow through
      uverbs") to keep those names in sync with the data structures added in
      commit 319a441d ("IB/core: Add receive flow steering support").
      
      It's just a matter of translating 'ib_flow' to 'ib_uverbs_flow'.
      Signed-off-by: default avatarYann Droneaud <ydroneaud@opteya.com>
      Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.comSigned-off-by: default avatarRoland Dreier <roland@purestorage.com>
      b68c9560
    • Yann Droneaud's avatar
      IB/core: Rename 'flow' structs to match other uverbs structs · d82693da
      Yann Droneaud authored
      Commit 436f2ad0 ("IB/core: Export ib_create/destroy_flow through
      uverbs") added public data structures to support receive flow
      steering.  The new structs are not following the 'uverbs' pattern:
      they're lacking the common prefix 'ib_uverbs'.
      
      This patch replaces ib_kern prefix by ib_uverbs.
      Signed-off-by: default avatarYann Droneaud <ydroneaud@opteya.com>
      Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.comSigned-off-by: default avatarRoland Dreier <roland@purestorage.com>
      d82693da
    • Matan Barak's avatar
      IB/core: clarify overflow/underflow checks on ib_create/destroy_flow · f8848274
      Matan Barak authored
      This patch fixes the following issues:
      
      1. Unneeded checks were removed
      
      2. Removed the fixed size out of flow_attr.size, thus simplifying the checks.
      
      3. Remove a 32bit hole on 64bit systems with strict alignment in
         struct ib_kern_flow_att by adding a reserved field.
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      f8848274
  2. 16 Nov, 2013 2 commits
  3. 15 Nov, 2013 6 commits
  4. 11 Nov, 2013 2 commits
    • Michal Nazarewicz's avatar
      RDMA/cma: Remove unused argument and minor dead code · 352b9056
      Michal Nazarewicz authored
      The dev variable is never assigned after being initialised.
      Signed-off-by: default avatarMichal Nazarewicz <mina86@mina86.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      352b9056
    • Sean Hefty's avatar
      RDMA/ucma: Discard events for IDs not yet claimed by user space · c6b21824
      Sean Hefty authored
      Problem reported by Avneesh Pant <avneesh.pant@oracle.com>:
      
          It looks like we are triggering a bug in RDMA CM/UCM interaction.
          The bug specifically hits when we have an incoming connection
          request and the connecting process dies BEFORE the passive end of
          the connection can process the request i.e. it does not call
          rdma_get_cm_event() to retrieve the initial connection event.  We
          were able to triage this further and have some additional
          information now.
      
          In the example below when P1 dies after issuing a connect request
          as the CM id is being destroyed all outstanding connects (to P2)
          are sent a reject message. We see this reject message being
          received on the passive end and the appropriate CM ID created for
          the initial connection message being retrieved in cm_match_req().
          The problem is in the ucma_event_handler() code when this reject
          message is delivered to it and the initial connect message itself
          HAS NOT been delivered to the client. In fact the client has not
          even called rdma_cm_get_event() at this stage so we haven't
          allocated a new ctx in ucma_get_event() and updated the new
          connection CM_ID to point to the new UCMA context.
      
          This results in the reject message not being dropped in
          ucma_event_handler() for the new connection request as the
          (if (!ctx->uid)) block is skipped since the ctx it refers to is
          the listen CM id context which does have a valid UID associated
          with it (I believe the new CMID for the connection initially
          uses the listen CMID -> context when it is created in
          cma_new_conn_id). Thus the assumption that new events for a
          connection can get dropped in ucma_event_handler() is incorrect
          IF the initial connect request has not been retrieved in the
          first case. We end up getting a CM Reject event on the listen CM
          ID and our upper layer code asserts (in fact this event does not
          even have the listen_id set as that only gets set up librdmacm
          for connect requests).
      
      The solution is to verify that the cm_id being reported in the event
      is the same as the cm_id referenced by the ucma context.  A mismatch
      indicates that the ucma context corresponds to the listen.  This fix
      was validated by using a modified version of librdmacm that was able
      to verify the problem and see that the reject message was indeed
      dropped after this patch was applied.
      Signed-off-by: default avatarSean Hefty <sean.hefty@intel.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      c6b21824
  5. 09 Nov, 2013 2 commits
  6. 08 Nov, 2013 20 commits