1. 14 Mar, 2018 6 commits
    • Mark Bloch's avatar
      IB/mlx5: Expose more priorities for bypass namespace · 72f7cc09
      Mark Bloch authored
      BYPASS namespace is used by the RDMA side to insert flow rules into
      the vport RX flow tables. Currently only 8 priorities are exposed,
      increase this to 16 to allow more flexibility. This change will also
      cause the BYPASS namespace to use 32 levels (as apposed to 16 today) of
      flow tables, 16 levels for regular rules and 16 for don't trap rules.
      Reviewed-by: default avatarMaor Gottlieb <maorg@mellanox.com>
      Signed-off-by: default avatarMark Bloch <markb@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      72f7cc09
    • Bart Van Assche's avatar
      IB/srp: Fix IPv6 address parsing · c62adb7d
      Bart Van Assche authored
      Split IPv6 addresses at the colon that separates the IPv6 address
      and the port number instead of at a colon in the middle of the IPv6
      address. Check whether the IPv6 address is surrounded with square
      brackets.
      
      Fixes: 19f31343 ("IB/srp: Add RDMA/CM support")
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      c62adb7d
    • Leon Romanovsky's avatar
      RDMA/verbs: Simplify modify QP check · 19b1f540
      Leon Romanovsky authored
      All callers to ib_modify_qp_is_ok() provides enum ib_qp_state
      makes the checks of out-of-scope redundant. Let's remove them
      together with updating function signature to return boolean result.
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Reviewed-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      19b1f540
    • Leon Romanovsky's avatar
      RDMA/pvrdma: Properly annotate QP states · fbf1795c
      Leon Romanovsky authored
      QP states provided by core layer are converted to enum ib_qp_state
      and better to use internal variable in that type instead of int.
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      fbf1795c
    • Leon Romanovsky's avatar
      RDMA/uverbs: Ensure validity of current QP state value · 88de869b
      Leon Romanovsky authored
      The QP state is internal enum which is checked at the driver
      level by calling to ib_modify_qp_is_ok(). Move this check closer
      to user and leave kernel users to be checked by compiler.
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Reviewed-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      88de869b
    • Leon Romanovsky's avatar
      RDMA/mlx5: Fix NULL dereference while accessing XRC_TGT QPs · 75a45982
      Leon Romanovsky authored
      mlx5 modify_qp() relies on FW that the error will be thrown if wrong
      state is supplied. The missing check in FW causes the following crash
      while using XRC_TGT QPs.
      
      [   14.769632] BUG: unable to handle kernel NULL pointer dereference at (null)
      [   14.771085] IP: mlx5_ib_modify_qp+0xf60/0x13f0
      [   14.771894] PGD 800000001472e067 P4D 800000001472e067 PUD 14529067 PMD 0
      [   14.773126] Oops: 0002 [#1] SMP PTI
      [   14.773763] CPU: 0 PID: 365 Comm: ubsan Not tainted 4.16.0-rc1-00038-g8151138c0793 #119
      [   14.775192] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
      [   14.777522] RIP: 0010:mlx5_ib_modify_qp+0xf60/0x13f0
      [   14.778417] RSP: 0018:ffffbf48001c7bd8 EFLAGS: 00010246
      [   14.779346] RAX: 0000000000000000 RBX: ffff9a8f9447d400 RCX: 0000000000000000
      [   14.780643] RDX: 0000000000000000 RSI: 000000000000000a RDI: 0000000000000000
      [   14.781930] RBP: 0000000000000000 R08: 00000000000217b0 R09: ffffffffbc9c1504
      [   14.783214] R10: fffff4a180519480 R11: ffff9a8f94523600 R12: ffff9a8f9493e240
      [   14.784507] R13: ffff9a8f9447d738 R14: 000000000000050a R15: 0000000000000000
      [   14.785800] FS:  00007f545b466700(0000) GS:ffff9a8f9fc00000(0000) knlGS:0000000000000000
      [   14.787073] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   14.787792] CR2: 0000000000000000 CR3: 00000000144be000 CR4: 00000000000006b0
      [   14.788689] Call Trace:
      [   14.789007]  _ib_modify_qp+0x71/0x120
      [   14.789475]  modify_qp.isra.20+0x207/0x2f0
      [   14.790010]  ib_uverbs_modify_qp+0x90/0xe0
      [   14.790532]  ib_uverbs_write+0x1d2/0x3c0
      [   14.791049]  ? __handle_mm_fault+0x93c/0xe40
      [   14.791644]  __vfs_write+0x36/0x180
      [   14.792096]  ? handle_mm_fault+0xc1/0x210
      [   14.792601]  vfs_write+0xad/0x1e0
      [   14.793018]  SyS_write+0x52/0xc0
      [   14.793422]  do_syscall_64+0x75/0x180
      [   14.793888]  entry_SYSCALL_64_after_hwframe+0x21/0x86
      [   14.794527] RIP: 0033:0x7f545ad76099
      [   14.794975] RSP: 002b:00007ffd78787468 EFLAGS: 00000287 ORIG_RAX: 0000000000000001
      [   14.795958] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f545ad76099
      [   14.797075] RDX: 0000000000000078 RSI: 0000000020009000 RDI: 0000000000000003
      [   14.798140] RBP: 00007ffd78787470 R08: 00007ffd78787480 R09: 00007ffd78787480
      [   14.799207] R10: 00007ffd78787480 R11: 0000000000000287 R12: 00005599ada98760
      [   14.800277] R13: 00007ffd78787560 R14: 0000000000000000 R15: 0000000000000000
      [   14.801341] Code: 4c 8b 1c 24 48 8b 83 70 02 00 00 48 c7 83 cc 02 00
      00 00 00 00 00 48 c7 83 24 03 00 00 00 00 00 00 c7 83 2c 03 00 00 00 00
      00 00 <c7> 00 00 00 00 00 48 8b 83 70 02 00 00 c7 40 04 00 00 00 00 4c
      [   14.804012] RIP: mlx5_ib_modify_qp+0xf60/0x13f0 RSP: ffffbf48001c7bd8
      [   14.804838] CR2: 0000000000000000
      [   14.805288] ---[ end trace 3f1da0df5c8b7c37 ]---
      
      Cc: syzkaller <syzkaller@googlegroups.com>
      Reported-by: default avatarMaor Gottlieb <maorg@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      75a45982
  2. 13 Mar, 2018 8 commits
  3. 08 Mar, 2018 10 commits
  4. 07 Mar, 2018 16 commits
    • Leon Romanovsky's avatar
      net/mlx5: Fix wrongly assigned CQ reference counter · 31135eb3
      Leon Romanovsky authored
      The kernel compiled with CONFIG_REFCOUNT_FULL produces the following
      error. The reason to it that initial value of refcount_t is supposed
      to be more than 0, change it.
      
      [    3.106634] ------------[ cut here ]------------
      [    3.107756] refcount_t: increment on 0; use-after-free.
      [    3.109130] WARNING: CPU: 0 PID: 1 at lib/refcount.c:153 refcount_inc+0x27/0x30
      [    3.110085] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc1-00028-gf683e04bdccc #137
      [    3.110085] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
      [    3.110085] RIP: 0010:refcount_inc+0x27/0x30
      [    3.110085] RSP: 0000:ffffaa620000fba0 EFLAGS: 00010286
      [    3.110085] RAX: 0000000000000000 RBX: ffff9a6d1a1821c8 RCX: ffffffff98a50f48
      [    3.110085] RDX: 0000000000000001 RSI: 0000000000000086 RDI: 0000000000000246
      [    3.110085] RBP: ffff9a6d1ac800a0 R08: 0000000000000289 R09: 000000000000000a
      [    3.110085] R10: fffff03bc0682840 R11: ffffffff9949856d R12: ffff9a6d1b4a4000
      [    3.110085] R13: 0000000000000000 R14: ffff9a6d1a0a6c00 R15: ffffaa620000fc5c
      [    3.110085] FS:  0000000000000000(0000) GS:ffff9a6d1fc00000(0000) knlGS:0000000000000000
      [    3.110085] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [    3.110085] CR2: 0000000000000000 CR3: 000000000ba0a000 CR4: 00000000000006b0
      [    3.110085] Call Trace:
      [    3.110085]  mlx5_core_create_cq+0xde/0x250
      [    3.110085]  ? __kmalloc+0x1ce/0x1e0
      [    3.110085]  mlx5e_create_cq+0x15c/0x1e0
      [    3.110085]  mlx5e_open_drop_rq+0xea/0x190
      [    3.110085]  mlx5e_attach_netdev+0x53/0x140
      [    3.110085]  mlx5e_attach+0x3d/0x60
      [    3.110085]  mlx5e_add+0x11d/0x2f0
      [    3.110085]  mlx5_add_device+0x77/0x170
      [    3.110085]  mlx5_register_interface+0x74/0xc0
      [    3.110085]  ? set_debug_rodata+0x11/0x11
      [    3.110085]  init+0x67/0x72
      [    3.110085]  ? mlx4_en_init_ptys2ethtool_map+0x346/0x346
      [    3.110085]  do_one_initcall+0x98/0x147
      [    3.110085]  ? set_debug_rodata+0x11/0x11
      [    3.110085]  kernel_init_freeable+0x164/0x1e0
      [    3.110085]  ? rest_init+0xb0/0xb0
      [    3.110085]  kernel_init+0xa/0x100
      [    3.110085]  ret_from_fork+0x35/0x40
      [    3.110085] Code: 00 00 00 00 e8 ab ff ff ff 84 c0 74 02 f3 c3 80 3d 3b c3 64 01 00 75 f5 48 c7 c7 68 0b 81 98 c6 05 2b c3 64 01 01 e8 79 d7 a3 ff <0f> ff c3 66 0f 1f 44 00 00 8b 06 83 f8 ff 74 39 31 c9 39 f8 89
      [    3.110085] ---[ end trace a0068e1c68438a74 ]---
      
      Fixes: f105b45b ("net/mlx5: CQ hold/put API")
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      31135eb3
    • Aviad Yehezkel's avatar
      net/mlx5: IPSec, Add support for ESN · cb010083
      Aviad Yehezkel authored
      Currently ESN is not supported with IPSec device offload.
      
      This patch adds ESN support to IPsec device offload.
      Implementing new xfrm device operation to synchronize offloading device
      ESN with xfrm received SN. New QP command to update SA state at the
      following:
      
                 ESN 1                    ESN 2                  ESN 3
      |-----------*-----------|-----------*-----------|-----------*
      ^           ^           ^           ^           ^           ^
      
      ^ - marks where QP command invoked to update the SA ESN state
          machine.
      | - marks the start of the ESN scope (0-2^32-1). At this point move SA
          ESN overlap bit to zero and increment ESN.
      * - marks the middle of the ESN scope (2^31). At this point move SA
          ESN overlap bit to one.
      Signed-off-by: default avatarAviad Yehezkel <aviadye@mellanox.com>
      Signed-off-by: default avatarYossef Efraim <yossefe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      cb010083
    • Aviad Yehezkel's avatar
      net/mlx5e: Added common function for to_ipsec_sa_entry · 75ef3f55
      Aviad Yehezkel authored
      New function for getting driver internal sa entry from xfrm state.
      All checks are done in one function.
      Signed-off-by: default avatarAviad Yehezkel <aviadye@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      75ef3f55
    • Aviad Yehezkel's avatar
      net/mlx5: Add flow-steering commands for FPGA IPSec implementation · 05564d0a
      Aviad Yehezkel authored
      In order to add a context to the FPGA, we need to get both the software
      transform context (which includes the keys, etc) and the
      source/destination IPs (which are included in the steering
      rule). Therefore, we register new set of firmware like commands for
      the FPGA. Each time a rule is added, the steering core infrastructure
      calls the FPGA command layer. If the rule is intended for the FPGA,
      it combines the IPs information with the software transformation
      context and creates the respective hardware transform.
      Afterwards, it calls the standard steering command layer.
      Signed-off-by: default avatarAviad Yehezkel <aviadye@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      05564d0a
    • Aviad Yehezkel's avatar
      net/mlx5: Refactor accel IPSec code · d6c4f029
      Aviad Yehezkel authored
      The current code has one layer that executed FPGA commands and
      the Ethernet part directly used this code. Since downstream patches
      introduces support for IPSec in mlx5_ib, we need to provide some
      abstractions. This patch refactors the accel code into one layer
      that creates a software IPSec transformation and another one which
      creates the actual hardware context.
      The internal command implementation is now hidden in the FPGA
      core layer. The code also adds the ability to share FPGA hardware
      contexts. If two contexts are the same, only a reference count
      is taken.
      Signed-off-by: default avatarAviad Yehezkel <aviadye@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      d6c4f029
    • Aviad Yehezkel's avatar
      net/mlx5: Added required metadata capability for ipsec · af9fe19d
      Aviad Yehezkel authored
      Currently our device requires additional metadata in packet
      to perform ipsec crypto offload.
      Signed-off-by: default avatarAviad Yehezkel <aviadye@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      af9fe19d
    • Aviad Yehezkel's avatar
      net/mlx5: Export ipsec capabilities · 1d2005e2
      Aviad Yehezkel authored
      We will need that for ipsec verbs.
      Signed-off-by: default avatarAviad Yehezkel <aviadye@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      1d2005e2
    • Aviad Yehezkel's avatar
      net/mlx5: IPSec, Add command V2 support · 65802f48
      Aviad Yehezkel authored
      This patch adds V2 command support.
      New fpga devices support extended features (udp encap, esn etc...), this
      features require new hardware sadb format therefore we have a new version
      of commands to manipulate it.
      Signed-off-by: default avatarYossef Efraim <yossefe@mellanox.com>
      Signed-off-by: default avatarAviad Yehezkel <aviadye@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      65802f48
    • Yossi Kuperman's avatar
      net/mlx5e: IPSec, Add support for ESP trailer removal by hardware · 788a8210
      Yossi Kuperman authored
      Current hardware decrypts and authenticates incoming ESP packets.
      Subsequently, the software extracts the nexthdr field, truncates the
      trailer and adjusts csum accordingly.
      
      With this patch and a capable device, the trailer is being removed
      by the hardware and the nexthdr field is conveyed via PET. This way
      we avoid both the need to access the trailer (cache miss) and to
      compute its relative checksum, which significantly improve
      the performance.
      
      Experiment shows that trailer removal improves the performance by
      2Gbps, (netperf). Both forwarding and host-to-host configurations.
      Signed-off-by: default avatarYossi Kuperman <yossiku@mellanox.com>
      Signed-off-by: default avatarAviad Yehezkel <aviadye@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      788a8210
    • Yossi Kuperman's avatar
      net/mlx5: IPSec, Generalize sandbox QP commands · 581fddde
      Yossi Kuperman authored
      The current code assume only SA QP commands.
      Refactor in order to pave the way for new QP commands:
      1. Generic cmd response format.
      2. SA cmd checks are in dedicated functions.
      3. Aligned debug prints.
      Signed-off-by: default avatarYossi Kuperman <yossiku@mellanox.com>
      Signed-off-by: default avatarAviad Yehezkel <aviadye@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      581fddde
    • Saeed Mahameed's avatar
      net/mlx5: Use MLX5_IPSEC_DEV macro for ipsec caps · d83a69c2
      Saeed Mahameed authored
      Fix build break of mlx5_accel_ipsec_device_caps is not defined when
      MLX5_ACCEL is not selected, use MLX5_IPSEC_DEV instead which handles
      such case.
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Reported-by: default avatarDoug Ledford <dledford@redhat.com>
      d83a69c2
    • Yishai Hadas's avatar
      IB/mlx4: Move mlx4_uverbs_ex_query_device_resp to include/uapi/ · d50a8a96
      Yishai Hadas authored
      This struct is involved in the user API for mlx4 and should not be hidden
      inside a driver header file.
      
      Fixes: 09d208b2 ("IB/mlx4: Add report for RSS capabilities by vendor channel")
      Reviewed-by: default avatarMark Bloch <markb@mellanox.com>
      Signed-off-by: default avatarYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      d50a8a96
    • Doug Ledford's avatar
      Merge tag 'mlx5-updates-2018-02-28-1' of... · 1abb791f
      Doug Ledford authored
      Merge tag 'mlx5-updates-2018-02-28-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into k.o/wip/dl-for-next
      
      mlx5-updates-2018-02-28-1 (IPSec-1)
      
      This series consists of some fixes and refactors for the mlx5 drivers,
      especially around the FPGA and flow steering. Most of them are trivial
      fixes and are the foundation of allowing IPSec acceleration from user-space.
      
      We use flow steering abstraction in order to accelerate IPSec packets.
      When a user creates a steering rule, [s]he states that we'll carry an
      encrypt/decrypt flow action (using a specific configuration) for every
      packet which conforms to a certain match. Since currently offloading these
      packets is done via FPGA, we'll add another set of flow steering ops.
      These ops will execute the required FPGA commands and then call the
      standard steering ops.
      
      In order to achieve this, we need that the commands will get all the
      required information. Therefore, we pass the fte object and embed the
      flow_action struct inside the fte. In addition, we add the shim layer
      that will later be used for alternating between the standard and the
      FPGA steering commands.
      
      Some fixes, like " net/mlx5e: Wait for FPGA command responses with a timeout"
      are very relevant for user-space applications, as these applications could
      be killed, but we still want to wait for the FPGA and update the kernel's
      database.
      
      Regards,
      Aviad and Matan
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      1abb791f
    • Zhu Yanjun's avatar
      IB/rxe: change the function rxe_init_device_param type · befd8d98
      Zhu Yanjun authored
      The function rxe_init_device_param always return 0. So the function
      type is changed to void.
      
      CC: Srinivas Eeda <srinivas.eeda@oracle.com>
      CC: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: default avatarZhu Yanjun <yanjun.zhu@oracle.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      befd8d98
    • Zhu Yanjun's avatar
      IB/rxe: remove unnecessary rxe in rxe_send · 31f1bd14
      Zhu Yanjun authored
      In the function rxe_send, the variable rxe is not used in it.
      So it should be removed.
      
      CC: Srinivas Eeda <srinivas.eeda@oracle.com>
      CC: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: default avatarZhu Yanjun <yanjun.zhu@oracle.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      31f1bd14
    • Zhu Yanjun's avatar
      IB/rxe: remove unnecessary skb_clone · 86af6176
      Zhu Yanjun authored
      In send_atomic_ack function, it is not necessary to make a
      skb_clone. To gain better performance (high throughput and
      low latency), this skb_clone is removed.
      
      The following tests are made.
      
       server                       client
      ---------                    ---------
      |1.1.1.1|<----rxe-channel--->|1.1.1.2|
      ---------                    ---------
      
      On server: rping -s -a 1.1.1.1 -v -C 1000 -S 512
      On client: rping -c -a 1.1.1.1 -v -C 1000 -S 512
      
      The kernel config CONFIG_DEBUG_KMEMLEAK is enabled on both server
      and client.
      
      This test runs for several hours. There is no memory leak and the whole
      system can work well.
      
      Based on the above network, the following tests are made.
      
      Server: ibv_rc_pingpong -d rxe0 -g 1
      Client: ibv_rc_pingpong -d rxe0 -g 1 1.1.1.1
      
      The test results on Server(10 tests are made).
      Before:
      Throughput is 137.07 Mbit/sec
      Latency is 517.76 usec/iter
      
      After:
      Throughput is 148.85 Mbit/sec
      Latency is 476.64 usec/iter
      
      The throughput is enhanced and the latency is reduced.
      
      CC: Srinivas Eeda <srinivas.eeda@oracle.com>
      CC: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: default avatarZhu Yanjun <yanjun.zhu@oracle.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      86af6176