1. 12 Jun, 2019 14 commits
  2. 11 Jun, 2019 5 commits
    • David S. Miller's avatar
      Merge branch 'vxlan-geneve-linear' · 93c65f83
      David S. Miller authored
      Stefano Brivio says:
      
      ====================
      Don't assume linear buffers in error handlers for VXLAN and GENEVE
      
      Guillaume noticed the same issue fixed by commit 26fc181e ("fou, fou6:
      do not assume linear skbs") for fou and fou6 is also present in VXLAN and
      GENEVE error handlers: we can't assume linear buffers there, we need to
      use pskb_may_pull() instead.
      ====================
      Acked-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93c65f83
    • Stefano Brivio's avatar
      geneve: Don't assume linear buffers in error handler · eccc73a6
      Stefano Brivio authored
      In commit a0796644 ("geneve: ICMP error lookup handler") I wrongly
      assumed buffers from icmp_socket_deliver() would be linear. This is not
      the case: icmp_socket_deliver() only guarantees we have 8 bytes of linear
      data.
      
      Eric fixed this same issue for fou and fou6 in commits 26fc181e
      ("fou, fou6: do not assume linear skbs") and 5355ed63 ("fou, fou6:
      avoid uninit-value in gue_err() and gue6_err()").
      
      Use pskb_may_pull() instead of checking skb->len, and take into account
      the fact we later access the GENEVE header with udp_hdr(), so we also
      need to sum skb_transport_header() here.
      Reported-by: default avatarGuillaume Nault <gnault@redhat.com>
      Fixes: a0796644 ("geneve: ICMP error lookup handler")
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eccc73a6
    • Stefano Brivio's avatar
      vxlan: Don't assume linear buffers in error handler · 8399a693
      Stefano Brivio authored
      In commit c3a43b9f ("vxlan: ICMP error lookup handler") I wrongly
      assumed buffers from icmp_socket_deliver() would be linear. This is not
      the case: icmp_socket_deliver() only guarantees we have 8 bytes of linear
      data.
      
      Eric fixed this same issue for fou and fou6 in commits 26fc181e
      ("fou, fou6: do not assume linear skbs") and 5355ed63 ("fou, fou6:
      avoid uninit-value in gue_err() and gue6_err()").
      
      Use pskb_may_pull() instead of checking skb->len, and take into account
      the fact we later access the VXLAN header with udp_hdr(), so we also
      need to sum skb_transport_header() here.
      Reported-by: default avatarGuillaume Nault <gnault@redhat.com>
      Fixes: c3a43b9f ("vxlan: ICMP error lookup handler")
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8399a693
    • Taehee Yoo's avatar
      net: openvswitch: do not free vport if register_netdevice() is failed. · 309b6697
      Taehee Yoo authored
      In order to create an internal vport, internal_dev_create() is used and
      that calls register_netdevice() internally.
      If register_netdevice() fails, it calls dev->priv_destructor() to free
      private data of netdev. actually, a private data of this is a vport.
      
      Hence internal_dev_create() should not free and use a vport after failure
      of register_netdevice().
      
      Test command
          ovs-dpctl add-dp bonding_masters
      
      Splat looks like:
      [ 1035.667767] kasan: GPF could be caused by NULL-ptr deref or user memory access
      [ 1035.675958] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [ 1035.676916] CPU: 1 PID: 1028 Comm: ovs-vswitchd Tainted: G    B             5.2.0-rc3+ #240
      [ 1035.676916] RIP: 0010:internal_dev_create+0x2e5/0x4e0 [openvswitch]
      [ 1035.676916] Code: 48 c1 ea 03 80 3c 02 00 0f 85 9f 01 00 00 4c 8b 23 48 b8 00 00 00 00 00 fc ff df 49 8d bc 24 60 05 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 86 01 00 00 49 8b bc 24 60 05 00 00 e8 e4 68 f4
      [ 1035.713720] RSP: 0018:ffff88810dcb7578 EFLAGS: 00010206
      [ 1035.713720] RAX: dffffc0000000000 RBX: ffff88810d13fe08 RCX: ffffffff84297704
      [ 1035.713720] RDX: 00000000000000ac RSI: 0000000000000000 RDI: 0000000000000560
      [ 1035.713720] RBP: 00000000ffffffef R08: fffffbfff0d3b881 R09: fffffbfff0d3b881
      [ 1035.713720] R10: 0000000000000001 R11: fffffbfff0d3b880 R12: 0000000000000000
      [ 1035.768776] R13: 0000607ee460b900 R14: ffff88810dcb7690 R15: ffff88810dcb7698
      [ 1035.777709] FS:  00007f02095fc980(0000) GS:ffff88811b400000(0000) knlGS:0000000000000000
      [ 1035.777709] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1035.777709] CR2: 00007ffdf01d2f28 CR3: 0000000108258000 CR4: 00000000001006e0
      [ 1035.777709] Call Trace:
      [ 1035.777709]  ovs_vport_add+0x267/0x4f0 [openvswitch]
      [ 1035.777709]  new_vport+0x15/0x1e0 [openvswitch]
      [ 1035.777709]  ovs_vport_cmd_new+0x567/0xd10 [openvswitch]
      [ 1035.777709]  ? ovs_dp_cmd_dump+0x490/0x490 [openvswitch]
      [ 1035.777709]  ? __kmalloc+0x131/0x2e0
      [ 1035.777709]  ? genl_family_rcv_msg+0xa54/0x1030
      [ 1035.777709]  genl_family_rcv_msg+0x63a/0x1030
      [ 1035.777709]  ? genl_unregister_family+0x630/0x630
      [ 1035.841681]  ? debug_show_all_locks+0x2d0/0x2d0
      [ ... ]
      
      Fixes: cf124db5 ("net: Fix inconsistent teardown and release of private netdev state.")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Reviewed-by: default avatarGreg Rose <gvrose8192@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      309b6697
    • Willem de Bruijn's avatar
      net: correct udp zerocopy refcnt also when zerocopy only on append · 522924b5
      Willem de Bruijn authored
      The below patch fixes an incorrect zerocopy refcnt increment when
      appending with MSG_MORE to an existing zerocopy udp skb.
      
        send(.., MSG_ZEROCOPY | MSG_MORE);	// refcnt 1
        send(.., MSG_ZEROCOPY | MSG_MORE);	// refcnt still 1 (bar frags)
      
      But it missed that zerocopy need not be passed at the first send. The
      right test whether the uarg is newly allocated and thus has extra
      refcnt 1 is not !skb, but !skb_zcopy.
      
        send(.., MSG_MORE);			// <no uarg>
        send(.., MSG_ZEROCOPY);		// refcnt 1
      
      Fixes: 100f6d8e ("net: correct zerocopy refcnt with udp MSG_MORE")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      522924b5
  3. 10 Jun, 2019 9 commits
    • John Hurley's avatar
      nfp: ensure skb network header is set for packet redirect · dce5cccc
      John Hurley authored
      Packets received at the NFP driver may be redirected to egress of another
      netdev (e.g. in the case of OvS internal ports). On the egress path, some
      processes, like TC egress hooks, may expect the network header offset
      field in the skb to be correctly set. If this is not the case there is
      potential for abnormal behaviour and even the triggering of BUG() calls.
      
      Set the skb network header field before the mac header pull when doing a
      packet redirect.
      
      Fixes: 27f54b58 ("nfp: allow fallback packets from non-reprs")
      Signed-off-by: default avatarJohn Hurley <john.hurley@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dce5cccc
    • Yuchung Cheng's avatar
      tcp: fix undo spurious SYNACK in passive Fast Open · fcc2202a
      Yuchung Cheng authored
      Commit 794200d6 ("tcp: undo cwnd on Fast Open spurious SYNACK
      retransmit") may cause tcp_fastretrans_alert() to warn about pending
      retransmission in Open state. This is triggered when the Fast Open
      server both sends data and has spurious SYNACK retransmission during
      the handshake, and the data packets were lost or reordered.
      
      The root cause is a bit complicated:
      
      (1) Upon receiving SYN-data: a full socket is created with
          snd_una = ISN + 1 by tcp_create_openreq_child()
      
      (2) On SYNACK timeout the server/sender enters CA_Loss state.
      
      (3) Upon receiving the final ACK to complete the handshake, sender
          does not mark FLAG_SND_UNA_ADVANCED since (1)
      
          Sender then calls tcp_process_loss since state is CA_loss by (2)
      
      (4) tcp_process_loss() does not invoke undo operations but instead
          mark REXMIT_LOST to force retransmission
      
      (5) tcp_rcv_synrecv_state_fastopen() calls tcp_try_undo_loss(). It
          changes state to CA_Open but has positive tp->retrans_out
      
      (6) Next ACK triggers the WARN_ON in tcp_fastretrans_alert()
      
      The step that goes wrong is (4) where the undo operation should
      have been invoked because the ACK successfully acknowledged the
      SYN sequence. This fixes that by specifically checking undo
      when the SYN-ACK sequence is acknowledged. Then after
      tcp_process_loss() the state would be further adjusted based
      in tcp_fastretrans_alert() to avoid triggering the warning in (6).
      
      Fixes: 794200d6 ("tcp: undo cwnd on Fast Open spurious SYNACK retransmit")
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fcc2202a
    • Matteo Croce's avatar
      mpls: fix af_mpls dependencies · c1a9d659
      Matteo Croce authored
      MPLS routing code relies on sysctl to work, so let it select PROC_SYSCTL.
      Reported-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Suggested-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarMatteo Croce <mcroce@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1a9d659
    • David S. Miller's avatar
      Merge branch 'ibmvnic-Fixes-for-device-reset-handling' · 7f0b44a4
      David S. Miller authored
      Thomas Falcon says:
      
      ====================
      ibmvnic: Fixes for device reset handling
      
      This series contains three unrelated fixes to issues seen during
      device resets. The first patch fixes an error when the driver requests
      to deactivate the link of an uninitialized device, resulting in a
      failure to reset. Next, a patch to fix multicast transmission
      failures seen after a driver reset. The final patch fixes mishandling
      of memory allocation failures during device initialization, which
      caused a kernel oops.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7f0b44a4
    • Thomas Falcon's avatar
      ibmvnic: Fix unchecked return codes of memory allocations · 7c940b1a
      Thomas Falcon authored
      The return values for these memory allocations are unchecked,
      which may cause an oops if the driver does not handle them after
      a failure. Fix by checking the function's return code.
      Signed-off-by: default avatarThomas Falcon <tlfalcon@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c940b1a
    • Thomas Falcon's avatar
      ibmvnic: Refresh device multicast list after reset · be32a243
      Thomas Falcon authored
      It was observed that multicast packets were no longer received after
      a device reset.  The fix is to resend the current multicast list to
      the backing device after recovery.
      Signed-off-by: default avatarThomas Falcon <tlfalcon@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      be32a243
    • Thomas Falcon's avatar
      ibmvnic: Do not close unopened driver during reset · 1f94608b
      Thomas Falcon authored
      Check driver state before halting it during a reset. If the driver is
      not running, do nothing. Otherwise, a request to deactivate a down link
      can cause an error and the reset will fail.
      Signed-off-by: default avatarThomas Falcon <tlfalcon@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1f94608b
    • David S. Miller's avatar
      Merge tag 'mlx5-fixes-2019-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 4172eadb
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      Mellanox, mlx5 fixes 2019-06-07
      
      This series introduces some fixes to mlx5 driver.
      
      Please pull and let me know if there is any problem.
      
      For -stable v4.17
        ('net/mlx5: Avoid reloading already removed devices')
      
      For -stable v5.0
        ('net/mlx5e: Avoid detaching non-existing netdev under switchdev mode')
      
      For -stable v5.1
        ('net/mlx5e: Fix source port matching in fdb peer flow rule')
        ('net/mlx5e: Support tagged tunnel over bond')
        ('net/mlx5e: Add ndo_set_feature for uplink representor')
        ('net/mlx5: Update pci error handler entries and command translation')
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4172eadb
    • David S. Miller's avatar
      Merge tag 'linux-can-fixes-for-5.2-20190607' of... · 62f42a11
      David S. Miller authored
      Merge tag 'linux-can-fixes-for-5.2-20190607' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can 2019-06-07
      
      this is a pull reqeust of 9 patches for net/master.
      
      The first patch is by Alexander Dahl and removes a duplicate menu entry from
      the Kconfig. The next patch by Joakim Zhang fixes the timeout in the flexcan
      driver when setting small bit rates. Anssi Hannula's patch for the xilinx_can
      driver fixes the bittiming_const for CAN FD core. The two patches by Sean
      Nyekjaer bring mcp25625 to the existing mcp251x driver. The patch by Eugen
      Hristev implements an errata for the m_can driver. YueHaibing's patch fixes the
      error handling ing can_init(). The patch by Fabio Estevam for the flexcan
      driver removes an unneeded registration message during flexcan_probe(). And the
      last patch is by Willem de Bruijn and adds the missing purging the  socket
      error queue on sock destruct.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      62f42a11
  4. 09 Jun, 2019 4 commits
  5. 07 Jun, 2019 8 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 38e406f6
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2019-06-07
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Fix several bugs in riscv64 JIT code emission which forgot to clear high
         32-bits for alu32 ops, from Björn and Luke with selftests covering all
         relevant BPF alu ops from Björn and Jiong.
      
      2) Two fixes for UDP BPF reuseport that avoid calling the program in case of
         __udp6_lib_err and UDP GRO which broke reuseport_select_sock() assumption
         that skb->data is pointing to transport header, from Martin.
      
      3) Two fixes for BPF sockmap: a use-after-free from sleep in psock's backlog
         workqueue, and a missing restore of sk_write_space when psock gets dropped,
         from Jakub and John.
      
      4) Fix unconnected UDP sendmsg hook API which is insufficient as-is since it
         breaks standard applications like DNS if reverse NAT is not performed upon
         receive, from Daniel.
      
      5) Fix an out-of-bounds read in __bpf_skc_lookup which in case of AF_INET6
         fails to verify that the length of the tuple is long enough, from Lorenz.
      
      6) Fix libbpf's libbpf__probe_raw_btf to return an fd instead of 0/1 (for
         {un,}successful probe) as that is expected to be propagated as an fd to
         load_sk_storage_btf() and thus closing the wrong descriptor otherwise,
         from Michal.
      
      7) Fix bpftool's JSON output for the case when a lookup fails, from Krzesimir.
      
      8) Minor misc fixes in docs, samples and selftests, from various others.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      38e406f6
    • Eli Britstein's avatar
      net/mlx5e: Support tagged tunnel over bond · 45e7d4c0
      Eli Britstein authored
      Stacked devices like bond interface may have a VLAN device on top of
      them. Detect lag state correctly under this condition, and return the
      correct routed net device, according to it the encap header is built.
      
      Fixes: e32ee6c7 ("net/mlx5e: Support tunnel encap over tagged Ethernet")
      Signed-off-by: default avatarEli Britstein <elibr@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      45e7d4c0
    • Alaa Hleihel's avatar
      net/mlx5e: Avoid detaching non-existing netdev under switchdev mode · 47c9d2c9
      Alaa Hleihel authored
      After introducing dedicated uplink representor, the netdev instance
      set over the esw manager vport (PF) became no longer in use, so it was
      removed in the cited commit once we're on switchdev mode.
      However, the mlx5e_detach function was not updated accordingly, and it
      still tries to detach a non-existing netdev, causing a kernel crash.
      
      This patch fixes this issue.
      
      Fixes: aec002f6 ("net/mlx5e: Uninstantiate esw manager vport netdev on switchdev mode")
      Signed-off-by: default avatarAlaa Hleihel <alaa@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      47c9d2c9
    • Raed Salem's avatar
      net/mlx5e: Fix source port matching in fdb peer flow rule · b83c0730
      Raed Salem authored
      The cited commit changed the initialization placement of the eswitch
      attributes so it is done prior to parse tc actions function call,
      including among others the in_rep and in_mdev fields which are mistakenly
      reassigned inside the parse actions function.
      
      This breaks the source port matching criteria of the peer redirect rule.
      
      Fix by removing the now redundant reassignment of the already initialized
      fields.
      
      Fixes: 988ab9c7 ("net/mlx5e: Introduce mlx5e_flow_esw_attr_init() helper")
      Signed-off-by: default avatarRaed Salem <raeds@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      b83c0730
    • Shay Agroskin's avatar
      net/mlx5e: Replace reciprocal_scale in TX select queue function · 57c70d87
      Shay Agroskin authored
      The TX queue index returned by the fallback function ranges
      between [0,NUM CHANNELS - 1] if QoS isn't set and
      [0, (NUM CHANNELS)*(NUM TCs) -1] otherwise.
      
      Our HW uses different TC mapping than the fallback function
      (which is denoted as 'up', user priority) so we only need to extract
      a channel number out of the returned value.
      
      Since (NUM CHANNELS)*(NUM TCs) is a relatively small number, using
      reciprocal scale almost always returns zero.
      We instead access the 'txq2sq' table to extract the sq (and with it the
      channel number) associated with the tx queue, thus getting
      a more evenly distributed channel number.
      
      Perf:
      
      Rx/Tx side with Intel(R) Xeon(R) Silver 4108 CPU @ 1.80GHz and ConnectX-5.
      Used 'iperf' UDP traffic, 10 threads, and priority 5.
      
      Before:	0.566Mpps
      After:	 2.37Mpps
      
      As expected, releasing the existing bottleneck of steering all traffic
      to TX queue zero significantly improves transmission rates.
      
      Fixes: 7ccdd084 ("net/mlx5e: Fix select queue callback")
      Signed-off-by: default avatarShay Agroskin <shayag@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      57c70d87
    • Chris Mi's avatar
      net/mlx5e: Add ndo_set_feature for uplink representor · d3cbd425
      Chris Mi authored
      After we have a dedicated uplink representor, the new netdev ops
      doesn't support ndo_set_feature. Because of that, we can't change
      some features, eg. rxvlan. Now add it back.
      
      In this patch, I also do a cleanup for the features flag handling,
      eg. remove duplicate NETIF_F_HW_TC flag setting.
      
      Fixes: aec002f6 ("net/mlx5e: Uninstantiate esw manager vport netdev on switchdev mode")
      Signed-off-by: default avatarChris Mi <chrism@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Reviewed-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      d3cbd425
    • Alaa Hleihel's avatar
      net/mlx5: Avoid reloading already removed devices · dd80857b
      Alaa Hleihel authored
      Prior to reloading a device we must first verify that it was not already
      removed. Otherwise, the attempt to remove the device will do nothing, and
      in that case we will end up proceeding with adding an new device that no
      one was expecting to remove, leaving behind used resources such as EQs that
      causes a failure to destroy comp EQs and syndrome (0x30f433).
      
      Fix that by making sure that we try to remove and add a device (based on a
      protocol) only if the device is already added.
      
      Fixes: c5447c70 ("net/mlx5: E-Switch, Reload IB interface when switching devlink modes")
      Signed-off-by: default avatarAlaa Hleihel <alaa@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      dd80857b
    • Edward Srouji's avatar
      net/mlx5: Update pci error handler entries and command translation · 6a6fabbf
      Edward Srouji authored
      Add missing entries for create/destroy UCTX and UMEM commands.
      This could get us wrong "unknown FW command" error in flows
      where we unbind the device or reset the driver.
      
      Also the translation of these commands from opcodes to string
      was missing.
      
      Fixes: 6e3722ba ("IB/mlx5: Use the correct commands for UMEM and UCTX allocation")
      Signed-off-by: default avatarEdward Srouji <edwards@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      6a6fabbf