1. 11 Mar, 2021 1 commit
  2. 10 Mar, 2021 39 commits
    • Florian Fainelli's avatar
      net: dsa: b53: VLAN filtering is global to all users · d45c36ba
      Florian Fainelli authored
      The bcm_sf2 driver uses the b53 driver as a library but does not make
      usre of the b53_setup() function, this made it fail to inherit the
      vlan_filtering_is_global attribute. Fix this by moving the assignment to
      b53_switch_alloc() which is used by bcm_sf2.
      
      Fixes: 7228b23e ("net: dsa: b53: Let DSA handle mismatched VLAN filtering settings")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d45c36ba
    • Eric Dumazet's avatar
      net: sched: validate stab values · e323d865
      Eric Dumazet authored
      iproute2 package is well behaved, but malicious user space can
      provide illegal shift values and trigger UBSAN reports.
      
      Add stab parameter to red_check_params() to validate user input.
      
      syzbot reported:
      
      UBSAN: shift-out-of-bounds in ./include/net/red.h:312:18
      shift exponent 111 is too large for 64-bit type 'long unsigned int'
      CPU: 1 PID: 14662 Comm: syz-executor.3 Not tainted 5.12.0-rc2-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:79 [inline]
       dump_stack+0x141/0x1d7 lib/dump_stack.c:120
       ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
       __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:327
       red_calc_qavg_from_idle_time include/net/red.h:312 [inline]
       red_calc_qavg include/net/red.h:353 [inline]
       choke_enqueue.cold+0x18/0x3dd net/sched/sch_choke.c:221
       __dev_xmit_skb net/core/dev.c:3837 [inline]
       __dev_queue_xmit+0x1943/0x2e00 net/core/dev.c:4150
       neigh_hh_output include/net/neighbour.h:499 [inline]
       neigh_output include/net/neighbour.h:508 [inline]
       ip6_finish_output2+0x911/0x1700 net/ipv6/ip6_output.c:117
       __ip6_finish_output net/ipv6/ip6_output.c:182 [inline]
       __ip6_finish_output+0x4c1/0xe10 net/ipv6/ip6_output.c:161
       ip6_finish_output+0x35/0x200 net/ipv6/ip6_output.c:192
       NF_HOOK_COND include/linux/netfilter.h:290 [inline]
       ip6_output+0x1e4/0x530 net/ipv6/ip6_output.c:215
       dst_output include/net/dst.h:448 [inline]
       NF_HOOK include/linux/netfilter.h:301 [inline]
       NF_HOOK include/linux/netfilter.h:295 [inline]
       ip6_xmit+0x127e/0x1eb0 net/ipv6/ip6_output.c:320
       inet6_csk_xmit+0x358/0x630 net/ipv6/inet6_connection_sock.c:135
       dccp_transmit_skb+0x973/0x12c0 net/dccp/output.c:138
       dccp_send_reset+0x21b/0x2b0 net/dccp/output.c:535
       dccp_finish_passive_close net/dccp/proto.c:123 [inline]
       dccp_finish_passive_close+0xed/0x140 net/dccp/proto.c:118
       dccp_terminate_connection net/dccp/proto.c:958 [inline]
       dccp_close+0xb3c/0xe60 net/dccp/proto.c:1028
       inet_release+0x12e/0x280 net/ipv4/af_inet.c:431
       inet6_release+0x4c/0x70 net/ipv6/af_inet6.c:478
       __sock_release+0xcd/0x280 net/socket.c:599
       sock_close+0x18/0x20 net/socket.c:1258
       __fput+0x288/0x920 fs/file_table.c:280
       task_work_run+0xdd/0x1a0 kernel/task_work.c:140
       tracehook_notify_resume include/linux/tracehook.h:189 [inline]
      
      Fixes: 8afa10cb ("net_sched: red: Avoid illegal values")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e323d865
    • David S. Miller's avatar
    • Rafał Miłecki's avatar
      net: dsa: bcm_sf2: use 2 Gbps IMP port link on BCM4908 · 8373a0fe
      Rafał Miłecki authored
      BCM4908 uses 2 Gbps link between switch and the Ethernet interface.
      Without this BCM4908 devices were able to achieve only 2 x ~895 Mb/s.
      This allows handling e.g. NAT traffic with 940 Mb/s.
      Signed-off-by: default avatarRafał Miłecki <rafal@milecki.pl>
      Acked-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8373a0fe
    • Pavel Andrianov's avatar
      net: pxa168_eth: Fix a potential data race in pxa168_eth_remove · 0571a753
      Pavel Andrianov authored
      pxa168_eth_remove() firstly calls unregister_netdev(),
      then cancels a timeout work. unregister_netdev() shuts down a device
      interface and removes it from the kernel tables. If the timeout occurs
      in parallel, the timeout work (pxa168_eth_tx_timeout_task) performs stop
      and open of the device. It may lead to an inconsistent state and memory
      leaks.
      
      Found by Linux Driver Verification project (linuxtesting.org).
      Signed-off-by: default avatarPavel Andrianov <andrianov@ispras.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0571a753
    • Eric Dumazet's avatar
      macvlan: macvlan_count_rx() needs to be aware of preemption · dd4fa1da
      Eric Dumazet authored
      macvlan_count_rx() can be called from process context, it is thus
      necessary to disable preemption before calling u64_stats_update_begin()
      
      syzbot was able to spot this on 32bit arch:
      
      WARNING: CPU: 1 PID: 4632 at include/linux/seqlock.h:271 __seqprop_assert include/linux/seqlock.h:271 [inline]
      WARNING: CPU: 1 PID: 4632 at include/linux/seqlock.h:271 __seqprop_assert.constprop.0+0xf0/0x11c include/linux/seqlock.h:269
      Modules linked in:
      Kernel panic - not syncing: panic_on_warn set ...
      CPU: 1 PID: 4632 Comm: kworker/1:3 Not tainted 5.12.0-rc2-syzkaller #0
      Hardware name: ARM-Versatile Express
      Workqueue: events macvlan_process_broadcast
      Backtrace:
      [<82740468>] (dump_backtrace) from [<827406dc>] (show_stack+0x18/0x1c arch/arm/kernel/traps.c:252)
       r7:00000080 r6:60000093 r5:00000000 r4:8422a3c4
      [<827406c4>] (show_stack) from [<82751b58>] (__dump_stack lib/dump_stack.c:79 [inline])
      [<827406c4>] (show_stack) from [<82751b58>] (dump_stack+0xb8/0xe8 lib/dump_stack.c:120)
      [<82751aa0>] (dump_stack) from [<82741270>] (panic+0x130/0x378 kernel/panic.c:231)
       r7:830209b4 r6:84069ea4 r5:00000000 r4:844350d0
      [<82741140>] (panic) from [<80244924>] (__warn+0xb0/0x164 kernel/panic.c:605)
       r3:8404ec8c r2:00000000 r1:00000000 r0:830209b4
       r7:0000010f
      [<80244874>] (__warn) from [<82741520>] (warn_slowpath_fmt+0x68/0xd4 kernel/panic.c:628)
       r7:81363f70 r6:0000010f r5:83018e50 r4:00000000
      [<827414bc>] (warn_slowpath_fmt) from [<81363f70>] (__seqprop_assert include/linux/seqlock.h:271 [inline])
      [<827414bc>] (warn_slowpath_fmt) from [<81363f70>] (__seqprop_assert.constprop.0+0xf0/0x11c include/linux/seqlock.h:269)
       r8:5a109000 r7:0000000f r6:a568dac0 r5:89802300 r4:00000001
      [<81363e80>] (__seqprop_assert.constprop.0) from [<81364af0>] (u64_stats_update_begin include/linux/u64_stats_sync.h:128 [inline])
      [<81363e80>] (__seqprop_assert.constprop.0) from [<81364af0>] (macvlan_count_rx include/linux/if_macvlan.h:47 [inline])
      [<81363e80>] (__seqprop_assert.constprop.0) from [<81364af0>] (macvlan_broadcast+0x154/0x26c drivers/net/macvlan.c:291)
       r5:89802300 r4:8a927740
      [<8136499c>] (macvlan_broadcast) from [<81365020>] (macvlan_process_broadcast+0x258/0x2d0 drivers/net/macvlan.c:317)
       r10:81364f78 r9:8a86d000 r8:8a9c7e7c r7:8413aa5c r6:00000000 r5:00000000
       r4:89802840
      [<81364dc8>] (macvlan_process_broadcast) from [<802696a4>] (process_one_work+0x2d4/0x998 kernel/workqueue.c:2275)
       r10:00000008 r9:8404ec98 r8:84367a02 r7:ddfe6400 r6:ddfe2d40 r5:898dac80
       r4:8a86d43c
      [<802693d0>] (process_one_work) from [<80269dcc>] (worker_thread+0x64/0x54c kernel/workqueue.c:2421)
       r10:00000008 r9:8a9c6000 r8:84006d00 r7:ddfe2d78 r6:898dac94 r5:ddfe2d40
       r4:898dac80
      [<80269d68>] (worker_thread) from [<80271f40>] (kthread+0x184/0x1a4 kernel/kthread.c:292)
       r10:85247e64 r9:898dac80 r8:80269d68 r7:00000000 r6:8a9c6000 r5:89a2ee40
       r4:8a97bd00
      [<80271dbc>] (kthread) from [<80200114>] (ret_from_fork+0x14/0x20 arch/arm/kernel/entry-common.S:158)
      Exception stack(0x8a9c7fb0 to 0x8a9c7ff8)
      
      Fixes: 412ca155 ("macvlan: Move broadcasts into a work queue")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dd4fa1da
    • Ido Schimmel's avatar
      drop_monitor: Perform cleanup upon probe registration failure · 9398e9c0
      Ido Schimmel authored
      In the rare case that drop_monitor fails to register its probe on the
      'napi_poll' tracepoint, it will not deactivate its hysteresis timer as
      part of the error path. If the hysteresis timer was armed by the shortly
      lived 'kfree_skb' probe and user space retries to initiate tracing, a
      warning will be emitted for trying to initialize an active object [1].
      
      Fix this by properly undoing all the operations that were done prior to
      probe registration, in both software and hardware code paths.
      
      Note that syzkaller managed to fail probe registration by injecting a
      slab allocation failure [2].
      
      [1]
      ODEBUG: init active (active state 0) object type: timer_list hint: sched_send_work+0x0/0x60 include/linux/list.h:135
      WARNING: CPU: 1 PID: 8649 at lib/debugobjects.c:505 debug_print_object+0x16e/0x250 lib/debugobjects.c:505
      Modules linked in:
      CPU: 1 PID: 8649 Comm: syz-executor.0 Not tainted 5.11.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:505
      [...]
      Call Trace:
       __debug_object_init+0x524/0xd10 lib/debugobjects.c:588
       debug_timer_init kernel/time/timer.c:722 [inline]
       debug_init kernel/time/timer.c:770 [inline]
       init_timer_key+0x2d/0x340 kernel/time/timer.c:814
       net_dm_trace_on_set net/core/drop_monitor.c:1111 [inline]
       set_all_monitor_traces net/core/drop_monitor.c:1188 [inline]
       net_dm_monitor_start net/core/drop_monitor.c:1295 [inline]
       net_dm_cmd_trace+0x720/0x1220 net/core/drop_monitor.c:1339
       genl_family_rcv_msg_doit+0x228/0x320 net/netlink/genetlink.c:739
       genl_family_rcv_msg net/netlink/genetlink.c:783 [inline]
       genl_rcv_msg+0x328/0x580 net/netlink/genetlink.c:800
       netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2502
       genl_rcv+0x24/0x40 net/netlink/genetlink.c:811
       netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline]
       netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338
       netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927
       sock_sendmsg_nosec net/socket.c:652 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:672
       ____sys_sendmsg+0x6e8/0x810 net/socket.c:2348
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2402
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2435
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      [2]
       FAULT_INJECTION: forcing a failure.
       name failslab, interval 1, probability 0, space 0, times 1
       CPU: 1 PID: 8645 Comm: syz-executor.0 Not tainted 5.11.0-syzkaller #0
       Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
       Call Trace:
        dump_stack+0xfa/0x151
        should_fail.cold+0x5/0xa
        should_failslab+0x5/0x10
        __kmalloc+0x72/0x3f0
        tracepoint_add_func+0x378/0x990
        tracepoint_probe_register+0x9c/0xe0
        net_dm_cmd_trace+0x7fc/0x1220
        genl_family_rcv_msg_doit+0x228/0x320
        genl_rcv_msg+0x328/0x580
        netlink_rcv_skb+0x153/0x420
        genl_rcv+0x24/0x40
        netlink_unicast+0x533/0x7d0
        netlink_sendmsg+0x856/0xd90
        sock_sendmsg+0xcf/0x120
        ____sys_sendmsg+0x6e8/0x810
        ___sys_sendmsg+0xf3/0x170
        __sys_sendmsg+0xe5/0x1b0
        do_syscall_64+0x2d/0x70
        entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Fixes: 70c69274 ("drop_monitor: Initialize timer and work item upon tracing enable")
      Fixes: 8ee2267a ("drop_monitor: Convert to using devlink tracepoint")
      Reported-by: syzbot+779559d6503f3a56213d@syzkaller.appspotmail.com
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9398e9c0
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 547fd083
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2021-03-10
      
      The following pull-request contains BPF updates for your *net* tree.
      
      We've added 8 non-merge commits during the last 5 day(s) which contain
      a total of 11 files changed, 136 insertions(+), 17 deletions(-).
      
      The main changes are:
      
      1) Reject bogus use of vmlinux BTF as map/prog creation BTF, from Alexei Starovoitov.
      
      2) Fix allocation failure splat in x86 JIT for large progs. Also fix overwriting
         percpu cgroup storage from tracing programs when nested, from Yonghong Song.
      
      3) Fix rx queue retrieval in XDP for multi-queue veth, from Maciej Fijalkowski.
      
      4) Fix bpf_check_mtu() helper API before freeze to have mtu_len as custom skb/xdp
         L3 input length, from Jesper Dangaard Brouer.
      
      5) Fix inode_storage's lookup_elem return value upon having bad fd, from Tal Lossos.
      
      6) Fix bpftool and libbpf cross-build on MacOS, from Georgi Valkov.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      547fd083
    • Wei Wang's avatar
      ipv6: fix suspecious RCU usage warning · 28259bac
      Wei Wang authored
      Syzbot reported the suspecious RCU usage in nexthop_fib6_nh() when
      called from ipv6_route_seq_show(). The reason is ipv6_route_seq_start()
      calls rcu_read_lock_bh(), while nexthop_fib6_nh() calls
      rcu_dereference_rtnl().
      The fix proposed is to add a variant of nexthop_fib6_nh() to use
      rcu_dereference_bh_rtnl() for ipv6_route_seq_show().
      
      The reported trace is as follows:
      ./include/net/nexthop.h:416 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      2 locks held by syz-executor.0/17895:
           at: seq_read+0x71/0x12a0 fs/seq_file.c:169
           at: seq_file_net include/linux/seq_file_net.h:19 [inline]
           at: ipv6_route_seq_start+0xaf/0x300 net/ipv6/ip6_fib.c:2616
      
      stack backtrace:
      CPU: 1 PID: 17895 Comm: syz-executor.0 Not tainted 4.15.0-syzkaller #0
      Call Trace:
       [<ffffffff849edf9e>] __dump_stack lib/dump_stack.c:17 [inline]
       [<ffffffff849edf9e>] dump_stack+0xd8/0x147 lib/dump_stack.c:53
       [<ffffffff8480b7fa>] lockdep_rcu_suspicious+0x153/0x15d kernel/locking/lockdep.c:5745
       [<ffffffff8459ada6>] nexthop_fib6_nh include/net/nexthop.h:416 [inline]
       [<ffffffff8459ada6>] ipv6_route_native_seq_show net/ipv6/ip6_fib.c:2488 [inline]
       [<ffffffff8459ada6>] ipv6_route_seq_show+0x436/0x7a0 net/ipv6/ip6_fib.c:2673
       [<ffffffff81c556df>] seq_read+0xccf/0x12a0 fs/seq_file.c:276
       [<ffffffff81dbc62c>] proc_reg_read+0x10c/0x1d0 fs/proc/inode.c:231
       [<ffffffff81bc28ae>] do_loop_readv_writev fs/read_write.c:714 [inline]
       [<ffffffff81bc28ae>] do_loop_readv_writev fs/read_write.c:701 [inline]
       [<ffffffff81bc28ae>] do_iter_read+0x49e/0x660 fs/read_write.c:935
       [<ffffffff81bc81ab>] vfs_readv+0xfb/0x170 fs/read_write.c:997
       [<ffffffff81c88847>] kernel_readv fs/splice.c:361 [inline]
       [<ffffffff81c88847>] default_file_splice_read+0x487/0x9c0 fs/splice.c:416
       [<ffffffff81c86189>] do_splice_to+0x129/0x190 fs/splice.c:879
       [<ffffffff81c86f66>] splice_direct_to_actor+0x256/0x890 fs/splice.c:951
       [<ffffffff81c8777d>] do_splice_direct+0x1dd/0x2b0 fs/splice.c:1060
       [<ffffffff81bc4747>] do_sendfile+0x597/0xce0 fs/read_write.c:1459
       [<ffffffff81bca205>] SYSC_sendfile64 fs/read_write.c:1520 [inline]
       [<ffffffff81bca205>] SyS_sendfile64+0x155/0x170 fs/read_write.c:1506
       [<ffffffff81015fcf>] do_syscall_64+0x1ff/0x310 arch/x86/entry/common.c:305
       [<ffffffff84a00076>] entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      Fixes: f88d8ea6 ("ipv6: Plumb support for nexthop object in a fib6_info")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Cc: David Ahern <dsahern@kernel.org>
      Cc: Ido Schimmel <idosch@idosch.org>
      Cc: Petr Machata <petrm@nvidia.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      28259bac
    • David S. Miller's avatar
      Merge branch 'ip6ip6-crash' · c89489b4
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      Fix ip6ip6 crash for collect_md skbs
      
      Fix a NULL pointer deref panic I ran into for regular ip6ip6 tunnel devices
      when collect_md populated skbs were redirected to them for xmit. See patches
      for further details, thanks!
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c89489b4
    • Daniel Borkmann's avatar
      net, bpf: Fix ip6ip6 crash with collect_md populated skbs · a188bb56
      Daniel Borkmann authored
      I ran into a crash where setting up a ip6ip6 tunnel device which was /not/
      set to collect_md mode was receiving collect_md populated skbs for xmit.
      
      The BPF prog was populating the skb via bpf_skb_set_tunnel_key() which is
      assigning special metadata dst entry and then redirecting the skb to the
      device, taking ip6_tnl_start_xmit() -> ipxip6_tnl_xmit() -> ip6_tnl_xmit()
      and in the latter it performs a neigh lookup based on skb_dst(skb) where
      we trigger a NULL pointer dereference on dst->ops->neigh_lookup() since
      the md_dst_ops do not populate neigh_lookup callback with a fake handler.
      
      Transform the md_dst_ops into generic dst_blackhole_ops that can also be
      reused elsewhere when needed, and use them for the metadata dst entries as
      callback ops.
      
      Also, remove the dst_md_discard{,_out}() ops and rely on dst_discard{,_out}()
      from dst_init() which free the skb the same way modulo the splat. Given we
      will be able to recover just fine from there, avoid any potential splats
      iff this gets ever triggered in future (or worse, panic on warns when set).
      
      Fixes: f38a9eb1 ("dst: Metadata destinations")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a188bb56
    • Daniel Borkmann's avatar
      net: Consolidate common blackhole dst ops · c4c877b2
      Daniel Borkmann authored
      Move generic blackhole dst ops to the core and use them from both
      ipv4_dst_blackhole_ops and ip6_dst_blackhole_ops where possible. No
      functional change otherwise. We need these also in other locations
      and having to define them over and over again is not great.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c4c877b2
    • Yevgeny Kliteynik's avatar
      net/mlx5: DR, Fix potential shift wrapping of 32-bit value in STEv1 getter · 84076c4c
      Yevgeny Kliteynik authored
      Fix 32-bit variable shift wrapping in dr_ste_v1_get_miss_addr.
      
      Fixes: a6098129 ("net/mlx5: DR, Add STEv1 setters and getters")
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarYevgeny Kliteynik <kliteyn@nvidia.com>
      Reviewed-by: default avatarAlex Vesker <valex@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      84076c4c
    • Shay Drory's avatar
      net/mlx5: SF: Fix error flow of SFs allocation flow · dc694f11
      Shay Drory authored
      When SF id is unavailable, code jumps to wrong label that accesses
      sw id array outside of its range.
      Hence, when SF id is not allocated, avoid accessing such array.
      
      Fixes: 8f010541 ("net/mlx5: SF, Add port add delete functionality")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarParav Pandit <parav@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      dc694f11
    • Shay Drory's avatar
      net/mlx5: SF: Fix memory leak of work item · 6fa37d66
      Shay Drory authored
      Cited patch in the fixes tag missed to free the allocated work.
      Fix it by freeing the work after work execution.
      
      Fixes: f3196bb0 ("net/mlx5: Introduce vhca state event notifier")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarParav Pandit <parav@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      6fa37d66
    • Parav Pandit's avatar
      net/mlx5: SF, Correct vhca context size · 6a371754
      Parav Pandit authored
      Fix vhca context size as defined by device interface specification.
      
      Fixes: f3196bb0 ("net/mlx5: Introduce vhca state event notifier")
      Signed-off-by: default avatarParav Pandit <parav@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      6a371754
    • Parav Pandit's avatar
      net/mlx5e: E-switch, Fix rate calculation division · 8b90d897
      Parav Pandit authored
      do_div() returns reminder, while cited patch wanted to use
      quotient.
      Fix it by using quotient.
      
      Fixes: 0e22bfb7 ("net/mlx5e: E-switch, Fix rate calculation for overflow")
      Signed-off-by: default avatarParav Pandit <parav@nvidia.com>
      Signed-off-by: default avatarMaor Dickman <maord@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      8b90d897
    • Maor Gottlieb's avatar
      RDMA/mlx5: Fix timestamp default mode · 8256c69b
      Maor Gottlieb authored
      1. Don't set the ts_format bit to default when it reserved - device is
         running in the old mode (free running).
      2. XRC doesn't have a CQ therefore the ts format in the QP
         context should be default / free running.
      3. Set ts_format to WQ.
      
      Fixes: 2fe8d4b8 ("RDMA/mlx5: Fail QP creation if the device can not support the CQE TS")
      Signed-off-by: default avatarMaor Gottlieb <maorg@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      8256c69b
    • Maor Gottlieb's avatar
      net/mlx5: Set QP timestamp mode to default · 4806f1e2
      Maor Gottlieb authored
      QPs which don't care from timestamp mode, should set the ts_format
      to default, otherwise the QP creation could be failed if the timestamp
      mode is not supported.
      
      Fixes: 2fe8d4b8 ("RDMA/mlx5: Fail QP creation if the device can not support the CQE TS")
      Signed-off-by: default avatarMaor Gottlieb <maorg@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      4806f1e2
    • Roi Dayan's avatar
      net/mlx5e: Fix error flow in change profile · 469549e4
      Roi Dayan authored
      Move priv memset from init to cleanup to avoid double priv cleanup
      that can happen on profile change if also roolback fails.
      Add missing cleanup flow in mlx5e_netdev_attach_profile().
      
      Fixes: c4d7eb57 ("net/mxl5e: Add change profile method")
      Signed-off-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      469549e4
    • Maor Dickman's avatar
      net/mlx5: Disable VF tunnel TX offload if ignore_flow_level isn't supported · f574531a
      Maor Dickman authored
      VF tunnel TX traffic offload is adding flow which forward to flow
      tables with lower level, which isn't support on all FW versions
      and may cause firmware to fail with syndrome.
      
      Fixed by enabling VF tunnel TX offload only if flow table capability
      ignore_flow_level is enabled.
      
      Fixes: 10742efc ("net/mlx5e: VF tunnel TX traffic offloading")
      Signed-off-by: default avatarMaor Dickman <maord@nvidia.com>
      Reviewed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      f574531a
    • Roi Dayan's avatar
      net/mlx5e: Check correct ip_version in decapsulation route resolution · 1e74152e
      Roi Dayan authored
      flow_attr->ip_version has the matching that should be done inner/outer.
      When working with chains, decapsulation is done on chain0 and next chain
      match on outer header which is the original inner which could be ipv4.
      So in tunnel route resolution we cannot use that to know which ip version
      we are at so save tun_ip_version when parsing the tunnel match and use
      that.
      
      Fixes: a508728a ("net/mlx5e: VF tunnel RX traffic offloading")
      Signed-off-by: default avatarRoi Dayan <roid@nvidia.com>
      Reviewed-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      1e74152e
    • Aya Levin's avatar
      net/mlx5: Fix turn-off PPS command · 55affa97
      Aya Levin authored
      Fix a bug of uninitialized pin index when trying to turn off PPS out.
      
      Fixes: de19cd6c ("net/mlx5: Move some PPS logic into helper functions")
      Signed-off-by: default avatarAya Levin <ayal@nvidia.com>
      Reviewed-by: default avatarEran Ben Elisha <eranbe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      55affa97
    • Maor Dickman's avatar
      net/mlx5e: Don't match on Geneve options in case option masks are all zero · 385d40b0
      Maor Dickman authored
      The cited change added offload support for Geneve options without verifying
      the validity of the options masks, this caused offload of rules with match
      on Geneve options with class,type and data masks which are zero to fail.
      
      Fix by ignoring the match on Geneve options in case option masks are
      all zero.
      
      Fixes: 9272e3df ("net/mlx5e: Geneve, Add support for encap/decap flows offload")
      Signed-off-by: default avatarMaor Dickman <maord@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Reviewed-by: default avatarOz Shlomo <ozsh@nvidia.com>
      Reviewed-by: default avatarYevgeny Kliteynik <kliteyn@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      385d40b0
    • Maxim Mikityanskiy's avatar
      net/mlx5e: Revert parameters on errors when changing PTP state without reset · 74640f09
      Maxim Mikityanskiy authored
      Port timestamping for PTP can be enabled/disabled while the channels are
      closed. In that case mlx5e_safe_switch_channels is skipped, and the
      preactivate hook is called directly. However, if that hook returns an
      error, the channel parameters must be reverted back to their old values.
      This commit adds missing handling on this case.
      
      Fixes: 145e5637 ("net/mlx5e: Add TX PTP port object support")
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      74640f09
    • Maxim Mikityanskiy's avatar
      net/mlx5e: When changing XDP program without reset, take refs for XSK RQs · e5eb0134
      Maxim Mikityanskiy authored
      Each RQ (including XSK RQs) takes a reference to the XDP program. When
      an XDP program is attached or detached, the channels and queues are
      recreated, however, there is a special flow for changing an active XDP
      program to another one. In that flow, channels and queues stay alive,
      but the refcounts of the old and new XDP programs are adjusted. This
      flow didn't increment refcount by the number of active XSK RQs, and this
      commit fixes it.
      
      Fixes: db05815b ("net/mlx5e: Add XSK zero-copy support")
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      e5eb0134
    • Aya Levin's avatar
      net/mlx5e: Set PTP channel pointer explicitly to NULL · 1c2cdf0b
      Aya Levin authored
      When closing the PTP channel, set its pointer explicitly to NULL. PTP
      channel is opened on demand, the code verify the pointer validity before
      access. Nullify it when closing the PTP channel to avoid unexpected
      behavior.
      
      Fixes: 145e5637 ("net/mlx5e: Add TX PTP port object support")
      Signed-off-by: default avatarAya Levin <ayal@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      1c2cdf0b
    • Aya Levin's avatar
      net/mlx5e: Accumulate port PTP TX stats with other channels stats · 354521ee
      Aya Levin authored
      In addition to .get_ethtool_stats, add port PTP TX stats to
      .ndo_get_stats64.
      
      Fixes: 145e5637 ("net/mlx5e: Add TX PTP port object support")
      Signed-off-by: default avatarAya Levin <ayal@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      354521ee
    • Tariq Toukan's avatar
      net/mlx5e: RX, Mind the MPWQE gaps when calculating offsets · d5dd03b2
      Tariq Toukan authored
      Since cited patch, MLX5E_REQUIRED_WQE_MTTS is not a power of two.
      Hence, usage of MLX5E_LOG_ALIGNED_MPWQE_PPW should be replaced,
      as it lost some accuracy. Use the designated macro to calculate
      the number of required MTTs.
      
      This makes sure the solution in cited patch works properly.
      
      While here, un-inline mlx5e_get_mpwqe_offset(), and remove the
      unused RQ parameter.
      
      Fixes: c3c94023 ("net/mlx5e: Add resiliency in Striding RQ mode for packets larger than MTU")
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      d5dd03b2
    • Tariq Toukan's avatar
      net/mlx5e: Enforce minimum value check for ICOSQ size · 5115daa6
      Tariq Toukan authored
      The ICOSQ size should not go below MLX5E_PARAMS_MINIMUM_LOG_SQ_SIZE.
      Enforce this where it's missing.
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Reviewed-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      5115daa6
    • Linus Torvalds's avatar
      Merge git://git.kernel.org:/pub/scm/linux/kernel/git/netdev/net · 05a59d79
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix transmissions in dynamic SMPS mode in ath9k, from Felix Fietkau.
      
       2) TX skb error handling fix in mt76 driver, also from Felix.
      
       3) Fix BPF_FETCH atomic in x86 JIT, from Brendan Jackman.
      
       4) Avoid double free of percpu pointers when freeing a cloned bpf prog.
          From Cong Wang.
      
       5) Use correct printf format for dma_addr_t in ath11k, from Geert
          Uytterhoeven.
      
       6) Fix resolve_btfids build with older toolchains, from Kun-Chuan
          Hsieh.
      
       7) Don't report truncated frames to mac80211 in mt76 driver, from
          Lorenzop Bianconi.
      
       8) Fix watcdog timeout on suspend/resume of stmmac, from Joakim Zhang.
      
       9) mscc ocelot needs NET_DEVLINK selct in Kconfig, from Arnd Bergmann.
      
      10) Fix sign comparison bug in TCP_ZEROCOPY_RECEIVE getsockopt(), from
          Arjun Roy.
      
      11) Ignore routes with deleted nexthop object in mlxsw, from Ido
          Schimmel.
      
      12) Need to undo tcp early demux lookup sometimes in nf_nat, from
          Florian Westphal.
      
      13) Fix gro aggregation for udp encaps with zero csum, from Daniel
          Borkmann.
      
      14) Make sure to always use imp*_ndo_send when necessaey, from Jason A.
          Donenfeld.
      
      15) Fix TRSCER masks in sh_eth driver from Sergey Shtylyov.
      
      16) prevent overly huge skb allocationsd in qrtr, from Pavel Skripkin.
      
      17) Prevent rx ring copnsumer index loss of sync in enetc, from Vladimir
          Oltean.
      
      18) Make sure textsearch copntrol block is large enough, from Wilem de
          Bruijn.
      
      19) Revert MAC changes to r8152 leading to instability, from Hates Wang.
      
      20) Advance iov in 9p even for empty reads, from Jissheng Zhang.
      
      21) Double hook unregister in nftables, from PabloNeira Ayuso.
      
      22) Fix memleak in ixgbe, fropm Dinghao Liu.
      
      23) Avoid dups in pkt scheduler class dumps, from Maximilian Heyne.
      
      24) Various mptcp fixes from Florian Westphal, Paolo Abeni, and Geliang
          Tang.
      
      25) Fix DOI refcount bugs in cipso, from Paul Moore.
      
      26) One too many irqsave in ibmvnic, from Junlin Yang.
      
      27) Fix infinite loop with MPLS gso segmenting via virtio_net, from
          Balazs Nemeth.
      
      * git://git.kernel.org:/pub/scm/linux/kernel/git/netdev/net: (164 commits)
        s390/qeth: fix notification for pending buffers during teardown
        s390/qeth: schedule TX NAPI on QAOB completion
        s390/qeth: improve completion of pending TX buffers
        s390/qeth: fix memory leak after failed TX Buffer allocation
        net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0
        net: check if protocol extracted by virtio_net_hdr_set_proto is correct
        net: dsa: xrs700x: check if partner is same as port in hsr join
        net: lapbether: Remove netif_start_queue / netif_stop_queue
        atm: idt77252: fix null-ptr-dereference
        atm: uPD98402: fix incorrect allocation
        atm: fix a typo in the struct description
        net: qrtr: fix error return code of qrtr_sendmsg()
        mptcp: fix length of ADD_ADDR with port sub-option
        net: bonding: fix error return code of bond_neigh_init()
        net: enetc: allow hardware timestamping on TX queues with tc-etf enabled
        net: enetc: set MAC RX FIFO to recommended value
        net: davicom: Use platform_get_irq_optional()
        net: davicom: Fix regulator not turned off on driver removal
        net: davicom: Fix regulator not turned off on failed probe
        net: dsa: fix switchdev objects on bridge master mistakenly being applied on ports
        ...
      05a59d79
    • Linus Torvalds's avatar
      Merge git://git.kernel.org:/pub/scm/linux/kernel/git/davem/sparc · 6a30bedf
      Linus Torvalds authored
      Pull sparc fixes from David Miller:
       "Fix opcode filtering for exceptions, and clean up defconfig"
      
      * git://git.kernel.org:/pub/scm/linux/kernel/git/davem/sparc:
        sparc: sparc64_defconfig: remove duplicate CONFIGs
        sparc64: Fix opcode filtering in handling of no fault loads
      6a30bedf
    • Corentin Labbe's avatar
      sparc: sparc64_defconfig: remove duplicate CONFIGs · 69264b4a
      Corentin Labbe authored
      After my patch there is CONFIG_ATA defined twice.
      Remove the duplicate one.
      Same problem for CONFIG_HAPPYMEAL, except I added as builtin for boot
      test with NFS.
      Reported-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Fixes: a57cdeb3 ("sparc: sparc64_defconfig: add necessary configs for qemu")
      Signed-off-by: default avatarCorentin Labbe <clabbe@baylibre.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      69264b4a
    • Rob Gardner's avatar
      sparc64: Fix opcode filtering in handling of no fault loads · e5e8b80d
      Rob Gardner authored
      is_no_fault_exception() has two bugs which were discovered via random
      opcode testing with stress-ng. Both are caused by improper filtering
      of opcodes.
      
      The first bug can be triggered by a floating point store with a no-fault
      ASI, for instance "sta %f0, [%g0] #ASI_PNF", opcode C1A01040.
      
      The code first tests op3[5] (0x1000000), which denotes a floating
      point instruction, and then tests op3[2] (0x200000), which denotes a
      store instruction. But these bits are not mutually exclusive, and the
      above mentioned opcode has both bits set. The intent is to filter out
      stores, so the test for stores must be done first in order to have
      any effect.
      
      The second bug can be triggered by a floating point load with one of
      the invalid ASI values 0x8e or 0x8f, which pass this check in
      is_no_fault_exception():
           if ((asi & 0xf2) == ASI_PNF)
      
      An example instruction is "ldqa [%l7 + %o7] #ASI 0x8f, %f38",
      opcode CF95D1EF. Asi values greater than 0x8b (ASI_SNFL) are fatal
      in handle_ldf_stq(), and is_no_fault_exception() must not allow these
      invalid asi values to make it that far.
      
      In both of these cases, handle_ldf_stq() reacts by calling
      sun4v_data_access_exception() or spitfire_data_access_exception(),
      which call is_no_fault_exception() and results in an infinite
      recursion.
      Signed-off-by: default avatarRob Gardner <rob.gardner@oracle.com>
      Tested-by: default avatarAnatoly Pugachev <matorola@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e5e8b80d
    • David S. Miller's avatar
      Merge branch 's390-qeth-fixes' · 85154557
      David S. Miller authored
      Julian Wiedmann says:
      
      ====================
      s390/qeth: fixes 2021-03-09
      
      please apply the following patch series to netdev's net tree.
      
      This brings one fix for a memleak in an error path of the setup code.
      Also several fixes for dealing with pending TX buffers - two for old
      bugs in their completion handling, and one recent regression in a
      teardown path.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      85154557
    • Julian Wiedmann's avatar
      s390/qeth: fix notification for pending buffers during teardown · 7eefda7f
      Julian Wiedmann authored
      The cited commit reworked the state machine for pending TX buffers.
      In qeth_iqd_tx_complete() it turned PENDING into a transient state, and
      uses NEED_QAOB for buffers that get parked while waiting for their QAOB
      completion.
      
      But it missed to adjust the check in qeth_tx_complete_buf(). So if
      qeth_tx_complete_pending_bufs() is called during teardown to drain
      the parked TX buffers, we no longer raise a notification for af_iucv.
      
      Instead of updating the checked state, just move this code into
      qeth_tx_complete_pending_bufs() itself. This also gets rid of the
      special-case in the common TX completion path.
      
      Fixes: 8908f36d ("s390/qeth: fix af_iucv notification race")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7eefda7f
    • Julian Wiedmann's avatar
      s390/qeth: schedule TX NAPI on QAOB completion · 3e83d467
      Julian Wiedmann authored
      When a QAOB notifies us that a pending TX buffer has been delivered, the
      actual TX completion processing by qeth_tx_complete_pending_bufs()
      is done within the context of a TX NAPI instance. We shouldn't rely on
      this instance being scheduled by some other TX event, but just do it
      ourselves.
      
      qeth_qdio_handle_aob() is called from qeth_poll(), ie. our main NAPI
      instance. To avoid touching the TX queue's NAPI instance
      before/after it is (un-)registered, reorder the code in qeth_open()
      and qeth_stop() accordingly.
      
      Fixes: 0da9581d ("qeth: exploit asynchronous delivery of storage blocks")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3e83d467
    • Julian Wiedmann's avatar
      s390/qeth: improve completion of pending TX buffers · c20383ad
      Julian Wiedmann authored
      The current design attaches a pending TX buffer to a custom
      single-linked list, which is anchored at the buffer's slot on the
      TX ring. The buffer is then checked for final completion whenever
      this slot is processed during a subsequent TX NAPI poll cycle.
      
      But if there's insufficient traffic on the ring, we might never make
      enough progress to get back to this ring slot and discover the pending
      buffer's final TX completion. In particular if this missing TX
      completion blocks the application from sending further traffic.
      
      So convert the custom single-linked list code to a per-queue list_head,
      and scan this list on every TX NAPI cycle.
      
      Fixes: 0da9581d ("qeth: exploit asynchronous delivery of storage blocks")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c20383ad
    • Julian Wiedmann's avatar
      s390/qeth: fix memory leak after failed TX Buffer allocation · e7a36d27
      Julian Wiedmann authored
      When qeth_alloc_qdio_queues() fails to allocate one of the buffers that
      back an Output Queue, the 'out_freeoutqbufs' path will free all
      previously allocated buffers for this queue. But it misses to free the
      half-finished queue struct itself.
      
      Move the buffer allocation into qeth_alloc_output_queue(), and deal with
      such errors internally.
      
      Fixes: 0da9581d ("qeth: exploit asynchronous delivery of storage blocks")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Reviewed-by: default avatarAlexandra Winter <wintera@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7a36d27