1. 13 Aug, 2021 1 commit
    • Ilya Leoshkevich's avatar
      bpf: Clear zext_dst of dead insns · 45c709f8
      Ilya Leoshkevich authored
      "access skb fields ok" verifier test fails on s390 with the "verifier
      bug. zext_dst is set, but no reg is defined" message. The first insns
      of the test prog are ...
      
         0:	61 01 00 00 00 00 00 00 	ldxw %r0,[%r1+0]
         8:	35 00 00 01 00 00 00 00 	jge %r0,0,1
        10:	61 01 00 08 00 00 00 00 	ldxw %r0,[%r1+8]
      
      ... and the 3rd one is dead (this does not look intentional to me, but
      this is a separate topic).
      
      sanitize_dead_code() converts dead insns into "ja -1", but keeps
      zext_dst. When opt_subreg_zext_lo32_rnd_hi32() tries to parse such
      an insn, it sees this discrepancy and bails. This problem can be seen
      only with JITs whose bpf_jit_needs_zext() returns true.
      
      Fix by clearning dead insns' zext_dst.
      
      The commits that contributed to this problem are:
      
      1. 5aa5bd14 ("bpf: add initial suite for selftests"), which
         introduced the test with the dead code.
      2. 5327ed3d ("bpf: verifier: mark verified-insn with
         sub-register zext flag"), which introduced the zext_dst flag.
      3. 83a28819 ("bpf: Account for BPF_FETCH in
         insn_has_def32()"), which introduced the sanity check.
      4. 9183671a ("bpf: Fix leakage under speculation on
         mispredicted branches"), which bisect points to.
      
      It's best to fix this on stable branches that contain the second one,
      since that's the point where the inconsistency was introduced.
      
      Fixes: 5327ed3d ("bpf: verifier: mark verified-insn with sub-register zext flag")
      Signed-off-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210812151811.184086-2-iii@linux.ibm.com
      45c709f8
  2. 11 Aug, 2021 1 commit
    • Yonghong Song's avatar
      bpf: Add rcu_read_lock in bpf_get_current_[ancestor_]cgroup_id() helpers · 2d3a1e36
      Yonghong Song authored
      Currently, if bpf_get_current_cgroup_id() or
      bpf_get_current_ancestor_cgroup_id() helper is
      called with sleepable programs e.g., sleepable
      fentry/fmod_ret/fexit/lsm programs, a rcu warning
      may appear. For example, if I added the following
      hack to test_progs/test_lsm sleepable fentry program
      test_sys_setdomainname:
      
        --- a/tools/testing/selftests/bpf/progs/lsm.c
        +++ b/tools/testing/selftests/bpf/progs/lsm.c
        @@ -168,6 +168,10 @@ int BPF_PROG(test_sys_setdomainname, struct pt_regs *regs)
                int buf = 0;
                long ret;
      
        +       __u64 cg_id = bpf_get_current_cgroup_id();
        +       if (cg_id == 1000)
        +               copy_test++;
        +
                ret = bpf_copy_from_user(&buf, sizeof(buf), ptr);
                if (len == -2 && ret == 0 && buf == 1234)
                        copy_test++;
      
      I will hit the following rcu warning:
      
        include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
        other info that might help us debug this:
          rcu_scheduler_active = 2, debug_locks = 1
          1 lock held by test_progs/260:
            #0: ffffffffa5173360 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable+0x0/0xa0
          stack backtrace:
          CPU: 1 PID: 260 Comm: test_progs Tainted: G           O      5.14.0-rc2+ #176
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
          Call Trace:
            dump_stack_lvl+0x56/0x7b
            bpf_get_current_cgroup_id+0x9c/0xb1
            bpf_prog_a29888d1c6706e09_test_sys_setdomainname+0x3e/0x89c
            bpf_trampoline_6442469132_0+0x2d/0x1000
            __x64_sys_setdomainname+0x5/0x110
            do_syscall_64+0x3a/0x80
            entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      I can get similar warning using bpf_get_current_ancestor_cgroup_id() helper.
      syzbot reported a similar issue in [1] for syscall program. Helper
      bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id()
      has the following callchain:
         task_dfl_cgroup
           task_css_set
             task_css_set_check
      and we have
         #define task_css_set_check(task, __c)                                   \
                 rcu_dereference_check((task)->cgroups,                          \
                         lockdep_is_held(&cgroup_mutex) ||                       \
                         lockdep_is_held(&css_set_lock) ||                       \
                         ((task)->flags & PF_EXITING) || (__c))
      Since cgroup_mutex/css_set_lock is not held and the task
      is not existing and rcu read_lock is not held, a warning
      will be issued. Note that bpf sleepable program is protected by
      rcu_read_lock_trace().
      
      The above sleepable bpf programs are already protected
      by migrate_disable(). Adding rcu_read_lock() in these
      two helpers will silence the above warning.
      I marked the patch fixing 95b861a7
      ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
      which added bpf_get_current_ancestor_cgroup_id() to tracing programs
      in 5.14. I think backporting 5.14 is probably good enough as sleepable
      progrems are not widely used.
      
      This patch should fix [1] as well since syscall program is a sleepable
      program protected with migrate_disable().
      
       [1] https://lore.kernel.org/bpf/0000000000006d5cab05c7d9bb87@google.com/
      
      Fixes: 95b861a7 ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
      Reported-by: syzbot+7ee5c2c09c284495371f@syzkaller.appspotmail.com
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20210810230537.2864668-1-yhs@fb.com
      2d3a1e36
  3. 10 Aug, 2021 38 commits
    • Yang Yingliang's avatar
      net: bridge: fix memleak in br_add_if() · 519133de
      Yang Yingliang authored
      I got a memleak report:
      
      BUG: memory leak
      unreferenced object 0x607ee521a658 (size 240):
      comm "syz-executor.0", pid 955, jiffies 4294780569 (age 16.449s)
      hex dump (first 32 bytes, cpu 1):
      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
      backtrace:
      [<00000000d830ea5a>] br_multicast_add_port+0x1c2/0x300 net/bridge/br_multicast.c:1693
      [<00000000274d9a71>] new_nbp net/bridge/br_if.c:435 [inline]
      [<00000000274d9a71>] br_add_if+0x670/0x1740 net/bridge/br_if.c:611
      [<0000000012ce888e>] do_set_master net/core/rtnetlink.c:2513 [inline]
      [<0000000012ce888e>] do_set_master+0x1aa/0x210 net/core/rtnetlink.c:2487
      [<0000000099d1cafc>] __rtnl_newlink+0x1095/0x13e0 net/core/rtnetlink.c:3457
      [<00000000a01facc0>] rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3488
      [<00000000acc9186c>] rtnetlink_rcv_msg+0x369/0xa10 net/core/rtnetlink.c:5550
      [<00000000d4aabb9c>] netlink_rcv_skb+0x134/0x3d0 net/netlink/af_netlink.c:2504
      [<00000000bc2e12a3>] netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
      [<00000000bc2e12a3>] netlink_unicast+0x4a0/0x6a0 net/netlink/af_netlink.c:1340
      [<00000000e4dc2d0e>] netlink_sendmsg+0x789/0xc70 net/netlink/af_netlink.c:1929
      [<000000000d22c8b3>] sock_sendmsg_nosec net/socket.c:654 [inline]
      [<000000000d22c8b3>] sock_sendmsg+0x139/0x170 net/socket.c:674
      [<00000000e281417a>] ____sys_sendmsg+0x658/0x7d0 net/socket.c:2350
      [<00000000237aa2ab>] ___sys_sendmsg+0xf8/0x170 net/socket.c:2404
      [<000000004f2dc381>] __sys_sendmsg+0xd3/0x190 net/socket.c:2433
      [<0000000005feca6c>] do_syscall_64+0x37/0x90 arch/x86/entry/common.c:47
      [<000000007304477d>] entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      On error path of br_add_if(), p->mcast_stats allocated in
      new_nbp() need be freed, or it will be leaked.
      
      Fixes: 1080ab95 ("net: bridge: add support for IGMP/MLD stats and export them via netlink")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Link: https://lore.kernel.org/r/20210809132023.978546-1-yangyingliang@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      519133de
    • Vladimir Oltean's avatar
      net: switchdev: zero-initialize struct switchdev_notifier_fdb_info emitted by... · c35b57ce
      Vladimir Oltean authored
      net: switchdev: zero-initialize struct switchdev_notifier_fdb_info emitted by drivers towards the bridge
      
      The blamed commit added a new field to struct switchdev_notifier_fdb_info,
      but did not make sure that all call paths set it to something valid.
      For example, a switchdev driver may emit a SWITCHDEV_FDB_ADD_TO_BRIDGE
      notifier, and since the 'is_local' flag is not set, it contains junk
      from the stack, so the bridge might interpret those notifications as
      being for local FDB entries when that was not intended.
      
      To avoid that now and in the future, zero-initialize all
      switchdev_notifier_fdb_info structures created by drivers such that all
      newly added fields to not need to touch drivers again.
      
      Fixes: 2c4eca3e ("net: bridge: switchdev: include local flag in FDB notifications")
      Reported-by: default avatarIdo Schimmel <idosch@idosch.org>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Tested-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Reviewed-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Link: https://lore.kernel.org/r/20210810115024.1629983-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c35b57ce
    • Nikolay Aleksandrov's avatar
      net: bridge: fix flags interpretation for extern learn fdb entries · 45a68787
      Nikolay Aleksandrov authored
      Ignore fdb flags when adding port extern learn entries and always set
      BR_FDB_LOCAL flag when adding bridge extern learn entries. This is
      closest to the behaviour we had before and avoids breaking any use cases
      which were allowed.
      
      This patch fixes iproute2 calls which assume NUD_PERMANENT and were
      allowed before, example:
      $ bridge fdb add 00:11:22:33:44:55 dev swp1 extern_learn
      
      Extern learn entries are allowed to roam, but do not expire, so static
      or dynamic flags make no sense for them.
      
      Also add a comment for future reference.
      
      Fixes: eb100e0e ("net: bridge: allow to add externally learned entries from user-space")
      Fixes: 0541a629 ("net: bridge: validate the NUD_PERMANENT bit when adding an extern_learn FDB entry")
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Tested-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20210810110010.43859-1-razor@blackwall.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      45a68787
    • Jakub Kicinski's avatar
      Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 2e273b09
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      bpf 2021-08-10
      
      We've added 5 non-merge commits during the last 2 day(s) which contain
      a total of 7 files changed, 27 insertions(+), 15 deletions(-).
      
      1) Fix missing bpf_read_lock_trace() context for BPF loader progs, from Yonghong Song.
      
      2) Fix corner case where BPF prog retrieves wrong local storage, also from Yonghong Song.
      
      3) Restrict availability of BPF write_user helper behind lockdown, from Daniel Borkmann.
      
      4) Fix multiple kernel-doc warnings in BPF core, from Randy Dunlap.
      
      * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        bpf, core: Fix kernel-doc notation
        bpf: Fix potentially incorrect results with bpf_get_local_storage()
        bpf: Add missing bpf_read_[un]lock_trace() for syscall program
        bpf: Add lockdown check for probe_write_user helper
        bpf: Add _kernel suffix to internal lockdown_bpf_read
      ====================
      
      Link: https://lore.kernel.org/r/20210810144025.22814-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2e273b09
    • David S. Miller's avatar
      Merge branch 'fdb-backpressure-fixes' · 09c7fd52
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      Fix broken backpressure during FDB dump in DSA drivers
      
      rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into
      multiple netlink skbs if the buffer provided by user space is too small
      (one buffer will typically handle a few hundred FDB entries).
      
      When the current buffer becomes full, nlmsg_put() in
      dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index
      of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that
      point, and then the dump resumes on the same port with a new skb, and
      FDB entries up to the saved index are simply skipped.
      
      Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to
      drivers, then drivers must check for the -EMSGSIZE error code returned
      by it. Otherwise, when a netlink skb becomes full, DSA will no longer
      save newly dumped FDB entries to it, but the driver will continue
      dumping. So FDB entries will be missing from the dump.
      
      DSA is one of the few switchdev drivers that have an .ndo_fdb_dump
      implementation, because of the assumption that the hardware and software
      FDBs cannot be efficiently kept in sync via SWITCHDEV_FDB_ADD_TO_BRIDGE.
      Other drivers with a home-cooked .ndo_fdb_dump implementation are
      ocelot and dpaa2-switch. These appear to do the correct thing, as do the
      other DSA drivers, so nothing else appears to need fixing.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      09c7fd52
    • Vladimir Oltean's avatar
      net: dsa: sja1105: fix broken backpressure in .port_fdb_dump · 21b52fed
      Vladimir Oltean authored
      rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into
      multiple netlink skbs if the buffer provided by user space is too small
      (one buffer will typically handle a few hundred FDB entries).
      
      When the current buffer becomes full, nlmsg_put() in
      dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index
      of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that
      point, and then the dump resumes on the same port with a new skb, and
      FDB entries up to the saved index are simply skipped.
      
      Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to
      drivers, then drivers must check for the -EMSGSIZE error code returned
      by it. Otherwise, when a netlink skb becomes full, DSA will no longer
      save newly dumped FDB entries to it, but the driver will continue
      dumping. So FDB entries will be missing from the dump.
      
      Fix the broken backpressure by propagating the "cb" return code and
      allow rtnl_fdb_dump() to restart the FDB dump with a new skb.
      
      Fixes: 291d1e72 ("net: dsa: sja1105: Add support for FDB and MDB management")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      21b52fed
    • Vladimir Oltean's avatar
      net: dsa: lantiq: fix broken backpressure in .port_fdb_dump · 871a73a1
      Vladimir Oltean authored
      rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into
      multiple netlink skbs if the buffer provided by user space is too small
      (one buffer will typically handle a few hundred FDB entries).
      
      When the current buffer becomes full, nlmsg_put() in
      dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index
      of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that
      point, and then the dump resumes on the same port with a new skb, and
      FDB entries up to the saved index are simply skipped.
      
      Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to
      drivers, then drivers must check for the -EMSGSIZE error code returned
      by it. Otherwise, when a netlink skb becomes full, DSA will no longer
      save newly dumped FDB entries to it, but the driver will continue
      dumping. So FDB entries will be missing from the dump.
      
      Fix the broken backpressure by propagating the "cb" return code and
      allow rtnl_fdb_dump() to restart the FDB dump with a new skb.
      
      Fixes: 58c59ef9 ("net: dsa: lantiq: Add Forwarding Database access")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      871a73a1
    • Vladimir Oltean's avatar
      net: dsa: lan9303: fix broken backpressure in .port_fdb_dump · ada2fee1
      Vladimir Oltean authored
      rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into
      multiple netlink skbs if the buffer provided by user space is too small
      (one buffer will typically handle a few hundred FDB entries).
      
      When the current buffer becomes full, nlmsg_put() in
      dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index
      of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that
      point, and then the dump resumes on the same port with a new skb, and
      FDB entries up to the saved index are simply skipped.
      
      Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to
      drivers, then drivers must check for the -EMSGSIZE error code returned
      by it. Otherwise, when a netlink skb becomes full, DSA will no longer
      save newly dumped FDB entries to it, but the driver will continue
      dumping. So FDB entries will be missing from the dump.
      
      Fix the broken backpressure by propagating the "cb" return code and
      allow rtnl_fdb_dump() to restart the FDB dump with a new skb.
      
      Fixes: ab335349 ("net: dsa: lan9303: Add port_fast_age and port_fdb_dump methods")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ada2fee1
    • Vladimir Oltean's avatar
      net: dsa: hellcreek: fix broken backpressure in .port_fdb_dump · cd391280
      Vladimir Oltean authored
      rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into
      multiple netlink skbs if the buffer provided by user space is too small
      (one buffer will typically handle a few hundred FDB entries).
      
      When the current buffer becomes full, nlmsg_put() in
      dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index
      of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that
      point, and then the dump resumes on the same port with a new skb, and
      FDB entries up to the saved index are simply skipped.
      
      Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to
      drivers, then drivers must check for the -EMSGSIZE error code returned
      by it. Otherwise, when a netlink skb becomes full, DSA will no longer
      save newly dumped FDB entries to it, but the driver will continue
      dumping. So FDB entries will be missing from the dump.
      
      Fix the broken backpressure by propagating the "cb" return code and
      allow rtnl_fdb_dump() to restart the FDB dump with a new skb.
      
      Fixes: e4b27ebc ("net: dsa: Add DSA driver for Hirschmann Hellcreek switches")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cd391280
    • Randy Dunlap's avatar
      bpf, core: Fix kernel-doc notation · 019d0454
      Randy Dunlap authored
      Fix kernel-doc warnings in kernel/bpf/core.c (found by scripts/kernel-doc
      and W=1 builds). That is, correct a function name in a comment and add
      return descriptions for 2 functions.
      
      Fixes these kernel-doc warnings:
      
        kernel/bpf/core.c:1372: warning: expecting prototype for __bpf_prog_run(). Prototype was for ___bpf_prog_run() instead
        kernel/bpf/core.c:1372: warning: No description found for return value of '___bpf_prog_run'
        kernel/bpf/core.c:1883: warning: No description found for return value of 'bpf_prog_select_runtime'
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210809215229.7556-1-rdunlap@infradead.org
      019d0454
    • Eric Dumazet's avatar
      net: igmp: fix data-race in igmp_ifc_timer_expire() · 4a2b285e
      Eric Dumazet authored
      Fix the data-race reported by syzbot [1]
      Issue here is that igmp_ifc_timer_expire() can update in_dev->mr_ifc_count
      while another change just occured from another context.
      
      in_dev->mr_ifc_count is only 8bit wide, so the race had little
      consequences.
      
      [1]
      BUG: KCSAN: data-race in igmp_ifc_event / igmp_ifc_timer_expire
      
      write to 0xffff8881051e3062 of 1 bytes by task 12547 on cpu 0:
       igmp_ifc_event+0x1d5/0x290 net/ipv4/igmp.c:821
       igmp_group_added+0x462/0x490 net/ipv4/igmp.c:1356
       ____ip_mc_inc_group+0x3ff/0x500 net/ipv4/igmp.c:1461
       __ip_mc_join_group+0x24d/0x2c0 net/ipv4/igmp.c:2199
       ip_mc_join_group_ssm+0x20/0x30 net/ipv4/igmp.c:2218
       do_ip_setsockopt net/ipv4/ip_sockglue.c:1285 [inline]
       ip_setsockopt+0x1827/0x2a80 net/ipv4/ip_sockglue.c:1423
       tcp_setsockopt+0x8c/0xa0 net/ipv4/tcp.c:3657
       sock_common_setsockopt+0x5d/0x70 net/core/sock.c:3362
       __sys_setsockopt+0x18f/0x200 net/socket.c:2159
       __do_sys_setsockopt net/socket.c:2170 [inline]
       __se_sys_setsockopt net/socket.c:2167 [inline]
       __x64_sys_setsockopt+0x62/0x70 net/socket.c:2167
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffff8881051e3062 of 1 bytes by interrupt on cpu 1:
       igmp_ifc_timer_expire+0x706/0xa30 net/ipv4/igmp.c:808
       call_timer_fn+0x2e/0x1d0 kernel/time/timer.c:1419
       expire_timers+0x135/0x250 kernel/time/timer.c:1464
       __run_timers+0x358/0x420 kernel/time/timer.c:1732
       run_timer_softirq+0x19/0x30 kernel/time/timer.c:1745
       __do_softirq+0x12c/0x26e kernel/softirq.c:558
       invoke_softirq kernel/softirq.c:432 [inline]
       __irq_exit_rcu+0x9a/0xb0 kernel/softirq.c:636
       sysvec_apic_timer_interrupt+0x69/0x80 arch/x86/kernel/apic/apic.c:1100
       asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638
       console_unlock+0x8e8/0xb30 kernel/printk/printk.c:2646
       vprintk_emit+0x125/0x3d0 kernel/printk/printk.c:2174
       vprintk_default+0x22/0x30 kernel/printk/printk.c:2185
       vprintk+0x15a/0x170 kernel/printk/printk_safe.c:392
       printk+0x62/0x87 kernel/printk/printk.c:2216
       selinux_netlink_send+0x399/0x400 security/selinux/hooks.c:6041
       security_netlink_send+0x42/0x90 security/security.c:2070
       netlink_sendmsg+0x59e/0x7c0 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:703 [inline]
       sock_sendmsg net/socket.c:723 [inline]
       ____sys_sendmsg+0x360/0x4d0 net/socket.c:2392
       ___sys_sendmsg net/socket.c:2446 [inline]
       __sys_sendmsg+0x1ed/0x270 net/socket.c:2475
       __do_sys_sendmsg net/socket.c:2484 [inline]
       __se_sys_sendmsg net/socket.c:2482 [inline]
       __x64_sys_sendmsg+0x42/0x50 net/socket.c:2482
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x01 -> 0x02
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 12539 Comm: syz-executor.1 Not tainted 5.14.0-rc4-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4a2b285e
    • David S. Miller's avatar
      Merge branch 'ks8795-vlan-fixes' · 37c86c4a
      David S. Miller authored
      Ben Hutchings says:
      
      ====================
      ksz8795 VLAN fixes
      
      This series fixes a number of bugs in the ksz8795 driver that affect
      VLAN filtering, tag insertion, and tag removal.
      
      I've tested these on the KSZ8795CLXD evaluation board, and checked the
      register usage against the datasheets for the other supported chips.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      37c86c4a
    • Ben Hutchings's avatar
      net: dsa: microchip: ksz8795: Don't use phy_port_cnt in VLAN table lookup · 411d466d
      Ben Hutchings authored
      The magic number 4 in VLAN table lookup was the number of entries we
      can read and write at once.  Using phy_port_cnt here doesn't make
      sense and presumably broke VLAN filtering for 3-port switches.  Change
      it back to 4.
      
      Fixes: 4ce2a984 ("net: dsa: microchip: ksz8795: use phy_port_cnt ...")
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@mind.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      411d466d
    • Ben Hutchings's avatar
      net: dsa: microchip: ksz8795: Fix VLAN filtering · 16484413
      Ben Hutchings authored
      Currently ksz8_port_vlan_filtering() sets or clears the VLAN Enable
      hardware flag.  That controls discarding of packets with a VID that
      has not been enabled for any port on the switch.
      
      Since it is a global flag, set the dsa_switch::vlan_filtering_is_global
      flag so that the DSA core understands this can't be controlled per
      port.
      
      When VLAN filtering is enabled, the switch should also discard packets
      with a VID that's not enabled on the ingress port.  Set or clear each
      external port's VLAN Ingress Filter flag in ksz8_port_vlan_filtering()
      to make that happen.
      
      Fixes: e66f840c ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver")
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@mind.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      16484413
    • Ben Hutchings's avatar
      net: dsa: microchip: ksz8795: Use software untagging on CPU port · 9130c2d3
      Ben Hutchings authored
      On the CPU port, we can support both tagged and untagged VLANs at the
      same time by doing any necessary untagging in software rather than
      hardware.  To enable that, keep the CPU port's Remove Tag flag cleared
      and set the dsa_switch::untag_bridge_pvid flag.
      
      Fixes: e66f840c ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver")
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@mind.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9130c2d3
    • Ben Hutchings's avatar
      net: dsa: microchip: ksz8795: Fix VLAN untagged flag change on deletion · af01754f
      Ben Hutchings authored
      When a VLAN is deleted from a port, the flags in struct
      switchdev_obj_port_vlan are always 0.  ksz8_port_vlan_del() copies the
      BRIDGE_VLAN_INFO_UNTAGGED flag to the port's Tag Removal flag, and
      therefore always clears it.
      
      In case there are multiple VLANs configured as untagged on this port -
      which seems useless, but is allowed - deleting one of them changes the
      remaining VLANs to be tagged.
      
      It's only ever necessary to change this flag when a VLAN is added to
      the port, so leave it unchanged in ksz8_port_vlan_del().
      
      Fixes: e66f840c ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver")
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@mind.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      af01754f
    • Ben Hutchings's avatar
      net: dsa: microchip: ksz8795: Reject unsupported VLAN configuration · 8f4f58f8
      Ben Hutchings authored
      The switches supported by ksz8795 only have a per-port flag for Tag
      Removal.  This means it is not possible to support both tagged and
      untagged VLANs on the same port.  Reject attempts to add a VLAN that
      requires the flag to be changed, unless there are no VLANs currently
      configured.
      
      VID 0 is excluded from this check since it is untagged regardless of
      the state of the flag.
      
      On the CPU port we could support tagged and untagged VLANs at the same
      time.  This will be enabled by a later patch.
      
      Fixes: e66f840c ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver")
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@mind.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8f4f58f8
    • Ben Hutchings's avatar
      net: dsa: microchip: ksz8795: Fix PVID tag insertion · ef3b02a1
      Ben Hutchings authored
      ksz8795 has never actually enabled PVID tag insertion, and it also
      programmed the PVID incorrectly.  To fix this:
      
      * Allow tag insertion to be controlled per ingress port.  On most
        chips, set bit 2 in Global Control 19.  On KSZ88x3 this control
        flag doesn't exist.
      
      * When adding a PVID:
        - Set the appropriate register bits to enable tag insertion on
          egress at every other port if this was the packet's ingress port.
        - Mask *out* the VID from the default tag, before or-ing in the new
          PVID.
      
      * When removing a PVID:
        - Clear the same control bits to disable tag insertion.
        - Don't update the default tag.  This wasn't doing anything useful.
      
      Fixes: e66f840c ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver")
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@mind.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef3b02a1
    • Ben Hutchings's avatar
      net: dsa: microchip: Fix ksz_read64() · c34f674c
      Ben Hutchings authored
      ksz_read64() currently does some dubious byte-swapping on the two
      halves of a 64-bit register, and then only returns the high bits.
      Replace this with a straightforward expression.
      
      Fixes: e66f840c ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver")
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@mind.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c34f674c
    • David S. Miller's avatar
      Merge tag 'linux-can-fixes-for-5.14-20210810' of... · 31782a01
      David S. Miller authored
      Merge tag 'linux-can-fixes-for-5.14-20210810' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
      
      linux-can-fixes-for-5.14-20210810
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can 2021-08-10
      
      this is a pull request of 2 patches for net/master.
      
      Baruch Siach's patch fixes a typo for the Microchip CAN BUS Analyzer
      Tool entry in the MAINTAINERS file.
      
      Hussein Alasadi fixes the setting of the M_CAN_DBTP register in the
      m_can driver. The regression git mainline in v5.14-rc1, so no backport
      to stable is needed.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      31782a01
    • David S. Miller's avatar
      Merge tag 'mlx5-fixes-2021-08-09' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 6a279f61
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5 fixes 2021-08-09
      
      This series introduces fixes to mlx5 driver.
      Please pull and let me know if there is any problem.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6a279f61
    • David S. Miller's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · ea377dca
      David S. Miller authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2021-08-09
      
      This series contains updates to ice and iavf drivers.
      
      Ani prevents the ice driver from accidentally being probed to a virtual
      function and stops processing of VF messages when VFs are being torn
      down.
      
      Brett prevents the ice driver from deleting is own MAC address.
      
      Fahad ensures the RSS LUT and key are always set following reset for
      iavf.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ea377dca
    • Yonghong Song's avatar
      bpf: Fix potentially incorrect results with bpf_get_local_storage() · a2baf4e8
      Yonghong Song authored
      Commit b910eaaa ("bpf: Fix NULL pointer dereference in bpf_get_local_storage()
      helper") fixed a bug for bpf_get_local_storage() helper so different tasks
      won't mess up with each other's percpu local storage.
      
      The percpu data contains 8 slots so it can hold up to 8 contexts (same or
      different tasks), for 8 different program runs, at the same time. This in
      general is sufficient. But our internal testing showed the following warning
      multiple times:
      
        [...]
        warning: WARNING: CPU: 13 PID: 41661 at include/linux/bpf-cgroup.h:193
           __cgroup_bpf_run_filter_sock_ops+0x13e/0x180
        RIP: 0010:__cgroup_bpf_run_filter_sock_ops+0x13e/0x180
        <IRQ>
         tcp_call_bpf.constprop.99+0x93/0xc0
         tcp_conn_request+0x41e/0xa50
         ? tcp_rcv_state_process+0x203/0xe00
         tcp_rcv_state_process+0x203/0xe00
         ? sk_filter_trim_cap+0xbc/0x210
         ? tcp_v6_inbound_md5_hash.constprop.41+0x44/0x160
         tcp_v6_do_rcv+0x181/0x3e0
         tcp_v6_rcv+0xc65/0xcb0
         ip6_protocol_deliver_rcu+0xbd/0x450
         ip6_input_finish+0x11/0x20
         ip6_input+0xb5/0xc0
         ip6_sublist_rcv_finish+0x37/0x50
         ip6_sublist_rcv+0x1dc/0x270
         ipv6_list_rcv+0x113/0x140
         __netif_receive_skb_list_core+0x1a0/0x210
         netif_receive_skb_list_internal+0x186/0x2a0
         gro_normal_list.part.170+0x19/0x40
         napi_complete_done+0x65/0x150
         mlx5e_napi_poll+0x1ae/0x680
         __napi_poll+0x25/0x120
         net_rx_action+0x11e/0x280
         __do_softirq+0xbb/0x271
         irq_exit_rcu+0x97/0xa0
         common_interrupt+0x7f/0xa0
         </IRQ>
         asm_common_interrupt+0x1e/0x40
        RIP: 0010:bpf_prog_1835a9241238291a_tw_egress+0x5/0xbac
         ? __cgroup_bpf_run_filter_skb+0x378/0x4e0
         ? do_softirq+0x34/0x70
         ? ip6_finish_output2+0x266/0x590
         ? ip6_finish_output+0x66/0xa0
         ? ip6_output+0x6c/0x130
         ? ip6_xmit+0x279/0x550
         ? ip6_dst_check+0x61/0xd0
        [...]
      
      Using drgn [0] to dump the percpu buffer contents showed that on this CPU
      slot 0 is still available, but slots 1-7 are occupied and those tasks in
      slots 1-7 mostly don't exist any more. So we might have issues in
      bpf_cgroup_storage_unset().
      
      Further debugging confirmed that there is a bug in bpf_cgroup_storage_unset().
      Currently, it tries to unset "current" slot with searching from the start.
      So the following sequence is possible:
      
        1. A task is running and claims slot 0
        2. Running BPF program is done, and it checked slot 0 has the "task"
           and ready to reset it to NULL (not yet).
        3. An interrupt happens, another BPF program runs and it claims slot 1
           with the *same* task.
        4. The unset() in interrupt context releases slot 0 since it matches "task".
        5. Interrupt is done, the task in process context reset slot 0.
      
      At the end, slot 1 is not reset and the same process can continue to occupy
      slots 2-7 and finally, when the above step 1-5 is repeated again, step 3 BPF
      program won't be able to claim an empty slot and a warning will be issued.
      
      To fix the issue, for unset() function, we should traverse from the last slot
      to the first. This way, the above issue can be avoided.
      
      The same reverse traversal should also be done in bpf_get_local_storage() helper
      itself. Otherwise, incorrect local storage may be returned to BPF program.
      
        [0] https://github.com/osandov/drgn
      
      Fixes: b910eaaa ("bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper")
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20210810010413.1976277-1-yhs@fb.com
      a2baf4e8
    • Yonghong Song's avatar
      bpf: Add missing bpf_read_[un]lock_trace() for syscall program · 87b7b533
      Yonghong Song authored
      Commit 79a7f8bd ("bpf: Introduce bpf_sys_bpf() helper and program type.")
      added support for syscall program, which is a sleepable program.
      
      But the program run missed bpf_read_lock_trace()/bpf_read_unlock_trace(),
      which is needed to ensure proper rcu callback invocations. This patch adds
      bpf_read_[un]lock_trace() properly.
      
      Fixes: 79a7f8bd ("bpf: Introduce bpf_sys_bpf() helper and program type.")
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20210809235151.1663680-1-yhs@fb.com
      87b7b533
    • Daniel Borkmann's avatar
      bpf: Add lockdown check for probe_write_user helper · 51e1bb9e
      Daniel Borkmann authored
      Back then, commit 96ae5227 ("bpf: Add bpf_probe_write_user BPF helper
      to be called in tracers") added the bpf_probe_write_user() helper in order
      to allow to override user space memory. Its original goal was to have a
      facility to "debug, divert, and manipulate execution of semi-cooperative
      processes" under CAP_SYS_ADMIN. Write to kernel was explicitly disallowed
      since it would otherwise tamper with its integrity.
      
      One use case was shown in cf9b1199 ("samples/bpf: Add test/example of
      using bpf_probe_write_user bpf helper") where the program DNATs traffic
      at the time of connect(2) syscall, meaning, it rewrites the arguments to
      a syscall while they're still in userspace, and before the syscall has a
      chance to copy the argument into kernel space. These days we have better
      mechanisms in BPF for achieving the same (e.g. for load-balancers), but
      without having to write to userspace memory.
      
      Of course the bpf_probe_write_user() helper can also be used to abuse
      many other things for both good or bad purpose. Outside of BPF, there is
      a similar mechanism for ptrace(2) such as PTRACE_PEEK{TEXT,DATA} and
      PTRACE_POKE{TEXT,DATA}, but would likely require some more effort.
      Commit 96ae5227 explicitly dedicated the helper for experimentation
      purpose only. Thus, move the helper's availability behind a newly added
      LOCKDOWN_BPF_WRITE_USER lockdown knob so that the helper is disabled under
      the "integrity" mode. More fine-grained control can be implemented also
      from LSM side with this change.
      
      Fixes: 96ae5227 ("bpf: Add bpf_probe_write_user BPF helper to be called in tracers")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      51e1bb9e
    • Hussein Alasadi's avatar
      can: m_can: m_can_set_bittiming(): fix setting M_CAN_DBTP register · aae32b78
      Hussein Alasadi authored
      This patch fixes the setting of the M_CAN_DBTP register contents:
      - use DBTP_ (the data bitrate macros) instead of NBTP_ which area used
        for the nominal bitrate
      - do not overwrite possibly-existing DBTP_TDC flag by ORing reg_btp
        instead of overwriting
      
      Link: https://lore.kernel.org/r/FRYP281MB06140984ABD9994C0AAF7433D1F69@FRYP281MB0614.DEUP281.PROD.OUTLOOK.COM
      Fixes: 20779943 ("can: m_can: use bits.h macros for all regmasks")
      Cc: Torin Cooper-Bennun <torin@maxiluxsystems.com>
      Cc: Chandrasekar Ramakrishnan <rcsekar@samsung.com>
      Signed-off-by: default avatarHussein Alasadi <alasadi@arecs.eu>
      [mkl: update patch description, update indention]
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      aae32b78
    • Baruch Siach's avatar
      MAINTAINERS: fix Microchip CAN BUS Analyzer Tool entry typo · 7b637cd5
      Baruch Siach authored
      This patch fixes the abbreviated name of the Microchip CAN BUS
      Analyzer Tool.
      
      Fixes: 8a7b46fa ("MAINTAINERS: add Yasushi SHOJI as reviewer for the Microchip CAN BUS Analyzer Tool driver")
      Link: https://lore.kernel.org/r/cc4831cb1c8759c15fb32c21fd326e831183733d.1627876781.git.baruch@tkos.co.ilSigned-off-by: default avatarBaruch Siach <baruch@tkos.co.il>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      7b637cd5
    • Aya Levin's avatar
      net/mlx5: Fix return value from tracer initialization · bd37c288
      Aya Levin authored
      Check return value of mlx5_fw_tracer_start(), set error path and fix
      return value of mlx5_fw_tracer_init() accordingly.
      
      Fixes: c71ad41c ("net/mlx5: FW tracer, events handling")
      Signed-off-by: default avatarAya Levin <ayal@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      bd37c288
    • Shay Drory's avatar
      net/mlx5: Synchronize correct IRQ when destroying CQ · 563476ae
      Shay Drory authored
      The CQ destroy is performed based on the IRQ number that is stored in
      cq->irqn. That number wasn't set explicitly during CQ creation and as
      expected some of the API users of mlx5_core_create_cq() forgot to update
      it.
      
      This caused to wrong synchronization call of the wrong IRQ with a number
      0 instead of the real one.
      
      As a fix, set the IRQ number directly in the mlx5_core_create_cq() and
      update all users accordingly.
      
      Fixes: 1a86b377 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
      Fixes: ef1659ad ("IB/mlx5: Add DEVX support for CQ events")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      563476ae
    • Chris Mi's avatar
      net/mlx5e: TC, Fix error handling memory leak · 88bbd7b2
      Chris Mi authored
      Free the offload sample action on error.
      
      Fixes: f94d6389 ("net/mlx5e: TC, Add support to offload sample action")
      Signed-off-by: default avatarChris Mi <cmi@nvidia.com>
      Reviewed-by: default avatarOz Shlomo <ozsh@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      88bbd7b2
    • Shay Drory's avatar
      net/mlx5: Destroy pool->mutex · ba317e83
      Shay Drory authored
      Destroy pool->mutex when we destroy the pool.
      
      Fixes: c36326d3 ("net/mlx5: Round-Robin EQs over IRQs")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarParav Pandit <parav@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      ba317e83
    • Shay Drory's avatar
      net/mlx5: Set all field of mlx5_irq before inserting it to the xarray · 5957cc55
      Shay Drory authored
      Currently irq->pool is set after the irq is insert to the xarray.
      Set irq->pool before the irq is inserted to the xarray.
      
      Fixes: 71e084e2 ("net/mlx5: Allocating a pool of MSI-X vectors for SFs")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarParav Pandit <parav@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      5957cc55
    • Shay Drory's avatar
      net/mlx5: Fix order of functions in mlx5_irq_detach_nb() · 3c8946e0
      Shay Drory authored
      Change order of functions in mlx5_irq_detach_nb() so it will be
      a mirror of mlx5_irq_attach_nb.
      
      Fixes: 71e084e2 ("net/mlx5: Allocating a pool of MSI-X vectors for SFs")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarParav Pandit <parav@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      3c8946e0
    • Aya Levin's avatar
      net/mlx5: Block switchdev mode while devlink traps are active · c85a6b8f
      Aya Levin authored
      Since switchdev mode can't support  devlink traps, verify there are
      no active devlink traps before moving eswitch to switchdev mode. If
      there are active traps, prevent the switchdev mode configuration.
      
      Fixes: eb3862a0 ("net/mlx5e: Enable traps according to link state")
      Signed-off-by: default avatarAya Levin <ayal@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      c85a6b8f
    • Maxim Mikityanskiy's avatar
      net/mlx5e: Destroy page pool after XDP SQ to fix use-after-free · 8ba3e4c8
      Maxim Mikityanskiy authored
      mlx5e_close_xdpsq does the cleanup: it calls mlx5e_free_xdpsq_descs to
      free the outstanding descriptors, which relies on
      mlx5e_page_release_dynamic and page_pool_release_page. However,
      page_pool_destroy is already called by this point, because
      mlx5e_close_rq runs before mlx5e_close_xdpsq.
      
      This commit fixes the use-after-free by swapping mlx5e_close_xdpsq and
      mlx5e_close_rq.
      
      The commit cited below started calling page_pool_destroy directly from
      the driver. Previously, the page pool was destroyed under a call_rcu
      from xdp_rxq_info_unreg_mem_model, which would defer the deallocation
      until after the XDPSQ is cleaned up.
      
      Fixes: 1da4bbef ("net: core: page_pool: add user refcnt and reintroduce page_pool_destroy")
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      8ba3e4c8
    • Vlad Buslov's avatar
      net/mlx5: Bridge, fix ageing time · 6d8680da
      Vlad Buslov authored
      Ageing time is not converted from clock_t to jiffies which results
      incorrect ageing timeout calculation in workqueue update task. Fix it by
      applying clock_t_to_jiffies() to provided value.
      
      Fixes: c636a0f0 ("net/mlx5: Bridge, dynamic entry ageing")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      6d8680da
    • Roi Dayan's avatar
      net/mlx5e: Avoid creating tunnel headers for local route · c623c95a
      Roi Dayan authored
      It could be local and remote are on the same machine and the route
      result will be a local route which will result in creating encap id
      with src/dst mac address of 0.
      
      Fixes: a54e20b4 ("net/mlx5e: Add basic TC tunnel set action for SRIOV offloads")
      Signed-off-by: default avatarRoi Dayan <roid@nvidia.com>
      Reviewed-by: default avatarMaor Dickman <maord@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      c623c95a
    • Alex Vesker's avatar
      net/mlx5: DR, Add fail on error check on decap · d3875924
      Alex Vesker authored
      While processing encapsulated packet on RX, one of the fields that is
      checked is the inner packet length. If the length as specified in the header
      doesn't match the actual inner packet length, the packet is invalid
      and should be dropped. However, such packet caused the NIC to hang.
      
      This patch turns on a 'fail_on_error' HW bit which allows HW to drop
      such an invalid packet while processing RX packet and trying to decap it.
      
      Fixes: ad17dc8c ("net/mlx5: DR, Move STEv0 action apply logic")
      Signed-off-by: default avatarAlex Vesker <valex@nvidia.com>
      Signed-off-by: default avatarYevgeny Kliteynik <kliteyn@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      d3875924