1. 27 Sep, 2024 2 commits
    • Eric Dumazet's avatar
      netfilter: nf_tables: prevent nf_skb_duplicated corruption · 92ceba94
      Eric Dumazet authored
      syzbot found that nf_dup_ipv4() or nf_dup_ipv6() could write
      per-cpu variable nf_skb_duplicated in an unsafe way [1].
      
      Disabling preemption as hinted by the splat is not enough,
      we have to disable soft interrupts as well.
      
      [1]
      BUG: using __this_cpu_write() in preemptible [00000000] code: syz.4.282/6316
       caller is nf_dup_ipv4+0x651/0x8f0 net/ipv4/netfilter/nf_dup_ipv4.c:87
      CPU: 0 UID: 0 PID: 6316 Comm: syz.4.282 Not tainted 6.11.0-rc7-syzkaller-00104-g7052622f #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
      Call Trace:
       <TASK>
        __dump_stack lib/dump_stack.c:93 [inline]
        dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119
        check_preemption_disabled+0x10e/0x120 lib/smp_processor_id.c:49
        nf_dup_ipv4+0x651/0x8f0 net/ipv4/netfilter/nf_dup_ipv4.c:87
        nft_dup_ipv4_eval+0x1db/0x300 net/ipv4/netfilter/nft_dup_ipv4.c:30
        expr_call_ops_eval net/netfilter/nf_tables_core.c:240 [inline]
        nft_do_chain+0x4ad/0x1da0 net/netfilter/nf_tables_core.c:288
        nft_do_chain_ipv4+0x202/0x320 net/netfilter/nft_chain_filter.c:23
        nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline]
        nf_hook_slow+0xc3/0x220 net/netfilter/core.c:626
        nf_hook+0x2c4/0x450 include/linux/netfilter.h:269
        NF_HOOK_COND include/linux/netfilter.h:302 [inline]
        ip_output+0x185/0x230 net/ipv4/ip_output.c:433
        ip_local_out net/ipv4/ip_output.c:129 [inline]
        ip_send_skb+0x74/0x100 net/ipv4/ip_output.c:1495
        udp_send_skb+0xacf/0x1650 net/ipv4/udp.c:981
        udp_sendmsg+0x1c21/0x2a60 net/ipv4/udp.c:1269
        sock_sendmsg_nosec net/socket.c:730 [inline]
        __sock_sendmsg+0x1a6/0x270 net/socket.c:745
        ____sys_sendmsg+0x525/0x7d0 net/socket.c:2597
        ___sys_sendmsg net/socket.c:2651 [inline]
        __sys_sendmmsg+0x3b2/0x740 net/socket.c:2737
        __do_sys_sendmmsg net/socket.c:2766 [inline]
        __se_sys_sendmmsg net/socket.c:2763 [inline]
        __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2763
        do_syscall_x64 arch/x86/entry/common.c:52 [inline]
        do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      RIP: 0033:0x7f4ce4f7def9
      Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f4ce5d4a038 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
      RAX: ffffffffffffffda RBX: 00007f4ce5135f80 RCX: 00007f4ce4f7def9
      RDX: 0000000000000001 RSI: 0000000020005d40 RDI: 0000000000000006
      RBP: 00007f4ce4ff0b76 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 0000000000000000 R14: 00007f4ce5135f80 R15: 00007ffd4cbc6d68
       </TASK>
      
      Fixes: d877f071 ("netfilter: nf_tables: add nft_dup expression")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      92ceba94
    • Phil Sutter's avatar
      selftests: netfilter: Fix nft_audit.sh for newer nft binaries · 8a890156
      Phil Sutter authored
      As a side-effect of nftables' commit dbff26bfba833 ("cache: consolidate
      reset command"), audit logs changed when more objects were reset than
      fit into a single netlink message.
      
      Since the objects' distribution in netlink messages is not relevant,
      implement a summarizing function which combines repeated audit logs into
      a single one with summed up 'entries=' value.
      
      Fixes: 203bb9d3 ("selftests: netfilter: Extend nft_audit.sh")
      Signed-off-by: default avatarPhil Sutter <phil@nwl.cc>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      8a890156
  2. 26 Sep, 2024 20 commits
    • Phil Sutter's avatar
      netfilter: uapi: NFTA_FLOWTABLE_HOOK is NLA_NESTED · 76f1ed08
      Phil Sutter authored
      Fix the comment which incorrectly defines it as NLA_U32.
      
      Fixes: 3b49e2e9 ("netfilter: nf_tables: add flow table netlink frontend")
      Signed-off-by: default avatarPhil Sutter <phil@nwl.cc>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      76f1ed08
    • Paolo Abeni's avatar
      Merge tag 'nf-24-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · aef3a58b
      Paolo Abeni authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      v2: with kdoc fixes per Paolo Abeni.
      
      The following patchset contains Netfilter fixes for net:
      
      Patch #1 and #2 handle an esoteric scenario: Given two tasks sending UDP
      packets to one another, two packets of the same flow in each direction
      handled by different CPUs that result in two conntrack objects in NEW
      state, where reply packet loses race. Then, patch #3 adds a testcase for
      this scenario. Series from Florian Westphal.
      
      1) NAT engine can falsely detect a port collision if it happens to pick
         up a reply packet as NEW rather than ESTABLISHED. Add extra code to
         detect this and suppress port reallocation in this case.
      
      2) To complete the clash resolution in the reply direction, extend conntrack
         logic to detect clashing conntrack in the reply direction to existing entry.
      
      3) Adds a test case.
      
      Then, an assorted list of fixes follow:
      
      4) Add a selftest for tproxy, from Antonio Ojea.
      
      5) Guard ctnetlink_*_size() functions under
         #if defined(CONFIG_NETFILTER_NETLINK_GLUE_CT) || defined(CONFIG_NF_CONNTRACK_EVENTS)
         From Andy Shevchenko.
      
      6) Use -m socket --transparent in iptables tproxy documentation.
         From XIE Zhibang.
      
      7) Call kfree_rcu() when releasing flowtable hooks to address race with
         netlink dump path, from Phil Sutter.
      
      8) Fix compilation warning in nf_reject with CONFIG_BRIDGE_NETFILTER=n.
         From Simon Horman.
      
      9) Guard ctnetlink_label_size() under CONFIG_NF_CONNTRACK_EVENTS which
         is its only user, to address a compilation warning. From Simon Horman.
      
      10) Use rcu-protected list iteration over basechain hooks from netlink
          dump path.
      
      11) Fix memcg for nf_tables, use GFP_KERNEL_ACCOUNT is not complete.
      
      12) Remove old nfqueue conntrack clash resolution. Instead trying to
          use same destination address consistently which requires double DNAT,
          use the existing clash resolution which allows clashing packets
          go through with different destination. Antonio Ojea originally
          reported an issue from the postrouting chain, I proposed a fix:
          https://lore.kernel.org/netfilter-devel/ZuwSwAqKgCB2a51-@calendula/T/
          which he reported it did not work for him.
      
      13) Adds a selftest for patch 12.
      
      14) Fixes ipvs.sh selftest.
      
      netfilter pull request 24-09-26
      
      * tag 'nf-24-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        selftests: netfilter: Avoid hanging ipvs.sh
        kselftest: add test for nfqueue induced conntrack race
        netfilter: nfnetlink_queue: remove old clash resolution logic
        netfilter: nf_tables: missing objects with no memcg accounting
        netfilter: nf_tables: use rcu chain hook list iterator from netlink dump path
        netfilter: ctnetlink: compile ctnetlink_label_size with CONFIG_NF_CONNTRACK_EVENTS
        netfilter: nf_reject: Fix build warning when CONFIG_BRIDGE_NETFILTER=n
        netfilter: nf_tables: Keep deleted flowtable hooks until after RCU
        docs: tproxy: ignore non-transparent sockets in iptables
        netfilter: ctnetlink: Guard possible unused functions
        selftests: netfilter: nft_tproxy.sh: add tcp tests
        selftests: netfilter: add reverse-clash resolution test case
        netfilter: conntrack: add clash resolution for reverse collisions
        netfilter: nf_nat: don't try nat source port reallocation for reverse dir clash
      ====================
      
      Link: https://patch.msgid.link/20240926110717.102194-1-pablo@netfilter.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      aef3a58b
    • Phil Sutter's avatar
      selftests: netfilter: Avoid hanging ipvs.sh · fc786304
      Phil Sutter authored
      If the client can't reach the server, the latter remains listening
      forever. Kill it after 5s of waiting.
      
      Fixes: 867d2190 ("selftests: netfilter: add ipvs test script")
      Signed-off-by: default avatarPhil Sutter <phil@nwl.cc>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      fc786304
    • Florian Westphal's avatar
      kselftest: add test for nfqueue induced conntrack race · e306e373
      Florian Westphal authored
      The netfilter race happens when two packets with the same tuple are DNATed
      and enqueued with nfqueue in the postrouting hook.
      
      Once one of the packet is reinjected it may be DNATed again to a different
      destination, but the conntrack entry remains the same and the return packet
      was dropped.
      
      Based on earlier patch from Antonio Ojea.
      
      Link: https://bugzilla.netfilter.org/show_bug.cgi?id=1766Co-developed-by: default avatarAntonio Ojea <aojea@google.com>
      Signed-off-by: default avatarAntonio Ojea <aojea@google.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      e306e373
    • Florian Westphal's avatar
      netfilter: nfnetlink_queue: remove old clash resolution logic · 8af79d3e
      Florian Westphal authored
      For historical reasons there are two clash resolution spots in
      netfilter, one in nfnetlink_queue and one in conntrack core.
      
      nfnetlink_queue one was added first: If a colliding entry is found, NAT
      NAT transformation is reversed by calling nat engine again with altered
      tuple.
      
      See commit 368982cd ("netfilter: nfnetlink_queue: resolve clash for
      unconfirmed conntracks") for details.
      
      One problem is that nf_reroute() won't take an action if the queueing
      doesn't occur in the OUTPUT hook, i.e. when queueing in forward or
      postrouting, packet will be sent via the wrong path.
      
      Another problem is that the scenario addressed (2nd UDP packet sent with
      identical addresses while first packet is still being processed) can also
      occur without any nfqueue involvement due to threaded resolvers doing
      A and AAAA requests back-to-back.
      
      This lead us to add clash resolution logic to the conntrack core, see
      commit 6a757c07 ("netfilter: conntrack: allow insertion of clashing
      entries").  Instead of fixing the nfqueue based logic, lets remove it
      and let conntrack core handle this instead.
      
      Retain the ->update hook for sake of nfqueue based conntrack helpers.
      We could axe this hook completely but we'd have to split confirm and
      helper logic again, see commit ee04805f ("netfilter: conntrack: make
      conntrack userspace helpers work again").
      
      This SHOULD NOT be backported to kernels earlier than v5.6; they lack
      adequate clash resolution handling.
      
      Patch was originally written by Pablo Neira Ayuso.
      Reported-by: default avatarAntonio Ojea <aojea@google.com>
      Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1766Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Tested-by: default avatarAntonio Ojea <aojea@google.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      8af79d3e
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: missing objects with no memcg accounting · 69e687ce
      Pablo Neira Ayuso authored
      Several ruleset objects are still not using GFP_KERNEL_ACCOUNT for
      memory accounting, update them. This includes:
      
      - catchall elements
      - compat match large info area
      - log prefix
      - meta secctx
      - numgen counters
      - pipapo set backend datastructure
      - tunnel private objects
      
      Fixes: 33758c89 ("memcg: enable accounting for nft objects")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      69e687ce
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: use rcu chain hook list iterator from netlink dump path · 4ffcf5ca
      Pablo Neira Ayuso authored
      Lockless iteration over hook list is possible from netlink dump path,
      use rcu variant to iterate over the hook list as is done with flowtable
      hooks.
      
      Fixes: b9703ed4 ("netfilter: nf_tables: support for adding new devices to an existing netdev chain")
      Reported-by: default avatarPhil Sutter <phil@nwl.cc>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      4ffcf5ca
    • Simon Horman's avatar
      netfilter: ctnetlink: compile ctnetlink_label_size with CONFIG_NF_CONNTRACK_EVENTS · e1f1ee0e
      Simon Horman authored
      Only provide ctnetlink_label_size when it is used,
      which is when CONFIG_NF_CONNTRACK_EVENTS is configured.
      
      Flagged by clang-18 W=1 builds as:
      
      .../nf_conntrack_netlink.c:385:19: warning: unused function 'ctnetlink_label_size' [-Wunused-function]
        385 | static inline int ctnetlink_label_size(const struct nf_conn *ct)
            |                   ^~~~~~~~~~~~~~~~~~~~
      
      The condition on CONFIG_NF_CONNTRACK_LABELS being removed by
      this patch guards compilation of non-trivial implementations
      of ctnetlink_dump_labels() and ctnetlink_label_size().
      
      However, this is not necessary as each of these functions
      will always return 0 if CONFIG_NF_CONNTRACK_LABELS is not defined
      as each function starts with the equivalent of:
      
      	struct nf_conn_labels *labels = nf_ct_labels_find(ct);
      
      	if (!labels)
      		return 0;
      
      And nf_ct_labels_find always returns NULL if CONFIG_NF_CONNTRACK_LABELS
      is not enabled.  So I believe that the compiler optimises the code away
      in such cases anyway.
      
      Found by inspection.
      Compile tested only.
      
      Originally splitted in two patches, Pablo Neira Ayuso collapsed them and
      added Fixes: tag.
      
      Fixes: 0ceabd83 ("netfilter: ctnetlink: deliver labels to userspace")
      Link: https://lore.kernel.org/netfilter-devel/20240909151712.GZ2097826@kernel.org/Signed-off-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      e1f1ee0e
    • Simon Horman's avatar
      netfilter: nf_reject: Fix build warning when CONFIG_BRIDGE_NETFILTER=n · fc56878c
      Simon Horman authored
      If CONFIG_BRIDGE_NETFILTER is not enabled, which is the case for x86_64
      defconfig, then building nf_reject_ipv4.c and nf_reject_ipv6.c with W=1
      using gcc-14 results in the following warnings, which are treated as
      errors:
      
      net/ipv4/netfilter/nf_reject_ipv4.c: In function 'nf_send_reset':
      net/ipv4/netfilter/nf_reject_ipv4.c:243:23: error: variable 'niph' set but not used [-Werror=unused-but-set-variable]
        243 |         struct iphdr *niph;
            |                       ^~~~
      cc1: all warnings being treated as errors
      net/ipv6/netfilter/nf_reject_ipv6.c: In function 'nf_send_reset6':
      net/ipv6/netfilter/nf_reject_ipv6.c:286:25: error: variable 'ip6h' set but not used [-Werror=unused-but-set-variable]
        286 |         struct ipv6hdr *ip6h;
            |                         ^~~~
      cc1: all warnings being treated as errors
      
      Address this by reducing the scope of these local variables to where
      they are used, which is code only compiled when CONFIG_BRIDGE_NETFILTER
      enabled.
      
      Compile tested and run through netfilter selftests.
      Reported-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Closes: https://lore.kernel.org/netfilter-devel/20240906145513.567781-1-andriy.shevchenko@linux.intel.com/Signed-off-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      fc56878c
    • Phil Sutter's avatar
      netfilter: nf_tables: Keep deleted flowtable hooks until after RCU · 642c89c4
      Phil Sutter authored
      Documentation of list_del_rcu() warns callers to not immediately free
      the deleted list item. While it seems not necessary to use the
      RCU-variant of list_del() here in the first place, doing so seems to
      require calling kfree_rcu() on the deleted item as well.
      
      Fixes: 3f0465a9 ("netfilter: nf_tables: dynamically allocate hooks per net_device in flowtables")
      Signed-off-by: default avatarPhil Sutter <phil@nwl.cc>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      642c89c4
    • 谢致邦 (XIE Zhibang)'s avatar
      docs: tproxy: ignore non-transparent sockets in iptables · aa758763
      谢致邦 (XIE Zhibang) authored
      The iptables example was added in commit d2f26037 (netfilter: Add
      documentation for tproxy, 2008-10-08), but xt_socket 'transparent'
      option was added in commit a31e1ffd (netfilter: xt_socket: added new
      revision of the 'socket' match supporting flags, 2009-06-09).
      
      Now add the 'transparent' option to the iptables example to ignore
      non-transparent sockets, which is also consistent with the nft example.
      Signed-off-by: default avatar谢致邦 (XIE Zhibang) <Yeking@Red54.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      aa758763
    • Andy Shevchenko's avatar
      netfilter: ctnetlink: Guard possible unused functions · 2cadd3b1
      Andy Shevchenko authored
      Some of the functions may be unused (CONFIG_NETFILTER_NETLINK_GLUE_CT=n
      and CONFIG_NF_CONNTRACK_EVENTS=n), it prevents kernel builds with clang,
      `make W=1` and CONFIG_WERROR=y:
      
      net/netfilter/nf_conntrack_netlink.c:657:22: error: unused function 'ctnetlink_acct_size' [-Werror,-Wunused-function]
        657 | static inline size_t ctnetlink_acct_size(const struct nf_conn *ct)
            |                      ^~~~~~~~~~~~~~~~~~~
      net/netfilter/nf_conntrack_netlink.c:667:19: error: unused function 'ctnetlink_secctx_size' [-Werror,-Wunused-function]
        667 | static inline int ctnetlink_secctx_size(const struct nf_conn *ct)
            |                   ^~~~~~~~~~~~~~~~~~~~~
      net/netfilter/nf_conntrack_netlink.c:683:22: error: unused function 'ctnetlink_timestamp_size' [-Werror,-Wunused-function]
        683 | static inline size_t ctnetlink_timestamp_size(const struct nf_conn *ct)
            |                      ^~~~~~~~~~~~~~~~~~~~~~~~
      
      Fix this by guarding possible unused functions with ifdeffery.
      
      See also commit 6863f564 ("kbuild: allow Clang to find unused static
      inline functions for W=1 build").
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      2cadd3b1
    • Antonio Ojea's avatar
      selftests: netfilter: nft_tproxy.sh: add tcp tests · 7e37e0ea
      Antonio Ojea authored
      The TPROXY functionality is widely used, however, there are only mptcp
      selftests covering this feature.
      
      The selftests represent the most common scenarios and can also be used
      as selfdocumentation of the feature.
      
      UDP and TCP testcases are split in different files because of the
      different nature of the protocols, specially due to the challenges that
      present to reliable test UDP due to the connectionless nature of the
      protocol. UDP only covers the scenarios involving the prerouting hook.
      
      The UDP tests are signfinicantly slower than the TCP ones, hence they
      use a larger timeout, it takes 20 seconds to run the full UDP suite
      on a 48 vCPU Intel(R) Xeon(R) CPU @2.60GHz.
      Signed-off-by: default avatarAntonio Ojea <aojea@google.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      7e37e0ea
    • Florian Westphal's avatar
      selftests: netfilter: add reverse-clash resolution test case · a57856c0
      Florian Westphal authored
      Add test program that is sending UDP packets in both directions
      and check that packets arrive without source port modification.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      a57856c0
    • Florian Westphal's avatar
      netfilter: conntrack: add clash resolution for reverse collisions · a4e6a103
      Florian Westphal authored
      Given existing entry:
      ORIGIN: a:b -> c:d
      REPLY:  c:d -> a:b
      
      And colliding entry:
      ORIGIN: c:d -> a:b
      REPLY:  a:b -> c:d
      
      The colliding ct (and the associated skb) get dropped on insert.
      Permit this by checking if the colliding entry matches the reply
      direction.
      
      Happens when both ends send packets at same time, both requests are picked
      up as NEW, rather than NEW for the 'first' and 'ESTABLISHED' for the
      second packet.
      
      This is an esoteric condition, as ruleset must permit NEW connections
      in either direction and both peers must already have a bidirectional
      traffic flow at the time conntrack gets enabled.
      
      Allow the 'reverse' skb to pass and assign the existing (clashing)
      entry.
      
      While at it, also drop the extra 'dying' check, this is already
      tested earlier by the calling function.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      a4e6a103
    • Florian Westphal's avatar
      netfilter: nf_nat: don't try nat source port reallocation for reverse dir clash · d8f84a9b
      Florian Westphal authored
      A conntrack entry can be inserted to the connection tracking table if there
      is no existing entry with an identical tuple in either direction.
      
      Example:
      INITIATOR -> NAT/PAT -> RESPONDER
      
      Initiator passes through NAT/PAT ("us") and SNAT is done (saddr rewrite).
      Then, later, NAT/PAT machine itself also wants to connect to RESPONDER.
      
      This will not work if the SNAT done earlier has same IP:PORT source pair.
      
      Conntrack table has:
      ORIGINAL: $IP_INITATOR:$SPORT -> $IP_RESPONDER:$DPORT
      REPLY:    $IP_RESPONDER:$DPORT -> $IP_NAT:$SPORT
      
      and new locally originating connection wants:
      ORIGINAL: $IP_NAT:$SPORT -> $IP_RESPONDER:$DPORT
      REPLY:    $IP_RESPONDER:$DPORT -> $IP_NAT:$SPORT
      
      This is handled by the NAT engine which will do a source port reallocation
      for the locally originating connection that is colliding with an existing
      tuple by attempting a source port rewrite.
      
      This is done even if this new connection attempt did not go through a
      masquerade/snat rule.
      
      There is a rare race condition with connection-less protocols like UDP,
      where we do the port reallocation even though its not needed.
      
      This happens when new packets from the same, pre-existing flow are received
      in both directions at the exact same time on different CPUs after the
      conntrack table was flushed (or conntrack becomes active for first time).
      
      With strict ordering/single cpu, the first packet creates new ct entry and
      second packet is resolved as established reply packet.
      
      With parallel processing, both packets are picked up as new and both get
      their own ct entry.
      
      In this case, the 'reply' packet (picked up as ORIGINAL) can be mangled by
      NAT engine because a port collision is detected.
      
      This change isn't enough to prevent a packet drop later during
      nf_conntrack_confirm(), the existing clash resolution strategy will not
      detect such reverse clash case.  This is resolved by a followup patch.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      d8f84a9b
    • Willem de Bruijn's avatar
      selftests/net: packetdrill: increase timing tolerance in debug mode · 72ef0755
      Willem de Bruijn authored
      Some packetdrill tests are flaky in debug mode. As discussed, increase
      tolerance.
      
      We have been doing this for debug builds outside ksft too.
      
      Previous setting was 10000. A manual 50 runs in virtme-ng showed two
      failures that needed 12000. To be on the safe side, Increase to 14000.
      
      Link: https://lore.kernel.org/netdev/Zuhhe4-MQHd3EkfN@mini-arch/
      Fixes: 1e42f73f ("selftests/net: packetdrill: import tcp/zerocopy")
      Reported-by: default avatarStanislav Fomichev <sdf@fomichev.me>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Acked-by: default avatarStanislav Fomichev <sdf@fomichev.me>
      Acked-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Link: https://patch.msgid.link/20240919124412.3014326-1-willemdebruijn.kernel@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      72ef0755
    • Oliver Neukum's avatar
      usbnet: fix cyclical race on disconnect with work queue · 04e90683
      Oliver Neukum authored
      The work can submit URBs and the URBs can schedule the work.
      This cycle needs to be broken, when a device is to be stopped.
      Use a flag to do so.
      This is a design issue as old as the driver.
      Signed-off-by: default avatarOliver Neukum <oneukum@suse.com>
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      CC: stable@vger.kernel.org
      Link: https://patch.msgid.link/20240919123525.688065-1-oneukum@suse.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      04e90683
    • Furong Xu's avatar
      net: stmmac: set PP_FLAG_DMA_SYNC_DEV only if XDP is enabled · b514c47e
      Furong Xu authored
      Commit 5fabb012 ("net: stmmac: Add initial XDP support") sets
      PP_FLAG_DMA_SYNC_DEV flag for page_pool unconditionally,
      page_pool_recycle_direct() will call page_pool_dma_sync_for_device()
      on every page even the page is not going to be reused by XDP program.
      
      When XDP is not enabled, the page which holds the received buffer
      will be recycled once the buffer is copied into new SKB by
      skb_copy_to_linear_data(), then the MAC core will never reuse this
      page any longer. Always setting PP_FLAG_DMA_SYNC_DEV wastes CPU cycles
      on unnecessary calling of page_pool_dma_sync_for_device().
      
      After this patch, up to 9% noticeable performance improvement was observed
      on certain platforms.
      
      Fixes: 5fabb012 ("net: stmmac: Add initial XDP support")
      Signed-off-by: default avatarFurong Xu <0x1207@gmail.com>
      Link: https://patch.msgid.link/20240919121028.1348023-1-0x1207@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      b514c47e
    • Wenbo Li's avatar
      virtio_net: Fix mismatched buf address when unmapping for small packets · c11a49d5
      Wenbo Li authored
      Currently, the virtio-net driver will perform a pre-dma-mapping for
      small or mergeable RX buffer. But for small packets, a mismatched address
      without VIRTNET_RX_PAD and xdp_headroom is used for unmapping.
      
      That will result in unsynchronized buffers when SWIOTLB is enabled, for
      example, when running as a TDX guest.
      
      This patch unifies the address passed to the virtio core as the address of
      the virtnet header and fixes the mismatched buffer address.
      
      Changes from v2: unify the buf that passed to the virtio core in small
      and merge mode.
      Changes from v1: Use ctx to get xdp_headroom.
      
      Fixes: 295525e2 ("virtio_net: merge dma operations when filling mergeable buffers")
      Signed-off-by: default avatarWenbo Li <liwenbo.martin@bytedance.com>
      Signed-off-by: default avatarJiahui Cen <cenjiahui@bytedance.com>
      Signed-off-by: default avatarYing Fang <fangying.tommy@bytedance.com>
      Reviewed-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Link: https://patch.msgid.link/20240919081351.51772-1-liwenbo.martin@bytedance.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      c11a49d5
  3. 24 Sep, 2024 10 commits
  4. 23 Sep, 2024 1 commit
    • Josh Hunt's avatar
      tcp: check skb is non-NULL in tcp_rto_delta_us() · c8770db2
      Josh Hunt authored
      We have some machines running stock Ubuntu 20.04.6 which is their 5.4.0-174-generic
      kernel that are running ceph and recently hit a null ptr dereference in
      tcp_rearm_rto(). Initially hitting it from the TLP path, but then later we also
      saw it getting hit from the RACK case as well. Here are examples of the oops
      messages we saw in each of those cases:
      
      Jul 26 15:05:02 rx [11061395.780353] BUG: kernel NULL pointer dereference, address: 0000000000000020
      Jul 26 15:05:02 rx [11061395.787572] #PF: supervisor read access in kernel mode
      Jul 26 15:05:02 rx [11061395.792971] #PF: error_code(0x0000) - not-present page
      Jul 26 15:05:02 rx [11061395.798362] PGD 0 P4D 0
      Jul 26 15:05:02 rx [11061395.801164] Oops: 0000 [#1] SMP NOPTI
      Jul 26 15:05:02 rx [11061395.805091] CPU: 0 PID: 9180 Comm: msgr-worker-1 Tainted: G W 5.4.0-174-generic #193-Ubuntu
      Jul 26 15:05:02 rx [11061395.814996] Hardware name: Supermicro SMC 2x26 os-gen8 64C NVME-Y 256G/H12SSW-NTR, BIOS 2.5.V1.2U.NVMe.UEFI 05/09/2023
      Jul 26 15:05:02 rx [11061395.825952] RIP: 0010:tcp_rearm_rto+0xe4/0x160
      Jul 26 15:05:02 rx [11061395.830656] Code: 87 ca 04 00 00 00 5b 41 5c 41 5d 5d c3 c3 49 8b bc 24 40 06 00 00 eb 8d 48 bb cf f7 53 e3 a5 9b c4 20 4c 89 ef e8 0c fe 0e 00 <48> 8b 78 20 48 c1 ef 03 48 89 f8 41 8b bc 24 80 04 00 00 48 f7 e3
      Jul 26 15:05:02 rx [11061395.849665] RSP: 0018:ffffb75d40003e08 EFLAGS: 00010246
      Jul 26 15:05:02 rx [11061395.855149] RAX: 0000000000000000 RBX: 20c49ba5e353f7cf RCX: 0000000000000000
      Jul 26 15:05:02 rx [11061395.862542] RDX: 0000000062177c30 RSI: 000000000000231c RDI: ffff9874ad283a60
      Jul 26 15:05:02 rx [11061395.869933] RBP: ffffb75d40003e20 R08: 0000000000000000 R09: ffff987605e20aa8
      Jul 26 15:05:02 rx [11061395.877318] R10: ffffb75d40003f00 R11: ffffb75d4460f740 R12: ffff9874ad283900
      Jul 26 15:05:02 rx [11061395.884710] R13: ffff9874ad283a60 R14: ffff9874ad283980 R15: ffff9874ad283d30
      Jul 26 15:05:02 rx [11061395.892095] FS: 00007f1ef4a2e700(0000) GS:ffff987605e00000(0000) knlGS:0000000000000000
      Jul 26 15:05:02 rx [11061395.900438] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      Jul 26 15:05:02 rx [11061395.906435] CR2: 0000000000000020 CR3: 0000003e450ba003 CR4: 0000000000760ef0
      Jul 26 15:05:02 rx [11061395.913822] PKRU: 55555554
      Jul 26 15:05:02 rx [11061395.916786] Call Trace:
      Jul 26 15:05:02 rx [11061395.919488]
      Jul 26 15:05:02 rx [11061395.921765] ? show_regs.cold+0x1a/0x1f
      Jul 26 15:05:02 rx [11061395.925859] ? __die+0x90/0xd9
      Jul 26 15:05:02 rx [11061395.929169] ? no_context+0x196/0x380
      Jul 26 15:05:02 rx [11061395.933088] ? ip6_protocol_deliver_rcu+0x4e0/0x4e0
      Jul 26 15:05:02 rx [11061395.938216] ? ip6_sublist_rcv_finish+0x3d/0x50
      Jul 26 15:05:02 rx [11061395.943000] ? __bad_area_nosemaphore+0x50/0x1a0
      Jul 26 15:05:02 rx [11061395.947873] ? bad_area_nosemaphore+0x16/0x20
      Jul 26 15:05:02 rx [11061395.952486] ? do_user_addr_fault+0x267/0x450
      Jul 26 15:05:02 rx [11061395.957104] ? ipv6_list_rcv+0x112/0x140
      Jul 26 15:05:02 rx [11061395.961279] ? __do_page_fault+0x58/0x90
      Jul 26 15:05:02 rx [11061395.965458] ? do_page_fault+0x2c/0xe0
      Jul 26 15:05:02 rx [11061395.969465] ? page_fault+0x34/0x40
      Jul 26 15:05:02 rx [11061395.973217] ? tcp_rearm_rto+0xe4/0x160
      Jul 26 15:05:02 rx [11061395.977313] ? tcp_rearm_rto+0xe4/0x160
      Jul 26 15:05:02 rx [11061395.981408] tcp_send_loss_probe+0x10b/0x220
      Jul 26 15:05:02 rx [11061395.985937] tcp_write_timer_handler+0x1b4/0x240
      Jul 26 15:05:02 rx [11061395.990809] tcp_write_timer+0x9e/0xe0
      Jul 26 15:05:02 rx [11061395.994814] ? tcp_write_timer_handler+0x240/0x240
      Jul 26 15:05:02 rx [11061395.999866] call_timer_fn+0x32/0x130
      Jul 26 15:05:02 rx [11061396.003782] __run_timers.part.0+0x180/0x280
      Jul 26 15:05:02 rx [11061396.008309] ? recalibrate_cpu_khz+0x10/0x10
      Jul 26 15:05:02 rx [11061396.012841] ? native_x2apic_icr_write+0x30/0x30
      Jul 26 15:05:02 rx [11061396.017718] ? lapic_next_event+0x21/0x30
      Jul 26 15:05:02 rx [11061396.021984] ? clockevents_program_event+0x8f/0xe0
      Jul 26 15:05:02 rx [11061396.027035] run_timer_softirq+0x2a/0x50
      Jul 26 15:05:02 rx [11061396.031212] __do_softirq+0xd1/0x2c1
      Jul 26 15:05:02 rx [11061396.035044] do_softirq_own_stack+0x2a/0x40
      Jul 26 15:05:02 rx [11061396.039480]
      Jul 26 15:05:02 rx [11061396.041840] do_softirq.part.0+0x46/0x50
      Jul 26 15:05:02 rx [11061396.046022] __local_bh_enable_ip+0x50/0x60
      Jul 26 15:05:02 rx [11061396.050460] _raw_spin_unlock_bh+0x1e/0x20
      Jul 26 15:05:02 rx [11061396.054817] nf_conntrack_tcp_packet+0x29e/0xbe0 [nf_conntrack]
      Jul 26 15:05:02 rx [11061396.060994] ? get_l4proto+0xe7/0x190 [nf_conntrack]
      Jul 26 15:05:02 rx [11061396.066220] nf_conntrack_in+0xe9/0x670 [nf_conntrack]
      Jul 26 15:05:02 rx [11061396.071618] ipv6_conntrack_local+0x14/0x20 [nf_conntrack]
      Jul 26 15:05:02 rx [11061396.077356] nf_hook_slow+0x45/0xb0
      Jul 26 15:05:02 rx [11061396.081098] ip6_xmit+0x3f0/0x5d0
      Jul 26 15:05:02 rx [11061396.084670] ? ipv6_anycast_cleanup+0x50/0x50
      Jul 26 15:05:02 rx [11061396.089282] ? __sk_dst_check+0x38/0x70
      Jul 26 15:05:02 rx [11061396.093381] ? inet6_csk_route_socket+0x13b/0x200
      Jul 26 15:05:02 rx [11061396.098346] inet6_csk_xmit+0xa7/0xf0
      Jul 26 15:05:02 rx [11061396.102263] __tcp_transmit_skb+0x550/0xb30
      Jul 26 15:05:02 rx [11061396.106701] tcp_write_xmit+0x3c6/0xc20
      Jul 26 15:05:02 rx [11061396.110792] ? __alloc_skb+0x98/0x1d0
      Jul 26 15:05:02 rx [11061396.114708] __tcp_push_pending_frames+0x37/0x100
      Jul 26 15:05:02 rx [11061396.119667] tcp_push+0xfd/0x100
      Jul 26 15:05:02 rx [11061396.123150] tcp_sendmsg_locked+0xc70/0xdd0
      Jul 26 15:05:02 rx [11061396.127588] tcp_sendmsg+0x2d/0x50
      Jul 26 15:05:02 rx [11061396.131245] inet6_sendmsg+0x43/0x70
      Jul 26 15:05:02 rx [11061396.135075] __sock_sendmsg+0x48/0x70
      Jul 26 15:05:02 rx [11061396.138994] ____sys_sendmsg+0x212/0x280
      Jul 26 15:05:02 rx [11061396.143172] ___sys_sendmsg+0x88/0xd0
      Jul 26 15:05:02 rx [11061396.147098] ? __seccomp_filter+0x7e/0x6b0
      Jul 26 15:05:02 rx [11061396.151446] ? __switch_to+0x39c/0x460
      Jul 26 15:05:02 rx [11061396.155453] ? __switch_to_asm+0x42/0x80
      Jul 26 15:05:02 rx [11061396.159636] ? __switch_to_asm+0x5a/0x80
      Jul 26 15:05:02 rx [11061396.163816] __sys_sendmsg+0x5c/0xa0
      Jul 26 15:05:02 rx [11061396.167647] __x64_sys_sendmsg+0x1f/0x30
      Jul 26 15:05:02 rx [11061396.171832] do_syscall_64+0x57/0x190
      Jul 26 15:05:02 rx [11061396.175748] entry_SYSCALL_64_after_hwframe+0x5c/0xc1
      Jul 26 15:05:02 rx [11061396.181055] RIP: 0033:0x7f1ef692618d
      Jul 26 15:05:02 rx [11061396.184893] Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 ca ee ff ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2f 44 89 c7 48 89 44 24 08 e8 fe ee ff ff 48
      Jul 26 15:05:02 rx [11061396.203889] RSP: 002b:00007f1ef4a26aa0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
      Jul 26 15:05:02 rx [11061396.211708] RAX: ffffffffffffffda RBX: 000000000000084b RCX: 00007f1ef692618d
      Jul 26 15:05:02 rx [11061396.219091] RDX: 0000000000004000 RSI: 00007f1ef4a26b10 RDI: 0000000000000275
      Jul 26 15:05:02 rx [11061396.226475] RBP: 0000000000004000 R08: 0000000000000000 R09: 0000000000000020
      Jul 26 15:05:02 rx [11061396.233859] R10: 0000000000000000 R11: 0000000000000293 R12: 000000000000084b
      Jul 26 15:05:02 rx [11061396.241243] R13: 00007f1ef4a26b10 R14: 0000000000000275 R15: 000055592030f1e8
      Jul 26 15:05:02 rx [11061396.248628] Modules linked in: vrf bridge stp llc vxlan ip6_udp_tunnel udp_tunnel nls_iso8859_1 amd64_edac_mod edac_mce_amd kvm_amd kvm crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper wmi_bmof ipmi_ssif input_leds joydev rndis_host cdc_ether usbnet mii ast drm_vram_helper ttm drm_kms_helper i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt ccp mac_hid ipmi_si ipmi_devintf ipmi_msghandler nft_ct sch_fq_codel nf_tables_set nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ramoops reed_solomon efi_pstore drm ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear mlx5_ib ib_uverbs ib_core raid1 mlx5_core hid_generic pci_hyperv_intf crc32_pclmul tls usbhid ahci mlxfw bnxt_en libahci hid nvme i2c_piix4 nvme_core wmi
      Jul 26 15:05:02 rx [11061396.324334] CR2: 0000000000000020
      Jul 26 15:05:02 rx [11061396.327944] ---[ end trace 68a2b679d1cfb4f1 ]---
      Jul 26 15:05:02 rx [11061396.433435] RIP: 0010:tcp_rearm_rto+0xe4/0x160
      Jul 26 15:05:02 rx [11061396.438137] Code: 87 ca 04 00 00 00 5b 41 5c 41 5d 5d c3 c3 49 8b bc 24 40 06 00 00 eb 8d 48 bb cf f7 53 e3 a5 9b c4 20 4c 89 ef e8 0c fe 0e 00 <48> 8b 78 20 48 c1 ef 03 48 89 f8 41 8b bc 24 80 04 00 00 48 f7 e3
      Jul 26 15:05:02 rx [11061396.457144] RSP: 0018:ffffb75d40003e08 EFLAGS: 00010246
      Jul 26 15:05:02 rx [11061396.462629] RAX: 0000000000000000 RBX: 20c49ba5e353f7cf RCX: 0000000000000000
      Jul 26 15:05:02 rx [11061396.470012] RDX: 0000000062177c30 RSI: 000000000000231c RDI: ffff9874ad283a60
      Jul 26 15:05:02 rx [11061396.477396] RBP: ffffb75d40003e20 R08: 0000000000000000 R09: ffff987605e20aa8
      Jul 26 15:05:02 rx [11061396.484779] R10: ffffb75d40003f00 R11: ffffb75d4460f740 R12: ffff9874ad283900
      Jul 26 15:05:02 rx [11061396.492164] R13: ffff9874ad283a60 R14: ffff9874ad283980 R15: ffff9874ad283d30
      Jul 26 15:05:02 rx [11061396.499547] FS: 00007f1ef4a2e700(0000) GS:ffff987605e00000(0000) knlGS:0000000000000000
      Jul 26 15:05:02 rx [11061396.507886] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      Jul 26 15:05:02 rx [11061396.513884] CR2: 0000000000000020 CR3: 0000003e450ba003 CR4: 0000000000760ef0
      Jul 26 15:05:02 rx [11061396.521267] PKRU: 55555554
      Jul 26 15:05:02 rx [11061396.524230] Kernel panic - not syncing: Fatal exception in interrupt
      Jul 26 15:05:02 rx [11061396.530885] Kernel Offset: 0x1b200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
      Jul 26 15:05:03 rx [11061396.660181] ---[ end Kernel panic - not syncing: Fatal
       exception in interrupt ]---
      
      After we hit this we disabled TLP by setting tcp_early_retrans to 0 and then hit the crash in the RACK case:
      
      Aug 7 07:26:16 rx [1006006.265582] BUG: kernel NULL pointer dereference, address: 0000000000000020
      Aug 7 07:26:16 rx [1006006.272719] #PF: supervisor read access in kernel mode
      Aug 7 07:26:16 rx [1006006.278030] #PF: error_code(0x0000) - not-present page
      Aug 7 07:26:16 rx [1006006.283343] PGD 0 P4D 0
      Aug 7 07:26:16 rx [1006006.286057] Oops: 0000 [#1] SMP NOPTI
      Aug 7 07:26:16 rx [1006006.289896] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G W 5.4.0-174-generic #193-Ubuntu
      Aug 7 07:26:16 rx [1006006.299107] Hardware name: Supermicro SMC 2x26 os-gen8 64C NVME-Y 256G/H12SSW-NTR, BIOS 2.5.V1.2U.NVMe.UEFI 05/09/2023
      Aug 7 07:26:16 rx [1006006.309970] RIP: 0010:tcp_rearm_rto+0xe4/0x160
      Aug 7 07:26:16 rx [1006006.314584] Code: 87 ca 04 00 00 00 5b 41 5c 41 5d 5d c3 c3 49 8b bc 24 40 06 00 00 eb 8d 48 bb cf f7 53 e3 a5 9b c4 20 4c 89 ef e8 0c fe 0e 00 <48> 8b 78 20 48 c1 ef 03 48 89 f8 41 8b bc 24 80 04 00 00 48 f7 e3
      Aug 7 07:26:16 rx [1006006.333499] RSP: 0018:ffffb42600a50960 EFLAGS: 00010246
      Aug 7 07:26:16 rx [1006006.338895] RAX: 0000000000000000 RBX: 20c49ba5e353f7cf RCX: 0000000000000000
      Aug 7 07:26:16 rx [1006006.346193] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff92d687ed8160
      Aug 7 07:26:16 rx [1006006.353489] RBP: ffffb42600a50978 R08: 0000000000000000 R09: 00000000cd896dcc
      Aug 7 07:26:16 rx [1006006.360786] R10: ffff92dc3404f400 R11: 0000000000000001 R12: ffff92d687ed8000
      Aug 7 07:26:16 rx [1006006.368084] R13: ffff92d687ed8160 R14: 00000000cd896dcc R15: 00000000cd8fca81
      Aug 7 07:26:16 rx [1006006.375381] FS: 0000000000000000(0000) GS:ffff93158ad40000(0000) knlGS:0000000000000000
      Aug 7 07:26:16 rx [1006006.383632] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      Aug 7 07:26:16 rx [1006006.389544] CR2: 0000000000000020 CR3: 0000003e775ce006 CR4: 0000000000760ee0
      Aug 7 07:26:16 rx [1006006.396839] PKRU: 55555554
      Aug 7 07:26:16 rx [1006006.399717] Call Trace:
      Aug 7 07:26:16 rx [1006006.402335]
      Aug 7 07:26:16 rx [1006006.404525] ? show_regs.cold+0x1a/0x1f
      Aug 7 07:26:16 rx [1006006.408532] ? __die+0x90/0xd9
      Aug 7 07:26:16 rx [1006006.411760] ? no_context+0x196/0x380
      Aug 7 07:26:16 rx [1006006.415599] ? __bad_area_nosemaphore+0x50/0x1a0
      Aug 7 07:26:16 rx [1006006.420392] ? _raw_spin_lock+0x1e/0x30
      Aug 7 07:26:16 rx [1006006.424401] ? bad_area_nosemaphore+0x16/0x20
      Aug 7 07:26:16 rx [1006006.428927] ? do_user_addr_fault+0x267/0x450
      Aug 7 07:26:16 rx [1006006.433450] ? __do_page_fault+0x58/0x90
      Aug 7 07:26:16 rx [1006006.437542] ? do_page_fault+0x2c/0xe0
      Aug 7 07:26:16 rx [1006006.441470] ? page_fault+0x34/0x40
      Aug 7 07:26:16 rx [1006006.445134] ? tcp_rearm_rto+0xe4/0x160
      Aug 7 07:26:16 rx [1006006.449145] tcp_ack+0xa32/0xb30
      Aug 7 07:26:16 rx [1006006.452542] tcp_rcv_established+0x13c/0x670
      Aug 7 07:26:16 rx [1006006.456981] ? sk_filter_trim_cap+0x48/0x220
      Aug 7 07:26:16 rx [1006006.461419] tcp_v6_do_rcv+0xdb/0x450
      Aug 7 07:26:16 rx [1006006.465257] tcp_v6_rcv+0xc2b/0xd10
      Aug 7 07:26:16 rx [1006006.468918] ip6_protocol_deliver_rcu+0xd3/0x4e0
      Aug 7 07:26:16 rx [1006006.473706] ip6_input_finish+0x15/0x20
      Aug 7 07:26:16 rx [1006006.477710] ip6_input+0xa2/0xb0
      Aug 7 07:26:16 rx [1006006.481109] ? ip6_protocol_deliver_rcu+0x4e0/0x4e0
      Aug 7 07:26:16 rx [1006006.486151] ip6_sublist_rcv_finish+0x3d/0x50
      Aug 7 07:26:16 rx [1006006.490679] ip6_sublist_rcv+0x1aa/0x250
      Aug 7 07:26:16 rx [1006006.494779] ? ip6_rcv_finish_core.isra.0+0xa0/0xa0
      Aug 7 07:26:16 rx [1006006.499828] ipv6_list_rcv+0x112/0x140
      Aug 7 07:26:16 rx [1006006.503748] __netif_receive_skb_list_core+0x1a4/0x250
      Aug 7 07:26:16 rx [1006006.509057] netif_receive_skb_list_internal+0x1a1/0x2b0
      Aug 7 07:26:16 rx [1006006.514538] gro_normal_list.part.0+0x1e/0x40
      Aug 7 07:26:16 rx [1006006.519068] napi_complete_done+0x91/0x130
      Aug 7 07:26:16 rx [1006006.523352] mlx5e_napi_poll+0x18e/0x610 [mlx5_core]
      Aug 7 07:26:16 rx [1006006.528481] net_rx_action+0x142/0x390
      Aug 7 07:26:16 rx [1006006.532398] __do_softirq+0xd1/0x2c1
      Aug 7 07:26:16 rx [1006006.536142] irq_exit+0xae/0xb0
      Aug 7 07:26:16 rx [1006006.539452] do_IRQ+0x5a/0xf0
      Aug 7 07:26:16 rx [1006006.542590] common_interrupt+0xf/0xf
      Aug 7 07:26:16 rx [1006006.546421]
      Aug 7 07:26:16 rx [1006006.548695] RIP: 0010:native_safe_halt+0xe/0x10
      Aug 7 07:26:16 rx [1006006.553399] Code: 7b ff ff ff eb bd 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d 36 2c 50 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d 26 2c 50 00 fb f4 90 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 e8 dd 5e 61 ff 65
      Aug 7 07:26:16 rx [1006006.572309] RSP: 0018:ffffb42600177e70 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffc2
      Aug 7 07:26:16 rx [1006006.580040] RAX: ffffffff8ed08b20 RBX: 0000000000000005 RCX: 0000000000000001
      Aug 7 07:26:16 rx [1006006.587337] RDX: 00000000f48eeca2 RSI: 0000000000000082 RDI: 0000000000000082
      Aug 7 07:26:16 rx [1006006.594635] RBP: ffffb42600177e90 R08: 0000000000000000 R09: 000000000000020f
      Aug 7 07:26:16 rx [1006006.601931] R10: 0000000000100000 R11: 0000000000000000 R12: 0000000000000005
      Aug 7 07:26:16 rx [1006006.609229] R13: ffff93157deb5f00 R14: 0000000000000000 R15: 0000000000000000
      Aug 7 07:26:16 rx [1006006.616530] ? __cpuidle_text_start+0x8/0x8
      Aug 7 07:26:16 rx [1006006.620886] ? default_idle+0x20/0x140
      Aug 7 07:26:16 rx [1006006.624804] arch_cpu_idle+0x15/0x20
      Aug 7 07:26:16 rx [1006006.628545] default_idle_call+0x23/0x30
      Aug 7 07:26:16 rx [1006006.632640] do_idle+0x1fb/0x270
      Aug 7 07:26:16 rx [1006006.636035] cpu_startup_entry+0x20/0x30
      Aug 7 07:26:16 rx [1006006.640126] start_secondary+0x178/0x1d0
      Aug 7 07:26:16 rx [1006006.644218] secondary_startup_64+0xa4/0xb0
      Aug 7 07:26:17 rx [1006006.648568] Modules linked in: vrf bridge stp llc vxlan ip6_udp_tunnel udp_tunnel nls_iso8859_1 nft_ct amd64_edac_mod edac_mce_amd kvm_amd kvm crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper wmi_bmof ipmi_ssif input_leds joydev rndis_host cdc_ether usbnet ast mii drm_vram_helper ttm drm_kms_helper i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt ccp mac_hid ipmi_si ipmi_devintf ipmi_msghandler sch_fq_codel nf_tables_set nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ramoops reed_solomon efi_pstore drm ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear mlx5_ib ib_uverbs ib_core raid1 hid_generic mlx5_core pci_hyperv_intf crc32_pclmul usbhid ahci tls mlxfw bnxt_en hid libahci nvme i2c_piix4 nvme_core wmi [last unloaded: cpuid]
      Aug 7 07:26:17 rx [1006006.726180] CR2: 0000000000000020
      Aug 7 07:26:17 rx [1006006.729718] ---[ end trace e0e2e37e4e612984 ]---
      
      Prior to seeing the first crash and on other machines we also see the warning in
      tcp_send_loss_probe() where packets_out is non-zero, but both transmit and retrans
      queues are empty so we know the box is seeing some accounting issue in this area:
      
      Jul 26 09:15:27 kernel: ------------[ cut here ]------------
      Jul 26 09:15:27 kernel: invalid inflight: 2 state 1 cwnd 68 mss 8988
      Jul 26 09:15:27 kernel: WARNING: CPU: 16 PID: 0 at net/ipv4/tcp_output.c:2605 tcp_send_loss_probe+0x214/0x220
      Jul 26 09:15:27 kernel: Modules linked in: vrf bridge stp llc vxlan ip6_udp_tunnel udp_tunnel nls_iso8859_1 nft_ct amd64_edac_mod edac_mce_amd kvm_amd kvm crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper wmi_bmof ipmi_ssif joydev input_leds rndis_host cdc_ether usbnet mii ast drm_vram_helper ttm drm_kms_he>
      Jul 26 09:15:27 kernel: CPU: 16 PID: 0 Comm: swapper/16 Not tainted 5.4.0-174-generic #193-Ubuntu
      Jul 26 09:15:27 kernel: Hardware name: Supermicro SMC 2x26 os-gen8 64C NVME-Y 256G/H12SSW-NTR, BIOS 2.5.V1.2U.NVMe.UEFI 05/09/2023
      Jul 26 09:15:27 kernel: RIP: 0010:tcp_send_loss_probe+0x214/0x220
      Jul 26 09:15:27 kernel: Code: 08 26 01 00 75 e2 41 0f b6 54 24 12 41 8b 8c 24 c0 06 00 00 45 89 f0 48 c7 c7 e0 b4 20 a7 c6 05 8d 08 26 01 01 e8 4a c0 0f 00 <0f> 0b eb ba 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41
      Jul 26 09:15:27 kernel: RSP: 0018:ffffb7838088ce00 EFLAGS: 00010286
      Jul 26 09:15:27 kernel: RAX: 0000000000000000 RBX: ffff9b84b5630430 RCX: 0000000000000006
      Jul 26 09:15:27 kernel: RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff9b8e4621c8c0
      Jul 26 09:15:27 kernel: RBP: ffffb7838088ce18 R08: 0000000000000927 R09: 0000000000000004
      Jul 26 09:15:27 kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff9b84b5630000
      Jul 26 09:15:27 kernel: R13: 0000000000000000 R14: 000000000000231c R15: ffff9b84b5630430
      Jul 26 09:15:27 kernel: FS: 0000000000000000(0000) GS:ffff9b8e46200000(0000) knlGS:0000000000000000
      Jul 26 09:15:27 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      Jul 26 09:15:27 kernel: CR2: 000056238cec2380 CR3: 0000003e49ede005 CR4: 0000000000760ee0
      Jul 26 09:15:27 kernel: PKRU: 55555554
      Jul 26 09:15:27 kernel: Call Trace:
      Jul 26 09:15:27 kernel: <IRQ>
      Jul 26 09:15:27 kernel: ? show_regs.cold+0x1a/0x1f
      Jul 26 09:15:27 kernel: ? __warn+0x98/0xe0
      Jul 26 09:15:27 kernel: ? tcp_send_loss_probe+0x214/0x220
      Jul 26 09:15:27 kernel: ? report_bug+0xd1/0x100
      Jul 26 09:15:27 kernel: ? do_error_trap+0x9b/0xc0
      Jul 26 09:15:27 kernel: ? do_invalid_op+0x3c/0x50
      Jul 26 09:15:27 kernel: ? tcp_send_loss_probe+0x214/0x220
      Jul 26 09:15:27 kernel: ? invalid_op+0x1e/0x30
      Jul 26 09:15:27 kernel: ? tcp_send_loss_probe+0x214/0x220
      Jul 26 09:15:27 kernel: tcp_write_timer_handler+0x1b4/0x240
      Jul 26 09:15:27 kernel: tcp_write_timer+0x9e/0xe0
      Jul 26 09:15:27 kernel: ? tcp_write_timer_handler+0x240/0x240
      Jul 26 09:15:27 kernel: call_timer_fn+0x32/0x130
      Jul 26 09:15:27 kernel: __run_timers.part.0+0x180/0x280
      Jul 26 09:15:27 kernel: ? timerqueue_add+0x9b/0xb0
      Jul 26 09:15:27 kernel: ? enqueue_hrtimer+0x3d/0x90
      Jul 26 09:15:27 kernel: ? do_error_trap+0x9b/0xc0
      Jul 26 09:15:27 kernel: ? do_invalid_op+0x3c/0x50
      Jul 26 09:15:27 kernel: ? tcp_send_loss_probe+0x214/0x220
      Jul 26 09:15:27 kernel: ? invalid_op+0x1e/0x30
      Jul 26 09:15:27 kernel: ? tcp_send_loss_probe+0x214/0x220
      Jul 26 09:15:27 kernel: tcp_write_timer_handler+0x1b4/0x240
      Jul 26 09:15:27 kernel: tcp_write_timer+0x9e/0xe0
      Jul 26 09:15:27 kernel: ? tcp_write_timer_handler+0x240/0x240
      Jul 26 09:15:27 kernel: call_timer_fn+0x32/0x130
      Jul 26 09:15:27 kernel: __run_timers.part.0+0x180/0x280
      Jul 26 09:15:27 kernel: ? timerqueue_add+0x9b/0xb0
      Jul 26 09:15:27 kernel: ? enqueue_hrtimer+0x3d/0x90
      Jul 26 09:15:27 kernel: ? recalibrate_cpu_khz+0x10/0x10
      Jul 26 09:15:27 kernel: ? ktime_get+0x3e/0xa0
      Jul 26 09:15:27 kernel: ? native_x2apic_icr_write+0x30/0x30
      Jul 26 09:15:27 kernel: run_timer_softirq+0x2a/0x50
      Jul 26 09:15:27 kernel: __do_softirq+0xd1/0x2c1
      Jul 26 09:15:27 kernel: irq_exit+0xae/0xb0
      Jul 26 09:15:27 kernel: smp_apic_timer_interrupt+0x7b/0x140
      Jul 26 09:15:27 kernel: apic_timer_interrupt+0xf/0x20
      Jul 26 09:15:27 kernel: </IRQ>
      Jul 26 09:15:27 kernel: RIP: 0010:native_safe_halt+0xe/0x10
      Jul 26 09:15:27 kernel: Code: 7b ff ff ff eb bd 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d 36 2c 50 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d 26 2c 50 00 fb f4 <c3> 90 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 e8 dd 5e 61 ff 65
      Jul 26 09:15:27 kernel: RSP: 0018:ffffb783801cfe70 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
      Jul 26 09:15:27 kernel: RAX: ffffffffa6908b20 RBX: 0000000000000010 RCX: 0000000000000001
      Jul 26 09:15:27 kernel: RDX: 000000006fc0c97e RSI: 0000000000000082 RDI: 0000000000000082
      Jul 26 09:15:27 kernel: RBP: ffffb783801cfe90 R08: 0000000000000000 R09: 0000000000000225
      Jul 26 09:15:27 kernel: R10: 0000000000100000 R11: 0000000000000000 R12: 0000000000000010
      Jul 26 09:15:27 kernel: R13: ffff9b8e390b0000 R14: 0000000000000000 R15: 0000000000000000
      Jul 26 09:15:27 kernel: ? __cpuidle_text_start+0x8/0x8
      Jul 26 09:15:27 kernel: ? default_idle+0x20/0x140
      Jul 26 09:15:27 kernel: arch_cpu_idle+0x15/0x20
      Jul 26 09:15:27 kernel: default_idle_call+0x23/0x30
      Jul 26 09:15:27 kernel: do_idle+0x1fb/0x270
      Jul 26 09:15:27 kernel: cpu_startup_entry+0x20/0x30
      Jul 26 09:15:27 kernel: start_secondary+0x178/0x1d0
      Jul 26 09:15:27 kernel: secondary_startup_64+0xa4/0xb0
      Jul 26 09:15:27 kernel: ---[ end trace e7ac822987e33be1 ]---
      
      The NULL ptr deref is coming from tcp_rto_delta_us() attempting to pull an skb
      off the head of the retransmit queue and then dereferencing that skb to get the
      skb_mstamp_ns value via tcp_skb_timestamp_us(skb).
      
      The crash is the same one that was reported a # of years ago here:
      https://lore.kernel.org/netdev/86c0f836-9a7c-438b-d81a-839be45f1f58@gmail.com/T/#t
      
      and the kernel we're running has the fix which was added to resolve this issue.
      
      Unfortunately we've been unsuccessful so far in reproducing this problem in the
      lab and do not have the luxury of pushing out a new kernel to try and test if
      newer kernels resolve this issue at the moment. I realize this is a report
      against both an Ubuntu kernel and also an older 5.4 kernel. I have reported this
      issue to Ubuntu here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2077657
      however I feel like since this issue has possibly cropped up again it makes
      sense to build in some protection in this path (even on the latest kernel
      versions) since the code in question just blindly assumes there's a valid skb
      without testing if it's NULL b/f it looks at the timestamp.
      
      Given we have seen crashes in this path before and now this case it seems like
      we should protect ourselves for when packets_out accounting is incorrect.
      While we should fix that root cause we should also just make sure the skb
      is not NULL before dereferencing it. Also add a warn once here to capture
      some information if/when the problem case is hit again.
      
      Fixes: e1a10ef7 ("tcp: introduce tcp_rto_delta_us() helper for xmit timer fix")
      Signed-off-by: default avatarJosh Hunt <johunt@akamai.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c8770db2
  5. 22 Sep, 2024 2 commits
  6. 19 Sep, 2024 5 commits
    • Kaixin Wang's avatar
      net: seeq: Fix use after free vulnerability in ether3 Driver Due to Race Condition · b5109b60
      Kaixin Wang authored
      In the ether3_probe function, a timer is initialized with a callback
      function ether3_ledoff, bound to &prev(dev)->timer. Once the timer is
      started, there is a risk of a race condition if the module or device
      is removed, triggering the ether3_remove function to perform cleanup.
      The sequence of operations that may lead to a UAF bug is as follows:
      
      CPU0                                    CPU1
      
                            |  ether3_ledoff
      ether3_remove         |
        free_netdev(dev);   |
        put_devic           |
        kfree(dev);         |
       |  ether3_outw(priv(dev)->regs.config2 |= CFG2_CTRLO, REG_CONFIG2);
                            | // use dev
      
      Fix it by ensuring that the timer is canceled before proceeding with
      the cleanup in ether3_remove.
      
      Fixes: 6fd9c53f ("net: seeq: Convert timers to use timer_setup()")
      Signed-off-by: default avatarKaixin Wang <kxwang23@m.fudan.edu.cn>
      Link: https://patch.msgid.link/20240915144045.451-1-kxwang23@m.fudan.edu.cnSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      b5109b60
    • Eric Dumazet's avatar
      netfilter: nf_reject_ipv6: fix nf_reject_ip6_tcphdr_put() · 9c778fe4
      Eric Dumazet authored
      syzbot reported that nf_reject_ip6_tcphdr_put() was possibly sending
      garbage on the four reserved tcp bits (th->res1)
      
      Use skb_put_zero() to clear the whole TCP header,
      as done in nf_reject_ip_tcphdr_put()
      
      BUG: KMSAN: uninit-value in nf_reject_ip6_tcphdr_put+0x688/0x6c0 net/ipv6/netfilter/nf_reject_ipv6.c:255
        nf_reject_ip6_tcphdr_put+0x688/0x6c0 net/ipv6/netfilter/nf_reject_ipv6.c:255
        nf_send_reset6+0xd84/0x15b0 net/ipv6/netfilter/nf_reject_ipv6.c:344
        nft_reject_inet_eval+0x3c1/0x880 net/netfilter/nft_reject_inet.c:48
        expr_call_ops_eval net/netfilter/nf_tables_core.c:240 [inline]
        nft_do_chain+0x438/0x22a0 net/netfilter/nf_tables_core.c:288
        nft_do_chain_inet+0x41a/0x4f0 net/netfilter/nft_chain_filter.c:161
        nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline]
        nf_hook_slow+0xf4/0x400 net/netfilter/core.c:626
        nf_hook include/linux/netfilter.h:269 [inline]
        NF_HOOK include/linux/netfilter.h:312 [inline]
        ipv6_rcv+0x29b/0x390 net/ipv6/ip6_input.c:310
        __netif_receive_skb_one_core net/core/dev.c:5661 [inline]
        __netif_receive_skb+0x1da/0xa00 net/core/dev.c:5775
        process_backlog+0x4ad/0xa50 net/core/dev.c:6108
        __napi_poll+0xe7/0x980 net/core/dev.c:6772
        napi_poll net/core/dev.c:6841 [inline]
        net_rx_action+0xa5a/0x19b0 net/core/dev.c:6963
        handle_softirqs+0x1ce/0x800 kernel/softirq.c:554
        __do_softirq+0x14/0x1a kernel/softirq.c:588
        do_softirq+0x9a/0x100 kernel/softirq.c:455
        __local_bh_enable_ip+0x9f/0xb0 kernel/softirq.c:382
        local_bh_enable include/linux/bottom_half.h:33 [inline]
        rcu_read_unlock_bh include/linux/rcupdate.h:908 [inline]
        __dev_queue_xmit+0x2692/0x5610 net/core/dev.c:4450
        dev_queue_xmit include/linux/netdevice.h:3105 [inline]
        neigh_resolve_output+0x9ca/0xae0 net/core/neighbour.c:1565
        neigh_output include/net/neighbour.h:542 [inline]
        ip6_finish_output2+0x2347/0x2ba0 net/ipv6/ip6_output.c:141
        __ip6_finish_output net/ipv6/ip6_output.c:215 [inline]
        ip6_finish_output+0xbb8/0x14b0 net/ipv6/ip6_output.c:226
        NF_HOOK_COND include/linux/netfilter.h:303 [inline]
        ip6_output+0x356/0x620 net/ipv6/ip6_output.c:247
        dst_output include/net/dst.h:450 [inline]
        NF_HOOK include/linux/netfilter.h:314 [inline]
        ip6_xmit+0x1ba6/0x25d0 net/ipv6/ip6_output.c:366
        inet6_csk_xmit+0x442/0x530 net/ipv6/inet6_connection_sock.c:135
        __tcp_transmit_skb+0x3b07/0x4880 net/ipv4/tcp_output.c:1466
        tcp_transmit_skb net/ipv4/tcp_output.c:1484 [inline]
        tcp_connect+0x35b6/0x7130 net/ipv4/tcp_output.c:4143
        tcp_v6_connect+0x1bcc/0x1e40 net/ipv6/tcp_ipv6.c:333
        __inet_stream_connect+0x2ef/0x1730 net/ipv4/af_inet.c:679
        inet_stream_connect+0x6a/0xd0 net/ipv4/af_inet.c:750
        __sys_connect_file net/socket.c:2061 [inline]
        __sys_connect+0x606/0x690 net/socket.c:2078
        __do_sys_connect net/socket.c:2088 [inline]
        __se_sys_connect net/socket.c:2085 [inline]
        __x64_sys_connect+0x91/0xe0 net/socket.c:2085
        x64_sys_call+0x27a5/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:43
        do_syscall_x64 arch/x86/entry/common.c:52 [inline]
        do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      
      Uninit was stored to memory at:
        nf_reject_ip6_tcphdr_put+0x60c/0x6c0 net/ipv6/netfilter/nf_reject_ipv6.c:249
        nf_send_reset6+0xd84/0x15b0 net/ipv6/netfilter/nf_reject_ipv6.c:344
        nft_reject_inet_eval+0x3c1/0x880 net/netfilter/nft_reject_inet.c:48
        expr_call_ops_eval net/netfilter/nf_tables_core.c:240 [inline]
        nft_do_chain+0x438/0x22a0 net/netfilter/nf_tables_core.c:288
        nft_do_chain_inet+0x41a/0x4f0 net/netfilter/nft_chain_filter.c:161
        nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline]
        nf_hook_slow+0xf4/0x400 net/netfilter/core.c:626
        nf_hook include/linux/netfilter.h:269 [inline]
        NF_HOOK include/linux/netfilter.h:312 [inline]
        ipv6_rcv+0x29b/0x390 net/ipv6/ip6_input.c:310
        __netif_receive_skb_one_core net/core/dev.c:5661 [inline]
        __netif_receive_skb+0x1da/0xa00 net/core/dev.c:5775
        process_backlog+0x4ad/0xa50 net/core/dev.c:6108
        __napi_poll+0xe7/0x980 net/core/dev.c:6772
        napi_poll net/core/dev.c:6841 [inline]
        net_rx_action+0xa5a/0x19b0 net/core/dev.c:6963
        handle_softirqs+0x1ce/0x800 kernel/softirq.c:554
        __do_softirq+0x14/0x1a kernel/softirq.c:588
      
      Uninit was stored to memory at:
        nf_reject_ip6_tcphdr_put+0x2ca/0x6c0 net/ipv6/netfilter/nf_reject_ipv6.c:231
        nf_send_reset6+0xd84/0x15b0 net/ipv6/netfilter/nf_reject_ipv6.c:344
        nft_reject_inet_eval+0x3c1/0x880 net/netfilter/nft_reject_inet.c:48
        expr_call_ops_eval net/netfilter/nf_tables_core.c:240 [inline]
        nft_do_chain+0x438/0x22a0 net/netfilter/nf_tables_core.c:288
        nft_do_chain_inet+0x41a/0x4f0 net/netfilter/nft_chain_filter.c:161
        nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline]
        nf_hook_slow+0xf4/0x400 net/netfilter/core.c:626
        nf_hook include/linux/netfilter.h:269 [inline]
        NF_HOOK include/linux/netfilter.h:312 [inline]
        ipv6_rcv+0x29b/0x390 net/ipv6/ip6_input.c:310
        __netif_receive_skb_one_core net/core/dev.c:5661 [inline]
        __netif_receive_skb+0x1da/0xa00 net/core/dev.c:5775
        process_backlog+0x4ad/0xa50 net/core/dev.c:6108
        __napi_poll+0xe7/0x980 net/core/dev.c:6772
        napi_poll net/core/dev.c:6841 [inline]
        net_rx_action+0xa5a/0x19b0 net/core/dev.c:6963
        handle_softirqs+0x1ce/0x800 kernel/softirq.c:554
        __do_softirq+0x14/0x1a kernel/softirq.c:588
      
      Uninit was created at:
        slab_post_alloc_hook mm/slub.c:3998 [inline]
        slab_alloc_node mm/slub.c:4041 [inline]
        kmem_cache_alloc_node_noprof+0x6bf/0xb80 mm/slub.c:4084
        kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:583
        __alloc_skb+0x363/0x7b0 net/core/skbuff.c:674
        alloc_skb include/linux/skbuff.h:1320 [inline]
        nf_send_reset6+0x98d/0x15b0 net/ipv6/netfilter/nf_reject_ipv6.c:327
        nft_reject_inet_eval+0x3c1/0x880 net/netfilter/nft_reject_inet.c:48
        expr_call_ops_eval net/netfilter/nf_tables_core.c:240 [inline]
        nft_do_chain+0x438/0x22a0 net/netfilter/nf_tables_core.c:288
        nft_do_chain_inet+0x41a/0x4f0 net/netfilter/nft_chain_filter.c:161
        nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline]
        nf_hook_slow+0xf4/0x400 net/netfilter/core.c:626
        nf_hook include/linux/netfilter.h:269 [inline]
        NF_HOOK include/linux/netfilter.h:312 [inline]
        ipv6_rcv+0x29b/0x390 net/ipv6/ip6_input.c:310
        __netif_receive_skb_one_core net/core/dev.c:5661 [inline]
        __netif_receive_skb+0x1da/0xa00 net/core/dev.c:5775
        process_backlog+0x4ad/0xa50 net/core/dev.c:6108
        __napi_poll+0xe7/0x980 net/core/dev.c:6772
        napi_poll net/core/dev.c:6841 [inline]
        net_rx_action+0xa5a/0x19b0 net/core/dev.c:6963
        handle_softirqs+0x1ce/0x800 kernel/softirq.c:554
        __do_softirq+0x14/0x1a kernel/softirq.c:588
      
      Fixes: c8d7b98b ("netfilter: move nf_send_resetX() code to nf_reject_ipvX modules")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Link: https://patch.msgid.link/20240913170615.3670897-1-edumazet@google.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      9c778fe4
    • Sean Anderson's avatar
      net: xilinx: axienet: Fix packet counting · 5a6caa2c
      Sean Anderson authored
      axienet_free_tx_chain returns the number of DMA descriptors it's
      handled. However, axienet_tx_poll treats the return as the number of
      packets. When scatter-gather SKBs are enabled, a single packet may use
      multiple DMA descriptors, which causes incorrect packet counts. Fix this
      by explicitly keepting track of the number of packets processed as
      separate from the DMA descriptors.
      
      Budget does not affect the number of Tx completions we can process for
      NAPI, so we use the ring size as the limit instead of budget. As we no
      longer return the number of descriptors processed to axienet_tx_poll, we
      now update tx_bd_ci in axienet_free_tx_chain.
      
      Fixes: 8a3b7a25 ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver")
      Signed-off-by: default avatarSean Anderson <sean.anderson@linux.dev>
      Link: https://patch.msgid.link/20240913145156.2283067-1-sean.anderson@linux.devSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      5a6caa2c
    • Sean Anderson's avatar
      net: xilinx: axienet: Schedule NAPI in two steps · ba0da2dc
      Sean Anderson authored
      As advised by Documentation/networking/napi.rst, masking IRQs after
      calling napi_schedule can be racy. Avoid this by only masking/scheduling
      if napi_schedule_prep returns true.
      
      Fixes: 9e2bc267 ("net: axienet: Use NAPI for TX completion path")
      Fixes: cc37610c ("net: axienet: implement NAPI and GRO receive")
      Signed-off-by: default avatarSean Anderson <sean.anderson@linux.dev>
      Reviewed-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://patch.msgid.link/20240913145711.2284295-1-sean.anderson@linux.devSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      ba0da2dc
    • Vladimir Oltean's avatar
      net: phy: aquantia: fix -ETIMEDOUT PHY probe failure when firmware not present · 194ef9d0
      Vladimir Oltean authored
      The author of the blamed commit apparently did not notice something
      about aqr_wait_reset_complete(): it polls the exact same register -
      MDIO_MMD_VEND1:VEND1_GLOBAL_FW_ID - as aqr_firmware_load().
      
      Thus, the entire logic after the introduction of aqr_wait_reset_complete() is
      now completely side-stepped, because if aqr_wait_reset_complete()
      succeeds, MDIO_MMD_VEND1:VEND1_GLOBAL_FW_ID could have only been a
      non-zero value. The handling of the case where the register reads as 0
      is dead code, due to the previous -ETIMEDOUT having stopped execution
      and returning a fatal error to the caller. We never attempt to load
      new firmware if no firmware is present.
      
      Based on static code analysis, I guess we should simply introduce a
      switch/case statement based on the return code from aqr_wait_reset_complete(),
      to determine whether to load firmware or not. I am not intending to
      change the procedure through which the driver determines whether to load
      firmware or not, as I am unaware of alternative possibilities.
      
      At the same time, Russell King suggests that if aqr_wait_reset_complete()
      is expected to return -ETIMEDOUT as part of normal operation and not
      just catastrophic failure, the use of phy_read_mmd_poll_timeout() is
      improper, since that has an embedded print inside. Just open-code a
      call to read_poll_timeout() to avoid printing -ETIMEDOUT, but continue
      printing actual read errors from the MDIO bus.
      
      Fixes: ad649a1f ("net: phy: aquantia: wait for FW reset before checking the vendor ID")
      Reported-by: default avatarClark Wang <xiaoning.wang@nxp.com>
      Reported-by: default avatarJon Hunter <jonathanh@nvidia.com>
      Closes: https://lore.kernel.org/netdev/8ac00a45-ac61-41b4-9f74-d18157b8b6bf@nvidia.com/Reported-by: default avatarHans-Frieder Vogt <hfdevel@gmx.net>
      Closes: https://lore.kernel.org/netdev/c7c1a3ae-be97-4929-8d89-04c8aa870209@gmx.net/Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: default avatarBartosz Golaszewski <bartosz.golaszewski@linaro.org>
      Tested-by: default avatarHans-Frieder Vogt <hfdevel@gmx.net>
      Link: https://patch.msgid.link/20240913121230.2620122-1-vladimir.oltean@nxp.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      194ef9d0