1. 02 Jun, 2022 1 commit
  2. 01 Jun, 2022 5 commits
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: use kfree_rcu(ptr, rcu) to release hooks in clean_net path · ab5e5c06
      Pablo Neira Ayuso authored
      Use kfree_rcu(ptr, rcu) variant instead as described by ae089831
      ("netfilter: nf_tables: prefer kfree_rcu(ptr, rcu) variant").
      
      Fixes: f9a43007 ("netfilter: nf_tables: double hook unregistration in netns path")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      ab5e5c06
    • Florian Westphal's avatar
      netfilter: nat: really support inet nat without l3 address · 282e5f8f
      Florian Westphal authored
      When no l3 address is given, priv->family is set to NFPROTO_INET and
      the evaluation function isn't called.
      
      Call it too so l4-only rewrite can work.
      Also add a test case for this.
      
      Fixes: a33f387e ("netfilter: nft_nat: allow to specify layer 4 protocol NAT only")
      Reported-by: default avatarYi Chen <yiche@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      282e5f8f
    • Eric Dumazet's avatar
      tcp: tcp_rtx_synack() can be called from process context · 0a375c82
      Eric Dumazet authored
      Laurent reported the enclosed report [1]
      
      This bug triggers with following coditions:
      
      0) Kernel built with CONFIG_DEBUG_PREEMPT=y
      
      1) A new passive FastOpen TCP socket is created.
         This FO socket waits for an ACK coming from client to be a complete
         ESTABLISHED one.
      2) A socket operation on this socket goes through lock_sock()
         release_sock() dance.
      3) While the socket is owned by the user in step 2),
         a retransmit of the SYN is received and stored in socket backlog.
      4) At release_sock() time, the socket backlog is processed while
         in process context.
      5) A SYNACK packet is cooked in response of the SYN retransmit.
      6) -> tcp_rtx_synack() is called in process context.
      
      Before blamed commit, tcp_rtx_synack() was always called from BH handler,
      from a timer handler.
      
      Fix this by using TCP_INC_STATS() & NET_INC_STATS()
      which do not assume caller is in non preemptible context.
      
      [1]
      BUG: using __this_cpu_add() in preemptible [00000000] code: epollpep/2180
      caller is tcp_rtx_synack.part.0+0x36/0xc0
      CPU: 10 PID: 2180 Comm: epollpep Tainted: G           OE     5.16.0-0.bpo.4-amd64 #1  Debian 5.16.12-1~bpo11+1
      Hardware name: Supermicro SYS-5039MC-H8TRF/X11SCD-F, BIOS 1.7 11/23/2021
      Call Trace:
       <TASK>
       dump_stack_lvl+0x48/0x5e
       check_preemption_disabled+0xde/0xe0
       tcp_rtx_synack.part.0+0x36/0xc0
       tcp_rtx_synack+0x8d/0xa0
       ? kmem_cache_alloc+0x2e0/0x3e0
       ? apparmor_file_alloc_security+0x3b/0x1f0
       inet_rtx_syn_ack+0x16/0x30
       tcp_check_req+0x367/0x610
       tcp_rcv_state_process+0x91/0xf60
       ? get_nohz_timer_target+0x18/0x1a0
       ? lock_timer_base+0x61/0x80
       ? preempt_count_add+0x68/0xa0
       tcp_v4_do_rcv+0xbd/0x270
       __release_sock+0x6d/0xb0
       release_sock+0x2b/0x90
       sock_setsockopt+0x138/0x1140
       ? __sys_getsockname+0x7e/0xc0
       ? aa_sk_perm+0x3e/0x1a0
       __sys_setsockopt+0x198/0x1e0
       __x64_sys_setsockopt+0x21/0x30
       do_syscall_64+0x38/0xc0
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Fixes: 168a8f58 ("tcp: TCP Fast Open Server - main code path")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarLaurent Fasnacht <laurent.fasnacht@proton.ch>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Link: https://lore.kernel.org/r/20220530213713.601888-1-eric.dumazet@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0a375c82
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · b3c0a9ef
      Jakub Kicinski authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      1) Missing proper sanitization for nft_set_desc_concat_parse().
      
      2) Missing mutex in nf_tables pre_exit path.
      
      3) Possible double hook unregistration from clean_net path.
      
      4) Missing FLOWI_FLAG_ANYSRC flag in flowtable route lookup.
         Fix incorrect source and destination address in case of NAT.
         Patch from wenxu.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: flowtable: fix nft_flow_route source address for nat case
        netfilter: flowtable: fix missing FLOWI_FLAG_ANYSRC flag
        netfilter: nf_tables: double hook unregistration in netns path
        netfilter: nf_tables: hold mutex on netns pre_exit path
        netfilter: nf_tables: sanitize nft_set_desc_concat_parse()
      ====================
      
      Link: https://lore.kernel.org/r/20220531215839.84765-1-pablo@netfilter.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b3c0a9ef
    • Guoju Fang's avatar
      net: sched: add barrier to fix packet stuck problem for lockless qdisc · 2e8728c9
      Guoju Fang authored
      In qdisc_run_end(), the spin_unlock() only has store-release semantic,
      which guarantees all earlier memory access are visible before it. But
      the subsequent test_bit() has no barrier semantics so may be reordered
      ahead of the spin_unlock(). The store-load reordering may cause a packet
      stuck problem.
      
      The concurrent operations can be described as below,
               CPU 0                      |          CPU 1
         qdisc_run_end()                  |     qdisc_run_begin()
                .                         |           .
       ----> /* may be reorderd here */   |           .
      |         .                         |           .
      |     spin_unlock()                 |         set_bit()
      |         .                         |         smp_mb__after_atomic()
       ---- test_bit()                    |         spin_trylock()
                .                         |          .
      
      Consider the following sequence of events:
          CPU 0 reorder test_bit() ahead and see MISSED = 0
          CPU 1 calls set_bit()
          CPU 1 calls spin_trylock() and return fail
          CPU 0 executes spin_unlock()
      
      At the end of the sequence, CPU 0 calls spin_unlock() and does nothing
      because it see MISSED = 0. The skb on CPU 1 has beed enqueued but no one
      take it, until the next cpu pushing to the qdisc (if ever ...) will
      notice and dequeue it.
      
      This patch fix this by adding one explicit barrier. As spin_unlock() and
      test_bit() ordering is a store-load ordering, a full memory barrier
      smp_mb() is needed here.
      
      Fixes: a90c57f2 ("net: sched: fix packet stuck problem for lockless qdisc")
      Signed-off-by: default avatarGuoju Fang <gjfang@linux.alibaba.com>
      Link: https://lore.kernel.org/r/20220528101628.120193-1-gjfang@linux.alibaba.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2e8728c9
  3. 31 May, 2022 10 commits
  4. 29 May, 2022 3 commits
    • David S. Miller's avatar
      Merge branch 'sfc-fixes' · 90343f57
      David S. Miller authored
      Íñigo Huguet says:
      
      ====================
      sfc: fix some efx_separate_tx_channels errors
      
      Trying to load sfc driver with modparam efx_separate_tx_channels=1
      resulted in errors during initialization and not being able to use the
      NIC. This patches fix a few bugs and make it work again.
      
      v2:
      * added Martin's patch instead of a previous mine. Mine one solved some
      of the initialization errors, but Martin's solves them also in all
      possible cases.
      * removed whitespaces cleanup, as requested by Jakub
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90343f57
    • Íñigo Huguet's avatar
      sfc: fix wrong tx channel offset with efx_separate_tx_channels · c308dfd1
      Íñigo Huguet authored
      tx_channel_offset is calculated in efx_allocate_msix_channels, but it is
      also calculated again in efx_set_channels because it was originally done
      there, and when efx_allocate_msix_channels was introduced it was
      forgotten to be removed from efx_set_channels.
      
      Moreover, the old calculation is wrong when using
      efx_separate_tx_channels because now we can have XDP channels after the
      TX channels, so n_channels - n_tx_channels doesn't point to the first TX
      channel.
      
      Remove the old calculation from efx_set_channels, and add the
      initialization of this variable if MSI or legacy interrupts are used,
      next to the initialization of the rest of the related variables, where
      it was missing.
      
      Fixes: 3990a8ff ("sfc: allocate channels for XDP tx queues")
      Reported-by: default avatarTianhao Zhao <tizhao@redhat.com>
      Signed-off-by: default avatarÍñigo Huguet <ihuguet@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c308dfd1
    • Martin Habets's avatar
      sfc: fix considering that all channels have TX queues · 2e102b53
      Martin Habets authored
      Normally, all channels have RX and TX queues, but this is not true if
      modparam efx_separate_tx_channels=1 is used. In that cases, some
      channels only have RX queues and others only TX queues (or more
      preciselly, they have them allocated, but not initialized).
      
      Fix efx_channel_has_tx_queues to return the correct value for this case
      too.
      
      Messages shown at probe time before the fix:
       sfc 0000:03:00.0 ens6f0np0: MC command 0x82 inlen 544 failed rc=-22 (raw=0) arg=0
       ------------[ cut here ]------------
       netdevice: ens6f0np0: failed to initialise TXQ -1
       WARNING: CPU: 1 PID: 626 at drivers/net/ethernet/sfc/ef10.c:2393 efx_ef10_tx_init+0x201/0x300 [sfc]
       [...] stripped
       RIP: 0010:efx_ef10_tx_init+0x201/0x300 [sfc]
       [...] stripped
       Call Trace:
        efx_init_tx_queue+0xaa/0xf0 [sfc]
        efx_start_channels+0x49/0x120 [sfc]
        efx_start_all+0x1f8/0x430 [sfc]
        efx_net_open+0x5a/0xe0 [sfc]
        __dev_open+0xd0/0x190
        __dev_change_flags+0x1b3/0x220
        dev_change_flags+0x21/0x60
       [...] stripped
      
      Messages shown at remove time before the fix:
       sfc 0000:03:00.0 ens6f0np0: failed to flush 10 queues
       sfc 0000:03:00.0 ens6f0np0: failed to flush queues
      
      Fixes: 8700aff0 ("sfc: fix channel allocation with brute force")
      Reported-by: default avatarTianhao Zhao <tizhao@redhat.com>
      Signed-off-by: default avatarMartin Habets <habetsm.xilinx@gmail.com>
      Tested-by: default avatarÍñigo Huguet <ihuguet@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2e102b53
  5. 28 May, 2022 13 commits
  6. 27 May, 2022 8 commits