1. 01 Apr, 2022 5 commits
    • Vladimir Oltean's avatar
      Revert "net: dsa: stop updating master MTU from master.c" · 066dfc42
      Vladimir Oltean authored
      This reverts commit a1ff94c2.
      
      Switch drivers that don't implement ->port_change_mtu() will cause the
      DSA master to remain with an MTU of 1500, since we've deleted the other
      code path. In turn, this causes a regression for those systems, where
      MTU-sized traffic can no longer be terminated.
      
      Revert the change taking into account the fact that rtnl_lock() is now
      taken top-level from the callers of dsa_master_setup() and
      dsa_master_teardown(). Also add a comment in order for it to be
      absolutely clear why it is still needed.
      
      Fixes: a1ff94c2 ("net: dsa: stop updating master MTU from master.c")
      Reported-by: default avatarLuiz Angelo Daros de Luca <luizluca@gmail.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: default avatarLuiz Angelo Daros de Luca <luizluca@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      066dfc42
    • Jean-Philippe Brucker's avatar
      skbuff: fix coalescing for page_pool fragment recycling · 1effe8ca
      Jean-Philippe Brucker authored
      Fix a use-after-free when using page_pool with page fragments. We
      encountered this problem during normal RX in the hns3 driver:
      
      (1) Initially we have three descriptors in the RX queue. The first one
          allocates PAGE1 through page_pool, and the other two allocate one
          half of PAGE2 each. Page references look like this:
      
                      RX_BD1 _______ PAGE1
                      RX_BD2 _______ PAGE2
                      RX_BD3 _________/
      
      (2) Handle RX on the first descriptor. Allocate SKB1, eventually added
          to the receive queue by tcp_queue_rcv().
      
      (3) Handle RX on the second descriptor. Allocate SKB2 and pass it to
          netif_receive_skb():
      
          netif_receive_skb(SKB2)
            ip_rcv(SKB2)
              SKB3 = skb_clone(SKB2)
      
          SKB2 and SKB3 share a reference to PAGE2 through
          skb_shinfo()->dataref. The other ref to PAGE2 is still held by
          RX_BD3:
      
                            SKB2 ---+- PAGE2
                            SKB3 __/   /
                      RX_BD3 _________/
      
       (3b) Now while handling TCP, coalesce SKB3 with SKB1:
      
            tcp_v4_rcv(SKB3)
              tcp_try_coalesce(to=SKB1, from=SKB3)    // succeeds
              kfree_skb_partial(SKB3)
                skb_release_data(SKB3)                // drops one dataref
      
                            SKB1 _____ PAGE1
                                 \____
                            SKB2 _____ PAGE2
                                       /
                      RX_BD3 _________/
      
          In skb_try_coalesce(), __skb_frag_ref() takes a page reference to
          PAGE2, where it should instead have increased the page_pool frag
          reference, pp_frag_count. Without coalescing, when releasing both
          SKB2 and SKB3, a single reference to PAGE2 would be dropped. Now
          when releasing SKB1 and SKB2, two references to PAGE2 will be
          dropped, resulting in underflow.
      
       (3c) Drop SKB2:
      
            af_packet_rcv(SKB2)
              consume_skb(SKB2)
                skb_release_data(SKB2)                // drops second dataref
                  page_pool_return_skb_page(PAGE2)    // drops one pp_frag_count
      
                            SKB1 _____ PAGE1
                                 \____
                                       PAGE2
                                       /
                      RX_BD3 _________/
      
      (4) Userspace calls recvmsg()
          Copies SKB1 and releases it. Since SKB3 was coalesced with SKB1, we
          release the SKB3 page as well:
      
          tcp_eat_recv_skb(SKB1)
            skb_release_data(SKB1)
              page_pool_return_skb_page(PAGE1)
              page_pool_return_skb_page(PAGE2)        // drops second pp_frag_count
      
      (5) PAGE2 is freed, but the third RX descriptor was still using it!
          In our case this causes IOMMU faults, but it would silently corrupt
          memory if the IOMMU was disabled.
      
      Change the logic that checks whether pp_recycle SKBs can be coalesced.
      We still reject differing pp_recycle between 'from' and 'to' SKBs, but
      in order to avoid the situation described above, we also reject
      coalescing when both 'from' and 'to' are pp_recycled and 'from' is
      cloned.
      
      The new logic allows coalescing a cloned pp_recycle SKB into a page
      refcounted one, because in this case the release (4) will drop the right
      reference, the one taken by skb_try_coalesce().
      
      Fixes: 53e0961d ("page_pool: add frag page recycling support in page pool")
      Suggested-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: default avatarJean-Philippe Brucker <jean-philippe@linaro.org>
      Reviewed-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
      Reviewed-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Acked-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1effe8ca
    • Eyal Birger's avatar
      vrf: fix packet sniffing for traffic originating from ip tunnels · 012d69fb
      Eyal Birger authored
      in commit 04893908
      ("vrf: add mac header for tunneled packets when sniffer is attached")
      an Ethernet header was cooked for traffic originating from tunnel devices.
      
      However, the header is added based on whether the mac_header is unset
      and ignores cases where the device doesn't expose a mac header to upper
      layers, such as in ip tunnels like ipip and gre.
      
      Traffic originating from such devices still appears garbled when capturing
      on the vrf device.
      
      Fix by observing whether the original device exposes a header to upper
      layers, similar to the logic done in af_packet.
      
      In addition, skb->mac_len needs to be adjusted after adding the Ethernet
      header for the skb_push/pull() surrounding dev_queue_xmit_nit() to work
      on these packets.
      
      Fixes: 04893908 ("vrf: add mac header for tunneled packets when sniffer is attached")
      Signed-off-by: default avatarEyal Birger <eyal.birger@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      012d69fb
    • Ziyang Xuan's avatar
      net/tls: fix slab-out-of-bounds bug in decrypt_internal · 9381fe8c
      Ziyang Xuan authored
      The memory size of tls_ctx->rx.iv for AES128-CCM is 12 setting in
      tls_set_sw_offload(). The return value of crypto_aead_ivsize()
      for "ccm(aes)" is 16. So memcpy() require 16 bytes from 12 bytes
      memory space will trigger slab-out-of-bounds bug as following:
      
      ==================================================================
      BUG: KASAN: slab-out-of-bounds in decrypt_internal+0x385/0xc40 [tls]
      Read of size 16 at addr ffff888114e84e60 by task tls/10911
      
      Call Trace:
       <TASK>
       dump_stack_lvl+0x34/0x44
       print_report.cold+0x5e/0x5db
       ? decrypt_internal+0x385/0xc40 [tls]
       kasan_report+0xab/0x120
       ? decrypt_internal+0x385/0xc40 [tls]
       kasan_check_range+0xf9/0x1e0
       memcpy+0x20/0x60
       decrypt_internal+0x385/0xc40 [tls]
       ? tls_get_rec+0x2e0/0x2e0 [tls]
       ? process_rx_list+0x1a5/0x420 [tls]
       ? tls_setup_from_iter.constprop.0+0x2e0/0x2e0 [tls]
       decrypt_skb_update+0x9d/0x400 [tls]
       tls_sw_recvmsg+0x3c8/0xb50 [tls]
      
      Allocated by task 10911:
       kasan_save_stack+0x1e/0x40
       __kasan_kmalloc+0x81/0xa0
       tls_set_sw_offload+0x2eb/0xa20 [tls]
       tls_setsockopt+0x68c/0x700 [tls]
       __sys_setsockopt+0xfe/0x1b0
      
      Replace the crypto_aead_ivsize() with prot->iv_size + prot->salt_size
      when memcpy() iv value in TLS_1_3_VERSION scenario.
      
      Fixes: f295b3ae ("net/tls: Add support of AES128-CCM based ciphers")
      Signed-off-by: default avatarZiyang Xuan <william.xuanziyang@huawei.com>
      Reviewed-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9381fe8c
    • Taehee Yoo's avatar
      net: sfc: add missing xdp queue reinitialization · 059a47f1
      Taehee Yoo authored
      After rx/tx ring buffer size is changed, kernel panic occurs when
      it acts XDP_TX or XDP_REDIRECT.
      
      When tx/rx ring buffer size is changed(ethtool -G), sfc driver
      reallocates and reinitializes rx and tx queues and their buffer
      (tx_queue->buffer).
      But it misses reinitializing xdp queues(efx->xdp_tx_queues).
      So, while it is acting XDP_TX or XDP_REDIRECT, it uses the uninitialized
      tx_queue->buffer.
      
      A new function efx_set_xdp_channels() is separated from efx_set_channels()
      to handle only xdp queues.
      
      Splat looks like:
         BUG: kernel NULL pointer dereference, address: 000000000000002a
         #PF: supervisor write access in kernel mode
         #PF: error_code(0x0002) - not-present page
         PGD 0 P4D 0
         Oops: 0002 [#4] PREEMPT SMP NOPTI
         RIP: 0010:efx_tx_map_chunk+0x54/0x90 [sfc]
         CPU: 2 PID: 0 Comm: swapper/2 Tainted: G      D           5.17.0+ #55 e8beeee8289528f11357029357cf
         Code: 48 8b 8d a8 01 00 00 48 8d 14 52 4c 8d 2c d0 44 89 e0 48 85 c9 74 0e 44 89 e2 4c 89 f6 48 80
         RSP: 0018:ffff92f121e45c60 EFLAGS: 00010297
         RIP: 0010:efx_tx_map_chunk+0x54/0x90 [sfc]
         RAX: 0000000000000040 RBX: ffff92ea506895c0 RCX: ffffffffc0330870
         RDX: 0000000000000001 RSI: 00000001139b10ce RDI: ffff92ea506895c0
         RBP: ffffffffc0358a80 R08: 00000001139b110d R09: 0000000000000000
         R10: 0000000000000001 R11: ffff92ea414c0088 R12: 0000000000000040
         R13: 0000000000000018 R14: 00000001139b10ce R15: ffff92ea506895c0
         FS:  0000000000000000(0000) GS:ffff92f121ec0000(0000) knlGS:0000000000000000
         CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
         Code: 48 8b 8d a8 01 00 00 48 8d 14 52 4c 8d 2c d0 44 89 e0 48 85 c9 74 0e 44 89 e2 4c 89 f6 48 80
         CR2: 000000000000002a CR3: 00000003e6810004 CR4: 00000000007706e0
         RSP: 0018:ffff92f121e85c60 EFLAGS: 00010297
         PKRU: 55555554
         RAX: 0000000000000040 RBX: ffff92ea50689700 RCX: ffffffffc0330870
         RDX: 0000000000000001 RSI: 00000001145a90ce RDI: ffff92ea50689700
         RBP: ffffffffc0358a80 R08: 00000001145a910d R09: 0000000000000000
         R10: 0000000000000001 R11: ffff92ea414c0088 R12: 0000000000000040
         R13: 0000000000000018 R14: 00000001145a90ce R15: ffff92ea50689700
         FS:  0000000000000000(0000) GS:ffff92f121e80000(0000) knlGS:0000000000000000
         CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
         CR2: 000000000000002a CR3: 00000003e6810005 CR4: 00000000007706e0
         PKRU: 55555554
         Call Trace:
          <IRQ>
          efx_xdp_tx_buffers+0x12b/0x3d0 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5]
          __efx_rx_packet+0x5c3/0x930 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5]
          efx_rx_packet+0x28c/0x2e0 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5]
          efx_ef10_ev_process+0x5f8/0xf40 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5]
          ? enqueue_task_fair+0x95/0x550
          efx_poll+0xc4/0x360 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5]
      
      Fixes: 3990a8ff ("sfc: allocate channels for XDP tx queues")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      059a47f1
  2. 31 Mar, 2022 35 commits