1. 02 Aug, 2023 1 commit
    • Hans de Goede's avatar
      wifi: brcmfmac: Fix field-spanning write in brcmf_scan_params_v2_to_v1() · 16e455a4
      Hans de Goede authored
      Using brcmfmac with 6.5-rc3 on a brcmfmac43241b4-sdio triggers
      a backtrace caused by the following field-spanning warning:
      
      memcpy: detected field-spanning write (size 120) of single field
        "&params_le->channel_list[0]" at
        drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c:1072 (size 2)
      
      The driver still works after this warning. The warning was introduced by the
      new field-spanning write checks which were enabled recently.
      
      Fix this by replacing the channel_list[1] declaration at the end of
      the struct with a flexible array declaration.
      
      Most users of struct brcmf_scan_params_le calculate the size to alloc
      using the size of the non flex-array part of the struct + needed extra
      space, so they do not care about sizeof(struct brcmf_scan_params_le).
      
      brcmf_notify_escan_complete() however uses the struct on the stack,
      expecting there to be room for at least 1 entry in the channel-list
      to store the special -1 abort channel-id.
      
      To make this work use an anonymous union with a padding member
      added + the actual channel_list flexible array.
      
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarFranky Lin <franky.lin@broadcom.com>
      Signed-off-by: default avatarKalle Valo <kvalo@kernel.org>
      Link: https://lore.kernel.org/r/20230729140500.27892-1-hdegoede@redhat.com
      16e455a4
  2. 01 Aug, 2023 1 commit
    • Kees Cook's avatar
      wifi: ray_cs: Replace 1-element array with flexible array · 1d7dd5aa
      Kees Cook authored
      The trailing array member of struct tx_buf was defined as a 1-element
      array, but used as a flexible array. This was resulting in build warnings:
      
          In function 'fortify_memset_chk',
              inlined from 'memset_io' at /kisskb/src/arch/mips/include/asm/io.h:486:2,
              inlined from 'build_auth_frame' at /kisskb/src/drivers/net/wireless/legacy/ray_cs.c:2697:2:
          /kisskb/src/include/linux/fortify-string.h:493:25: error: call to '__write_overflow_field' declared with attribute warning:
      detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning]
            493 |                         __write_overflow_field(p_size_field, size);
                |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      Replace it with an actual flexible array. Binary difference comparison
      shows a single change in output:
      
      │  drivers/net/wireless/legacy/ray_cs.c:883
      │       lea    0x1c(%rbp),%r13d
      │ -     cmp    $0x7c3,%r13d
      │ +     cmp    $0x7c4,%r13d
      
      This is from:
      
              if (len + TX_HEADER_LENGTH > TX_BUF_SIZE) {
      
      specifically:
      
       #define TX_BUF_SIZE (2048 - sizeof(struct tx_msg))
      
      This appears to have been originally buggy, so the change is correct.
      Reported-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Closes: https://lore.kernel.org/all/88f83d73-781d-bdc-126-aa629cb368c@linux-m68k.org
      Cc: Kalle Valo <kvalo@kernel.org>
      Cc: linux-wireless@vger.kernel.org
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarKalle Valo <kvalo@kernel.org>
      Link: https://lore.kernel.org/r/20230728231245.never.309-kees@kernel.org
      1d7dd5aa
  3. 26 Jul, 2023 13 commits
  4. 24 Jul, 2023 2 commits
  5. 20 Jul, 2023 10 commits
  6. 19 Jul, 2023 13 commits
    • Jakub Kicinski's avatar
      Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · e80698b7
      Jakub Kicinski authored
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf 2023-07-19
      
      We've added 4 non-merge commits during the last 1 day(s) which contain
      a total of 3 files changed, 55 insertions(+), 10 deletions(-).
      
      The main changes are:
      
      1) Fix stack depth check in presence of async callbacks,
         from Kumar Kartikeya Dwivedi.
      
      2) Fix BTI type used for freplace attached functions,
         from Alexander Duyck.
      
      * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        bpf, arm64: Fix BTI type used for freplace attached functions
        selftests/bpf: Add more tests for check_max_stack_depth bug
        bpf: Repeat check_max_stack_depth for async callbacks
        bpf: Fix subprog idx logic in check_max_stack_depth
      ====================
      
      Link: https://lore.kernel.org/r/20230719174502.74023-1-alexei.starovoitov@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e80698b7
    • Yuanjun Gong's avatar
      ipv4: ip_gre: fix return value check in erspan_xmit() · aa7cb378
      Yuanjun Gong authored
      goto free_skb if an unexpected result is returned by pskb_tirm()
      in erspan_xmit().
      Signed-off-by: default avatarYuanjun Gong <ruc_gongyuanjun@163.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aa7cb378
    • Yuanjun Gong's avatar
      ipv4: ip_gre: fix return value check in erspan_fb_xmit() · 02d84f3e
      Yuanjun Gong authored
      goto err_free_skb if an unexpected result is returned by pskb_tirm()
      in erspan_fb_xmit().
      Signed-off-by: default avatarYuanjun Gong <ruc_gongyuanjun@163.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      02d84f3e
    • Yuanjun Gong's avatar
      drivers:net: fix return value check in ocelot_fdma_receive_skb · bce56033
      Yuanjun Gong authored
      ocelot_fdma_receive_skb should return false if an unexpected
      value is returned by pskb_trim.
      Signed-off-by: default avatarYuanjun Gong <ruc_gongyuanjun@163.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bce56033
    • Yuanjun Gong's avatar
      drivers: net: fix return value check in emac_tso_csum() · 78a93c31
      Yuanjun Gong authored
      in emac_tso_csum(), return an error code if an unexpected value
      is returned by pskb_trim().
      Signed-off-by: default avatarYuanjun Gong <ruc_gongyuanjun@163.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      78a93c31
    • Yuanjun Gong's avatar
      net:ipv6: check return value of pskb_trim() · 4258faa1
      Yuanjun Gong authored
      goto tx_err if an unexpected result is returned by pskb_tirm()
      in ip6erspan_tunnel_xmit().
      
      Fixes: 5a963eb6 ("ip6_gre: Add ERSPAN native tunnel support")
      Signed-off-by: default avatarYuanjun Gong <ruc_gongyuanjun@163.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4258faa1
    • Wang Ming's avatar
      net: ipv4: Use kfree_sensitive instead of kfree · daa75144
      Wang Ming authored
      key might contain private part of the key, so better use
      kfree_sensitive to free it.
      
      Fixes: 38320c70 ("[IPSEC]: Use crypto_aead and authenc in ESP")
      Signed-off-by: default avatarWang Ming <machel@vivo.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      daa75144
    • Jakub Kicinski's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 7f5acea7
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2023-07-17 (iavf)
      
      This series contains updates to iavf driver only.
      
      Ding Hui fixes use-after-free issue by calling netif_napi_del() for all
      allocated q_vectors. He also resolves out-of-bounds issue by not
      updating to new values when timeout is encountered.
      
      Marcin and Ahmed change the way resets are handled so that the callback
      operating under the RTNL lock will wait for the reset to finish, the
      rtnl_lock sensitive functions in reset flow will schedule the netdev update
      for later in order to remove circular dependency with the critical lock.
      
      * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        iavf: fix reset task race with iavf_remove()
        iavf: fix a deadlock caused by rtnl and driver's lock circular dependencies
        Revert "iavf: Do not restart Tx queues after reset task failure"
        Revert "iavf: Detach device during reset task"
        iavf: Wait for reset in callbacks which trigger it
        iavf: use internal state to free traffic IRQs
        iavf: Fix out-of-bounds when setting channels on remove
        iavf: Fix use-after-free in free_netdev
      ====================
      
      Link: https://lore.kernel.org/r/20230717175205.3217774-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7f5acea7
    • Jakub Kicinski's avatar
      Merge branch 'tcp-annotate-data-races-in-tcp_rsk-req' · e9b2bd96
      Jakub Kicinski authored
      Eric Dumazet says:
      
      ====================
      tcp: annotate data-races in tcp_rsk(req)
      
      Small series addressing two syzbot reports around tcp_rsk(req)
      ====================
      
      Link: https://lore.kernel.org/r/20230717144445.653164-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e9b2bd96
    • Eric Dumazet's avatar
      tcp: annotate data-races around tcp_rsk(req)->ts_recent · eba20811
      Eric Dumazet authored
      TCP request sockets are lockless, tcp_rsk(req)->ts_recent
      can change while being read by another cpu as syzbot noticed.
      
      This is harmless, but we should annotate the known races.
      
      Note that tcp_check_req() changes req->ts_recent a bit early,
      we might change this in the future.
      
      BUG: KCSAN: data-race in tcp_check_req / tcp_check_req
      
      write to 0xffff88813c8afb84 of 4 bytes by interrupt on cpu 1:
      tcp_check_req+0x694/0xc70 net/ipv4/tcp_minisocks.c:762
      tcp_v4_rcv+0x12db/0x1b70 net/ipv4/tcp_ipv4.c:2071
      ip_protocol_deliver_rcu+0x356/0x6d0 net/ipv4/ip_input.c:205
      ip_local_deliver_finish+0x13c/0x1a0 net/ipv4/ip_input.c:233
      NF_HOOK include/linux/netfilter.h:303 [inline]
      ip_local_deliver+0xec/0x1c0 net/ipv4/ip_input.c:254
      dst_input include/net/dst.h:468 [inline]
      ip_rcv_finish net/ipv4/ip_input.c:449 [inline]
      NF_HOOK include/linux/netfilter.h:303 [inline]
      ip_rcv+0x197/0x270 net/ipv4/ip_input.c:569
      __netif_receive_skb_one_core net/core/dev.c:5493 [inline]
      __netif_receive_skb+0x90/0x1b0 net/core/dev.c:5607
      process_backlog+0x21f/0x380 net/core/dev.c:5935
      __napi_poll+0x60/0x3b0 net/core/dev.c:6498
      napi_poll net/core/dev.c:6565 [inline]
      net_rx_action+0x32b/0x750 net/core/dev.c:6698
      __do_softirq+0xc1/0x265 kernel/softirq.c:571
      do_softirq+0x7e/0xb0 kernel/softirq.c:472
      __local_bh_enable_ip+0x64/0x70 kernel/softirq.c:396
      local_bh_enable+0x1f/0x20 include/linux/bottom_half.h:33
      rcu_read_unlock_bh include/linux/rcupdate.h:843 [inline]
      __dev_queue_xmit+0xabb/0x1d10 net/core/dev.c:4271
      dev_queue_xmit include/linux/netdevice.h:3088 [inline]
      neigh_hh_output include/net/neighbour.h:528 [inline]
      neigh_output include/net/neighbour.h:542 [inline]
      ip_finish_output2+0x700/0x840 net/ipv4/ip_output.c:229
      ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:317
      NF_HOOK_COND include/linux/netfilter.h:292 [inline]
      ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:431
      dst_output include/net/dst.h:458 [inline]
      ip_local_out net/ipv4/ip_output.c:126 [inline]
      __ip_queue_xmit+0xa4d/0xa70 net/ipv4/ip_output.c:533
      ip_queue_xmit+0x38/0x40 net/ipv4/ip_output.c:547
      __tcp_transmit_skb+0x1194/0x16e0 net/ipv4/tcp_output.c:1399
      tcp_transmit_skb net/ipv4/tcp_output.c:1417 [inline]
      tcp_write_xmit+0x13ff/0x2fd0 net/ipv4/tcp_output.c:2693
      __tcp_push_pending_frames+0x6a/0x1a0 net/ipv4/tcp_output.c:2877
      tcp_push_pending_frames include/net/tcp.h:1952 [inline]
      __tcp_sock_set_cork net/ipv4/tcp.c:3336 [inline]
      tcp_sock_set_cork+0xe8/0x100 net/ipv4/tcp.c:3343
      rds_tcp_xmit_path_complete+0x3b/0x40 net/rds/tcp_send.c:52
      rds_send_xmit+0xf8d/0x1420 net/rds/send.c:422
      rds_send_worker+0x42/0x1d0 net/rds/threads.c:200
      process_one_work+0x3e6/0x750 kernel/workqueue.c:2408
      worker_thread+0x5f2/0xa10 kernel/workqueue.c:2555
      kthread+0x1d7/0x210 kernel/kthread.c:379
      ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
      
      read to 0xffff88813c8afb84 of 4 bytes by interrupt on cpu 0:
      tcp_check_req+0x32a/0xc70 net/ipv4/tcp_minisocks.c:622
      tcp_v4_rcv+0x12db/0x1b70 net/ipv4/tcp_ipv4.c:2071
      ip_protocol_deliver_rcu+0x356/0x6d0 net/ipv4/ip_input.c:205
      ip_local_deliver_finish+0x13c/0x1a0 net/ipv4/ip_input.c:233
      NF_HOOK include/linux/netfilter.h:303 [inline]
      ip_local_deliver+0xec/0x1c0 net/ipv4/ip_input.c:254
      dst_input include/net/dst.h:468 [inline]
      ip_rcv_finish net/ipv4/ip_input.c:449 [inline]
      NF_HOOK include/linux/netfilter.h:303 [inline]
      ip_rcv+0x197/0x270 net/ipv4/ip_input.c:569
      __netif_receive_skb_one_core net/core/dev.c:5493 [inline]
      __netif_receive_skb+0x90/0x1b0 net/core/dev.c:5607
      process_backlog+0x21f/0x380 net/core/dev.c:5935
      __napi_poll+0x60/0x3b0 net/core/dev.c:6498
      napi_poll net/core/dev.c:6565 [inline]
      net_rx_action+0x32b/0x750 net/core/dev.c:6698
      __do_softirq+0xc1/0x265 kernel/softirq.c:571
      run_ksoftirqd+0x17/0x20 kernel/softirq.c:939
      smpboot_thread_fn+0x30a/0x4a0 kernel/smpboot.c:164
      kthread+0x1d7/0x210 kernel/kthread.c:379
      ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
      
      value changed: 0x1cd237f1 -> 0x1cd237f2
      
      Fixes: 079096f1 ("tcp/dccp: install syn_recv requests into ehash table")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20230717144445.653164-3-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      eba20811
    • Eric Dumazet's avatar
      tcp: annotate data-races around tcp_rsk(req)->txhash · 5e526552
      Eric Dumazet authored
      TCP request sockets are lockless, some of their fields
      can change while being read by another cpu as syzbot noticed.
      
      This is usually harmless, but we should annotate the known
      races.
      
      This patch takes care of tcp_rsk(req)->txhash,
      a separate one is needed for tcp_rsk(req)->ts_recent.
      
      BUG: KCSAN: data-race in tcp_make_synack / tcp_rtx_synack
      
      write to 0xffff8881362304bc of 4 bytes by task 32083 on cpu 1:
      tcp_rtx_synack+0x9d/0x2a0 net/ipv4/tcp_output.c:4213
      inet_rtx_syn_ack+0x38/0x80 net/ipv4/inet_connection_sock.c:880
      tcp_check_req+0x379/0xc70 net/ipv4/tcp_minisocks.c:665
      tcp_v6_rcv+0x125b/0x1b20 net/ipv6/tcp_ipv6.c:1673
      ip6_protocol_deliver_rcu+0x92f/0xf30 net/ipv6/ip6_input.c:437
      ip6_input_finish net/ipv6/ip6_input.c:482 [inline]
      NF_HOOK include/linux/netfilter.h:303 [inline]
      ip6_input+0xbd/0x1b0 net/ipv6/ip6_input.c:491
      dst_input include/net/dst.h:468 [inline]
      ip6_rcv_finish+0x1e2/0x2e0 net/ipv6/ip6_input.c:79
      NF_HOOK include/linux/netfilter.h:303 [inline]
      ipv6_rcv+0x74/0x150 net/ipv6/ip6_input.c:309
      __netif_receive_skb_one_core net/core/dev.c:5452 [inline]
      __netif_receive_skb+0x90/0x1b0 net/core/dev.c:5566
      netif_receive_skb_internal net/core/dev.c:5652 [inline]
      netif_receive_skb+0x4a/0x310 net/core/dev.c:5711
      tun_rx_batched+0x3bf/0x400
      tun_get_user+0x1d24/0x22b0 drivers/net/tun.c:1997
      tun_chr_write_iter+0x18e/0x240 drivers/net/tun.c:2043
      call_write_iter include/linux/fs.h:1871 [inline]
      new_sync_write fs/read_write.c:491 [inline]
      vfs_write+0x4ab/0x7d0 fs/read_write.c:584
      ksys_write+0xeb/0x1a0 fs/read_write.c:637
      __do_sys_write fs/read_write.c:649 [inline]
      __se_sys_write fs/read_write.c:646 [inline]
      __x64_sys_write+0x42/0x50 fs/read_write.c:646
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      read to 0xffff8881362304bc of 4 bytes by task 32078 on cpu 0:
      tcp_make_synack+0x367/0xb40 net/ipv4/tcp_output.c:3663
      tcp_v6_send_synack+0x72/0x420 net/ipv6/tcp_ipv6.c:544
      tcp_conn_request+0x11a8/0x1560 net/ipv4/tcp_input.c:7059
      tcp_v6_conn_request+0x13f/0x180 net/ipv6/tcp_ipv6.c:1175
      tcp_rcv_state_process+0x156/0x1de0 net/ipv4/tcp_input.c:6494
      tcp_v6_do_rcv+0x98a/0xb70 net/ipv6/tcp_ipv6.c:1509
      tcp_v6_rcv+0x17b8/0x1b20 net/ipv6/tcp_ipv6.c:1735
      ip6_protocol_deliver_rcu+0x92f/0xf30 net/ipv6/ip6_input.c:437
      ip6_input_finish net/ipv6/ip6_input.c:482 [inline]
      NF_HOOK include/linux/netfilter.h:303 [inline]
      ip6_input+0xbd/0x1b0 net/ipv6/ip6_input.c:491
      dst_input include/net/dst.h:468 [inline]
      ip6_rcv_finish+0x1e2/0x2e0 net/ipv6/ip6_input.c:79
      NF_HOOK include/linux/netfilter.h:303 [inline]
      ipv6_rcv+0x74/0x150 net/ipv6/ip6_input.c:309
      __netif_receive_skb_one_core net/core/dev.c:5452 [inline]
      __netif_receive_skb+0x90/0x1b0 net/core/dev.c:5566
      netif_receive_skb_internal net/core/dev.c:5652 [inline]
      netif_receive_skb+0x4a/0x310 net/core/dev.c:5711
      tun_rx_batched+0x3bf/0x400
      tun_get_user+0x1d24/0x22b0 drivers/net/tun.c:1997
      tun_chr_write_iter+0x18e/0x240 drivers/net/tun.c:2043
      call_write_iter include/linux/fs.h:1871 [inline]
      new_sync_write fs/read_write.c:491 [inline]
      vfs_write+0x4ab/0x7d0 fs/read_write.c:584
      ksys_write+0xeb/0x1a0 fs/read_write.c:637
      __do_sys_write fs/read_write.c:649 [inline]
      __se_sys_write fs/read_write.c:646 [inline]
      __x64_sys_write+0x42/0x50 fs/read_write.c:646
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      value changed: 0x91d25731 -> 0xe79325cd
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 32078 Comm: syz-executor.4 Not tainted 6.5.0-rc1-syzkaller-00033-geb26cbb1 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/03/2023
      
      Fixes: 58d607d3 ("tcp: provide skb->hash to synack packets")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20230717144445.653164-2-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5e526552
    • Subbaraya Sundeep's avatar
      octeontx2-pf: mcs: Generate hash key using ecb(aes) · e7002b3b
      Subbaraya Sundeep authored
      Hardware generated encryption and ICV tags are found to
      be wrong when tested with IEEE MACSEC test vectors.
      This is because as per the HRM, the hash key (derived by
      AES-ECB block encryption of an all 0s block with the SAK)
      has to be programmed by the software in
      MCSX_RS_MCS_CPM_TX_SLAVE_SA_PLCY_MEM_4X register.
      Hence fix this by generating hash key in software and
      configuring in hardware.
      
      Fixes: c54ffc73 ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading")
      Signed-off-by: default avatarSubbaraya Sundeep <sbhatta@marvell.com>
      Reviewed-by: default avatarKalesh AP <kalesh-anakkur.purayil@broadcom.com>
      Link: https://lore.kernel.org/r/1689574603-28093-1-git-send-email-sbhatta@marvell.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e7002b3b
    • Florian Kauer's avatar
      igc: Prevent garbled TX queue with XDP ZEROCOPY · 78adb4bc
      Florian Kauer authored
      In normal operation, each populated queue item has
      next_to_watch pointing to the last TX desc of the packet,
      while each cleaned item has it set to 0. In particular,
      next_to_use that points to the next (necessarily clean)
      item to use has next_to_watch set to 0.
      
      When the TX queue is used both by an application using
      AF_XDP with ZEROCOPY as well as a second non-XDP application
      generating high traffic, the queue pointers can get in
      an invalid state where next_to_use points to an item
      where next_to_watch is NOT set to 0.
      
      However, the implementation assumes at several places
      that this is never the case, so if it does hold,
      bad things happen. In particular, within the loop inside
      of igc_clean_tx_irq(), next_to_clean can overtake next_to_use.
      Finally, this prevents any further transmission via
      this queue and it never gets unblocked or signaled.
      Secondly, if the queue is in this garbled state,
      the inner loop of igc_clean_tx_ring() will never terminate,
      completely hogging a CPU core.
      
      The reason is that igc_xdp_xmit_zc() reads next_to_use
      before acquiring the lock, and writing it back
      (potentially unmodified) later. If it got modified
      before locking, the outdated next_to_use is written
      pointing to an item that was already used elsewhere
      (and thus next_to_watch got written).
      
      Fixes: 9acf59a7 ("igc: Enable TX via AF_XDP zero-copy")
      Signed-off-by: default avatarFlorian Kauer <florian.kauer@linutronix.de>
      Reviewed-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Tested-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Acked-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Tested-by: default avatarNaama Meir <naamax.meir@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Link: https://lore.kernel.org/r/20230717175444.3217831-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      78adb4bc