1. 05 Jun, 2024 2 commits
  2. 04 Jun, 2024 10 commits
    • Jakub Kicinski's avatar
      net: tls: fix marking packets as decrypted · a535d594
      Jakub Kicinski authored
      For TLS offload we mark packets with skb->decrypted to make sure
      they don't escape the host without getting encrypted first.
      The crypto state lives in the socket, so it may get detached
      by a call to skb_orphan(). As a safety check - the egress path
      drops all packets with skb->decrypted and no "crypto-safe" socket.
      
      The skb marking was added to sendpage only (and not sendmsg),
      because tls_device injected data into the TCP stack using sendpage.
      This special case was missed when sendpage got folded into sendmsg.
      
      Fixes: c5c37af6 ("tcp: Convert do_tcp_sendpages() to use MSG_SPLICE_PAGES")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240530232607.82686-1-kuba@kernel.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      a535d594
    • Jakub Kicinski's avatar
      Merge tag 'wireless-2024-06-03' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless · d6301802
      Jakub Kicinski authored
      Kalle Valo says:
      
      ====================
      wireless fixes for v6.10-rc3
      
      The first fixes for v6.10. And we have a big one, I suspect the
      biggest wireless pull request we ever had. There are fixes all over,
      both in stack and drivers. Likely the most important here are mt76 not
      working on mt7615 devices, ath11k not being able to connect to 6 GHz
      networks and rtlwifi suffering from packet loss. But of course there's
      much more.
      
      * tag 'wireless-2024-06-03' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: (37 commits)
        wifi: rtlwifi: Ignore IEEE80211_CONF_CHANGE_RETRY_LIMITS
        wifi: mt76: mt7615: add missing chanctx ops
        wifi: wilc1000: document SRCU usage instead of SRCU
        Revert "wifi: wilc1000: set atomic flag on kmemdup in srcu critical section"
        Revert "wifi: wilc1000: convert list management to RCU"
        wifi: mac80211: fix UBSAN noise in ieee80211_prep_hw_scan()
        wifi: mac80211: correctly parse Spatial Reuse Parameter Set element
        wifi: mac80211: fix Spatial Reuse element size check
        wifi: iwlwifi: mvm: don't read past the mfuart notifcation
        wifi: iwlwifi: mvm: Fix scan abort handling with HW rfkill
        wifi: iwlwifi: mvm: check n_ssids before accessing the ssids
        wifi: iwlwifi: mvm: properly set 6 GHz channel direct probe option
        wifi: iwlwifi: mvm: handle BA session teardown in RF-kill
        wifi: iwlwifi: mvm: Handle BIGTK cipher in kek_kck cmd
        wifi: iwlwifi: mvm: remove stale STA link data during restart
        wifi: iwlwifi: dbg_ini: move iwl_dbg_tlv_free outside of debugfs ifdef
        wifi: iwlwifi: mvm: set properly mac header
        wifi: iwlwifi: mvm: revert gen2 TX A-MPDU size to 64
        wifi: iwlwifi: mvm: d3: fix WoWLAN command version lookup
        wifi: iwlwifi: mvm: fix a crash on 7265
        ...
      ====================
      
      Link: https://lore.kernel.org/r/20240603115129.9494CC2BD10@smtp.kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d6301802
    • Jeff Johnson's avatar
      lib/test_rhashtable: add missing MODULE_DESCRIPTION() macro · c6cab01d
      Jeff Johnson authored
      make allmodconfig && make W=1 C=1 reports:
      WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test_rhashtable.o
      
      Add the missing invocation of the MODULE_DESCRIPTION() macro.
      Signed-off-by: default avatarJeff Johnson <quic_jjohnson@quicinc.com>
      Link: https://lore.kernel.org/r/20240531-md-lib-test_rhashtable-v1-1-cd6d4138f1b6@quicinc.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c6cab01d
    • Jakub Kicinski's avatar
      Merge branch 'dst_cache-fix-possible-races' · d730a42c
      Jakub Kicinski authored
      Eric Dumazet says:
      
      ====================
      dst_cache: fix possible races
      
      This series is inspired by various undisclosed syzbot
      reports hinting at corruptions in dst_cache structures.
      
      It seems at least four users of dst_cache are racy against
      BH reentrancy.
      
      Last patch is adding a DEBUG_NET check to catch future misuses.
      ====================
      
      Link: https://lore.kernel.org/r/20240531132636.2637995-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d730a42c
    • Eric Dumazet's avatar
      net: dst_cache: add two DEBUG_NET warnings · 2fe6fb36
      Eric Dumazet authored
      After fixing four different bugs involving dst_cache
      users, it might be worth adding a check about BH being
      blocked by dst_cache callers.
      
      DEBUG_NET_WARN_ON_ONCE(!in_softirq());
      
      It is not fatal, if we missed valid case where no
      BH deadlock is to be feared, we might change this.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Link: https://lore.kernel.org/r/20240531132636.2637995-6-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2fe6fb36
    • Eric Dumazet's avatar
      ila: block BH in ila_output() · cf28ff8e
      Eric Dumazet authored
      As explained in commit 13788174 ("tipc: block BH
      before using dst_cache"), net/core/dst_cache.c
      helpers need to be called with BH disabled.
      
      ila_output() is called from lwtunnel_output()
      possibly from process context, and under rcu_read_lock().
      
      We might be interrupted by a softirq, re-enter ila_output()
      and corrupt dst_cache data structures.
      
      Fix the race by using local_bh_disable().
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Link: https://lore.kernel.org/r/20240531132636.2637995-5-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cf28ff8e
    • Eric Dumazet's avatar
      ipv6: sr: block BH in seg6_output_core() and seg6_input_core() · c0b98ac1
      Eric Dumazet authored
      As explained in commit 13788174 ("tipc: block BH
      before using dst_cache"), net/core/dst_cache.c
      helpers need to be called with BH disabled.
      
      Disabling preemption in seg6_output_core() is not good enough,
      because seg6_output_core() is called from process context,
      lwtunnel_output() only uses rcu_read_lock().
      
      We might be interrupted by a softirq, re-enter seg6_output_core()
      and corrupt dst_cache data structures.
      
      Fix the race by using local_bh_disable() instead of
      preempt_disable().
      
      Apply a similar change in seg6_input_core().
      
      Fixes: fa79581e ("ipv6: sr: fix several BUGs when preemption is enabled")
      Fixes: 6c8702c6 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: David Lebrun <dlebrun@google.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Link: https://lore.kernel.org/r/20240531132636.2637995-4-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c0b98ac1
    • Eric Dumazet's avatar
      net: ipv6: rpl_iptunnel: block BH in rpl_output() and rpl_input() · db0090c6
      Eric Dumazet authored
      As explained in commit 13788174 ("tipc: block BH
      before using dst_cache"), net/core/dst_cache.c
      helpers need to be called with BH disabled.
      
      Disabling preemption in rpl_output() is not good enough,
      because rpl_output() is called from process context,
      lwtunnel_output() only uses rcu_read_lock().
      
      We might be interrupted by a softirq, re-enter rpl_output()
      and corrupt dst_cache data structures.
      
      Fix the race by using local_bh_disable() instead of
      preempt_disable().
      
      Apply a similar change in rpl_input().
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Alexander Aring <aahringo@redhat.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Link: https://lore.kernel.org/r/20240531132636.2637995-3-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      db0090c6
    • Eric Dumazet's avatar
      ipv6: ioam: block BH from ioam6_output() · 2fe40483
      Eric Dumazet authored
      As explained in commit 13788174 ("tipc: block BH
      before using dst_cache"), net/core/dst_cache.c
      helpers need to be called with BH disabled.
      
      Disabling preemption in ioam6_output() is not good enough,
      because ioam6_output() is called from process context,
      lwtunnel_output() only uses rcu_read_lock().
      
      We might be interrupted by a softirq, re-enter ioam6_output()
      and corrupt dst_cache data structures.
      
      Fix the race by using local_bh_disable() instead of
      preempt_disable().
      
      Fixes: 8cb3bf8b ("ipv6: ioam: Add support for the ip6ip6 encapsulation")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Justin Iurman <justin.iurman@uliege.be>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Link: https://lore.kernel.org/r/20240531132636.2637995-2-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2fe40483
    • Matthias Stocker's avatar
      vmxnet3: disable rx data ring on dma allocation failure · ffbe335b
      Matthias Stocker authored
      When vmxnet3_rq_create() fails to allocate memory for rq->data_ring.base,
      the subsequent call to vmxnet3_rq_destroy_all_rxdataring does not reset
      rq->data_ring.desc_size for the data ring that failed, which presumably
      causes the hypervisor to reference it on packet reception.
      
      To fix this bug, rq->data_ring.desc_size needs to be set to 0 to tell
      the hypervisor to disable this feature.
      
      [   95.436876] kernel BUG at net/core/skbuff.c:207!
      [   95.439074] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
      [   95.440411] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 6.9.3-dirty #1
      [   95.441558] Hardware name: VMware, Inc. VMware Virtual
      Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
      [   95.443481] RIP: 0010:skb_panic+0x4d/0x4f
      [   95.444404] Code: 4f 70 50 8b 87 c0 00 00 00 50 8b 87 bc 00 00 00 50
      ff b7 d0 00 00 00 4c 8b 8f c8 00 00 00 48 c7 c7 68 e8 be 9f e8 63 58 f9
      ff <0f> 0b 48 8b 14 24 48 c7 c1 d0 73 65 9f e8 a1 ff ff ff 48 8b 14 24
      [   95.447684] RSP: 0018:ffffa13340274dd0 EFLAGS: 00010246
      [   95.448762] RAX: 0000000000000089 RBX: ffff8fbbc72b02d0 RCX: 000000000000083f
      [   95.450148] RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000083f
      [   95.451520] RBP: 000000000000002d R08: 0000000000000000 R09: ffffa13340274c60
      [   95.452886] R10: ffffffffa04ed468 R11: 0000000000000002 R12: 0000000000000000
      [   95.454293] R13: ffff8fbbdab3c2d0 R14: ffff8fbbdbd829e0 R15: ffff8fbbdbd809e0
      [   95.455682] FS:  0000000000000000(0000) GS:ffff8fbeefd80000(0000) knlGS:0000000000000000
      [   95.457178] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   95.458340] CR2: 00007fd0d1f650c8 CR3: 0000000115f28000 CR4: 00000000000406f0
      [   95.459791] Call Trace:
      [   95.460515]  <IRQ>
      [   95.461180]  ? __die_body.cold+0x19/0x27
      [   95.462150]  ? die+0x2e/0x50
      [   95.462976]  ? do_trap+0xca/0x110
      [   95.463973]  ? do_error_trap+0x6a/0x90
      [   95.464966]  ? skb_panic+0x4d/0x4f
      [   95.465901]  ? exc_invalid_op+0x50/0x70
      [   95.466849]  ? skb_panic+0x4d/0x4f
      [   95.467718]  ? asm_exc_invalid_op+0x1a/0x20
      [   95.468758]  ? skb_panic+0x4d/0x4f
      [   95.469655]  skb_put.cold+0x10/0x10
      [   95.470573]  vmxnet3_rq_rx_complete+0x862/0x11e0 [vmxnet3]
      [   95.471853]  vmxnet3_poll_rx_only+0x36/0xb0 [vmxnet3]
      [   95.473185]  __napi_poll+0x2b/0x160
      [   95.474145]  net_rx_action+0x2c6/0x3b0
      [   95.475115]  handle_softirqs+0xe7/0x2a0
      [   95.476122]  __irq_exit_rcu+0x97/0xb0
      [   95.477109]  common_interrupt+0x85/0xa0
      [   95.478102]  </IRQ>
      [   95.478846]  <TASK>
      [   95.479603]  asm_common_interrupt+0x26/0x40
      [   95.480657] RIP: 0010:pv_native_safe_halt+0xf/0x20
      [   95.481801] Code: 22 d7 e9 54 87 01 00 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d 93 ba 3b 00 fb f4 <e9> 2c 87 01 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90
      [   95.485563] RSP: 0018:ffffa133400ffe58 EFLAGS: 00000246
      [   95.486882] RAX: 0000000000004000 RBX: ffff8fbbc1d14064 RCX: 0000000000000000
      [   95.488477] RDX: ffff8fbeefd80000 RSI: ffff8fbbc1d14000 RDI: 0000000000000001
      [   95.490067] RBP: ffff8fbbc1d14064 R08: ffffffffa0652260 R09: 00000000000010d3
      [   95.491683] R10: 0000000000000018 R11: ffff8fbeefdb4764 R12: ffffffffa0652260
      [   95.493389] R13: ffffffffa06522e0 R14: 0000000000000001 R15: 0000000000000000
      [   95.495035]  acpi_safe_halt+0x14/0x20
      [   95.496127]  acpi_idle_do_entry+0x2f/0x50
      [   95.497221]  acpi_idle_enter+0x7f/0xd0
      [   95.498272]  cpuidle_enter_state+0x81/0x420
      [   95.499375]  cpuidle_enter+0x2d/0x40
      [   95.500400]  do_idle+0x1e5/0x240
      [   95.501385]  cpu_startup_entry+0x29/0x30
      [   95.502422]  start_secondary+0x11c/0x140
      [   95.503454]  common_startup_64+0x13e/0x141
      [   95.504466]  </TASK>
      [   95.505197] Modules linked in: nft_fib_inet nft_fib_ipv4
      nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
      nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6
      nf_defrag_ipv4 rfkill ip_set nf_tables vsock_loopback
      vmw_vsock_virtio_transport_common qrtr vmw_vsock_vmci_transport vsock
      sunrpc binfmt_misc pktcdvd vmw_balloon pcspkr vmw_vmci i2c_piix4 joydev
      loop dm_multipath nfnetlink zram crct10dif_pclmul crc32_pclmul vmwgfx
      crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel
      sha512_ssse3 sha256_ssse3 vmxnet3 sha1_ssse3 drm_ttm_helper vmw_pvscsi
      ttm ata_generic pata_acpi serio_raw scsi_dh_rdac scsi_dh_emc
      scsi_dh_alua ip6_tables ip_tables fuse
      [   95.516536] ---[ end trace 0000000000000000 ]---
      
      Fixes: 6f483338 ("net: vmxnet3: Fix NULL pointer dereference in vmxnet3_rq_rx_complete()")
      Signed-off-by: default avatarMatthias Stocker <mstocker@barracuda.com>
      Reviewed-by: default avatarSubbaraya Sundeep <sbhatta@marvell.com>
      Reviewed-by: default avatarRonak Doshi <ronak.doshi@broadcom.com>
      Link: https://lore.kernel.org/r/20240531103711.101961-1-mstocker@barracuda.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ffbe335b
  3. 03 Jun, 2024 1 commit
  4. 01 Jun, 2024 17 commits
  5. 30 May, 2024 10 commits
    • Linus Torvalds's avatar
      Merge tag 'net-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · d8ec1985
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from bpf and netfilter.
      
        Current release - regressions:
      
         - gro: initialize network_offset in network layer
      
         - tcp: reduce accepted window in NEW_SYN_RECV state
      
        Current release - new code bugs:
      
         - eth: mlx5e: do not use ptp structure for tx ts stats when not
           initialized
      
         - eth: ice: check for unregistering correct number of devlink params
      
        Previous releases - regressions:
      
         - bpf: Allow delete from sockmap/sockhash only if update is allowed
      
         - sched: taprio: extend minimum interval restriction to entire cycle
           too
      
         - netfilter: ipset: add list flush to cancel_gc
      
         - ipv4: fix address dump when IPv4 is disabled on an interface
      
         - sock_map: avoid race between sock_map_close and sk_psock_put
      
         - eth: mlx5: use mlx5_ipsec_rx_status_destroy to correctly delete
           status rules
      
        Previous releases - always broken:
      
         - core: fix __dst_negative_advice() race
      
         - bpf:
             - fix multi-uprobe PID filtering logic
             - fix pkt_type override upon netkit pass verdict
      
         - netfilter: tproxy: bail out if IP has been disabled on the device
      
         - af_unix: annotate data-race around unix_sk(sk)->addr
      
         - eth: mlx5e: fix UDP GSO for encapsulated packets
      
         - eth: idpf: don't enable NAPI and interrupts prior to allocating Rx
           buffers
      
         - eth: i40e: fully suspend and resume IO operations in EEH case
      
         - eth: octeontx2-pf: free send queue buffers incase of leaf to inner
      
         - eth: ipvlan: dont Use skb->sk in ipvlan_process_v{4,6}_outbound"
      
      * tag 'net-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits)
        netdev: add qstat for csum complete
        ipvlan: Dont Use skb->sk in ipvlan_process_v{4,6}_outbound
        net: ena: Fix redundant device NUMA node override
        ice: check for unregistering correct number of devlink params
        ice: fix 200G PHY types to link speed mapping
        i40e: Fully suspend and resume IO operations in EEH case
        i40e: factoring out i40e_suspend/i40e_resume
        e1000e: move force SMBUS near the end of enable_ulp function
        net: dsa: microchip: fix RGMII error in KSZ DSA driver
        ipv4: correctly iterate over the target netns in inet_dump_ifaddr()
        net: fix __dst_negative_advice() race
        nfc/nci: Add the inconsistency check between the input data length and count
        MAINTAINERS: dwmac: starfive: update Maintainer
        net/sched: taprio: extend minimum interval restriction to entire cycle too
        net/sched: taprio: make q->picos_per_byte available to fill_sched_entry()
        netfilter: nft_fib: allow from forward/input without iif selector
        netfilter: tproxy: bail out if IP has been disabled on the device
        netfilter: nft_payload: skbuff vlan metadata mangle support
        net: ti: icssg-prueth: Fix start counter for ft1 filter
        sock_map: avoid race between sock_map_close and sk_psock_put
        ...
      d8ec1985
    • Jakub Kicinski's avatar
      netdev: add qstat for csum complete · 13c7c941
      Jakub Kicinski authored
      Recent commit 0cfe71f4 ("netdev: add queue stats") added
      a lot of useful stats, but only those immediately needed by virtio.
      Presumably virtio does not support CHECKSUM_COMPLETE,
      so statistic for that form of checksumming wasn't included.
      Other drivers will definitely need it, in fact we expect it
      to be needed in net-next soon (mlx5). So let's add the definition
      of the counter for CHECKSUM_COMPLETE to uAPI in net already,
      so that the counters are in a more natural order (all subsequent
      counters have not been present in any released kernel, yet).
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarJoe Damato <jdamato@fastly.com>
      Fixes: 0cfe71f4 ("netdev: add queue stats")
      Link: https://lore.kernel.org/r/20240529163547.3693194-1-kuba@kernel.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      13c7c941
    • Yue Haibing's avatar
      ipvlan: Dont Use skb->sk in ipvlan_process_v{4,6}_outbound · b3dc6e80
      Yue Haibing authored
      Raw packet from PF_PACKET socket ontop of an IPv6-backed ipvlan device will
      hit WARN_ON_ONCE() in sk_mc_loop() through sch_direct_xmit() path.
      
      WARNING: CPU: 2 PID: 0 at net/core/sock.c:775 sk_mc_loop+0x2d/0x70
      Modules linked in: sch_netem ipvlan rfkill cirrus drm_shmem_helper sg drm_kms_helper
      CPU: 2 PID: 0 Comm: swapper/2 Kdump: loaded Not tainted 6.9.0+ #279
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
      RIP: 0010:sk_mc_loop+0x2d/0x70
      Code: fa 0f 1f 44 00 00 65 0f b7 15 f7 96 a3 4f 31 c0 66 85 d2 75 26 48 85 ff 74 1c
      RSP: 0018:ffffa9584015cd78 EFLAGS: 00010212
      RAX: 0000000000000011 RBX: ffff91e585793e00 RCX: 0000000002c6a001
      RDX: 0000000000000000 RSI: 0000000000000040 RDI: ffff91e589c0f000
      RBP: ffff91e5855bd100 R08: 0000000000000000 R09: 3d00545216f43d00
      R10: ffff91e584fdcc50 R11: 00000060dd8616f4 R12: ffff91e58132d000
      R13: ffff91e584fdcc68 R14: ffff91e5869ce800 R15: ffff91e589c0f000
      FS:  0000000000000000(0000) GS:ffff91e898100000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f788f7c44c0 CR3: 0000000008e1a000 CR4: 00000000000006f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
      <IRQ>
       ? __warn (kernel/panic.c:693)
       ? sk_mc_loop (net/core/sock.c:760)
       ? report_bug (lib/bug.c:201 lib/bug.c:219)
       ? handle_bug (arch/x86/kernel/traps.c:239)
       ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1))
       ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621)
       ? sk_mc_loop (net/core/sock.c:760)
       ip6_finish_output2 (net/ipv6/ip6_output.c:83 (discriminator 1))
       ? nf_hook_slow (net/netfilter/core.c:626)
       ip6_finish_output (net/ipv6/ip6_output.c:222)
       ? __pfx_ip6_finish_output (net/ipv6/ip6_output.c:215)
       ipvlan_xmit_mode_l3 (drivers/net/ipvlan/ipvlan_core.c:602) ipvlan
       ipvlan_start_xmit (drivers/net/ipvlan/ipvlan_main.c:226) ipvlan
       dev_hard_start_xmit (net/core/dev.c:3594)
       sch_direct_xmit (net/sched/sch_generic.c:343)
       __qdisc_run (net/sched/sch_generic.c:416)
       net_tx_action (net/core/dev.c:5286)
       handle_softirqs (kernel/softirq.c:555)
       __irq_exit_rcu (kernel/softirq.c:589)
       sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1043)
      
      The warning triggers as this:
      packet_sendmsg
         packet_snd //skb->sk is packet sk
            __dev_queue_xmit
               __dev_xmit_skb //q->enqueue is not NULL
                   __qdisc_run
                     sch_direct_xmit
                       dev_hard_start_xmit
                         ipvlan_start_xmit
                            ipvlan_xmit_mode_l3 //l3 mode
                              ipvlan_process_outbound //vepa flag
                                ipvlan_process_v6_outbound
                                  ip6_local_out
                                      __ip6_finish_output
                                        ip6_finish_output2 //multicast packet
                                          sk_mc_loop //sk->sk_family is AF_PACKET
      
      Call ip{6}_local_out() with NULL sk in ipvlan as other tunnels to fix this.
      
      Fixes: 2ad7bf36 ("ipvlan: Initial check-in of the IPVLAN driver.")
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarYue Haibing <yuehaibing@huawei.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240529095633.613103-1-yuehaibing@huawei.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      b3dc6e80
    • Paolo Abeni's avatar
      Merge tag 'nf-24-05-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · e889eb17
      Paolo Abeni authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      Patch #1 syzbot reports that nf_reinject() could be called without
               rcu_read_lock() when flushing pending packets at nfnetlink
               queue removal, from Eric Dumazet.
      
      Patch #2 flushes ipset list:set when canceling garbage collection to
               reference to other lists to fix a race, from Jozsef Kadlecsik.
      
      Patch #3 restores q-in-q matching with nft_payload by reverting
               f6ae9f12 ("netfilter: nft_payload: add C-VLAN support").
      
      Patch #4 fixes vlan mangling in skbuff when vlan offload is present
               in skbuff, without this patch nft_payload corrupts packets
               in this case.
      
      Patch #5 fixes possible nul-deref in tproxy no IP address is found in
               netdevice, reported by syzbot and patch from Florian Westphal.
      
      Patch #6 removes a superfluous restriction which prevents loose fib
               lookups from input and forward hooks, from Eric Garver.
      
      My assessment is that patches #1, #2 and #5 address possible kernel
      crash, anything else in this batch fixes broken features.
      
      netfilter pull request 24-05-29
      
      * tag 'nf-24-05-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: nft_fib: allow from forward/input without iif selector
        netfilter: tproxy: bail out if IP has been disabled on the device
        netfilter: nft_payload: skbuff vlan metadata mangle support
        netfilter: nft_payload: restore vlan q-in-q match support
        netfilter: ipset: Add list flush to cancel_gc
        netfilter: nfnetlink_queue: acquire rcu_read_lock() in instance_destroy_rcu()
      ====================
      
      Link: https://lore.kernel.org/r/20240528225519.1155786-1-pablo@netfilter.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      e889eb17
    • Shay Agroskin's avatar
      net: ena: Fix redundant device NUMA node override · 2dc8b1e7
      Shay Agroskin authored
      The driver overrides the NUMA node id of the device regardless of
      whether it knows its correct value (often setting it to -1 even though
      the node id is advertised in 'struct device'). This can lead to
      suboptimal configurations.
      
      This patch fixes this behavior and makes the shared memory allocation
      functions use the NUMA node id advertised by the underlying device.
      
      Fixes: 1738cd3e ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
      Signed-off-by: default avatarShay Agroskin <shayagr@amazon.com>
      Link: https://lore.kernel.org/r/20240528170912.1204417-1-shayagr@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2dc8b1e7
    • Jakub Kicinski's avatar
      Merge branch 'intel-wired-lan-driver-updates-2024-05-28-e1000e-i40e-ice' · 602d9591
      Jakub Kicinski authored
      Jacob Keller says:
      
      ====================
      Intel Wired LAN Driver Updates 2024-05-28 (e1000e, i40e, ice) [part]
      
      This series includes a variety of fixes that have been accumulating on the
      Intel Wired LAN dev-queue.
      
      Hui Wang provides a fix for suspend/resume on e1000e due to failure
      to correctly setup the SMBUS in enable_ulp().
      
      Thinh Tran provides a fix for EEH I/O suspend/resume on i40e to
      ensure that I/O operations can continue after a resume. To avoid duplicate
      code, the common logic is factored out of i40e_suspend and i40e_resume.
      
      Paul Greenwalt provides a fix to correctly map the 200G PHY types to link
      speeds in the ice driver.
      
      Dave Ertman provides a fix correcting devlink parameter unregistration in
      the event that the driver loads in safe mode and some of the parameters
      were not registered.
      ====================
      
      Link: https://lore.kernel.org/r/20240528-net-2024-05-28-intel-net-fixes-v1-0-dc8593d2bbc6@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      602d9591
    • Dave Ertman's avatar
      ice: check for unregistering correct number of devlink params · a51c9b1c
      Dave Ertman authored
      On module load, the ice driver checks for the lack of a specific PF
      capability to determine if it should reduce the number of devlink params
      to register.  One situation when this test returns true is when the
      driver loads in safe mode.  The same check is not present on the unload
      path when devlink params are unregistered.  This results in the driver
      triggering a WARN_ON in the kernel devlink code.
      
      The current check and code path uses a reduction in the number of elements
      reported in the list of params.  This is fragile and not good for future
      maintaining.
      
      Change the parameters to be held in two lists, one always registered and
      one dependent on the check.
      
      Add a symmetrical check in the unload path so that the correct parameters
      are unregistered as well.
      
      Fixes: 109eb291 ("ice: Add tx_scheduling_layers devlink param")
      CC: Lukasz Czapnik <lukasz.czapnik@intel.com>
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Signed-off-by: default avatarDave Ertman <david.m.ertman@intel.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarPucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Link: https://lore.kernel.org/r/20240528-net-2024-05-28-intel-net-fixes-v1-8-dc8593d2bbc6@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a51c9b1c
    • Paul Greenwalt's avatar
      ice: fix 200G PHY types to link speed mapping · 2a6d8f2d
      Paul Greenwalt authored
      Commit 24407a01 ("ice: Add 200G speed/phy type use") added support
      for 200G PHY speeds, but did not include the mapping of 200G PHY types
      to link speed. As a result the driver is returning UNKNOWN link speed
      when setting 200G ethtool advertised link modes.
      
      To fix this add 200G PHY types to link speed mapping to
      ice_get_link_speed_based_on_phy_type().
      
      Fixes: 24407a01 ("ice: Add 200G speed/phy type use")
      Reviewed-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Signed-off-by: default avatarPaul Greenwalt <paul.greenwalt@intel.com>
      Tested-by: default avatarPucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Link: https://lore.kernel.org/r/20240528-net-2024-05-28-intel-net-fixes-v1-5-dc8593d2bbc6@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2a6d8f2d
    • Thinh Tran's avatar
      i40e: Fully suspend and resume IO operations in EEH case · c80b6538
      Thinh Tran authored
      When EEH events occurs, the callback functions in the i40e, which are
      managed by the EEH driver, will completely suspend and resume all IO
      operations.
      
      - In the PCI error detected callback, replaced i40e_prep_for_reset()
        with i40e_io_suspend(). The change is to fully suspend all I/O
        operations
      - In the PCI error slot reset callback, replaced pci_enable_device_mem()
        with pci_enable_device(). This change enables both I/O and memory of
        the device.
      - In the PCI error resume callback, replaced i40e_handle_reset_warning()
        with i40e_io_resume(). This change allows the system to resume I/O
        operations
      
      Fixes: a5f3d2c1 ("powerpc/pseries/pci: Add MSI domains")
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarRobert Thomas <rob.thomas@ibm.com>
      Signed-off-by: default avatarThinh Tran <thinhtr@linux.ibm.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Tested-by: default avatarPucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Link: https://lore.kernel.org/r/20240528-net-2024-05-28-intel-net-fixes-v1-3-dc8593d2bbc6@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c80b6538
    • Thinh Tran's avatar
      i40e: factoring out i40e_suspend/i40e_resume · 218ed820
      Thinh Tran authored
      Two new functions, i40e_io_suspend() and i40e_io_resume(), have been
      introduced.  These functions were factored out from the existing
      i40e_suspend() and i40e_resume() respectively.  This factoring was
      done due to concerns about the logic of the I40E_SUSPENSED state, which
      caused the device to be unable to recover.  The functions are now used
      in the EEH handling for device suspend/resume callbacks.
      
      The function i40e_enable_mc_magic_wake() has been moved ahead of
      i40e_io_suspend() to ensure it is declared before being used.
      Tested-by: default avatarRobert Thomas <rob.thomas@ibm.com>
      Signed-off-by: default avatarThinh Tran <thinhtr@linux.ibm.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Tested-by: default avatarPucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Link: https://lore.kernel.org/r/20240528-net-2024-05-28-intel-net-fixes-v1-2-dc8593d2bbc6@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      218ed820