- 12 Sep, 2024 3 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfPaolo Abeni authored
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following batch contains two fixes from Florian Westphal: Patch #1 fixes a sk refcount leak in nft_socket on mismatch. Patch #2 fixes cgroupsv2 matching from containers due to incorrect level in subtree. netfilter pull request 24-09-12 * tag 'nf-24-09-12' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nft_socket: make cgroupsv2 matching work with namespaces netfilter: nft_socket: fix sk refcount leaks ==================== Link: https://patch.msgid.link/20240911222520.3606-1-pablo@netfilter.orgSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Willem de Bruijn authored
The referenced commit drops bad input, but has false positives. Tighten the check to avoid these. The check detects illegal checksum offload requests, which produce csum_start/csum_off beyond end of packet after segmentation. But it is based on two incorrect assumptions: 1. virtio_net_hdr_to_skb with VIRTIO_NET_HDR_GSO_TCP[46] implies GSO. True in callers that inject into the tx path, such as tap. But false in callers that inject into rx, like virtio-net. Here, the flags indicate GRO, and CHECKSUM_UNNECESSARY or CHECKSUM_NONE without VIRTIO_NET_HDR_F_NEEDS_CSUM is normal. 2. TSO requires checksum offload, i.e., ip_summed == CHECKSUM_PARTIAL. False, as tcp[46]_gso_segment will fix up csum_start and offset for all other ip_summed by calling __tcp_v4_send_check. Because of 2, we can limit the scope of the fix to virtio_net_hdr that do try to set these fields, with a bogus value. Link: https://lore.kernel.org/netdev/20240909094527.GA3048202@port70.net/ Fixes: 89add400 ("net: drop bad gso csum_start and offset in virtio_net_hdr") Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20240910213553.839926-1-willemdebruijn.kernel@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Asbjørn Sloth Tønnesen authored
The MPTCP port attribute is in host endianness, but was documented as big-endian in the ynl specification. Below are two examples from net/mptcp/pm_netlink.c showing that the attribute is converted to/from host endianness for use with netlink. Import from netlink: addr->port = htons(nla_get_u16(tb[MPTCP_PM_ADDR_ATTR_PORT])) Export to netlink: nla_put_u16(skb, MPTCP_PM_ADDR_ATTR_PORT, ntohs(addr->port)) Where addr->port is defined as __be16. No functional change intended. Fixes: bc8aeb20 ("Documentation: netlink: add a YAML spec for mptcp") Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240911091003.1112179-1-ast@fiberby.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 11 Sep, 2024 15 commits
-
-
Sean Anderson authored
When sending packets under 60 bytes, up to three bytes of the buffer following the data may be leaked. Avoid this by extending all packets to ETH_ZLEN, ensuring nothing is leaked in the padding. This bug can be reproduced by running $ ping -s 11 destination Fixes: 9ad1a374 ("dpaa_eth: add support for DPAA Ethernet") Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240910143144.1439910-1-sean.anderson@linux.devSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Edward Adam Davis authored
There are two paths to access mptcp_pm_del_add_timer, result in a race condition: CPU1 CPU2 ==== ==== net_rx_action napi_poll netlink_sendmsg __napi_poll netlink_unicast process_backlog netlink_unicast_kernel __netif_receive_skb genl_rcv __netif_receive_skb_one_core netlink_rcv_skb NF_HOOK genl_rcv_msg ip_local_deliver_finish genl_family_rcv_msg ip_protocol_deliver_rcu genl_family_rcv_msg_doit tcp_v4_rcv mptcp_pm_nl_flush_addrs_doit tcp_v4_do_rcv mptcp_nl_remove_addrs_list tcp_rcv_established mptcp_pm_remove_addrs_and_subflows tcp_data_queue remove_anno_list_by_saddr mptcp_incoming_options mptcp_pm_del_add_timer mptcp_pm_del_add_timer kfree(entry) In remove_anno_list_by_saddr(running on CPU2), after leaving the critical zone protected by "pm.lock", the entry will be released, which leads to the occurrence of uaf in the mptcp_pm_del_add_timer(running on CPU1). Keeping a reference to add_timer inside the lock, and calling sk_stop_timer_sync() with this reference, instead of "entry->add_timer". Move list_del(&entry->list) to mptcp_pm_del_add_timer and inside the pm lock, do not directly access any members of the entry outside the pm lock, which can avoid similar "entry->x" uaf. Fixes: 00cfd77b ("mptcp: retransmit ADD_ADDR when timeout") Cc: stable@vger.kernel.org Reported-and-tested-by: syzbot+f3a31fb909db9b2a5c4d@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=f3a31fb909db9b2a5c4dSigned-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Edward Adam Davis <eadavis@qq.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/tencent_7142963A37944B4A74EF76CD66EA3C253609@qq.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jiawen Wu authored
The number of transmit and receive descriptors must be a multiple of 128 due to the hardware limitation. If it is set to a multiple of 8 instead of a multiple 128, the queues will easily be hung. Cc: stable@vger.kernel.org Fixes: 883b5984 ("net: wangxun: add ethtool_ops for ring parameters") Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240910095629.570674-1-jiawenwu@trustnetic.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Xiaoliang Yang authored
The TAS module could not be configured when it's running in pending status. We need disable the module and configure it again. However, the pending status is not cleared after the module disabled. TC taprio set will always return busy even it's disabled. For example, a user uses tc-taprio to configure Qbv and a future basetime. The TAS module will run in a pending status. There is no way to reconfigure Qbv, it always returns busy. Actually the TAS module can be reconfigured when it's disabled. So it doesn't need to check the pending status if the TAS module is disabled. After the patch, user can delete the tc taprio configuration to disable Qbv and reconfigure it again. Fixes: de143c0e ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload") Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Link: https://patch.msgid.link/20240906093550.29985-1-xiaoliang.yang_1@nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jeongjun Park authored
In the function hsr_proxy_annouance() added in the previous commit 5f703ce5 ("net: hsr: Send supervisory frames to HSR network with ProxyNodeTable data"), the return value of the hsr_port_get_hsr() function is not checked to be a NULL pointer, which causes a NULL pointer dereference. To solve this, we need to add code to check whether the return value of hsr_port_get_hsr() is NULL. Reported-by: syzbot+02a42d9b1bd395cbcab4@syzkaller.appspotmail.com Fixes: 5f703ce5 ("net: hsr: Send supervisory frames to HSR network with ProxyNodeTable data") Signed-off-by: Jeongjun Park <aha310510@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Lukasz Majewski <lukma@denx.de> Link: https://patch.msgid.link/20240907190341.162289-1-aha310510@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Matthieu Baerts says: ==================== selftests: mptcp: misc. small fixes Here are some various fixes for the MPTCP selftests. Patch 1 fixes a recently modified test to continue to work as expected on older kernels. This is a fix for a recent fix that can be backported up to v5.15. Patch 2 and 3 include dependences when exporting or installing the tests. Two fixes for v6.11-rc1. ==================== Link: https://patch.msgid.link/20240910-net-selftests-mptcp-fix-install-v1-0-8f124aa9156d@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Matthieu Baerts (NGI0) authored
Similar to the previous commit, the net_helper.sh file from the parent directory is used by the MPTCP selftests and it needs to be present when running the tests. This file then needs to be listed in the Makefile to be included when exporting or installing the tests, e.g. with: make -C tools/testing/selftests \ TARGETS=net/mptcp \ install INSTALL_PATH=$KSFT_INSTALL_PATH cd $KSFT_INSTALL_PATH ./run_kselftest.sh -c net/mptcp Fixes: 1af3bc91 ("selftests: mptcp: lib: use wait_local_port_listen helper") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240910-net-selftests-mptcp-fix-install-v1-3-8f124aa9156d@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Matthieu Baerts (NGI0) authored
The lib.sh file from the parent directory is used by the MPTCP selftests and it needs to be present when running the tests. This file then needs to be listed in the Makefile to be included when exporting or installing the tests, e.g. with: make -C tools/testing/selftests \ TARGETS=net/mptcp \ install INSTALL_PATH=$KSFT_INSTALL_PATH cd $KSFT_INSTALL_PATH ./run_kselftest.sh -c net/mptcp Fixes: f265d311 ("selftests: mptcp: lib: use setup/cleanup_ns helpers") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240910-net-selftests-mptcp-fix-install-v1-2-8f124aa9156d@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Matthieu Baerts (NGI0) authored
A new endpoint using the IP of the initial subflow has been recently added to increase the code coverage. But it breaks the test when using old kernels not having commit 86e39e04 ("mptcp: keep track of local endpoint still available for each msk"), e.g. on v5.15. Similar to commit d4c81bbb ("selftests: mptcp: join: support local endpoint being tracked or not"), it is possible to add the new endpoint conditionally, by checking if "mptcp_pm_subflow_check_next" is present in kallsyms: this is not directly linked to the commit introducing this symbol but for the parent one which is linked anyway. So we can know in advance what will be the expected behaviour, and add the new endpoint only when it makes sense to do so. Fixes: 4878f9f8 ("selftests: mptcp: join: validate fullmesh endp on 1st sf") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240910-net-selftests-mptcp-fix-install-v1-1-8f124aa9156d@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Florian Westphal authored
When running in container environmment, /sys/fs/cgroup/ might not be the real root node of the sk-attached cgroup. Example: In container: % stat /sys//fs/cgroup/ Device: 0,21 Inode: 2214 .. % stat /sys/fs/cgroup/foo Device: 0,21 Inode: 2264 .. The expectation would be for: nft add rule .. socket cgroupv2 level 1 "foo" counter to match traffic from a process that got added to "foo" via "echo $pid > /sys/fs/cgroup/foo/cgroup.procs". However, 'level 3' is needed to make this work. Seen from initial namespace, the complete hierarchy is: % stat /sys/fs/cgroup/system.slice/docker-.../foo Device: 0,21 Inode: 2264 .. i.e. hierarchy is 0 1 2 3 / -> system.slice -> docker-1... -> foo ... but the container doesn't know that its "/" is the "docker-1.." cgroup. Current code will retrieve the 'system.slice' cgroup node and store its kn->id in the destination register, so compare with 2264 ("foo" cgroup id) will not match. Fetch "/" cgroup from ->init() and add its level to the level we try to extract. cgroup root-level is 0 for the init-namespace or the level of the ancestor that is exposed as the cgroup root inside the container. In the above case, cgrp->level of "/" resolved in the container is 2 (docker-1...scope/) and request for 'level 1' will get adjusted to fetch the actual level (3). v2: use CONFIG_SOCK_CGROUP_DATA, eval function depends on it. (kernel test robot) Cc: cgroups@vger.kernel.org Fixes: e0bb96db ("netfilter: nft_socket: add support for cgroupsv2") Reported-by: Nadia Pinaeva <n.m.pinaeva@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Florian Westphal authored
We must put 'sk' reference before returning. Fixes: 039b1f4f ("netfilter: nft_socket: fix erroneous socket assignment") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queueJakub Kicinski authored
Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-09-09 (ice, igb) This series contains updates to ice and igb drivers. Martyna moves LLDP rule removal to the proper uninitialization function for ice. Jake corrects accounting logic for FWD_TO_VSI_LIST switch filters on ice. Przemek removes incorrect, explicit calls to pci_disable_device() for ice. Michal Schmidt stops incorrect use of VSI list for VLAN use on ice. Sriram Yagnaraman adjusts igb_xdp_ring_update_tail() to be called under Tx lock on igb. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: igb: Always call igb_xdp_ring_update_tail() under Tx lock ice: fix VSI lists confusion when adding VLANs ice: stop calling pci_disable_device() as we use pcim ice: fix accounting for filters shared by multiple VSIs ice: Fix lldp packets dropping after changing the number of channels ==================== Link: https://patch.msgid.link/20240909203842.3109822-1-anthony.l.nguyen@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linuxJakub Kicinski authored
Saeed Mahameed says: ==================== mlx5 fixes 2024-09-09 This series provides bug fixes to mlx5 driver. * tag 'mlx5-fixes-2024-09-09' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: Fix bridge mode operations when there are no VFs net/mlx5: Verify support for scheduling element and TSAR type net/mlx5: Add missing masks and QoS bit masks for scheduling elements net/mlx5: Explicitly set scheduling element and TSAR type net/mlx5e: Add missing link mode to ptys2ext_ethtool_map net/mlx5e: Add missing link modes to ptys2ethtool_map net/mlx5: Update the list of the PCI supported devices ==================== Link: https://patch.msgid.link/20240909194505.69715-1-saeed@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Kory Maincent authored
Add net/ethtool/pse-pd.c to PSE NETWORK DRIVER to receive emails concerning modifications to the ethtool part. Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240909114336.362174-1-kory.maincent@bootlin.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Wei Fang authored
As Rob pointed in another mail thread [1], the binding of tja11xx PHY is completely broken, the schema cannot catch the error in the DTS. A compatiable string must be needed if we want to add a custom propety. So extract known PHY IDs from the tja11xx PHY drivers and convert them into supported compatible string list to fix the broken binding issue. Fixes: 52b2fe45 ("dt-bindings: net: tja11xx: add nxp,refclk_in property") Link: https://lore.kernel.org/31058f49-bac5-49a9-a422-c43b121bf049@kernel.org # [1] Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20240909012152.431647-1-wei.fang@nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 10 Sep, 2024 9 commits
-
-
Sean Anderson authored
Padding is not included in UDP and TCP checksums. Therefore, reduce the length of the checksummed data to include only the data in the IP payload. This fixes spurious reported checksum failures like rx: pkt: sport=33000 len=26 csum=0xc850 verify=0xf9fe pkt: bad csum Technically it is possible for there to be trailing bytes after the UDP data but before the Ethernet padding (e.g. if sizeof(ip) + sizeof(udp) + udp.len < ip.len). However, we don't generate such packets. Fixes: 91a7de85 ("selftests/net: add csum offload test") Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240906210743.627413-1-sean.anderson@linux.devSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Tomas Paukrt authored
The probe() function is only used for DP83822 and DP83826 PHY, leaving the private data pointer uninitialized for the DP83825 models which causes a NULL pointer dereference in the recently introduced/changed functions dp8382x_config_init() and dp83822_set_wol(). Add the dp8382x_probe() function, so all PHY models will have a valid private data pointer to fix this issue and also prevent similar issues in the future. Fixes: 9ef9ecfa ("net: phy: dp8382x: keep WOL settings across suspends") Signed-off-by: Tomas Paukrt <tomaspaukrt@email.cz> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/66w.ZbGt.65Ljx42yHo5.1csjxu@seznam.czSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Xuan Zhuo says: ==================== Revert "virtio_net: rx enable premapped mode by default" Regression: http://lore.kernel.org/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com ==================== Link: https://patch.msgid.link/20240906123137.108741-1-xuanzhuo@linux.alibaba.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Xuan Zhuo authored
Now, the premapped mode encounters some problem. http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com So we disable the premapped mode by default. We can re-enable it in the future. Fixes: f9dac92b ("virtio_ring: enable premapped mode whatever use_dma_api") Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.comSigned-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Takero Funaki <flintglass@gmail.com> Link: https://patch.msgid.link/20240906123137.108741-4-xuanzhuo@linux.alibaba.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Xuan Zhuo authored
This reverts commit a377ae54. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Takero Funaki <flintglass@gmail.com> Link: https://patch.msgid.link/20240906123137.108741-3-xuanzhuo@linux.alibaba.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Xuan Zhuo authored
This reverts commit defd28aa. Recover the code to disable premapped mode. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Takero Funaki <flintglass@gmail.com> Link: https://patch.msgid.link/20240906123137.108741-2-xuanzhuo@linux.alibaba.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jacky Chou authored
Currently, the driver only enables RX interrupt to handle RX packets and TX resources. Sometimes there is not RX traffic, so the TX resource needs to wait for RX interrupt to free. This situation will toggle the TX timeout watchdog when the MAC TX ring has no more resources to transmit packets. Therefore, enable TX interrupt to release TX resources at any time. When I am verifying iperf3 over UDP, the network hangs. Like the log below. root# iperf3 -c 192.168.100.100 -i1 -t10 -u -b0 Connecting to host 192.168.100.100, port 5201 [ 4] local 192.168.100.101 port 35773 connected to 192.168.100.100 port 5201 [ ID] Interval Transfer Bandwidth Total Datagrams [ 4] 0.00-20.42 sec 160 KBytes 64.2 Kbits/sec 20 [ 4] 20.42-20.42 sec 0.00 Bytes 0.00 bits/sec 0 [ 4] 20.42-20.42 sec 0.00 Bytes 0.00 bits/sec 0 [ 4] 20.42-20.42 sec 0.00 Bytes 0.00 bits/sec 0 [ 4] 20.42-20.42 sec 0.00 Bytes 0.00 bits/sec 0 [ 4] 20.42-20.42 sec 0.00 Bytes 0.00 bits/sec 0 [ 4] 20.42-20.42 sec 0.00 Bytes 0.00 bits/sec 0 [ 4] 20.42-20.42 sec 0.00 Bytes 0.00 bits/sec 0 [ 4] 20.42-20.42 sec 0.00 Bytes 0.00 bits/sec 0 [ 4] 20.42-20.42 sec 0.00 Bytes 0.00 bits/sec 0 - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams [ 4] 0.00-20.42 sec 160 KBytes 64.2 Kbits/sec 0.000 ms 0/20 (0%) [ 4] Sent 20 datagrams iperf3: error - the server has terminated The network topology is FTGMAC connects directly to a PC. UDP does not need to wait for ACK, unlike TCP. Therefore, FTGMAC needs to enable TX interrupt to release TX resources instead of waiting for the RX interrupt. Fixes: 10cbd640 ("ftgmac100: Rework NAPI & interrupts handling") Signed-off-by: Jacky Chou <jacky_chou@aspeedtech.com> Link: https://patch.msgid.link/20240906062831.2243399-1-jacky_chou@aspeedtech.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Naveen Mamindlapalli authored
The current implementation of SMQ flush sequence waits for the packets in the TM pipeline to be transmitted out of the link. This sequence doesn't succeed in HW when there is any issue with link such as lack of link credits, link down or any other traffic that is fully occupying the link bandwidth (QoS). This patch modifies the SMQ flush sequence to drop the packets after TL1 level (SQM) instead of polling for the packets to be sent out of RPM/CGX link. Fixes: 5d9b976d ("octeontx2-af: Support fixed transmit scheduler topology") Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com> Reviewed-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Link: https://patch.msgid.link/20240906045838.1620308-1-naveenm@marvell.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Muhammad Usama Anjum authored
The grc must be initialize first. There can be a condition where if fou is NULL, goto out will be executed and grc would be used uninitialized. Fixes: 7e419693 ("fou: Fix null-ptr-deref in GRO.") Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20240906102839.202798-1-usama.anjum@collabora.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 09 Sep, 2024 13 commits
-
-
Benjamin Poirier authored
Currently, trying to set the bridge mode attribute when numvfs=0 leads to a crash: bridge link set dev eth2 hwmode vepa [ 168.967392] BUG: kernel NULL pointer dereference, address: 0000000000000030 [...] [ 168.969989] RIP: 0010:mlx5_add_flow_rules+0x1f/0x300 [mlx5_core] [...] [ 168.976037] Call Trace: [ 168.976188] <TASK> [ 168.978620] _mlx5_eswitch_set_vepa_locked+0x113/0x230 [mlx5_core] [ 168.979074] mlx5_eswitch_set_vepa+0x7f/0xa0 [mlx5_core] [ 168.979471] rtnl_bridge_setlink+0xe9/0x1f0 [ 168.979714] rtnetlink_rcv_msg+0x159/0x400 [ 168.980451] netlink_rcv_skb+0x54/0x100 [ 168.980675] netlink_unicast+0x241/0x360 [ 168.980918] netlink_sendmsg+0x1f6/0x430 [ 168.981162] ____sys_sendmsg+0x3bb/0x3f0 [ 168.982155] ___sys_sendmsg+0x88/0xd0 [ 168.985036] __sys_sendmsg+0x59/0xa0 [ 168.985477] do_syscall_64+0x79/0x150 [ 168.987273] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 168.987773] RIP: 0033:0x7f8f7950f917 (esw->fdb_table.legacy.vepa_fdb is null) The bridge mode is only relevant when there are multiple functions per port. Therefore, prevent setting and getting this setting when there are no VFs. Note that after this change, there are no settings to change on the PF interface using `bridge link` when there are no VFs, so the interface no longer appears in the `bridge link` output. Fixes: 4b89251d ("net/mlx5: Support ndo bridge_setlink and getlink") Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Carolina Jubran authored
Before creating a scheduling element in a NIC or E-Switch scheduler, ensure that the requested element type is supported. If the element is of type Transmit Scheduling Arbiter (TSAR), also verify that the specific TSAR type is supported. Fixes: 214baf22 ("net/mlx5e: Support HTB offload") Fixes: 85c5f7c9 ("net/mlx5: E-switch, Create QoS on demand") Fixes: 0fe132ea ("net/mlx5: E-switch, Allow to add vports to rate groups") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Carolina Jubran authored
Add the missing masks for supported element types and Transmit Scheduling Arbiter (TSAR) types in scheduling elements. Also, add the corresponding bit masks for these types in the QoS capabilities of a NIC scheduler. Fixes: 214baf22 ("net/mlx5e: Support HTB offload") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Carolina Jubran authored
Ensure the scheduling element type and TSAR type are explicitly initialized in the QoS rate group creation. This prevents potential issues due to default values. Fixes: 1ae258f8 ("net/mlx5: E-switch, Introduce rate limiting groups API") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Shahar Shitrit authored
Add MLX5E_400GAUI_8_400GBASE_CR8 to the extended modes in ptys2ext_ethtool_table, since it was missing. Fixes: 6a897372 ("net/mlx5: ethtool, Add ethtool support for 50Gbps per lane link modes") Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Shahar Shitrit authored
Add MLX5E_1000BASE_T and MLX5E_100BASE_TX to the legacy modes in ptys2legacy_ethtool_table, since they were missing. Fixes: 665bc539 ("net/mlx5e: Use new ethtool get/set link ksettings API") Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Maher Sanalla authored
Add the upcoming ConnectX-9 device ID to the table of supported PCI device IDs. Fixes: f908a35b ("net/mlx5: Update the list of the PCI supported devices") Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Sriram Yagnaraman authored
Always call igb_xdp_ring_update_tail() under __netif_tx_lock, add a comment and lockdep assert to indicate that. This is needed to share the same TX ring between XDP, XSK and slow paths. Furthermore, the current XDP implementation is racy on tail updates. Fixes: 9cbc948b ("igb: add XDP support") Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech> [Kurt: Add lockdep assert and fixes tag] Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Michal Schmidt authored
The description of function ice_find_vsi_list_entry says: Search VSI list map with VSI count 1 However, since the blamed commit (see Fixes below), the function no longer checks vsi_count. This causes a problem in ice_add_vlan_internal, where the decision to share VSI lists between filter rules relies on the vsi_count of the found existing VSI list being 1. The reproducing steps: 1. Have a PF and two VFs. There will be a filter rule for VLAN 0, referring to a VSI list containing VSIs: 0 (PF), 2 (VF#0), 3 (VF#1). 2. Add VLAN 1234 to VF#0. ice will make the wrong decision to share the VSI list with the new rule. The wrong behavior may not be immediately apparent, but it can be observed with debug prints. 3. Add VLAN 1234 to VF#1. ice will unshare the VSI list for the VLAN 1234 rule. Due to the earlier bad decision, the newly created VSI list will contain VSIs 0 (PF) and 3 (VF#1), instead of expected 2 (VF#0) and 3 (VF#1). 4. Try pinging a network peer over the VLAN interface on VF#0. This fails. Reproducer script at: https://gitlab.com/mschmidt2/repro/-/blob/master/RHEL-46814/test-vlan-vsi-list-confusion.sh Commented debug trace: https://gitlab.com/mschmidt2/repro/-/blob/master/RHEL-46814/ice-vlan-vsi-lists-debug.txt Patch adding the debug prints: https://gitlab.com/mschmidt2/linux/-/commit/f8a8814623944a45091a77c6094c40bfe726bfdb (Unsafe, by the way. Lacks rule_lock when dumping in ice_remove_vlan.) Michal Swiatkowski added to the explanation that the bug is caused by reusing a VSI list created for VLAN 0. All created VFs' VSIs are added to VLAN 0 filter. When a non-zero VLAN is created on a VF which is already in VLAN 0 (normal case), the VSI list from VLAN 0 is reused. It leads to a problem because all VFs (VSIs to be specific) that are subscribed to VLAN 0 will now receive a new VLAN tag traffic. This is one bug, another is the bug described above. Removing filters from one VF will remove VLAN filter from the previous VF. It happens a VF is reset. Example: - creation of 3 VFs - we have VSI list (used for VLAN 0) [0 (pf), 2 (vf1), 3 (vf2), 4 (vf3)] - we are adding VLAN 100 on VF1, we are reusing the previous list because 2 is there - VLAN traffic works fine, but VLAN 100 tagged traffic can be received on all VSIs from the list (for example broadcast or unicast) - trust is turning on VF2, VF2 is resetting, all filters from VF2 are removed; the VLAN 100 filter is also removed because 3 is on the list - VLAN traffic to VF1 isn't working anymore, there is a need to recreate VLAN interface to readd VLAN filter One thing I'm not certain about is the implications for the LAG feature, which is another caller of ice_find_vsi_list_entry. I don't have a LAG-capable card at hand to test. Fixes: 23ccae5c ("ice: changes to the interface with the HW and FW for SRIOV_VF+LAG") Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Dave Ertman <David.m.ertman@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Przemek Kitszel authored
Our driver uses devres to manage resources, in particular we call pcim_enable_device(), what also means we express the intent to get automatic pci_disable_device() call at driver removal. Manual calls to pci_disable_device() misuse the API. Recent commit (see "Fixes" tag) has changed the removal action from conditional (silent ignore of double call to pci_disable_device()) to unconditional, but able to catch unwanted redundant calls; see cited "Fixes" commit for details. Since that, unloading the driver yields following warn+splat: [70633.628490] ice 0000:af:00.7: disabling already-disabled device [70633.628512] WARNING: CPU: 52 PID: 33890 at drivers/pci/pci.c:2250 pci_disable_device+0xf4/0x100 ... [70633.628744] ? pci_disable_device+0xf4/0x100 [70633.628752] release_nodes+0x4a/0x70 [70633.628759] devres_release_all+0x8b/0xc0 [70633.628768] device_unbind_cleanup+0xe/0x70 [70633.628774] device_release_driver_internal+0x208/0x250 [70633.628781] driver_detach+0x47/0x90 [70633.628786] bus_remove_driver+0x80/0x100 [70633.628791] pci_unregister_driver+0x2a/0xb0 [70633.628799] ice_module_exit+0x11/0x3a [ice] Note that this is the only Intel ethernet driver that needs such fix. Fixes: f748a07a ("PCI: Remove legacy pcim_release()") Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Jacob Keller authored
When adding a switch filter (such as a MAC or VLAN filter), it is expected that the driver will detect the case where the filter already exists, and return -EEXIST. This is used by calling code such as ice_vc_add_mac_addr, and ice_vsi_add_vlan to avoid incrementing the accounting fields such as vsi->num_vlan or vf->num_mac. This logic works correctly for the case where only a single VSI has added a given switch filter. When a second VSI adds the same switch filter, the driver converts the existing filter from an ICE_FWD_TO_VSI filter into an ICE_FWD_TO_VSI_LIST filter. This saves switch resources, by ensuring that multiple VSIs can re-use the same filter. The ice_add_update_vsi_list() function is responsible for doing this conversion. When first converting a filter from the FWD_TO_VSI into FWD_TO_VSI_LIST, it checks if the VSI being added is the same as the existing rule's VSI. In such a case it returns -EEXIST. However, when the switch rule has already been converted to a FWD_TO_VSI_LIST, the logic is different. Adding a new VSI in this case just requires extending the VSI list entry. The logic for checking if the rule already exists in this case returns 0 instead of -EEXIST. This breaks the accounting logic mentioned above, so the counters for how many MAC and VLAN filters exist for a given VF or VSI no longer accurately reflect the actual count. This breaks other code which relies on these counts. In typical usage this primarily affects such filters generally shared by multiple VSIs such as VLAN 0, or broadcast and multicast MAC addresses. Fix this by correctly reporting -EEXIST in the case of adding the same VSI to a switch rule already converted to ICE_FWD_TO_VSI_LIST. Fixes: 9daf8208 ("ice: Add support for switch filter programming") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Martyna Szapar-Mudlaw authored
After vsi setup refactor commit 6624e780 ("ice: split ice_vsi_setup into smaller functions") ice_cfg_sw_lldp function which removes rx rule directing LLDP packets to vsi is moved from ice_vsi_release to ice_vsi_decfg function. ice_vsi_decfg is used in more cases than just in vsi_release resulting in unnecessary removal of rx lldp packets handling switch rule. This leads to lldp packets being dropped after a change number of channels via ethtool. This patch moves ice_cfg_sw_lldp function that removes rx lldp sw rule back to ice_vsi_release function. Fixes: 6624e780 ("ice: split ice_vsi_setup into smaller functions") Reported-by: Matěj Grégr <mgregr@netx.as> Closes: https://lore.kernel.org/intel-wired-lan/1be45a76-90af-4813-824f-8398b69745a9@netx.as/T/#uReviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@linux.intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Eric Dumazet authored
syzbot found a new splat [1]. Instead of adding yet another spin_lock_bh(&hsr->seqnr_lock) / spin_unlock_bh(&hsr->seqnr_lock) pair, remove seqnr_lock and use atomic_t for hsr->sequence_nr and hsr->sup_sequence_nr. This also avoid a race in hsr_fill_info(). Also remove interlink_sequence_nr which is unused. [1] WARNING: CPU: 1 PID: 9723 at net/hsr/hsr_forward.c:602 handle_std_frame+0x247/0x2c0 net/hsr/hsr_forward.c:602 Modules linked in: CPU: 1 UID: 0 PID: 9723 Comm: syz.0.1657 Not tainted 6.11.0-rc6-syzkaller-00026-g88fac175 #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 RIP: 0010:handle_std_frame+0x247/0x2c0 net/hsr/hsr_forward.c:602 Code: 49 8d bd b0 01 00 00 be ff ff ff ff e8 e2 58 25 00 31 ff 89 c5 89 c6 e8 47 53 a8 f6 85 ed 0f 85 5a ff ff ff e8 fa 50 a8 f6 90 <0f> 0b 90 e9 4c ff ff ff e8 cc e7 06 f7 e9 8f fe ff ff e8 52 e8 06 RSP: 0018:ffffc90000598598 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffc90000598670 RCX: ffffffff8ae2c919 RDX: ffff888024e94880 RSI: ffffffff8ae2c926 RDI: 0000000000000005 RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000003 R13: ffff8880627a8cc0 R14: 0000000000000000 R15: ffff888012b03c3a FS: 0000000000000000(0000) GS:ffff88802b700000(0063) knlGS:00000000f5696b40 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 0000000020010000 CR3: 00000000768b4000 CR4: 0000000000350ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> hsr_fill_frame_info+0x2c8/0x360 net/hsr/hsr_forward.c:630 fill_frame_info net/hsr/hsr_forward.c:700 [inline] hsr_forward_skb+0x7df/0x25c0 net/hsr/hsr_forward.c:715 hsr_handle_frame+0x603/0x850 net/hsr/hsr_slave.c:70 __netif_receive_skb_core.constprop.0+0xa3d/0x4330 net/core/dev.c:5555 __netif_receive_skb_list_core+0x357/0x950 net/core/dev.c:5737 __netif_receive_skb_list net/core/dev.c:5804 [inline] netif_receive_skb_list_internal+0x753/0xda0 net/core/dev.c:5896 gro_normal_list include/net/gro.h:515 [inline] gro_normal_list include/net/gro.h:511 [inline] napi_complete_done+0x23f/0x9a0 net/core/dev.c:6247 gro_cell_poll+0x162/0x210 net/core/gro_cells.c:66 __napi_poll.constprop.0+0xb7/0x550 net/core/dev.c:6772 napi_poll net/core/dev.c:6841 [inline] net_rx_action+0xa92/0x1010 net/core/dev.c:6963 handle_softirqs+0x216/0x8f0 kernel/softirq.c:554 do_softirq kernel/softirq.c:455 [inline] do_softirq+0xb2/0xf0 kernel/softirq.c:442 </IRQ> <TASK> Fixes: 06afd2c3 ("hsr: Synchronize sending frames to have always incremented outgoing seq nr.") Fixes: f421436a ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-