- 26 Jul, 2024 11 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linuxLinus Torvalds authored
Pull arm64 fixes from Will Deacon: "The usual summary below, but the main fix is for the fast GUP lockless page-table walk when we have a combination of compile-time and run-time folding of the p4d and the pud respectively. - Remove some redundant Kconfig conditionals - Fix string output in ptrace selftest - Fix fast GUP crashes in some page-table configurations - Remove obsolete linker option when building the vDSO - Fix some sysreg field definitions for the GIC" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: mm: Fix lockless walks with static and dynamic page-table folding arm64/sysreg: Correct the values for GICv4.1 arm64/vdso: Remove --hash-style=sysv kselftest: missing arg in ptrace.c arm64/Kconfig: Remove redundant 'if HAVE_FUNCTION_GRAPH_TRACER' arm64: remove redundant 'if HAVE_ARCH_KASAN' in Kconfig
-
https://github.com/ceph/ceph-clientLinus Torvalds authored
Pull ceph updates from Ilya Dryomov: "A small patchset to address bogus I/O errors and ultimately an assertion failure in the face of watch errors with -o exclusive mappings in RBD marked for stable and some assorted CephFS fixes" * tag 'ceph-for-6.11-rc1' of https://github.com/ceph/ceph-client: rbd: don't assume rbd_is_lock_owner() for exclusive mappings rbd: don't assume RBD_LOCK_STATE_LOCKED for exclusive mappings rbd: rename RBD_LOCK_STATE_RELEASING and releasing_wait ceph: fix incorrect kmalloc size of pagevec mempool ceph: periodically flush the cap releases ceph: convert comma to semicolon in __ceph_dentry_dir_lease_touch() ceph: use cap_wait_list only if debugfs is enabled
-
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofsLinus Torvalds authored
Pull more erofs updates from Gao Xiang: - Support STATX_DIOALIGN and FS_IOC_GETFSSYSFSPATH - Fix a race of LZ4 decompression due to recent refactoring - Another multi-page folio adaption in erofs_bread() * tag 'erofs-for-6.11-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: convert comma to semicolon erofs: support multi-page folios for erofs_bread() erofs: add support for FS_IOC_GETFSSYSFSPATH erofs: fix race in z_erofs_get_gbuf() erofs: support STATX_DIOALIGN
-
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds authored
Pull struct file leak fixes from Al Viro: "a couple of leaks on failure exits missing fdput()" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: lirc: rc_dev_get_from_fd(): fix file leak powerpc: fix a file leak in kvm_vcpu_ioctl_enable_cap()
-
Linus Torvalds authored
On arm64 we build compressed images, but "make install" by default will install the old non-compressed one. To actually get the compressed image install, you need to use "make zinstall", which is not the usual way to install a kernel. Which may not sound like much of an issue, but when you deal with multiple architectures (and years of your fingers knowing the regular "make install" incantation), this inconsistency is pretty annoying. But as Will Deacon says: "Sadly, bootloaders being as top quality as you might expect, I don't think we're in a position to rely on decompressor support across the board. Our Image.gz is literally just that -- we don't have a built-in decompressor (nor do I think we want to rush into that again after the fun we had on arm32) and the recent EFI zboot support solves that problem for platforms using EFI. Changing the default 'install' target terrifies me. There are bound to be folks with embedded boards who've scripted this and we could really ruin their day if we quietly give them a compressed kernel that their bootloader doesn't know how to handle :/" So make this conditional on a new "COMPRESSED_INSTALL" option. Cc: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
https://github.com:/norov/linuxLinus Torvalds authored
Pull bitmap updates from Yury Norov: "Random fixes" * tag 'bitmap-6.11-rc1' of https://github.com:/norov/linux: riscv: Remove unnecessary int cast in variable_fls() radix tree test suite: put definition of bitmap_clear() into lib/bitmap.c bitops: Add a comment explaining the double underscore macros lib: bitmap: add missing MODULE_DESCRIPTION() macros cpumask: introduce assign_cpu() macro
-
Chen Ni authored
Replace a comma between expression statements by a semicolon. Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Link: https://lore.kernel.org/r/20240724020721.2389738-1-nichen@iscas.ac.cnReviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
-
Gao Xiang authored
If the requested page is part of the previous multi-page folio, there is no need to call read_mapping_folio() again. Also, get rid of the remaining one of page->index [1] in our codebase. [1] https://lore.kernel.org/r/Zp8fgUSIBGQ1TN0D@casper.infradead.org Cc: Matthew Wilcox <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240723073024.875290-1-hsiangkao@linux.alibaba.com
-
Huang Xiaojia authored
FS_IOC_GETFSSYSFSPATH ioctl exposes /sys/fs path of a given filesystem, potentially standarizing sysfs reporting. This patch add support for FS_IOC_GETFSSYSFSPATH for erofs, "erofs/<dev>" will be outputted for bdev cases, "erofs/[domain_id,]<fs_id>" will be outputted for fscache cases. Signed-off-by: Huang Xiaojia <huangxiaojia2@huawei.com> Link: https://lore.kernel.org/r/20240720082335.441563-1-huangxiaojia2@huawei.comReviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
-
Gao Xiang authored
In z_erofs_get_gbuf(), the current task may be migrated to another CPU between `z_erofs_gbuf_id()` and `spin_lock(&gbuf->lock)`. Therefore, z_erofs_put_gbuf() will trigger the following issue which was found by stress test: <2>[772156.434168] kernel BUG at fs/erofs/zutil.c:58! .. <4>[772156.435007] <4>[772156.439237] CPU: 0 PID: 3078 Comm: stress Kdump: loaded Tainted: G E 6.10.0-rc7+ #2 <4>[772156.439239] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 1.0.0 01/01/2017 <4>[772156.439241] pstate: 83400005 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) <4>[772156.439243] pc : z_erofs_put_gbuf+0x64/0x70 [erofs] <4>[772156.439252] lr : z_erofs_lz4_decompress+0x600/0x6a0 [erofs] .. <6>[772156.445958] stress (3127): drop_caches: 1 <4>[772156.446120] Call trace: <4>[772156.446121] z_erofs_put_gbuf+0x64/0x70 [erofs] <4>[772156.446761] z_erofs_lz4_decompress+0x600/0x6a0 [erofs] <4>[772156.446897] z_erofs_decompress_queue+0x740/0xa10 [erofs] <4>[772156.447036] z_erofs_runqueue+0x428/0x8c0 [erofs] <4>[772156.447160] z_erofs_readahead+0x224/0x390 [erofs] .. Fixes: f36f3010 ("erofs: rename per-CPU buffers to global buffer pool and make it configurable") Cc: <stable@vger.kernel.org> # 6.10+ Reviewed-by: Chunhai Guo <guochunhai@vivo.com> Reviewed-by: Sandeep Dhavale <dhavale@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240722035110.3456740-1-hsiangkao@linux.alibaba.com
-
Hongbo Li authored
Add support for STATX_DIOALIGN to EROFS, so that direct I/O alignment restrictions are exposed to userspace in a generic way. [Before] ``` ./statx_test /mnt/erofs/testfile statx(/mnt/erofs/testfile) = 0 dio mem align:0 dio offset align:0 ``` [After] ``` ./statx_test /mnt/erofs/testfile statx(/mnt/erofs/testfile) = 0 dio mem align:512 dio offset align:512 ``` Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240718083243.2485437-1-hsiangkao@linux.alibaba.com
-
- 25 Jul, 2024 29 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds authored
Pull networking fixes from Jakub Kicinski: "Including fixes from bpf and netfilter. A lot of networking people were at a conference last week, busy catching COVID, so relatively short PR. Current release - regressions: - tcp: process the 3rd ACK with sk_socket for TFO and MPTCP Current release - new code bugs: - l2tp: protect session IDR and tunnel session list with one lock, make sure the state is coherent to avoid a warning - eth: bnxt_en: update xdp_rxq_info in queue restart logic - eth: airoha: fix location of the MBI_RX_AGE_SEL_MASK field Previous releases - regressions: - xsk: require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len, the field reuses previously un-validated pad Previous releases - always broken: - tap/tun: drop short frames to prevent crashes later in the stack - eth: ice: add a per-VF limit on number of FDIR filters - af_unix: disable MSG_OOB handling for sockets in sockmap/sockhash" * tag 'net-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (34 commits) tun: add missing verification for short frame tap: add missing verification for short frame mISDN: Fix a use after free in hfcmulti_tx() gve: Fix an edge case for TSO skb validity check bnxt_en: update xdp_rxq_info in queue restart logic tcp: process the 3rd ACK with sk_socket for TFO/MPTCP selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len bpf: Fix a segment issue when downgrading gso_size net: mediatek: Fix potential NULL pointer dereference in dummy net_device handling MAINTAINERS: make Breno the netconsole maintainer MAINTAINERS: Update bonding entry net: nexthop: Initialize all fields in dumped nexthops net: stmmac: Correct byte order of perfect_match selftests: forwarding: skip if kernel not support setting bridge fdb learning limit tipc: Return non-zero value from tipc_udp_addr2str() on error netfilter: nft_set_pipapo_avx2: disable softinterrupts ice: Fix recipe read procedure ice: Add a per-VF limit on number of FDIR filters net: bonding: correctly annotate RCU in bond_should_notify_peers() ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linuxLinus Torvalds authored
Pull printk updates from Petr Mladek: - trivial printk changes The bigger "real" printk work is still being discussed. * tag 'printk-for-6.11-trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: vsprintf: add missing MODULE_DESCRIPTION() macro printk: Rename console_replay_all() and update context
-
git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctlLinus Torvalds authored
Pull sysctl constification from Joel Granados: "Treewide constification of the ctl_table argument of proc_handlers using a coccinelle script and some manual code formatting fixups. This is a prerequisite to moving the static ctl_table structs into read-only data section which will ensure that proc_handler function pointers cannot be modified" * tag 'constfy-sysctl-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: sysctl: treewide: constify the ctl_table argument of proc_handlers
-
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efiLinus Torvalds authored
Pull EFI fixes from Ard Biesheuvel: - Wipe screen_info after allocating it from the heap - used by arm32 and EFI zboot, other EFI architectures allocate it statically - Revert to allocating boot_params from the heap on x86 when entering via the native PE entrypoint, to work around a regression on older Dell hardware * tag 'efi-fixes-for-v6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: x86/efistub: Revert to heap allocated boot_params for PE entrypoint efi/libstub: Zero initialize heap allocated struct screen_info
-
git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linuxLinus Torvalds authored
Pull kgdb updates from Daniel Thompson: "Three small changes this cycle: - Clean up an architecture abstraction that is no longer needed because all the architectures have converged. - Actually use the prompt argument to kdb_position_cursor() instead of ignoring it (functionally this fix is a nop but that was due to luck rather than good judgement) - Fix a -Wformat-security warning" * tag 'kgdb-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux: kdb: Get rid of redundant kdb_curr_task() kdb: Use the passed prompt in kdb_position_cursor() kdb: address -Wformat-security warnings
-
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linuxLinus Torvalds authored
Pull MIPS updates from Thomas Bogendoerfer: - Use improved timer sync for Loongson64 - Fix address of GCR_ACCESS register - Add missing MODULE_DESCRIPTION * tag 'mips_6.11_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: mips: sibyte: add missing MODULE_DESCRIPTION() macro MIPS: SMP-CPS: Fix address for GCR_ACCESS register for CM3 and later MIPS: Loongson64: Switch to SYNC_R4K
-
Linus Torvalds authored
Merge tag 'parisc-for-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc updates from Helge Deller: "The gettimeofday() and clock_gettime() syscalls are now available as vDSO functions, and Dave added a patch which allows to use NVMe cards in the PCI slots as fast and easy alternative to SCSI discs. Summary: - add gettimeofday() and clock_gettime() vDSO functions - enable PCI_MSI_ARCH_FALLBACKS to allow PCI to PCIe bridge adaptor with PCIe NVME card to function in parisc machines - allow users to reduce kernel unaligned runtime warnings - minor code cleanups" * tag 'parisc-for-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Add support for CONFIG_SYSCTL_ARCH_UNALIGN_NO_WARN parisc: Use max() to calculate parisc_tlb_flush_threshold parisc: Fix warning at drivers/pci/msi/msi.h:121 parisc: Add 64-bit gettimeofday() and clock_gettime() vDSO functions parisc: Add 32-bit gettimeofday() and clock_gettime() vDSO functions parisc: Clean up unistd.h file
-
git://git.kernel.org/pub/scm/linux/kernel/git/uml/linuxLinus Torvalds authored
Pull UML updates from Richard Weinberger: - Support for preemption - i386 Rust support - Huge cleanup by Benjamin Berg - UBSAN support - Removal of dead code * tag 'uml-for-linus-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (41 commits) um: vector: always reset vp->opened um: vector: remove vp->lock um: register power-off handler um: line: always fill *error_out in setup_one_line() um: remove pcap driver from documentation um: Enable preemption in UML um: refactor TLB update handling um: simplify and consolidate TLB updates um: remove force_flush_all from fork_handler um: Do not flush MM in flush_thread um: Delay flushing syscalls until the thread is restarted um: remove copy_context_skas0 um: remove LDT support um: compress memory related stub syscalls while adding them um: Rework syscall handling um: Add generic stub_syscall6 function um: Create signal stack memory assignment in stub_data um: Remove stub-data.h include from common-offsets.h um: time-travel: fix signal blocking race/hang um: time-travel: remove time_exit() ...
-
Linus Torvalds authored
Merge tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the big set of driver core changes for 6.11-rc1. Lots of stuff in here, with not a huge diffstat, but apis are evolving which required lots of files to be touched. Highlights of the changes in here are: - platform remove callback api final fixups (Uwe took many releases to get here, finally!) - Rust bindings for basic firmware apis and initial driver-core interactions. It's not all that useful for a "write a whole driver in rust" type of thing, but the firmware bindings do help out the phy rust drivers, and the driver core bindings give a solid base on which others can start their work. There is still a long way to go here before we have a multitude of rust drivers being added, but it's a great first step. - driver core const api changes. This reached across all bus types, and there are some fix-ups for some not-common bus types that linux-next and 0-day testing shook out. This work is being done to help make the rust bindings more safe, as well as the C code, moving toward the end-goal of allowing us to put driver structures into read-only memory. We aren't there yet, but are getting closer. - minor devres cleanups and fixes found by code inspection - arch_topology minor changes - other minor driver core cleanups All of these have been in linux-next for a very long time with no reported problems" * tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (55 commits) ARM: sa1100: make match function take a const pointer sysfs/cpu: Make crash_hotplug attribute world-readable dio: Have dio_bus_match() callback take a const * zorro: make match function take a const pointer driver core: module: make module_[add|remove]_driver take a const * driver core: make driver_find_device() take a const * driver core: make driver_[create|remove]_file take a const * firmware_loader: fix soundness issue in `request_internal` firmware_loader: annotate doctests as `no_run` devres: Correct code style for functions that return a pointer type devres: Initialize an uninitialized struct member devres: Fix memory leakage caused by driver API devm_free_percpu() devres: Fix devm_krealloc() wasting memory driver core: platform: Switch to use kmemdup_array() driver core: have match() callback in struct bus_type take a const * MAINTAINERS: add Rust device abstractions to DRIVER CORE device: rust: improve safety comments MAINTAINERS: add Danilo as FIRMWARE LOADER maintainer MAINTAINERS: add Rust FW abstractions to FIRMWARE LOADER firmware: rust: improve safety comments ...
-
git://www.linux-watchdog.org/linux-watchdogLinus Torvalds authored
Pull watchdog updates from Wim Van Sebroeck: - make watchdog_class const - rework of the rzg2l_wdt driver - other small fixes and improvements * tag 'linux-watchdog-6.11-rc1' of git://www.linux-watchdog.org/linux-watchdog: dt-bindings: watchdog: dlg,da9062-watchdog: Drop blank space watchdog: rzn1: Convert comma to semicolon watchdog: lenovo_se10_wdt: Convert comma to semicolon dt-bindings: watchdog: renesas,wdt: Document RZ/G3S support watchdog: rzg2l_wdt: Add suspend/resume support watchdog: rzg2l_wdt: Rely on the reset driver for doing proper reset watchdog: rzg2l_wdt: Remove comparison with zero watchdog: rzg2l_wdt: Remove reset de-assert from probe watchdog: rzg2l_wdt: Check return status of pm_runtime_put() watchdog: rzg2l_wdt: Use pm_runtime_resume_and_get() watchdog: rzg2l_wdt: Make the driver depend on PM watchdog: rzg2l_wdt: Restrict the driver to ARCH_RZG2L and ARCH_R9A09G011 watchdog: imx7ulp_wdt: keep already running watchdog enabled watchdog: starfive: Add missing clk_disable_unprepare() watchdog: Make watchdog_class const
-
git://git.infradead.org/users/hch/dma-mappingLinus Torvalds authored
Pull dma-mapping fix from Christoph Hellwig: - fix the order of actions in dmam_free_coherent (Lance Richardson) * tag 'dma-mapping-6.11-2024-07-24' of git://git.infradead.org/users/hch/dma-mapping: dma: fix call order in dmam_free_coherent
-
Jakub Kicinski authored
Dongli Zhang says: ==================== tap/tun: harden by dropping short frame This is to harden all of tap/tun to avoid any short frame smaller than the Ethernet header (ETH_HLEN). While the xen-netback already rejects short frame smaller than ETH_HLEN ... 914 static void xenvif_tx_build_gops(struct xenvif_queue *queue, 915 int budget, 916 unsigned *copy_ops, 917 unsigned *map_ops) 918 { ... ... 1007 if (unlikely(txreq.size < ETH_HLEN)) { 1008 netdev_dbg(queue->vif->dev, 1009 "Bad packet size: %d\n", txreq.size); 1010 xenvif_tx_err(queue, &txreq, extra_count, idx); 1011 break; 1012 } ... the short frame may not be dropped by vhost-net/tap/tun. This fixes CVE-2024-41090 and CVE-2024-41091. ==================== Link: https://patch.msgid.link/20240724170452.16837-1-dongli.zhang@oracle.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Dongli Zhang authored
The cited commit missed to check against the validity of the frame length in the tun_xdp_one() path, which could cause a corrupted skb to be sent downstack. Even before the skb is transmitted, the tun_xdp_one-->eth_type_trans() may access the Ethernet header although it can be less than ETH_HLEN. Once transmitted, this could either cause out-of-bound access beyond the actual length, or confuse the underlayer with incorrect or inconsistent header length in the skb metadata. In the alternative path, tun_get_user() already prohibits short frame which has the length less than Ethernet header size from being transmitted for IFF_TAP. This is to drop any frame shorter than the Ethernet header size just like how tun_get_user() does. CVE: CVE-2024-41091 Inspired-by: https://lore.kernel.org/netdev/1717026141-25716-1-git-send-email-si-wei.liu@oracle.com/ Fixes: 043d222f ("tuntap: accept an array of XDP buffs through sendmsg()") Cc: stable@vger.kernel.org Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com> Link: https://patch.msgid.link/20240724170452.16837-3-dongli.zhang@oracle.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Si-Wei Liu authored
The cited commit missed to check against the validity of the frame length in the tap_get_user_xdp() path, which could cause a corrupted skb to be sent downstack. Even before the skb is transmitted, the tap_get_user_xdp()-->skb_set_network_header() may assume the size is more than ETH_HLEN. Once transmitted, this could either cause out-of-bound access beyond the actual length, or confuse the underlayer with incorrect or inconsistent header length in the skb metadata. In the alternative path, tap_get_user() already prohibits short frame which has the length less than Ethernet header size from being transmitted. This is to drop any frame shorter than the Ethernet header size just like how tap_get_user() does. CVE: CVE-2024-41090 Link: https://lore.kernel.org/netdev/1717026141-25716-1-git-send-email-si-wei.liu@oracle.com/ Fixes: 0efac277 ("tap: accept an array of XDP buffs through sendmsg()") Cc: stable@vger.kernel.org Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com> Link: https://patch.msgid.link/20240724170452.16837-2-dongli.zhang@oracle.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Dan Carpenter authored
Don't dereference *sp after calling dev_kfree_skb(*sp). Fixes: af69fb3a ("Add mISDN HFC multiport driver") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/8be65f5a-c2dd-4ba0-8a10-bfe5980b8cfb@stanley.mountainSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Bailey Forrest authored
The NIC requires each TSO segment to not span more than 10 descriptors. NIC further requires each descriptor to not exceed 16KB - 1 (GVE_TX_MAX_BUF_SIZE_DQO). The descriptors for an skb are generated by gve_tx_add_skb_no_copy_dqo() for DQO RDA queue format. gve_tx_add_skb_no_copy_dqo() loops through each skb frag and generates a descriptor for the entire frag if the frag size is not greater than GVE_TX_MAX_BUF_SIZE_DQO. If the frag size is greater than GVE_TX_MAX_BUF_SIZE_DQO, it is split into descriptor(s) of size GVE_TX_MAX_BUF_SIZE_DQO and a descriptor is generated for the remainder (frag size % GVE_TX_MAX_BUF_SIZE_DQO). gve_can_send_tso() checks if the descriptors thus generated for an skb would meet the requirement that each TSO-segment not span more than 10 descriptors. However, the current code misses an edge case when a TSO segment spans multiple descriptors within a large frag. This change fixes the edge case. gve_can_send_tso() relies on the assumption that max gso size (9728) is less than GVE_TX_MAX_BUF_SIZE_DQO and therefore within an skb fragment a TSO segment can never span more than 2 descriptors. Fixes: a57e5de4 ("gve: DQO: Add TX path") Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Jeroen de Borst <jeroendb@google.com> Cc: stable@vger.kernel.org Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240724143431.3343722-1-pkaligineedi@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Taehee Yoo authored
When the netdev_rx_queue_restart() restarts queues, the bnxt_en driver updates(creates and deletes) a page_pool. But it doesn't update xdp_rxq_info, so the xdp_rxq_info is still connected to an old page_pool. So, bnxt_rx_ring_info->page_pool indicates a new page_pool, but bnxt_rx_ring_info->xdp_rxq is still connected to an old page_pool. An old page_pool is no longer used so it is supposed to be deleted by page_pool_destroy() but it isn't. Because the xdp_rxq_info is holding the reference count for it and the xdp_rxq_info is not updated, an old page_pool will not be deleted in the queue restart logic. Before restarting 1 queue: ./tools/net/ynl/samples/page-pool enp10s0f1np1[6] page pools: 4 (zombies: 0) refs: 8192 bytes: 33554432 (refs: 0 bytes: 0) recycling: 0.0% (alloc: 128:8048 recycle: 0:0) After restarting 1 queue: ./tools/net/ynl/samples/page-pool enp10s0f1np1[6] page pools: 5 (zombies: 0) refs: 10240 bytes: 41943040 (refs: 0 bytes: 0) recycling: 20.0% (alloc: 160:10080 recycle: 1920:128) Before restarting queues, an interface has 4 page_pools. After restarting one queue, an interface has 5 page_pools, but it should be 4, not 5. The reason is that queue restarting logic creates a new page_pool and an old page_pool is not deleted due to the absence of an update of xdp_rxq_info logic. Fixes: 2d694c27 ("bnxt_en: implement netdev_queue_mgmt_ops") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: David Wei <dw@davidwei.uk> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Link: https://patch.msgid.link/20240721053554.1233549-1-ap420073@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski authored
Daniel Borkmann says: ==================== pull-request: bpf 2024-07-25 We've added 14 non-merge commits during the last 8 day(s) which contain a total of 19 files changed, 177 insertions(+), 70 deletions(-). The main changes are: 1) Fix af_unix to disable MSG_OOB handling for sockets in BPF sockmap and BPF sockhash. Also add test coverage for this case, from Michal Luczaj. 2) Fix a segmentation issue when downgrading gso_size in the BPF helper bpf_skb_adjust_room(), from Fred Li. 3) Fix a compiler warning in resolve_btfids due to a missing type cast, from Liwei Song. 4) Fix stack allocation for arm64 to align the stack pointer at a 16 byte boundary in the fexit_sleep BPF selftest, from Puranjay Mohan. 5) Fix a xsk regression to require a flag when actuating tx_metadata_len, from Stanislav Fomichev. 6) Fix function prototype BTF dumping in libbpf for prototypes that have no input arguments, from Andrii Nakryiko. 7) Fix stacktrace symbol resolution in perf script for BPF programs containing subprograms, from Hou Tao. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len bpf: Fix a segment issue when downgrading gso_size tools/resolve_btfids: Fix comparison of distinct pointer types warning in resolve_btfids bpf, events: Use prog to emit ksymbol event for main program selftests/bpf: Test sockmap redirect for AF_UNIX MSG_OOB selftests/bpf: Parametrize AF_UNIX redir functions to accept send() flags selftests/bpf: Support SOCK_STREAM in unix_inet_redir_to_connected() af_unix: Disable MSG_OOB handling for sockets in sockmap/sockhash bpftool: Fix typo in usage help libbpf: Fix no-args func prototype BTF dumping syntax MAINTAINERS: Update powerpc BPF JIT maintainers MAINTAINERS: Update email address of Naveen selftests/bpf: fexit_sleep: Fix stack allocation for arm64 ==================== Link: https://patch.msgid.link/20240725114312.32197-1-daniel@iogearbox.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Will Deacon authored
Lina reports random oopsen originating from the fast GUP code when 16K pages are used with 4-level page-tables, the fourth level being folded at runtime due to lack of LPA2. In this configuration, the generic implementation of p4d_offset_lockless() will return a 'p4d_t *' corresponding to the 'pgd_t' allocated on the stack of the caller, gup_fast_pgd_range(). This is normally fine, but when the fourth level of page-table is folded at runtime, pud_offset_lockless() will offset from the address of the 'p4d_t' to calculate the address of the PUD in the same page-table page. This results in a stray stack read when the 'p4d_t' has been allocated on the stack and can send the walker into the weeds. Fix the problem by providing our own definition of p4d_offset_lockless() when CONFIG_PGTABLE_LEVELS <= 4 which returns the real page-table pointer rather than the address of the local stack variable. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/50360968-13fb-4e6f-8f52-1725b3177215@asahilina.net Fixes: 0dd4f60a ("arm64: mm: Add support for folding PUDs at runtime") Reported-by: Asahi Lina <lina@asahilina.net> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240725090345.28461-1-will@kernel.orgSigned-off-by: Will Deacon <will@kernel.org>
-
Matthieu Baerts (NGI0) authored
The 'Fixes' commit recently changed the behaviour of TCP by skipping the processing of the 3rd ACK when a sk->sk_socket is set. The goal was to skip tcp_ack_snd_check() in tcp_rcv_state_process() not to send an unnecessary ACK in case of simultaneous connect(). Unfortunately, that had an impact on TFO and MPTCP. I started to look at the impact on MPTCP, because the MPTCP CI found some issues with the MPTCP Packetdrill tests [1]. Then Paolo Abeni suggested me to look at the impact on TFO with "plain" TCP. For MPTCP, when receiving the 3rd ACK of a request adding a new path (MP_JOIN), sk->sk_socket will be set, and point to the MPTCP sock that has been created when the MPTCP connection got established before with the first path. The newly added 'goto' will then skip the processing of the segment text (step 7) and not go through tcp_data_queue() where the MPTCP options are validated, and some actions are triggered, e.g. sending the MPJ 4th ACK [2] as demonstrated by the new errors when running a packetdrill test [3] establishing a second subflow. This doesn't fully break MPTCP, mainly the 4th MPJ ACK that will be delayed. Still, we don't want to have this behaviour as it delays the switch to the fully established mode, and invalid MPTCP options in this 3rd ACK will not be caught any more. This modification also affects the MPTCP + TFO feature as well, and being the reason why the selftests started to be unstable the last few days [4]. For TFO, the existing 'basic-cookie-not-reqd' test [5] was no longer passing: if the 3rd ACK contains data, and the connection is accept()ed before receiving them, these data would no longer be processed, and thus not ACKed. One last thing about MPTCP, in case of simultaneous connect(), a fallback to TCP will be done, which seems fine: `../common/defaults.sh` 0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_MPTCP) = 3 +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress) +0 > S 0:0(0) <mss 1460, sackOK, TS val 100 ecr 0, nop, wscale 8, mpcapable v1 flags[flag_h] nokey> +0 < S 0:0(0) win 1000 <mss 1460, sackOK, TS val 407 ecr 0, nop, wscale 8, mpcapable v1 flags[flag_h] nokey> +0 > S. 0:0(0) ack 1 <mss 1460, sackOK, TS val 330 ecr 0, nop, wscale 8, mpcapable v1 flags[flag_h] nokey> +0 < S. 0:0(0) ack 1 win 65535 <mss 1460, sackOK, TS val 700 ecr 100, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey=2]> +0 > . 1:1(0) ack 1 <nop, nop, TS val 845707014 ecr 700, nop, nop, sack 0:1> Simultaneous SYN-data crossing is also not supported by TFO, see [6]. Kuniyuki Iwashima suggested to restrict the processing to SYN+ACK only: that's a more generic solution than the one initially proposed, and also enough to fix the issues described above. Later on, Eric Dumazet mentioned that an ACK should still be sent in reaction to the second SYN+ACK that is received: not sending a DUPACK here seems wrong and could hurt: 0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_TCP) = 3 +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress) +0 > S 0:0(0) <mss 1460, sackOK, TS val 1000 ecr 0,nop,wscale 8> +0 < S 0:0(0) win 1000 <mss 1000, sackOK, nop, nop> +0 > S. 0:0(0) ack 1 <mss 1460, sackOK, TS val 3308134035 ecr 0,nop,wscale 8> +0 < S. 0:0(0) ack 1 win 1000 <mss 1000, sackOK, nop, nop> +0 > . 1:1(0) ack 1 <nop, nop, sack 0:1> // <== Here So in this version, the 'goto consume' is dropped, to always send an ACK when switching from TCP_SYN_RECV to TCP_ESTABLISHED. This ACK will be seen as a DUPACK -- with DSACK if SACK has been negotiated -- in case of simultaneous SYN crossing: that's what is expected here. Link: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/9936227696 [1] Link: https://datatracker.ietf.org/doc/html/rfc8684#fig_tokens [2] Link: https://github.com/multipath-tcp/packetdrill/blob/mptcp-net-next/gtests/net/mptcp/syscalls/accept.pkt#L28 [3] Link: https://netdev.bots.linux.dev/contest.html?executor=vmksft-mptcp-dbg&test=mptcp-connect-sh [4] Link: https://github.com/google/packetdrill/blob/master/gtests/net/tcp/fastopen/server/basic-cookie-not-reqd.pkt#L21 [5] Link: https://github.com/google/packetdrill/blob/master/gtests/net/tcp/fastopen/client/simultaneous-fast-open.pkt [6] Fixes: 23e89e8e ("tcp: Don't drop SYN+ACK for simultaneous connect().") Suggested-by: Paolo Abeni <pabeni@redhat.com> Suggested-by: Kuniyuki Iwashima <kuniyu@amazon.com> Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240724-upstream-net-next-20240716-tcp-3rd-ack-consume-sk_socket-v3-1-d48339764ce9@kernel.orgSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Ilya Dryomov authored
Expanding on the previous commit, assuming that rbd_is_lock_owner() always returns true (i.e. that we are either in RBD_LOCK_STATE_LOCKED or RBD_LOCK_STATE_QUIESCING) if the mapping is exclusive is wrong too. In case ceph_cls_set_cookie() fails, the lock would be temporarily released even if the mapping is exclusive, meaning that we can end up even in RBD_LOCK_STATE_UNLOCKED. IOW, exclusive mappings are really "just" about disabling automatic lock transitions (as documented in the man page), not about grabbing the lock and holding on to it whatever it takes. Cc: stable@vger.kernel.org Fixes: 637cd060 ("rbd: new exclusive lock wait/wake code") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
-
Ilya Dryomov authored
Every time a watch is reestablished after getting lost, we need to update the cookie which involves quiescing exclusive lock. For this, we transition from RBD_LOCK_STATE_LOCKED to RBD_LOCK_STATE_QUIESCING roughly for the duration of rbd_reacquire_lock() call. If the mapping is exclusive and I/O happens to arrive in this time window, it's failed with EROFS (later translated to EIO) based on the wrong assumption in rbd_img_exclusive_lock() -- "lock got released?" check there stopped making sense with commit a2b1da09 ("rbd: lock should be quiesced on reacquire"). To make it worse, any such I/O is added to the acquiring list before EROFS is returned and this sets up for violating rbd_lock_del_request() precondition that the request is either on the running list or not on any list at all -- see commit ded080c8 ("rbd: don't move requests to the running list on errors"). rbd_lock_del_request() ends up processing these requests as if they were on the running list which screws up quiescing_wait completion counter and ultimately leads to rbd_assert(!completion_done(&rbd_dev->quiescing_wait)); being triggered on the next watch error. Cc: stable@vger.kernel.org # 06ef84c4e9c4: rbd: rename RBD_LOCK_STATE_RELEASING and releasing_wait Cc: stable@vger.kernel.org Fixes: 637cd060 ("rbd: new exclusive lock wait/wake code") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
-
Ilya Dryomov authored
... to RBD_LOCK_STATE_QUIESCING and quiescing_wait to recognize that this state and the associated completion are backing rbd_quiesce_lock(), which isn't specific to releasing the lock. While exclusive lock does get quiesced before it's released, it also gets quiesced before an attempt to update the cookie is made and there the lock is not released as long as ceph_cls_set_cookie() succeeds. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
-
Stanislav Fomichev authored
This flag is now required to use tx_metadata_len. Fixes: 40808a23 ("selftests/bpf: Add TX side to xdp_metadata") Reported-by: Julian Schindel <mail@arctic-alpaca.de> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/bpf/20240713015253.121248-3-sdf@fomichev.me
-
Stanislav Fomichev authored
Julian reports that commit 341ac980 ("xsk: Support tx_metadata_len") can break existing use cases which don't zero-initialize xdp_umem_reg padding. Introduce new XDP_UMEM_TX_METADATA_LEN to make sure we interpret the padding as tx_metadata_len only when being explicitly asked. Fixes: 341ac980 ("xsk: Support tx_metadata_len") Reported-by: Julian Schindel <mail@arctic-alpaca.de> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/bpf/20240713015253.121248-2-sdf@fomichev.me
-
Fred Li authored
Linearize the skb when downgrading gso_size because it may trigger a BUG_ON() later when the skb is segmented as described in [1,2]. Fixes: 2be7e212 ("bpf: add bpf_skb_adjust_room helper") Signed-off-by: Fred Li <dracodingfly@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/all/20240626065555.35460-2-dracodingfly@gmail.com [1] Link: https://lore.kernel.org/all/668d5cf1ec330_1c18c32947@willemb.c.googlers.com.notmuch [2] Link: https://lore.kernel.org/bpf/20240719024653.77006-1-dracodingfly@gmail.com
-
Breno Leitao authored
Move the freeing of the dummy net_device from mtk_free_dev() to mtk_remove(). Previously, if alloc_netdev_dummy() failed in mtk_probe(), eth->dummy_dev would be NULL. The error path would then call mtk_free_dev(), which in turn called free_netdev() assuming dummy_dev was allocated (but it was not), potentially causing a NULL pointer dereference. By moving free_netdev() to mtk_remove(), we ensure it's only called when mtk_probe() has succeeded and dummy_dev is fully allocated. This addresses a potential NULL pointer dereference detected by Smatch[1]. Fixes: b209bd6d ("net: mediatek: mtk_eth_sock: allocate dummy net_device dynamically") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/all/4160f4e0-cbef-4a22-8b5d-42c4d399e1f7@stanley.mountain/ [1] Suggested-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240724080524.2734499-1-leitao@debian.orgSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfPaolo Abeni authored
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains a Netfilter fix for net: Patch #1 if FPU is busy, then pipapo set backend falls back to standard set element lookup. Moreover, disable bh while at this. From Florian Westphal. netfilter pull request 24-07-24 * tag 'nf-24-07-24' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nft_set_pipapo_avx2: disable softinterrupts ==================== Link: https://patch.msgid.link/20240724081305.3152-1-pablo@netfilter.orgSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queuePaolo Abeni authored
Tony Nguyen says: ==================== This series contains updates to ice driver only. Ahmed enforces the iavf per VF filter limit on ice (PF) driver to prevent possible resource exhaustion. Wojciech corrects assignment of l2 flags read from firmware. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: Fix recipe read procedure ice: Add a per-VF limit on number of FDIR filters ==================== Link: https://patch.msgid.link/20240723233242.3146628-1-anthony.l.nguyen@intel.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-