1. 17 Sep, 2024 23 commits
    • Linus Torvalds's avatar
      Merge tag 'x86-timers-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · fc1dc0d5
      Linus Torvalds authored
      Pull x86 timer updates from Thomas Gleixner:
      
       - Use the topology information of number of packages for making the
         decision about TSC trust instead of using the number of online nodes
         which is not reflecting the real topology.
      
       - Stop the PIT timer 0 when its not in use as to stop pointless
         emulation in the VMM.
      
       - Fix the PIT timer stop sequence for timer 0 so it truly stops both
         real hardware and buggy VMM emulations.
      
      * tag 'x86-timers-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/tsc: Check for sockets instead of CPUs to make code match comment
        clockevents/drivers/i8253: Fix stop sequence for timer 0
        x86/i8253: Disable PIT timer 0 when not in use
        x86/tsc: Use topology_max_packages() to get package number
      fc1dc0d5
    • Linus Torvalds's avatar
      Merge tag 'x86-misc-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b5075354
      Linus Torvalds authored
      Pull misc x86 updates from Thomas Gleixner:
      
       - Rework kcpuid to handle the the autogenerated CSV file correctly and
         update the CSV file to cover the whole zoo of CPUID.
      
       - Avoid memcpy() for ia32 syscall_get_arguments() and use direct
         assignments as fortified memcpy() is unhappy about writing/reading
         beyond the end of the addresses destination/source struct member
      
       - A few new PCI IDs for AMD
      
       - Update MAINTAINERS to cover x86 specific selftests
      
      * tag 'x86-misc-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        MAINTAINERS: Add selftests/x86 entry
        x86/amd_nb: Add new PCI IDs for AMD family 1Ah model 60h-70h
        x86/syscall: Avoid memcpy() for ia32 syscall_get_arguments()
        MAINTAINERS: Add x86 cpuid database entry
        tools/x86/kcpuid: Introduce a complete cpuid bitfields CSV file
        tools/x86/kcpuid: Parse subleaf ranges if provided
        tools/x86/kcpuid: Recognize all leaves with subleaves
        tools/x86/kcpuid: Strip bitfield names leading/trailing whitespace
        tools/x86/kcpuid: Protect against faulty "max subleaf" values
        tools/x86/kcpuid: Set max possible subleaves count to 64
        tools/x86/kcpuid: Properly align long-description columns
        tools/x86/kcpuid: Remove unused variable
        x86/amd_nb: Add new PCI IDs for AMD family 1Ah model 60h
      b5075354
    • Linus Torvalds's avatar
      Merge tag 'x86-platform-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a3233da6
      Linus Torvalds authored
      Pull x86 platform update from Thomas Gleixner:
       "Remove a stale declaration from the UV platform code"
      
      * tag 'x86-platform-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/platform/uv: Remove unused declaration uv_irq_2_mmr_info()
      a3233da6
    • Linus Torvalds's avatar
      Merge tag 'x86-mm-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 70f43ea3
      Linus Torvalds authored
      Pull x86 memory management updates from Thomas Gleixner:
      
       - Make LAM enablement safe vs. kernel threads using a process mm
         temporarily as switching back to the process would not update CR3 and
         therefore not enable LAM causing faults in user space when using
         tagged pointers. Cure it by synchronizing LAM enablement via IPIs to
         all CPUs which use the related mm.
      
       - Cure a LAM harmless inconsistency between CR3 and the state during
         context switch. It's both confusing and prone to lead to real bugs
      
       - Handle alt stack handling for threads which run with a non-zero
         protection key. The non-zero key prevents the kernel to access the
         alternate stack. Cure it by temporarily enabling all protection keys
         for the alternate stack setup/restore operations.
      
       - Provide a EFI config table identity mapping for kexec kernel to
         prevent kexec fails because the new kernel cannot access the config
         table array
      
       - Use GB pages only when a full GB is mapped in the identity map as
         otherwise the CPU can speculate into reserved areas after the end of
         memory which causes malfunction on UV systems.
      
       - Remove the noisy and pointless SRAT table dump during boot
      
       - Use is_ioremap_addr() for iounmap() address range checks instead of
         high_memory. is_ioremap_addr() is more precise.
      
      * tag 'x86-mm-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/ioremap: Improve iounmap() address range checks
        x86/mm: Remove duplicate check from build_cr3()
        x86/mm: Remove unused NX related declarations
        x86/mm: Remove unused CR3_HW_ASID_BITS
        x86/mm: Don't print out SRAT table information
        x86/mm/ident_map: Use gbpages only where full GB page should be mapped.
        x86/kexec: Add EFI config table identity mapping for kexec kernel
        selftests/mm: Add new testcases for pkeys
        x86/pkeys: Restore altstack access in sigreturn()
        x86/pkeys: Update PKRU to enable all pkeys before XSAVE
        x86/pkeys: Add helper functions to update PKRU on the sigframe
        x86/pkeys: Add PKRU as a parameter in signal handling functions
        x86/mm: Cleanup prctl_enable_tagged_addr() nr_bits error checking
        x86/mm: Fix LAM inconsistency during context switch
        x86/mm: Use IPIs to synchronize LAM enablement
      70f43ea3
    • Linus Torvalds's avatar
      Merge tag 'x86-fred-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b1360211
      Linus Torvalds authored
      Pull x86 FRED updates from Thomas Gleixner:
      
       - Enable FRED right after init_mem_mapping() because at that point the
         early IDT fault handler is replaced by the real fault handler. The
         real fault handler retrieves the faulting address from the stack
         frame and not from CR2 when the FRED feature is set. But that
         obviously only works when FRED is enabled in the CPU as well.
      
       - Set SS to __KERNEL_DS when enabling FRED to prevent a corner case
         where ERETS can observe a SS mismatch and raises a #GP.
      
      * tag 'x86-fred-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/entry: Set FRED RSP0 on return to userspace instead of context switch
        x86/msr: Switch between WRMSRNS and WRMSR with the alternatives mechanism
        x86/entry: Test ti_work for zero before processing individual bits
        x86/fred: Set SS to __KERNEL_DS when enabling FRED
        x86/fred: Enable FRED right after init_mem_mapping()
        x86/fred: Move FRED RSP initialization into a separate function
        x86/fred: Parse cmdline param "fred=" in cpu_parse_early_param()
      b1360211
    • Linus Torvalds's avatar
      Merge tag 'x86-fpu-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c3056a7d
      Linus Torvalds authored
      Pull x86 fpu updates from Thomas Gleixner:
       "Provide FPU buffer layout in core dumps:
      
        Debuggers have guess the FPU buffer layout in core dumps, which is
        error prone. This is because AMD and Intel layouts differ.
      
        To avoid buggy heuristics add a ELF section which describes the buffer
        layout which can be retrieved by tools"
      
      * tag 'x86-fpu-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/elf: Add a new FPU buffer layout info to x86 core files
      c3056a7d
    • Linus Torvalds's avatar
      Merge tag 'x86-core-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · dea435d3
      Linus Torvalds authored
      Pull x86 core update from Thomas Gleixner:
       "Enable UBSAN traps for x86, which provides better reporting through
        metadata encodeded into UD1"
      
      * tag 'x86-core-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/traps: Enable UBSAN traps on x86
      dea435d3
    • Linus Torvalds's avatar
      Merge tag 'x86-apic-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 61d1ea91
      Linus Torvalds authored
      Pull x86 APIC updates from Thomas Gleixner:
      
       - Handle an allocation failure in the IO/APIC code gracefully instead
         of crashing the machine.
      
       - Remove support for APIC local destination mode on 64bit
      
         Logical destination mode of the local APIC is used for systems with
         up to 8 CPUs. It has an advantage over physical destination mode as
         it allows to target multiple CPUs at once with IPIs. That advantage
         was definitely worth it when systems with up to 8 CPUs were state of
         the art for servers and workstations, but that's history.
      
         In the recent past there were quite some reports of new laptops
         failing to boot with logical destination mode, but they work fine
         with physical destination mode. That's not a suprise because physical
         destination mode is guaranteed to work as it's the only way to get a
         CPU up and running via the INIT/INIT/STARTUP sequence. Some of the
         affected systems were cured by BIOS updates, but not all OEMs provide
         them.
      
         As the number of CPUs keep increasing, logical destination mode
         becomes less used and the benefit for small systems, like laptops, is
         not really worth the trouble. So just remove logical destination mode
         support for 64bit and be done with it.
      
       - Code and comment cleanups in the APIC area.
      
      * tag 'x86-apic-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/irq: Fix comment on IRQ vector layout
        x86/apic: Remove unused extern declarations
        x86/apic: Remove logical destination mode for 64-bit
        x86/apic: Remove unused inline function apic_set_eoi_cb()
        x86/ioapic: Cleanup remaining coding style issues
        x86/ioapic: Cleanup line breaks
        x86/ioapic: Cleanup bracket usage
        x86/ioapic: Cleanup comments
        x86/ioapic: Move replace_pin_at_irq_node() to the call site
        iommu/vt-d: Cleanup apic_printk()
        x86/mpparse: Cleanup apic_printk()s
        x86/ioapic: Cleanup guarded debug printk()s
        x86/ioapic: Cleanup apic_printk()s
        x86/apic: Cleanup apic_printk()s
        x86/apic: Provide apic_printk() helpers
        x86/ioapic: Use guard() for locking where applicable
        x86/ioapic: Cleanup structs
        x86/ioapic: Mark mp_alloc_timer_irq() __init
        x86/ioapic: Handle allocation failures gracefully
      61d1ea91
    • Linus Torvalds's avatar
      Merge tag 'x86-cleanups-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0279aa78
      Linus Torvalds authored
      Pull x86 cleanups from Thomas Gleixner:
       "A set of cleanups across x86:
      
         - Use memremap() for the EISA probe instead of ioremap(). EISA is
           strictly memory and not MMIO
      
         - Cleanups and enhancement all over the place"
      
      * tag 'x86-cleanups-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/EISA: Dereference memory directly instead of using readl()
        x86/extable: Remove unused declaration fixup_bug()
        x86/boot/64: Strip percpu address space when setting up GDT descriptors
        x86/cpu: Clarify the error message when BIOS does not support SGX
        x86/kexec: Add comments around swap_pages() assembly to improve readability
        x86/kexec: Fix a comment of swap_pages() assembly
        x86/sgx: Fix a W=1 build warning in function comment
        x86/EISA: Use memremap() to probe for the EISA BIOS signature
        x86/mtrr: Remove obsolete declaration for mtrr_bp_restore()
        x86/cpu_entry_area: Annotate percpu_setup_exception_stacks() as __init
      0279aa78
    • Linus Torvalds's avatar
      Merge tag 'x86-build-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 5ba202a7
      Linus Torvalds authored
      Pull x86 build updates from Thomas Gleixner:
       "Updates for KCOV instrumentation on x86:
      
         - Prevent spurious KCOV coverage in common_interrupt()
      
         - Fixup the KCOV Makefile directive which got stale due to a source
           file rename
      
         - Exclude stack unwinding from KCOV as it creates large amounts of
           uninteresting coverage
      
         - Provide a self test to validate that KCOV coverage of the interrupt
           handling code starts not before preempt count got updated"
      
      * tag 'x86-build-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86: Ignore stack unwinding in KCOV
        module: Fix KCOV-ignored file name
        kcov: Add interrupt handling self test
        x86/entry: Remove unwanted instrumentation in common_interrupt()
      5ba202a7
    • Linus Torvalds's avatar
      Merge tag 'soc-arm-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · a940d9a4
      Linus Torvalds authored
      Pull SoC ARM platform updates from Arnd Bergmann:
       "Most of these updates are for removing dead code on the Samsung S3C,
        NXP i.MX, TI OMAP and TI DaVinci platforms, though this appears to be
        a coincidence.
      
        There are also cleanups for the Marvell Orion family and the Arm
        integrator series and a Kconfig change for Broadcom"
      
      * tag 'soc-arm-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
        ARM: dove: Drop a write-only variable
        ARM: orion5x: Switch to new sys-off handler API
        ARM: mvebu: Warn about memory chunks too small for DDR training
        ARM: imx: Annotate imx7d_enet_init() as __init
        ARM: OMAP1: Remove unused declarations in arch/arm/mach-omap1/pm.h
        ARM: s3c: remove unused s3c2410_cpu_suspend() declaration
        ARM: s3c: remove unused declarations for s3c6400
        ARM: s3c: Remove unused s3c_init_uart_irqs() declaration
        ARM: davinci: remove unused cpuidle code
        ARM: davinci: remove unused davinci_init_ide() declaration
        ARM: davinci: remove unused davinci_cfg_reg_list() declaration
        ARM: mach-imx: imx6sx: Remove Ethernet refclock setting
        MAINTAINERS: Add entry for Samsung Exynos850 SoC
        ARM: bcm: Select ARM_GIC_V3 for ARCH_BRCMSTB
        ARM: omap2: Switch to use kmemdup_array()
        ARM: omap1: Remove unused struct 'dma_link_info'
        ARM: s3c: Drop explicit initialization of struct i2c_device_id::driver_data to 0
      a940d9a4
    • Linus Torvalds's avatar
      Merge tag 'soc-defconfig-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · 38ea77ab
      Linus Torvalds authored
      Pull SoC defconfig updates from Arnd Bergmann:
       "The updates to the defconfig files are fairly small, enabling drivers
        for eight of the arm and riscv based platforms"
      
      * tag 'soc-defconfig-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
        arm64: defconfig: enable mt8365 sound
        riscv: defconfig: Enable pinctrl support for CV18XX Series SoC
        arm64: defconfig: Enable ADP5585 GPIO and PWM drivers
        arm64: defconfig: Enable Tegra194 PCIe Endpoint
        arm64: defconfig: Enable E5010 JPEG Encoder
        riscv: defconfig: sophgo: enable clks for sg2042
        arm64: defconfig: build CONFIG_REGULATOR_QCOM_REFGEN as module
        ARM: configs: at91: enable config flags for sam9x7 SoC family
        arm64: defconfig: Enable R-Car Ethernet-TSN support
        ARM: shmobile: defconfig: Enable slab hardening and kmalloc buckets
        arm64: defconfig: Enable AK4619 codec support
      38ea77ab
    • Linus Torvalds's avatar
      Merge tag 'soc-drivers-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · b8979c6b
      Linus Torvalds authored
      Pull SoC driver updates from Arnd Bergmann:
       "The driver updates seem larger this time around, with changes is many
        of the SoC specific drivers, both the custom drivers/soc ones and the
        closely related subsystems (memory, bus, firmware, reset, ...).
      
        The at91 platform gains support for sam9x7 chips in the soc and power
        management code. This is the latest variant of one of the oldest still
        supported SoC families, using the ARM9 (ARMv5) core.
      
        As usual, the qualcomm snapdragon platform gets a ton of updates in
        many of their drivers to add more features and additional SoC support.
        Most of these are somewhat firmware related as the platform has a
        number of firmware based interfaces to the kernel. A notable addition
        here is the inclusion of trace events to two of these drivers.
      
        Herve Codina and Christophe Leroy are now sending updates for
        drivers/soc/fsl/ code through the SoC tree, this contains both PowerPC
        and Arm specific platforms and has previously been problematic to
        maintain. The first update here contains support for newer PowerPC
        variants and some cleanups.
      
        The turris mox firmware driver has a number of updates, mostly
        cleanups.
      
        The Arm SCMI firmware driver gets a major rework to modularize the
        existing code into separately loadable drivers for the various
        transports, the addition of custom NXP i.MX9 interfaces and a number
        of smaller updates.
      
        The Arm FF-A firmware driver gets a feature update to support the v1.2
        version of the specification.
      
        The reset controller drivers have some smaller cleanups and a newly
        added driver for the Intel/Mobileye EyeQ5/EyeQ6 MIPS SoCs.
      
        The memory controller drivers get some cleanups and refactoring for
        Tegra, TI, Freescale/NXP and a couple more platforms.
      
        Finally there are lots of minor updates to firmware (raspberry pi,
        tegra, imx), bus (sunxi, omap, tegra) and soc (rockchips, tegra,
        amlogic, mediatek) drivers and their DT bindings"
      
      * tag 'soc-drivers-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (212 commits)
        firmware: imx: remove duplicate scmi_imx_misc_ctrl_get()
        platform: cznic: turris-omnia-mcu: Fix error check in omnia_mcu_register_trng()
        bus: sunxi-rsb: Simplify code with dev_err_probe()
        soc: fsl: qe: ucc: Export ucc_mux_set_grant_tsa_bkpt
        soc: fsl: cpm1: qmc: Fix dependency on fsl_soc.h
        dt-bindings: arm: rockchip: Add rk3576 compatible string to pmu.yaml
        soc: fsl: qbman: Remove redundant warnings
        soc: fsl: qbman: Use iommu_paging_domain_alloc()
        MAINTAINERS: Add QE files related to the Freescale QMC controller
        soc: fsl: cpm1: qmc: Handle QUICC Engine (QE) soft-qmc firmware
        soc: fsl: cpm1: qmc: Add support for QUICC Engine (QE) implementation
        soc: fsl: qe: Add missing PUSHSCHED command
        soc: fsl: qe: Add resource-managed muram allocators
        soc: fsl: cpm1: qmc: Introduce qmc_version
        soc: fsl: cpm1: qmc: Rename SCC_GSMRL_MODE_QMC
        soc: fsl: cpm1: qmc: Handle RPACK initialization
        soc: fsl: cpm1: qmc: Rename qmc_chan_command()
        soc: fsl: cpm1: qmc: Introduce qmc_{init,exit}_xcc() and their CPM1 version
        soc: fsl: cpm1: qmc: Introduce qmc_init_resource() and its CPM1 version
        soc: fsl: cpm1: qmc: Re-order probe() operations
        ...
      b8979c6b
    • Linus Torvalds's avatar
      Merge tag 'soc-dt-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · 7b17f5eb
      Linus Torvalds authored
      Pull SoC devicetree updates from Arnd Bergmann:
       "New SoC support for Broadcom bcm2712 (Raspberry Pi 5) and Renesas
        R9A09G057 (RZ/V2H(P)) and Qualcomm Snapdragon 414 (MSM8929), all three
        of these are variants of already supported chips, in particular the
        last one is almost identical to MSM8939.
      
        Lots of updates to Mediatek, ASpeed, Rockchips, Amlogic, Qualcomm,
        STM32, NXP i.MX, Sophgo, TI K3, Renesas, Microchip at91, NVIDIA Tegra,
        and T-HEAD.
      
        The added Qualcomm platform support once again dominates the changes,
        with seven phones and three laptops getting added in addition to many
        new features on existing machines. The Snapdragon X1E support
        specifically keeps improving.
      
        The other new machines are:
      
         - eight new machines using various 64-bit Rockchips SoCs, both on the
           consumer/gaming side and developer boards
      
         - three industrial boards with 64-bit i.MX, which is a very low
           number for them.
      
         - four more servers using a 32-bit Speed BMC
      
         - three boards using STM32MP1 SoCs
      
         - one new machine each using allwinner, amlogic, broadcom and renesas
           chips"
      
      * tag 'soc-dt-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (672 commits)
        arm64: dts: allwinner: h5: NanoPi NEO Plus2: Use regulators for pio
        arm64: dts: mediatek: add audio support for mt8365-evk
        arm64: dts: mediatek: add afe support for mt8365 SoC
        arm64: dts: mediatek: mt8186-corsola: Disable DPI display interface
        arm64: dts: mediatek: mt8186: Add svs node
        arm64: dts: mediatek: mt8186: Add power domain for DPI
        arm64: dts: mediatek: mt8195: Correct clock order for dp_intf*
        arm64: dts: mt8183: add dpi node to mt8183
        arm64: dts: allwinner: h5: NanoPi Neo Plus2: Fix regulators
        arm64: dts: rockchip: add CAN0 and CAN1 interfaces to mecsbc board
        arm64: dts: rockchip: add CAN-FD controller nodes to rk3568
        arm64: dts: nuvoton: ma35d1: Add uart pinctrl settings
        arm64: dts: nuvoton: ma35d1: Add pinctrl and gpio nodes
        arm64: dts: nuvoton: Add syscon to the system-management node
        ARM: dts: Fix undocumented LM75 compatible nodes
        arm64: dts: toshiba: Fix pl011 and pl022 clocks
        ARM: dts: stm32: Use SAI to generate bit and frame clock on STM32MP15xx DHCOM PDK2
        ARM: dts: stm32: Switch bitclock/frame-master to flag on STM32MP15xx DHCOM PDK2
        ARM: dts: stm32: Sort properties in audio endpoints on STM32MP15xx DHCOM PDK2
        ARM: dts: stm32: Add MECIO1 and MECT1S board variants
        ...
      7b17f5eb
    • Linus Torvalds's avatar
      Merge tag 'spi-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi · 303ba85c
      Linus Torvalds authored
      Pull spi updates from Mark Brown:
       "This is quite a quiet release for SPI. The one new core feature here
        is support for configuring the state of the MOSI pin when the bus is
        idle, there are some devices which are very fragile in this regard
        even when the chip select signal is not asserted. Otherwise we have
        some new driver support, a bunch of small fixes and some general
        cleanup work.
      
         - Support for configuring the state of the MOSI pin when the the bus
           is idle
      
         - Add the Elgin JG0309-01 in spidev
      
         - Support for Marvell xSPI, Mediatek MTK7981, Microchip PIC64GX, NXP
           i.MX8ULP, and Rockchip RK3576 controllers
      
        I also accidentally pulled in an IIO DT bindings update due to a typo
        when applying the MOSI idle state patches"
      
      * tag 'spi-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (65 commits)
        spi: geni-qcom: Use devm functions to simplify code
        spi: remove spi_controller_is_slave() and spi_slave_abort()
        platform/olpc: olpc-xo175-ec: switch to use spi_target_abort().
        spi: slave-mt27xx: switch to use target_abort
        spi: spidev: switch to use spi_target_abort()
        spi: slave-system-control: switch to use spi_target_abort()
        spi: slave-time: switch to use spi_target_abort()
        spi: switch to use spi_controller_is_target()
        spi: fspi: add support for imx8ulp
        spi: fspi: involve lut_num for struct nxp_fspi_devtype_data
        dt-bindings: spi: nxp-fspi: add imx8ulp support
        spi: spidev_fdx: Fix the wrong format specifier
        spi: mxs: Switch to RUNTIME/SYSTEM_SLEEP_PM_OPS()
        spi: dt-bindings: Add rockchip,rk3576-spi compatible
        spi: Revert "spi: Insert the missing pci_dev_put()before return"
        spi: zynq-qspi: Replace kzalloc with kmalloc for buffer allocation
        spi: ppc4xx: Sort headers
        spi: ppc4xx: Revert "handle irq_of_parse_and_map() errors"
        spi: zynqmp-gqspi: Simplify with dev_err_probe()
        spi: zynqmp-gqspi: Use devm_spi_alloc_host()
        ...
      303ba85c
    • Linus Torvalds's avatar
      Merge tag 'regulator-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator · 6df92808
      Linus Torvalds authored
      Pull regulator updates from Mark Brown:
       "This release is almost all cleanup work of various kinds, while the
        diffstat for the core is quite large this is almost all cleanups and
        documentation improvments with some small fixes rather than any new
        feature work. We do have support for a couple of new devices but these
        are small additions to existing drivers rather than new drivers.
      
         - Removal of the SM5703 driver which does not have it's dependencies
           available.
      
         - Support for Allwinner AXP717, and Qualcomm WCN6855.
      
        The Allwinner support shares some commits with the MFD tree"
      
      * tag 'regulator-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (66 commits)
        regulator: sm5703: Remove because it is unused and fails to build
        regulator: Split up _regulator_get()
        regulator: update some comments ([gs]et_voltage_vsel vs [gs]et_voltage_sel)
        regulator: max8973: Use irq_get_trigger_type() helper
        regulator: core: fix the broken behavior of regulator_dev_lookup()
        regulator: max77650: Use container_of and constify static data
        regulator: hi6421v530: Use container_of and constify static data
        regulator: hi6421v530: Drop unused 'eco_microamp'
        regulator: qcom-refgen: Constify static data
        regulator: pfuze100: Constify static data
        regulator: pcap: Constify static data
        regulator: mtk-dvfsrc: Constify static data
        regulator: max77826: Constify static data
        regulator: max77826: Drop unused 'rdesc' in 'struct max77826_regulator_info'
        regulator: tps65023: Constify static data
        regulator: hi6421v600: Constify static data
        regulator: hi6421: Constify static data
        regulator: da9121: Constify static data
        regulator: da9063: Constify static data
        regulator: da9055: Constify static data
        ...
      6df92808
    • Linus Torvalds's avatar
      Merge tag 'regmap-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap · 9179b73a
      Linus Torvalds authored
      Pull regmap updates from Mark Brown:
       "The main update here is Matti's work allowing regmap irqdomains to be
        given custom names (allowing multiple interrupt controllers associatd
        with a single struct device), this pulls in some commits from Thomas'
        tree which it depends on.
      
        Otherwise there's a bit of work on improving handling of regmaps
        protected with spinlocks when used with complex cache types, fixing
        some valid but harmless lockdep reports seen with some new driver
        work"
      
      * tag 'regmap-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
        regmap: kunit: Add coverage of spinlocked regmaps
        regcache: use map->alloc_flags also for allocating cache
        regmap: Use locking during kunit tests
        regmap: Hold the regmap lock when allocating and freeing the cache
        regmap: Allow setting IRQ domain name suffix
      9179b73a
    • Linus Torvalds's avatar
      Merge tag 'printk-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux · c903327d
      Linus Torvalds authored
      Pull printk updates from Petr Mladek:
       "This is the "last" part of the support for the new nbcon consoles.
        Where "nbcon" stays for "No Big console lock CONsoles" aka not under
        the console_lock.
      
        New callbacks are added to struct console:
      
         - write_thread() for flushing nbcon consoles in task context.
      
         - write_atomic() for flushing nbcon consoles in atomic context,
           including NMI.
      
         - con->device_lock() and device_unlock() for taking the driver
           specific lock, for example, port->lock.
      
        New printk-specific kthreads are created:
      
         - per-console kthreads which get responsible for flushing normal
           priority messages on nbcon consoles.
      
         - thread which gets responsible for flushing normal priority messages
           on all consoles when CONFIG_RT enabled.
      
        The new callbacks are called under a special per-console lock which
        has already been added back in v6.7. It allows to distinguish three
        severities: normal, emergency, and panic. A context with a higher
        priority could take over the ownership when it is safe even in the
        middle of handling a record. The panic context could do it even when
        it is not safe. But it is allowed only for the final desperate flush
        before entering the infinite loop.
      
        The new lock helps to flush the messages directly in emergency and
        panic contexts. But it is not enough in all situations:
      
         - console_lock() is still need for synchronization against boot
           consoles.
      
         - con->device_lock() is need for synchronization against other
           operations on the same HW, e.g. serial port speed setting,
           non-printk related read/write.
      
        The dependency on con->device_lock() is mutual. Any code taking the
        driver specific lock has to acquire the related nbcon console context
        as well. For example, see the new uart_port_lock() API. It provides
        the necessary synchronization against emergency and panic contexts
        where the messages are flushed only under the new per-console lock.
      
        Maybe surprisingly, a quite tricky part is the decision how to flush
        the consoles in various situations. It has to take into account:
      
         - message priority:    normal, emergency, panic
      
         - scheduling context:  task, atomic, deferred_legacy
      
         - registered consoles: boot, legacy, nbcon
      
         - threads are running: early boot, suspend, shutdown, panic
      
         - caller:              printk(), pr_flush(), printk_flush_in_panic(),
                                console_unlock(), console_start(), ...
      
        The primary decision is made in printk_get_console_flush_type(). It
        creates a hint what the caller should do:
      
         - flush nbcon consoles directly or via the kthread
      
         - call the legacy loop (console_unlock()) directly or via irq_work
      
        The existing behavior is preserved for the legacy consoles. The only
        exception is that they are not longer flushed directly from printk()
        in panic() before CPUs are stopped. But this blocking happens only
        when at least one nbcon console is registered. The motivation is to
        increase a chance to produce the crash dump. They legacy consoles
        might create a deadlock in compare with nbcon consoles. The nbcon
        console should allow to see the messages even when the crash dump
        fails.
      
        There are three possible ways how nbcon consoles are flushed:
      
         - The per-nbcon-console kthread is responsible for flushing messages
           added with the normal priority. This is the default mode.
      
         - The legacy loop, aka console_unlock(), is used when there is still
           a boot console registered. There is no easy way how to match an
           early console driver with a nbcon console driver. And the
           console_lock() provides the only reliable serialization at the
           moment.
      
           The legacy loop uses either con->write_atomic() or
           con->write_thread() callbacks depending on whether it is allowed to
           schedule. The atomic variant has to be used from printk().
      
         - In other situations, the messages are flushed directly using
           write_atomic() which can be called in any context, including NMI.
           It is primary needed during early boot or shutdown, in emergency
           situations, and panic.
      
        The emergency priority is used by a code called within
        nbcon_cpu_emergency_enter()/exit(). At the moment, it is used in four
        situations: WARN(), Oops, lockdep, and RCU stall reports.
      
        Finally, there is no nbcon console at the moment. It means that the
        changes should _not_ modify the existing behavior. The only exception
        is CONFIG_RT which would force offloading the legacy loop, for normal
        priority context, into the dedicated kthread"
      
      * tag 'printk-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: (54 commits)
        printk: Avoid false positive lockdep report for legacy printing
        printk: nbcon: Assign nice -20 for printing threads
        printk: Implement legacy printer kthread for PREEMPT_RT
        tty: sysfs: Add nbcon support for 'active'
        proc: Add nbcon support for /proc/consoles
        proc: consoles: Add notation to c_start/c_stop
        printk: nbcon: Show replay message on takeover
        printk: Provide helper for message prepending
        printk: nbcon: Rely on kthreads for normal operation
        printk: nbcon: Use thread callback if in task context for legacy
        printk: nbcon: Relocate nbcon_atomic_emit_one()
        printk: nbcon: Introduce printer kthreads
        printk: nbcon: Init @nbcon_seq to highest possible
        printk: nbcon: Add context to usable() and emit()
        printk: Flush console on unregister_console()
        printk: Fail pr_flush() if before SYSTEM_SCHEDULING
        printk: nbcon: Add function for printers to reacquire ownership
        printk: nbcon: Use raw_cpu_ptr() instead of open coding
        printk: Use the BITS_PER_LONG macro
        lockdep: Mark emergency sections in lockdep splats
        ...
      c903327d
    • Linus Torvalds's avatar
      Merge tag 'core-debugobjects-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · daa394f0
      Linus Torvalds authored
      Pull debugobjects updates from Thomas Gleixner:
      
       - Use the threshold to check for the pool refill condition and not the
         run time recorded all time low fill value, which is lower than the
         threshold and therefore causes refills to be delayed.
      
       - KCSAN annotation updates and simplification of the fill_pool() code.
      
      * tag 'core-debugobjects-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        debugobjects: Remove redundant checks in fill_pool()
        debugobjects: Fix conditions in fill_pool()
        debugobjects: Fix the compilation attributes of some global variables
      daa394f0
    • Linus Torvalds's avatar
      Merge tag 'timers-core-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9ea925c8
      Linus Torvalds authored
      Pull timer updates from Thomas Gleixner:
       "Core:
      
         - Overhaul of posix-timers in preparation of removing the workaround
           for periodic timers which have signal delivery ignored.
      
         - Remove the historical extra jiffie in msleep()
      
           msleep() adds an extra jiffie to the timeout value to ensure
           minimal sleep time. The timer wheel ensures minimal sleep time
           since the large rewrite to a non-cascading wheel, but the extra
           jiffie in msleep() remained unnoticed. Remove it.
      
         - Make the timer slack handling correct for realtime tasks.
      
           The procfs interface is inconsistent and does neither reflect
           reality nor conforms to the man page. Show the correct 0 slack for
           real time tasks and enforce it at the core level instead of having
           inconsistent individual checks in various timer setup functions.
      
         - The usual set of updates and enhancements all over the place.
      
        Drivers:
      
         - Allow the ACPI PM timer to be turned off during suspend
      
         - No new drivers
      
         - The usual updates and enhancements in various drivers"
      
      * tag 'timers-core-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
        ntp: Make sure RTC is synchronized when time goes backwards
        treewide: Fix wrong singular form of jiffies in comments
        cpu: Use already existing usleep_range()
        timers: Rename next_expiry_recalc() to be unique
        platform/x86:intel/pmc: Fix comment for the pmc_core_acpi_pm_timer_suspend_resume function
        clocksource/drivers/jcore: Use request_percpu_irq()
        clocksource/drivers/cadence-ttc: Add missing clk_disable_unprepare in ttc_setup_clockevent
        clocksource/drivers/asm9260: Add missing clk_disable_unprepare in asm9260_timer_init
        clocksource/drivers/qcom: Add missing iounmap() on errors in msm_dt_timer_init()
        clocksource/drivers/ingenic: Use devm_clk_get_enabled() helpers
        platform/x86:intel/pmc: Enable the ACPI PM Timer to be turned off when suspended
        clocksource: acpi_pm: Add external callback for suspend/resume
        clocksource/drivers/arm_arch_timer: Using for_each_available_child_of_node_scoped()
        dt-bindings: timer: rockchip: Add rk3576 compatible
        timers: Annotate possible non critical data race of next_expiry
        timers: Remove historical extra jiffie for timeout in msleep()
        hrtimer: Use and report correct timerslack values for realtime tasks
        hrtimer: Annotate hrtimer_cpu_base_.*_expiry() for sparse.
        timers: Add sparse annotation for timer_sync_wait_running().
        signal: Replace BUG_ON()s
        ...
      9ea925c8
    • Linus Torvalds's avatar
      Merge tag 'irq-core-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · cb69d865
      Linus Torvalds authored
      Pull irq updates from Thomas Gleixner:
       "Core:
      
         - Remove a global lock in the affinity setting code
      
           The lock protects a cpumask for intermediate results and the lock
           causes a bottleneck on simultaneous start of multiple virtual
           machines. Replace the lock and the static cpumask with a per CPU
           cpumask which is nicely serialized by raw spinlock held when
           executing this code.
      
         - Provide support for giving a suffix to interrupt domain names.
      
           That's required to support devices with subfunctions so that the
           domain names are distinct even if they originate from the same
           device node.
      
         - The usual set of cleanups and enhancements all over the place
      
        Drivers:
      
         - Support for longarch AVEC interrupt chip
      
         - Refurbishment of the Armada driver so it can be extended for new
           variants.
      
         - The usual set of cleanups and enhancements all over the place"
      
      * tag 'irq-core-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (73 commits)
        genirq: Use cpumask_intersects()
        genirq/cpuhotplug: Use cpumask_intersects()
        irqchip/apple-aic: Only access system registers on SoCs which provide them
        irqchip/apple-aic: Add a new "Global fast IPIs only" feature level
        irqchip/apple-aic: Skip unnecessary enabling of use_fast_ipi
        dt-bindings: apple,aic: Document A7-A11 compatibles
        irqdomain: Use IS_ERR_OR_NULL() in irq_domain_trim_hierarchy()
        genirq/msi: Use kmemdup_array() instead of kmemdup()
        genirq/proc: Change the return value for set affinity permission error
        genirq/proc: Use irq_move_pending() in show_irq_affinity()
        genirq/proc: Correctly set file permissions for affinity control files
        genirq: Get rid of global lock in irq_do_set_affinity()
        genirq: Fix typo in struct comment
        irqchip/loongarch-avec: Add AVEC irqchip support
        irqchip/loongson-pch-msi: Prepare get_pch_msi_handle() for AVECINTC
        irqchip/loongson-eiointc: Rename CPUHP_AP_IRQ_LOONGARCH_STARTING
        LoongArch: Architectural preparation for AVEC irqchip
        LoongArch: Move irqchip function prototypes to irq-loongson.h
        irqchip/loongson-pch-msi: Switch to MSI parent domains
        softirq: Remove unused 'action' parameter from action callback
        ...
      cb69d865
    • Linus Torvalds's avatar
      Merge tag 'timers-clocksource-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a64405b7
      Linus Torvalds authored
      Pull clocksource watchdog updates from Thomas Gleixner:
      
       - Make the uncertainty margin handling more robust to prevent false
         positives
      
       - Clarify comments
      
      * tag 'timers-clocksource-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        clocksource: Set cs_watchdog_read() checks based on .uncertainty_margin
        clocksource: Fix comments on WATCHDOG_THRESHOLD & WATCHDOG_MAX_SKEW
        clocksource: Improve comments for watchdog skew bounds
      a64405b7
    • Linus Torvalds's avatar
      Merge tag 'smp-core-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 97e17c08
      Linus Torvalds authored
      Pull CPU hotplug updates from Thomas Gleixner:
      
       - Prepare the core for supporting parallel hotplug on loongarch
      
       - A small set of cleanups and enhancements
      
      * tag 'smp-core-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        smp: Mark smp_prepare_boot_cpu() __init
        cpu: Fix W=1 build kernel-doc warning
        cpu/hotplug: Provide weak fallback for arch_cpuhp_init_parallel_bringup()
        cpu/hotplug: Make HOTPLUG_PARALLEL independent of HOTPLUG_SMT
      97e17c08
  2. 16 Sep, 2024 17 commits
    • Linus Torvalds's avatar
      Merge tag 'lsm-pr-20240911' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm · a430d95c
      Linus Torvalds authored
      Pull lsm updates from Paul Moore:
      
       - Move the LSM framework to static calls
      
         This transitions the vast majority of the LSM callbacks into static
         calls. Those callbacks which haven't been converted were left as-is
         due to the general ugliness of the changes required to support the
         static call conversion; we can revisit those callbacks at a future
         date.
      
       - Add the Integrity Policy Enforcement (IPE) LSM
      
         This adds a new LSM, Integrity Policy Enforcement (IPE). There is
         plenty of documentation about IPE in this patches, so I'll refrain
         from going into too much detail here, but the basic motivation behind
         IPE is to provide a mechanism such that administrators can restrict
         execution to only those binaries which come from integrity protected
         storage, e.g. a dm-verity protected filesystem. You will notice that
         IPE requires additional LSM hooks in the initramfs, dm-verity, and
         fs-verity code, with the associated patches carrying ACK/review tags
         from the associated maintainers. We couldn't find an obvious
         maintainer for the initramfs code, but the IPE patchset has been
         widely posted over several years.
      
         Both Deven Bowers and Fan Wu have contributed to IPE's development
         over the past several years, with Fan Wu agreeing to serve as the IPE
         maintainer moving forward. Once IPE is accepted into your tree, I'll
         start working with Fan to ensure he has the necessary accounts, keys,
         etc. so that he can start submitting IPE pull requests to you
         directly during the next merge window.
      
       - Move the lifecycle management of the LSM blobs to the LSM framework
      
         Management of the LSM blobs (the LSM state buffers attached to
         various kernel structs, typically via a void pointer named "security"
         or similar) has been mixed, some blobs were allocated/managed by
         individual LSMs, others were managed by the LSM framework itself.
      
         Starting with this pull we move management of all the LSM blobs,
         minus the XFRM blob, into the framework itself, improving consistency
         across LSMs, and reducing the amount of duplicated code across LSMs.
         Due to some additional work required to migrate the XFRM blob, it has
         been left as a todo item for a later date; from a practical
         standpoint this omission should have little impact as only SELinux
         provides a XFRM LSM implementation.
      
       - Fix problems with the LSM's handling of F_SETOWN
      
         The LSM hook for the fcntl(F_SETOWN) operation had a couple of
         problems: it was racy with itself, and it was disconnected from the
         associated DAC related logic in such a way that the LSM state could
         be updated in cases where the DAC state would not. We fix both of
         these problems by moving the security_file_set_fowner() hook into the
         same section of code where the DAC attributes are updated. Not only
         does this resolve the DAC/LSM synchronization issue, but as that code
         block is protected by a lock, it also resolve the race condition.
      
       - Fix potential problems with the security_inode_free() LSM hook
      
         Due to use of RCU to protect inodes and the placement of the LSM hook
         associated with freeing the inode, there is a bit of a challenge when
         it comes to managing any LSM state associated with an inode. The VFS
         folks are not open to relocating the LSM hook so we have to get
         creative when it comes to releasing an inode's LSM state.
         Traditionally we have used a single LSM callback within the hook that
         is triggered when the inode is "marked for death", but not actually
         released due to RCU.
      
         Unfortunately, this causes problems for LSMs which want to take an
         action when the inode's associated LSM state is actually released; so
         we add an additional LSM callback, inode_free_security_rcu(), that is
         called when the inode's LSM state is released in the RCU free
         callback.
      
       - Refactor two LSM hooks to better fit the LSM return value patterns
      
         The vast majority of the LSM hooks follow the "return 0 on success,
         negative values on failure" pattern, however, there are a small
         handful that have unique return value behaviors which has caused
         confusion in the past and makes it difficult for the BPF verifier to
         properly vet BPF LSM programs. This includes patches to
         convert two of these"special" LSM hooks to the common 0/-ERRNO pattern.
      
       - Various cleanups and improvements
      
         A handful of patches to remove redundant code, better leverage the
         IS_ERR_OR_NULL() helper, add missing "static" markings, and do some
         minor style fixups.
      
      * tag 'lsm-pr-20240911' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: (40 commits)
        security: Update file_set_fowner documentation
        fs: Fix file_set_fowner LSM hook inconsistencies
        lsm: Use IS_ERR_OR_NULL() helper function
        lsm: remove LSM_COUNT and LSM_CONFIG_COUNT
        ipe: Remove duplicated include in ipe.c
        lsm: replace indirect LSM hook calls with static calls
        lsm: count the LSMs enabled at compile time
        kernel: Add helper macros for loop unrolling
        init/main.c: Initialize early LSMs after arch code, static keys and calls.
        MAINTAINERS: add IPE entry with Fan Wu as maintainer
        documentation: add IPE documentation
        ipe: kunit test for parser
        scripts: add boot policy generation program
        ipe: enable support for fs-verity as a trust provider
        fsverity: expose verified fsverity built-in signatures to LSMs
        lsm: add security_inode_setintegrity() hook
        ipe: add support for dm-verity as a trust provider
        dm-verity: expose root hash digest and signature data to LSMs
        block,lsm: add LSM blob and new LSM hooks for block devices
        ipe: add permissive toggle
        ...
      a430d95c
    • Linus Torvalds's avatar
      Merge tag 'selinux-pr-20240911' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux · ad060dbb
      Linus Torvalds authored
      Pull selinux updates from Paul Moore:
      
       - Ensure that both IPv4 and IPv6 connections are properly initialized
      
         While we always properly initialized IPv4 connections early in their
         life, we missed the necessary IPv6 change when we were adding IPv6
         support.
      
       - Annotate the SELinux inode revalidation function to quiet KCSAN
      
         KCSAN correctly identifies a race in __inode_security_revalidate()
         when we check to see if an inode's SELinux has been properly
         initialized. While KCSAN is correct, it is an intentional choice made
         for performance reasons; if necessary, we check the state a second
         time, this time with a lock held, before initializing the inode's
         state.
      
       - Code cleanups, simplification, etc.
      
         A handful of individual patches to simplify some SELinux kernel
         logic, improve return code granularity via ERR_PTR(), follow the
         guidance on using KMEM_CACHE(), and correct some minor style
         problems.
      
      * tag 'selinux-pr-20240911' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
        selinux: fix style problems in security/selinux/include/audit.h
        selinux: simplify avc_xperms_audit_required()
        selinux: mark both IPv4 and IPv6 accepted connection sockets as labeled
        selinux: replace kmem_cache_create() with KMEM_CACHE()
        selinux: annotate false positive data race to avoid KCSAN warnings
        selinux: refactor code to return ERR_PTR in selinux_netlbl_sock_genattr
        selinux: Streamline type determination in security_compute_sid
      ad060dbb
    • Linus Torvalds's avatar
      Merge tag 'audit-pr-20240911' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit · dc644fba
      Linus Torvalds authored
      Pull audit updates from Paul Moore:
      
       - Fix some remaining problems with PID/TGID reporting
      
         When most users think about PIDs, what they are really thinking about
         is the TGID. This commit shifts the audit PID logging and filtering
         to use the TGID value which should provide a more meaningful audit
         stream and filtering experience for users.
      
       - Migrate to the str_enabled_disabled() helper
      
         Evidently we have helper functions that help ensure if we mistype
         "enabled" or "disabled" it is now caught at compile time. I guess
         we're fancy now.
      
      * tag 'audit-pr-20240911' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
        audit: Make use of str_enabled_disabled() helper
        audit: use task_tgid_nr() instead of task_pid_nr()
      dc644fba
    • David Howells's avatar
      cifs: Remove redundant setting of NETFS_SREQ_HIT_EOF · 43a64bd0
      David Howells authored
      Fix an upstream merge resolution issue[1].  The NETFS_SREQ_HIT_EOF flag,
      and code to set it, got added via two different paths.  The original path
      saw it added in the netfslib read improvements[2], but it was also added,
      and slightly differently, in a fix that was committed before v6.11:
      
              1da29f2c
              netfs, cifs: Fix handling of short DIO read
      
      However, the code added to smb2_readv_callback() to set the flag in didn't
      get removed when the netfs read improvements series was rebased to take
      account of the cifs fixes.  The proposed merge resolution[2] deleted it
      rather than rebase the patches.
      
      Fix this by removing the redundant lines.  Code to set the bit that derives
      from the fix patch is still there, a few lines above in the source.
      
      Fixes: 35219bc5 ("Merge tag 'vfs-6.12.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Steve French <stfrench@microsoft.com>
      cc: Paulo Alcantara <pc@manguebit.com>
      cc: Christian Brauner <brauner@kernel.org>
      cc: Jeff Layton <jlayton@kernel.org>
      cc: linux-cifs@vger.kernel.org
      cc: netfs@lists.linux.dev
      cc: linux-fsdevel@vger.kernel.org
      Link: https://lore.kernel.org/r/CAHk-=wjr8fxk20-wx=63mZruW1LTvBvAKya1GQ1EhyzXb-okMA@mail.gmail.com/ [1]
      Link: https://lore.kernel.org/linux-fsdevel/20240913-vfs-netfs-39ef6f974061@brauner/ [2]
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      43a64bd0
    • David Howells's avatar
      cifs: Fix cifs readv callback merge resolution issue · dc1a456d
      David Howells authored
      Fix an upstream merge resolution issue[1].  Prior to the netfs read
      healpers, the SMB1 asynchronous read callback, cifs_readv_worker()
      performed the cleanup for the operation in the network message processing
      loop, potentially slowing down the processing of incoming SMB messages.
      
      With commit a68c7486 ("cifs: Fix SMB1 readv/writev callback in the same
      way as SMB2/3"), this was moved to a worker thread (as is done in the
      SMB2/3 transport variant).  However, the "was_async" argument to
      netfs_subreq_terminated (which was originally incorrectly "false" got
      flipped to "true" - which was then incorrect because, being in a kernel
      thread, it's not in an async context).
      
      This got corrected in the sample merge[2], but Linus, not unreasonably,
      switched it back to its previous value.
      
      Note that this value tells netfslib whether or not it can run sleepable
      stuff or stuff that takes a long time, such as retries and cleanups, in the
      calling thread, or whether it should offload to a worker thread.
      
      Fix this so that it is "false".  The callback to netfslib in both SMB1 and
      SMB2/3 now gets offloaded from the network message thread to a separate
      worker thread and thus it's fine to do the slow work in this thread.
      
      Fixes: 35219bc5 ("Merge tag 'vfs-6.12.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Steve French <stfrench@microsoft.com>
      cc: Paulo Alcantara <pc@manguebit.com>
      cc: Christian Brauner <brauner@kernel.org>
      cc: Jeff Layton <jlayton@kernel.org>
      cc: linux-cifs@vger.kernel.org
      cc: netfs@lists.linux.dev
      cc: linux-fsdevel@vger.kernel.org
      Link: https://lore.kernel.org/r/CAHk-=wjr8fxk20-wx=63mZruW1LTvBvAKya1GQ1EhyzXb-okMA@mail.gmail.com/ [1]
      Link: https://lore.kernel.org/linux-fsdevel/20240913-vfs-netfs-39ef6f974061@brauner/ [2]
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dc1a456d
    • Linus Torvalds's avatar
      Merge tag 'for-6.12/io_uring-discard-20240913' of git://git.kernel.dk/linux · adfc3ded
      Linus Torvalds authored
      Pull io_uring async discard support from Jens Axboe:
       "Sitting on top of both the 6.12 block and io_uring core branches,
        here's support for async discard through io_uring.
      
        This allows applications to issue async discards, rather than rely on
        the blocking sync ioctl discards we already have. The sync support is
        difficult to use outside of idle/cleanup periods.
      
        On a real (but slow) device, testing shows the following results when
        compared to sync discard:
      
      	qd64 sync discard: 21K IOPS, lat avg 3 msec (max 21 msec)
      	qd64 async discard: 76K IOPS, lat avg 845 usec (max 2.2 msec)
      
      	qd64 sync discard: 14K IOPS, lat avg 5 msec (max 25 msec)
      	qd64 async discard: 56K IOPS, lat avg 1153 usec (max 3.6 msec)
      
        and synthetic null_blk testing with the same queue depth and block
        size settings as above shows:
      
      	Type    Trim size       IOPS    Lat avg (usec)  Lat Max (usec)
      	==============================================================
      	sync    4k               144K       444            20314
      	async   4k              1353K        47              595
      	sync    1M                56K      1136            21031
      	async   1M                94K       680              760"
      
      * tag 'for-6.12/io_uring-discard-20240913' of git://git.kernel.dk/linux:
        block: implement async io_uring discard cmd
        block: introduce blk_validate_byte_range()
        filemap: introduce filemap_invalidate_pages
        io_uring/cmd: give inline space in request to cmds
        io_uring/cmd: expose iowq to cmds
      adfc3ded
    • Linus Torvalds's avatar
      Merge tag 'for-6.12/block-20240913' of git://git.kernel.dk/linux · 26bb0d3f
      Linus Torvalds authored
      Pull block updates from Jens Axboe:
      
       - MD changes via Song:
            - md-bitmap refactoring (Yu Kuai)
            - raid5 performance optimization (Artur Paszkiewicz)
            - Other small fixes (Yu Kuai, Chen Ni)
            - Add a sysfs entry 'new_level' (Xiao Ni)
            - Improve information reported in /proc/mdstat (Mateusz Kusiak)
      
       - NVMe changes via Keith:
            - Asynchronous namespace scanning (Stuart)
            - TCP TLS updates (Hannes)
            - RDMA queue controller validation (Niklas)
            - Align field names to the spec (Anuj)
            - Metadata support validation (Puranjay)
            - A syntax cleanup (Shen)
            - Fix a Kconfig linking error (Arnd)
            - New queue-depth quirk (Keith)
      
       - Add missing unplug trace event (Keith)
      
       - blk-iocost fixes (Colin, Konstantin)
      
       - t10-pi modular removal and fixes (Alexey)
      
       - Fix for potential BLKSECDISCARD overflow (Alexey)
      
       - bio splitting cleanups and fixes (Christoph)
      
       - Deal with folios rather than rather than pages, speeding up how the
         block layer handles bigger IOs (Kundan)
      
       - Use spinlocks rather than bit spinlocks in zram (Sebastian, Mike)
      
       - Reduce zoned device overhead in ublk (Ming)
      
       - Add and use sendpages_ok() for drbd and nvme-tcp (Ofir)
      
       - Fix regression in partition error pointer checking (Riyan)
      
       - Add support for write zeroes and rotational status in nbd (Wouter)
      
       - Add Yu Kuai as new BFQ maintainer. The scheduler has been
         unmaintained for quite a while.
      
       - Various sets of fixes for BFQ (Yu Kuai)
      
       - Misc fixes and cleanups (Alvaro, Christophe, Li, Md Haris, Mikhail,
         Yang)
      
      * tag 'for-6.12/block-20240913' of git://git.kernel.dk/linux: (120 commits)
        nvme-pci: qdepth 1 quirk
        block: fix potential invalid pointer dereference in blk_add_partition
        blk_iocost: make read-only static array vrate_adj_pct const
        block: unpin user pages belonging to a folio at once
        mm: release number of pages of a folio
        block: introduce folio awareness and add a bigger size from folio
        block: Added folio-ized version of bio_add_hw_page()
        block, bfq: factor out a helper to split bfqq in bfq_init_rq()
        block, bfq: remove local variable 'bfqq_already_existing' in bfq_init_rq()
        block, bfq: remove local variable 'split' in bfq_init_rq()
        block, bfq: remove bfq_log_bfqg()
        block, bfq: merge bfq_release_process_ref() into bfq_put_cooperator()
        block, bfq: fix procress reference leakage for bfqq in merge chain
        block, bfq: fix uaf for accessing waker_bfqq after splitting
        blk-throttle: support prioritized processing of metadata
        blk-throttle: remove last_low_overflow_time
        drbd: Add NULL check for net_conf to prevent dereference in state validation
        nvme-tcp: fix link failure for TCP auth
        blk-mq: add missing unplug trace event
        mtip32xx: Remove redundant null pointer checks in mtip_hw_debugfs_init()
        ...
      26bb0d3f
    • Linus Torvalds's avatar
      Merge tag 'for-6.12/io_uring-20240913' of git://git.kernel.dk/linux · 3a4d319a
      Linus Torvalds authored
      Pull io_uring updates from Jens Axboe:
      
       - NAPI fixes and cleanups (Pavel, Olivier)
      
       - Add support for absolute timeouts (Pavel)
      
       - Fixes for io-wq/sqpoll affinities (Felix)
      
       - Efficiency improvements for dealing with huge pages (Chenliang)
      
       - Support for a minwait mode, where the application essentially has two
         timouts - one smaller one that defines the batch timeout, and the
         overall large one similar to what we had before. This enables
         efficient use of batching based on count + timeout, while still
         working well with periods of less intensive workloads
      
       - Use ITER_UBUF for single segment sends
      
       - Add support for incremental buffer consumption. Right now each
         operation will always consume a full buffer. With incremental
         consumption, a recv/read operation only consumes the part of the
         buffer that it needs to satisfy the operation
      
       - Add support for GCOV for io_uring, to help retain a high coverage of
         test to code ratio
      
       - Fix regression with ocfs2, where an odd -EOPNOTSUPP wasn't correctly
         converted to a blocking retry
      
       - Add support for cloning registered buffers from one ring to another
      
       - Misc cleanups (Anuj, me)
      
      * tag 'for-6.12/io_uring-20240913' of git://git.kernel.dk/linux: (35 commits)
        io_uring: add IORING_REGISTER_COPY_BUFFERS method
        io_uring/register: provide helper to get io_ring_ctx from 'fd'
        io_uring/rsrc: add reference count to struct io_mapped_ubuf
        io_uring/rsrc: clear 'slot' entry upfront
        io_uring/io-wq: inherit cpuset of cgroup in io worker
        io_uring/io-wq: do not allow pinning outside of cpuset
        io_uring/rw: drop -EOPNOTSUPP check in __io_complete_rw_common()
        io_uring/rw: treat -EOPNOTSUPP for IOCB_NOWAIT like -EAGAIN
        io_uring/sqpoll: do not allow pinning outside of cpuset
        io_uring/eventfd: move refs to refcount_t
        io_uring: remove unused rsrc_put_fn
        io_uring: add new line after variable declaration
        io_uring: add GCOV_PROFILE_URING Kconfig option
        io_uring/kbuf: add support for incremental buffer consumption
        io_uring/kbuf: pass in 'len' argument for buffer commit
        Revert "io_uring: Require zeroed sqe->len on provided-buffers send"
        io_uring/kbuf: move io_ring_head_to_buf() to kbuf.h
        io_uring/kbuf: add io_kbuf_commit() helper
        io_uring/kbuf: shrink nr_iovs/mode in struct buf_sel_arg
        io_uring: wire up min batch wake timeout
        ...
      3a4d319a
    • Linus Torvalds's avatar
      Merge tag 'erofs-for-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs · 69a3a0a4
      Linus Torvalds authored
      Pull erofs updates from Gao Xiang:
       "In this cycle, we add file-backed mount support, which has has been a
        strong requirement for years. It is especially useful when there are
        thousands of images running on the same host for containers and other
        sandbox use cases, unlike OS image use cases.
      
        Without file-backed mounts, it's hard for container runtimes to manage
        and isolate so many unnecessary virtual block devices safely and
        efficiently, therefore file-backed mounts are highly preferred. For
        EROFS users, ComposeFS [1], containerd, and Android APEXes [2] will
        directly benefit from it, and I've seen no risk in implementing it as
        a completely immutable filesystem.
      
        The previous experimental feature "EROFS over fscache" is now marked
        as deprecated because:
      
         - Fscache is no longer an independent subsystem and has been merged
           into netfs, which was somewhat unexpected when it was proposed.
      
         - New HSM "fanotify pre-content hooks" [3] will be landed upstream.
           These hooks will replace "EROFS over fscache" in a simpler way, as
           EROFS won't be bother with kernel caching anymore. Userspace
           programs can also manage their own caching hierarchy more flexibly.
      
        Once the HSM "fanotify pre-content hooks" is landed, I will remove the
        fscache backend entirely as an internal dependency cleanup. More
        backgrounds are listed in the original patchset [4].
      
        In addition to that, there are bugfixes and cleanups as usual.
      
        Summary:
      
         - Support file-backed mounts for containers and sandboxes
      
         - Mark the experimental fscache backend as deprecated
      
         - Handle overlapped pclusters caused by crafted images properly
      
         - Fix a failure path which could cause infinite loops in
           z_erofs_init_decompressor()
      
         - Get rid of unnecessary NOFAILs
      
         - Harmless on-disk hardening & minor cleanups"
      
      * tag 'erofs-for-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
        erofs: reject inodes with negative i_size
        erofs: restrict pcluster size limitations
        erofs: allocate more short-lived pages from reserved pool first
        erofs: sunset unneeded NOFAILs
        erofs: simplify erofs_map_blocks_flatmode()
        erofs: refactor read_inode calling convention
        erofs: use kmemdup_nul in erofs_fill_symlink
        erofs: mark experimental fscache backend deprecated
        erofs: support compressed inodes for fileio
        erofs: support unencoded inodes for fileio
        erofs: add file-backed mount support
        erofs: handle overlapped pclusters out of crafted images properly
        erofs: fix error handling in z_erofs_init_decompressor
        erofs: clean up erofs_register_sysfs()
        erofs: fix incorrect symlink detection in fast symlink
      69a3a0a4
    • Linus Torvalds's avatar
      Merge tag 'for-6.12-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 7a40974f
      Linus Torvalds authored
      Pull btrfs updates from David Sterba:
       "This brings mostly refactoring, cleanups, minor performance
        optimizations and usual fixes. The folio API conversions are most
        noticeable.
      
        There's one less visible change that could have a high impact. The
        extent lock scope for read is reduced, not held for the entire
        operation. In the buffered read case it's left to page or inode lock,
        some direct io read synchronization is still needed.
      
        This used to prevent deadlocks induced by page faults during direct
        io, so there was a 4K limitation on the requests, e.g. for io_uring.
        In the future this will allow smoother integration with iomap where
        the extent read lock was a major obstacle.
      
        User visible changes:
      
         - the FSTRIM ioctl updates the processed range even after an error or
           interruption
      
         - cleaner thread is woken up in SYNC ioctl instead of waking the
           transaction thread that can take some delay before waking up the
           cleaner, this can speed up cleaning of deleted subvolumes
      
         - print an error message when opening a device fail, e.g. when it's
           unexpectedly read-only
      
        Core changes:
      
         - improved extent map handling in various ways (locking, iteration, ...)
      
         - new assertions and locking annotations
      
         - raid-stripe-tree locking fixes
      
         - use xarray for tracking dirty qgroup extents, switched from rb-tree
      
         - turn the subpage test to compile-time condition if possible (e.g.
           on x86_64 with 4K pages), this allows to skip a lot of ifs and
           remove dead code
      
         - more preparatory work for compression in subpage mode
      
        Cleanups and refactoring
      
         - folio API conversions, many simple cases where page is passed so
           switch it to folios
      
         - more subpage code refactoring, update page state bitmap processing
      
         - introduce auto free for btrfs_path structure, use for the simple
           cases"
      
      * tag 'for-6.12-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (110 commits)
        btrfs: only unlock the to-be-submitted ranges inside a folio
        btrfs: merge btrfs_folio_unlock_writer() into btrfs_folio_end_writer_lock()
        btrfs: BTRFS_PATH_AUTO_FREE in orphan.c
        btrfs: use btrfs_path auto free in zoned.c
        btrfs: DEFINE_FREE for struct btrfs_path
        btrfs: remove btrfs_folio_end_all_writers()
        btrfs: constify more pointer parameters
        btrfs: rework BTRFS_I as macro to preserve parameter const
        btrfs: add and use helper to verify the calling task has locked the inode
        btrfs: always update fstrim_range on failure in FITRIM ioctl
        btrfs: convert copy_inline_to_page() to use folio
        btrfs: convert btrfs_decompress() to take a folio
        btrfs: convert zstd_decompress() to take a folio
        btrfs: convert lzo_decompress() to take a folio
        btrfs: convert zlib_decompress() to take a folio
        btrfs: convert try_release_extent_mapping() to take a folio
        btrfs: convert try_release_extent_state() to take a folio
        btrfs: convert submit_eb_page() to take a folio
        btrfs: convert submit_eb_subpage() to take a folio
        btrfs: convert read_key_bytes() to take a folio
        ...
      7a40974f
    • Linus Torvalds's avatar
      Merge tag 'affs-for-6.12-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · effdcd52
      Linus Torvalds authored
      Pull affs updates from David Sterba:
       "Cleanups removing unused code and updating the definition of a
        flexible struct array"
      
      * tag 'affs-for-6.12-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        affs: Replace one-element array with flexible-array member
        affs: Remove unused macros GET_END_PTR, AFFS_GET_HASHENTRY
      effdcd52
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.12.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 35219bc5
      Linus Torvalds authored
      Pull netfs updates from Christian Brauner:
       "This contains the work to improve read/write performance for the new
        netfs library.
      
        The main performance enhancing changes are:
      
         - Define a structure, struct folio_queue, and a new iterator type,
           ITER_FOLIOQ, to hold a buffer as a replacement for ITER_XARRAY. See
           that patch for questions about naming and form.
      
           ITER_FOLIOQ is provided as a replacement for ITER_XARRAY. The
           problem with an xarray is that accessing it requires the use of a
           lock (typically the RCU read lock) - and this means that we can't
           supply iterate_and_advance() with a step function that might sleep
           (crypto for example) without having to drop the lock between pages.
           ITER_FOLIOQ is the iterator for a chain of folio_queue structs,
           where each folio_queue holds a small list of folios. A folio_queue
           struct is a simpler structure than xarray and is not subject to
           concurrent manipulation by the VM. folio_queue is used rather than
           a bvec[] as it can form lists of indefinite size, adding to one end
           and removing from the other on the fly.
      
         - Provide a copy_folio_from_iter() wrapper.
      
         - Make cifs RDMA support ITER_FOLIOQ.
      
         - Use folio queues in the write-side helpers instead of xarrays.
      
         - Add a function to reset the iterator in a subrequest.
      
         - Simplify the write-side helpers to use sheaves to skip gaps rather
           than trying to work out where gaps are.
      
         - In afs, make the read subrequests asynchronous, putting them into
           work items to allow the next patch to do progressive
           unlocking/reading.
      
         - Overhaul the read-side helpers to improve performance.
      
         - Fix the caching of a partial block at the end of a file.
      
         - Allow a store to be cancelled.
      
        Then some changes for cifs to make it use folio queues instead of
        xarrays for crypto bufferage:
      
         - Use raw iteration functions rather than manually coding iteration
           when hashing data.
      
         - Switch to using folio_queue for crypto buffers.
      
         - Remove the xarray bits.
      
        Make some adjustments to the /proc/fs/netfs/stats file such that:
      
         - All the netfs stats lines begin 'Netfs:' but change this to
           something a bit more useful.
      
         - Add a couple of stats counters to track the numbers of skips and
           waits on the per-inode writeback serialisation lock to make it
           easier to check for this as a source of performance loss.
      
        Miscellaneous work:
      
         - Ensure that the sb_writers lock is taken around
           vfs_{set,remove}xattr() in the cachefiles code.
      
         - Reduce the number of conditional branches in netfs_perform_write().
      
         - Move the CIFS_INO_MODIFIED_ATTR flag to the netfs_inode struct and
           remove cifs_post_modify().
      
         - Move the max_len/max_nr_segs members from netfs_io_subrequest to
           netfs_io_request as they're only needed for one subreq at a time.
      
         - Add an 'unknown' source value for tracing purposes.
      
         - Remove NETFS_COPY_TO_CACHE as it's no longer used.
      
         - Set the request work function up front at allocation time.
      
         - Use bh-disabling spinlocks for rreq->lock as cachefiles completion
           may be run from block-filesystem DIO completion in softirq context.
      
         - Remove fs/netfs/io.c"
      
      * tag 'vfs-6.12.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits)
        docs: filesystems: corrected grammar of netfs page
        cifs: Don't support ITER_XARRAY
        cifs: Switch crypto buffer to use a folio_queue rather than an xarray
        cifs: Use iterate_and_advance*() routines directly for hashing
        netfs: Cancel dirty folios that have no storage destination
        cachefiles, netfs: Fix write to partial block at EOF
        netfs: Remove fs/netfs/io.c
        netfs: Speed up buffered reading
        afs: Make read subreqs async
        netfs: Simplify the writeback code
        netfs: Provide an iterator-reset function
        netfs: Use new folio_queue data type and iterator instead of xarray iter
        cifs: Provide the capability to extract from ITER_FOLIOQ to RDMA SGEs
        iov_iter: Provide copy_folio_from_iter()
        mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of folios
        netfs: Use bh-disabling spinlocks for rreq->lock
        netfs: Set the request work function upon allocation
        netfs: Remove NETFS_COPY_TO_CACHE
        netfs: Reserve netfs_sreq_source 0 as unset/unknown
        netfs: Move max_len/max_nr_segs from netfs_io_subrequest to netfs_io_stream
        ...
      35219bc5
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.12.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 9020d0d8
      Linus Torvalds authored
      Pull vfs mount updates from Christian Brauner:
       "Recently, we added the ability to list mounts in other mount
        namespaces and the ability to retrieve namespace file descriptors
        without having to go through procfs by deriving them from pidfds.
      
        This extends nsfs in two ways:
      
         (1) Add the ability to retrieve information about a mount namespace
             via NS_MNT_GET_INFO.
      
             This will return the mount namespace id and the number of mounts
             currently in the mount namespace. The number of mounts can be
             used to size the buffer that needs to be used for listmount() and
             is in general useful without having to actually iterate through
             all the mounts.
      
            The structure is extensible.
      
         (2) Add the ability to iterate through all mount namespaces over
             which the caller holds privilege returning the file descriptor
             for the next or previous mount namespace.
      
             To retrieve a mount namespace the caller must be privileged wrt
             to it's owning user namespace. This means that PID 1 on the host
             can list all mounts in all mount namespaces or that a container
             can list all mounts of its nested containers.
      
             Optionally pass a structure for NS_MNT_GET_INFO with
             NS_MNT_GET_{PREV,NEXT} to retrieve information about the mount
             namespace in one go.
      
        (1) and (2) can be implemented for other namespace types easily.
      
        Together with recent api additions this means one can iterate through
        all mounts in all mount namespaces without ever touching procfs.
      
        The commit message in 49224a34 ('Merge patch series "nsfs: iterate
        through mount namespaces"') contains example code how to do this"
      
      * tag 'vfs-6.12.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        nsfs: iterate through mount namespaces
        file: add fput() cleanup helper
        fs: add put_mnt_ns() cleanup helper
        fs: allow mount namespace fd
      9020d0d8
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.12.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · e8fc317d
      Linus Torvalds authored
      Pull procfs updates from Christian Brauner:
       "This contains the following changes for procfs:
      
         - Add config options and parameters to block forcing memory writes.
      
           This adds a Kconfig option and boot param to allow removing the
           FOLL_FORCE flag from /proc/<pid>/mem write calls as this can be
           used in various attacks.
      
           The traditional forcing behavior is kept as default because it can
           break GDB and some other use cases.
      
           This is the simpler version that you had requested.
      
         - Restrict overmounting of ephemeral entities.
      
           It is currently possible to mount on top of various ephemeral
           entities in procfs. This specifically includes magic links. To
           recap, magic links are links of the form /proc/<pid>/fd/<nr>. They
           serve as references to a target file and during path lookup they
           cause a jump to the target path. Such magic links disappear if the
           corresponding file descriptor is closed.
      
           Currently it is possible to overmount such magic links. This is
           mostly interesting for an attacker that wants to somehow trick a
           process into e.g., reopening something that it didn't intend to
           reopen or to hide a malicious file descriptor.
      
           But also it risks leaking mounts for long-running processes. When
           overmounting a magic link like above, the mount will not be
           detached when the file descriptor is closed. Only the target
           mountpoint will disappear. Which has the consequence of making it
           impossible to unmount that mount afterwards. So the mount will
           stick around until the process exits and the /proc/<pid>/ directory
           is cleaned up during proc_flush_pid() when the dentries are pruned
           and invalidated.
      
           That in turn means it's possible for a program to accidentally leak
           mounts and it's also possible to make a task leak mounts without
           it's knowledge if the attacker just keeps overmounting things under
           /proc/<pid>/fd/<nr>.
      
           Disallow overmounting of such ephemeral entities.
      
         - Cleanup the readdir method naming in some procfs file operations.
      
         - Replace kmalloc() and strcpy() with a simple kmemdup() call"
      
      * tag 'vfs-6.12.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        proc: fold kmalloc() + strcpy() into kmemdup()
        proc: block mounting on top of /proc/<pid>/fdinfo/*
        proc: block mounting on top of /proc/<pid>/fd/*
        proc: block mounting on top of /proc/<pid>/map_files/*
        proc: add proc_splice_unmountable()
        proc: proc_readfdinfo() -> proc_fdinfo_iterate()
        proc: proc_readfd() -> proc_fd_iterate()
        proc: add config & param to block forcing mem writes
      e8fc317d
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.12.fallocate' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · ee25861f
      Linus Torvalds authored
      Pull vfs fallocate updates from Christian Brauner:
       "This contains work to try and cleanup some the fallocate mode
        handling. Currently, it confusingly mixes operation modes and an
        optional flag.
      
        The work here tries to better define operation modes and optional
        flags allowing the core and filesystem code to use switch statements
        to switch on the operation mode"
      
      * tag 'vfs-6.12.fallocate' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        xfs: refactor xfs_file_fallocate
        xfs: move the xfs_is_always_cow_inode check into xfs_alloc_file_space
        xfs: call xfs_flush_unmap_range from xfs_free_file_space
        fs: sort out the fallocate mode vs flag mess
        ext4: remove tracing for FALLOC_FL_NO_HIDE_STALE
        block: remove checks for FALLOC_FL_NO_HIDE_STALE
      ee25861f
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.12.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 3352633c
      Linus Torvalds authored
      Pull vfs file updates from Christian Brauner:
       "This is the work to cleanup and shrink struct file significantly.
      
        Right now, (focusing on x86) struct file is 232 bytes. After this
        series struct file will be 184 bytes aka 3 cacheline and a spare 8
        bytes for future extensions at the end of the struct.
      
        With struct file being as ubiquitous as it is this should make a
        difference for file heavy workloads and allow further optimizations in
        the future.
      
         - struct fown_struct was embedded into struct file letting it take up
           32 bytes in total when really it shouldn't even be embedded in
           struct file in the first place. Instead, actual users of struct
           fown_struct now allocate the struct on demand. This frees up 24
           bytes.
      
         - Move struct file_ra_state into the union containg the cleanup hooks
           and move f_iocb_flags out of the union. This closes a 4 byte hole
           we created earlier and brings struct file to 192 bytes. Which means
           struct file is 3 cachelines and we managed to shrink it by 40
           bytes.
      
         - Reorder struct file so that nothing crosses a cacheline.
      
           I suspect that in the future we will end up reordering some members
           to mitigate false sharing issues or just because someone does
           actually provide really good perf data.
      
         - Shrinking struct file to 192 bytes is only part of the work.
      
           Files use a slab that is SLAB_TYPESAFE_BY_RCU and when a kmem cache
           is created with SLAB_TYPESAFE_BY_RCU the free pointer must be
           located outside of the object because the cache doesn't know what
           part of the memory can safely be overwritten as it may be needed to
           prevent object recycling.
      
           That has the consequence that SLAB_TYPESAFE_BY_RCU may end up
           adding a new cacheline.
      
           So this also contains work to add a new kmem_cache_create_rcu()
           function that allows the caller to specify an offset where the
           freelist pointer is supposed to be placed. Thus avoiding the
           implicit addition of a fourth cacheline.
      
         - And finally this removes the f_version member in struct file.
      
           The f_version member isn't particularly well-defined. It is mainly
           used as a cookie to detect concurrent seeks when iterating
           directories. But it is also abused by some subsystems for
           completely unrelated things.
      
           It is mostly a directory and filesystem specific thing that doesn't
           really need to live in struct file and with its wonky semantics it
           really lacks a specific function.
      
           For pipes, f_version is (ab)used to defer poll notifications until
           a write has happened. And struct pipe_inode_info is used by
           multiple struct files in their ->private_data so there's no chance
           of pushing that down into file->private_data without introducing
           another pointer indirection.
      
           But pipes don't rely on f_pos_lock so this adds a union into struct
           file encompassing f_pos_lock and a pipe specific f_pipe member that
           pipes can use. This union of course can be extended to other file
           types and is similar to what we do in struct inode already"
      
      * tag 'vfs-6.12.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (26 commits)
        fs: remove f_version
        pipe: use f_pipe
        fs: add f_pipe
        ubifs: store cookie in private data
        ufs: store cookie in private data
        udf: store cookie in private data
        proc: store cookie in private data
        ocfs2: store cookie in private data
        input: remove f_version abuse
        ext4: store cookie in private data
        ext2: store cookie in private data
        affs: store cookie in private data
        fs: add generic_llseek_cookie()
        fs: use must_set_pos()
        fs: add must_set_pos()
        fs: add vfs_setpos_cookie()
        s390: remove unused f_version
        ceph: remove unused f_version
        adi: remove unused f_version
        mm: Removed @freeptr_offset to prevent doc warning
        ...
      3352633c
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.12.folio' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs · 2775df6e
      Linus Torvalds authored
      Pull vfs folio updates from Christian Brauner:
       "This contains work to port write_begin and write_end to rely on folios
        for various filesystems.
      
        This converts ocfs2, vboxfs, orangefs, jffs2, hostfs, fuse, f2fs,
        ecryptfs, ntfs3, nilfs2, reiserfs, minixfs, qnx6, sysv, ufs, and
        squashfs.
      
        After this series lands a bunch of the filesystems in this list do not
        mention struct page anymore"
      
      * tag 'vfs-6.12.folio' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (61 commits)
        Squashfs: Ensure all readahead pages have been used
        Squashfs: Rewrite and update squashfs_readahead_fragment() to not use page->index
        Squashfs: Update squashfs_readpage_block() to not use page->index
        Squashfs: Update squashfs_readahead() to not use page->index
        Squashfs: Update page_actor to not use page->index
        jffs2: Use a folio in jffs2_garbage_collect_dnode()
        jffs2: Convert jffs2_do_readpage_nolock to take a folio
        buffer: Convert __block_write_begin() to take a folio
        ocfs2: Convert ocfs2_write_zero_page to use a folio
        fs: Convert aops->write_begin to take a folio
        fs: Convert aops->write_end to take a folio
        vboxsf: Use a folio in vboxsf_write_end()
        orangefs: Convert orangefs_write_begin() to use a folio
        orangefs: Convert orangefs_write_end() to use a folio
        jffs2: Convert jffs2_write_begin() to use a folio
        jffs2: Convert jffs2_write_end() to use a folio
        hostfs: Convert hostfs_write_end() to use a folio
        fuse: Convert fuse_write_begin() to use a folio
        fuse: Convert fuse_write_end() to use a folio
        f2fs: Convert f2fs_write_begin() to use a folio
        ...
      2775df6e