Commits · fc1dc0d50780a9b215322bcc315f07ad8e4c6c13 · Kirill Smelkov / linux

17 Sep, 2024 23 commits

Merge tag 'x86-timers-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · fc1dc0d5

Linus Torvalds authored Sep 17, 2024

Pull x86 timer updates from Thomas Gleixner:

 - Use the topology information of number of packages for making the
   decision about TSC trust instead of using the number of online nodes
   which is not reflecting the real topology.

 - Stop the PIT timer 0 when its not in use as to stop pointless
   emulation in the VMM.

 - Fix the PIT timer stop sequence for timer 0 so it truly stops both
   real hardware and buggy VMM emulations.

* tag 'x86-timers-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tsc: Check for sockets instead of CPUs to make code match comment
  clockevents/drivers/i8253: Fix stop sequence for timer 0
  x86/i8253: Disable PIT timer 0 when not in use
  x86/tsc: Use topology_max_packages() to get package number

fc1dc0d5

Merge tag 'x86-misc-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b5075354

Linus Torvalds authored Sep 17, 2024

Pull misc x86 updates from Thomas Gleixner:

 - Rework kcpuid to handle the the autogenerated CSV file correctly and
   update the CSV file to cover the whole zoo of CPUID.

 - Avoid memcpy() for ia32 syscall_get_arguments() and use direct
   assignments as fortified memcpy() is unhappy about writing/reading
   beyond the end of the addresses destination/source struct member

 - A few new PCI IDs for AMD

 - Update MAINTAINERS to cover x86 specific selftests

* tag 'x86-misc-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  MAINTAINERS: Add selftests/x86 entry
  x86/amd_nb: Add new PCI IDs for AMD family 1Ah model 60h-70h
  x86/syscall: Avoid memcpy() for ia32 syscall_get_arguments()
  MAINTAINERS: Add x86 cpuid database entry
  tools/x86/kcpuid: Introduce a complete cpuid bitfields CSV file
  tools/x86/kcpuid: Parse subleaf ranges if provided
  tools/x86/kcpuid: Recognize all leaves with subleaves
  tools/x86/kcpuid: Strip bitfield names leading/trailing whitespace
  tools/x86/kcpuid: Protect against faulty "max subleaf" values
  tools/x86/kcpuid: Set max possible subleaves count to 64
  tools/x86/kcpuid: Properly align long-description columns
  tools/x86/kcpuid: Remove unused variable
  x86/amd_nb: Add new PCI IDs for AMD family 1Ah model 60h

b5075354

Merge tag 'x86-platform-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a3233da6

Linus Torvalds authored Sep 17, 2024

Pull x86 platform update from Thomas Gleixner:
 "Remove a stale declaration from the UV platform code"

* tag 'x86-platform-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/platform/uv: Remove unused declaration uv_irq_2_mmr_info()

a3233da6

Merge tag 'x86-mm-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 70f43ea3

Linus Torvalds authored Sep 17, 2024

Pull x86 memory management updates from Thomas Gleixner:

 - Make LAM enablement safe vs. kernel threads using a process mm
   temporarily as switching back to the process would not update CR3 and
   therefore not enable LAM causing faults in user space when using
   tagged pointers. Cure it by synchronizing LAM enablement via IPIs to
   all CPUs which use the related mm.

 - Cure a LAM harmless inconsistency between CR3 and the state during
   context switch. It's both confusing and prone to lead to real bugs

 - Handle alt stack handling for threads which run with a non-zero
   protection key. The non-zero key prevents the kernel to access the
   alternate stack. Cure it by temporarily enabling all protection keys
   for the alternate stack setup/restore operations.

 - Provide a EFI config table identity mapping for kexec kernel to
   prevent kexec fails because the new kernel cannot access the config
   table array

 - Use GB pages only when a full GB is mapped in the identity map as
   otherwise the CPU can speculate into reserved areas after the end of
   memory which causes malfunction on UV systems.

 - Remove the noisy and pointless SRAT table dump during boot

 - Use is_ioremap_addr() for iounmap() address range checks instead of
   high_memory. is_ioremap_addr() is more precise.

* tag 'x86-mm-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ioremap: Improve iounmap() address range checks
  x86/mm: Remove duplicate check from build_cr3()
  x86/mm: Remove unused NX related declarations
  x86/mm: Remove unused CR3_HW_ASID_BITS
  x86/mm: Don't print out SRAT table information
  x86/mm/ident_map: Use gbpages only where full GB page should be mapped.
  x86/kexec: Add EFI config table identity mapping for kexec kernel
  selftests/mm: Add new testcases for pkeys
  x86/pkeys: Restore altstack access in sigreturn()
  x86/pkeys: Update PKRU to enable all pkeys before XSAVE
  x86/pkeys: Add helper functions to update PKRU on the sigframe
  x86/pkeys: Add PKRU as a parameter in signal handling functions
  x86/mm: Cleanup prctl_enable_tagged_addr() nr_bits error checking
  x86/mm: Fix LAM inconsistency during context switch
  x86/mm: Use IPIs to synchronize LAM enablement

70f43ea3

Merge tag 'x86-fred-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b1360211

Linus Torvalds authored Sep 17, 2024

Pull x86 FRED updates from Thomas Gleixner:

 - Enable FRED right after init_mem_mapping() because at that point the
   early IDT fault handler is replaced by the real fault handler. The
   real fault handler retrieves the faulting address from the stack
   frame and not from CR2 when the FRED feature is set. But that
   obviously only works when FRED is enabled in the CPU as well.

 - Set SS to __KERNEL_DS when enabling FRED to prevent a corner case
   where ERETS can observe a SS mismatch and raises a #GP.

* tag 'x86-fred-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/entry: Set FRED RSP0 on return to userspace instead of context switch
  x86/msr: Switch between WRMSRNS and WRMSR with the alternatives mechanism
  x86/entry: Test ti_work for zero before processing individual bits
  x86/fred: Set SS to __KERNEL_DS when enabling FRED
  x86/fred: Enable FRED right after init_mem_mapping()
  x86/fred: Move FRED RSP initialization into a separate function
  x86/fred: Parse cmdline param "fred=" in cpu_parse_early_param()

b1360211

Merge tag 'x86-fpu-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c3056a7d

Linus Torvalds authored Sep 17, 2024

Pull x86 fpu updates from Thomas Gleixner:
 "Provide FPU buffer layout in core dumps:

  Debuggers have guess the FPU buffer layout in core dumps, which is
  error prone. This is because AMD and Intel layouts differ.

  To avoid buggy heuristics add a ELF section which describes the buffer
  layout which can be retrieved by tools"

* tag 'x86-fpu-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/elf: Add a new FPU buffer layout info to x86 core files

c3056a7d

Merge tag 'x86-core-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · dea435d3

Linus Torvalds authored Sep 17, 2024

Pull x86 core update from Thomas Gleixner:
 "Enable UBSAN traps for x86, which provides better reporting through
  metadata encodeded into UD1"

* tag 'x86-core-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/traps: Enable UBSAN traps on x86

dea435d3

Merge tag 'x86-apic-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 61d1ea91

Linus Torvalds authored Sep 17, 2024

Pull x86 APIC updates from Thomas Gleixner:

 - Handle an allocation failure in the IO/APIC code gracefully instead
   of crashing the machine.

 - Remove support for APIC local destination mode on 64bit

   Logical destination mode of the local APIC is used for systems with
   up to 8 CPUs. It has an advantage over physical destination mode as
   it allows to target multiple CPUs at once with IPIs. That advantage
   was definitely worth it when systems with up to 8 CPUs were state of
   the art for servers and workstations, but that's history.

   In the recent past there were quite some reports of new laptops
   failing to boot with logical destination mode, but they work fine
   with physical destination mode. That's not a suprise because physical
   destination mode is guaranteed to work as it's the only way to get a
   CPU up and running via the INIT/INIT/STARTUP sequence. Some of the
   affected systems were cured by BIOS updates, but not all OEMs provide
   them.

   As the number of CPUs keep increasing, logical destination mode
   becomes less used and the benefit for small systems, like laptops, is
   not really worth the trouble. So just remove logical destination mode
   support for 64bit and be done with it.

 - Code and comment cleanups in the APIC area.

* tag 'x86-apic-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/irq: Fix comment on IRQ vector layout
  x86/apic: Remove unused extern declarations
  x86/apic: Remove logical destination mode for 64-bit
  x86/apic: Remove unused inline function apic_set_eoi_cb()
  x86/ioapic: Cleanup remaining coding style issues
  x86/ioapic: Cleanup line breaks
  x86/ioapic: Cleanup bracket usage
  x86/ioapic: Cleanup comments
  x86/ioapic: Move replace_pin_at_irq_node() to the call site
  iommu/vt-d: Cleanup apic_printk()
  x86/mpparse: Cleanup apic_printk()s
  x86/ioapic: Cleanup guarded debug printk()s
  x86/ioapic: Cleanup apic_printk()s
  x86/apic: Cleanup apic_printk()s
  x86/apic: Provide apic_printk() helpers
  x86/ioapic: Use guard() for locking where applicable
  x86/ioapic: Cleanup structs
  x86/ioapic: Mark mp_alloc_timer_irq() __init
  x86/ioapic: Handle allocation failures gracefully

61d1ea91

Merge tag 'x86-cleanups-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0279aa78

Linus Torvalds authored Sep 17, 2024

Pull x86 cleanups from Thomas Gleixner:
 "A set of cleanups across x86:

   - Use memremap() for the EISA probe instead of ioremap(). EISA is
     strictly memory and not MMIO

   - Cleanups and enhancement all over the place"

* tag 'x86-cleanups-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/EISA: Dereference memory directly instead of using readl()
  x86/extable: Remove unused declaration fixup_bug()
  x86/boot/64: Strip percpu address space when setting up GDT descriptors
  x86/cpu: Clarify the error message when BIOS does not support SGX
  x86/kexec: Add comments around swap_pages() assembly to improve readability
  x86/kexec: Fix a comment of swap_pages() assembly
  x86/sgx: Fix a W=1 build warning in function comment
  x86/EISA: Use memremap() to probe for the EISA BIOS signature
  x86/mtrr: Remove obsolete declaration for mtrr_bp_restore()
  x86/cpu_entry_area: Annotate percpu_setup_exception_stacks() as __init

0279aa78

Merge tag 'x86-build-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 5ba202a7

Linus Torvalds authored Sep 17, 2024

Pull x86 build updates from Thomas Gleixner:
 "Updates for KCOV instrumentation on x86:

   - Prevent spurious KCOV coverage in common_interrupt()

   - Fixup the KCOV Makefile directive which got stale due to a source
     file rename

   - Exclude stack unwinding from KCOV as it creates large amounts of
     uninteresting coverage

   - Provide a self test to validate that KCOV coverage of the interrupt
     handling code starts not before preempt count got updated"

* tag 'x86-build-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Ignore stack unwinding in KCOV
  module: Fix KCOV-ignored file name
  kcov: Add interrupt handling self test
  x86/entry: Remove unwanted instrumentation in common_interrupt()

5ba202a7

Merge tag 'soc-arm-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · a940d9a4

Linus Torvalds authored Sep 17, 2024

Pull SoC ARM platform updates from Arnd Bergmann:
 "Most of these updates are for removing dead code on the Samsung S3C,
  NXP i.MX, TI OMAP and TI DaVinci platforms, though this appears to be
  a coincidence.

  There are also cleanups for the Marvell Orion family and the Arm
  integrator series and a Kconfig change for Broadcom"

* tag 'soc-arm-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  ARM: dove: Drop a write-only variable
  ARM: orion5x: Switch to new sys-off handler API
  ARM: mvebu: Warn about memory chunks too small for DDR training
  ARM: imx: Annotate imx7d_enet_init() as __init
  ARM: OMAP1: Remove unused declarations in arch/arm/mach-omap1/pm.h
  ARM: s3c: remove unused s3c2410_cpu_suspend() declaration
  ARM: s3c: remove unused declarations for s3c6400
  ARM: s3c: Remove unused s3c_init_uart_irqs() declaration
  ARM: davinci: remove unused cpuidle code
  ARM: davinci: remove unused davinci_init_ide() declaration
  ARM: davinci: remove unused davinci_cfg_reg_list() declaration
  ARM: mach-imx: imx6sx: Remove Ethernet refclock setting
  MAINTAINERS: Add entry for Samsung Exynos850 SoC
  ARM: bcm: Select ARM_GIC_V3 for ARCH_BRCMSTB
  ARM: omap2: Switch to use kmemdup_array()
  ARM: omap1: Remove unused struct 'dma_link_info'
  ARM: s3c: Drop explicit initialization of struct i2c_device_id::driver_data to 0

a940d9a4

Merge tag 'soc-defconfig-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · 38ea77ab

Linus Torvalds authored Sep 17, 2024

Pull SoC defconfig updates from Arnd Bergmann:
 "The updates to the defconfig files are fairly small, enabling drivers
  for eight of the arm and riscv based platforms"

* tag 'soc-defconfig-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  arm64: defconfig: enable mt8365 sound
  riscv: defconfig: Enable pinctrl support for CV18XX Series SoC
  arm64: defconfig: Enable ADP5585 GPIO and PWM drivers
  arm64: defconfig: Enable Tegra194 PCIe Endpoint
  arm64: defconfig: Enable E5010 JPEG Encoder
  riscv: defconfig: sophgo: enable clks for sg2042
  arm64: defconfig: build CONFIG_REGULATOR_QCOM_REFGEN as module
  ARM: configs: at91: enable config flags for sam9x7 SoC family
  arm64: defconfig: Enable R-Car Ethernet-TSN support
  ARM: shmobile: defconfig: Enable slab hardening and kmalloc buckets
  arm64: defconfig: Enable AK4619 codec support

38ea77ab

Merge tag 'soc-drivers-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · b8979c6b

Linus Torvalds authored Sep 17, 2024

Pull SoC driver updates from Arnd Bergmann:
 "The driver updates seem larger this time around, with changes is many
  of the SoC specific drivers, both the custom drivers/soc ones and the
  closely related subsystems (memory, bus, firmware, reset, ...).

  The at91 platform gains support for sam9x7 chips in the soc and power
  management code. This is the latest variant of one of the oldest still
  supported SoC families, using the ARM9 (ARMv5) core.

  As usual, the qualcomm snapdragon platform gets a ton of updates in
  many of their drivers to add more features and additional SoC support.
  Most of these are somewhat firmware related as the platform has a
  number of firmware based interfaces to the kernel. A notable addition
  here is the inclusion of trace events to two of these drivers.

  Herve Codina and Christophe Leroy are now sending updates for
  drivers/soc/fsl/ code through the SoC tree, this contains both PowerPC
  and Arm specific platforms and has previously been problematic to
  maintain. The first update here contains support for newer PowerPC
  variants and some cleanups.

  The turris mox firmware driver has a number of updates, mostly
  cleanups.

  The Arm SCMI firmware driver gets a major rework to modularize the
  existing code into separately loadable drivers for the various
  transports, the addition of custom NXP i.MX9 interfaces and a number
  of smaller updates.

  The Arm FF-A firmware driver gets a feature update to support the v1.2
  version of the specification.

  The reset controller drivers have some smaller cleanups and a newly
  added driver for the Intel/Mobileye EyeQ5/EyeQ6 MIPS SoCs.

  The memory controller drivers get some cleanups and refactoring for
  Tegra, TI, Freescale/NXP and a couple more platforms.

  Finally there are lots of minor updates to firmware (raspberry pi,
  tegra, imx), bus (sunxi, omap, tegra) and soc (rockchips, tegra,
  amlogic, mediatek) drivers and their DT bindings"

* tag 'soc-drivers-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (212 commits)
  firmware: imx: remove duplicate scmi_imx_misc_ctrl_get()
  platform: cznic: turris-omnia-mcu: Fix error check in omnia_mcu_register_trng()
  bus: sunxi-rsb: Simplify code with dev_err_probe()
  soc: fsl: qe: ucc: Export ucc_mux_set_grant_tsa_bkpt
  soc: fsl: cpm1: qmc: Fix dependency on fsl_soc.h
  dt-bindings: arm: rockchip: Add rk3576 compatible string to pmu.yaml
  soc: fsl: qbman: Remove redundant warnings
  soc: fsl: qbman: Use iommu_paging_domain_alloc()
  MAINTAINERS: Add QE files related to the Freescale QMC controller
  soc: fsl: cpm1: qmc: Handle QUICC Engine (QE) soft-qmc firmware
  soc: fsl: cpm1: qmc: Add support for QUICC Engine (QE) implementation
  soc: fsl: qe: Add missing PUSHSCHED command
  soc: fsl: qe: Add resource-managed muram allocators
  soc: fsl: cpm1: qmc: Introduce qmc_version
  soc: fsl: cpm1: qmc: Rename SCC_GSMRL_MODE_QMC
  soc: fsl: cpm1: qmc: Handle RPACK initialization
  soc: fsl: cpm1: qmc: Rename qmc_chan_command()
  soc: fsl: cpm1: qmc: Introduce qmc_{init,exit}_xcc() and their CPM1 version
  soc: fsl: cpm1: qmc: Introduce qmc_init_resource() and its CPM1 version
  soc: fsl: cpm1: qmc: Re-order probe() operations
  ...

b8979c6b

Merge tag 'soc-dt-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · 7b17f5eb

Linus Torvalds authored Sep 17, 2024

Pull SoC devicetree updates from Arnd Bergmann:
 "New SoC support for Broadcom bcm2712 (Raspberry Pi 5) and Renesas
  R9A09G057 (RZ/V2H(P)) and Qualcomm Snapdragon 414 (MSM8929), all three
  of these are variants of already supported chips, in particular the
  last one is almost identical to MSM8939.

  Lots of updates to Mediatek, ASpeed, Rockchips, Amlogic, Qualcomm,
  STM32, NXP i.MX, Sophgo, TI K3, Renesas, Microchip at91, NVIDIA Tegra,
  and T-HEAD.

  The added Qualcomm platform support once again dominates the changes,
  with seven phones and three laptops getting added in addition to many
  new features on existing machines. The Snapdragon X1E support
  specifically keeps improving.

  The other new machines are:

   - eight new machines using various 64-bit Rockchips SoCs, both on the
     consumer/gaming side and developer boards

   - three industrial boards with 64-bit i.MX, which is a very low
     number for them.

   - four more servers using a 32-bit Speed BMC

   - three boards using STM32MP1 SoCs

   - one new machine each using allwinner, amlogic, broadcom and renesas
     chips"

* tag 'soc-dt-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (672 commits)
  arm64: dts: allwinner: h5: NanoPi NEO Plus2: Use regulators for pio
  arm64: dts: mediatek: add audio support for mt8365-evk
  arm64: dts: mediatek: add afe support for mt8365 SoC
  arm64: dts: mediatek: mt8186-corsola: Disable DPI display interface
  arm64: dts: mediatek: mt8186: Add svs node
  arm64: dts: mediatek: mt8186: Add power domain for DPI
  arm64: dts: mediatek: mt8195: Correct clock order for dp_intf*
  arm64: dts: mt8183: add dpi node to mt8183
  arm64: dts: allwinner: h5: NanoPi Neo Plus2: Fix regulators
  arm64: dts: rockchip: add CAN0 and CAN1 interfaces to mecsbc board
  arm64: dts: rockchip: add CAN-FD controller nodes to rk3568
  arm64: dts: nuvoton: ma35d1: Add uart pinctrl settings
  arm64: dts: nuvoton: ma35d1: Add pinctrl and gpio nodes
  arm64: dts: nuvoton: Add syscon to the system-management node
  ARM: dts: Fix undocumented LM75 compatible nodes
  arm64: dts: toshiba: Fix pl011 and pl022 clocks
  ARM: dts: stm32: Use SAI to generate bit and frame clock on STM32MP15xx DHCOM PDK2
  ARM: dts: stm32: Switch bitclock/frame-master to flag on STM32MP15xx DHCOM PDK2
  ARM: dts: stm32: Sort properties in audio endpoints on STM32MP15xx DHCOM PDK2
  ARM: dts: stm32: Add MECIO1 and MECT1S board variants
  ...

7b17f5eb

Merge tag 'spi-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi · 303ba85c

Linus Torvalds authored Sep 17, 2024

Pull spi updates from Mark Brown:
 "This is quite a quiet release for SPI. The one new core feature here
  is support for configuring the state of the MOSI pin when the bus is
  idle, there are some devices which are very fragile in this regard
  even when the chip select signal is not asserted. Otherwise we have
  some new driver support, a bunch of small fixes and some general
  cleanup work.

   - Support for configuring the state of the MOSI pin when the the bus
     is idle

   - Add the Elgin JG0309-01 in spidev

   - Support for Marvell xSPI, Mediatek MTK7981, Microchip PIC64GX, NXP
     i.MX8ULP, and Rockchip RK3576 controllers

  I also accidentally pulled in an IIO DT bindings update due to a typo
  when applying the MOSI idle state patches"

* tag 'spi-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (65 commits)
  spi: geni-qcom: Use devm functions to simplify code
  spi: remove spi_controller_is_slave() and spi_slave_abort()
  platform/olpc: olpc-xo175-ec: switch to use spi_target_abort().
  spi: slave-mt27xx: switch to use target_abort
  spi: spidev: switch to use spi_target_abort()
  spi: slave-system-control: switch to use spi_target_abort()
  spi: slave-time: switch to use spi_target_abort()
  spi: switch to use spi_controller_is_target()
  spi: fspi: add support for imx8ulp
  spi: fspi: involve lut_num for struct nxp_fspi_devtype_data
  dt-bindings: spi: nxp-fspi: add imx8ulp support
  spi: spidev_fdx: Fix the wrong format specifier
  spi: mxs: Switch to RUNTIME/SYSTEM_SLEEP_PM_OPS()
  spi: dt-bindings: Add rockchip,rk3576-spi compatible
  spi: Revert "spi: Insert the missing pci_dev_put()before return"
  spi: zynq-qspi: Replace kzalloc with kmalloc for buffer allocation
  spi: ppc4xx: Sort headers
  spi: ppc4xx: Revert "handle irq_of_parse_and_map() errors"
  spi: zynqmp-gqspi: Simplify with dev_err_probe()
  spi: zynqmp-gqspi: Use devm_spi_alloc_host()
  ...

303ba85c

Merge tag 'regulator-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator · 6df92808

Linus Torvalds authored Sep 17, 2024

Pull regulator updates from Mark Brown:
 "This release is almost all cleanup work of various kinds, while the
  diffstat for the core is quite large this is almost all cleanups and
  documentation improvments with some small fixes rather than any new
  feature work. We do have support for a couple of new devices but these
  are small additions to existing drivers rather than new drivers.

   - Removal of the SM5703 driver which does not have it's dependencies
     available.

   - Support for Allwinner AXP717, and Qualcomm WCN6855.

  The Allwinner support shares some commits with the MFD tree"

* tag 'regulator-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (66 commits)
  regulator: sm5703: Remove because it is unused and fails to build
  regulator: Split up _regulator_get()
  regulator: update some comments ([gs]et_voltage_vsel vs [gs]et_voltage_sel)
  regulator: max8973: Use irq_get_trigger_type() helper
  regulator: core: fix the broken behavior of regulator_dev_lookup()
  regulator: max77650: Use container_of and constify static data
  regulator: hi6421v530: Use container_of and constify static data
  regulator: hi6421v530: Drop unused 'eco_microamp'
  regulator: qcom-refgen: Constify static data
  regulator: pfuze100: Constify static data
  regulator: pcap: Constify static data
  regulator: mtk-dvfsrc: Constify static data
  regulator: max77826: Constify static data
  regulator: max77826: Drop unused 'rdesc' in 'struct max77826_regulator_info'
  regulator: tps65023: Constify static data
  regulator: hi6421v600: Constify static data
  regulator: hi6421: Constify static data
  regulator: da9121: Constify static data
  regulator: da9063: Constify static data
  regulator: da9055: Constify static data
  ...

6df92808

Merge tag 'regmap-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap · 9179b73a

Linus Torvalds authored Sep 17, 2024

Pull regmap updates from Mark Brown:
 "The main update here is Matti's work allowing regmap irqdomains to be
  given custom names (allowing multiple interrupt controllers associatd
  with a single struct device), this pulls in some commits from Thomas'
  tree which it depends on.

  Otherwise there's a bit of work on improving handling of regmaps
  protected with spinlocks when used with complex cache types, fixing
  some valid but harmless lockdep reports seen with some new driver
  work"

* tag 'regmap-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: kunit: Add coverage of spinlocked regmaps
  regcache: use map->alloc_flags also for allocating cache
  regmap: Use locking during kunit tests
  regmap: Hold the regmap lock when allocating and freeing the cache
  regmap: Allow setting IRQ domain name suffix

9179b73a

Merge tag 'printk-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux · c903327d

Linus Torvalds authored Sep 17, 2024

Pull printk updates from Petr Mladek:
"This is the "last" part of the support for the new nbcon consoles.
Where "nbcon" stays for "No Big console lock CONsoles" aka not under
the console_lock.

New callbacks are added to struct console:

- write_thread() for flushing nbcon consoles in task context.

- write_atomic() for flushing nbcon consoles in atomic context,
including NMI.

- con->device_lock() and device_unlock() for taking the driver
specific lock, for example, port->lock.

New printk-specific kthreads are created:

- per-console kthreads which get responsible for flushing normal
priority messages on nbcon consoles.

- thread which gets responsible for flushing normal priority messages
on all consoles when CONFIG_RT enabled.

The new callbacks are called under a special per-console lock which
has already been added back in v6.7. It allows to distinguish three
severities: normal, emergency, and panic. A context with a higher
priority could take over the ownership when it is safe even in the
middle of handling a record. The panic context could do it even when
it is not safe. But it is allowed only for the final desperate flush
before entering the infinite loop.

The new lock helps to flush the messages directly in emergency and
panic contexts. But it is not enough in all situations:

- console_lock() is still need for synchronization against boot
consoles.

- con->device_lock() is need for synchronization against other
operations on the same HW, e.g. serial port speed setting,
non-printk related read/write.

The dependency on con->device_lock() is mutual. Any code taking the
driver specific lock has to acquire the related nbcon console context
as well. For example, see the new uart_port_lock() API. It provides
the necessary synchronization against emergency and panic contexts
where the messages are flushed only under the new per-console lock.

Maybe surprisingly, a quite tricky part is the decision how to flush
the consoles in various situations. It has to take into account:

- message priority: normal, emergency, panic

- scheduling context: task, atomic, deferred_legacy

- registered consoles: boot, legacy, nbcon

- threads are running: early boot, suspend, shutdown, panic

- caller: printk(), pr_flush(), printk_flush_in_panic(),
console_unlock(), console_start(), ...

The primary decision is made in printk_get_console_flush_type(). It
creates a hint what the caller should do:

- flush nbcon consoles directly or via the kthread

- call the legacy loop (console_unlock()) directly or via irq_work

The existing behavior is preserved for the legacy consoles. The only
exception is that they are not longer flushed directly from printk()
in panic() before CPUs are stopped. But this blocking happens only
when at least one nbcon console is registered. The motivation is to
increase a chance to produce the crash dump. They legacy consoles
might create a deadlock in compare with nbcon consoles. The nbcon
console should allow to see the messages even when the crash dump
fails.

There are three possible ways how nbcon consoles are flushed:

- The per-nbcon-console kthread is responsible for flushing messages
added with the normal priority. This is the default mode.

- The legacy loop, aka console_unlock(), is used when there is still
a boot console registered. There is no easy way how to match an
early console driver with a nbcon console driver. And the
console_lock() provides the only reliable serialization at the
moment.

The legacy loop uses either con->write_atomic() or
con->write_thread() callbacks depending on whether it is allowed to
schedule. The atomic variant has to be used from printk().

- In other situations, the messages are flushed directly using
write_atomic() which can be called in any context, including NMI.
It is primary needed during early boot or shutdown, in emergency
situations, and panic.

The emergency priority is used by a code called within
nbcon_cpu_emergency_enter()/exit(). At the moment, it is used in four
situations: WARN(), Oops, lockdep, and RCU stall reports.

Finally, there is no nbcon console at the moment. It means that the
changes should _not_ modify the existing behavior. The only exception
is CONFIG_RT which would force offloading the legacy loop, for normal
priority context, into the dedicated kthread"

* tag 'printk-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: (54 commits)
printk: Avoid false positive lockdep report for legacy printing
printk: nbcon: Assign nice -20 for printing threads
printk: Implement legacy printer kthread for PREEMPT_RT
tty: sysfs: Add nbcon support for 'active'
proc: Add nbcon support for /proc/consoles
proc: consoles: Add notation to c_start/c_stop
printk: nbcon: Show replay message on takeover
printk: Provide helper for message prepending
printk: nbcon: Rely on kthreads for normal operation
printk: nbcon: Use thread callback if in task context for legacy
printk: nbcon: Relocate nbcon_atomic_emit_one()
printk: nbcon: Introduce printer kthreads
printk: nbcon: Init @nbcon_seq to highest possible
printk: nbcon: Add context to usable() and emit()
printk: Flush console on unregister_console()
printk: Fail pr_flush() if before SYSTEM_SCHEDULING
printk: nbcon: Add function for printers to reacquire ownership
printk: nbcon: Use raw_cpu_ptr() instead of open coding
printk: Use the BITS_PER_LONG macro
lockdep: Mark emergency sections in lockdep splats
...

c903327d

Merge tag 'core-debugobjects-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · daa394f0

Linus Torvalds authored Sep 17, 2024

Pull debugobjects updates from Thomas Gleixner:

 - Use the threshold to check for the pool refill condition and not the
   run time recorded all time low fill value, which is lower than the
   threshold and therefore causes refills to be delayed.

 - KCSAN annotation updates and simplification of the fill_pool() code.

* tag 'core-debugobjects-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  debugobjects: Remove redundant checks in fill_pool()
  debugobjects: Fix conditions in fill_pool()
  debugobjects: Fix the compilation attributes of some global variables

daa394f0

Merge tag 'timers-core-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9ea925c8

Linus Torvalds authored Sep 17, 2024

Pull timer updates from Thomas Gleixner:
 "Core:

   - Overhaul of posix-timers in preparation of removing the workaround
     for periodic timers which have signal delivery ignored.

   - Remove the historical extra jiffie in msleep()

     msleep() adds an extra jiffie to the timeout value to ensure
     minimal sleep time. The timer wheel ensures minimal sleep time
     since the large rewrite to a non-cascading wheel, but the extra
     jiffie in msleep() remained unnoticed. Remove it.

   - Make the timer slack handling correct for realtime tasks.

     The procfs interface is inconsistent and does neither reflect
     reality nor conforms to the man page. Show the correct 0 slack for
     real time tasks and enforce it at the core level instead of having
     inconsistent individual checks in various timer setup functions.

   - The usual set of updates and enhancements all over the place.

  Drivers:

   - Allow the ACPI PM timer to be turned off during suspend

   - No new drivers

   - The usual updates and enhancements in various drivers"

* tag 'timers-core-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
  ntp: Make sure RTC is synchronized when time goes backwards
  treewide: Fix wrong singular form of jiffies in comments
  cpu: Use already existing usleep_range()
  timers: Rename next_expiry_recalc() to be unique
  platform/x86:intel/pmc: Fix comment for the pmc_core_acpi_pm_timer_suspend_resume function
  clocksource/drivers/jcore: Use request_percpu_irq()
  clocksource/drivers/cadence-ttc: Add missing clk_disable_unprepare in ttc_setup_clockevent
  clocksource/drivers/asm9260: Add missing clk_disable_unprepare in asm9260_timer_init
  clocksource/drivers/qcom: Add missing iounmap() on errors in msm_dt_timer_init()
  clocksource/drivers/ingenic: Use devm_clk_get_enabled() helpers
  platform/x86:intel/pmc: Enable the ACPI PM Timer to be turned off when suspended
  clocksource: acpi_pm: Add external callback for suspend/resume
  clocksource/drivers/arm_arch_timer: Using for_each_available_child_of_node_scoped()
  dt-bindings: timer: rockchip: Add rk3576 compatible
  timers: Annotate possible non critical data race of next_expiry
  timers: Remove historical extra jiffie for timeout in msleep()
  hrtimer: Use and report correct timerslack values for realtime tasks
  hrtimer: Annotate hrtimer_cpu_base_.*_expiry() for sparse.
  timers: Add sparse annotation for timer_sync_wait_running().
  signal: Replace BUG_ON()s
  ...

9ea925c8

Merge tag 'irq-core-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · cb69d865

Linus Torvalds authored Sep 17, 2024

Pull irq updates from Thomas Gleixner:
 "Core:

   - Remove a global lock in the affinity setting code

     The lock protects a cpumask for intermediate results and the lock
     causes a bottleneck on simultaneous start of multiple virtual
     machines. Replace the lock and the static cpumask with a per CPU
     cpumask which is nicely serialized by raw spinlock held when
     executing this code.

   - Provide support for giving a suffix to interrupt domain names.

     That's required to support devices with subfunctions so that the
     domain names are distinct even if they originate from the same
     device node.

   - The usual set of cleanups and enhancements all over the place

  Drivers:

   - Support for longarch AVEC interrupt chip

   - Refurbishment of the Armada driver so it can be extended for new
     variants.

   - The usual set of cleanups and enhancements all over the place"

* tag 'irq-core-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (73 commits)
  genirq: Use cpumask_intersects()
  genirq/cpuhotplug: Use cpumask_intersects()
  irqchip/apple-aic: Only access system registers on SoCs which provide them
  irqchip/apple-aic: Add a new "Global fast IPIs only" feature level
  irqchip/apple-aic: Skip unnecessary enabling of use_fast_ipi
  dt-bindings: apple,aic: Document A7-A11 compatibles
  irqdomain: Use IS_ERR_OR_NULL() in irq_domain_trim_hierarchy()
  genirq/msi: Use kmemdup_array() instead of kmemdup()
  genirq/proc: Change the return value for set affinity permission error
  genirq/proc: Use irq_move_pending() in show_irq_affinity()
  genirq/proc: Correctly set file permissions for affinity control files
  genirq: Get rid of global lock in irq_do_set_affinity()
  genirq: Fix typo in struct comment
  irqchip/loongarch-avec: Add AVEC irqchip support
  irqchip/loongson-pch-msi: Prepare get_pch_msi_handle() for AVECINTC
  irqchip/loongson-eiointc: Rename CPUHP_AP_IRQ_LOONGARCH_STARTING
  LoongArch: Architectural preparation for AVEC irqchip
  LoongArch: Move irqchip function prototypes to irq-loongson.h
  irqchip/loongson-pch-msi: Switch to MSI parent domains
  softirq: Remove unused 'action' parameter from action callback
  ...

cb69d865

Merge tag 'timers-clocksource-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a64405b7

Linus Torvalds authored Sep 17, 2024

Pull clocksource watchdog updates from Thomas Gleixner:

 - Make the uncertainty margin handling more robust to prevent false
   positives

 - Clarify comments

* tag 'timers-clocksource-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource: Set cs_watchdog_read() checks based on .uncertainty_margin
  clocksource: Fix comments on WATCHDOG_THRESHOLD & WATCHDOG_MAX_SKEW
  clocksource: Improve comments for watchdog skew bounds

a64405b7

Merge tag 'smp-core-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 97e17c08

Linus Torvalds authored Sep 17, 2024

Pull CPU hotplug updates from Thomas Gleixner:

 - Prepare the core for supporting parallel hotplug on loongarch

 - A small set of cleanups and enhancements

* tag 'smp-core-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  smp: Mark smp_prepare_boot_cpu() __init
  cpu: Fix W=1 build kernel-doc warning
  cpu/hotplug: Provide weak fallback for arch_cpuhp_init_parallel_bringup()
  cpu/hotplug: Make HOTPLUG_PARALLEL independent of HOTPLUG_SMT

97e17c08

16 Sep, 2024 17 commits

Merge tag 'lsm-pr-20240911' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm · a430d95c

Linus Torvalds authored Sep 16, 2024

Pull lsm updates from Paul Moore:

 - Move the LSM framework to static calls

   This transitions the vast majority of the LSM callbacks into static
   calls. Those callbacks which haven't been converted were left as-is
   due to the general ugliness of the changes required to support the
   static call conversion; we can revisit those callbacks at a future
   date.

 - Add the Integrity Policy Enforcement (IPE) LSM

   This adds a new LSM, Integrity Policy Enforcement (IPE). There is
   plenty of documentation about IPE in this patches, so I'll refrain
   from going into too much detail here, but the basic motivation behind
   IPE is to provide a mechanism such that administrators can restrict
   execution to only those binaries which come from integrity protected
   storage, e.g. a dm-verity protected filesystem. You will notice that
   IPE requires additional LSM hooks in the initramfs, dm-verity, and
   fs-verity code, with the associated patches carrying ACK/review tags
   from the associated maintainers. We couldn't find an obvious
   maintainer for the initramfs code, but the IPE patchset has been
   widely posted over several years.

   Both Deven Bowers and Fan Wu have contributed to IPE's development
   over the past several years, with Fan Wu agreeing to serve as the IPE
   maintainer moving forward. Once IPE is accepted into your tree, I'll
   start working with Fan to ensure he has the necessary accounts, keys,
   etc. so that he can start submitting IPE pull requests to you
   directly during the next merge window.

 - Move the lifecycle management of the LSM blobs to the LSM framework

   Management of the LSM blobs (the LSM state buffers attached to
   various kernel structs, typically via a void pointer named "security"
   or similar) has been mixed, some blobs were allocated/managed by
   individual LSMs, others were managed by the LSM framework itself.

   Starting with this pull we move management of all the LSM blobs,
   minus the XFRM blob, into the framework itself, improving consistency
   across LSMs, and reducing the amount of duplicated code across LSMs.
   Due to some additional work required to migrate the XFRM blob, it has
   been left as a todo item for a later date; from a practical
   standpoint this omission should have little impact as only SELinux
   provides a XFRM LSM implementation.

 - Fix problems with the LSM's handling of F_SETOWN

   The LSM hook for the fcntl(F_SETOWN) operation had a couple of
   problems: it was racy with itself, and it was disconnected from the
   associated DAC related logic in such a way that the LSM state could
   be updated in cases where the DAC state would not. We fix both of
   these problems by moving the security_file_set_fowner() hook into the
   same section of code where the DAC attributes are updated. Not only
   does this resolve the DAC/LSM synchronization issue, but as that code
   block is protected by a lock, it also resolve the race condition.

 - Fix potential problems with the security_inode_free() LSM hook

   Due to use of RCU to protect inodes and the placement of the LSM hook
   associated with freeing the inode, there is a bit of a challenge when
   it comes to managing any LSM state associated with an inode. The VFS
   folks are not open to relocating the LSM hook so we have to get
   creative when it comes to releasing an inode's LSM state.
   Traditionally we have used a single LSM callback within the hook that
   is triggered when the inode is "marked for death", but not actually
   released due to RCU.

   Unfortunately, this causes problems for LSMs which want to take an
   action when the inode's associated LSM state is actually released; so
   we add an additional LSM callback, inode_free_security_rcu(), that is
   called when the inode's LSM state is released in the RCU free
   callback.

 - Refactor two LSM hooks to better fit the LSM return value patterns

   The vast majority of the LSM hooks follow the "return 0 on success,
   negative values on failure" pattern, however, there are a small
   handful that have unique return value behaviors which has caused
   confusion in the past and makes it difficult for the BPF verifier to
   properly vet BPF LSM programs. This includes patches to
   convert two of these"special" LSM hooks to the common 0/-ERRNO pattern.

 - Various cleanups and improvements

   A handful of patches to remove redundant code, better leverage the
   IS_ERR_OR_NULL() helper, add missing "static" markings, and do some
   minor style fixups.

* tag 'lsm-pr-20240911' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: (40 commits)
  security: Update file_set_fowner documentation
  fs: Fix file_set_fowner LSM hook inconsistencies
  lsm: Use IS_ERR_OR_NULL() helper function
  lsm: remove LSM_COUNT and LSM_CONFIG_COUNT
  ipe: Remove duplicated include in ipe.c
  lsm: replace indirect LSM hook calls with static calls
  lsm: count the LSMs enabled at compile time
  kernel: Add helper macros for loop unrolling
  init/main.c: Initialize early LSMs after arch code, static keys and calls.
  MAINTAINERS: add IPE entry with Fan Wu as maintainer
  documentation: add IPE documentation
  ipe: kunit test for parser
  scripts: add boot policy generation program
  ipe: enable support for fs-verity as a trust provider
  fsverity: expose verified fsverity built-in signatures to LSMs
  lsm: add security_inode_setintegrity() hook
  ipe: add support for dm-verity as a trust provider
  dm-verity: expose root hash digest and signature data to LSMs
  block,lsm: add LSM blob and new LSM hooks for block devices
  ipe: add permissive toggle
  ...

a430d95c

Merge tag 'selinux-pr-20240911' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux · ad060dbb

Linus Torvalds authored Sep 16, 2024

Pull selinux updates from Paul Moore:

 - Ensure that both IPv4 and IPv6 connections are properly initialized

   While we always properly initialized IPv4 connections early in their
   life, we missed the necessary IPv6 change when we were adding IPv6
   support.

 - Annotate the SELinux inode revalidation function to quiet KCSAN

   KCSAN correctly identifies a race in __inode_security_revalidate()
   when we check to see if an inode's SELinux has been properly
   initialized. While KCSAN is correct, it is an intentional choice made
   for performance reasons; if necessary, we check the state a second
   time, this time with a lock held, before initializing the inode's
   state.

 - Code cleanups, simplification, etc.

   A handful of individual patches to simplify some SELinux kernel
   logic, improve return code granularity via ERR_PTR(), follow the
   guidance on using KMEM_CACHE(), and correct some minor style
   problems.

* tag 'selinux-pr-20240911' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: fix style problems in security/selinux/include/audit.h
  selinux: simplify avc_xperms_audit_required()
  selinux: mark both IPv4 and IPv6 accepted connection sockets as labeled
  selinux: replace kmem_cache_create() with KMEM_CACHE()
  selinux: annotate false positive data race to avoid KCSAN warnings
  selinux: refactor code to return ERR_PTR in selinux_netlbl_sock_genattr
  selinux: Streamline type determination in security_compute_sid

ad060dbb

Merge tag 'audit-pr-20240911' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit · dc644fba

Linus Torvalds authored Sep 16, 2024

Pull audit updates from Paul Moore:

 - Fix some remaining problems with PID/TGID reporting

   When most users think about PIDs, what they are really thinking about
   is the TGID. This commit shifts the audit PID logging and filtering
   to use the TGID value which should provide a more meaningful audit
   stream and filtering experience for users.

 - Migrate to the str_enabled_disabled() helper

   Evidently we have helper functions that help ensure if we mistype
   "enabled" or "disabled" it is now caught at compile time. I guess
   we're fancy now.

* tag 'audit-pr-20240911' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: Make use of str_enabled_disabled() helper
  audit: use task_tgid_nr() instead of task_pid_nr()

dc644fba

cifs: Remove redundant setting of NETFS_SREQ_HIT_EOF · 43a64bd0

David Howells authored Sep 16, 2024

Fix an upstream merge resolution issue[1].  The NETFS_SREQ_HIT_EOF flag,
and code to set it, got added via two different paths.  The original path
saw it added in the netfslib read improvements[2], but it was also added,
and slightly differently, in a fix that was committed before v6.11:

        1da29f2c
        netfs, cifs: Fix handling of short DIO read

However, the code added to smb2_readv_callback() to set the flag in didn't
get removed when the netfs read improvements series was rebased to take
account of the cifs fixes.  The proposed merge resolution[2] deleted it
rather than rebase the patches.

Fix this by removing the redundant lines.  Code to set the bit that derives
from the fix patch is still there, a few lines above in the source.

Fixes: 35219bc5 ("Merge tag 'vfs-6.12.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <stfrench@microsoft.com>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Christian Brauner <brauner@kernel.org>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/CAHk-=wjr8fxk20-wx=63mZruW1LTvBvAKya1GQ1EhyzXb-okMA@mail.gmail.com/ [1]
Link: https://lore.kernel.org/linux-fsdevel/20240913-vfs-netfs-39ef6f974061@brauner/ [2]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

43a64bd0

cifs: Fix cifs readv callback merge resolution issue · dc1a456d

David Howells authored Sep 16, 2024

Fix an upstream merge resolution issue[1]. Prior to the netfs read
healpers, the SMB1 asynchronous read callback, cifs_readv_worker()
performed the cleanup for the operation in the network message processing
loop, potentially slowing down the processing of incoming SMB messages.

With commit a68c7486 ("cifs: Fix SMB1 readv/writev callback in the same
way as SMB2/3"), this was moved to a worker thread (as is done in the
SMB2/3 transport variant). However, the "was_async" argument to
netfs_subreq_terminated (which was originally incorrectly "false" got
flipped to "true" - which was then incorrect because, being in a kernel
thread, it's not in an async context).

This got corrected in the sample merge[2], but Linus, not unreasonably,
switched it back to its previous value.

Note that this value tells netfslib whether or not it can run sleepable
stuff or stuff that takes a long time, such as retries and cleanups, in the
calling thread, or whether it should offload to a worker thread.

Fix this so that it is "false". The callback to netfslib in both SMB1 and
SMB2/3 now gets offloaded from the network message thread to a separate
worker thread and thus it's fine to do the slow work in this thread.

Fixes: 35219bc5 ("Merge tag 'vfs-6.12.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <stfrench@microsoft.com>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Christian Brauner <brauner@kernel.org>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/CAHk-=wjr8fxk20-wx=63mZruW1LTvBvAKya1GQ1EhyzXb-okMA@mail.gmail.com/ [1]
Link: https://lore.kernel.org/linux-fsdevel/20240913-vfs-netfs-39ef6f974061@brauner/ [2]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

dc1a456d

Merge tag 'for-6.12/io_uring-discard-20240913' of git://git.kernel.dk/linux · adfc3ded

Linus Torvalds authored Sep 16, 2024

Pull io_uring async discard support from Jens Axboe:
 "Sitting on top of both the 6.12 block and io_uring core branches,
  here's support for async discard through io_uring.

  This allows applications to issue async discards, rather than rely on
  the blocking sync ioctl discards we already have. The sync support is
  difficult to use outside of idle/cleanup periods.

  On a real (but slow) device, testing shows the following results when
  compared to sync discard:

	qd64 sync discard: 21K IOPS, lat avg 3 msec (max 21 msec)
	qd64 async discard: 76K IOPS, lat avg 845 usec (max 2.2 msec)

	qd64 sync discard: 14K IOPS, lat avg 5 msec (max 25 msec)
	qd64 async discard: 56K IOPS, lat avg 1153 usec (max 3.6 msec)

  and synthetic null_blk testing with the same queue depth and block
  size settings as above shows:

	Type    Trim size       IOPS    Lat avg (usec)  Lat Max (usec)
	==============================================================
	sync    4k               144K       444            20314
	async   4k              1353K        47              595
	sync    1M                56K      1136            21031
	async   1M                94K       680              760"

* tag 'for-6.12/io_uring-discard-20240913' of git://git.kernel.dk/linux:
  block: implement async io_uring discard cmd
  block: introduce blk_validate_byte_range()
  filemap: introduce filemap_invalidate_pages
  io_uring/cmd: give inline space in request to cmds
  io_uring/cmd: expose iowq to cmds

adfc3ded

Merge tag 'for-6.12/block-20240913' of git://git.kernel.dk/linux · 26bb0d3f

Linus Torvalds authored Sep 16, 2024

Pull block updates from Jens Axboe:

 - MD changes via Song:
      - md-bitmap refactoring (Yu Kuai)
      - raid5 performance optimization (Artur Paszkiewicz)
      - Other small fixes (Yu Kuai, Chen Ni)
      - Add a sysfs entry 'new_level' (Xiao Ni)
      - Improve information reported in /proc/mdstat (Mateusz Kusiak)

 - NVMe changes via Keith:
      - Asynchronous namespace scanning (Stuart)
      - TCP TLS updates (Hannes)
      - RDMA queue controller validation (Niklas)
      - Align field names to the spec (Anuj)
      - Metadata support validation (Puranjay)
      - A syntax cleanup (Shen)
      - Fix a Kconfig linking error (Arnd)
      - New queue-depth quirk (Keith)

 - Add missing unplug trace event (Keith)

 - blk-iocost fixes (Colin, Konstantin)

 - t10-pi modular removal and fixes (Alexey)

 - Fix for potential BLKSECDISCARD overflow (Alexey)

 - bio splitting cleanups and fixes (Christoph)

 - Deal with folios rather than rather than pages, speeding up how the
   block layer handles bigger IOs (Kundan)

 - Use spinlocks rather than bit spinlocks in zram (Sebastian, Mike)

 - Reduce zoned device overhead in ublk (Ming)

 - Add and use sendpages_ok() for drbd and nvme-tcp (Ofir)

 - Fix regression in partition error pointer checking (Riyan)

 - Add support for write zeroes and rotational status in nbd (Wouter)

 - Add Yu Kuai as new BFQ maintainer. The scheduler has been
   unmaintained for quite a while.

 - Various sets of fixes for BFQ (Yu Kuai)

 - Misc fixes and cleanups (Alvaro, Christophe, Li, Md Haris, Mikhail,
   Yang)

* tag 'for-6.12/block-20240913' of git://git.kernel.dk/linux: (120 commits)
  nvme-pci: qdepth 1 quirk
  block: fix potential invalid pointer dereference in blk_add_partition
  blk_iocost: make read-only static array vrate_adj_pct const
  block: unpin user pages belonging to a folio at once
  mm: release number of pages of a folio
  block: introduce folio awareness and add a bigger size from folio
  block: Added folio-ized version of bio_add_hw_page()
  block, bfq: factor out a helper to split bfqq in bfq_init_rq()
  block, bfq: remove local variable 'bfqq_already_existing' in bfq_init_rq()
  block, bfq: remove local variable 'split' in bfq_init_rq()
  block, bfq: remove bfq_log_bfqg()
  block, bfq: merge bfq_release_process_ref() into bfq_put_cooperator()
  block, bfq: fix procress reference leakage for bfqq in merge chain
  block, bfq: fix uaf for accessing waker_bfqq after splitting
  blk-throttle: support prioritized processing of metadata
  blk-throttle: remove last_low_overflow_time
  drbd: Add NULL check for net_conf to prevent dereference in state validation
  nvme-tcp: fix link failure for TCP auth
  blk-mq: add missing unplug trace event
  mtip32xx: Remove redundant null pointer checks in mtip_hw_debugfs_init()
  ...

26bb0d3f

Merge tag 'for-6.12/io_uring-20240913' of git://git.kernel.dk/linux · 3a4d319a

Linus Torvalds authored Sep 16, 2024

Pull io_uring updates from Jens Axboe:

 - NAPI fixes and cleanups (Pavel, Olivier)

 - Add support for absolute timeouts (Pavel)

 - Fixes for io-wq/sqpoll affinities (Felix)

 - Efficiency improvements for dealing with huge pages (Chenliang)

 - Support for a minwait mode, where the application essentially has two
   timouts - one smaller one that defines the batch timeout, and the
   overall large one similar to what we had before. This enables
   efficient use of batching based on count + timeout, while still
   working well with periods of less intensive workloads

 - Use ITER_UBUF for single segment sends

 - Add support for incremental buffer consumption. Right now each
   operation will always consume a full buffer. With incremental
   consumption, a recv/read operation only consumes the part of the
   buffer that it needs to satisfy the operation

 - Add support for GCOV for io_uring, to help retain a high coverage of
   test to code ratio

 - Fix regression with ocfs2, where an odd -EOPNOTSUPP wasn't correctly
   converted to a blocking retry

 - Add support for cloning registered buffers from one ring to another

 - Misc cleanups (Anuj, me)

* tag 'for-6.12/io_uring-20240913' of git://git.kernel.dk/linux: (35 commits)
  io_uring: add IORING_REGISTER_COPY_BUFFERS method
  io_uring/register: provide helper to get io_ring_ctx from 'fd'
  io_uring/rsrc: add reference count to struct io_mapped_ubuf
  io_uring/rsrc: clear 'slot' entry upfront
  io_uring/io-wq: inherit cpuset of cgroup in io worker
  io_uring/io-wq: do not allow pinning outside of cpuset
  io_uring/rw: drop -EOPNOTSUPP check in __io_complete_rw_common()
  io_uring/rw: treat -EOPNOTSUPP for IOCB_NOWAIT like -EAGAIN
  io_uring/sqpoll: do not allow pinning outside of cpuset
  io_uring/eventfd: move refs to refcount_t
  io_uring: remove unused rsrc_put_fn
  io_uring: add new line after variable declaration
  io_uring: add GCOV_PROFILE_URING Kconfig option
  io_uring/kbuf: add support for incremental buffer consumption
  io_uring/kbuf: pass in 'len' argument for buffer commit
  Revert "io_uring: Require zeroed sqe->len on provided-buffers send"
  io_uring/kbuf: move io_ring_head_to_buf() to kbuf.h
  io_uring/kbuf: add io_kbuf_commit() helper
  io_uring/kbuf: shrink nr_iovs/mode in struct buf_sel_arg
  io_uring: wire up min batch wake timeout
  ...

3a4d319a

Merge tag 'erofs-for-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs · 69a3a0a4

Linus Torvalds authored Sep 16, 2024

Pull erofs updates from Gao Xiang:
 "In this cycle, we add file-backed mount support, which has has been a
  strong requirement for years. It is especially useful when there are
  thousands of images running on the same host for containers and other
  sandbox use cases, unlike OS image use cases.

  Without file-backed mounts, it's hard for container runtimes to manage
  and isolate so many unnecessary virtual block devices safely and
  efficiently, therefore file-backed mounts are highly preferred. For
  EROFS users, ComposeFS [1], containerd, and Android APEXes [2] will
  directly benefit from it, and I've seen no risk in implementing it as
  a completely immutable filesystem.

  The previous experimental feature "EROFS over fscache" is now marked
  as deprecated because:

   - Fscache is no longer an independent subsystem and has been merged
     into netfs, which was somewhat unexpected when it was proposed.

   - New HSM "fanotify pre-content hooks" [3] will be landed upstream.
     These hooks will replace "EROFS over fscache" in a simpler way, as
     EROFS won't be bother with kernel caching anymore. Userspace
     programs can also manage their own caching hierarchy more flexibly.

  Once the HSM "fanotify pre-content hooks" is landed, I will remove the
  fscache backend entirely as an internal dependency cleanup. More
  backgrounds are listed in the original patchset [4].

  In addition to that, there are bugfixes and cleanups as usual.

  Summary:

   - Support file-backed mounts for containers and sandboxes

   - Mark the experimental fscache backend as deprecated

   - Handle overlapped pclusters caused by crafted images properly

   - Fix a failure path which could cause infinite loops in
     z_erofs_init_decompressor()

   - Get rid of unnecessary NOFAILs

   - Harmless on-disk hardening & minor cleanups"

* tag 'erofs-for-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: reject inodes with negative i_size
  erofs: restrict pcluster size limitations
  erofs: allocate more short-lived pages from reserved pool first
  erofs: sunset unneeded NOFAILs
  erofs: simplify erofs_map_blocks_flatmode()
  erofs: refactor read_inode calling convention
  erofs: use kmemdup_nul in erofs_fill_symlink
  erofs: mark experimental fscache backend deprecated
  erofs: support compressed inodes for fileio
  erofs: support unencoded inodes for fileio
  erofs: add file-backed mount support
  erofs: handle overlapped pclusters out of crafted images properly
  erofs: fix error handling in z_erofs_init_decompressor
  erofs: clean up erofs_register_sysfs()
  erofs: fix incorrect symlink detection in fast symlink

69a3a0a4

Merge tag 'for-6.12-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 7a40974f

Linus Torvalds authored Sep 16, 2024

Pull btrfs updates from David Sterba:
 "This brings mostly refactoring, cleanups, minor performance
  optimizations and usual fixes. The folio API conversions are most
  noticeable.

  There's one less visible change that could have a high impact. The
  extent lock scope for read is reduced, not held for the entire
  operation. In the buffered read case it's left to page or inode lock,
  some direct io read synchronization is still needed.

  This used to prevent deadlocks induced by page faults during direct
  io, so there was a 4K limitation on the requests, e.g. for io_uring.
  In the future this will allow smoother integration with iomap where
  the extent read lock was a major obstacle.

  User visible changes:

   - the FSTRIM ioctl updates the processed range even after an error or
     interruption

   - cleaner thread is woken up in SYNC ioctl instead of waking the
     transaction thread that can take some delay before waking up the
     cleaner, this can speed up cleaning of deleted subvolumes

   - print an error message when opening a device fail, e.g. when it's
     unexpectedly read-only

  Core changes:

   - improved extent map handling in various ways (locking, iteration, ...)

   - new assertions and locking annotations

   - raid-stripe-tree locking fixes

   - use xarray for tracking dirty qgroup extents, switched from rb-tree

   - turn the subpage test to compile-time condition if possible (e.g.
     on x86_64 with 4K pages), this allows to skip a lot of ifs and
     remove dead code

   - more preparatory work for compression in subpage mode

  Cleanups and refactoring

   - folio API conversions, many simple cases where page is passed so
     switch it to folios

   - more subpage code refactoring, update page state bitmap processing

   - introduce auto free for btrfs_path structure, use for the simple
     cases"

* tag 'for-6.12-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (110 commits)
  btrfs: only unlock the to-be-submitted ranges inside a folio
  btrfs: merge btrfs_folio_unlock_writer() into btrfs_folio_end_writer_lock()
  btrfs: BTRFS_PATH_AUTO_FREE in orphan.c
  btrfs: use btrfs_path auto free in zoned.c
  btrfs: DEFINE_FREE for struct btrfs_path
  btrfs: remove btrfs_folio_end_all_writers()
  btrfs: constify more pointer parameters
  btrfs: rework BTRFS_I as macro to preserve parameter const
  btrfs: add and use helper to verify the calling task has locked the inode
  btrfs: always update fstrim_range on failure in FITRIM ioctl
  btrfs: convert copy_inline_to_page() to use folio
  btrfs: convert btrfs_decompress() to take a folio
  btrfs: convert zstd_decompress() to take a folio
  btrfs: convert lzo_decompress() to take a folio
  btrfs: convert zlib_decompress() to take a folio
  btrfs: convert try_release_extent_mapping() to take a folio
  btrfs: convert try_release_extent_state() to take a folio
  btrfs: convert submit_eb_page() to take a folio
  btrfs: convert submit_eb_subpage() to take a folio
  btrfs: convert read_key_bytes() to take a folio
  ...

7a40974f

Merge tag 'affs-for-6.12-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · effdcd52

Linus Torvalds authored Sep 16, 2024

Pull affs updates from David Sterba:
 "Cleanups removing unused code and updating the definition of a
  flexible struct array"

* tag 'affs-for-6.12-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  affs: Replace one-element array with flexible-array member
  affs: Remove unused macros GET_END_PTR, AFFS_GET_HASHENTRY

effdcd52

Merge tag 'vfs-6.12.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 35219bc5

Linus Torvalds authored Sep 16, 2024

Pull netfs updates from Christian Brauner:
 "This contains the work to improve read/write performance for the new
  netfs library.

  The main performance enhancing changes are:

   - Define a structure, struct folio_queue, and a new iterator type,
     ITER_FOLIOQ, to hold a buffer as a replacement for ITER_XARRAY. See
     that patch for questions about naming and form.

     ITER_FOLIOQ is provided as a replacement for ITER_XARRAY. The
     problem with an xarray is that accessing it requires the use of a
     lock (typically the RCU read lock) - and this means that we can't
     supply iterate_and_advance() with a step function that might sleep
     (crypto for example) without having to drop the lock between pages.
     ITER_FOLIOQ is the iterator for a chain of folio_queue structs,
     where each folio_queue holds a small list of folios. A folio_queue
     struct is a simpler structure than xarray and is not subject to
     concurrent manipulation by the VM. folio_queue is used rather than
     a bvec[] as it can form lists of indefinite size, adding to one end
     and removing from the other on the fly.

   - Provide a copy_folio_from_iter() wrapper.

   - Make cifs RDMA support ITER_FOLIOQ.

   - Use folio queues in the write-side helpers instead of xarrays.

   - Add a function to reset the iterator in a subrequest.

   - Simplify the write-side helpers to use sheaves to skip gaps rather
     than trying to work out where gaps are.

   - In afs, make the read subrequests asynchronous, putting them into
     work items to allow the next patch to do progressive
     unlocking/reading.

   - Overhaul the read-side helpers to improve performance.

   - Fix the caching of a partial block at the end of a file.

   - Allow a store to be cancelled.

  Then some changes for cifs to make it use folio queues instead of
  xarrays for crypto bufferage:

   - Use raw iteration functions rather than manually coding iteration
     when hashing data.

   - Switch to using folio_queue for crypto buffers.

   - Remove the xarray bits.

  Make some adjustments to the /proc/fs/netfs/stats file such that:

   - All the netfs stats lines begin 'Netfs:' but change this to
     something a bit more useful.

   - Add a couple of stats counters to track the numbers of skips and
     waits on the per-inode writeback serialisation lock to make it
     easier to check for this as a source of performance loss.

  Miscellaneous work:

   - Ensure that the sb_writers lock is taken around
     vfs_{set,remove}xattr() in the cachefiles code.

   - Reduce the number of conditional branches in netfs_perform_write().

   - Move the CIFS_INO_MODIFIED_ATTR flag to the netfs_inode struct and
     remove cifs_post_modify().

   - Move the max_len/max_nr_segs members from netfs_io_subrequest to
     netfs_io_request as they're only needed for one subreq at a time.

   - Add an 'unknown' source value for tracing purposes.

   - Remove NETFS_COPY_TO_CACHE as it's no longer used.

   - Set the request work function up front at allocation time.

   - Use bh-disabling spinlocks for rreq->lock as cachefiles completion
     may be run from block-filesystem DIO completion in softirq context.

   - Remove fs/netfs/io.c"

* tag 'vfs-6.12.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits)
  docs: filesystems: corrected grammar of netfs page
  cifs: Don't support ITER_XARRAY
  cifs: Switch crypto buffer to use a folio_queue rather than an xarray
  cifs: Use iterate_and_advance*() routines directly for hashing
  netfs: Cancel dirty folios that have no storage destination
  cachefiles, netfs: Fix write to partial block at EOF
  netfs: Remove fs/netfs/io.c
  netfs: Speed up buffered reading
  afs: Make read subreqs async
  netfs: Simplify the writeback code
  netfs: Provide an iterator-reset function
  netfs: Use new folio_queue data type and iterator instead of xarray iter
  cifs: Provide the capability to extract from ITER_FOLIOQ to RDMA SGEs
  iov_iter: Provide copy_folio_from_iter()
  mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of folios
  netfs: Use bh-disabling spinlocks for rreq->lock
  netfs: Set the request work function upon allocation
  netfs: Remove NETFS_COPY_TO_CACHE
  netfs: Reserve netfs_sreq_source 0 as unset/unknown
  netfs: Move max_len/max_nr_segs from netfs_io_subrequest to netfs_io_stream
  ...

35219bc5

Merge tag 'vfs-6.12.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 9020d0d8

Linus Torvalds authored Sep 16, 2024

Pull vfs mount updates from Christian Brauner:
 "Recently, we added the ability to list mounts in other mount
  namespaces and the ability to retrieve namespace file descriptors
  without having to go through procfs by deriving them from pidfds.

  This extends nsfs in two ways:

   (1) Add the ability to retrieve information about a mount namespace
       via NS_MNT_GET_INFO.

       This will return the mount namespace id and the number of mounts
       currently in the mount namespace. The number of mounts can be
       used to size the buffer that needs to be used for listmount() and
       is in general useful without having to actually iterate through
       all the mounts.

      The structure is extensible.

   (2) Add the ability to iterate through all mount namespaces over
       which the caller holds privilege returning the file descriptor
       for the next or previous mount namespace.

       To retrieve a mount namespace the caller must be privileged wrt
       to it's owning user namespace. This means that PID 1 on the host
       can list all mounts in all mount namespaces or that a container
       can list all mounts of its nested containers.

       Optionally pass a structure for NS_MNT_GET_INFO with
       NS_MNT_GET_{PREV,NEXT} to retrieve information about the mount
       namespace in one go.

  (1) and (2) can be implemented for other namespace types easily.

  Together with recent api additions this means one can iterate through
  all mounts in all mount namespaces without ever touching procfs.

  The commit message in 49224a34 ('Merge patch series "nsfs: iterate
  through mount namespaces"') contains example code how to do this"

* tag 'vfs-6.12.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  nsfs: iterate through mount namespaces
  file: add fput() cleanup helper
  fs: add put_mnt_ns() cleanup helper
  fs: allow mount namespace fd

9020d0d8

Merge tag 'vfs-6.12.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · e8fc317d

Linus Torvalds authored Sep 16, 2024

Pull procfs updates from Christian Brauner:
 "This contains the following changes for procfs:

   - Add config options and parameters to block forcing memory writes.

     This adds a Kconfig option and boot param to allow removing the
     FOLL_FORCE flag from /proc/<pid>/mem write calls as this can be
     used in various attacks.

     The traditional forcing behavior is kept as default because it can
     break GDB and some other use cases.

     This is the simpler version that you had requested.

   - Restrict overmounting of ephemeral entities.

     It is currently possible to mount on top of various ephemeral
     entities in procfs. This specifically includes magic links. To
     recap, magic links are links of the form /proc/<pid>/fd/<nr>. They
     serve as references to a target file and during path lookup they
     cause a jump to the target path. Such magic links disappear if the
     corresponding file descriptor is closed.

     Currently it is possible to overmount such magic links. This is
     mostly interesting for an attacker that wants to somehow trick a
     process into e.g., reopening something that it didn't intend to
     reopen or to hide a malicious file descriptor.

     But also it risks leaking mounts for long-running processes. When
     overmounting a magic link like above, the mount will not be
     detached when the file descriptor is closed. Only the target
     mountpoint will disappear. Which has the consequence of making it
     impossible to unmount that mount afterwards. So the mount will
     stick around until the process exits and the /proc/<pid>/ directory
     is cleaned up during proc_flush_pid() when the dentries are pruned
     and invalidated.

     That in turn means it's possible for a program to accidentally leak
     mounts and it's also possible to make a task leak mounts without
     it's knowledge if the attacker just keeps overmounting things under
     /proc/<pid>/fd/<nr>.

     Disallow overmounting of such ephemeral entities.

   - Cleanup the readdir method naming in some procfs file operations.

   - Replace kmalloc() and strcpy() with a simple kmemdup() call"

* tag 'vfs-6.12.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  proc: fold kmalloc() + strcpy() into kmemdup()
  proc: block mounting on top of /proc/<pid>/fdinfo/*
  proc: block mounting on top of /proc/<pid>/fd/*
  proc: block mounting on top of /proc/<pid>/map_files/*
  proc: add proc_splice_unmountable()
  proc: proc_readfdinfo() -> proc_fdinfo_iterate()
  proc: proc_readfd() -> proc_fd_iterate()
  proc: add config & param to block forcing mem writes

e8fc317d

Merge tag 'vfs-6.12.fallocate' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · ee25861f

Linus Torvalds authored Sep 16, 2024

Pull vfs fallocate updates from Christian Brauner:
 "This contains work to try and cleanup some the fallocate mode
  handling. Currently, it confusingly mixes operation modes and an
  optional flag.

  The work here tries to better define operation modes and optional
  flags allowing the core and filesystem code to use switch statements
  to switch on the operation mode"

* tag 'vfs-6.12.fallocate' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  xfs: refactor xfs_file_fallocate
  xfs: move the xfs_is_always_cow_inode check into xfs_alloc_file_space
  xfs: call xfs_flush_unmap_range from xfs_free_file_space
  fs: sort out the fallocate mode vs flag mess
  ext4: remove tracing for FALLOC_FL_NO_HIDE_STALE
  block: remove checks for FALLOC_FL_NO_HIDE_STALE

ee25861f

Merge tag 'vfs-6.12.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 3352633c

Linus Torvalds authored Sep 16, 2024

Pull vfs file updates from Christian Brauner:
 "This is the work to cleanup and shrink struct file significantly.

  Right now, (focusing on x86) struct file is 232 bytes. After this
  series struct file will be 184 bytes aka 3 cacheline and a spare 8
  bytes for future extensions at the end of the struct.

  With struct file being as ubiquitous as it is this should make a
  difference for file heavy workloads and allow further optimizations in
  the future.

   - struct fown_struct was embedded into struct file letting it take up
     32 bytes in total when really it shouldn't even be embedded in
     struct file in the first place. Instead, actual users of struct
     fown_struct now allocate the struct on demand. This frees up 24
     bytes.

   - Move struct file_ra_state into the union containg the cleanup hooks
     and move f_iocb_flags out of the union. This closes a 4 byte hole
     we created earlier and brings struct file to 192 bytes. Which means
     struct file is 3 cachelines and we managed to shrink it by 40
     bytes.

   - Reorder struct file so that nothing crosses a cacheline.

     I suspect that in the future we will end up reordering some members
     to mitigate false sharing issues or just because someone does
     actually provide really good perf data.

   - Shrinking struct file to 192 bytes is only part of the work.

     Files use a slab that is SLAB_TYPESAFE_BY_RCU and when a kmem cache
     is created with SLAB_TYPESAFE_BY_RCU the free pointer must be
     located outside of the object because the cache doesn't know what
     part of the memory can safely be overwritten as it may be needed to
     prevent object recycling.

     That has the consequence that SLAB_TYPESAFE_BY_RCU may end up
     adding a new cacheline.

     So this also contains work to add a new kmem_cache_create_rcu()
     function that allows the caller to specify an offset where the
     freelist pointer is supposed to be placed. Thus avoiding the
     implicit addition of a fourth cacheline.

   - And finally this removes the f_version member in struct file.

     The f_version member isn't particularly well-defined. It is mainly
     used as a cookie to detect concurrent seeks when iterating
     directories. But it is also abused by some subsystems for
     completely unrelated things.

     It is mostly a directory and filesystem specific thing that doesn't
     really need to live in struct file and with its wonky semantics it
     really lacks a specific function.

     For pipes, f_version is (ab)used to defer poll notifications until
     a write has happened. And struct pipe_inode_info is used by
     multiple struct files in their ->private_data so there's no chance
     of pushing that down into file->private_data without introducing
     another pointer indirection.

     But pipes don't rely on f_pos_lock so this adds a union into struct
     file encompassing f_pos_lock and a pipe specific f_pipe member that
     pipes can use. This union of course can be extended to other file
     types and is similar to what we do in struct inode already"

* tag 'vfs-6.12.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (26 commits)
  fs: remove f_version
  pipe: use f_pipe
  fs: add f_pipe
  ubifs: store cookie in private data
  ufs: store cookie in private data
  udf: store cookie in private data
  proc: store cookie in private data
  ocfs2: store cookie in private data
  input: remove f_version abuse
  ext4: store cookie in private data
  ext2: store cookie in private data
  affs: store cookie in private data
  fs: add generic_llseek_cookie()
  fs: use must_set_pos()
  fs: add must_set_pos()
  fs: add vfs_setpos_cookie()
  s390: remove unused f_version
  ceph: remove unused f_version
  adi: remove unused f_version
  mm: Removed @freeptr_offset to prevent doc warning
  ...

3352633c

Merge tag 'vfs-6.12.folio' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs · 2775df6e

Linus Torvalds authored Sep 16, 2024

Pull vfs folio updates from Christian Brauner:
 "This contains work to port write_begin and write_end to rely on folios
  for various filesystems.

  This converts ocfs2, vboxfs, orangefs, jffs2, hostfs, fuse, f2fs,
  ecryptfs, ntfs3, nilfs2, reiserfs, minixfs, qnx6, sysv, ufs, and
  squashfs.

  After this series lands a bunch of the filesystems in this list do not
  mention struct page anymore"

* tag 'vfs-6.12.folio' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (61 commits)
  Squashfs: Ensure all readahead pages have been used
  Squashfs: Rewrite and update squashfs_readahead_fragment() to not use page->index
  Squashfs: Update squashfs_readpage_block() to not use page->index
  Squashfs: Update squashfs_readahead() to not use page->index
  Squashfs: Update page_actor to not use page->index
  jffs2: Use a folio in jffs2_garbage_collect_dnode()
  jffs2: Convert jffs2_do_readpage_nolock to take a folio
  buffer: Convert __block_write_begin() to take a folio
  ocfs2: Convert ocfs2_write_zero_page to use a folio
  fs: Convert aops->write_begin to take a folio
  fs: Convert aops->write_end to take a folio
  vboxsf: Use a folio in vboxsf_write_end()
  orangefs: Convert orangefs_write_begin() to use a folio
  orangefs: Convert orangefs_write_end() to use a folio
  jffs2: Convert jffs2_write_begin() to use a folio
  jffs2: Convert jffs2_write_end() to use a folio
  hostfs: Convert hostfs_write_end() to use a folio
  fuse: Convert fuse_write_begin() to use a folio
  fuse: Convert fuse_write_end() to use a folio
  f2fs: Convert f2fs_write_begin() to use a folio
  ...

2775df6e