1. 09 Oct, 2023 1 commit
  2. 20 Jul, 2023 7 commits
  3. 25 Apr, 2023 25 commits
    • Linus Torvalds's avatar
      Merge tag 'printk-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux · 7ec85f3e
      Linus Torvalds authored
      Pull printk updates from Petr Mladek:
      
       - Code cleanup and dead code removal
      
      * tag 'printk-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
        printk: Remove obsoleted check for non-existent "user" object
        lib/vsprintf: Use isodigit() for the octal number check
        Remove orphaned CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT
      7ec85f3e
    • Linus Torvalds's avatar
      Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · df45da57
      Linus Torvalds authored
      Pull arm64 updates from Will Deacon:
       "ACPI:
      
         - Improve error reporting when failing to manage SDEI on AGDI device
           removal
      
        Assembly routines:
      
         - Improve register constraints so that the compiler can make use of
           the zero register instead of moving an immediate #0 into a GPR
      
         - Allow the compiler to allocate the registers used for CAS
           instructions
      
        CPU features and system registers:
      
         - Cleanups to the way in which CPU features are identified from the
           ID register fields
      
         - Extend system register definition generation to handle Enum types
           when defining shared register fields
      
         - Generate definitions for new _EL2 registers and add new fields for
           ID_AA64PFR1_EL1
      
         - Allow SVE to be disabled separately from SME on the kernel
           command-line
      
        Tracing:
      
         - Support for "direct calls" in ftrace, which enables BPF tracing for
           arm64
      
        Kdump:
      
         - Don't bother unmapping the crashkernel from the linear mapping,
           which then allows us to use huge (block) mappings and reduce TLB
           pressure when a crashkernel is loaded.
      
        Memory management:
      
         - Try again to remove data cache invalidation from the coherent DMA
           allocation path
      
         - Simplify the fixmap code by mapping at page granularity
      
         - Allow the kfence pool to be allocated early, preventing the rest of
           the linear mapping from being forced to page granularity
      
        Perf and PMU:
      
         - Move CPU PMU code out to drivers/perf/ where it can be reused by
           the 32-bit ARM architecture when running on ARMv8 CPUs
      
         - Fix race between CPU PMU probing and pKVM host de-privilege
      
         - Add support for Apple M2 CPU PMU
      
         - Adjust the generic PERF_COUNT_HW_BRANCH_INSTRUCTIONS event
           dynamically, depending on what the CPU actually supports
      
         - Minor fixes and cleanups to system PMU drivers
      
        Stack tracing:
      
         - Use the XPACLRI instruction to strip PAC from pointers, rather than
           rolling our own function in C
      
         - Remove redundant PAC removal for toolchains that handle this in
           their builtins
      
         - Make backtracing more resilient in the face of instrumentation
      
        Miscellaneous:
      
         - Fix single-step with KGDB
      
         - Remove harmless warning when 'nokaslr' is passed on the kernel
           command-line
      
         - Minor fixes and cleanups across the board"
      
      * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (72 commits)
        KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege
        arm64: kexec: include reboot.h
        arm64: delete dead code in this_cpu_set_vectors()
        arm64/cpufeature: Use helper macro to specify ID register for capabilites
        drivers/perf: hisi: add NULL check for name
        drivers/perf: hisi: Remove redundant initialized of pmu->name
        arm64/cpufeature: Consistently use symbolic constants for min_field_value
        arm64/cpufeature: Pull out helper for CPUID register definitions
        arm64/sysreg: Convert HFGITR_EL2 to automatic generation
        ACPI: AGDI: Improve error reporting for problems during .remove()
        arm64: kernel: Fix kernel warning when nokaslr is passed to commandline
        perf/arm-cmn: Fix port detection for CMN-700
        arm64: kgdb: Set PSTATE.SS to 1 to re-enable single-step
        arm64: move PAC masks to <asm/pointer_auth.h>
        arm64: use XPACLRI to strip PAC
        arm64: avoid redundant PAC stripping in __builtin_return_address()
        arm64/sme: Fix some comments of ARM SME
        arm64/signal: Alloc tpidr2 sigframe after checking system_supports_tpidr2()
        arm64/signal: Use system_supports_tpidr2() to check TPIDR2
        arm64/idreg: Don't disable SME when disabling SVE
        ...
      df45da57
    • Linus Torvalds's avatar
      Merge tag 'asm-generic-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic · 53b5e72b
      Linus Torvalds authored
      Pull asm-generic updates from Arnd Bergmann:
       "These are various cleanups, fixing a number of uapi header files to no
        longer reference CONFIG_* symbols, and one patch that introduces the
        new CONFIG_HAS_IOPORT symbol for architectures that provide working
        inb()/outb() macros, as a preparation for adding driver dependencies
        on those in the following release"
      
      * tag 'asm-generic-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
        Kconfig: introduce HAS_IOPORT option and select it as necessary
        scripts: Update the CONFIG_* ignore list in headers_install.sh
        pktcdvd: Remove CONFIG_CDROM_PKTCDVD_WCACHE from uapi header
        Move bp_type_idx to include/linux/hw_breakpoint.h
        Move ep_take_care_of_epollwakeup() to fs/eventpoll.c
        Move COMPAT_ATM_ADDPARTY to net/atm/svc.c
      53b5e72b
    • Linus Torvalds's avatar
      Merge tag 'soc-dt-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · d53c3eaa
      Linus Torvalds authored
      Pull ARM SoC devicetree updates from Arnd Bergmann:
       "The devicetree changes overall are again dominated by the Qualcomm
        Snapdragon platform that weighs in at over 300 changesets, but there
        are many updates across other platforms as well, notably Mediatek,
        NXP, Rockchips, Renesas, TI, Samsung and ST Microelectronics. These
        all add new features for existing machines, as well as new machines
        and SoCs.
      
        The newly added SoCs are:
      
         - Allwinner T113-s, an Cortex-A7 based variant of the RISC-V based D1
           chip.
      
         - StarFive JH7110, a RISC-V SoC based on the Sifive U74 core like its
           JH7100 predecessor, but with additional CPU cores and a GPU.
      
         - Apple M2 as used in current Macbook Air/Pro and Mac Mini gets
           added, with comparable support as its M1 predecessor.
      
         - Unisoc UMS512 (Tiger T610) is a midrange smartphone SoC
      
         - Qualcomm IPQ5332 and IPQ9574 are Wi-Fi 7 networking SoCs, based on
           the Cortex-A53 and Cortex-A73 cores, respectively.
      
         - Qualcomm sa8775p is an automotive SoC derived from the Snapdragon
           family.
      
        Including the initial board support for the added SoC platforms, there
        are 52 new machines. The largest group are 19 boards industrial
        embedded boards based on the NXP i.MX6 (32-bit) and i.MX8 (64-bit)
        families.
      
        Others include:
      
         - Two boards based on the Allwinner f1c200s ultra-low-cost chip
      
         - Three 'Banana Pi' variants based on the Amlogic g12b (A311D, S922X)
           SoC.
      
         - The Gl.Inet mv1000 router based on Marvell Armada 3720
      
         - A Wifi/LTE Dongle based on Qualcomm msm8916
      
         - Two robotics boards based on Qualcomm QRB chips
      
         - Three Snapdragon based phones made by Xiaomi
      
         - Five developments boards based on various Rockchip SoCs, including
           the rk3588s-khadas-edge2 and a few NanoPi models
      
         - The AM625 Beagleplay industrial SBC
      
        Another 14 machines get removed: both boards for the obsolete 'oxnas'
        platform, three boards for the Renesas r8a77950 SoC that were only for
        pre-production chips, and various chromebook models based on the
        Qualcomm Sc7180 'trogdor' design that were never part of products"
      
      * tag 'soc-dt-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (836 commits)
        arm64: dts: rockchip: Add support for volume keys to rk3399-pinephone-pro
        arm64: dts: rockchip: Add vdd_cpu_big regulators to rk3588-rock-5b
        arm64: dts: rockchip: Use generic name for es8316 on Pinebook Pro and Rock 5B
        arm64: dts: rockchip: Drop RTC clock-frequency on rk3588-rock-5b
        arm64: dts: apple: t8112: Add PWM controller
        arm64: dts: apple: t600x: Add PWM controller
        arm64: dts: apple: t8103: Add PWM controller
        arm64: dts: rockchip: Add pinctrl gpio-ranges for rk356x
        ARM: dts: nomadik: Replace deprecated spi-gpio properties
        ARM: dts: aspeed-g6: Add UDMA node
        ARM: dts: aspeed: greatlakes: add mctp device
        ARM: dts: aspeed: greatlakes: Add gpio names
        ARM: dts: aspeed: p10bmc: Change power supply info
        arm64: dts: mediatek: mt6795-xperia-m5: Add Bosch BMM050 Magnetometer
        arm64: dts: mediatek: mt6795-xperia-m5: Add Bosch BMA255 Accelerometer
        arm64: dts: mediatek: mt6795: Add tertiary PWM node
        arm64: dts: rockchip: add panel to Anbernic RG353 series
        dt-bindings: arm: Add Data Modul i.MX8M Plus eDM SBC
        dt-bindings: arm: fsl: Add chargebyte Tarragon
        dt-bindings: vendor-prefixes: add chargebyte
        ...
      d53c3eaa
    • Linus Torvalds's avatar
      Merge tag 'soc-defconfig-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · 672d2dae
      Linus Torvalds authored
      Pull ARM SoC defconfig updates from Arnd Bergmann:
       "Most of the changes just enable additional device drivers that were
        added or that are often used on major platforms.
      
        The virtconfig added last time now disables additional drivers to
        shrink kernels for virtual machines.
      
        The obsolete oxnas_v6_defconfig file is removed in turn"
      
      * tag 'soc-defconfig-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (33 commits)
        ARM: config: Update Vexpress defconfig
        arm64: defconfig: enable building the nvmem-reboot-mode module
        arm64: defconfig: Enable TI ADC driver
        arm64: defconfig: Enable TI TSCADC driver
        arm64: defconfig: Enable security accelerator driver for TI K3 SoCs
        arm64: defconfig: Enable crypto test module
        ARM: multi_v7_defconfig: Add OPTEE support
        ARM: configs: Update U8500 defconfig
        ARM: imx_v4_v5_defconfig: Build CONFIG_IMX_SDMA as module
        arm64: defconfig: Enable IPQ9574 SoC base configs
        ARM: imx_v6_v7_defconfig: Enable Tarragon peripheral drivers
        arm64: defconfig: Enable ARM CoreSight PMU driver
        arm64: defconfig: remove duplicate TYPEC_UCSI & QCOM_PMIC_GLINK
        ARM: configs: remove oxnas_v6_defconfig
        arm64: defconfig: Enable audio drivers for AM62-SK
        arm64: defconfig: Enable drivers for BeaglePlay
        ARM: imx_v6_v7_defconfig: Select CONFIG_DRM_I2C_NXP_TDA998X
        arm64: defconfig: Enable Virtio RNG driver as built in
        arm64: defconfig: Enable CAN PHY transceiver driver
        arm64: defconfig: add PMIC GLINK modules
        ...
      672d2dae
    • Linus Torvalds's avatar
      Merge tag 'soc-drivers-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · a9070477
      Linus Torvalds authored
      Pull ARM SoC driver updates from Arnd Bergmann:
       "The most notable updates this time are for Qualcomm Snapdragon
        platforms. The Inline-Crypto-Engine gets a new DT binding and driver,
        and a number of drivers now support additional Snapdragon variants, in
        particular the rsc, scm, geni, bwm, glink and socinfo, while the llcc
        (edac) and rpm drivers get notable functionality updates.
      
        Updates on other platforms include:
      
         - Various updates to the Mediatek mutex and mmsys drivers, including
           support for the Helio X10 SoC
      
         - Support for unidirectional mailbox channels in Arm SCMI firmware
      
         - Support for per cpu asynchronous notification in OP-TEE firmware
      
         - Minor updates for memory controller drivers.
      
         - Minor updates for Renesas, TI, Amlogic, Apple, Broadcom, Tegra,
           Allwinner, Versatile Express, Canaan, Microchip, Mediatek and i.MX
           SoC drivers, mainly updating the use of MODULE_LICENSE() macros and
           obsolete DT driver interfaces"
      
      * tag 'soc-drivers-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (165 commits)
        soc: ti: smartreflex: Simplify getting the opam_sr pointer
        bus: vexpress-config: Add explicit of_platform.h include
        soc: mediatek: Kconfig: Add MTK_CMDQ dependency to MTK_MMSYS
        memory: mtk-smi: mt8365: Add SMI Support
        dt-bindings: memory-controllers: mediatek,smi-larb: add mt8365
        dt-bindings: memory-controllers: mediatek,smi-common: add mt8365
        memory: tegra: read values from correct device
        dt-bindings: crypto: Add Qualcomm Inline Crypto Engine
        soc: qcom: Make the Qualcomm UFS/SDCC ICE a dedicated driver
        dt-bindings: firmware: document Qualcomm QCM2290 SCM
        soc: qcom: rpmh-rsc: Support RSC v3 minor versions
        soc: qcom: smd-rpm: Use GFP_ATOMIC in write path
        soc/tegra: fuse: Remove nvmem root only access
        soc/tegra: cbb: tegra194: Use of_address_count() helper
        soc/tegra: cbb: Remove MODULE_LICENSE in non-modules
        ARM: tegra: Remove MODULE_LICENSE in non-modules
        soc/tegra: flowctrl: Use devm_platform_get_and_ioremap_resource()
        soc: tegra: cbb: Drop empty platform remove function
        firmware: arm_scmi: Add support for unidirectional mailbox channels
        dt-bindings: firmware: arm,scmi: Support mailboxes unidirectional channels
        ...
      a9070477
    • Linus Torvalds's avatar
      Merge tag 'soc-arm-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · 18032df5
      Linus Torvalds authored
      Pull ARM SoC updates from Arnd Bergmann:
       "The Oxford Semiconductor OX810/OX820 'Oxnas' platform gets retired
        after the ARM11MPcore processor keeps causing problems in certain
        corner cases. OX820 was the only remaining SoC with this core after
        CNS3xxx got retired, and its driver support was never completely
        merged upstream. The Arm 'Realview' reference platform still supports
        ARM11MPCore in principle, but this was never a product, and the CPU
        support will get cleaned up later on.
      
        Another series updates the mv78xx0 platform, which has been similarly
        neglected for a while, but should work properly again now.
      
        The other changes are minor cleanups across platforms, mostly
        converting code to more modern interfaces for DT nodes and removing
        some more code as a follow-up to the large-scale platform removal in
        linux-6.3"
      
      * tag 'soc-arm-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (28 commits)
        ARM: mv78xx0: fix entries for gpios, buttons and usb ports
        ARM: mv78xx0: add code to enable XOR and CRYPTO engines on mv78xx0
        ARM: mv78xx0: set the correct driver for the i2c RTC
        ARM: mv78xx0: adjust init logic for ts-wxl to reflect single core dev
        soc: fsl: Use of_property_present() for testing DT property presence
        ARM: pxa: Use of_property_read_bool() for boolean properties
        firmware: turris-mox-rwtm: make kobj_type structure constant
        ARM: oxnas: remove OXNAS support
        ARM: sh-mobile: Use of_cpu_node_to_id() to read CPU node 'reg'
        ARM: OMAP2+: hwmod: Use kzalloc for allocating only one element
        ARM: OMAP2+: Remove the unneeded result variable
        ARM: OMAP2+: fix repeated words in comments
        ARM: OMAP2+: remove obsolete config OMAP3_SDRC_AC_TIMING
        ARM: OMAP2+: Use of_address_to_resource()
        ARM: OMAP2+: Use of_property_read_bool() for boolean properties
        ARM: omap1: remove redundant variables err
        ARM: omap1: Kconfig: Fix indentation
        ARM: bcm: Use of_address_to_resource()
        ARM: mstar: remove unused config MACH_MERCURY
        ARM: spear: remove obsolete config MACH_SPEAR600
        ...
      18032df5
    • Linus Torvalds's avatar
      Merge tag 'x86-apic-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · de10553f
      Linus Torvalds authored
      Pull x86 APIC updates from Thomas Gleixner:
      
       - Fix the incorrect handling of atomic offset updates in
         reserve_eilvt_offset()
      
         The check for the return value of atomic_cmpxchg() is not compared
         against the old value, it is compared against the new value, which
         makes it two round on success.
      
         Convert it to atomic_try_cmpxchg() which does the right thing.
      
       - Handle IO/APIC less systems correctly
      
         When IO/APIC is not advertised by ACPI then the computation of the
         lower bound for dynamically allocated interrupts like MSI goes wrong.
      
         This lower bound is used to exclude the IO/APIC legacy GSI space as
         that must stay reserved for the legacy interrupts.
      
         In case that the system, e.g. VM, does not advertise an IO/APIC the
         lower bound stays at 0.
      
         0 is an invalid interrupt number except for the legacy timer
         interrupt on x86. The return value is unchecked in the core code, so
         it ends up to allocate interrupt number 0 which is subsequently
         considered to be invalid by the caller, e.g. the MSI allocation code.
      
         A similar problem was already cured for device tree based systems
         years ago, but that missed - or did not envision - the zero IO/APIC
         case.
      
         Consolidate the zero check and return the provided "from" argument to
         the core code call site, which is guaranteed to be greater than 0.
      
       - Simplify the X2APIC cluster CPU mask logic for CPU hotplug
      
         Per cluster CPU masks are required for X2APIC in cluster mode to
         determine the correct cluster for a target CPU when calculating the
         destination for IPIs
      
         These masks are established when CPUs are borught up. The first CPU
         in a cluster must allocate a new cluster CPU mask. As this happens
         during the early startup of a CPU, where memory allocations cannot be
         done, the mask has to be allocated by the control CPU.
      
         The current implementation allocates a clustermask just in case and
         if the to be brought up CPU is the first in a cluster the CPU takes
         over this allocation from a global pointer.
      
         This works nicely in the fully serialized CPU bringup scenario which
         is used today, but would fail completely for parallel bringup of
         CPUs.
      
         The cluster association of a CPU can be computed from the APIC ID
         which is enumerated by ACPI/MADT.
      
         So the cluster CPU masks can be preallocated and associated upfront
         and the upcoming CPUs just need to set their corresponding bit.
      
         Aside of preparing for parallel bringup this is a valuable
         simplification on its own.
      
       - Remove global variables which control the early startup of secondary
         CPUs on 64-bit
      
         The only information which is needed by a starting CPU is the Linux
         CPU number. The CPU number allows it to retrieve the rest of the
         required data from already existing per CPU storage.
      
         So instead of initial_stack, early_gdt_desciptor and initial_gs
         provide a new variable smpboot_control which contains the Linux CPU
         number for now. The starting CPU can retrieve and compute all
         required information for startup from there.
      
         Aside of being a cleanup, this is also preparing for parallel CPU
         bringup, where starting CPUs will look up their Linux CPU number via
         the APIC ID, when smpboot_control has the corresponding control bit
         set.
      
       - Make cc_vendor globally accesible
      
         Subsequent parallel bringup changes require access to cc_vendor
         because confidental computing platforms need special treatment in the
         early startup phase vs. CPUID and APCI ID readouts.
      
         The change makes cc_vendor global and provides stub accessors in case
         that CONFIG_ARCH_HAS_CC_PLATFORM is not set.
      
         This was merged from the x86/cc branch in anticipation of further
         parallel bringup commits which require access to cc_vendor. Due to
         late discoveries of fundamental issue with those patches these
         commits never happened.
      
         The merge commit is unfortunately in the middle of the APIC commits
         so unraveling it would have required a rebase or revert. As the
         parallel bringup seems to be well on its way for 6.5 this would be
         just pointless churn. As the commit does not contain any functional
         change it's not a risk to keep it.
      
      * tag 'x86-apic-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/ioapic: Don't return 0 from arch_dynirq_lower_bound()
        x86/apic: Fix atomic update of offset in reserve_eilvt_offset()
        x86/coco: Export cc_vendor
        x86/smpboot: Reference count on smpboot_setup_warm_reset_vector()
        x86/smpboot: Remove initial_gs
        x86/smpboot: Remove early_gdt_descr on 64-bit
        x86/smpboot: Remove initial_stack on 64-bit
        x86/apic/x2apic: Allow CPU cluster_mask to be populated in parallel
      de10553f
    • Linus Torvalds's avatar
      Merge tag 'timers-core-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e7989789
      Linus Torvalds authored
      Pull timers and timekeeping updates from Thomas Gleixner:
      
       - Improve the VDSO build time checks to cover all dynamic relocations
      
         VDSO does not allow dynamic relocations, but the build time check is
         incomplete and fragile.
      
         It's based on architectures specifying the relocation types to search
         for and does not handle R_*_NONE relocation entries correctly.
         R_*_NONE relocations are injected by some GNU ld variants if they
         fail to determine the exact .rel[a]/dyn_size to cover trailing zeros.
         R_*_NONE relocations must be ignored by dynamic loaders, so they
         should be ignored in the build time check too.
      
         Remove the architecture specific relocation types to check for and
         validate strictly that no other relocations than R_*_NONE end up in
         the VSDO .so file.
      
       - Prefer signal delivery to the current thread for
         CLOCK_PROCESS_CPUTIME_ID based posix-timers
      
         Such timers prefer to deliver the signal to the main thread of a
         process even if the context in which the timer expires is the current
         task. This has the downside that it might wake up an idle thread.
      
         As there is no requirement or guarantee that the signal has to be
         delivered to the main thread, avoid this by preferring the current
         task if it is part of the thread group which shares sighand.
      
         This not only avoids waking idle threads, it also distributes the
         signal delivery in case of multiple timers firing in the context of
         different threads close to each other better.
      
       - Align the tick period properly (again)
      
         For a long time the tick was starting at CLOCK_MONOTONIC zero, which
         allowed users space applications to either align with the tick or to
         place a periodic computation so that it does not interfere with the
         tick. The alignement of the tick period was more by chance than by
         intention as the tick is set up before a high resolution clocksource
         is installed, i.e. timekeeping is still tick based and the tick
         period advances from there.
      
         The early enablement of sched_clock() broke this alignement as the
         time accumulated by sched_clock() is taken into account when
         timekeeping is initialized. So the base value now(CLOCK_MONOTONIC) is
         not longer a multiple of tick periods, which breaks applications
         which relied on that behaviour.
      
         Cure this by aligning the tick starting point to the next multiple of
         tick periods, i.e 1000ms/CONFIG_HZ.
      
       - A set of NOHZ fixes and enhancements:
      
           * Cure the concurrent writer race for idle and IO sleeptime
             statistics
      
             The statitic values which are exposed via /proc/stat are updated
             from the CPU local idle exit and remotely by cpufreq, but that
             happens without any form of serialization. As a consequence
             sleeptimes can be accounted twice or worse.
      
             Prevent this by restricting the accumulation writeback to the CPU
             local idle exit and let the remote access compute the accumulated
             value.
      
           * Protect idle/iowait sleep time with a sequence count
      
             Reading idle/iowait sleep time, e.g. from /proc/stat, can race
             with idle exit updates. As a consequence the readout may result
             in random and potentially going backwards values.
      
             Protect this by a sequence count, which fixes the idle time
             statistics issue, but cannot fix the iowait time problem because
             iowait time accounting races with remote wake ups decrementing
             the remote runqueues nr_iowait counter. The latter is impossible
             to fix, so the only way to deal with that is to document it
             properly and to remove the assertion in the selftest which
             triggers occasionally due to that.
      
           * Restructure struct tick_sched for better cache layout
      
           * Some small cleanups and a better cache layout for struct
             tick_sched
      
       - Implement the missing timer_wait_running() callback for POSIX CPU
         timers
      
         For unknown reason the introduction of the timer_wait_running()
         callback missed to fixup posix CPU timers, which went unnoticed for
         almost four years.
      
         While initially only targeted to prevent livelocks between a timer
         deletion and the timer expiry function on PREEMPT_RT enabled kernels,
         it turned out that fixing this for mainline is not as trivial as just
         implementing a stub similar to the hrtimer/timer callbacks.
      
         The reason is that for CONFIG_POSIX_CPU_TIMERS_TASK_WORK enabled
         systems there is a livelock issue independent of RT.
      
         CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y moves the expiry of POSIX CPU
         timers out from hard interrupt context to task work, which is handled
         before returning to user space or to a VM. The expiry mechanism moves
         the expired timers to a stack local list head with sighand lock held.
         Once sighand is dropped the task can be preempted and a task which
         wants to delete a timer will spin-wait until the expiry task is
         scheduled back in. In the worst case this will end up in a livelock
         when the preempting task and the expiry task are pinned on the same
         CPU.
      
         The timer wheel has a timer_wait_running() mechanism for RT, which
         uses a per CPU timer-base expiry lock which is held by the expiry
         code and the task waiting for the timer function to complete blocks
         on that lock.
      
         This does not work in the same way for posix CPU timers as there is
         no timer base and expiry for process wide timers can run on any task
         belonging to that process, but the concept of waiting on an expiry
         lock can be used too in a slightly different way.
      
         Add a per task mutex to struct posix_cputimers_work, let the expiry
         task hold it accross the expiry function and let the deleting task
         which waits for the expiry to complete block on the mutex.
      
         In the non-contended case this results in an extra
         mutex_lock()/unlock() pair on both sides.
      
         This avoids spin-waiting on a task which is scheduled out, prevents
         the livelock and cures the problem for RT and !RT systems
      
      * tag 'timers-core-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        posix-cpu-timers: Implement the missing timer_wait_running callback
        selftests/proc: Assert clock_gettime(CLOCK_BOOTTIME) VS /proc/uptime monotonicity
        selftests/proc: Remove idle time monotonicity assertions
        MAINTAINERS: Remove stale email address
        timers/nohz: Remove middle-function __tick_nohz_idle_stop_tick()
        timers/nohz: Add a comment about broken iowait counter update race
        timers/nohz: Protect idle/iowait sleep time under seqcount
        timers/nohz: Only ever update sleeptime from idle exit
        timers/nohz: Restructure and reshuffle struct tick_sched
        tick/common: Align tick period with the HZ tick.
        selftests/timers/posix_timers: Test delivery of signals across threads
        posix-timers: Prefer delivery of signals to the current thread
        vdso: Improve cmd_vdso_check to check all dynamic relocations
      e7989789
    • Linus Torvalds's avatar
      Merge tag 'irq-core-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 3f614ab5
      Linus Torvalds authored
      Pull interrupt updates from Thomas Gleixner:
       "Core:
      
         - Add tracepoints for tasklet callbacks which makes it possible to
           analyze individual tasklet functions instead of guess working from
           the overall duration of tasklet processing
      
         - Ensure that secondary interrupt threads have their affinity
           adjusted correctly
      
        Drivers:
      
         - A large rework of the RISC-V IPI management to prepare for a new
           RISC-V interrupt architecture
      
         - Small fixes and enhancements all over the place
      
         - Removal of support for various obsolete hardware platforms and the
           related code"
      
      * tag 'irq-core-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
        irqchip/st: Remove stih415/stih416 and stid127 platforms support
        irqchip/gic-v3: Add Rockchip 3588001 erratum workaround
        genirq: Update affinity of secondary threads
        softirq: Add trace points for tasklet entry/exit
        irqchip/loongson-pch-pic: Fix pch_pic_acpi_init calling
        irqchip/loongson-pch-pic: Fix registration of syscore_ops
        irqchip/loongson-eiointc: Fix registration of syscore_ops
        irqchip/loongson-eiointc: Fix incorrect use of acpi_get_vec_parent
        irqchip/loongson-eiointc: Fix returned value on parsing MADT
        irqchip/riscv-intc: Add empty irq_eoi() for chained irq handlers
        RISC-V: Use IPIs for remote icache flush when possible
        RISC-V: Use IPIs for remote TLB flush when possible
        RISC-V: Allow marking IPIs as suitable for remote FENCEs
        RISC-V: Treat IPIs as normal Linux IRQs
        irqchip/riscv-intc: Allow drivers to directly discover INTC hwnode
        RISC-V: Clear SIP bit only when using SBI IPI operations
        irqchip/irq-sifive-plic: Add syscore callbacks for hibernation
        irqchip: Use of_property_read_bool() for boolean properties
        irqchip/bcm-6345-l1: Request memory region
        irqchip/gicv3: Workaround for NVIDIA erratum T241-FABRIC-4
        ...
      3f614ab5
    • Linus Torvalds's avatar
      Merge tag 'core-entry-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 15bbeec0
      Linus Torvalds authored
      Pull core entry/ptrace update from Thomas Gleixner:
       "Provide a ptrace set/get interface for syscall user dispatch. The main
        purpose is to enable checkpoint/restore (CRIU) to handle processes
        which utilize syscall user dispatch correctly"
      
      * tag 'core-entry-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        selftest, ptrace: Add selftest for syscall user dispatch config api
        ptrace: Provide set/get interface for syscall user dispatch
        syscall_user_dispatch: Untag selector address before access_ok()
        syscall_user_dispatch: Split up set_syscall_user_dispatch()
      15bbeec0
    • Linus Torvalds's avatar
      Merge tag 'core-debugobjects-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 29e95a4b
      Linus Torvalds authored
      Pull core debugobjects update from Thomas Gleixner:
       "A single update to debugobjects:
      
        Prevent a race vs statically initialized objects. Such objects are
        usually not initialized via an init() function. They are special cased
        and detected on first use under the assumption that they are already
        correctly initialized via the static initializer.
      
        This works correctly unless there are two concurrent debug object
        operations on such an object.
      
        The first one detects that the object is not yet tracked and tries to
        establish a tracking object after dropping the debug objects hash
        bucket lock. The concurrent operation does the same. The one which
        wins the race ends up modifying the state of the object which makes
        the other one fail resulting in a bogus debug objects warning.
      
        Prevent this by making the detection of a static object and the
        allocation of a tracking object atomic under the hash bucket lock. So
        the first one to acquire the hash bucket lock will succeed and the
        second one will observe the correct tracking state.
      
        This race existed forever but was only exposed when the timer wheel
        code added a debug_object_assert_init() call outside of the timer base
        locked region. This replaced the previous warning about
        timer::function being NULL which had to be removed when the
        timer_shutdown() mechanics were added"
      
      * tag 'core-debugobjects-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        debugobject: Prevent init race with static objects
      29e95a4b
    • Linus Torvalds's avatar
      Merge tag 'x86_sev_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · bc1bb2a4
      Linus Torvalds authored
      Pull x86 SEV updates from Borislav Petkov:
      
       - Add the necessary glue so that the kernel can run as a confidential
         SEV-SNP vTOM guest on Hyper-V. A vTOM guest basically splits the
         address space in two parts: encrypted and unencrypted. The use case
         being running unmodified guests on the Hyper-V confidential computing
         hypervisor
      
       - Double-buffer messages between the guest and the hardware PSP device
         so that no partial buffers are copied back'n'forth and thus potential
         message integrity and leak attacks are possible
      
       - Name the return value the sev-guest driver returns when the hw PSP
         device hasn't been called, explicitly
      
       - Cleanups
      
      * tag 'x86_sev_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/hyperv: Change vTOM handling to use standard coco mechanisms
        init: Call mem_encrypt_init() after Hyper-V hypercall init is done
        x86/mm: Handle decryption/re-encryption of bss_decrypted consistently
        Drivers: hv: Explicitly request decrypted in vmap_pfn() calls
        x86/hyperv: Reorder code to facilitate future work
        x86/ioremap: Add hypervisor callback for private MMIO mapping in coco VM
        x86/sev: Change snp_guest_issue_request()'s fw_err argument
        virt/coco/sev-guest: Double-buffer messages
        crypto: ccp: Get rid of __sev_platform_init_locked()'s local function pointer
        crypto: ccp - Name -1 return value as SEV_RET_NO_FW_CALL
      bc1bb2a4
    • Linus Torvalds's avatar
      Merge tag 'x86_paravirt_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c42b59bf
      Linus Torvalds authored
      Pull x86 paravirt updates from Borislav Petkov:
      
       - Convert a couple of paravirt callbacks to asm to prevent
         '-fzero-call-used-regs' builds from zeroing live registers because
         paravirt hides the CALLs from the compiler so latter doesn't know
         there's a CALL in the first place
      
       - Merge two paravirt callbacks into one, as their functionality is
         identical
      
      * tag 'x86_paravirt_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/paravirt: Convert simple paravirt functions to asm
        x86/paravirt: Merge activate_mm() and dup_mmap() callbacks
      c42b59bf
    • Linus Torvalds's avatar
      Merge tag 'x86_misc_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4a4a28fc
      Linus Torvalds authored
      Pull misc x86 updates from Borislav Petkov:
      
       - Add a x86 hw vulnerabilities section to MAINTAINERS so that the folks
         involved in it can get CCed on patches
      
       - Add some more CPUID leafs to the kcpuid tool and extend its
         functionality to be more useful when grepping for CPUID bits
      
      * tag 'x86_misc_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        MAINTAINERS: Add x86 hardware vulnerabilities section
        tools/x86/kcpuid: Dump the CPUID function in detailed view
        tools/x86/kcpuid: Update AMD leaf Fn80000001
        tools/x86/kcpuid: Fix avx512bw and avx512lvl fields in Fn00000007
      4a4a28fc
    • Linus Torvalds's avatar
      Merge tag 'x86_cpu_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e3420f98
      Linus Torvalds authored
      Pull x86 cpu model updates from Borislav Petkov:
      
       - Add Emerald Rapids to the list of Intel models supporting PPIN
      
       - Finally use a CPUID bit for split lock detection instead of
         enumerating every model
      
       - Make sure automatic IBRS is set on AMD, even though the AP bringup
         code does that now by replicating the MSR which contains the switch
      
      * tag 'x86_cpu_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/cpu: Add Xeon Emerald Rapids to list of CPUs that support PPIN
        x86/split_lock: Enumerate architectural split lock disable bit
        x86/CPU/AMD: Make sure EFER[AIBRSE] is set
      e3420f98
    • Linus Torvalds's avatar
      Merge tag 'x86_acpi_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1699dbeb
      Linus Torvalds authored
      Pull x86 ACPI update from Borislav Petkov:
      
       - Improve code generation in ACPI's global lock's acquisition function
      
      * tag 'x86_acpi_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/ACPI/boot: Improve __acpi_acquire_global_lock
      1699dbeb
    • Linus Torvalds's avatar
      Merge tag 'ras_core_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · d3464152
      Linus Torvalds authored
      Pull RAS updates from Borislav Petkov:
      
       - Just cleanups and fixes this time around: make threshold_ktype const,
         an objtool fix and use proper size for a bitmap
      
      * tag 'ras_core_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/MCE/AMD: Use an u64 for bank_map
        x86/mce: Always inline old MCA stubs
        x86/MCE/AMD: Make kobj_type structure constant
      d3464152
    • Linus Torvalds's avatar
      Merge tag 'edac_updates_for_v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras · e94ee641
      Linus Torvalds authored
      Pull EDAC updates from Borislav Petkov:
      
       - skx_edac: Fix overflow when decoding 32G DIMM ranks
      
       - i10nm_edac: Add Sierra Forest support
      
       - amd64_edac: Split driver code between legacy and SMCA systems. The
         final goal is adding support for more hw, like GPUs
      
       - The usual minor cleanups and fixes
      
      * tag 'edac_updates_for_v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: (25 commits)
        EDAC/i10nm: Add Intel Sierra Forest server support
        EDAC/amd64: Fix indentation in umc_determine_edac_cap()
        EDAC/altera: Remove MODULE_LICENSE in non-module
        EDAC: Sanitize MODULE_AUTHOR strings
        EDAC/amd81[13]1: Remove trailing newline from MODULE_AUTHOR
        EDAC/amd64: Add get_err_info() to pvt->ops
        EDAC/amd64: Split dump_misc_regs() into dct/umc functions
        EDAC/amd64: Split init_csrows() into dct/umc functions
        EDAC/amd64: Split determine_edac_cap() into dct/umc functions
        EDAC/amd64: Rename f17h_determine_edac_ctl_cap()
        EDAC/amd64: Split setup_mci_misc_attrs() into dct/umc functions
        EDAC/amd64: Split ecc_enabled() into dct/umc functions
        EDAC/amd64: Split read_mc_regs() into dct/umc functions
        EDAC/amd64: Split determine_memory_type() into dct/umc functions
        EDAC/amd64: Split read_base_mask() into dct/umc functions
        EDAC/amd64: Split prep_chip_selects() into dct/umc functions
        EDAC/amd64: Rework hw_info_{get,put}
        EDAC/amd64: Merge struct amd64_family_type into struct amd64_pvt
        EDAC/amd64: Do not discover ECC symbol size for Family 17h and later
        EDAC/amd64: Drop dbam_to_cs() for Family 17h and later
        ...
      e94ee641
    • Linus Torvalds's avatar
      Merge tag 'm68k-for-v6.4-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k · f7301270
      Linus Torvalds authored
      Pull m68k updates from Geert Uytterhoeven:
      
       - defconfig updates
      
       - miscellaneous fixes and improvements
      
      * tag 'm68k-for-v6.4-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
        m68k: kexec: Include <linux/reboot.h>
        m68k: defconfig: Update defconfigs for v6.3-rc1
        m68k: Remove obsolete config NO_KERNEL_MSG
        nubus: Drop noop match function
      f7301270
    • Linus Torvalds's avatar
      Merge tag 'pull-nios2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 173ea743
      Linus Torvalds authored
      Pull trivial nios2 cleanup from Al Viro.
      
      * tag 'pull-nios2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        nios2: _TIF_ALLWORK_MASK is unused
      173ea743
    • Linus Torvalds's avatar
      Merge tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 181b69dd
      Linus Torvalds authored
      Pull misc vfs pile from Al Viro.
      
      Random minor cleanups.
      
      * tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        fs: Fix description of vfs_tmpfile()
        sysv: switch to put_and_unmap_page()
        fs/sysv: Don't round down address for kunmap_flush_on_unmap()
      181b69dd
    • Linus Torvalds's avatar
      Merge tag 'pull-old-dio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 11b32219
      Linus Torvalds authored
      Pull legacy dio cleanup from Al Viro.
      
      * tag 'pull-old-dio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        __blockdev_direct_IO(): get rid of submit_io callback
      11b32219
    • Linus Torvalds's avatar
      Merge tag 'pull-write-one-page' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 0e497ad5
      Linus Torvalds authored
      Pull vfs write_one_page removal from Al Viro:
       "write_one_page series"
      
      * tag 'pull-write-one-page' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        mm,jfs: move write_one_page/folio_write_one to jfs
        ocfs2: don't use write_one_page in ocfs2_duplicate_clusters_by_page
        ufs: don't flush page immediately for DIRSYNC directories
      0e497ad5
    • Linus Torvalds's avatar
      Merge tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · ef36b9af
      Linus Torvalds authored
      Pull vfs fget updates from Al Viro:
       "fget() to fdget() conversions"
      
      * tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        fuse_dev_ioctl(): switch to fdget()
        cgroup_get_from_fd(): switch to fdget_raw()
        bpf: switch to fdget_raw()
        build_mount_idmapped(): switch to fdget()
        kill the last remaining user of proc_ns_fget()
        SVM-SEV: convert the rest of fget() uses to fdget() in there
        convert sgx_set_attribute() to fdget()/fdput()
        convert setns(2) to fdget()/fdput()
      ef36b9af
  4. 24 Apr, 2023 7 commits
    • Linus Torvalds's avatar
      Merge tag 'erofs-for-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs · 61d325dc
      Linus Torvalds authored
      Pull erofs updates from Gao Xiang:
       "In this cycle, sub-page block support for uncompressed files is
        available. It's mainly used to enable original signing ('golden')
        4k-block images on arm64 with 16/64k pages. In addition, end users
        could also use this feature to build a manifest to directly refer to
        golden tar data.
      
        Besides, long xattr name prefix support is also introduced in this
        cycle to avoid too many xattrs with the same prefix (e.g. overlayfs
        xattrs). It's useful for erofs + overlayfs combination (like Composefs
        model): the image size is reduced by ~14% and runtime performance is
        also slightly improved.
      
        Others are random fixes and cleanups as usual.
      
        Summary:
      
         - Add sub-page block size support for uncompressed files
      
         - Support flattened block device for multi-blob images to be attached
           into virtual machines (including cloud servers) and bare metals
      
         - Support long xattr name prefixes to optimize images with common
           xattr namespaces (e.g. files with overlayfs xattrs) use cases
      
         - Various minor cleanups & fixes"
      
      * tag 'erofs-for-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
        erofs: cleanup i_format-related stuffs
        erofs: sunset erofs_dbg()
        erofs: fix potential overflow calculating xattr_isize
        erofs: get rid of z_erofs_fill_inode()
        erofs: enable long extended attribute name prefixes
        erofs: handle long xattr name prefixes properly
        erofs: add helpers to load long xattr name prefixes
        erofs: introduce on-disk format for long xattr name prefixes
        erofs: move packed inode out of the compression part
        erofs: keep meta inode into erofs_buf
        erofs: initialize packed inode after root inode is assigned
        erofs: stop parsing non-compact HEAD index if clusterofs is invalid
        erofs: don't warn ztailpacking feature anymore
        erofs: simplify erofs_xattr_generic_get()
        erofs: rename init_inode_xattrs with erofs_ prefix
        erofs: move several xattr helpers into xattr.c
        erofs: tidy up EROFS on-disk naming
        erofs: support flattened block device for multi-blob images
        erofs: set block size to the on-disk block size
        erofs: avoid hardcoded blocksize for subpage block support
      61d325dc
    • Linus Torvalds's avatar
      Merge tag 'v6.4/vfs.open' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 97adb49f
      Linus Torvalds authored
      Pull vfs open fixlet from Christian Brauner:
       "EINVAL ist keinmal: This contains the changes to make O_DIRECTORY when
        specified together with O_CREAT an invalid request.
      
        The wider background is that a regression report about the behavior of
        O_DIRECTORY | O_CREAT was sent to fsdevel about a behavior that was
        changed multiple years and LTS releases earlier during v5.7
        development.
      
        This has also been covered in
      
              https://lwn.net/Articles/926782/
      
        which provides an excellent summary of the discussion"
      
      * tag 'v6.4/vfs.open' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        open: return EINVAL for O_DIRECTORY | O_CREAT
      97adb49f
    • Linus Torvalds's avatar
      Merge tag 'v6.4/vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · e2eff52c
      Linus Torvalds authored
      Pull misc vfs updates from Christian Brauner:
       "This contains a pile of various smaller fixes. Most of them aren't
        very interesting so this just highlights things worth mentioning:
      
         - Various filesystems contained the same little helper to convert
           from the mode of a dentry to the DT_* type of that dentry.
      
           They have now all been switched to rely on the generic
           fs_umode_to_dtype() helper. All custom helpers are removed (Jeff)
      
         - Fsnotify now reports ACCESS and MODIFY events for splice
           (Chung-Chiang Cheng)
      
         - After converting timerfd a long time ago to rely on
           wait_event_interruptible_*() apis, convert eventfd as well. This
           removes the complex open-coded wait code (Wen Yang)
      
         - Simplify sysctl registration for devpts, avoiding the declaration
           of two tables. Instead, just use a prefixed path with
           register_sysctl() (Luis)
      
         - The setattr_should_drop_sgid() helper is now exported so NFS can
           use it. By switching NFS to this helper an NFS setgid inheritance
           bug is fixed (me)"
      
      * tag 'v6.4/vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        fs: hfsplus: remove WARN_ON() from hfsplus_cat_{read,write}_inode()
        pnode: pass mountpoint directly
        eventfd: use wait_event_interruptible_locked_irq() helper
        splice: report related fsnotify events
        fs: consolidate duplicate dt_type helpers
        nfs: use vfs setgid helper
        Update relatime comments to include equality
        fs/buffer: Remove redundant assignment to err
        fs_context: drop the unused lsm_flags member
        fs/namespace: fnic: Switch to use %ptTd
        Documentation: update idmappings.rst
        devpts: simplify two-level sysctl registration for pty_kern_table
        eventpoll: align comment with nested epoll limitation
      e2eff52c
    • Linus Torvalds's avatar
      Merge tag 'v6.4/vfs.acl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 7bcff5a3
      Linus Torvalds authored
      Pull acl updates from Christian Brauner:
       "After finishing the introduction of the new posix acl api last cycle
        the generic POSIX ACL xattr handlers are still around in the
        filesystems xattr handlers for two reasons:
      
         (1) Because a few filesystems rely on the ->list() method of the
             generic POSIX ACL xattr handlers in their ->listxattr() inode
             operation.
      
         (2) POSIX ACLs are only available if IOP_XATTR is raised. The
             IOP_XATTR flag is raised in inode_init_always() based on whether
             the sb->s_xattr pointer is non-NULL. IOW, the registered xattr
             handlers of the filesystem are used to raise IOP_XATTR. Removing
             the generic POSIX ACL xattr handlers from all filesystems would
             risk regressing filesystems that only implement POSIX ACL support
             and no other xattrs (nfs3 comes to mind).
      
        This contains the work to decouple POSIX ACLs from the IOP_XATTR flag
        as they don't depend on xattr handlers anymore. So it's now possible
        to remove the generic POSIX ACL xattr handlers from the sb->s_xattr
        list of all filesystems. This is a crucial step as the generic POSIX
        ACL xattr handlers aren't used for POSIX ACLs anymore and POSIX ACLs
        don't depend on the xattr infrastructure anymore.
      
        Adressing problem (1) will require more long-term work. It would be
        best to get rid of the ->list() method of xattr handlers completely at
        some point.
      
        For erofs, ext{2,4}, f2fs, jffs2, ocfs2, and reiserfs the nop POSIX
        ACL xattr handler is kept around so they can continue to use
        array-based xattr handler indexing.
      
        This update does simplify the ->listxattr() implementation of all
        these filesystems however"
      
      * tag 'v6.4/vfs.acl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        acl: don't depend on IOP_XATTR
        ovl: check for ->listxattr() support
        reiserfs: rework priv inode handling
        fs: rename generic posix acl handlers
        reiserfs: rework ->listxattr() implementation
        fs: simplify ->listxattr() implementation
        fs: drop unused posix acl handlers
        xattr: remove unused argument
        xattr: add listxattr helper
        xattr: simplify listxattr helpers
      7bcff5a3
    • Linus Torvalds's avatar
      Merge tag 'v6.4/pidfd.file' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux · ec40758b
      Linus Torvalds authored
      Pull pidfd updates from Christian Brauner:
       "This adds a new pidfd_prepare() helper which allows the caller to
        reserve a pidfd number and allocates a new pidfd file that stashes the
        provided struct pid.
      
        It should be avoided installing a file descriptor into a task's file
        descriptor table just to close it again via close_fd() in case an
        error occurs. The fd has been visible to userspace and might already
        be in use. Instead, a file descriptor should be reserved but not
        installed into the caller's file descriptor table.
      
        If another failure path is hit then the reserved file descriptor and
        file can just be put without any userspace visible side-effects. And
        if all failure paths are cleared the file descriptor and file can be
        installed into the task's file descriptor table.
      
        This helper is now used in all places that open coded this
        functionality before. For example, this is currently done during
        copy_process() and fanotify used pidfd_create(), which returns a pidfd
        that has already been made visibile in the caller's file descriptor
        table, but then closed it using close_fd().
      
        In one of the next merge windows there is also new functionality
        coming to unix domain sockets that will have to rely on
        pidfd_prepare()"
      
      * tag 'v6.4/pidfd.file' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
        fanotify: use pidfd_prepare()
        fork: use pidfd_prepare()
        pid: add pidfd_prepare()
      ec40758b
    • Linus Torvalds's avatar
      Merge tag 'v6.4/kernel.user_worker' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux · 3323ddce
      Linus Torvalds authored
      Pull user work thread updates from Christian Brauner:
       "This contains the work generalizing the ability to create a kernel
        worker from a userspace process.
      
        Such user workers will run with the same credentials as the userspace
        process they were created from providing stronger security and
        accounting guarantees than the traditional override_creds() approach
        ever could've hoped for.
      
        The original work was heavily based and optimzed for the needs of
        io_uring which was the first user. However, as it quickly turned out
        the ability to create user workers inherting properties from a
        userspace process is generally useful.
      
        The vhost subsystem currently creates workers using the kthread api.
        The consequences of using the kthread api are that RLIMITs don't work
        correctly as they are inherited from khtreadd. This leads to bugs
        where more workers are created than would be allowed by the RLIMITs of
        the userspace process in lieu of which workers are created.
      
        Problems like this disappear with user workers created from the
        userspace processes for which they perform the work. In addition,
        providing this api allows vhost to remove additional complexity. For
        example, cgroup and mm sharing will just work out of the box with user
        workers based on the relevant userspace process instead of manually
        ensuring the correct cgroup and mm contexts are used.
      
        So the vhost subsystem should simply be made to use the same mechanism
        as io_uring. To this end the original mechanism used for
        create_io_thread() is generalized into user workers:
      
         - Introduce PF_USER_WORKER as a generic indicator that a given task
           is a user worker, i.e., a kernel task that was created from a
           userspace process. Now a PF_IO_WORKER thread is just a specialized
           version of PF_USER_WORKER. So io_uring io workers raise both flags.
      
         - Make copy_process() available to core kernel code
      
         - Extend struct kernel_clone_args with the following bitfields
           allowing to indicate to copy_process():
             - to create a user worker (raise PF_USER_WORKER)
             - to not inherit any files from the userspace process
             - to ignore signals
      
        After all generic changes are in place the vhost subsystem implements
        a new dedicated vhost api based on user workers. Finally, vhost is
        switched to rely on the new api moving it off of kthreads.
      
        Thanks to Mike for sticking it out and making it through this rather
        arduous journey"
      
      * tag 'v6.4/kernel.user_worker' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
        vhost: use vhost_tasks for worker threads
        vhost: move worker thread fields to new struct
        vhost_task: Allow vhost layer to use copy_process
        fork: allow kernel code to call copy_process
        fork: Add kernel_clone_args flag to ignore signals
        fork: add kernel_clone_args flag to not dup/clone files
        fork/vm: Move common PF_IO_WORKER behavior to new flag
        kernel: Make io_thread and kthread bit fields
        kthread: Pass in the thread's name during creation
        kernel: Allow a kernel thread's name to be set in copy_process
        csky: Remove kernel_thread declaration
      3323ddce
    • Linus Torvalds's avatar
      Merge tag 'v6.4/kernel.clone3.tests' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux · a632b76b
      Linus Torvalds authored
      Pull clone3 selftest fix from Christian Brauner:
       "This is a single fix to the clone3() selftstests.
      
        It fell through the sefltest tree cracks a few times so I'll provide
        it here. It has low urgency but we should still correctly report the
        number of tests"
      
      * tag 'v6.4/kernel.clone3.tests' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
        selftests/clone3: fix number of tests in ksft_set_plan
      a632b76b