1. 28 Jul, 2024 4 commits
    • Linus Torvalds's avatar
      Merge tag 'v6.11-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux · e172f1e9
      Linus Torvalds authored
      Pull turbostat updates from Len Brown:
      
       - Enable turbostat extensions to add both perf and PMT (Intel
         Platform Monitoring Technology) counters via the cmdline
      
       - Demonstrate PMT access with built-in support for Meteor Lake's
         Die C6 counter
      
      * tag 'v6.11-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
        tools/power turbostat: version 2024.07.26
        tools/power turbostat: Include umask=%x in perf counter's config
        tools/power turbostat: Document PMT in turbostat.8
        tools/power turbostat: Add MTL's PMT DC6 builtin counter
        tools/power turbostat: Add early support for PMT counters
        tools/power turbostat: Add selftests for added perf counters
        tools/power turbostat: Add selftests for SMI, APERF and MPERF counters
        tools/power turbostat: Move verbose counter messages to level 2
        tools/power turbostat: Move debug prints from stdout to stderr
        tools/power turbostat: Fix typo in turbostat.8
        tools/power turbostat: Add perf added counter example to turbostat.8
        tools/power turbostat: Fix formatting in turbostat.8
        tools/power turbostat: Extend --add option with perf counters
        tools/power turbostat: Group SMI counter with APERF and MPERF
        tools/power turbostat: Add ZERO_ARRAY for zero initializing builtin array
        tools/power turbostat: Replace enum rapl_source and cstate_source with counter_source
        tools/power turbostat: Remove anonymous union from rapl_counter_info_t
        tools/power/turbostat: Switch to new Intel CPU model defines
      e172f1e9
    • Linus Torvalds's avatar
      Merge tag 'cxl-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl · e62f81bb
      Linus Torvalds authored
      Pull CXL updates from Dave Jiang:
       "Core:
      
         - A CXL maturity map has been added to the documentation to detail
           the current state of CXL enabling.
      
           It provides the status of the current state of various CXL features
           to inform current and future contributors of where things are and
           which areas need contribution.
      
         - A notifier handler has been added in order for a newly created CXL
           memory region to trigger the abstract distance metrics calculation.
      
           This should bring parity for CXL memory to the same level vs
           hotplugged DRAM for NUMA abstract distance calculation. The
           abstract distance reflects relative performance used for memory
           tiering handling.
      
         - An addition for XOR math has been added to address the CXL DPA to
           SPA translation.
      
           CXL address translation did not support address interleave math
           with XOR prior to this change.
      
        Fixes:
      
         - Fix to address race condition in the CXL memory hotplug notifier
      
         - Add missing MODULE_DESCRIPTION() for CXL modules
      
         - Fix incorrect vendor debug UUID define
      
        Misc:
      
         - A warning has been added to inform users of an unsupported
           configuration when mixing CXL VH and RCH/RCD hierarchies
      
         - The ENXIO error code has been replaced with EBUSY for inject poison
           limit reached via debugfs and cxl-test support
      
         - Moving the PCI config read in cxl_dvsec_rr_decode() to avoid
           unnecessary PCI config reads
      
         - A refactor to a common struct for DRAM and general media CXL
           events"
      
      * tag 'cxl-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
        cxl/core/pci: Move reading of control register to immediately before usage
        cxl: Remove defunct code calculating host bridge target positions
        cxl/region: Verify target positions using the ordered target list
        cxl: Restore XOR'd position bits during address translation
        cxl/core: Fold cxl_trace_hpa() into cxl_dpa_to_hpa()
        cxl/test: Replace ENXIO with EBUSY for inject poison limit reached
        cxl/memdev: Replace ENXIO with EBUSY for inject poison limit reached
        cxl/acpi: Warn on mixed CXL VH and RCH/RCD Hierarchy
        cxl/core: Fix incorrect vendor debug UUID define
        Documentation: CXL Maturity Map
        cxl/region: Simplify cxl_region_nid()
        cxl/region: Support to calculate memory tier abstract distance
        cxl/region: Fix a race condition in memory hotplug notifier
        cxl: add missing MODULE_DESCRIPTION() macros
        cxl/events: Use a common struct for DRAM and General Media events
      e62f81bb
    • Linus Torvalds's avatar
      Merge tag 'unicode-next-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode · 7b5d4818
      Linus Torvalds authored
      Pull unicode update from Gabriel Krisman Bertazi:
       "Two small fixes to silence the compiler and static analyzers tools
        from Ben Dooks and Jeff Johnson"
      
      * tag 'unicode-next-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode:
        unicode: add MODULE_DESCRIPTION() macros
        unicode: make utf8 test count static
      7b5d4818
    • Linus Torvalds's avatar
      Merge tag '6.11-rc-smb-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6 · 5437f30d
      Linus Torvalds authored
      Pull more smb client updates from Steve French:
      
       - fix for potential null pointer use in init cifs
      
       - additional dynamic trace points to improve debugging of some common
         scenarios
      
       - two SMB1 fixes (one addressing reconnect with POSIX extensions, one a
         mount parsing error)
      
      * tag '6.11-rc-smb-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
        smb3: add dynamic trace point for session setup key expired failures
        smb3: add four dynamic tracepoints for copy_file_range and reflink
        smb3: add dynamic tracepoint for reflink errors
        cifs: mount with "unix" mount option for SMB1 incorrectly handled
        cifs: fix reconnect with SMB1 UNIX Extensions
        cifs: fix potential null pointer use in destroy_workqueue in init_cifs error path
      5437f30d
  2. 27 Jul, 2024 24 commits
    • Linus Torvalds's avatar
      Merge tag 'block-6.11-20240726' of git://git.kernel.dk/linux · 6342649c
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
      
       - NVMe pull request via Keith:
           - Fix request without payloads cleanup  (Leon)
           - Use new protection information format (Francis)
           - Improved debug message for lost pci link (Bart)
           - Another apst quirk (Wang)
           - Use appropriate sysfs api for printing chars (Markus)
      
       - ublk async device deletion fix (Ming)
      
       - drbd kerneldoc fixups (Simon)
      
       - Fix deadlock between sd removal and release (Yang)
      
      * tag 'block-6.11-20240726' of git://git.kernel.dk/linux:
        nvme-pci: add missing condition check for existence of mapped data
        ublk: fix UBLK_CMD_DEL_DEV_ASYNC handling
        block: fix deadlock between sd_remove & sd_release
        drbd: Add peer_device to Kernel doc
        nvme-core: choose PIF from QPIF if QPIFS supports and PIF is QTYPE
        nvme-pci: Fix the instructions for disabling power management
        nvme: remove redundant bdev local variable
        nvme-fabrics: Use seq_putc() in __nvmf_concat_opt_tokens()
        nvme/pci: Add APST quirk for Lenovo N60z laptop
      6342649c
    • Linus Torvalds's avatar
      Merge tag 'io_uring-6.11-20240726' of git://git.kernel.dk/linux · 8c930747
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
      
       - Fix a syzbot issue for the msg ring cache added in this release. No
         ill effects from this one, but it did make KMSAN unhappy (me)
      
       - Sanitize the NAPI timeout handling, by unifying the value handling
         into all ktime_t rather than converting back and forth (Pavel)
      
       - Fail NAPI registration for IOPOLL rings, it's not supported (Pavel)
      
       - Fix a theoretical issue with ring polling and cancelations (Pavel)
      
       - Various little cleanups and fixes (Pavel)
      
      * tag 'io_uring-6.11-20240726' of git://git.kernel.dk/linux:
        io_uring/napi: pass ktime to io_napi_adjust_timeout
        io_uring/napi: use ktime in busy polling
        io_uring/msg_ring: fix uninitialized use of target_req->flags
        io_uring: align iowq and task request error handling
        io_uring: kill REQ_F_CANCEL_SEQ
        io_uring: simplify io_uring_cmd return
        io_uring: fix io_match_task must_hold
        io_uring: don't allow netpolling with SETUP_IOPOLL
        io_uring: tighten task exit cancellations
      8c930747
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.11-rc1.fixes.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · bc4eee85
      Linus Torvalds authored
      Pull vfs fixes from Christian Brauner:
       "This contains two fixes for this merge window:
      
        VFS:
      
         - I noticed that it is possible for a privileged user to mount most
           filesystems with a non-initial user namespace in sb->s_user_ns.
      
           When fsopen() is called in a non-init namespace the caller's
           namespace is recorded in fs_context->user_ns. If the returned file
           descriptor is then passed to a process privileged in init_user_ns,
           that process can call fsconfig(fd_fs, FSCONFIG_CMD_CREATE*),
           creating a new superblock with sb->s_user_ns set to the namespace
           of the process which called fsopen().
      
           This is problematic as only filesystems that raise FS_USERNS_MOUNT
           are known to be able to support a non-initial s_user_ns. Others may
           suffer security issues, on-disk corruption or outright crash the
           kernel. Prevent that by restricting such delegation to filesystems
           that allow FS_USERNS_MOUNT.
      
           Note, that this delegation requires a privileged process to
           actually create the superblock so either the privileged process is
           cooperaing or someone must have tricked a privileged process into
           operating on a fscontext file descriptor whose origin it doesn't
           know (a stupid idea).
      
           The bug dates back to about 5 years afaict.
      
        Misc:
      
         - Fix hostfs parsing when the mount request comes in via the legacy
           mount api.
      
           In the legacy mount api hostfs allows to specify the host directory
           mount without any key.
      
           Restore that behavior"
      
      * tag 'vfs-6.11-rc1.fixes.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        hostfs: fix the host directory parse when mounting.
        fs: don't allow non-init s_user_ns for filesystems without FS_USERNS_MOUNT
      bc4eee85
    • Linus Torvalds's avatar
      Merge tag 'rust-6.11' of https://github.com/Rust-for-Linux/linux · 910bfc26
      Linus Torvalds authored
      Pull Rust updates from Miguel Ojeda:
       "The highlight is the establishment of a minimum version for the Rust
        toolchain, including 'rustc' (and bundled tools) and 'bindgen'.
      
        The initial minimum will be the pinned version we currently have, i.e.
        we are just widening the allowed versions. That covers three stable
        Rust releases: 1.78.0, 1.79.0, 1.80.0 (getting released tomorrow),
        plus beta, plus nightly.
      
        This should already be enough for kernel developers in distributions
        that provide recent Rust compiler versions routinely, such as Arch
        Linux, Debian Unstable (outside the freeze period), Fedora Linux,
        Gentoo Linux (especially the testing channel), Nix (unstable) and
        openSUSE Slowroll and Tumbleweed.
      
        In addition, the kernel is now being built-tested by Rust's pre-merge
        CI. That is, every change that is attempting to land into the Rust
        compiler is tested against the kernel, and it is merged only if it
        passes. Similarly, the bindgen tool has agreed to build the kernel in
        their CI too.
      
        Thus, with the pre-merge CI in place, both projects hope to avoid
        unintentional changes to Rust that break the kernel. This means that,
        in general, apart from intentional changes on their side (that we will
        need to workaround conditionally on our side), the upcoming Rust
        compiler versions should generally work.
      
        In addition, the Rust project has proposed getting the kernel into
        stable Rust (at least solving the main blockers) as one of its three
        flagship goals for 2024H2 [1].
      
        I would like to thank Niko, Sid, Emilio et al. for their help
        promoting the collaboration between Rust and the kernel.
      
        Toolchain and infrastructure:
      
         - Support several Rust toolchain versions.
      
         - Support several bindgen versions.
      
         - Remove 'cargo' requirement and simplify 'rusttest', thanks to
           'alloc' having been dropped last cycle.
      
         - Provide proper error reporting for the 'rust-analyzer' target.
      
        'kernel' crate:
      
         - Add 'uaccess' module with a safe userspace pointers abstraction.
      
         - Add 'page' module with a 'struct page' abstraction.
      
         - Support more complex generics in workqueue's 'impl_has_work!'
           macro.
      
        'macros' crate:
      
         - Add 'firmware' field support to the 'module!' macro.
      
         - Improve 'module!' macro documentation.
      
        Documentation:
      
         - Provide instructions on what packages should be installed to build
           the kernel in some popular Linux distributions.
      
         - Introduce the new kernel.org LLVM+Rust toolchains.
      
         - Explain '#[no_std]'.
      
        And a few other small bits"
      
      Link: https://rust-lang.github.io/rust-project-goals/2024h2/index.html#flagship-goals [1]
      
      * tag 'rust-6.11' of https://github.com/Rust-for-Linux/linux: (26 commits)
        docs: rust: quick-start: add section on Linux distributions
        rust: warn about `bindgen` versions 0.66.0 and 0.66.1
        rust: start supporting several `bindgen` versions
        rust: work around `bindgen` 0.69.0 issue
        rust: avoid assuming a particular `bindgen` build
        rust: start supporting several compiler versions
        rust: simplify Clippy warning flags set
        rust: relax most deny-level lints to warnings
        rust: allow `dead_code` for never constructed bindings
        rust: init: simplify from `map_err` to `inspect_err`
        rust: macros: indent list item in `paste!`'s docs
        rust: add abstraction for `struct page`
        rust: uaccess: add typed accessors for userspace pointers
        uaccess: always export _copy_[from|to]_user with CONFIG_RUST
        rust: uaccess: add userspace pointers
        kbuild: rust-analyzer: improve comment documentation
        kbuild: rust-analyzer: better error handling
        docs: rust: no_std is used
        rust: alloc: add __GFP_HIGHMEM flag
        rust: alloc: fix typo in docs for GFP_NOWAIT
        ...
      910bfc26
    • Linus Torvalds's avatar
      Merge tag 'apparmor-pr-2024-07-25' of... · ff305644
      Linus Torvalds authored
      Merge tag 'apparmor-pr-2024-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor
      
      Pull apparmor updates from John Johansen:
       "Cleanups
         - optimization: try to avoid refing the label in apparmor_file_open
         - remove useless static inline function is_deleted
         - use kvfree_sensitive to free data->data
         - fix typo in kernel doc
      
        Bug fixes:
         - unpack transition table if dfa is not present
         - test: add MODULE_DESCRIPTION()
         - take nosymfollow flag into account
         - fix possible NULL pointer dereference
         - fix null pointer deref when receiving skb during sock creation"
      
      * tag 'apparmor-pr-2024-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
        apparmor: unpack transition table if dfa is not present
        apparmor: try to avoid refing the label in apparmor_file_open
        apparmor: test: add MODULE_DESCRIPTION()
        apparmor: take nosymfollow flag into account
        apparmor: fix possible NULL pointer dereference
        apparmor: fix typo in kernel doc
        apparmor: remove useless static inline function is_deleted
        apparmor: use kvfree_sensitive to free data->data
        apparmor: Fix null pointer deref when receiving skb during sock creation
      ff305644
    • Linus Torvalds's avatar
      Merge tag 'landlock-6.11-rc1-houdini-fix' of... · 86b405ad
      Linus Torvalds authored
      Merge tag 'landlock-6.11-rc1-houdini-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
      
      Pull landlock fix from Mickaël Salaün:
       "Jann Horn reported a sandbox bypass for Landlock. This includes the
        fix and new tests. This should be backported"
      
      * tag 'landlock-6.11-rc1-houdini-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
        selftests/landlock: Add cred_transfer test
        landlock: Don't lose track of restrictions on cred_transfer
      86b405ad
    • Linus Torvalds's avatar
      Merge tag 'gpio-fixes-for-v6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux · 8e333791
      Linus Torvalds authored
      Pull gpio fix from Bartosz Golaszewski:
      
       - don't use sprintf() with non-constant format string
      
      * tag 'gpio-fixes-for-v6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
        gpio: virtuser: avoid non-constant format string
      8e333791
    • Linus Torvalds's avatar
      Merge tag 'devicetree-fixes-for-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux · bf80f139
      Linus Torvalds authored
      Pull more devicetree updates from Rob Herring:
       "Most of this is a treewide change to of_property_for_each_u32() which
        was small enough to do in one go before rc1 and avoids the need to
        create of_property_for_each_u32_some_new_name().
      
         - Treewide conversion of of_property_for_each_u32() to drop internal
           arguments making struct property opaque
      
         - Add binding for Amlogic A4 SoC watchdog
      
         - Fix constraints for AD7192 'single-channel' property"
      
      * tag 'devicetree-fixes-for-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
        dt-bindings: iio: adc: ad7192: Fix 'single-channel' constraints
        of: remove internal arguments from of_property_for_each_u32()
        dt-bindings: watchdog: add support for Amlogic A4 SoCs
      bf80f139
    • Linus Torvalds's avatar
      Merge tag 'iommu-fixes-v6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux · b465ed28
      Linus Torvalds authored
      Pull iommu fixes from Will Deacon:
       "We're still resolving a regression with the handling of unexpected
        page faults on SMMUv3, but we're not quite there with a fix yet.
      
         - Fix NULL dereference when freeing domain in Unisoc SPRD driver
      
         - Separate assignment statements with semicolons in AMD page-table
           code
      
         - Fix Tegra erratum workaround when the CPU is using 16KiB pages"
      
      * tag 'iommu-fixes-v6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
        iommu: arm-smmu: Fix Tegra workaround for PAGE_SIZE mappings
        iommu/amd: Convert comma to semicolon
        iommu: sprd: Avoid NULL deref in sprd_iommu_hw_en
      b465ed28
    • Linus Torvalds's avatar
      Merge tag 'firewire-fixes-6.11-rc1' of... · 04216211
      Linus Torvalds authored
      Merge tag 'firewire-fixes-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
      
      Pull firewire fixes from Takashi Sakamoto:
       "The recent integration of compiler collections introduced the
        technology to check flexible array length at runtime by providing
        proper annotations. In v6.10 kernel, a patch was merged into firewire
        subsystem to utilize it, however the annotation was inadequate.
      
        There is also the related change for the flexible array in sound
        subsystem, but it causes a regression where the data in the payload of
        isochronous packet is incorrect for some devices. These bugs are now
        fixed"
      
      * tag 'firewire-fixes-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
        ALSA: firewire-lib: fix wrong value as length of header for CIP_NO_HEADER case
        Revert "firewire: Annotate struct fw_iso_packet with __counted_by()"
      04216211
    • Linus Torvalds's avatar
      Merge tag 'spi-fix-v6.11-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi · ab11658f
      Linus Torvalds authored
      Pull spi fixes from Mark Brown:
       "The bulk of this is a series of fixes for the microchip-core driver
        mostly originating from one of their customers, I also applied an
        additional patch adding support for controlling the word size which
        came along with it since it's still the merge window and clearly had a
        bunch of fairly thorough testing.
      
        We also have a fix for the compatible used to bind spidev to the
        BH2228FV"
      
      * tag 'spi-fix-v6.11-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
        spi: spidev: add correct compatible for Rohm BH2228FV
        dt-bindings: trivial-devices: fix Rohm BH2228FV compatible string
        spi: microchip-core: add support for word sizes of 1 to 32 bits
        spi: microchip-core: ensure TX and RX FIFOs are empty at start of a transfer
        spi: microchip-core: fix init function not setting the master and motorola modes
        spi: microchip-core: only disable SPI controller when register value change requires it
        spi: microchip-core: defer asserting chip select until just before write to TX FIFO
        spi: microchip-core: fix the issues in the isr
      ab11658f
    • Linus Torvalds's avatar
      Merge tag 'regulator-fix-v6.11-merge-window' of... · 560e8050
      Linus Torvalds authored
      Merge tag 'regulator-fix-v6.11-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
      
      Pull regulator fixes from Mark Brown:
       "These two commits clean up the excessively loose dependencies for the
        RZG2L USB VBCTRL regulator driver, ensuring it shouldn't prompt for
        people who can't use it"
      
      * tag 'regulator-fix-v6.11-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
        regulator: Further restrict RZG2L USB VBCTRL regulator dependencies
        regulator: renesas-usb-vbus-regulator: Update the default
      560e8050
    • Linus Torvalds's avatar
      Merge tag 'regmap-fix-v6.11-merge-window' of... · 8f3f7598
      Linus Torvalds authored
      Merge tag 'regmap-fix-v6.11-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
      
      Pull regmap fix from Mark Brown:
       "Arnd sent a workaround for a false positive warning which was showing
        up with GCC 14.1"
      
      * tag 'regmap-fix-v6.11-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
        regmap: maple: work around gcc-14.1 false-positive warning
      8f3f7598
    • Linus Torvalds's avatar
      Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · de5f4fbe
      Linus Torvalds authored
      Pull clk fixes from Stephen Boyd:
       "A few clk driver fixes for the merge window to fix the build and boot
        on some SoCs.
      
         - Initialize struct clk_init_data in the TI da8xx-cfgchip driver so
           that stack contents aren't used for things like clk flags leading
           to unexpected behavior
      
         - Don't leak stack contents in a debug print in the new Sophgo clk
           driver
      
         - Disable the new T-Head clk driver on 32-bit targets to fix the
           build due to a division
      
         - Fix Samsung Exynos4 fin_pll wreckage from the clkdev rework done
           last cycle by using a struct clk_hw directly instead of a struct
           clk consumer"
      
      * tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
        clk: samsung: fix getting Exynos4 fin_pll rate from external clocks
        clk: T-Head: Disable on 32-bit Targets
        clk: sophgo: clk-sg2042-pll: Fix uninitialized variable in debug output
        clk: davinci: da8xx-cfgchip: Initialize clk_init_data before use
      de5f4fbe
    • Linus Torvalds's avatar
      Merge tag 'i3c/for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux · c85e1497
      Linus Torvalds authored
      Pull i3c updates from Alexandre Belloni:
       "This cycle, there are new features for the Designware controller and
        fixes for the other IPs:
      
         - dw: optional apb clock and power management support, IBI handling
           fixes
      
         - mipi-i3c-hci: IBI handling fixes
      
         - svc: a few fixes"
      
      * tag 'i3c/for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
        dt-bindings: i3c: add header for generic I3C flags
        i3c: master: svc: Fix error code in svc_i3c_master_do_daa_locked()
        i3c: master: Enhance i3c_bus_type visibility for device searching & event monitoring
        i3c: dw: Add power management support
        i3c: dw: Add some functions for reusability
        i3c: dw: Save timing registers and other values
        i3c: master: svc: Improve DAA STOP handle code logic
        i3c: dw: Add optional apb clock
        i3c: dw: Use new *_enabled clk API
        dt-bindings: i3c: dw: Add apb clock binding
        i3c: master: svc: Convert comma to semicolon
        i3c: mipi-i3c-hci: Round IBI data chunk size to HW supported value
        i3c: mipi-i3c-hci: Error out instead on BUG_ON() in IBI DMA setup
        i3c: mipi-i3c-hci: Set IBI Status and Data Ring base addresses
        i3c: mipi-i3c-hci: Switch to lower_32_bits()/upper_32_bits() helpers
        i3c: dw: Remove ibi_capable property
        i3c: dw: Fix IBI intr programming
        i3c: dw: Fix clearing queue thld
        i3c: mipi-i3c-hci: Fix number of DAT/DCT entries for HCI versions < 1.1
        i3c: master: svc: resend target address when get NACK
      c85e1497
    • Linus Torvalds's avatar
      Merge tag 'thermal-6.11-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 1fcaa5db
      Linus Torvalds authored
      Pull thermal control fix from Rafael Wysocki:
       "Prevent the thermal core from flooding the kernel log with useless
        messages if thermal zone temperature can never be determined (or its
        sensor has failed permanently) and make it finally give up and disable
        defective thermal zones (Rafael Wysocki)"
      
      * tag 'thermal-6.11-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        thermal: core: Back off when polling thermal zones on errors
        thermal: trip: Split thermal_zone_device_set_mode()
      1fcaa5db
    • Linus Torvalds's avatar
      Merge tag 'mm-hotfixes-stable-2024-07-26-14-33' of... · 7b0acd91
      Linus Torvalds authored
      Merge tag 'mm-hotfixes-stable-2024-07-26-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
      
      Pull misc hotfixes from Andrew Morton:
       "11 hotfixes, 7 of which are cc:stable.  7 are MM, 4 are other"
      
      * tag 'mm-hotfixes-stable-2024-07-26-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
        nilfs2: handle inconsistent state in nilfs_btnode_create_block()
        selftests/mm: skip test for non-LPA2 and non-LVA systems
        mm/page_alloc: fix pcp->count race between drain_pages_zone() vs __rmqueue_pcplist()
        mm: memcg: add cacheline padding after lruvec in mem_cgroup_per_node
        alloc_tag: outline and export free_reserved_page()
        decompress_bunzip2: fix rare decompression failure
        mm/huge_memory: avoid PMD-size page cache if needed
        mm: huge_memory: use !CONFIG_64BIT to relax huge page alignment on 32 bit machines
        mm: fix old/young bit handling in the faulting path
        dt-bindings: arm: update James Clark's email address
        MAINTAINERS: mailmap: update James Clark's email address
      7b0acd91
    • Linus Torvalds's avatar
      Merge tag 'timers-urgent-2024-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 5256184b
      Linus Torvalds authored
      Pull timer migration updates from Thomas Gleixner:
       "Fixes and minor updates for the timer migration code:
      
         - Stop testing the group->parent pointer as it is not guaranteed to
           be stable over a chain of operations by design.
      
           This includes a warning which would be nice to have but it produces
           false positives due to the racy nature of the check.
      
         - Plug a race between CPUs going in and out of idle and a CPU hotplug
           operation. The latter can create and connect a new hierarchy level
           which is missed in the concurrent updates of CPUs which go into
           idle. As a result the events of such a CPU might not be processed
           and timers go stale.
      
           Cure it by splitting the hotplug operation into a prepare and
           online callback. The prepare callback is guaranteed to run on an
           online and therefore active CPU. This CPU updates the hierarchy and
           being online ensures that there is always at least one migrator
           active which handles the modified hierarchy correctly when going
           idle. The online callback which runs on the incoming CPU then just
           marks the CPU active and brings it into operation.
      
         - Improve tracing and polish the code further so it is more obvious
           what's going on"
      
      * tag 'timers-urgent-2024-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        timers/migration: Fix grammar in comment
        timers/migration: Spare write when nothing changed
        timers/migration: Rename childmask by groupmask to make naming more obvious
        timers/migration: Read childmask and parent pointer in a single place
        timers/migration: Use a single struct for hierarchy walk data
        timers/migration: Improve tracing
        timers/migration: Move hierarchy setup into cpuhotplug prepare callback
        timers/migration: Do not rely always on group->parent
      5256184b
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-6.11-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · c9f33436
      Linus Torvalds authored
      Pull more RISC-V updates from Palmer Dabbelt:
      
       - Support for NUMA (via SRAT and SLIT), console output (via SPCR), and
         cache info (via PPTT) on ACPI-based systems.
      
       - The trap entry/exit code no longer breaks the return address stack
         predictor on many systems, which results in an improvement to trap
         latency.
      
       - Support for HAVE_ARCH_STACKLEAK.
      
       - The sv39 linear map has been extended to support 128GiB mappings.
      
       - The frequency of the mtime CSR is now visible via hwprobe.
      
      * tag 'riscv-for-linus-6.11-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (21 commits)
        RISC-V: Provide the frequency of time CSR via hwprobe
        riscv: Extend sv39 linear mapping max size to 128G
        riscv: enable HAVE_ARCH_STACKLEAK
        riscv: signal: Remove unlikely() from WARN_ON() condition
        riscv: Improve exception and system call latency
        RISC-V: Select ACPI PPTT drivers
        riscv: cacheinfo: initialize cacheinfo's level and type from ACPI PPTT
        riscv: cacheinfo: remove the useless input parameter (node) of ci_leaf_init()
        RISC-V: ACPI: Enable SPCR table for console output on RISC-V
        riscv: boot: remove duplicated targets line
        trace: riscv: Remove deprecated kprobe on ftrace support
        riscv: cpufeature: Extract common elements from extension checking
        riscv: Introduce vendor variants of extension helpers
        riscv: Add vendor extensions to /proc/cpuinfo
        riscv: Extend cpufeature.c to detect vendor extensions
        RISC-V: run savedefconfig for defconfig
        RISC-V: hwprobe: sort EXT_KEY()s in hwprobe_isa_ext0() alphabetically
        ACPI: NUMA: replace pr_info with pr_debug in arch_acpi_numa_init
        ACPI: NUMA: change the ACPI_NUMA to a hidden option
        ACPI: NUMA: Add handler for SRAT RINTC affinity structure
        ...
      c9f33436
    • Linus Torvalds's avatar
      Merge tag 'for-linus-6.11-rc1a-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · c17f1224
      Linus Torvalds authored
      Pull xen fixes from Juergen Gross:
       "Two fixes for issues introduced in this merge window:
      
         - fix enhanced debugging in the Xen multicall handling
      
         - two patches fixing a boot failure when running as dom0 in PVH mode"
      
      * tag 'for-linus-6.11-rc1a-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        x86/xen: fix memblock_reserve() usage on PVH
        x86/xen: move xen_reserve_extra_memory()
        xen: fix multicall debug data referencing
      c17f1224
    • Hongbo Li's avatar
      hostfs: fix the host directory parse when mounting. · ef9ca17c
      Hongbo Li authored
      hostfs not keep the host directory when mounting. When the host
      directory is none (default), fc->source is used as the host root
      directory, and this is wrong. Here we use `parse_monolithic` to
      handle the old mount path for parsing the root directory. For new
      mount path, The `parse_param` is used for the host directory parse.
      Reported-and-tested-by: default avatarMaciej Żenczykowski <maze@google.com>
      Fixes: cd140ce9 ("hostfs: convert hostfs to use the new mount API")
      Link: https://lore.kernel.org/all/CANP3RGceNzwdb7w=vPf5=7BCid5HVQDmz1K5kC9JG42+HVAh_g@mail.gmail.com/
      Cc: Christian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarHongbo Li <lihongbo22@huawei.com>
      Link: https://lore.kernel.org/r/20240725065130.1821964-1-lihongbo22@huawei.com
      [brauner: minor fixes]
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      ef9ca17c
    • Seth Forshee (DigitalOcean)'s avatar
      fs: don't allow non-init s_user_ns for filesystems without FS_USERNS_MOUNT · e1c5ae59
      Seth Forshee (DigitalOcean) authored
      Christian noticed that it is possible for a privileged user to mount
      most filesystems with a non-initial user namespace in sb->s_user_ns.
      When fsopen() is called in a non-init namespace the caller's namespace
      is recorded in fs_context->user_ns. If the returned file descriptor is
      then passed to a process priviliged in init_user_ns, that process can
      call fsconfig(fd_fs, FSCONFIG_CMD_CREATE), creating a new superblock
      with sb->s_user_ns set to the namespace of the process which called
      fsopen().
      
      This is problematic. We cannot assume that any filesystem which does not
      set FS_USERNS_MOUNT has been written with a non-initial s_user_ns in
      mind, increasing the risk for bugs and security issues.
      
      Prevent this by returning EPERM from sget_fc() when FS_USERNS_MOUNT is
      not set for the filesystem and a non-initial user namespace will be
      used. sget() does not need to be updated as it always uses the user
      namespace of the current context, or the initial user namespace if
      SB_SUBMOUNT is set.
      
      Fixes: cb50b348 ("convenience helpers: vfs_get_super() and sget_fc()")
      Reported-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarSeth Forshee (DigitalOcean) <sforshee@kernel.org>
      Link: https://lore.kernel.org/r/20240724-s_user_ns-fix-v1-1-895d07c94701@kernel.orgReviewed-by: default avatarAlexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      e1c5ae59
    • Takashi Sakamoto's avatar
      ALSA: firewire-lib: fix wrong value as length of header for CIP_NO_HEADER case · c1839501
      Takashi Sakamoto authored
      In a commit 1d717123 ("ALSA: firewire-lib: Avoid
      -Wflex-array-member-not-at-end warning"), DEFINE_FLEX() macro was used to
      handle variable length of array for header field in struct fw_iso_packet
      structure. The usage of macro has a side effect that the designated
      initializer assigns the count of array to the given field. Therefore
      CIP_HEADER_QUADLETS (=2) is assigned to struct fw_iso_packet.header,
      while the original designated initializer assigns zero to all fields.
      
      With CIP_NO_HEADER flag, the change causes invalid length of header in
      isochronous packet for 1394 OHCI IT context. This bug affects all of
      devices supported by ALSA fireface driver; RME Fireface 400, 800, UCX, UFX,
      and 802.
      
      This commit fixes the bug by replacing it with the alternative version of
      macro which corresponds no initializer.
      
      Cc: stable@vger.kernel.org
      Fixes: 1d717123 ("ALSA: firewire-lib: Avoid -Wflex-array-member-not-at-end warning")
      Reported-by: default avatarEdmund Raile <edmund.raile@proton.me>
      Closes: https://lore.kernel.org/r/rrufondjeynlkx2lniot26ablsltnynfaq2gnqvbiso7ds32il@qk4r6xps7jh2/Reviewed-by: default avatarTakashi Iwai <tiwai@suse.de>
      Link: https://lore.kernel.org/r/20240725155640.128442-1-o-takashi@sakamocchi.jpSigned-off-by: default avatarTakashi Sakamoto <o-takashi@sakamocchi.jp>
      c1839501
    • Takashi Sakamoto's avatar
      Revert "firewire: Annotate struct fw_iso_packet with __counted_by()" · 00e3913b
      Takashi Sakamoto authored
      This reverts commit d3155742.
      
      The header_length field is byte unit, thus it can not express the number of
      elements in header field. It seems that the argument for counted_by
      attribute can have no arithmetic expression, therefore this commit just
      reverts the issued commit.
      Suggested-by: default avatarGustavo A. R. Silva <gustavoars@kernel.org>
      Link: https://lore.kernel.org/r/20240725161648.130404-1-o-takashi@sakamocchi.jpSigned-off-by: default avatarTakashi Sakamoto <o-takashi@sakamocchi.jp>
      00e3913b
  3. 26 Jul, 2024 12 commits
    • Linus Torvalds's avatar
      minmax: avoid overly complicated constant expressions in VM code · 3a7e02c0
      Linus Torvalds authored
      The minmax infrastructure is overkill for simple constants, and can
      cause huge expansions because those simple constants are then used by
      other things.
      
      For example, 'pageblock_order' is a core VM constant, but because it was
      implemented using 'min_t()' and all the type-checking that involves, it
      actually expanded to something like 2.5kB of preprocessor noise.
      
      And when that simple constant was then used inside other expansions:
      
        #define pageblock_nr_pages      (1UL << pageblock_order)
        #define pageblock_start_pfn(pfn)  ALIGN_DOWN((pfn), pageblock_nr_pages)
      
      and we then use that inside a 'max()' macro:
      
      	case ISOLATE_SUCCESS:
      		update_cached = false;
      		last_migrated_pfn = max(cc->zone->zone_start_pfn,
      			pageblock_start_pfn(cc->migrate_pfn - 1));
      
      the end result was that one statement expanding to 253kB in size.
      
      There are probably other cases of this, but this one case certainly
      stood out.
      
      I've added 'MIN_T()' and 'MAX_T()' macros for this kind of "core simple
      constant with specific type" use.  These macros skip the type checking,
      and as such need to be very sparingly used only for obvious cases that
      have active issues like this.
      Reported-by: default avatarLorenzo Stoakes <lorenzo.stoakes@oracle.com>
      Link: https://lore.kernel.org/all/36aa2cad-1db1-4abf-8dd2-fb20484aabc3@lucifer.local/
      Cc: David Laight <David.Laight@aculab.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3a7e02c0
    • Linus Torvalds's avatar
      minmax: avoid overly complex min()/max() macro arguments in xen · e8432ac8
      Linus Torvalds authored
      We have some very fancy min/max macros that have tons of sanity checking
      to warn about mixed signedness etc.
      
      This is all things that a sane compiler should warn about, but there are
      no sane compiler interfaces for this, and '-Wsign-compare' is broken [1]
      and not useful.
      
      So then we compensate (some would say over-compensate) by doing the
      checks manually with some truly horrid macro games.
      
      And no, we can't just use __builtin_types_compatible_p(), because the
      whole question of "does it make sense to compare these two values" is a
      lot more complicated than that.
      
      For example, it makes a ton of sense to compare unsigned values with
      simple constants like "5", even if that is indeed a signed type.  So we
      have these very strange macros to try to make sensible type checking
      decisions on the arguments to 'min()' and 'max()'.
      
      But that can cause enormous code expansion if the min()/max() macros are
      used with complicated expressions, and particularly if you nest these
      things so that you get the first big expansion then expanded again.
      
      The xen setup.c file ended up ballooning to over 50MB of preprocessed
      noise that takes 15s to compile (obviously depending on the build host),
      largely due to one single line.
      
      So let's split that one single line to just be simpler.  I think it ends
      up being more legible to humans too at the same time.  Now that single
      file compiles in under a second.
      Reported-and-reviewed-by: default avatarLorenzo Stoakes <lorenzo.stoakes@oracle.com>
      Link: https://lore.kernel.org/all/c83c17bb-be75-4c67-979d-54eee38774c6@lucifer.local/
      Link: https://staticthinking.wordpress.com/2023/07/25/wsign-compare-is-garbage/ [1]
      Cc: David Laight <David.Laight@aculab.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e8432ac8
    • Ryusuke Konishi's avatar
      nilfs2: handle inconsistent state in nilfs_btnode_create_block() · 4811f7af
      Ryusuke Konishi authored
      Syzbot reported that a buffer state inconsistency was detected in
      nilfs_btnode_create_block(), triggering a kernel bug.
      
      It is not appropriate to treat this inconsistency as a bug; it can occur
      if the argument block address (the buffer index of the newly created
      block) is a virtual block number and has been reallocated due to
      corruption of the bitmap used to manage its allocation state.
      
      So, modify nilfs_btnode_create_block() and its callers to treat it as a
      possible filesystem error, rather than triggering a kernel bug.
      
      Link: https://lkml.kernel.org/r/20240725052007.4562-1-konishi.ryusuke@gmail.com
      Fixes: a60be987 ("nilfs2: B-tree node cache")
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Reported-by: syzbot+89cc4f2324ed37988b60@syzkaller.appspotmail.com
      Closes: https://syzkaller.appspot.com/bug?extid=89cc4f2324ed37988b60
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4811f7af
    • Dev Jain's avatar
      selftests/mm: skip test for non-LPA2 and non-LVA systems · f556acc2
      Dev Jain authored
      Post my improvement of the test in e4a4ba41 ("selftests/mm:
      va_high_addr_switch: dynamically initialize testcases to enable LPA2
      testing"):
      
      The test begins to fail on 4k and 16k pages, on non-LPA2 systems.  To
      reduce noise in the CI systems, let us skip the test when higher address
      space is not implemented.
      
      Link: https://lkml.kernel.org/r/20240718052504.356517-1-dev.jain@arm.com
      Fixes: e4a4ba41 ("selftests/mm: va_high_addr_switch: dynamically initialize testcases to enable LPA2 testing")
      Signed-off-by: default avatarDev Jain <dev.jain@arm.com>
      Reviewed-by: default avatarRyan Roberts <ryan.roberts@arm.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Mark Brown <broonie@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f556acc2
    • Li Zhijian's avatar
      mm/page_alloc: fix pcp->count race between drain_pages_zone() vs __rmqueue_pcplist() · 66eca102
      Li Zhijian authored
      It's expected that no page should be left in pcp_list after calling
      zone_pcp_disable() in offline_pages().  Previously, it's observed that
      offline_pages() gets stuck [1] due to some pages remaining in pcp_list.
      
      Cause:
      There is a race condition between drain_pages_zone() and __rmqueue_pcplist()
      involving the pcp->count variable. See below scenario:
      
               CPU0                              CPU1
          ----------------                    ---------------
                                            spin_lock(&pcp->lock);
                                            __rmqueue_pcplist() {
      zone_pcp_disable() {
                                              /* list is empty */
                                              if (list_empty(list)) {
                                                /* add pages to pcp_list */
                                                alloced = rmqueue_bulk()
        mutex_lock(&pcp_batch_high_lock)
        ...
        __drain_all_pages() {
          drain_pages_zone() {
            /* read pcp->count, it's 0 here */
            count = READ_ONCE(pcp->count)
            /* 0 means nothing to drain */
                                                /* update pcp->count */
                                                pcp->count += alloced << order;
            ...
                                            ...
                                            spin_unlock(&pcp->lock);
      
      In this case, after calling zone_pcp_disable() though, there are still some
      pages in pcp_list. And these pages in pcp_list are neither movable nor
      isolated, offline_pages() gets stuck as a result.
      
      Solution:
      Expand the scope of the pcp->lock to also protect pcp->count in
      drain_pages_zone(), to ensure no pages are left in the pcp list after
      zone_pcp_disable()
      
      [1] https://lore.kernel.org/linux-mm/6a07125f-e720-404c-b2f9-e55f3f166e85@fujitsu.com/
      
      Link: https://lkml.kernel.org/r/20240723064428.1179519-1-lizhijian@fujitsu.com
      Fixes: 4b23a68f ("mm/page_alloc: protect PCP lists with a spinlock")
      Signed-off-by: default avatarLi Zhijian <lizhijian@fujitsu.com>
      Reported-by: default avatarYao Xingtao <yaoxt.fnst@fujitsu.com>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      66eca102
    • Roman Gushchin's avatar
      mm: memcg: add cacheline padding after lruvec in mem_cgroup_per_node · f59adcf5
      Roman Gushchin authored
      Oliver Sand reported a performance regression caused by commit
      98c9daf5 ("mm: memcg: guard memcg1-specific members of struct
      mem_cgroup_per_node"), which puts some fields of the mem_cgroup_per_node
      structure under the CONFIG_MEMCG_V1 config option.  Apparently it causes a
      false cache sharing between lruvec and lru_zone_size members of the
      structure.  Fix it by adding an explicit padding after the lruvec member.
      
      Even though the padding is not required with CONFIG_MEMCG_V1 set, it seems
      like the introduced memory overhead is not significant enough to warrant
      another divergence in the mem_cgroup_per_node layout, so the padding is
      added unconditionally.
      
      Link: https://lkml.kernel.org/r/20240723171244.747521-1-roman.gushchin@linux.dev
      Fixes: 98c9daf5 ("mm: memcg: guard memcg1-specific members of struct mem_cgroup_per_node")
      Signed-off-by: default avatarRoman Gushchin <roman.gushchin@linux.dev>
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Closes: https://lore.kernel.org/oe-lkp/202407121335.31a10cb6-oliver.sang@intel.comTested-by: default avatarOliver Sang <oliver.sang@intel.com>
      Acked-by: default avatarShakeel Butt <shakeel.butt@linux.dev>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f59adcf5
    • Suren Baghdasaryan's avatar
      alloc_tag: outline and export free_reserved_page() · b3bebe44
      Suren Baghdasaryan authored
      Outline and export free_reserved_page() because modules use it and it in
      turn uses page_ext_{get|put} which should not be exported.  The same
      result could be obtained by outlining {get|put}_page_tag_ref() but that
      would have higher performance impact as these functions are used in more
      performance critical paths.
      
      Link: https://lkml.kernel.org/r/20240717212844.2749975-1-surenb@google.com
      Fixes: dcfe378c ("lib: introduce support for page allocation tagging")
      Signed-off-by: default avatarSuren Baghdasaryan <surenb@google.com>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Closes: https://lore.kernel.org/oe-kbuild-all/202407080044.DWMC9N9I-lkp@intel.com/Suggested-by: default avatarChristoph Hellwig <hch@infradead.org>
      Suggested-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kent Overstreet <kent.overstreet@linux.dev>
      Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
      Cc: Sourav Panda <souravpanda@google.com>
      Cc: <stable@vger.kernel.org>	[6.10]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b3bebe44
    • Ross Lagerwall's avatar
      decompress_bunzip2: fix rare decompression failure · bf6acd5d
      Ross Lagerwall authored
      The decompression code parses a huffman tree and counts the number of
      symbols for a given bit length.  In rare cases, there may be >= 256
      symbols with a given bit length, causing the unsigned char to overflow. 
      This causes a decompression failure later when the code tries and fails to
      find the bit length for a given symbol.
      
      Since the maximum number of symbols is 258, use unsigned short instead.
      
      Link: https://lkml.kernel.org/r/20240717162016.1514077-1-ross.lagerwall@citrix.com
      Fixes: bc22c17e ("bzip2/lzma: library support for gzip, bzip2 and lzma decompression")
      Signed-off-by: default avatarRoss Lagerwall <ross.lagerwall@citrix.com>
      Cc: Alain Knaff <alain@knaff.lu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      bf6acd5d
    • Gavin Shan's avatar
      mm/huge_memory: avoid PMD-size page cache if needed · d659b715
      Gavin Shan authored
      xarray can't support arbitrary page cache size.  the largest and supported
      page cache size is defined as MAX_PAGECACHE_ORDER by commit 099d9064
      ("mm/filemap: make MAX_PAGECACHE_ORDER acceptable to xarray").  However,
      it's possible to have 512MB page cache in the huge memory's collapsing
      path on ARM64 system whose base page size is 64KB.  512MB page cache is
      breaking the limitation and a warning is raised when the xarray entry is
      split as shown in the following example.
      
      [root@dhcp-10-26-1-207 ~]# cat /proc/1/smaps | grep KernelPageSize
      KernelPageSize:       64 kB
      [root@dhcp-10-26-1-207 ~]# cat /tmp/test.c
         :
      int main(int argc, char **argv)
      {
      	const char *filename = TEST_XFS_FILENAME;
      	int fd = 0;
      	void *buf = (void *)-1, *p;
      	int pgsize = getpagesize();
      	int ret = 0;
      
      	if (pgsize != 0x10000) {
      		fprintf(stdout, "System with 64KB base page size is required!\n");
      		return -EPERM;
      	}
      
      	system("echo 0 > /sys/devices/virtual/bdi/253:0/read_ahead_kb");
      	system("echo 1 > /proc/sys/vm/drop_caches");
      
      	/* Open the xfs file */
      	fd = open(filename, O_RDONLY);
      	assert(fd > 0);
      
      	/* Create VMA */
      	buf = mmap(NULL, TEST_MEM_SIZE, PROT_READ, MAP_SHARED, fd, 0);
      	assert(buf != (void *)-1);
      	fprintf(stdout, "mapped buffer at 0x%p\n", buf);
      
      	/* Populate VMA */
      	ret = madvise(buf, TEST_MEM_SIZE, MADV_NOHUGEPAGE);
      	assert(ret == 0);
      	ret = madvise(buf, TEST_MEM_SIZE, MADV_POPULATE_READ);
      	assert(ret == 0);
      
      	/* Collapse VMA */
      	ret = madvise(buf, TEST_MEM_SIZE, MADV_HUGEPAGE);
      	assert(ret == 0);
      	ret = madvise(buf, TEST_MEM_SIZE, MADV_COLLAPSE);
      	if (ret) {
      		fprintf(stdout, "Error %d to madvise(MADV_COLLAPSE)\n", errno);
      		goto out;
      	}
      
      	/* Split xarray entry. Write permission is needed */
      	munmap(buf, TEST_MEM_SIZE);
      	buf = (void *)-1;
      	close(fd);
      	fd = open(filename, O_RDWR);
      	assert(fd > 0);
      	fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE,
       		  TEST_MEM_SIZE - pgsize, pgsize);
      out:
      	if (buf != (void *)-1)
      		munmap(buf, TEST_MEM_SIZE);
      	if (fd > 0)
      		close(fd);
      
      	return ret;
      }
      
      [root@dhcp-10-26-1-207 ~]# gcc /tmp/test.c -o /tmp/test
      [root@dhcp-10-26-1-207 ~]# /tmp/test
       ------------[ cut here ]------------
       WARNING: CPU: 25 PID: 7560 at lib/xarray.c:1025 xas_split_alloc+0xf8/0x128
       Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib    \
       nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct      \
       nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4      \
       ip_set rfkill nf_tables nfnetlink vfat fat virtio_balloon drm fuse   \
       xfs libcrc32c crct10dif_ce ghash_ce sha2_ce sha256_arm64 virtio_net  \
       sha1_ce net_failover virtio_blk virtio_console failover dimlib virtio_mmio
       CPU: 25 PID: 7560 Comm: test Kdump: loaded Not tainted 6.10.0-rc7-gavin+ #9
       Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20240524-1.el9 05/24/2024
       pstate: 83400005 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
       pc : xas_split_alloc+0xf8/0x128
       lr : split_huge_page_to_list_to_order+0x1c4/0x780
       sp : ffff8000ac32f660
       x29: ffff8000ac32f660 x28: ffff0000e0969eb0 x27: ffff8000ac32f6c0
       x26: 0000000000000c40 x25: ffff0000e0969eb0 x24: 000000000000000d
       x23: ffff8000ac32f6c0 x22: ffffffdfc0700000 x21: 0000000000000000
       x20: 0000000000000000 x19: ffffffdfc0700000 x18: 0000000000000000
       x17: 0000000000000000 x16: ffffd5f3708ffc70 x15: 0000000000000000
       x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
       x11: ffffffffffffffc0 x10: 0000000000000040 x9 : ffffd5f3708e692c
       x8 : 0000000000000003 x7 : 0000000000000000 x6 : ffff0000e0969eb8
       x5 : ffffd5f37289e378 x4 : 0000000000000000 x3 : 0000000000000c40
       x2 : 000000000000000d x1 : 000000000000000c x0 : 0000000000000000
       Call trace:
        xas_split_alloc+0xf8/0x128
        split_huge_page_to_list_to_order+0x1c4/0x780
        truncate_inode_partial_folio+0xdc/0x160
        truncate_inode_pages_range+0x1b4/0x4a8
        truncate_pagecache_range+0x84/0xa0
        xfs_flush_unmap_range+0x70/0x90 [xfs]
        xfs_file_fallocate+0xfc/0x4d8 [xfs]
        vfs_fallocate+0x124/0x2f0
        ksys_fallocate+0x4c/0xa0
        __arm64_sys_fallocate+0x24/0x38
        invoke_syscall.constprop.0+0x7c/0xd8
        do_el0_svc+0xb4/0xd0
        el0_svc+0x44/0x1d8
        el0t_64_sync_handler+0x134/0x150
        el0t_64_sync+0x17c/0x180
      
      Fix it by correcting the supported page cache orders, different sets for
      DAX and other files.  With it corrected, 512MB page cache becomes
      disallowed on all non-DAX files on ARM64 system where the base page size
      is 64KB.  After this patch is applied, the test program fails with error
      -EINVAL returned from __thp_vma_allowable_orders() and the madvise()
      system call to collapse the page caches.
      
      Link: https://lkml.kernel.org/r/20240715000423.316491-1-gshan@redhat.com
      Fixes: 6b24ca4a ("mm: Use multi-index entries in the page cache")
      Signed-off-by: default avatarGavin Shan <gshan@redhat.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarRyan Roberts <ryan.roberts@arm.com>
      Acked-by: default avatarZi Yan <ziy@nvidia.com>
      Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
      Cc: Barry Song <baohua@kernel.org>
      Cc: Don Dutile <ddutile@redhat.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Cc: <stable@vger.kernel.org>	[5.17+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d659b715
    • Yang Shi's avatar
      mm: huge_memory: use !CONFIG_64BIT to relax huge page alignment on 32 bit machines · d9592025
      Yang Shi authored
      Yves-Alexis Perez reported commit 4ef9ad19 ("mm: huge_memory: don't
      force huge page alignment on 32 bit") didn't work for x86_32 [1].  It is
      because x86_32 uses CONFIG_X86_32 instead of CONFIG_32BIT.
      
      !CONFIG_64BIT should cover all 32 bit machines.
      
      [1] https://lore.kernel.org/linux-mm/CAHbLzkr1LwH3pcTgM+aGQ31ip2bKqiqEQ8=FQB+t2c3dhNKNHA@mail.gmail.com/
      
      Link: https://lkml.kernel.org/r/20240712155855.1130330-1-yang@os.amperecomputing.com
      Fixes: 4ef9ad19 ("mm: huge_memory: don't force huge page alignment on 32 bit")
      Signed-off-by: default avatarYang Shi <yang@os.amperecomputing.com>
      Reported-by: default avatarYves-Alexis Perez <corsac@debian.org>
      Tested-by: default avatarYves-Alexis Perez <corsac@debian.org>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Jiri Slaby <jirislaby@kernel.org>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Salvatore Bonaccorso <carnil@debian.org>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: <stable@vger.kernel.org>	[6.8+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d9592025
    • Ram Tummala's avatar
      mm: fix old/young bit handling in the faulting path · 4cd7ba16
      Ram Tummala authored
      Commit 3bd786f7 ("mm: convert do_set_pte() to set_pte_range()")
      replaced do_set_pte() with set_pte_range() and that introduced a
      regression in the following faulting path of non-anonymous vmas which
      caused the PTE for the faulting address to be marked as old instead of
      young.
      
      handle_pte_fault()
        do_pte_missing()
          do_fault()
            do_read_fault() || do_cow_fault() || do_shared_fault()
              finish_fault()
                set_pte_range()
      
      The polarity of prefault calculation is incorrect.  This leads to prefault
      being incorrectly set for the faulting address.  The following check will
      incorrectly mark the PTE old rather than young.  On some architectures
      this will cause a double fault to mark it young when the access is
      retried.
      
          if (prefault && arch_wants_old_prefaulted_pte())
              entry = pte_mkold(entry);
      
      On a subsequent fault on the same address, the faulting path will see a
      non NULL vmf->pte and instead of reaching the do_pte_missing() path, PTE
      will then be correctly marked young in handle_pte_fault() itself.
      
      Due to this bug, performance degradation in the fault handling path will
      be observed due to unnecessary double faulting.
      
      Link: https://lkml.kernel.org/r/20240710014539.746200-1-rtummala@nvidia.com
      Fixes: 3bd786f7 ("mm: convert do_set_pte() to set_pte_range()")
      Signed-off-by: default avatarRam Tummala <rtummala@nvidia.com>
      Reviewed-by: default avatarYin Fengwei <fengwei.yin@intel.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Yin Fengwei <fengwei.yin@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4cd7ba16
    • James Clark's avatar
      dt-bindings: arm: update James Clark's email address · 34e526f6
      James Clark authored
      My new address is james.clark@linaro.org
      
      Link: https://lkml.kernel.org/r/20240709102512.31212-3-james.clark@linaro.orgSigned-off-by: default avatarJames Clark <james.clark@linaro.org>
      Cc: Bjorn Andersson <quic_bjorande@quicinc.com>
      Cc: Conor Dooley <conor+dt@kernel.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Geliang Tang <geliang@kernel.org>
      Cc: Hao Zhang <quic_hazha@quicinc.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Kees Cook <kees@kernel.org>
      Cc: Krzysztof Kozlowski <krzk+dt@kernel.org>
      Cc: Mao Jinlong <quic_jinlmao@quicinc.com>
      Cc: Matthieu Baerts <matttbe@kernel.org>
      Cc: Matt Ranostay <matt@ranostay.sg>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Oleksij Rempel <o.rempel@pengutronix.de>
      Cc: Rob Herring (Arm) <robh@kernel.org>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      34e526f6