1. 15 Jul, 2024 26 commits
    • Linus Torvalds's avatar
      Merge tag 'nolibc.2024.07.15a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu · f97b956b
      Linus Torvalds authored
      Pull nolibc updates from Paul McKenney:
      
       - Fix selftest printf format mismatch in expect_str_buf_eq()
      
       - Stop using brk() and sbrk() when testing against musl, which
         implements these two functions with ENOMEM
      
       - Make tests use -Werror to force failure on compiler warnings
      
       - Add limits for the {u,}intmax_t, ulong and {u,}llong types
      
       - Implement strtol() and friends
      
       - Add facility to skip nolibc-specific tests when running against
         non-nolibc libraries
      
       - Implement strerror()
      
       - Also use strerror() on nolibc when running kselftests
      
      * tag 'nolibc.2024.07.15a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
        selftests: kselftest: also use strerror() on nolibc
        tools/nolibc: implement strerror()
        selftests/nolibc: introduce condition to run tests only on nolibc
        tools/nolibc: implement strtol() and friends
        tools/nolibc: add limits for {u,}intmax_t, ulong and {u,}llong
        selftests/nolibc: run-tests.sh: use -Werror by default
        selftests/nolibc: disable brk()/sbrk() tests on musl
        selftests/nolibc: fix printf format mismatch in expect_str_buf_eq()
      f97b956b
    • Linus Torvalds's avatar
      Merge tag 'kcsan.2024.07.12a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu · e4b2b0b1
      Linus Torvalds authored
      Pull KCSAN updates from Paul McKenney:
      
       - improve the documentation for the new __data_racy type qualifier
         to the data_race() macro's kernel-doc header and to the LKMM's
         access-marking documentation
      
       - add missing MODULE_DESCRIPTION
      
      * tag 'kcsan.2024.07.12a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
        kcsan: Add missing MODULE_DESCRIPTION() macro
        kcsan: Add example to data_race() kerneldoc header
      e4b2b0b1
    • Linus Torvalds's avatar
      Merge tag 'torture.2024.07.12a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu · b176e21d
      Linus Torvalds authored
      Pull torture-test updates from Paul McKenney:
       "This adds MODULE_DESCRIPTION() to torture.c, locktorture.c, and
        scftorture.c, and also adds 'static' to a global variable that is used
        only in scftorture.c"
      
      * tag 'torture.2024.07.12a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
        scftorture: Make torture_type static
        scftorture: Add MODULE_DESCRIPTION()
        locktorture: Add MODULE_DESCRIPTION()
        torture: Add MODULE_DESCRIPTION()
      b176e21d
    • Linus Torvalds's avatar
      Merge tag 'rcu.2024.07.12a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu · 9855e873
      Linus Torvalds authored
      Pull RCU updates from Paul McKenney:
      
       - Update Tasks RCU and Tasks Rude RCU description in Requirements.rst
         and clarify rcu_assign_pointer() and rcu_dereference() ordering
         properties
      
       - Add lockdep assertions for RCU readers, limit inline wakeups for
         callback-bypass synchronize_rcu(), add an
         rcutree.nohz_full_patience_delay to reduce nohz_full OS jitter, add
         Uladzislau Rezki as RCU maintainer, and fix a subtle
         callback-migration memory-ordering issue
      
       - Remove a number of redundant memory barriers
      
       - Remove unnecessary bypass-list lock-contention mitigation, use
         parking API instead of open-coded ad-hoc equivalent, and upgrade
         obsolete comments
      
       - Revert avoidance of a deadlock that can no longer occur and properly
         synchronize Tasks Trace RCU checking of runqueues
      
       - Add tests for handling of double-call_rcu() bug, add missing
         MODULE_DESCRIPTION, and add a script that histograms the number of
         calls to RCU updaters
      
       - Fill out SRCU polled-grace-period API
      
      * tag 'rcu.2024.07.12a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (29 commits)
        rcu: Fix rcu_barrier() VS post CPUHP_TEARDOWN_CPU invocation
        rcu: Eliminate lockless accesses to rcu_sync->gp_count
        MAINTAINERS: Add Uladzislau Rezki as RCU maintainer
        rcu: Add rcutree.nohz_full_patience_delay to reduce nohz_full OS jitter
        rcu/exp: Remove redundant full memory barrier at the end of GP
        rcu: Remove full memory barrier on RCU stall printout
        rcu: Remove full memory barrier on boot time eqs sanity check
        rcu/exp: Remove superfluous full memory barrier upon first EQS snapshot
        rcu: Remove superfluous full memory barrier upon first EQS snapshot
        rcu: Remove full ordering on second EQS snapshot
        srcu: Fill out polled grace-period APIs
        srcu: Update cleanup_srcu_struct() comment
        srcu: Add NUM_ACTIVE_SRCU_POLL_OLDSTATE
        srcu: Disable interrupts directly in srcu_gp_end()
        rcu: Disable interrupts directly in rcu_gp_init()
        rcu/tree: Reduce wake up for synchronize_rcu() common case
        rcu/tasks: Fix stale task snaphot for Tasks Trace
        tools/rcu: Add rcu-updaters.sh script
        rcutorture: Add missing MODULE_DESCRIPTION() macros
        rcutorture: Fix rcu_torture_fwd_cb_cr() data race
        ...
      9855e873
    • Linus Torvalds's avatar
      Merge tag 'lkmm.2024.07.12a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu · 253e1e98
      Linus Torvalds authored
      Pull memory model updates from Paul McKenney:
       "lkmm: Fix corner-case locking bug and improve documentation
      
        A simple but odd single-process litmus test acquires and immediately
        releases a lock, then calls spin_is_locked(). LKMM acts if it was a
        deadlock due to an assumption that spin_is_locked() will follow a
        spin_lock() or some other process's spin_unlock(). This litmus test
        manages to violate this assumption because the spin_is_locked()
        follows the same process's spin_unlock().
      
        This series fixes this bug, reorganizes and optimizes the lock.cat
        model, and updates documentation"
      
      * tag 'lkmm.2024.07.12a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
        tools/memory-model: Code reorganization in lock.cat
        tools/memory-model: Fix bug in lock.cat
        tools/memory-model: Add access-marking.txt to README
        tools/memory-model: Add KCSAN LF mentorship session citation
      253e1e98
    • Linus Torvalds's avatar
      Merge tag 'cmpxchg.2024.07.12a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu · c4b729b0
      Linus Torvalds authored
      Pull arm byte cmpxchg from Paul McKenney:
       "ARM: Provide one-byte cmpxchg emulation
      
        This provides emulated one-byte cmpxchg() support for ARM using the
        cmpxchg_emu_u8() function that uses a four-byte cmpxchg() to emulate
        the one-byte variant.
      
        Similar patches for emulation of one-byte cmpxchg() for arc, sh, and
        xtensa have not yet received maintainer acks, so they are slated for
        the v6.12 merge window"
      
      * tag 'cmpxchg.2024.07.12a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
        ARM: Emulate one-byte cmpxchg
      c4b729b0
    • Linus Torvalds's avatar
      Merge tag 'timers-core-2024-07-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4fd94356
      Linus Torvalds authored
      Pull timer updates from Thomas Gleixner:
       "Updates for timers, timekeeping and related functionality:
      
        Core:
      
         - Make the takeover of a hrtimer based broadcast timer reliable
           during CPU hot-unplug. The current implementation suffers from a
           race which can lead to broadcast timer starvation in the worst
           case.
      
         - VDSO related cleanups and simplifications
      
         - Small cleanups and enhancements all over the place
      
        PTP:
      
         - Replace the architecture specific base clock to clocksource, e.g.
           ART to TSC, conversion function with generic functionality to avoid
           exposing such internals to drivers and convert all existing drivers
           over. This also allows to provide functionality which converts the
           other way round in the core code based on the same parameter set.
      
         - Provide a function to convert CLOCK_REALTIME to the base clock to
           support the upcoming PPS output driver on Intel platforms.
      
        Drivers:
      
         - A set of Device Tree bindings for new hardware
      
         - Cleanups and enhancements all over the place"
      
      * tag 'timers-core-2024-07-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
        clocksource/drivers/realtek: Add timer driver for rtl-otto platforms
        dt-bindings: timer: Add schema for realtek,otto-timer
        dt-bindings: timer: Add SOPHGO SG2002 clint
        dt-bindings: timer: renesas,tmu: Add R-Car Gen2 support
        dt-bindings: timer: renesas,tmu: Add RZ/G1 support
        dt-bindings: timer: renesas,tmu: Add R-Mobile APE6 support
        clocksource/drivers/mips-gic-timer: Correct sched_clock width
        clocksource/drivers/mips-gic-timer: Refine rating computation
        clocksource/drivers/sh_cmt: Address race condition for clock events
        clocksource/driver/arm_global_timer: Remove unnecessary ‘0’ values from err
        clocksource/drivers/arm_arch_timer: Remove unnecessary ‘0’ values from irq
        tick/broadcast: Make takeover of broadcast hrtimer reliable
        tick/sched: Combine WARN_ON_ONCE and print_once
        x86/vdso: Remove unused include
        x86/vgtod: Remove unused typedef gtod_long_t
        x86/vdso: Fix function reference in comment
        vdso: Add comment about reason for vdso struct ordering
        vdso/gettimeofday: Clarify comment about open coded function
        timekeeping: Add missing kernel-doc function comments
        tick: Remove unnused tick_nohz_get_idle_calls()
        ...
      4fd94356
    • Linus Torvalds's avatar
      Merge tag 'smp-core-2024-07-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0eff0491
      Linus Torvalds authored
      Pull CPU hotplug updates from Thomas Gleixner:
       "A small set of SMP/CPU hotplug updates:
      
         - Reverse the order of iteration when freezing secondary CPUs for
           hibernation.
      
           This avoids that drivers like the Intel uncore performance counter
           have to transfer the assignement of handling the per package uncore
           events for every CPU in a package, which is a considerable speedup
           on larger systems.
      
         - Add a missing destroy_work_on_stack() invocation in
           smp_call_on_cpu() to prevent debug objects to emit a false positive
           warning when the stack is freed.
      
         - Small cleanups in comments and a str_plural() conversion"
      
      * tag 'smp-core-2024-07-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        smp: Add missing destroy_work_on_stack() call in smp_call_on_cpu()
        cpu/hotplug: Reverse order of iteration in freeze_secondary_cpus()
        smp: Use str_plural() to fix Coccinelle warnings
        cpu/hotplug: Fix typo in comment
      0eff0491
    • Linus Torvalds's avatar
      Merge tag 'core-debugobjects-2024-07-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0e4b77d4
      Linus Torvalds authored
      Pull debugobjects update from Thomas Gleixner:
       "A single update for debugobjects to annotate all intentionally racy
        global debug variables so that KCSAN ignores them"
      
      * tag 'core-debugobjects-2024-07-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        debugobjects: Annotate racy debug variables
      0e4b77d4
    • Linus Torvalds's avatar
      Merge tag 'for-6.11/block-20240710' of git://git.kernel.dk/linux · 3e781988
      Linus Torvalds authored
      Pull block updates from Jens Axboe:
      
       - NVMe updates via Keith:
           - Device initialization memory leak fixes (Keith)
           - More constants defined (Weiwen)
           - Target debugfs support (Hannes)
           - PCIe subsystem reset enhancements (Keith)
           - Queue-depth multipath policy (Redhat and PureStorage)
           - Implement get_unique_id (Christoph)
           - Authentication error fixes (Gaosheng)
      
       - MD updates via Song
           - sync_action fix and refactoring (Yu Kuai)
           - Various small fixes (Christoph Hellwig, Li Nan, and Ofir Gal, Yu
             Kuai, Benjamin Marzinski, Christophe JAILLET, Yang Li)
      
       - Fix loop detach/open race (Gulam)
      
       - Fix lower control limit for blk-throttle (Yu)
      
       - Add module descriptions to various drivers (Jeff)
      
       - Add support for atomic writes for block devices, and statx reporting
         for same. Includes SCSI and NVMe (John, Prasad, Alan)
      
       - Add IO priority information to block trace points (Dongliang)
      
       - Various zone improvements and tweaks (Damien)
      
       - mq-deadline tag reservation improvements (Bart)
      
       - Ignore direct reclaim swap writes in writeback throttling (Baokun)
      
       - Block integrity improvements and fixes (Anuj)
      
       - Add basic support for rust based block drivers. Has a dummy null_blk
         variant for now (Andreas)
      
       - Series converting driver settings to queue limits, and cleanups and
         fixes related to that (Christoph)
      
       - Cleanup for poking too deeply into the bvec internals, in preparation
         for DMA mapping API changes (Christoph)
      
       - Various minor tweaks and fixes (Jiapeng, John, Kanchan, Mikulas,
         Ming, Zhu, Damien, Christophe, Chaitanya)
      
      * tag 'for-6.11/block-20240710' of git://git.kernel.dk/linux: (206 commits)
        floppy: add missing MODULE_DESCRIPTION() macro
        loop: add missing MODULE_DESCRIPTION() macro
        ublk_drv: add missing MODULE_DESCRIPTION() macro
        xen/blkback: add missing MODULE_DESCRIPTION() macro
        block/rnbd: Constify struct kobj_type
        block: take offset into account in blk_bvec_map_sg again
        block: fix get_max_segment_size() warning
        loop: Don't bother validating blocksize
        virtio_blk: Don't bother validating blocksize
        null_blk: Don't bother validating blocksize
        block: Validate logical block size in blk_validate_limits()
        virtio_blk: Fix default logical block size fallback
        nvmet-auth: fix nvmet_auth hash error handling
        nvme: implement ->get_unique_id
        block: pass a phys_addr_t to get_max_segment_size
        block: add a bvec_phys helper
        blk-lib: check for kill signal in ioctl BLKZEROOUT
        block: limit the Write Zeroes to manually writing zeroes fallback
        block: refacto blkdev_issue_zeroout
        block: move read-only and supported checks into (__)blkdev_issue_zeroout
        ...
      3e781988
    • Linus Torvalds's avatar
      Merge tag 'for-6.11/io_uring-20240714' of git://git.kernel.dk/linux · 3a56e241
      Linus Torvalds authored
      Pull io_uring updates from Jens Axboe:
       "Here are the io_uring updates queued up for 6.11.
      
        Nothing major this time around, various minor improvements and
        cleanups/fixes. This contains:
      
         - Add bind/listen opcodes. Main motivation is to support direct
           descriptors, to avoid needing a regular fd just for doing these two
           operations (Gabriel)
      
         - Probe fixes (Gabriel)
      
         - Treat io-wq work flags as atomics. Not fixing a real issue, but may
           as well and it silences a KCSAN warning (me)
      
         - Cleanup of rsrc __set_current_state() usage (me)
      
         - Add 64-bit for {m,f}advise operations (me)
      
         - Improve performance of data ring messages (me)
      
         - Fix for ring message overflow posting (Pavel)
      
         - Fix for freezer interaction with TWA_NOTIFY_SIGNAL. Not strictly an
           io_uring thing, but since TWA_NOTIFY_SIGNAL was originally added
           for faster task_work signaling for io_uring, bundling it with this
           pull (Pavel)
      
         - Add Pavel as a co-maintainer
      
         - Various cleanups (me, Thorsten)"
      
      * tag 'for-6.11/io_uring-20240714' of git://git.kernel.dk/linux: (28 commits)
        io_uring/net: check socket is valid in io_bind()/io_listen()
        kernel: rerun task_work while freezing in get_signal()
        io_uring/io-wq: limit retrying worker initialisation
        io_uring/napi: Remove unnecessary s64 cast
        io_uring/net: cleanup io_recv_finish() bundle handling
        io_uring/msg_ring: fix overflow posting
        MAINTAINERS: change Pavel Begunkov from io_uring reviewer to maintainer
        io_uring/msg_ring: use kmem_cache_free() to free request
        io_uring/msg_ring: check for dead submitter task
        io_uring/msg_ring: add an alloc cache for io_kiocb entries
        io_uring/msg_ring: improve handling of target CQE posting
        io_uring: add io_add_aux_cqe() helper
        io_uring: add remote task_work execution helper
        io_uring/msg_ring: tighten requirement for remote posting
        io_uring: Allocate only necessary memory in io_probe
        io_uring: Fix probe of disabled operations
        io_uring: Introduce IORING_OP_LISTEN
        io_uring: Introduce IORING_OP_BIND
        net: Split a __sys_listen helper for io_uring
        net: Split a __sys_bind helper for io_uring
        ...
      3a56e241
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.11.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 4f5e249e
      Linus Torvalds authored
      Pull iomap updates from Christian Brauner:
       "This contains some minor work for the iomap subsystem:
      
         - Add documentation on the design of iomap and how to port to it
      
         - Optimize iomap_read_folio()
      
         - Bring back the change to iomap_write_end() to no increase i_size.
      
           This is accompanied by a change to xfs to reserve blocks for
           truncating large realtime inodes to avoid exposing stale data when
           iomap_write_end() stops increasing i_size"
      
      * tag 'vfs-6.11.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        iomap: don't increase i_size in iomap_write_end()
        xfs: reserve blocks for truncating large realtime inode
        Documentation: the design of iomap and how to port
        iomap: Optimize iomap_read_folio
      4f5e249e
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.11.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 98f3a9a4
      Linus Torvalds authored
      Pull pidfs updates from Christian Brauner:
       "This contains work to make it possible to derive namespace file
        descriptors from pidfd file descriptors.
      
        Right now it is already possible to use a pidfd with setns() to
        atomically change multiple namespaces at the same time. In other
        words, it is possible to switch to the namespace context of a process
        using a pidfd. There is no need to first open namespace file
        descriptors via procfs.
      
        The work included here is an extension of these abilities by allowing
        to open namespace file descriptors using a pidfd. This means it is now
        possible to interact with namespaces without ever touching procfs.
      
        To this end a new set of ioctls() on pidfds is introduced covering all
        supported namespace types"
      
      * tag 'vfs-6.11.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        pidfs: allow retrieval of namespace file descriptors
        nsfs: add open_namespace()
        nsproxy: add helper to go from arbitrary namespace to ns_common
        nsproxy: add a cleanup helper for nsproxy
        file: add take_fd() cleanup helper
      98f3a9a4
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.11.nsfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 1b074abe
      Linus Torvalds authored
      Pull namespace-fs updates from Christian Brauner:
       "This adds ioctls allowing to translate PIDs between PID namespaces.
      
        The motivating use-case comes from LXCFS which is a tiny fuse
        filesystem used to virtualize various aspects of procfs. LXCFS is run
        on the host. The files and directories it creates can be bind-mounted
        by e.g. a container at startup and mounted over the various procfs
        files the container wishes to have virtualized.
      
        When e.g. a read request for uptime is received, LXCFS will receive
        the pid of the reader. In order to virtualize the corresponding read,
        LXCFS needs to know the pid of the init process of the reader's pid
        namespace.
      
        In order to do this, LXCFS first needs to fork() two helper processes.
        The first helper process setns() to the readers pid namespace. The
        second helper process is needed to create a process that is a proper
        member of the pid namespace.
      
        The second helper process then creates a ucred message with ucred.pid
        set to 1 and sends it back to LXCFS. The kernel will translate the
        ucred.pid field to the corresponding pid number in LXCFS's pid
        namespace. This way LXCFS can learn the init pid number of the
        reader's pid namespace and can go on to virtualize.
      
        Since these two forks() are costly LXCFS maintains an init pid cache
        that caches a given pid for a fixed amount of time. The cache is
        pruned during new read requests. However, even with the cache the hit
        of the two forks() is singificant when a very large number of
        containers are running.
      
        So this adds a simple set of ioctls that let's a caller translate PIDs
        from and into a given PID namespace. This significantly improves
        performance with a very simple change.
      
        To protect against races pidfds can be used to check whether the
        process is still valid"
      
      * tag 'vfs-6.11.nsfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        nsfs: add pid translation ioctls
      1b074abe
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.11.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · f608caba
      Linus Torvalds authored
      Pull vfs mount query updates from Christian Brauner:
       "This contains work to extend the abilities of listmount() and
        statmount() and various fixes and cleanups.
      
        Features:
      
         - Allow iterating through mounts via listmount() from newest to
           oldest. This makes it possible for mount(8) to keep iterating the
           mount table in reverse order so it gets newest mounts first.
      
         - Relax permissions on listmount() and statmount().
      
           It's not necessary to have capabilities in the initial namespace:
           it is sufficient to have capabilities in the owning namespace of
           the mount namespace we're located in to list unreachable mounts in
           that namespace.
      
         - Extend both listmount() and statmount() to list and stat mounts in
           foreign mount namespaces.
      
           Currently the only way to iterate over mount entries in mount
           namespaces that aren't in the caller's mount namespace is by
           crawling through /proc in order to find /proc/<pid>/mountinfo for
           the relevant mount namespace.
      
           This is both very clumsy and hugely inefficient. So extend struct
           mnt_id_req with a new member that allows to specify the mount
           namespace id of the mount namespace we want to look at.
      
           Luckily internally we already have most of the infrastructure for
           this so we just need to expose it to userspace. Give userspace a
           way to retrieve the id of a mount namespace via statmount() and
           through a new nsfs ioctl() on mount namespace file descriptor.
      
           This comes with appropriate selftests.
      
         - Expose mount options through statmount().
      
           Currently if userspace wants to get mount options for a mount and
           with statmount(), they still have to open /proc/<pid>/mountinfo to
           parse mount options. Simply the information through statmount()
           directly.
      
           Afterwards it's possible to only rely on statmount() and
           listmount() to retrieve all and more information than
           /proc/<pid>/mountinfo provides.
      
           This comes with appropriate selftests.
      
        Fixes:
      
         - Avoid copying to userspace under the namespace semaphore in
           listmount.
      
        Cleanups:
      
         - Simplify the error handling in listmount by relying on our newly
           added cleanup infrastructure.
      
         - Refuse invalid mount ids early for both listmount and statmount"
      
      * tag 'vfs-6.11.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        fs: reject invalid last mount id early
        fs: refuse mnt id requests with invalid ids early
        fs: find rootfs mount of the mount namespace
        fs: only copy to userspace on success in listmount()
        sefltests: extend the statmount test for mount options
        fs: use guard for namespace_sem in statmount()
        fs: export mount options via statmount()
        fs: rename show_mnt_opts -> show_vfsmnt_opts
        selftests: add a test for the foreign mnt ns extensions
        fs: add an ioctl to get the mnt ns id from nsfs
        fs: Allow statmount() in foreign mount namespace
        fs: Allow listmount() in foreign mount namespace
        fs: export the mount ns id via statmount
        fs: keep an index of current mount namespaces
        fs: relax permissions for statmount()
        listmount: allow listing in reverse order
        fs: relax permissions for listmount()
        fs: simplify error handling
        fs: don't copy to userspace under namespace semaphore
        path: add cleanup helper
      f608caba
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.11.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 2aae1d67
      Linus Torvalds authored
      Pull vfs inode / dentry updates from Christian Brauner:
       "This contains smaller performance improvements to inodes and dentries:
      
        inode:
      
         - Add rcu based inode lookup variants.
      
           They avoid one inode hash lock acquire in the common case thereby
           significantly reducing contention. We already support RCU-based
           operations but didn't take advantage of them during inode
           insertion.
      
           Callers of iget_locked() get the improvement without any code
           changes. Callers that need a custom callback can switch to
           iget5_locked_rcu() as e.g., did btrfs.
      
           With 20 threads each walking a dedicated 1000 dirs * 1000 files
           directory tree to stat(2) on a 32 core + 24GB ram vm:
      
              before: 3.54s user 892.30s system 1966% cpu 45.549 total
              after:  3.28s user 738.66s system 1955% cpu 37.932 total (-16.7%)
      
           Long-term we should pick up the effort to introduce more
           fine-grained locking and possibly improve on the currently used
           hash implementation.
      
         - Start zeroing i_state in inode_init_always() instead of doing it in
           individual filesystems.
      
           This allows us to remove an unneeded lock acquire in new_inode()
           and not burden individual filesystems with this.
      
        dcache:
      
         - Move d_lockref out of the area used by RCU lookup to avoid
           cacheline ping poing because the embedded name is sharing a
           cacheline with d_lockref.
      
         - Fix dentry size on 32bit with CONFIG_SMP=y so it does actually end
           up with 128 bytes in total"
      
      * tag 'vfs-6.11.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        fs: fix dentry size
        vfs: move d_lockref out of the area used by RCU lookup
        bcachefs: remove now spurious i_state initialization
        xfs: remove now spurious i_state initialization in xfs_inode_alloc
        vfs: partially sanitize i_state zeroing on inode creation
        xfs: preserve i_state around inode_init_always in xfs_reinit_inode
        btrfs: use iget5_locked_rcu
        vfs: add rcu-based find_inode variants for iget ops
      2aae1d67
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.11.mount.api' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · b8fc1bd7
      Linus Torvalds authored
      Pull vfs mount API updates from Christian Brauner:
      
       - Add a generic helper to parse uid and gid mount options.
      
         Currently we open-code the same logic in various filesystems which is
         error prone, especially since the verification of uid and gid mount
         options is a sensitive operation in the face of idmappings.
      
         Add a generic helper and convert all filesystems over to it. Make
         sure that filesystems that are mountable in unprivileged containers
         verify that the specified uid and gid can be represented in the
         owning namespace of the filesystem.
      
       - Convert hostfs to the new mount api.
      
      * tag 'vfs-6.11.mount.api' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        fuse: Convert to new uid/gid option parsing helpers
        fuse: verify {g,u}id mount options correctly
        fat: Convert to new uid/gid option parsing helpers
        fat: Convert to new mount api
        fat: move debug into fat_mount_options
        vboxsf: Convert to new uid/gid option parsing helpers
        tracefs: Convert to new uid/gid option parsing helpers
        smb: client: Convert to new uid/gid option parsing helpers
        tmpfs: Convert to new uid/gid option parsing helpers
        ntfs3: Convert to new uid/gid option parsing helpers
        isofs: Convert to new uid/gid option parsing helpers
        hugetlbfs: Convert to new uid/gid option parsing helpers
        ext4: Convert to new uid/gid option parsing helpers
        exfat: Convert to new uid/gid option parsing helpers
        efivarfs: Convert to new uid/gid option parsing helpers
        debugfs: Convert to new uid/gid option parsing helpers
        autofs: Convert to new uid/gid option parsing helpers
        fs_parse: add uid & gid option option parsing helpers
        hostfs: Add const qualifier to host_root in hostfs_fill_super()
        hostfs: convert hostfs to use the new mount API
      b8fc1bd7
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.11.casefold' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 4a051e4c
      Linus Torvalds authored
      Pull vfs casefolding updates from Christian Brauner:
       "This contains some work to simplify the handling of casefolded names:
      
         - Simplify the handling of casefolded names in f2fs and ext4 by
           keeping the names as a qstr to avoiding unnecessary conversions
      
         - Introduce a new generic_ci_match() libfs case-insensitive lookup
           helper and use it in both f2fs and ext4 allowing to remove the
           filesystem specific implementations
      
         - Remove a bunch of ifdefs by making the unicode build checks part of
           the code flow"
      
      * tag 'vfs-6.11.casefold' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        f2fs: Move CONFIG_UNICODE defguards into the code flow
        ext4: Move CONFIG_UNICODE defguards into the code flow
        f2fs: Reuse generic_ci_match for ci comparisons
        ext4: Reuse generic_ci_match for ci comparisons
        libfs: Introduce case-insensitive string comparison helper
        f2fs: Simplify the handling of cached casefolded names
        ext4: Simplify the handling of cached casefolded names
      4a051e4c
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.11.module.description' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 7d156879
      Linus Torvalds authored
      Pull vfs module description updates from Christian Brauner:
       "This contains patches to add module descriptions to all modules under
        fs/ currently lacking them"
      
      * tag 'vfs-6.11.module.description' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        openpromfs: add missing MODULE_DESCRIPTION() macro
        fs: nls: add missing MODULE_DESCRIPTION() macros
        fs: autofs: add MODULE_DESCRIPTION()
        fs: fat: add missing MODULE_DESCRIPTION() macros
        fs: binfmt: add missing MODULE_DESCRIPTION() macros
        fs: cramfs: add MODULE_DESCRIPTION()
        fs: hfs: add MODULE_DESCRIPTION()
        fs: hpfs: add MODULE_DESCRIPTION()
        qnx4: add MODULE_DESCRIPTION()
        qnx6: add MODULE_DESCRIPTION()
        fs: sysv: add MODULE_DESCRIPTION()
        fs: efs: add MODULE_DESCRIPTION()
        fs: minix: add MODULE_DESCRIPTION()
      7d156879
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.11.pg_error' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · aff31330
      Linus Torvalds authored
      Pull PG_error removal updates from Christian Brauner:
       "This contains work to remove almost all remaining users of PG_error
        from filesystems and filesystem helper libraries. An additional patch
        will be coming in via the jfs tree which tests the PG_error bit.
      
        Afterwards nothing will be testing it anymore and it's safe to remove
        all places which set or clear the PG_error bit.
      
        The goal is to fully remove PG_error by the next merge window"
      
      * tag 'vfs-6.11.pg_error' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        buffer: Remove calls to set and clear the folio error flag
        iomap: Remove calls to set and clear folio error flag
        vboxsf: Convert vboxsf_read_folio() to use a folio
        ufs: Remove call to set the folio error flag
        romfs: Convert romfs_read_folio() to use a folio
        reiserfs: Remove call to folio_set_error()
        orangefs: Remove calls to set/clear the error flag
        nfs: Remove calls to folio_set_error
        jffs2: Remove calls to set/clear the folio error flag
        hostfs: Convert hostfs_read_folio() to use a folio
        isofs: Convert rock_ridge_symlink_read_folio to use a folio
        hpfs: Convert hpfs_symlink_read_folio to use a folio
        efs: Convert efs_symlink_read_folio to use a folio
        cramfs: Convert cramfs_read_folio to use a folio
        coda: Convert coda_symlink_filler() to use folio_end_read()
        befs: Convert befs_symlink_read_folio() to use folio_end_read()
      aff31330
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.11.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · b051320d
      Linus Torvalds authored
      Pull misc vfs updates from Christian Brauner:
       "Features:
      
         - Support passing NULL along AT_EMPTY_PATH for statx().
      
           NULL paths with any flag value other than AT_EMPTY_PATH go the
           usual route and end up with -EFAULT to retain compatibility (Rust
           is abusing calls of the sort to detect availability of statx)
      
           This avoids path lookup code, lockref management, memory allocation
           and in case of NULL path userspace memory access (which can be
           quite expensive with SMAP on x86_64)
      
         - Don't block i_writecount during exec. Remove the
           deny_write_access() mechanism for executables
      
         - Relax open_by_handle_at() permissions in specific cases where we
           can prove that the caller had sufficient privileges to open a file
      
         - Switch timespec64 fields in struct inode to discrete integers
           freeing up 4 bytes
      
        Fixes:
      
         - Fix false positive circular locking warning in hfsplus
      
         - Initialize hfs_inode_info after hfs_alloc_inode() in hfs
      
         - Avoid accidental overflows in vfs_fallocate()
      
         - Don't interrupt fallocate with EINTR in tmpfs to avoid constantly
           restarting shmem_fallocate()
      
         - Add missing quote in comment in fs/readdir
      
        Cleanups:
      
         - Don't assign and test in an if statement in mqueue. Move the
           assignment out of the if statement
      
         - Reflow the logic in may_create_in_sticky()
      
         - Remove the usage of the deprecated ida_simple_xx() API from procfs
      
         - Reject FSCONFIG_CMD_CREATE_EXCL requets that depend on the new
           mount api early
      
         - Rename variables in copy_tree() to make it easier to understand
      
         - Replace WARN(down_read_trylock, ...) abuse with proper asserts in
           various places in the VFS
      
         - Get rid of user_path_at_empty() and drop the empty argument from
           getname_flags()
      
         - Check for error while copying and no path in one branch in
           getname_flags()
      
         - Avoid redundant smp_mb() for THP handling in do_dentry_open()
      
         - Rename parent_ino to d_parent_ino and make it use RCU
      
         - Remove unused header include in fs/readdir
      
         - Export in_group_capable() helper and switch f2fs and fuse over to
           it instead of open-coding the logic in both places"
      
      * tag 'vfs-6.11.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (27 commits)
        ipc: mqueue: remove assignment from IS_ERR argument
        vfs: rename parent_ino to d_parent_ino and make it use RCU
        vfs: support statx(..., NULL, AT_EMPTY_PATH, ...)
        stat: use vfs_empty_path() helper
        fs: new helper vfs_empty_path()
        fs: reflow may_create_in_sticky()
        vfs: remove redundant smp_mb for thp handling in do_dentry_open
        fuse: Use in_group_or_capable() helper
        f2fs: Use in_group_or_capable() helper
        fs: Export in_group_or_capable()
        vfs: reorder checks in may_create_in_sticky
        hfs: fix to initialize fields of hfs_inode_info after hfs_alloc_inode()
        proc: Remove usage of the deprecated ida_simple_xx() API
        hfsplus: fix to avoid false alarm of circular locking
        Improve readability of copy_tree
        vfs: shave a branch in getname_flags
        vfs: retire user_path_at_empty and drop empty arg from getname_flags
        vfs: stop using user_path_at_empty in do_readlinkat
        tmpfs: don't interrupt fallocate with EINTR
        fs: don't block i_writecount during exec
        ...
      b051320d
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2024-07-12' of https://gitlab.freedesktop.org/drm/kernel · 2ffd45da
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Oh I screwed up last week's fixes pull, and forgot to send..
      
        Back to work, thanks to Sima for last week, not too many fixes as
        expected getting close to release [ sic - Linus ], amdgpu and xe have
        a couple each, and then some other misc ones.
      
        amdgpu:
         - PSR-SU fix
         - Reseved VMID fix
      
        xe:
         - Use write-back caching mode for system memory on DGFX
         - Do not leak object when finalizing hdcp gsc
      
        bridge:
         - adv7511 EDID irq fix
      
        gma500:
         - NULL mode fixes.
      
        meson:
         - fix resource leak"
      
      * tag 'drm-fixes-2024-07-12' of https://gitlab.freedesktop.org/drm/kernel:
        Revert "drm/amd/display: Reset freesync config before update new state"
        drm/xe/display/xe_hdcp_gsc: Free arbiter on driver removal
        drm/xe: Use write-back caching mode for system memory on DGFX
        drm/amdgpu: reject gang submit on reserved VMIDs
        drm/gma500: fix null pointer dereference in cdv_intel_lvds_get_modes
        drm/gma500: fix null pointer dereference in psb_intel_lvds_get_modes
        drm/meson: fix canvas release in bind function
        drm/bridge: adv7511: Fix Intermittent EDID failures
      2ffd45da
    • Linus Torvalds's avatar
      Merge branch 'link_path_walk' · 5e049755
      Linus Torvalds authored
      This is the last - for now - of the "look, we generated some
      questionable code for basic pathname lookup operations" set of
      branches.
      
      This is mainly just re-organizing the name hashing code in
      link_path_walk(), mostly by improving the calling conventions to
      the inlined helper functions and moving some of the code around
      to allow for more straightforward code generation.
      
      The profiles - and the generated code - look much more palatable
      to me now.
      
      * link_path_walk:
        vfs: link_path_walk: move more of the name hashing into hash_name()
        vfs: link_path_walk: improve may_lookup() code generation
        vfs: link_path_walk: do '.' and '..' detection while hashing
        vfs: link_path_walk: clarify and improve name hashing interface
        vfs: link_path_walk: simplify name hash flow
      5e049755
    • Linus Torvalds's avatar
      Merge branch 'arm64-uaccess' (early part) · 1654c37d
      Linus Torvalds authored
      Merge arm64 support for proper 'unsafe' user accessor functionality,
      with 'asm goto' for handling exceptions.
      
      The arm64 user access code used the slow fallback code for the user
      access code, which generates horrendous code for things like
      strncpy_from_user(), because it causes us to generate code for SW PAN
      and for range checking for every individual word.
      
      Teach arm64 about 'user_access_begin()' and the so-called 'unsafe' user
      access functions that take an error label and use 'asm goto' to make all
      the exception handling be entirely out of line.
      
      [ These user access functions are called 'unsafe' not because the
        concept is unsafe, but because the low-level accessor functions
        absolutely have to be protected by the 'user_access_begin()' code,
        because that's what does the range checking.
      
        So the accessor functions have that scary name to make sure people
        don't think they are usable on their own, and cannot be mis-used the
        way our old "double underscore" versions of __get_user() and friends
        were ]
      
      The "(early part)" of the branch is because the full branch also
      improved on the "access_ok()" function, but the exact semantics of TBI
      (top byte ignore) have to be discussed before doing that part.  So this
      just does the low-level accessor update to use "asm goto".
      
      * 'arm64-uaccess' (early part):
        arm64: start using 'asm goto' for put_user()
        arm64: start using 'asm goto' for get_user() when available
      1654c37d
    • Linus Torvalds's avatar
      Merge branch 'word-at-a-time' · 6a31ffdf
      Linus Torvalds authored
      Merge minor word-at-a-time instruction choice improvements for x86 and
      arm64.
      
      This is the second of four branches that came out of me looking at the
      code generation for path lookup on arm64.
      
      The word-at-a-time infrastructure is used to do string operations in
      chunks of one word both when copying the pathname from user space (in
      strncpy_from_user()), and when parsing and hashing the individual path
      components (in link_path_walk()).
      
      In particular, the "find the first zero byte" uses various bit tricks to
      figure out the end of the string or path component, and get the length
      without having to do things one byte at a time.  Both x86-64 and arm64
      had less than optimal code choices for that.
      
      The commit message for the arm64 change in particular tries to explain
      the exact code flow for the zero byte finding for people who care.  It's
      made a bit more complicated by the fact that we support big-endian
      hardware too, and so we have some extra abstraction layers to allow
      different models for finding the zero byte, quite apart from the issue
      of picking specialized instructions.
      
      * word-at-a-time:
        arm64: word-at-a-time: improve byte count calculations for LE
        x86-64: word-at-a-time: improve byte count calculations
      6a31ffdf
    • Linus Torvalds's avatar
      Merge branch 'runtime-constants' · a5819099
      Linus Torvalds authored
      Merge runtime constants infrastructure with implementations for x86 and
      arm64.
      
      This is one of four branches that came out of me looking at profiles of
      my kernel build filesystem load on my 128-core Altra arm64 system, where
      pathname walking and the user copies (particularly strncpy_from_user()
      for fetching the pathname from user space) is very hot.
      
      This is a very specialized "instruction alternatives" model where the
      dentry hash pointer and hash count will be constants for the lifetime of
      the kernel, but the allocation are not static but done early during the
      kernel boot.  In order to avoid the pointer load and dynamic shift, we
      just rewrite the constants in the instructions in place.
      
      We can't use the "generic" alternative instructions infrastructure,
      because different architectures do it very differently, and it's
      actually simpler to just have very specific helpers, with a fallback to
      the generic ("old") model of just using variables for architectures that
      do not implement the runtime constant patching infrastructure.
      
      Link: https://lore.kernel.org/all/CAHk-=widPe38fUNjUOmX11ByDckaeEo9tN4Eiyke9u1SAtu9sA@mail.gmail.com/
      
      * runtime-constants:
        arm64: add 'runtime constant' support
        runtime constants: add x86 architecture support
        runtime constants: add default dummy infrastructure
        vfs: dcache: move hashlen_hash() from callers into d_hash()
      a5819099
  2. 14 Jul, 2024 7 commits
  3. 13 Jul, 2024 7 commits