1. 13 Dec, 2022 27 commits
    • Linus Torvalds's avatar
      Merge tag 'fs.xattr.simple.noaudit.v6.2' of... · 07d7a4d6
      Linus Torvalds authored
      Merge tag 'fs.xattr.simple.noaudit.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping
      
      Pull xattr audit fix from Seth Forshee:
       "This is a single patch to remove auditing of the capability check in
        simple_xattr_list().
      
        This check is done to check whether trusted xattrs should be included
        by listxattr(2). SELinux will normally log a denial when capable() is
        called and the task's SELinux context doesn't have the corresponding
        capability permission allowed, which can end up spamming the log.
      
        Since a failed check here cannot be used to infer malicious intent,
        auditing is of no real value, and it makes sense to stop auditing the
        capability check"
      
      * tag 'fs.xattr.simple.noaudit.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
        fs: don't audit the capability check in simple_xattr_list()
      07d7a4d6
    • Linus Torvalds's avatar
      Merge tag 'fs.idmapped.squashfs.v6.2' of... · 6e8948a0
      Linus Torvalds authored
      Merge tag 'fs.idmapped.squashfs.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping
      
      Pull squashfs update from Seth Forshee:
       "This is a simple patch to enable idmapped mounts for squashfs.
      
        All functionality squashfs needs to support idmapped mounts is already
        implemented in generic VFS code, so all that is needed is to set
        FS_ALLOW_IDMAP in fs_flags"
      
      * tag 'fs.idmapped.squashfs.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
        squashfs: enable idmapped mounts
      6e8948a0
    • Linus Torvalds's avatar
      Merge tag 'fuse-update-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse · 043930b1
      Linus Torvalds authored
      Pull fuse update from Miklos Szeredi:
      
       - Allow some write requests to proceed in parallel
      
       - Fix a performance problem with allow_sys_admin_access
      
       - Add a special kind of invalidation that doesn't immediately purge
         submounts
      
       - On revalidation treat the target of rename(RENAME_NOREPLACE) the same
         as open(O_EXCL)
      
       - Use type safe helpers for some mnt_userns transformations
      
      * tag 'fuse-update-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
        fuse: Rearrange fuse_allow_current_process checks
        fuse: allow non-extending parallel direct writes on the same file
        fuse: remove the unneeded result variable
        fuse: port to vfs{g,u}id_t and associated helpers
        fuse: Remove user_ns check for FUSE_DEV_IOC_CLONE
        fuse: always revalidate rename target dentry
        fuse: add "expire only" mode to FUSE_NOTIFY_INVAL_ENTRY
        fs/fuse: Replace kmap() with kmap_local_page()
      043930b1
    • Linus Torvalds's avatar
      Merge tag 'ovl-update-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs · 6df7cc22
      Linus Torvalds authored
      Pull overlayfs update from Miklos Szeredi:
      
       - Fix a couple of bugs found by syzbot
      
       - Don't ingore some open flags set by fcntl(F_SETFL)
      
       - Fix failure to create a hard link in certain cases
      
       - Use type safe helpers for some mnt_userns transformations
      
       - Improve performance of mount
      
       - Misc cleanups
      
      * tag 'ovl-update-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
        ovl: Kconfig: Fix spelling mistake "undelying" -> "underlying"
        ovl: use inode instead of dentry where possible
        ovl: Add comment on upperredirect reassignment
        ovl: use plain list filler in indexdir and workdir cleanup
        ovl: do not reconnect upper index records in ovl_indexdir_cleanup()
        ovl: fix comment typos
        ovl: port to vfs{g,u}id_t and associated helpers
        ovl: Use ovl mounter's fsuid and fsgid in ovl_link()
        ovl: Use "buf" flexible array for memcpy() destination
        ovl: update ->f_iocb_flags when ovl_change_flags() modifies ->f_flags
        ovl: fix use inode directly in rcu-walk mode
      6df7cc22
    • Linus Torvalds's avatar
      Merge tag 'erofs-for-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs · 4a6bff11
      Linus Torvalds authored
      Pull erofs updates from Gao Xiang:
       "In this cycle, large folios are now enabled in the iomap/fscache mode
        for uncompressed files first. In order to do that, we've also cleaned
        up better interfaces between erofs and fscache, which are acked by
        fscache/netfs folks and included in this pull request.
      
        Other than that, there are random fixes around erofs over fscache and
        crafted images by syzbot, minor cleanups and documentation updates.
      
        Summary:
      
         - Enable large folios for iomap/fscache mode
      
         - Avoid sysfs warning due to mounting twice with the same fsid and
           domain_id in fscache mode
      
         - Refine fscache interface among erofs, fscache, and cachefiles
      
         - Use kmap_local_page() only for metabuf
      
         - Fixes around crafted images found by syzbot
      
         - Minor cleanups and documentation updates"
      
      * tag 'erofs-for-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
        erofs: validate the extent length for uncompressed pclusters
        erofs: fix missing unmap if z_erofs_get_extent_compressedlen() fails
        erofs: Fix pcluster memleak when its block address is zero
        erofs: use kmap_local_page() only for erofs_bread()
        erofs: enable large folios for fscache mode
        erofs: support large folios for fscache mode
        erofs: switch to prepare_ondemand_read() in fscache mode
        fscache,cachefiles: add prepare_ondemand_read() callback
        erofs: clean up cached I/O strategies
        erofs: update documentation
        erofs: check the uniqueness of fsid in shared domain in advance
        erofs: enable large folios for iomap mode
      4a6bff11
    • Linus Torvalds's avatar
      Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt · ad0d9da1
      Linus Torvalds authored
      Pull fsverity updates from Eric Biggers:
       "The main change this cycle is to stop using the PG_error flag to track
        verity failures, and instead just track failures at the bio level.
        This follows a similar fscrypt change that went into 6.1, and it is a
        step towards freeing up PG_error for other uses.
      
        There's also one other small cleanup"
      
      * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
        fsverity: simplify fsverity_get_digest()
        fsverity: stop using PG_error to track error status
      ad0d9da1
    • Linus Torvalds's avatar
      Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt · 8129bac6
      Linus Torvalds authored
      Pull fscrypt updates from Eric Biggers:
       "This release adds SM4 encryption support, contributed by Tianjia
        Zhang. SM4 is a Chinese block cipher that is an alternative to AES.
      
        I recommend against using SM4, but (according to Tianjia) some people
        are being required to use it. Since SM4 has been turning up in many
        other places (crypto API, wireless, TLS, OpenSSL, ARMv8 CPUs, etc.),
        it hasn't been very controversial, and some people have to use it, I
        don't think it would be fair for me to reject this optional feature.
      
        Besides the above, there are a couple cleanups"
      
      * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
        fscrypt: add additional documentation for SM4 support
        fscrypt: remove unused Speck definitions
        fscrypt: Add SM4 XTS/CTS symmetric algorithm support
        blk-crypto: Add support for SM4-XTS blk crypto mode
        fscrypt: add comment for fscrypt_valid_enc_modes_v1()
        fscrypt: pass super_block to fscrypt_put_master_key_activeref()
      8129bac6
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · deb9acc1
      Linus Torvalds authored
      Pull ext4 updates from Ted Ts'o:
       "A large number of cleanups and bug fixes, with many of the bug fixes
        found by Syzbot and fuzzing. (Many of the bug fixes involve less-used
        ext4 features such as fast_commit, inline_data and bigalloc)
      
        In addition, remove the writepage function for ext4, since the
        medium-term plan is to remove ->writepage() entirely. (The VM doesn't
        need or want writepage() for writeback, since it is fine with
        ->writepages() so long as ->migrate_folio() is implemented)"
      
      * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (58 commits)
        ext4: fix reserved cluster accounting in __es_remove_extent()
        ext4: fix inode leak in ext4_xattr_inode_create() on an error path
        ext4: allocate extended attribute value in vmalloc area
        ext4: avoid unaccounted block allocation when expanding inode
        ext4: initialize quota before expanding inode in setproject ioctl
        ext4: stop providing .writepage hook
        mm: export buffer_migrate_folio_norefs()
        ext4: switch to using write_cache_pages() for data=journal writeout
        jbd2: switch jbd2_submit_inode_data() to use fs-provided hook for data writeout
        ext4: switch to using ext4_do_writepages() for ordered data writeout
        ext4: move percpu_rwsem protection into ext4_writepages()
        ext4: provide ext4_do_writepages()
        ext4: add support for writepages calls that cannot map blocks
        ext4: drop pointless IO submission from ext4_bio_write_page()
        ext4: remove nr_submitted from ext4_bio_write_page()
        ext4: move keep_towrite handling to ext4_bio_write_page()
        ext4: handle redirtying in ext4_bio_write_page()
        ext4: fix kernel BUG in 'ext4_write_inline_data_end()'
        ext4: make ext4_mb_initialize_context return void
        ext4: fix deadlock due to mbcache entry corruption
        ...
      deb9acc1
    • Linus Torvalds's avatar
      Merge tag 'fs.idmapped.mnt_idmap.v6.2' of... · 9b93f506
      Linus Torvalds authored
      Merge tag 'fs.idmapped.mnt_idmap.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping
      
      Pull idmapping updates from Christian Brauner:
       "Last cycle we've already made the interaction with idmapped mounts
        more robust and type safe by introducing the vfs{g,u}id_t type. This
        cycle we concluded the conversion and removed the legacy helpers.
      
        Currently we still pass around the plain namespace that was attached
        to a mount. This is in general pretty convenient but it makes it easy
        to conflate namespaces that are relevant on the filesystem - with
        namespaces that are relevent on the mount level. Especially for
        filesystem developers without detailed knowledge in this area this can
        be a potential source for bugs.
      
        Instead of passing the plain namespace we introduce a dedicated type
        struct mnt_idmap and replace the pointer with a pointer to a struct
        mnt_idmap. There are no semantic or size changes for the mount struct
        caused by this.
      
        We then start converting all places aware of idmapped mounts to rely
        on struct mnt_idmap. Once the conversion is done all helpers down to
        the really low-level make_vfs{g,u}id() and from_vfs{g,u}id() will take
        a struct mnt_idmap argument instead of two namespace arguments. This
        way it becomes impossible to conflate the two removing and thus
        eliminating the possibility of any bugs. Fwiw, I fixed some issues in
        that area a while ago in ntfs3 and ksmbd in the past. Afterwards only
        low-level code can ultimately use the associated namespace for any
        permission checks. Even most of the vfs can be completely obivious
        about this ultimately and filesystems will never interact with it in
        any form in the future.
      
        A struct mnt_idmap currently encompasses a simple refcount and pointer
        to the relevant namespace the mount is idmapped to. If a mount isn't
        idmapped then it will point to a static nop_mnt_idmap and if it
        doesn't that it is idmapped. As usual there are no allocations or
        anything happening for non-idmapped mounts. Everthing is carefully
        written to be a nop for non-idmapped mounts as has always been the
        case.
      
        If an idmapped mount is created a struct mnt_idmap is allocated and a
        reference taken on the relevant namespace. Each mount that gets
        idmapped or inherits the idmap simply bumps the reference count on
        struct mnt_idmap. Just a reminder that we only allow a mount to change
        it's idmapping a single time and only if it hasn't already been
        attached to the filesystems and has no active writers.
      
        The actual changes are fairly straightforward but this will have huge
        benefits for maintenance and security in the long run even if it
        causes some churn.
      
        Note that this also makes it possible to extend struct mount_idmap in
        the future. For example, it would be possible to place the namespace
        pointer in an anonymous union together with an idmapping struct. This
        would allow us to expose an api to userspace that would let it specify
        idmappings directly instead of having to go through the detour of
        setting up namespaces at all"
      
      * tag 'fs.idmapped.mnt_idmap.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
        acl: conver higher-level helpers to rely on mnt_idmap
        fs: introduce dedicated idmap type for mounts
      9b93f506
    • Linus Torvalds's avatar
      Merge tag 'fs.vfsuid.conversion.v6.2' of... · e1212e9b
      Linus Torvalds authored
      Merge tag 'fs.vfsuid.conversion.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping
      
      Pull vfsuid updates from Christian Brauner:
       "Last cycle we introduced the vfs{g,u}id_t types and associated helpers
        to gain type safety when dealing with idmapped mounts. That initial
        work already converted a lot of places over but there were still some
        left,
      
        This converts all remaining places that still make use of non-type
        safe idmapping helpers to rely on the new type safe vfs{g,u}id based
        helpers.
      
        Afterwards it removes all the old non-type safe helpers"
      
      * tag 'fs.vfsuid.conversion.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
        fs: remove unused idmapping helpers
        ovl: port to vfs{g,u}id_t and associated helpers
        fuse: port to vfs{g,u}id_t and associated helpers
        ima: use type safe idmapping helpers
        apparmor: use type safe idmapping helpers
        caps: use type safe idmapping helpers
        fs: use type safe idmapping helpers
        mnt_idmapping: add missing helpers
      e1212e9b
    • Linus Torvalds's avatar
      Merge tag 'fs.ovl.setgid.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping · cf619f89
      Linus Torvalds authored
      Pull setgid inheritance updates from Christian Brauner:
       "This contains the work to make setgid inheritance consistent between
        modifying a file and when changing ownership or mode as this has been
        a repeated source of very subtle bugs. The gist is that we perform the
        same permission checks in the write path as we do in the ownership and
        mode changing paths after this series where we're currently doing
        different things.
      
        We've already made setgid inheritance a lot more consistent and
        reliable in the last releases by moving setgid stripping from the
        individual filesystems up into the vfs. This aims to make the logic
        even more consistent and easier to understand and also to fix
        long-standing overlayfs setgid inheritance bugs. Miklos was nice
        enough to just let me carry the trivial overlayfs patches from Amir
        too.
      
        Below is a more detailed explanation how the current difference in
        setgid handling lead to very subtle bugs exemplified via overlayfs
        which is a victim of the current rules. I hope this explains why I
        think taking the regression risk here is worth it.
      
        A long while ago I found a few setgid inheritance bugs in overlayfs in
        the write path in certain conditions. Amir recently picked this back
        up in [1] and I jumped on board to fix this more generally.
      
        On the surface all that overlayfs would need to fix setgid inheritance
        would be to call file_remove_privs() or file_modified() but actually
        that isn't enough because the setgid inheritance api is wildly
        inconsistent in that area.
      
        Before this pr setgid stripping in file_remove_privs()'s old
        should_remove_suid() helper was inconsistent with other parts of the
        vfs. Specifically, it only raises ATTR_KILL_SGID if the inode is
        S_ISGID and S_IXGRP but not if the inode isn't in the caller's groups
        and the caller isn't privileged over the inode although we require
        this already in setattr_prepare() and setattr_copy() and so all
        filesystem implement this requirement implicitly because they have to
        use setattr_{prepare,copy}() anyway.
      
        But the inconsistency shows up in setgid stripping bugs for overlayfs
        in xfstests (e.g., generic/673, generic/683, generic/685, generic/686,
        generic/687). For example, we test whether suid and setgid stripping
        works correctly when performing various write-like operations as an
        unprivileged user (fallocate, reflink, write, etc.):
      
            echo "Test 1 - qa_user, non-exec file $verb"
            setup_testfile
            chmod a+rws $junk_file
            commit_and_check "$qa_user" "$verb" 64k 64k
      
        The test basically creates a file with 6666 permissions. While the
        file has the S_ISUID and S_ISGID bits set it does not have the S_IXGRP
        set.
      
        On a regular filesystem like xfs what will happen is:
      
            sys_fallocate()
            -> vfs_fallocate()
               -> xfs_file_fallocate()
                  -> file_modified()
                     -> __file_remove_privs()
                        -> dentry_needs_remove_privs()
                           -> should_remove_suid()
                        -> __remove_privs()
                           newattrs.ia_valid = ATTR_FORCE | kill;
                           -> notify_change()
                              -> setattr_copy()
      
        In should_remove_suid() we can see that ATTR_KILL_SUID is raised
        unconditionally because the file in the test has S_ISUID set.
      
        But we also see that ATTR_KILL_SGID won't be set because while the
        file is S_ISGID it is not S_IXGRP (see above) which is a condition for
        ATTR_KILL_SGID being raised.
      
        So by the time we call notify_change() we have attr->ia_valid set to
        ATTR_KILL_SUID | ATTR_FORCE.
      
        Now notify_change() sees that ATTR_KILL_SUID is set and does:
      
            ia_valid      = attr->ia_valid |= ATTR_MODE
            attr->ia_mode = (inode->i_mode & ~S_ISUID);
      
        which means that when we call setattr_copy() later we will definitely
        update inode->i_mode. Note that attr->ia_mode still contains S_ISGID.
      
        Now we call into the filesystem's ->setattr() inode operation which
        will end up calling setattr_copy(). Since ATTR_MODE is set we will
        hit:
      
            if (ia_valid & ATTR_MODE) {
                    umode_t mode = attr->ia_mode;
                    vfsgid_t vfsgid = i_gid_into_vfsgid(mnt_userns, inode);
                    if (!vfsgid_in_group_p(vfsgid) &&
                        !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
                            mode &= ~S_ISGID;
                    inode->i_mode = mode;
            }
      
        and since the caller in the test is neither capable nor in the group
        of the inode the S_ISGID bit is stripped.
      
        But assume the file isn't suid then ATTR_KILL_SUID won't be raised
        which has the consequence that neither the setgid nor the suid bits
        are stripped even though it should be stripped because the inode isn't
        in the caller's groups and the caller isn't privileged over the inode.
      
        If overlayfs is in the mix things become a bit more complicated and
        the bug shows up more clearly.
      
        When e.g., ovl_setattr() is hit from ovl_fallocate()'s call to
        file_remove_privs() then ATTR_KILL_SUID and ATTR_KILL_SGID might be
        raised but because the check in notify_change() is questioning the
        ATTR_KILL_SGID flag again by requiring S_IXGRP for it to be stripped
        the S_ISGID bit isn't removed even though it should be stripped:
      
            sys_fallocate()
            -> vfs_fallocate()
               -> ovl_fallocate()
                  -> file_remove_privs()
                     -> dentry_needs_remove_privs()
                        -> should_remove_suid()
                     -> __remove_privs()
                        newattrs.ia_valid = ATTR_FORCE | kill;
                        -> notify_change()
                           -> ovl_setattr()
                              /* TAKE ON MOUNTER'S CREDS */
                              -> ovl_do_notify_change()
                                 -> notify_change()
                              /* GIVE UP MOUNTER'S CREDS */
                 /* TAKE ON MOUNTER'S CREDS */
                 -> vfs_fallocate()
                    -> xfs_file_fallocate()
                       -> file_modified()
                          -> __file_remove_privs()
                             -> dentry_needs_remove_privs()
                                -> should_remove_suid()
                             -> __remove_privs()
                                newattrs.ia_valid = attr_force | kill;
                                -> notify_change()
      
        The fix for all of this is to make file_remove_privs()'s
        should_remove_suid() helper perform the same checks as we already
        require in setattr_prepare() and setattr_copy() and have
        notify_change() not pointlessly requiring S_IXGRP again. It doesn't
        make any sense in the first place because the caller must calculate
        the flags via should_remove_suid() anyway which would raise
        ATTR_KILL_SGID
      
        Note that some xfstests will now fail as these patches will cause the
        setgid bit to be lost in certain conditions for unprivileged users
        modifying a setgid file when they would've been kept otherwise. I
        think this risk is worth taking and I explained and mentioned this
        multiple times on the list [2].
      
        Enforcing the rules consistently across write operations and
        chmod/chown will lead to losing the setgid bit in cases were it
        might've been retained before.
      
        While I've mentioned this a few times but it's worth repeating just to
        make sure that this is understood. For the sake of maintainability,
        consistency, and security this is a risk worth taking.
      
        If we really see regressions for workloads the fix is to have special
        setgid handling in the write path again with different semantics from
        chmod/chown and possibly additional duct tape for overlayfs. I'll
        update the relevant xfstests with if you should decide to merge this
        second setgid cleanup.
      
        Before that people should be aware that there might be failures for
        fstests where unprivileged users modify a setgid file"
      
      Link: https://lore.kernel.org/linux-fsdevel/20221003123040.900827-1-amir73il@gmail.com [1]
      Link: https://lore.kernel.org/linux-fsdevel/20221122142010.zchf2jz2oymx55qi@wittgenstein [2]
      
      * tag 'fs.ovl.setgid.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
        fs: use consistent setgid checks in is_sxid()
        ovl: remove privs in ovl_fallocate()
        ovl: remove privs in ovl_copyfile()
        attr: use consistent sgid stripping checks
        attr: add setattr_should_drop_sgid()
        fs: move should_remove_suid()
        attr: add in_group_or_capable()
      cf619f89
    • Linus Torvalds's avatar
      Merge tag 'fs.acl.rework.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping · 6a518afc
      Linus Torvalds authored
      Pull VFS acl updates from Christian Brauner:
       "This contains the work that builds a dedicated vfs posix acl api.
      
        The origins of this work trace back to v5.19 but it took quite a while
        to understand the various filesystem specific implementations in
        sufficient detail and also come up with an acceptable solution.
      
        As we discussed and seen multiple times the current state of how posix
        acls are handled isn't nice and comes with a lot of problems: The
        current way of handling posix acls via the generic xattr api is error
        prone, hard to maintain, and type unsafe for the vfs until we call
        into the filesystem's dedicated get and set inode operations.
      
        It is already the case that posix acls are special-cased to death all
        the way through the vfs. There are an uncounted number of hacks that
        operate on the uapi posix acl struct instead of the dedicated vfs
        struct posix_acl. And the vfs must be involved in order to interpret
        and fixup posix acls before storing them to the backing store, caching
        them, reporting them to userspace, or for permission checking.
      
        Currently a range of hacks and duct tape exist to make this work. As
        with most things this is really no ones fault it's just something that
        happened over time. But the code is hard to understand and difficult
        to maintain and one is constantly at risk of introducing bugs and
        regressions when having to touch it.
      
        Instead of continuing to hack posix acls through the xattr handlers
        this series builds a dedicated posix acl api solely around the get and
        set inode operations.
      
        Going forward, the vfs_get_acl(), vfs_remove_acl(), and vfs_set_acl()
        helpers must be used in order to interact with posix acls. They
        operate directly on the vfs internal struct posix_acl instead of
        abusing the uapi posix acl struct as we currently do. In the end this
        removes all of the hackiness, makes the codepaths easier to maintain,
        and gets us type safety.
      
        This series passes the LTP and xfstests suites without any
        regressions. For xfstests the following combinations were tested:
         - xfs
         - ext4
         - btrfs
         - overlayfs
         - overlayfs on top of idmapped mounts
         - orangefs
         - (limited) cifs
      
        There's more simplifications for posix acls that we can make in the
        future if the basic api has made it.
      
        A few implementation details:
      
         - The series makes sure to retain exactly the same security and
           integrity module permission checks. Especially for the integrity
           modules this api is a win because right now they convert the uapi
           posix acl struct passed to them via a void pointer into the vfs
           struct posix_acl format to perform permission checking on the mode.
      
           There's a new dedicated security hook for setting posix acls which
           passes the vfs struct posix_acl not a void pointer. Basing checking
           on the posix acl stored in the uapi format is really unreliable.
           The vfs currently hacks around directly in the uapi struct storing
           values that frankly the security and integrity modules can't
           correctly interpret as evidenced by bugs we reported and fixed in
           this area. It's not necessarily even their fault it's just that the
           format we provide to them is sub optimal.
      
         - Some filesystems like 9p and cifs need access to the dentry in
           order to get and set posix acls which is why they either only
           partially or not even at all implement get and set inode
           operations. For example, cifs allows setxattr() and getxattr()
           operations but doesn't allow permission checking based on posix
           acls because it can't implement a get acl inode operation.
      
           Thus, this patch series updates the set acl inode operation to take
           a dentry instead of an inode argument. However, for the get acl
           inode operation we can't do this as the old get acl method is
           called in e.g., generic_permission() and inode_permission(). These
           helpers in turn are called in various filesystem's permission inode
           operation. So passing a dentry argument to the old get acl inode
           operation would amount to passing a dentry to the permission inode
           operation which we shouldn't and probably can't do.
      
           So instead of extending the existing inode operation Christoph
           suggested to add a new one. He also requested to ensure that the
           get and set acl inode operation taking a dentry are consistently
           named. So for this version the old get acl operation is renamed to
           ->get_inode_acl() and a new ->get_acl() inode operation taking a
           dentry is added. With this we can give both 9p and cifs get and set
           acl inode operations and in turn remove their complex custom posix
           xattr handlers.
      
           In the future I hope to get rid of the inode method duplication but
           it isn't like we have never had this situation. Readdir is just one
           example. And frankly, the overall gain in type safety and the more
           pleasant api wise are simply too big of a benefit to not accept
           this duplication for a while.
      
         - We've done a full audit of every codepaths using variant of the
           current generic xattr api to get and set posix acls and
           surprisingly it isn't that many places. There's of course always a
           chance that we might have missed some and if so I'm sure we'll find
           them soon enough.
      
           The crucial codepaths to be converted are obviously stacking
           filesystems such as ecryptfs and overlayfs.
      
           For a list of all callers currently using generic xattr api helpers
           see [2] including comments whether they support posix acls or not.
      
         - The old vfs generic posix acl infrastructure doesn't obey the
           create and replace semantics promised on the setxattr(2) manpage.
           This patch series doesn't address this. It really is something we
           should revisit later though.
      
        The patches are roughly organized as follows:
      
         (1) Change existing set acl inode operation to take a dentry
             argument (Intended to be a non-functional change)
      
         (2) Rename existing get acl method (Intended to be a non-functional
             change)
      
         (3) Implement get and set acl inode operations for filesystems that
             couldn't implement one before because of the missing dentry.
             That's mostly 9p and cifs (Intended to be a non-functional
             change)
      
         (4) Build posix acl api, i.e., add vfs_get_acl(), vfs_remove_acl(),
             and vfs_set_acl() including security and integrity hooks
             (Intended to be a non-functional change)
      
         (5) Implement get and set acl inode operations for stacking
             filesystems (Intended to be a non-functional change)
      
         (6) Switch posix acl handling in stacking filesystems to new posix
             acl api now that all filesystems it can stack upon support it.
      
         (7) Switch vfs to new posix acl api (semantical change)
      
         (8) Remove all now unused helpers
      
         (9) Additional regression fixes reported after we merged this into
             linux-next
      
        Thanks to Seth for a lot of good discussion around this and
        encouragement and input from Christoph"
      
      * tag 'fs.acl.rework.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: (36 commits)
        posix_acl: Fix the type of sentinel in get_acl
        orangefs: fix mode handling
        ovl: call posix_acl_release() after error checking
        evm: remove dead code in evm_inode_set_acl()
        cifs: check whether acl is valid early
        acl: make vfs_posix_acl_to_xattr() static
        acl: remove a slew of now unused helpers
        9p: use stub posix acl handlers
        cifs: use stub posix acl handlers
        ovl: use stub posix acl handlers
        ecryptfs: use stub posix acl handlers
        evm: remove evm_xattr_acl_change()
        xattr: use posix acl api
        ovl: use posix acl api
        ovl: implement set acl method
        ovl: implement get acl method
        ecryptfs: implement set acl method
        ecryptfs: implement get acl method
        ksmbd: use vfs_remove_acl()
        acl: add vfs_remove_acl()
        ...
      6a518afc
    • Linus Torvalds's avatar
      Merge tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · bd907413
      Linus Torvalds authored
      Pull misc vfs updates from Al Viro:
       "misc pile"
      
      * tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        fs: sysv: Fix sysv_nblocks() returns wrong value
        get rid of INT_LIMIT, use type_max() instead
        btrfs: replace INT_LIMIT(loff_t) with OFFSET_MAX
        fs: simplify vfs_get_super
        fs: drop useless condition from inode_needs_update_time
      bd907413
    • Linus Torvalds's avatar
      Merge tag 'pull-namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 13c574fe
      Linus Torvalds authored
      Pull namespace fix from Al Viro:
       "Fix weird corner case in copy_mnt_ns()"
      
      * tag 'pull-namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        copy_mnt_ns(): handle a corner case (overmounted mntns bindings) saner
      13c574fe
    • Linus Torvalds's avatar
      Merge tag 'pull-iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 75f4d9af
      Linus Torvalds authored
      Pull iov_iter updates from Al Viro:
       "iov_iter work; most of that is about getting rid of direction
        misannotations and (hopefully) preventing more of the same for the
        future"
      
      * tag 'pull-iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        use less confusing names for iov_iter direction initializers
        iov_iter: saner checks for attempt to copy to/from iterator
        [xen] fix "direction" argument of iov_iter_kvec()
        [vhost] fix 'direction' argument of iov_iter_{init,bvec}()
        [target] fix iov_iter_bvec() "direction" argument
        [s390] memcpy_real(): WRITE is "data source", not destination...
        [s390] zcore: WRITE is "data source", not destination...
        [infiniband] READ is "data destination", not source...
        [fsi] WRITE is "data source", not destination...
        [s390] copy_oldmem_kernel() - WRITE is "data source", not destination
        csum_and_copy_to_iter(): handle ITER_DISCARD
        get rid of unlikely() on page_copy_sane() calls
      75f4d9af
    • Linus Torvalds's avatar
      Merge tag 'pull-alpha' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 268369b1
      Linus Torvalds authored
      Pull alpha updates from Al Viro:
       "Alpha architecture cleanups and fixes.
      
        One thing *not* included is lazy FPU switching stuff - this pile is
        just the straightforward stuff"
      
      * tag 'pull-alpha' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        alpha: ret_from_fork can go straight to ret_to_user
        alpha: syscall exit cleanup
        alpha: fix handling of a3 on straced syscalls
        alpha: fix syscall entry in !AUDUT_SYSCALL case
        alpha: _TIF_ALLWORK_MASK is unused
        alpha: fix TIF_NOTIFY_SIGNAL handling
      268369b1
    • Linus Torvalds's avatar
      Merge tag 'pull-elfcore' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 405b2fc6
      Linus Torvalds authored
      Pull elf coredumping updates from Al Viro:
       "Unification of regset and non-regset sides of ELF coredump handling.
      
        Collecting per-thread register values is the only thing that needs to
        be ifdefed there..."
      
      * tag 'pull-elfcore' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        [elf] get rid of get_note_info_size()
        [elf] unify regset and non-regset cases
        [elf][non-regset] use elf_core_copy_task_regs() for dumper as well
        [elf][non-regset] uninline elf_core_copy_task_fpregs() (and lose pt_regs argument)
        elf_core_copy_task_regs(): task_pt_regs is defined everywhere
        [elf][regset] simplify thread list handling in fill_note_info()
        [elf][regset] clean fill_note_info() a bit
        kill extern of vsyscall32_sysctl
        kill coredump_params->regs
        kill signal_pt_regs()
      405b2fc6
    • Linus Torvalds's avatar
      Merge tag 'mm-nonmm-stable-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm · 8702f2c6
      Linus Torvalds authored
      Pull non-MM updates from Andrew Morton:
      
       - A ptrace API cleanup series from Sergey Shtylyov
      
       - Fixes and cleanups for kexec from ye xingchen
      
       - nilfs2 updates from Ryusuke Konishi
      
       - squashfs feature work from Xiaoming Ni: permit configuration of the
         filesystem's compression concurrency from the mount command line
      
       - A series from Akinobu Mita which addresses bound checking errors when
         writing to debugfs files
      
       - A series from Yang Yingliang to address rapidio memory leaks
      
       - A series from Zheng Yejian to address possible overflow errors in
         encode_comp_t()
      
       - And a whole shower of singleton patches all over the place
      
      * tag 'mm-nonmm-stable-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (79 commits)
        ipc: fix memory leak in init_mqueue_fs()
        hfsplus: fix bug causing custom uid and gid being unable to be assigned with mount
        rapidio: devices: fix missing put_device in mport_cdev_open
        kcov: fix spelling typos in comments
        hfs: Fix OOB Write in hfs_asc2mac
        hfs: fix OOB Read in __hfs_brec_find
        relay: fix type mismatch when allocating memory in relay_create_buf()
        ocfs2: always read both high and low parts of dinode link count
        io-mapping: move some code within the include guarded section
        kernel: kcsan: kcsan_test: build without structleak plugin
        mailmap: update email for Iskren Chernev
        eventfd: change int to __u64 in eventfd_signal() ifndef CONFIG_EVENTFD
        rapidio: fix possible UAF when kfifo_alloc() fails
        relay: use strscpy() is more robust and safer
        cpumask: limit visibility of FORCE_NR_CPUS
        acct: fix potential integer overflow in encode_comp_t()
        acct: fix accuracy loss for input value of encode_comp_t()
        linux/init.h: include <linux/build_bug.h> and <linux/stringify.h>
        rapidio: rio: fix possible name leak in rio_register_mport()
        rapidio: fix possible name leaks when rio_add_device() fails
        ...
      8702f2c6
    • Linus Torvalds's avatar
      Merge tag 'docs-6.2' of git://git.lwn.net/linux · a7cacfb0
      Linus Torvalds authored
      Pull documentation updates from Jonathan Corbet:
       "This was a not-too-busy cycle for documentation; highlights include:
      
         - The beginnings of a set of translations into Spanish, headed up by
           Carlos Bilbao
      
         - More Chinese translations
      
         - A change to the Sphinx "alabaster" theme by default for HTML
           generation.
      
           Unlike the previous default (Read the Docs), alabaster is shipped
           with Sphinx by default, reducing the number of other dependencies
           that need to be installed. It also (IMO) produces a cleaner and
           more readable result.
      
         - The ability to render the documentation into the texinfo format
           (something Sphinx could always do, we just never wired it up until
           now)
      
        Plus the usual collection of typo fixes, build-warning fixes, and
        minor updates"
      
      * tag 'docs-6.2' of git://git.lwn.net/linux: (67 commits)
        Documentation/features: Use loongarch instead of loong
        Documentation/features-refresh.sh: Only sed the beginning "arch" of ARCH_DIR
        docs/zh_CN: Fix '.. only::' directive's expression
        docs/sp_SP: Add memory-barriers.txt Spanish translation
        docs/zh_CN/LoongArch: Update links of LoongArch ISA Vol1 and ELF psABI
        docs/LoongArch: Update links of LoongArch ISA Vol1 and ELF psABI
        Documentation/features: Update feature lists for 6.1
        Documentation: Fixed a typo in bootconfig.rst
        docs/sp_SP: Add process coding-style translation
        docs/sp_SP: Add kernel-docs.rst Spanish translation
        docs: Create translations/sp_SP/process/, move submitting-patches.rst
        docs: Add book to process/kernel-docs.rst
        docs: Retire old resources from kernel-docs.rst
        docs: Update maintainer of kernel-docs.rst
        Documentation: riscv: Document the sv57 VM layout
        Documentation: USB: correct possessive "its" usage
        math64: fix kernel-doc return value warnings
        math64: add kernel-doc for DIV64_U64_ROUND_UP
        math64: favor kernel-doc from header files
        doc: add texinfodocs and infodocs targets
        ...
      a7cacfb0
    • Linus Torvalds's avatar
      Merge tag 'rust-6.2' of https://github.com/Rust-for-Linux/linux · 96f42635
      Linus Torvalds authored
      Pull rust updates from Miguel Ojeda:
       "The first set of changes after the merge, the major ones being:
      
         - String and formatting: new types 'CString', 'CStr', 'BStr' and
           'Formatter'; new macros 'c_str!', 'b_str!' and 'fmt!'.
      
         - Errors: the rest of the error codes from 'errno-base.h', as well as
           some 'From' trait implementations for the 'Error' type.
      
         - Printing: the rest of the 'pr_*!' levels and the continuation one
           'pr_cont!', as well as a new sample.
      
         - 'alloc' crate: new constructors 'try_with_capacity()' and
           'try_with_capacity_in()' for 'RawVec' and 'Vec'.
      
         - Procedural macros: new macros '#[vtable]' and 'concat_idents!', as
           well as better ergonomics for 'module!' users.
      
         - Asserting: new macros 'static_assert!', 'build_error!' and
           'build_assert!', as well as a new crate 'build_error' to support
           them.
      
         - Vocabulary types: new types 'Opaque' and 'Either'.
      
         - Debugging: new macro 'dbg!'"
      
      * tag 'rust-6.2' of https://github.com/Rust-for-Linux/linux: (28 commits)
        rust: types: add `Opaque` type
        rust: types: add `Either` type
        rust: build_assert: add `build_{error,assert}!` macros
        rust: add `build_error` crate
        rust: static_assert: add `static_assert!` macro
        rust: std_vendor: add `dbg!` macro based on `std`'s one
        rust: str: add `fmt!` macro
        rust: str: add `CString` type
        rust: str: add `Formatter` type
        rust: str: add `c_str!` macro
        rust: str: add `CStr` unit tests
        rust: str: implement several traits for `CStr`
        rust: str: add `CStr` type
        rust: str: add `b_str!` macro
        rust: str: add `BStr` type
        rust: alloc: add `Vec::try_with_capacity{,_in}()` constructors
        rust: alloc: add `RawVec::try_with_capacity_in()` constructor
        rust: prelude: add `error::code::*` constant items
        rust: error: add `From` implementations for `Error`
        rust: error: add codes from `errno-base.h`
        ...
      96f42635
    • Linus Torvalds's avatar
      Merge tag 'trace-tools-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · eb451153
      Linus Torvalds authored
      Pull tracing tools updates from Steven Rostedt:
      
       - New tool "rv" for starting and stopping runtime verification.
         Example:
      
            ./rv mon wip -r printk -v
      
         Enables the wake-in-preempt monitor and the printk reactor in verbose
         mode
      
       - Fix exit status of rtla usage() calls
      
      * tag 'trace-tools-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        Documentation/rv: Add verification/rv man pages
        tools/rv: Add in-kernel monitor interface
        rv: Add rv tool
        rtla: Fix exit status when returning from calls to usage()
      eb451153
    • Linus Torvalds's avatar
      Merge tag 'ktest-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest · 535ea85d
      Linus Torvalds authored
      Pull ktest updates from Steven Rostedt:
      
       - Fix minconfig test to unset the config and not relying on
         olddefconfig to do it, as some configs are set to default y
      
       - Fix reading grub2 menus for handling submenus
      
       - Add new ${shell <cmd>} to execute shell commands that will be useful
         for setting variables like: HOSTNAME := ${shell hostname}
      
      * tag 'ktest-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
        ktest.pl: Add shell commands to variables
        kest.pl: Fix grub2 menu handling for rebooting
        ktest.pl minconfig: Unset configs instead of just removing them
      535ea85d
    • Linus Torvalds's avatar
      Merge tag 'linux-kselftest-kunit-next-6.2-rc1' of... · e2ed78d5
      Linus Torvalds authored
      Merge tag 'linux-kselftest-kunit-next-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull KUnit updates from Shuah Khan:
       "Several enhancements, fixes, clean-ups, documentation updates,
        improvements to logging and KTAP compliance of KUnit test output:
      
         - log numbers in decimal and hex
      
         - parse KTAP compliant test output
      
         - allow conditionally exposing static symbols to tests when KUNIT is
           enabled
      
         - make static symbols visible during kunit testing
      
         - clean-ups to remove unused structure definition"
      
      * tag 'linux-kselftest-kunit-next-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (29 commits)
        Documentation: dev-tools: Clarify requirements for result description
        apparmor: test: make static symbols visible during kunit testing
        kunit: add macro to allow conditionally exposing static symbols to tests
        kunit: tool: make parser preserve whitespace when printing test log
        Documentation: kunit: Fix "How Do I Use This" / "Next Steps" sections
        kunit: tool: don't include KTAP headers and the like in the test log
        kunit: improve KTAP compliance of KUnit test output
        kunit: tool: parse KTAP compliant test output
        mm: slub: test: Use the kunit_get_current_test() function
        kunit: Use the static key when retrieving the current test
        kunit: Provide a static key to check if KUnit is actively running tests
        kunit: tool: make --json do nothing if --raw_ouput is set
        kunit: tool: tweak error message when no KTAP found
        kunit: remove KUNIT_INIT_MEM_ASSERTION macro
        Documentation: kunit: Remove redundant 'tips.rst' page
        Documentation: KUnit: reword description of assertions
        Documentation: KUnit: make usage.rst a superset of tips.rst, remove duplication
        kunit: eliminate KUNIT_INIT_*_ASSERT_STRUCT macros
        kunit: tool: remove redundant file.close() call in unit test
        kunit: tool: unit tests all check parser errors, standardize formatting a bit
        ...
      e2ed78d5
    • Linus Torvalds's avatar
      Merge tag 'linux-kselftest-next-6.2-rc1' of... · 23a68d14
      Linus Torvalds authored
      Merge tag 'linux-kselftest-next-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull Kselftest updates from Shuah Khan:
       "Several fixes and enhancements to existing tests and a few new tests:
      
         - add new amd-pstate tests and fix and enhance existing ones
      
         - add new watchdog tests and enhance existing ones to improve
           coverage
      
         - fixes to ftrace, splice_read, rtc, and efivars tests
      
         - fixes to handle egrep obsolescence in the latest grep release
      
         - miscellaneous spelling and SPDX fixes"
      
      * tag 'linux-kselftest-next-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (24 commits)
        selftests/ftrace: Use long for synthetic event probe test
        selftests/tpm2: Split async tests call to separate shell script runner
        selftests: splice_read: Fix sysfs read cases
        selftests: ftrace: Use "grep -E" instead of "egrep"
        selftests: gpio: Use "grep -E" instead of "egrep"
        selftests: kselftest_deps: Use "grep -E" instead of "egrep"
        selftests/efivarfs: Add checking of the test return value
        cpufreq: amd-pstate: fix spdxcheck warnings for amd-pstate-ut.c
        selftests: rtc: skip when RTC is not present
        selftests/ftrace: event_triggers: wait longer for test_event_enable
        selftests/vDSO: Add riscv getcpu & gettimeofday test
        Documentation: amd-pstate: Add tbench and gitsource test introduction
        selftests: amd-pstate: Trigger gitsource benchmark and test cpus
        selftests: amd-pstate: Trigger tbench benchmark and test cpus
        selftests: amd-pstate: Split basic.sh into run.sh and basic.sh.
        selftests: amd-pstate: Rename amd-pstate-ut.sh to basic.sh.
        selftests/ftrace: Convert tracer tests to use 'requires' to specify program dependency
        selftests/ftrace: Add check for ping command for trigger tests
        selftests/watchdog: Fix spelling mistake "Temeprature" -> "Temperature"
        selftests/watchdog: add test for WDIOC_GETTEMP
        ...
      23a68d14
    • Linus Torvalds's avatar
      Merge tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random · 268325bd
      Linus Torvalds authored
      Pull random number generator updates from Jason Donenfeld:
      
       - Replace prandom_u32_max() and various open-coded variants of it,
         there is now a new family of functions that uses fast rejection
         sampling to choose properly uniformly random numbers within an
         interval:
      
             get_random_u32_below(ceil) - [0, ceil)
             get_random_u32_above(floor) - (floor, U32_MAX]
             get_random_u32_inclusive(floor, ceil) - [floor, ceil]
      
         Coccinelle was used to convert all current users of
         prandom_u32_max(), as well as many open-coded patterns, resulting in
         improvements throughout the tree.
      
         I'll have a "late" 6.1-rc1 pull for you that removes the now unused
         prandom_u32_max() function, just in case any other trees add a new
         use case of it that needs to converted. According to linux-next,
         there may be two trivial cases of prandom_u32_max() reintroductions
         that are fixable with a 's/.../.../'. So I'll have for you a final
         conversion patch doing that alongside the removal patch during the
         second week.
      
         This is a treewide change that touches many files throughout.
      
       - More consistent use of get_random_canary().
      
       - Updates to comments, documentation, tests, headers, and
         simplification in configuration.
      
       - The arch_get_random*_early() abstraction was only used by arm64 and
         wasn't entirely useful, so this has been replaced by code that works
         in all relevant contexts.
      
       - The kernel will use and manage random seeds in non-volatile EFI
         variables, refreshing a variable with a fresh seed when the RNG is
         initialized. The RNG GUID namespace is then hidden from efivarfs to
         prevent accidental leakage.
      
         These changes are split into random.c infrastructure code used in the
         EFI subsystem, in this pull request, and related support inside of
         EFISTUB, in Ard's EFI tree. These are co-dependent for full
         functionality, but the order of merging doesn't matter.
      
       - Part of the infrastructure added for the EFI support is also used for
         an improvement to the way vsprintf initializes its siphash key,
         replacing an sleep loop wart.
      
       - The hardware RNG framework now always calls its correct random.c
         input function, add_hwgenerator_randomness(), rather than sometimes
         going through helpers better suited for other cases.
      
       - The add_latent_entropy() function has long been called from the fork
         handler, but is a no-op when the latent entropy gcc plugin isn't
         used, which is fine for the purposes of latent entropy.
      
         But it was missing out on the cycle counter that was also being mixed
         in beside the latent entropy variable. So now, if the latent entropy
         gcc plugin isn't enabled, add_latent_entropy() will expand to a call
         to add_device_randomness(NULL, 0), which adds a cycle counter,
         without the absent latent entropy variable.
      
       - The RNG is now reseeded from a delayed worker, rather than on demand
         when used. Always running from a worker allows it to make use of the
         CPU RNG on platforms like S390x, whose instructions are too slow to
         do so from interrupts. It also has the effect of adding in new inputs
         more frequently with more regularity, amounting to a long term
         transcript of random values. Plus, it helps a bit with the upcoming
         vDSO implementation (which isn't yet ready for 6.2).
      
       - The jitter entropy algorithm now tries to execute on many different
         CPUs, round-robining, in hopes of hitting even more memory latencies
         and other unpredictable effects. It also will mix in a cycle counter
         when the entropy timer fires, in addition to being mixed in from the
         main loop, to account more explicitly for fluctuations in that timer
         firing. And the state it touches is now kept within the same cache
         line, so that it's assured that the different execution contexts will
         cause latencies.
      
      * tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (23 commits)
        random: include <linux/once.h> in the right header
        random: align entropy_timer_state to cache line
        random: mix in cycle counter when jitter timer fires
        random: spread out jitter callback to different CPUs
        random: remove extraneous period and add a missing one in comments
        efi: random: refresh non-volatile random seed when RNG is initialized
        vsprintf: initialize siphash key using notifier
        random: add back async readiness notifier
        random: reseed in delayed work rather than on-demand
        random: always mix cycle counter in add_latent_entropy()
        hw_random: use add_hwgenerator_randomness() for early entropy
        random: modernize documentation comment on get_random_bytes()
        random: adjust comment to account for removed function
        random: remove early archrandom abstraction
        random: use random.trust_{bootloader,cpu} command line option only
        stackprotector: actually use get_random_canary()
        stackprotector: move get_random_canary() into stackprotector.h
        treewide: use get_random_u32_inclusive() when possible
        treewide: use get_random_u32_{above,below}() instead of manual loop
        treewide: use get_random_u32_below() instead of deprecated function
        ...
      268325bd
    • Linus Torvalds's avatar
      Merge branch 'for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu · ca1443c7
      Linus Torvalds authored
      Pull percpu updates from Dennis Zhou:
       "Baoquan was nice enough to run some clean ups for percpu"
      
      * 'for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu:
        mm/percpu: remove unused PERCPU_DYNAMIC_EARLY_SLOTS
        mm/percpu.c: remove the lcm code since block size is fixed at page size
        mm/percpu: replace the goto with break
        mm/percpu: add comment to state the empty populated pages accounting
        mm/percpu: Update the code comment when creating new chunk
        mm/percpu: use list_first_entry_or_null in pcpu_reclaim_populated()
        mm/percpu: remove unused pcpu_map_extend_chunks
      ca1443c7
    • Linus Torvalds's avatar
      Merge tag 'livepatching-for-6.2' of... · e1a1ccef
      Linus Torvalds authored
      Merge tag 'livepatching-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching
      
      Pull livepatching update from Petr Mladek:
      
       - code cleanup
      
      * tag 'livepatching-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
        livepatch: Move the result-invariant calculation out of the loop
      e1a1ccef
  2. 12 Dec, 2022 13 commits
    • Linus Torvalds's avatar
      Merge tag 'cgroup-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup · a312a8cc
      Linus Torvalds authored
      Pull cgroup updates from Tejun Heo:
       "Nothing too interesting:
      
         - Add CONFIG_DEBUG_GROUP_REF which makes cgroup refcnt operations
           kprobable
      
         - A couple cpuset optimizations
      
         - Other misc changes including doc and test updates"
      
      * tag 'cgroup-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
        cgroup: remove rcu_read_lock()/rcu_read_unlock() in critical section of spin_lock_irq()
        cgroup/cpuset: Improve cpuset_css_alloc() description
        kselftest/cgroup: Add cleanup() to test_cpuset_prs.sh
        cgroup/cpuset: Optimize cpuset_attach() on v2
        cgroup/cpuset: Skip spread flags update on v2
        kselftest/cgroup: Fix gathering number of CPUs
        cgroup: cgroup refcnt functions should be exported when CONFIG_DEBUG_CGROUP_REF
        cgroup: Implement DEBUG_CGROUP_REF
      a312a8cc
    • Linus Torvalds's avatar
      Merge tag 'sched-core-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · bf57ae21
      Linus Torvalds authored
      Pull scheduler updates from Ingo Molnar:
      
       - Implement persistent user-requested affinity: introduce
         affinity_context::user_mask and unconditionally preserve the
         user-requested CPU affinity masks, for long-lived tasks to better
         interact with cpusets & CPU hotplug events over longer timespans,
         without destroying the original affinity intent if the underlying
         topology changes.
      
       - Uclamp updates: fix relationship between uclamp and fits_capacity()
      
       - PSI fixes
      
       - Misc fixes & updates
      
      * tag 'sched-core-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched: Clear ttwu_pending after enqueue_task()
        sched/psi: Use task->psi_flags to clear in CPU migration
        sched/psi: Stop relying on timer_pending() for poll_work rescheduling
        sched/psi: Fix avgs_work re-arm in psi_avgs_work()
        sched/psi: Fix possible missing or delayed pending event
        sched: Always clear user_cpus_ptr in do_set_cpus_allowed()
        sched: Enforce user requested affinity
        sched: Always preserve the user requested cpumask
        sched: Introduce affinity_context
        sched: Add __releases annotations to affine_move_task()
        sched/fair: Check if prev_cpu has highest spare cap in feec()
        sched/fair: Consider capacity inversion in util_fits_cpu()
        sched/fair: Detect capacity inversion
        sched/uclamp: Cater for uclamp in find_energy_efficient_cpu()'s early exit condition
        sched/uclamp: Make cpu_overutilized() use util_fits_cpu()
        sched/uclamp: Make asym_fits_capacity() use util_fits_cpu()
        sched/uclamp: Make select_idle_capacity() use util_fits_cpu()
        sched/uclamp: Fix fits_capacity() check in feec()
        sched/uclamp: Make task_fits_capacity() use util_fits_cpu()
        sched/uclamp: Fix relationship between uclamp and migration margin
      bf57ae21
    • Linus Torvalds's avatar
      Merge tag 'perf-core-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · add76959
      Linus Torvalds authored
      Pull perf events updates from Ingo Molnar:
      
       - Thoroughly rewrite the data structures that implement perf task
         context handling, with the goal of fixing various quirks and
         unfeatures both in already merged, and in upcoming proposed code.
      
         The old data structure is the per task and per cpu
         perf_event_contexts:
      
               task_struct::perf_events_ctxp[] <-> perf_event_context <-> perf_cpu_context
                    ^                                 |    ^     |           ^
                    `---------------------------------'    |     `--> pmu ---'
                                                           v           ^
                                                      perf_event ------'
      
         In this new design this is replaced with a single task context and a
         single CPU context, plus intermediate data-structures:
      
               task_struct::perf_event_ctxp -> perf_event_context <- perf_cpu_context
                    ^                           |   ^ ^
                    `---------------------------'   | |
                                                    | |    perf_cpu_pmu_context <--.
                                                    | `----.    ^                  |
                                                    |      |    |                  |
                                                    |      v    v                  |
                                                    | ,--> perf_event_pmu_context  |
                                                    | |                            |
                                                    | |                            |
                                                    v v                            |
                                               perf_event ---> pmu ----------------'
      
         [ See commit bd275681 for more details. ]
      
         This rewrite was developed by Peter Zijlstra and Ravi Bangoria.
      
       - Optimize perf_tp_event()
      
       - Update the Intel uncore PMU driver, extending it with UPI topology
         discovery on various hardware models.
      
       - Misc fixes & cleanups
      
      * tag 'perf-core-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
        perf/x86/intel/uncore: Fix reference count leak in __uncore_imc_init_box()
        perf/x86/intel/uncore: Fix reference count leak in snr_uncore_mmio_map()
        perf/x86/intel/uncore: Fix reference count leak in hswep_has_limit_sbox()
        perf/x86/intel/uncore: Fix reference count leak in sad_cfg_iio_topology()
        perf/x86/intel/uncore: Make set_mapping() procedure void
        perf/x86/intel/uncore: Update sysfs-devices-mapping file
        perf/x86/intel/uncore: Enable UPI topology discovery for Sapphire Rapids
        perf/x86/intel/uncore: Enable UPI topology discovery for Icelake Server
        perf/x86/intel/uncore: Get UPI NodeID and GroupID
        perf/x86/intel/uncore: Enable UPI topology discovery for Skylake Server
        perf/x86/intel/uncore: Generalize get_topology() for SKX PMUs
        perf/x86/intel/uncore: Disable I/O stacks to PMU mapping on ICX-D
        perf/x86/intel/uncore: Clear attr_update properly
        perf/x86/intel/uncore: Introduce UPI topology type
        perf/x86/intel/uncore: Generalize IIO topology support
        perf/core: Don't allow grouping events from different hw pmus
        perf/amd/ibs: Make IBS a core pmu
        perf: Fix function pointer case
        perf/x86/amd: Remove the repeated declaration
        perf: Fix possible memleak in pmu_dev_alloc()
        ...
      add76959
    • Linus Torvalds's avatar
      Merge tag 'locking-core-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 617fe4fa
      Linus Torvalds authored
      Pull locking updates from Ingo Molnar:
       "Two changes in this cycle:
      
         - a micro-optimization in static_key_slow_inc_cpuslocked()
      
         - fix futex death-notification wakeup bug"
      
      * tag 'locking-core-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        futex: Resend potentially swallowed owner death notification
        jump_label: Use atomic_try_cmpxchg() in static_key_slow_inc_cpuslocked()
      617fe4fa
    • Linus Torvalds's avatar
      Merge tag 'x86_alternatives_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 2f60f830
      Linus Torvalds authored
      Pull x86 alternative update from Borislav Petkov:
       "A single alternatives patching fix for modules:
      
         - Have alternatives patch the same sections in modules as in vmlinux"
      
      * tag 'x86_alternatives_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/alternative: Consistently patch SMP locks in vmlinux and modules
      2f60f830
    • Linus Torvalds's avatar
      Merge tag 'ras_core_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9196a0ba
      Linus Torvalds authored
      Pull x86 RAS updates from Borislav Petkov:
      
       - Fix confusing output from /sys/kernel/debug/ras/daemon_active
      
       - Add another MCE severity error case to the Intel error severity table
         to promote UC and AR errors to panic severity and remove the
         corresponding code condition doing that.
      
       - Make sure the thresholding and deferred error interrupts on AMD SMCA
         systems clear the all registers reporting an error so that there are
         no multiple errors logged for the same event
      
      * tag 'ras_core_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        RAS: Fix return value from show_trace()
        x86/mce: Use severity table to handle uncorrected errors in kernel
        x86/MCE/AMD: Clear DFR errors found in THR handler
      9196a0ba
    • Linus Torvalds's avatar
      Merge tag 'edac_updates_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras · 7adcadb9
      Linus Torvalds authored
      Pull EDAC updates from Borislav Petkov:
      
       - Make ghes_edac a simple module like the rest of the EDAC drivers and
         drop the forced built-in only configuration by disentangling it from
         GHES (Jia He)
      
       - The usual small cleanups and improvements all over EDAC land
      
      * tag 'edac_updates_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
        EDAC/i10nm: fix refcount leak in pci_get_dev_wrapper()
        EDAC/i5400: Fix typo in comment: vaious -> various
        EDAC/mc_sysfs: Increase legacy channel support to 12
        MAINTAINERS: Make Mauro EDAC reviewer
        MAINTAINERS: Make Manivannan Sadhasivam the maintainer of qcom_edac
        EDAC/igen6: Return the correct error type when not the MC owner
        apei/ghes: Use xchg_release() for updating new cache slot instead of cmpxchg()
        EDAC: Check for GHES preference in the chipset-specific EDAC drivers
        EDAC/ghes: Make ghes_edac a proper module
        EDAC/ghes: Prepare to make ghes_edac a proper module
        EDAC/ghes: Add a notifier for reporting memory errors
        efi/cper: Export several helpers for ghes_edac to use
        EDAC/i5000: Mark as BROKEN
      7adcadb9
    • Linus Torvalds's avatar
      Merge tag 'x86_fpu_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 40deb5e4
      Linus Torvalds authored
      Pull x86 fpu updates from Dave Hansen:
       "There are two little fixes in here, one to give better XSAVE warnings
        and another to address some undefined behavior in offsetof().
      
        There is also a collection of patches to fix some issues with ptrace
        and the protection keys register (PKRU). PKRU is a real oddity because
        it is exposed in the XSAVE-related ABIs, but it is generally managed
        without using XSAVE in the kernel. This fix thankfully came with a
        selftest to ward off future regressions.
      
        Summary:
      
         - Clarify XSAVE consistency warnings
      
         - Fix up ptrace interface to protection keys register (PKRU)
      
         - Avoid undefined compiler behavior with TYPE_ALIGN"
      
      * tag 'x86_fpu_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/fpu: Use _Alignof to avoid undefined behavior in TYPE_ALIGN
        selftests/vm/pkeys: Add a regression test for setting PKRU through ptrace
        x86/fpu: Emulate XRSTOR's behavior if the xfeatures PKRU bit is not set
        x86/fpu: Allow PKRU to be (once again) written by ptrace.
        x86/fpu: Add a pkru argument to copy_uabi_to_xstate()
        x86/fpu: Add a pkru argument to copy_uabi_from_kernel_to_xstate().
        x86/fpu: Take task_struct* in copy_sigframe_from_user_to_xstate()
        x86/fpu/xstate: Fix XSTATE_WARN_ON() to emit relevant diagnostics
      40deb5e4
    • Linus Torvalds's avatar
      Merge tag 'x86_splitlock_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1cab145a
      Linus Torvalds authored
      Pull x86 splitlock updates from Dave Hansen:
       "Add a sysctl to control the split lock misery mode.
      
        This enables users to reduce the penalty inflicted on split lock
        users. There are some proprietary, binary-only games which became
        entirely unplayable with the old penalty.
      
        Anyone opting into the new mode is, of course, more exposed to the DoS
        nasitness inherent with split locks, but they can play their games
        again"
      
      * tag 'x86_splitlock_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/split_lock: Add sysctl to control the misery mode
      1cab145a
    • Linus Torvalds's avatar
      Merge tag 'x86_cache_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 287f037d
      Linus Torvalds authored
      Pull x86 cache resource control updates from Dave Hansen:
       "These declare the resource control (rectrl) MSRs a bit more normally
        and clean up an unnecessary structure member:
      
         - Remove unnecessary arch_has_empty_bitmaps structure memory
      
         - Move rescrtl MSR defines into msr-index.h, like normal MSRs"
      
      * tag 'x86_cache_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/resctrl: Move MSR defines into msr-index.h
        x86/resctrl: Remove arch_has_empty_bitmaps
      287f037d
    • Linus Torvalds's avatar
      Merge tag 'x86_tdx_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a89ef2aa
      Linus Torvalds authored
      Pull x86 tdx updates from Dave Hansen:
       "This includes a single chunk of new functionality for TDX guests which
        allows them to talk to the trusted TDX module software and obtain an
        attestation report.
      
        This report can then be used to prove the trustworthiness of the guest
        to a third party and get access to things like storage encryption
        keys"
      
      * tag 'x86_tdx_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        selftests/tdx: Test TDX attestation GetReport support
        virt: Add TDX guest driver
        x86/tdx: Add a wrapper to get TDREPORT0 from the TDX Module
      a89ef2aa
    • Linus Torvalds's avatar
      Merge tag 'x86_sgx_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 2da68a77
      Linus Torvalds authored
      Pull x86 sgx updates from Dave Hansen:
       "The biggest deal in this series is support for a new hardware feature
        that allows enclaves to detect and mitigate single-stepping attacks.
      
        There's also a minor performance tweak and a little piece of the
        kmap_atomic() -> kmap_local() transition.
      
        Summary:
      
         - Introduce a new SGX feature (Asynchrounous Exit Notification) for
           bare-metal enclaves and KVM guests to mitigate single-step attacks
      
         - Increase batching to speed up enclave release
      
         - Replace kmap/kunmap_atomic() calls"
      
      * tag 'x86_sgx_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/sgx: Replace kmap/kunmap_atomic() calls
        KVM/VMX: Allow exposing EDECCSSA user leaf function to KVM guest
        x86/sgx: Allow enclaves to use Asynchrounous Exit Notification
        x86/sgx: Reduce delay and interference of enclave release
      2da68a77
    • Linus Torvalds's avatar
      Merge tag 'cxl-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl · c1f0fcd8
      Linus Torvalds authored
      Pull cxl updates from Dan Williams:
       "Compute Express Link (CXL) updates for 6.2.
      
        While it may seem backwards, the CXL update this time around includes
        some focus on CXL 1.x enabling where the work to date had been with
        CXL 2.0 (VH topologies) in mind.
      
        First generation CXL can mostly be supported via BIOS, similar to DDR,
        however it became clear there are use cases for OS native CXL error
        handling and some CXL 3.0 endpoint features can be deployed on CXL 1.x
        hosts (Restricted CXL Host (RCH) topologies). So, this update brings
        RCH topologies into the Linux CXL device model.
      
        In support of the ongoing CXL 2.0+ enabling two new core kernel
        facilities are added.
      
        One is the ability for the kernel to flag collisions between userspace
        access to PCI configuration registers and kernel accesses. This is
        brought on by the PCIe Data-Object-Exchange (DOE) facility, a hardware
        mailbox over config-cycles.
      
        The other is a cpu_cache_invalidate_memregion() API that maps to
        wbinvd_on_all_cpus() on x86. To prevent abuse it is disabled in guest
        VMs and architectures that do not support it yet. The CXL paths that
        need it, dynamic memory region creation and security commands (erase /
        unlock), are disabled when it is not present.
      
        As for the CXL 2.0+ this cycle the subsystem gains support Persistent
        Memory Security commands, error handling in response to PCIe AER
        notifications, and support for the "XOR" host bridge interleave
        algorithm.
      
        Summary:
      
         - Add the cpu_cache_invalidate_memregion() API for cache flushing in
           response to physical memory reconfiguration, or memory-side data
           invalidation from operations like secure erase or memory-device
           unlock.
      
         - Add a facility for the kernel to warn about collisions between
           kernel and userspace access to PCI configuration registers
      
         - Add support for Restricted CXL Host (RCH) topologies (formerly CXL
           1.1)
      
         - Add handling and reporting of CXL errors reported via the PCIe AER
           mechanism
      
         - Add support for CXL Persistent Memory Security commands
      
         - Add support for the "XOR" algorithm for CXL host bridge interleave
      
         - Rework / simplify CXL to NVDIMM interactions
      
         - Miscellaneous cleanups and fixes"
      
      * tag 'cxl-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (71 commits)
        cxl/region: Fix memdev reuse check
        cxl/pci: Remove endian confusion
        cxl/pci: Add some type-safety to the AER trace points
        cxl/security: Drop security command ioctl uapi
        cxl/mbox: Add variable output size validation for internal commands
        cxl/mbox: Enable cxl_mbox_send_cmd() users to validate output size
        cxl/security: Fix Get Security State output payload endian handling
        cxl: update names for interleave ways conversion macros
        cxl: update names for interleave granularity conversion macros
        cxl/acpi: Warn about an invalid CHBCR in an existing CHBS entry
        tools/testing/cxl: Require cache invalidation bypass
        cxl/acpi: Fail decoder add if CXIMS for HBIG is missing
        cxl/region: Fix spelling mistake "memergion" -> "memregion"
        cxl/regs: Fix sparse warning
        cxl/acpi: Set ACPI's CXL _OSC to indicate RCD mode support
        tools/testing/cxl: Add an RCH topology
        cxl/port: Add RCD endpoint port enumeration
        cxl/mem: Move devm_cxl_add_endpoint() from cxl_core to cxl_mem
        tools/testing/cxl: Add XOR Math support to cxl_test
        cxl/acpi: Support CXL XOR Interleave Math (CXIMS)
        ...
      c1f0fcd8