1. 15 Jul, 2022 1 commit
  2. 08 Jul, 2022 1 commit
    • Christian Brauner's avatar
      ovl: turn of SB_POSIXACL with idmapped layers temporarily · 4a47c638
      Christian Brauner authored
      This cycle we added support for mounting overlayfs on top of idmapped
      mounts.  Recently I've started looking into potential corner cases when
      trying to add additional tests and I noticed that reporting for POSIX ACLs
      is currently wrong when using idmapped layers with overlayfs mounted on top
      of it.
      
      I have sent out an patch that fixes this and makes POSIX ACLs work
      correctly but the patch is a bit bigger and we're already at -rc5 so I
      recommend we simply don't raise SB_POSIXACL when idmapped layers are
      used. Then we can fix the VFS part described below for the next merge
      window so we can have good exposure in -next.
      
      I'm going to give a rather detailed explanation to both the origin of the
      problem and mention the solution so people know what's going on.
      
      Let's assume the user creates the following directory layout and they have
      a rootfs /var/lib/lxc/c1/rootfs. The files in this rootfs are owned as you
      would expect files on your host system to be owned. For example, ~/.bashrc
      for your regular user would be owned by 1000:1000 and /root/.bashrc would
      be owned by 0:0. IOW, this is just regular boring filesystem tree on an
      ext4 or xfs filesystem.
      
      The user chooses to set POSIX ACLs using the setfacl binary granting the
      user with uid 4 read, write, and execute permissions for their .bashrc
      file:
      
              setfacl -m u:4:rwx /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc
      
      Now they to expose the whole rootfs to a container using an idmapped
      mount. So they first create:
      
              mkdir -pv /vol/contpool/{ctrover,merge,lowermap,overmap}
              mkdir -pv /vol/contpool/ctrover/{over,work}
              chown 10000000:10000000 /vol/contpool/ctrover/{over,work}
      
      The user now creates an idmapped mount for the rootfs:
      
              mount-idmapped/mount-idmapped --map-mount=b:0:10000000:65536 \
                                            /var/lib/lxc/c2/rootfs \
                                            /vol/contpool/lowermap
      
      This for example makes it so that
      /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc which is owned by uid and gid
      1000 as being owned by uid and gid 10001000 at
      /vol/contpool/lowermap/home/ubuntu/.bashrc.
      
      Assume the user wants to expose these idmapped mounts through an overlayfs
      mount to a container.
      
             mount -t overlay overlay                      \
                   -o lowerdir=/vol/contpool/lowermap,     \
                      upperdir=/vol/contpool/overmap/over, \
                      workdir=/vol/contpool/overmap/work   \
                   /vol/contpool/merge
      
      The user can do this in two ways:
      
      (1) Mount overlayfs in the initial user namespace and expose it to the
          container.
      
      (2) Mount overlayfs on top of the idmapped mounts inside of the container's
          user namespace.
      
      Let's assume the user chooses the (1) option and mounts overlayfs on the
      host and then changes into a container which uses the idmapping
      0:10000000:65536 which is the same used for the two idmapped mounts.
      
      Now the user tries to retrieve the POSIX ACLs using the getfacl command
      
              getfacl -n /vol/contpool/lowermap/home/ubuntu/.bashrc
      
      and to their surprise they see:
      
              # file: vol/contpool/merge/home/ubuntu/.bashrc
              # owner: 1000
              # group: 1000
              user::rw-
              user:4294967295:rwx
              group::r--
              mask::rwx
              other::r--
      
      indicating the uid wasn't correctly translated according to the idmapped
      mount. The problem is how we currently translate POSIX ACLs. Let's inspect
      the callchain in this example:
      
              idmapped mount /vol/contpool/merge:      0:10000000:65536
              caller's idmapping:                      0:10000000:65536
              overlayfs idmapping (ofs->creator_cred): 0:0:4k /* initial idmapping */
      
              sys_getxattr()
              -> path_getxattr()
                 -> getxattr()
                    -> do_getxattr()
                        |> vfs_getxattr()
                        |  -> __vfs_getxattr()
                        |     -> handler->get == ovl_posix_acl_xattr_get()
                        |        -> ovl_xattr_get()
                        |           -> vfs_getxattr()
                        |              -> __vfs_getxattr()
                        |                 -> handler->get() /* lower filesystem callback */
                        |> posix_acl_fix_xattr_to_user()
                           {
                                    4 = make_kuid(&init_user_ns, 4);
                                    4 = mapped_kuid_fs(&init_user_ns /* no idmapped mount */, 4);
                                    /* FAILURE */
                                   -1 = from_kuid(0:10000000:65536 /* caller's idmapping */, 4);
                           }
      
      If the user chooses to use option (2) and mounts overlayfs on top of
      idmapped mounts inside the container things don't look that much better:
      
              idmapped mount /vol/contpool/merge:      0:10000000:65536
              caller's idmapping:                      0:10000000:65536
              overlayfs idmapping (ofs->creator_cred): 0:10000000:65536
      
              sys_getxattr()
              -> path_getxattr()
                 -> getxattr()
                    -> do_getxattr()
                        |> vfs_getxattr()
                        |  -> __vfs_getxattr()
                        |     -> handler->get == ovl_posix_acl_xattr_get()
                        |        -> ovl_xattr_get()
                        |           -> vfs_getxattr()
                        |              -> __vfs_getxattr()
                        |                 -> handler->get() /* lower filesystem callback */
                        |> posix_acl_fix_xattr_to_user()
                           {
                                    4 = make_kuid(&init_user_ns, 4);
                                    4 = mapped_kuid_fs(&init_user_ns, 4);
                                    /* FAILURE */
                                   -1 = from_kuid(0:10000000:65536 /* caller's idmapping */, 4);
                           }
      
      As is easily seen the problem arises because the idmapping of the lower
      mount isn't taken into account as all of this happens in do_gexattr(). But
      do_getxattr() is always called on an overlayfs mount and inode and thus
      cannot possible take the idmapping of the lower layers into account.
      
      This problem is similar for fscaps but there the translation happens as
      part of vfs_getxattr() already. Let's walk through an fscaps overlayfs
      callchain:
      
              setcap 'cap_net_raw+ep' /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc
      
      The expected outcome here is that we'll receive the cap_net_raw capability
      as we are able to map the uid associated with the fscap to 0 within our
      container.  IOW, we want to see 0 as the result of the idmapping
      translations.
      
      If the user chooses option (1) we get the following callchain for fscaps:
      
              idmapped mount /vol/contpool/merge:      0:10000000:65536
              caller's idmapping:                      0:10000000:65536
              overlayfs idmapping (ofs->creator_cred): 0:0:4k /* initial idmapping */
      
              sys_getxattr()
              -> path_getxattr()
                 -> getxattr()
                    -> do_getxattr()
                         -> vfs_getxattr()
                            -> xattr_getsecurity()
                               -> security_inode_getsecurity()                                       ________________________________
                                  -> cap_inode_getsecurity()                                         |                              |
                                     {                                                               V                              |
                                              10000000 = make_kuid(0:0:4k /* overlayfs idmapping */, 10000000);                     |
                                              10000000 = mapped_kuid_fs(0:0:4k /* no idmapped mount */, 10000000);                  |
                                                     /* Expected result is 0 and thus that we own the fscap. */                     |
                                                     0 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000000);            |
                                     }                                                                                              |
                                     -> vfs_getxattr_alloc()                                                                        |
                                        -> handler->get == ovl_other_xattr_get()                                                    |
                                           -> vfs_getxattr()                                                                        |
                                              -> xattr_getsecurity()                                                                |
                                                 -> security_inode_getsecurity()                                                    |
                                                    -> cap_inode_getsecurity()                                                      |
                                                       {                                                                            |
                                                                      0 = make_kuid(0:0:4k /* lower s_user_ns */, 0);               |
                                                               10000000 = mapped_kuid_fs(0:10000000:65536 /* idmapped mount */, 0); |
                                                               10000000 = from_kuid(0:0:4k /* overlayfs idmapping */, 10000000);    |
                                                               |____________________________________________________________________|
                                                       }
                                                       -> vfs_getxattr_alloc()
                                                          -> handler->get == /* lower filesystem callback */
      
      And if the user chooses option (2) we get:
      
              idmapped mount /vol/contpool/merge:      0:10000000:65536
              caller's idmapping:                      0:10000000:65536
              overlayfs idmapping (ofs->creator_cred): 0:10000000:65536
      
              sys_getxattr()
              -> path_getxattr()
                 -> getxattr()
                    -> do_getxattr()
                         -> vfs_getxattr()
                            -> xattr_getsecurity()
                               -> security_inode_getsecurity()                                                _______________________________
                                  -> cap_inode_getsecurity()                                                  |                             |
                                     {                                                                        V                             |
                                             10000000 = make_kuid(0:10000000:65536 /* overlayfs idmapping */, 0);                           |
                                             10000000 = mapped_kuid_fs(0:0:4k /* no idmapped mount */, 10000000);                           |
                                                     /* Expected result is 0 and thus that we own the fscap. */                             |
                                                    0 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000000);                     |
                                     }                                                                                                      |
                                     -> vfs_getxattr_alloc()                                                                                |
                                        -> handler->get == ovl_other_xattr_get()                                                            |
                                          |-> vfs_getxattr()                                                                                |
                                              -> xattr_getsecurity()                                                                        |
                                                 -> security_inode_getsecurity()                                                            |
                                                    -> cap_inode_getsecurity()                                                              |
                                                       {                                                                                    |
                                                                       0 = make_kuid(0:0:4k /* lower s_user_ns */, 0);                      |
                                                                10000000 = mapped_kuid_fs(0:10000000:65536 /* idmapped mount */, 0);        |
                                                                       0 = from_kuid(0:10000000:65536 /* overlayfs idmapping */, 10000000); |
                                                                       |____________________________________________________________________|
                                                       }
                                                       -> vfs_getxattr_alloc()
                                                          -> handler->get == /* lower filesystem callback */
      
      We can see how the translation happens correctly in those cases as the
      conversion happens within the vfs_getxattr() helper.
      
      For POSIX ACLs we need to do something similar. However, in contrast to
      fscaps we cannot apply the fix directly to the kernel internal posix acl
      data structure as this would alter the cached values and would also require
      a rework of how we currently deal with POSIX ACLs in general which almost
      never take the filesystem idmapping into account (the noteable exception
      being FUSE but even there the implementation is special) and instead
      retrieve the raw values based on the initial idmapping.
      
      The correct values are then generated right before returning to
      userspace. The fix for this is to move taking the mount's idmapping into
      account directly in vfs_getxattr() instead of having it be part of
      posix_acl_fix_xattr_to_user().
      
      To this end we simply move the idmapped mount translation into a separate
      step performed in vfs_{g,s}etxattr() instead of in
      posix_acl_fix_xattr_{from,to}_user().
      
      To see how this fixes things let's go back to the original example. Assume
      the user chose option (1) and mounted overlayfs on top of idmapped mounts
      on the host:
      
              idmapped mount /vol/contpool/merge:      0:10000000:65536
              caller's idmapping:                      0:10000000:65536
              overlayfs idmapping (ofs->creator_cred): 0:0:4k /* initial idmapping */
      
              sys_getxattr()
              -> path_getxattr()
                 -> getxattr()
                    -> do_getxattr()
                        |> vfs_getxattr()
                        |  |> __vfs_getxattr()
                        |  |  -> handler->get == ovl_posix_acl_xattr_get()
                        |  |     -> ovl_xattr_get()
                        |  |        -> vfs_getxattr()
                        |  |           |> __vfs_getxattr()
                        |  |           |  -> handler->get() /* lower filesystem callback */
                        |  |           |> posix_acl_getxattr_idmapped_mnt()
                        |  |              {
                        |  |                              4 = make_kuid(&init_user_ns, 4);
                        |  |                       10000004 = mapped_kuid_fs(0:10000000:65536 /* lower idmapped mount */, 4);
                        |  |                       10000004 = from_kuid(&init_user_ns, 10000004);
                        |  |                       |_______________________
                        |  |              }                               |
                        |  |                                              |
                        |  |> posix_acl_getxattr_idmapped_mnt()           |
                        |     {                                           |
                        |                                                 V
                        |             10000004 = make_kuid(&init_user_ns, 10000004);
                        |             10000004 = mapped_kuid_fs(&init_user_ns /* no idmapped mount */, 10000004);
                        |             10000004 = from_kuid(&init_user_ns, 10000004);
                        |     }       |_________________________________________________
                        |                                                              |
                        |                                                              |
                        |> posix_acl_fix_xattr_to_user()                               |
                           {                                                           V
                                       10000004 = make_kuid(0:0:4k /* init_user_ns */, 10000004);
                                              /* SUCCESS */
                                              4 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000004);
                           }
      
      And similarly if the user chooses option (1) and mounted overayfs on top of
      idmapped mounts inside the container:
      
              idmapped mount /vol/contpool/merge:      0:10000000:65536
              caller's idmapping:                      0:10000000:65536
              overlayfs idmapping (ofs->creator_cred): 0:10000000:65536
      
              sys_getxattr()
              -> path_getxattr()
                 -> getxattr()
                    -> do_getxattr()
                        |> vfs_getxattr()
                        |  |> __vfs_getxattr()
                        |  |  -> handler->get == ovl_posix_acl_xattr_get()
                        |  |     -> ovl_xattr_get()
                        |  |        -> vfs_getxattr()
                        |  |           |> __vfs_getxattr()
                        |  |           |  -> handler->get() /* lower filesystem callback */
                        |  |           |> posix_acl_getxattr_idmapped_mnt()
                        |  |              {
                        |  |                              4 = make_kuid(&init_user_ns, 4);
                        |  |                       10000004 = mapped_kuid_fs(0:10000000:65536 /* lower idmapped mount */, 4);
                        |  |                       10000004 = from_kuid(&init_user_ns, 10000004);
                        |  |                       |_______________________
                        |  |              }                               |
                        |  |                                              |
                        |  |> posix_acl_getxattr_idmapped_mnt()           |
                        |     {                                           V
                        |             10000004 = make_kuid(&init_user_ns, 10000004);
                        |             10000004 = mapped_kuid_fs(&init_user_ns /* no idmapped mount */, 10000004);
                        |             10000004 = from_kuid(0(&init_user_ns, 10000004);
                        |             |_________________________________________________
                        |     }                                                        |
                        |                                                              |
                        |> posix_acl_fix_xattr_to_user()                               |
                           {                                                           V
                                       10000004 = make_kuid(0:0:4k /* init_user_ns */, 10000004);
                                              /* SUCCESS */
                                              4 = from_kuid(0:10000000:65536 /* caller's idmappings */, 10000004);
                           }
      
      The last remaining problem we need to fix here is ovl_get_acl(). During
      ovl_permission() overlayfs will call:
      
              ovl_permission()
              -> generic_permission()
                 -> acl_permission_check()
                    -> check_acl()
                       -> get_acl()
                          -> inode->i_op->get_acl() == ovl_get_acl()
                              > get_acl() /* on the underlying filesystem)
                                ->inode->i_op->get_acl() == /*lower filesystem callback */
                       -> posix_acl_permission()
      
      passing through the get_acl request to the underlying filesystem. This will
      retrieve the acls stored in the lower filesystem without taking the
      idmapping of the underlying mount into account as this would mean altering
      the cached values for the lower filesystem. The simple solution is to have
      ovl_get_acl() simply duplicate the ACLs, update the values according to the
      idmapped mount and return it to acl_permission_check() so it can be used in
      posix_acl_permission(). Since overlayfs doesn't cache ACLs they'll be
      released right after.
      
      Link: https://github.com/brauner/mount-idmapped/issues/9
      Cc: Seth Forshee <sforshee@digitalocean.com>
      Cc: Amir Goldstein <amir73il@gmail.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Aleksa Sarai <cyphar@cyphar.com>
      Cc: linux-unionfs@vger.kernel.org
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      Fixes: bc70682a ("ovl: support idmapped layers")
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      4a47c638
  3. 03 Jul, 2022 4 commits
    • Linus Torvalds's avatar
      Linux 5.19-rc5 · 88084a3d
      Linus Torvalds authored
      88084a3d
    • Linus Torvalds's avatar
      lockref: remove unused 'lockref_get_or_lock()' function · b8d5109f
      Linus Torvalds authored
      Looking at the conditional lock acquire functions in the kernel due to
      the new sparse support (see commit 4a557a5d "sparse: introduce
      conditional lock acquire function attribute"), it became obvious that
      the lockref code has a couple of them, but they don't match the usual
      naming convention for the other ones, and their return value logic is
      also reversed.
      
      In the other very similar places, the naming pattern is '*_and_lock()'
      (eg 'atomic_put_and_lock()' and 'refcount_dec_and_lock()'), and the
      function returns true when the lock is taken.
      
      The lockref code is superficially very similar to the refcount code,
      only with the special "atomic wrt the embedded lock" semantics.  But
      instead of the '*_and_lock()' naming it uses '*_or_lock()'.
      
      And instead of returning true in case it took the lock, it returns true
      if it *didn't* take the lock.
      
      Now, arguably the reflock code is quite logical: it really is a "either
      decrement _or_ lock" kind of situation - and the return value is about
      whether the operation succeeded without any special care needed.
      
      So despite the similarities, the differences do make some sense, and
      maybe it's not worth trying to unify the different conditional locking
      primitives in this area.
      
      But while looking at this all, it did become obvious that the
      'lockref_get_or_lock()' function hasn't actually had any users for
      almost a decade.
      
      The only user it ever had was the shortlived 'd_rcu_to_refcount()'
      function, and it got removed and replaced with 'lockref_get_not_dead()'
      back in 2013 in commits 0d98439e ("vfs: use lockred 'dead' flag to
      mark unrecoverably dead dentries") and e5c832d5 ("vfs: fix dentry
      RCU to refcounting possibly sleeping dput()")
      
      In fact, that single use was removed less than a week after the whole
      function was introduced in commit b3abd802 ("lockref: add
      'lockref_get_or_lock() helper") so this function has been around for a
      decade, but only had a user for six days.
      
      Let's just put this mis-designed and unused function out of its misery.
      
      We can think about the naming and semantic oddities of the remaining
      'lockref_put_or_lock()' later, but at least that function has users.
      
      And while the naming is different and the return value doesn't match,
      that function matches the whole '{atomic,refcount}_dec_and_test()'
      pattern much better (ie the magic happens when the count goes down to
      zero, not when it is incremented from zero).
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b8d5109f
    • Linus Torvalds's avatar
      sparse: introduce conditional lock acquire function attribute · 4a557a5d
      Linus Torvalds authored
      The kernel tends to try to avoid conditional locking semantics because
      it makes it harder to think about and statically check locking rules,
      but we do have a few fundamental locking primitives that take locks
      conditionally - most obviously the 'trylock' functions.
      
      That has always been a problem for 'sparse' checking for locking
      imbalance, and we've had a special '__cond_lock()' macro that we've used
      to let sparse know how the locking works:
      
          # define __cond_lock(x,c)        ((c) ? ({ __acquire(x); 1; }) : 0)
      
      so that you can then use this to tell sparse that (for example) the
      spinlock trylock macro ends up acquiring the lock when it succeeds, but
      not when it fails:
      
          #define raw_spin_trylock(lock)  __cond_lock(lock, _raw_spin_trylock(lock))
      
      and then sparse can follow along the locking rules when you have code like
      
              if (!spin_trylock(&dentry->d_lock))
                      return LRU_SKIP;
      	.. sparse sees that the lock is held here..
              spin_unlock(&dentry->d_lock);
      
      and sparse ends up happy about the lock contexts.
      
      However, this '__cond_lock()' use does result in very ugly header files,
      and requires you to basically wrap the real function with that macro
      that uses '__cond_lock'.  Which has made PeterZ NAK things that try to
      fix sparse warnings over the years [1].
      
      To solve this, there is now a very experimental patch to sparse that
      basically does the exact same thing as '__cond_lock()' did, but using a
      function attribute instead.  That seems to make PeterZ happy [2].
      
      Note that this does not replace existing use of '__cond_lock()', but
      only exposes the new proposed attribute and uses it for the previously
      unannotated 'refcount_dec_and_lock()' family of functions.
      
      For existing sparse installations, this will make no difference (a
      negative output context was ignored), but if you have the experimental
      sparse patch it will make sparse now understand code that uses those
      functions, the same way '__cond_lock()' makes sparse understand the very
      similar 'atomic_dec_and_lock()' uses that have the old '__cond_lock()'
      annotations.
      
      Note that in some cases this will silence existing context imbalance
      warnings.  But in other cases it may end up exposing new sparse warnings
      for code that sparse just didn't see the locking for at all before.
      
      This is a trial, in other words.  I'd expect that if it ends up being
      successful, and new sparse releases end up having this new attribute,
      we'll migrate the old-style '__cond_lock()' users to use the new-style
      '__cond_acquires' function attribute.
      
      The actual experimental sparse patch was posted in [3].
      
      Link: https://lore.kernel.org/all/20130930134434.GC12926@twins.programming.kicks-ass.net/ [1]
      Link: https://lore.kernel.org/all/Yr60tWxN4P568x3W@worktop.programming.kicks-ass.net/ [2]
      Link: https://lore.kernel.org/all/CAHk-=wjZfO9hGqJ2_hGQG3U_XzSh9_XaXze=HgPdvJbgrvASfA@mail.gmail.com/ [3]
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Alexander Aring <aahringo@redhat.com>
      Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4a557a5d
    • Linus Torvalds's avatar
      Merge tag 'xfs-5.19-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 20855e4c
      Linus Torvalds authored
      Pull xfs fixes from Darrick Wong:
       "This fixes some stalling problems and corrects the last of the
        problems (I hope) observed during testing of the new atomic xattr
        update feature.
      
         - Fix statfs blocking on background inode gc workers
      
         - Fix some broken inode lock assertion code
      
         - Fix xattr leaf buffer leaks when cancelling a deferred xattr update
           operation
      
         - Clean up xattr recovery to make it easier to understand.
      
         - Fix xattr leaf block verifiers tripping over empty blocks.
      
         - Remove complicated and error prone xattr leaf block bholding mess.
      
         - Fix a bug where an rt extent crossing EOF was treated as "posteof"
           blocks and cleaned unnecessarily.
      
         - Fix a UAF when log shutdown races with unmount"
      
      * tag 'xfs-5.19-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: prevent a UAF when log IO errors race with unmount
        xfs: dont treat rt extents beyond EOF as eofblocks to be cleared
        xfs: don't hold xattr leaf buffers across transaction rolls
        xfs: empty xattr leaf header blocks are not corruption
        xfs: clean up the end of xfs_attri_item_recover
        xfs: always free xattri_leaf_bp when cancelling a deferred op
        xfs: use invalidate_lock to check the state of mmap_lock
        xfs: factor out the common lock flags assert
        xfs: introduce xfs_inodegc_push()
        xfs: bound maximum wait time for inodegc work
      20855e4c
  4. 02 Jul, 2022 8 commits
  5. 01 Jul, 2022 19 commits
    • Linus Torvalds's avatar
      Merge tag 'libnvdimm-fixes-5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 08986606
      Linus Torvalds authored
      Pull libnvdimm fix from Vishal Verma:
      
       - Fix a bug in the libnvdimm 'BTT' (Block Translation Table) driver
         where accounting for poison blocks to be cleared was off by one,
         causing a failure to clear the the last badblock in an nvdimm region.
      
      * tag 'libnvdimm-fixes-5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        nvdimm: Fix badblocks clear off-by-one error
      08986606
    • Linus Torvalds's avatar
      Merge tag 'thermal-5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 1ce8c443
      Linus Torvalds authored
      Pull thermal control fix from Rafael Wysocki:
       "Add a new CPU ID to the list of supported processors in the
        intel_tcc_cooling driver (Sumeet Pawnikar)"
      
      * tag 'thermal-5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        thermal: intel_tcc_cooling: Add TCC cooling support for RaptorLake
      1ce8c443
    • Linus Torvalds's avatar
      Merge tag 'pm-5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 9ee78276
      Linus Torvalds authored
      Pull power management fixes from Rafael Wysocki:
       "These fix some issues in cpufreq drivers and some issues in devfreq:
      
         - Fix error code path issues related PROBE_DEFER handling in devfreq
           (Christian Marangi)
      
         - Revert an editing accident in SPDX-License line in the devfreq
           passive governor (Lukas Bulwahn)
      
         - Fix refcount leak in of_get_devfreq_events() in the exynos-ppmu
           devfreq driver (Miaoqian Lin)
      
         - Use HZ_PER_KHZ macro in the passive devfreq governor (Yicong Yang)
      
         - Fix missing of_node_put for qoriq and pmac32 driver (Liang He)
      
         - Fix issues around throttle interrupt for qcom driver (Stephen Boyd)
      
         - Add MT8186 to cpufreq-dt-platdev blocklist (AngeloGioacchino Del
           Regno)
      
         - Make amd-pstate enable CPPC on resume from S3 (Jinzhou Su)"
      
      * tag 'pm-5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        PM / devfreq: passive: revert an editing accident in SPDX-License line
        PM / devfreq: Fix kernel warning with cpufreq passive register fail
        PM / devfreq: Rework freq_table to be local to devfreq struct
        PM / devfreq: exynos-ppmu: Fix refcount leak in of_get_devfreq_events
        PM / devfreq: passive: Use HZ_PER_KHZ macro in units.h
        PM / devfreq: Fix cpufreq passive unregister erroring on PROBE_DEFER
        PM / devfreq: Mute warning on governor PROBE_DEFER
        PM / devfreq: Fix kernel panic with cpu based scaling to passive gov
        cpufreq: Add MT8186 to cpufreq-dt-platdev blocklist
        cpufreq: pmac32-cpufreq: Fix refcount leak bug
        cpufreq: qcom-hw: Don't do lmh things without a throttle interrupt
        drivers: cpufreq: Add missing of_node_put() in qoriq-cpufreq.c
        cpufreq: amd-pstate: Add resume and suspend callbacks
      9ee78276
    • Rafael J. Wysocki's avatar
      Merge branch 'pm-cpufreq' · bc621588
      Rafael J. Wysocki authored
      Merge cpufreq fixes for 5.19-rc5, including ARM cpufreq fixes and the
      following one:
      
       - Make amd-pstate enable CPPC on resume from S3 (Jinzhou Su).
      
      * pm-cpufreq:
        cpufreq: Add MT8186 to cpufreq-dt-platdev blocklist
        cpufreq: pmac32-cpufreq: Fix refcount leak bug
        cpufreq: qcom-hw: Don't do lmh things without a throttle interrupt
        drivers: cpufreq: Add missing of_node_put() in qoriq-cpufreq.c
        cpufreq: amd-pstate: Add resume and suspend callbacks
      bc621588
    • Linus Torvalds's avatar
      Merge tag 'hwmon-for-v5.19-rc5' of... · b336ad59
      Linus Torvalds authored
      Merge tag 'hwmon-for-v5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
      
      Pull hwmon fixes from Guenter Roeck:
      
       - Fix error handling in ibmaem driver initialization
      
       - Fix bad data reported by occ driver after setting power cap
      
       - Fix typos in pmbus/ucd9200 driver comments
      
      * tag 'hwmon-for-v5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
        hwmon: (ibmaem) don't call platform_device_del() if platform_device_add() fails
        hwmon: (pmbus/ucd9200) fix typos in comments
        hwmon: (occ) Prevent power cap command overwriting poll response
      b336ad59
    • Yang Yingliang's avatar
      hwmon: (ibmaem) don't call platform_device_del() if platform_device_add() fails · d0e51022
      Yang Yingliang authored
      If platform_device_add() fails, it no need to call platform_device_del(), split
      platform_device_unregister() into platform_device_del/put(), so platform_device_put()
      can be called separately.
      
      Fixes: 8808a793 ("ibmaem: new driver for power/energy/temp meters in IBM System X hardware")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Link: https://lore.kernel.org/r/20220701074153.4021556-1-yangyingliang@huawei.comSigned-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      d0e51022
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · d0f67adb
      Linus Torvalds authored
      Pull arm64 fix from Catalin Marinas:
       "Restore TLB invalidation for the 'break-before-make' rule on
        contiguous ptes (missed in a recent clean-up)"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: hugetlb: Restore TLB invalidation for BBM on contiguous ptes
      d0f67adb
    • Linus Torvalds's avatar
      Merge tag 's390-5.19-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · cec84e75
      Linus Torvalds authored
      Pull s390 fixes from Alexander Gordeev:
      
       - Fix purgatory build process so bin2c tool does not get built
         unnecessarily and the Makefile is more consistent with other
         architectures.
      
       - Return earlier simple design of arch_get_random_seed_long|int() and
         arch_get_random_long|int() callbacks as result of changes in generic
         RNG code.
      
       - Fix minor comment typos and spelling mistakes.
      
      * tag 's390-5.19-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/qdio: Fix spelling mistake
        s390/sclp: Fix typo in comments
        s390/archrandom: simplify back to earlier design and initialize earlier
        s390/purgatory: remove duplicated build rule of kexec-purgatory.o
        s390/purgatory: hard-code obj-y in Makefile
        s390: remove unneeded 'select BUILD_BIN2C'
      cec84e75
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-5.19-3' of git://git.linux-nfs.org/projects/anna/linux-nfs · 76ff294e
      Linus Torvalds authored
      Pull NFS client fixes from Anna Schumaker:
      
       - Allocate a fattr for _nfs4_discover_trunking()
      
       - Fix module reference count leak in nfs4_run_state_manager()
      
      * tag 'nfs-for-5.19-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
        NFSv4: Add an fattr allocation to _nfs4_discover_trunking()
        NFS: restore module put when manager exits.
      76ff294e
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-5.19-rc5' of https://github.com/ceph/ceph-client · 6f8693ea
      Linus Torvalds authored
      Pull ceph fix from Ilya Dryomov:
       "A ceph filesystem fix, marked for stable.
      
        There appears to be a deeper issue on the MDS side, but for now we are
        going with this one-liner to avoid busy looping and potential soft
        lockups"
      
      * tag 'ceph-for-5.19-rc5' of https://github.com/ceph/ceph-client:
        ceph: wait on async create before checking caps for syncfs
      6f8693ea
    • Linus Torvalds's avatar
      Merge tag 'for-5.19/dm-fixes-5' of... · 8300d380
      Linus Torvalds authored
      Merge tag 'for-5.19/dm-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
      
      Pull device mapper fixes from Mike Snitzer:
       "Three fixes for invalid memory accesses discovered by using KASAN
        while running the lvm2 testsuite's dm-raid tests. Includes changes to
        MD's raid5.c given the dependency dm-raid has on the MD code"
      
      * tag 'for-5.19/dm-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
        dm raid: fix KASAN warning in raid5_add_disks
        dm raid: fix KASAN warning in raid5_remove_disk
        dm raid: fix accesses beyond end of raid member array
      8300d380
    • Linus Torvalds's avatar
      Merge tag 'io_uring-5.19-2022-07-01' of git://git.kernel.dk/linux-block · 0a35d162
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
       "Two minor tweaks:
      
         - While we still can, adjust the send/recv based flags to be in
           ->ioprio rather than in ->addr2. This is consistent with eg accept,
           and also doesn't waste a full 64-bit field for flags (Pavel)
      
         - 5.18-stable fix for re-importing provided buffers. Not much real
           world relevance here as it'll only impact non-pollable files gone
           async, which is more of a practical test case rather than something
           that is used in the wild (Dylan)"
      
      * tag 'io_uring-5.19-2022-07-01' of git://git.kernel.dk/linux-block:
        io_uring: fix provided buffer import
        io_uring: keep sendrecv flags in ioprio
      0a35d162
    • Linus Torvalds's avatar
      Merge tag 'block-5.19-2022-07-01' of git://git.kernel.dk/linux-block · d516e221
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
      
       - Fix for batch getting of tags in sbitmap (wuchi)
      
       - NVMe pull request via Christoph:
            - More quirks (Lamarque Vieira Souza, Pablo Greco)
            - Fix a fabrics disconnect regression (Ruozhu Li)
            - Fix a nvmet-tcp data_digest calculation regression (Sagi
              Grimberg)
            - Fix nvme-tcp send failure handling (Sagi Grimberg)
            - Fix a regression with nvmet-loop and passthrough controllers
              (Alan Adamson)
      
      * tag 'block-5.19-2022-07-01' of git://git.kernel.dk/linux-block:
        nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA IM2P33F8ABR1
        nvmet: add a clear_ids attribute for passthru targets
        nvme: fix regression when disconnect a recovering ctrl
        nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA XPG SX6000LNP (AKA SPECTRIX S40G)
        nvme-tcp: always fail a request when sending it failed
        nvmet-tcp: fix regression in data_digest calculation
        lib/sbitmap: Fix invalid loop in __sbitmap_queue_get_batch()
      d516e221
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 067c2273
      Linus Torvalds authored
      Pull SCSI fix from James Bottomley:
       "One simple driver fix for a dma overrun"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: hisi_sas: Limit max hw sectors for v3 HW
      067c2273
    • Linus Torvalds's avatar
      Merge tag 'ata-5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata · 690685ff
      Linus Torvalds authored
      Pull ATA fix from Damien Le Moal:
      
       - Fix a compilation warning with some versions of gcc/sparse when
         compiling the pata_cs5535 driver, from John.
      
      * tag 'ata-5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
        ata: pata_cs5535: Fix W=1 warnings
      690685ff
    • Will Deacon's avatar
      arm64: hugetlb: Restore TLB invalidation for BBM on contiguous ptes · 41098230
      Will Deacon authored
      Commit fb396bb4 ("arm64/hugetlb: Drop TLB flush from get_clear_flush()")
      removed TLB invalidation from get_clear_flush() [now get_clear_contig()]
      on the basis that the core TLB invalidation code is aware of hugetlb
      mappings backed by contiguous page-table entries and will cover the
      correct virtual address range.
      
      However, this change also resulted in the TLB invalidation being removed
      from the "break" step in the break-before-make (BBM) sequence used
      internally by huge_ptep_set_{access_flags,wrprotect}(), therefore
      making the BBM sequence unsafe irrespective of later invalidation.
      
      Although the architecture is desperately unclear about how exactly
      contiguous ptes should be updated in a live page-table, restore TLB
      invalidation to our BBM sequence under the assumption that BBM is the
      right thing to be doing in the first place.
      
      Fixes: fb396bb4 ("arm64/hugetlb: Drop TLB flush from get_clear_flush()")
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Steve Capper <steve.capper@arm.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Reviewed-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Link: https://lore.kernel.org/r/20220629095349.25748-1-will@kernel.orgSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      41098230
    • Linus Torvalds's avatar
      Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · 9650910d
      Linus Torvalds authored
      Pull clk fixes from Stephen Boyd:
       "Two small fixes
      
         - Initialize a spinlock in the stm32 reset code
      
         - Add dt bindings to the clk maintainer filepattern"
      
      * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
        MAINTAINERS: add include/dt-bindings/clock to COMMON CLK FRAMEWORK
        clk: stm32: rcc_reset: Fix missing spin_lock_init()
      9650910d
    • Darrick J. Wong's avatar
      xfs: prevent a UAF when log IO errors race with unmount · 7561cea5
      Darrick J. Wong authored
      KASAN reported the following use after free bug when running
      generic/475:
      
       XFS (dm-0): Mounting V5 Filesystem
       XFS (dm-0): Starting recovery (logdev: internal)
       XFS (dm-0): Ending recovery (logdev: internal)
       Buffer I/O error on dev dm-0, logical block 20639616, async page read
       Buffer I/O error on dev dm-0, logical block 20639617, async page read
       XFS (dm-0): log I/O error -5
       XFS (dm-0): Filesystem has been shut down due to log error (0x2).
       XFS (dm-0): Unmounting Filesystem
       XFS (dm-0): Please unmount the filesystem and rectify the problem(s).
       ==================================================================
       BUG: KASAN: use-after-free in do_raw_spin_lock+0x246/0x270
       Read of size 4 at addr ffff888109dd84c4 by task 3:1H/136
      
       CPU: 3 PID: 136 Comm: 3:1H Not tainted 5.19.0-rc4-xfsx #rc4 8e53ab5ad0fddeb31cee5e7063ff9c361915a9c4
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
       Workqueue: xfs-log/dm-0 xlog_ioend_work [xfs]
       Call Trace:
        <TASK>
        dump_stack_lvl+0x34/0x44
        print_report.cold+0x2b8/0x661
        ? do_raw_spin_lock+0x246/0x270
        kasan_report+0xab/0x120
        ? do_raw_spin_lock+0x246/0x270
        do_raw_spin_lock+0x246/0x270
        ? rwlock_bug.part.0+0x90/0x90
        xlog_force_shutdown+0xf6/0x370 [xfs 4ad76ae0d6add7e8183a553e624c31e9ed567318]
        xlog_ioend_work+0x100/0x190 [xfs 4ad76ae0d6add7e8183a553e624c31e9ed567318]
        process_one_work+0x672/0x1040
        worker_thread+0x59b/0xec0
        ? __kthread_parkme+0xc6/0x1f0
        ? process_one_work+0x1040/0x1040
        ? process_one_work+0x1040/0x1040
        kthread+0x29e/0x340
        ? kthread_complete_and_exit+0x20/0x20
        ret_from_fork+0x1f/0x30
        </TASK>
      
       Allocated by task 154099:
        kasan_save_stack+0x1e/0x40
        __kasan_kmalloc+0x81/0xa0
        kmem_alloc+0x8d/0x2e0 [xfs]
        xlog_cil_init+0x1f/0x540 [xfs]
        xlog_alloc_log+0xd1e/0x1260 [xfs]
        xfs_log_mount+0xba/0x640 [xfs]
        xfs_mountfs+0xf2b/0x1d00 [xfs]
        xfs_fs_fill_super+0x10af/0x1910 [xfs]
        get_tree_bdev+0x383/0x670
        vfs_get_tree+0x7d/0x240
        path_mount+0xdb7/0x1890
        __x64_sys_mount+0x1fa/0x270
        do_syscall_64+0x2b/0x80
        entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
       Freed by task 154151:
        kasan_save_stack+0x1e/0x40
        kasan_set_track+0x21/0x30
        kasan_set_free_info+0x20/0x30
        ____kasan_slab_free+0x110/0x190
        slab_free_freelist_hook+0xab/0x180
        kfree+0xbc/0x310
        xlog_dealloc_log+0x1b/0x2b0 [xfs]
        xfs_unmountfs+0x119/0x200 [xfs]
        xfs_fs_put_super+0x6e/0x2e0 [xfs]
        generic_shutdown_super+0x12b/0x3a0
        kill_block_super+0x95/0xd0
        deactivate_locked_super+0x80/0x130
        cleanup_mnt+0x329/0x4d0
        task_work_run+0xc5/0x160
        exit_to_user_mode_prepare+0xd4/0xe0
        syscall_exit_to_user_mode+0x1d/0x40
        entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      This appears to be a race between the unmount process, which frees the
      CIL and waits for in-flight iclog IO; and the iclog IO completion.  When
      generic/475 runs, it starts fsstress in the background, waits a few
      seconds, and substitutes a dm-error device to simulate a disk falling
      out of a machine.  If the fsstress encounters EIO on a pure data write,
      it will exit but the filesystem will still be online.
      
      The next thing the test does is unmount the filesystem, which tries to
      clean the log, free the CIL, and wait for iclog IO completion.  If an
      iclog was being written when the dm-error switch occurred, it can race
      with log unmounting as follows:
      
      Thread 1				Thread 2
      
      					xfs_log_unmount
      					xfs_log_clean
      					xfs_log_quiesce
      xlog_ioend_work
      <observe error>
      xlog_force_shutdown
      test_and_set_bit(XLOG_IOERROR)
      					xfs_log_force
      					<log is shut down, nop>
      					xfs_log_umount_write
      					<log is shut down, nop>
      					xlog_dealloc_log
      					xlog_cil_destroy
      					<wait for iclogs>
      spin_lock(&log->l_cilp->xc_push_lock)
      <KABOOM>
      
      Therefore, free the CIL after waiting for the iclogs to complete.  I
      /think/ this race has existed for quite a few years now, though I don't
      remember the ~2014 era logging code well enough to know if it was a real
      threat then or if the actual race was exposed only more recently.
      
      Fixes: ac983517 ("xfs: don't sleep in xlog_cil_force_lsn on shutdown")
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      7561cea5
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2022-07-01' of git://anongit.freedesktop.org/drm/drm · a175eca0
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Bit quieter this week, the main thing is it pulls in the fixes for the
        sysfb resource issue you were seeing. these had been queued for next
        so should have had some decent testing.
      
        Otherwise amdgpu, i915 and msm each have a few fixes, and vc4 has one.
      
        fbdev:
         - sysfb fixes/conflicting fb fixes
      
        amdgpu:
         - GPU recovery fix
      
         - Fix integer type usage in fourcc header for AMD modifiers
      
         - KFD TLB flush fix for gfx9 APUs
      
         - Display fix
      
        i915:
         - Fix ioctl argument error return
      
         - Fix d3cold disable to allow PCI upstream bridge D3 transition
      
         - Fix setting cache_dirty for dma-buf objects on discrete
      
        msm:
         - Fix to increment vsync_cnt before calling drm_crtc_handle_vblank so
           that userspace sees the value *after* it is incremented if waiting
           for vblank events
      
         - Fix to reset drm_dev to NULL in dp_display_unbind to avoid a crash
           in probe/bind error paths
      
         - Fix to resolve the smatch error of de-referencing before NULL check
           in dpu_encoder_phys_wb.c
      
         - Fix error return to userspace if fence-id allocation fails in
           submit ioctl
      
        vc4:
         - NULL ptr dereference fix"
      
      * tag 'drm-fixes-2022-07-01' of git://anongit.freedesktop.org/drm/drm:
        Revert "drm/amdgpu/display: set vblank_disable_immediate for DC"
        drm/amdgpu: To flush tlb for MMHUB of RAVEN series
        drm/fourcc: fix integer type usage in uapi header
        drm/amdgpu: fix adev variable used in amdgpu_device_gpu_recover()
        fbdev: Disable sysfb device registration when removing conflicting FBs
        firmware: sysfb: Add sysfb_disable() helper function
        firmware: sysfb: Make sysfb_create_simplefb() return a pdev pointer
        drm/msm/gem: Fix error return on fence id alloc fail
        drm/i915: tweak the ordering in cpu_write_needs_clflush
        drm/i915/dgfx: Disable d3cold at gfx root port
        drm/i915/gem: add missing else
        drm/vc4: perfmon: Fix variable dereferenced before check
        drm/msm/dpu: Fix variable dereferenced before check
        drm/msm/dp: reset drm_dev to NULL at dp_display_unbind()
        drm/msm/dpu: Increment vsync_cnt before waking up userspace
      a175eca0
  6. 30 Jun, 2022 7 commits
    • Dave Airlie's avatar
      Merge tag 'drm-misc-fixes-2022-06-30' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes · b8f0009b
      Dave Airlie authored
      A NULL pointer dereference fix for vc4, and 3 patches to improve the
      sysfb device behaviour when removing conflicting framebuffers
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      
      From: Maxime Ripard <maxime@cerno.tech>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220630072404.2fa4z3nk5h5q34ci@houat
      b8f0009b
    • Linus Torvalds's avatar
      Merge tag 'net-5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 5e837935
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from netfilter.
      
        Current release - new code bugs:
      
         - clear msg_get_inq in __sys_recvfrom() and __copy_msghdr_from_user()
      
         - mptcp:
            - invoke MP_FAIL response only when needed
            - fix shutdown vs fallback race
            - consistent map handling on failure
      
         - octeon_ep: use bitwise AND
      
        Previous releases - regressions:
      
         - tipc: move bc link creation back to tipc_node_create, fix NPD
      
        Previous releases - always broken:
      
         - tcp: add a missing nf_reset_ct() in 3WHS handling to prevent socket
           buffered skbs from keeping refcount on the conntrack module
      
         - ipv6: take care of disable_policy when restoring routes
      
         - tun: make sure to always disable and unlink NAPI instances
      
         - phy: don't trigger state machine while in suspend
      
         - netfilter: nf_tables: avoid skb access on nf_stolen
      
         - asix: fix "can't send until first packet is send" issue
      
         - usb: asix: do not force pause frames support
      
         - nxp-nci: don't issue a zero length i2c_master_read()
      
        Misc:
      
         - ncsi: allow use of proper "mellanox" DT vendor prefix
      
         - act_api: add a message for user space if any actions were already
           flushed before the error was hit"
      
      * tag 'net-5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (55 commits)
        net: dsa: felix: fix race between reading PSFP stats and port stats
        selftest: tun: add test for NAPI dismantle
        net: tun: avoid disabling NAPI twice
        net: sparx5: mdb add/del handle non-sparx5 devices
        net: sfp: fix memory leak in sfp_probe()
        mlxsw: spectrum_router: Fix rollback in tunnel next hop init
        net: rose: fix UAF bugs caused by timer handler
        net: usb: ax88179_178a: Fix packet receiving
        net: bonding: fix use-after-free after 802.3ad slave unbind
        ipv6: fix lockdep splat in in6_dump_addrs()
        net: phy: ax88772a: fix lost pause advertisement configuration
        net: phy: Don't trigger state machine while in suspend
        usbnet: fix memory allocation in helpers
        selftests net: fix kselftest net fatal error
        NFC: nxp-nci: don't print header length mismatch on i2c error
        NFC: nxp-nci: Don't issue a zero length i2c_master_read()
        net: tipc: fix possible refcount leak in tipc_sk_create()
        nfc: nfcmrvl: Fix irq_of_parse_and_map() return value
        net: ipv6: unexport __init-annotated seg6_hmac_net_init()
        ipv6/sit: fix ipip6_tunnel_get_prl return value
        ...
      5e837935
    • Amir Goldstein's avatar
      vfs: fix copy_file_range() regression in cross-fs copies · 868f9f2f
      Amir Goldstein authored
      A regression has been reported by Nicolas Boichat, found while using the
      copy_file_range syscall to copy a tracefs file.
      
      Before commit 5dae222a ("vfs: allow copy_file_range to copy across
      devices") the kernel would return -EXDEV to userspace when trying to
      copy a file across different filesystems.  After this commit, the
      syscall doesn't fail anymore and instead returns zero (zero bytes
      copied), as this file's content is generated on-the-fly and thus reports
      a size of zero.
      
      Another regression has been reported by He Zhe - the assertion of
      WARN_ON_ONCE(ret == -EOPNOTSUPP) can be triggered from userspace when
      copying from a sysfs file whose read operation may return -EOPNOTSUPP.
      
      Since we do not have test coverage for copy_file_range() between any two
      types of filesystems, the best way to avoid these sort of issues in the
      future is for the kernel to be more picky about filesystems that are
      allowed to do copy_file_range().
      
      This patch restores some cross-filesystem copy restrictions that existed
      prior to commit 5dae222a ("vfs: allow copy_file_range to copy across
      devices"), namely, cross-sb copy is not allowed for filesystems that do
      not implement ->copy_file_range().
      
      Filesystems that do implement ->copy_file_range() have full control of
      the result - if this method returns an error, the error is returned to
      the user.  Before this change this was only true for fs that did not
      implement the ->remap_file_range() operation (i.e.  nfsv3).
      
      Filesystems that do not implement ->copy_file_range() still fall-back to
      the generic_copy_file_range() implementation when the copy is within the
      same sb.  This helps the kernel can maintain a more consistent story
      about which filesystems support copy_file_range().
      
      nfsd and ksmbd servers are modified to fall-back to the
      generic_copy_file_range() implementation in case vfs_copy_file_range()
      fails with -EOPNOTSUPP or -EXDEV, which preserves behavior of
      server-side-copy.
      
      fall-back to generic_copy_file_range() is not implemented for the smb
      operation FSCTL_DUPLICATE_EXTENTS_TO_FILE, which is arguably a correct
      change of behavior.
      
      Fixes: 5dae222a ("vfs: allow copy_file_range to copy across devices")
      Link: https://lore.kernel.org/linux-fsdevel/20210212044405.4120619-1-drinkcat@chromium.org/
      Link: https://lore.kernel.org/linux-fsdevel/CANMq1KDZuxir2LM5jOTm0xx+BnvW=ZmpsG47CyHFJwnw7zSX6Q@mail.gmail.com/
      Link: https://lore.kernel.org/linux-fsdevel/20210126135012.1.If45b7cdc3ff707bc1efa17f5366057d60603c45f@changeid/
      Link: https://lore.kernel.org/linux-fsdevel/20210630161320.29006-1-lhenriques@suse.de/Reported-by: default avatarNicolas Boichat <drinkcat@chromium.org>
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Signed-off-by: default avatarLuis Henriques <lhenriques@suse.de>
      Fixes: 64bf5ff5 ("vfs: no fallback for ->copy_file_range")
      Link: https://lore.kernel.org/linux-fsdevel/20f17f64-88cb-4e80-07c1-85cb96c83619@windriver.com/Reported-by: default avatarHe Zhe <zhe.he@windriver.com>
      Tested-by: default avatarNamjae Jeon <linkinjeon@kernel.org>
      Tested-by: default avatarLuis Henriques <lhenriques@suse.de>
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      868f9f2f
    • Chuck Lever's avatar
      SUNRPC: Fix READ_PLUS crasher · a23dd544
      Chuck Lever authored
      Looks like there are still cases when "space_left - frag1bytes" can
      legitimately exceed PAGE_SIZE. Ensure that xdr->end always remains
      within the current encode buffer.
      Reported-by: default avatarBruce Fields <bfields@fieldses.org>
      Reported-by: default avatarZorro Lang <zlang@redhat.com>
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=216151
      Fixes: 6c254bf3 ("SUNRPC: Fix the calculation of xdr->end in xdr_get_next_encode_buffer()")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      a23dd544
    • Scott Mayhew's avatar
      NFSv4: Add an fattr allocation to _nfs4_discover_trunking() · 4f40a5b5
      Scott Mayhew authored
      This was missed in c3ed2227 ("NFSv4: Fix free of uninitialized
      nfs4_label on referral lookup.") and causes a panic when mounting
      with '-o trunkdiscovery':
      
      PID: 1604   TASK: ffff93dac3520000  CPU: 3   COMMAND: "mount.nfs"
       #0 [ffffb79140f738f8] machine_kexec at ffffffffaec64bee
       #1 [ffffb79140f73950] __crash_kexec at ffffffffaeda67fd
       #2 [ffffb79140f73a18] crash_kexec at ffffffffaeda76ed
       #3 [ffffb79140f73a30] oops_end at ffffffffaec2658d
       #4 [ffffb79140f73a50] general_protection at ffffffffaf60111e
          [exception RIP: nfs_fattr_init+0x5]
          RIP: ffffffffc0c18265  RSP: ffffb79140f73b08  RFLAGS: 00010246
          RAX: 0000000000000000  RBX: ffff93dac304a800  RCX: 0000000000000000
          RDX: ffffb79140f73bb0  RSI: ffff93dadc8cbb40  RDI: d03ee11cfaf6bd50
          RBP: ffffb79140f73be8   R8: ffffffffc0691560   R9: 0000000000000006
          R10: ffff93db3ffd3df8  R11: 0000000000000000  R12: ffff93dac4040000
          R13: ffff93dac2848e00  R14: ffffb79140f73b60  R15: ffffb79140f73b30
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
       #5 [ffffb79140f73b08] _nfs41_proc_get_locations at ffffffffc0c73d53 [nfsv4]
       #6 [ffffb79140f73bf0] nfs4_proc_get_locations at ffffffffc0c83e90 [nfsv4]
       #7 [ffffb79140f73c60] nfs4_discover_trunking at ffffffffc0c83fb7 [nfsv4]
       #8 [ffffb79140f73cd8] nfs_probe_fsinfo at ffffffffc0c0f95f [nfs]
       #9 [ffffb79140f73da0] nfs_probe_server at ffffffffc0c1026a [nfs]
          RIP: 00007f6254fce26e  RSP: 00007ffc69496ac8  RFLAGS: 00000246
          RAX: ffffffffffffffda  RBX: 0000000000000000  RCX: 00007f6254fce26e
          RDX: 00005600220a82a0  RSI: 00005600220a64d0  RDI: 00005600220a6520
          RBP: 00007ffc69496c50   R8: 00005600220a8710   R9: 003035322e323231
          R10: 0000000000000000  R11: 0000000000000246  R12: 00007ffc69496c50
          R13: 00005600220a8440  R14: 0000000000000010  R15: 0000560020650ef9
          ORIG_RAX: 00000000000000a5  CS: 0033  SS: 002b
      
      Fixes: c3ed2227 ("NFSv4: Fix free of uninitialized nfs4_label on referral lookup.")
      Signed-off-by: default avatarScott Mayhew <smayhew@redhat.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      4f40a5b5
    • NeilBrown's avatar
      NFS: restore module put when manager exits. · 080abad7
      NeilBrown authored
      Commit f49169c9 ("NFSD: Remove svc_serv_ops::svo_module") removed
      calls to module_put_and_kthread_exit() from threads that acted as SUNRPC
      servers and had a related svc_serv_ops structure.  This was correct.
      
      It ALSO removed the module_put_and_kthread_exit() call from
      nfs4_run_state_manager() which is NOT a SUNRPC service.
      
      Consequently every time the NFSv4 state manager runs the module count
      increments and won't be decremented.  So the nfsv4 module cannot be
      unloaded.
      
      So restore the module_put_and_kthread_exit() call.
      
      Fixes: f49169c9 ("NFSD: Remove svc_serv_ops::svo_module")
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      080abad7
    • Jens Axboe's avatar
      Merge tag 'nvme-5.19-2022-06-30' of git://git.infradead.org/nvme into block-5.19 · f3163d85
      Jens Axboe authored
      Pull NVMe fixes from Christoph:
      
      "nvme fixes for Linux 5.19
      
       - more quirks (Lamarque Vieira Souza, Pablo Greco)
       - fix a fabrics disconnect regression (Ruozhu Li)
       - fix a nvmet-tcp data_digest calculation regression (Sagi Grimberg)
       - fix nvme-tcp send failure handling (Sagi Grimberg)
       - fix a regression with nvmet-loop and passthrough controllers
         (Alan Adamson)"
      
      * tag 'nvme-5.19-2022-06-30' of git://git.infradead.org/nvme:
        nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA IM2P33F8ABR1
        nvmet: add a clear_ids attribute for passthru targets
        nvme: fix regression when disconnect a recovering ctrl
        nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA XPG SX6000LNP (AKA SPECTRIX S40G)
        nvme-tcp: always fail a request when sending it failed
        nvmet-tcp: fix regression in data_digest calculation
      f3163d85