An error occurred fetching the project authors.
  1. 19 Jan, 2023 8 commits
    • Christian Brauner's avatar
      fs: port vfs{g,u}id helpers to mnt_idmap · 4d7ca409
      Christian Brauner authored
      Convert to struct mnt_idmap.
      
      Last cycle we merged the necessary infrastructure in
      256c8aed ("fs: introduce dedicated idmap type for mounts").
      This is just the conversion to struct mnt_idmap.
      
      Currently we still pass around the plain namespace that was attached to a
      mount. This is in general pretty convenient but it makes it easy to
      conflate namespaces that are relevant on the filesystem with namespaces
      that are relevent on the mount level. Especially for non-vfs developers
      without detailed knowledge in this area this can be a potential source for
      bugs.
      
      Once the conversion to struct mnt_idmap is done all helpers down to the
      really low-level helpers will take a struct mnt_idmap argument instead of
      two namespace arguments. This way it becomes impossible to conflate the two
      eliminating the possibility of any bugs. All of the vfs and all filesystems
      only operate on struct mnt_idmap.
      Acked-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      4d7ca409
    • Christian Brauner's avatar
      fs: port i_{g,u}id_into_vfs{g,u}id() to mnt_idmap · e67fe633
      Christian Brauner authored
      Convert to struct mnt_idmap.
      Remove legacy file_mnt_user_ns() and mnt_user_ns().
      
      Last cycle we merged the necessary infrastructure in
      256c8aed ("fs: introduce dedicated idmap type for mounts").
      This is just the conversion to struct mnt_idmap.
      
      Currently we still pass around the plain namespace that was attached to a
      mount. This is in general pretty convenient but it makes it easy to
      conflate namespaces that are relevant on the filesystem with namespaces
      that are relevent on the mount level. Especially for non-vfs developers
      without detailed knowledge in this area this can be a potential source for
      bugs.
      
      Once the conversion to struct mnt_idmap is done all helpers down to the
      really low-level helpers will take a struct mnt_idmap argument instead of
      two namespace arguments. This way it becomes impossible to conflate the two
      eliminating the possibility of any bugs. All of the vfs and all filesystems
      only operate on struct mnt_idmap.
      Acked-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      e67fe633
    • Christian Brauner's avatar
      fs: port privilege checking helpers to mnt_idmap · 9452e93e
      Christian Brauner authored
      Convert to struct mnt_idmap.
      
      Last cycle we merged the necessary infrastructure in
      256c8aed ("fs: introduce dedicated idmap type for mounts").
      This is just the conversion to struct mnt_idmap.
      
      Currently we still pass around the plain namespace that was attached to a
      mount. This is in general pretty convenient but it makes it easy to
      conflate namespaces that are relevant on the filesystem with namespaces
      that are relevent on the mount level. Especially for non-vfs developers
      without detailed knowledge in this area this can be a potential source for
      bugs.
      
      Once the conversion to struct mnt_idmap is done all helpers down to the
      really low-level helpers will take a struct mnt_idmap argument instead of
      two namespace arguments. This way it becomes impossible to conflate the two
      eliminating the possibility of any bugs. All of the vfs and all filesystems
      only operate on struct mnt_idmap.
      Acked-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      9452e93e
    • Christian Brauner's avatar
      fs: port inode_owner_or_capable() to mnt_idmap · 01beba79
      Christian Brauner authored
      Convert to struct mnt_idmap.
      
      Last cycle we merged the necessary infrastructure in
      256c8aed ("fs: introduce dedicated idmap type for mounts").
      This is just the conversion to struct mnt_idmap.
      
      Currently we still pass around the plain namespace that was attached to a
      mount. This is in general pretty convenient but it makes it easy to
      conflate namespaces that are relevant on the filesystem with namespaces
      that are relevent on the mount level. Especially for non-vfs developers
      without detailed knowledge in this area this can be a potential source for
      bugs.
      
      Once the conversion to struct mnt_idmap is done all helpers down to the
      really low-level helpers will take a struct mnt_idmap argument instead of
      two namespace arguments. This way it becomes impossible to conflate the two
      eliminating the possibility of any bugs. All of the vfs and all filesystems
      only operate on struct mnt_idmap.
      Acked-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      01beba79
    • Christian Brauner's avatar
      fs: port acl to mnt_idmap · 700b7940
      Christian Brauner authored
      Convert to struct mnt_idmap.
      
      Last cycle we merged the necessary infrastructure in
      256c8aed ("fs: introduce dedicated idmap type for mounts").
      This is just the conversion to struct mnt_idmap.
      
      Currently we still pass around the plain namespace that was attached to a
      mount. This is in general pretty convenient but it makes it easy to
      conflate namespaces that are relevant on the filesystem with namespaces
      that are relevent on the mount level. Especially for non-vfs developers
      without detailed knowledge in this area this can be a potential source for
      bugs.
      
      Once the conversion to struct mnt_idmap is done all helpers down to the
      really low-level helpers will take a struct mnt_idmap argument instead of
      two namespace arguments. This way it becomes impossible to conflate the two
      eliminating the possibility of any bugs. All of the vfs and all filesystems
      only operate on struct mnt_idmap.
      Acked-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      700b7940
    • Christian Brauner's avatar
      fs: port ->permission() to pass mnt_idmap · 4609e1f1
      Christian Brauner authored
      Convert to struct mnt_idmap.
      
      Last cycle we merged the necessary infrastructure in
      256c8aed ("fs: introduce dedicated idmap type for mounts").
      This is just the conversion to struct mnt_idmap.
      
      Currently we still pass around the plain namespace that was attached to a
      mount. This is in general pretty convenient but it makes it easy to
      conflate namespaces that are relevant on the filesystem with namespaces
      that are relevent on the mount level. Especially for non-vfs developers
      without detailed knowledge in this area this can be a potential source for
      bugs.
      
      Once the conversion to struct mnt_idmap is done all helpers down to the
      really low-level helpers will take a struct mnt_idmap argument instead of
      two namespace arguments. This way it becomes impossible to conflate the two
      eliminating the possibility of any bugs. All of the vfs and all filesystems
      only operate on struct mnt_idmap.
      Acked-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      4609e1f1
    • Christian Brauner's avatar
      fs: port ->set_acl() to pass mnt_idmap · 13e83a49
      Christian Brauner authored
      Convert to struct mnt_idmap.
      
      Last cycle we merged the necessary infrastructure in
      256c8aed ("fs: introduce dedicated idmap type for mounts").
      This is just the conversion to struct mnt_idmap.
      
      Currently we still pass around the plain namespace that was attached to a
      mount. This is in general pretty convenient but it makes it easy to
      conflate namespaces that are relevant on the filesystem with namespaces
      that are relevent on the mount level. Especially for non-vfs developers
      without detailed knowledge in this area this can be a potential source for
      bugs.
      
      Once the conversion to struct mnt_idmap is done all helpers down to the
      really low-level helpers will take a struct mnt_idmap argument instead of
      two namespace arguments. This way it becomes impossible to conflate the two
      eliminating the possibility of any bugs. All of the vfs and all filesystems
      only operate on struct mnt_idmap.
      Acked-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      13e83a49
    • Christian Brauner's avatar
      fs: port ->get_acl() to pass mnt_idmap · 77435322
      Christian Brauner authored
      Convert to struct mnt_idmap.
      
      Last cycle we merged the necessary infrastructure in
      256c8aed ("fs: introduce dedicated idmap type for mounts").
      This is just the conversion to struct mnt_idmap.
      
      Currently we still pass around the plain namespace that was attached to a
      mount. This is in general pretty convenient but it makes it easy to
      conflate namespaces that are relevant on the filesystem with namespaces
      that are relevent on the mount level. Especially for non-vfs developers
      without detailed knowledge in this area this can be a potential source for
      bugs.
      
      Once the conversion to struct mnt_idmap is done all helpers down to the
      really low-level helpers will take a struct mnt_idmap argument instead of
      two namespace arguments. This way it becomes impossible to conflate the two
      eliminating the possibility of any bugs. All of the vfs and all filesystems
      only operate on struct mnt_idmap.
      Acked-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      77435322
  2. 02 Dec, 2022 1 commit
  3. 31 Oct, 2022 1 commit
  4. 28 Oct, 2022 1 commit
  5. 20 Oct, 2022 6 commits
    • Christian Brauner's avatar
      acl: remove a slew of now unused helpers · 0a26bde2
      Christian Brauner authored
      Now that the posix acl api is active we can remove all the hacky helpers
      we had to keep around for all these years and also remove the set and
      get posix acl xattr handler methods as they aren't needed anymore.
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      0a26bde2
    • Christian Brauner's avatar
      xattr: use posix acl api · 318e6685
      Christian Brauner authored
      In previous patches we built a new posix api solely around get and set
      inode operations. Now that we have all the pieces in place we can switch
      the system calls and the vfs over to only rely on this api when
      interacting with posix acls. This finally removes all type unsafety and
      type conversion issues explained in detail in [1] that we aim to get rid
      of.
      
      With the new posix acl api we immediately translate into an appropriate
      kernel internal struct posix_acl format both when getting and setting
      posix acls. This is a stark contrast to before were we hacked unsafe raw
      values into the uapi struct that was stored in a void pointer relying
      and having filesystems and security modules hack around in the uapi
      struct as well.
      
      Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1]
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      318e6685
    • Christian Brauner's avatar
      acl: add vfs_remove_acl() · aeb7f005
      Christian Brauner authored
      In previous patches we implemented get and set inode operations for all
      non-stacking filesystems that support posix acls but didn't yet
      implement get and/or set acl inode operations. This specifically
      affected cifs and 9p.
      
      Now we can build a posix acl api based solely on get and set inode
      operations. We add a new vfs_remove_acl() api that can be used to set
      posix acls. This finally removes all type unsafety and type conversion
      issues explained in detail in [1] that we aim to get rid of.
      
      After we finished building the vfs api we can switch stacking
      filesystems to rely on the new posix api and then finally switch the
      xattr system calls themselves to rely on the posix acl api.
      
      Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1]
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      aeb7f005
    • Christian Brauner's avatar
      acl: add vfs_get_acl() · 4f353ba4
      Christian Brauner authored
      In previous patches we implemented get and set inode operations for all
      non-stacking filesystems that support posix acls but didn't yet
      implement get and/or set acl inode operations. This specifically
      affected cifs and 9p.
      
      Now we can build a posix acl api based solely on get and set inode
      operations. We add a new vfs_get_acl() api that can be used to get posix
      acls. This finally removes all type unsafety and type conversion issues
      explained in detail in [1] that we aim to get rid of.
      
      After we finished building the vfs api we can switch stacking
      filesystems to rely on the new posix api and then finally switch the
      xattr system calls themselves to rely on the posix acl api.
      
      Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1]
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      4f353ba4
    • Christian Brauner's avatar
      acl: add vfs_set_acl() · e4cc9163
      Christian Brauner authored
      In previous patches we implemented get and set inode operations for all
      non-stacking filesystems that support posix acls but didn't yet
      implement get and/or set acl inode operations. This specifically
      affected cifs and 9p.
      
      Now we can build a posix acl api based solely on get and set inode
      operations. We add a new vfs_set_acl() api that can be used to set posix
      acls. This finally removes all type unsafety and type conversion issues
      explained in detail in [1] that we aim to get rid of.
      
      After we finished building the vfs api we can switch stacking
      filesystems to rely on the new posix api and then finally switch the
      xattr system calls themselves to rely on the posix acl api.
      
      Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1]
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      e4cc9163
    • Christian Brauner's avatar
      fs: rename current get acl method · cac2f8b8
      Christian Brauner authored
      The current way of setting and getting posix acls through the generic
      xattr interface is error prone and type unsafe. The vfs needs to
      interpret and fixup posix acls before storing or reporting it to
      userspace. Various hacks exist to make this work. The code is hard to
      understand and difficult to maintain in it's current form. Instead of
      making this work by hacking posix acls through xattr handlers we are
      building a dedicated posix acl api around the get and set inode
      operations. This removes a lot of hackiness and makes the codepaths
      easier to maintain. A lot of background can be found in [1].
      
      The current inode operation for getting posix acls takes an inode
      argument but various filesystems (e.g., 9p, cifs, overlayfs) need access
      to the dentry. In contrast to the ->set_acl() inode operation we cannot
      simply extend ->get_acl() to take a dentry argument. The ->get_acl()
      inode operation is called from:
      
      acl_permission_check()
      -> check_acl()
         -> get_acl()
      
      which is part of generic_permission() which in turn is part of
      inode_permission(). Both generic_permission() and inode_permission() are
      called in the ->permission() handler of various filesystems (e.g.,
      overlayfs). So simply passing a dentry argument to ->get_acl() would
      amount to also having to pass a dentry argument to ->permission(). We
      should avoid this unnecessary change.
      
      So instead of extending the existing inode operation rename it from
      ->get_acl() to ->get_inode_acl() and add a ->get_acl() method later that
      passes a dentry argument and which filesystems that need access to the
      dentry can implement instead of ->get_inode_acl(). Filesystems like cifs
      which allow setting and getting posix acls but not using them for
      permission checking during lookup can simply not implement
      ->get_inode_acl().
      
      This is intended to be a non-functional change.
      
      Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1]
      Suggested-by/Inspired-by: Christoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      cac2f8b8
  6. 19 Oct, 2022 1 commit
    • Christian Brauner's avatar
      fs: pass dentry to set acl method · 138060ba
      Christian Brauner authored
      The current way of setting and getting posix acls through the generic
      xattr interface is error prone and type unsafe. The vfs needs to
      interpret and fixup posix acls before storing or reporting it to
      userspace. Various hacks exist to make this work. The code is hard to
      understand and difficult to maintain in it's current form. Instead of
      making this work by hacking posix acls through xattr handlers we are
      building a dedicated posix acl api around the get and set inode
      operations. This removes a lot of hackiness and makes the codepaths
      easier to maintain. A lot of background can be found in [1].
      
      Since some filesystem rely on the dentry being available to them when
      setting posix acls (e.g., 9p and cifs) they cannot rely on set acl inode
      operation. But since ->set_acl() is required in order to use the generic
      posix acl xattr handlers filesystems that do not implement this inode
      operation cannot use the handler and need to implement their own
      dedicated posix acl handlers.
      
      Update the ->set_acl() inode method to take a dentry argument. This
      allows all filesystems to rely on ->set_acl().
      
      As far as I can tell all codepaths can be switched to rely on the dentry
      instead of just the inode. Note that the original motivation for passing
      the dentry separate from the inode instead of just the dentry in the
      xattr handlers was because of security modules that call
      security_d_instantiate(). This hook is called during
      d_instantiate_new(), d_add(), __d_instantiate_anon(), and
      d_splice_alias() to initialize the inode's security context and possibly
      to set security.* xattrs. Since this only affects security.* xattrs this
      is completely irrelevant for posix acls.
      
      Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1]
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      138060ba
  7. 03 Oct, 2022 1 commit
    • Jeff Layton's avatar
      tmpfs: add support for an i_version counter · 36f05cab
      Jeff Layton authored
      NFSv4 mandates a change attribute to avoid problems with timestamp
      granularity, which Linux implements using the i_version counter. This is
      particularly important when the underlying filesystem is fast.
      
      Give tmpfs an i_version counter. Since it doesn't have to be persistent,
      we can just turn on SB_I_VERSION and sprinkle some inode_inc_iversion
      calls in the right places.
      
      Also, while there is no formal spec for xattrs, most implementations
      update the ctime on setxattr. Fix shmem_xattr_handler_set to update the
      ctime and bump the i_version appropriately.
      
      Link: https://lkml.kernel.org/r/20220909130031.15477-1-jlayton@kernel.orgSigned-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      36f05cab
  8. 17 Sep, 2022 1 commit
  9. 31 Aug, 2022 3 commits
    • Christian Brauner's avatar
      acl: move idmapping handling into posix_acl_xattr_set() · 52edb408
      Christian Brauner authored
      The uapi POSIX ACL struct passed through the value argument during
      setxattr() contains {g,u}id values encoded via ACL_{GROUP,USER} entries
      that should actually be stored in the form of k{g,u}id_t (See [1] for a
      long explanation of the issue.).
      
      In 0c5fd887 ("acl: move idmapped mount fixup into vfs_{g,s}etxattr()")
      we took the mount's idmapping into account in order to let overlayfs
      handle POSIX ACLs on idmapped layers correctly. The fixup is currently
      performed directly in vfs_setxattr() which piles on top of the earlier
      hackiness by handling the mount's idmapping and stuff the vfs{g,u}id_t
      values into the uapi struct as well. While that is all correct and works
      fine it's just ugly.
      
      Now that we have introduced vfs_make_posix_acl() earlier move handling
      idmapped mounts out of vfs_setxattr() and into the POSIX ACL handler
      where it belongs.
      
      Note that we also need to call vfs_make_posix_acl() for EVM which
      interpretes POSIX ACLs during security_inode_setxattr(). Leave them a
      longer comment for future reference.
      
      All filesystems that support idmapped mounts via FS_ALLOW_IDMAP use the
      standard POSIX ACL xattr handlers and are covered by this change. This
      includes overlayfs which simply calls vfs_{g,s}etxattr().
      
      The following filesystems use custom POSIX ACL xattr handlers: 9p, cifs,
      ecryptfs, and ntfs3 (and overlayfs but we've covered that in the paragraph
      above) and none of them support idmapped mounts yet.
      
      Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org/ [1]
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      Reviewed-by: default avatarSeth Forshee (DigitalOcean) <sforshee@kernel.org>
      52edb408
    • Christian Brauner's avatar
      acl: add vfs_set_acl_prepare() · 6b70fe06
      Christian Brauner authored
      Various filesystems store POSIX ACLs on the backing store in their uapi
      format. Such filesystems need to translate from the uapi POSIX ACL
      format into the VFS format during i_op->get_acl(). The VFS provides the
      posix_acl_from_xattr() helper for this task.
      
      But the usage of posix_acl_from_xattr() is currently ambiguous. It is
      intended to transform from a uapi POSIX ACL  to the VFS represenation.
      For example, when retrieving POSIX ACLs for permission checking during
      lookup or when calling getxattr() to retrieve system.posix_acl_{access,default}.
      
      Calling posix_acl_from_xattr() during i_op->get_acl() will map the raw
      {g,u}id values stored as ACL_{GROUP,USER} entries in the uapi POSIX ACL
      format into k{g,u}id_t in the filesystem's idmapping and return a struct
      posix_acl ready to be returned to the VFS for caching and to perform
      permission checks on.
      
      However, posix_acl_from_xattr() is also called during setxattr() for all
      filesystems that rely on VFS provides posix_acl_{access,default}_xattr_handler.
      The posix_acl_xattr_set() handler which is used for the ->set() method
      of posix_acl_{access,default}_xattr_handler uses posix_acl_from_xattr()
      to translate from the uapi POSIX ACL format to the VFS format so that it
      can be passed to the i_op->set_acl() handler of the filesystem or for
      direct caching in case no i_op->set_acl() handler is defined.
      
      During setxattr() the {g,u}id values stored as ACL_{GROUP,USER} entries
      in the uapi POSIX ACL format aren't raw {g,u}id values that need to be
      mapped according to the filesystem's idmapping. Instead they are {g,u}id
      values in the caller's idmapping which have been generated during
      posix_acl_fix_xattr_from_user(). In other words, they are k{g,u}id_t
      which are passed as raw {g,u}id values abusing the uapi POSIX ACL format
      (Please note that this type safety violation has existed since the
      introduction of k{g,u}id_t. Please see [1] for more details.).
      
      So when posix_acl_from_xattr() is called in posix_acl_xattr_set() the
      filesystem idmapping is completely irrelevant. Instead, we abuse the
      initial idmapping to recover the k{g,u}id_t base on the value stored in
      raw {g,u}id as ACL_{GROUP,USER} in the uapi POSIX ACL format.
      
      We need to clearly distinguish betweeen these two operations as it is
      really easy to confuse for filesystems as can be seen in ntfs3.
      
      In order to do this we factor out make_posix_acl() which takes callbacks
      allowing callers to pass dedicated methods to generate the correct
      k{g,u}id_t. This is just an internal static helper which is not exposed
      to any filesystems but it neatly encapsulates the basic logic of walking
      through a uapi POSIX ACL and returning an allocated VFS POSIX ACL with
      the correct k{g,u}id_t values.
      
      The posix_acl_from_xattr() helper can then be implemented as a simple
      call to make_posix_acl() with callbacks that generate the correct
      k{g,u}id_t from the raw {g,u}id values in ACL_{GROUP,USER} entries in
      the uapi POSIX ACL format as read from the backing store.
      
      For setxattr() we add a new helper vfs_set_acl_prepare() which has
      callbacks to map the POSIX ACLs from the uapi format with the k{g,u}id_t
      values stored in raw {g,u}id format in ACL_{GROUP,USER} entries into the
      correct k{g,u}id_t values in the filesystem idmapping. In contrast to
      posix_acl_from_xattr() the vfs_set_acl_prepare() helper needs to take
      the mount idmapping into account. The differences are explained in more
      detail in the kernel doc for the new functions.
      
      In follow up patches we will remove all abuses of posix_acl_from_xattr()
      for setxattr() operations and replace it with calls to vfs_set_acl_prepare().
      
      The new vfs_set_acl_prepare() helper allows us to deal with the
      ambiguity in how the POSI ACL uapi struct stores {g,u}id values
      depending on whether this is a getxattr() or setxattr() operation.
      
      This also allows us to remove the posix_acl_setxattr_idmapped_mnt()
      helper reducing the abuse of the POSIX ACL uapi format to pass values
      that should be distinct types in {g,u}id values stored as
      ACL_{GROUP,USER} entries.
      
      The removal of posix_acl_setxattr_idmapped_mnt() in turn allows us to
      re-constify the value parameter of vfs_setxattr() which in turn allows
      us to avoid the nasty cast from a const void pointer to a non-const void
      pointer on ovl_do_setxattr().
      
      Ultimately, the plan is to get rid of the type violations completely and
      never pass the values from k{g,u}id_t as raw {g,u}id in ACL_{GROUP,USER}
      entries in uapi POSIX ACL format. But that's a longer way to go and this
      is a preparatory step.
      
      Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1]
      Co-Developed-by: default avatarSeth Forshee <sforshee@digitalocean.com>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      6b70fe06
    • Christian Brauner's avatar
      acl: return EOPNOTSUPP in posix_acl_fix_xattr_common() · 985a6d0b
      Christian Brauner authored
      Return EOPNOTSUPP when the POSIX ACL version doesn't match and zero if
      there are no entries. This will allow us to reuse the helper in
      posix_acl_from_xattr(). This change will have no user visible effects.
      
      Fixes: 0c5fd887 ("acl: move idmapped mount fixup into vfs_{g,s}etxattr()")
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      Reviewed-by: default avatarSeth Forshee (DigitalOcean) <sforshee@kernel.org&gt;>
      985a6d0b
  10. 17 Aug, 2022 1 commit
    • Christian Brauner's avatar
      acl: handle idmapped mounts for idmapped filesystems · abfcf55d
      Christian Brauner authored
      Ensure that POSIX ACLs checking, getting, and setting works correctly
      for filesystems mountable with a filesystem idmapping ("fs_idmapping")
      that want to support idmapped mounts ("mnt_idmapping").
      
      Note that no filesystems mountable with an fs_idmapping do yet support
      idmapped mounts. This is required infrastructure work to unblock this.
      
      As we explained in detail in [1] the fs_idmapping is irrelevant for
      getxattr() and setxattr() when mapping the ACL_{GROUP,USER} {g,u}ids
      stored in the uapi struct posix_acl_xattr_entry in
      posix_acl_fix_xattr_{from,to}_user().
      
      But for acl_permission_check() and posix_acl_{g,s}etxattr_idmapped_mnt()
      the fs_idmapping matters.
      
      acl_permission_check():
        During lookup POSIX ACLs are retrieved directly via i_op->get_acl() and
        are returned via the kernel internal struct posix_acl which contains
        e_{g,u}id members of type k{g,u}id_t that already take the
        fs_idmapping into acccount.
      
        For example, a POSIX ACL stored with u4 on the backing store is mapped
        to k10000004 in the fs_idmapping. The mnt_idmapping remaps the POSIX ACL
        to k20000004. In order to do that the fs_idmapping needs to be taken
        into account but that doesn't happen yet (Again, this is a
        counterfactual currently as fuse doesn't support idmapped mounts
        currently. It's just used as a convenient example.):
      
        fs_idmapping:  u0:k10000000:r65536
        mnt_idmapping: u0:v20000000:r65536
        ACL_USER:      k10000004
      
        acl_permission_check()
        -> check_acl()
           -> get_acl()
              -> i_op->get_acl() == fuse_get_acl()
                 -> posix_acl_from_xattr(u0:k10000000:r65536 /* fs_idmapping */, ...)
                    {
                            k10000004 = make_kuid(u0:k10000000:r65536 /* fs_idmapping */,
                                                  u4 /* ACL_USER */);
                    }
           -> posix_acl_permission()
              {
                      -1 = make_vfsuid(u0:v20000000:r65536 /* mnt_idmapping */,
                                       &init_user_ns,
                                       k10000004);
                      vfsuid_eq_kuid(-1, k10000004 /* caller_fsuid */)
              }
      
        In order to correctly map from the fs_idmapping into mnt_idmapping we
        require the relevant fs_idmaping to be passed:
      
        acl_permission_check()
        -> check_acl()
           -> get_acl()
              -> i_op->get_acl() == fuse_get_acl()
                 -> posix_acl_from_xattr(u0:k10000000:r65536 /* fs_idmapping */, ...)
                    {
                            k10000004 = make_kuid(u0:k10000000:r65536 /* fs_idmapping */,
                                                  u4 /* ACL_USER */);
                    }
           -> posix_acl_permission()
              {
                      v20000004 = make_vfsuid(u0:v20000000:r65536 /* mnt_idmapping */,
                                              u0:k10000000:r65536 /* fs_idmapping */,
                                              k10000004);
                      vfsuid_eq_kuid(v20000004, k10000004 /* caller_fsuid */)
              }
      
        The initial_idmapping is only correct for the current situation because
        all filesystems that currently support idmapped mounts do not support
        being mounted with an fs_idmapping.
      
        Note that ovl_get_acl() is used to retrieve the POSIX ACLs from the
        relevant lower layer and the lower layer's mnt_idmapping needs to be
        taken into account and so does the fs_idmapping. See 0c5fd887 ("acl:
        move idmapped mount fixup into vfs_{g,s}etxattr()") for more details.
      
      For posix_acl_{g,s}etxattr_idmapped_mnt() it is not as obvious why the
      fs_idmapping matters as it is for acl_permission_check(). Especially
      because it doesn't matter for posix_acl_fix_xattr_{from,to}_user() (See
      [1] for more context.).
      
      Because posix_acl_{g,s}etxattr_idmapped_mnt() operate on the uapi
      struct posix_acl_xattr_entry which contains {g,u}id_t values and thus
      give the impression that the fs_idmapping is irrelevant as at this point
      appropriate {g,u}id_t values have seemlingly been generated.
      
      As we've stated multiple times this assumption is wrong and in fact the
      uapi struct posix_acl_xattr_entry is taking idmappings into account
      depending at what place it is operated on.
      
      posix_acl_getxattr_idmapped_mnt()
        When posix_acl_getxattr_idmapped_mnt() is called the values stored in
        the uapi struct posix_acl_xattr_entry are mapped according to the
        fs_idmapping. This happened when they were read from the backing store
        and then translated from struct posix_acl into the uapi
        struct posix_acl_xattr_entry during posix_acl_to_xattr().
      
        In other words, the fs_idmapping matters as the values stored as
        {g,u}id_t in the uapi struct posix_acl_xattr_entry have been generated
        by it.
      
        So we need to take the fs_idmapping into account during make_vfsuid()
        in posix_acl_getxattr_idmapped_mnt().
      
      posix_acl_setxattr_idmapped_mnt()
        When posix_acl_setxattr_idmapped_mnt() is called the values stored as
        {g,u}id_t in uapi struct posix_acl_xattr_entry are intended to be the
        values that ultimately get turned back into a k{g,u}id_t in
        posix_acl_from_xattr() (which turns the uapi
        struct posix_acl_xattr_entry into the kernel internal struct posix_acl).
      
        In other words, the fs_idmapping matters as the values stored as
        {g,u}id_t in the uapi struct posix_acl_xattr_entry are intended to be
        the values that will be undone in the fs_idmapping when writing to the
        backing store.
      
        So we need to take the fs_idmapping into account during from_vfsuid()
        in posix_acl_setxattr_idmapped_mnt().
      
      Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1]
      Fixes: 0c5fd887 ("acl: move idmapped mount fixup into vfs_{g,s}etxattr()")
      Cc: Seth Forshee <sforshee@digitalocean.com>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      Reviewed-by: default avatarSeth Forshee <sforshee@digitalocean.com>
      Link: https://lore.kernel.org/r/20220816113514.43304-1-brauner@kernel.org
      abfcf55d
  11. 15 Jul, 2022 3 commits
    • Christian Brauner's avatar
      acl: make posix_acl_clone() available to overlayfs · 8043bffd
      Christian Brauner authored
      The ovl_get_acl() function needs to alter the POSIX ACLs retrieved from the
      lower filesystem. Instead of hand-rolling a overlayfs specific
      posix_acl_clone() variant allow export it. It's not special and it's not deeply
      internal anyway.
      
      Link: https://lore.kernel.org/r/20220708090134.385160-3-brauner@kernel.org
      Cc: Seth Forshee <sforshee@digitalocean.com>
      Cc: Amir Goldstein <amir73il@gmail.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Aleksa Sarai <cyphar@cyphar.com>
      Cc: Miklos Szeredi <mszeredi@redhat.com>
      Cc: linux-unionfs@vger.kernel.org
      Cc: linux-fsdevel@vger.kernel.org
      Reviewed-by: default avatarSeth Forshee <sforshee@digitalocean.com>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      8043bffd
    • Christian Brauner's avatar
      acl: port to vfs{g,u}id_t · e933c15f
      Christian Brauner authored
      Port the few remaining pieces to vfs{g,u}id_t and associated type safe helpers.
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      e933c15f
    • Christian Brauner's avatar
      acl: move idmapped mount fixup into vfs_{g,s}etxattr() · 0c5fd887
      Christian Brauner authored
      This cycle we added support for mounting overlayfs on top of idmapped mounts.
      Recently I've started looking into potential corner cases when trying to add
      additional tests and I noticed that reporting for POSIX ACLs is currently wrong
      when using idmapped layers with overlayfs mounted on top of it.
      
      I'm going to give a rather detailed explanation to both the origin of the
      problem and the solution.
      
      Let's assume the user creates the following directory layout and they have a
      rootfs /var/lib/lxc/c1/rootfs. The files in this rootfs are owned as you would
      expect files on your host system to be owned. For example, ~/.bashrc for your
      regular user would be owned by 1000:1000 and /root/.bashrc would be owned by
      0:0. IOW, this is just regular boring filesystem tree on an ext4 or xfs
      filesystem.
      
      The user chooses to set POSIX ACLs using the setfacl binary granting the user
      with uid 4 read, write, and execute permissions for their .bashrc file:
      
              setfacl -m u:4:rwx /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc
      
      Now they to expose the whole rootfs to a container using an idmapped mount. So
      they first create:
      
              mkdir -pv /vol/contpool/{ctrover,merge,lowermap,overmap}
              mkdir -pv /vol/contpool/ctrover/{over,work}
              chown 10000000:10000000 /vol/contpool/ctrover/{over,work}
      
      The user now creates an idmapped mount for the rootfs:
      
              mount-idmapped/mount-idmapped --map-mount=b:0:10000000:65536 \
                                            /var/lib/lxc/c2/rootfs \
                                            /vol/contpool/lowermap
      
      This for example makes it so that /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc
      which is owned by uid and gid 1000 as being owned by uid and gid 10001000 at
      /vol/contpool/lowermap/home/ubuntu/.bashrc.
      
      Assume the user wants to expose these idmapped mounts through an overlayfs
      mount to a container.
      
             mount -t overlay overlay                      \
                   -o lowerdir=/vol/contpool/lowermap,     \
                      upperdir=/vol/contpool/overmap/over, \
                      workdir=/vol/contpool/overmap/work   \
                   /vol/contpool/merge
      
      The user can do this in two ways:
      
      (1) Mount overlayfs in the initial user namespace and expose it to the
          container.
      (2) Mount overlayfs on top of the idmapped mounts inside of the container's
          user namespace.
      
      Let's assume the user chooses the (1) option and mounts overlayfs on the host
      and then changes into a container which uses the idmapping 0:10000000:65536
      which is the same used for the two idmapped mounts.
      
      Now the user tries to retrieve the POSIX ACLs using the getfacl command
      
              getfacl -n /vol/contpool/lowermap/home/ubuntu/.bashrc
      
      and to their surprise they see:
      
              # file: vol/contpool/merge/home/ubuntu/.bashrc
              # owner: 1000
              # group: 1000
              user::rw-
              user:4294967295:rwx
              group::r--
              mask::rwx
              other::r--
      
      indicating the the uid wasn't correctly translated according to the idmapped
      mount. The problem is how we currently translate POSIX ACLs. Let's inspect the
      callchain in this example:
      
              idmapped mount /vol/contpool/merge:      0:10000000:65536
              caller's idmapping:                      0:10000000:65536
              overlayfs idmapping (ofs->creator_cred): 0:0:4k /* initial idmapping */
      
              sys_getxattr()
              -> path_getxattr()
                 -> getxattr()
                    -> do_getxattr()
                        |> vfs_getxattr()
                        |  -> __vfs_getxattr()
                        |     -> handler->get == ovl_posix_acl_xattr_get()
                        |        -> ovl_xattr_get()
                        |           -> vfs_getxattr()
                        |              -> __vfs_getxattr()
                        |                 -> handler->get() /* lower filesystem callback */
                        |> posix_acl_fix_xattr_to_user()
                           {
                                    4 = make_kuid(&init_user_ns, 4);
                                    4 = mapped_kuid_fs(&init_user_ns /* no idmapped mount */, 4);
                                    /* FAILURE */
                                   -1 = from_kuid(0:10000000:65536 /* caller's idmapping */, 4);
                           }
      
      If the user chooses to use option (2) and mounts overlayfs on top of idmapped
      mounts inside the container things don't look that much better:
      
              idmapped mount /vol/contpool/merge:      0:10000000:65536
              caller's idmapping:                      0:10000000:65536
              overlayfs idmapping (ofs->creator_cred): 0:10000000:65536
      
              sys_getxattr()
              -> path_getxattr()
                 -> getxattr()
                    -> do_getxattr()
                        |> vfs_getxattr()
                        |  -> __vfs_getxattr()
                        |     -> handler->get == ovl_posix_acl_xattr_get()
                        |        -> ovl_xattr_get()
                        |           -> vfs_getxattr()
                        |              -> __vfs_getxattr()
                        |                 -> handler->get() /* lower filesystem callback */
                        |> posix_acl_fix_xattr_to_user()
                           {
                                    4 = make_kuid(&init_user_ns, 4);
                                    4 = mapped_kuid_fs(&init_user_ns, 4);
                                    /* FAILURE */
                                   -1 = from_kuid(0:10000000:65536 /* caller's idmapping */, 4);
                           }
      
      As is easily seen the problem arises because the idmapping of the lower mount
      isn't taken into account as all of this happens in do_gexattr(). But
      do_getxattr() is always called on an overlayfs mount and inode and thus cannot
      possible take the idmapping of the lower layers into account.
      
      This problem is similar for fscaps but there the translation happens as part of
      vfs_getxattr() already. Let's walk through an fscaps overlayfs callchain:
      
              setcap 'cap_net_raw+ep' /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc
      
      The expected outcome here is that we'll receive the cap_net_raw capability as
      we are able to map the uid associated with the fscap to 0 within our container.
      IOW, we want to see 0 as the result of the idmapping translations.
      
      If the user chooses option (1) we get the following callchain for fscaps:
      
              idmapped mount /vol/contpool/merge:      0:10000000:65536
              caller's idmapping:                      0:10000000:65536
              overlayfs idmapping (ofs->creator_cred): 0:0:4k /* initial idmapping */
      
              sys_getxattr()
              -> path_getxattr()
                 -> getxattr()
                    -> do_getxattr()
                         -> vfs_getxattr()
                            -> xattr_getsecurity()
                               -> security_inode_getsecurity()                                       ________________________________
                                  -> cap_inode_getsecurity()                                         |                              |
                                     {                                                               V                              |
                                              10000000 = make_kuid(0:0:4k /* overlayfs idmapping */, 10000000);                     |
                                              10000000 = mapped_kuid_fs(0:0:4k /* no idmapped mount */, 10000000);                  |
                                                     /* Expected result is 0 and thus that we own the fscap. */                     |
                                                     0 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000000);            |
                                     }                                                                                              |
                                     -> vfs_getxattr_alloc()                                                                        |
                                        -> handler->get == ovl_other_xattr_get()                                                    |
                                           -> vfs_getxattr()                                                                        |
                                              -> xattr_getsecurity()                                                                |
                                                 -> security_inode_getsecurity()                                                    |
                                                    -> cap_inode_getsecurity()                                                      |
                                                       {                                                                            |
                                                                      0 = make_kuid(0:0:4k /* lower s_user_ns */, 0);               |
                                                               10000000 = mapped_kuid_fs(0:10000000:65536 /* idmapped mount */, 0); |
                                                               10000000 = from_kuid(0:0:4k /* overlayfs idmapping */, 10000000);    |
                                                               |____________________________________________________________________|
                                                       }
                                                       -> vfs_getxattr_alloc()
                                                          -> handler->get == /* lower filesystem callback */
      
      And if the user chooses option (2) we get:
      
              idmapped mount /vol/contpool/merge:      0:10000000:65536
              caller's idmapping:                      0:10000000:65536
              overlayfs idmapping (ofs->creator_cred): 0:10000000:65536
      
              sys_getxattr()
              -> path_getxattr()
                 -> getxattr()
                    -> do_getxattr()
                         -> vfs_getxattr()
                            -> xattr_getsecurity()
                               -> security_inode_getsecurity()                                                _______________________________
                                  -> cap_inode_getsecurity()                                                  |                             |
                                     {                                                                        V                             |
                                             10000000 = make_kuid(0:10000000:65536 /* overlayfs idmapping */, 0);                           |
                                             10000000 = mapped_kuid_fs(0:0:4k /* no idmapped mount */, 10000000);                           |
                                                     /* Expected result is 0 and thus that we own the fscap. */                             |
                                                    0 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000000);                     |
                                     }                                                                                                      |
                                     -> vfs_getxattr_alloc()                                                                                |
                                        -> handler->get == ovl_other_xattr_get()                                                            |
                                          |-> vfs_getxattr()                                                                                |
                                              -> xattr_getsecurity()                                                                        |
                                                 -> security_inode_getsecurity()                                                            |
                                                    -> cap_inode_getsecurity()                                                              |
                                                       {                                                                                    |
                                                                       0 = make_kuid(0:0:4k /* lower s_user_ns */, 0);                      |
                                                                10000000 = mapped_kuid_fs(0:10000000:65536 /* idmapped mount */, 0);        |
                                                                       0 = from_kuid(0:10000000:65536 /* overlayfs idmapping */, 10000000); |
                                                                       |____________________________________________________________________|
                                                       }
                                                       -> vfs_getxattr_alloc()
                                                          -> handler->get == /* lower filesystem callback */
      
      We can see how the translation happens correctly in those cases as the
      conversion happens within the vfs_getxattr() helper.
      
      For POSIX ACLs we need to do something similar. However, in contrast to fscaps
      we cannot apply the fix directly to the kernel internal posix acl data
      structure as this would alter the cached values and would also require a rework
      of how we currently deal with POSIX ACLs in general which almost never take the
      filesystem idmapping into account (the noteable exception being FUSE but even
      there the implementation is special) and instead retrieve the raw values based
      on the initial idmapping.
      
      The correct values are then generated right before returning to userspace. The
      fix for this is to move taking the mount's idmapping into account directly in
      vfs_getxattr() instead of having it be part of posix_acl_fix_xattr_to_user().
      
      To this end we split out two small and unexported helpers
      posix_acl_getxattr_idmapped_mnt() and posix_acl_setxattr_idmapped_mnt(). The
      former to be called in vfs_getxattr() and the latter to be called in
      vfs_setxattr().
      
      Let's go back to the original example. Assume the user chose option (1) and
      mounted overlayfs on top of idmapped mounts on the host:
      
              idmapped mount /vol/contpool/merge:      0:10000000:65536
              caller's idmapping:                      0:10000000:65536
              overlayfs idmapping (ofs->creator_cred): 0:0:4k /* initial idmapping */
      
              sys_getxattr()
              -> path_getxattr()
                 -> getxattr()
                    -> do_getxattr()
                        |> vfs_getxattr()
                        |  |> __vfs_getxattr()
                        |  |  -> handler->get == ovl_posix_acl_xattr_get()
                        |  |     -> ovl_xattr_get()
                        |  |        -> vfs_getxattr()
                        |  |           |> __vfs_getxattr()
                        |  |           |  -> handler->get() /* lower filesystem callback */
                        |  |           |> posix_acl_getxattr_idmapped_mnt()
                        |  |              {
                        |  |                              4 = make_kuid(&init_user_ns, 4);
                        |  |                       10000004 = mapped_kuid_fs(0:10000000:65536 /* lower idmapped mount */, 4);
                        |  |                       10000004 = from_kuid(&init_user_ns, 10000004);
                        |  |                       |_______________________
                        |  |              }                               |
                        |  |                                              |
                        |  |> posix_acl_getxattr_idmapped_mnt()           |
                        |     {                                           |
                        |                                                 V
                        |             10000004 = make_kuid(&init_user_ns, 10000004);
                        |             10000004 = mapped_kuid_fs(&init_user_ns /* no idmapped mount */, 10000004);
                        |             10000004 = from_kuid(&init_user_ns, 10000004);
                        |     }       |_________________________________________________
                        |                                                              |
                        |                                                              |
                        |> posix_acl_fix_xattr_to_user()                               |
                           {                                                           V
                                       10000004 = make_kuid(0:0:4k /* init_user_ns */, 10000004);
                                              /* SUCCESS */
                                              4 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000004);
                           }
      
      And similarly if the user chooses option (1) and mounted overayfs on top of
      idmapped mounts inside the container:
      
              idmapped mount /vol/contpool/merge:      0:10000000:65536
              caller's idmapping:                      0:10000000:65536
              overlayfs idmapping (ofs->creator_cred): 0:10000000:65536
      
              sys_getxattr()
              -> path_getxattr()
                 -> getxattr()
                    -> do_getxattr()
                        |> vfs_getxattr()
                        |  |> __vfs_getxattr()
                        |  |  -> handler->get == ovl_posix_acl_xattr_get()
                        |  |     -> ovl_xattr_get()
                        |  |        -> vfs_getxattr()
                        |  |           |> __vfs_getxattr()
                        |  |           |  -> handler->get() /* lower filesystem callback */
                        |  |           |> posix_acl_getxattr_idmapped_mnt()
                        |  |              {
                        |  |                              4 = make_kuid(&init_user_ns, 4);
                        |  |                       10000004 = mapped_kuid_fs(0:10000000:65536 /* lower idmapped mount */, 4);
                        |  |                       10000004 = from_kuid(&init_user_ns, 10000004);
                        |  |                       |_______________________
                        |  |              }                               |
                        |  |                                              |
                        |  |> posix_acl_getxattr_idmapped_mnt()           |
                        |     {                                           V
                        |             10000004 = make_kuid(&init_user_ns, 10000004);
                        |             10000004 = mapped_kuid_fs(&init_user_ns /* no idmapped mount */, 10000004);
                        |             10000004 = from_kuid(0(&init_user_ns, 10000004);
                        |             |_________________________________________________
                        |     }                                                        |
                        |                                                              |
                        |> posix_acl_fix_xattr_to_user()                               |
                           {                                                           V
                                       10000004 = make_kuid(0:0:4k /* init_user_ns */, 10000004);
                                              /* SUCCESS */
                                              4 = from_kuid(0:10000000:65536 /* caller's idmappings */, 10000004);
                           }
      
      The last remaining problem we need to fix here is ovl_get_acl(). During
      ovl_permission() overlayfs will call:
      
              ovl_permission()
              -> generic_permission()
                 -> acl_permission_check()
                    -> check_acl()
                       -> get_acl()
                          -> inode->i_op->get_acl() == ovl_get_acl()
                              > get_acl() /* on the underlying filesystem)
                                ->inode->i_op->get_acl() == /*lower filesystem callback */
                       -> posix_acl_permission()
      
      passing through the get_acl request to the underlying filesystem. This will
      retrieve the acls stored in the lower filesystem without taking the idmapping
      of the underlying mount into account as this would mean altering the cached
      values for the lower filesystem. So we block using ACLs for now until we
      decided on a nice way to fix this. Note this limitation both in the
      documentation and in the code.
      
      The most straightforward solution would be to have ovl_get_acl() simply
      duplicate the ACLs, update the values according to the idmapped mount and
      return it to acl_permission_check() so it can be used in posix_acl_permission()
      forgetting them afterwards. This is a bit heavy handed but fairly
      straightforward otherwise.
      
      Link: https://github.com/brauner/mount-idmapped/issues/9
      Link: https://lore.kernel.org/r/20220708090134.385160-2-brauner@kernel.org
      Cc: Seth Forshee <sforshee@digitalocean.com>
      Cc: Amir Goldstein <amir73il@gmail.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Aleksa Sarai <cyphar@cyphar.com>
      Cc: Miklos Szeredi <mszeredi@redhat.com>
      Cc: linux-unionfs@vger.kernel.org
      Cc: linux-fsdevel@vger.kernel.org
      Reviewed-by: default avatarSeth Forshee <sforshee@digitalocean.com>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      0c5fd887
  12. 19 Apr, 2022 1 commit
    • Christian Brauner's avatar
      fs: fix acl translation · 705191b0
      Christian Brauner authored
      Last cycle we extended the idmapped mounts infrastructure to support
      idmapped mounts of idmapped filesystems (No such filesystem yet exist.).
      Since then, the meaning of an idmapped mount is a mount whose idmapping
      is different from the filesystems idmapping.
      
      While doing that work we missed to adapt the acl translation helpers.
      They still assume that checking for the identity mapping is enough.  But
      they need to use the no_idmapping() helper instead.
      
      Note, POSIX ACLs are always translated right at the userspace-kernel
      boundary using the caller's current idmapping and the initial idmapping.
      The order depends on whether we're coming from or going to userspace.
      The filesystem's idmapping doesn't matter at the border.
      
      Consequently, if a non-idmapped mount is passed we need to make sure to
      always pass the initial idmapping as the mount's idmapping and not the
      filesystem idmapping.  Since it's irrelevant here it would yield invalid
      ids and prevent setting acls for filesystems that are mountable in a
      userns and support posix acls (tmpfs and fuse).
      
      I verified the regression reported in [1] and verified that this patch
      fixes it.  A regression test will be added to xfstests in parallel.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=215849 [1]
      Fixes: bd303368 ("fs: support mapped mounts of mapped filesystems")
      Cc: Seth Forshee <sforshee@digitalocean.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: <stable@vger.kernel.org> # 5.17
      Cc: <regressions@lists.linux.dev>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      705191b0
  13. 05 Dec, 2021 1 commit
    • Christian Brauner's avatar
      fs: support mapped mounts of mapped filesystems · bd303368
      Christian Brauner authored
      In previous patches we added new and modified existing helpers to handle
      idmapped mounts of filesystems mounted with an idmapping. In this final
      patch we convert all relevant places in the vfs to actually pass the
      filesystem's idmapping into these helpers.
      
      With this the vfs is in shape to handle idmapped mounts of filesystems
      mounted with an idmapping. Note that this is just the generic
      infrastructure. Actually adding support for idmapped mounts to a
      filesystem mountable with an idmapping is follow-up work.
      
      In this patch we extend the definition of an idmapped mount from a mount
      that that has the initial idmapping attached to it to a mount that has
      an idmapping attached to it which is not the same as the idmapping the
      filesystem was mounted with.
      
      As before we do not allow the initial idmapping to be attached to a
      mount. In addition this patch prevents that the idmapping the filesystem
      was mounted with can be attached to a mount created based on this
      filesystem.
      
      This has multiple reasons and advantages. First, attaching the initial
      idmapping or the filesystem's idmapping doesn't make much sense as in
      both cases the values of the i_{g,u}id and other places where k{g,u}ids
      are used do not change. Second, a user that really wants to do this for
      whatever reason can just create a separate dedicated identical idmapping
      to attach to the mount. Third, we can continue to use the initial
      idmapping as an indicator that a mount is not idmapped allowing us to
      continue to keep passing the initial idmapping into the mapping helpers
      to tell them that something isn't an idmapped mount even if the
      filesystem is mounted with an idmapping.
      
      Link: https://lore.kernel.org/r/20211123114227.3124056-11-brauner@kernel.org (v1)
      Link: https://lore.kernel.org/r/20211130121032.3753852-11-brauner@kernel.org (v2)
      Link: https://lore.kernel.org/r/20211203111707.3901969-11-brauner@kernel.org
      Cc: Seth Forshee <sforshee@digitalocean.com>
      Cc: Amir Goldstein <amir73il@gmail.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      CC: linux-fsdevel@vger.kernel.org
      Reviewed-by: default avatarSeth Forshee <sforshee@digitalocean.com>
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      bd303368
  14. 03 Dec, 2021 2 commits
  15. 06 Nov, 2021 1 commit
  16. 18 Aug, 2021 2 commits
  17. 24 Jan, 2021 5 commits
    • Christian Brauner's avatar
      fs: make helpers idmap mount aware · 549c7297
      Christian Brauner authored
      Extend some inode methods with an additional user namespace argument. A
      filesystem that is aware of idmapped mounts will receive the user
      namespace the mount has been marked with. This can be used for
      additional permission checking and also to enable filesystems to
      translate between uids and gids if they need to. We have implemented all
      relevant helpers in earlier patches.
      
      As requested we simply extend the exisiting inode method instead of
      introducing new ones. This is a little more code churn but it's mostly
      mechanical and doesnt't leave us with additional inode methods.
      
      Link: https://lore.kernel.org/r/20210121131959.646623-25-christian.brauner@ubuntu.com
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: linux-fsdevel@vger.kernel.org
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      549c7297
    • Christian Brauner's avatar
      acl: handle idmapped mounts · e65ce2a5
      Christian Brauner authored
      The posix acl permission checking helpers determine whether a caller is
      privileged over an inode according to the acls associated with the
      inode. Add helpers that make it possible to handle acls on idmapped
      mounts.
      
      The vfs and the filesystems targeted by this first iteration make use of
      posix_acl_fix_xattr_from_user() and posix_acl_fix_xattr_to_user() to
      translate basic posix access and default permissions such as the
      ACL_USER and ACL_GROUP type according to the initial user namespace (or
      the superblock's user namespace) to and from the caller's current user
      namespace. Adapt these two helpers to handle idmapped mounts whereby we
      either map from or into the mount's user namespace depending on in which
      direction we're translating.
      Similarly, cap_convert_nscap() is used by the vfs to translate user
      namespace and non-user namespace aware filesystem capabilities from the
      superblock's user namespace to the caller's user namespace. Enable it to
      handle idmapped mounts by accounting for the mount's user namespace.
      
      In addition the fileystems targeted in the first iteration of this patch
      series make use of the posix_acl_chmod() and, posix_acl_update_mode()
      helpers. Both helpers perform permission checks on the target inode. Let
      them handle idmapped mounts. These two helpers are called when posix
      acls are set by the respective filesystems to handle this case we extend
      the ->set() method to take an additional user namespace argument to pass
      the mount's user namespace down.
      
      Link: https://lore.kernel.org/r/20210121131959.646623-9-christian.brauner@ubuntu.com
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: linux-fsdevel@vger.kernel.org
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      e65ce2a5
    • Christian Brauner's avatar
      inode: make init and permission helpers idmapped mount aware · 21cb47be
      Christian Brauner authored
      The inode_owner_or_capable() helper determines whether the caller is the
      owner of the inode or is capable with respect to that inode. Allow it to
      handle idmapped mounts. If the inode is accessed through an idmapped
      mount it according to the mount's user namespace. Afterwards the checks
      are identical to non-idmapped mounts. If the initial user namespace is
      passed nothing changes so non-idmapped mounts will see identical
      behavior as before.
      
      Similarly, allow the inode_init_owner() helper to handle idmapped
      mounts. It initializes a new inode on idmapped mounts by mapping the
      fsuid and fsgid of the caller from the mount's user namespace. If the
      initial user namespace is passed nothing changes so non-idmapped mounts
      will see identical behavior as before.
      
      Link: https://lore.kernel.org/r/20210121131959.646623-7-christian.brauner@ubuntu.com
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: linux-fsdevel@vger.kernel.org
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      21cb47be
    • Christian Brauner's avatar
      namei: make permission helpers idmapped mount aware · 47291baa
      Christian Brauner authored
      The two helpers inode_permission() and generic_permission() are used by
      the vfs to perform basic permission checking by verifying that the
      caller is privileged over an inode. In order to handle idmapped mounts
      we extend the two helpers with an additional user namespace argument.
      On idmapped mounts the two helpers will make sure to map the inode
      according to the mount's user namespace and then peform identical
      permission checks to inode_permission() and generic_permission(). If the
      initial user namespace is passed nothing changes so non-idmapped mounts
      will see identical behavior as before.
      
      Link: https://lore.kernel.org/r/20210121131959.646623-6-christian.brauner@ubuntu.com
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: linux-fsdevel@vger.kernel.org
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      Acked-by: default avatarSerge Hallyn <serge@hallyn.com>
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      47291baa
    • Christian Brauner's avatar
      capability: handle idmapped mounts · 0558c1bf
      Christian Brauner authored
      In order to determine whether a caller holds privilege over a given
      inode the capability framework exposes the two helpers
      privileged_wrt_inode_uidgid() and capable_wrt_inode_uidgid(). The former
      verifies that the inode has a mapping in the caller's user namespace and
      the latter additionally verifies that the caller has the requested
      capability in their current user namespace.
      If the inode is accessed through an idmapped mount map it into the
      mount's user namespace. Afterwards the checks are identical to
      non-idmapped inodes. If the initial user namespace is passed all
      operations are a nop so non-idmapped mounts will not see a change in
      behavior.
      
      Link: https://lore.kernel.org/r/20210121131959.646623-5-christian.brauner@ubuntu.com
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: linux-fsdevel@vger.kernel.org
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      Acked-by: default avatarSerge Hallyn <serge@hallyn.com>
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      0558c1bf
  18. 08 Jun, 2020 1 commit
    • Linus Torvalds's avatar
      vfs: clean up posix_acl_permission() logic aroudn MAY_NOT_BLOCK · 63d72b93
      Linus Torvalds authored
      posix_acl_permission() does not care about MAY_NOT_BLOCK, and in fact
      the permission logic internally must not check that bit (it's only for
      upper layers to decide whether they can block to do IO to look up the
      acl information or not).
      
      But the way the code was written, it _looked_ like it cared, since the
      function explicitly did not mask that bit off.
      
      But it has exactly two callers: one for when that bit is set, which
      first clears the bit before calling posix_acl_permission(), and the
      other call site when that bit was clear.
      
      So stop the silly games "saving" the MAY_NOT_BLOCK bit that must not be
      used for the actual permission test, and that currently is pointlessly
      cleared by the callers when the function itself should just not care.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      63d72b93