• Christian Brauner's avatar
    commoncap: handle idmapped mounts · 71bc356f
    Christian Brauner authored
    When interacting with user namespace and non-user namespace aware
    filesystem capabilities the vfs will perform various security checks to
    determine whether or not the filesystem capabilities can be used by the
    caller, whether they need to be removed and so on. The main
    infrastructure for this resides in the capability codepaths but they are
    called through the LSM security infrastructure even though they are not
    technically an LSM or optional. This extends the existing security hooks
    security_inode_removexattr(), security_inode_killpriv(),
    security_inode_getsecurity() to pass down the mount's user namespace and
    makes them aware of idmapped mounts.
    
    In order to actually get filesystem capabilities from disk the
    capability infrastructure exposes the get_vfs_caps_from_disk() helper.
    For user namespace aware filesystem capabilities a root uid is stored
    alongside the capabilities.
    
    In order to determine whether the caller can make use of the filesystem
    capability or whether it needs to be ignored it is translated according
    to the superblock's user namespace. If it can be translated to uid 0
    according to that id mapping the caller can use the filesystem
    capabilities stored on disk. If we are accessing the inode that holds
    the filesystem capabilities through an idmapped mount we map the root
    uid according to the mount's user namespace. Afterwards the checks are
    identical to non-idmapped mounts: reading filesystem caps from disk
    enforces that the root uid associated with the filesystem capability
    must have a mapping in the superblock's user namespace and that the
    caller is either in the same user namespace or is a descendant of the
    superblock's user namespace. For filesystems that are mountable inside
    user namespace the caller can just mount the filesystem and won't
    usually need to idmap it. If they do want to idmap it they can create an
    idmapped mount and mark it with a user namespace they created and which
    is thus a descendant of s_user_ns. For filesystems that are not
    mountable inside user namespaces the descendant rule is trivially true
    because the s_user_ns will be the initial user namespace.
    
    If the initial user namespace is passed nothing changes so non-idmapped
    mounts will see identical behavior as before.
    
    Link: https://lore.kernel.org/r/20210121131959.646623-11-christian.brauner@ubuntu.com
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: David Howells <dhowells@redhat.com>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: linux-fsdevel@vger.kernel.org
    Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    Acked-by: default avatarJames Morris <jamorris@linux.microsoft.com>
    Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
    71bc356f
security.h 54.1 KB