• Christian Brauner's avatar
    fs: introduce MOUNT_ATTR_IDMAP · 9caccd41
    Christian Brauner authored
    Introduce a new mount bind mount property to allow idmapping mounts. The
    MOUNT_ATTR_IDMAP flag can be set via the new mount_setattr() syscall
    together with a file descriptor referring to a user namespace.
    
    The user namespace referenced by the namespace file descriptor will be
    attached to the bind mount. All interactions with the filesystem going
    through that mount will be mapped according to the mapping specified in
    the user namespace attached to it.
    
    Using user namespaces to mark mounts means we can reuse all the existing
    infrastructure in the kernel that already exists to handle idmappings
    and can also use this for permission checking to allow unprivileged user
    to create idmapped mounts in the future.
    
    Idmapping a mount is decoupled from the caller's user and mount
    namespace. This means idmapped mounts can be created in the initial
    user namespace which is an important use-case for systemd-homed,
    portable usb-sticks between systems, sharing data between the initial
    user namespace and unprivileged containers, and other use-cases that
    have been brought up. For example, assume a home directory where all
    files are owned by uid and gid 1000 and the home directory is brought to
    a new laptop where the user has id 12345. The system administrator can
    simply create a mount of this home directory with a mapping of
    1000:12345:1 and other mappings to indicate the ids should be kept.
    (With this it is e.g. also possible to create idmapped mounts on the
    host with an identity mapping 1:1:100000 where the root user is not
    mapped. A user with root access that e.g. has been pivot rooted into
    such a mount on the host will be not be able to execute, read, write, or
    create files as root.)
    
    Given that mapping a mount is decoupled from the caller's user namespace
    a sufficiently privileged process such as a container manager can set up
    an idmapped mount for the container and the container can simply pivot
    root to it. There's no need for the container to do anything. The mount
    will appear correctly mapped independent of the user namespace the
    container uses. This means we don't need to mark a mount as idmappable.
    
    In order to create an idmapped mount the caller must currently be
    privileged in the user namespace of the superblock the mount belongs to.
    Once a mount has been idmapped we don't allow it to change its mapping.
    This keeps permission checking and life-cycle management simple. Users
    wanting to change the idmapped can always create a new detached mount
    with a different idmapping.
    
    Link: https://lore.kernel.org/r/20210121131959.646623-36-christian.brauner@ubuntu.com
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: David Howells <dhowells@redhat.com>
    Cc: Mauricio Vásquez Bernal <mauricio@kinvolk.io>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: linux-fsdevel@vger.kernel.org
    Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
    9caccd41
proc_namespace.c 8.14 KB