• Serge E. Hallyn's avatar
    capabilities: require CAP_SETFCAP to map uid 0 · db2e718a
    Serge E. Hallyn authored
    cap_setfcap is required to create file capabilities.
    
    Since commit 8db6c34f ("Introduce v3 namespaced file capabilities"),
    a process running as uid 0 but without cap_setfcap is able to work
    around this as follows: unshare a new user namespace which maps parent
    uid 0 into the child namespace.
    
    While this task will not have new capabilities against the parent
    namespace, there is a loophole due to the way namespaced file
    capabilities are represented as xattrs.  File capabilities valid in
    userns 1 are distinguished from file capabilities valid in userns 2 by
    the kuid which underlies uid 0.  Therefore the restricted root process
    can unshare a new self-mapping namespace, add a namespaced file
    capability onto a file, then use that file capability in the parent
    namespace.
    
    To prevent that, do not allow mapping parent uid 0 if the process which
    opened the uid_map file does not have CAP_SETFCAP, which is the
    capability for setting file capabilities.
    
    As a further wrinkle: a task can unshare its user namespace, then open
    its uid_map file itself, and map (only) its own uid.  In this case we do
    not have the credential from before unshare, which was potentially more
    restricted.  So, when creating a user namespace, we record whether the
    creator had CAP_SETFCAP.  Then we can use that during map_write().
    
    With this patch:
    
    1. Unprivileged user can still unshare -Ur
    
       ubuntu@caps:~$ unshare -Ur
       root@caps:~# logout
    
    2. Root user can still unshare -Ur
    
       ubuntu@caps:~$ sudo bash
       root@caps:/home/ubuntu# unshare -Ur
       root@caps:/home/ubuntu# logout
    
    3. Root user without CAP_SETFCAP cannot unshare -Ur:
    
       root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap --
       root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap
       unable to set CAP_SETFCAP effective capability: Operation not permitted
       root@caps:/home/ubuntu# unshare -Ur
       unshare: write failed /proc/self/uid_map: Operation not permitted
    
    Note: an alternative solution would be to allow uid 0 mappings by
    processes without CAP_SETFCAP, but to prevent such a namespace from
    writing any file capabilities.  This approach can be seen at [1].
    
    Background history: commit 95ebabde ("capabilities: Don't allow
    writing ambiguous v3 file capabilities") tried to fix the issue by
    preventing v3 fscaps to be written to disk when the root uid would map
    to the same uid in nested user namespaces.  This led to regressions for
    various workloads.  For example, see [2].  Ultimately this is a valid
    use-case we have to support meaning we had to revert this change in
    3b0c2d3e ("Revert 95ebabde ("capabilities: Don't allow writing
    ambiguous v3 file capabilities")").
    
    Link: https://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git/log/?h=2021-04-15/setfcap-nsfscaps-v4 [1]
    Link: https://github.com/containers/buildah/issues/3071
    
     [2]
    Signed-off-by: default avatarSerge Hallyn <serge@hallyn.com>
    Reviewed-by: default avatarAndrew G. Morgan <morgan@kernel.org>
    Tested-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
    Reviewed-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
    Tested-by: default avatarGiuseppe Scrivano <gscrivan@redhat.com>
    Cc: Eric Biederman <ebiederm@xmission.com>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    db2e718a
user_namespace.c 34.9 KB