• Aleksa Sarai's avatar
    fs: exec: apply CLOEXEC before changing dumpable task flags · 43fe8188
    Aleksa Sarai authored
    commit 613cc2b6 upstream.
    
    If you have a process that has set itself to be non-dumpable, and it
    then undergoes exec(2), any CLOEXEC file descriptors it has open are
    "exposed" during a race window between the dumpable flags of the process
    being reset for exec(2) and CLOEXEC being applied to the file
    descriptors. This can be exploited by a process by attempting to access
    /proc/<pid>/fd/... during this window, without requiring CAP_SYS_PTRACE.
    
    The race in question is after set_dumpable has been (for get_link,
    though the trace is basically the same for readlink):
    
    [vfs]
    -> proc_pid_link_inode_operations.get_link
       -> proc_pid_get_link
          -> proc_fd_access_allowed
             -> ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS);
    
    Which will return 0, during the race window and CLOEXEC file descriptors
    will still be open during this window because do_close_on_exec has not
    been called yet. As a result, the ordering of these calls should be
    reversed to avoid this race window.
    
    This is of particular concern to container runtimes, where joining a
    PID namespace with file descriptors referring to the host filesystem
    can result in security issues (since PRCTL_SET_DUMPABLE doesn't protect
    against access of CLOEXEC file descriptors -- file descriptors which may
    reference filesystem objects the container shouldn't have access to).
    
    Cc: dev@opencontainers.org
    Reported-by: default avatarMichael Crosby <crosbymichael@gmail.com>
    Signed-off-by: default avatarAleksa Sarai <asarai@suse.de>
    Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
    Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
    43fe8188
exec.c 40 KB