• Christian Brauner's avatar
    nsfs: add pid translation ioctls · ca567df7
    Christian Brauner authored
    Add ioctl()s to translate pids between pid namespaces.
    
    LXCFS is a tiny fuse filesystem used to virtualize various aspects of
    procfs. LXCFS is run on the host. The files and directories it creates
    can be bind-mounted by e.g. a container at startup and mounted over the
    various procfs files the container wishes to have virtualized. When e.g.
    a read request for uptime is received, LXCFS will receive the pid of the
    reader. In order to virtualize the corresponding read, LXCFS needs to
    know the pid of the init process of the reader's pid namespace. In order
    to do this, LXCFS first needs to fork() two helper processes. The first
    helper process setns() to the readers pid namespace. The second helper
    process is needed to create a process that is a proper member of the pid
    namespace. The second helper process then creates a ucred message with
    ucred.pid set to 1 and sends it back to LXCFS. The kernel will translate
    the ucred.pid field to the corresponding pid number in LXCFS's pid
    namespace. This way LXCFS can learn the init pid number of the reader's
    pid namespace and can go on to virtualize. Since these two forks() are
    costly LXCFS maintains an init pid cache that caches a given pid for a
    fixed amount of time. The cache is pruned during new read requests.
    However, even with the cache the hit of the two forks() is singificant
    when a very large number of containers are running. With this simple
    patch we add an ns ioctl that let's a caller retrieve the init pid nr of
    a pid namespace through its pid namespace fd. This significantly
    improves performance with a very simple change.
    
    Support translation of pids and tgids. Other concepts can be added but
    there are no obvious users for this right now.
    
    To protect against races pidfds can be used to check whether the process
    is still valid. If needed, this can also be extended to work on pidfds
    directly.
    
    Link: https://lore.kernel.org/r/20240619-work-ns_ioctl-v1-1-7c0097e6bb6b@kernel.org
    
    Reviewed-by: default avatarAlexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
    Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
    ca567df7
nsfs.c 6.57 KB