• Christian Brauner's avatar
    ovl: modify layer parameter parsing · b36a5780
    Christian Brauner authored
    We ran into issues where mount(8) passed multiple lower layers as one
    big string through fsconfig(). But the fsconfig() FSCONFIG_SET_STRING
    option is limited to 256 bytes in strndup_user(). While this would be
    fixable by extending the fsconfig() buffer I'd rather encourage users to
    append layers via multiple fsconfig() calls as the interface allows
    nicely for this. This has also been requested as a feature before.
    
    With this port to the new mount api the following will be possible:
    
            fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", "/lower1", 0);
    
            /* set upper layer */
            fsconfig(fs_fd, FSCONFIG_SET_STRING, "upperdir", "/upper", 0);
    
            /* append "/lower2", "/lower3", and "/lower4" */
            fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower2:/lower3:/lower4", 0);
    
            /* turn index feature on */
            fsconfig(fs_fd, FSCONFIG_SET_STRING, "index", "on", 0);
    
            /* append "/lower5" */
            fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower5", 0);
    
    Specifying ':' would have been rejected so this isn't a regression. And
    we can't simply use "lowerdir=/lower" to append on top of existing
    layers as "lowerdir=/lower,lowerdir=/other-lower" would make
    "/other-lower" the only lower layer so we'd break uapi if we changed
    this. So the ':' prefix seems a good compromise.
    
    Users can choose to specify multiple layers at once or individual
    layers. A layer is appended if it starts with ":". This requires that
    the user has already added at least one layer before. If lowerdir is
    specified again without a leading ":" then all previous layers are
    dropped and replaced with the new layers. If lowerdir is specified and
    empty than all layers are simply dropped.
    
    An additional change is that overlayfs will now parse and resolve layers
    right when they are specified in fsconfig() instead of deferring until
    super block creation. This allows users to receive early errors.
    
    It also allows users to actually use up to 500 layers something which
    was theoretically possible but ended up not working due to the mount
    option string passed via mount(2) being too large.
    
    This also allows a more privileged process to set config options for a
    lesser privileged process as the creds for fsconfig() and the creds for
    fsopen() can differ. We could restrict that they match by enforcing that
    the creds of fsopen() and fsconfig() match but I don't see why that
    needs to be the case and allows for a good delegation mechanism.
    
    Plus, in the future it means we're able to extend overlayfs mount
    options and allow users to specify layers via file descriptors instead
    of paths:
    
            fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower1", dirfd);
    
            /* append */
            fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower2", dirfd);
    
            /* append */
            fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower3", dirfd);
    
            /* clear all layers specified until now */
            fsconfig(FSCONFIG_SET_STRING, "lowerdir", NULL, 0);
    
    This would be especially nice if users create an overlayfs mount on top
    of idmapped layers or just in general private mounts created via
    open_tree(OPEN_TREE_CLONE). Those mounts would then never have to appear
    anywhere in the filesystem. But for now just do the minimal thing.
    
    We should probably aim to move more validation into ovl_fs_parse_param()
    so users get errors before fsconfig(FSCONFIG_CMD_CREATE). But that can
    be done in additional patches later.
    
    This is now also rebased on top of the lazy lowerdata lookup which
    allows the specificatin of data only layers using the new "::" syntax.
    
    The rules are simple. A data only layers cannot be followed by any
    regular layers and data layers must be preceeded by at least one regular
    layer.
    
    Parsing the lowerdir mount option must change because of this. The
    original patchset used the old lowerdir parsing function to split a
    lowerdir mount option string such as:
    
            lowerdir=/lower1:/lower2::/lower3::/lower4
    
    simply replacing each non-escaped ":" by "\0". So sequences of
    non-escaped ":" were counted as layers. For example, the previous
    lowerdir mount option above would've counted 6 layers instead of 4 and a
    lowerdir mount option such as:
    
            lowerdir="/lower1:/lower2::/lower3::/lower4:::::::::::::::::::::::::::"
    
    would be counted as 33 layers. Other than being ugly this didn't matter
    much because kern_path() would reject the first "\0" layer. However,
    this overcounting of layers becomes problematic when we base allocations
    on it where we very much only want to allocate space for 4 layers
    instead of 33.
    
    So the new parsing function rejects non-escaped sequences of colons
    other than ":" and "::" immediately instead of relying on kern_path().
    
    Link: https://github.com/util-linux/util-linux/issues/2287
    Link: https://github.com/util-linux/util-linux/issues/1992
    Link: https://bugs.archlinux.org/task/78702
    Link: https://lore.kernel.org/linux-unionfs/20230530-klagen-zudem-32c0908c2108@braunerSigned-off-by: default avatarChristian Brauner <brauner@kernel.org>
    Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
    b36a5780
super.c 52.7 KB