1. 28 Feb, 2019 26 commits
    • David Howells's avatar
      vfs: Implement logging through fs_context · e7582e16
      David Howells authored
      Implement the ability for filesystems to log error, warning and
      informational messages through the fs_context.  In the future, these will
      be extractable by userspace by reading from an fd created by the fsopen()
      syscall.
      
      Error messages are prefixed with "e ", warnings with "w " and informational
      messages with "i ".
      
      In the future, inside the kernel, formatted messages will be malloc'd but
      unformatted messages will not copied if they're either in the core .rodata
      section or in the .rodata section of the filesystem module pinned by
      fs_context::fs_type.  The messages will only be good till the fs_type is
      released.
      
      Note that the logging object will be shared between duplicated fs_context
      structures.  This is so that such as NFS which do a mount within a mount
      can get at least some of the errors from the inner mount.
      
      Five logging functions are provided for this:
      
       (1) void logfc(struct fs_context *fc, const char *fmt, ...);
      
           This logs a message into the context.  If the buffer is full, the
           earliest message is discarded.
      
       (2) void errorf(fc, fmt, ...);
      
           This wraps logfc() to log an error.
      
       (3) void invalf(fc, fmt, ...);
      
           This wraps errorf() and returns -EINVAL for convenience.
      
       (4) void warnf(fc, fmt, ...);
      
           This wraps logfc() to log a warning.
      
       (5) void infof(fc, fmt, ...);
      
           This wraps logfc() to log an informational message.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      e7582e16
    • David Howells's avatar
      vfs: Provide documentation for new mount API · 5fe1890d
      David Howells authored
      Provide documentation for the new mount API.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      5fe1890d
    • David Howells's avatar
      vfs: Remove kern_mount_data() · d911b458
      David Howells authored
      The kern_mount_data() isn't used any more so remove it.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      d911b458
    • David Howells's avatar
      hugetlbfs: Convert to fs_context · 32021982
      David Howells authored
      Convert the hugetlbfs to use the fs_context during mount.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      32021982
    • David Howells's avatar
      cpuset: Use fs_context · a1875374
      David Howells authored
      Make the cpuset filesystem use the filesystem context.  This is potentially
      tricky as the cpuset fs is almost an alias for the cgroup filesystem, but
      with some special parameters.
      
      This can, however, be handled by setting up an appropriate cgroup
      filesystem and returning the root directory of that as the root dir of this
      one.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      a1875374
    • David Howells's avatar
      kernfs, sysfs, cgroup, intel_rdt: Support fs_context · 23bf1b6b
      David Howells authored
      Make kernfs support superblock creation/mount/remount with fs_context.
      
      This requires that sysfs, cgroup and intel_rdt, which are built on kernfs,
      be made to support fs_context also.
      
      Notes:
      
       (1) A kernfs_fs_context struct is created to wrap fs_context and the
           kernfs mount parameters are moved in here (or are in fs_context).
      
       (2) kernfs_mount{,_ns}() are made into kernfs_get_tree().  The extra
           namespace tag parameter is passed in the context if desired
      
       (3) kernfs_free_fs_context() is provided as a destructor for the
           kernfs_fs_context struct, but for the moment it does nothing except
           get called in the right places.
      
       (4) sysfs doesn't wrap kernfs_fs_context since it has no parameters to
           pass, but possibly this should be done anyway in case someone wants to
           add a parameter in future.
      
       (5) A cgroup_fs_context struct is created to wrap kernfs_fs_context and
           the cgroup v1 and v2 mount parameters are all moved there.
      
       (6) cgroup1 parameter parsing error messages are now handled by invalf(),
           which allows userspace to collect them directly.
      
       (7) cgroup1 parameter cleanup is now done in the context destructor rather
           than in the mount/get_tree and remount functions.
      
      Weirdies:
      
       (*) cgroup_do_get_tree() calls cset_cgroup_from_root() with locks held,
           but then uses the resulting pointer after dropping the locks.  I'm
           told this is okay and needs commenting.
      
       (*) The cgroup refcount web.  This really needs documenting.
      
       (*) cgroup2 only has one root?
      
      Add a suggestion from Thomas Gleixner in which the RDT enablement code is
      placed into its own function.
      
      [folded a leak fix from Andrey Vagin]
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      cc: Tejun Heo <tj@kernel.org>
      cc: Li Zefan <lizefan@huawei.com>
      cc: Johannes Weiner <hannes@cmpxchg.org>
      cc: cgroups@vger.kernel.org
      cc: fenghua.yu@intel.com
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      23bf1b6b
    • Al Viro's avatar
      cgroup: store a reference to cgroup_ns into cgroup_fs_context · cca8f327
      Al Viro authored
      ... and trim cgroup_do_mount() arguments (renaming it to cgroup_do_get_tree())
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      cca8f327
    • Al Viro's avatar
    • Al Viro's avatar
      cgroup_do_mount(): massage calling conventions · 71d883c3
      Al Viro authored
      pass it fs_context instead of fs_type/flags/root triple, have
      it return int instead of dentry and make it deal with setting
      fc->root.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      71d883c3
    • Al Viro's avatar
      cgroup: stash cgroup_root reference into cgroup_fs_context · cf6299b1
      Al Viro authored
      Note that this reference is *NOT* contributing to refcount of
      cgroup_root in question and is valid only until cgroup_do_mount()
      returns.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      cf6299b1
    • Al Viro's avatar
      cgroup2: switch to option-by-option parsing · e34a98d5
      Al Viro authored
      [again, carved out of patch by dhowells]
      [NB: we probably want to handle "source" in parse_param here]
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      e34a98d5
    • Al Viro's avatar
      cgroup1: switch to option-by-option parsing · 8d2451f4
      Al Viro authored
      [dhowells should be the author - it's carved out of his patch]
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      8d2451f4
    • Al Viro's avatar
      cgroup: take options parsing into ->parse_monolithic() · f5dfb531
      Al Viro authored
      Store the results in cgroup_fs_context.  There's a nasty twist caused
      by the enabling/disabling subsystems - we can't do the checks sensitive
      to that until cgroup_mutex gets grabbed.  Frankly, these checks are
      complete bullshit (e.g. all,none combination is accepted if all subsystems
      are disabled; so's cpusets,none and all,cpusets when cpusets is disabled,
      etc.), but touching that would be a userland-visible behaviour change ;-/
      
      So we do parsing in ->parse_monolithic() and have the consistency checks
      done in check_cgroupfs_options(), with the latter called (on already parsed
      options) from cgroup1_get_tree() and cgroup1_reconfigure().
      
      Freeing the strdup'ed strings is done from fs_context destructor, which
      somewhat simplifies the life for cgroup1_{get_tree,reconfigure}().
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      f5dfb531
    • Al Viro's avatar
      7feeef58
    • Al Viro's avatar
      cgroup: start switching to fs_context · 90129625
      Al Viro authored
      Unfortunately, cgroup is tangled into kernfs infrastructure.
      To avoid converting all kernfs-based filesystems at once,
      we need to untangle the remount part of things, instead of
      having it go through kernfs_sop_remount_fs().  Fortunately,
      it's not hard to do.
      
      This commit just gets cgroup/cgroup1 to use fs_context to
      deliver options on mount and remount paths.  Parsing those
      is going to be done in the next commits; for now we do
      pretty much what legacy case does.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      90129625
    • David Howells's avatar
      ipc: Convert mqueue fs to fs_context · 935c6912
      David Howells authored
      Convert the mqueue filesystem to use the filesystem context stuff.
      
      Notes:
      
       (1) The relevant ipc namespace is selected in when the context is
           initialised (and it defaults to the current task's ipc namespace).
           The caller can override this before calling vfs_get_tree().
      
       (2) Rather than simply calling kern_mount_data(), mq_init_ns() and
           mq_internal_mount() create a context, adjust it and then do the rest
           of the mount procedure.
      
       (3) The lazy mqueue mounting on creation of a new namespace is retained
           from a previous patch, but the avoidance of sget() if no superblock
           yet exists is reverted and the superblock is again keyed on the
           namespace pointer.
      
           Yes, there was a performance gain in not searching the superblock
           hash, but it's only paid once per ipc namespace - and only if someone
           uses mqueue within that namespace, so I'm not sure it's worth it,
           especially as calling sget() allows avoidance of recursion.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      935c6912
    • David Howells's avatar
      proc: Add fs_context support to procfs · 66f592e2
      David Howells authored
      Add fs_context support to procfs.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      66f592e2
    • David Howells's avatar
      procfs: Move proc_fill_super() to fs/proc/root.c · 60a3c3a5
      David Howells authored
      Move proc_fill_super() to fs/proc/root.c as that's where the other
      superblock stuff is.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      60a3c3a5
    • Al Viro's avatar
      introduce cloning of fs_context · 0b52075e
      Al Viro authored
      new primitive: vfs_dup_fs_context().  Comes with fs_context
      method (->dup()) for copying the filesystem-specific parts
      of fs_context, along with LSM one (->fs_context_dup()) for
      doing the same to LSM parts.
      
      [needs better commit message, and change of Author:, anyway]
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      0b52075e
    • Al Viro's avatar
      convenience helpers: vfs_get_super() and sget_fc() · cb50b348
      Al Viro authored
      the former is an analogue of mount_{single,nodev} for use in
      ->get_tree() instances, the latter - analogue of sget() for the
      same.
      
      These are fairly similar to the originals, but the callback signature
      for sget_fc() is different from sget() ones, so getting bits and
      pieces shared would be too convoluted; we might get around to that
      later, but for now let's just remember to keep them in sync.  They
      do live next to each other, and changes in either won't be hard
      to spot.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      cb50b348
    • David Howells's avatar
      vfs: Implement a filesystem superblock creation/configuration context · 3e1aeb00
      David Howells authored
      [AV - unfuck kern_mount_data(); we want non-NULL ->mnt_ns on long-living
      mounts]
      [AV - reordering fs/namespace.c is badly overdue, but let's keep it
      separate from that series]
      [AV - drop simple_pin_fs() change]
      [AV - clean vfs_kern_mount() failure exits up]
      
      Implement a filesystem context concept to be used during superblock
      creation for mount and superblock reconfiguration for remount.
      
      The mounting procedure then becomes:
      
       (1) Allocate new fs_context context.
      
       (2) Configure the context.
      
       (3) Create superblock.
      
       (4) Query the superblock.
      
       (5) Create a mount for the superblock.
      
       (6) Destroy the context.
      
      Rather than calling fs_type->mount(), an fs_context struct is created and
      fs_type->init_fs_context() is called to set it up.  Pointers exist for the
      filesystem and LSM to hang their private data off.
      
      A set of operations has to be set by ->init_fs_context() to provide
      freeing, duplication, option parsing, binary data parsing, validation,
      mounting and superblock filling.
      
      Legacy filesystems are supported by the provision of a set of legacy
      fs_context operations that build up a list of mount options and then invoke
      fs_type->mount() from within the fs_context ->get_tree() operation.  This
      allows all filesystems to be accessed using fs_context.
      
      It should be noted that, whilst this patch adds a lot of lines of code,
      there is quite a bit of duplication with existing code that can be
      eliminated should all filesystems be converted over.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      3e1aeb00
    • David Howells's avatar
      vfs: Put security flags into the fs_context struct · 846e5662
      David Howells authored
      Put security flags, such as SECURITY_LSM_NATIVE_LABELS, into the filesystem
      context so that the filesystem can communicate them to the LSM more easily.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      846e5662
    • David Howells's avatar
      smack: Implement filesystem context security hooks · 2febd254
      David Howells authored
      Implement filesystem context security hooks for the smack LSM.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Casey Schaufler <casey@schaufler-ca.com>
      cc: linux-security-module@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      2febd254
    • David Howells's avatar
      selinux: Implement the new mount API LSM hooks · 442155c1
      David Howells authored
      Implement the new mount API LSM hooks for SELinux.  At some point the old
      hooks will need to be removed.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Paul Moore <paul@paul-moore.com>
      cc: Stephen Smalley <sds@tycho.nsa.gov>
      cc: selinux@tycho.nsa.gov
      cc: linux-security-module@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      442155c1
    • David Howells's avatar
      vfs: Add LSM hooks for the new mount API · da2441fd
      David Howells authored
      Add LSM hooks for use by the new mount API and filesystem context code.
      This includes:
      
       (1) Hooks to handle allocation, duplication and freeing of the security
           record attached to a filesystem context.
      
       (2) A hook to snoop source specifications.  There may be multiple of these
           if the filesystem supports it.  They will to be local files/devices if
           fs_context::source_is_dev is true and will be something else, possibly
           remote server specifications, if false.
      
       (3) A hook to snoop superblock configuration options in key[=val] form.
           If the LSM decides it wants to handle it, it can suppress the option
           being passed to the filesystem.  Note that 'val' may include commas
           and binary data with the fsopen patch.
      
       (4) A hook to perform validation and allocation after the configuration
           has been done but before the superblock is allocated and set up.
      
       (5) A hook to transfer the security from the context to a newly created
           superblock.
      
       (6) A hook to rule on whether a path point can be used as a mountpoint.
      
      These are intended to replace:
      
      	security_sb_copy_data
      	security_sb_kern_mount
      	security_sb_mount
      	security_sb_set_mnt_opts
      	security_sb_clone_mnt_opts
      	security_sb_parse_opts_str
      
      [AV -- some of the methods being replaced are already gone, some of the
      methods are not added for the lack of need]
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: linux-security-module@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      da2441fd
    • David Howells's avatar
      vfs: Add configuration parser helpers · 31d921c7
      David Howells authored
      Because the new API passes in key,value parameters, match_token() cannot be
      used with it.  Instead, provide three new helpers to aid with parsing:
      
       (1) fs_parse().  This takes a parameter and a simple static description of
           all the parameters and maps the key name to an ID.  It returns 1 on a
           match, 0 on no match if unknowns should be ignored and some other
           negative error code on a parse error.
      
           The parameter description includes a list of key names to IDs, desired
           parameter types and a list of enumeration name -> ID mappings.
      
           [!] Note that for the moment I've required that the key->ID mapping
           array is expected to be sorted and unterminated.  The size of the
           array is noted in the fsconfig_parser struct.  This allows me to use
           bsearch(), but I'm not sure any performance gain is worth the hassle
           of requiring people to keep the array sorted.
      
           The parameter type array is sized according to the number of parameter
           IDs and is indexed directly.  The optional enum mapping array is an
           unterminated, unsorted list and the size goes into the fsconfig_parser
           struct.
      
           The function can do some additional things:
      
      	(a) If it's not ambiguous and no value is given, the prefix "no" on
      	    a key name is permitted to indicate that the parameter should
      	    be considered negatory.
      
      	(b) If the desired type is a single simple integer, it will perform
      	    an appropriate conversion and store the result in a union in
      	    the parse result.
      
      	(c) If the desired type is an enumeration, {key ID, name} will be
      	    looked up in the enumeration list and the matching value will
      	    be stored in the parse result union.
      
      	(d) Optionally generate an error if the key is unrecognised.
      
           This is called something like:
      
      	enum rdt_param {
      		Opt_cdp,
      		Opt_cdpl2,
      		Opt_mba_mpbs,
      		nr__rdt_params
      	};
      
      	const struct fs_parameter_spec rdt_param_specs[nr__rdt_params] = {
      		[Opt_cdp]	= { fs_param_is_bool },
      		[Opt_cdpl2]	= { fs_param_is_bool },
      		[Opt_mba_mpbs]	= { fs_param_is_bool },
      	};
      
      	const const char *const rdt_param_keys[nr__rdt_params] = {
      		[Opt_cdp]	= "cdp",
      		[Opt_cdpl2]	= "cdpl2",
      		[Opt_mba_mpbs]	= "mba_mbps",
      	};
      
      	const struct fs_parameter_description rdt_parser = {
      		.name		= "rdt",
      		.nr_params	= nr__rdt_params,
      		.keys		= rdt_param_keys,
      		.specs		= rdt_param_specs,
      		.no_source	= true,
      	};
      
      	int rdt_parse_param(struct fs_context *fc,
      			    struct fs_parameter *param)
      	{
      		struct fs_parse_result parse;
      		struct rdt_fs_context *ctx = rdt_fc2context(fc);
      		int ret;
      
      		ret = fs_parse(fc, &rdt_parser, param, &parse);
      		if (ret < 0)
      			return ret;
      
      		switch (parse.key) {
      		case Opt_cdp:
      			ctx->enable_cdpl3 = true;
      			return 0;
      		case Opt_cdpl2:
      			ctx->enable_cdpl2 = true;
      			return 0;
      		case Opt_mba_mpbs:
      			ctx->enable_mba_mbps = true;
      			return 0;
      		}
      
      		return -EINVAL;
      	}
      
       (2) fs_lookup_param().  This takes a { dirfd, path, LOOKUP_EMPTY? } or
           string value and performs an appropriate path lookup to convert it
           into a path object, which it will then return.
      
           If the desired type was a blockdev, the type of the looked up inode
           will be checked to make sure it is one.
      
           This can be used like:
      
      	enum foo_param {
      		Opt_source,
      		nr__foo_params
      	};
      
      	const struct fs_parameter_spec foo_param_specs[nr__foo_params] = {
      		[Opt_source]	= { fs_param_is_blockdev },
      	};
      
      	const char *char foo_param_keys[nr__foo_params] = {
      		[Opt_source]	= "source",
      	};
      
      	const struct constant_table foo_param_alt_keys[] = {
      		{ "device",	Opt_source },
      	};
      
      	const struct fs_parameter_description foo_parser = {
      		.name		= "foo",
      		.nr_params	= nr__foo_params,
      		.nr_alt_keys	= ARRAY_SIZE(foo_param_alt_keys),
      		.keys		= foo_param_keys,
      		.alt_keys	= foo_param_alt_keys,
      		.specs		= foo_param_specs,
      	};
      
      	int foo_parse_param(struct fs_context *fc,
      			    struct fs_parameter *param)
      	{
      		struct fs_parse_result parse;
      		struct foo_fs_context *ctx = foo_fc2context(fc);
      		int ret;
      
      		ret = fs_parse(fc, &foo_parser, param, &parse);
      		if (ret < 0)
      			return ret;
      
      		switch (parse.key) {
      		case Opt_source:
      			return fs_lookup_param(fc, &foo_parser, param,
      					       &parse, &ctx->source);
      		default:
      			return -EINVAL;
      		}
      	}
      
       (3) lookup_constant().  This takes a table of named constants and looks up
           the given name within it.  The table is expected to be sorted such
           that bsearch() be used upon it.
      
           Possibly I should require the table be terminated and just use a
           for-loop to scan it instead of using bsearch() to reduce hassle.
      
           Tables look something like:
      
      	static const struct constant_table bool_names[] = {
      		{ "0",		false },
      		{ "1",		true },
      		{ "false",	false },
      		{ "no",		false },
      		{ "true",	true },
      		{ "yes",	true },
      	};
      
           and a lookup is done with something like:
      
      	b = lookup_constant(bool_names, param->string, -1);
      
      Additionally, optional validation routines for the parameter description
      are provided that can be enabled at compile time.  A later patch will
      invoke these when a filesystem is registered.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      31d921c7
  2. 30 Jan, 2019 11 commits
    • David Howells's avatar
      vfs: Introduce logging functions · c6b82263
      David Howells authored
      Introduce a set of logging functions through which informational messages,
      warnings and error messages incurred by the mount procedure can be logged
      and, in a future patch, passed to userspace instead by way of the
      filesystem configuration context file descriptor.
      
      There are four functions:
      
       (1) infof(const char *fmt, ...);
      
           Logs an informational message.
      
       (2) warnf(const char *fmt, ...);
      
           Logs a warning message.
      
       (3) errorf(const char *fmt, ...);
      
           Logs an error message.
      
       (4) invalf(const char *fmt, ...);
      
           As errof(), but returns -EINVAL so can be used on a return statement.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      c6b82263
    • Al Viro's avatar
      introduce fs_context methods · f3a09c92
      Al Viro authored
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      f3a09c92
    • Al Viro's avatar
      fs_context flavour for submounts · e1a91586
      Al Viro authored
      This is an eventual replacement for vfs_submount() uses.  Unlike the
      "mount" and "remount" cases, the users of that thing are not in VFS -
      they are buried in various ->d_automount() instances and rather than
      converting them all at once we introduce the (thankfully small and
      simple) infrastructure here and deal with the prospective users in
      afs, nfs, etc. parts of the series.
      
      Here we just introduce a new constructor (fs_context_for_submount())
      along with the corresponding enum constant to be put into fc->purpose
      for those.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      e1a91586
    • David Howells's avatar
      convert do_remount_sb() to fs_context · 8d0347f6
      David Howells authored
      Replace do_remount_sb() with a function, reconfigure_super(), that's
      fs_context aware.  The fs_context is expected to be parameterised already
      and have ->root pointing to the superblock to be reconfigured.
      
      A legacy wrapper is provided that is intended to be called from the
      fs_context ops when those appear, but for now is called directly from
      reconfigure_super().  This wrapper invokes the ->remount_fs() superblock op
      for the moment.  It is intended that the remount_fs() op will be phased
      out.
      
      The fs_context->purpose is set to FS_CONTEXT_FOR_RECONFIGURE to indicate
      that the context is being used for reconfiguration.
      
      do_umount_root() is provided to consolidate remount-to-R/O for umount and
      emergency remount by creating a context and invoking reconfiguration.
      
      do_remount(), do_umount() and do_emergency_remount_callback() are switched
      to use the new process.
      
      [AV -- fold UMOUNT and EMERGENCY_REMOUNT in; fixes the
      umount / bug, gets rid of pointless complexity]
      [AV -- set ->net_ns in all cases; nfs remount will need that]
      [AV -- shift security_sb_remount() call into reconfigure_super(); the callers
      that didn't do security_sb_remount() have NULL fc->security anyway, so it's
      a no-op for them]
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Co-developed-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      8d0347f6
    • Al Viro's avatar
      vfs_get_tree(): evict the call of security_sb_kern_mount() · c9ce29ed
      Al Viro authored
      Right now vfs_get_tree() calls security_sb_kern_mount() (i.e.
      mount MAC) unless it gets MS_KERNMOUNT or MS_SUBMOUNT in flags.
      Doing it that way is both clumsy and imprecise.
      
      Consider the callers' tree of vfs_get_tree():
      vfs_get_tree()
              <- do_new_mount()
      	<- vfs_kern_mount()
      		<- simple_pin_fs()
      		<- vfs_submount()
      		<- kern_mount_data()
      		<- init_mount_tree()
      		<- btrfs_mount()
      			<- vfs_get_tree()
      		<- nfs_do_root_mount()
      			<- nfs4_try_mount()
      				<- nfs_fs_mount()
      					<- vfs_get_tree()
      			<- nfs4_referral_mount()
      
      do_new_mount() always does need MAC (we are guaranteed that neither
      MS_KERNMOUNT nor MS_SUBMOUNT will be passed there).
      
      simple_pin_fs(), vfs_submount() and kern_mount_data() pass explicit
      flags inhibiting that check.  So does nfs4_referral_mount() (the
      flags there are ulimately coming from vfs_submount()).
      
      init_mount_tree() is called too early for anything LSM-related; it
      doesn't matter whether we attempt those checks, they'll do nothing.
      
      Finally, in case of btrfs_mount() and nfs_fs_mount(), doing MAC
      is pointless - either the caller will do it, or the flags are
      such that we wouldn't have done it either.
      
      In other words, the one and only case when we want that check
      done is when we are called from do_new_mount(), and there we
      want it unconditionally.
      
      So let's simply move it there.  The superblock is still locked,
      so nobody is going to get access to it (via ustat(2), etc.)
      until we get a chance to apply the checks - we are free to
      move them to any point up to where we drop ->s_umount (in
      do_new_mount_fc()).
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      c9ce29ed
    • David Howells's avatar
      new helper: do_new_mount_fc() · 132e4608
      David Howells authored
      Create an fs_context-aware version of do_new_mount().  This takes an
      fs_context with a superblock already attached to it.
      
      Make do_new_mount() use do_new_mount_fc() rather than do_new_mount(); this
      allows the consolidation of the mount creation, check and add steps.
      
      To make this work, mount_too_revealing() is changed to take a superblock
      rather than a mount (which the fs_context doesn't have available), allowing
      this check to be done before the mount object is created.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Co-developed-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      132e4608
    • Al Viro's avatar
      teach vfs_get_tree() to handle subtype, switch do_new_mount() to it · a0c9a8b8
      Al Viro authored
      Roll the handling of subtypes into do_new_mount() and vfs_get_tree().  The
      former determines any subtype string and hangs it off the fs_context; the
      latter applies it.
      
      Make do_new_mount() create, parameterise and commit an fs_context and
      create a mount for itself rather than calling vfs_kern_mount().
      
      [AV -- missing kstrdup()]
      [AV -- ... and no kstrdup() if we get to setting ->s_submount - we
      simply transfer it from fc, leaving NULL behind]
      [AV -- constify ->s_submount, while we are at it]
      Reviewed-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      a0c9a8b8
    • Al Viro's avatar
      new helpers: vfs_create_mount(), fc_mount() · 8f291889
      Al Viro authored
      Create a new helper, vfs_create_mount(), that creates a detached vfsmount
      object from an fs_context that has a superblock attached to it.
      
      Almost all uses will be paired with immediately preceding vfs_get_tree();
      add a helper for such combination.
      
      Switch vfs_kern_mount() to use this.
      
      NOTE: mild behaviour change; passing NULL as 'device name' to
      something like procfs will change /proc/*/mountstats - "device none"
      instead on "no device".  That is consistent with /proc/mounts et.al.
      
      [do'h - EXPORT_SYMBOL_GPL slipped in by mistake; removed]
      [AV -- remove confused comment from vfs_create_mount()]
      [AV -- removed the second argument]
      Reviewed-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      8f291889
    • David Howells's avatar
      vfs: Introduce fs_context, switch vfs_kern_mount() to it. · 9bc61ab1
      David Howells authored
      Introduce a filesystem context concept to be used during superblock
      creation for mount and superblock reconfiguration for remount.  This is
      allocated at the beginning of the mount procedure and into it is placed:
      
       (1) Filesystem type.
      
       (2) Namespaces.
      
       (3) Source/Device names (there may be multiple).
      
       (4) Superblock flags (SB_*).
      
       (5) Security details.
      
       (6) Filesystem-specific data, as set by the mount options.
      
      Accessor functions are then provided to set up a context, parameterise it
      from monolithic mount data (the data page passed to mount(2)) and tear it
      down again.
      
      A legacy wrapper is provided that implements what will be the basic
      operations, wrapping access to filesystems that aren't yet aware of the
      fs_context.
      
      Finally, vfs_kern_mount() is changed to make use of the fs_context and
      mount_fs() is replaced by vfs_get_tree(), called from vfs_kern_mount().
      [AV -- add missing kstrdup()]
      [AV -- put_cred() can be unconditional - fc->cred can't be NULL]
      [AV -- take legacy_validate() contents into legacy_parse_monolithic()]
      [AV -- merge KERNEL_MOUNT and USER_MOUNT]
      [AV -- don't unlock superblock on success return from vfs_get_tree()]
      [AV -- kill 'reference' argument of init_fs_context()]
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Co-developed-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      9bc61ab1
    • Al Viro's avatar
      saner handling of temporary namespaces · 74e83122
      Al Viro authored
      mount_subtree() creates (and soon destroys) a temporary namespace,
      so that automounts could function normally.  These beasts should
      never become anyone's current namespaces; they don't, but it would
      be better to make prevention of that more straightforward.  And
      since they don't become anyone's current namespace, we don't need
      to bother with reserving procfs inums for those.
      
      Teach alloc_mnt_ns() to skip inum allocation if told so, adjust
      put_mnt_ns() accordingly, make mount_subtree() use temporary
      (anon) namespace.  is_anon_ns() checks if a namespace is such.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      74e83122
    • Al Viro's avatar
      separate copying and locking mount tree on cross-userns copies · 3bd045cc
      Al Viro authored
      Rather than having propagate_mnt() check doing unprivileged copies,
      lock them before commit_tree().
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      3bd045cc
  3. 17 Jan, 2019 3 commits
    • Al Viro's avatar
      kill kernfs_pin_sb() · 6d7fbce7
      Al Viro authored
      unused now and impossible to use safely anyway.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      6d7fbce7
    • Al Viro's avatar
      cgroup: saner refcounting for cgroup_root · 35ac1184
      Al Viro authored
      * make the reference from superblock to cgroup_root counting -
      do cgroup_put() in cgroup_kill_sb() whether we'd done
      percpu_ref_kill() or not; matching grab is done when we allocate
      a new root.  That gives the same refcounting rules for all callers
      of cgroup_do_mount() - a reference to cgroup_root has been grabbed
      by caller and it either is transferred to new superblock or dropped.
      
      * have cgroup_kill_sb() treat an already killed refcount as "just
      don't bother killing it, then".
      
      * after successful cgroup_do_mount() have cgroup1_mount() recheck
      if we'd raced with mount/umount from somebody else and cgroup_root
      got killed.  In that case we drop the superblock and bugger off
      with -ERESTARTSYS, same as if we'd found it in the list already
      dying.
      
      * don't bother with delayed initialization of refcount - it's
      unreliable and not needed.  No need to prevent attempts to bump
      the refcount if we find cgroup_root of another mount in progress -
      sget will reuse an existing superblock just fine and if the
      other sb manages to die before we get there, we'll catch
      that immediately after cgroup_do_mount().
      
      * don't bother with kernfs_pin_sb() - no need for doing that
      either.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      35ac1184
    • Al Viro's avatar
      fix cgroup_do_mount() handling of failure exits · 399504e2
      Al Viro authored
      same story as with last May fixes in sysfs (7b745a4e
      "unfuck sysfs_mount()"); new_sb is left uninitialized
      in case of early errors in kernfs_mount_ns() and papering
      over it by treating any error from kernfs_mount_ns() as
      equivalent to !new_ns ends up conflating the cases when
      objects had never been transferred to a superblock with
      ones when that has happened and resulting new superblock
      had been dropped.  Easily fixed (same way as in sysfs
      case).  Additionally, there's a superblock leak on
      kernfs_node_dentry() failure *and* a dentry leak inside
      kernfs_node_dentry() itself - the latter on probably
      impossible errors, but the former not impossible to trigger
      (as the matter of fact, injecting allocation failures
      at that point *does* trigger it).
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      399504e2