1. 02 Apr, 2020 2 commits
    • Al Viro's avatar
      follow_dotdot{,_rcu}(): massage loops · 12487f30
      Al Viro authored
      The logics in both of them is the same:
      	while true
      		if in process' root	// uncommon
      			break
      		if *not* in mount root	// normal case
      			find the parent
      			return
      		if at absolute root	// very uncommon
      			break
      		move to underlying mountpoint
      	report that we are in root
      
      Pull the common path out of the loop:
      	if in process' root		// uncommon
      		goto in_root
      	if unlikely(in mount root)
      		while true
      			if at absolute root
      				goto in_root
      			move to underlying mountpoint
      			if in process' root
      				goto in_root
      			if in mount root
      				break;
      	find the parent	// we are not in mount root
      	return
      in_root:
      	report that we are in root
      
      The reason for that transformation is that we get to keep the
      common path straight *and* get a separate block for "move
      through underlying mountpoints", which will allow to sanitize
      NO_XDEV handling there.  What's more, the pared-down loops
      will be easier to deal with - in particular, non-RCU case
      has no need to grab mount_lock and rewriting it to the
      form that wouldn't do that is a non-trivial change.  Better
      do that with less stuff getting in the way...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      12487f30
    • Al Viro's avatar
      lift all calls of step_into() out of follow_dotdot/follow_dotdot_rcu · c2df1968
      Al Viro authored
      lift step_into() into handle_dots() (where they merge with each other);
      have follow_... return dentry and pass inode/seq to the caller.
      
      [braino fix folded; kudos to Qian Cai <cai@lca.pw> for reporting it]
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      c2df1968
  2. 14 Mar, 2020 38 commits
    • Al Viro's avatar
      follow_dotdot{,_rcu}(): switch to use of step_into() · 6dfd9fe5
      Al Viro authored
      gets the regular mount crossing on result of ..
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      6dfd9fe5
    • Al Viro's avatar
      handle_dots(), follow_dotdot{,_rcu}(): preparation to switch to step_into() · 7521f22b
      Al Viro authored
      Right now the tail ends of follow_dotdot{,_rcu}() are pretty
      much the open-coded analogues of step_into().  The differences:
      	* the lack of proper LOOKUP_NO_XDEV handling in non-RCU case
      (arguably a bug)
      	* the lack of ->d_manage() handling (again, arguably a bug)
      
      Adjust the calling conventions so that on the next step with could
      just switch those functions to returning step_into().
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      7521f22b
    • Al Viro's avatar
      move handle_dots(), follow_dotdot() and follow_dotdot_rcu() past step_into() · 957dd41d
      Al Viro authored
      pure move; we are going to have step_into() called by that bunch.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      957dd41d
    • Al Viro's avatar
      follow_dotdot{,_rcu}(): lift LOOKUP_BENEATH checks out of loop · c9a0f75d
      Al Viro authored
      Behaviour change: LOOKUP_BENEATH lookup of .. in absolute root
      yields an error even if it's not the process' root.  That's
      possible only if you'd managed to escape chroot jail by way of
      procfs symlinks, but IMO the resulting behaviour is not worse -
      more consistent and easier to describe:
      	".." in root is "stay where you are", uness LOOKUP_BENEATH
      	has been given, in which case it's "fail with EXDEV".
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      c9a0f75d
    • Al Viro's avatar
    • Al Viro's avatar
      expand path_parent_directory() in its callers · a6a7eb76
      Al Viro authored
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      a6a7eb76
    • Al Viro's avatar
      path_parent_directory(): leave changing path->dentry to callers · 63b27720
      Al Viro authored
      Instead of returning 0, return new dentry; instead of returning
      -ENOENT, return NULL.  Adjust the callers accordingly.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      63b27720
    • Al Viro's avatar
      path_connected(): pass mount and dentry separately · 6b03f7ed
      Al Viro authored
      eventually we'll want to do that check *before* mangling
      nd->path.dentry...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      6b03f7ed
    • Al Viro's avatar
    • Al Viro's avatar
      do_last(): rejoin the common path even earlier in FMODE_{OPENED,CREATED} case · 973d4b73
      Al Viro authored
      ... getting may_create_in_sticky() checks in FMODE_OPENED case as well.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      973d4b73
    • Al Viro's avatar
      do_last(): simplify the liveness analysis past finish_open_created · 8795e7d4
      Al Viro authored
      Don't mess with got_write there - it is guaranteed to be false on
      entry and it will be set true if and only if we decide to go for
      truncation and manage to get write access for that.
      
      Don't carry acc_mode through the entire thing - it's only used
      in that part.  And don't bother with gotos in there - compiler is
      quite capable of optimizing that.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      8795e7d4
    • Al Viro's avatar
    • Al Viro's avatar
      do_last(): don't bother with keeping got_write in FMODE_OPENED case · 59e96e65
      Al Viro authored
      it's easier to drop it right after lookup_open() and regain if
      needed (i.e. if we will need to truncate).  On the non-FMODE_OPENED
      path we do that anyway.  In case of FMODE_CREATED we won't be
      needing it.  And it's easier to prove correctness that way,
      especially since the initial failure to get write access is not
      always fatal; proving that we'll never end up truncating in that
      case is rather convoluted.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      59e96e65
    • Al Viro's avatar
      do_last(): merge the may_open() calls · 3ad5615a
      Al Viro authored
      have FMODE_OPENED case rejoin the main path at earlier point
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      3ad5615a
    • Al Viro's avatar
      atomic_open(): lift the call of may_open() into do_last() · 7be219b4
      Al Viro authored
      there we'll be able to merge it with its counterparts in other
      cases, and there's no reason to do it before the parent has
      been unlocked
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      7be219b4
    • Al Viro's avatar
      atomic_open(): return the right dentry in FMODE_OPENED case · 6fb968cd
      Al Viro authored
      ->atomic_open() might have used a different alias than the one we'd
      passed to it; in "not opened" case we take care of that, in "opened"
      one we don't.  Currently we don't care downstream of "opened" case
      which alias to return; however, that will change shortly when we
      get to unifying may_open() calls.
      
      It's not hard to get right in all cases, anyway.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      6fb968cd
    • Al Viro's avatar
      new helper: traverse_mounts() · 9deed3eb
      Al Viro authored
      common guts of follow_down() and follow_managed() taken to a new
      helper - traverse_mounts().  The remnants of follow_managed()
      are folded into its sole remaining caller (handle_mounts()).
      Calling conventions of handle_mounts() slightly sanitized -
      instead of the weird "1 for success, -E... for failure" that used
      to be imposed by the calling conventions of walk_component() et.al.
      we can use the normal "0 for success, -E... for failure".
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      9deed3eb
    • Al Viro's avatar
      massage __follow_mount_rcu() a bit · ea936aeb
      Al Viro authored
      make the loop more similar to that in follow_managed(), with
      explicit tracking of flags, etc.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      ea936aeb
    • Al Viro's avatar
      namei: have link_path_walk() maintain LOOKUP_PARENT · c108837e
      Al Viro authored
      set on entry, clear when we get to the last component.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      c108837e
    • Al Viro's avatar
      link_path_walk(): simplify stack handling · d8d4611a
      Al Viro authored
      We use nd->stack to store two things: pinning down the symlinks
      we are resolving and resuming the name traversal when a nested
      symlink is finished.
      
      Currently, nd->depth is used to keep track of both.  It's 0 when
      we call link_path_walk() for the first time (for the pathname
      itself) and 1 on all subsequent calls (for trailing symlinks,
      if any).  That's fine, as far as pinning symlinks goes - when
      handling a trailing symlink, the string we are interpreting
      is the body of symlink pinned down in nd->stack[0].  It's
      rather inconvenient with respect to handling nested symlinks,
      though - when we run out of a string we are currently interpreting,
      we need to decide whether it's a nested symlink (in which case
      we need to pick the string saved back when we started to interpret
      that nested symlink and resume its traversal) or not (in which
      case we are done with link_path_walk()).
      
      Current solution is a bit of a kludge - in handling of trailing symlink
      (in lookup_last() and open_last_lookups() we clear nd->stack[0].name.
      That allows link_path_walk() to use the following rules when
      running out of a string to interpret:
      	* if nd->depth is zero, we are at the end of pathname itself.
      	* if nd->depth is positive, check the saved string; for
      nested symlink it will be non-NULL, for trailing symlink - NULL.
      
      It works, but it's rather non-obvious.  Note that we have two sets:
      the set of symlinks currently being traversed and the set of postponed
      pathname tails.  The former is stored in nd->stack[0..nd->depth-1].link
      and it's valid throught the pathname resolution; the latter is valid only
      during an individual call of link_path_walk() and it occupies
      nd->stack[0..nd->depth-1].name for the first call of link_path_walk() and
      nd->stack[1..nd->depth-1].name for subsequent ones.  The kludge is basically
      a way to recognize the second set becoming empty.
      
      The things get simpler if we keep track of the second set's size
      explicitly and always store it in nd->stack[0..depth-1].name.
      We access the second set only inside link_path_walk(), so its
      size can live in a local variable; that way the check becomes
      trivial without the need of that kludge.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      d8d4611a
    • Al Viro's avatar
      b1a81972
    • Al Viro's avatar
      namei: invert the meaning of WALK_FOLLOW · 8c4efe22
      Al Viro authored
      old flags & WALK_FOLLOW <=> new !(flags & WALK_TRAILING)
      That's what that flag had really been used for.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      8c4efe22
    • Al Viro's avatar
      sanitize handling of nd->last_type, kill LAST_BIND · b4c03536
      Al Viro authored
      ->last_type values are set in 3 places: path_init() (sets to LAST_ROOT),
      link_path_walk (LAST_NORM/DOT/DOTDOT) and pick_link (LAST_BIND).
      
      The are checked in walk_component(), lookup_last() and do_last().
      They also get copied to the caller by filename_parentat().  In the last
      3 cases the value is what we had at the return from link_path_walk().
      In case of walk_component() it's either directly downstream from
      assignment in link_path_walk() or, when called by lookup_last(), the
      value we have at the return from link_path_walk().
      
      The value at the entry into link_path_walk() can survive to return only
      if the pathname contains nothing but slashes.  Note that pick_link()
      never returns such - pure jumps are handled directly.  So for the calls
      of link_path_walk() for trailing symlinks it does not matter what value
      had been there at the entry; the value at the return won't depend upon it.
      
      There are 3 call chains that might have pick_link() storing LAST_BIND:
      
      1) pick_link() from step_into() from walk_component() from
      link_path_walk().  In that case we will either be parsing the next
      component immediately after return into link_path_walk(), which will
      overwrite the ->last_type before anyone has a chance to look at it,
      or we'll fail, in which case nobody will be looking at ->last_type at all.
      
      2) pick_link() from step_into() from walk_component() from lookup_last().
      The value is never looked at due to the above; it won't affect the value
      seen at return from any link_path_walk().
      
      3) pick_link() from step_into() from do_last().  Ditto.
      
      In other words, assignemnt in pick_link() is pointless, and so is
      LAST_BIND itself; nothing ever looks at that value.  Kill it off.
      And make link_path_walk() _always_ assign ->last_type - in the only
      case when the value at the entry might survive to the return that value
      is always LAST_ROOT, inherited from path_init().  Move that assignment
      from path_init() into the beginning of link_path_walk(), to consolidate
      the things.
      
      Historical note: LAST_BIND used to be used for the kludge with trailing
      pure jump symlinks (extra iteration through the top-level loop).
      No point keeping it anymore...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      b4c03536
    • Al Viro's avatar
      finally fold get_link() into pick_link() · ad6cc4c3
      Al Viro authored
      kill nd->link_inode, while we are at it
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      ad6cc4c3
    • Al Viro's avatar
      merging pick_link() with get_link(), part 6 · 06708adb
      Al Viro authored
      move the only remaining call of get_link() into pick_link()
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      06708adb
    • Al Viro's avatar
      merging pick_link() with get_link(), part 5 · b0417d2c
      Al Viro authored
      move get_link() call into step_into().
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      b0417d2c
    • Al Viro's avatar
      merging pick_link() with get_link(), part 4 · 92d27016
      Al Viro authored
      Move the call of get_link() into walk_component().  Change the
      calling conventions for walk_component() to returning the link
      body to follow (if any).
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      92d27016
    • Al Viro's avatar
      merging pick_link() with get_link(), part 3 · 40fcf5a9
      Al Viro authored
      After a pure jump ("/" or procfs-style symlink) we don't need to
      hold the link anymore.  link_path_walk() dropped it if such case
      had been detected, lookup_last/do_last() (i.e. old trailing_symlink())
      left it on the stack - it ended up calling terminate_walk() shortly
      anyway, which would've purged the entire stack.
      
      Do it in get_link() itself instead.  Simpler logics that way...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      40fcf5a9
    • Al Viro's avatar
      merging pick_link() with get_link(), part 2 · 1ccac622
      Al Viro authored
      Fold trailing_symlink() into lookup_last() and do_last(), change
      the calling conventions of those two.  Rules change:
      	success, we are done => NULL instead of 0
      	error	=> ERR_PTR(-E...) instead of -E...
      	got a symlink to follow => return the path to be followed instead of 1
      
      The loops calling those (in path_lookupat() and path_openat()) adjusted.
      
      A subtle change of control flow here: originally a pure-jump trailing
      symlink ("/" or procfs one) would've passed through the upper level
      loop once more, with "" for path to traverse.  That would've brought
      us back to the lookup_last/do_last entry and we would've hit LAST_BIND
      case (LAST_BIND left from get_link() called by trailing_symlink())
      and pretty much skip to the point right after where we'd left the
      sucker back when we picked that trailing symlink.
      
      Now we don't bother with that extra pass through the upper level
      loop - if get_link() says "I've just done a pure jump, nothing
      else to do", we just treat that as non-symlink case.
      
      Boilerplate added on that step will go away shortly - it'll migrate
      into walk_component() and then to step_into(), collapsing into the
      change of calling conventions for those.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      1ccac622
    • Al Viro's avatar
      merging pick_link() with get_link(), part 1 · 43679723
      Al Viro authored
      Move restoring LOOKUP_PARENT and zeroing nd->stack.name[0] past
      the call of get_link() (nothing _currently_ uses them in there).
      That allows to moved the call of may_follow_link() into get_link()
      as well, since now the presence of LOOKUP_PARENT distinguishes
      the callers from each other (link_path_walk() has it, trailing_symlink()
      doesn't).
      
      Preparations for folding trailing_symlink() into callers (lookup_last()
      and do_last()) and changing the calling conventions of those.  Next
      stage after that will have get_link() call migrate into walk_component(),
      then - into step_into().  It's tricky enough to warrant doing that
      in stages, unfortunately...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      43679723
    • Al Viro's avatar
      a9dc1494
    • Al Viro's avatar
      LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat() · 161aff1d
      Al Viro authored
      New LOOKUP flag, telling path_lookupat() to act as path_mountpointat().
      IOW, traverse mounts at the final point and skip revalidation of the
      location where it ends up.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      161aff1d
    • Al Viro's avatar
      fold handle_mounts() into step_into() · cbae4d12
      Al Viro authored
      The following is true:
      	* calls of handle_mounts() and step_into() are always
      paired in sequences like
      	err = handle_mounts(nd, dentry, &path, &inode, &seq);
      	if (unlikely(err < 0))
      		return err;
      	err = step_into(nd, &path, flags, inode, seq);
      	* in all such sequences path is uninitialized before and
      unused after this pair of calls
      	* in all such sequences inode and seq are unused afterwards.
      
      So the call of handle_mounts() can be shifted inside step_into(),
      turning 'path' into a local variable in the combined function.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      cbae4d12
    • Al Viro's avatar
      new step_into() flag: WALK_NOFOLLOW · aca2903e
      Al Viro authored
      Tells step_into() not to follow symlinks, regardless of LOOKUP_FOLLOW.
      Allows to switch handle_lookup_down() to of step_into(), getting
      all follow_managed() and step_into() calls paired.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      aca2903e
    • Al Viro's avatar
      step_into() callers: dismiss the symlink earlier · 56676ec3
      Al Viro authored
      We need to dismiss a symlink when we are done traversing it;
      currently that's done when we call step_into() for its last
      component.  For the cases when we do not call step_into()
      for that component (i.e. when it's . or ..) we do the same
      symlink dismissal after the call of handle_dots().
      
      What we need to guarantee is that the symlink won't be dismissed
      while we are still using nd->last.name - it's pointing into the
      body of said symlink.  step_into() is sufficiently late - by
      the time it's called we'd already obtained the dentry, so the
      name we'd been looking up is no longer needed.  However, it
      turns out to be cleaner to have that ("we are done with that
      component now, can dismiss the link") done explicitly - in the
      callers of step_into().
      
      In handle_dots() case we won't be using the component string
      at all, so for . and .. the corresponding point is actually
      _before_ the call of handle_dots(), not after it.
      
      Fix a minor irregularity in do_last(), while we are at it -
      if trailing symlink ended with . or .. we forgot to dismiss
      it.  Not a problem, since nameidata is about to be done with
      (neither . nor .. can be a trailing symlink, so this is the
      last iteration through the loop) and terminate_walk() will
      clean the stack anyway, but let's keep it more regular.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      56676ec3
    • Al Viro's avatar
      lookup_fast(): take mount traversal into callers · 20e34357
      Al Viro authored
      Current calling conventions: -E... on error, 0 on cache miss,
      result of handle_mounts(nd, dentry, path, inode, seqp) on
      success.  Turn that into returning ERR_PTR(-E...), NULL and dentry
      resp.; deal with handle_mounts() in the callers.  The thing
      is, they already do that in cache miss handling case, so we
      just need to supply dentry to them and unify the mount traversal
      in those cases.  Fewer arguments that way, and we get closer
      to merging handle_mounts() and step_into().
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      20e34357
    • Al Viro's avatar
      teach handle_mounts() to handle RCU mode · c153007b
      Al Viro authored
      ... and make the callers of __follow_mount_rcu() use handle_mounts().
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      c153007b
    • Al Viro's avatar
      lookup_fast(): consolidate the RCU success case · b023e172
      Al Viro authored
      1) in case of __follow_mount_rcu() failure, lookup_fast() proceeds
      to call unlazy_child() and, should it succeed, handle_mounts().
      Note that we have status > 0 (or we wouldn't be calling
      __follow_mount_rcu() at all), so all stuff conditional upon
      non-positive status won't be even touched.
      
      Consolidate just that sequence after the call of __follow_mount_rcu().
      
      2) calling d_is_negative() and keeping its result is pointless -
      we either don't get past checking ->d_seq (and don't use the results of
      d_is_negative() at all), or we are guaranteed that ->d_inode and
      type bits of ->d_flags had been consistent at the time of d_is_negative()
      call.  IOW, we could only get to the use of its result if it's
      equal to !inode.  The same ->d_seq check guarantees that after that point
      this CPU won't observe ->d_flags values older than ->d_inode update.
      So 'negative' variable is completely pointless these days.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      b023e172