- 02 Apr, 2020 21 commits
-
-
Al Viro authored
We fall back to lookup+create (instead of atomic_open) in several cases: 1) we don't have write access to filesystem and O_TRUNC is present in the flags. It's not something we want ->atomic_open() to see - it just might go ahead and truncate the file. However, we can pass it the flags sans O_TRUNC - eventually do_open() will call handle_truncate() anyway. 2) we have O_CREAT | O_EXCL and we can't write to parent. That's going to be an error, of course, but we want to know _which_ error should that be - might be EEXIST (if file exists), might be EACCES or EROFS. Simply stripping O_CREAT (and checking if we see ENOENT) would suffice, if not for O_EXCL. However, we used to have ->atomic_open() fully responsible for rejecting O_CREAT | O_EXCL on existing file and just stripping O_CREAT would've disarmed those checks. With nothing downstream to catch the problem - FMODE_OPENED used to be "don't bother with EEXIST checks, ->atomic_open() has done those". Now EEXIST checks downstream are skipped only if FMODE_CREATED is set - FMODE_OPENED alone is not enough. That has eliminated the need to fall back onto lookup+create path in this case. 3) O_WRONLY or O_RDWR when we have no write access to filesystem, with nothing else objectionable. Fallback is (and had always been) pointless. IOW, we don't really need that fallback; all we need in such cases is to trim O_TRUNC and O_CREAT properly. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
argument had been unused since 1643b43f (lookup_open(): lift the "fallback to !O_CREAT" logics from atomic_open()) back in 2016 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Currently path_openat() has "EEXIST on O_EXCL|O_CREAT" checks done on one of the ways out of open_last_lookups(). There are 4 cases: 1) the last component is . or ..; check is not done. 2) we had FMODE_OPENED or FMODE_CREATED set while in lookup_open(); check is not done. 3) symlink to be traversed is found; check is not done (nor should it be) 4) everything else: check done (before complete_walk(), even). In case (1) O_EXCL|O_CREAT ends up failing with -EISDIR - that's open("/tmp/.", O_CREAT|O_EXCL, 0600) Note that in the same conditions open("/tmp", O_CREAT|O_EXCL, 0600) would have yielded EEXIST. Either error is allowed, switching to -EEXIST in these cases would've been more consistent. Case (2) is more subtle; first of all, if we have FMODE_CREATED set, the object hadn't existed prior to the call. The check should not be done in such a case. The rest is problematic, though - we have FMODE_OPENED set (i.e. it went through ->atomic_open() and got successfully opened there) FMODE_CREATED is *NOT* set O_CREAT and O_EXCL are both set. Any such case is a bug - either we failed to set FMODE_CREATED when we had, in fact, created an object (no such instances in the tree) or we have opened a pre-existing file despite having had both O_CREAT and O_EXCL passed. One of those was, in fact caught (and fixed) while sorting out this mess (gfs2 on cold dcache). And in such situations we should fail with EEXIST. Note that for (1) and (4) FMODE_CREATED is not set - for (1) there's nothing in handle_dots() to set it, for (4) we'd explicitly checked that. And (1), (2) and (4) are exactly the cases when we leave the loop in the caller, with do_open() called immediately after that loop. IOW, we can move the check over there, and make it If we have O_CREAT|O_EXCL and after successful pathname resolution FMODE_CREATED is *not* set, we must have run into a preexisting file and should fail with EEXIST. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
now we can have open_last_lookups() directly from the loop in path_openat() - the rest of do_last() never returns a symlink to follow, so we can bloody well leave the loop first. Rename the rest of that thing from do_last() to do_open() and make it return an int. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
... and adjust the caller (reserve_stack()). Rename to nd_alloc_stack(), while we are at it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
expand the call of nd_alloc_stack() into it (and don't recheck the depth on the second call) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
pick_link() needs to push onto stack; we start with using two-element array embedded into struct nameidata and the first time we need more than that we switch to separately allocated array. Allocation can fail, of course, and handling of that would be simple enough - we need to drop 'link' and bugger off. However, the things get more complicated in RCU mode. There we must do GFP_ATOMIC allocation. If that fails, we try to switch to non-RCU mode and repeat the allocation. To switch to non-RCU mode we need to grab references to 'link' and to everything in nameidata. The latter done by unlazy_walk(); the former - legitimize_path(). 'link' must go first - after unlazy_walk() we are out of RCU-critical period and it's too late to call legitimize_path() since the references in link->mnt and link->dentry might be pointing to freed and reused memory. So we do legitimize_path(), then unlazy_walk(). And that's where it gets too subtle: what to do if the former fails? We MUST do path_put(link) to avoid leaks. And we can't do that under rcu_read_lock(). Solution in mainline was to empty then nameidata manually, drop out of RCU mode and then do put_path(). In effect, we open-code the things eventual terminate_walk() would've done on error in RCU mode. That looks badly out of place and confusing. We could add a comment along the lines of the explanation above, but... there's a simpler solution. Call unlazy_walk() even if legitimaze_path() fails. It will take us out of RCU mode, so we'll be able to do path_put(link). Yes, it will do unnecessary work - attempt to grab references on the stuff in nameidata, only to have them dropped as soon as we return the error to upper layer and get terminate_walk() called there. So what? We are thoroughly off the fast path by that point - we had GFP_ATOMIC allocation fail, we had ->d_seq or mount_lock mismatch and we are about to try walking the same path from scratch in non-RCU mode. Which will need to do the same allocation, this time with GFP_KERNEL, so it will be able to apply memory pressure for blocking stuff. Compared to that the cost of several lockref_get_not_dead() is noise. And the logics become much easier to understand that way. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
step_into() tries to avoid grabbing and dropping mount references on the steps that do not involve crossing mountpoints (which is obviously the majority of cases). So it uses a local struct path with unusual refcounting rules - path.mnt is pinned if and only if it's not equal to nd->path.mnt. We used to have similar beasts all over the place and we had quite a few bugs crop up in their handling - it's easy to get confused when changing e.g. cleanup on failure exits (or adding a new check, etc.) Now that's mostly gone - the step_into() instance (which is what we need them for) is the only one left. It is exposed to mount traversal and it's (shortly) seen by pick_link(). Since pick_link() needs to store it in link stack, where the normal rules apply, it has to make sure that mount is pinned regardless of nd->path.mnt value. That's done on all calls of pick_link() and very early in those. Let's do that in the caller (step_into()) instead - that way the fewer places need to be aware of such struct path instances. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
The only remaining caller (path_pts()) should be using follow_down() anyway. And clean path_pts() a bit. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
new helper: choose_mountpoint(). Wrapper around choose_mountpoint_rcu(), similar to lookup_mnt() vs. __lookup_mnt(). follow_dotdot() switched to it. Now we don't grab mount_lock exclusive anymore; note that the primitive used non-RCU mount traversals in other direction (lookup_mnt()) doesn't bother with that either - it uses mount_lock seqcount instead. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
The loops in follow_dotdot{_rcu()} are doing the same thing: we have a mount and we want to find out how far up the chain of mounts do we need to go. We follow the chain of mount until we find one that is not directly overmounting the root of another mount. If such a mount is found, we want the location it's mounted upon. If we run out of chain (i.e. get to a mount that is not mounted on anything else) or run into process' root, we report failure. On success, we want (in RCU case) d_seq of resulting location sampled or (in non-RCU case) references to that location acquired. This commit introduces such primitive for RCU case and switches follow_dotdot_rcu() to it; non-RCU case will be go in the next commit. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Change nd->path only after the loop is done and only in case we hadn't ended up finding ourselves in root. Same for NO_XDEV check. That separates the "check how far back do we need to go through the mount stack" logics from the rest of .. traversal. NOTE: path_get/path_put introduced here are temporary. They will go away later in the series. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Change nd->path only after the loop is done and only in case we hadn't ended up finding ourselves in root. Same for NO_XDEV check. Don't recheck mount_lock on each step either. That separates the "check how far back do we need to go through the mount stack" logics from the rest of .. traversal. Note that the sequence for d_seq/d_inode here is * sample mount_lock seqcount ... * sample d_seq * fetch d_inode * verify mount_lock seqcount The last step makes sure that d_inode value we'd got matches d_seq - it dentry is guaranteed to have been a mountpoint through the entire thing, so its d_inode must have been stable. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
The logics in both of them is the same: while true if in process' root // uncommon break if *not* in mount root // normal case find the parent return if at absolute root // very uncommon break move to underlying mountpoint report that we are in root Pull the common path out of the loop: if in process' root // uncommon goto in_root if unlikely(in mount root) while true if at absolute root goto in_root move to underlying mountpoint if in process' root goto in_root if in mount root break; find the parent // we are not in mount root return in_root: report that we are in root The reason for that transformation is that we get to keep the common path straight *and* get a separate block for "move through underlying mountpoints", which will allow to sanitize NO_XDEV handling there. What's more, the pared-down loops will be easier to deal with - in particular, non-RCU case has no need to grab mount_lock and rewriting it to the form that wouldn't do that is a non-trivial change. Better do that with less stuff getting in the way... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
lift step_into() into handle_dots() (where they merge with each other); have follow_... return dentry and pass inode/seq to the caller. [braino fix folded; kudos to Qian Cai <cai@lca.pw> for reporting it] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
- 14 Mar, 2020 19 commits
-
-
Al Viro authored
gets the regular mount crossing on result of .. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Right now the tail ends of follow_dotdot{,_rcu}() are pretty much the open-coded analogues of step_into(). The differences: * the lack of proper LOOKUP_NO_XDEV handling in non-RCU case (arguably a bug) * the lack of ->d_manage() handling (again, arguably a bug) Adjust the calling conventions so that on the next step with could just switch those functions to returning step_into(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
pure move; we are going to have step_into() called by that bunch. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Behaviour change: LOOKUP_BENEATH lookup of .. in absolute root yields an error even if it's not the process' root. That's possible only if you'd managed to escape chroot jail by way of procfs symlinks, but IMO the resulting behaviour is not worse - more consistent and easier to describe: ".." in root is "stay where you are", uness LOOKUP_BENEATH has been given, in which case it's "fail with EXDEV". Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Instead of returning 0, return new dentry; instead of returning -ENOENT, return NULL. Adjust the callers accordingly. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
eventually we'll want to do that check *before* mangling nd->path.dentry... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
... getting may_create_in_sticky() checks in FMODE_OPENED case as well. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Don't mess with got_write there - it is guaranteed to be false on entry and it will be set true if and only if we decide to go for truncation and manage to get write access for that. Don't carry acc_mode through the entire thing - it's only used in that part. And don't bother with gotos in there - compiler is quite capable of optimizing that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
it's easier to drop it right after lookup_open() and regain if needed (i.e. if we will need to truncate). On the non-FMODE_OPENED path we do that anyway. In case of FMODE_CREATED we won't be needing it. And it's easier to prove correctness that way, especially since the initial failure to get write access is not always fatal; proving that we'll never end up truncating in that case is rather convoluted. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
have FMODE_OPENED case rejoin the main path at earlier point Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
there we'll be able to merge it with its counterparts in other cases, and there's no reason to do it before the parent has been unlocked Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
->atomic_open() might have used a different alias than the one we'd passed to it; in "not opened" case we take care of that, in "opened" one we don't. Currently we don't care downstream of "opened" case which alias to return; however, that will change shortly when we get to unifying may_open() calls. It's not hard to get right in all cases, anyway. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
common guts of follow_down() and follow_managed() taken to a new helper - traverse_mounts(). The remnants of follow_managed() are folded into its sole remaining caller (handle_mounts()). Calling conventions of handle_mounts() slightly sanitized - instead of the weird "1 for success, -E... for failure" that used to be imposed by the calling conventions of walk_component() et.al. we can use the normal "0 for success, -E... for failure". Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
make the loop more similar to that in follow_managed(), with explicit tracking of flags, etc. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
set on entry, clear when we get to the last component. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-