- 25 Nov, 2023 32 commits
-
-
Vegard Nossum authored
dget_dlock() requires dentry->d_lock to be held when called, yet contains a NULL check for dentry. An audit of all calls to dget_dlock() shows that it is never called with a NULL pointer (as spin_lock()/spin_unlock() would crash in these cases): $ git grep -W '\<dget_dlock\>' arch/powerpc/platforms/cell/spufs/inode.c- spin_lock(&dentry->d_lock); arch/powerpc/platforms/cell/spufs/inode.c- if (simple_positive(dentry)) { arch/powerpc/platforms/cell/spufs/inode.c: dget_dlock(dentry); fs/autofs/expire.c- spin_lock_nested(&child->d_lock, DENTRY_D_LOCK_NESTED); fs/autofs/expire.c- if (simple_positive(child)) { fs/autofs/expire.c: dget_dlock(child); fs/autofs/root.c: dget_dlock(active); fs/autofs/root.c- spin_unlock(&active->d_lock); fs/autofs/root.c: dget_dlock(expiring); fs/autofs/root.c- spin_unlock(&expiring->d_lock); fs/ceph/dir.c- if (!spin_trylock(&dentry->d_lock)) fs/ceph/dir.c- continue; [...] fs/ceph/dir.c: dget_dlock(dentry); fs/ceph/mds_client.c- spin_lock(&alias->d_lock); [...] fs/ceph/mds_client.c: dn = dget_dlock(alias); fs/configfs/inode.c- spin_lock(&dentry->d_lock); fs/configfs/inode.c- if (simple_positive(dentry)) { fs/configfs/inode.c: dget_dlock(dentry); fs/libfs.c: found = dget_dlock(d); fs/libfs.c- spin_unlock(&d->d_lock); fs/libfs.c: found = dget_dlock(child); fs/libfs.c- spin_unlock(&child->d_lock); fs/libfs.c: child = dget_dlock(d); fs/libfs.c- spin_unlock(&d->d_lock); fs/ocfs2/dcache.c: dget_dlock(dentry); fs/ocfs2/dcache.c- spin_unlock(&dentry->d_lock); include/linux/dcache.h:static inline struct dentry *dget_dlock(struct dentry *dentry) After taking out the NULL check, dget_dlock() becomes almost identical to __dget_dlock(); the only difference is that dget_dlock() returns the dentry that was passed in. These are static inline helpers, so we can rely on the compiler to discard unused return values. We can therefore also remove __dget_dlock() and replace calls to it by dget_dlock(). Also fix up and improve the kerneldoc comments while we're at it. Al Viro pointed out that we can also clean up some of the callers to make use of the returned value and provided a bit more info for the kerneldoc. While preparing v2 I also noticed that the tabs used in the kerneldoc comments were causing the kerneldoc to get parsed incorrectly so I also fixed this up (including for d_unhashed, which is otherwise unrelated). Testing: x86 defconfig build + boot; make htmldocs for the kerneldoc warning. objdump shows there are code generation changes. Link: https://lore.kernel.org/all/20231022164520.915013-1-vegard.nossum@oracle.com/ Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: linux-fsdevel@vger.kernel.org Cc: Nick Piggin <npiggin@kernel.dk> Cc: Waiman Long <Waiman.Long@hp.com> Cc: linux-doc@vger.kernel.org Signed-off-by:
Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
With the new ordering in __dentry_kill() it has become redundant - it's set if and only if both DCACHE_DENTRY_KILLED and DCACHE_SHRINK_LIST are set. We set it in __dentry_kill(), after having set DCACHE_DENTRY_KILLED with the only condition being that DCACHE_SHRINK_LIST is there; all of that is done without dropping ->d_lock and the only place that checks that flag (shrink_dentry_list()) does so under ->d_lock, after having found the victim on its shrink list. Since DCACHE_SHRINK_LIST is set only when placing dentry into shrink list and removed only by shrink_dentry_list() itself, a check for DCACHE_DENTRY_KILLED in there would be equivalent to check for DCACHE_MAY_FREE. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
-
Al Viro authored
... and hasn't since 2015. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
We only search in the damn thing under hlist_bl_lock(); RCU variant of insertion was, IIRC, pretty much cargo-culted - mea culpa... Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
... now that we never call d_genocide() other than from kill_litter_super() Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Failing ->fill_super() will be followed by ->kill_sb(), which should include kill_litter_super() if the call of simple_fill_super() had been asked to create anything besides the root dentry. So there's no need to empty the partially populated tree - it will be trimmed by inevitable kill_litter_super(). Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
-
Al Viro authored
Normally d_make_root() is used to create the root dentry of superblock; here we use it for a different purpose, but... idiomatic or not, we need the same operation. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
now that the only user of d_instantiate_anon() is gone... [braino fix folded - kudos to Dan Carpenter] Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
fast_dput() contains a small piece of code, preceded by scary comments about 5 times longer than it. What is actually done there is a trimmed-down subset of retain_dentry() - in some situations we can tell that retain_dentry() would have returned true without ever needing ->d_lock and that's what that code checks. If these checks come true fast_dput() can declare that we are done, without bothering with ->d_lock; otherwise it has to take the lock and do full variant of retain_dentry() checks. Trimmed-down variant of the checks is hard to follow and it's asking for trouble - if we ever decide to change the rules in retain_dentry(), we'll have to remember to update that code. It turns out that an equivalent variant of these checks more obviously parallel to retain_dentry() is not just possible, but easy to unify with retain_dentry() itself, passing it a new boolean argument ('locked') to distinguish between the full semantics and trimmed down one. Note that in lockless case true is returned only when locked variant would have returned true without ever needing the lock; false means "punt to the locking path and recheck there". Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Currently we enter __dentry_kill() with parent (along with the victim dentry and victim's inode) held locked. Then we mark dentry refcount as dead call ->d_prune() remove dentry from hash remove it from the parent's list of children unlock the parent, don't need it from that point on detach dentry from inode, unlock dentry and drop the inode (via ->d_iput()) call ->d_release() regain the lock on dentry check if it's on a shrink list (in which case freeing its empty husk has to be left to shrink_dentry_list()) or not (in which case we can free it ourselves). In the former case, mark it as an empty husk, so that shrink_dentry_list() would know it can free the sucker. drop the lock on dentry ... and usually the caller proceeds to drop a reference on the parent, possibly retaking the lock on it. That is painful for a bunch of reasons, starting with the need to take locks out of order, but not limited to that - the parent of positive dentry can change if we drop its ->d_lock, so getting these locks has to be done with care. Moreover, as soon as dentry is out of the parent's list of children, shrink_dcache_for_umount() won't see it anymore, making it appear as if the parent is inexplicably busy. We do work around that by having shrink_dentry_list() decrement the parent's refcount first and put it on shrink list to be evicted once we are done with __dentry_kill() of child, but that may in some cases lead to ->d_iput() on child called after the parent got killed. That doesn't happen in cases where in-tree ->d_iput() instances might want to look at the parent, but that's brittle as hell. Solution: do removal from the parent's list of children in the very end of __dentry_kill(). As the result, the callers do not need to lock the parent and by the time we really need the parent locked, dentry is negative and is guaranteed not to be moved around. It does mean that ->d_prune() will be called with parent not locked. It also means that we might see dentries in process of being torn down while going through the parent's list of children; those dentries will be unhashed, negative and with refcount marked dead. In practice, that's enough for in-tree code that looks through the list of children to do the right thing as-is. Out-of-tree code might need to be adjusted. Calling conventions: __dentry_kill(dentry) is called with dentry->d_lock held, along with ->i_lock of its inode (if any). It either returns the parent (locked, with refcount decremented to 0) or NULL (if there'd been no parent or if refcount decrement for parent hadn't reached 0). lock_for_kill() is adjusted for new requirements - it doesn't touch the parent's ->d_lock at all. Callers adjusted. Note that for dput() we don't need to bother with fast_dput() for the parent - we just need to check retain_dentry() for it, since its ->d_lock is still held since the moment when __dentry_kill() had taken it to remove the victim from the list of children. The kludge with early decrement of parent's refcount in shrink_dentry_list() is no longer needed - shrink_dcache_for_umount() sees the half-killed dentries in the list of children for as long as they are pinning the parent. They are easily recognized and accounted for by select_collect(), so we know we are not done yet. As the result, we always have the expected ordering for ->d_iput()/->d_release() vs. __dentry_kill() of the parent, no exceptions. Moreover, the current rules for shrink lists (one must make sure that shrink_dcache_for_umount() won't happen while any dentries from the superblock in question are on any shrink lists) are gone - shrink_dcache_for_umount() will do the right thing in all cases, taking such dentries out. Their empty husks (memory occupied by struct dentry itself + its external name, if any) will remain on the shrink lists, but they are no obstacles to filesystem shutdown. And such husks will get freed as soon as shrink_dentry_list() of the list they are on gets to them. Reviewed-by:
Christian Brauner <brauner@kernel.org> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Instead of dropping aliases one by one, restarting, etc., just collect them into a shrink list and kill them off in one pass. We don't really need the restarts - one alias can't pin another (directory has only one alias, and couldn't be its own ancestor anyway), so collecting everything that is not busy and taking it out would take care of everything evictable that had been there as we entered the function. And new aliases added while we'd been dropping old ones could just as easily have appeared right as we return to caller... Reviewed-by:
Christian Brauner <brauner@kernel.org> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Reviewed-by:
Christian Brauner <brauner@kernel.org> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
The only thing it does if refcount is not zero is d_lru_del(); no point, IMO, seeing that plain dput() does nothing of that sort... Note that 2 of 3 current callers are guaranteed that refcount is 0. Acked-by:
Christian Brauner <brauner@kernel.org> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Reviewed-by:
Christian Brauner <brauner@kernel.org> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
That is to say, do *not* treat the ->d_inode or ->d_parent changes as "it's hard, return false; somebody must have grabbed it, so even if has zero refcount, we don't need to bother killing it - final dput() from whoever grabbed it would've done everything". First of all, that is not guaranteed. It might have been dropped by shrink_kill() handling of victim's parent, which would've found it already on a shrink list (ours) and decided that they don't need to put it on their shrink list. What's more, dentry_kill() is doing pretty much the same thing, cutting its own set of corners (it assumes that dentry can't go from positive to negative, so its inode can change but only once and only in one direction). Doing that right allows to get rid of that not-quite-duplication and removes the only reason for re-incrementing refcount before the call of dentry_kill(). Replacement is called lock_for_kill(); called under rcu_read_lock and with ->d_lock held. If it returns false, dentry has non-zero refcount and the same locks are held. If it returns true, dentry has zero refcount and all locks required by __dentry_kill() are taken. Part of __lock_parent() had been lifted into lock_parent() to allow its reuse. Now it's called with rcu_read_lock already held and dentry already unlocked. Note that this is not the final change - locking requirements for __dentry_kill() are going to change later in the series and the set of locks taken by lock_for_kill() will be adjusted. Both lock_parent() and __lock_parent() will be gone once that happens. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Calls of retain_dentry() happen immediately after getting false from fast_dput() and getting true from retain_dentry() is treated the same way as non-zero refcount would be treated by fast_dput() - unlock dentry and bugger off. Doing that in fast_dput() itself is simpler. Reviewed-by:
Christian Brauner <brauner@kernel.org> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Instead of bumping it from 0 to 1, calling retain_dentry(), then decrementing it back to 0 (with ->d_lock held all the way through), just leave refcount at 0 through all of that. It will have a visible effect for ->d_delete() - now it can be called with refcount 0 instead of 1 and it can no longer play silly buggers with dropping/regaining ->d_lock. Not that any in-tree instances tried to (it's pretty hard to get right). Any out-of-tree ones will have to adjust (assuming they need any changes). Note that we do not need to extend rcu-critical area here - we have verified that refcount is non-negative after having grabbed ->d_lock, so nobody will be able to free dentry until they get into __dentry_kill(), which won't happen until they manage to grab ->d_lock. Reviewed-by:
Christian Brauner <brauner@kernel.org> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
We have already checked it and dentry used to look not worthy of keeping. The only hard obstacle to evicting dentry is non-zero refcount; everything else is advisory - e.g. memory pressure could evict any dentry found with refcount zero. On the slow path in dentry_kill() we had dropped and regained ->d_lock; we must recheck the refcount, but everything else is not worth bothering with. Note that filesystem can not count upon ->d_delete() being called for dentry - not even once. Again, memory pressure (as well as d_prune_aliases(), or attempted rmdir() of ancestor, or...) will not call ->d_delete() at all. So from the correctness point of view we are fine doing the check only once. And it makes things simpler down the road. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Currently we call it with refcount equal to 1 when called from dentry_kill(); all other callers have it equal to 0. Make it always be called with zero refcount; on this step we just decrement it before the calls in dentry_kill(). That is safe, since all places that care about the value of refcount either do that under ->d_lock or hold a reference to dentry in question. Either is sufficient to prevent observing a dentry immediately prior to __dentry_kill() getting called from dentry_kill(). Reviewed-by:
Christian Brauner <brauner@kernel.org> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
retain_dentry() used to decrement refcount if and only if it returned true. Lift those decrements into the callers. Reviewed-by:
Christian Brauner <brauner@kernel.org> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
... and rename it to to_shrink_list(), seeing that it no longer does dropping any references Reviewed-by:
Christian Brauner <brauner@kernel.org> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
By now there is only one place in entire fast_dput() where we return false; that happens after refcount had been decremented and found (under ->d_lock) to be zero. In that case, just prior to returning false to caller, fast_dput() forcibly changes the refcount from 0 to 1. Lift that resetting refcount to 1 into the callers; later in the series it will be massaged out of existence. Reviewed-by:
Christian Brauner <brauner@kernel.org> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
If refcount is less than 1, we should just warn, unlock dentry and return true, so that the caller doesn't try to do anything else. Taking care of that leaves the rest of "lockref_put_return() has failed" case equivalent to "decrement refcount and rejoin the normal slow path after the point where we grab ->d_lock". NOTE: lockref_put_return() is strictly a fastpath thing - unlike the rest of lockref primitives, it does not contain a fallback. Caller (and it looks like fast_dput() is the only legitimate one in the entire kernel) has to do that itself. Reasons for lockref_put_return() failures: * ->d_lock held by somebody * refcount <= 0 * ... or an architecture not supporting lockref use of cmpxchg - sparc, anything non-SMP, config with spinlock debugging... We could add a fallback, but it would be a clumsy API - we'd have to distinguish between: (1) refcount > 1 - decremented, lock not held on return (2) refcount < 1 - left alone, probably no sense to hold the lock (3) refcount is 1, no cmphxcg - decremented, lock held on return (4) refcount is 1, cmphxcg supported - decremented, lock *NOT* held on return. We want to return with no lock held in case (4); that's the whole point of that thing. We very much do not want to have the fallback in case (3) return without a lock, since the caller might have to retake it in that case. So it wouldn't be more convenient than doing the fallback in the caller and it would be very easy to screw up, especially since the test coverage would suck - no way to test (3) and (4) on the same kernel build. Reviewed-by:
Christian Brauner <brauner@kernel.org> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
->d_delete() is a way for filesystem to tell that dentry is not worth keeping cached. It is not guaranteed to be called every time a dentry has refcount drop down to zero; it is not guaranteed to be called before dentry gets evicted. In other words, it is not suitable for any kind of keeping track of dentry state. None of the in-tree filesystems attempt to use it that way, fortunately. So the contortions done by fast_dput() (as well as dentry_kill()) are not warranted. fast_dput() certainly should treat having ->d_delete() instance as "can't assume we'll be keeping it", but that's not different from the way we treat e.g. DCACHE_DONTCACHE (which is rather similar to making ->d_delete() returns true when called). Reviewed-by:
Christian Brauner <brauner@kernel.org> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
... we won't see DCACHE_MAY_FREE on anything that is *not* dead and checking d_flags is just as cheap as checking refcount. Reviewed-by:
Christian Brauner <brauner@kernel.org> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
new helper unifying identical bits of shrink_dentry_list() and shring_dcache_for_umount() Reviewed-by:
Christian Brauner <brauner@kernel.org> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Saves a pointer per struct dentry and actually makes the things less clumsy. Cleaned the d_walk() and dcache_readdir() a bit by use of hlist_for_... iterators. A couple of new helpers - d_first_child() and d_next_sibling(), to make the expressions less awful. Reviewed-by:
Christian Brauner <brauner@kernel.org> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
->d_lock on parent does not stabilize ->d_inode of child. We don't do much with that inode in there, but we need at least to avoid struct inode getting freed under us... [rcu_read_lock() is not needed here, since parent's ->d_lock provides an rcu-critical area] Reviewed-by:
Christian Brauner <brauner@kernel.org> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 18 Nov, 2023 8 commits
-
-
Al Viro authored
nfsd_client_rmdir() open-codes a subset of simple_recursive_removal(). Conversion to calling simple_recursive_removal() allows to clean things up quite a bit. While we are at it, nfsdfs_create_files() doesn't need to mess with "pick the reference to struct nfsdfs_client from the already created parent" - the caller already knows it (that's where the parent got it from, after all), so we might as well just pass it as an explicit argument. So __get_nfsdfs_client() is only needed in get_nfsdfs_client() and can be folded in there. Incidentally, the locking in get_nfsdfs_client() is too heavy - we don't need ->i_rwsem for that, ->i_lock serves just fine. Reviewed-by:
Jeff Layton <jlayton@kernel.org> Tested-by:
Jeff Layton <jlayton@kernel.org> Acked-by:
Chuck Lever <chuck.lever@oracle.com> Acked-by:
Christian Brauner <brauner@kernel.org> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
-
Al Viro authored
no users left Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
there's a strange comment in front of d_lookup() declaration: /* appendix may either be NULL or be used for transname suffixes */ Looks like nobody had been curious enough to track its history; it predates git, it predates bitkeeper and if you look through the pre-BK trees, you finally arrive at this in 2.1.44-for-davem: /* appendix may either be NULL or be used for transname suffixes */ -extern struct dentry * d_lookup(struct inode * dir, struct qstr * name, - struct qstr * appendix); +extern struct dentry * d_lookup(struct dentry * dir, struct qstr * name); In other words, it refers to the third argument d_lookup() used to have back then. It had been introduced in 2.1.43-pre, on June 12 1997, along with d_lookup(), only to be removed by July 4 1997, presumably when the Cthulhu-awful thing it used to be used for (look for CONFIG_TRANS_NAMES in 2.1.43-pre, and keep a heavy-duty barfbag ready) had been, er, noticed and recognized for what it had been. Despite the appendectomy, the comment remained. Some things really need to be put out of their misery... Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
d_instantiate_unique() had been gone for 7 years; __d_lookup...() and shrink_dcache_for_umount() are fs/internal.h fodder. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Introduced in 2015 and never had any in-tree users... Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
the last user gone in 2021... Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
For bits 20..22 (inode type cached in ->d_flags) turn the definitions into expressions like (5 << 20); everything else turns into straight use of BIT() Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-