• Sebastian Andrzej Siewior's avatar
    fs/dcache: Move the wakeup from __d_lookup_done() to the caller. · 45f78b0a
    Sebastian Andrzej Siewior authored
    __d_lookup_done() wakes waiters on dentry->d_wait.  On PREEMPT_RT we are
    not allowed to do that with preemption disabled, since the wakeup
    acquired wait_queue_head::lock, which is a "sleeping" spinlock on RT.
    
    Calling it under dentry->d_lock is not a problem, since that is also a
    "sleeping" spinlock on the same configs.  Unfortunately, two of its
    callers (__d_add() and __d_move()) are holding more than just ->d_lock
    and that needs to be dealt with.
    
    The key observation is that wakeup can be moved to any point before
    dropping ->d_lock.
    
    As a first step to solve this, move the wake up outside of the
    hlist_bl_lock() held section.
    
    This is safe because:
    
    Waiters get inserted into ->d_wait only after they'd taken ->d_lock
    and observed DCACHE_PAR_LOOKUP in flags.  As long as they are
    woken up (and evicted from the queue) between the moment __d_lookup_done()
    has removed DCACHE_PAR_LOOKUP and dropping ->d_lock, we are safe,
    since the waitqueue ->d_wait points to won't get destroyed without
    having __d_lookup_done(dentry) called (under ->d_lock).
    
    ->d_wait is set only by d_alloc_parallel() and only in case when
    it returns a freshly allocated in-lookup dentry.  Whenever that happens,
    we are guaranteed that __d_lookup_done() will be called for resulting
    dentry (under ->d_lock) before the wq in question gets destroyed.
    
    With two exceptions wq lives in call frame of the caller of
    d_alloc_parallel() and we have an explicit d_lookup_done() on the
    resulting in-lookup dentry before we leave that frame.
    
    One of those exceptions is nfs_call_unlink(), where wq is embedded into
    (dynamically allocated) struct nfs_unlinkdata.  It is destroyed in
    nfs_async_unlink_release() after an explicit d_lookup_done() on the
    dentry wq went into.
    
    Remaining exception is d_add_ci(). There wq is what we'd found in
    ->d_wait of d_add_ci() argument. Callers of d_add_ci() are two
    instances of ->d_lookup() and they must have been given an in-lookup
    dentry.  Which means that they'd been called by __lookup_slow() or
    lookup_open(), with wq in the call frame of one of those.
    
    Result of d_alloc_parallel() in d_add_ci() is fed to
    d_splice_alias(), which either returns non-NULL (and d_add_ci() does
    d_lookup_done()) or feeds dentry to __d_add() that will do
    __d_lookup_done() under ->d_lock.  That concludes the analysis.
    
    Let __d_lookup_unhash():
    
      1) Lock the lookup hash and clear DCACHE_PAR_LOOKUP
      2) Unhash the dentry
      3) Retrieve and clear dentry::d_wait
      4) Unlock the hash and return the retrieved waitqueue head pointer
      5) Let the caller handle the wake up.
      6) Rename __d_lookup_done() to __d_lookup_unhash_wake() to enforce
         build failures for OOT code that used __d_lookup_done() and is not
         aware of the new return value.
    
    This does not yet solve the PREEMPT_RT problem completely because
    preemption is still disabled due to i_dir_seq being held for write. This
    will be addressed in subsequent steps.
    
    An alternative solution would be to switch the waitqueue to a simple
    waitqueue, but aside of Linus not being a fan of them, moving the wake up
    closer to the place where dentry::lock is unlocked reduces lock contention
    time for the woken up waiter.
    Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
    Link: https://lkml.kernel.org/r/20220613140712.77932-3-bigeasy@linutronix.deSigned-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
    45f78b0a
dcache.c 87 KB