• Tejun Heo's avatar
    kernfs: restructure removal path to fix possible premature return · 45a140e5
    Tejun Heo authored
    The recursive nature of kernfs_remove() means that, even if
    kernfs_remove() is not allowed to be called multiple times on the same
    node, there may be race conditions between removal of parent and its
    descendants.  While we can claim that kernfs_remove() shouldn't be
    called on one of the descendants while the removal of an ancestor is
    in progress, such rule is unnecessarily restrictive and very difficult
    to enforce.  It's better to simply allow invoking kernfs_remove() as
    the caller sees fit as long as the caller ensures that the node is
    accessible.
    
    The current behavior in such situations is broken.  Whoever enters
    removal path first takes the node off the hierarchy and then
    deactivates.  Following removers either return as soon as it notices
    that it's not the first one or can't even find the target node as it
    has already been removed from the hierarchy.  In both cases, the
    following removers may finish prematurely while the nodes which should
    be removed and drained are still being processed by the first one.
    
    This patch restructures so that multiple removers, whether through
    recursion or direction invocation, always follow the following rules.
    
    * When there are multiple concurrent removers, only one puts the base
      ref.
    
    * Regardless of which one puts the base ref, all removers are blocked
      until the target node is fully deactivated and removed.
    
    To achieve the above, removal path now first deactivates the subtree,
    drains it and then unlinks one-by-one.  __kernfs_deactivate() is
    called directly from __kernfs_removal() and drops and regrabs
    kernfs_mutex for each descendant to drain active refs.  As this means
    that multiple removers can enter __kernfs_deactivate() for the same
    node, the function is updated so that it can handle multiple
    deactivators of the same node - only one actually deactivates but all
    wait till drain completion.
    
    The restructured removal path guarantees that a removed node gets
    unlinked only after the node is deactivated and drained.  Combined
    with proper multiple deactivator handling, this guarantees that any
    invocation of kernfs_remove() returns only after the node itself and
    all its descendants are deactivated, drained and removed.
    
    v2: Draining separated into a separate loop (used to be in the same
        loop as unlink) and done from __kernfs_deactivate().  This is to
        allow exposing deactivation as a separate interface later.
    
        Root node removal was broken in v1 patch.  Fixed.
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    45a140e5
dir.c 26.2 KB