• Dave Chinner's avatar
    xfs: per-cpu deferred inode inactivation queues · ab23a776
    Dave Chinner authored
    Move inode inactivation to background work contexts so that it no
    longer runs in the context that releases the final reference to an
    inode. This will allow process work that ends up blocking on
    inactivation to continue doing work while the filesytem processes
    the inactivation in the background.
    
    A typical demonstration of this is unlinking an inode with lots of
    extents. The extents are removed during inactivation, so this blocks
    the process that unlinked the inode from the directory structure. By
    moving the inactivation to the background process, the userspace
    applicaiton can keep working (e.g. unlinking the next inode in the
    directory) while the inactivation work on the previous inode is
    done by a different CPU.
    
    The implementation of the queue is relatively simple. We use a
    per-cpu lockless linked list (llist) to queue inodes for
    inactivation without requiring serialisation mechanisms, and a work
    item to allow the queue to be processed by a CPU bound worker
    thread. We also keep a count of the queue depth so that we can
    trigger work after a number of deferred inactivations have been
    queued.
    
    The use of a bound workqueue with a single work depth allows the
    workqueue to run one work item per CPU. We queue the work item on
    the CPU we are currently running on, and so this essentially gives
    us affine per-cpu worker threads for the per-cpu queues. THis
    maintains the effective CPU affinity that occurs within XFS at the
    AG level due to all objects in a directory being local to an AG.
    Hence inactivation work tends to run on the same CPU that last
    accessed all the objects that inactivation accesses and this
    maintains hot CPU caches for unlink workloads.
    
    A depth of 32 inodes was chosen to match the number of inodes in an
    inode cluster buffer. This hopefully allows sequential
    allocation/unlink behaviours to defering inactivation of all the
    inodes in a single cluster buffer at a time, further helping
    maintain hot CPU and buffer cache accesses while running
    inactivations.
    
    A hard per-cpu queue throttle of 256 inode has been set to avoid
    runaway queuing when inodes that take a long to time inactivate are
    being processed. For example, when unlinking inodes with large
    numbers of extents that can take a lot of processing to free.
    Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
    [djwong: tweak comments and tracepoints, convert opflags to state bits]
    Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
    Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
    ab23a776
common.c 21.7 KB