• Jeff Mahoney's avatar
    btrfs: fix lockup in find_free_extent with read-only block groups · 14443937
    Jeff Mahoney authored
    If we have a block group that is all of the following:
    1) uncached in memory
    2) is read-only
    3) has a disk cache state that indicates we need to recreate the cache
    
    AND the file system has enough free space fragmentation such that the
    request for an extent of a given size can't be honored;
    
    AND have a single CPU core;
    
    AND it's the block group with the highest starting offset such that
    there are no opportunities (like reading from disk) for the loop to
    yield the CPU;
    
    We can end up with a lockup.
    
    The root cause is simple.  Once we're in the position that we've read in
    all of the other block groups directly and none of those block groups
    can honor the request, there are no more opportunities to sleep.  We end
    up trying to start a caching thread which never gets run if we only have
    one core.  This *should* present as a hung task waiting on the caching
    thread to make some progress, but it doesn't.  Instead, it degrades into
    a busy loop because of the placement of the read-only check.
    
    During the first pass through the loop, block_group->cached will be set
    to BTRFS_CACHE_STARTED and have_caching_bg will be set.  Then we hit the
    read-only check and short circuit the loop.  We're not yet in
    LOOP_CACHING_WAIT, so we skip that loop back before going through the
    loop again for other raid groups.
    
    Then we move to LOOP_CACHING_WAIT state.
    
    During the this pass through the loop, ->cached will still be
    BTRFS_CACHE_STARTED, which means it's not cached, so we'll enter
    cache_block_group, do a lot of nothing, and return, and also set
    have_caching_bg again.  Then we hit the read-only check and short circuit
    the loop.  The same thing happens as before except now we DO trigger
    the LOOP_CACHING_WAIT && have_caching_bg check and loop back up to the
    top.  We do this forever.
    
    There are two fixes in this patch since they address the same underlying
    bug.
    
    The first is to add a cond_resched to the end of the loop to ensure
    that the caching thread always has an opportunity to run.  This will
    fix the soft lockup issue, but find_free_extent will still loop doing
    nothing until the thread has completed.
    
    The second is to move the read-only check to the top of the loop.  We're
    never going to return an allocation within a read-only block group so
    we may as well skip it early.  The check for ->cached == BTRFS_CACHE_ERROR
    would cause the same problem except that BTRFS_CACHE_ERROR is considered
    a "done" state and we won't re-set have_caching_bg again.
    
    Many thanks to Stephan Kulow <coolo@suse.de> for his excellent help in
    the testing process.
    Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
    Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    14443937
extent-tree.c 300 KB