• Josef Bacik's avatar
    btrfs: wait for actual caching progress during allocation · fc1f91b9
    Josef Bacik authored
    Recently we've been having mysterious hangs while running generic/475 on
    the CI system.  This turned out to be something like this:
    
      Task 1
      dmsetup suspend --nolockfs
      -> __dm_suspend
       -> dm_wait_for_completion
        -> dm_wait_for_bios_completion
         -> Unable to complete because of IO's on a plug in Task 2
    
      Task 2
      wb_workfn
      -> wb_writeback
       -> blk_start_plug
        -> writeback_sb_inodes
         -> Infinite loop unable to make an allocation
    
      Task 3
      cache_block_group
      ->read_extent_buffer_pages
       ->Waiting for IO to complete that can't be submitted because Task 1
         suspended the DM device
    
    The problem here is that we need Task 2 to be scheduled completely for
    the blk plug to flush.  Normally this would happen, we normally wait for
    the block group caching to finish (Task 3), and this schedule would
    result in the block plug flushing.
    
    However if there's enough free space available from the current caching
    to satisfy the allocation we won't actually wait for the caching to
    complete.  This check however just checks that we have enough space, not
    that we can make the allocation.  In this particular case we were trying
    to allocate 9MiB, and we had 10MiB of free space, but we didn't have
    9MiB of contiguous space to allocate, and thus the allocation failed and
    we looped.
    
    We specifically don't cycle through the FFE loop until we stop finding
    cached block groups because we don't want to allocate new block groups
    just because we're caching, so we short circuit the normal loop once we
    hit LOOP_CACHING_WAIT and we found a caching block group.
    
    This is normally fine, except in this particular case where the caching
    thread can't make progress because the DM device has been suspended.
    
    Fix this by not only waiting for free space to >= the amount of space we
    want to allocate, but also that we make some progress in caching from
    the time we start waiting.  This will keep us from busy looping when the
    caching is taking a while but still theoretically has enough space for
    us to allocate from, and fixes this particular case by forcing us to
    actually sleep and wait for forward progress, which will flush the plug.
    
    With this fix we're no longer hanging with generic/475.
    
    CC: stable@vger.kernel.org # 6.1+
    Reviewed-by: default avatarBoris Burkov <boris@bur.io>
    Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    fc1f91b9
block-group.h 11.9 KB