• Filipe Manana's avatar
    btrfs: add new unused block groups to the list of unused block groups · 12c5128f
    Filipe Manana authored
    Space reservations for metadata are, most of the time, pessimistic as we
    reserve space for worst possible cases - where tree heights are at the
    maximum possible height (8), we need to COW every extent buffer in a tree
    path, need to split extent buffers, etc.
    
    For data, we generally reserve the exact amount of space we are going to
    allocate. The exception here is when using compression, in which case we
    reserve space matching the uncompressed size, as the compression only
    happens at writeback time and in the worst possible case we need that
    amount of space in case the data is not compressible.
    
    This means that when there's not available space in the corresponding
    space_info object, we may need to allocate a new block group, and then
    that block group might not be used after all. In this case the block
    group is never added to the list of unused block groups and ends up
    never being deleted - except if we unmount and mount again the fs, as
    when reading block groups from disk we add unused ones to the list of
    unused block groups (fs_info->unused_bgs). Otherwise a block group is
    only added to the list of unused block groups when we deallocate the
    last extent from it, so if no extent is ever allocated, the block group
    is kept around forever.
    
    This also means that if we have a bunch of tasks reserving space in
    parallel we can end up allocating many block groups that end up never
    being used or kept around for too long without being used, which has
    the potential to result in ENOSPC failures in case for example we over
    allocate too many metadata block groups and then end up in a state
    without enough unallocated space to allocate a new data block group.
    
    This is more likely to happen with metadata reservations as of kernel
    6.7, namely since commit 28270e25 ("btrfs: always reserve space for
    delayed refs when starting transaction"), because we started to always
    reserve space for delayed references when starting a transaction handle
    for a non-zero number of items, and also to try to reserve space to fill
    the gap between the delayed block reserve's reserved space and its size.
    
    So to avoid this, when finishing the creation a new block group, add the
    block group to the list of unused block groups if it's still unused at
    that time. This way the next time the cleaner kthread runs, it will delete
    the block group if it's still unused and not needed to satisfy existing
    space reservations.
    Reported-by: default avatarIvan Shapovalov <intelfx@intelfx.name>
    Link: https://lore.kernel.org/linux-btrfs/9cdbf0ca9cdda1b4c84e15e548af7d7f9f926382.camel@intelfx.name/
    CC: stable@vger.kernel.org # 6.7+
    Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
    Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Reviewed-by: default avatarBoris Burkov <boris@bur.io>
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    12c5128f
block-group.c 137 KB