• Filipe Manana's avatar
    btrfs: reserve space for delayed refs on a per ref basis · 3ee56a58
    Filipe Manana authored
    Currently when reserving space for delayed refs we do it on a per ref head
    basis. This is generally enough because most back refs for an extent end
    up being inlined in the extent item - with the default leaf size of 16K we
    can have at most 33 inline back refs (this is calculated by the macro
    BTRFS_MAX_EXTENT_ITEM_SIZE()). The amount of bytes reserved for each ref
    head is given by btrfs_calc_delayed_ref_bytes(), which basically
    corresponds to a single path for insertion into the extent tree plus
    another path for insertion into the free space tree if it's enabled.
    
    However if we have reached the limit of inline refs or we have a mix of
    inline and non-inline refs, then we will need to insert a non-inline ref
    and update the existing extent item to update the total number of
    references for the extent. This implies we need reserved space for two
    insertion paths in the extent tree, but we only reserved for one path.
    The extent item and the non-inline ref item may be located in different
    leaves, or even if they are located in the same leaf, after updating the
    extent item and before inserting the non-inline ref item, the extent
    buffers in the btree path may have been written (due to memory pressure
    for e.g.), in which case we need to COW the entire path again. In this
    case since we have not reserved enough space for the delayed refs block
    reserve, we will use the global block reserve.
    
    If we are in a situation where the fs has no more unallocated space enough
    to allocate a new metadata block group and available space in the existing
    metadata block groups is close to the maximum size of the global block
    reserve (512M), we may end up consuming too much of the free metadata
    space to the point where we can't commit any future transaction because it
    will fail, with -ENOSPC, during its commit when trying to allocate an
    extent for some COW operation (running delayed refs generated by running
    delayed refs or COWing the root tree's root node at commit_cowonly_roots()
    for example). Such dramatic scenario can happen if we have many delayed
    refs that require the insertion of non-inline ref items, due to too many
    reflinks or snapshots. We also have situations where we use the global
    block reserve because we could not in advance know that we will need
    space to update some trees (block group creation for example), so this
    all adds up to increase the chances of exhausting the global block reserve
    and making any future transaction commit to fail with -ENOSPC and turn
    the fs into RO mode, or fail the mount operation in case the mount needs
    to start and commit a transaction, such as when we have orphans to cleanup
    for example - such case was reported and hit by someone running a SLE
    (SUSE Linux Enterprise) distribution for example - where the fs had no
    more unallocated space that could be used to allocate a new metadata block
    group, and the available metadata space was about 1.5M, not enough to
    commit a transaction to cleanup an orphan inode (or do relocation of data
    block groups that were far from being full).
    
    So reserve space for delayed refs by individual refs and not by ref heads,
    as we may need to COW multiple extent tree paths due to non-inline ref
    items.
    Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    3ee56a58
extent-tree.c 168 KB