• Filipe Manana's avatar
    Btrfs: fix race leading to BUG_ON when running delalloc for nodatacow · 1d512cb7
    Filipe Manana authored
    If we are using the NO_HOLES feature, we have a tiny time window when
    running delalloc for a nodatacow inode where we can race with a concurrent
    link or xattr add operation leading to a BUG_ON.
    
    This happens because at run_delalloc_nocow() we end up casting a leaf item
    of type BTRFS_INODE_[REF|EXTREF]_KEY or of type BTRFS_XATTR_ITEM_KEY to a
    file extent item (struct btrfs_file_extent_item) and then analyse its
    extent type field, which won't match any of the expected extent types
    (values BTRFS_FILE_EXTENT_[REG|PREALLOC|INLINE]) and therefore trigger an
    explicit BUG_ON(1).
    
    The following sequence diagram shows how the race happens when running a
    no-cow dellaloc range [4K, 8K[ for inode 257 and we have the following
    neighbour leafs:
    
                 Leaf X (has N items)                    Leaf Y
    
     [ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ]  [ (257 EXTENT_DATA 8192), ... ]
                  slot N - 2         slot N - 1              slot 0
    
     (Note the implicit hole for inode 257 regarding the [0, 8K[ range)
    
           CPU 1                                         CPU 2
    
     run_dealloc_nocow()
       btrfs_lookup_file_extent()
         --> searches for a key with value
             (257 EXTENT_DATA 4096) in the
             fs/subvol tree
         --> returns us a path with
             path->nodes[0] == leaf X and
             path->slots[0] == N
    
       because path->slots[0] is >=
       btrfs_header_nritems(leaf X), it
       calls btrfs_next_leaf()
    
       btrfs_next_leaf()
         --> releases the path
    
                                                  hard link added to our inode,
                                                  with key (257 INODE_REF 500)
                                                  added to the end of leaf X,
                                                  so leaf X now has N + 1 keys
    
         --> searches for the key
             (257 INODE_REF 256), because
             it was the last key in leaf X
             before it released the path,
             with path->keep_locks set to 1
    
         --> ends up at leaf X again and
             it verifies that the key
             (257 INODE_REF 256) is no longer
             the last key in the leaf, so it
             returns with path->nodes[0] ==
             leaf X and path->slots[0] == N,
             pointing to the new item with
             key (257 INODE_REF 500)
    
       the loop iteration of run_dealloc_nocow()
       does not break out the loop and continues
       because the key referenced in the path
       at path->nodes[0] and path->slots[0] is
       for inode 257, its type is < BTRFS_EXTENT_DATA_KEY
       and its offset (500) is less then our delalloc
       range's end (8192)
    
       the item pointed by the path, an inode reference item,
       is (incorrectly) interpreted as a file extent item and
       we get an invalid extent type, leading to the BUG_ON(1):
    
       if (extent_type == BTRFS_FILE_EXTENT_REG ||
          extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
           (...)
       } else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
           (...)
       } else {
           BUG_ON(1)
       }
    
    The same can happen if a xattr is added concurrently and ends up having
    a key with an offset smaller then the delalloc's range end.
    
    So fix this by skipping keys with a type smaller than
    BTRFS_EXTENT_DATA_KEY.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    1d512cb7
inode.c 267 KB