• Filipe Manana's avatar
    Btrfs: send, fix race with transaction commits that create snapshots · be6821f8
    Filipe Manana authored
    If we create a snapshot of a snapshot currently being used by a send
    operation, we can end up with send failing unexpectedly (returning
    -ENOENT error to user space for example). The following diagram shows
    how this happens.
    
                CPU 1                                   CPU2                                CPU3
    
     btrfs_ioctl_send()
      (...)
                                         create_snapshot()
                                          -> creates snapshot of a
                                             root used by the send
                                             task
                                          btrfs_commit_transaction()
                                           create_pending_snapshot()
      __get_inode_info()
       btrfs_search_slot()
        btrfs_search_slot_get_root()
         down_read commit_root_sem
    
         get reference on eb of the
         commit root
          -> eb with bytenr == X
    
         up_read commit_root_sem
    
                                            btrfs_cow_block(root node)
                                             btrfs_free_tree_block()
                                              -> creates delayed ref to
                                                 free the extent
    
                                           btrfs_run_delayed_refs()
                                            -> runs the delayed ref,
                                               adds extent to
                                               fs_info->pinned_extents
    
                                           btrfs_finish_extent_commit()
                                            unpin_extent_range()
                                             -> marks extent as free
                                                in the free space cache
    
                                          transaction commit finishes
    
                                                                           btrfs_start_transaction()
                                                                            (...)
                                                                            btrfs_cow_block()
                                                                             btrfs_alloc_tree_block()
                                                                              btrfs_reserve_extent()
                                                                               -> allocates extent at
                                                                                  bytenr == X
                                                                              btrfs_init_new_buffer(bytenr X)
                                                                               btrfs_find_create_tree_block()
                                                                                alloc_extent_buffer(bytenr X)
                                                                                 find_extent_buffer(bytenr X)
                                                                                  -> returns existing eb,
                                                                                     which the send task got
    
                                                                            (...)
                                                                             -> modifies content of the
                                                                                eb with bytenr == X
    
        -> uses an eb that now
           belongs to some other
           tree and no more matches
           the commit root of the
           snapshot, resuts will be
           unpredictable
    
    The consequences of this race can be various, and can lead to searches in
    the commit root performed by the send task failing unexpectedly (unable to
    find inode items, returning -ENOENT to user space, for example) or not
    failing because an inode item with the same number was added to the tree
    that reused the metadata extent, in which case send can behave incorrectly
    in the worst case or just fail later for some reason.
    
    Fix this by performing a copy of the commit root's extent buffer when doing
    a search in the context of a send operation.
    
    CC: stable@vger.kernel.org # 4.4.x: 1fc28d8e: Btrfs: move get root out of btrfs_search_slot to a helper
    CC: stable@vger.kernel.org # 4.4.x: f9ddfd05: Btrfs: remove unused check of skip_locking
    CC: stable@vger.kernel.org # 4.4.x
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    be6821f8
ctree.c 151 KB