• Nikolay Borisov's avatar
    btrfs: Ensure replaced device doesn't have pending chunk allocation · debd1c06
    Nikolay Borisov authored
    Recent FITRIM work, namely bbbf7243 ("btrfs: combine device update
    operations during transaction commit") combined the way certain
    operations are recoded in a transaction. As a result an ASSERT was added
    in dev_replace_finish to ensure the new code works correctly.
    Unfortunately I got reports that it's possible to trigger the assert,
    meaning that during a device replace it's possible to have an unfinished
    chunk allocation on the source device.
    
    This is supposed to be prevented by the fact that a transaction is
    committed before finishing the replace oepration and alter acquiring the
    chunk mutex. This is not sufficient since by the time the transaction is
    committed and the chunk mutex acquired it's possible to allocate a chunk
    depending on the workload being executed on the replaced device. This
    bug has been present ever since device replace was introduced but there
    was never code which checks for it.
    
    The correct way to fix is to ensure that there is no pending device
    modification operation when the chunk mutex is acquire and if there is
    repeat transaction commit. Unfortunately it's not possible to just
    exclude the source device from btrfs_fs_devices::dev_alloc_list since
    this causes ENOSPC to be hit in transaction commit.
    
    Fixing that in another way would need to add special cases to handle the
    last writes and forbid new ones. The looped transaction fix is more
    obvious, and can be easily backported. The runtime of dev-replace is
    long so there's no noticeable delay caused by that.
    Reported-by: default avatarDavid Sterba <dsterba@suse.com>
    Fixes: 391cd9df ("Btrfs: fix unprotected alloc list insertion during the finishing procedure of replace")
    CC: stable@vger.kernel.org # 4.4+
    Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
    Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    debd1c06
dev-replace.c 31.4 KB