Commit 53e489bc authored by Filipe Manana's avatar Filipe Manana Committed by Chris Mason

Btrfs: check pending chunks when shrinking fs to avoid corruption

When we shrink the usable size of a device (its total_bytes), we go over
all the device extent items in the device tree and attempt to relocate
the chunk of any device extent that goes beyond the new usable size for
the device. We do that after setting the new usable size (total_bytes) in
the device object, so that all new allocations (and reallocations) don't
use areas of the device that go beyond the new (shorter) size. However we
were not considering that before setting the new size in the device,
pending chunks might have been created that use device extents that go
beyond the new size, and those device extents are not yet in the device
tree after we search the device tree - they are still attached to the
list of new block group for some ongoing transaction handle, and they are
only added to the device tree when the transaction handle is ended (via
btrfs_create_pending_block_groups()).

So check for pending chunks with device extents that go beyond the new
size and if any exists, commit the current transaction and repeat the
search in the device tree.

Not doing this it would mean we would return success to user space while
still having extents that go beyond the new size, and later user space
could override those locations on the device while the fs still references
them, causing all sorts of corruption and unexpected events.
Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
Signed-off-by: default avatarChris Mason <clm@fb.com>
parent 64ad6c48
......@@ -3965,6 +3965,7 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
int slot;
int failed = 0;
bool retried = false;
bool checked_pending_chunks = false;
struct extent_buffer *l;
struct btrfs_key key;
struct btrfs_super_block *super_copy = root->fs_info->super_copy;
......@@ -4045,15 +4046,6 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
goto again;
} else if (failed && retried) {
ret = -ENOSPC;
lock_chunks(root);
btrfs_device_set_total_bytes(device, old_size);
if (device->writeable)
device->fs_devices->total_rw_bytes += diff;
spin_lock(&root->fs_info->free_chunk_lock);
root->fs_info->free_chunk_space += diff;
spin_unlock(&root->fs_info->free_chunk_lock);
unlock_chunks(root);
goto done;
}
......@@ -4065,6 +4057,35 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
}
lock_chunks(root);
/*
* We checked in the above loop all device extents that were already in
* the device tree. However before we have updated the device's
* total_bytes to the new size, we might have had chunk allocations that
* have not complete yet (new block groups attached to transaction
* handles), and therefore their device extents were not yet in the
* device tree and we missed them in the loop above. So if we have any
* pending chunk using a device extent that overlaps the device range
* that we can not use anymore, commit the current transaction and
* repeat the search on the device tree - this way we guarantee we will
* not have chunks using device extents that end beyond 'new_size'.
*/
if (!checked_pending_chunks) {
u64 start = new_size;
u64 len = old_size - new_size;
if (contains_pending_extent(trans, device, &start, len)) {
unlock_chunks(root);
checked_pending_chunks = true;
failed = 0;
retried = false;
ret = btrfs_commit_transaction(trans, root);
if (ret)
goto done;
goto again;
}
}
btrfs_device_set_disk_total_bytes(device, new_size);
if (list_empty(&device->resized_list))
list_add_tail(&device->resized_list,
......@@ -4079,6 +4100,16 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
btrfs_end_transaction(trans, root);
done:
btrfs_free_path(path);
if (ret) {
lock_chunks(root);
btrfs_device_set_total_bytes(device, old_size);
if (device->writeable)
device->fs_devices->total_rw_bytes += diff;
spin_lock(&root->fs_info->free_chunk_lock);
root->fs_info->free_chunk_space += diff;
spin_unlock(&root->fs_info->free_chunk_lock);
unlock_chunks(root);
}
return ret;
}
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment