Commits · 99c15fec3280a766b0df2903542410c475585bed · Kirill Smelkov / linux

04 Mar, 2024 40 commits

btrfs: remove do_list variable at btrfs_set_delalloc_extent() · 99c15fec

Filipe Manana authored Feb 09, 2024

The "do_list" variable is only used once, plus its name/meaning is a bit
confusing, so remove it and directory use btrfs_is_free_space_inode().
Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

99c15fec

btrfs: use assertion instead of BUG_ON when adding/removing to delalloc list · d23626d8

Filipe Manana authored Feb 09, 2024

When adding or removing and inode to/from the root's delalloc list,
instead of using a BUG_ON() to validate list emptiness, use ASSERT()
since this is to check logic errors rather than real errors.
Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

d23626d8

btrfs: add lockdep assertion to remaining delalloc callbacks · b5d56392

Filipe Manana authored Feb 09, 2024

The merge and split callbacks for an inode's io tree are supposed to be
called while the io tree's spinlock is being held, so that the given
extent_state records are stable, not modified or freed while the callbacks
are using them. So add lockdep assertions in the callbacks.
Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

b5d56392

btrfs: reduce inode lock critical section when setting and clearing delalloc · bdc0f89e

Filipe Manana authored Feb 09, 2024

When setting and clearing a delalloc range, at btrfs_set_delalloc_extent()
and btrfs_clear_delalloc_extent(), we are adding/removing the inode
to/from the root's list of delalloc inodes while under the protection of
the inode's lock. This however is not needed, we can add and remove the
inode to the root's list without holding the inode's lock because here
we are under the protection of the io tree's lock, reducing the size of
the critical section delimited by the inode's lock. The inode's lock is
used in many other places such as when finishing an ordered extent (when
calling btrfs_update_inode_bytes() or btrfs_delalloc_release_metadata(),
or decreasing the number of outstanding extents) or when reserving space
when doing a buffered or direct IO write (calls to functions from
delalloc-space.c).

So move the inode add/remove operations to the root's list of delalloc
inodes to outside the critical section delimited by the inode's lock.
This also allows us to get rid of the BTRFS_INODE_IN_DELALLOC_LIST flag
since we can rely on the inode's delalloc bytes counter to determine if
the inode is or is not in the list.

The following fio based test, that exercises IO to multiple files in the
same subvolume, was used to test:

   $ cat test.sh
   #!/bin/bash

   DEV=/dev/nullb0
   MNT=/mnt/nullb0
   MOUNT_OPTIONS="-o ssd"

   mkfs.btrfs -f $DEV &> /dev/null
   mount $MOUNT_OPTIONS $DEV $MNT

   fio --direct=0 --ioengine=sync --thread --directory=$MNT \
       --invalidate=1 --group_reporting=1 \
       --new_group --rw=randwrite --size=50m --numjobs=200 \
       --bs=4k --fsync_on_close=0 --fallocate=none --end_fsync=0 \
       --name=foo --filename_format=FioWorkloads.\$jobnum

   umount $MNT

The test was run on a non-debug kernel (Debian's default kernel config)
against a 16G null block device.

Result before this patch:

   WRITE: bw=81.9MiB/s (85.9MB/s), 81.9MiB/s-81.9MiB/s (85.9MB/s-85.9MB/s), io=9.77GiB (10.5GB), run=122136-122136msec

Result after this patch:

   WRITE: bw=86.8MiB/s (91.0MB/s), 86.8MiB/s-86.8MiB/s (91.0MB/s-91.0MB/s), io=9.77GiB (10.5GB), run=115180-115180msec
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

bdc0f89e

btrfs: rename btrfs_add_delalloc_inodes() to singular form · f4f15454

Filipe Manana authored Feb 08, 2024

The function btrfs_add_delalloc_inodes() adds a single inode its root's
list of delalloc inodes, so it doesn't make any sense at all for the
function's name to be plural. Rename it to the singular form
btrfs_add_delalloc_inode() to avoid any confusion.
Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

f4f15454

btrfs: assert root delalloc lock is held at __btrfs_del_delalloc_inode() · f23f8952

Filipe Manana authored Feb 08, 2024

This function requires the delalloc lock of the inode's root to be held,
so assert it's held.
Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

f23f8952

btrfs: stop passing root argument to __btrfs_del_delalloc_inode() · f5169f12

Filipe Manana authored Feb 08, 2024

There's no need to pass a root argument to __btrfs_del_delalloc_inode()
and btrfs_del_delalloc_inode(), we can just pass the inode since the root
is always the root associated to that inode. Some remove the root argument
from these functions.
Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

f5169f12

btrfs: stop passing root argument to btrfs_add_delalloc_inodes() · 8a46e55a

Filipe Manana authored Feb 08, 2024

There's no need to pass a root argument to btrfs_add_delalloc_inodes(), we
can just pass the inode since the root is always the root associated to
the inode in the context it's called. So remove it and have the single
caller pass only the inode.
Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

8a46e55a

btrfs: add forward declarations and headers, part 3 · 5693a128

David Sterba authored Jan 27, 2024

Do a cleanup in the rest of the headers:

- add forward declarations for types referenced by pointers
- add includes when types need them

This fixes potential compilation problems if the headers are reordered
or the missing includes are not provided indirectly.
Signed-off-by: David Sterba <dsterba@suse.com>

5693a128

btrfs: add forward declarations and headers, part 2 · 602035d7

David Sterba authored Jan 27, 2024

Do a cleanup in more headers:

- add forward declarations for types referenced by pointers
- add includes when types need them

This fixes potential compilation problems if the headers are reordered
or the missing includes are not provided indirectly.
Signed-off-by: David Sterba <dsterba@suse.com>

602035d7

btrfs: add forward declarations and headers, part 1 · 22b46bdc

David Sterba authored Jan 27, 2024

Do a cleanup in the short headers:

- add forward declarations for types referenced by pointers
- add includes when types need them

This fixes potential compilation problems if the headers are reordered
or the missing includes are not provided indirectly.
Signed-off-by: David Sterba <dsterba@suse.com>

22b46bdc

btrfs: hoist fs_info out of loops in end_bbio_data_write and end_bbio_data_read · e84bfffc

David Sterba authored Jan 24, 2024

The fs_info and sectorsize remain the same during the loops, no need to
set them on each iteration.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

e84bfffc

btrfs: add helper to get fs_info from struct inode pointer · 41044b41

David Sterba authored Sep 14, 2023

Add a convenience helper to get a fs_info from a VFS inode pointer
instead of open coding the chain or using btrfs_sb() that in some cases
does one more pointer hop.  This is implemented as a macro (still with
type checking) so we don't need full definitions of struct btrfs_inode,
btrfs_root or btrfs_fs_info.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

41044b41

btrfs: add helpers to get fs_info from page/folio pointers · b33d2e53

David Sterba authored Sep 14, 2023

Add convenience helpers to get a fs_info from a page or folio pointer
instead of open coding the chain or using btrfs_sb() that in some cases
does one more pointer hop.  This is implemented as a macro (still with
type checking) so we don't need full definitions of struct page, folio,
btrfs_root and btrfs_fs_info. The latter can't be static inlines as this
would create loop between ctree.h <-> fs.h, or the headers would have to
be restructured.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

b33d2e53

btrfs: add helpers to get inode from page/folio pointers · c8293894

David Sterba authored Sep 13, 2023

Add convenience helpers to get a struct btrfs_inode from a page or folio
pointer instead of open coding the chain or intermediate BTRFS_I. This
is implemented as a macro (still with type checking) so we don't need
full definitions of struct page or address_space.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

c8293894

btrfs: tests: allocate dummy fs_info and root in test_find_delalloc() · b2136cc2

David Sterba authored Jan 29, 2024

Allocate fs_info and root to have a valid fs_info pointer in case it's
dereferenced by a helper outside of tests, like find_lock_delalloc_range().
Signed-off-by: David Sterba <dsterba@suse.com>

b2136cc2

btrfs: mark __btrfs_add_free_space static · 737e6e5f

Lijuan Li authored Feb 06, 2024

__btrfs_add_free_space is only used in free-space-cache.c,
so mark it static.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Lijuan Li <lilijuan@iscas.ac.cn>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

737e6e5f

btrfs: move transaction abort to the error site btrfs_rebuild_free_space_tree() · 3c9da0d5

David Sterba authored Jan 24, 2024

The recommended pattern for transaction abort after error is to place it
right after the error is handled. That way it's easier to locate where
it failed and help debugging.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

3c9da0d5

btrfs: move transaction abort to the error site in btrfs_create_free_space_tree() · ca7f7986

David Sterba authored Jan 24, 2024

The recommended pattern for transaction abort after error is to place it
right after the error is handled. That way it's easier to locate where
it failed and help debugging.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

ca7f7986

btrfs: move transaction abort to the error site in btrfs_delete_free_space_tree() · 0896ce75

David Sterba authored Jan 24, 2024

The recommended pattern for transaction abort after error is to place it
right after the error is handled. That way it's easier to locate where
it failed and help debugging.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

0896ce75

btrfs: unify handling of return values of btrfs_insert_empty_items() · 5378ea6e

David Sterba authored Jan 24, 2024

The error values returned by btrfs_insert_empty_items() are following
the common patter of 0/-errno, but some callers check for a value > 0,
which can't happen. Document that and update calls to not expect
positive values.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

5378ea6e

btrfs: change BUG_ON to assertion in reset_balance_state() · 53e4d8c2

David Sterba authored Jan 24, 2024

The balance state machine is complex so it's good to verify the
assumptions in helpers, however reset_balance_state() is used
at the end of balance and fs_info::balance_ctl is properly set up before
and protected by the exclusive op ownership in btrfs_balance().
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

53e4d8c2

btrfs: change BUG_ON to assertion when verifying root in btrfs_alloc_reserved_file_extent() · 3e1d51dd

David Sterba authored Jan 24, 2024

The file extents are normally reserved in subvolume roots but could be
also in the data reloc tree. Change the BUG_ON to assertions as this
verifies the usage assumptions.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

3e1d51dd

btrfs: change BUG_ON to assertion when verifying lockdep class setup · 504a00ac

David Sterba authored Jan 23, 2024

The BUG_ON in btrfs_set_buffer_lockdep_class() is a sanity check of the
level which is verified in callers, e.g. when initializing an extent
buffer or reading from an eb header. Change it to an assertion as this
would not happen unless things are really bad and would fail elsewhere
too.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

504a00ac

btrfs: change BUG_ON to assertion in btrfs_read_roots() · 2467d0fe

David Sterba authored Jan 24, 2024

There's one caller of btrfs_read_roots() and that already uses the
tree_root pointer, it's pointless to BUG_ON on it. As it's an assumption
of the initialization helpers make it an assert instead.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

2467d0fe

btrfs: defrag: change BUG_ON to assertion in btrfs_defrag_leaves() · 51d4be54

David Sterba authored Jan 19, 2024

The BUG_ON verifies a condition that should be guaranteed by the correct
use of the path search (with keep_locks and lowest_level set), an
assertion is the suitable check.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

51d4be54

btrfs: change BUG_ON to assertion when checking for delayed_node root · be73f444

David Sterba authored Jan 20, 2024

The pointer to root is initialized in btrfs_init_delayed_node(), no need
to check for it again. Change the BUG_ON to assertion.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

be73f444

btrfs: delayed-inode: drop pointless BUG_ON in __btrfs_remove_delayed_item() · 778e618b

David Sterba authored Jan 20, 2024

There's a BUG_ON checking for a valid pointer of fs_info::delayed_root
but it is valid since init_mount_fs_info() and has the same lifetime as
fs_info.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

778e618b

btrfs: export: handle invalid inode or root reference in btrfs_get_parent() · 26b66d1d

David Sterba authored Jan 19, 2024

The get_parent handler looks up a parent of a given dentry, this can be
either a subvolume or a directory. The search is set up with offset -1
but it's never expected to find such item, as it would break allowed
range of inode number or a root id. This means it's a corruption (ext4
also returns this error code).
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

26b66d1d

btrfs: handle invalid extent item reference found in check_committed_ref() · a4259b6c

David Sterba authored Jan 24, 2024

The check_committed_ref() helper looks up an extent item by a key,
allowing to do an inexact search when key->offset is -1.  It's never
expected to find such item, as it would break the allowed range of a
extent item offset.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

a4259b6c

btrfs: handle chunk tree lookup error in btrfs_relocate_sys_chunks() · 7411055d

David Sterba authored Jan 23, 2024

The unhandled case in btrfs_relocate_sys_chunks() loop is a corruption,
as it could be caused only by two impossible conditions:

- at first the search key is set up to look for a chunk tree item, with
  offset -1, this is an inexact search and the key->offset will contain
  the correct offset upon a successful search, a valid chunk tree item
  cannot have an offset -1

- after first successful search, the found_key corresponds to a chunk
  item, the offset is decremented by 1 before the next loop, it's
  impossible to find a chunk item there due to alignment and size
  constraints
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

7411055d

btrfs: handle invalid root reference found in btrfs_init_root_free_objectid() · a6724290

David Sterba authored Jan 23, 2024

The btrfs_init_root_free_objectid() looks up a root by a key, allowing
to do an inexact search when key->offset is -1.  It's never expected to
find such item, as it would break the allowed range of a root id.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

a6724290

btrfs: handle invalid root reference found in btrfs_find_root() · 9dcb6ed9

David Sterba authored Jan 23, 2024

The btrfs_find_root() looks up a root by a key, allowing to do an
inexact search when key->offset is -1.  It's never expected to find such
item, as it would break allowed the range of a root id.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

9dcb6ed9

btrfs: handle root deletion lookup error in btrfs_del_root() · 0fe29838

David Sterba authored Jan 23, 2024

We're deleting a root and looking it up by key does not succeed, this
is an inconsistent state and we can't do anything. All callers handle
errors and abort a transaction.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

0fe29838

btrfs: handle block group lookup error when it's being removed · 97ec3320

David Sterba authored Jan 20, 2024

The unlikely case of lookup error in btrfs_remove_block_group() can be
handled properly, in its caller this would lead to a transaction abort.
We can't do anything else, a block group must have been loaded first.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

97ec3320

btrfs: handle invalid range and start in merge_extent_mapping() · c093bf30

David Sterba authored Jan 19, 2024

Turn a BUG_ON to a properly handled error and update the error message
in the caller.  It is expected that @em_in and @start passed to
btrfs_add_extent_mapping() overlap. Besides tests, the only caller
btrfs_get_extent() makes sure this is true.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

c093bf30

btrfs: handle directory and dentry mismatch in btrfs_may_delete() · 16865702

David Sterba authored Jan 19, 2024

The helper btrfs_may_delete() is a copy of generic fs/namei.c:may_delete()
to verify various conditions before deletion. There's a BUG_ON added
before linux.git started, we can turn it to a proper error handling
at least in our local helper. A mistmatch between directory and the
deleted dentry is clearly invalid.

This won't be probably ever hit due to the way how the parameters are
set from the caller btrfs_ioctl_snap_destroy(), using a VFS helper
lookup_one().
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

16865702

btrfs: use READ/WRITE_ONCE for fs_devices->read_policy · 4dc4a3be

Naohiro Aota authored Feb 02, 2024

Since we can read/modify the value from the sysfs interface concurrently,
it would be better to protect it from compiler optimizations.

Currently, there is only one read policy BTRFS_READ_POLICY_PID available,
so no actual problem can happen now. This is a preparation for the future
expansion.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

4dc4a3be

btrfs: preallocate temporary extent buffer for inode logging when needed · e383e158

Filipe Manana authored Jan 26, 2024

When logging an inode and we require to copy items from subvolume leaves
to the log tree, we clone each subvolume leaf and than use that clone to
copy items to the log tree. This is required to avoid possible deadlocks
as stated in commit 796787c9 ("btrfs: do not modify log tree while
holding a leaf from fs tree locked").

The cloning requires allocating an extent buffer (struct extent_buffer)
and then allocating pages (folios) to attach to the extent buffer. This
may be slow in case we are under memory pressure, and since we are doing
the cloning while holding a read lock on a subvolume leaf, it means we
can be blocking other operations on that leaf for significant periods of
time, which can increase latency on operations like creating other files,
renaming files, etc. Similarly because we're under a log transaction, we
may also cause extra delay on other tasks doing an fsync, because syncing
the log requires waiting for tasks that joined a log transaction to exit
the transaction.

So to improve this, for any inode logging operation that needs to copy
items from a subvolume leaf ("full sync" or "copy everything" bit set
in the inode), preallocate a dummy extent buffer before locking any
extent buffer from the subvolume tree, and even before joining a log
transaction, add it to the log context and then use it when we need to
copy items from a subvolume leaf to the log tree. This avoids making
other operations get extra latency when waiting to lock a subvolume
leaf that is used during inode logging and we are under heavy memory
pressure.

The following test script with bonnie++ was used to test this:

  $ cat test.sh
  #!/bin/bash

  DEV=/dev/sdh
  MNT=/mnt/sdh
  MOUNT_OPTIONS="-o ssd"

  MEMTOTAL_BYTES=`free -b | grep Mem: | awk '{ print $2 }'`
  NR_DIRECTORIES=20
  NR_FILES=20480
  DATASET_SIZE=$((MEMTOTAL_BYTES * 2 / 1048576))
  DIRECTORY_SIZE=$((MEMTOTAL_BYTES * 2 / NR_FILES))
  NR_FILES=$((NR_FILES / 1024))

  echo "performance" | \
      tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

  umount $DEV &> /dev/null
  mkfs.btrfs -f $MKFS_OPTIONS $DEV
  mount $MOUNT_OPTIONS $DEV $MNT

  bonnie++ -u root -d $MNT \
      -n $NR_FILES:$DIRECTORY_SIZE:$DIRECTORY_SIZE:$NR_DIRECTORIES \
      -r 0 -s $DATASET_SIZE -b

  umount $MNT

The results of this test on a 8G VM running a non-debug kernel (Debian's
default kernel config), were the following.

Before this change:

  Version 2.00a       ------Sequential Output------ --Sequential Input- --Random-
                      -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
  Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
  debian0       7501M  376k  99  1.4g  96  117m  14 1510k  99  2.5g  95 +++++ +++
  Latency             35068us   24976us    2944ms   30725us   71770us   26152us
  Version 2.00a       ------Sequential Create------ --------Random Create--------
  debian0             -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
  files:max:min        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
  20:384100:384100/20 20480  32 20480  58 20480  48 20480  39 20480  56 20480  61
  Latency               411ms   11914us     119ms     617ms   10296us     110ms

After this change:

  Version 2.00a       ------Sequential Output------ --Sequential Input- --Random-
                      -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
  Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
  debian0       7501M  375k  99  1.4g  97  117m  14 1546k  99  2.3g  98 +++++ +++
  Latency             35975us  20945us    2144ms   10297us    2217us    6004us
  Version 2.00a       ------Sequential Create------ --------Random Create--------
  debian0             -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
  files:max:min        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
  20:384100:384100/20 20480  35 20480  58 20480  48 20480  40 20480  57 20480  59
  Latency               320ms   11237us   77779us     518ms    6470us   86389us
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

e383e158

btrfs: add comment about list_is_singular() use at btrfs_delete_unused_bgs() · edebd19a

Filipe Manana authored Jan 25, 2024

At btrfs_delete_unused_bgs(), the use of the list_is_singular() check on
a block group may not be immediately obvious. It is there to prevent
losing raid profile information for a block group type (data, metadata or
system), as that information is removed from
fs_info->avail_[data|metadata|system]_alloc_bits when the last block group
of a given type is deleted. So deleting the block group would later result
in creating block groups of that type with a single profile (because
fs_info->avail_*_alloc_bits would have a value of 0).

This check was added in commit aefbe9a6 ("btrfs: Fix lost-data-profile
caused by auto removing bg").

So add a comment mentioning the need for the check.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

edebd19a