- 29 Apr, 2019 40 commits
-
-
Qu Wenruo authored
Since reloc tree doesn't contribute to qgroup numbers, just skip them. This should catch the final cause of unnecessary data ref processing when running balance of metadata with qgroups on. The 4G data 16 snapshots test (*) should explain it pretty well: | delayed subtree | refactor delayed ref | this patch --------------------------------------------------------------------- relocated | 22653 | 22673 | 22744 qgroup dirty | 122792 | 48360 | 70 time | 24.494 | 11.606 | 3.944 Finally, we're at the stage where qgroup + metadata balance cost no obvious overhead. Test environment: Test VM: - vRAM 8G - vCPU 8 - block dev vitrio-blk, 'unsafe' cache mode - host block 850evo Test workload: - Copy 4G data from /usr/ to one subvolume - Create 16 snapshots of that subvolume, and modify 3 files in each snapshot - Enable quota, rescan - Time "btrfs balance start -m" Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
Similar to btrfs_inc_extent_ref(), use btrfs_ref to replace the long parameter list and the confusing @owner parameter. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
Use the new btrfs_ref structure and replace parameter list to clean up the usage of owner and level to distinguish the extent types. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
Since add_pinned_bytes() only needs to know if the extent is metadata and if it's a chunk tree extent, btrfs_ref is a perfect match for it, as we don't need various owner/level trick to determine extent type. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
It's a perfect match for btrfs_ref_tree_mod() to use btrfs_ref, as btrfs_ref describes a metadata/data reference update comprehensively. Now we have one less function use confusing owner/level trick. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
Just like btrfs_add_delayed_tree_ref(), use btrfs_ref to refactor btrfs_add_delayed_data_ref(). Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
btrfs_add_delayed_tree_ref() has a longer and longer parameter list, and some callers like btrfs_inc_extent_ref() are using @owner as level for delayed tree ref. Instead of making the parameter list longer, use btrfs_ref to refactor it, so each parameter assignment should be self-explaining without dirty level/owner trick, and provides the basis for later refactoring. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
The process_func function pointer is local to __btrfs_mod_ref() and points to either btrfs_inc_extent_ref() or btrfs_free_extent(). Open code it to make later delayed ref refactor easier, so we can refactor btrfs_inc_extent_ref() and btrfs_free_extent() in different patches. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
Current delayed ref interface has several problems: - Longer and longer parameter lists bytenr num_bytes parent ---------- so far so good ref_root owner offset ---------- I don't feel good now - Different interpretation of the same parameter Above @owner for data ref is inode number (u64), while for tree ref, it's level (int). They are even in different size range. For level we only need 0 ~ 8, while for ino it's BTRFS_FIRST_FREE_OBJECTID ~ BTRFS_LAST_FREE_OBJECTID. And @offset doesn't even make sense for tree ref. Such parameter reuse may look clever as an hidden union, but it destroys code readability. To solve both problems, we introduce a new structure, btrfs_ref to solve them: - Structure instead of long parameter list This makes later expansion easier, and is better documented. - Use btrfs_ref::type to distinguish data and tree ref - Use proper union to store data/tree ref specific structures. - Use separate functions to fill data/tree ref data, with a common generic function to fill common bytenr/num_bytes members. All parameters will find its place in btrfs_ref, and an extra member, @real_root, inspired by ref-verify code, is newly introduced for later qgroup code, to record which tree is triggered by this extent modification. This patch doesn't touch any code, but provides the basis for further refactoring. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Filipe Manana authored
When finding out which inodes have references on a particular extent, done by backref.c:iterate_extent_inodes(), from the BTRFS_IOC_LOGICAL_INO (both v1 and v2) ioctl and from scrub we use the transaction join API to grab a reference on the currently running transaction, since in order to give accurate results we need to inspect the delayed references of the currently running transaction. However, if there is currently no running transaction, the join operation will create a new transaction. This is inefficient as the transaction will eventually be committed, doing unnecessary IO and introducing a potential point of failure that will lead to a transaction abort due to -ENOSPC, as recently reported [1]. That's because the join, creates the transaction but does not reserve any space, so when attempting to update the root item of the root passed to btrfs_join_transaction(), during the transaction commit, we can end up failling with -ENOSPC. Users of a join operation are supposed to actually do some filesystem changes and reserve space by some means, which is not the case of iterate_extent_inodes(), it is a read-only operation for all contextes from which it is called. The reported [1] -ENOSPC failure stack trace is the following: heisenberg kernel: ------------[ cut here ]------------ heisenberg kernel: BTRFS: Transaction aborted (error -28) heisenberg kernel: WARNING: CPU: 0 PID: 7137 at fs/btrfs/root-tree.c:136 btrfs_update_root+0x22b/0x320 [btrfs] (...) heisenberg kernel: CPU: 0 PID: 7137 Comm: btrfs-transacti Not tainted 4.19.0-4-amd64 #1 Debian 4.19.28-2 heisenberg kernel: Hardware name: FUJITSU LIFEBOOK U757/FJNB2A5, BIOS Version 1.21 03/19/2018 heisenberg kernel: RIP: 0010:btrfs_update_root+0x22b/0x320 [btrfs] (...) heisenberg kernel: RSP: 0018:ffffb5448828bd40 EFLAGS: 00010286 heisenberg kernel: RAX: 0000000000000000 RBX: ffff8ed56bccef50 RCX: 0000000000000006 heisenberg kernel: RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff8ed6bda166a0 heisenberg kernel: RBP: 00000000ffffffe4 R08: 00000000000003df R09: 0000000000000007 heisenberg kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff8ed63396a078 heisenberg kernel: R13: ffff8ed092d7c800 R14: ffff8ed64f5db028 R15: ffff8ed6bd03d068 heisenberg kernel: FS: 0000000000000000(0000) GS:ffff8ed6bda00000(0000) knlGS:0000000000000000 heisenberg kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 heisenberg kernel: CR2: 00007f46f75f8000 CR3: 0000000310a0a002 CR4: 00000000003606f0 heisenberg kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 heisenberg kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 heisenberg kernel: Call Trace: heisenberg kernel: commit_fs_roots+0x166/0x1d0 [btrfs] heisenberg kernel: ? _cond_resched+0x15/0x30 heisenberg kernel: ? btrfs_run_delayed_refs+0xac/0x180 [btrfs] heisenberg kernel: btrfs_commit_transaction+0x2bd/0x870 [btrfs] heisenberg kernel: ? start_transaction+0x9d/0x3f0 [btrfs] heisenberg kernel: transaction_kthread+0x147/0x180 [btrfs] heisenberg kernel: ? btrfs_cleanup_transaction+0x530/0x530 [btrfs] heisenberg kernel: kthread+0x112/0x130 heisenberg kernel: ? kthread_bind+0x30/0x30 heisenberg kernel: ret_from_fork+0x35/0x40 heisenberg kernel: ---[ end trace 05de912e30e012d9 ]--- So fix that by using the attach API, which does not create a transaction when there is currently no running transaction. [1] https://lore.kernel.org/linux-btrfs/b2a668d7124f1d3e410367f587926f622b3f03a4.camel@scientia.net/Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
We can read fs_info from the device and can drop it from the parameters. Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
We can read fs_info from the device and can drop it from the parameters. Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
We can read fs_info from the device and can drop it from the parameters. Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
We can read fs_info from extent buffer and can drop it from the parameters. Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
We can read fs_info from extent buffer and can drop it from the parameters. Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
None of the implementers of the submit_bio_hook use the bio_offset parameter, simply remove it. No functional changes. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
The btree submit hook queues the async csum and forwards the bio_offset parameter passed to btree_submit_bio_hook. This is redundant since btree_submit_bio_start calls btree_csum_one_bio which doesn't use the offset at all. No functional changes. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
Buffered writeback always calls btrfs_csum_one_bio with the last 2 arguments being 0 irrespective of what the bio_offset has been passed to btrfs_submit_bio_start. Make this apparent by explicitly passing 0 for bio_offset when calling btrfs_wq_submit_bio from btrfs_submit_bio_hook. This will allow for further simplifications down the line. No functional changes. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
This function always uses the btree inode's io_tree. Stop taking the tree as a function argument and instead access it internally from read_extent_buffer_pages. No functional changes. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
The only possible 'private_data' that is passed to this function is actually an inode. Make that explicit by changing the signature of the call back. No functional changes. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
There is no need to use a typedef to define the type of the function and then use that to define the respective member in extent_io_ops. Define struct's member directly. No functional changes. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
We can read fs_info from the block group cache structure and can drop it from the parameters. Though the transaction is also availabe, it's not guaranteed to be non-NULL. Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
We can read fs_info from the block group cache structure and can drop it from the parameters. Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
We can read fs_info from the block group cache structure and can drop it from the parameters. Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
We can read fs_info from the block group cache structure and can drop it from the parameters. Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
We can read fs_info from the block group cache structure and can drop it from the parameters. Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
We can read fs_info from the block group cache structure and can drop it from the parameters. Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
We can read fs_info from the block group cache structure and can drop it from the parameters. Signed-off-by: David Sterba <dsterba@suse.com>
-
Filipe Manana authored
It used to be called from only two places (truncate path and releasing a transaction handle), but commits 28bad212 ("btrfs: fix truncate throttling") and db2462a6 ("btrfs: don't run delayed refs in the end transaction logic") removed their calls to this function, so it's not used anymore. Just remove it and all its helpers. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Anand Jain authored
Previous patch made sure that btrfs_setxattr_trans() is called only when transaction NULL. Clean up btrfs_setxattr_trans() and drop the parameter. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Anand Jain authored
When the caller has already created the transaction handle, btrfs_setxattr() will use it. Also adds assert in btrfs_setxattr(). Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Anand Jain authored
btrfs_setxattr_trans() is called by 5 functions as below and all of them do updates. None of them would be roun on a read-only root. So its ok to remove the readonly root check here as it's a high-level conditon. 1. __btrfs_set_acl() btrfs_init_acl() btrfs_init_inode_security() 2. __btrfs_set_acl() btrfs_set_acl() 3. btrfs_set_prop() btrfs_set_prop_trans() / \ btrfs_ioctl_setflags() btrfs_xattr_handler_set_prop() 4. btrfs_xattr_handler_set() 5. btrfs_initxattrs() btrfs_xattr_security_init() btrfs_init_inode_security() Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Anand Jain authored
Preparatory patch, as we are going split the calls with and without transaction to use the respective btrfs_setxattr() and btrfs_setxattr_trans() functions. Export btrfs_setxattr() for calls outside of xattr.c. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Anand Jain authored
When trans is not NULL btrfs_setxattr() calls do_setxattr() directly with a check for readonly root. Rename do_setxattr() btrfs_setxattr() in preparation to call do_setxattr() directly instead. Preparatory patch, no functional changes. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Anand Jain authored
Rename btrfs_setxattr() to btrfs_setxattr_trans(), so that do_setxattr() can be renamed to btrfs_setxattr(). Preparatory patch, no functional changes. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
Unlike btrfs_tree_lock() and btrfs_tree_read_lock(), the remaining functions in locking.c will not sleep, thus doesn't make much sense to record their execution time. Those events are introduced mainly for user space tool to audit and detect lock leakage or dead lock. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
There are two tree lock events which can sleep: - btrfs_tree_read_lock() - btrfs_tree_lock() Sometimes we may need to look into the concurrency picture of the fs. For that case, we need the execution time of above two functions and the owner of @eb. Here we introduce a trace events for user space tools like bcc, to get the execution time of above two functions, and get detailed owner info where eBPF code can't. All the overhead is hidden behind the trace events, so if events are not enabled, there is no overhead. These trace events also output bytenr and generation, allow them to be pared with unlock events to pin down deadlock. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Filipe Manana authored
The member num_dirty_bgs of struct btrfs_transaction is not used anymore, it is set and incremented but nothing reads its value anymore. Its last read use was removed by commit 64403612 ("btrfs: rework btrfs_check_space_for_delayed_refs"). So just remove that member. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
We can read fs_info from the transaction and can drop it from the parameters. Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
We can read fs_info from the transaction and can drop it from the parameters. Signed-off-by: David Sterba <dsterba@suse.com>
-