- 31 Mar, 2018 7 commits
-
-
Filipe Manana authored
When logging an inode, at tree-log.c:copy_items(), if we call btrfs_next_leaf() at the loop which checks for the need to log holes, we need to make sure copy_items() returns the value 1 to its caller and not 0 (on success). This is because the path the caller passed was released and is now different from what is was before, and the caller expects a return value of 0 to mean both success and that the path has not changed, while a return value of 1 means both success and signals the caller that it can not reuse the path, it has to perform another tree search. Even though this is a case that should not be triggered on normal circumstances or very rare at least, its consequences can be very unpredictable (especially when replaying a log tree). Fixes: 16e7549f ("Btrfs: incompatible format change to remove hole extents") Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Filipe Manana authored
When we have the no-holes mode enabled and fsync a file after punching a hole in it, we can end up not logging the whole hole range in the log tree. This happens if the file has extent items that span more than one leaf and we punch a hole that covers a range that starts in a leaf but does not go beyond the offset of the first extent in the next leaf. Example: $ mkfs.btrfs -f -O no-holes -n 65536 /dev/sdb $ mount /dev/sdb /mnt $ for ((i = 0; i <= 831; i++)); do offset=$((i * 2 * 256 * 1024)) xfs_io -f -c "pwrite -S 0xab -b 256K $offset 256K" \ /mnt/foobar >/dev/null done $ sync # We now have 2 leafs in our filesystem fs tree, the first leaf has an # item corresponding the extent at file offset 216530944 and the second # leaf has a first item corresponding to the extent at offset 217055232. # Now we punch a hole that partially covers the range of the extent at # offset 216530944 but does go beyond the offset 217055232. $ xfs_io -c "fpunch $((216530944 + 128 * 1024 - 4000)) 256K" /mnt/foobar $ xfs_io -c "fsync" /mnt/foobar <power fail> # mount to replay the log $ mount /dev/sdb /mnt # Before this patch, only the subrange [216658016, 216662016[ (length of # 4000 bytes) was logged, leaving an incorrect file layout after log # replay. Fix this by checking if there is a hole between the last extent item that we processed and the first extent item in the next leaf, and if there is one, log an explicit hole extent item. Fixes: 16e7549f ("Btrfs: incompatible format change to remove hole extents") Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
We have a nice helper to do proper casting of a qgroup to a ulist aux value. And several places that could make use of it. Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
This reverts commit 48a89bc4. The idea to commit transaction and free some space after hitting qgroup limit is good, although the problem is it can easily cause deadlocks. One deadlock example is caused by trying to flush data while still holding it: Call Trace: __schedule+0x49d/0x10f0 schedule+0xc6/0x290 schedule_timeout+0x187/0x1c0 wait_for_completion+0x204/0x3a0 btrfs_wait_ordered_extents+0xa40/0xaf0 [btrfs] qgroup_reserve+0x913/0xa10 [btrfs] btrfs_qgroup_reserve_data+0x3ef/0x580 [btrfs] btrfs_check_data_free_space+0x96/0xd0 [btrfs] __btrfs_buffered_write+0x3ac/0xd40 [btrfs] btrfs_file_write_iter+0x62a/0xba0 [btrfs] __vfs_write+0x320/0x430 vfs_write+0x107/0x270 SyS_write+0xbf/0x150 do_syscall_64+0x1b0/0x3d0 entry_SYSCALL64_slow_path+0x25/0x25 Another can be caused by trying to commit one transaction while nesting with trans handle held by ourselves: btrfs_start_transaction() |- btrfs_qgroup_reserve_meta_pertrans() |- qgroup_reserve() |- btrfs_join_transaction() |- btrfs_commit_transaction() The retry is causing more problems than exppected when limit is enabled. At least a graceful EDQUOT is way better than deadlock. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
Now trace_qgroup_meta_reserve() will have extra type parameter. And introduce two new trace events: 1) trace_qgroup_meta_free_all_pertrans() For btrfs_qgroup_free_meta_all_pertrans() 2) trace_qgroup_meta_convert() For btrfs_qgroup_convert_reserved_meta() Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
For quota disabled->enable case, it's possible that at reservation time quota was not enabled so no bytes were really reserved, while at release time, quota was enabled so we will try to release some bytes we didn't really own. Such situation can cause metadata reserveation underflow, for both types, also less possible for per-trans type since quota enable will commit transaction. To address this, record qgroup meta reserved bytes into root::qgroup_meta_rsv_pertrans and ::prealloc. So at releasing time we won't free any bytes we didn't reserve. For DATA, it's already handled by io_tree, so nothing needs to be done there. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
Quite similar for delalloc, some modification to delayed-inode and delayed-item reservation. Also needs extra parameter for release case to distinguish normal release and error release. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
- 30 Mar, 2018 33 commits
-
-
Qu Wenruo authored
Before this patch, btrfs qgroup is mixing per-transcation meta rsv with preallocated meta rsv, making it quite easy to underflow qgroup meta reservation. Since we have the new qgroup meta rsv types, apply it to delalloc reservation. Now for delalloc, most of its reserved space will use META_PREALLOC qgroup rsv type. And for callers reducing outstanding extent like btrfs_finish_ordered_io(), they will convert corresponding META_PREALLOC reservation to META_PERTRANS. This is mainly due to the fact that current qgroup numbers will only be updated in btrfs_commit_transaction(), that's to say if we don't keep such placeholder reservation, we can exceed qgroup limitation. And for callers freeing outstanding extent in error handler, we will just free META_PREALLOC bytes. This behavior makes callers of btrfs_qgroup_release_meta() or btrfs_qgroup_convert_meta() to be aware of which type they are. So in this patch, btrfs_delalloc_release_metadata() and its callers get an extra parameter to info qgroup to do correct meta convert/release. The good news is, even we use the wrong type (convert or free), it won't cause obvious bug, as prealloc type is always in good shape, and the type only affects how per-trans meta is increased or not. So the worst case will be at most metadata limitation can be sometimes exceeded (no convert at all) or metadata limitation is reached too soon (no free at all). Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
For meta_prealloc reservation users, after btrfs_join_transaction() caller will modify tree so part (or even all) meta_prealloc reservation should be converted to meta_pertrans until transaction commit time. This patch introduces a new function, btrfs_qgroup_convert_reserved_meta() to do this for META_PREALLOC reservation user. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
Since qgroup has seperate metadata reservation types now, we can completely get rid of the old root->qgroup_meta_rsv, which mostly acts as current META_PERTRANS reservation type. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
Btrfs uses 2 different methods to reseve metadata qgroup space. 1) Reserve at btrfs_start_transaction() time This is quite straightforward, caller will use the trans handler allocated to modify b-trees. In this case, reserved metadata should be kept until qgroup numbers are updated. 2) Reserve by using block_rsv first, and later btrfs_join_transaction() This is more complicated, caller will reserve space using block_rsv first, and then later call btrfs_join_transaction() to get a trans handle. In this case, before we modify trees, the reserved space can be modified on demand, and after btrfs_join_transaction(), such reserved space should also be kept until qgroup numbers are updated. Since these two types behave differently, split the original "META" reservation type into 2 sub-types: META_PERTRANS: For above case 1) META_PREALLOC: For reservations that happened before btrfs_join_transaction() of case 2) NOTE: This patch will only convert existing qgroup meta reservation callers according to its situation, not ensuring all callers are at correct timing. Such fix will be added in later patches. Signed-off-by: Qu Wenruo <wqu@suse.com> [ update comments ] Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
So qgroup is switched to new separate types reservation system. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
When modifying qgroup relationship, for qgroup which only owns exclusive extents, we will go through quick update path. In this path, we will add/subtract exclusive and reference number for parent qgroup, since the source (child) qgroup only has exclusive extents, destination (parent) qgroup will also own or lose those extents exclusively. The same should be the same for reservation, since later reservation adding/releasing will also affect parent qgroup, without the reservation carried from child, parent will underflow reservation or have dead reservation which will never be freed. However original code doesn't do the same thing for reservation. It handles qgroup reservation quite differently: It removes qgroup reservation, as it's allocating space from the reserved qgroup for relationship adding. But does nothing for qgroup reservation if we're removing a qgroup relationship. According to the original code, it looks just like because we're adding qgroup->rfer, the code assumes we're writing new data, so it's follows the normal write routine, by reducing qgroup->reserved and adding qgroup->rfer/excl. This old behavior is wrong, and should be fixed to follow the same excl/rfer behavior. Just fix it by using the correct behavior described above. Fixes: 31193213 ("Btrfs: qgroup: Introduce a may_use to account space_info->bytes_may_use.") Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
Since most callers of qgroup_reserve() are already defined by type, converting qgroup_reserve() is quite an easy work. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
Introduce helpers to: 1) Get total reserved space For limit calculation 2) Add/release reserved space for given type With underflow detection and warning 3) Add/release reserved space according to child qgroup Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
Instead of single qgroup->reserved, use a new structure btrfs_qgroup_rsv to store different types of reservation. This patch only updates the header needed to compile. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Omar Sandoval authored
btrfs_orphan_add() has had this case commented out since it was first introduced in commit d68fc57b ("Btrfs: Metadata reservation for orphan inodes"). Most of the orphan cleanup code has been rewritten since then, so it's safe to say that this code isn't needed. Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> [ switch to bool ] Signed-off-by: David Sterba <dsterba@suse.com>
-
Misono, Tomohiro authored
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Jeff Mahoney authored
Any time the first block group of a new type is created, we add a new kobject to sysfs to hold the attributes for that type. Kobject-internal allocations always use GFP_KERNEL, making them prone to fs-reclaim races. While it appears as if this can occur any time a block group is created, the only times the first block group of a new type can be created in memory is at mount and when we create the first new block group during raid conversion. This patch adds a new list to track pending kobject additions and then handles them after we do chunk relocation. Between relocating the target chunk (or forcing allocation of a new chunk in the case of data) and removing the old chunk, we're in a safe place for fs-reclaim to occur. We're holding the volume mutex, which is already held across page faults, and the delete_unused_bgs_mutex, which will only stall the cleaner thread. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Jeff Mahoney authored
Since commit 2be12ef7 (btrfs: Separate space_info create/update), we've separated out the creation and updating of the space info structures. That commit was a straightforward refactoring of the two parts of update_space_info, but we can go a step further. Since commits c59021f8 (Btrfs: fix OOPS of empty filesystem after balance) and b742bb82 (Btrfs: Link block groups of different raid types), we know that the space_info structures will be created at mount and there will only ever be, at most, three of them. This patch cleans out the create_space_info calls after __find_space_info returns NULL since __find_space_info *can't* return NULL. The initial cause for reviewing this was the kobject_add calls from create_space_info occuring in sites where fs-reclaim wasn't allowed. Now we are certain they occur only early in the mount process and are safe. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Liu Bo authored
Rebuild on missing device is as same as recover, after it's done, rbio has data which is consistent with on-disk data, so it can be cached to avoid further reads. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Jeff Mahoney authored
While running btrfs/011, I hit the following lockdep splat. This is the important bit: pcpu_alloc+0x1ac/0x5e0 __percpu_counter_init+0x4e/0xb0 btrfs_init_fs_root+0x99/0x1c0 [btrfs] btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs] resolve_indirect_refs+0x130/0x830 [btrfs] find_parent_nodes+0x69e/0xff0 [btrfs] btrfs_find_all_roots_safe+0xa0/0x110 [btrfs] btrfs_find_all_roots+0x50/0x70 [btrfs] btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs] btrfs_commit_transaction+0x3ce/0x9b0 [btrfs] The percpu_counter_init call in btrfs_alloc_subvolume_writers uses GFP_KERNEL, which we can't do during transaction commit. This switches it to GFP_NOFS. ======================================================== WARNING: possible irq lock inversion dependency detected 4.12.14-kvmsmall #8 Tainted: G W -------------------------------------------------------- kswapd0/50 just changed the state of lock: (&delayed_node->mutex){+.+.-.}, at: [<ffffffffc06994fa>] __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs] but this lock took another, RECLAIM_FS-unsafe lock in the past: (pcpu_alloc_mutex){+.+.+.} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Chain exists of: &delayed_node->mutex --> &found->groups_sem --> pcpu_alloc_mutex Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(pcpu_alloc_mutex); local_irq_disable(); lock(&delayed_node->mutex); lock(&found->groups_sem); <Interrupt> lock(&delayed_node->mutex); *** DEADLOCK *** 2 locks held by kswapd0/50: #0: (shrinker_rwsem){++++..}, at: [<ffffffff811dc11f>] shrink_slab+0x7f/0x5b0 #1: (&type->s_umount_key#30){+++++.}, at: [<ffffffff8126dec6>] trylock_super+0x16/0x50 the shortest dependencies between 2nd lock and 1st lock: -> (pcpu_alloc_mutex){+.+.+.} ops: 4904 { HARDIRQ-ON-W at: __mutex_lock+0x4e/0x8c0 pcpu_alloc+0x1ac/0x5e0 alloc_kmem_cache_cpus.isra.70+0x25/0xa0 __do_tune_cpucache+0x2c/0x220 do_tune_cpucache+0x26/0xc0 enable_cpucache+0x6d/0xf0 kmem_cache_init_late+0x42/0x75 start_kernel+0x343/0x4cb x86_64_start_kernel+0x127/0x134 secondary_startup_64+0xa5/0xb0 SOFTIRQ-ON-W at: __mutex_lock+0x4e/0x8c0 pcpu_alloc+0x1ac/0x5e0 alloc_kmem_cache_cpus.isra.70+0x25/0xa0 __do_tune_cpucache+0x2c/0x220 do_tune_cpucache+0x26/0xc0 enable_cpucache+0x6d/0xf0 kmem_cache_init_late+0x42/0x75 start_kernel+0x343/0x4cb x86_64_start_kernel+0x127/0x134 secondary_startup_64+0xa5/0xb0 RECLAIM_FS-ON-W at: __kmalloc+0x47/0x310 pcpu_extend_area_map+0x2b/0xc0 pcpu_alloc+0x3ec/0x5e0 alloc_kmem_cache_cpus.isra.70+0x25/0xa0 __do_tune_cpucache+0x2c/0x220 do_tune_cpucache+0x26/0xc0 enable_cpucache+0x6d/0xf0 __kmem_cache_create+0x1bf/0x390 create_cache+0xba/0x1b0 kmem_cache_create+0x1f8/0x2b0 ksm_init+0x6f/0x19d do_one_initcall+0x50/0x1b0 kernel_init_freeable+0x201/0x289 kernel_init+0xa/0x100 ret_from_fork+0x3a/0x50 INITIAL USE at: __mutex_lock+0x4e/0x8c0 pcpu_alloc+0x1ac/0x5e0 alloc_kmem_cache_cpus.isra.70+0x25/0xa0 setup_cpu_cache+0x2f/0x1f0 __kmem_cache_create+0x1bf/0x390 create_boot_cache+0x8b/0xb1 kmem_cache_init+0xa1/0x19e start_kernel+0x270/0x4cb x86_64_start_kernel+0x127/0x134 secondary_startup_64+0xa5/0xb0 } ... key at: [<ffffffff821d8e70>] pcpu_alloc_mutex+0x70/0xa0 ... acquired at: pcpu_alloc+0x1ac/0x5e0 __percpu_counter_init+0x4e/0xb0 btrfs_init_fs_root+0x99/0x1c0 [btrfs] btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs] resolve_indirect_refs+0x130/0x830 [btrfs] find_parent_nodes+0x69e/0xff0 [btrfs] btrfs_find_all_roots_safe+0xa0/0x110 [btrfs] btrfs_find_all_roots+0x50/0x70 [btrfs] btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs] btrfs_commit_transaction+0x3ce/0x9b0 [btrfs] transaction_kthread+0x176/0x1b0 [btrfs] kthread+0x102/0x140 ret_from_fork+0x3a/0x50 -> (&fs_info->commit_root_sem){++++..} ops: 1566382 { HARDIRQ-ON-W at: down_write+0x3e/0xa0 cache_block_group+0x287/0x420 [btrfs] find_free_extent+0x106c/0x12d0 [btrfs] btrfs_reserve_extent+0xd8/0x170 [btrfs] cow_file_range.isra.66+0x133/0x470 [btrfs] run_delalloc_range+0x121/0x410 [btrfs] writepage_delalloc.isra.50+0xfe/0x180 [btrfs] __extent_writepage+0x19a/0x360 [btrfs] extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs] extent_writepages+0x4d/0x60 [btrfs] do_writepages+0x1a/0x70 __filemap_fdatawrite_range+0xa7/0xe0 btrfs_rename+0x5ee/0xdb0 [btrfs] vfs_rename+0x52a/0x7e0 SyS_rename+0x351/0x3b0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 HARDIRQ-ON-R at: down_read+0x35/0x90 caching_thread+0x57/0x560 [btrfs] normal_work_helper+0x1c0/0x5e0 [btrfs] process_one_work+0x1e0/0x5c0 worker_thread+0x44/0x390 kthread+0x102/0x140 ret_from_fork+0x3a/0x50 SOFTIRQ-ON-W at: down_write+0x3e/0xa0 cache_block_group+0x287/0x420 [btrfs] find_free_extent+0x106c/0x12d0 [btrfs] btrfs_reserve_extent+0xd8/0x170 [btrfs] cow_file_range.isra.66+0x133/0x470 [btrfs] run_delalloc_range+0x121/0x410 [btrfs] writepage_delalloc.isra.50+0xfe/0x180 [btrfs] __extent_writepage+0x19a/0x360 [btrfs] extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs] extent_writepages+0x4d/0x60 [btrfs] do_writepages+0x1a/0x70 __filemap_fdatawrite_range+0xa7/0xe0 btrfs_rename+0x5ee/0xdb0 [btrfs] vfs_rename+0x52a/0x7e0 SyS_rename+0x351/0x3b0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 SOFTIRQ-ON-R at: down_read+0x35/0x90 caching_thread+0x57/0x560 [btrfs] normal_work_helper+0x1c0/0x5e0 [btrfs] process_one_work+0x1e0/0x5c0 worker_thread+0x44/0x390 kthread+0x102/0x140 ret_from_fork+0x3a/0x50 INITIAL USE at: down_write+0x3e/0xa0 cache_block_group+0x287/0x420 [btrfs] find_free_extent+0x106c/0x12d0 [btrfs] btrfs_reserve_extent+0xd8/0x170 [btrfs] cow_file_range.isra.66+0x133/0x470 [btrfs] run_delalloc_range+0x121/0x410 [btrfs] writepage_delalloc.isra.50+0xfe/0x180 [btrfs] __extent_writepage+0x19a/0x360 [btrfs] extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs] extent_writepages+0x4d/0x60 [btrfs] do_writepages+0x1a/0x70 __filemap_fdatawrite_range+0xa7/0xe0 btrfs_rename+0x5ee/0xdb0 [btrfs] vfs_rename+0x52a/0x7e0 SyS_rename+0x351/0x3b0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 } ... key at: [<ffffffffc0729578>] __key.61970+0x0/0xfffffffffff9aa88 [btrfs] ... acquired at: cache_block_group+0x287/0x420 [btrfs] find_free_extent+0x106c/0x12d0 [btrfs] btrfs_reserve_extent+0xd8/0x170 [btrfs] btrfs_alloc_tree_block+0x12f/0x4c0 [btrfs] btrfs_create_tree+0xbb/0x2a0 [btrfs] btrfs_create_uuid_tree+0x37/0x140 [btrfs] open_ctree+0x23c0/0x2660 [btrfs] btrfs_mount+0xd36/0xf90 [btrfs] mount_fs+0x3a/0x160 vfs_kern_mount+0x66/0x150 btrfs_mount+0x18c/0xf90 [btrfs] mount_fs+0x3a/0x160 vfs_kern_mount+0x66/0x150 do_mount+0x1c1/0xcc0 SyS_mount+0x7e/0xd0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 -> (&found->groups_sem){++++..} ops: 2134587 { HARDIRQ-ON-W at: down_write+0x3e/0xa0 __link_block_group+0x34/0x130 [btrfs] btrfs_read_block_groups+0x33d/0x7b0 [btrfs] open_ctree+0x2054/0x2660 [btrfs] btrfs_mount+0xd36/0xf90 [btrfs] mount_fs+0x3a/0x160 vfs_kern_mount+0x66/0x150 btrfs_mount+0x18c/0xf90 [btrfs] mount_fs+0x3a/0x160 vfs_kern_mount+0x66/0x150 do_mount+0x1c1/0xcc0 SyS_mount+0x7e/0xd0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 HARDIRQ-ON-R at: down_read+0x35/0x90 btrfs_calc_num_tolerated_disk_barrier_failures+0x113/0x1f0 [btrfs] open_ctree+0x207b/0x2660 [btrfs] btrfs_mount+0xd36/0xf90 [btrfs] mount_fs+0x3a/0x160 vfs_kern_mount+0x66/0x150 btrfs_mount+0x18c/0xf90 [btrfs] mount_fs+0x3a/0x160 vfs_kern_mount+0x66/0x150 do_mount+0x1c1/0xcc0 SyS_mount+0x7e/0xd0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 SOFTIRQ-ON-W at: down_write+0x3e/0xa0 __link_block_group+0x34/0x130 [btrfs] btrfs_read_block_groups+0x33d/0x7b0 [btrfs] open_ctree+0x2054/0x2660 [btrfs] btrfs_mount+0xd36/0xf90 [btrfs] mount_fs+0x3a/0x160 vfs_kern_mount+0x66/0x150 btrfs_mount+0x18c/0xf90 [btrfs] mount_fs+0x3a/0x160 vfs_kern_mount+0x66/0x150 do_mount+0x1c1/0xcc0 SyS_mount+0x7e/0xd0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 SOFTIRQ-ON-R at: down_read+0x35/0x90 btrfs_calc_num_tolerated_disk_barrier_failures+0x113/0x1f0 [btrfs] open_ctree+0x207b/0x2660 [btrfs] btrfs_mount+0xd36/0xf90 [btrfs] mount_fs+0x3a/0x160 vfs_kern_mount+0x66/0x150 btrfs_mount+0x18c/0xf90 [btrfs] mount_fs+0x3a/0x160 vfs_kern_mount+0x66/0x150 do_mount+0x1c1/0xcc0 SyS_mount+0x7e/0xd0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 INITIAL USE at: down_write+0x3e/0xa0 __link_block_group+0x34/0x130 [btrfs] btrfs_read_block_groups+0x33d/0x7b0 [btrfs] open_ctree+0x2054/0x2660 [btrfs] btrfs_mount+0xd36/0xf90 [btrfs] mount_fs+0x3a/0x160 vfs_kern_mount+0x66/0x150 btrfs_mount+0x18c/0xf90 [btrfs] mount_fs+0x3a/0x160 vfs_kern_mount+0x66/0x150 do_mount+0x1c1/0xcc0 SyS_mount+0x7e/0xd0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 } ... key at: [<ffffffffc0729488>] __key.59101+0x0/0xfffffffffff9ab78 [btrfs] ... acquired at: find_free_extent+0xcb4/0x12d0 [btrfs] btrfs_reserve_extent+0xd8/0x170 [btrfs] btrfs_alloc_tree_block+0x12f/0x4c0 [btrfs] __btrfs_cow_block+0x110/0x5b0 [btrfs] btrfs_cow_block+0xd7/0x290 [btrfs] btrfs_search_slot+0x1f6/0x960 [btrfs] btrfs_lookup_inode+0x2a/0x90 [btrfs] __btrfs_update_delayed_inode+0x65/0x210 [btrfs] btrfs_commit_inode_delayed_inode+0x121/0x130 [btrfs] btrfs_evict_inode+0x3fe/0x6a0 [btrfs] evict+0xc4/0x190 __dentry_kill+0xbf/0x170 dput+0x2ae/0x2f0 SyS_rename+0x2a6/0x3b0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 -> (&delayed_node->mutex){+.+.-.} ops: 5580204 { HARDIRQ-ON-W at: __mutex_lock+0x4e/0x8c0 btrfs_delayed_update_inode+0x46/0x6e0 [btrfs] btrfs_update_inode+0x83/0x110 [btrfs] btrfs_dirty_inode+0x62/0xe0 [btrfs] touch_atime+0x8c/0xb0 do_generic_file_read+0x818/0xb10 __vfs_read+0xdc/0x150 vfs_read+0x8a/0x130 SyS_read+0x45/0xa0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 SOFTIRQ-ON-W at: __mutex_lock+0x4e/0x8c0 btrfs_delayed_update_inode+0x46/0x6e0 [btrfs] btrfs_update_inode+0x83/0x110 [btrfs] btrfs_dirty_inode+0x62/0xe0 [btrfs] touch_atime+0x8c/0xb0 do_generic_file_read+0x818/0xb10 __vfs_read+0xdc/0x150 vfs_read+0x8a/0x130 SyS_read+0x45/0xa0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 IN-RECLAIM_FS-W at: __mutex_lock+0x4e/0x8c0 __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs] btrfs_evict_inode+0x22c/0x6a0 [btrfs] evict+0xc4/0x190 dispose_list+0x35/0x50 prune_icache_sb+0x42/0x50 super_cache_scan+0x139/0x190 shrink_slab+0x262/0x5b0 shrink_node+0x2eb/0x2f0 kswapd+0x2eb/0x890 kthread+0x102/0x140 ret_from_fork+0x3a/0x50 INITIAL USE at: __mutex_lock+0x4e/0x8c0 btrfs_delayed_update_inode+0x46/0x6e0 [btrfs] btrfs_update_inode+0x83/0x110 [btrfs] btrfs_dirty_inode+0x62/0xe0 [btrfs] touch_atime+0x8c/0xb0 do_generic_file_read+0x818/0xb10 __vfs_read+0xdc/0x150 vfs_read+0x8a/0x130 SyS_read+0x45/0xa0 do_syscall_64+0x79/0x1e0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 } ... key at: [<ffffffffc072d488>] __key.56935+0x0/0xfffffffffff96b78 [btrfs] ... acquired at: __lock_acquire+0x264/0x11c0 lock_acquire+0xbd/0x1e0 __mutex_lock+0x4e/0x8c0 __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs] btrfs_evict_inode+0x22c/0x6a0 [btrfs] evict+0xc4/0x190 dispose_list+0x35/0x50 prune_icache_sb+0x42/0x50 super_cache_scan+0x139/0x190 shrink_slab+0x262/0x5b0 shrink_node+0x2eb/0x2f0 kswapd+0x2eb/0x890 kthread+0x102/0x140 ret_from_fork+0x3a/0x50 stack backtrace: CPU: 1 PID: 50 Comm: kswapd0 Tainted: G W 4.12.14-kvmsmall #8 SLE15 (unreleased) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Call Trace: dump_stack+0x78/0xb7 print_irq_inversion_bug.part.38+0x19f/0x1aa check_usage_forwards+0x102/0x120 ? ret_from_fork+0x3a/0x50 ? check_usage_backwards+0x110/0x110 mark_lock+0x16c/0x270 __lock_acquire+0x264/0x11c0 ? pagevec_lookup_entries+0x1a/0x30 ? truncate_inode_pages_range+0x2b3/0x7f0 lock_acquire+0xbd/0x1e0 ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs] __mutex_lock+0x4e/0x8c0 ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs] ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs] ? btrfs_evict_inode+0x1f6/0x6a0 [btrfs] __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs] btrfs_evict_inode+0x22c/0x6a0 [btrfs] evict+0xc4/0x190 dispose_list+0x35/0x50 prune_icache_sb+0x42/0x50 super_cache_scan+0x139/0x190 shrink_slab+0x262/0x5b0 shrink_node+0x2eb/0x2f0 kswapd+0x2eb/0x890 kthread+0x102/0x140 ? mem_cgroup_shrink_node+0x2c0/0x2c0 ? kthread_create_on_node+0x40/0x40 ret_from_fork+0x3a/0x50 Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Anand Jain authored
Drop optimal argument from the function find_live_mirror() as we can deduce it in the function itself. Also rename optimal to preferred_mirror. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Anand Jain authored
Obtain the stripes info from the map directly and so no need to pass it as an argument. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
It's provided by transaction handle. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
It's provided by the transaction handle. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
It's provided by the transaction handle. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
It's provided by the transaction handle. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
Added by 08e007d2 ("Btrfs: improve the noflush reservation") and made redundant by 17024ad0 ("Btrfs: fix early ENOSPC due to delalloc"). Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
Added by b4570aa9 ("btrfs: fix compiling with CONFIG_BTRFS_DEBUG enabled.") and obsoleted by 2ff7e61e ("btrfs: take an fs_info directly when the root is not used otherwise"). Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
Introduced by 5cdc7ad3 ("btrfs: Replace fs_info->workers with btrfs_workqueue.") but obsoleted by 2a458198 ("btrfs: factor btrfs_init_workqueues() out of open_ctree()"). Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
Added in 38c227d8 ("Btrfs: snapshot-aware defrag") but subsequently made redundant by 0b246afa ("btrfs: root->fs_info cleanup, add fs_info convenience variables"). Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
Added already unused in a6fa6fae ("btrfs: Add lzo compression support"). Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
Added in b5d67f64 ("Btrfs: change scrub to support big blocks") but rendered redundant by be50a8dd ("Btrfs: Simplify scrub_setup_recheck_block()'s argument"). Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
Added as part of 86d5f994 ("btrfs: convert prelimary reference tracking to use rbtrees") but never used. tmp_op_key essentially subsumed that variable. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
When checking the minimal nr_devs, there is one dead and meaningless condition: if (ndevs < devs_increment * sub_stripes || ndevs < devs_min) { ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This condition is meaningless, @devs_increment has nothing to do with @sub_stripes. In fact, in btrfs_raid_array[], profile with sub_stripes larger than 1 (RAID10) already has the @devs_increment set to 2. So no need to multiple it by @sub_stripes. And above condition is also dead. For RAID10, @devs_increment * @sub_stripes equals 4, which is also the @devs_min of RAID10. For other profiles, @sub_stripes is always 1, and since @ndevs is rounded down to @devs_increment, the condition will always be true. Remove the meaningless condition to make later reader wander less. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
This function is the entry to the extent allocator and as such has quite a number of parameters. Some of those have subtle effects on the allocation algorithm. Document the parameters. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
As with every function which deals with modifying the btree btrfs_uuid_tree_rem can fail for any number of reasons (ie. EIO/ENOMEM). Handle return error value from this function gracefully by aborting the transaction. Fixes: dd5f9615 ("Btrfs: maintain subvolume items in the UUID tree") Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
The kernel would like to have all stack VLA usage removed[1]. Unfortunately using an integer constant variable as the size of an array is still considered a VLA. Instead let's use directly sizeof(var) which removes the VLA usage. Use the occasion to remove csum_size altogether and use sizeof() also for the size passed to memcmp [1]: https://lkml.org/lkml/2018/3/7/621Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-