- 21 Aug, 2023 39 commits
-
-
Christoph Hellwig authored
Currently the logic whether to compress or not in compress_file_range is a bit convoluted because it tries to share code for creating inline extents for the compressible [1] path and the bail to uncompressed path. But the latter isn't needed at all, because cow_file_range as called by submit_uncompressed_range will already create inline extents as needed, so there is no need to have special handling for it if we can live with the fact that it will be called a bit later in the ->ordered_func of the workqueue instead of right now. [1] there is undocumented logic that creates an uncompressed inline extent outside of the shall not compress logic if total_in is too small. This logic isn't explained in comments or any commit log I could find, so I've preserved it. Documentation explaining it would be appreciated if anyone understands this code. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
Reorder compress_file_range so that the main compression flow happens straight line and not in branches. To do this ensure that pages is always zeroed before a page allocation happens, which allows the cleanup_and_bail_uncompressed label to clean up the page allocations as needed. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
The code in submit_compressed_extents just loops over the async_extents, and doesn't need to be conditional on an inode being present, as there won't be any async_extent in the list if we created and inline extent. Merge the two functions to simplify the logic. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
There is no good reason to have the simple async_cow_start wrapper, merge the argument conversion into the main compress_file_range function. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
Now that the ->inode check isn't needed in submit_compressed_extents any more, there is no reason to clear the field early. Always keep the inode around until the work item is finished and remove the special casing, and the counting of compressed extents in compress_file_range. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
Instead of checking for a NULL !pages and explaining this with a cryptic comment, just check the compression type for BTRFS_COMPRESS_NONE to make the check self-explanatory. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
Instead of a separate page_started argument that tells the callers that btrfs_run_delalloc_range already started writeback by itself, overload the return value with a positive 1 in additio to 0 and a negative error code to indicate that is has already started writeback, and remove the nr_written argument as that caller can calculate it directly based on the range, and in fact already does so for the case where writeback wasn't started yet. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
Currently writepage_delalloc adds to delalloc_to_write in every loop operation. That is not only more work than doing it once after the loop, but can also over-increment the counter due to rounding errors when a new loop iteration starts with an offset into a page. Add a new page_start variable instead of recaculation that value over and over, move the delalloc_to_write calculation out of the loop, use the DIV_ROUND_UP helper instead of open coding it and remove the pointless found local variable. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
The return value from extent_write_locked_range is ignored, and that's fine because the error reporting happens through the mapping and ordered_extent. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
The return value from submit_uncompressed_range is ignored, and that's fine because the error reporting happens through the mapping and ordered_extent. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
Move the printk that is supposed to help to debug failures in submit_one_async_extent into submit_one_async_extent and make it coniditonal on actually having an error condition instead of spamming the log unconditionally. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
end_extent_writepage is a small helper that combines a call to btrfs_mark_ordered_io_finished with conditional error-only calls to btrfs_page_clear_uptodate and mapping_set_error with a somewhat unfortunate calling convention that passes and inclusive end instead of the len expected by the underlying functions. Remove end_extent_writepage and open code it in the 4 callers. Out of those two already are error-only and thus don't need the extra conditional, and one already has the mapping_set_error, so a duplicate call can be avoided. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
btrfs_writepage_endio_finish_ordered is a small wrapper around btrfs_mark_ordered_io_finished that just changs the argument passing slightly, and adds a tracepoint. Move the tracpoint to btrfs_mark_ordered_io_finished, which means it now also covers the error handling in btrfs_cleanup_ordered_extent and switch all callers to just call btrfs_mark_ordered_io_finished directly. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
There is a lot of complexity in __process_pages_contig to deal with the PAGE_LOCK case that can return an error unlike all the other actions. Open code the page iteration for page locking in lock_delalloc_pages and remove all the now unused code from __process_pages_contig. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
For NOCOW files, run_delalloc_nocow can still fall back to COW allocations when required and calls to fallback_to_cow helper for that. For such an allocation we can have multiple ordered_extents for existing extents that NOCOW overwrites and new allocations that fallback_to_cow creates. If one of the new extents is an inline extent, the writepages could would have to avoid normal page writeback for them as indicated by the page_started return argument, which run_delalloc_nocow can't return. Fix this by never creating inline extents from fallback_to_cow. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
The int used as bool unlock is not a very good way to describe the behavior, and the next patch will have to add another behavior modifier. We'll do that by two bool parameters instead of adding bit flags. Now specifies that the pages should always be kept locked. This is the inverse of the old unlock argument. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> [ switch flags to bool ] Signed-off-by: David Sterba <dsterba@suse.com>
-
Boris Burkov authored
btrfs_start_transaction reserves metadata space of the PERTRANS type before it identifies a transaction to start/join. This allows flushing when reserving that space without a deadlock. However, it results in a race which temporarily breaks qgroup rsv accounting. T1 T2 start_transaction do_stuff start_transaction qgroup_reserve_meta_pertrans commit_transaction qgroup_free_meta_all_pertrans hit an error starting txn goto reserve_fail qgroup_free_meta_pertrans (already freed!) The basic issue is that there is nothing preventing another commit from committing before start_transaction finishes (in fact sometimes we intentionally wait for it) so any error path that frees the reserve is at risk of this race. While this exact space was getting freed anyway, and it's not a huge deal to double free it (just a warning, the free code catches this), it can result in incorrectly freeing some other pertrans reservation in this same reservation, which could then lead to spuriously granting reservations we might not have the space for. Therefore, I do believe it is worth fixing. To fix it, use the existing prealloc->pertrans conversion mechanism. When we first reserve the space, we reserve prealloc space and only when we are sure we have a transaction do we convert it to pertrans. This way any racing commits do not blow away our reservation, but we still get a pertrans reservation that is freed when _this_ transaction gets committed. This issue can be reproduced by running generic/269 with either qgroups or squotas enabled via mkfs on the scratch device. Reviewed-by: Josef Bacik <josef@toxicpanda.com> CC: stable@vger.kernel.org # 5.10+ Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
-
Boris Burkov authored
If we do a write whose bio suffers an error, we will never reclaim the qgroup reserved space for it. We allocate the space in the write_iter codepath, then release the reservation as we allocate the ordered extent, but we only create a delayed ref if the ordered extent finishes. If it has an error, we simply leak the rsv. This is apparent in running any error injecting (dmerror) fstests like btrfs/146 or btrfs/160. Such tests fail due to dmesg on umount complaining about the leaked qgroup data space. When we clean up other aspects of space on failed ordered_extents, also free the qgroup rsv. Reviewed-by: Josef Bacik <josef@toxicpanda.com> CC: stable@vger.kernel.org # 5.10+ Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
-
Goldwyn Rodrigues authored
While performing compressed writes, if the extent reservation fails, the async extent pages are first freed in the error check for return value ret, and then again at out_free label. Remove the first call to free_async_extent_pages(). Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
Initially we preallocate btrfs_subpage structure in the main loop of alloc_extent_buffer(). But later commit fbca46eb ("btrfs: make nodesize >= PAGE_SIZE case to reuse the non-subpage routine") has made sure we only go subpage routine if our nodesize is smaller than PAGE_SIZE. This means for that case, we only need to allocate the subpage structure once anyway. So this patch would make the preallocation out of the main loop. This would slightly reduce the workload when we hold the page lock, and make code a little easier to read. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
nr_alloc_stripes can't be one if we are writing to a replacement device, as it is incremented for that case right above. Remove the duplicate checks. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
The @path in scrub_simple_mirror() is no longer utilized after commit e02ee89b ("btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure"). Before that commit, we call find_first_extent_item() directly, which needs a path and that path can be reused. But after that switch commit, the extent search is done inside queue_scrub_stripe(), which will no longer accept a path from outside. So the @path variable can be safely removed. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ remove the stale comment ] Signed-off-by: David Sterba <dsterba@suse.com>
-
Minjie Du authored
Simplify code pattern of 'folio->index + folio_nr_pages(folio)' by using the existing helper folio_next_index(). Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Minjie Du <duminjie@vivo.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
There's a helper for obtaining size of a struct member, we can use it instead of open coding the pointer magic. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
The integrity checker feature needs to be enabled at compile time (BTRFS_FS_CHECK_INTEGRITY) and then enabled by mount options check_int*. Although it provides some unique features which can not be provided by any other sanity checks like tree-checker, it does not only have high CPU and memory overhead, but is also a maintenance burden. For example, it's the only caller of btrfs_map_block() with @need_raid_map = 0. Considering most btrfs developers are not even testing this feature, I'm here to propose deprecation of this feature. For now only warning messages will be printed, the feature itself would still work. Removal time has been set to 6.7 release. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Filipe Manana authored
The function btrfs_free_excluded_extents() is only used by block-group.c, so move it into block-group.c and make it static. Also removed unnecessary variables that are used only once. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Filipe Manana authored
The code for btrfs_add_excluded_extent() is trivial, it's just a set_extent_bit() call. However it's defined in extent-tree.c but it is only used (twice) in block-group.c. So open code it in block-group.c, reducing the need to export a trivial function. Also since the only caller btrfs_add_excluded_extent() is prepared to deal with errors, stop ignoring errors from the set_extent_bit() call. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Filipe Manana authored
Currently find_first_extent_bit() returns a 0 if it found a range in the given io tree and 1 if it didn't find any. There's no need to return any errors, so make the return value a boolean and invert the logic to make more sense: return true if it found a range and false if it didn't find any range. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Filipe Manana authored
Currently btrfs_destroy_pinned_extent() is always returning 0 no matter what and its caller ignores its return value (as well everything up in the call chain). This is because this is called in the transaction abort path, where we can't even deal with any errors since we are in a critical situation already and cleanup of resources is done in a best effort fashion. So make btrfs_destroy_pinned_extent() return void. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Filipe Manana authored
Currently btrfs_destroy_marked_extents() is returning the value of the last call to find_first_extent_bit(), which returns a value of 1 meaning no more ranges found the dirty pages io tree. This value is useless to the single caller of btrfs_destroy_marked_extents(), which ignores any return value from btrfs_destroy_marked_extents(). This is because it's only used in the transaction abort path, where we can't even deal with any errors since we are in a critical situation already and cleanup of resources is done in a best effort fashion. So make btrfs_destroy_marked_extents() return void. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Filipe Manana authored
Since add_new_free_space() is exported, used outside block-group.c, rename it to include the 'btrfs_' prefix. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Filipe Manana authored
The documentation for add_new_free_space() is stale and no longer correct: 1) It's no longer used only when caching a block group. It's also called when creating a block group (btrfs_make_block_group()), when reading a block group at mount time (read_one_block_group()) and when reading the free space tree for a block group (typically the first time we attempt to allocate from the block group); 2) It has nothing to do with pinned extents. It only deals with the excluded extents io tree, which is used to track the locations of super blocks in order to make sure we never add the location of a super block to the free space cache of a block group. So update the documention and also add a description of the arguments and return values. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
After commit 6bfd0133 ("btrfs: raid56: switch scrub path to use a single function"), the raid56 implementation no longer uses different endio functions for RMW/recover/scrub. All read operations end in submit_read_wait_bio_list(), while all write operations end in submit_write_bios(). This means quite some trace events are out-of-date and no longer utilized. This patch would unify the trace events into just two: - trace_raid56_read() Replaces trace_raid56_read_partial(), trace_raid56_scrub_read() and trace_raid56_scrub_read_recover(). - trace_raid56_write() Replaces trace_raid56_write_stripe() and trace_raid56_scrub_write_stripe(). Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Anand Jain authored
ACL support depends on the compile-time configuration option CONFIG_BTRFS_FS_POSIX_ACL. Prior to mounting a btrfs filesystem, it is not possible to determine whether ACL support has been compiled in. To address this, add a sysfs interface, /sys/fs/btrfs/features/acl, and check for ACL support in the system's btrfs. To determine ACL support: Return 0 indicates ACL is not supported: $ cat /sys/fs/btrfs/features/acl 0 Return 1 indicates ACL is supported: $ cat /sys/fs/btrfs/features/acl 1 IMO, this is a better approach, so that we also know if kernel is older. On an older kernel $ ls /sys/fs/btrfs/features/acl ls: cannot access '/sys/fs/btrfs/features/acl': No such file or directory mount a btrfs filesystem $ cat /proc/self/mounts | grep btrfs | grep -q noacl $ echo $? 0 Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
Commit aca43fe8 ("btrfs: remove unused raid56 functions which were dedicated for scrub") removed the special handling of RAID56 scrub for missing device. As scrub goes full mirror_num based recovery, that means if it hits a missing device in RAID56, it would just try the next mirror, which would go through the BTRFS_RBIO_READ_REBUILD operation. This means there is no longer any use of BTRFS_RBIO_REBUILD_MISSING operation and we can safely remove it. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
The function btrfs_map_block() is a critical part of the btrfs storage layer, which handles mapping of logical ranges to physical ranges. Thus it's better to have some basic explanation, especially on the following points: - Segment split by various boundaries As a continuous logical range may be split into different segments, due to various factors like zones and RAID0/5/6/10 boundaries. - The meaning of @mirror_num - The possible single stripe optimization - One deprecated parameter @need_raid_map Just explicitly mark it deprecated so we're aware of the problem. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Colin Ian King authored
The variables leaf and slot are initialized when declared but the values assigned to them are never read as they are being re-assigned later on. The initializations are redundant and can be removed. Cleans up clang scan build warnings: fs/btrfs/tree-log.c:6797:25: warning: Value stored to 'leaf' during its initialization is never read [deadcode.DeadStores] fs/btrfs/tree-log.c:6798:7: warning: Value stored to 'slot' during its initialization is never read [deadcode.DeadStores] It's been there since b8aa330d ("Btrfs: improve performance on fsync of files with multiple hardlinks") without any usage so it's safe to be removed. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Colin Ian King authored
Variable stripe_nr is being divided by map->num_stripes however the result is never read. The division and assignment are redundant and can be removed. Cleans up clang scan build warning: fs/btrfs/scrub.c:1264:3: warning: Value stored to 'stripe_nr' is never read [deadcode.DeadStores] The code is a leftover from 6ded22c1 ("btrfs: reduce div64 calls by limiting the number of stripes of a chunk to u32") that converted div64 to normal division, it's the same but previous version did not trigger a warning. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Julia Lawall authored
Use vcalloc that checks potential multiplication overflows. The changes were done using Coccinelle semantic patch. Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
- 20 Aug, 2023 1 commit
-
-
Linus Torvalds authored
-