- 16 May, 2022 40 commits
-
-
Christoph Hellwig authored
Reading a value from a different member of a union is not just a great way to obfuscate code, but also creates an aliasing violation. Switch btrfs_is_zoned to look at ->zone_size and remove the union. Note: union was to simplify the detection of zoned filesystem but now this is wrapped behind btrfs_is_zoned so we can drop the union. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> [ add note ] Signed-off-by: David Sterba <dsterba@suse.com>
-
Lv Ruyi authored
iput() already handles NULL and non-NULL parameter, so it is not needed to check that. This unifies all iput calls. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Lv Ruyi <lv.ruyi@zte.com.cn> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
The bios added to ->bio_list are the original bios fed into btrfs_map_bio, which are never advanced. Just use the iter in the bio itself. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
All the scrub bios go straight to the block device or the raid56 code, none of which looks at the btrfs_bio. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
Except for the spurious initialization of ->device just after allocation nothing uses the btrfs_bio, so just allocate a normal bio without extra data. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
Prepare for further refactoring by moving this initialization to a single place instead of setting it in the callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
Pass the block_device to bio_alloc_clone instead of setting it later. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
Prepare for additional refactoring, btrfs_map_bio is direct caller of submit_stripe_bio. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
The I/O in repair_io_failue is synchronous and doesn't need a btrfs_bio, so just use an on-stack bio. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
The I/O in repair_io_failue is synchronous and doesn't need a btrfs_bio, so just use an on-stack bio. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
The I/O in repair_io_failue is synchronous and doesn't need a btrfs_bio, so just use an on-stack bio. Also cleanup the error handling to use goto labels and not discard the actual return values. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
btrfsic_read_block does not need the btrfs_bio structure, so switch to plain bio_alloc (that also does not fail as it's backed by a bioset). Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
Require a separate call to the integrity checking helpers from the actual bio submission. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
Split out two helpers to make __btrfsic_submit_bio more readable. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Johannes Thumshirn authored
The current auto-reclaim algorithm starts reclaiming all block groups with a zone_unusable value above a configured threshold. This is causing a lot of reclaim IO even if there would be enough free zones on the device. Instead of only accounting a block groups zone_unusable value, also take the ratio of free and not usable (written as well as zone_unusable) bytes a device has into account. Tested-by: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Josef Bacik authored
For the non-zoned case we may want to set the threshold for reclaim to something below 50%. Change the acceptable threshold from 50-100 to 0-100. Tested-by: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Josef Bacik authored
This will allow us to set a threshold for block groups to be automatically relocated even if we don't have zoned devices. We have found this feature invaluable at Facebook due to how our workload interacts with the allocator. We have been using this in production for months with only a single problem that has already been fixed. Tested-by: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Josef Bacik authored
For non-zoned file systems it's useful to have the auto reclaim feature, however there are different use cases for non-zoned, for example we may not want to reclaim metadata chunks ever, only data chunks. Move this sysfs flag to per-space_info. This won't affect current users because this tunable only ever did anything for zoned, and that is currently hidden behind BTRFS_CONFIG_DEBUG. Tested-by: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> [ jth restore global bg_reclaim_threshold ] Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Filipe Manana authored
When checking if we can do a NOCOW write against a range covered by a file extent item, we do a quick a check to determine if the inode's root was snapshotted in a generation older than the generation of the file extent item or not. This is to quickly determine if the extent is likely shared and avoid the expensive check for cross references (this was added in commit 78d4295b ("btrfs: lift some btrfs_cross_ref_exist checks in nocow path"). We restrict that check to the case where the inode is not a free space inode (since commit 27a7ff55 ("btrfs: skip file_extent generation check for free_space_inode in run_delalloc_nocow")). That is because when we had the inode cache feature, inode caches were backed by a free space inode that belonged to the inode's root. However we don't have support for the inode cache feature since kernel 5.11, so we don't need this check anymore since free space inodes are now always related to free space caches, which are always associated to the root tree (which can't be snapshotted, and its last_snapshot field is always 0). So remove that condition. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Filipe Manana authored
Verifying if we can do a NOCOW write against a range fully or partially covered by a file extent item requires verifying several constraints, and these are currently duplicated at two different places: can_nocow_extent() and run_delalloc_nocow(). This change moves those checks into a common helper function to avoid duplication. It adds some comments and also preserves all existing behaviour like for example can_nocow_extent() treating errors from the calls to btrfs_cross_ref_exist() and csum_exist_in_range() as meaning we can not NOCOW, instead of propagating the error back to the caller. That specific behaviour is questionable but also reasonable to some degree. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Sweet Tea Dorminy authored
When allocating memory in a loop, each iteration should call memalloc_retry_wait() in order to prevent starving memory-freeing processes (and to mark where allocation loops are). Other filesystems do that as well. The bulk page allocation is the only place in btrfs with an allocation retry loop, so add an appropriate call to it. Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Sweet Tea Dorminy authored
While calling alloc_page() in a loop is an effective way to populate an array of pages, the MM subsystem provides a method to allocate pages in bulk. alloc_pages_bulk_array() populates the NULL slots in a page array, trying to grab more than one page at a time. Unfortunately, it doesn't guarantee allocating all slots in the array, but it's easy to call it in a loop and return an error if no progress occurs. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Sweet Tea Dorminy authored
Several functions currently populate an array of page pointers one allocated page at a time. Factor out the common code so as to allow improvements to all of the sites at once. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Yu Zhe authored
Explicit type casts are not necessary when it's void* to another pointer type. Signed-off-by: Yu Zhe <yuzhe@nfschina.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
With the recent change in metadata handling, we can handle metadata in the following cases: - nodesize < PAGE_SIZE and sectorsize < PAGE_SIZE Go subpage routine for both metadata and data. - nodesize < PAGE_SIZE and sectorsize >= PAGE_SIZE Invalid case for now. As we require nodesize >= sectorsize. - nodesize >= PAGE_SIZE and sectorsize < PAGE_SIZE Go subpage routine for data, but regular page routine for metadata. - nodesize >= PAGE_SIZE and sectorsize >= PAGE_SIZE Go regular page routine for both metadata and data. Now we can handle any sectorsize < PAGE_SIZE, plus the existing sectorsize == PAGE_SIZE support. But here we introduce an artificial limit, any PAGE_SIZE > 4K case, we will only support 4K and PAGE_SIZE as sector size. The idea here is to reduce the test combinations, and push 4K as the default standard in the future. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
The reason why we only support 64K page size for subpage is, for 64K page size we can ensure no matter what the nodesize is, we can fit it into one page. When other page size come, especially like 16K, the limitation is a bit limiting. To remove such limitation, we allow nodesize >= PAGE_SIZE case to go the non-subpage routine. By this, we can allow 4K sectorsize on 16K page size. Although this introduces another smaller limitation, the metadata can not cross page boundary, which is already met by most recent mkfs. Another small improvement is, we can avoid the overhead for metadata if nodesize >= PAGE_SIZE. For 4K sector size and 64K page size/node size, or 4K sector size and 16K page size/node size, we don't need to allocate extra memory for the metadata pages. Please note that, this patch will not yet enable other page size support yet. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
In function btrfs_read_sys_array(), we allocate a real extent buffer using btrfs_find_create_tree_block(). Such extent buffer will be even cached into buffer_radix tree, and using btree inode address space. However we only use such extent buffer to enable the accessors, thus we don't even need to bother using real extent buffer, a dummy one is what we really need. And for dummy extent buffer, we no longer need to do any special handling for the first page, as subpage helper is already doing it properly. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Naohiro Aota authored
Relocation of a data block group creates ordered extents. They can cause a hang when a process is trying to thaw the filesystem. We should have called sb_start_write(), so the filesystem is not being frozen. Add an ASSERT to check it is protected. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Naohiro Aota authored
Add a function sb_write_started() to allow callers to verify if sb_start_write() is properly called. It will be used for assertion in btrfs. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
Move code in btrfs_ioctl_balance to simplify its flow. This is possible thanks to the removal of balance v1 ioctl and ensuring 'arg' argument is always present. First move the code duplicating the userspace arg to the kernel 'barg'. This makes the out_unlock label redundant. Secondly, check the validity of bargs::flags before copying to the dynamically allocated 'bctl'. This removes the need for the out_bctl label. Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
With the removal of balance v1 ioctl the 'arg' argument is guaranteed to be present so simply remove all conditional code which checks for its presence. Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
The original code resets the page to 0x1 for not apparent reason, it's been like that since the initial 2007 code added in commit 07157aac ("Btrfs: Add file data csums back in via hooks in the extent map code"). It could mean that a failed buffer can be detected from the data but that's just a guess and any value is good. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>
-
Filipe Manana authored
When doing a NOWAIT direct IO write, if we can NOCOW then it means we can proceed with the non-blocking, NOWAIT path. However reserving the metadata space and qgroup meta space can often result in blocking - flushing delalloc, wait for ordered extents to complete, trigger transaction commits, etc, going against the semantics of a NOWAIT write. So make the NOWAIT write path to try to reserve all the metadata it needs without resulting in a blocking behaviour - if we get -ENOSPC or -EDQUOT then return -EAGAIN to make the caller fallback to a blocking direct IO write. This is part of a patchset comprised of the following patches: btrfs: avoid blocking on page locks with nowait dio on compressed range btrfs: avoid blocking nowait dio when locking file range btrfs: avoid double nocow check when doing nowait dio writes btrfs: stop allocating a path when checking if cross reference exists btrfs: free path at can_nocow_extent() before checking for checksum items btrfs: release path earlier at can_nocow_extent() btrfs: avoid blocking when allocating context for nowait dio read/write btrfs: avoid blocking on space revervation when doing nowait dio writes The following test was run before and after applying this patchset: $ cat io-uring-nodatacow-test.sh #!/bin/bash DEV=/dev/sdc MNT=/mnt/sdc MOUNT_OPTIONS="-o ssd -o nodatacow" MKFS_OPTIONS="-R free-space-tree -O no-holes" NUM_JOBS=4 FILE_SIZE=8G RUN_TIME=300 cat <<EOF > /tmp/fio-job.ini [io_uring_rw] rw=randrw fsync=0 fallocate=posix group_reporting=1 direct=1 ioengine=io_uring iodepth=64 bssplit=4k/20:8k/20:16k/20:32k/10:64k/10:128k/5:256k/5:512k/5:1m/5 filesize=$FILE_SIZE runtime=$RUN_TIME time_based filename=foobar directory=$MNT numjobs=$NUM_JOBS thread EOF echo performance | \ tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor umount $MNT &> /dev/null mkfs.btrfs -f $MKFS_OPTIONS $DEV &> /dev/null mount $MOUNT_OPTIONS $DEV $MNT fio /tmp/fio-job.ini umount $MNT The test was run a 12 cores box with 64G of ram, using a non-debug kernel config (Debian's default config) and a spinning disk. Result before the patchset: READ: bw=407MiB/s (427MB/s), 407MiB/s-407MiB/s (427MB/s-427MB/s), io=119GiB (128GB), run=300175-300175msec WRITE: bw=407MiB/s (427MB/s), 407MiB/s-407MiB/s (427MB/s-427MB/s), io=119GiB (128GB), run=300175-300175msec Result after the patchset: READ: bw=436MiB/s (457MB/s), 436MiB/s-436MiB/s (457MB/s-457MB/s), io=128GiB (137GB), run=300044-300044msec WRITE: bw=435MiB/s (456MB/s), 435MiB/s-435MiB/s (456MB/s-456MB/s), io=128GiB (137GB), run=300044-300044msec That's about +7.2% throughput for reads and +6.9% for writes. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Filipe Manana authored
When doing a NOWAIT direct IO read/write, we allocate a context object (struct btrfs_dio_data) with GFP_NOFS, which can result in blocking waiting for memory allocation (GFP_NOFS is __GFP_RECLAIM | __GFP_IO). This is undesirable for the NOWAIT semantics, so do the allocation with GFP_NOWAIT if we are serving a NOWAIT request and if the allocation fails return -EAGAIN, so that the caller can fallback to a blocking context and retry with a non-blocking write. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Filipe Manana authored
At can_nocow_extent(), we are releasing the path only after checking if the block group that has the target extent is read only, and after checking if there's delalloc in the range in case our extent is a preallocated extent. The read only extent check can be expensive if we have a very large filesystem with many block groups, as well as the check for delalloc in the inode's io_tree in case the io_tree is big due to IO on other file ranges. Our path is holding a read lock on a leaf and there's no need to keep the lock while doing those two checks, so release the path before doing them, immediately after the last use of the leaf. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Filipe Manana authored
When we look for checksum items, through csum_exist_in_range(), at can_nocow_extent(), we no longer need the path that we have previously allocated. Through csum_exist_in_range() -> btrfs_lookup_csums_range(), we also end up allocating a path, so we are adding unnecessary extra memory usage. So free the path before calling csum_exist_in_range(). Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Filipe Manana authored
At btrfs_cross_ref_exist() we always allocate a path, but we really don't need to because all its callers (only 2) already have an allocated path that is not being used when they call btrfs_cross_ref_exist(). So change btrfs_cross_ref_exist() to take a path as an argument and update both its callers to pass in the unused path they have when they call btrfs_cross_ref_exist(). Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Filipe Manana authored
When doing a NOWAIT direct IO write we are checking twice if we can COW into the target file range using can_nocow_extent() - once at the very beginning of the write path, at btrfs_write_check() via check_nocow_nolock(), and later again at btrfs_get_blocks_direct_write(). The can_nocow_extent() function does a lot of expensive things - searching for the file extent item in the inode's subvolume tree, searching for the extent item in the extent tree, checking delayed references, etc, so it isn't a very cheap call. We can remove the first check at btrfs_write_check(), and add there a quick check to verify if the inode has the NODATACOW or PREALLOC flags, and quickly bail out if it doesn't have neither of those flags, as that means we have to COW and therefore can't comply with the NOWAIT semantics. After this we do only one call to can_nocow_extent(), while we are at btrfs_get_blocks_direct_write(), where we have already locked the file range and we did a try lock on the range before, at btrfs_dio_iomap_begin() (since the previous patch in the series). Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Filipe Manana authored
If we are doing a NOWAIT direct IO read/write, we can block when locking the file range at btrfs_dio_iomap_begin(), as it's possible the range (or a part of it) is already locked by another task (mmap writes, another direct IO read/write racing with us, fiemap, etc). We are also waiting for completion of any ordered extent we find in the range, which also can block us for a significant amount of time. There's also the incorrect fallback to buffered IO (returning -ENOTBLK) when we are dealing with a NOWAIT request and we can't proceed. In this case we should be returning -EAGAIN, as falling back to buffered IO can result in blocking for many different reasons, so that the caller can delegate a retry to a context where blocking is more acceptable. Fix these cases by: 1) Doing a try lock on the file range and failing with -EAGAIN if we can not lock right away; 2) Fail with -EAGAIN if we find an ordered extent; 3) Return -EAGAIN instead of -ENOTBLK when we need to fallback to buffered IO and we have a NOWAIT request. This will also allow us to avoid a duplicated check that verifies if we are able to do a NOCOW write for NOWAIT direct IO writes, done in the next patch. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Filipe Manana authored
If we are doing NOWAIT direct IO read/write and our inode has compressed extents, we call filemap_fdatawrite_range() against the range in order to wait for compressed writeback to complete, since the generic code at iomap_dio_rw() calls filemap_write_and_wait_range() once, which is not enough to wait for compressed writeback to complete. This call to filemap_fdatawrite_range() can block on page locks, since the first writepages() on a range that we will try to compress results only in queuing a work to compress the data while holding the pages locked. Even though the generic code at iomap_dio_rw() will do the right thing and return -EAGAIN for NOWAIT requests in case there are pages in the range, we can still end up at btrfs_dio_iomap_begin() with pages in the range because either of the following can happen: 1) Memory mapped writes, as we haven't locked the range yet; 2) Buffered reads might have started, which lock the pages, and we do the filemap_fdatawrite_range() call before locking the file range. So don't call filemap_fdatawrite_range() at btrfs_dio_iomap_begin() if we are doing a NOWAIT read/write. Instead call filemap_range_needs_writeback() to check if there are any locked, dirty, or under writeback pages, and return -EAGAIN if that's the case. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-