- 13 Sep, 2024 3 commits
-
-
Keith Busch authored
Always defining the field will make using it easier and less error prone in future patches. There shouldn't be any downside to this: the field fits in what would otherwise be a 2-byte hole, so we're not saving space by conditionally leaving it out. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20240913182854.2445457-2-kbusch@meta.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
git://git.infradead.org/nvmeJens Axboe authored
Pull NVMe fixes from Keith: "nvme fixes for Linux 6.12 - A syntax cleanup (Shen) - Fix a Kconfig linking error (Arnd) - New queue-depth quirk (Keith)" * tag 'nvme-6.12-2024-09-13' of git://git.infradead.org/nvme: nvme-pci: qdepth 1 quirk nvme-tcp: fix link failure for TCP auth nvme: Convert comma to semicolon
-
Keith Busch authored
Another device has been reported to be unreliable if we have more than one outstanding command. In this new case, data corruption may occur. Since we have two devices now needing this quirky behavior, make a generic quirk flag. The same Apple quirk is clearly not "temporary", so update the comment while moving it. Link: https://lore.kernel.org/linux-nvme/191d810a4e3.fcc6066c765804.973611676137075390@collabora.com/Reported-by: Robert Beckett <bob.beckett@collabora.com> Reviewed-by: Christoph Hellwig hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
-
- 12 Sep, 2024 1 commit
-
-
Riyan Dhiman authored
The blk_add_partition() function initially used a single if-condition (IS_ERR(part)) to check for errors when adding a partition. This was modified to handle the specific case of -ENXIO separately, allowing the function to proceed without logging the error in this case. However, this change unintentionally left a path where md_autodetect_dev() could be called without confirming that part is a valid pointer. This commit separates the error handling logic by splitting the initial if-condition, improving code readability and handling specific error scenarios explicitly. The function now distinguishes the general error case from -ENXIO without altering the existing behavior of md_autodetect_dev() calls. Fixes: b7205307 (block: allow partitions on host aware zone devices) Signed-off-by: Riyan Dhiman <riyandhiman14@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240911132954.5874-1-riyandhiman14@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 11 Sep, 2024 5 commits
-
-
Colin Ian King authored
The static array vrate_adj_pct is read-only, so make it const as well. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20240911214124.197403-1-colin.i.king@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Kundan Kumar authored
Use newly added mm function unpin_user_folio() to put refs by npages count. Signed-off-by: Kundan Kumar <kundan.kumar@samsung.com> Tested-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20240911064935.5630-5-kundan.kumar@samsung.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Kundan Kumar authored
Add a new function unpin_user_folio() to put the refs of a folio by npages count. The check for BIO_PAGE_PINNED flag is removed as it is already checked in bio_release_pages(). Signed-off-by: Kundan Kumar <kundan.kumar@samsung.com> Tested-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20240911064935.5630-4-kundan.kumar@samsung.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Kundan Kumar authored
Add a bigger size from folio to bio and skip merge processing for pages. Fetch the offset of page within a folio. Depending on the size of folio and folio_offset, fetch a larger length. This length may consist of multiple contiguous pages if folio is multiorder. Using the length calculate number of pages which will be added to bio and increment the loop counter to skip those pages. This technique helps to avoid overhead of merging pages which belong to same large order folio. Also folio-ize the functions bio_iov_add_page() and bio_iov_add_zone_append_page() Signed-off-by: Kundan Kumar <kundan.kumar@samsung.com> Tested-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20240911064935.5630-3-kundan.kumar@samsung.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Kundan Kumar authored
Added new bio_add_hw_folio() function as a wrapper around bio_add_hw_page(). This is a prep patch. Signed-off-by: Kundan Kumar <kundan.kumar@samsung.com> Tested-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20240911064935.5630-2-kundan.kumar@samsung.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 10 Sep, 2024 11 commits
-
-
Yu Kuai authored
Make code cleaner, there are no functional changes. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240909134154.954924-8-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Yu Kuai authored
Now that 'bfqq_already_existing' is only used in one branch, it can be removed. There are no functional changes. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240909134154.954924-7-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Yu Kuai authored
The local variable is used to call bfq_bfqq_resume_state() later, since 'bfqd->lock' is held, and bfqq status will not change between setting 'split' and calling bfq_bfqq_resume_state(), move forward bfq_bfqq_resume_state() so that 'split' can be removed. There are no functional chagnes. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240909134154.954924-6-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Yu Kuai authored
It's not used, hence can be removed. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240909134154.954924-5-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Yu Kuai authored
Because bfq_put_cooperator() is always followed by bfq_release_process_ref(). Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240909134154.954924-4-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Yu Kuai authored
Original state: Process 1 Process 2 Process 3 Process 4 (BIC1) (BIC2) (BIC3) (BIC4) Λ | | | \--------------\ \-------------\ \-------------\| V V V bfqq1--------->bfqq2---------->bfqq3----------->bfqq4 ref 0 1 2 4 After commit 0e456dba ("block, bfq: choose the last bfqq from merge chain in bfq_setup_cooperator()"), if P1 issues a new IO: Without the patch: Process 1 Process 2 Process 3 Process 4 (BIC1) (BIC2) (BIC3) (BIC4) Λ | | | \------------------------------\ \-------------\| V V bfqq1--------->bfqq2---------->bfqq3----------->bfqq4 ref 0 0 2 4 bfqq3 will be used to handle IO from P1, this is not expected, IO should be redirected to bfqq4; With the patch: ------------------------------------------- | | Process 1 Process 2 Process 3 | Process 4 (BIC1) (BIC2) (BIC3) | (BIC4) | | | | \-------------\ \-------------\| V V bfqq1--------->bfqq2---------->bfqq3----------->bfqq4 ref 0 0 2 4 IO is redirected to bfqq4, however, procress reference of bfqq3 is still 2, while there is only P2 using it. Fix the problem by calling bfq_merge_bfqqs() for each bfqq in the merge chain. Also change bfqq_merge_bfqqs() to return new_bfqq to simplify code. Fixes: 0e456dba ("block, bfq: choose the last bfqq from merge chain in bfq_setup_cooperator()") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240909134154.954924-3-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Yu Kuai authored
After commit 42c306ed ("block, bfq: don't break merge chain in bfq_split_bfqq()"), if the current procress is the last holder of bfqq, the bfqq can be freed after bfq_split_bfqq(). Hence recored the bfqq and then access bfqq->waker_bfqq may trigger UAF. What's more, the waker_bfqq may in the merge chain of bfqq, hence just recored waker_bfqq is still not safe. Fix the problem by adding a helper bfq_waker_bfqq() to check if bfqq->waker_bfqq is in the merge chain, and current procress is the only holder. Fixes: 42c306ed ("block, bfq: don't break merge chain in bfq_split_bfqq()") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240909134154.954924-2-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Yu Kuai authored
Currently, blk-throttle handle all IO fifo, hence if data IO is throttled and then meta IO is dispatched, the meta IO will have to wait for the data IO, causing priority inversion problems. This patch support to handle metadata first and then pay debt while throttling data. Test script: use cgroup v1 to throttle root cgroup, then create new dir and file while write back is throttled test() { mkdir /mnt/test/xxx touch /mnt/test/xxx/1 sync /mnt/test/xxx sync /mnt/test/xxx } mkfs.ext4 -F /dev/nvme0n1 -E lazy_itable_init=0,lazy_journal_init=0 mount /dev/nvme0n1 /mnt/test echo "259:0 $((1024*1024))" > /sys/fs/cgroup/blkio/blkio.throttle.write_bps_device dd if=/dev/zero of=/mnt/test/foo1 bs=16M count=1 conv=fdatasync status=none & sleep 4 time test echo "259:0 0" > /sys/fs/cgroup/blkio/blkio.throttle.write_bps_device sleep 1 umount /dev/nvme0n1 Test result: time cost for creating new dir and file before this patch: 14s after this patch: 0.1s Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20240903135149.271857-3-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Yu Kuai authored
last_low_overflow_time is not used anymore after commit bf20ab53 ("blk-throttle: remove CONFIG_BLK_DEV_THROTTLING_LOW"). Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20240903135149.271857-2-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Mikhail Lobanov authored
If the net_conf pointer is NULL and the code attempts to access its fields without a check, it will lead to a null pointer dereference. Add a NULL check before dereferencing the pointer. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 44ed167d ("drbd: rcu_read_lock() and rcu_dereference() for tconn->net_conf") Cc: stable@vger.kernel.org Signed-off-by: Mikhail Lobanov <m.lobanov@rosalinux.ru> Link: https://lore.kernel.org/r/20240909133740.84297-1-m.lobanov@rosalinux.ruSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Arnd Bergmann authored
The nvme fabric driver calls the nvme_tls_key_lookup() function from nvmf_parse_key() when the keyring is enabled, but this is broken in a configuration with CONFIG_NVME_FABRICS=y and CONFIG_NVME_TCP=m because this leads to the function definition being in a loadable module: x86_64-linux-ld: vmlinux.o: in function `nvmf_parse_key': fabrics.c:(.text+0xb1bdec): undefined reference to `nvme_tls_key_lookup' Move the 'select' up to CONFIG_NVME_FABRICS itself to force this part to be built-in as well if needed. Fixes: 5bc46b49 ("nvme-tcp: check for invalidated or revoked key") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
-
- 07 Sep, 2024 2 commits
-
-
Keith Busch authored
The single-queue optimized list flush doesn't have an unplug trace event to pair with the plug event. Add one. In the unlikely event an error occurs and falls back to the less optimized plug flush path, it's possible a 2nd unplug trace event will be logged, but it will show the remainig count that weren't previously handled. Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20240906194540.3719642-1-kbusch@meta.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Li Zetao authored
Since the debugfs_create_dir() never returns a null pointer, checking the return value for a null pointer is redundant. Since debugfs_create_file() can deal with a ERR_PTR() style pointer, drop the check. Since mtip_hw_debugfs_init does not pay attention to the return value, its return type can be changed to void. Signed-off-by: Li Zetao <lizetao1@huawei.com> Link: https://lore.kernel.org/r/20240907034046.3595268-1-lizetao1@huawei.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 06 Sep, 2024 11 commits
-
-
Shen Lichuan authored
To ensure code clarity and prevent potential errors, it's advisable to employ the ';' as a statement separator, except when ',' are intentionally used for specific purposes. Signed-off-by: Shen Lichuan <shenlichuan@vivo.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
-
Jens Axboe authored
Merge tag 'md-6.12-20240906' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.12/block Pull MD updates from Song: "This patch, by Xiao Ni, adds a sysfs entry 'new_level'." * tag 'md-6.12-20240906' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: md: Add new_level sysfs interface
-
git://git.infradead.org/nvmeJens Axboe authored
Pull NVMe updates from Keith: "nvme updates for Linux 6.12 - Asynchronous namespace scanning (Stuart) - TCP TLS updates (Hannes) - RDMA queue controller validation (Niklas) - Align field names to the spec (Anuj) - Metadata support validation (Puranjay)" * tag 'nvme-6.12-2024-09-06' of git://git.infradead.org/nvme: nvme: fix metadata handling in nvme-passthrough nvme: rename apptag and appmask to lbat and lbatm nvme-rdma: send cntlid in the RDMA_CM_REQUEST Private Data nvme-target: do not check authentication status for admin commands twice nvmet-auth: allow to clear DH-HMAC-CHAP keys nvme-sysfs: add 'tls_keyring' attribute nvme-sysfs: add 'tls_configured_key' sysfs attribute nvme: split off TLS sysfs attributes into a separate group nvme: add a newline to the 'tls_key' sysfs attribute nvme-tcp: check for invalidated or revoked key nvme-tcp: sanitize TLS key handling nvme-keyring: restrict match length for version '1' identifiers nvme_core: scan namespaces asynchronously
-
Xiao Ni authored
Now reshape supports two ways: with backup file or without backup file. For the situation without backup file, it needs to change data offset. It doesn't need systemd service mdadm-grow-continue. So it can finish the reshape job in one process environment. It can know the new level from mdadm --grow command and can change to new level after reshape finishes. For the situation with backup file, it needs systemd service mdadm-grow-continue to monitor reshape progress. So there are two process envolved. One is mdadm --grow command whick kicks off reshape and wakes up mdadm-grow-continue service. The second process is the service, which doesn't know the new level from the first process. In kernel space mddev->new_level is used to record the new level when doing reshape. This patch adds a new interface to help mdadm update new_level and sync it to metadata. Then mdadm-grow-continue can read the right new_level. Commit log revised by Song Liu. Please refer to the link for more details. Signed-off-by: Xiao Ni <xni@redhat.com> Link: https://lore.kernel.org/r/20240904235453.99120-1-xni@redhat.comSigned-off-by: Song Liu <song@kernel.org>
-
Sebastian Andrzej Siewior authored
The zram_table_entry::flags member is of type long and uses 8 bytes on a 64bit architecture. With a PAGE_SIZE of 256KiB we have PAGE_SHIFT of 18 which in turn leads to __NR_ZRAM_PAGEFLAGS = 27. This still fits in an ordinary integer. By reducing the size of `flags' to four bytes, the size of the struct goes back to 16 bytes. The padding between the lock and ac_time (if enabled) is also gone. Make zram_table_entry::flags an unsigned int and update the build test to reflect the change. Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20240906141520.730009-4-bigeasy@linutronix.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Sebastian Andrzej Siewior authored
The ZRAM_LOCK was used for locking and after the addition of spinlock_t the bit set and cleared but there no reader of it. Remove the ZRAM_LOCK bit. Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20240906141520.730009-3-bigeasy@linutronix.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Mike Galbraith authored
The bit spinlock disables preemption. The spinlock_t lock becomes a sleeping lock on PREEMPT_RT and it can not be acquired in this context. In this locked section, zs_free() acquires a zs_pool::lock, and there is access to zram::wb_limit_lock. Add a spinlock_t for locking. Keep the set/ clear ZRAM_LOCK bit after the lock has been acquired/ dropped. The size of struct zram_table_entry increases by 4 bytes due to lock and additional 4 bytes padding with CONFIG_ZRAM_TRACK_ENTRY_ACTIME enabled. Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20240906141520.730009-2-bigeasy@linutronix.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Wouter Verhelst authored
The version of the NBD protocol implemented by the kernel driver currently has a 32 bit field for length values. As the NBD protocol uses bytes as a unit of length, length values larger than 2^32 bytes cannot be expressed. Update the max_hw_discard_sectors field to match that. Signed-off-by: Wouter Verhelst <w@uter.be> Fixes: 26828324 ("nbd: use the atomic queue limits API in nbd_set_size") Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Cc: Eric Blake <eblake@redhat.Com> Link: https://lore.kernel.org/r/20240812133032.115134-8-w@uter.beSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Wouter Verhelst authored
Also handle NBD_FLAG_ROTATIONAL in our debug helper function Signed-off-by: Wouter Verhelst <w@uter.be> Cc: Eric Blake <eblake@redhat.Com> Link: https://lore.kernel.org/r/20240812133032.115134-6-w@uter.beSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Wouter Verhelst authored
The NBD protocol defines a message for zeroing out a region of an export Add support to the kernel driver for that message. Signed-off-by: Wouter Verhelst <w@uter.be> Cc: Eric Blake <eblake@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240812133032.115134-3-w@uter.beSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Yu Kuai authored
BFQ has been lacking active maintenance for approximately two years, and it was recently transitioned to the Orphan state. However, there are still many users, I have decided to step forward and assume the role of maintainer to ensure continued support and development. While I may not be the one with the most extensive knowledge of BFQ's internals, I have been actively involved in its development since 2021. Moreover, our team continues to rigorously test BFQ in downstream kernels, ensuring it's stability and performance. Despite my confidence to maintain BFQ, I believe it is prudent to classify its state as "Odd Fixes" to accurately reflect my relatively new position as the maintainer. By assuming this responsibility, I am committed to providing the necessary support and addressing any issues that may arise with BFQ. As time progresses, we will reassess the situation and determine the appropriate state. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240906102153.612997-1-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 05 Sep, 2024 1 commit
-
-
Jens Axboe authored
Merge tag 'md-6.12-20240905' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.12/block Pull MD fix from Song: "This patch, from Mateusz Kusiak, improves the information reported in /proc/mdstat." * tag 'md-6.12-20240905' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: md: Report failed arrays as broken in mdstat
-
- 04 Sep, 2024 3 commits
-
-
Mateusz Kusiak authored
Depending on if array has personality, it is either reported as active or inactive. This patch adds third status "broken" for arrays with personality that became inoperative. The reason is end users tend to assume that "active" indicates array is operational. Add "broken" state for inoperative arrays with personality and refactor the code. Signed-off-by: Mateusz Kusiak <mateusz.kusiak@intel.com> Link: https://lore.kernel.org/r/20240903142949.53628-1-mateusz.kusiak@intel.comSigned-off-by: Song Liu <song@kernel.org>
-
Alexey Dobriyan authored
I independently rediscovered commit 22d24a54 block: fix overflow in blk_ioctl_discard() but for secure erase. Same problem: uint64_t r[2] = {512, 18446744073709551104ULL}; ioctl(fd, BLKSECDISCARD, r); will enter near infinite loop inside blkdev_issue_secure_erase(): a.out: attempt to access beyond end of device loop0: rw=5, sector=3399043073, nr_sectors = 1024 limit=2048 bio_check_eod: 3286214 callbacks suppressed Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Link: https://lore.kernel.org/r/9e64057f-650a-46d1-b9f7-34af391536ef@p183Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Alvaro Parker authored
The explanatory comment used `set_task_state` instead of `set_current_state` which is the function actually used in the code. Signed-off-by: Alvaro Parker <alparkerdf@gmail.com> Link: https://lore.kernel.org/r/20240903172214.520086-1-alparkerdf@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 03 Sep, 2024 3 commits
-
-
Jens Axboe authored
Nobody is maintaining this code, and it just falls under the umbrella of block layer code. But at least mark it as such, in case anyone wants to care more deeply about it and assume the responsibility of doing so. Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Yu Kuai authored
Instead of open coding it, there are no functional changes. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240902130329.3787024-5-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Yu Kuai authored
Consider the following scenario: Process 1 Process 2 Process 3 Process 4 (BIC1) (BIC2) (BIC3) (BIC4) Λ | | | \-------------\ \-------------\ \--------------\| V V V bfqq1--------->bfqq2---------->bfqq3----------->bfqq4 ref 0 1 2 4 If Process 1 issue a new IO and bfqq2 is found, and then bfq_init_rq() decide to spilt bfqq2 by bfq_split_bfqq(). Howerver, procress reference of bfqq2 is 1 and bfq_split_bfqq() just clear the coop flag, which will break the merge chain. Expected result: caller will allocate a new bfqq for BIC1 Process 1 Process 2 Process 3 Process 4 (BIC1) (BIC2) (BIC3) (BIC4) | | | \-------------\ \--------------\| V V bfqq1--------->bfqq2---------->bfqq3----------->bfqq4 ref 0 0 1 3 Since the condition is only used for the last bfqq4 when the previous bfqq2 and bfqq3 are already splited. Fix the problem by checking if bfqq is the last one in the merge chain as well. Fixes: 36eca894 ("block, bfq: add Early Queue Merge (EQM)") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240902130329.3787024-4-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-