Commits · 2b018086143d638de8d67ae5be6e8c1afb413193 · Kirill Smelkov / linux

13 Sep, 2024 3 commits

blk-mq: unconditional nr_integrity_segments · 2b018086

Keith Busch authored Sep 13, 2024

Always defining the field will make using it easier and less error prone
in future patches.

There shouldn't be any downside to this: the field fits in what would
otherwise be a 2-byte hole, so we're not saving space by conditionally
leaving it out.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20240913182854.2445457-2-kbusch@meta.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

2b018086

Merge tag 'nvme-6.12-2024-09-13' of git://git.infradead.org/nvme into for-6.12/block · d4d7c03f

Jens Axboe authored Sep 13, 2024

Pull NVMe fixes from Keith:

"nvme fixes for Linux 6.12

 - A syntax cleanup (Shen)
 - Fix a Kconfig linking error (Arnd)
 - New queue-depth quirk (Keith)"

* tag 'nvme-6.12-2024-09-13' of git://git.infradead.org/nvme:
  nvme-pci: qdepth 1 quirk
  nvme-tcp: fix link failure for TCP auth
  nvme: Convert comma to semicolon

d4d7c03f

nvme-pci: qdepth 1 quirk · 83bdfcbd

Keith Busch authored Sep 11, 2024

Another device has been reported to be unreliable if we have more than
one outstanding command. In this new case, data corruption may occur.
Since we have two devices now needing this quirky behavior, make a
generic quirk flag.

The same Apple quirk is clearly not "temporary", so update the comment
while moving it.

Link: https://lore.kernel.org/linux-nvme/191d810a4e3.fcc6066c765804.973611676137075390@collabora.com/Reported-by: Robert Beckett <bob.beckett@collabora.com>
Reviewed-by: Christoph Hellwig hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>

83bdfcbd

12 Sep, 2024 1 commit

block: fix potential invalid pointer dereference in blk_add_partition · 26e197b7

Riyan Dhiman authored Sep 11, 2024

The blk_add_partition() function initially used a single if-condition
(IS_ERR(part)) to check for errors when adding a partition. This was
modified to handle the specific case of -ENXIO separately, allowing the
function to proceed without logging the error in this case. However,
this change unintentionally left a path where md_autodetect_dev()
could be called without confirming that part is a valid pointer.

This commit separates the error handling logic by splitting the
initial if-condition, improving code readability and handling specific
error scenarios explicitly. The function now distinguishes the general
error case from -ENXIO without altering the existing behavior of
md_autodetect_dev() calls.

Fixes: b7205307 (block: allow partitions on host aware zone devices)
Signed-off-by: Riyan Dhiman <riyandhiman14@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240911132954.5874-1-riyandhiman14@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

26e197b7

11 Sep, 2024 5 commits

blk_iocost: make read-only static array vrate_adj_pct const · cc089684

Colin Ian King authored Sep 11, 2024

The static array vrate_adj_pct is read-only, so make it const as
well.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20240911214124.197403-1-colin.i.king@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

cc089684

block: unpin user pages belonging to a folio at once · eb1d46fc

Kundan Kumar authored Sep 11, 2024

Use newly added mm function unpin_user_folio() to put refs by npages
count.
Signed-off-by: Kundan Kumar <kundan.kumar@samsung.com>
Tested-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20240911064935.5630-5-kundan.kumar@samsung.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

eb1d46fc

mm: release number of pages of a folio · d3bfbfb1

Kundan Kumar authored Sep 11, 2024

Add a new function unpin_user_folio() to put the refs of a folio by
npages count.

The check for BIO_PAGE_PINNED flag is removed as it is already checked
in bio_release_pages().
Signed-off-by: Kundan Kumar <kundan.kumar@samsung.com>
Tested-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20240911064935.5630-4-kundan.kumar@samsung.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

d3bfbfb1

block: introduce folio awareness and add a bigger size from folio · ed9832bc

Kundan Kumar authored Sep 11, 2024

Add a bigger size from folio to bio and skip merge processing for pages.

Fetch the offset of page within a folio. Depending on the size of folio
and folio_offset, fetch a larger length. This length may consist of
multiple contiguous pages if folio is multiorder.

Using the length calculate number of pages which will be added to bio and
increment the loop counter to skip those pages.

This technique helps to avoid overhead of merging pages which belong to
same large order folio.

Also folio-ize the functions bio_iov_add_page() and
bio_iov_add_zone_append_page()
Signed-off-by: Kundan Kumar <kundan.kumar@samsung.com>
Tested-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20240911064935.5630-3-kundan.kumar@samsung.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

ed9832bc

block: Added folio-ized version of bio_add_hw_page() · 7de98954

Kundan Kumar authored Sep 11, 2024

Added new bio_add_hw_folio() function as a wrapper around
bio_add_hw_page(). This is a prep patch.
Signed-off-by: Kundan Kumar <kundan.kumar@samsung.com>
Tested-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20240911064935.5630-2-kundan.kumar@samsung.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

7de98954

10 Sep, 2024 11 commits

block, bfq: factor out a helper to split bfqq in bfq_init_rq() · a7609d2a

Yu Kuai authored Sep 09, 2024

Make code cleaner, there are no functional changes.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240909134154.954924-8-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

a7609d2a

block, bfq: remove local variable 'bfqq_already_existing' in bfq_init_rq() · 3c61429c

Yu Kuai authored Sep 09, 2024

Now that 'bfqq_already_existing' is only used in one branch, it can be
removed. There are no functional changes.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240909134154.954924-7-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

3c61429c

block, bfq: remove local variable 'split' in bfq_init_rq() · e61e002a

Yu Kuai authored Sep 09, 2024

The local variable is used to call bfq_bfqq_resume_state() later,
since 'bfqd->lock' is held, and bfqq status will not change between
setting 'split' and calling bfq_bfqq_resume_state(), move forward
bfq_bfqq_resume_state() so that 'split' can be removed. There are no
functional chagnes.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240909134154.954924-6-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

e61e002a

block, bfq: remove bfq_log_bfqg() · 553a606c

Yu Kuai authored Sep 09, 2024

It's not used, hence can be removed.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240909134154.954924-5-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

553a606c

block, bfq: merge bfq_release_process_ref() into bfq_put_cooperator() · bc3b1e9e

Yu Kuai authored Sep 09, 2024

Because bfq_put_cooperator() is always followed by
bfq_release_process_ref().
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240909134154.954924-4-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

bc3b1e9e

block, bfq: fix procress reference leakage for bfqq in merge chain · 73aeab37

Yu Kuai authored Sep 09, 2024

Original state:

        Process 1       Process 2       Process 3       Process 4
         (BIC1)          (BIC2)          (BIC3)          (BIC4)
          Λ                |               |               |
           \--------------\ \-------------\ \-------------\|
                           V               V               V
          bfqq1--------->bfqq2---------->bfqq3----------->bfqq4
    ref    0               1               2               4

After commit 0e456dba ("block, bfq: choose the last bfqq from merge
chain in bfq_setup_cooperator()"), if P1 issues a new IO:

Without the patch:

        Process 1       Process 2       Process 3       Process 4
         (BIC1)          (BIC2)          (BIC3)          (BIC4)
          Λ                |               |               |
           \------------------------------\ \-------------\|
                                           V               V
          bfqq1--------->bfqq2---------->bfqq3----------->bfqq4
    ref    0               0               2               4

bfqq3 will be used to handle IO from P1, this is not expected, IO
should be redirected to bfqq4;

With the patch:

          -------------------------------------------
          |                                         |
        Process 1       Process 2       Process 3   |   Process 4
         (BIC1)          (BIC2)          (BIC3)     |    (BIC4)
                           |               |        |      |
                            \-------------\ \-------------\|
                                           V               V
          bfqq1--------->bfqq2---------->bfqq3----------->bfqq4
    ref    0               0               2               4

IO is redirected to bfqq4, however, procress reference of bfqq3 is still
2, while there is only P2 using it.

Fix the problem by calling bfq_merge_bfqqs() for each bfqq in the merge
chain. Also change bfqq_merge_bfqqs() to return new_bfqq to simplify
code.

Fixes: 0e456dba ("block, bfq: choose the last bfqq from merge chain in bfq_setup_cooperator()")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240909134154.954924-3-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

73aeab37

block, bfq: fix uaf for accessing waker_bfqq after splitting · 1ba0403a

Yu Kuai authored Sep 09, 2024

After commit 42c306ed ("block, bfq: don't break merge chain in
bfq_split_bfqq()"), if the current procress is the last holder of bfqq,
the bfqq can be freed after bfq_split_bfqq(). Hence recored the bfqq and
then access bfqq->waker_bfqq may trigger UAF. What's more, the waker_bfqq
may in the merge chain of bfqq, hence just recored waker_bfqq is still
not safe.

Fix the problem by adding a helper bfq_waker_bfqq() to check if
bfqq->waker_bfqq is in the merge chain, and current procress is the only
holder.

Fixes: 42c306ed ("block, bfq: don't break merge chain in bfq_split_bfqq()")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240909134154.954924-2-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

1ba0403a

blk-throttle: support prioritized processing of metadata · 29390bb5

Yu Kuai authored Sep 03, 2024

Currently, blk-throttle handle all IO fifo, hence if data IO is
throttled and then meta IO is dispatched, the meta IO will have to wait
for the data IO, causing priority inversion problems.

This patch support to handle metadata first and then pay debt while
throttling data.

Test script: use cgroup v1 to throttle root cgroup, then create new
dir and file while write back is throttled

test() {
  mkdir /mnt/test/xxx
  touch /mnt/test/xxx/1
  sync /mnt/test/xxx
  sync /mnt/test/xxx
}

mkfs.ext4 -F /dev/nvme0n1 -E lazy_itable_init=0,lazy_journal_init=0
mount /dev/nvme0n1 /mnt/test

echo "259:0 $((1024*1024))" > /sys/fs/cgroup/blkio/blkio.throttle.write_bps_device
dd if=/dev/zero of=/mnt/test/foo1 bs=16M count=1 conv=fdatasync status=none &
sleep 4

time test
echo "259:0 0" > /sys/fs/cgroup/blkio/blkio.throttle.write_bps_device

sleep 1
umount /dev/nvme0n1

Test result: time cost for creating new dir and file
before this patch:  14s
after this patch:   0.1s
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20240903135149.271857-3-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

29390bb5

blk-throttle: remove last_low_overflow_time · 3bf73e62

Yu Kuai authored Sep 03, 2024

last_low_overflow_time is not used anymore after commit bf20ab53
("blk-throttle: remove CONFIG_BLK_DEV_THROTTLING_LOW").
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20240903135149.271857-2-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

3bf73e62

drbd: Add NULL check for net_conf to prevent dereference in state validation · a5e61b50

Mikhail Lobanov authored Sep 09, 2024

If the net_conf pointer is NULL and the code attempts to access its
fields without a check, it will lead to a null pointer dereference.
Add a NULL check before dereferencing the pointer.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 44ed167d ("drbd: rcu_read_lock() and rcu_dereference() for tconn->net_conf")
Cc: stable@vger.kernel.org
Signed-off-by: Mikhail Lobanov <m.lobanov@rosalinux.ru>
Link: https://lore.kernel.org/r/20240909133740.84297-1-m.lobanov@rosalinux.ruSigned-off-by: Jens Axboe <axboe@kernel.dk>

a5e61b50

nvme-tcp: fix link failure for TCP auth · 2d5a333e

Arnd Bergmann authored Sep 09, 2024

The nvme fabric driver calls the nvme_tls_key_lookup() function from
nvmf_parse_key() when the keyring is enabled, but this is broken in a
configuration with CONFIG_NVME_FABRICS=y and CONFIG_NVME_TCP=m because
this leads to the function definition being in a loadable module:

x86_64-linux-ld: vmlinux.o: in function `nvmf_parse_key':
fabrics.c:(.text+0xb1bdec): undefined reference to `nvme_tls_key_lookup'

Move the 'select' up to CONFIG_NVME_FABRICS itself to force this
part to be built-in as well if needed.

Fixes: 5bc46b49 ("nvme-tcp: check for invalidated or revoked key")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>

2d5a333e

07 Sep, 2024 2 commits

blk-mq: add missing unplug trace event · acc8c0a9

Keith Busch authored Sep 06, 2024

The single-queue optimized list flush doesn't have an unplug trace event
to pair with the plug event. Add one.

In the unlikely event an error occurs and falls back to the less
optimized plug flush path, it's possible a 2nd unplug trace event will
be logged, but it will show the remainig count that weren't previously
handled.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20240906194540.3719642-1-kbusch@meta.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

acc8c0a9

mtip32xx: Remove redundant null pointer checks in mtip_hw_debugfs_init() · a02e98be

Li Zetao authored Sep 07, 2024

Since the debugfs_create_dir() never returns a null pointer, checking
the return value for a null pointer is redundant. Since
debugfs_create_file() can deal with a ERR_PTR() style pointer, drop
the check. Since mtip_hw_debugfs_init does not pay attention to the
return value, its return type can be changed to void.
Signed-off-by: Li Zetao <lizetao1@huawei.com>
Link: https://lore.kernel.org/r/20240907034046.3595268-1-lizetao1@huawei.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

a02e98be

06 Sep, 2024 11 commits

nvme: Convert comma to semicolon · 389e72c5

Shen Lichuan authored Sep 06, 2024

To ensure code clarity and prevent potential errors, it's advisable
to employ the ';' as a statement separator, except when ',' are
intentionally used for specific purposes.
Signed-off-by: Shen Lichuan <shenlichuan@vivo.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>

389e72c5

Merge tag 'md-6.12-20240906' of... · 68f31e88

Jens Axboe authored Sep 06, 2024

Merge tag 'md-6.12-20240906' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.12/block

Pull MD updates from Song:

"This patch, by Xiao Ni, adds a sysfs entry 'new_level'."

* tag 'md-6.12-20240906' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
  md: Add new_level sysfs interface

68f31e88

Merge tag 'nvme-6.12-2024-09-06' of git://git.infradead.org/nvme into for-6.12/block · 98141430

Jens Axboe authored Sep 06, 2024

Pull NVMe updates from Keith:

"nvme updates for Linux 6.12

 - Asynchronous namespace scanning (Stuart)
 - TCP TLS updates (Hannes)
 - RDMA queue controller validation (Niklas)
 - Align field names to the spec (Anuj)
 - Metadata support validation (Puranjay)"

* tag 'nvme-6.12-2024-09-06' of git://git.infradead.org/nvme:
  nvme: fix metadata handling in nvme-passthrough
  nvme: rename apptag and appmask to lbat and lbatm
  nvme-rdma: send cntlid in the RDMA_CM_REQUEST Private Data
  nvme-target: do not check authentication status for admin commands twice
  nvmet-auth: allow to clear DH-HMAC-CHAP keys
  nvme-sysfs: add 'tls_keyring' attribute
  nvme-sysfs: add 'tls_configured_key' sysfs attribute
  nvme: split off TLS sysfs attributes into a separate group
  nvme: add a newline to the 'tls_key' sysfs attribute
  nvme-tcp: check for invalidated or revoked key
  nvme-tcp: sanitize TLS key handling
  nvme-keyring: restrict match length for version '1' identifiers
  nvme_core: scan namespaces asynchronously

98141430

md: Add new_level sysfs interface · d981ed84

Xiao Ni authored Sep 05, 2024

Now reshape supports two ways: with backup file or without backup file.
For the situation without backup file, it needs to change data offset.
It doesn't need systemd service mdadm-grow-continue. So it can finish
the reshape job in one process environment. It can know the new level
from mdadm --grow command and can change to new level after reshape
finishes.

For the situation with backup file, it needs systemd service
mdadm-grow-continue to monitor reshape progress. So there are two process
envolved. One is mdadm --grow command whick kicks off reshape and wakes
up mdadm-grow-continue service. The second process is the service, which
doesn't know the new level from the first process.

In kernel space mddev->new_level is used to record the new level when
doing reshape. This patch adds a new interface to help mdadm update
new_level and sync it to metadata. Then mdadm-grow-continue can read the
right new_level.

Commit log revised by Song Liu. Please refer to the link for more details.
Signed-off-by: Xiao Ni <xni@redhat.com>
Link: https://lore.kernel.org/r/20240904235453.99120-1-xni@redhat.comSigned-off-by: Song Liu <song@kernel.org>

d981ed84

zram: Shrink zram_table_entry::flags. · 68d20eb6

Sebastian Andrzej Siewior authored Sep 06, 2024

The zram_table_entry::flags member is of type long and uses 8 bytes on a
64bit architecture. With a PAGE_SIZE of 256KiB we have PAGE_SHIFT of 18
which in turn leads to __NR_ZRAM_PAGEFLAGS = 27. This still fits in an
ordinary integer.
By reducing the size of `flags' to four bytes, the size of the struct
goes back to 16 bytes. The padding between the lock and ac_time (if
enabled) is also gone.

Make zram_table_entry::flags an unsigned int and update the build test
to reflect the change.
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20240906141520.730009-4-bigeasy@linutronix.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

68d20eb6

zram: Remove ZRAM_LOCK · 6086aeb4

Sebastian Andrzej Siewior authored Sep 06, 2024

The ZRAM_LOCK was used for locking and after the addition of spinlock_t
the bit set and cleared but there no reader of it.

Remove the ZRAM_LOCK bit.
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20240906141520.730009-3-bigeasy@linutronix.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

6086aeb4

zram: Replace bit spinlocks with a spinlock_t. · 9518e5bf

Mike Galbraith authored Sep 06, 2024

The bit spinlock disables preemption. The spinlock_t lock becomes a sleeping
lock on PREEMPT_RT and it can not be acquired in this context. In this locked
section, zs_free() acquires a zs_pool::lock, and there is access to
zram::wb_limit_lock.

Add a spinlock_t for locking. Keep the set/ clear ZRAM_LOCK bit after
the lock has been acquired/ dropped. The size of struct zram_table_entry
increases by 4 bytes due to lock and additional 4 bytes padding with
CONFIG_ZRAM_TRACK_ENTRY_ACTIME enabled.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20240906141520.730009-2-bigeasy@linutronix.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

9518e5bf

nbd: correct the maximum value for discard sectors · 296dbc72

Wouter Verhelst authored Aug 12, 2024

The version of the NBD protocol implemented by the kernel driver
currently has a 32 bit field for length values. As the NBD protocol uses
bytes as a unit of length, length values larger than 2^32 bytes cannot
be expressed.

Update the max_hw_discard_sectors field to match that.
Signed-off-by: Wouter Verhelst <w@uter.be>
Fixes: 26828324 ("nbd: use the atomic queue limits API in nbd_set_size")
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Cc: Eric Blake <eblake@redhat.Com>
Link: https://lore.kernel.org/r/20240812133032.115134-8-w@uter.beSigned-off-by: Jens Axboe <axboe@kernel.dk>

296dbc72

nbd: nbd_bg_flags_show: add NBD_FLAG_ROTATIONAL · 41372f5c

Wouter Verhelst authored Aug 12, 2024

Also handle NBD_FLAG_ROTATIONAL in our debug helper function
Signed-off-by: Wouter Verhelst <w@uter.be>
Cc: Eric Blake <eblake@redhat.Com>
Link: https://lore.kernel.org/r/20240812133032.115134-6-w@uter.beSigned-off-by: Jens Axboe <axboe@kernel.dk>

41372f5c

nbd: implement the WRITE_ZEROES command · e49dacc7

Wouter Verhelst authored Aug 12, 2024

The NBD protocol defines a message for zeroing out a region of an export

Add support to the kernel driver for that message.
Signed-off-by: Wouter Verhelst <w@uter.be>
Cc: Eric Blake <eblake@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20240812133032.115134-3-w@uter.beSigned-off-by: Jens Axboe <axboe@kernel.dk>

e49dacc7

MAINTAINERS: Move the BFQ io scheduler to Odd Fixes state · f55d3b82

Yu Kuai authored Sep 06, 2024

BFQ has been lacking active maintenance for approximately two years, and it
was recently transitioned to the Orphan state. However, there are still
many users, I have decided to step forward and assume the role of
maintainer to ensure continued support and development.

While I may not be the one with the most extensive knowledge of BFQ's
internals, I have been actively involved in its development since 2021.
Moreover, our team continues to rigorously test BFQ in downstream kernels,
ensuring it's stability and performance. Despite my confidence to maintain
BFQ, I believe it is prudent to classify its state as "Odd Fixes" to
accurately reflect my relatively new position as the maintainer.

By assuming this responsibility, I am committed to providing the necessary
support and addressing any issues that may arise with BFQ. As time
progresses, we will reassess the situation and determine the appropriate
state.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240906102153.612997-1-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

f55d3b82

05 Sep, 2024 1 commit

Merge tag 'md-6.12-20240905' of... · 9714452a

Jens Axboe authored Sep 05, 2024

Merge tag 'md-6.12-20240905' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.12/block

Pull MD fix from Song:

"This patch, from Mateusz Kusiak, improves the information reported in
/proc/mdstat."

* tag 'md-6.12-20240905' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
  md: Report failed arrays as broken in mdstat

9714452a

04 Sep, 2024 3 commits

md: Report failed arrays as broken in mdstat · 2d2b3bc1

Mateusz Kusiak authored Sep 03, 2024

Depending on if array has personality, it is either reported as active or
inactive. This patch adds third status "broken" for arrays with
personality that became inoperative. The reason is end users tend to
assume that "active" indicates array is operational.

Add "broken" state for inoperative arrays with personality and refactor
the code.
Signed-off-by: Mateusz Kusiak <mateusz.kusiak@intel.com>
Link: https://lore.kernel.org/r/20240903142949.53628-1-mateusz.kusiak@intel.comSigned-off-by: Song Liu <song@kernel.org>

2d2b3bc1

block: fix integer overflow in BLKSECDISCARD · 697ba0b6

Alexey Dobriyan authored Sep 03, 2024

I independently rediscovered

	commit 22d24a54
	block: fix overflow in blk_ioctl_discard()

but for secure erase.

Same problem:

	uint64_t r[2] = {512, 18446744073709551104ULL};
	ioctl(fd, BLKSECDISCARD, r);

will enter near infinite loop inside blkdev_issue_secure_erase():

	a.out: attempt to access beyond end of device
	loop0: rw=5, sector=3399043073, nr_sectors = 1024 limit=2048
	bio_check_eod: 3286214 callbacks suppressed
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Link: https://lore.kernel.org/r/9e64057f-650a-46d1-b9f7-34af391536ef@p183Signed-off-by: Jens Axboe <axboe@kernel.dk>

697ba0b6

block: fix comment to use set_current_state · 2be6190c

Alvaro Parker authored Sep 03, 2024

The explanatory comment used `set_task_state` instead of
`set_current_state` which is the function actually used in the code.
Signed-off-by: Alvaro Parker <alparkerdf@gmail.com>
Link: https://lore.kernel.org/r/20240903172214.520086-1-alparkerdf@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

2be6190c

03 Sep, 2024 3 commits

MAINTAINERS: move the BFQ io scheduler to orphan state · 761e5afb

Jens Axboe authored Sep 03, 2024

Nobody is maintaining this code, and it just falls under the umbrella
of block layer code. But at least mark it as such, in case anyone wants
to care more deeply about it and assume the responsibility of doing so.
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

761e5afb

block, bfq: use bfq_reassign_last_bfqq() in bfq_bfqq_move() · f45916ae

Yu Kuai authored Sep 02, 2024

Instead of open coding it, there are no functional changes.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240902130329.3787024-5-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

f45916ae

block, bfq: don't break merge chain in bfq_split_bfqq() · 42c306ed

Yu Kuai authored Sep 02, 2024

Consider the following scenario:

    Process 1       Process 2       Process 3       Process 4
     (BIC1)          (BIC2)          (BIC3)          (BIC4)
      Λ               |               |                |
       \-------------\ \-------------\ \--------------\|
                      V               V                V
      bfqq1--------->bfqq2---------->bfqq3----------->bfqq4
ref    0              1               2                4

If Process 1 issue a new IO and bfqq2 is found, and then bfq_init_rq()
decide to spilt bfqq2 by bfq_split_bfqq(). Howerver, procress reference
of bfqq2 is 1 and bfq_split_bfqq() just clear the coop flag, which will
break the merge chain.

Expected result: caller will allocate a new bfqq for BIC1

    Process 1       Process 2       Process 3       Process 4
     (BIC1)          (BIC2)          (BIC3)          (BIC4)
                      |               |                |
                       \-------------\ \--------------\|
                                      V                V
      bfqq1--------->bfqq2---------->bfqq3----------->bfqq4
ref    0              0               1                3

Since the condition is only used for the last bfqq4 when the previous
bfqq2 and bfqq3 are already splited. Fix the problem by checking if
bfqq is the last one in the merge chain as well.

Fixes: 36eca894 ("block, bfq: add Early Queue Merge (EQM)")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240902130329.3787024-4-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

42c306ed