Commits · 23881aec85f3219e8462e87c708815ee2cd82358 · Kirill Smelkov / linux

21 Jul, 2023 2 commits

loop: deprecate autoloading callback loop_probe() · 23881aec

Mauricio Faria de Oliveira authored Jul 20, 2023

The 'probe' callback in __register_blkdev() is only used under the
CONFIG_BLOCK_LEGACY_AUTOLOAD deprecation guard.

The loop_probe() function is only used for that callback, so guard it
too, accordingly.

See commit fbdee71b ("block: deprecate autoloading based on dev_t").
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230720143033.841001-2-mfo@canonical.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

23881aec

sbitmap: fix batching wakeup · 10639737

David Jeffery authored Jul 21, 2023

Current code supposes that it is enough to provide forward progress by
just waking up one wait queue after one completion batch is done.

Unfortunately this way isn't enough, cause waiter can be added to wait
queue just after it is woken up.

Follows one example(64 depth, wake_batch is 8)

1) all 64 tags are active

2) in each wait queue, there is only one single waiter

3) each time one completion batch(8 completions) wakes up just one
   waiter in each wait queue, then immediately one new sleeper is added
   to this wait queue

4) after 64 completions, 8 waiters are wakeup, and there are still 8
   waiters in each wait queue

5) after another 8 active tags are completed, only one waiter can be
   wakeup, and the other 7 can't be waken up anymore.

Turns out it isn't easy to fix this problem, so simply wakeup enough
waiters for single batch.

Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Chengming Zhou <zhouchengming@bytedance.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20230721095715.232728-1-ming.lei@redhat.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

10639737

20 Jul, 2023 2 commits

blk-iocost: skip empty flush bio in iocost · 013adcbe

Chengming Zhou authored Jul 20, 2023

The flush bio may have data, may have no data (empty flush), we couldn't
calculate cost for empty flush bio. So we'd better just skip it for now.

Another side effect is that empty flush bio's bio_end_sector() is 0, cause
iocg->cursor reset to 0, may break the cost calculation of other bios.

This isn't good enough, since flush bio still consume the device bandwidth,
but flush request is special, can be merged randomly in the flush state
machine, we don't know how to calculate cost for it for now.

Its completion time also has flaws, which may include the pre-flush or
post-flush completion time, but I don't know if we need to fix that and
how to fix it.
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20230720121441.1408522-1-chengming.zhou@linux.devSigned-off-by: Jens Axboe <axboe@kernel.dk>

013adcbe

blk-mq: delete dead struct blk_mq_hw_ctx->queued field · 3641c90c

Chengming Zhou authored Jul 20, 2023

This counter is not used anywhere, so delete it.
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230720095512.1403123-1-chengming.zhou@linux.devSigned-off-by: Jens Axboe <axboe@kernel.dk>

3641c90c

14 Jul, 2023 2 commits

blk-mq: Fix stall due to recursive flush plug · 70904263

Ross Lagerwall authored Jul 14, 2023

We have seen rare IO stalls as follows:

* blk_mq_plug_issue_direct() is entered with an mq_list containing two
requests.
* For the first request, it sets last == false and enters the driver's
queue_rq callback.
* The driver queue_rq callback indirectly calls schedule() which calls
blk_flush_plug(). This may happen if the driver has the
BLK_MQ_F_BLOCKING flag set and is allowed to sleep in ->queue_rq.
* blk_flush_plug() handles the remaining request in the mq_list. mq_list
is now empty.
* The original call to queue_rq resumes (with last == false).
* The loop in blk_mq_plug_issue_direct() terminates because there are no
remaining requests in mq_list.

The IO is now stalled because the last request submitted to the driver
had last == false and there was no subsequent call to commit_rqs().

Fix this by returning early in blk_mq_flush_plug_list() if rq_count is 0
which it will be in the recursive case, rather than checking if the
mq_list is empty. At the same time, adjust one of the callers to skip
the mq_list empty check as it is not necessary.

Fixes: dc5fc361 ("block: attempt direct issue of plug list")
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230714101106.3635611-1-ross.lagerwall@citrix.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

70904263

block: queue data commands from the flush state machine at the head · 9f87fc4d

Christoph Hellwig authored Jul 14, 2023

We used to insert the data commands following a pre-flush to the head
of the queue until commit 1e82fadf ("blk-mq: do not do head insertions
post-pre-flush commands"). Not doing this seems to cause hangs of
such commands on NFS workloads when exported from file systems with
SATA SSDs. I have no idea why this would starve these workloads,
but doing a semantic revert of this patch (which looks quite different
due to various other changes) fixes the hangs.

Fixes: 1e82fadf ("blk-mq: do not do head insertions post-pre-flush commands")
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://lore.kernel.org/r/20230714143014.11879-1-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

9f87fc4d

13 Jul, 2023 4 commits

Merge tag 'nvme-6.5-2023-07-13' of git://git.infradead.org/nvme into block-6.5 · 90b46229

Jens Axboe authored Jul 13, 2023

Pull NVMe fixes from Keith:

"nvme fixes for Linux 6.5

 - Don't require quirk to use duplicate namespace identifiers
   (Christoph, Sagi)
 - One more BOGUS_NID quirk (Pankaj)
 - IO timeout and error hanlding fixes for PCI (Keith)
 - Enhanced metadata format mask fix (Ankit)
 - Association race condition fix for fibre channel (Michael)
 - Correct debugfs error checks (Minjie)
 - Use PAGE_SECTORS_SHIFT where needed (Damien)
 - Reduce kernel logs for legacy nguid attribute (Keith)
 - Use correct dma direction when unmapping metadata (Ming)"

* tag 'nvme-6.5-2023-07-13' of git://git.infradead.org/nvme:
  nvme-pci: fix DMA direction of unmapping integrity data
  nvme: don't reject probe due to duplicate IDs for single-ported PCIe devices
  nvme: ensure disabling pairs with unquiesce
  nvme-fc: fix race between error recovery and creating association
  nvme-fc: return non-zero status code when fails to create association
  nvme: fix parameter check in nvme_fault_inject_init()
  nvme: warn only once for legacy uuid attribute
  nvme: fix the NVME_ID_NS_NVM_STS_MASK definition
  nvmet: use PAGE_SECTORS_SHIFT
  nvme: add BOGUS_NID quirk for Samsung SM953

90b46229

blk-mq: fix start_time_ns and alloc_time_ns for pre-allocated rq · 5c17f45e

Chengming Zhou authored Jul 10, 2023

The iocost rely on rq start_time_ns and alloc_time_ns to tell saturation
state of the block device. Most of the time request is allocated after
rq_qos_throttle() and its alloc_time_ns or start_time_ns won't be affected.

But for plug batched allocation introduced by the commit 47c122e3
("block: pre-allocate requests if plug is started and is a batch"), we can
rq_qos_throttle() after the allocation of the request. This is what the
blk_mq_get_cached_request() does.

In this case, the cached request alloc_time_ns or start_time_ns is much
ahead if blocked in any qos ->throttle().

Fix it by setting alloc_time_ns and start_time_ns to now when the allocated
request is actually used.
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230710105516.2053478-1-chengming.zhou@linux.devSigned-off-by: Jens Axboe <axboe@kernel.dk>

5c17f45e

nvme-pci: fix DMA direction of unmapping integrity data · b8f6446b

Ming Lei authored Jul 13, 2023

DMA direction should be taken in dma_unmap_page() for unmapping integrity
data.

Fix this DMA direction, and reported in Guangwu's test.
Reported-by: Guangwu Zhang <guazhang@redhat.com>
Fixes: 4aedb705 ("nvme-pci: split metadata handling from nvme_map_data / nvme_unmap_data")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>

b8f6446b

nvme: don't reject probe due to duplicate IDs for single-ported PCIe devices · ac522fc6

Christoph Hellwig authored Jul 13, 2023

While duplicate IDs are still very harmful, including the potential to easily
see changing devices in /dev/disk/by-id, it turn out they are extremely
common for cheap end user NVMe devices.

Relax our check for them for so that it doesn't reject the probe on
single-ported PCIe devices, but prints a big warning instead.  In doubt
we'd still like to see quirk entries to disable the potential for
changing supposed stable device identifier links, but this will at least
allow users how have two (or more) of these devices to use them without
having to manually add a new PCI ID entry with the quirk through sysfs or
by patching the kernel.

Fixes: 2079f41e ("nvme: check that EUI/GUID/UUID are globally unique")
Cc: stable@vger.kernel.org # 6.0+
Co-developed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>

ac522fc6

12 Jul, 2023 6 commits

block/mq-deadline: Fix a bug in deadline_from_pos() · f673b4f5

Bart Van Assche authored Jul 12, 2023

A bug was introduced in deadline_from_pos() while implementing the
suggestion to use round_down() in the following code:

pos -= bdev_offset_from_zone_start(rq->q->disk->part0, pos);

This patch makes deadline_from_pos() use round_down() such that 'pos' is
rounded down.
Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Closes: https://lore.kernel.org/all/5zthzi3lppvcdp4nemum6qck4gpqbdhvgy4k3qwguhgzxc4quj@amulvgycq67h/
Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <dlemoal@kernel.org>
Fixes: 0effb390 ("block: mq-deadline: Handle requeued requests correctly")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230712173344.2994513-1-bvanassche@acm.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>

f673b4f5

nvme: ensure disabling pairs with unquiesce · 71a5bb15

Keith Busch authored Jul 12, 2023

If any error handling that disables the controller fails to queue the
reset work, like if the state changed to disconnected inbetween, then
the failed teardown needs to unquiesce the queues since it's no longer
paired with reset_work. Just make sure that the controller can be put
into a resetting state prior to starting the disable so that no other
handling can change the queue states while recovery is happening.
Reported-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>

71a5bb15

nvme-fc: fix race between error recovery and creating association · ee6fdc50

Michael Liang authored Jul 07, 2023

There is a small race window between nvme-fc association creation and error
recovery. Fix this race condition by protecting accessing to controller
state and ASSOC_FAILED flag under nvme-fc controller lock.
Signed-off-by: Michael Liang <mliang@purestorage.com>
Reviewed-by: Caleb Sander <csander@purestorage.com>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>

ee6fdc50

nvme-fc: return non-zero status code when fails to create association · 60e445bd

Michael Liang authored Jul 07, 2023

Return non-zero status code(-EIO) when needed, so re-connecting or
deleting controller will be triggered properly.
Signed-off-by: Michael Liang <mliang@purestorage.com>
Reviewed-by: Caleb Sander <csander@purestorage.com>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>

60e445bd

nvme: fix parameter check in nvme_fault_inject_init() · 733ba346

Minjie Du authored Jul 12, 2023

Make IS_ERR() judge the debugfs_create_dir() function return.
Signed-off-by: Minjie Du <duminjie@vivo.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>

733ba346

nvme: warn only once for legacy uuid attribute · b718ae83

Keith Busch authored Jul 12, 2023

Report the legacy fallback behavior for uuid attributes just once
instead of logging repeated warnings for the same condition every time
the attribute is read. The old behavior is too spamy on the kernel logs.

Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reported-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>

b718ae83

10 Jul, 2023 4 commits

block: remove dead struc request->completion_data field · dc8cbb65

Jens Axboe authored Jul 10, 2023

It's no longer used. While in there, also update the comment as to why
it can coexist with the rb_node.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

dc8cbb65

nvme: fix the NVME_ID_NS_NVM_STS_MASK definition · b938e660

Ankit Kumar authored Jun 23, 2023

As per NVMe command set specification 1.0c Storage tag size is 7 bits.

Fixes: 4020aad8 ("nvme: add support for enhanced metadata")
Signed-off-by: Ankit Kumar <ankit.kumar@samsung.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>

b938e660

nvmet: use PAGE_SECTORS_SHIFT · 9bcf156f

Damien Le Moal authored Jul 07, 2023

Replace occurences of the pattern "PAGE_SHIFT - 9" in the passthru and
loop targets with PAGE_SECTORS_SHIFT.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>

9bcf156f

nvme: add BOGUS_NID quirk for Samsung SM953 · e5bb0988

Pankaj Raghav authored Jul 04, 2023

Add the quirk as SM953 is reporting bogus namespace ID.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217593Reported-by: Clemens Springsguth <cspringsguth@gmail.com>
Tested-by: Clemens Springsguth <cspringsguth@gmail.com>
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>

e5bb0988

05 Jul, 2023 3 commits

blk-crypto: use dynamic lock class for blk_crypto_profile::lock · 2fb48d88

Eric Biggers authored Jun 09, 2023

When a device-mapper device is passing through the inline encryption
support of an underlying device, calls to blk_crypto_evict_key() take
the blk_crypto_profile::lock of the device-mapper device, then take the
blk_crypto_profile::lock of the underlying device (nested).  This isn't
a real deadlock, but it causes a lockdep report because there is only
one lock class for all instances of this lock.

Lockdep subclasses don't really work here because the hierarchy of block
devices is dynamic and could have more than 2 levels.

Instead, register a dynamic lock class for each blk_crypto_profile, and
associate that with the lock.

This avoids false-positive lockdep reports like the following:

    ============================================
    WARNING: possible recursive locking detected
    6.4.0-rc5 #2 Not tainted
    --------------------------------------------
    fscryptctl/1421 is trying to acquire lock:
    ffffff80829ca418 (&profile->lock){++++}-{3:3}, at: __blk_crypto_evict_key+0x44/0x1c0

                   but task is already holding lock:
    ffffff8086b68ca8 (&profile->lock){++++}-{3:3}, at: __blk_crypto_evict_key+0xc8/0x1c0

                   other info that might help us debug this:
     Possible unsafe locking scenario:

           CPU0
           ----
      lock(&profile->lock);
      lock(&profile->lock);

                    *** DEADLOCK ***

     May be due to missing lock nesting notation

Fixes: 1b262839 ("block: Keyslot Manager for Inline Encryption")
Reported-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230610061139.212085-1-ebiggers@kernel.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>

2fb48d88

block/partition: fix signedness issue for Amiga partitions · 7eb1e476

Michael Schmitz authored Jul 05, 2023

Making 'blk' sector_t (i.e. 64 bit if LBD support is active) fails the
'blk>0' test in the partition block loop if a value of (signed int) -1 is
used to mark the end of the partition block list.

Explicitly cast 'blk' to signed int to allow use of -1 to terminate the
partition block linked list.

Fixes: b6f3f28f ("block: add overflow checks for Amiga partition support")
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Link: https://lore.kernel.org/r/024ce4fa-cc6d-50a2-9aae-3701d0ebf668@xenosoft.deSigned-off-by: Michael Schmitz <schmitzmic@gmail.com>
Reviewed-by: Martin Steigerwald <martin@lichtvoll.de>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

7eb1e476

gup: make the stack expansion warning a bit more targeted · 6cd06ab1

Linus Torvalds authored Jul 05, 2023

I added a warning about about GUP no longer expanding the stack in
commit a425ac53 ("gup: add warning if some caller would seem to want
stack expansion"), but didn't really expect anybody to hit it.

And it's true that nobody seems to have hit a _real_ case yet, but we
certainly have a number of reports of false positives.  Which not only
causes extra noise in itself, but might also end up hiding any real
cases if they do exist.

So let's tighten up the warning condition, and replace the simplistic

	vma = find_vma(mm, start);
	if (vma && (start < vma->vm_start)) {
		WARN_ON_ONCE(vma->vm_flags & VM_GROWSDOWN);

with a

	vma = gup_vma_lookup(mm, start);

helper function which works otherwise like just "vma_lookup()", but with
some heuristics for when to warn about gup no longer causing stack
expansion.

In particular, don't just warn for "below the stack", but warn if it's
_just_ below the stack (with "just below" arbitrarily defined as 64kB,
because why not?).  And rate-limit it to at most once per hour, which
means that any false positives shouldn't completely hide subsequent
reports, but we won't be flooding the logs about it either.

The previous code triggered when some GUP user (chromium crashpad)
accessing past the end of the previous vma, for example.  That has never
expanded the stack, it just causes GUP to return early, and as such we
shouldn't be warning about it.

This is still going trigger the randomized testers, but to mitigate the
noise from that, use "dump_stack()" instead of "WARN_ON_ONCE()" to get
the kernel call chain.  We'll get the relevant information, but syzbot
shouldn't get too upset about it.

Also, don't even bother with the GROWSUP case, which would be using
different heuristics entirely, but only happens on parisc.
Reported-by: kernel test robot <oliver.sang@intel.com>
Reported-by: John Hubbard <jhubbard@nvidia.com>
Reported-by: syzbot+6cf44e127903fdf9d929@syzkaller.appspotmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

6cd06ab1

04 Jul, 2023 17 commits

Revert ".gitignore: ignore *.cover and *.mbx" · d5280145

Linus Torvalds authored Jul 04, 2023

This reverts commit 534066a9.

It's actively detrimental in that it hides files that shouldn't be
hidden.

If I have some b4 mbx file in my git directory, it either was already
applied with "git am" and is now stale, or maybe it's waiting for that
to happen.  In neither case is "ignore it" the right option.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

d5280145

Merge tag 'core_guards_for_6.5_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue · 04f2933d

Linus Torvalds authored Jul 04, 2023

Pull scope-based resource management infrastructure from Peter Zijlstra:
 "These are the first few patches in the Scope-based Resource Management
  series that introduce the infrastructure but not any conversions as of
  yet.

  Adding the infrastructure now allows multiple people to start using
  them.

  Of note is that Sparse will need some work since it doesn't yet
  understand this attribute and might have decl-after-stmt issues"

* tag 'core_guards_for_6.5_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue:
  kbuild: Drop -Wdeclaration-after-statement
  locking: Introduce __cleanup() based infrastructure
  apparmor: Free up __cleanup() name
  dmaengine: ioat: Free up __cleanup() name

04f2933d

afs: Fix accidental truncation when storing data · 03275585

David Howells authored Jul 04, 2023

When an AFS FS.StoreData RPC call is made, amongst other things it is
given the resultant file size to be.  On the server, this is processed
by truncating the file to new size and then writing the data.

Now, kafs has a lock (vnode->io_lock) that serves to serialise
operations against a specific vnode (ie.  inode), but the parameters for
the op are set before the lock is taken.  This allows two writebacks
(say sync and kswapd) to race - and if writes are ongoing the writeback
for a later write could occur before the writeback for an earlier one if
the latter gets interrupted.

Note that afs_writepages() cannot take i_mutex and only takes a shared
lock on vnode->validate_lock.

Also note that the server does the truncation and the write inside a
lock, so there's no problem at that end.

Fix this by moving the calculation for the proposed new i_size inside
the vnode->io_lock.  Also reset the iterator (which we might have read
from) and update the mtime setting there.

Fixes: bd80d8a8 ("afs: Use ITER_XARRAY for writing")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/3526895.1687960024@warthog.procyon.org.uk/Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

03275585

Merge tag 'ovl-update-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs · 538140ca

Linus Torvalds authored Jul 04, 2023

Pull more overlayfs updates from Amir Goldstein:
 "This is a small 'move code around' followup by Christian to his work
  on porting overlayfs to the new mount api for 6.5. It makes things a
  bit cleaner and simpler for the next development cycle when I hand
  overlayfs back over to Miklos"

* tag 'ovl-update-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
  ovl: move all parameter handling into params.{c,h}

538140ca

Merge tag 'gfs2-v6.4-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 · 94c76955

Linus Torvalds authored Jul 04, 2023

Pull gfs2 updates from Andreas Gruenbacher:

 - Move the freeze/thaw logic from glock callback context to process /
   worker thread context to prevent deadlocks

 - Fix a quota reference couting bug in do_qc()

 - Carry on deallocating inodes even when gfs2_rindex_update() fails

 - Retry filesystem-internal reads when they are interruped by a signal

 - Eliminate kmap_atomic() in favor of kmap_local_page() /
   memcpy_{from,to}_page()

 - Get rid of noop_direct_IO

 - And a few more minor fixes and cleanups

* tag 'gfs2-v6.4-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (23 commits)
  gfs2: Add quota_change type
  gfs2: Use memcpy_{from,to}_page where appropriate
  gfs2: Convert remaining kmap_atomic calls to kmap_local_page
  gfs2: Replace deprecated kmap_atomic with kmap_local_page
  gfs: Get rid of unnucessary locking in inode_go_dump
  gfs2: gfs2_freeze_lock_shared cleanup
  gfs2: Replace sd_freeze_state with SDF_FROZEN flag
  gfs2: Rework freeze / thaw logic
  gfs2: Rename SDF_{FS_FROZEN => FREEZE_INITIATOR}
  gfs2: Reconfiguring frozen filesystem already rejected
  gfs2: Rename gfs2_freeze_lock{ => _shared }
  gfs2: Rename the {freeze,thaw}_super callbacks
  gfs2: Rename remaining "transaction" glock references
  gfs2: retry interrupted internal reads
  gfs2: Fix possible data races in gfs2_show_options()
  gfs2: Fix duplicate should_fault_in_pages() call
  gfs2: set FMODE_CAN_ODIRECT instead of a dummy direct_IO method
  gfs2: Don't remember delete unless it's successful
  gfs2: Update rl_unlinked before releasing rgrp lock
  gfs2: Fix gfs2_qa_get imbalance in gfs2_quota_hold
  ...

94c76955

Merge tag 'pm-6.5-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · ccf46d85

Linus Torvalds authored Jul 04, 2023

Pull more power management updates from Rafael Wysocki:
 "These add support for new hardware (ap807 and AM62A7), fix several
  issues in cpufreq drivers and in the operating performance points
  (OPP) framework, fix up intel_idle after recent changes and add
  documentation.

  Specifics:

   - Add missing __init annotation to one function in the intel_idle
     drvier (Rafael Wysocki)

   - Make intel_pstate use a correct scaling factor when mapping HWP
     performance levels to frequency values on hybrid-capable systems
     with disabled E-cores (Srinivas Pandruvada)

   - Fix Kconfig dependencies of the cpufreq-dt-platform driver (Viresh
     Kumar)

   - Add support to build cpufreq-dt-platdev as a module (Zhipeng Wang)

   - Don't allocate Sparc's cpufreq_driver dynamically (Viresh Kumar)

   - Add support for TI's AM62A7 platform (Vibhore Vardhan)

   - Add support for Armada's ap807 platform (Russell King (Oracle))

   - Add support for StarFive JH7110 SoC (Mason Huo)

   - Fix voltage selection for Mediatek Socs (Daniel Golle)

   - Fix error handling in Tegra's cpufreq driver (Christophe JAILLET)

   - Document Qualcomm's IPQ8074 in DT bindings (Robert Marko)

   - Don't warn for disabling a non-existing frequency for imx6q cpufreq
     driver (Christoph Niedermaier)

   - Use dev_err_probe() in Qualcomm's cpufreq driver (Andrew Halaney)

   - Simplify performance state related logic in the OPP core (Viresh
     Kumar)

   - Fix use-after-free and improve locking around lazy_opp_tables
     (Viresh Kumar, Stephan Gerhold)

   - Minor cleanups - using dev_err_probe() and rate-limiting debug
     messages (Andrew Halaney, Adrián Larumbe)"

* tag 'pm-6.5-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (23 commits)
  cpufreq: intel_pstate: Fix scaling for hybrid-capable systems with disabled E-cores
  cpufreq: Make CONFIG_CPUFREQ_DT_PLATDEV depend on OF
  intel_idle: Add __init annotation to matchup_vm_state_with_baremetal()
  OPP: Properly propagate error along when failing to get icc_path
  OPP: Use dev_err_probe() when failing to get icc_path
  cpufreq: qcom-cpufreq-hw: Use dev_err_probe() when failing to get icc paths
  cpufreq: mediatek: correct voltages for MT7622 and MT7623
  cpufreq: armada-8k: add ap807 support
  OPP: Simplify the over-designed pstate <-> level dance
  OPP: pstate is only valid for genpd OPP tables
  OPP: don't drop performance constraint on OPP table removal
  OPP: Protect `lazy_opp_tables` list with `opp_table_lock`
  OPP: Staticize `lazy_opp_tables` in of.c
  cpufreq: dt-platdev: Support building as module
  opp: Fix use-after-free in lazy_opp_tables after probe deferral
  dt-bindings: cpufreq: qcom-cpufreq-nvmem: document IPQ8074
  cpufreq: dt-platdev: Blacklist ti,am62a7 SoC
  cpufreq: ti-cpufreq: Add support for AM62A7
  OPP: rate-limit debug messages when no change in OPP is required
  cpufreq: imx6q: don't warn for disabling a non-existing frequency
  ...

ccf46d85

Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · b869e9f4

Linus Torvalds authored Jul 04, 2023

Pull more clk updates from Stephen Boyd:
 "Another set of clk driver updates and fixes for the merge window. The
  driver updates needed more time to bake in linux-next.

  Updates:
   - Support for more clk controllers in Qualcomm SoCs such as SM8350,
     SM8450, SDX75, SC8280XP, and IPQ9574
   - Runtime PM enablement of some more Qualcomm clk controllers
   - Various fixes to Qualcomm clk driver data to use correct clk_ops
     and to check halt bits properly
   - AT91 updates to modernize with clk_parent_data structures

  Fixes:
   - Remove 'syscon' from dt binding fix for ti,j721e-system-controller
   - Fix determine rate in the Tegra driver that got wrecked by the
     refactorting of muxes this merge window"

* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (69 commits)
  clk: tegra: Avoid calling an uninitialized function
  dt-bindings: mfd: ti,j721e-system-controller: Remove syscon from example
  clk: at91: sama7g5: s/ep_chg_chg_id/ep_chg_id
  clk: at91: sama7g5: switch to parent_hw and parent_data
  clk: at91: sckc: switch to parent_data/parent_hw
  clk: at91: clk-sam9x60-pll: add support for parent_hw
  clk: at91: clk-utmi: add support for parent_hw
  clk: at91: clk-system: add support for parent_hw
  clk: at91: clk-programmable: add support for parent_hw
  clk: at91: clk-peripheral: add support for parent_hw
  clk: at91: clk-master: add support for parent_hw
  clk: at91: clk-generated: add support for parent_hw
  clk: at91: clk-main: add support for parent_data/parent_hw
  clk: qcom: gcc-sc8280xp: Add runtime PM
  clk: qcom: gpucc-sc8280xp: Add runtime PM
  clk: qcom: mmcc-msm8974: fix MDSS_GDSC power flags
  clk: qcom: gpucc-sm6375: Enable runtime pm
  dt-bindings: clock: sm6375-gpucc: Add VDD_GX
  clk: qcom: gcc-sm6115: Add missing PLL config properties
  clk: qcom: clk-alpha-pll: Add a way to update some bits of test_ctl(_hi)
  ...

b869e9f4

Merge tag 'firewire-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394 · 406fb9eb

Linus Torvalds authored Jul 04, 2023

Pull firewire updates from Takashi Sakamoto:
 "This consist of three parts; UAPI update, OHCI driver update, and
  several bug fixes.

  Firstly, the 1394 OHCI specification defines method to retrieve
  hardware time stamps for asynchronous communication, which was
  previously unavailable in user space. This adds new events to the
  UAPI, allowing applications to retrieve the time when asynchronous
  packet are received and sent. The new events are tested in the
  bleeding edge of libhinawa and look to work well. The new version of
  libhinawa will be released after current merge window is closed:

    https://git.kernel.org/pub/scm/libs/ieee1394/libhinawa.git/

  Secondly, the FireWire stack includes a PCM device driver for 1394
  OHCI hardware, This change modernizes the driver by managed resource
  (devres) framework.

  Lastly, bug fixes for firewire-net and firewire-core"

* tag 'firewire-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394: (25 commits)
  firewire: net: fix use after free in fwnet_finish_incoming_packet()
  firewire: core: obsolete usage of GFP_ATOMIC at building node tree
  firewire: ohci: release buffer for AR req/resp contexts when managed resource is released
  firewire: ohci: use devres for content of configuration ROM
  firewire: ohci: use devres for IT, IR, AT/receive, and AT/request contexts
  firewire: ohci: use devres for list of isochronous contexts
  firewire: ohci: use devres for requested IRQ
  firewire: ohci: use devres for misc DMA buffer
  firewire: ohci: use devres for MMIO region mapping
  firewire: ohci: use devres for PCI-related resources
  firewire: ohci: use devres for memory object of ohci structure
  firewire: fix warnings to generate UAPI documentation
  firewire: fix build failure due to missing module license
  firewire: cdev: implement new event relevant to phy packet with time stamp
  firewire: cdev: add new event to notify phy packet with time stamp
  firewire: cdev: code refactoring to dispatch event for phy packet
  firewire: cdev: implement new event to notify response subaction with time stamp
  firewire: cdev: add new event to notify response subaction with time stamp
  firewire: cdev: code refactoring to operate event of response
  firewire: core: implement variations to send request and wait for response with time stamp
  ...

406fb9eb

module: fix init_module_from_file() error handling · f1962207

Linus Torvalds authored Jul 04, 2023

Vegard Nossum pointed out two different problems with the error handling
in init_module_from_file():

 (a) the idempotent loading code didn't clean up properly in some error
     cases, leaving the on-stack 'struct idempotent' element still in
     the hash table

 (b) failure to read the module file would nonsensically update the
     'invalid_kread_bytes' stat counter with the error value

The first error is quite nasty, in that it can then cause subsequent
idempotent loads of that same file to access stale stack contents of the
previous failure.  The case may not happen in any normal situation
(explaining all the "Tested-by's on the original change), and requires
admin privileges, but syzkaller triggers random bad behavior as a
result:

    BUG: soft lockup in sys_finit_module
    BUG: unable to handle kernel paging request in init_module_from_file
    general protection fault in init_module_from_file
    INFO: task hung in init_module_from_file
    KASAN: out-of-bounds Read in init_module_from_file
    KASAN: slab-out-of-bounds Read in init_module_from_file
    ...

The second error is fairly benign and just leads to nonsensical stats
(and has been around since the debug stats were added).

Vegard also provided a patch for the idempotent loading issue, but I'd
rather re-organize the code and make it more legible using another level
of helper functions than add the usual "goto out" error handling.

Link: https://lore.kernel.org/lkml/20230704100852.23452-1-vegard.nossum@oracle.com/
Fixes: 9b9879fc ("modules: catch concurrent module loads, treat them as idempotent")
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Reported-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Reported-by: syzbot+9c2bdc9d24e4a7abe741@syzkaller.appspotmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

f1962207

Merge branches 'pm-cpufreq' and 'pm-cpuidle' · 40c565a4

Rafael J. Wysocki authored Jul 04, 2023

Merge CPU power management updates for 6.5-rc1:

 - Add missing __init annotation to one function in the intel_idle
   drvier (Rafael Wysocki).

 - Make intel_pstate use a correct scaling factor when mapping HWP
   performance levels to frequency values on hybrid-capable systems
   with disabled E-cores (Srinivas Pandruvada).

 - Fix Kconfig dependencies of the cpufreq-dt-platform driver (Viresh
   Kumar).

 - Add support to build cpufreq-dt-platdev as a module (Zhipeng Wang).

 - Don't allocate Sparc's cpufreq_driver dynamically (Viresh Kumar).

 - Add support for TI's AM62A7 platform (Vibhore Vardhan).

 - Add support for Armada's ap807 platform (Russell King (Oracle)).

 - Add support for StarFive JH7110 SoC (Mason Huo).

 - Fix voltage selection for Mediatek Socs (Daniel Golle).

 - Fix error handling in Tegra's cpufreq driver (Christophe JAILLET).

 - Document Qualcomm's IPQ8074 in DT bindings (Robert Marko).

 - Don't warn for disabling a non-existing frequency for imx6q cpufreq
   driver (Christoph Niedermaier).

 - Use dev_err_probe() in Qualcomm's cpufreq driver (Andrew Halaney).

* pm-cpufreq:
  cpufreq: intel_pstate: Fix scaling for hybrid-capable systems with disabled E-cores
  cpufreq: Make CONFIG_CPUFREQ_DT_PLATDEV depend on OF
  cpufreq: qcom-cpufreq-hw: Use dev_err_probe() when failing to get icc paths
  cpufreq: mediatek: correct voltages for MT7622 and MT7623
  cpufreq: armada-8k: add ap807 support
  cpufreq: dt-platdev: Support building as module
  dt-bindings: cpufreq: qcom-cpufreq-nvmem: document IPQ8074
  cpufreq: dt-platdev: Blacklist ti,am62a7 SoC
  cpufreq: ti-cpufreq: Add support for AM62A7
  cpufreq: imx6q: don't warn for disabling a non-existing frequency
  cpufreq: sparc: Don't allocate cpufreq_driver dynamically
  cpufreq: tegra194: Fix an error handling path in tegra194_cpufreq_probe()
  cpufreq: dt-platdev: Add JH7110 SOC to the allowlist

* pm-cpuidle:
  intel_idle: Add __init annotation to matchup_vm_state_with_baremetal()

40c565a4

clk: tegra: Avoid calling an uninitialized function · f679e89a

Thierry Reding authored Jun 30, 2023

Commit 493ffb04 ("clk: tegra: super: Switch to determine_rate")
replaced clk_super_round_rate() by clk_super_determine_rate(), but
didn't update one callsite that was explicitly calling the old
tegra_clk_super_ops.round_rate() function, which was now NULL. This
resulted in a crash on Tegra30 systems during early boot.

Switch this callsite over to the clk_super_determine_rate() equivalent
to avoid the crash.

Fixes: 493ffb04 ("clk: tegra: super: Switch to determine_rate")
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Link: https://lore.kernel.org/r/20230630130748.840729-1-thierry.reding@gmail.comSigned-off-by: Stephen Boyd <sboyd@kernel.org>

f679e89a

mm: don't do validate_mm() unnecessarily and without mmap locking · b5641a5d

Linus Torvalds authored Jul 03, 2023

This is an addition to commit ae80b404 ("mm: validate the mm before
dropping the mmap lock"), because it turns out there were two problems,
but lockdep just stopped complaining after finding the first one.

The do_vmi_align_munmap() function now drops the mmap lock after doing
the validate_mm() call, but it turns out that one of the callers then
immediately calls validate_mm() again.

That's both a bit silly, and now (again) happens without the mmap lock
held.

So just remove that validate_mm() call from the caller, but make sure to
not lose any coverage by doing that mm sanity checking in the error path
of do_vmi_align_munmap() too.
Reported-and-tested-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/lkml/ZKN6CdkKyxBShPHi@xsang-OptiPlex-9020/
Fixes: 408579cd ("mm: Update do_vmi_align_munmap() return semantics")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

b5641a5d

arch/arm64/mm/fault: Fix undeclared variable error in do_page_fault() · 24be4d0b

SeongJae Park authored Jul 04, 2023

Commit ae870a68 ("arm64/mm: Convert to using
lock_mm_and_find_vma()") made do_page_fault() to use 'vma' even if
CONFIG_PER_VMA_LOCK is not defined, but the declaration is still in the
ifdef.

As a result, building kernel without the config fails with undeclared
variable error as below:

    arch/arm64/mm/fault.c: In function 'do_page_fault':
    arch/arm64/mm/fault.c:624:2: error: 'vma' undeclared (first use in this function); did you mean 'vmap'?
      624 |  vma = lock_mm_and_find_vma(mm, addr, regs);
          |  ^~~
          |  vmap

Fix it by moving the declaration out of the ifdef.

Fixes: ae870a68 ("arm64/mm: Convert to using lock_mm_and_find_vma()")
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

24be4d0b

Merge tag 'block-6.5-2023-07-03' of git://git.kernel.dk/linux · e50df249

Linus Torvalds authored Jul 03, 2023

Pull more block updates from Jens Axboe:
 "Mostly items that came in a bit late for the initial pull request,
  wanted to make sure they had the appropriate amount of linux-next soak
  before going upstream.

  Outside of stragglers, just generic fixes for either merge window
  items, or longer standing bugs"

* tag 'block-6.5-2023-07-03' of git://git.kernel.dk/linux: (25 commits)
  md/raid0: add discard support for the 'original' layout
  nvme: disable controller on reset state failure
  nvme: sync timeout work on failed reset
  nvme: ensure unquiesce on teardown
  cdrom/gdrom: Fix build error
  nvme: improved uring polling
  block: add request polling helper
  nvme-mpath: fix I/O failure with EAGAIN when failing over I/O
  nvme: host: fix command name spelling
  blk-sysfs: add a new attr_group for blk_mq
  blk-iocost: move wbt_enable/disable_default() out of spinlock
  blk-wbt: cleanup rwb_enabled() and wbt_disabled()
  blk-wbt: remove dead code to handle wbt enable/disable with io inflight
  blk-wbt: don't create wbt sysfs entry if CONFIG_BLK_WBT is disabled
  blk-mq: fix two misuses on RQF_USE_SCHED
  blk-throttle: Fix io statistics for cgroup v1
  bcache: Fix bcache device claiming
  bcache: Alloc holder object before async registration
  raid10: avoid spin_lock from fastpath from raid10_unplug()
  md: fix 'delete_mutex' deadlock
  ...

e50df249

Merge tag 'io_uring-6.5-2023-07-03' of git://git.kernel.dk/linux · 4f528753

Linus Torvalds authored Jul 03, 2023

Pull io_uring fixes from Jens Axboe:
 "The fix for the msghdr->msg_inq assigned value being wrong, using -1
  instead of -1U for the signed type.

  Also a fix for ensuring when we're trying to run task_work on an
  exiting task, that we wait for it. This is not really a correctness
  thing as the work is being canceled, but it does help with ensuring
  file descriptors are closed when the task has exited."

* tag 'io_uring-6.5-2023-07-03' of git://git.kernel.dk/linux:
  io_uring: flush offloaded and delayed task_work on exit
  io_uring: remove io_fallback_tw() forward declaration
  io_uring/net: use proper value for msg_inq

4f528753

Merge tag 'hsi-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi · 69c9f230

Linus Torvalds authored Jul 03, 2023

Pull HSI updates from Sebastian Reichel:

 - fix build warning with W=1

 - drop error handling for debugfs

* tag 'hsi-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi:
  HSI: omap_ssi_port: Drop error checking for debugfs_create_dir
  HSI: fix ssi_waketest() declaration

69c9f230

Merge tag 'for-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply · 0df24138

Linus Torvalds authored Jul 03, 2023

Pull power supply and reset updates from Sebastian Reichel:

 - Add new Qualcomm PMI8998/PM660 SMB2 charger

 - bq256xx: support systems without thermistors

 - cros_pchg: fix peripheral device status after system resume

 - axp20x_usb_power: add support for AXP192

 - qcom-pon: add support for pm8941

 - at91-reset: prepare to expose reset reason to sysfs

 - switch all I2C drivers back to use .probe instead of .probe_new

 - convert some more DT bindings to YAML

 - misc cleanups

* tag 'for-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (28 commits)
  MAINTAINERS: add documentation file for Microchip SAMA5D2 shutdown controller
  dt-bindings: power: reset: atmel,sama5d2-shdwc: convert to yaml
  dt-bindings: power: reset: atmel,at91sam9260-shdwc: convert to yaml
  power: reset: at91-reset: change the power on reason prototype
  power: reset: qcom-pon: add support for pm8941-pon
  dt-bindings: power: reset: qcom-pon: define pm8941-pon
  power: supply: add Qualcomm PMI8998 SMB2 Charger driver
  dt-bindings: power: supply: qcom,pmi8998-charger: add bindings for smb2 driver
  power: supply: rt9467: Make charger-enable control as logic level
  power: supply: Switch i2c drivers back to use .probe()
  power: reset: add HAS_IOPORT dependencies
  dt-bindings: power: supply: axp20x: Add AXP192 compatible
  power: supply: axp20x_usb_power: Add support for AXP192
  power: supply: axp20x_usb_power: Remove variant IDs from VBUS polling check
  power: supply: axp20x_usb_power: Use regmap field for VBUS disabling
  power: supply: axp20x_usb_power: Use regmap fields for USB BC feature
  power: supply: axp20x_usb_power: Use regmap fields for VBUS monitor feature
  power: supply: axp20x_usb_power: Simplify USB current limit handling
  power: supply: hwmon: constify pointers to hwmon_channel_info
  power: supply: twl4030_madc_battery: Refactor twl4030_madc_bat_ext_changed()
  ...

0df24138