Commits · f8eacd8ad7a658b805c635f8ffad7913981f863c · Kirill Smelkov / linux

18 Oct, 2024 15 commits

Merge tag 'block-6.12-20241018' of git://git.kernel.dk/linux · f8eacd8a

Linus Torvalds authored Oct 18, 2024

Pull block fixes from Jens Axboe:

 - NVMe pull request via Keith:
     - Fix target passthrough identifier (Nilay)
     - Fix tcp locking (Hannes)
     - Replace list with sbitmap for tracking RDMA rsp tags (Guixen)
     - Remove unnecessary fallthrough statements (Tokunori)
     - Remove ready-without-media support (Greg)
     - Fix multipath partition scan deadlock (Keith)
     - Fix concurrent PCI reset and remove queue mapping (Maurizio)
     - Fabrics shutdown fixes (Nilay)

 - Fix for a kerneldoc warning (Keith)

 - Fix a race with blk-rq-qos and wakeups (Omar)

 - Cleanup of checking for always-set tag_set (SurajSonawane2415)

 - Fix for a crash with CPU hotplug notifiers (Ming)

 - Don't allow zero-copy ublk on unprivileged device (Ming)

 - Use array_index_nospec() for CDROM (Josh)

 - Remove dead code in drbd (David)

 - Tweaks to elevator loading (Breno)

* tag 'block-6.12-20241018' of git://git.kernel.dk/linux:
  cdrom: Avoid barrier_nospec() in cdrom_ioctl_media_changed()
  nvme: use helper nvme_ctrl_state in nvme_keep_alive_finish function
  nvme: make keep-alive synchronous operation
  nvme-loop: flush off pending I/O while shutting down loop controller
  nvme-pci: fix race condition between reset and nvme_dev_disable()
  ublk: don't allow user copy for unprivileged device
  blk-rq-qos: fix crash on rq_qos_wait vs. rq_qos_wake_function race
  nvme-multipath: defer partition scanning
  blk-mq: setup queue ->tag_set before initializing hctx
  elevator: Remove argument from elevator_find_get
  elevator: do not request_module if elevator exists
  drbd: Remove unused conn_lowest_minor
  nvme: disable CC.CRIME (NVME_CC_CRIME)
  nvme: delete unnecessary fallthru comment
  nvmet-rdma: use sbitmap to replace rsp free list
  block: Fix elevator_get_default() checking for NULL q->tag_set
  nvme: tcp: avoid race between queue_lock lock and destroy
  nvmet-passthru: clear EUID/NGUID/UUID while using loop target
  block: fix blk_rq_map_integrity_sg kernel-doc

f8eacd8a

Merge tag 'io_uring-6.12-20241018' of git://git.kernel.dk/linux · a041f478

Linus Torvalds authored Oct 18, 2024

Pull io_uring fixes from Jens Axboe:

 - Fix a regression this merge window where cloning of registered
   buffers didn't take into account the dummy_ubuf

 - Fix a race with reading how many SQRING entries are available,
   causing userspace to need to loop around io_uring_sqring_wait()
   rather than being able to rely on SQEs being available when it
   returned

 - Ensure that the SQPOLL thread is TASK_RUNNING before running
   task_work off the cancelation exit path

* tag 'io_uring-6.12-20241018' of git://git.kernel.dk/linux:
  io_uring/sqpoll: ensure task state is TASK_RUNNING when running task_work
  io_uring/rsrc: ignore dummy_ubuf for buffer cloning
  io_uring/sqpoll: close race on waiting for sqring entries

a041f478

Merge tag 'v6.12-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 · b04ae0f4

Linus Torvalds authored Oct 18, 2024

Pull smb client fixes from Steve French:

 - Fix possible double free setting xattrs

 - Fix slab out of bounds with large ioctl payload

 - Remove three unused functions, and an unused variable that could be
   confusing

* tag 'v6.12-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Remove unused functions
  smb/client: Fix logically dead code
  smb: client: fix OOBs when building SMB2_IOCTL request
  smb: client: fix possible double free in smb2_set_ea()

b04ae0f4

Merge tag 'xfs-6.12-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 568570fd

Linus Torvalds authored Oct 18, 2024

Pull xfs fixes from Carlos Maiolino:

 - Fix integer overflow in xrep_bmap

 - Fix stale dealloc punching for COW IO

* tag 'xfs-6.12-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: punch delalloc extents from the COW fork for COW writes
  xfs: set IOMAP_F_SHARED for all COW fork allocations
  xfs: share more code in xfs_buffered_write_iomap_begin
  xfs: support the COW fork in xfs_bmap_punch_delalloc_range
  xfs: IOMAP_ZERO and IOMAP_UNSHARE already hold invalidate_lock
  xfs: take XFS_MMAPLOCK_EXCL xfs_file_write_zero_eof
  xfs: factor out a xfs_file_write_zero_eof helper
  iomap: move locking out of iomap_write_delalloc_release
  iomap: remove iomap_file_buffered_write_punch_delalloc
  iomap: factor out a iomap_last_written_block helper
  xfs: fix integer overflow in xrep_bmap

568570fd

Merge tag 'pm-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 5e9ab267

Linus Torvalds authored Oct 18, 2024

Pull power management fixes from Rafael Wysocki:
 "These fix two issues in the amd-pstate cpufreq driver and update the
  intel_rapl power capping driver with a new processor ID.

  Specifics:

   - Enable ACPI CPPC in amd_pstate_register_driver() after disabling it
     in amd_pstate_unregister_driver() when switching driver operation
     modes (Dhananjay Ugwekar)

   - Make amd-pstate use nominal performance as the maximum performance
     level when boost is disabled (Mario Limonciello)

   - Add ArrowLake-H to the list of processors where PL4 is supported in
     the MSR part of the intel_rapl power capping driver (Srinivas
     Pandruvada)"

* tag 'pm-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  powercap: intel_rapl_msr: Add PL4 support for ArrowLake-H
  cpufreq/amd-pstate: Use nominal perf for limits when boost is disabled
  cpufreq/amd-pstate: Fix amd_pstate mode switch on shared memory systems

5e9ab267

Merge tag 'hwmon-for-v6.12-rc4' of... · 3b3a0ef6

Linus Torvalds authored Oct 18, 2024

Merge tag 'hwmon-for-v6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fix from Guenter Roeck:
 "Fix auto-detect regression in jc42 driver"

* tag 'hwmon-for-v6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  [PATCH} hwmon: (jc42) Properly detect TSE2004-compliant devices again

3b3a0ef6

Merge tag 'drm-fixes-2024-10-18' of https://gitlab.freedesktop.org/drm/kernel · 5d97dde4

Linus Torvalds authored Oct 18, 2024

Pull drm fixes from Dave Airlie:
 "Weekly fixes, msm and xe are the two main ones, with a bunch of
  scattered fixes including a largish revert in mgag200, then amdgpu,
  vmwgfx and scattering of other minor ones.

  All seems pretty regular.

  msm:
   - Display:
      - move CRTC resource assignment to atomic_check otherwise to make
        consecutive calls to atomic_check() consistent
      - fix rounding / sign-extension issues with pclk calculation in
        case of DSC
      - cleanups to drop incorrect null checks in dpu snapshots
      - fix to use kvzalloc in dpu snapshot to avoid allocation issues
        in heavily loaded system cases
      - Fix to not program merge_3d block if dual LM is not being used
      - Fix to not flush merge_3d block if its not enabled otherwise
        this leads to false timeouts
   - GPU:
      - a7xx: add a fence wait before SMMU table update

  xe:
   - New workaround to Xe2 (Aradhya)
   - Fix unbalanced rpm put (Matthew Auld)
   - Remove fragile lock optimization (Matthew Brost)
   - Fix job release, delegating it to the drm scheduler (Matthew Brost)
   - Fix timestamp bit width for Xe2 (Lucas)
   - Fix external BO's dma-resv usag (Matthew Brost)
   - Fix returning success for timeout in wait_token (Nirmoy)
   - Initialize fence to avoid it being detected as signaled (Matthew
     Auld)
   - Improve cache flush for BMG (Matthew Auld)
   - Don't allow hflip for tile4 framebuffer on Xe2 (Juha-Pekka)

  amdgpu:
   - SR-IOV fix
   - CS chunk handling fix
   - MES fixes
   - SMU13 fixes

  amdkfd:
   - VRAM usage reporting fix

  radeon:
   - Fix possible_clones handling

  i915:
   - Two DP bandwidth related MST fixes

  ast:
   - Clear EDID on unplugged connectors

  host1x:
   - Fix boot on Tegra186
   - Set DMA parameters

  mgag200:
   - Revert VBLANK support

  panel:
   - himax-hx83192: Adjust power and gamma

  qaic:
   - Sgtable loop fixes

  vmwgfx:
   - Limit display layout allocatino size
   - Handle allocation errors in connector checks
   - Clean up KMS code for 2d-only setup
   - Report surface-check errors correctly
   - Remove NULL test around kvfree()"

* tag 'drm-fixes-2024-10-18' of https://gitlab.freedesktop.org/drm/kernel: (45 commits)
  drm/ast: vga: Clear EDID if no display is connected
  drm/ast: sil164: Clear EDID if no display is connected
  Revert "drm/mgag200: Add vblank support"
  drm/amdgpu/swsmu: default to fullscreen 3D profile for dGPUs
  drm/i915/display: Don't allow tile4 framebuffer to do hflip on display20 or greater
  drm/xe/bmg: improve cache flushing behaviour
  drm/xe/xe_sync: initialise ufence.signalled
  drm/xe/ufence: ufence can be signaled right after wait_woken
  drm/xe: Use bookkeep slots for external BO's in exec IOCTL
  drm/xe/query: Increase timestamp width
  drm/xe: Don't free job in TDR
  drm/xe: Take job list lock in xe_sched_add_pending_job
  drm/xe: fix unbalanced rpm put() with declare_wedged()
  drm/xe: fix unbalanced rpm put() with fence_fini()
  drm/xe/xe2lpg: Extend Wa_15016589081 for xe2lpg
  drm/i915/dp_mst: Don't require DSC hblank quirk for a non-DSC compatible mode
  drm/i915/dp_mst: Handle error during DSC BW overhead/slice calculation
  drm/msm/a6xx+: Insert a fence wait before SMMU table update
  drm/msm/dpu: don't always program merge_3d block
  drm/msm/dpu: Don't always set merge_3d pending flush
  ...

5d97dde4

mm: fix follow_pfnmap API lockdep assert · b1b46751

Linus Torvalds authored Oct 18, 2024

The lockdep asserts for the new follow_pfnmap() API "knows" that a
pfnmap always has a vma->vm_file, since that's the only way to create
such a mapping.

And that's actually true for all the normal cases.  But not for the mmap
failure case, where the incomplete mapping is torn down and we have
cleared vma->vm_file because the failure occured before the file was
linked to the vma.

So this codepath does actually need to check for vm_file being NULL.
Reported-by: Jann Horn <jannh@google.com>
Fixes: 6da8e963 ("mm: new follow_pfnmap API")
Cc: Peter Xu <peterx@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

b1b46751

Merge branch 'pm-cpufreq' · cf8679bb

Rafael J. Wysocki authored Oct 18, 2024

Merge amd-pstate driver fixes for 6.12-rc4:

 - Enable ACPI CPPC in amd_pstate_register_driver() after disabling
   it in amd_pstate_unregister_driver() during driver operation mode
   switch (Dhananjay Ugwekar).

 - Make amd-pstate use nominal performance as the maximum performance
   level when boost is disabled (Mario Limonciello).

* pm-cpufreq:
  cpufreq/amd-pstate: Use nominal perf for limits when boost is disabled
  cpufreq/amd-pstate: Fix amd_pstate mode switch on shared memory systems

cf8679bb

Merge tag 'iommu-fixes-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux · 75aa74d5

Linus Torvalds authored Oct 18, 2024

Pull iommu fixes from Joerg Roedel:
 "ARM-SMMU fixes from Will Deacon:

   - Clarify warning message when failing to disable the MMU-500
     prefetcher

   - Fix undefined behaviour in calculation of L1 stream-table index
     when 32-bit StreamIDs are implemented

   - Replace a rogue comma with a semicolon

  Intel VT-d fix from Lu Baolu:

   - Fix incorrect pci_for_each_dma_alias() for non-PCI devices"

* tag 'iommu-fixes-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
  iommu/vt-d: Fix incorrect pci_for_each_dma_alias() for non-PCI devices
  iommu/arm-smmu-v3: Convert comma to semicolon
  iommu/arm-smmu-v3: Fix last_sid_idx calculation for sid_bits==32
  iommu/arm-smmu: Clarify MMU-500 CPRE workaround

75aa74d5

Merge tag 'powerpc-6.12-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · ef444a0a

Linus Torvalds authored Oct 18, 2024

Pull powerpc fix from Madhavan Srinivasan:

 - To prevent possible memory leak, free "name" on error in
   opal_event_init()

Thanks to Michael Ellerman and 2639161967.

* tag 'powerpc-6.12-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/powernv: Free name on error in opal_event_init()

ef444a0a

Merge tag 's390-6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · c91c1461

Linus Torvalds authored Oct 18, 2024

Pull s390 fixes from Heiko Carstens:

 - Fix PCI error recovery by handling error events correctly

 - Fix CCA crypto card behavior within protected execution environment

 - Two KVM commits which fix virtual vs physical address handling bugs
   in KVM pfault handling

 - Fix return code handling in pckmo_key2protkey()

 - Deactivate sclp console as late as possible so that outstanding
   messages appear on the console instead of being dropped on reboot

 - Convert newlines to CRLF instead of LFCR for the sclp vt220 driver,
   as required by the vt220 specification

 - Initialize also psw mask in perf_arch_fetch_caller_regs() to make
   sure that user_mode(regs) will return false

 - Update defconfigs

* tag 's390-6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: Update defconfigs
  s390: Initialize psw mask in perf_arch_fetch_caller_regs()
  s390/sclp_vt220: Convert newlines to CRLF instead of LFCR
  s390/sclp: Deactivate sclp after all its users
  s390/pkey_pckmo: Return with success for valid protected key types
  KVM: s390: Change virtual to physical address access in diag 0x258 handler
  KVM: s390: gaccess: Check if guest address is in memslot
  s390/ap: Fix CCA crypto card behavior within protected execution environment
  s390/pci: Handle PCI error codes other than 0x3a

c91c1461

Merge tag 'drm-xe-fixes-2024-10-17' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes · 83f00078

Dave Airlie authored Oct 18, 2024

Driver Changes:
- New workaround to Xe2 (Aradhya)
- Fix unbalanced rpm put (Matthew Auld)
- Remove fragile lock optimization (Matthew Brost)
- Fix job release, delegating it to the drm scheduler (Matthew Brost)
- Fix timestamp bit width for Xe2 (Lucas)
- Fix external BO's dma-resv usag (Matthew Brost)
- Fix returning success for timeout in wait_token (Nirmoy)
- Initialize fence to avoid it being detected as signaled (Matthew Auld)
- Improve cache flush for BMG (Matthew Auld)
- Don't allow hflip for tile4 framebuffer on Xe2 (Juha-Pekka)
Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/jkldrex5733ldxrla75b4ayvhujjhw2kccmasl5rotoufoacj4@pkvlrrv4orc7

83f00078

Merge tag 'x86_bugs_post_ibpb' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ade8ff3b

Linus Torvalds authored Oct 17, 2024

Pull x86 IBPB fixes from Borislav Petkov:
 "This fixes the IBPB implementation of older AMDs (< gen4) that do not
  flush the RSB (Return Address Stack) so you can still do some leaking
  when using a "=ibpb" mitigation for Retbleed or SRSO. Fix it by doing
  the flushing in software on those generations.

  IBPB is not the default setting so this is not likely to affect
  anybody in practice"

* tag 'x86_bugs_post_ibpb' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/bugs: Do not use UNTRAIN_RET with IBPB on entry
  x86/bugs: Skip RSB fill at VMEXIT
  x86/entry: Have entry_ibpb() invalidate return predictions
  x86/cpufeatures: Add a IBPB_NO_RET BUG flag
  x86/cpufeatures: Define X86_FEATURE_AMD_IBPB_RET

ade8ff3b

cdrom: Avoid barrier_nospec() in cdrom_ioctl_media_changed() · b0bf1afd

Josh Poimboeuf authored Oct 17, 2024

The barrier_nospec() after the array bounds check is overkill and
painfully slow for arches which implement it.

Furthermore, most arches don't implement it, so they remain exposed to
Spectre v1 (which can affect pretty much any CPU with branch
prediction).

Instead, clamp the user pointer to a valid range so it's guaranteed to
be a valid array index even when the bounds check mispredicts.

Fixes: 8270cb10 ("cdrom: Fix spectre-v1 gadget")
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lore.kernel.org/r/1d86f4d9d8fba68e5ca64cdeac2451b95a8bf872.1729202937.git.jpoimboe@kernel.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>

b0bf1afd

17 Oct, 2024 25 commits

Merge tag 'mm-hotfixes-stable-2024-10-17-16-08' of... · 4d939780

Linus Torvalds authored Oct 17, 2024

Merge tag 'mm-hotfixes-stable-2024-10-17-16-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "28 hotfixes. 13 are cc:stable. 23 are MM.

  It is the usual shower of unrelated singletons - please see the
  individual changelogs for details"

* tag 'mm-hotfixes-stable-2024-10-17-16-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (28 commits)
  maple_tree: add regression test for spanning store bug
  maple_tree: correct tree corruption on spanning store
  mm/mglru: only clear kswapd_failures if reclaimable
  mm/swapfile: skip HugeTLB pages for unuse_vma
  selftests: mm: fix the incorrect usage() info of khugepaged
  MAINTAINERS: add Jann as memory mapping/VMA reviewer
  mm: swap: prevent possible data-race in __try_to_reclaim_swap
  mm: khugepaged: fix the incorrect statistics when collapsing large file folios
  MAINTAINERS: kasan, kcov: add bugzilla links
  mm: don't install PMD mappings when THPs are disabled by the hw/process/vma
  mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw()
  Docs/damon/maintainer-profile: update deprecated awslabs GitHub URLs
  Docs/damon/maintainer-profile: add missing '_' suffixes for external web links
  maple_tree: check for MA_STATE_BULK on setting wr_rebalance
  mm: khugepaged: fix the arguments order in khugepaged_collapse_file trace point
  mm/damon/tests/sysfs-kunit.h: fix memory leak in damon_sysfs_test_add_targets()
  mm: remove unused stub for can_swapin_thp()
  mailmap: add an entry for Andy Chiu
  MAINTAINERS: add memory mapping/VMA co-maintainers
  fs/proc: fix build with GCC 15 due to -Werror=unterminated-string-initialization
  ...

4d939780

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · d4b82e58

Linus Torvalds authored Oct 17, 2024

Pull clk fixes from Stephen Boyd:
 "Two clk driver fixes and a unit test fix:

   - Terminate the of_device_id table in the Samsung exynosautov920 clk
     driver so that device matching logic doesn't run off the end of the
     array into other memory and break matching for any kernel with this
     driver loaded

   - Properly limit the max clk ID in the Rockchip clk driver

   - Use clk kunit helpers in the clk tests so that memory isn't leaked
     after the test concludes"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: test: Fix some memory leaks
  clk: rockchip: fix finding of maximum clock ID
  clk: samsung: Fix out-of-bound access of of_match_node()

d4b82e58

Merge tag 'drm-misc-fixes-2024-10-17' of... · 49ff3e79

Dave Airlie authored Oct 18, 2024

Merge tag 'drm-misc-fixes-2024-10-17' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

Short summary of fixes pull:

ast:
- Clear EDID on unplugged connectors

host1x:
- Fix boot on Tegra186
- Set DMA parameters

mgag200:
- Revert VBLANK support

panel:
- himax-hx83192: Adjust power and gamma

qaic:
- Sgtable loop fixes

vmwgfx:
- Limit display layout allocatino size
- Handle allocation errors in connector checks
- Clean up KMS code for 2d-only setup
- Report surface-check errors correctly
- Remove NULL test around kvfree()
Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20241017115516.GA196624@linux.fritz.box

49ff3e79

Merge tag 'drm-intel-fixes-2024-10-17' of... · 7626b4e9

Dave Airlie authored Oct 18, 2024

Merge tag 'drm-intel-fixes-2024-10-17' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes

- Two DP bandwidth related MST fixes
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZxDLdML9Dwqkb1AW@jlahtine-mobl.ger.corp.intel.com

7626b4e9

Merge tag 'amd-drm-fixes-6.12-2024-10-16' of... · 01541a87

Dave Airlie authored Oct 18, 2024

Merge tag 'amd-drm-fixes-6.12-2024-10-16' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.12-2024-10-16:

amdgpu:
- SR-IOV fix
- CS chunk handling fix
- MES fixes
- SMU13 fixes

amdkfd:
- VRAM usage reporting fix

radeon:
- Fix possible_clones handling
Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241016200514.3520286-1-alexander.deucher@amd.com

01541a87

Merge tag 'nvme-6.12-2024-10-18' of git://git.infradead.org/nvme into block-6.12 · de7007e9

Jens Axboe authored Oct 17, 2024

Pull NVMe fixes from Keith:

"nvme fixes for Linux 6.12

 - Fix target passthrough identifier (Nilay)
 - Fix tcp locking (Hannes)
 - Replace list with sbitmap for tracking RDMA rsp tags (Guixen)
 - Remove unnecessary fallthrough statements (Tokunori)
 - Remove ready-without-media support (Greg)
 - Fix multipath partition scan deadlock (Keith)
 - Fix concurrent PCI reset and remove queue mapping (Maurizio)
 - Fabrics shutdown fixes (Nilay)"

* tag 'nvme-6.12-2024-10-18' of git://git.infradead.org/nvme:
  nvme: use helper nvme_ctrl_state in nvme_keep_alive_finish function
  nvme: make keep-alive synchronous operation
  nvme-loop: flush off pending I/O while shutting down loop controller
  nvme-pci: fix race condition between reset and nvme_dev_disable()
  nvme-multipath: defer partition scanning
  nvme: disable CC.CRIME (NVME_CC_CRIME)
  nvme: delete unnecessary fallthru comment
  nvmet-rdma: use sbitmap to replace rsp free list
  nvme: tcp: avoid race between queue_lock lock and destroy
  nvmet-passthru: clear EUID/NGUID/UUID while using loop target
  block: fix blk_rq_map_integrity_sg kernel-doc

de7007e9

nvme: use helper nvme_ctrl_state in nvme_keep_alive_finish function · 599d9f3a

Nilay Shroff authored Oct 16, 2024

We no more need acquiring ctrl->lock before accessing the
NVMe controller state and instead we can now use the helper
nvme_ctrl_state. So replace the use of ctrl->lock from
nvme_keep_alive_finish function with nvme_ctrl_state call.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>

599d9f3a

nvme: make keep-alive synchronous operation · d0692367

Nilay Shroff authored Oct 16, 2024

The nvme keep-alive operation, which executes at a periodic interval,
could potentially sneak in while shutting down a fabric controller.
This may lead to a race between the fabric controller admin queue
destroy code path (invoked while shutting down controller) and hw/hctx
queue dispatcher called from the nvme keep-alive async request queuing
operation. This race could lead to the kernel crash shown below:

Call Trace:
    autoremove_wake_function+0x0/0xbc (unreliable)
    __blk_mq_sched_dispatch_requests+0x114/0x24c
    blk_mq_sched_dispatch_requests+0x44/0x84
    blk_mq_run_hw_queue+0x140/0x220
    nvme_keep_alive_work+0xc8/0x19c [nvme_core]
    process_one_work+0x200/0x4e0
    worker_thread+0x340/0x504
    kthread+0x138/0x140
    start_kernel_thread+0x14/0x18

While shutting down fabric controller, if nvme keep-alive request sneaks
in then it would be flushed off. The nvme_keep_alive_end_io function is
then invoked to handle the end of the keep-alive operation which
decrements the admin->q_usage_counter and assuming this is the last/only
request in the admin queue then the admin->q_usage_counter becomes zero.
If that happens then blk-mq destroy queue operation (blk_mq_destroy_
queue()) which could be potentially running simultaneously on another
cpu (as this is the controller shutdown code path) would forward
progress and deletes the admin queue. So, now from this point onward
we are not supposed to access the admin queue resources. However the
issue here's that the nvme keep-alive thread running hw/hctx queue
dispatch operation hasn't yet finished its work and so it could still
potentially access the admin queue resource while the admin queue had
been already deleted and that causes the above crash.

This fix helps avoid the observed crash by implementing keep-alive as a
synchronous operation so that we decrement admin->q_usage_counter only
after keep-alive command finished its execution and returns the command
status back up to its caller (blk_execute_rq()). This would ensure that
fabric shutdown code path doesn't destroy the fabric admin queue until
keep-alive request finished execution and also keep-alive thread is not
running hw/hctx queue dispatch operation.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>

d0692367

nvme-loop: flush off pending I/O while shutting down loop controller · c199fac8

Nilay Shroff authored Oct 16, 2024

While shutting down loop controller, we first quiesce the admin/IO queue,
delete the admin/IO tag-set and then at last destroy the admin/IO queue.
However it's quite possible that during the window between quiescing and
destroying of the admin/IO queue, some admin/IO request might sneak in
and if that happens then we could potentially encounter a hung task
because shutdown operation can't forward progress until any pending I/O
is flushed off.

This commit helps ensure that before destroying the admin/IO queue, we
unquiesce the admin/IO queue so that any outstanding requests, which are
added after the admin/IO queue is quiesced, are now flushed to its
completion.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>

c199fac8

nvme-pci: fix race condition between reset and nvme_dev_disable() · 26bc0a81

Maurizio Lombardi authored Oct 15, 2024

nvme_dev_disable() modifies the dev->online_queues field, therefore
nvme_pci_update_nr_queues() should avoid racing against it, otherwise
we could end up passing invalid values to blk_mq_update_nr_hw_queues().

 WARNING: CPU: 39 PID: 61303 at drivers/pci/msi/api.c:347
          pci_irq_get_affinity+0x187/0x210
 Workqueue: nvme-reset-wq nvme_reset_work [nvme]
 RIP: 0010:pci_irq_get_affinity+0x187/0x210
 Call Trace:
  <TASK>
  ? blk_mq_pci_map_queues+0x87/0x3c0
  ? pci_irq_get_affinity+0x187/0x210
  blk_mq_pci_map_queues+0x87/0x3c0
  nvme_pci_map_queues+0x189/0x460 [nvme]
  blk_mq_update_nr_hw_queues+0x2a/0x40
  nvme_reset_work+0x1be/0x2a0 [nvme]

Fix the bug by locking the shutdown_lock mutex before using
dev->online_queues. Give up if nvme_dev_disable() is running or if
it has been executed already.

Fixes: 949928c1 ("NVMe: Fix possible queue use after freed")
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>

26bc0a81

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 6efbea77

Linus Torvalds authored Oct 17, 2024

Pull arm64 fixes from Will Deacon:

 - Disable software tag-based KASAN when compiling with GCC, as
   functions are incorrectly instrumented leading to a crash early
   during boot

 - Fix pkey configuration for kernel threads when POE is enabled

 - Fix invalid memory accesses in uprobes when targetting load-literal
   instructions

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  kasan: Disable Software Tag-Based KASAN with GCC
  Documentation/protection-keys: add AArch64 to documentation
  arm64: set POR_EL0 for kernel threads
  arm64: probes: Fix uprobes for big-endian kernels
  arm64: probes: Fix simulate_ldr*_literal()
  arm64: probes: Remove broken LDR (literal) uprobe support

6efbea77

Merge tag 'arm-fixes-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · c16e5c94

Linus Torvalds authored Oct 17, 2024

Pull SoC fixes from Arnd Bergmann:
 "Most of the fixes this time are for platform specific drivers,
  addressing issues found through build testing on freescale, ep93xx,
  starfive, and npcm platforms, as as well as the ffa firmware.

  The fixes for the scmi firmware driver address compatibility problems
  found on broadcom machines.

  There are only two devicetree fixes, addressing incorrect in
  configuration on broadcom and marvell machines.

  The changes to the Documentation and MAINTAINERS files are for
  clarification only"

* tag 'arm-fixes-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  firmware: arm_ffa: Avoid string-fortify warning caused by memcpy()
  firmware: arm_scmi: Queue in scmi layer for mailbox implementation
  firmware: arm_ffa: Avoid string-fortify warning in export_uuid()
  firmware: arm_scmi: Give SMC transport precedence over mailbox
  firmware: arm_scmi: Fix the double free in scmi_debugfs_common_setup()
  Documentation/process: maintainer-soc: clarify submitting patches
  dmaengine: cirrus: check that output may be truncated
  dmaengine: cirrus: ERR_CAST() ioremap error
  MAINTAINERS: use the canonical soc mailing list address and mark it as L:
  ARM: dts: bcm2837-rpi-cm3-io3: Fix HDMI hpd-gpio pin
  arm64: dts: marvell: cn9130-sr-som: fix cp0 mdio pin numbers
  soc: fsl: cpm1: qmc: Fix unused data compilation warning
  soc: fsl: cpm1: qmc: Do not use IS_ERR_VALUE() on error pointers
  reset: starfive: jh71x0: Fix accessing the empty member on JH7110 SoC
  reset: npcm: convert comma to semicolon

c16e5c94

Merge tag 'sound-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 5c94bdab

Linus Torvalds authored Oct 17, 2024

Pull sound fixes from Takashi Iwai:
 "A collection of small fixes, nothing really stands out:

   - Usual HD-audio quirks / device-specific fixes

   - Kconfig dependency fix for UM

   - A series of minor fixes for SoundWire

   - Updates of USB-audio LINE6 contact address"

* tag 'sound-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/conexant - Use cached pin control for Node 0x1d on HP EliteOne 1000 G2
  ALSA/hda: intel-sdw-acpi: add support for sdw-manager-list property read
  ALSA/hda: intel-sdw-acpi: simplify sdw-master-count property read
  ALSA/hda: intel-sdw-acpi: fetch fwnode once in sdw_intel_scan_controller()
  ALSA/hda: intel-sdw-acpi: cleanup sdw_intel_scan_controller
  ALSA: hda/tas2781: Add new quirk for Lenovo, ASUS, Dell projects
  ALSA: scarlett2: Add error check after retrieving PEQ filter values
  ALSA: hda/cs8409: Fix possible NULL dereference
  sound: Make CONFIG_SND depend on INDIRECT_IOMEM instead of UML
  ALSA: line6: update contact information
  ALSA: usb-audio: Fix NULL pointer deref in snd_usb_power_domain_set()
  ALSA: hda/conexant - Fix audio routing for HP EliteOne 1000 G2
  ALSA: hda: Sound support for HP Spectre x360 16 inch model 2024

5c94bdab

Merge tag 'net-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 07d6bf63

Linus Torvalds authored Oct 17, 2024

Pull networking fixes from Paolo Abeni:
 "Current release - new code bugs:

   - eth: mlx5: HWS, don't destroy more bwc queue locks than allocated

  Previous releases - regressions:

   - ipv4: give an IPv4 dev to blackhole_netdev

   - udp: compute L4 checksum as usual when not segmenting the skb

   - tcp/dccp: don't use timer_pending() in reqsk_queue_unlink().

   - eth: mlx5e: don't call cleanup on profile rollback failure

   - eth: microchip: vcap api: fix memory leaks in
     vcap_api_encode_rule_test()

   - eth: enetc: disable Tx BD rings after they are empty

   - eth: macb: avoid 20s boot delay by skipping MDIO bus registration
     for fixed-link PHY

  Previous releases - always broken:

   - posix-clock: fix missing timespec64 check in pc_clock_settime()

   - genetlink: hold RCU in genlmsg_mcast()

   - mptcp: prevent MPC handshake on port-based signal endpoints

   - eth: vmxnet3: fix packet corruption in vmxnet3_xdp_xmit_frame

   - eth: stmmac: dwmac-tegra: fix link bring-up sequence

   - eth: bcmasp: fix potential memory leak in bcmasp_xmit()

  Misc:

   - add Andrew Lunn as a co-maintainer of all networking drivers"

* tag 'net-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
  net/mlx5e: Don't call cleanup on profile rollback failure
  net/mlx5: Unregister notifier on eswitch init failure
  net/mlx5: Fix command bitmask initialization
  net/mlx5: Check for invalid vector index on EQ creation
  net/mlx5: HWS, use lock classes for bwc locks
  net/mlx5: HWS, don't destroy more bwc queue locks than allocated
  net/mlx5: HWS, fixed double free in error flow of definer layout
  net/mlx5: HWS, removed wrong access to a number of rules variable
  mptcp: pm: fix UaF read in mptcp_pm_nl_rm_addr_or_subflow
  net: ethernet: mtk_eth_soc: fix memory corruption during fq dma init
  vmxnet3: Fix packet corruption in vmxnet3_xdp_xmit_frame
  net: dsa: vsc73xx: fix reception from VLAN-unaware bridges
  net: ravb: Only advertise Rx/Tx timestamps if hardware supports it
  net: microchip: vcap api: Fix memory leaks in vcap_api_encode_rule_test()
  net: phy: mdio-bcm-unimac: Add BCM6846 support
  dt-bindings: net: brcm,unimac-mdio: Add bcm6846-mdio
  udp: Compute L4 checksum as usual when not segmenting the skb
  genetlink: hold RCU in genlmsg_mcast()
  net: dsa: mv88e6xxx: Fix the max_vid definition for the MV88E6361
  tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink().
  ...

07d6bf63

maple_tree: add regression test for spanning store bug · e993457d

Lorenzo Stoakes authored Oct 07, 2024

Add a regression test to assert that, when performing a spanning store
which consumes the entirety of the rightmost right leaf node does not
result in maple tree corruption when doing so.

This achieves this by building a test tree of 3 levels and establishing a
store which ultimately results in a spanned store of this nature.

Link: https://lkml.kernel.org/r/30cdc101a700d16e03ba2f9aa5d83f2efa894168.1728314403.git.lorenzo.stoakes@oracle.comSigned-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Bert Karwatzki <spasswolf@web.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

e993457d

maple_tree: correct tree corruption on spanning store · bea07fd6

Lorenzo Stoakes authored Oct 07, 2024

Patch series "maple_tree: correct tree corruption on spanning store", v3.

There has been a nasty yet subtle maple tree corruption bug that appears
to have been in existence since the inception of the algorithm.

This bug seems far more likely to happen since commit f8d112a4
("mm/mmap: avoid zeroing vma tree in mmap_region()"), which is the point
at which reports started to be submitted concerning this bug.

We were made definitely aware of the bug thanks to the kind efforts of
Bert Karwatzki who helped enormously in my being able to track this down
and identify the cause of it.

The bug arises when an attempt is made to perform a spanning store across
two leaf nodes, where the right leaf node is the rightmost child of the
shared parent, AND the store completely consumes the right-mode node.

This results in mas_wr_spanning_store() mitakenly duplicating the new and
existing entries at the maximum pivot within the range, and thus maple
tree corruption.

The fix patch corrects this by detecting this scenario and disallowing the
mistaken duplicate copy.

The fix patch commit message goes into great detail as to how this occurs.

This series also includes a test which reliably reproduces the issue, and
asserts that the fix works correctly.

Bert has kindly tested the fix and confirmed it resolved his issues.  Also
Mikhail Gavrilov kindly reported what appears to be precisely the same
bug, which this fix should also resolve.


This patch (of 2):

There has been a subtle bug present in the maple tree implementation from
its inception.

This arises from how stores are performed - when a store occurs, it will
overwrite overlapping ranges and adjust the tree as necessary to
accommodate this.

A range may always ultimately span two leaf nodes.  In this instance we
walk the two leaf nodes, determine which elements are not overwritten to
the left and to the right of the start and end of the ranges respectively
and then rebalance the tree to contain these entries and the newly
inserted one.

This kind of store is dubbed a 'spanning store' and is implemented by
mas_wr_spanning_store().

In order to reach this stage, mas_store_gfp() invokes
mas_wr_preallocate(), mas_wr_store_type() and mas_wr_walk() in turn to
walk the tree and update the object (mas) to traverse to the location
where the write should be performed, determining its store type.

When a spanning store is required, this function returns false stopping at
the parent node which contains the target range, and mas_wr_store_type()
marks the mas->store_type as wr_spanning_store to denote this fact.

When we go to perform the store in mas_wr_spanning_store(), we first
determine the elements AFTER the END of the range we wish to store (that
is, to the right of the entry to be inserted) - we do this by walking to
the NEXT pivot in the tree (i.e.  r_mas.last + 1), starting at the node we
have just determined contains the range over which we intend to write.

We then turn our attention to the entries to the left of the entry we are
inserting, whose state is represented by l_mas, and copy these into a 'big
node', which is a special node which contains enough slots to contain two
leaf node's worth of data.

We then copy the entry we wish to store immediately after this - the copy
and the insertion of the new entry is performed by mas_store_b_node().

After this we copy the elements to the right of the end of the range which
we are inserting, if we have not exceeded the length of the node (i.e. 
r_mas.offset <= r_mas.end).

Herein lies the bug - under very specific circumstances, this logic can
break and corrupt the maple tree.

Consider the following tree:

Height
  0                             Root Node
                                 /      \
                 pivot = 0xffff /        \ pivot = ULONG_MAX
                               /          \
  1                       A [-----]       ...
                             /   \
             pivot = 0x4fff /     \ pivot = 0xffff
                           /       \
  2 (LEAVES)          B [-----]  [-----] C
                                      ^--- Last pivot 0xffff.

Now imagine we wish to store an entry in the range [0x4000, 0xffff] (note
that all ranges expressed in maple tree code are inclusive):

1. mas_store_gfp() descends the tree, finds node A at <=0xffff, then
   determines that this is a spanning store across nodes B and C. The mas
   state is set such that the current node from which we traverse further
   is node A.

2. In mas_wr_spanning_store() we try to find elements to the right of pivot
   0xffff by searching for an index of 0x10000:

    - mas_wr_walk_index() invokes mas_wr_walk_descend() and
      mas_wr_node_walk() in turn.

        - mas_wr_node_walk() loops over entries in node A until EITHER it
          finds an entry whose pivot equals or exceeds 0x10000 OR it
          reaches the final entry.

        - Since no entry has a pivot equal to or exceeding 0x10000, pivot
          0xffff is selected, leading to node C.

    - mas_wr_walk_traverse() resets the mas state to traverse node C. We
      loop around and invoke mas_wr_walk_descend() and mas_wr_node_walk()
      in turn once again.

         - Again, we reach the last entry in node C, which has a pivot of
           0xffff.

3. We then copy the elements to the left of 0x4000 in node B to the big
   node via mas_store_b_node(), and insert the new [0x4000, 0xffff] entry
   too.

4. We determine whether we have any entries to copy from the right of the
   end of the range via - and with r_mas set up at the entry at pivot
   0xffff, r_mas.offset <= r_mas.end, and then we DUPLICATE the entry at
   pivot 0xffff.

5. BUG! The maple tree is corrupted with a duplicate entry.

This requires a very specific set of circumstances - we must be spanning
the last element in a leaf node, which is the last element in the parent
node.

spanning store across two leaf nodes with a range that ends at that shared
pivot.

A potential solution to this problem would simply be to reset the walk
each time we traverse r_mas, however given the rarity of this situation it
seems that would be rather inefficient.

Instead, this patch detects if the right hand node is populated, i.e.  has
anything we need to copy.

We do so by only copying elements from the right of the entry being
inserted when the maximum value present exceeds the last, rather than
basing this on offset position.

The patch also updates some comments and eliminates the unused bool return
value in mas_wr_walk_index().

The work performed in commit f8d112a4 ("mm/mmap: avoid zeroing vma
tree in mmap_region()") seems to have made the probability of this event
much more likely, which is the point at which reports started to be
submitted concerning this bug.

The motivation for this change arose from Bert Karwatzki's report of
encountering mm instability after the release of kernel v6.12-rc1 which,
after the use of CONFIG_DEBUG_VM_MAPLE_TREE and similar configuration
options, was identified as maple tree corruption.

After Bert very generously provided his time and ability to reproduce this
event consistently, I was able to finally identify that the issue
discussed in this commit message was occurring for him.

Link: https://lkml.kernel.org/r/cover.1728314402.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/48b349a2a0f7c76e18772712d0997a5e12ab0a3b.1728314403.git.lorenzo.stoakes@oracle.com
Fixes: 54a611b6 ("Maple Tree: add new data structure")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-by: Bert Karwatzki <spasswolf@web.de>
Closes: https://lore.kernel.org/all/20241001023402.3374-1-spasswolf@web.de/Tested-by: Bert Karwatzki <spasswolf@web.de>
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Closes: https://lore.kernel.org/all/CABXGCsOPwuoNOqSMmAvWO2Fz4TEmPnjFj-b7iF+XFRu1h7-+Dg@mail.gmail.com/Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

bea07fd6

io_uring/sqpoll: ensure task state is TASK_RUNNING when running task_work · 8f7033aa

Jens Axboe authored Oct 17, 2024

When the sqpoll is exiting and cancels pending work items, it may need
to run task_work. If this happens from within io_uring_cancel_generic(),
then it may be under waiting for the io_uring_task waitqueue. This
results in the below splat from the scheduler, as the ring mutex may be
attempted grabbed while in a TASK_INTERRUPTIBLE state.

Ensure that the task state is set appropriately for that, just like what
is done for the other cases in io_run_task_work().

do not call blocking ops when !TASK_RUNNING; state=1 set at [<0000000029387fd2>] prepare_to_wait+0x88/0x2fc
WARNING: CPU: 6 PID: 59939 at kernel/sched/core.c:8561 __might_sleep+0xf4/0x140
Modules linked in:
CPU: 6 UID: 0 PID: 59939 Comm: iou-sqp-59938 Not tainted 6.12.0-rc3-00113-g8d020023b155 #7456
Hardware name: linux,dummy-virt (DT)
pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
pc : __might_sleep+0xf4/0x140
lr : __might_sleep+0xf4/0x140
sp : ffff80008c5e7830
x29: ffff80008c5e7830 x28: ffff0000d93088c0 x27: ffff60001c2d7230
x26: dfff800000000000 x25: ffff0000e16b9180 x24: ffff80008c5e7a50
x23: 1ffff000118bcf4a x22: ffff0000e16b9180 x21: ffff0000e16b9180
x20: 000000000000011b x19: ffff80008310fac0 x18: 1ffff000118bcd90
x17: 30303c5b20746120 x16: 74657320313d6574 x15: 0720072007200720
x14: 0720072007200720 x13: 0720072007200720 x12: ffff600036c64f0b
x11: 1fffe00036c64f0a x10: ffff600036c64f0a x9 : dfff800000000000
x8 : 00009fffc939b0f6 x7 : ffff0001b6327853 x6 : 0000000000000001
x5 : ffff0001b6327850 x4 : ffff600036c64f0b x3 : ffff8000803c35bc
x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000e16b9180
Call trace:
 __might_sleep+0xf4/0x140
 mutex_lock+0x84/0x124
 io_handle_tw_list+0xf4/0x260
 tctx_task_work_run+0x94/0x340
 io_run_task_work+0x1ec/0x3c0
 io_uring_cancel_generic+0x364/0x524
 io_sq_thread+0x820/0x124c
 ret_from_fork+0x10/0x20

Cc: stable@vger.kernel.org
Fixes: af5d68f8 ("io_uring/sqpoll: manage task_work privately")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

8f7033aa

Merge branch 'mlx5-misc-fixes-2024-10-15' · cb560795

Paolo Abeni authored Oct 17, 2024

Tariq Toukan says:

====================
mlx5 misc fixes 2024-10-15

This patchset provides misc bug fixes from the team to the mlx5 core and
Eth drivers.

Series generated against:
commit 174714f0 ("selftests: drivers: net: fix name not defined")
====================

Link: https://patch.msgid.link/20241015093208.197603-1-tariqt@nvidia.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

cb560795

net/mlx5e: Don't call cleanup on profile rollback failure · 4dbc1d1a

Cosmin Ratiu authored Oct 15, 2024

When profile rollback fails in mlx5e_netdev_change_profile, the netdev
profile var is left set to NULL. Avoid a crash when unloading the driver
by not calling profile->cleanup in such a case.

This was encountered while testing, with the original trigger that
the wq rescuer thread creation got interrupted (presumably due to
Ctrl+C-ing modprobe), which gets converted to ENOMEM (-12) by
mlx5e_priv_init, the profile rollback also fails for the same reason
(signal still active) so the profile is left as NULL, leading to a crash
later in _mlx5e_remove.

 [  732.473932] mlx5_core 0000:08:00.1: E-Switch: Unload vfs: mode(OFFLOADS), nvfs(2), necvfs(0), active vports(2)
 [  734.525513] workqueue: Failed to create a rescuer kthread for wq "mlx5e": -EINTR
 [  734.557372] mlx5_core 0000:08:00.1: mlx5e_netdev_init_profile:6235:(pid 6086): mlx5e_priv_init failed, err=-12
 [  734.559187] mlx5_core 0000:08:00.1 eth3: mlx5e_netdev_change_profile: new profile init failed, -12
 [  734.560153] workqueue: Failed to create a rescuer kthread for wq "mlx5e": -EINTR
 [  734.589378] mlx5_core 0000:08:00.1: mlx5e_netdev_init_profile:6235:(pid 6086): mlx5e_priv_init failed, err=-12
 [  734.591136] mlx5_core 0000:08:00.1 eth3: mlx5e_netdev_change_profile: failed to rollback to orig profile, -12
 [  745.537492] BUG: kernel NULL pointer dereference, address: 0000000000000008
 [  745.538222] #PF: supervisor read access in kernel mode
<snipped>
 [  745.551290] Call Trace:
 [  745.551590]  <TASK>
 [  745.551866]  ? __die+0x20/0x60
 [  745.552218]  ? page_fault_oops+0x150/0x400
 [  745.555307]  ? exc_page_fault+0x79/0x240
 [  745.555729]  ? asm_exc_page_fault+0x22/0x30
 [  745.556166]  ? mlx5e_remove+0x6b/0xb0 [mlx5_core]
 [  745.556698]  auxiliary_bus_remove+0x18/0x30
 [  745.557134]  device_release_driver_internal+0x1df/0x240
 [  745.557654]  bus_remove_device+0xd7/0x140
 [  745.558075]  device_del+0x15b/0x3c0
 [  745.558456]  mlx5_rescan_drivers_locked.part.0+0xb1/0x2f0 [mlx5_core]
 [  745.559112]  mlx5_unregister_device+0x34/0x50 [mlx5_core]
 [  745.559686]  mlx5_uninit_one+0x46/0xf0 [mlx5_core]
 [  745.560203]  remove_one+0x4e/0xd0 [mlx5_core]
 [  745.560694]  pci_device_remove+0x39/0xa0
 [  745.561112]  device_release_driver_internal+0x1df/0x240
 [  745.561631]  driver_detach+0x47/0x90
 [  745.562022]  bus_remove_driver+0x84/0x100
 [  745.562444]  pci_unregister_driver+0x3b/0x90
 [  745.562890]  mlx5_cleanup+0xc/0x1b [mlx5_core]
 [  745.563415]  __x64_sys_delete_module+0x14d/0x2f0
 [  745.563886]  ? kmem_cache_free+0x1b0/0x460
 [  745.564313]  ? lockdep_hardirqs_on_prepare+0xe2/0x190
 [  745.564825]  do_syscall_64+0x6d/0x140
 [  745.565223]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
 [  745.565725] RIP: 0033:0x7f1579b1288b

Fixes: 3ef14e46 ("net/mlx5e: Separate between netdev objects and mlx5e profiles initialization")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

4dbc1d1a

net/mlx5: Unregister notifier on eswitch init failure · 1da9cfd6

Cosmin Ratiu authored Oct 15, 2024

It otherwise remains registered and a subsequent attempt at eswitch
enabling might trigger warnings of the sort:

[  682.589148] ------------[ cut here ]------------
[  682.590204] notifier callback eswitch_vport_event [mlx5_core] already registered
[  682.590256] WARNING: CPU: 13 PID: 2660 at kernel/notifier.c:31 notifier_chain_register+0x3e/0x90
[...snipped]
[  682.610052] Call Trace:
[  682.610369]  <TASK>
[  682.610663]  ? __warn+0x7c/0x110
[  682.611050]  ? notifier_chain_register+0x3e/0x90
[  682.611556]  ? report_bug+0x148/0x170
[  682.611977]  ? handle_bug+0x36/0x70
[  682.612384]  ? exc_invalid_op+0x13/0x60
[  682.612817]  ? asm_exc_invalid_op+0x16/0x20
[  682.613284]  ? notifier_chain_register+0x3e/0x90
[  682.613789]  atomic_notifier_chain_register+0x25/0x40
[  682.614322]  mlx5_eswitch_enable_locked+0x1d4/0x3b0 [mlx5_core]
[  682.614965]  mlx5_eswitch_enable+0xc9/0x100 [mlx5_core]
[  682.615551]  mlx5_device_enable_sriov+0x25/0x340 [mlx5_core]
[  682.616170]  mlx5_core_sriov_configure+0x50/0x170 [mlx5_core]
[  682.616789]  sriov_numvfs_store+0xb0/0x1b0
[  682.617248]  kernfs_fop_write_iter+0x117/0x1a0
[  682.617734]  vfs_write+0x231/0x3f0
[  682.618138]  ksys_write+0x63/0xe0
[  682.618536]  do_syscall_64+0x4c/0x100
[  682.618958]  entry_SYSCALL_64_after_hwframe+0x4b/0x53

Fixes: 7624e58a ("net/mlx5: E-switch, register event handler before arming the event")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

1da9cfd6

net/mlx5: Fix command bitmask initialization · d62b1404

Shay Drory authored Oct 15, 2024

Command bitmask have a dedicated bit for MANAGE_PAGES command, this bit
isn't Initialize during command bitmask Initialization, only during
MANAGE_PAGES.

In addition, mlx5_cmd_trigger_completions() is trying to trigger
completion for MANAGE_PAGES command as well.

Hence, in case health error occurred before any MANAGE_PAGES command
have been invoke (for example, during mlx5_enable_hca()),
mlx5_cmd_trigger_completions() will try to trigger completion for
MANAGE_PAGES command, which will result in null-ptr-deref error.[1]

Fix it by Initialize command bitmask correctly.

While at it, re-write the code for better understanding.

[1]
BUG: KASAN: null-ptr-deref in mlx5_cmd_trigger_completions+0x1db/0x600 [mlx5_core]
Write of size 4 at addr 0000000000000214 by task kworker/u96:2/12078
CPU: 10 PID: 12078 Comm: kworker/u96:2 Not tainted 6.9.0-rc2_for_upstream_debug_2024_04_07_19_01 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: mlx5_health0000:08:00.0 mlx5_fw_fatal_reporter_err_work [mlx5_core]
Call Trace:
 <TASK>
 dump_stack_lvl+0x7e/0xc0
 kasan_report+0xb9/0xf0
 kasan_check_range+0xec/0x190
 mlx5_cmd_trigger_completions+0x1db/0x600 [mlx5_core]
 mlx5_cmd_flush+0x94/0x240 [mlx5_core]
 enter_error_state+0x6c/0xd0 [mlx5_core]
 mlx5_fw_fatal_reporter_err_work+0xf3/0x480 [mlx5_core]
 process_one_work+0x787/0x1490
 ? lockdep_hardirqs_on_prepare+0x400/0x400
 ? pwq_dec_nr_in_flight+0xda0/0xda0
 ? assign_work+0x168/0x240
 worker_thread+0x586/0xd30
 ? rescuer_thread+0xae0/0xae0
 kthread+0x2df/0x3b0
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork+0x2d/0x70
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork_asm+0x11/0x20
 </TASK>

Fixes: 9b98d395 ("net/mlx5: Start health poll at earlier stage of driver load")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

d62b1404

net/mlx5: Check for invalid vector index on EQ creation · d4f25be2

Maher Sanalla authored Oct 15, 2024

Currently, mlx5 driver does not enforce vector index to be lower than
the maximum number of supported completion vectors when requesting a
new completion EQ. Thus, mlx5_comp_eqn_get() fails when trying to
acquire an IRQ with an improper vector index.

To prevent the case above, enforce that vector index value is
valid and lower than maximum in mlx5_comp_eqn_get() before handling the
request.

Fixes: f14c1a14 ("net/mlx5: Allocate completion EQs dynamically")
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

d4f25be2

net/mlx5: HWS, use lock classes for bwc locks · 9addffa3

Cosmin Ratiu authored Oct 15, 2024

The HWS BWC API uses one lock per queue and usually acquires one of
them, except when doing changes which require locking all queues in
order. Naturally, lockdep isn't too happy about acquiring the same lock
class multiple times, so inform it that each queue lock is a different
class to avoid false positives.

Fixes: 2ca62599 ("net/mlx5: HWS, added send engine and context handling")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

9addffa3

net/mlx5: HWS, don't destroy more bwc queue locks than allocated · 45bcbd49

Cosmin Ratiu authored Oct 15, 2024

hws_send_queues_bwc_locks_destroy destroyed more queue locks than
allocated, leading to memory corruption (occasionally) and warnings such
as DEBUG_LOCKS_WARN_ON(mutex_is_locked(lock)) in __mutex_destroy because
sometimes, the 'mutex' being destroyed was random memory.
The severity of this problem is proportional to the number of queues
configured because the code overreaches beyond the end of the
bwc_send_queue_locks array by 2x its length.

Fix that by using the correct number of bwc queues.

Fixes: 2ca62599 ("net/mlx5: HWS, added send engine and context handling")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

45bcbd49

net/mlx5: HWS, fixed double free in error flow of definer layout · 5aa2184e

Yevgeny Kliteynik authored Oct 15, 2024

Fix error flow bug that could lead to double free of a buffer
during a failure to calculate a suitable definer layout.

Fixes: 74a778b4 ("net/mlx5: HWS, added definers handling")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Itamar Gozlan <igozlan@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

5aa2184e