1. 19 Oct, 2024 7 commits
    • Linus Torvalds's avatar
      Merge tag 'ftrace-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 06526daa
      Linus Torvalds authored
      Pull ftrace fixes from Steven Rostedt:
       "A couple of fixes to function graph infrastructure:
      
         - Fix allocation of idle shadow stack allocation during hotplug
      
           If function graph tracing is started when a CPU is offline, if it
           were come online during the trace then the idle task that
           represents the CPU will not get a shadow stack allocated for it.
           This means all function graph hooks that happen while that idle
           task is running (including in interrupt mode) will have all its
           events dropped.
      
           Switch over to the CPU hotplug mechanism that will have any newly
           brought on line CPU get a callback that can allocate the shadow
           stack for its idle task.
      
         - Fix allocation size of the ret_stack_list array
      
           When function graph tracing converted over to allowing more than
           one user at a time, it had to convert its shadow stack from an
           array of ret_stack structures to an array of unsigned longs. The
           shadow stacks are allocated in batches of 32 at a time and assigned
           to every running task. The batch is held by the ret_stack_list
           array.
      
           But when the conversion happened, instead of allocating an array of
           32 pointers, it was allocated as a ret_stack itself (PAGE_SIZE).
           This ret_stack_list gets passed to a function that iterates over
           what it believes is its size defined by the
           FTRACE_RETSTACK_ALLOC_SIZE macro (which is 32).
      
           Luckily (PAGE_SIZE) is greater than 32 * sizeof(long), otherwise
           this would have been an array overflow. This still should be fixed
           and the ret_stack_list should be allocated to the size it is
           expected to be as someday it may end up being bigger than
           SHADOW_STACK_SIZE"
      
      * tag 'ftrace-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        fgraph: Allocate ret_stack_list with proper size
        fgraph: Use CPU hotplug mechanism to initialize idle shadow stacks
      06526daa
    • Linus Torvalds's avatar
      Merge tag 'ipe-pr-20241018' of git://git.kernel.org/pub/scm/linux/kernel/git/wufan/ipe · 8203ca38
      Linus Torvalds authored
      Pull ipe fixes from Fan Wu:
       "This addresses several issues identified by Luca when attempting to
        enable IPE on Debian and systemd:
      
         - address issues with IPE policy update errors and policy update
           version check, improving the clarity of error messages for better
           understanding by userspace programs.
      
         - enable IPE policies to be signed by secondary and platform
           keyrings, facilitating broader use across general Linux
           distributions like Debian.
      
         - updates the IPE entry in the MAINTAINERS file to reflect the new
           tree URL and my updated email from kernel.org"
      
      * tag 'ipe-pr-20241018' of git://git.kernel.org/pub/scm/linux/kernel/git/wufan/ipe:
        MAINTAINERS: update IPE tree url and Fan Wu's email
        ipe: fallback to platform keyring also if key in trusted keyring is rejected
        ipe: allow secondary and platform keyrings to install/update policies
        ipe: also reject policy updates with the same version
        ipe: return -ESTALE instead of -EINVAL on update when new policy has a lower version
      8203ca38
    • Linus Torvalds's avatar
      Merge tag 'input-for-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · f9e48255
      Linus Torvalds authored
      Pull input fixes from Dmitry Torokhov:
      
       - a fix for Zinitix driver to not fail probing if the property enabling
         touch keys functionality is not defined. Support for touch keys was
         added in 6.12 merge window so this issue does not affect users of
         released kernels
      
       - a couple new vendor/device IDs in xpad driver to enable support for
         more hardware
      
      * tag 'input-for-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: zinitix - don't fail if linux,keycodes prop is absent
        Input: xpad - add support for MSI Claw A1M
        Input: xpad - add support for 8BitDo Ultimate 2C Wireless Controller
      f9e48255
    • Linus Torvalds's avatar
      Merge tag '9p-for-6.12-rc4' of https://github.com/martinetd/linux · 9197b73f
      Linus Torvalds authored
      Pull 9p fixes from Dominique Martinet:
       "Mashed-up update that I sat on too long:
      
         - fix for multiple slabs created with the same name
      
         - enable multipage folios
      
         - theorical fix to also look for opened fids by inode if none was
           found by dentry"
      
      [ Enabling multi-page folios should have been done during the merge
        window, but it's a one-liner, and the actual meat of the enablement
        is in netfs and already in use for other filesystems...  - Linus ]
      
      * tag '9p-for-6.12-rc4' of https://github.com/martinetd/linux:
        9p: Avoid creating multiple slab caches with the same name
        9p: Enable multipage folios
        9p: v9fs_fid_find: also lookup by inode if not found dentry
      9197b73f
    • Linus Torvalds's avatar
      Merge tag 'rust-fixes-6.12-2' of https://github.com/Rust-for-Linux/linux · 4e6bd4a3
      Linus Torvalds authored
      Pull rust fixes from Miguel Ojeda:
       "Toolchain and infrastructure:
      
         - Fix several issues with the 'rustc-option' macro. It includes a
           refactor from Masahiro of three '{cc,rust}-*' macros, which is not
           a fix but avoids repeating the same commands (which would be
           several lines in the case of 'rustc-option').
      
         - Fix conditions for 'CONFIG_HAVE_CFI_ICALL_NORMALIZE_INTEGERS'. It
           includes the addition of 'CONFIG_RUSTC_LLVM_VERSION', which is not
           a fix but is needed for the actual fix.
      
        And a trivial grammar fix"
      
      * tag 'rust-fixes-6.12-2' of https://github.com/Rust-for-Linux/linux:
        cfi: fix conditions for HAVE_CFI_ICALL_NORMALIZE_INTEGERS
        kbuild: rust: add `CONFIG_RUSTC_LLVM_VERSION`
        kbuild: fix issues with rustc-option
        kbuild: refactor cc-option-yn, cc-disable-warning, rust-option-yn macros
        lib/Kconfig.debug: fix grammar in RUST_BUILD_ASSERT_ALLOW
      4e6bd4a3
    • Steven Rostedt's avatar
      fgraph: Allocate ret_stack_list with proper size · fae4078c
      Steven Rostedt authored
      The ret_stack_list is an array of ret_stack shadow stacks for the function
      graph usage. When the first function graph is enabled, all tasks in the
      system get a shadow stack. The ret_stack_list is a 32 element array of
      pointers to these shadow stacks. It allocates the shadow stack in batches
      (32 stacks at a time), assigns them to running tasks, and continues until
      all tasks are covered.
      
      When the function graph shadow stack changed from an array of
      ftrace_ret_stack structures to an array of longs, the allocation of
      ret_stack_list went from allocating an array of 32 elements to just a
      block defined by SHADOW_STACK_SIZE. Luckily, that's defined as PAGE_SIZE
      and is much more than enough to hold 32 pointers. But it is way overkill
      for the amount needed to allocate.
      
      Change the allocation of ret_stack_list back to a kcalloc() of
      FTRACE_RETSTACK_ALLOC_SIZE pointers.
      
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Link: https://lore.kernel.org/20241018215212.23f13f40@rorschach
      Fixes: 42675b72 ("function_graph: Convert ret_stack to a series of longs")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      fae4078c
    • Steven Rostedt's avatar
      fgraph: Use CPU hotplug mechanism to initialize idle shadow stacks · 2c02f737
      Steven Rostedt authored
      The function graph infrastructure allocates a shadow stack for every task
      when enabled. This includes the idle tasks. The first time the function
      graph is invoked, the shadow stacks are created and never freed until the
      task exits. This includes the idle tasks.
      
      Only the idle tasks that were for online CPUs had their shadow stacks
      created when function graph tracing started. If function graph tracing is
      enabled and a CPU comes online, the idle task representing that CPU will
      not have its shadow stack created, and all function graph tracing for that
      idle task will be silently dropped.
      
      Instead, use the CPU hotplug mechanism to allocate the idle shadow stacks.
      This will include idle tasks for CPUs that come online during tracing.
      
      This issue can be reproduced by:
      
       # cd /sys/kernel/tracing
       # echo 0 > /sys/devices/system/cpu/cpu1/online
       # echo 0 > set_ftrace_pid
       # echo function_graph > current_tracer
       # echo 1 > options/funcgraph-proc
       # echo 1 > /sys/devices/system/cpu/cpu1
       # grep '<idle>' per_cpu/cpu1/trace | head
      
      Before, nothing would show up.
      
      After:
       1)    <idle>-0    |   0.811 us    |                        __enqueue_entity();
       1)    <idle>-0    |   5.626 us    |                      } /* enqueue_entity */
       1)    <idle>-0    |               |                      dl_server_update_idle_time() {
       1)    <idle>-0    |               |                        dl_scaled_delta_exec() {
       1)    <idle>-0    |   0.450 us    |                          arch_scale_cpu_capacity();
       1)    <idle>-0    |   1.242 us    |                        }
       1)    <idle>-0    |   1.908 us    |                      }
       1)    <idle>-0    |               |                      dl_server_start() {
       1)    <idle>-0    |               |                        enqueue_dl_entity() {
       1)    <idle>-0    |               |                          task_contending() {
      
      Note, if tracing stops and restarts, the old way would then initialize
      the onlined CPUs.
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lore.kernel.org/20241018214300.6df82178@rorschach
      Fixes: 868baf07 ("ftrace: Fix memory leak with function graph and cpu hotplug")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      2c02f737
  2. 18 Oct, 2024 21 commits
    • Linus Torvalds's avatar
      Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 3d5ad2d4
      Linus Torvalds authored
      Pull bpf fixes from Daniel Borkmann:
      
       - Fix BPF verifier to not affect subreg_def marks in its range
         propagation (Eduard Zingerman)
      
       - Fix a truncation bug in the BPF verifier's handling of
         coerce_reg_to_size_sx (Dimitar Kanaliev)
      
       - Fix the BPF verifier's delta propagation between linked registers
         under 32-bit addition (Daniel Borkmann)
      
       - Fix a NULL pointer dereference in BPF devmap due to missing rxq
         information (Florian Kauer)
      
       - Fix a memory leak in bpf_core_apply (Jiri Olsa)
      
       - Fix an UBSAN-reported array-index-out-of-bounds in BTF parsing for
         arrays of nested structs (Hou Tao)
      
       - Fix build ID fetching where memory areas backing the file were
         created with memfd_secret (Andrii Nakryiko)
      
       - Fix BPF task iterator tid filtering which was incorrectly using pid
         instead of tid (Jordan Rome)
      
       - Several fixes for BPF sockmap and BPF sockhash redirection in
         combination with vsocks (Michal Luczaj)
      
       - Fix riscv BPF JIT and make BPF_CMPXCHG fully ordered (Andrea Parri)
      
       - Fix riscv BPF JIT under CONFIG_CFI_CLANG to prevent the possibility
         of an infinite BPF tailcall (Pu Lehui)
      
       - Fix a build warning from resolve_btfids that bpf_lsm_key_free cannot
         be resolved (Thomas Weißschuh)
      
       - Fix a bug in kfunc BTF caching for modules where the wrong BTF object
         was returned (Toke Høiland-Jørgensen)
      
       - Fix a BPF selftest compilation error in cgroup-related tests with
         musl libc (Tony Ambardar)
      
       - Several fixes to BPF link info dumps to fill missing fields (Tyrone
         Wu)
      
       - Add BPF selftests for kfuncs from multiple modules, checking that the
         correct kfuncs are called (Simon Sundberg)
      
       - Ensure that internal and user-facing bpf_redirect flags don't overlap
         (Toke Høiland-Jørgensen)
      
       - Switch to use kvzmalloc to allocate BPF verifier environment (Rik van
         Riel)
      
       - Use raw_spinlock_t in BPF ringbuf to fix a sleep in atomic splat
         under RT (Wander Lairson Costa)
      
      * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (38 commits)
        lib/buildid: Handle memfd_secret() files in build_id_parse()
        selftests/bpf: Add test case for delta propagation
        bpf: Fix print_reg_state's constant scalar dump
        bpf: Fix incorrect delta propagation between linked registers
        bpf: Properly test iter/task tid filtering
        bpf: Fix iter/task tid filtering
        riscv, bpf: Make BPF_CMPXCHG fully ordered
        bpf, vsock: Drop static vsock_bpf_prot initialization
        vsock: Update msg_count on read_skb()
        vsock: Update rx_bytes on read_skb()
        bpf, sockmap: SK_DROP on attempted redirects of unsupported af_vsock
        selftests/bpf: Add asserts for netfilter link info
        bpf: Fix link info netfilter flags to populate defrag flag
        selftests/bpf: Add test for sign extension in coerce_subreg_to_size_sx()
        selftests/bpf: Add test for truncation after sign extension in coerce_reg_to_size_sx()
        bpf: Fix truncation bug in coerce_reg_to_size_sx()
        selftests/bpf: Assert link info uprobe_multi count & path_size if unset
        bpf: Fix unpopulated path_size when uprobe_multi fields unset
        selftests/bpf: Fix cross-compiling urandom_read
        selftests/bpf: Add test for kfunc module order
        ...
      3d5ad2d4
    • Linus Torvalds's avatar
      Merge tag 'linux_kselftest-fixes-6.12-rc4' of... · dbafeddb
      Linus Torvalds authored
      Merge tag 'linux_kselftest-fixes-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull kselftest fix from Shuah Khan:
      
       - fix test makefile to install tests directory without which the test
         fails with errors
      
      * tag 'linux_kselftest-fixes-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
        selftest: hid: add the missing tests directory
      dbafeddb
    • Nikita Travkin's avatar
      Input: zinitix - don't fail if linux,keycodes prop is absent · 2de01e0e
      Nikita Travkin authored
      When initially adding the touchkey support, a mistake was made in the
      property parsing code. The possible negative errno from
      device_property_count_u32() was never checked, which was an oversight
      left from converting to it from the of_property as part of the review
      fixes.
      
      Re-add the correct handling of the absent property, in which case zero
      touchkeys should be assumed, which would disable the feature.
      Reported-by: default avatarJakob Hauser <jahau@rocketmail.com>
      Tested-by: default avatarJakob Hauser <jahau@rocketmail.com>
      Fixes: 075d9b22 ("Input: zinitix - add touchkey support")
      Reviewed-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarNikita Travkin <nikita@trvn.ru>
      Tested-by: default avatarYassine Oudjana <y.oudjana@protonmail.com>
      Link: https://lore.kernel.org/r/20241004-zinitix-no-keycodes-v2-1-876dc9fea4b6@trvn.ruSigned-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      2de01e0e
    • Linus Torvalds's avatar
      Merge tag 'block-6.12-20241018' of git://git.kernel.dk/linux · f8eacd8a
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
      
       - NVMe pull request via Keith:
           - Fix target passthrough identifier (Nilay)
           - Fix tcp locking (Hannes)
           - Replace list with sbitmap for tracking RDMA rsp tags (Guixen)
           - Remove unnecessary fallthrough statements (Tokunori)
           - Remove ready-without-media support (Greg)
           - Fix multipath partition scan deadlock (Keith)
           - Fix concurrent PCI reset and remove queue mapping (Maurizio)
           - Fabrics shutdown fixes (Nilay)
      
       - Fix for a kerneldoc warning (Keith)
      
       - Fix a race with blk-rq-qos and wakeups (Omar)
      
       - Cleanup of checking for always-set tag_set (SurajSonawane2415)
      
       - Fix for a crash with CPU hotplug notifiers (Ming)
      
       - Don't allow zero-copy ublk on unprivileged device (Ming)
      
       - Use array_index_nospec() for CDROM (Josh)
      
       - Remove dead code in drbd (David)
      
       - Tweaks to elevator loading (Breno)
      
      * tag 'block-6.12-20241018' of git://git.kernel.dk/linux:
        cdrom: Avoid barrier_nospec() in cdrom_ioctl_media_changed()
        nvme: use helper nvme_ctrl_state in nvme_keep_alive_finish function
        nvme: make keep-alive synchronous operation
        nvme-loop: flush off pending I/O while shutting down loop controller
        nvme-pci: fix race condition between reset and nvme_dev_disable()
        ublk: don't allow user copy for unprivileged device
        blk-rq-qos: fix crash on rq_qos_wait vs. rq_qos_wake_function race
        nvme-multipath: defer partition scanning
        blk-mq: setup queue ->tag_set before initializing hctx
        elevator: Remove argument from elevator_find_get
        elevator: do not request_module if elevator exists
        drbd: Remove unused conn_lowest_minor
        nvme: disable CC.CRIME (NVME_CC_CRIME)
        nvme: delete unnecessary fallthru comment
        nvmet-rdma: use sbitmap to replace rsp free list
        block: Fix elevator_get_default() checking for NULL q->tag_set
        nvme: tcp: avoid race between queue_lock lock and destroy
        nvmet-passthru: clear EUID/NGUID/UUID while using loop target
        block: fix blk_rq_map_integrity_sg kernel-doc
      f8eacd8a
    • Linus Torvalds's avatar
      Merge tag 'io_uring-6.12-20241018' of git://git.kernel.dk/linux · a041f478
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
      
       - Fix a regression this merge window where cloning of registered
         buffers didn't take into account the dummy_ubuf
      
       - Fix a race with reading how many SQRING entries are available,
         causing userspace to need to loop around io_uring_sqring_wait()
         rather than being able to rely on SQEs being available when it
         returned
      
       - Ensure that the SQPOLL thread is TASK_RUNNING before running
         task_work off the cancelation exit path
      
      * tag 'io_uring-6.12-20241018' of git://git.kernel.dk/linux:
        io_uring/sqpoll: ensure task state is TASK_RUNNING when running task_work
        io_uring/rsrc: ignore dummy_ubuf for buffer cloning
        io_uring/sqpoll: close race on waiting for sqring entries
      a041f478
    • John Edwards's avatar
      Input: xpad - add support for MSI Claw A1M · 22a18935
      John Edwards authored
      Add MSI Claw A1M controller to xpad_device match table when in xinput mode.
      Add MSI VID as XPAD_XBOX360_VENDOR.
      Signed-off-by: default avatarJohn Edwards <uejji@uejji.net>
      Reviewed-by: default avatarDerek J. Clark <derekjohn.clark@gmail.com>
      Reviewed-by: default avatarChristopher Snowhill <kode54@gmail.com>
      Link: https://lore.kernel.org/r/20241010232020.3292284-4-uejji@uejji.net
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      22a18935
    • Fan Wu's avatar
      MAINTAINERS: update IPE tree url and Fan Wu's email · 917a15c3
      Fan Wu authored
      Update Integrity Policy Enforcement (IPE) LSM tree url and
      maintainer's email to the newly issued kernel.org tree/email.
      Signed-off-by: default avatarFan Wu <wufan@kernel.org>
      917a15c3
    • Luca Boccassi's avatar
      ipe: fallback to platform keyring also if key in trusted keyring is rejected · f40998a8
      Luca Boccassi authored
      If enabled, we fallback to the platform keyring if the trusted keyring
      doesn't have the key used to sign the ipe policy. But if pkcs7_verify()
      rejects the key for other reasons, such as usage restrictions, we do not
      fallback. Do so, following the same change in dm-verity.
      Signed-off-by: default avatarLuca Boccassi <bluca@debian.org>
      Suggested-by: default avatarSerge Hallyn <serge@hallyn.com>
      [FW: fixed some line length issues and a typo in the commit message]
      Signed-off-by: default avatarFan Wu <wufan@kernel.org>
      f40998a8
    • Linus Torvalds's avatar
      Merge tag 'v6.12-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 · b04ae0f4
      Linus Torvalds authored
      Pull smb client fixes from Steve French:
      
       - Fix possible double free setting xattrs
      
       - Fix slab out of bounds with large ioctl payload
      
       - Remove three unused functions, and an unused variable that could be
         confusing
      
      * tag 'v6.12-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: Remove unused functions
        smb/client: Fix logically dead code
        smb: client: fix OOBs when building SMB2_IOCTL request
        smb: client: fix possible double free in smb2_set_ea()
      b04ae0f4
    • Linus Torvalds's avatar
      Merge tag 'xfs-6.12-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 568570fd
      Linus Torvalds authored
      Pull xfs fixes from Carlos Maiolino:
      
       - Fix integer overflow in xrep_bmap
      
       - Fix stale dealloc punching for COW IO
      
      * tag 'xfs-6.12-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: punch delalloc extents from the COW fork for COW writes
        xfs: set IOMAP_F_SHARED for all COW fork allocations
        xfs: share more code in xfs_buffered_write_iomap_begin
        xfs: support the COW fork in xfs_bmap_punch_delalloc_range
        xfs: IOMAP_ZERO and IOMAP_UNSHARE already hold invalidate_lock
        xfs: take XFS_MMAPLOCK_EXCL xfs_file_write_zero_eof
        xfs: factor out a xfs_file_write_zero_eof helper
        iomap: move locking out of iomap_write_delalloc_release
        iomap: remove iomap_file_buffered_write_punch_delalloc
        iomap: factor out a iomap_last_written_block helper
        xfs: fix integer overflow in xrep_bmap
      568570fd
    • Linus Torvalds's avatar
      Merge tag 'pm-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 5e9ab267
      Linus Torvalds authored
      Pull power management fixes from Rafael Wysocki:
       "These fix two issues in the amd-pstate cpufreq driver and update the
        intel_rapl power capping driver with a new processor ID.
      
        Specifics:
      
         - Enable ACPI CPPC in amd_pstate_register_driver() after disabling it
           in amd_pstate_unregister_driver() when switching driver operation
           modes (Dhananjay Ugwekar)
      
         - Make amd-pstate use nominal performance as the maximum performance
           level when boost is disabled (Mario Limonciello)
      
         - Add ArrowLake-H to the list of processors where PL4 is supported in
           the MSR part of the intel_rapl power capping driver (Srinivas
           Pandruvada)"
      
      * tag 'pm-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        powercap: intel_rapl_msr: Add PL4 support for ArrowLake-H
        cpufreq/amd-pstate: Use nominal perf for limits when boost is disabled
        cpufreq/amd-pstate: Fix amd_pstate mode switch on shared memory systems
      5e9ab267
    • Linus Torvalds's avatar
      Merge tag 'hwmon-for-v6.12-rc4' of... · 3b3a0ef6
      Linus Torvalds authored
      Merge tag 'hwmon-for-v6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
      
      Pull hwmon fix from Guenter Roeck:
       "Fix auto-detect regression in jc42 driver"
      
      * tag 'hwmon-for-v6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
        [PATCH} hwmon: (jc42) Properly detect TSE2004-compliant devices again
      3b3a0ef6
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2024-10-18' of https://gitlab.freedesktop.org/drm/kernel · 5d97dde4
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Weekly fixes, msm and xe are the two main ones, with a bunch of
        scattered fixes including a largish revert in mgag200, then amdgpu,
        vmwgfx and scattering of other minor ones.
      
        All seems pretty regular.
      
        msm:
         - Display:
            - move CRTC resource assignment to atomic_check otherwise to make
              consecutive calls to atomic_check() consistent
            - fix rounding / sign-extension issues with pclk calculation in
              case of DSC
            - cleanups to drop incorrect null checks in dpu snapshots
            - fix to use kvzalloc in dpu snapshot to avoid allocation issues
              in heavily loaded system cases
            - Fix to not program merge_3d block if dual LM is not being used
            - Fix to not flush merge_3d block if its not enabled otherwise
              this leads to false timeouts
         - GPU:
            - a7xx: add a fence wait before SMMU table update
      
        xe:
         - New workaround to Xe2 (Aradhya)
         - Fix unbalanced rpm put (Matthew Auld)
         - Remove fragile lock optimization (Matthew Brost)
         - Fix job release, delegating it to the drm scheduler (Matthew Brost)
         - Fix timestamp bit width for Xe2 (Lucas)
         - Fix external BO's dma-resv usag (Matthew Brost)
         - Fix returning success for timeout in wait_token (Nirmoy)
         - Initialize fence to avoid it being detected as signaled (Matthew
           Auld)
         - Improve cache flush for BMG (Matthew Auld)
         - Don't allow hflip for tile4 framebuffer on Xe2 (Juha-Pekka)
      
        amdgpu:
         - SR-IOV fix
         - CS chunk handling fix
         - MES fixes
         - SMU13 fixes
      
        amdkfd:
         - VRAM usage reporting fix
      
        radeon:
         - Fix possible_clones handling
      
        i915:
         - Two DP bandwidth related MST fixes
      
        ast:
         - Clear EDID on unplugged connectors
      
        host1x:
         - Fix boot on Tegra186
         - Set DMA parameters
      
        mgag200:
         - Revert VBLANK support
      
        panel:
         - himax-hx83192: Adjust power and gamma
      
        qaic:
         - Sgtable loop fixes
      
        vmwgfx:
         - Limit display layout allocatino size
         - Handle allocation errors in connector checks
         - Clean up KMS code for 2d-only setup
         - Report surface-check errors correctly
         - Remove NULL test around kvfree()"
      
      * tag 'drm-fixes-2024-10-18' of https://gitlab.freedesktop.org/drm/kernel: (45 commits)
        drm/ast: vga: Clear EDID if no display is connected
        drm/ast: sil164: Clear EDID if no display is connected
        Revert "drm/mgag200: Add vblank support"
        drm/amdgpu/swsmu: default to fullscreen 3D profile for dGPUs
        drm/i915/display: Don't allow tile4 framebuffer to do hflip on display20 or greater
        drm/xe/bmg: improve cache flushing behaviour
        drm/xe/xe_sync: initialise ufence.signalled
        drm/xe/ufence: ufence can be signaled right after wait_woken
        drm/xe: Use bookkeep slots for external BO's in exec IOCTL
        drm/xe/query: Increase timestamp width
        drm/xe: Don't free job in TDR
        drm/xe: Take job list lock in xe_sched_add_pending_job
        drm/xe: fix unbalanced rpm put() with declare_wedged()
        drm/xe: fix unbalanced rpm put() with fence_fini()
        drm/xe/xe2lpg: Extend Wa_15016589081 for xe2lpg
        drm/i915/dp_mst: Don't require DSC hblank quirk for a non-DSC compatible mode
        drm/i915/dp_mst: Handle error during DSC BW overhead/slice calculation
        drm/msm/a6xx+: Insert a fence wait before SMMU table update
        drm/msm/dpu: don't always program merge_3d block
        drm/msm/dpu: Don't always set merge_3d pending flush
        ...
      5d97dde4
    • Linus Torvalds's avatar
      mm: fix follow_pfnmap API lockdep assert · b1b46751
      Linus Torvalds authored
      The lockdep asserts for the new follow_pfnmap() API "knows" that a
      pfnmap always has a vma->vm_file, since that's the only way to create
      such a mapping.
      
      And that's actually true for all the normal cases.  But not for the mmap
      failure case, where the incomplete mapping is torn down and we have
      cleared vma->vm_file because the failure occured before the file was
      linked to the vma.
      
      So this codepath does actually need to check for vm_file being NULL.
      Reported-by: default avatarJann Horn <jannh@google.com>
      Fixes: 6da8e963 ("mm: new follow_pfnmap API")
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b1b46751
    • Rafael J. Wysocki's avatar
      Merge branch 'pm-cpufreq' · cf8679bb
      Rafael J. Wysocki authored
      Merge amd-pstate driver fixes for 6.12-rc4:
      
       - Enable ACPI CPPC in amd_pstate_register_driver() after disabling
         it in amd_pstate_unregister_driver() during driver operation mode
         switch (Dhananjay Ugwekar).
      
       - Make amd-pstate use nominal performance as the maximum performance
         level when boost is disabled (Mario Limonciello).
      
      * pm-cpufreq:
        cpufreq/amd-pstate: Use nominal perf for limits when boost is disabled
        cpufreq/amd-pstate: Fix amd_pstate mode switch on shared memory systems
      cf8679bb
    • Linus Torvalds's avatar
      Merge tag 'iommu-fixes-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux · 75aa74d5
      Linus Torvalds authored
      Pull iommu fixes from Joerg Roedel:
       "ARM-SMMU fixes from Will Deacon:
      
         - Clarify warning message when failing to disable the MMU-500
           prefetcher
      
         - Fix undefined behaviour in calculation of L1 stream-table index
           when 32-bit StreamIDs are implemented
      
         - Replace a rogue comma with a semicolon
      
        Intel VT-d fix from Lu Baolu:
      
         - Fix incorrect pci_for_each_dma_alias() for non-PCI devices"
      
      * tag 'iommu-fixes-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
        iommu/vt-d: Fix incorrect pci_for_each_dma_alias() for non-PCI devices
        iommu/arm-smmu-v3: Convert comma to semicolon
        iommu/arm-smmu-v3: Fix last_sid_idx calculation for sid_bits==32
        iommu/arm-smmu: Clarify MMU-500 CPRE workaround
      75aa74d5
    • Linus Torvalds's avatar
      Merge tag 'powerpc-6.12-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · ef444a0a
      Linus Torvalds authored
      Pull powerpc fix from Madhavan Srinivasan:
      
       - To prevent possible memory leak, free "name" on error in
         opal_event_init()
      
      Thanks to Michael Ellerman and 2639161967.
      
      * tag 'powerpc-6.12-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/powernv: Free name on error in opal_event_init()
      ef444a0a
    • Linus Torvalds's avatar
      Merge tag 's390-6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · c91c1461
      Linus Torvalds authored
      Pull s390 fixes from Heiko Carstens:
      
       - Fix PCI error recovery by handling error events correctly
      
       - Fix CCA crypto card behavior within protected execution environment
      
       - Two KVM commits which fix virtual vs physical address handling bugs
         in KVM pfault handling
      
       - Fix return code handling in pckmo_key2protkey()
      
       - Deactivate sclp console as late as possible so that outstanding
         messages appear on the console instead of being dropped on reboot
      
       - Convert newlines to CRLF instead of LFCR for the sclp vt220 driver,
         as required by the vt220 specification
      
       - Initialize also psw mask in perf_arch_fetch_caller_regs() to make
         sure that user_mode(regs) will return false
      
       - Update defconfigs
      
      * tag 's390-6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390: Update defconfigs
        s390: Initialize psw mask in perf_arch_fetch_caller_regs()
        s390/sclp_vt220: Convert newlines to CRLF instead of LFCR
        s390/sclp: Deactivate sclp after all its users
        s390/pkey_pckmo: Return with success for valid protected key types
        KVM: s390: Change virtual to physical address access in diag 0x258 handler
        KVM: s390: gaccess: Check if guest address is in memslot
        s390/ap: Fix CCA crypto card behavior within protected execution environment
        s390/pci: Handle PCI error codes other than 0x3a
      c91c1461
    • Dave Airlie's avatar
      Merge tag 'drm-xe-fixes-2024-10-17' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes · 83f00078
      Dave Airlie authored
      Driver Changes:
      - New workaround to Xe2 (Aradhya)
      - Fix unbalanced rpm put (Matthew Auld)
      - Remove fragile lock optimization (Matthew Brost)
      - Fix job release, delegating it to the drm scheduler (Matthew Brost)
      - Fix timestamp bit width for Xe2 (Lucas)
      - Fix external BO's dma-resv usag (Matthew Brost)
      - Fix returning success for timeout in wait_token (Nirmoy)
      - Initialize fence to avoid it being detected as signaled (Matthew Auld)
      - Improve cache flush for BMG (Matthew Auld)
      - Don't allow hflip for tile4 framebuffer on Xe2 (Juha-Pekka)
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      
      From: Lucas De Marchi <lucas.demarchi@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/jkldrex5733ldxrla75b4ayvhujjhw2kccmasl5rotoufoacj4@pkvlrrv4orc7
      83f00078
    • Linus Torvalds's avatar
      Merge tag 'x86_bugs_post_ibpb' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ade8ff3b
      Linus Torvalds authored
      Pull x86 IBPB fixes from Borislav Petkov:
       "This fixes the IBPB implementation of older AMDs (< gen4) that do not
        flush the RSB (Return Address Stack) so you can still do some leaking
        when using a "=ibpb" mitigation for Retbleed or SRSO. Fix it by doing
        the flushing in software on those generations.
      
        IBPB is not the default setting so this is not likely to affect
        anybody in practice"
      
      * tag 'x86_bugs_post_ibpb' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/bugs: Do not use UNTRAIN_RET with IBPB on entry
        x86/bugs: Skip RSB fill at VMEXIT
        x86/entry: Have entry_ibpb() invalidate return predictions
        x86/cpufeatures: Add a IBPB_NO_RET BUG flag
        x86/cpufeatures: Define X86_FEATURE_AMD_IBPB_RET
      ade8ff3b
    • Josh Poimboeuf's avatar
      cdrom: Avoid barrier_nospec() in cdrom_ioctl_media_changed() · b0bf1afd
      Josh Poimboeuf authored
      The barrier_nospec() after the array bounds check is overkill and
      painfully slow for arches which implement it.
      
      Furthermore, most arches don't implement it, so they remain exposed to
      Spectre v1 (which can affect pretty much any CPU with branch
      prediction).
      
      Instead, clamp the user pointer to a valid range so it's guaranteed to
      be a valid array index even when the bounds check mispredicts.
      
      Fixes: 8270cb10 ("cdrom: Fix spectre-v1 gadget")
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@kernel.org>
      Link: https://lore.kernel.org/r/1d86f4d9d8fba68e5ca64cdeac2451b95a8bf872.1729202937.git.jpoimboe@kernel.orgSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b0bf1afd
  3. 17 Oct, 2024 12 commits
    • Linus Torvalds's avatar
      Merge tag 'mm-hotfixes-stable-2024-10-17-16-08' of... · 4d939780
      Linus Torvalds authored
      Merge tag 'mm-hotfixes-stable-2024-10-17-16-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
      
      Pull misc fixes from Andrew Morton:
       "28 hotfixes. 13 are cc:stable. 23 are MM.
      
        It is the usual shower of unrelated singletons - please see the
        individual changelogs for details"
      
      * tag 'mm-hotfixes-stable-2024-10-17-16-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (28 commits)
        maple_tree: add regression test for spanning store bug
        maple_tree: correct tree corruption on spanning store
        mm/mglru: only clear kswapd_failures if reclaimable
        mm/swapfile: skip HugeTLB pages for unuse_vma
        selftests: mm: fix the incorrect usage() info of khugepaged
        MAINTAINERS: add Jann as memory mapping/VMA reviewer
        mm: swap: prevent possible data-race in __try_to_reclaim_swap
        mm: khugepaged: fix the incorrect statistics when collapsing large file folios
        MAINTAINERS: kasan, kcov: add bugzilla links
        mm: don't install PMD mappings when THPs are disabled by the hw/process/vma
        mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw()
        Docs/damon/maintainer-profile: update deprecated awslabs GitHub URLs
        Docs/damon/maintainer-profile: add missing '_' suffixes for external web links
        maple_tree: check for MA_STATE_BULK on setting wr_rebalance
        mm: khugepaged: fix the arguments order in khugepaged_collapse_file trace point
        mm/damon/tests/sysfs-kunit.h: fix memory leak in damon_sysfs_test_add_targets()
        mm: remove unused stub for can_swapin_thp()
        mailmap: add an entry for Andy Chiu
        MAINTAINERS: add memory mapping/VMA co-maintainers
        fs/proc: fix build with GCC 15 due to -Werror=unterminated-string-initialization
        ...
      4d939780
    • Linus Torvalds's avatar
      Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · d4b82e58
      Linus Torvalds authored
      Pull clk fixes from Stephen Boyd:
       "Two clk driver fixes and a unit test fix:
      
         - Terminate the of_device_id table in the Samsung exynosautov920 clk
           driver so that device matching logic doesn't run off the end of the
           array into other memory and break matching for any kernel with this
           driver loaded
      
         - Properly limit the max clk ID in the Rockchip clk driver
      
         - Use clk kunit helpers in the clk tests so that memory isn't leaked
           after the test concludes"
      
      * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
        clk: test: Fix some memory leaks
        clk: rockchip: fix finding of maximum clock ID
        clk: samsung: Fix out-of-bound access of of_match_node()
      d4b82e58
    • Dave Airlie's avatar
      Merge tag 'drm-misc-fixes-2024-10-17' of... · 49ff3e79
      Dave Airlie authored
      Merge tag 'drm-misc-fixes-2024-10-17' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
      
      Short summary of fixes pull:
      
      ast:
      - Clear EDID on unplugged connectors
      
      host1x:
      - Fix boot on Tegra186
      - Set DMA parameters
      
      mgag200:
      - Revert VBLANK support
      
      panel:
      - himax-hx83192: Adjust power and gamma
      
      qaic:
      - Sgtable loop fixes
      
      vmwgfx:
      - Limit display layout allocatino size
      - Handle allocation errors in connector checks
      - Clean up KMS code for 2d-only setup
      - Report surface-check errors correctly
      - Remove NULL test around kvfree()
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      
      From: Thomas Zimmermann <tzimmermann@suse.de>
      Link: https://patchwork.freedesktop.org/patch/msgid/20241017115516.GA196624@linux.fritz.box
      49ff3e79
    • Dave Airlie's avatar
      Merge tag 'drm-intel-fixes-2024-10-17' of... · 7626b4e9
      Dave Airlie authored
      Merge tag 'drm-intel-fixes-2024-10-17' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes
      
      - Two DP bandwidth related MST fixes
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/ZxDLdML9Dwqkb1AW@jlahtine-mobl.ger.corp.intel.com
      7626b4e9
    • Dave Airlie's avatar
      Merge tag 'amd-drm-fixes-6.12-2024-10-16' of... · 01541a87
      Dave Airlie authored
      Merge tag 'amd-drm-fixes-6.12-2024-10-16' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
      
      amd-drm-fixes-6.12-2024-10-16:
      
      amdgpu:
      - SR-IOV fix
      - CS chunk handling fix
      - MES fixes
      - SMU13 fixes
      
      amdkfd:
      - VRAM usage reporting fix
      
      radeon:
      - Fix possible_clones handling
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      
      From: Alex Deucher <alexander.deucher@amd.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20241016200514.3520286-1-alexander.deucher@amd.com
      01541a87
    • Andrii Nakryiko's avatar
      lib/buildid: Handle memfd_secret() files in build_id_parse() · 5ac9b4e9
      Andrii Nakryiko authored
      >From memfd_secret(2) manpage:
      
        The memory areas backing the file created with memfd_secret(2) are
        visible only to the processes that have access to the file descriptor.
        The memory region is removed from the kernel page tables and only the
        page tables of the processes holding the file descriptor map the
        corresponding physical memory. (Thus, the pages in the region can't be
        accessed by the kernel itself, so that, for example, pointers to the
        region can't be passed to system calls.)
      
      We need to handle this special case gracefully in build ID fetching
      code. Return -EFAULT whenever secretmem file is passed to build_id_parse()
      family of APIs. Original report and repro can be found in [0].
      
        [0] https://lore.kernel.org/bpf/ZwyG8Uro%2FSyTXAni@ly-workstation/
      
      Fixes: de3ec364 ("lib/buildid: add single folio-based file reader abstraction")
      Reported-by: default avatarYi Lai <yi1.lai@intel.com>
      Suggested-by: default avatarShakeel Butt <shakeel.butt@linux.dev>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarShakeel Butt <shakeel.butt@linux.dev>
      Link: https://lore.kernel.org/bpf/20241017175431.6183-A-hca@linux.ibm.com
      Link: https://lore.kernel.org/bpf/20241017174713.2157873-1-andrii@kernel.org
      5ac9b4e9
    • Jens Axboe's avatar
      Merge tag 'nvme-6.12-2024-10-18' of git://git.infradead.org/nvme into block-6.12 · de7007e9
      Jens Axboe authored
      Pull NVMe fixes from Keith:
      
      "nvme fixes for Linux 6.12
      
       - Fix target passthrough identifier (Nilay)
       - Fix tcp locking (Hannes)
       - Replace list with sbitmap for tracking RDMA rsp tags (Guixen)
       - Remove unnecessary fallthrough statements (Tokunori)
       - Remove ready-without-media support (Greg)
       - Fix multipath partition scan deadlock (Keith)
       - Fix concurrent PCI reset and remove queue mapping (Maurizio)
       - Fabrics shutdown fixes (Nilay)"
      
      * tag 'nvme-6.12-2024-10-18' of git://git.infradead.org/nvme:
        nvme: use helper nvme_ctrl_state in nvme_keep_alive_finish function
        nvme: make keep-alive synchronous operation
        nvme-loop: flush off pending I/O while shutting down loop controller
        nvme-pci: fix race condition between reset and nvme_dev_disable()
        nvme-multipath: defer partition scanning
        nvme: disable CC.CRIME (NVME_CC_CRIME)
        nvme: delete unnecessary fallthru comment
        nvmet-rdma: use sbitmap to replace rsp free list
        nvme: tcp: avoid race between queue_lock lock and destroy
        nvmet-passthru: clear EUID/NGUID/UUID while using loop target
        block: fix blk_rq_map_integrity_sg kernel-doc
      de7007e9
    • Luca Boccassi's avatar
      ipe: allow secondary and platform keyrings to install/update policies · 02e2f9aa
      Luca Boccassi authored
      The current policy management makes it impossible to use IPE
      in a general purpose distribution. In such cases the users are not
      building the kernel, the distribution is, and access to the private
      key included in the trusted keyring is, for obvious reason, not
      available.
      This means that users have no way to enable IPE, since there will
      be no built-in generic policy, and no access to the key to sign
      updates validated by the trusted keyring.
      
      Just as we do for dm-verity, kernel modules and more, allow the
      secondary and platform keyrings to also validate policies. This
      allows users enrolling their own keys in UEFI db or MOK to also
      sign policies, and enroll them. This makes it sensible to enable
      IPE in general purpose distributions, as it becomes usable by
      any user wishing to do so. Keys in these keyrings can already
      load kernels and kernel modules, so there is no security
      downgrade.
      
      Add a kconfig each, like dm-verity does, but default to enabled if
      the dependencies are available.
      Signed-off-by: default avatarLuca Boccassi <bluca@debian.org>
      Reviewed-by: default avatarSerge Hallyn <serge@hallyn.com>
      [FW: fixed some style issues]
      Signed-off-by: default avatarFan Wu <wufan@kernel.org>
      02e2f9aa
    • Luca Boccassi's avatar
      ipe: also reject policy updates with the same version · 5ceecb30
      Luca Boccassi authored
      Currently IPE accepts an update that has the same version as the policy
      being updated, but it doesn't make it a no-op nor it checks that the
      old and new policyes are the same. So it is possible to change the
      content of a policy, without changing its version. This is very
      confusing from userspace when managing policies.
      Instead change the update logic to reject updates that have the same
      version with ESTALE, as that is much clearer and intuitive behaviour.
      Signed-off-by: default avatarLuca Boccassi <bluca@debian.org>
      Reviewed-by: default avatarSerge Hallyn <serge@hallyn.com>
      Signed-off-by: default avatarFan Wu <wufan@kernel.org>
      5ceecb30
    • Luca Boccassi's avatar
      ipe: return -ESTALE instead of -EINVAL on update when new policy has a lower version · 57994189
      Luca Boccassi authored
      When loading policies in userspace we want a recognizable error when an
      update attempts to use an old policy, as that is an error that needs
      to be treated differently from an invalid policy. Use -ESTALE as it is
      clear enough for an update mechanism.
      Signed-off-by: default avatarLuca Boccassi <bluca@debian.org>
      Reviewed-by: default avatarSerge Hallyn <serge@hallyn.com>
      Signed-off-by: default avatarFan Wu <wufan@kernel.org>
      57994189
    • Nilay Shroff's avatar
      nvme: use helper nvme_ctrl_state in nvme_keep_alive_finish function · 599d9f3a
      Nilay Shroff authored
      We no more need acquiring ctrl->lock before accessing the
      NVMe controller state and instead we can now use the helper
      nvme_ctrl_state. So replace the use of ctrl->lock from
      nvme_keep_alive_finish function with nvme_ctrl_state call.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarNilay Shroff <nilay@linux.ibm.com>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      599d9f3a
    • Nilay Shroff's avatar
      nvme: make keep-alive synchronous operation · d0692367
      Nilay Shroff authored
      The nvme keep-alive operation, which executes at a periodic interval,
      could potentially sneak in while shutting down a fabric controller.
      This may lead to a race between the fabric controller admin queue
      destroy code path (invoked while shutting down controller) and hw/hctx
      queue dispatcher called from the nvme keep-alive async request queuing
      operation. This race could lead to the kernel crash shown below:
      
      Call Trace:
          autoremove_wake_function+0x0/0xbc (unreliable)
          __blk_mq_sched_dispatch_requests+0x114/0x24c
          blk_mq_sched_dispatch_requests+0x44/0x84
          blk_mq_run_hw_queue+0x140/0x220
          nvme_keep_alive_work+0xc8/0x19c [nvme_core]
          process_one_work+0x200/0x4e0
          worker_thread+0x340/0x504
          kthread+0x138/0x140
          start_kernel_thread+0x14/0x18
      
      While shutting down fabric controller, if nvme keep-alive request sneaks
      in then it would be flushed off. The nvme_keep_alive_end_io function is
      then invoked to handle the end of the keep-alive operation which
      decrements the admin->q_usage_counter and assuming this is the last/only
      request in the admin queue then the admin->q_usage_counter becomes zero.
      If that happens then blk-mq destroy queue operation (blk_mq_destroy_
      queue()) which could be potentially running simultaneously on another
      cpu (as this is the controller shutdown code path) would forward
      progress and deletes the admin queue. So, now from this point onward
      we are not supposed to access the admin queue resources. However the
      issue here's that the nvme keep-alive thread running hw/hctx queue
      dispatch operation hasn't yet finished its work and so it could still
      potentially access the admin queue resource while the admin queue had
      been already deleted and that causes the above crash.
      
      This fix helps avoid the observed crash by implementing keep-alive as a
      synchronous operation so that we decrement admin->q_usage_counter only
      after keep-alive command finished its execution and returns the command
      status back up to its caller (blk_execute_rq()). This would ensure that
      fabric shutdown code path doesn't destroy the fabric admin queue until
      keep-alive request finished execution and also keep-alive thread is not
      running hw/hctx queue dispatch operation.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarNilay Shroff <nilay@linux.ibm.com>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      d0692367