- 10 Dec, 2021 9 commits
-
-
Mark Rutland authored
Now that open-coded stack unwinds have been converted to arch_stack_walk(), we no longer need to expose any of unwind_frame(), walk_stackframe(), or start_backtrace() outside of stacktrace.c. Make those functions private to stacktrace.c, removing their prototypes from <asm/stacktrace.h> and marking them static. Signed-off-by:
Mark Rutland <mark.rutland@arm.com> Reviewed-by:
Mark Brown <broonie@kernel.org> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Link: https://lore.kernel.org/r/20211129142849.3056714-10-mark.rutland@arm.comSigned-off-by:
Catalin Marinas <catalin.marinas@arm.com>
-
Madhavan T. Venkataraman authored
To enable RELIABLE_STACKTRACE and LIVEPATCH on arm64, we need to substantially rework arm64's unwinding code. As part of this, we want to minimize the set of unwind interfaces we expose, and avoid open-coding of unwind logic. Currently, dump_backtrace() walks the stack of the current task or a blocked task by calling stact_backtrace() and iterating unwind steps using unwind_frame(). This can be written more simply in terms of arch_stack_walk(), considering three distinct cases: 1) When unwinding a blocked task, start_backtrace() is called with the blocked task's saved PC and FP, and the unwind proceeds immediately from this point without skipping any entries. This is functionally equivalent to calling arch_stack_walk() with the blocked task, which will start with the task's saved PC and FP. There is no functional change to this case. 2) When unwinding the current task without regs, start_backtrace() is called with dump_backtrace() as the PC and __builtin_frame_address(0) as the next frame, and the unwind proceeds immediately without skipping. This is *almost* functionally equivalent to calling arch_stack_walk() for the current task, which will start with its caller (i.e. an offset into dump_backtrace()) as the PC, and the callers frame record as the next frame. The only difference being that dump_backtrace() will be reported with an offset (which is strictly more correct than currently). Otherwise there is no functional cahnge to this case. 3) When unwinding the current task with regs, start_backtrace() is called with dump_backtrace() as the PC and __builtin_frame_address(0) as the next frame, and the unwind is performed silently until the next frame is the frame pointed to by regs->fp. Reporting starts from regs->pc and continues from the frame in regs->fp. Historically, this pre-unwind was necessary to correctly record return addresses rewritten by the ftrace graph calller, but this is no longer necessary as these are now recovered using the FP since commit: c6d3cd32 ("arm64: ftrace: use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR") This pre-unwind is not necessary to recover return addresses rewritten by kretprobes, which historically were not recovered, and are now recovered using the FP since commit: cd9bc2c9 ("arm64: Recover kretprobe modified return address in stacktrace") Thus, this is functionally equivalent to calling arch_stack_walk() with the current task and regs, which will start with regs->pc as the PC and regs->fp as the next frame, without a pre-unwind. This patch makes dump_backtrace() use arch_stack_walk(). This simplifies dump_backtrace() and will permit subsequent changes to the unwind code. Aside from the improved reporting when unwinding current without regs, there should be no functional change as a result of this patch. Signed-off-by:
Madhavan T. Venkataraman <madvenka@linux.microsoft.com> [Mark: elaborate commit message] Signed-off-by:
Mark Rutland <mark.rutland@arm.com> Reviewed-by:
Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20211129142849.3056714-9-mark.rutland@arm.comSigned-off-by:
Catalin Marinas <catalin.marinas@arm.com>
-
Madhavan T. Venkataraman authored
To enable RELIABLE_STACKTRACE and LIVEPATCH on arm64, we need to substantially rework arm64's unwinding code. As part of this, we want to minimize the set of unwind interfaces we expose, and avoid open-coding of unwind logic outside of stacktrace.c. Currently profile_pc() walks the stack of an interrupted context by calling start_backtrace() with the context's PC and FP, and iterating unwind steps using walk_stackframe(). This is functionally equivalent to calling arch_stack_walk() with the interrupted context's pt_regs, which will start with the PC and FP from the regs. Make profile_pc() use arch_stack_walk(). This simplifies profile_pc(), and in future will alow us to make walk_stackframe() private to stacktrace.c. At the same time, we remove the early return for when regs->pc is not in lock functions, as this will be handled by the first call to the profile_pc_cb() callback. There should be no functional change as a result of this patch. Signed-off-by:
Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Reviewed-by:
Mark Rutland <mark.rutland@arm.com> [Mark: remove early return, elaborate commit message, fix includes] Signed-off-by:
Mark Rutland <mark.rutland@arm.com> Reviewed-by:
Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20211129142849.3056714-8-mark.rutland@arm.comSigned-off-by:
Catalin Marinas <catalin.marinas@arm.com>
-
Madhavan T. Venkataraman authored
To enable RELIABLE_STACKTRACE and LIVEPATCH on arm64, we need to substantially rework arm64's unwinding code. As part of this, we want to minimize the set of unwind interfaces we expose, and avoid open-coding of unwind logic outside of stacktrace.c. Currently return_address() walks the stack of the current task by calling start_backtrace() with return_address as the PC and the frame pointer of return_address() as the next frame, iterating unwind steps using walk_stackframe(). This is functionally equivalent to calling arch_stack_walk() for the current stack, which will start from its caller (i.e. return_address()) as the PC and it's caller's frame record as the next frame. Make return_address() use arch_stackwalk(). This simplifies return_address(), and in future will alow us to make walk_stackframe() private to stacktrace.c. There should be no functional change as a result of this patch. Signed-off-by:
Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Tested-by:
Mark Rutland <mark.rutland@arm.com> Reviewed-by:
Mark Brown <broonie@kernel.org> Reviewed-by:
Mark Rutland <mark.rutland@arm.com> [Mark: elaborate commit message, fix includes] Signed-off-by:
Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20211129142849.3056714-7-mark.rutland@arm.comSigned-off-by:
Catalin Marinas <catalin.marinas@arm.com>
-
Madhavan T. Venkataraman authored
To enable RELIABLE_STACKTRACE and LIVEPATCH on arm64, we need to substantially rework arm64's unwinding code. As part of this, we want to minimize the set of unwind interfaces we expose, and avoid open-coding of unwind logic outside of stacktrace.c. Currently, __get_wchan() walks the stack of a blocked task by calling start_backtrace() with the task's saved PC and FP values, and iterating unwind steps using unwind_frame(). The initialization is functionally equivalent to calling arch_stack_walk() with the blocked task, which will start with the task's saved PC and FP values. Currently __get_wchan() always performs an initial unwind step, which will stkip __switch_to(), but as this is now marked as a __sched function, this no longer needs special handling and will be skipped in the same way as other sched functions. Make __get_wchan() use arch_stack_walk(). This simplifies __get_wchan(), and in future will alow us to make unwind_frame() private to stacktrace.c. At the same time, we can simplify the try_get_task_stack() check and avoid the unnecessary `stack_page` variable. The change to the skipping logic means we may terminate one frame earlier than previously where there are an excessive number of sched functions in the trace, but this isn't seen in practice, and wchan is best-effort anyway, so this should not be a problem. Other than the above, there should be no functional change as a result of this patch. Signed-off-by:
Madhavan T. Venkataraman <madvenka@linux.microsoft.com> [Mark: rebase atop wchan changes, elaborate commit message, fix includes] Signed-off-by:
Mark Rutland <mark.rutland@arm.com> Reviewed-by:
Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20211129142849.3056714-6-mark.rutland@arm.comSigned-off-by:
Catalin Marinas <catalin.marinas@arm.com>
-
Madhavan T. Venkataraman authored
To enable RELIABLE_STACKTRACE and LIVEPATCH on arm64, we need to substantially rework arm64's unwinding code. As part of this, we want to minimize the set of unwind interfaces we expose, and avoid open-coding of unwind logic outside of stacktrace.c. Currently perf_callchain_kernel() walks the stack of an interrupted context by calling start_backtrace() with the context's PC and FP, and iterating unwind steps using walk_stackframe(). This is functionally equivalent to calling arch_stack_walk() with the interrupted context's pt_regs, which will start with the PC and FP from the regs. Make perf_callchain_kernel() use arch_stack_walk(). This simplifies perf_callchain_kernel(), and in future will alow us to make walk_stackframe() private to stacktrace.c. At the same time, we update the callchain_trace() callback to check the return value of perf_callchain_store(), which indicates whether there is space for any further entries. When a non-zero value is returned, further calls will be ignored, and are redundant, so we can stop the unwind at this point. We also remove the stale and confusing comment for callchain_trace. There should be no functional change as a result of this patch. Signed-off-by:
Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Tested-by:
Mark Rutland <mark.rutland@arm.com> Reviewed-by:
Mark Brown <broonie@kernel.org> Reviewed-by:
Mark Rutland <mark.rutland@arm.com> [Mark: elaborate commit message, remove comment, fix includes] Signed-off-by:
Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20211129142849.3056714-5-mark.rutland@arm.comSigned-off-by:
Catalin Marinas <catalin.marinas@arm.com>
-
Mark Rutland authored
Unlike most architectures (and only in keeping with powerpc), arm64 has a non __sched() function on the path to our cpu_switch_to() assembly function. It is expected that for a blocked task, in_sched_functions() can be used to skip all functions between the raw context switch assembly and the scheduler functions that call into __switch_to(). This is the behaviour expected by stack_trace_consume_entry_nosched(), and the behaviour we'd like to have such that we an simplify arm64's __get_wchan() implementation to use arch_stack_walk(). This patch mark's arm64's __switch_to as __sched. This *will not* change the behaviour of arm64's current __get_wchan() implementation, which always performs an initial unwind step which skips __switch_to(). This *will* change the behaviour of stack_trace_consume_entry_nosched() and stack_trace_save_tsk() to match their expected behaviour on blocked tasks, skipping all scheduler-internal functions including __switch_to(). Other than the above, there should be no functional change as a result of this patch. Signed-off-by:
Mark Rutland <mark.rutland@arm.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Reviewed-by:
Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20211129142849.3056714-4-mark.rutland@arm.comSigned-off-by:
Catalin Marinas <catalin.marinas@arm.com>
-
Mark Rutland authored
We added stack_info::kr_cur in commit: cd9bc2c9 ("arm64: Recover kretprobe modified return address in stacktrace") ... but didn't add anything in the corresponding comment block. For consistency, add a corresponding comment. Signed-off-by:
Mark Rutland <mark.rutland@arm.com> Reviwed-by:
Mark Brown <broonie@kernel.org> Reviewed-by:
Masami Hiramatsu <mhiramat@kernel.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20211129142849.3056714-3-mark.rutland@arm.comSigned-off-by:
Catalin Marinas <catalin.marinas@arm.com>
-
Peter Zijlstra authored
Make arch_stack_walk() available for ARCH_STACKWALK architectures without it being entangled in STACKTRACE. Link: https://lore.kernel.org/lkml/20211022152104.356586621@infradead.org/Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> [Mark: rebase, drop unnecessary arm change] Signed-off-by:
Mark Rutland <mark.rutland@arm.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Borislav Petkov <bp@alien8.de> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20211129142849.3056714-2-mark.rutland@arm.comSigned-off-by:
Catalin Marinas <catalin.marinas@arm.com>
-
- 28 Nov, 2021 8 commits
-
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds authored
Pull vhost,virtio,vdpa bugfixes from Michael Tsirkin: "Misc fixes all over the place. Revert of virtio used length validation series: the approach taken does not seem to work, breaking too many guests in the process. We'll need to do length validation using some other approach" [ This merge also ends up reverting commit f7a36b03 ("vsock/virtio: suppress used length validation"), which came in through the networking tree in the meantime, and was part of that whole used length validation series - Linus ] * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vdpa_sim: avoid putting an uninitialized iova_domain vhost-vdpa: clean irqs before reseting vdpa device virtio-blk: modify the value type of num in virtio_queue_rq() vhost/vsock: cleanup removing `len` variable vhost/vsock: fix incorrect used length reported to the guest Revert "virtio_ring: validate used buffer length" Revert "virtio-net: don't let virtio core to validate used length" Revert "virtio-blk: don't let virtio core to validate used length" Revert "virtio-scsi: don't let virtio core to validate used buffer length"
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 build fix from Thomas Gleixner: "A single fix for a missing __init annotation of prepare_command_line()" * tag 'x86-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Mark prepare_command_line() __init
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull scheduler fix from Thomas Gleixner: "A single scheduler fix to ensure that there is no stale KASAN shadow state left on the idle task's stack when a CPU is brought up after it was brought down before" * tag 'sched-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/scs: Reset task stack state in bringup_cpu()
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull perf fix from Thomas Gleixner: "A single fix for perf to prevent it from sending SIGTRAP to another task from a trace point event as it's not possible to deliver a synchronous signal to a different task from there" * tag 'perf-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Ignore sigtrap for tracepoints destined for other tasks
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull locking fixes from Thomas Gleixner: "Two regression fixes for reader writer semaphores: - Plug a race in the lock handoff which is caused by inconsistency of the reader and writer path and can lead to corruption of the underlying counter. - down_read_trylock() is suboptimal when the lock is contended and multiple readers trylock concurrently. That's due to the initial value being read non-atomically which results in at least two compare exchange loops. Making the initial readout atomic reduces this significantly. Whith 40 readers by 11% in a benchmark which enforces contention on mmap_sem" * tag 'locking-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rwsem: Optimize down_read_trylock() under highly contended case locking/rwsem: Make handoff bit handling more consistent
-
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-traceLinus Torvalds authored
Pull another tracing fix from Steven Rostedt: "Fix the fix of pid filtering The setting of the pid filtering flag tested the "trace only this pid" case twice, and ignored the "trace everything but this pid" case. The 5.15 kernel does things a little differently due to the new sparse pid mask introduced in 5.16, and as the bug was discovered running the 5.15 kernel, and the first fix was initially done for that kernel, that fix handled both cases (only pid and all but pid), but the forward port to 5.16 created this bug" * tag 'trace-v5.16-rc2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Test the 'Do not trace this pid' case in create event
-
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommuLinus Torvalds authored
Pull iommu fixes from Joerg Roedel: - Intel VT-d fixes: - Remove unused PASID_DISABLED - Fix RCU locking - Fix for the unmap_pages call-back - Rockchip RK3568 address mask fix - AMD IOMMUv2 log message clarification * tag 'iommu-fixes-v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/vt-d: Fix unmap_pages support iommu/vt-d: Fix an unbalanced rcu_read_lock/rcu_read_unlock() iommu/rockchip: Fix PAGE_DESC_HI_MASKs for RK3568 iommu/amd: Clarify AMD IOMMUv2 initialization messages iommu/vt-d: Remove unused PASID_DISABLED
-
- 27 Nov, 2021 17 commits
-
-
git://git.samba.org/ksmbdLinus Torvalds authored
Pull ksmbd fixes from Steve French: "Five ksmbd server fixes, four of them for stable: - memleak fix - fix for default data stream on filesystems that don't support xattr - error logging fix - session setup fix - minor doc cleanup" * tag '5.16-rc2-ksmbd-fixes' of git://git.samba.org/ksmbd: ksmbd: fix memleak in get_file_stream_info() ksmbd: contain default data stream even if xattr is empty ksmbd: downgrade addition info error msg to debug in smb2_get_info_sec() docs: filesystem: cifs: ksmbd: Fix small layout issues ksmbd: Fix an error handling path in 'smb2_sess_setup()'
-
Guenter Roeck authored
Use the architecture independent Kconfig option PAGE_SIZE_LESS_THAN_64KB to indicate that VMXNET3 requires a page size smaller than 64kB. Signed-off-by:
Guenter Roeck <linux@roeck-us.net> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Guenter Roeck authored
NTFS_RW code allocates page size dependent arrays on the stack. This results in build failures if the page size is 64k or larger. fs/ntfs/aops.c: In function 'ntfs_write_mst_block': fs/ntfs/aops.c:1311:1: error: the frame size of 2240 bytes is larger than 2048 bytes Since commit f22969a6 ("powerpc/64s: Default to 64K pages for 64 bit book3s") this affects ppc:allmodconfig builds, but other architectures supporting page sizes of 64k or larger are also affected. Increasing the maximum frame size for affected architectures just to silence this error does not really help. The frame size would have to be set to a really large value for 256k pages. Also, a large frame size could potentially result in stack overruns in this code and elsewhere and is therefore not desirable. Make NTFS_RW dependent on page sizes smaller than 64k instead. Signed-off-by:
Guenter Roeck <linux@roeck-us.net> Cc: Anton Altaparmakov <anton@tuxera.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Guenter Roeck authored
NTFS_RW and VMXNET3 require a page size smaller than 64kB. Add generic Kconfig option for use outside architecture code to avoid architecture specific Kconfig options in that code. Suggested-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Guenter Roeck <linux@roeck-us.net> Cc: Anton Altaparmakov <anton@tuxera.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Steven Rostedt (VMware) authored
When creating a new event (via a module, kprobe, eprobe, etc), the descriptors that are created must add flags for pid filtering if an instance has pid filtering enabled, as the flags are used at the time the event is executed to know if pid filtering should be done or not. The "Only trace this pid" case was added, but a cut and paste error made that case checked twice, instead of checking the "Trace all but this pid" case. Link: https://lore.kernel.org/all/202111280401.qC0z99JB-lkp@intel.com/ Fixes: 6cb20650 ("tracing: Check pid filtering when creating events") Reported-by:
kernel test robot <lkp@intel.com> Signed-off-by:
Steven Rostedt (VMware) <rostedt@goodmis.org>
-
git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds authored
Pull xfs fixes from Darrick Wong: "Fixes for a resource leak and a build robot complaint about totally dead code: - Fix buffer resource leak that could lead to livelock on corrupt fs. - Remove unused function xfs_inew_wait to shut up the build robots" * tag 'xfs-5.16-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: remove xfs_inew_wait xfs: Fix the free logic of state in xfs_attr_node_hasname
-
git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds authored
Pull iomap fixes from Darrick Wong: "A single iomap bug fix and a cleanup for 5.16-rc2. The bug fix changes how iomap deals with reading from an inline data region -- whereas the current code (incorrectly) lets the iomap read iter try for more bytes after reading the inline region (which zeroes the rest of the page!) and hopes the next iteration terminates, we surveyed the inlinedata implementations and realized that all inlinedata implementations also require that the inlinedata region end at EOF, so we can simply terminate the read. The second patch documents these assumptions in the code so that they're not subtle implications anymore, and cleans up some of the grosser parts of that function. Summary: - Fix an accounting problem where unaligned inline data reads can run off the end of the read iomap iterator. iomap has historically required that inline data mappings only exist at the end of a file, though this wasn't documented anywhere. - Document iomap_read_inline_data and change its return type to be appropriate for the information that it's actually returning" * tag 'iomap-5.16-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: iomap: iomap_read_inline_data cleanup iomap: Fix inline extent handling in iomap_readpage
-
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-traceLinus Torvalds authored
Pull tracing fixes from Steven Rostedt: "Two fixes to event pid filtering: - Make sure newly created events reflect the current state of pid filtering - Take pid filtering into account when recording trigger events. (Also clean up the if statement to be cleaner)" * tag 'trace-v5.16-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix pid filtering when triggers are attached tracing: Check pid filtering when creating events
-
git://git.kernel.dk/linux-blockLinus Torvalds authored
Pull more io_uring fixes from Jens Axboe: "The locking fixup that was applied earlier this rc has both a deadlock and IRQ safety issue, let's get that ironed out before -rc3. This contains: - Link traversal locking fix (Pavel) - Cancelation fix (Pavel) - Relocate cond_resched() for huge buffer chain freeing, avoiding a softlockup warning (Ye) - Fix timespec validation (Ye)" * tag 'io_uring-5.16-2021-11-27' of git://git.kernel.dk/linux-block: io_uring: Fix undefined-behaviour in io_issue_sqe io_uring: fix soft lockup when call __io_remove_buffers io_uring: fix link traversal locking io_uring: fail cancellation for EXITING tasks
-
git://git.kernel.dk/linux-blockLinus Torvalds authored
Pull more block fixes from Jens Axboe: "Turns out that the flushing out of pending fixes before the Thanksgiving break didn't quite work out in terms of timing, so here's a followup set of fixes: - rq_qos_done() should be called regardless of whether or not we're the final put of the request, it's not related to the freeing of the state. This fixes an IO stall with wbt that a few users have reported, a regression in this release. - Only define zram_wb_devops if it's used, fixing a compilation warning for some compilers" * tag 'block-5.16-2021-11-27' of git://git.kernel.dk/linux-block: zram: only make zram_wb_devops for CONFIG_ZRAM_WRITEBACK block: call rq_qos_done() before ref check in batch completions
-
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds authored
Pull SCSI fixes from James Bottomley: "Twelve fixes, eleven in drivers (target, qla2xx, scsi_debug, mpt3sas, ufs). The core fix is a minor correction to the previous state update fix for the iscsi daemons" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: scsi_debug: Zero clear zones at reset write pointer scsi: core: sysfs: Fix setting device state to SDEV_RUNNING scsi: scsi_debug: Sanity check block descriptor length in resp_mode_select() scsi: target: configfs: Delete unnecessary checks for NULL scsi: target: core: Use RCU helpers for INQUIRY t10_alua_tg_pt_gp scsi: mpt3sas: Fix incorrect system timestamp scsi: mpt3sas: Fix system going into read-only mode scsi: mpt3sas: Fix kernel panic during drive powercycle test scsi: ufs: ufs-mediatek: Add put_device() after of_find_device_by_node() scsi: scsi_debug: Fix type in min_t to avoid stack OOB scsi: qla2xxx: edif: Fix off by one bug in qla_edif_app_getfcinfo() scsi: ufs: ufshpb: Fix warning in ufshpb_set_hpb_read_to_upiu()
-
git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds authored
Pull NFS client fixes from Trond Myklebust: "Highlights include: Stable fixes: - NFSv42: Fix pagecache invalidation after COPY/CLONE Bugfixes: - NFSv42: Don't fail clone() just because the server failed to return post-op attributes - SUNRPC: use different lockdep keys for INET6 and LOCAL - NFSv4.1: handle NFS4ERR_NOSPC from CREATE_SESSION - SUNRPC: fix header include guard in trace header" * tag 'nfs-for-5.16-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: use different lock keys for INET6 and LOCAL sunrpc: fix header include guard in trace header NFSv4.1: handle NFS4ERR_NOSPC by CREATE_SESSION NFSv42: Fix pagecache invalidation after COPY/CLONE NFS: Add a tracepoint to show the results of nfs_set_cache_invalid() NFSv42: Don't fail clone() unless the OP_CLONE operation failed
-
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofsLinus Torvalds authored
Pull erofs fix from Gao Xiang: "Fix an ABBA deadlock introduced by XArray conversion" * tag 'erofs-for-5.16-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: fix deadlock when shrink erofs slab
-
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds authored
Pull powerpc fixes from Michael Ellerman: "Fix KVM using a Power9 instruction on earlier CPUs, which could lead to the host SLB being incorrectly invalidated and a subsequent host crash. Fix kernel hardlockup on vmap stack overflow on 32-bit. Thanks to Christophe Leroy, Nicholas Piggin, and Fabiano Rosas" * tag 'powerpc-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/32: Fix hardlockup on vmap stack overflow KVM: PPC: Book3S HV: Prevent POWER7/8 TLB flush flushing SLB
-
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linuxLinus Torvalds authored
Pull MIPS fixes from Thomas Bogendoerfer: - build fix for ZSTD enabled configs - fix for preempt warning - fix for loongson FTLB detection - fix for page table level selection * tag 'mips-fixes_5.16_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: use 3-level pgtable for 64KB page size on MIPS_VA_BITS_48 MIPS: loongson64: fix FTLB configuration MIPS: Fix using smp_processor_id() in preemptible in show_cpuinfo() MIPS: boot/compressed/: add __ashldi3 to target for ZSTD compression
-
Ye Bin authored
We got issue as follows: ================================================================================ UBSAN: Undefined behaviour in ./include/linux/ktime.h:42:14 signed integer overflow: -4966321760114568020 * 1000000000 cannot be represented in type 'long long int' CPU: 1 PID: 2186 Comm: syz-executor.2 Not tainted 4.19.90+ #12 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x3f0 arch/arm64/kernel/time.c:78 show_stack+0x28/0x38 arch/arm64/kernel/traps.c:158 __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x170/0x1dc lib/dump_stack.c:118 ubsan_epilogue+0x18/0xb4 lib/ubsan.c:161 handle_overflow+0x188/0x1dc lib/ubsan.c:192 __ubsan_handle_mul_overflow+0x34/0x44 lib/ubsan.c:213 ktime_set include/linux/ktime.h:42 [inline] timespec64_to_ktime include/linux/ktime.h:78 [inline] io_timeout fs/io_uring.c:5153 [inline] io_issue_sqe+0x42c8/0x4550 fs/io_uring.c:5599 __io_queue_sqe+0x1b0/0xbc0 fs/io_uring.c:5988 io_queue_sqe+0x1ac/0x248 fs/io_uring.c:6067 io_submit_sqe fs/io_uring.c:6137 [inline] io_submit_sqes+0xed8/0x1c88 fs/io_uring.c:6331 __do_sys_io_uring_enter fs/io_uring.c:8170 [inline] __se_sys_io_uring_enter fs/io_uring.c:8129 [inline] __arm64_sys_io_uring_enter+0x490/0x980 fs/io_uring.c:8129 invoke_syscall arch/arm64/kernel/syscall.c:53 [inline] el0_svc_common+0x374/0x570 arch/arm64/kernel/syscall.c:121 el0_svc_handler+0x190/0x260 arch/arm64/kernel/syscall.c:190 el0_svc+0x10/0x218 arch/arm64/kernel/entry.S:1017 ================================================================================ As ktime_set only judge 'secs' if big than KTIME_SEC_MAX, but if we pass negative value maybe lead to overflow. To address this issue, we must check if 'sec' is negative. Signed-off-by:
Ye Bin <yebin10@huawei.com> Link: https://lore.kernel.org/r/20211118015907.844807-1-yebin10@huawei.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Ye Bin authored
I got issue as follows: [ 567.094140] __io_remove_buffers: [1]start ctx=0xffff8881067bf000 bgid=65533 buf=0xffff8881fefe1680 [ 594.360799] watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [kworker/u32:5:108] [ 594.364987] Modules linked in: [ 594.365405] irq event stamp: 604180238 [ 594.365906] hardirqs last enabled at (604180237): [<ffffffff93fec9bd>] _raw_spin_unlock_irqrestore+0x2d/0x50 [ 594.367181] hardirqs last disabled at (604180238): [<ffffffff93fbbadb>] sysvec_apic_timer_interrupt+0xb/0xc0 [ 594.368420] softirqs last enabled at (569080666): [<ffffffff94200654>] __do_softirq+0x654/0xa9e [ 594.369551] softirqs last disabled at (569080575): [<ffffffff913e1d6a>] irq_exit_rcu+0x1ca/0x250 [ 594.370692] CPU: 2 PID: 108 Comm: kworker/u32:5 Tainted: G L 5.15.0-next-20211112+ #88 [ 594.371891] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014 [ 594.373604] Workqueue: events_unbound io_ring_exit_work [ 594.374303] RIP: 0010:_raw_spin_unlock_irqrestore+0x33/0x50 [ 594.375037] Code: 48 83 c7 18 53 48 89 f3 48 8b 74 24 10 e8 55 f5 55 fd 48 89 ef e8 ed a7 56 fd 80 e7 02 74 06 e8 43 13 7b fd fb bf 01 00 00 00 <e8> f8 78 474 [ 594.377433] RSP: 0018:ffff888101587a70 EFLAGS: 00000202 [ 594.378120] RAX: 0000000024030f0d RBX: 0000000000000246 RCX: 1ffffffff2f09106 [ 594.379053] RDX: 0000000000000000 RSI: ffffffff9449f0e0 RDI: 0000000000000001 [ 594.379991] RBP: ffffffff9586cdc0 R08: 0000000000000001 R09: fffffbfff2effcab [ 594.380923] R10: ffffffff977fe557 R11: fffffbfff2effcaa R12: ffff8881b8f3def0 [ 594.381858] R13: 0000000000000246 R14: ffff888153a8b070 R15: 0000000000000000 [ 594.382787] FS: 0000000000000000(0000) GS:ffff888399c00000(0000) knlGS:0000000000000000 [ 594.383851] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 594.384602] CR2: 00007fcbe71d2000 CR3: 00000000b4216000 CR4: 00000000000006e0 [ 594.385540] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 594.386474] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 594.387403] Call Trace: [ 594.387738] <TASK> [ 594.388042] find_and_remove_object+0x118/0x160 [ 594.389321] delete_object_full+0xc/0x20 [ 594.389852] kfree+0x193/0x470 [ 594.390275] __io_remove_buffers.part.0+0xed/0x147 [ 594.390931] io_ring_ctx_free+0x342/0x6a2 [ 594.392159] io_ring_exit_work+0x41e/0x486 [ 594.396419] process_one_work+0x906/0x15a0 [ 594.399185] worker_thread+0x8b/0xd80 [ 594.400259] kthread+0x3bf/0x4a0 [ 594.401847] ret_from_fork+0x22/0x30 [ 594.402343] </TASK> Message from syslogd@localhost at Nov 13 09:09:54 ... kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [kworker/u32:5:108] [ 596.793660] __io_remove_buffers: [2099199]start ctx=0xffff8881067bf000 bgid=65533 buf=0xffff8881fefe1680 We can reproduce this issue by follow syzkaller log: r0 = syz_io_uring_setup(0x401, &(0x7f0000000300), &(0x7f0000003000/0x2000)=nil, &(0x7f0000ff8000/0x4000)=nil, &(0x7f0000000280)=<r1=>0x0, &(0x7f0000000380)=<r2=>0x0) sendmsg$ETHTOOL_MSG_FEATURES_SET(0xffffffffffffffff, &(0x7f0000003080)={0x0, 0x0, &(0x7f0000003040)={&(0x7f0000000040)=ANY=[], 0x18}}, 0x0) syz_io_uring_submit(r1, r2, &(0x7f0000000240)=@IORING_OP_PROVIDE_BUFFERS={0x1f, 0x5, 0x0, 0x401, 0x1, 0x0, 0x100, 0x0, 0x1, {0xfffd}}, 0x0) io_uring_enter(r0, 0x3a2d, 0x0, 0x0, 0x0, 0x0) The reason above issue is 'buf->list' has 2,100,000 nodes, occupied cpu lead to soft lockup. To solve this issue, we need add schedule point when do while loop in '__io_remove_buffers'. After add schedule point we do regression, get follow data. [ 240.141864] __io_remove_buffers: [1]start ctx=0xffff888170603000 bgid=65533 buf=0xffff8881116fcb00 [ 268.408260] __io_remove_buffers: [1]start ctx=0xffff8881b92d2000 bgid=65533 buf=0xffff888130c83180 [ 275.899234] __io_remove_buffers: [2099199]start ctx=0xffff888170603000 bgid=65533 buf=0xffff8881116fcb00 [ 296.741404] __io_remove_buffers: [1]start ctx=0xffff8881b659c000 bgid=65533 buf=0xffff8881010fe380 [ 305.090059] __io_remove_buffers: [2099199]start ctx=0xffff8881b92d2000 bgid=65533 buf=0xffff888130c83180 [ 325.415746] __io_remove_buffers: [1]start ctx=0xffff8881b92d1000 bgid=65533 buf=0xffff8881a17d8f00 [ 333.160318] __io_remove_buffers: [2099199]start ctx=0xffff8881b659c000 bgid=65533 buf=0xffff8881010fe380 ... Fixes:8bab4c09("io_uring: allow conditional reschedule for intensive iterators") Signed-off-by:
Ye Bin <yebin10@huawei.com> Link: https://lore.kernel.org/r/20211122024737.2198530-1-yebin10@huawei.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 26 Nov, 2021 6 commits
-
-
Steven Rostedt (VMware) authored
If a event is filtered by pid and a trigger that requires processing of the event to happen is a attached to the event, the discard portion does not take the pid filtering into account, and the event will then be recorded when it should not have been. Cc: stable@vger.kernel.org Fixes: 3fdaf80f ("tracing: Implement event pid filtering") Signed-off-by:
Steven Rostedt (VMware) <rostedt@goodmis.org>
-
Alex Williamson authored
When supporting only the .map and .unmap callbacks of iommu_ops, the IOMMU driver can make assumptions about the size and alignment used for mappings based on the driver provided pgsize_bitmap. VT-d previously used essentially PAGE_MASK for this bitmap as any power of two mapping was acceptably filled by native page sizes. However, with the .map_pages and .unmap_pages interface we're now getting page-size and count arguments. If we simply combine these as (page-size * count) and make use of the previous map/unmap functions internally, any size and alignment assumptions are very different. As an example, a given vfio device assignment VM will often create a 4MB mapping at IOVA pfn [0x3fe00 - 0x401ff]. On a system that does not support IOMMU super pages, the unmap_pages interface will ask to unmap 1024 4KB pages at the base IOVA. dma_pte_clear_level() will recurse down to level 2 of the page table where the first half of the pfn range exactly matches the entire pte level. We clear the pte, increment the pfn by the level size, but (oops) the next pte is on a new page, so we exit the loop an pop back up a level. When we then update the pfn based on that higher level, we seem to assume that the previous pfn value was at the start of the level. In this case the level size is 256K pfns, which we add to the base pfn and get a results of 0x7fe00, which is clearly greater than 0x401ff, so we're done. Meanwhile we never cleared the ptes for the remainder of the range. When the VM remaps this range, we're overwriting valid ptes and the VT-d driver complains loudly, as reported by the user report linked below. The fix for this seems relatively simple, if each iteration of the loop in dma_pte_clear_level() is assumed to clear to the end of the level pte page, then our next pfn should be calculated from level_pfn rather than our working pfn. Fixes: 3f34f125 ("iommu/vt-d: Implement map/unmap_pages() iommu_ops callback") Reported-by:
Ajay Garg <ajaygargnsit@gmail.com> Signed-off-by:
Alex Williamson <alex.williamson@redhat.com> Tested-by:
Giovanni Cabiddu <giovanni.cabiddu@intel.com> Link: https://lore.kernel.org/all/20211002124012.18186-1-ajaygargnsit@gmail.com/ Link: https://lore.kernel.org/r/163659074748.1617923.12716161410774184024.stgit@omenSigned-off-by:
Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20211126135556.397932-3-baolu.lu@linux.intel.comSigned-off-by:
Joerg Roedel <jroedel@suse.de>
-
Christophe JAILLET authored
If we return -EOPNOTSUPP, the rcu lock remains lock. This is spurious. Go through the end of the function instead. This way, the missing 'rcu_read_unlock()' is called. Fixes: 7afd7f6a ("iommu/vt-d: Check FL and SL capability sanity in scalable mode") Signed-off-by:
Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/40cc077ca5f543614eab2a10e84d29dd190273f6.1636217517.git.christophe.jaillet@wanadoo.frSigned-off-by:
Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20211126135556.397932-2-baolu.lu@linux.intel.comSigned-off-by:
Joerg Roedel <jroedel@suse.de>
-
Alex Bee authored
With the submission of iommu driver for RK3568 a subtle bug was introduced: PAGE_DESC_HI_MASK1 and PAGE_DESC_HI_MASK2 have to be the other way arround - that leads to random errors, especially when addresses beyond 32 bit are used. Fix it. Fixes: c55356c5 ("iommu: rockchip: Add support for iommu v2") Signed-off-by:
Alex Bee <knaerzche@gmail.com> Tested-by:
Peter Geis <pgwipeout@gmail.com> Reviewed-by:
Heiko Stuebner <heiko@sntech.de> Tested-by:
Dan Johansen <strit@manjaro.org> Reviewed-by:
Benjamin Gaignard <benjamin.gaignard@collabora.com> Link: https://lore.kernel.org/r/20211124021325.858139-1-knaerzche@gmail.comSigned-off-by:
Joerg Roedel <jroedel@suse.de>
-
Joerg Roedel authored
The messages printed on the initialization of the AMD IOMMUv2 driver have caused some confusion in the past. Clarify the messages to lower the confusion in the future. Cc: stable@vger.kernel.org Signed-off-by:
Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/r/20211123105507.7654-3-joro@8bytes.org
-
Joerg Roedel authored
The macro is unused after commit 00ecd540 so it can be removed. Reported-by:
Linus Torvalds <torvalds@linux-foundation.org> Fixes: 00ecd540 ("iommu/vt-d: Clean up unused PASID updating functions") Signed-off-by:
Joerg Roedel <jroedel@suse.de> Reviewed-by:
Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20211123105507.7654-2-joro@8bytes.org
-