1. 10 Dec, 2021 9 commits
    • Mark Rutland's avatar
      arm64: Make some stacktrace functions private · d2d1d264
      Mark Rutland authored
      Now that open-coded stack unwinds have been converted to
      arch_stack_walk(), we no longer need to expose any of unwind_frame(),
      walk_stackframe(), or start_backtrace() outside of stacktrace.c.
      
      Make those functions private to stacktrace.c, removing their prototypes
      from <asm/stacktrace.h> and marking them static.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reviewed-by: default avatarMark Brown <broonie@kernel.org>
      Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
      Link: https://lore.kernel.org/r/20211129142849.3056714-10-mark.rutland@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      d2d1d264
    • Madhavan T. Venkataraman's avatar
      arm64: Make dump_backtrace() use arch_stack_walk() · 2dad6dc1
      Madhavan T. Venkataraman authored
      To enable RELIABLE_STACKTRACE and LIVEPATCH on arm64, we need to
      substantially rework arm64's unwinding code. As part of this, we want to
      minimize the set of unwind interfaces we expose, and avoid open-coding
      of unwind logic.
      
      Currently, dump_backtrace() walks the stack of the current task or a
      blocked task by calling stact_backtrace() and iterating unwind steps
      using unwind_frame(). This can be written more simply in terms of
      arch_stack_walk(), considering three distinct cases:
      
      1) When unwinding a blocked task, start_backtrace() is called with the
         blocked task's saved PC and FP, and the unwind proceeds immediately
         from this point without skipping any entries. This is functionally
         equivalent to calling arch_stack_walk() with the blocked task, which
         will start with the task's saved PC and FP.
      
         There is no functional change to this case.
      
      2) When unwinding the current task without regs, start_backtrace() is
         called with dump_backtrace() as the PC and __builtin_frame_address(0)
         as the next frame, and the unwind proceeds immediately without
         skipping. This is *almost* functionally equivalent to calling
         arch_stack_walk() for the current task, which will start with its
         caller (i.e. an offset into dump_backtrace()) as the PC, and the
         callers frame record as the next frame.
      
         The only difference being that dump_backtrace() will be reported with
         an offset (which is strictly more correct than currently). Otherwise
         there is no functional cahnge to this case.
      
      3) When unwinding the current task with regs, start_backtrace() is
         called with dump_backtrace() as the PC and __builtin_frame_address(0)
         as the next frame, and the unwind is performed silently until the
         next frame is the frame pointed to by regs->fp. Reporting starts
         from regs->pc and continues from the frame in regs->fp.
      
         Historically, this pre-unwind was necessary to correctly record
         return addresses rewritten by the ftrace graph calller, but this is
         no longer necessary as these are now recovered using the FP since
         commit:
      
         c6d3cd32 ("arm64: ftrace: use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR")
      
         This pre-unwind is not necessary to recover return addresses
         rewritten by kretprobes, which historically were not recovered, and
         are now recovered using the FP since commit:
      
         cd9bc2c9 ("arm64: Recover kretprobe modified return address in stacktrace")
      
         Thus, this is functionally equivalent to calling arch_stack_walk()
         with the current task and regs, which will start with regs->pc as the
         PC and regs->fp as the next frame, without a pre-unwind.
      
      This patch makes dump_backtrace() use arch_stack_walk(). This simplifies
      dump_backtrace() and will permit subsequent changes to the unwind code.
      
      Aside from the improved reporting when unwinding current without regs,
      there should be no functional change as a result of this patch.
      Signed-off-by: default avatarMadhavan T. Venkataraman <madvenka@linux.microsoft.com>
      [Mark: elaborate commit message]
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reviewed-by: default avatarMark Brown <broonie@kernel.org>
      Link: https://lore.kernel.org/r/20211129142849.3056714-9-mark.rutland@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      2dad6dc1
    • Madhavan T. Venkataraman's avatar
      arm64: Make profile_pc() use arch_stack_walk() · 22ecd975
      Madhavan T. Venkataraman authored
      To enable RELIABLE_STACKTRACE and LIVEPATCH on arm64, we need to
      substantially rework arm64's unwinding code. As part of this, we want to
      minimize the set of unwind interfaces we expose, and avoid open-coding
      of unwind logic outside of stacktrace.c.
      
      Currently profile_pc() walks the stack of an interrupted context by
      calling start_backtrace() with the context's PC and FP, and iterating
      unwind steps using walk_stackframe(). This is functionally equivalent to
      calling arch_stack_walk() with the interrupted context's pt_regs, which
      will start with the PC and FP from the regs.
      
      Make profile_pc() use arch_stack_walk(). This simplifies profile_pc(),
      and in future will alow us to make walk_stackframe() private to
      stacktrace.c.
      
      At the same time, we remove the early return for when regs->pc is not in
      lock functions, as this will be handled by the first call to the
      profile_pc_cb() callback.
      
      There should be no functional change as a result of this patch.
      Signed-off-by: default avatarMadhavan T. Venkataraman <madvenka@linux.microsoft.com>
      Reviewed-by: default avatarMark Rutland <mark.rutland@arm.com>
      [Mark: remove early return, elaborate commit message, fix includes]
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reviewed-by: default avatarMark Brown <broonie@kernel.org>
      Link: https://lore.kernel.org/r/20211129142849.3056714-8-mark.rutland@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      22ecd975
    • Madhavan T. Venkataraman's avatar
      arm64: Make return_address() use arch_stack_walk() · 39ef362d
      Madhavan T. Venkataraman authored
      To enable RELIABLE_STACKTRACE and LIVEPATCH on arm64, we need to
      substantially rework arm64's unwinding code. As part of this, we want to
      minimize the set of unwind interfaces we expose, and avoid open-coding
      of unwind logic outside of stacktrace.c.
      
      Currently return_address() walks the stack of the current task by
      calling start_backtrace() with return_address as the PC and the frame
      pointer of return_address() as the next frame, iterating unwind steps
      using walk_stackframe(). This is functionally equivalent to calling
      arch_stack_walk() for the current stack, which will start from its
      caller (i.e. return_address()) as the PC and it's caller's frame record
      as the next frame.
      
      Make return_address() use arch_stackwalk(). This simplifies
      return_address(), and in future will alow us to make walk_stackframe()
      private to stacktrace.c.
      
      There should be no functional change as a result of this patch.
      Signed-off-by: default avatarMadhavan T. Venkataraman <madvenka@linux.microsoft.com>
      Tested-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reviewed-by: default avatarMark Brown <broonie@kernel.org>
      Reviewed-by: default avatarMark Rutland <mark.rutland@arm.com>
      [Mark: elaborate commit message, fix includes]
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Link: https://lore.kernel.org/r/20211129142849.3056714-7-mark.rutland@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      39ef362d
    • Madhavan T. Venkataraman's avatar
      arm64: Make __get_wchan() use arch_stack_walk() · 4f62bb7c
      Madhavan T. Venkataraman authored
      To enable RELIABLE_STACKTRACE and LIVEPATCH on arm64, we need to
      substantially rework arm64's unwinding code. As part of this, we want to
      minimize the set of unwind interfaces we expose, and avoid open-coding
      of unwind logic outside of stacktrace.c.
      
      Currently, __get_wchan() walks the stack of a blocked task by calling
      start_backtrace() with the task's saved PC and FP values, and iterating
      unwind steps using unwind_frame(). The initialization is functionally
      equivalent to calling arch_stack_walk() with the blocked task, which
      will start with the task's saved PC and FP values.
      
      Currently __get_wchan() always performs an initial unwind step, which
      will stkip __switch_to(), but as this is now marked as a __sched
      function, this no longer needs special handling and will be skipped in
      the same way as other sched functions.
      
      Make __get_wchan() use arch_stack_walk(). This simplifies __get_wchan(),
      and in future will alow us to make unwind_frame() private to
      stacktrace.c. At the same time, we can simplify the try_get_task_stack()
      check and avoid the unnecessary `stack_page` variable.
      
      The change to the skipping logic means we may terminate one frame
      earlier than previously where there are an excessive number of sched
      functions in the trace, but this isn't seen in practice, and wchan is
      best-effort anyway, so this should not be a problem.
      
      Other than the above, there should be no functional change as a result
      of this patch.
      Signed-off-by: default avatarMadhavan T. Venkataraman <madvenka@linux.microsoft.com>
      [Mark: rebase atop wchan changes, elaborate commit message, fix includes]
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reviewed-by: default avatarMark Brown <broonie@kernel.org>
      Link: https://lore.kernel.org/r/20211129142849.3056714-6-mark.rutland@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      4f62bb7c
    • Madhavan T. Venkataraman's avatar
      arm64: Make perf_callchain_kernel() use arch_stack_walk() · ed876d35
      Madhavan T. Venkataraman authored
      To enable RELIABLE_STACKTRACE and LIVEPATCH on arm64, we need to
      substantially rework arm64's unwinding code. As part of this, we want to
      minimize the set of unwind interfaces we expose, and avoid open-coding
      of unwind logic outside of stacktrace.c.
      
      Currently perf_callchain_kernel() walks the stack of an interrupted
      context by calling start_backtrace() with the context's PC and FP, and
      iterating unwind steps using walk_stackframe(). This is functionally
      equivalent to calling arch_stack_walk() with the interrupted context's
      pt_regs, which will start with the PC and FP from the regs.
      
      Make perf_callchain_kernel() use arch_stack_walk(). This simplifies
      perf_callchain_kernel(), and in future will alow us to make
      walk_stackframe() private to stacktrace.c.
      
      At the same time, we update the callchain_trace() callback to check the
      return value of perf_callchain_store(), which indicates whether there is
      space for any further entries. When a non-zero value is returned,
      further calls will be ignored, and are redundant, so we can stop the
      unwind at this point.
      
      We also remove the stale and confusing comment for callchain_trace.
      
      There should be no functional change as a result of this patch.
      Signed-off-by: default avatarMadhavan T. Venkataraman <madvenka@linux.microsoft.com>
      Tested-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reviewed-by: default avatarMark Brown <broonie@kernel.org>
      Reviewed-by: default avatarMark Rutland <mark.rutland@arm.com>
      [Mark: elaborate commit message, remove comment, fix includes]
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Link: https://lore.kernel.org/r/20211129142849.3056714-5-mark.rutland@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      ed876d35
    • Mark Rutland's avatar
      arm64: Mark __switch_to() as __sched · 86bcbafc
      Mark Rutland authored
      Unlike most architectures (and only in keeping with powerpc), arm64 has
      a non __sched() function on the path to our cpu_switch_to() assembly
      function.
      
      It is expected that for a blocked task, in_sched_functions() can be used
      to skip all functions between the raw context switch assembly and the
      scheduler functions that call into __switch_to(). This is the behaviour
      expected by stack_trace_consume_entry_nosched(), and the behaviour we'd
      like to have such that we an simplify arm64's __get_wchan()
      implementation to use arch_stack_walk().
      
      This patch mark's arm64's __switch_to as __sched. This *will not* change
      the behaviour of arm64's current __get_wchan() implementation, which
      always performs an initial unwind step which skips __switch_to(). This
      *will* change the behaviour of stack_trace_consume_entry_nosched() and
      stack_trace_save_tsk() to match their expected behaviour on blocked
      tasks, skipping all scheduler-internal functions including
      __switch_to().
      
      Other than the above, there should be no functional change as a result
      of this patch.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Reviewed-by: default avatarMark Brown <broonie@kernel.org>
      Link: https://lore.kernel.org/r/20211129142849.3056714-4-mark.rutland@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      86bcbafc
    • Mark Rutland's avatar
      arm64: Add comment for stack_info::kr_cur · 1e5428b2
      Mark Rutland authored
      We added stack_info::kr_cur in commit:
      
        cd9bc2c9 ("arm64: Recover kretprobe modified return address in stacktrace")
      
      ... but didn't add anything in the corresponding comment block.
      
      For consistency, add a corresponding comment.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reviwed-by: default avatarMark Brown <broonie@kernel.org>
      Reviewed-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20211129142849.3056714-3-mark.rutland@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      1e5428b2
    • Peter Zijlstra's avatar
      arch: Make ARCH_STACKWALK independent of STACKTRACE · 1614b2b1
      Peter Zijlstra authored
      Make arch_stack_walk() available for ARCH_STACKWALK architectures
      without it being entangled in STACKTRACE.
      
      Link: https://lore.kernel.org/lkml/20211022152104.356586621@infradead.org/Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      [Mark: rebase, drop unnecessary arm change]
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Link: https://lore.kernel.org/r/20211129142849.3056714-2-mark.rutland@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      1614b2b1
  2. 28 Nov, 2021 8 commits
  3. 27 Nov, 2021 17 commits
    • Linus Torvalds's avatar
      Merge tag '5.16-rc2-ksmbd-fixes' of git://git.samba.org/ksmbd · 3498e7f2
      Linus Torvalds authored
      Pull ksmbd fixes from Steve French:
       "Five ksmbd server fixes, four of them for stable:
      
         - memleak fix
      
         - fix for default data stream on filesystems that don't support xattr
      
         - error logging fix
      
         - session setup fix
      
         - minor doc cleanup"
      
      * tag '5.16-rc2-ksmbd-fixes' of git://git.samba.org/ksmbd:
        ksmbd: fix memleak in get_file_stream_info()
        ksmbd: contain default data stream even if xattr is empty
        ksmbd: downgrade addition info error msg to debug in smb2_get_info_sec()
        docs: filesystem: cifs: ksmbd: Fix small layout issues
        ksmbd: Fix an error handling path in 'smb2_sess_setup()'
      3498e7f2
    • Guenter Roeck's avatar
      vmxnet3: Use generic Kconfig option for page size limit · 00169a92
      Guenter Roeck authored
      Use the architecture independent Kconfig option PAGE_SIZE_LESS_THAN_64KB
      to indicate that VMXNET3 requires a page size smaller than 64kB.
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      00169a92
    • Guenter Roeck's avatar
      fs: ntfs: Limit NTFS_RW to page sizes smaller than 64k · 4eec7faf
      Guenter Roeck authored
      NTFS_RW code allocates page size dependent arrays on the stack. This
      results in build failures if the page size is 64k or larger.
      
        fs/ntfs/aops.c: In function 'ntfs_write_mst_block':
        fs/ntfs/aops.c:1311:1: error:
      	the frame size of 2240 bytes is larger than 2048 bytes
      
      Since commit f22969a6 ("powerpc/64s: Default to 64K pages for 64 bit
      book3s") this affects ppc:allmodconfig builds, but other architectures
      supporting page sizes of 64k or larger are also affected.
      
      Increasing the maximum frame size for affected architectures just to
      silence this error does not really help.  The frame size would have to
      be set to a really large value for 256k pages.  Also, a large frame size
      could potentially result in stack overruns in this code and elsewhere
      and is therefore not desirable.  Make NTFS_RW dependent on page sizes
      smaller than 64k instead.
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Cc: Anton Altaparmakov <anton@tuxera.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4eec7faf
    • Guenter Roeck's avatar
      arch: Add generic Kconfig option indicating page size smaller than 64k · 1f0e290c
      Guenter Roeck authored
      NTFS_RW and VMXNET3 require a page size smaller than 64kB.  Add generic
      Kconfig option for use outside architecture code to avoid architecture
      specific Kconfig options in that code.
      Suggested-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Cc: Anton Altaparmakov <anton@tuxera.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1f0e290c
    • Steven Rostedt (VMware)'s avatar
      tracing: Test the 'Do not trace this pid' case in create event · 27ff768f
      Steven Rostedt (VMware) authored
      When creating a new event (via a module, kprobe, eprobe, etc), the
      descriptors that are created must add flags for pid filtering if an
      instance has pid filtering enabled, as the flags are used at the time the
      event is executed to know if pid filtering should be done or not.
      
      The "Only trace this pid" case was added, but a cut and paste error made
      that case checked twice, instead of checking the "Trace all but this pid"
      case.
      
      Link: https://lore.kernel.org/all/202111280401.qC0z99JB-lkp@intel.com/
      
      Fixes: 6cb20650 ("tracing: Check pid filtering when creating events")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      27ff768f
    • Linus Torvalds's avatar
      Merge tag 'xfs-5.16-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 4f0dda35
      Linus Torvalds authored
      Pull xfs fixes from Darrick Wong:
       "Fixes for a resource leak and a build robot complaint about totally
        dead code:
      
         - Fix buffer resource leak that could lead to livelock on corrupt fs.
      
         - Remove unused function xfs_inew_wait to shut up the build robots"
      
      * tag 'xfs-5.16-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: remove xfs_inew_wait
        xfs: Fix the free logic of state in xfs_attr_node_hasname
      4f0dda35
    • Linus Torvalds's avatar
      Merge tag 'iomap-5.16-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · adfb743a
      Linus Torvalds authored
      Pull iomap fixes from Darrick Wong:
       "A single iomap bug fix and a cleanup for 5.16-rc2.
      
        The bug fix changes how iomap deals with reading from an inline data
        region -- whereas the current code (incorrectly) lets the iomap read
        iter try for more bytes after reading the inline region (which zeroes
        the rest of the page!) and hopes the next iteration terminates, we
        surveyed the inlinedata implementations and realized that all
        inlinedata implementations also require that the inlinedata region end
        at EOF, so we can simply terminate the read.
      
        The second patch documents these assumptions in the code so that
        they're not subtle implications anymore, and cleans up some of the
        grosser parts of that function.
      
        Summary:
      
         - Fix an accounting problem where unaligned inline data reads can run
           off the end of the read iomap iterator. iomap has historically
           required that inline data mappings only exist at the end of a file,
           though this wasn't documented anywhere.
      
         - Document iomap_read_inline_data and change its return type to be
           appropriate for the information that it's actually returning"
      
      * tag 'iomap-5.16-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        iomap: iomap_read_inline_data cleanup
        iomap: Fix inline extent handling in iomap_readpage
      adfb743a
    • Linus Torvalds's avatar
      Merge tag 'trace-v5.16-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 86155d6b
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
       "Two fixes to event pid filtering:
      
         - Make sure newly created events reflect the current state of pid
           filtering
      
         - Take pid filtering into account when recording trigger events.
           (Also clean up the if statement to be cleaner)"
      
      * tag 'trace-v5.16-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing: Fix pid filtering when triggers are attached
        tracing: Check pid filtering when creating events
      86155d6b
    • Linus Torvalds's avatar
      Merge tag 'io_uring-5.16-2021-11-27' of git://git.kernel.dk/linux-block · 86799cdf
      Linus Torvalds authored
      Pull more io_uring fixes from Jens Axboe:
       "The locking fixup that was applied earlier this rc has both a deadlock
        and IRQ safety issue, let's get that ironed out before -rc3. This
        contains:
      
         - Link traversal locking fix (Pavel)
      
         - Cancelation fix (Pavel)
      
         - Relocate cond_resched() for huge buffer chain freeing, avoiding a
           softlockup warning (Ye)
      
         - Fix timespec validation (Ye)"
      
      * tag 'io_uring-5.16-2021-11-27' of git://git.kernel.dk/linux-block:
        io_uring: Fix undefined-behaviour in io_issue_sqe
        io_uring: fix soft lockup when call __io_remove_buffers
        io_uring: fix link traversal locking
        io_uring: fail cancellation for EXITING tasks
      86799cdf
    • Linus Torvalds's avatar
      Merge tag 'block-5.16-2021-11-27' of git://git.kernel.dk/linux-block · 650c8edf
      Linus Torvalds authored
      Pull more block fixes from Jens Axboe:
       "Turns out that the flushing out of pending fixes before the
        Thanksgiving break didn't quite work out in terms of timing, so here's
        a followup set of fixes:
      
         - rq_qos_done() should be called regardless of whether or not we're
           the final put of the request, it's not related to the freeing of
           the state. This fixes an IO stall with wbt that a few users have
           reported, a regression in this release.
      
         - Only define zram_wb_devops if it's used, fixing a compilation
           warning for some compilers"
      
      * tag 'block-5.16-2021-11-27' of git://git.kernel.dk/linux-block:
        zram: only make zram_wb_devops for CONFIG_ZRAM_WRITEBACK
        block: call rq_qos_done() before ref check in batch completions
      650c8edf
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 9e9fbe44
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Twelve fixes, eleven in drivers (target, qla2xx, scsi_debug, mpt3sas,
        ufs). The core fix is a minor correction to the previous state update
        fix for the iscsi daemons"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: scsi_debug: Zero clear zones at reset write pointer
        scsi: core: sysfs: Fix setting device state to SDEV_RUNNING
        scsi: scsi_debug: Sanity check block descriptor length in resp_mode_select()
        scsi: target: configfs: Delete unnecessary checks for NULL
        scsi: target: core: Use RCU helpers for INQUIRY t10_alua_tg_pt_gp
        scsi: mpt3sas: Fix incorrect system timestamp
        scsi: mpt3sas: Fix system going into read-only mode
        scsi: mpt3sas: Fix kernel panic during drive powercycle test
        scsi: ufs: ufs-mediatek: Add put_device() after of_find_device_by_node()
        scsi: scsi_debug: Fix type in min_t to avoid stack OOB
        scsi: qla2xxx: edif: Fix off by one bug in qla_edif_app_getfcinfo()
        scsi: ufs: ufshpb: Fix warning in ufshpb_set_hpb_read_to_upiu()
      9e9fbe44
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-5.16-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · 74139277
      Linus Torvalds authored
      Pull NFS client fixes from Trond Myklebust:
       "Highlights include:
      
        Stable fixes:
      
         - NFSv42: Fix pagecache invalidation after COPY/CLONE
      
        Bugfixes:
      
         - NFSv42: Don't fail clone() just because the server failed to return
           post-op attributes
      
         - SUNRPC: use different lockdep keys for INET6 and LOCAL
      
         - NFSv4.1: handle NFS4ERR_NOSPC from CREATE_SESSION
      
         - SUNRPC: fix header include guard in trace header"
      
      * tag 'nfs-for-5.16-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
        SUNRPC: use different lock keys for INET6 and LOCAL
        sunrpc: fix header include guard in trace header
        NFSv4.1: handle NFS4ERR_NOSPC by CREATE_SESSION
        NFSv42: Fix pagecache invalidation after COPY/CLONE
        NFS: Add a tracepoint to show the results of nfs_set_cache_invalid()
        NFSv42: Don't fail clone() unless the OP_CLONE operation failed
      74139277
    • Linus Torvalds's avatar
      Merge tag 'erofs-for-5.16-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs · 52dc4c64
      Linus Torvalds authored
      Pull erofs fix from Gao Xiang:
       "Fix an ABBA deadlock introduced by XArray conversion"
      
      * tag 'erofs-for-5.16-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
        erofs: fix deadlock when shrink erofs slab
      52dc4c64
    • Linus Torvalds's avatar
      Merge tag 'powerpc-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 7b65b798
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "Fix KVM using a Power9 instruction on earlier CPUs, which could lead
        to the host SLB being incorrectly invalidated and a subsequent host
        crash.
      
        Fix kernel hardlockup on vmap stack overflow on 32-bit.
      
        Thanks to Christophe Leroy, Nicholas Piggin, and Fabiano Rosas"
      
      * tag 'powerpc-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/32: Fix hardlockup on vmap stack overflow
        KVM: PPC: Book3S HV: Prevent POWER7/8 TLB flush flushing SLB
      7b65b798
    • Linus Torvalds's avatar
      Merge tag 'mips-fixes_5.16_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux · 6be08803
      Linus Torvalds authored
      Pull MIPS fixes from Thomas Bogendoerfer:
      
       - build fix for ZSTD enabled configs
      
       - fix for preempt warning
      
       - fix for loongson FTLB detection
      
       - fix for page table level selection
      
      * tag 'mips-fixes_5.16_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
        MIPS: use 3-level pgtable for 64KB page size on MIPS_VA_BITS_48
        MIPS: loongson64: fix FTLB configuration
        MIPS: Fix using smp_processor_id() in preemptible in show_cpuinfo()
        MIPS: boot/compressed/: add __ashldi3 to target for ZSTD compression
      6be08803
    • Ye Bin's avatar
      io_uring: Fix undefined-behaviour in io_issue_sqe · f6223ff7
      Ye Bin authored
      We got issue as follows:
      ================================================================================
      UBSAN: Undefined behaviour in ./include/linux/ktime.h:42:14
      signed integer overflow:
      -4966321760114568020 * 1000000000 cannot be represented in type 'long long int'
      CPU: 1 PID: 2186 Comm: syz-executor.2 Not tainted 4.19.90+ #12
      Hardware name: linux,dummy-virt (DT)
      Call trace:
       dump_backtrace+0x0/0x3f0 arch/arm64/kernel/time.c:78
       show_stack+0x28/0x38 arch/arm64/kernel/traps.c:158
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x170/0x1dc lib/dump_stack.c:118
       ubsan_epilogue+0x18/0xb4 lib/ubsan.c:161
       handle_overflow+0x188/0x1dc lib/ubsan.c:192
       __ubsan_handle_mul_overflow+0x34/0x44 lib/ubsan.c:213
       ktime_set include/linux/ktime.h:42 [inline]
       timespec64_to_ktime include/linux/ktime.h:78 [inline]
       io_timeout fs/io_uring.c:5153 [inline]
       io_issue_sqe+0x42c8/0x4550 fs/io_uring.c:5599
       __io_queue_sqe+0x1b0/0xbc0 fs/io_uring.c:5988
       io_queue_sqe+0x1ac/0x248 fs/io_uring.c:6067
       io_submit_sqe fs/io_uring.c:6137 [inline]
       io_submit_sqes+0xed8/0x1c88 fs/io_uring.c:6331
       __do_sys_io_uring_enter fs/io_uring.c:8170 [inline]
       __se_sys_io_uring_enter fs/io_uring.c:8129 [inline]
       __arm64_sys_io_uring_enter+0x490/0x980 fs/io_uring.c:8129
       invoke_syscall arch/arm64/kernel/syscall.c:53 [inline]
       el0_svc_common+0x374/0x570 arch/arm64/kernel/syscall.c:121
       el0_svc_handler+0x190/0x260 arch/arm64/kernel/syscall.c:190
       el0_svc+0x10/0x218 arch/arm64/kernel/entry.S:1017
      ================================================================================
      
      As ktime_set only judge 'secs' if big than KTIME_SEC_MAX, but if we pass
      negative value maybe lead to overflow.
      To address this issue, we must check if 'sec' is negative.
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Link: https://lore.kernel.org/r/20211118015907.844807-1-yebin10@huawei.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f6223ff7
    • Ye Bin's avatar
      io_uring: fix soft lockup when call __io_remove_buffers · 1d0254e6
      Ye Bin authored
      I got issue as follows:
      [ 567.094140] __io_remove_buffers: [1]start ctx=0xffff8881067bf000 bgid=65533 buf=0xffff8881fefe1680
      [  594.360799] watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [kworker/u32:5:108]
      [  594.364987] Modules linked in:
      [  594.365405] irq event stamp: 604180238
      [  594.365906] hardirqs last  enabled at (604180237): [<ffffffff93fec9bd>] _raw_spin_unlock_irqrestore+0x2d/0x50
      [  594.367181] hardirqs last disabled at (604180238): [<ffffffff93fbbadb>] sysvec_apic_timer_interrupt+0xb/0xc0
      [  594.368420] softirqs last  enabled at (569080666): [<ffffffff94200654>] __do_softirq+0x654/0xa9e
      [  594.369551] softirqs last disabled at (569080575): [<ffffffff913e1d6a>] irq_exit_rcu+0x1ca/0x250
      [  594.370692] CPU: 2 PID: 108 Comm: kworker/u32:5 Tainted: G            L    5.15.0-next-20211112+ #88
      [  594.371891] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
      [  594.373604] Workqueue: events_unbound io_ring_exit_work
      [  594.374303] RIP: 0010:_raw_spin_unlock_irqrestore+0x33/0x50
      [  594.375037] Code: 48 83 c7 18 53 48 89 f3 48 8b 74 24 10 e8 55 f5 55 fd 48 89 ef e8 ed a7 56 fd 80 e7 02 74 06 e8 43 13 7b fd fb bf 01 00 00 00 <e8> f8 78 474
      [  594.377433] RSP: 0018:ffff888101587a70 EFLAGS: 00000202
      [  594.378120] RAX: 0000000024030f0d RBX: 0000000000000246 RCX: 1ffffffff2f09106
      [  594.379053] RDX: 0000000000000000 RSI: ffffffff9449f0e0 RDI: 0000000000000001
      [  594.379991] RBP: ffffffff9586cdc0 R08: 0000000000000001 R09: fffffbfff2effcab
      [  594.380923] R10: ffffffff977fe557 R11: fffffbfff2effcaa R12: ffff8881b8f3def0
      [  594.381858] R13: 0000000000000246 R14: ffff888153a8b070 R15: 0000000000000000
      [  594.382787] FS:  0000000000000000(0000) GS:ffff888399c00000(0000) knlGS:0000000000000000
      [  594.383851] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  594.384602] CR2: 00007fcbe71d2000 CR3: 00000000b4216000 CR4: 00000000000006e0
      [  594.385540] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  594.386474] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  594.387403] Call Trace:
      [  594.387738]  <TASK>
      [  594.388042]  find_and_remove_object+0x118/0x160
      [  594.389321]  delete_object_full+0xc/0x20
      [  594.389852]  kfree+0x193/0x470
      [  594.390275]  __io_remove_buffers.part.0+0xed/0x147
      [  594.390931]  io_ring_ctx_free+0x342/0x6a2
      [  594.392159]  io_ring_exit_work+0x41e/0x486
      [  594.396419]  process_one_work+0x906/0x15a0
      [  594.399185]  worker_thread+0x8b/0xd80
      [  594.400259]  kthread+0x3bf/0x4a0
      [  594.401847]  ret_from_fork+0x22/0x30
      [  594.402343]  </TASK>
      
      Message from syslogd@localhost at Nov 13 09:09:54 ...
      kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [kworker/u32:5:108]
      [  596.793660] __io_remove_buffers: [2099199]start ctx=0xffff8881067bf000 bgid=65533 buf=0xffff8881fefe1680
      
      We can reproduce this issue by follow syzkaller log:
      r0 = syz_io_uring_setup(0x401, &(0x7f0000000300), &(0x7f0000003000/0x2000)=nil, &(0x7f0000ff8000/0x4000)=nil, &(0x7f0000000280)=<r1=>0x0, &(0x7f0000000380)=<r2=>0x0)
      sendmsg$ETHTOOL_MSG_FEATURES_SET(0xffffffffffffffff, &(0x7f0000003080)={0x0, 0x0, &(0x7f0000003040)={&(0x7f0000000040)=ANY=[], 0x18}}, 0x0)
      syz_io_uring_submit(r1, r2, &(0x7f0000000240)=@IORING_OP_PROVIDE_BUFFERS={0x1f, 0x5, 0x0, 0x401, 0x1, 0x0, 0x100, 0x0, 0x1, {0xfffd}}, 0x0)
      io_uring_enter(r0, 0x3a2d, 0x0, 0x0, 0x0, 0x0)
      
      The reason above issue  is 'buf->list' has 2,100,000 nodes, occupied cpu lead
      to soft lockup.
      To solve this issue, we need add schedule point when do while loop in
      '__io_remove_buffers'.
      After add  schedule point we do regression, get follow data.
      [  240.141864] __io_remove_buffers: [1]start ctx=0xffff888170603000 bgid=65533 buf=0xffff8881116fcb00
      [  268.408260] __io_remove_buffers: [1]start ctx=0xffff8881b92d2000 bgid=65533 buf=0xffff888130c83180
      [  275.899234] __io_remove_buffers: [2099199]start ctx=0xffff888170603000 bgid=65533 buf=0xffff8881116fcb00
      [  296.741404] __io_remove_buffers: [1]start ctx=0xffff8881b659c000 bgid=65533 buf=0xffff8881010fe380
      [  305.090059] __io_remove_buffers: [2099199]start ctx=0xffff8881b92d2000 bgid=65533 buf=0xffff888130c83180
      [  325.415746] __io_remove_buffers: [1]start ctx=0xffff8881b92d1000 bgid=65533 buf=0xffff8881a17d8f00
      [  333.160318] __io_remove_buffers: [2099199]start ctx=0xffff8881b659c000 bgid=65533 buf=0xffff8881010fe380
      ...
      
      Fixes:8bab4c09("io_uring: allow conditional reschedule for intensive iterators")
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Link: https://lore.kernel.org/r/20211122024737.2198530-1-yebin10@huawei.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      1d0254e6
  4. 26 Nov, 2021 6 commits