1. 29 Aug, 2024 5 commits
  2. 27 Aug, 2024 4 commits
    • Vasily Gorbik's avatar
      s390/ftrace: Avoid calling unwinder in ftrace_return_address() · a84dd0d8
      Vasily Gorbik authored
      ftrace_return_address() is called extremely often from
      performance-critical code paths when debugging features like
      CONFIG_TRACE_IRQFLAGS are enabled. For example, with debug_defconfig,
      ftrace selftests on my LPAR currently execute ftrace_return_address()
      as follows:
      
      ftrace_return_address(0) - 0 times (common code uses __builtin_return_address(0) instead)
      ftrace_return_address(1) - 2,986,805,401 times (with this patch applied)
      ftrace_return_address(2) - 140 times
      ftrace_return_address(>2) - 0 times
      
      The use of __builtin_return_address(n) was replaced by return_address()
      with an unwinder call by commit cae74ba8 ("s390/ftrace:
      Use unwinder instead of __builtin_return_address()") because
      __builtin_return_address(n) simply walks the stack backchain and doesn't
      check for reaching the stack top. For shallow stacks with fewer than
      "n" frames, this results in reads at low addresses and random
      memory accesses.
      
      While calling the fully functional unwinder "works", it is very slow
      for this purpose. Moreover, potentially following stack switches and
      walking past IRQ context is simply wrong thing to do for
      ftrace_return_address().
      
      Reimplement return_address() to essentially be __builtin_return_address(n)
      with checks for reaching the stack top. Since the ftrace_return_address(n)
      argument is always a constant, keep the implementation in the header,
      allowing both GCC and Clang to unroll the loop and optimize it to the
      bare minimum.
      
      Fixes: cae74ba8 ("s390/ftrace: Use unwinder instead of __builtin_return_address()")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarSumanth Korikkar <sumanthk@linux.ibm.com>
      Reviewed-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Acked-by: default avatarSumanth Korikkar <sumanthk@linux.ibm.com>
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      a84dd0d8
    • Jens Remus's avatar
      s390/build: Avoid relocation information in final vmlinux · 57216cc9
      Jens Remus authored
      Since commit 778666df ("s390: compile relocatable kernel without
      -fPIE") the kernel vmlinux ELF file is linked with --emit-relocs to
      preserve all relocations, so that all absolute relocations can be
      extracted using the 'relocs' tool to adjust them during boot.
      
      Port and adapt Petr Pavlu's x86 commit 9d9173e9 ("x86/build: Avoid
      relocation information in final vmlinux") to s390 to strip all
      relocations from the final vmlinux ELF file to optimize its size.
      Following is his original commit message with minor adaptions for s390:
      
      The Linux build process on s390 roughly consists of compiling all input
      files, statically linking them into a vmlinux ELF file, and then taking
      and turning this file into an actual bzImage bootable file.
      
      vmlinux has in this process two main purposes:
      1) It is an intermediate build target on the way to produce the final
         bootable image.
      2) It is a file that is expected to be used by debuggers and standard
         ELF tooling to work with the built kernel.
      
      For the second purpose, a vmlinux file is typically collected by various
      package build recipes, such as distribution spec files, including the
      kernel's own tar-pkg target.
      
      When building the kernel vmlinux contains also relocation information
      produced by using the --emit-relocs linker option. This is utilized by
      subsequent build steps to create relocs.S and produce a relocatable
      image. However, the information is not needed by debuggers and other
      standard ELF tooling.
      
      The issue is then that the collected vmlinux file and hence distribution
      packages end up unnecessarily large because of this extra data. The
      following is a size comparison of vmlinux v6.10 with and without the
      relocation information:
      
        | Configuration      | With relocs | Stripped relocs |
        | defconfig          |      696 MB |          320 MB |
        | -CONFIG_DEBUG_INFO |       48 MB |           32 MB |
      
      Optimize a resulting vmlinux by adding a postlink step that splits the
      relocation information into relocs.S and then strips it from the vmlinux
      binary.
      Reviewed-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: default avatarJens Remus <jremus@linux.ibm.com>
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      57216cc9
    • Vasily Gorbik's avatar
      s390/ftrace: Use kernel ftrace trampoline for modules · d759be28
      Vasily Gorbik authored
      Now that both the kernel modules area and the kernel image itself are
      located within 4 GB, there is no longer a need to maintain a separate
      ftrace_plt trampoline. Use the existing trampoline in the kernel.
      Reviewed-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      Reviewed-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      d759be28
    • Vasily Gorbik's avatar
      s390/ftrace: Remove unused ftrace_plt_template* · 017f1f0d
      Vasily Gorbik authored
      Unused since commit b860b934 ("s390/ftrace: remove dead code").
      Reviewed-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      017f1f0d
  3. 22 Aug, 2024 4 commits
  4. 21 Aug, 2024 7 commits
  5. 07 Aug, 2024 14 commits
  6. 04 Aug, 2024 6 commits
    • Linus Torvalds's avatar
      Linux 6.11-rc2 · de9c2c66
      Linus Torvalds authored
      de9c2c66
    • Tetsuo Handa's avatar
      profiling: remove profile=sleep support · b88f5538
      Tetsuo Handa authored
      The kernel sleep profile is no longer working due to a recursive locking
      bug introduced by commit 42a20f86 ("sched: Add wrapper for get_wchan()
      to keep task blocked")
      
      Booting with the 'profile=sleep' kernel command line option added or
      executing
      
        # echo -n sleep > /sys/kernel/profiling
      
      after boot causes the system to lock up.
      
      Lockdep reports
      
        kthreadd/3 is trying to acquire lock:
        ffff93ac82e08d58 (&p->pi_lock){....}-{2:2}, at: get_wchan+0x32/0x70
      
        but task is already holding lock:
        ffff93ac82e08d58 (&p->pi_lock){....}-{2:2}, at: try_to_wake_up+0x53/0x370
      
      with the call trace being
      
         lock_acquire+0xc8/0x2f0
         get_wchan+0x32/0x70
         __update_stats_enqueue_sleeper+0x151/0x430
         enqueue_entity+0x4b0/0x520
         enqueue_task_fair+0x92/0x6b0
         ttwu_do_activate+0x73/0x140
         try_to_wake_up+0x213/0x370
         swake_up_locked+0x20/0x50
         complete+0x2f/0x40
         kthread+0xfb/0x180
      
      However, since nobody noticed this regression for more than two years,
      let's remove 'profile=sleep' support based on the assumption that nobody
      needs this functionality.
      
      Fixes: 42a20f86 ("sched: Add wrapper for get_wchan() to keep task blocked")
      Cc: stable@vger.kernel.org # v5.16+
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b88f5538
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a5dbd76a
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
      
       - Prevent a deadlock on cpu_hotplug_lock in the aperf/mperf driver.
      
         A recent change in the ACPI code which consolidated code pathes moved
         the invocation of init_freq_invariance_cppc() to be moved to a CPU
         hotplug handler. The first invocation on AMD CPUs ends up enabling a
         static branch which dead locks because the static branch enable tries
         to acquire cpu_hotplug_lock but that lock is already held write by
         the hotplug machinery.
      
         Use static_branch_enable_cpuslocked() instead and take the hotplug
         lock read for the Intel code path which is invoked from the
         architecture code outside of the CPU hotplug operations.
      
       - Fix the number of reserved bits in the sev_config structure bit field
         so that the bitfield does not exceed 64 bit.
      
       - Add missing Zen5 model numbers
      
       - Fix the alignment assumptions of pti_clone_pgtable() and
         clone_entry_text() on 32-bit:
      
         The code assumes PMD aligned code sections, but on 32-bit the kernel
         entry text is not PMD aligned. So depending on the code size and
         location, which is configuration and compiler dependent, entry text
         can cross a PMD boundary. As the start is not PMD aligned adding PMD
         size to the start address is larger than the end address which
         results in partially mapped entry code for user space. That causes
         endless recursion on the first entry from userspace (usually #PF).
      
         Cure this by aligning the start address in the addition so it ends up
         at the next PMD start address.
      
         clone_entry_text() enforces PMD mapping, but on 32-bit the tail might
         eventually be PTE mapped, which causes a map fail because the PMD for
         the tail is not a large page mapping. Use PTI_LEVEL_KERNEL_IMAGE for
         the clone() invocation which resolves to PTE on 32-bit and PMD on
         64-bit.
      
       - Zero the 8-byte case for get_user() on range check failure on 32-bit
      
         The recend consolidation of the 8-byte get_user() case broke the
         zeroing in the failure case again. Establish it by clearing ECX
         before the range check and not afterwards as that obvioulsy can't be
         reached when the range check fails
      
      * tag 'x86-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/uaccess: Zero the 8-byte get_range case on failure on 32-bit
        x86/mm: Fix pti_clone_entry_text() for i386
        x86/mm: Fix pti_clone_pgtable() alignment assumption
        x86/setup: Parse the builtin command line before merging
        x86/CPU/AMD: Add models 0x60-0x6f to the Zen5 range
        x86/sev: Fix __reserved field in sev_config
        x86/aperfmperf: Fix deadlock on cpu_hotplug_lock
      a5dbd76a
    • Linus Torvalds's avatar
      Merge tag 'timers-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 61ca6c78
      Linus Torvalds authored
      Pull timer fixes from Thomas Gleixner:
       "Two fixes for the timer/clocksource code:
      
         - The recent fix to make the take over of the broadcast timer more
           reliable retrieves a per CPU pointer in preemptible context.
      
           This went unnoticed in testing as some compilers hoist the access
           into the non-preemotible section where the pointer is actually
           used, but obviously compilers can rightfully invoke it where the
           code put it.
      
           Move it into the non-preemptible section right to the actual usage
           side to cure it.
      
         - The clocksource watchdog is supposed to emit a warning when the
           retry count is greater than one and the number of retries reaches
           the limit.
      
           The condition is backwards and warns always when the count is
           greater than one. Fixup the condition to prevent spamming dmesg"
      
      * tag 'timers-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        clocksource: Fix brown-bag boolean thinko in cs_watchdog_read()
        tick/broadcast: Move per CPU pointer access into the atomic section
      61ca6c78
    • Linus Torvalds's avatar
      Merge tag 'sched-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 6cc82dc2
      Linus Torvalds authored
      Pull scheduler fixes from Thomas Gleixner:
      
       - When stime is larger than rtime due to accounting imprecision, then
         utime = rtime - stime becomes negative. As this is unsigned math, the
         result becomes a huge positive number.
      
         Cure it by resetting stime to rtime in that case, so utime becomes 0.
      
       - Restore consistent state when sched_cpu_deactivate() fails.
      
         When offlining a CPU fails in sched_cpu_deactivate() after the SMT
         present counter has been decremented, then the function aborts but
         fails to increment the SMT present counter and leaves it imbalanced.
         Consecutive operations cause it to underflow. Add the missing fixup
         for the error path.
      
         For SMT accounting the runqueue needs to marked online again in the
         error exit path to restore consistent state.
      
      * tag 'sched-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/core: Fix unbalance set_rq_online/offline() in sched_cpu_deactivate()
        sched/core: Introduce sched_set_rq_on/offline() helper
        sched/smt: Fix unbalance sched_smt_present dec/inc
        sched/smt: Introduce sched_smt_present_inc/dec() helper
        sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime
      6cc82dc2
    • Linus Torvalds's avatar
      Merge tag 'perf-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1ddeb0ef
      Linus Torvalds authored
      Pull x86 perf fixes from Thomas Gleixner:
      
       - Move the smp_processor_id() invocation back into the non-preemtible
         region, so that the result is valid to use
      
       - Add the missing package C2 residency counters for Sierra Forest CPUs
         to make the newly added support actually useful
      
      * tag 'perf-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86: Fix smp_processor_id()-in-preemptible warnings
        perf/x86/intel/cstate: Add pkg C2 residency counter for Sierra Forest
      1ddeb0ef