1. 06 Sep, 2023 23 commits
    • Qing Zhang's avatar
      LoongArch: Simplify the processing of jumping new kernel for KASLR · 9fbcc076
      Qing Zhang authored
      Modified relocate_kernel() doesn't return new kernel's entry point but
      the random_offset. In this way we share the start_kernel() processing
      with the normal kernel, which avoids calling 'jr a0' directly and allows
      some other operations (e.g, kasan_early_init) before start_kernel() when
      KASLR (CONFIG_RANDOMIZE_BASE) is turned on.
      Signed-off-by: default avatarQing Zhang <zhangqing@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      9fbcc076
    • Qing Zhang's avatar
      kasan: Add (pmd|pud)_init for LoongArch zero_(pud|p4d)_populate process · fb6d5c1d
      Qing Zhang authored
      LoongArch populates pmd/pud with invalid_pmd_table/invalid_pud_table in
      pagetable_init, So pmd_init/pud_init(p) is required, define them as __weak
      in mm/kasan/init.c, like mm/sparse-vmemmap.c.
      Reviewed-by: default avatarAndrey Konovalov <andreyknvl@gmail.com>
      Signed-off-by: default avatarQing Zhang <zhangqing@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      fb6d5c1d
    • Qing Zhang's avatar
      kasan: Add __HAVE_ARCH_SHADOW_MAP to support arch specific mapping · 9b04c764
      Qing Zhang authored
      MIPS, LoongArch and some other architectures have many holes between
      different segments and the valid address space (256T available) is
      insufficient to map all these segments to kasan shadow memory with the
      common formula provided by kasan core. So we need architecture specific
      mapping formulas to ensure different segments are mapped individually,
      and only limited space lengths of those specific segments are mapped to
      shadow.
      
      Therefore, when the incoming address is converted to a shadow, we need
      to add a condition to determine whether it is valid.
      Reviewed-by: default avatarAndrey Konovalov <andreyknvl@gmail.com>
      Signed-off-by: default avatarQing Zhang <zhangqing@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      9b04c764
    • Enze Li's avatar
      LoongArch: Add KFENCE (Kernel Electric-Fence) support · 6ad3df56
      Enze Li authored
      The LoongArch architecture is quite different from other architectures.
      When the allocating of KFENCE itself is done, it is mapped to the direct
      mapping configuration window [1] by default on LoongArch.  It means that
      it is not possible to use the page table mapped mode which required by
      the KFENCE system and therefore it should be remapped to the appropriate
      region.
      
      This patch adds architecture specific implementation details for KFENCE.
      In particular, this implements the required interface in <asm/kfence.h>.
      
      Tested this patch by running the testcases and all passed.
      
      [1] https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#virtual-address-space-and-address-translation-modeSigned-off-by: default avatarEnze Li <lienze@kylinos.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      6ad3df56
    • Enze Li's avatar
      LoongArch: Get partial stack information when providing regs parameter · 95bb5b61
      Enze Li authored
      Currently, arch_stack_walk() can only get the full stack information
      including NMI.  This is because the implementation of arch_stack_walk()
      is forced to ignore the information passed by the regs parameter and use
      the current stack information instead.
      
      For some detection systems like KFENCE, only partial stack information
      is needed.  In particular, the stack frame where the interrupt occurred.
      
      To support KFENCE, this patch modifies the implementation of the
      arch_stack_walk() function so that if this function is called with the
      regs argument passed, it retains all the stack information in regs and
      uses it to provide accurate information.
      
      Before this patch:
      [    1.531195 ] ==================================================================
      [    1.531442 ] BUG: KFENCE: out-of-bounds read in stack_trace_save_regs+0x48/0x6c
      [    1.531442 ]
      [    1.531900 ] Out-of-bounds read at 0xffff800012267fff (1B left of kfence-#12):
      [    1.532046 ]  stack_trace_save_regs+0x48/0x6c
      [    1.532169 ]  kfence_report_error+0xa4/0x528
      [    1.532276 ]  kfence_handle_page_fault+0x124/0x270
      [    1.532388 ]  no_context+0x50/0x94
      [    1.532453 ]  do_page_fault+0x1a8/0x36c
      [    1.532524 ]  tlb_do_page_fault_0+0x118/0x1b4
      [    1.532623 ]  test_out_of_bounds_read+0xa0/0x1d8
      [    1.532745 ]  kunit_generic_run_threadfn_adapter+0x1c/0x28
      [    1.532854 ]  kthread+0x124/0x130
      [    1.532922 ]  ret_from_kernel_thread+0xc/0xa4
      <snip>
      
      After this patch:
      [    1.320220 ] ==================================================================
      [    1.320401 ] BUG: KFENCE: out-of-bounds read in test_out_of_bounds_read+0xa8/0x1d8
      [    1.320401 ]
      [    1.320898 ] Out-of-bounds read at 0xffff800012257fff (1B left of kfence-#10):
      [    1.321134 ]  test_out_of_bounds_read+0xa8/0x1d8
      [    1.321264 ]  kunit_generic_run_threadfn_adapter+0x1c/0x28
      [    1.321392 ]  kthread+0x124/0x130
      [    1.321459 ]  ret_from_kernel_thread+0xc/0xa4
      <snip>
      Suggested-by: default avatarJinyang He <hejinyang@loongson.cn>
      Signed-off-by: default avatarEnze Li <lienze@kylinos.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      95bb5b61
    • Enze Li's avatar
      LoongArch: mm: Add page table mapped mode support for virt_to_page() · 8b5cb1cb
      Enze Li authored
      According to LoongArch documentations, there are two types of address
      translation modes: direct mapped address translation mode (DMW mode) and
      page table mapped address translation mode (TLB mode).
      
      Currently, virt_to_page() only supports direct mapped mode. This patch
      determines which mode is used, and adds corresponding handling functions
      for both modes.
      
      For more details on the two modes, see [1].
      
      [1] https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#virtual-address-space-and-address-translation-modeSigned-off-by: default avatarEnze Li <lienze@kylinos.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      8b5cb1cb
    • Enze Li's avatar
      kfence: Defer the assignment of the local variable addr · ec9fee79
      Enze Li authored
      The LoongArch architecture is different from other architectures. It
      needs to update __kfence_pool during arch_kfence_init_pool().
      
      This patch modifies the assignment location of the local variable addr
      in the kfence_init_pool() function to support the case of updating
      __kfence_pool in arch_kfence_init_pool().
      Acked-by: default avatarMarco Elver <elver@google.com>
      Signed-off-by: default avatarEnze Li <lienze@kylinos.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      ec9fee79
    • Feiyang Chen's avatar
      LoongArch: Allow building with kcov coverage · 2363088e
      Feiyang Chen authored
      Add ARCH_HAS_KCOV and HAVE_GCC_PLUGINS to the LoongArch Kconfig. And
      also disable instrumentation of vdso.
      Signed-off-by: default avatarFeiyang Chen <chenfeiyang@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      2363088e
    • Feiyang Chen's avatar
      LoongArch: Provide kaslr_offset() to get kernel offset · b72961f8
      Feiyang Chen authored
      Provide kaslr_offset() to get the kernel offset when KASLR is enabled.
      Signed-off-by: default avatarFeiyang Chen <chenfeiyang@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      b72961f8
    • Qing Zhang's avatar
      LoongArch: Add basic KGDB & KDB support · e14dd076
      Qing Zhang authored
      KGDB is intended to be used as a source level debugger for the Linux
      kernel. It is used along with gdb to debug a Linux kernel. GDB can be
      used to "break in" to the kernel to inspect memory, variables and regs
      similar to the way an application developer would use GDB to debug an
      application. KDB is a frontend of KGDB which is similar to GDB.
      
      By now, in addition to the generic KGDB features, the LoongArch KGDB
      implements the following features:
      - Hardware breakpoints/watchpoints;
      - Software single-step support for KDB.
      
      Signed-off-by: Qing Zhang <zhangqing@loongson.cn>   # Framework & CoreFeature
      Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn> # BreakPoint & SingleStep
      Signed-off-by: Hui Li <lihui@loongson.cn>           # Some Minor Improvements
      Signed-off-by: Randy Dunlap <rdunlap@infradead.org> # Some Build Error Fixes
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      e14dd076
    • Qi Hu's avatar
      LoongArch: Add Loongson Binary Translation (LBT) extension support · bd3c5798
      Qi Hu authored
      Loongson Binary Translation (LBT) is used to accelerate binary translation,
      which contains 4 scratch registers (scr0 to scr3), x86/ARM eflags (eflags)
      and x87 fpu stack pointer (ftop).
      
      This patch support kernel to save/restore these registers, handle the LBT
      exception and maintain sigcontext.
      Signed-off-by: default avatarQi Hu <huqi@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      bd3c5798
    • WANG Xuerui's avatar
      raid6: Add LoongArch SIMD recovery implementation · f2091321
      WANG Xuerui authored
      Similar to the syndrome calculation, the recovery algorithms also work
      on 64 bytes at a time to align with the L1 cache line size of current
      and future LoongArch cores (that we care about). Which means
      unrolled-by-4 LSX and unrolled-by-2 LASX code.
      
      The assembly is originally based on the x86 SSSE3/AVX2 ports, but
      register allocation has been redone to take advantage of LSX/LASX's 32
      vector registers, and instruction sequence has been optimized to suit
      (e.g. LoongArch can perform per-byte srl and andi on vectors, but x86
      cannot).
      
      Performance numbers measured by instrumenting the raid6test code, on a
      3A5000 system clocked at 2.5GHz:
      
      > lasx  2data: 354.987 MiB/s
      > lasx  datap: 350.430 MiB/s
      > lsx   2data: 340.026 MiB/s
      > lsx   datap: 337.318 MiB/s
      > intx1 2data: 164.280 MiB/s
      > intx1 datap: 187.966 MiB/s
      
      Because recovery algorithms are chosen solely based on priority and
      availability, lasx is marked as priority 2 and lsx priority 1. At least
      for the current generation of LoongArch micro-architectures, LASX should
      always be faster than LSX whenever supported, and have similar power
      consumption characteristics (because the only known LASX-capable uarch,
      the LA464, always compute the full 256-bit result for vector ops).
      Acked-by: default avatarSong Liu <song@kernel.org>
      Signed-off-by: default avatarWANG Xuerui <git@xen0n.name>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      f2091321
    • WANG Xuerui's avatar
      raid6: Add LoongArch SIMD syndrome calculation · 8f3f06df
      WANG Xuerui authored
      The algorithms work on 64 bytes at a time, which is the L1 cache line
      size of all current and future LoongArch cores (that we care about), as
      confirmed by Huacai. The code is based on the generic int.uc algorithm,
      unrolled 4 times for LSX and 2 times for LASX. Further unrolling does
      not meaningfully improve the performance according to experiments.
      
      Performance numbers measured during system boot on a 3A5000 @ 2.5GHz:
      
      > raid6: lasx     gen() 12726 MB/s
      > raid6: lsx      gen() 10001 MB/s
      > raid6: int64x8  gen()  2876 MB/s
      > raid6: int64x4  gen()  3867 MB/s
      > raid6: int64x2  gen()  2531 MB/s
      > raid6: int64x1  gen()  1945 MB/s
      
      Comparison of xor() speeds (from different boots but meaningful anyway):
      
      > lasx:    11226 MB/s
      > lsx:     6395 MB/s
      > int64x4: 2147 MB/s
      
      Performance as measured by raid6test:
      
      > raid6: lasx     gen() 25109 MB/s
      > raid6: lsx      gen() 13233 MB/s
      > raid6: int64x8  gen()  4164 MB/s
      > raid6: int64x4  gen()  6005 MB/s
      > raid6: int64x2  gen()  5781 MB/s
      > raid6: int64x1  gen()  4119 MB/s
      > raid6: using algorithm lasx gen() 25109 MB/s
      > raid6: .... xor() 14439 MB/s, rmw enabled
      Acked-by: default avatarSong Liu <song@kernel.org>
      Signed-off-by: default avatarWANG Xuerui <git@xen0n.name>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      8f3f06df
    • WANG Xuerui's avatar
      LoongArch: Add SIMD-optimized XOR routines · 75ded18a
      WANG Xuerui authored
      Add LSX and LASX implementations of xor operations, operating on 64
      bytes (one L1 cache line) at a time, for a balance between memory
      utilization and instruction mix. Huacai confirmed that all future
      LoongArch implementations by Loongson (that we care) will likely also
      feature 64-byte cache lines, and experiments show no throughput
      improvement with further unrolling.
      
      Performance numbers measured during system boot on a 3A5000 @ 2.5GHz:
      
      > 8regs           : 12702 MB/sec
      > 8regs_prefetch  : 10920 MB/sec
      > 32regs          : 12686 MB/sec
      > 32regs_prefetch : 10918 MB/sec
      > lsx             : 17589 MB/sec
      > lasx            : 26116 MB/sec
      Acked-by: default avatarSong Liu <song@kernel.org>
      Signed-off-by: default avatarWANG Xuerui <git@xen0n.name>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      75ded18a
    • Huacai Chen's avatar
      LoongArch: Allow usage of LSX/LASX in the kernel · 2478e4b7
      Huacai Chen authored
      Allow usage of LSX/LASX in the kernel by extending kernel_fpu_begin()
      and kernel_fpu_end().
      Reviewed-by: default avatarWANG Xuerui <git@xen0n.name>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      2478e4b7
    • Tiezhu Yang's avatar
      LoongArch: Define symbol 'fault' as a local label in fpu.S · 8f58c571
      Tiezhu Yang authored
      The initial aim is to silence the following objtool warnings:
      
        arch/loongarch/kernel/fpu.o: warning: objtool: _save_fp_context() falls through to next function fault()
        arch/loongarch/kernel/fpu.o: warning: objtool: _restore_fp_context() falls through to next function fault()
        arch/loongarch/kernel/fpu.o: warning: objtool: _save_lsx_context() falls through to next function fault()
        arch/loongarch/kernel/fpu.o: warning: objtool: _restore_lsx_context() falls through to next function fault()
        arch/loongarch/kernel/fpu.o: warning: objtool: _save_lasx_context() falls through to next function fault()
        arch/loongarch/kernel/fpu.o: warning: objtool: _restore_lasx_context() falls through to next function fault()
      
      Currently, SYM_FUNC_START()/SYM_FUNC_END() defines the symbol 'fault' as
      SYM_T_FUNC which is STT_FUNC, the objtool warnings are generated through
      the following code:
      
      tools/objtool/include/objtool/check.h:
      
      static inline struct symbol *insn_func(struct instruction *insn)
      {
      	struct symbol *sym = insn->sym;
      
      	if (sym && sym->type != STT_FUNC)
      		sym = NULL;
      
      	return sym;
      }
      
      tools/objtool/check.c:
      
      static int validate_branch(struct objtool_file *file, struct symbol *func,
      			   struct instruction *insn, struct insn_state state)
      {
      	...
      		if (func && insn_func(insn) && func != insn_func(insn)->pfunc) {
      	...
      			WARN("%s() falls through to next function %s()",
      			     func->name, insn_func(insn)->name);
      			return 1;
      		}
      	...
      }
      
      We can see that the fixup can be a local label in the following code:
      
      arch/loongarch/include/asm/asm-extable.h:
      	.pushsection	__ex_table, "a";		\
      	.balign		4;				\
      	.long		((insn) - .);			\
      	.long		((fixup) - .);			\
      	.short		(type);				\
      	.short		(data);				\
      	.popsection;
      
      	.macro		_asm_extable, insn, fixup
      	__ASM_EXTABLE_RAW(\insn, \fixup, EX_TYPE_FIXUP, 0)
      	.endm
      
      Like arch/loongarch/lib/*.S, just define the symbol 'fault' as a local
      label in fpu.S.
      
      Before:
      
      $ readelf -s arch/loongarch/kernel/fpu.o | awk -F: /fault/'{print $2}'
       000000000000053c     8 FUNC    GLOBAL DEFAULT    1 fault
      
      After:
      
      $ readelf -s arch/loongarch/kernel/fpu.o | awk -F: /fault/'{print $2}'
       000000000000053c     0 NOTYPE  LOCAL  DEFAULT    1 .L_fpu_fault
      Co-developed-by: default avatarYouling Tang <tangyouling@loongson.cn>
      Signed-off-by: default avatarYouling Tang <tangyouling@loongson.cn>
      Signed-off-by: default avatarTiezhu Yang <yangtiezhu@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      8f58c571
    • Weihao Li's avatar
      LoongArch: Adjust {copy, clear}_user exception handler behavior · 937f6593
      Weihao Li authored
      The {copy, clear}_user function should returns number of bytes that
      could not be {copied, cleared}. So, try to {copy, clear} byte by byte
      when ld.{d,w,h} and st.{d,w,h} trapped into an exception.
      Reviewed-by: default avatarWANG Rui <wangrui@loongson.cn>
      Signed-off-by: default avatarWeihao Li <liweihao@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      937f6593
    • Bibo Mao's avatar
      LoongArch: Use static defined zero page rather than allocated · 0921af6c
      Bibo Mao authored
      On LoongArch system, there is only one page needed for zero page (no
      cache synonyms), and there is no COLOR_ZERO_PAGE, so zero_page_mask is
      useless and the macro __HAVE_COLOR_ZERO_PAGE is not necessary.
      
      Like other popular architectures, It is simpler to define the zero page
      in kernel BSS code segment rather than dynamically allocate.
      Signed-off-by: default avatarBibo Mao <maobibo@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      0921af6c
    • Bibo Mao's avatar
      LoongArch: mm: Introduce unified function populate_kernel_pte() · 2bb20d29
      Bibo Mao authored
      Function pcpu_populate_pte() and fixmap_pte() are similar, they populate
      one page from kernel address space. And there is confusion between pgd
      and p4d in the function fixmap_pte(), such as pgd_none() always returns
      zero. This patch introduces a unified function populate_kernel_pte() and
      then replaces pcpu_populate_pte() and fixmap_pte().
      Signed-off-by: default avatarBibo Mao <maobibo@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      2bb20d29
    • Bibo Mao's avatar
      LoongArch: Code improvements in function pcpu_populate_pte() · f33efa90
      Bibo Mao authored
      Do some code improvements in function pcpu_populate_pte():
      1. Add memory allocation failure handling;
      2. Replace pgd_populate() with p4d_populate(), it will be useful if
         there are four-level page tables.
      Signed-off-by: default avatarBibo Mao <maobibo@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      f33efa90
    • Huacai Chen's avatar
      LoongArch: Remove shm_align_mask and use SHMLBA instead · ad3ff105
      Huacai Chen authored
      Both shm_align_mask and SHMLBA want to avoid cache alias. But they are
      inconsistent: shm_align_mask is (PAGE_SIZE - 1) while SHMLBA is SZ_64K,
      but PAGE_SIZE is not always equal to SZ_64K.
      
      This may cause problems when shmat() twice. Fix this problem by removing
      shm_align_mask and using SHMLBA (strictly SHMLBA - 1) instead.
      Reported-by: default avatarJiantao Shan <shanjiantao@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      ad3ff105
    • Hongchen Zhang's avatar
      LoongArch: mm: Add p?d_leaf() definitions · 303be4b3
      Hongchen Zhang authored
      When I do LTP test, LTP test case ksm06 caused panic at
      	break_ksm_pmd_entry
      	  -> pmd_leaf (Huge page table but False)
      	  -> pte_present (panic)
      
      The reason is pmd_leaf() is not defined, So like commit 501b8104
      ("mips: mm: add p?d_leaf() definitions") add p?d_leaf() definition for
      LoongArch.
      
      Fixes: 09cfefb7 ("LoongArch: Add memory management")
      Cc: stable@vger.kernel.org
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarHongchen Zhang <zhanghongchen@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      303be4b3
    • Nathan Chancellor's avatar
      LoongArch: Drop unused parse_r and parse_v macros · 8ff81bb2
      Nathan Chancellor authored
      When building with CONFIG_LTO_CLANG_FULL, there are several errors due
      to the way that parse_r is defined with an __asm__ statement in a
      header:
      
        ld.lld: error: ld-temp.o <inline asm>:105:1: macro 'parse_r' is already defined
        .macro  parse_r var r
        ^
      
      This was an issue for arch/mips as well, which was resolved by commit
      67512a8c ("MIPS: Avoid macro redefinitions").
      
      However, parse_r is unused in arch/loongarch after commit 83d8b389
      ("LoongArch: Simplify the invtlb wrappers"), so doing the same change
      does not make much sense now. Just remove parse_r (and parse_v, which
      is also unused) to resolve the redefinition error. If it needs to be
      brought back due to an actual use, it should be brought back with the
      same changes as the aforementioned arch/mips commit.
      
      Closes: https://github.com/ClangBuiltLinux/linux/issues/1924Reviewed-by: default avatarWANG Xuerui <git@xen0n.name>
      Signed-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      8ff81bb2
  2. 30 Aug, 2023 1 commit
  3. 27 Aug, 2023 2 commits
  4. 26 Aug, 2023 7 commits
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2023-08-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 28f20a19
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "Fix an FPU invalidation bug on exec(), and fix a performance
        regression due to a missing setting of X86_FEATURE_OSXSAVE"
      
      * tag 'x86-urgent-2023-08-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/fpu: Set X86_FEATURE_OSXSAVE feature after enabling OSXSAVE in CR4
        x86/fpu: Invalidate FPU state correctly on exec()
      28f20a19
    • Linus Torvalds's avatar
      Merge tag 'irq-urgent-2023-08-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 3b35375f
      Linus Torvalds authored
      Pull irq fix from Thomas Gleixner:
       "A last minute fix for a regression introduced in the v6.5 merge
        window.
      
        The conversion of the software based interrupt resend mechanism to
        hlist missed to add a check whether the descriptor is already enqueued
        and dropped the interrupt descriptor lookup for nested interrupts.
      
        The missing check whether the descriptor is already queued causes
        hlist corruption and can be observed in the wild. The dropped parent
        descriptor lookup has not yet caused problems, but it would result in
        stale interrupt line in the worst case.
      
        Add the missing enqueued check and bring the descriptor lookup back to
        cure this"
      
      * tag 'irq-urgent-2023-08-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        genirq: Fix software resend lockup and nested resend
      3b35375f
    • Linus Torvalds's avatar
      Merge tag 'loongarch-fixes-6.5-2' of... · c3137613
      Linus Torvalds authored
      Merge tag 'loongarch-fixes-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
      
      Pull LoongArch fixes from Huacai Chen:
       "Fix a ptrace bug, a hw_breakpoint bug, some build errors/warnings and
        some trivial cleanups"
      
      * tag 'loongarch-fixes-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
        LoongArch: Fix hw_breakpoint_control() for watchpoints
        LoongArch: Ensure FP/SIMD registers in the core dump file is up to date
        LoongArch: Put the body of play_dead() into arch_cpu_idle_dead()
        LoongArch: Add identifier names to arguments of die() declaration
        LoongArch: Return earlier in die() if notify_die() returns NOTIFY_STOP
        LoongArch: Do not kill the task in die() if notify_die() returns NOTIFY_STOP
        LoongArch: Remove <asm/export.h>
        LoongArch: Replace #include <asm/export.h> with #include <linux/export.h>
        LoongArch: Remove unneeded #include <asm/export.h>
        LoongArch: Replace -ffreestanding with finer-grained -fno-builtin's
        LoongArch: Remove redundant "source drivers/firmware/Kconfig"
      c3137613
    • Johan Hovold's avatar
      genirq: Fix software resend lockup and nested resend · 9f5deb55
      Johan Hovold authored
      The switch to using hlist for managing software resend of interrupts
      broke resend in at least two ways:
      
      First, unconditionally adding interrupt descriptors to the resend list can
      corrupt the list when the descriptor in question has already been
      added. This causes the resend tasklet to loop indefinitely with interrupts
      disabled as was recently reported with the Lenovo ThinkPad X13s after
      threaded NAPI was disabled in the ath11k WiFi driver.
      
      This bug is easily fixed by restoring the old semantics of irq_sw_resend()
      so that it can be called also for descriptors that have already been marked
      for resend.
      
      Second, the offending commit also broke software resend of nested
      interrupts by simply discarding the code that made sure that such
      interrupts are retriggered using the parent interrupt.
      
      Add back the corresponding code that adds the parent descriptor to the
      resend list.
      
      Fixes: bc06a9e0 ("genirq: Use hlist for managing resend handlers")
      Signed-off-by: default avatarJohan Hovold <johan+linaro@kernel.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/lkml/20230809073432.4193-1-johan+linaro@kernel.org/
      Link: https://lore.kernel.org/r/20230826154004.1417-1-johan+linaro@kernel.org
      9f5deb55
    • Huacai Chen's avatar
      LoongArch: Fix hw_breakpoint_control() for watchpoints · 9730870b
      Huacai Chen authored
      In hw_breakpoint_control(), encode_ctrl_reg() has already encoded the
      MWPnCFG3_LoadEn/MWPnCFG3_StoreEn bits in info->ctrl. We don't need to
      add (1 << MWPnCFG3_LoadEn | 1 << MWPnCFG3_StoreEn) unconditionally.
      
      Otherwise we can't set read watchpoint and write watchpoint separately.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      9730870b
    • Huacai Chen's avatar
      LoongArch: Ensure FP/SIMD registers in the core dump file is up to date · 656f9aec
      Huacai Chen authored
      This is a port of commit 379eb01c ("riscv: Ensure the value
      of FP registers in the core dump file is up to date").
      
      The values of FP/SIMD registers in the core dump file come from the
      thread.fpu. However, kernel saves the FP/SIMD registers only before
      scheduling out the process. If no process switch happens during the
      exception handling, kernel will not have a chance to save the latest
      values of FP/SIMD registers. So it may cause their values in the core
      dump file incorrect. To solve this problem, force fpr_get()/simd_get()
      to save the FP/SIMD registers into the thread.fpu if the target task
      equals the current task.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      656f9aec
    • Linus Torvalds's avatar
      Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · 7d2f353b
      Linus Torvalds authored
      Pull clk fixes from Stephen Boyd:
       "One clk driver fix and two clk framework fixes:
      
         - Fix an OOB access when devm_get_clk_from_child() is used and
           devm_clk_release() casts the void pointer to the wrong type
      
         - Move clk_rate_exclusive_{get,put}() within the correct ifdefs in
           clk.h so that the stubs are used when CONFIG_COMMON_CLK=n
      
         - Register the proper clk provider function depending on the value of
           #clock-cells in the TI keystone driver"
      
      * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
        clk: Fix slab-out-of-bounds error in devm_clk_release()
        clk: Fix undefined reference to `clk_rate_exclusive_{get,put}'
        clk: keystone: syscon-clk: Fix audio refclk
      7d2f353b
  5. 25 Aug, 2023 7 commits
    • Helge Deller's avatar
      lib/clz_ctz.c: Fix __clzdi2() and __ctzdi2() for 32-bit kernels · 382d4cd1
      Helge Deller authored
      The gcc compiler translates on some architectures the 64-bit
      __builtin_clzll() function to a call to the libgcc function __clzdi2(),
      which should take a 64-bit parameter on 32- and 64-bit platforms.
      
      But in the current kernel code, the built-in __clzdi2() function is
      defined to operate (wrongly) on 32-bit parameters if BITS_PER_LONG ==
      32, thus the return values on 32-bit kernels are in the range from
      [0..31] instead of the expected [0..63] range.
      
      This patch fixes the in-kernel functions __clzdi2() and __ctzdi2() to
      take a 64-bit parameter on 32-bit kernels as well, thus it makes the
      functions identical for 32- and 64-bit kernels.
      
      This bug went unnoticed since kernel 3.11 for over 10 years, and here
      are some possible reasons for that:
      
       a) Some architectures have assembly instructions to count the bits and
          which are used instead of calling __clzdi2(), e.g. on x86 the bsr
          instruction and on ppc cntlz is used. On such architectures the
          wrong __clzdi2() implementation isn't used and as such the bug has
          no effect and won't be noticed.
      
       b) Some architectures link to libgcc.a, and the in-kernel weak
          functions get replaced by the correct 64-bit variants from libgcc.a.
      
       c) __builtin_clzll() and __clzdi2() doesn't seem to be used in many
          places in the kernel, and most likely only in uncritical functions,
          e.g. when printing hex values via seq_put_hex_ll(). The wrong return
          value will still print the correct number, but just in a wrong
          formatting (e.g. with too many leading zeroes).
      
       d) 32-bit kernels aren't used that much any longer, so they are less
          tested.
      
      A trivial testcase to verify if the currently running 32-bit kernel is
      affected by the bug is to look at the output of /proc/self/maps:
      
      Here the kernel uses a correct implementation of __clzdi2():
      
        root@debian:~# cat /proc/self/maps
        00010000-00019000 r-xp 00000000 08:05 787324     /usr/bin/cat
        00019000-0001a000 rwxp 00009000 08:05 787324     /usr/bin/cat
        0001a000-0003b000 rwxp 00000000 00:00 0          [heap]
        f7551000-f770d000 r-xp 00000000 08:05 794765     /usr/lib/hppa-linux-gnu/libc.so.6
        ...
      
      and this kernel uses the broken implementation of __clzdi2():
      
        root@debian:~# cat /proc/self/maps
        0000000010000-0000000019000 r-xp 00000000 000000008:000000005 787324  /usr/bin/cat
        0000000019000-000000001a000 rwxp 000000009000 000000008:000000005 787324  /usr/bin/cat
        000000001a000-000000003b000 rwxp 00000000 00:00 0  [heap]
        00000000f73d1000-00000000f758d000 r-xp 00000000 000000008:000000005 794765  /usr/lib/hppa-linux-gnu/libc.so.6
        ...
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Fixes: 4df87bb7 ("lib: add weak clz/ctz functions")
      Cc: Chanho Min <chanho.min@lge.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: stable@vger.kernel.org # v3.11+
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      382d4cd1
    • Linus Torvalds's avatar
      Merge tag 'mm-hotfixes-stable-2023-08-25-11-07' of... · 6f0edbb8
      Linus Torvalds authored
      Merge tag 'mm-hotfixes-stable-2023-08-25-11-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
      
      Pull misc fixes from Andrew Morton:
       "18 hotfixes. 13 are cc:stable and the remainder pertain to post-6.4
        issues or aren't considered suitable for a -stable backport"
      
      * tag 'mm-hotfixes-stable-2023-08-25-11-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
        shmem: fix smaps BUG sleeping while atomic
        selftests: cachestat: catch failing fsync test on tmpfs
        selftests: cachestat: test for cachestat availability
        maple_tree: disable mas_wr_append() when other readers are possible
        madvise:madvise_free_pte_range(): don't use mapcount() against large folio for sharing check
        madvise:madvise_free_huge_pmd(): don't use mapcount() against large folio for sharing check
        madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check
        mm: multi-gen LRU: don't spin during memcg release
        mm: memory-failure: fix unexpected return value in soft_offline_page()
        radix tree: remove unused variable
        mm: add a call to flush_cache_vmap() in vmap_pfn()
        selftests/mm: FOLL_LONGTERM need to be updated to 0x100
        nilfs2: fix general protection fault in nilfs_lookup_dirty_data_buffers()
        mm/gup: handle cont-PTE hugetlb pages correctly in gup_must_unshare() via GUP-fast
        selftests: cgroup: fix test_kmem_basic less than error
        mm: enable page walking API to lock vmas during the walk
        smaps: use vm_normal_page_pmd() instead of follow_trans_huge_pmd()
        mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT
      6f0edbb8
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-6.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · 4942fed8
      Linus Torvalds authored
      Pull RISC-V fixes from Palmer Dabbelt:
       "This is obviously not ideal, particularly for something this late in
        the cycle.
      
        Unfortunately we found some uABI issues in the vector support while
        reviewing the GDB port, which has triggered a revert -- probably a
        good sign we should have reviewed GDB before merging this, I guess I
        just dropped the ball because I was so worried about the context
        extension and libc suff I forgot. Hence the late revert.
      
        There's some risk here as we're still exposing the vector context for
        signal handlers, but changing that would have meant reverting all of
        the vector support. The issues we've found so far have been fixed
        already and they weren't absolute showstoppers, so we're essentially
        just playing it safe by holding ptrace support for another release (or
        until we get through a proper userspace code review).
      
        Summary:
      
         - The vector ucontext extension has been extended with vlenb
      
         - The vector registers ELF core dump note type has been changed to
           avoid aliasing with the CSR type used in embedded systems
      
         - Support for accessing vector registers via ptrace() has been
           reverted
      
         - Another build fix for the ISA spec changes around Zifencei/Zicsr
           that manifests on some systems built with binutils-2.37 and
           gcc-11.2"
      
      * tag 'riscv-for-linus-6.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: Fix build errors using binutils2.37 toolchains
        RISC-V: vector: export VLENB csr in __sc_riscv_v_state
        RISC-V: Remove ptrace support for vectors
      4942fed8
    • Linus Torvalds's avatar
      Merge tag 'gpio-fixes-for-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux · 98c6b8a5
      Linus Torvalds authored
      Pull gpio fixes from Bartosz Golaszewski:
      
       - fix an irq mapping leak in gpio-sim
      
       - associate the GPIO device's software node with the irq domain in
         gpio-sim
      
      * tag 'gpio-fixes-for-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
        gpio: sim: pass the GPIO device's software node to irq domain
        gpio: sim: dispose of irq mappings before destroying the irq_sim domain
      98c6b8a5
    • Linus Torvalds's avatar
      Merge tag 'pinctrl-v6.5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · a87eaffb
      Linus Torvalds authored
      Pull pin control fixes from Linus Walleij:
       "Here are some Renesas and AMD driver fixes, the AMD fix affects
        important laptops in the wild so this one is pretty important. It
        seems a bit tough to get this right.
      
         - Fix DT parsing and related locking in the Renesas driver.
      
         - Fix wakeup IRQs in the AMD driver once again. Really tricky this
           one"
      
      * tag 'pinctrl-v6.5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
        pinctrl: amd: Mask wake bits on probe again
        pinctrl: renesas: rza2: Add lock around pinctrl_generic{{add,remove}_group,{add,remove}_function}
        pinctrl: renesas: rzv2m: Fix NULL pointer dereference in rzv2m_dt_subnode_to_map()
        pinctrl: renesas: rzg2l: Fix NULL pointer dereference in rzg2l_dt_subnode_to_map()
      a87eaffb
    • Linus Torvalds's avatar
      Merge tag 'sound-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · ced5bf24
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "Hopefully the last bits for 6.5. It's slightly higher LOCs than
        wished, but it doesn't look scary.
      
        The biggest change is MAINTAINERS update for TI; it's good to have the
        update before the final release, so that people can contact to the
        right persons for bug reports (which shouldn't happen of course!)
      
        The rest are all device-specific fixes and quirks, most for various
        ASoC platforms"
      
      * tag 'sound-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ASoC: amd: yc: Fix a non-functional mic on Lenovo 82SJ
        ALSA: ymfpci: Fix the missing snd_card_free() call at probe error
        ASoC: cs35l41: Correct amp_gain_tlv values
        ASoC: amd: yc: Add VivoBook Pro 15 to quirks list for acp6x
        ASoC: tas2781: fixed register access error when switching to other chips
        ASoC: cs35l56: Add an ACPI match table
        ASoC: cs35l56: Read firmware uuid from a device property instead of _SUB
        ASoC: SOF: ipc4-pcm: fix possible null pointer deference
        MAINTAINERS: Add entries for TEXAS INSTRUMENTS ASoC DRIVERS
      ced5bf24
    • Tiezhu Yang's avatar
      LoongArch: Put the body of play_dead() into arch_cpu_idle_dead() · c337c849
      Tiezhu Yang authored
      The initial aim is to silence the following objtool warning:
      
      arch/loongarch/kernel/process.o: warning: objtool: arch_cpu_idle_dead() falls through to next function start_thread()
      
      According to tools/objtool/Documentation/objtool.txt, this is because
      the last instruction of arch_cpu_idle_dead() is a call to a noreturn
      function play_dead(). In order to silence the warning, one simple way
      is to add the noreturn function play_dead() to objtool's hard-coded
      global_noreturns array, that is to say, just put "NORETURN(play_dead)"
      into tools/objtool/noreturns.h, it works well.
      
      But I noticed that play_dead() is only defined once and only called by
      arch_cpu_idle_dead(), so put the body of play_dead() into the caller
      arch_cpu_idle_dead(), then remove the noreturn function play_dead() is
      an alternative way which can reduce the overhead of the function call
      at the same time.
      Signed-off-by: default avatarTiezhu Yang <yangtiezhu@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      c337c849