Commits · ac9f97ff8b324905d457f2694490c63b9deccbc6 · Kirill Smelkov / linux

30 Aug, 2024 14 commits

powerpc/8xx: Inconditionally use task PGDIR in DTLB misses · ac9f97ff

Christophe Leroy authored Aug 20, 2024

At the time being, DATA TLB miss handlers use task PGDIR for user
addresses and swapper_pg_dir for kernel addresses.

Now that kernel part of swapper_pg_dir is copied into task PGDIR
at PGD allocation, it is possible to avoid the above logic and
always use task PGDIR.

But new kernel PGD entries can still be created after init, in
which case those PGD entries may miss in task PGDIR. This can be
handled in DATA TLB error handler.

However, it needs to be done in real mode because the missing
entry might be related to the stack.

So implement copy of missing PGD entry in the prolog of DATA TLB
ERROR handler just after the fixup of DAR.

Note that this is feasible because 8xx doesn't implement vmap or
ioremap with 8Mbytes pages but only 512kbytes pages which are at
PTE level.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/7a76a923d2a111f1d843d8b20b4df0c65d2f4a7b.1724173828.git.christophe.leroy@csgroup.eu

ac9f97ff

powerpc/8xx: Inconditionally use task PGDIR in ITLB misses · 33c52752

Christophe Leroy authored Aug 20, 2024

Now that modules exec page tables are preallocated, the instruction
TLBmiss handler can use task PGDIR inconditionally.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/774fd766a8b9bcb9173b5e677d5dad0df2d3970f.1724173828.git.christophe.leroy@csgroup.eu

33c52752

powerpc/8xx: Preallocate execmem page tables · 16a71c04

Christophe Leroy authored Aug 20, 2024

Preallocate execmem page tables before creating new PGDs so that
all PGD entries related to execmem can be copied in pgd_alloc().

On 8xx there are 32 Mbytes for execmem by default so this will use
32 kbytes.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/a7180cc1ba59dec4502af39b4e9f3ff91c57280d.1724173828.git.christophe.leroy@csgroup.eu

16a71c04

powerpc/8xx: Reduce default size of module/execmem area · c5eec4df

Christophe Leroy authored Aug 20, 2024

8xx boards don't have much memory, the two I know have respectively
32Mbytes and 128Mbytes, so there is no point in having 256 Mbytes of
memory for module text.

Reduce it to 32Mbytes for 8xx, that's more than enough.

Nevertheless, make it a configurable value so that it can be customised
if needed.

Also add a build verification for overlap of module execmem space
with user PMD.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/8db23b61e33a0d1913d814f94bfe71ba7ac78b0f.1724173828.git.christophe.leroy@csgroup.eu

c5eec4df

powerpc/8xx: Allow setting DATA alignment even with STRICT_KERNEL_RWX · bcf77a70

Christophe Leroy authored Aug 20, 2024

It is now possible to not pin kernel text with a 8Mbytes TLB, so
the alignment for STRICT_KERNEL_RWX can be relaxed.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/d0d8b05012b392dd166cfd911f14ba2741ce7e1e.1724173828.git.christophe.leroy@csgroup.eu

bcf77a70

Revert "powerpc/8xx: Always pin kernel text TLB" · 1a736d98

Christophe Leroy authored Aug 20, 2024

This reverts commit bccc5898.

When STRICT_KERNEL_RWX is selected, EXEC memory must stop where
RW memory start. When pinning iTLBs it means an 8M alignment for
RW data start. That may be acceptable on boards with a lot of
memory but one of my supported boards only has 32 Mbytes and this
forced alignment leads to a waste of almost 4 Mbytes with is more
than 10% of the total memory.

So revert commit bccc5898 ("powerpc/8xx: Always pin kernel text
TLB") but don't restore previous behaviour in ITLB miss handler
as now kernel PGD entries are copied into each process PGDIR.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/01b6780b860c8043b51a1ba9d83acfc6f2dde910.1724173828.git.christophe.leroy@csgroup.eu

1a736d98

powerpc/8xx: Copy kernel PGD entries into all PGDIRs · 985db026

Christophe Leroy authored Aug 20, 2024

In order to avoid having to select PGDIR at each TLB miss based on
fault address, copy kernel PGD entries into all PGDIRs in pgd_alloc().

At first it will be used for ITLB misses for kernel TEXT, then for
execmem then for kernel DATA.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/c6d2bf5af2ea909071a85bdca8b1f5dc2df134a8.1724173828.git.christophe.leroy@csgroup.eu

985db026

powerpc/8xx: Fix kernel vs user address comparison · 65a82e11

Christophe Leroy authored Aug 20, 2024

Since commit 9132a2e8 ("powerpc/8xx: Define a MODULE area below
kernel text"), module exec space is below PAGE_OFFSET so not only
space above PAGE_OFFSET, but space above TASK_SIZE need to be seen
as kernel space.

Until now the problem went undetected because by default TASK_SIZE
is 0x8000000 which means address space is determined by just
checking upper address bit. But when TASK_SIZE is over 0x80000000,
PAGE_OFFSET is used for comparison, leading to thinking module
addresses are part of user space.

Fix it by using TASK_SIZE instead of PAGE_OFFSET for address
comparison.

Fixes: 9132a2e8 ("powerpc/8xx: Define a MODULE area below kernel text")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/3f574c9845ff0a023b46cb4f38d2c45aecd769bd.1724173828.git.christophe.leroy@csgroup.eu

65a82e11

powerpc/8xx: Fix initial memory mapping · f9f2bff6

Christophe Leroy authored Aug 20, 2024

Commit cf209951 ("powerpc/8xx: Map linear memory with huge pages")
introduced an initial mapping of kernel TEXT using PAGE_KERNEL_TEXT,
but the pages that contain kernel TEXT may also contain kernel RODATA,
and depending on selected debug options PAGE_KERNEL_TEXT may be either
RWX or ROX. RODATA must be writable during init because it also
contains ro_after_init data.

So use PAGE_KERNEL_X instead to be sure it is RWX.

Fixes: cf209951 ("powerpc/8xx: Map linear memory with huge pages")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/dac7a828d8497c4548c91840575a706657baa4f1.1724173828.git.christophe.leroy@csgroup.eu

f9f2bff6

powerpc/powernv/pci: Remove obsoleted declaration for pnv_pci_init_ioda_hub · 10c8ac13

Gaosheng Cui authored Aug 22, 2024

The pnv_pci_init_ioda_hub() have been removed since
commit 5ac129cd ("powerpc/powernv/pci: Remove ioda1 support"),
and now it is useless, so remove it.
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240822130043.783756-1-cuigaosheng1@huawei.com

10c8ac13

powerpc: Remove obsoleted declarations for use_cop and drop_cop · 600d6a7e

Gaosheng Cui authored Aug 22, 2024

The use_cop() and drop_cop() have been removed since
commit 6ff4d3e9 ("powerpc: Remove old unused icswx based
coprocessor support"), now they are useless, so remove them.
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240822130609.786431-5-cuigaosheng1@huawei.com

600d6a7e

powerpc/pasemi: Remove obsoleted declaration for pas_pci_irq_fixup() · fe16a749

Gaosheng Cui authored Aug 22, 2024

The pas_pci_irq_fixup() have been removed since
commit 771f7404 ("pasemi_mac: Move the IRQ mapping from the
PCI layer to the driver"), and now it is useless, so remove it.
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240822130609.786431-4-cuigaosheng1@huawei.com

fe16a749

powerpc/maple: Remove obsoleted declaration for maple_calibrate_decr() · 6745c5bb

Gaosheng Cui authored Aug 22, 2024

The maple_calibrate_decr() have been removed since
commit 10f7e7c1 ("[PATCH] ppc64: consolidate calibrate_decr
implementations"), and now it is useless, so remove it.
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240822130609.786431-3-cuigaosheng1@huawei.com

6745c5bb

powerpc: Remove obsoleted declaration for _get_SP · dace02a9

Gaosheng Cui authored Aug 22, 2024

The implementation of _get_SP() was removed in commit f4db1967
("[POWERPC] Remove  _get_SP"), remove the now obsolete declaration.
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Update change log to refer to correct commit per Christophe]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240822130609.786431-2-cuigaosheng1@huawei.com

dace02a9

29 Aug, 2024 4 commits

powerpc/pseries/dlpar: Use helper function for_each_child_of_node() · 46f4bbb8

Zhang Zekun authored Aug 22, 2024

for_each_child_of_node can help to iterate through the device_node,
and we don't need to use while loop. No functional change with this
conversion.
Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240822085430.25753-3-zhangzekun11@huawei.com

46f4bbb8

powerpc/powermac/pfunc_base: Use helper function for_each_child_of_node() · 197116e2

Zhang Zekun authored Aug 22, 2024

for_each_child_of_node() can help to iterate through the device_node,
and we don't need to do it manually. No functional change with this
conversion.
Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240822085430.25753-2-zhangzekun11@huawei.com

197116e2

powerpc/64s/mm: Move __real_pte stubs into hash-4k.h · 8ae4f16f

Michael Ellerman authored Aug 21, 2024

The stub versions of __real_pte() etc are only used with HPT & 4K pages,
so move them into the hash-4k.h header.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240821080729.872034-1-mpe@ellerman.id.au

8ae4f16f

powerpc/configs/64s: Enable DEFERRED_STRUCT_PAGE_INIT · d6b34416

Michael Ellerman authored Aug 20, 2024

It can speed up initialisation of page structs at boot on large
machines.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240820065705.660812-1-mpe@ellerman.id.au

d6b34416

27 Aug, 2024 1 commit

powerpc/xmon: Fix tmpstr length check in scanhex · 0405e128

Madhavan Srinivasan authored Aug 26, 2024

If a function name is greater than 63 characters long, xmon command
may not find them. For example, here is a test that executed an illegal
instruction in a kernel function and one of call stack function has a
name greater than 63 characters long:

  cpu 0x0: Vector: 700 (Program Check) at [c00000000a6577e0]
      pc: c0000000001aacb8: check__allowed__function__name__for__symbol__r4+0x8/0x10
      lr: c00000000019c1e0: check__allowed__function__name__for__symbol__r1+0x20/0x40
      sp: c00000000a657a80
     msr: 800000000288b033
    current = 0xc00000000a439900
    paca    = 0xc000000003e90000	 irqmask: 0x03	 irq_happened: 0x01
  .....
  [link register   ] c00000000019c1e0 check__allowed__function__name__for__symbol__r1+0x20/0x40
  [c00000000a657a80] c00000000a439900 (unreliable)
  [c00000000a657aa0] c0000000001021d8 check__allowed__function__name__for__symbol__r2_resolution_symbol+0x38/0x4c
  [c00000000a657ac0] c00000000019b424 power_pmu_event_init+0xa4/0xa50

and when executing a dump instruction (di) command for long function
name, xmon fails to find the function symbol:

  0:mon> di $check__allowed__function__name__for__symbol__r2_resolution_symbol
  unknown symbol 'check__allowed__function__name__for__symbol__r2_resolution_symb'
  0000000000000000  ********

This is because in scanhex(), tmpstr loop index is checked only for
a upper bound of 63.

Fix it by replacing the upper bound value with (KSYM_NAME_LEN-1).

With fix:

  0:mon> di $check__allowed__function__name__for__symbol__r2_resolution_symbol
  c0000000001021a0  3c4c0249	addis   r2,r12,585
  c0000000001021a4  3842ae60	addi    r2,r2,-20896
  c0000000001021a8  7c0802a6	mflr    r0
  c0000000001021ac  60000000	nop
  .....
Reported-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Closes: https://lore.kernel.org/linuxppc-dev/CANiq72=QeTgtZL4k9=4CJP6C_Hv=rh3fsn3B9S3KFoPXkyWk3w@mail.gmail.com/Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240826064217.46658-1-maddy@linux.ibm.com

0405e128

21 Aug, 2024 6 commits

powerpc/code-patching: Add boot selftest for data patching · b7d47339

Benjamin Gray authored May 15, 2024

Extend the code patching selftests with some basic coverage of the new
data patching variants too.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240515024445.236364-6-bgray@linux.ibm.com

b7d47339

powerpc/32: Convert patch_instruction() to patch_uint() · 5799cd76

Benjamin Gray authored May 15, 2024

These changes are for patch_instruction() uses on data. Unlike ppc64
these should not be incorrect as-is, but using the patch_uint() alias
better reflects what kind of data being patched and allows for
benchmarking the effect of different patch_* implementations (e.g.,
skipping instruction flushing when patching data).
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Tested-by: Hari Bathini <hbathini@linux.ibm.com>
Acked-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240515024445.236364-5-bgray@linux.ibm.com

5799cd76

powerpc/64: Convert patch_instruction() to patch_u32() · 90d4fed5

Benjamin Gray authored May 15, 2024

This use of patch_instruction() is working on 32 bit data, and can fail
if the data looks like a prefixed instruction and the extra write
crosses a page boundary. Use patch_u32() to fix the write size.

Fixes: 8734b41b ("powerpc/module_64: Fix livepatching for RO modules")
Link: https://lore.kernel.org/all/20230203004649.1f59dbd4@yea/Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Tested-by: Hari Bathini <hbathini@linux.ibm.com>
Acked-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240515024445.236364-4-bgray@linux.ibm.com

90d4fed5

powerpc/code-patching: Add data patch alignment check · dbf828aa

Benjamin Gray authored May 15, 2024

The new data patching still needs to be aligned within a
cacheline too for the flushes to work correctly. To simplify
this requirement, we just say data patches must be aligned.

Detect when data patching is not aligned, returning an invalid
argument error.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Acked-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240515024445.236364-3-bgray@linux.ibm.com

dbf828aa

powerpc/code-patching: Add generic memory patching · e6b8940e

Benjamin Gray authored May 15, 2024

patch_instruction() is designed for patching instructions in otherwise
readonly memory. Other consumers also sometimes need to patch readonly
memory, so have abused patch_instruction() for arbitrary data patches.

This is a problem on ppc64 as patch_instruction() decides on the patch
width using the 'instruction' opcode to see if it's a prefixed
instruction. Data that triggers this can lead to larger writes, possibly
crossing a page boundary and failing the write altogether.

Introduce patch_uint(), and patch_ulong(), with aliases patch_u32(), and
patch_u64() (on ppc64) designed for aligned data patches. The patch
size is now determined by the called function, and is passed as an
additional parameter to generic internals.

While the instruction flushing is not required for data patches, it
remains unconditional in this patch. A followup series is possible if
benchmarking shows fewer flushes gives an improvement in some
data-patching workload.

ppc32 does not support prefixed instructions, so is unaffected by the
original issue. Care is taken in not exposing the size parameter in the
public (non-static) interface, so the compiler can const-propagate it
away.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240515024445.236364-2-bgray@linux.ibm.com

e6b8940e

powerpc: Remove unused LHZX_BE macro · a540ad3e

Christophe Leroy authored Aug 21, 2024

LHZX_BE has been unused since commit dbf44daf ("bpf, ppc64: remove
ld_abs/ld_ind")

Remove it.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/fd332b01c47bb9cb6c3af1696a2e109be655f5b5.1724222856.git.christophe.leroy@csgroup.eu

a540ad3e

19 Aug, 2024 2 commits

MAINTAINERS: Mark powerpc spufs as orphaned · 81695066

Michael Ellerman authored Jul 26, 2024

Jeremy is no longer actively maintaining spufs, mark it as orphan.

Also drop the dead developerworks link.
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240726123322.1165562-2-mpe@ellerman.id.au

81695066

MAINTAINERS: Mark powerpc Cell as orphaned · db9a6391

Michael Ellerman authored Jul 26, 2024

Arnd is no longer actively maintaining Cell, mark it as orphan.

Also drop the dead developerworks link.
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240726123322.1165562-1-mpe@ellerman.id.au

db9a6391

07 Aug, 2024 5 commits

powerpc: Remove useless config comment in asm/percpu.h · fa740ca8

Jinjie Ruan authored Aug 07, 2024

commit 0db880fc ("powerpc: Avoid nmi_enter/nmi_exit in real mode
interrupt.") has a config comment typo, and the #if/#else/#endif section
is small and doesn't nest additional #ifdefs so the comment is useless
and should be removed completely.
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240807025604.2817577-1-ruanjinjie@huawei.com

fa740ca8

powerpc/476: Drop explicit initialization of struct i2c_device_id::driver_data to 0 · c4afe3eb

Uwe Kleine-König authored Aug 04, 2024

This driver doesn't use the driver_data member of struct i2c_device_id,
so don't explicitly initialize this member.

This prepares putting driver_data in an anonymous union which requires
either no initialization or named designators. But it's also a nice
cleanup on its own.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240804112032.3628645-2-u.kleine-koenig@baylibre.com

c4afe3eb

macintosh/via-pmu-backlight: Use backlight power constants · c7907a47

Thomas Zimmermann authored Jul 31, 2024

Replace FB_BLANK_ constants with their counterparts from the
backlight subsystem. The values are identical, so there's no
change in functionality or semantics.

via-pmu-backlight.c already includes backlight.h where the
BACKLIGHT constants are defined.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240731130720.1148872-3-tzimmermann@suse.de

c7907a47

powerpc/traps: Use backlight power constants · 28455894

Thomas Zimmermann authored Jul 31, 2024

Replace FB_BLANK_ constants with their counterparts from the
backlight subsystem. The values are identical, so there's no
change in functionality or semantics.

traps.c already includes backlight.h where the BACKLIGHT
constants are defined.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240731130720.1148872-2-tzimmermann@suse.de

28455894

powerpc: Use of_property_present() · 8a93960a

Rob Herring (Arm) authored Jul 31, 2024

Use of_property_present() to test for property presence rather than
of_get_property(). This is part of a larger effort to remove callers
of of_get_property() and similar functions. of_get_property() leaks
the DT property data pointer which is a problem for dynamically
allocated nodes which may be freed.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240731191312.1710417-9-robh@kernel.org

8a93960a

04 Aug, 2024 8 commits

Linux 6.11-rc2 · de9c2c66
Linus Torvalds authored Aug 04, 2024

de9c2c66

profiling: remove profile=sleep support · b88f5538

Tetsuo Handa authored Aug 04, 2024

The kernel sleep profile is no longer working due to a recursive locking
bug introduced by commit 42a20f86 ("sched: Add wrapper for get_wchan()
to keep task blocked")

Booting with the 'profile=sleep' kernel command line option added or
executing

  # echo -n sleep > /sys/kernel/profiling

after boot causes the system to lock up.

Lockdep reports

  kthreadd/3 is trying to acquire lock:
  ffff93ac82e08d58 (&p->pi_lock){....}-{2:2}, at: get_wchan+0x32/0x70

  but task is already holding lock:
  ffff93ac82e08d58 (&p->pi_lock){....}-{2:2}, at: try_to_wake_up+0x53/0x370

with the call trace being

   lock_acquire+0xc8/0x2f0
   get_wchan+0x32/0x70
   __update_stats_enqueue_sleeper+0x151/0x430
   enqueue_entity+0x4b0/0x520
   enqueue_task_fair+0x92/0x6b0
   ttwu_do_activate+0x73/0x140
   try_to_wake_up+0x213/0x370
   swake_up_locked+0x20/0x50
   complete+0x2f/0x40
   kthread+0xfb/0x180

However, since nobody noticed this regression for more than two years,
let's remove 'profile=sleep' support based on the assumption that nobody
needs this functionality.

Fixes: 42a20f86 ("sched: Add wrapper for get_wchan() to keep task blocked")
Cc: stable@vger.kernel.org # v5.16+
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

b88f5538

Merge tag 'x86-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a5dbd76a

Linus Torvalds authored Aug 04, 2024

Pull x86 fixes from Thomas Gleixner:

 - Prevent a deadlock on cpu_hotplug_lock in the aperf/mperf driver.

   A recent change in the ACPI code which consolidated code pathes moved
   the invocation of init_freq_invariance_cppc() to be moved to a CPU
   hotplug handler. The first invocation on AMD CPUs ends up enabling a
   static branch which dead locks because the static branch enable tries
   to acquire cpu_hotplug_lock but that lock is already held write by
   the hotplug machinery.

   Use static_branch_enable_cpuslocked() instead and take the hotplug
   lock read for the Intel code path which is invoked from the
   architecture code outside of the CPU hotplug operations.

 - Fix the number of reserved bits in the sev_config structure bit field
   so that the bitfield does not exceed 64 bit.

 - Add missing Zen5 model numbers

 - Fix the alignment assumptions of pti_clone_pgtable() and
   clone_entry_text() on 32-bit:

   The code assumes PMD aligned code sections, but on 32-bit the kernel
   entry text is not PMD aligned. So depending on the code size and
   location, which is configuration and compiler dependent, entry text
   can cross a PMD boundary. As the start is not PMD aligned adding PMD
   size to the start address is larger than the end address which
   results in partially mapped entry code for user space. That causes
   endless recursion on the first entry from userspace (usually #PF).

   Cure this by aligning the start address in the addition so it ends up
   at the next PMD start address.

   clone_entry_text() enforces PMD mapping, but on 32-bit the tail might
   eventually be PTE mapped, which causes a map fail because the PMD for
   the tail is not a large page mapping. Use PTI_LEVEL_KERNEL_IMAGE for
   the clone() invocation which resolves to PTE on 32-bit and PMD on
   64-bit.

 - Zero the 8-byte case for get_user() on range check failure on 32-bit

   The recend consolidation of the 8-byte get_user() case broke the
   zeroing in the failure case again. Establish it by clearing ECX
   before the range check and not afterwards as that obvioulsy can't be
   reached when the range check fails

* tag 'x86-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/uaccess: Zero the 8-byte get_range case on failure on 32-bit
  x86/mm: Fix pti_clone_entry_text() for i386
  x86/mm: Fix pti_clone_pgtable() alignment assumption
  x86/setup: Parse the builtin command line before merging
  x86/CPU/AMD: Add models 0x60-0x6f to the Zen5 range
  x86/sev: Fix __reserved field in sev_config
  x86/aperfmperf: Fix deadlock on cpu_hotplug_lock

a5dbd76a

Merge tag 'timers-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 61ca6c78

Linus Torvalds authored Aug 04, 2024

Pull timer fixes from Thomas Gleixner:
 "Two fixes for the timer/clocksource code:

   - The recent fix to make the take over of the broadcast timer more
     reliable retrieves a per CPU pointer in preemptible context.

     This went unnoticed in testing as some compilers hoist the access
     into the non-preemotible section where the pointer is actually
     used, but obviously compilers can rightfully invoke it where the
     code put it.

     Move it into the non-preemptible section right to the actual usage
     side to cure it.

   - The clocksource watchdog is supposed to emit a warning when the
     retry count is greater than one and the number of retries reaches
     the limit.

     The condition is backwards and warns always when the count is
     greater than one. Fixup the condition to prevent spamming dmesg"

* tag 'timers-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource: Fix brown-bag boolean thinko in cs_watchdog_read()
  tick/broadcast: Move per CPU pointer access into the atomic section

61ca6c78

Merge tag 'sched-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 6cc82dc2

Linus Torvalds authored Aug 04, 2024

Pull scheduler fixes from Thomas Gleixner:

 - When stime is larger than rtime due to accounting imprecision, then
   utime = rtime - stime becomes negative. As this is unsigned math, the
   result becomes a huge positive number.

   Cure it by resetting stime to rtime in that case, so utime becomes 0.

 - Restore consistent state when sched_cpu_deactivate() fails.

   When offlining a CPU fails in sched_cpu_deactivate() after the SMT
   present counter has been decremented, then the function aborts but
   fails to increment the SMT present counter and leaves it imbalanced.
   Consecutive operations cause it to underflow. Add the missing fixup
   for the error path.

   For SMT accounting the runqueue needs to marked online again in the
   error exit path to restore consistent state.

* tag 'sched-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/core: Fix unbalance set_rq_online/offline() in sched_cpu_deactivate()
  sched/core: Introduce sched_set_rq_on/offline() helper
  sched/smt: Fix unbalance sched_smt_present dec/inc
  sched/smt: Introduce sched_smt_present_inc/dec() helper
  sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime

6cc82dc2

Merge tag 'perf-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1ddeb0ef

Linus Torvalds authored Aug 04, 2024

Pull x86 perf fixes from Thomas Gleixner:

 - Move the smp_processor_id() invocation back into the non-preemtible
   region, so that the result is valid to use

 - Add the missing package C2 residency counters for Sierra Forest CPUs
   to make the newly added support actually useful

* tag 'perf-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Fix smp_processor_id()-in-preemptible warnings
  perf/x86/intel/cstate: Add pkg C2 residency counter for Sierra Forest

1ddeb0ef

Merge tag 'irq-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 953f7764

Linus Torvalds authored Aug 04, 2024

Pull irq fixes from Thomas Gleixner:
 "A couple of fixes for interrupt chip drivers:

   - Make sure to skip the clear register space in the MBIGEN driver
     when calculating the node register index. Otherwise the clear
     register is clobbered and the wrong node registers are accessed.

   - Fix a signed/unsigned confusion in the loongarch CPU driver which
     converts an error code to a huge "valid" interrupt number.

   - Convert the mesion GPIO interrupt controller lock to a raw spinlock
     so it works on RT.

   - Add a missing static to a internal function in the pic32 EVIC
     driver"

* tag 'irq-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/mbigen: Fix mbigen node address layout
  irqchip/meson-gpio: Convert meson_gpio_irq_controller::lock to 'raw_spinlock_t'
  irqchip/irq-pic32-evic: Add missing 'static' to internal function
  irqchip/loongarch-cpu: Fix return value of lpic_gsi_to_irq()

953f7764

Merge tag 'locking-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 3bc70ad1

Linus Torvalds authored Aug 04, 2024

Pull locking fixes from Thomas Gleixner:
 "Two fixes for locking and jump labels:

   - Ensure that the atomic_cmpxchg() conditions are correct and
     evaluating to true on any non-zero value except 1. The missing
     check of the return value leads to inconsisted state of the jump
     label counter.

   - Add a missing type conversion in the paravirt spinlock code which
     makes loongson build again"

* tag 'locking-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  jump_label: Fix the fix, brown paper bags galore
  locking/pvqspinlock: Correct the type of "old" variable in pv_kick_node()

3bc70ad1