• Nicholas Piggin's avatar
    powerpc: smp_send_stop do not offline stopped CPUs · de6e5d38
    Nicholas Piggin authored
    Marking CPUs stopped by smp_send_stop as offline can cause warnings
    due to cross-CPU wakeups. This trace was noticed on a busy system
    running a sysrq+c crash test, after the injected crash:
    
    WARNING: CPU: 51 PID: 1546 at kernel/sched/core.c:1179 set_task_cpu+0x22c/0x240
    CPU: 51 PID: 1546 Comm: kworker/u352:1 Tainted: G      D
    Workqueue: mlx5e mlx5e_update_stats_work [mlx5_core]
    [...]
    NIP [c00000000017c21c] set_task_cpu+0x22c/0x240
    LR [c00000000017d580] try_to_wake_up+0x230/0x720
    Call Trace:
    [c000000001017700] runqueues+0x0/0xb00 (unreliable)
    [c00000000017d580] try_to_wake_up+0x230/0x720
    [c00000000015a214] insert_work+0x104/0x140
    [c00000000015adb0] __queue_work+0x230/0x690
    [c000003fc5007910] [c00000000015b26c] queue_work_on+0x5c/0x90
    [c0080000135fc8f8] mlx5_cmd_exec+0x538/0xcb0 [mlx5_core]
    [c008000013608fd0] mlx5_core_access_reg+0x140/0x1d0 [mlx5_core]
    [c00800001362777c] mlx5e_update_pport_counters.constprop.59+0x6c/0x90 [mlx5_core]
    [c008000013628868] mlx5e_update_ndo_stats+0x28/0x90 [mlx5_core]
    [c008000013625558] mlx5e_update_stats_work+0x68/0xb0 [mlx5_core]
    [c00000000015bcec] process_one_work+0x1bc/0x5f0
    [c00000000015ecac] worker_thread+0xac/0x6b0
    [c000000000168338] kthread+0x168/0x1b0
    [c00000000000b628] ret_from_kernel_thread+0x5c/0xb4
    
    This happens because firstly the CPU is not really offline in the
    usual sense, processes and interrupts have not been migrated away.
    Secondly smp_send_stop does not happen atomically on all CPUs, so
    one CPU can have marked itself offline, while another CPU is still
    running processes or interrupts which can affect the first CPU.
    
    Fix this by just not marking the CPU as offline. It's more like
    frozen in time, so offline does not really reflect its state properly
    anyway. There should be nothing in the crash/panic path that walks
    online CPUs and synchronously waits for them, so this change should
    not introduce new hangs.
    Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
    Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    de6e5d38
smp.c 27.4 KB