• Zhen Lei's avatar
    sched/debug: Show the registers of 'current' in dump_cpu_task() · bc1cca97
    Zhen Lei authored
    The dump_cpu_task() function does not print registers on architectures
    that do not support NMIs.  However, registers can be useful for
    debugging.  Fortunately, in the case where dump_cpu_task() is invoked
    from an interrupt handler and is dumping the current CPU's stack, the
    get_irq_regs() function can be used to get the registers.
    
    Therefore, this commit makes dump_cpu_task() check to see if it is being
    asked to dump the current CPU's stack from within an interrupt handler,
    and, if so, it uses the get_irq_regs() function to obtain the registers.
    On systems that do support NMIs, this commit has the further advantage
    of avoiding a self-NMI in this case.
    
    This is an example of rcu self-detected stall on arm64, which does not
    support NMIs:
    [   27.501721] rcu: INFO: rcu_preempt self-detected stall on CPU
    [   27.502238] rcu:     0-....: (1250 ticks this GP) idle=4f7/1/0x4000000000000000 softirq=2594/2594 fqs=619
    [   27.502632]  (t=1251 jiffies g=2989 q=29 ncpus=4)
    [   27.503845] CPU: 0 PID: 306 Comm: test0 Not tainted 5.19.0-rc7-00009-g1c1a6c29ff99-dirty #46
    [   27.504732] Hardware name: linux,dummy-virt (DT)
    [   27.504947] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
    [   27.504998] pc : arch_counter_read+0x18/0x24
    [   27.505301] lr : arch_counter_read+0x18/0x24
    [   27.505328] sp : ffff80000b29bdf0
    [   27.505345] x29: ffff80000b29bdf0 x28: 0000000000000000 x27: 0000000000000000
    [   27.505475] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
    [   27.505553] x23: 0000000000001f40 x22: ffff800009849c48 x21: 000000065f871ae0
    [   27.505627] x20: 00000000000025ec x19: ffff80000a6eb300 x18: ffffffffffffffff
    [   27.505654] x17: 0000000000000001 x16: 0000000000000000 x15: ffff80000a6d0296
    [   27.505681] x14: ffffffffffffffff x13: ffff80000a29bc18 x12: 0000000000000426
    [   27.505709] x11: 0000000000000162 x10: ffff80000a2f3c18 x9 : ffff80000a29bc18
    [   27.505736] x8 : 00000000ffffefff x7 : ffff80000a2f3c18 x6 : 00000000759bd013
    [   27.505761] x5 : 01ffffffffffffff x4 : 0002dc6c00000000 x3 : 0000000000000017
    [   27.505787] x2 : 00000000000025ec x1 : ffff80000b29bdf0 x0 : 0000000075a30653
    [   27.505937] Call trace:
    [   27.506002]  arch_counter_read+0x18/0x24
    [   27.506171]  ktime_get+0x48/0xa0
    [   27.506207]  test_task+0x70/0xf0
    [   27.506227]  kthread+0x10c/0x110
    [   27.506243]  ret_from_fork+0x10/0x20
    
    This is a marked improvement over the old output:
    [   27.944550] rcu: INFO: rcu_preempt self-detected stall on CPU
    [   27.944980] rcu:     0-....: (1249 ticks this GP) idle=cbb/1/0x4000000000000000 softirq=2610/2610 fqs=614
    [   27.945407]  (t=1251 jiffies g=2681 q=28 ncpus=4)
    [   27.945731] Task dump for CPU 0:
    [   27.945844] task:test0           state:R  running task     stack:    0 pid:  306 ppid:     2 flags:0x0000000a
    [   27.946073] Call trace:
    [   27.946151]  dump_backtrace.part.0+0xc8/0xd4
    [   27.946378]  show_stack+0x18/0x70
    [   27.946405]  sched_show_task+0x150/0x180
    [   27.946427]  dump_cpu_task+0x44/0x54
    [   27.947193]  rcu_dump_cpu_stacks+0xec/0x130
    [   27.947212]  rcu_sched_clock_irq+0xb18/0xef0
    [   27.947231]  update_process_times+0x68/0xac
    [   27.947248]  tick_sched_handle+0x34/0x60
    [   27.947266]  tick_sched_timer+0x4c/0xa4
    [   27.947281]  __hrtimer_run_queues+0x178/0x360
    [   27.947295]  hrtimer_interrupt+0xe8/0x244
    [   27.947309]  arch_timer_handler_virt+0x38/0x4c
    [   27.947326]  handle_percpu_devid_irq+0x88/0x230
    [   27.947342]  generic_handle_domain_irq+0x2c/0x44
    [   27.947357]  gic_handle_irq+0x44/0xc4
    [   27.947376]  call_on_irq_stack+0x2c/0x54
    [   27.947415]  do_interrupt_handler+0x80/0x94
    [   27.947431]  el1_interrupt+0x34/0x70
    [   27.947447]  el1h_64_irq_handler+0x18/0x24
    [   27.947462]  el1h_64_irq+0x64/0x68                       <--- the above backtrace is worthless
    [   27.947474]  arch_counter_read+0x18/0x24
    [   27.947487]  ktime_get+0x48/0xa0
    [   27.947501]  test_task+0x70/0xf0
    [   27.947520]  kthread+0x10c/0x110
    [   27.947538]  ret_from_fork+0x10/0x20
    Signed-off-by: default avatarZhen Lei <thunder.leizhen@huawei.com>
    Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Juri Lelli <juri.lelli@redhat.com>
    Cc: Vincent Guittot <vincent.guittot@linaro.org>
    Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
    Cc: Ben Segall <bsegall@google.com>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
    Cc: Valentin Schneider <vschneid@redhat.com>
    bc1cca97
core.c 285 KB