• Douglas Anderson's avatar
    kdb: Fix stack crawling on 'running' CPUs that aren't the master · 2277b492
    Douglas Anderson authored
    In kdb when you do 'btc' (back trace on CPU) it doesn't necessarily
    give you the right info.  Specifically on many architectures
    (including arm64, where I tested) you can't dump the stack of a
    "running" process that isn't the process running on the current CPU.
    This can be seen by this:
    
     echo SOFTLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
     # wait 2 seconds
     <sysrq>g
    
    Here's what I see now on rk3399-gru-kevin.  I see the stack crawl for
    the CPU that handled the sysrq but everything else just shows me stuck
    in __switch_to() which is bogus:
    
    ======
    
    [0]kdb> btc
    btc: cpu status: Currently on cpu 0
    Available cpus: 0, 1-3(I), 4, 5(I)
    Stack traceback for pid 0
    0xffffff801101a9c0        0        0  1    0   R  0xffffff801101b3b0 *swapper/0
    Call trace:
     dump_backtrace+0x0/0x138
     ...
     kgdb_compiled_brk_fn+0x34/0x44
     ...
     sysrq_handle_dbg+0x34/0x5c
    Stack traceback for pid 0
    0xffffffc0f175a040        0        0  1    1   I  0xffffffc0f175aa30  swapper/1
    Call trace:
     __switch_to+0x1e4/0x240
     0xffffffc0f65616c0
    Stack traceback for pid 0
    0xffffffc0f175d040        0        0  1    2   I  0xffffffc0f175da30  swapper/2
    Call trace:
     __switch_to+0x1e4/0x240
     0xffffffc0f65806c0
    Stack traceback for pid 0
    0xffffffc0f175b040        0        0  1    3   I  0xffffffc0f175ba30  swapper/3
    Call trace:
     __switch_to+0x1e4/0x240
     0xffffffc0f659f6c0
    Stack traceback for pid 1474
    0xffffffc0dde8b040     1474      727  1    4   R  0xffffffc0dde8ba30  bash
    Call trace:
     __switch_to+0x1e4/0x240
     __schedule+0x464/0x618
     0xffffffc0dde8b040
    Stack traceback for pid 0
    0xffffffc0f17b0040        0        0  1    5   I  0xffffffc0f17b0a30  swapper/5
    Call trace:
     __switch_to+0x1e4/0x240
     0xffffffc0f65dd6c0
    
    ===
    
    The problem is that 'btc' eventually boils down to
      show_stack(task_struct, NULL);
    
    ...and show_stack() doesn't work for "running" CPUs because their
    registers haven't been stashed.
    
    On x86 things might work better (I haven't tested) because kdb has a
    special case for x86 in kdb_show_stack() where it passes the stack
    pointer to show_stack().  This wouldn't work on arm64 where the stack
    crawling function seems needs the "fp" and "pc", not the "sp" which is
    presumably why arm64's show_stack() function totally ignores the "sp"
    parameter.
    
    NOTE: we _can_ get a good stack dump for all the cpus if we manually
    switch each one to the kdb master and do a back trace.  AKA:
      cpu 4
      bt
    ...will give the expected trace.  That's because now arm64's
    dump_backtrace will now see that "tsk == current" and go through a
    different path.
    
    In this patch I fix the problems by catching a request to stack crawl
    a task that's running on a CPU and then I ask that CPU to do the stack
    crawl.
    
    NOTE: this will (presumably) change what stack crawls are printed for
    x86 machines.  Now kdb functions will show up in the stack crawl.
    Presumably this is OK but if it's not we can go back and add a special
    case for x86 again.
    Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
    Acked-by: default avatarWill Deacon <will@kernel.org>
    Signed-off-by: default avatarDaniel Thompson <daniel.thompson@linaro.org>
    2277b492
debug_core.c 27.3 KB