1. 23 Jul, 2024 35 commits
  2. 22 Jul, 2024 4 commits
  3. 18 Jul, 2024 1 commit
    • Matthew Brost's avatar
      drm/xe: Don't suspend device upon wedge · 90936a0a
      Matthew Brost authored
      When wedging a device we shouldn't be suspending device as state for
      debug will be lost.
      
      Also this appears to not work as the below stack trace pops upon trying
      to resume a wedged device:
      
      [  304.245044] INFO: task cat:12115 blocked for more than 151 seconds.
      [  304.251333]       Tainted: G        W          6.10.0-rc7-xe+ #3518
      [  304.257617] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  304.265459] task:cat             state:D stack:13384 pid:12115 tgid:12115 ppid:3986   flags:0x00000006
      [  304.265465] Call Trace:
      [  304.265467]  <TASK>
      [  304.265469]  __schedule+0x3c4/0xdf0
      [  304.265478]  schedule+0x3c/0x140
      [  304.265481]  rpm_resume+0x1cc/0x740
      [  304.265484]  ? __pfx_autoremove_wake_function+0x10/0x10
      [  304.265489]  __pm_runtime_resume+0x49/0x80
      [  304.265494]  guc_info+0x6b/0xb0 [xe]
      [  304.265538]  ? __pfx___drm_printfn_seq_file+0x10/0x10
      [  304.265541]  ? __pfx___drm_puts_seq_file+0x10/0x10
      [  304.265545]  seq_read_iter+0x111/0x4c0
      [  304.265551]  seq_read+0xfc/0x140
      [  304.265556]  full_proxy_read+0x58/0x80
      [  304.265560]  vfs_read+0xa7/0x360
      [  304.265563]  ? find_held_lock+0x2b/0x80
      [  304.265568]  ksys_read+0x64/0xe0
      [  304.265571]  do_syscall_64+0x68/0x140
      [  304.265575]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
      [  304.265578] RIP: 0033:0x7f4254d14992
      [  304.265580] RSP: 002b:00007ffc558666f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
      [  304.265583] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f4254d14992
      [  304.265584] RDX: 0000000000020000 RSI: 00007f4254ebb000 RDI: 0000000000000003
      [  304.265586] RBP: 00007f4254ebb000 R08: 00007f4254eba010 R09: 00007f4254eba010
      [  304.265587] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000022000
      [  304.265588] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
      [  304.265593]  </TASK>
      [  304.265594]
                     Showing all locks held in the system:
      [  304.265598] 1 lock held by khungtaskd/57:
      [  304.265599]  #0: ffffffff8273b860 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x36/0x1c0
      [  304.265607] 3 locks held by kworker/6:1/90:
      [  304.265610] 1 lock held by in:imklog/547:
      [  304.265611]  #0: ffff88810498cd88 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x76/0xc0
      [  304.265620] 1 lock held by dmesg/1310:
      
      v2: Drop local 'err' variable (Jonathan)
      
      Fixes: 8ed9aaae ("drm/xe: Force wedged state and block GT reset upon any GPU hang")
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Signed-off-by: default avatarMatthew Brost <matthew.brost@intel.com>
      Reviewed-by: default avatarJonathan Cavitt <jonathan.cavitt@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240716063902.1390130-2-matthew.brost@intel.com
      (cherry picked from commit 452bca0e)
      Signed-off-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      90936a0a