• Matthew Brost's avatar
    drm/xe: Sample ctx timestamp to determine if jobs have timed out · 7ddb9403
    Matthew Brost authored
    In GuC TDR sample ctx timestamp to determine if jobs have timed out. The
    scheduling enable needs to be toggled to properly sample the timestamp.
    If a job has not been running for longer than the timeout period,
    re-enable scheduling and restart the TDR.
    
    v2:
     - Use GT clock to msec helper (Umesh, off list)
     - s/ctx_timestamp_job/ctx_job_timestamp
    v3:
     - Fix state machine for TDR, mainly decouple sched disable and
       deregister (testing)
     - Rebase (CI)
    v4:
     - Fix checkpatch && newline issue (CI)
     - Do not deregister on wedged or unregistered (CI)
     - Fix refcounting bugs (CI)
     - Move devcoredump above VM / kernel job check (John H)
     - Add comment for check_timeout state usage (John H)
     - Assert pending disable not inflight when enabling scheduling (John H)
     - Use enable_scheduling in other scheduling enable code (John H)
     - Add comments on a few steps in TDR (John H)
     - Add assert for timestamp overflow protection (John H)
    v6:
     - Use mul_u64_u32_div (CI, checkpath)
     - Change check time to dbg level (Paulo)
     - Add immediate mode to sched disable (inspection)
     - Use xe_gt_* messages (John H)
     - Fix typo in comment (John H)
     - Check timeout before clearing pending disable (Paulo)
    v7:
     - Fix ADJUST_FIVE_PERCENT macro (checkpatch)
     - Don't print sched disable failure message on GT reset (John H)
     - Move kernel / VM jobs WARNs near comment (John H)
    Signed-off-by: default avatarMatthew Brost <matthew.brost@intel.com>
    Reviewed-by: default avatarJonathan Cavitt <jonathan.cavitt@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240611144053.2805091-12-matthew.brost@intel.com
    7ddb9403
xe_guc_submit.c 59.6 KB