-
Matthew Brost authored
In GuC TDR sample ctx timestamp to determine if jobs have timed out. The scheduling enable needs to be toggled to properly sample the timestamp. If a job has not been running for longer than the timeout period, re-enable scheduling and restart the TDR. v2: - Use GT clock to msec helper (Umesh, off list) - s/ctx_timestamp_job/ctx_job_timestamp v3: - Fix state machine for TDR, mainly decouple sched disable and deregister (testing) - Rebase (CI) v4: - Fix checkpatch && newline issue (CI) - Do not deregister on wedged or unregistered (CI) - Fix refcounting bugs (CI) - Move devcoredump above VM / kernel job check (John H) - Add comment for check_timeout state usage (John H) - Assert pending disable not inflight when enabling scheduling (John H) - Use enable_scheduling in other scheduling enable code (John H) - Add comments on a few steps in TDR (John H) - Add assert for timestamp overflow protection (John H) v6: - Use mul_u64_u32_div (CI, checkpath) - Change check time to dbg level (Paulo) - Add immediate mode to sched disable (inspection) - Use xe_gt_* messages (John H) - Fix typo in comment (John H) - Check timeout before clearing pending disable (Paulo) v7: - Fix ADJUST_FIVE_PERCENT macro (checkpatch) - Don't print sched disable failure message on GT reset (John H) - Move kernel / VM jobs WARNs near comment (John H) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240611144053.2805091-12-matthew.brost@intel.com
7ddb9403