• John Harrison's avatar
    drm/i915: Allow error capture without a request · e8a3319c
    John Harrison authored
    There was a report of error captures occurring without any hung
    context being indicated despite the capture being initiated by a 'hung
    context notification' from GuC. The problem was not reproducible.
    However, it is possible to happen if the context in question has no
    active requests. For example, if the hang was in the context switch
    itself then the breadcrumb write would have occurred and the KMD would
    see an idle context.
    
    In the interests of attempting to provide as much information as
    possible about a hang, it seems wise to include the engine info
    regardless of whether a request was found or not. As opposed to just
    prentending there was no hang at all.
    
    So update the error capture code to always record engine information
    if a context is given. Which means updating record_context() to take a
    context instead of a request (which it only ever used to find the
    context anyway). And split the request agnostic parts of
    intel_engine_coredump_add_request() out into a seaprate function.
    
    v2: Remove a duplicate 'if' statement (Umesh) and fix a put of a null
    pointer.
    v3: Tidy up request locking code flow (Tvrtko)
    v4: Pull in improved info message from next patch and fix up potential
    leak of GuC register state (Daniele)
    Signed-off-by: default avatarJohn Harrison <John.C.Harrison@Intel.com>
    Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> (v2)
    Reviewed-by: default avatarDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
    Acked-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20230127002842.3169194-5-John.C.Harrison@Intel.com
    e8a3319c
i915_gpu_error.c 55.4 KB