• Chris Wilson's avatar
    drm/i915: Rely on accurate request tracking for finding hung batches · 8d9fc7fd
    Chris Wilson authored
    In the past, it was possible to have multiple batches per request due to
    a stray signal or ENOMEM. As a result we had to scan each active object
    (filtered by those having the COMMAND domain) for the one that contained
    the ACTHD pointer. This was then made more complicated by the
    introduction of ppgtt, whereby ACTHD then pointed into the address space
    of the context and so also needed to be taken into account.
    
    This is a fairly robust approach (though the implementation is a little
    fragile and depends upon the per-generation setup, registers and
    parameters). However, due to the requirements for hangstats, we needed a
    robust method for associating batches with a particular request and
    having that we can rely upon it for finding the associated batch object
    for error capture.
    
    If the batch buffer tracking is not robust enough, that should become
    apparent quite quickly through an erroneous error capture. That should
    also help to make sure that the runtime reporting to userspace is
    robust. It also means that we then report the oldest incomplete batch on
    each ring, which can be useful for determining the state of userspace at
    the time of a hang.
    
    v2: Use i915_gem_find_active_request (Mika)
    
    v3: remove check for ring->get_seqno, split long lines (Ben)
    
    v4: check that context is available (Chris)
        checkpatch warnings fixed
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
    Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> (v3)
    Cc: Ben Widawsky <benjamin.widawsky@intel.com>
    Reviewed-by: Ben Widawsky <ben@bwidawsk.net> (v3)
    Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
    8d9fc7fd
i915_gpu_error.c 31.6 KB