• Chris Wilson's avatar
    drm/i915/execlists: Always clear pending&inflight requests on reset · 10e36489
    Chris Wilson authored
    If we skip the reset as we found the engine inactive at the time of the
    reset, we still need to clear the residual inflight & pending request
    bookkeeping to reflect the current state of HW.
    
    Otherwise, we may end up stuck in a loop like:
    
    <7> [416.490346] hangcheck rcs0
    <7> [416.490371] hangcheck 	Awake? 1
    <7> [416.490376] hangcheck 	Hangcheck: 8003 ms ago
    <7> [416.490380] hangcheck 	Reset count: 0 (global 0)
    <7> [416.490383] hangcheck 	Requests:
    <7> [416.491210] hangcheck 	RING_START: 0x0017b000
    <7> [416.491983] hangcheck 	RING_HEAD:  0x00000048
    <7> [416.491992] hangcheck 	RING_TAIL:  0x00000048
    <7> [416.492006] hangcheck 	RING_CTL:   0x00000000
    <7> [416.492037] hangcheck 	RING_MODE:  0x00000200 [idle]
    <7> [416.492044] hangcheck 	RING_IMR: 00000000
    <7> [416.492809] hangcheck 	ACTHD:  0x00000000_9ca00048
    <7> [416.492824] hangcheck 	BBADDR: 0x00000000_00001004
    <7> [416.492838] hangcheck 	DMA_FADDR: 0x00000000_00000000
    <7> [416.492845] hangcheck 	IPEIR: 0x00000000
    <7> [416.492852] hangcheck 	IPEHR: 0x00000000
    <7> [416.492863] hangcheck 	Execlist status: 0x00018001 00000000, entries 12
    <7> [416.492869] hangcheck 	Execlist CSB read 1, write 1, tasklet queued? no (enabled)
    <7> [416.492938] hangcheck 		Pending[0] ring:{start:0017b000, hwsp:fedf9000, seqno:00016fd6}, rq:  20ffa:16fd6!+  prio=-4094 @ 8307ms: signaled
    <7> [416.492972] hangcheck 		Queue priority hint: -4093
    <7> [416.492979] hangcheck 		Q  20ffa:16fd8-  prio=-4093 @ 8307ms: [i915]
    <7> [416.492985] hangcheck 		Q  20ffa:16fda  prio=-4094 @ 8307ms: [i915]
    <7> [416.492990] hangcheck 		Q  20ffa:16fdc  prio=-4094 @ 8307ms: [i915]
    <7> [416.492996] hangcheck 		Q  20ffa:16fde  prio=-4094 @ 8307ms: [i915]
    <7> [416.493001] hangcheck 		Q  20ffa:16fe0  prio=-4094 @ 8307ms: [i915]
    <7> [416.493007] hangcheck 		Q  20ffa:16fe2  prio=-4094 @ 8307ms: [i915]
    <7> [416.493013] hangcheck 		Q  20ffa:16fe4  prio=-4094 @ 8307ms: [i915]
    <7> [416.493021] hangcheck 		...skipping 21 queued requests...
    <7> [416.493027] hangcheck 		Q  20ffa:17010  prio=-4094 @ 8307ms: [i915]
    <7> [416.493081] hangcheck HWSP:
    <7> [416.493089] hangcheck [0000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    <7> [416.493094] hangcheck *
    <7> [416.493100] hangcheck [0040] 10008002 00000000 10000018 00000000 10000018 00000000 10000001 00000000
    <7> [416.493106] hangcheck [0060] 10000018 00000000 10000001 00000000 10000018 00000000 10000001 00000000
    <7> [416.493111] hangcheck *
    <7> [416.493117] hangcheck [00a0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000001
    <7> [416.493123] hangcheck [00c0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    <7> [416.493127] hangcheck *
    <7> [416.493132] hangcheck Idle? no
    <6> [416.512124] i915 0000:00:02.0: GPU HANG: ecode 11:0:0x00000000, hang on rcs0
    <6> [416.512205] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
    <6> [416.512207] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
    <6> [416.512208] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
    <6> [416.512210] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
    <6> [416.512212] [drm] GPU crash dump saved to /sys/class/drm/card0/error
    <5> [416.513602] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
    <7> [424.489258] hangcheck rcs0
    <7> [424.489263] hangcheck 	Awake? 1
    <7> [424.489267] hangcheck 	Hangcheck: 5954 ms ago
    <7> [424.489271] hangcheck 	Reset count: 1 (global 0)
    <7> [424.489274] hangcheck 	Requests:
    <7> [424.490128] hangcheck 	RING_START: 0x00000000
    <7> [424.490870] hangcheck 	RING_HEAD:  0x00000000
    <7> [424.490877] hangcheck 	RING_TAIL:  0x00000000
    <7> [424.490887] hangcheck 	RING_CTL:   0x00000000
    <7> [424.490897] hangcheck 	RING_MODE:  0x00000200 [idle]
    <7> [424.490904] hangcheck 	RING_IMR: 00000000
    <7> [424.490917] hangcheck 	ACTHD:  0x00000000_00000000
    <7> [424.490930] hangcheck 	BBADDR: 0x00000000_00000000
    <7> [424.490943] hangcheck 	DMA_FADDR: 0x00000000_00000000
    <7> [424.490950] hangcheck 	IPEIR: 0x00000000
    <7> [424.490956] hangcheck 	IPEHR: 0x00000000
    <7> [424.490968] hangcheck 	Execlist status: 0x00000001 00000000, entries 12
    <7> [424.490972] hangcheck 	Execlist CSB read 11, write 11, tasklet queued? no (enabled)
    <7> [424.490983] hangcheck 		Pending[0] ring:{start:0017b000, hwsp:fedf9000, seqno:00016fd6}, rq:  20ffa:16fd6!+  prio=-4094 @ 16305ms: signaled
    <7> [424.490989] hangcheck 		Queue priority hint: -4093
    <7> [424.490996] hangcheck 		Q  20ffa:16fd8-  prio=-4093 @ 16305ms: [i915]
    <7> [424.491001] hangcheck 		Q  20ffa:16fda  prio=-4094 @ 16305ms: [i915]
    <7> [424.491006] hangcheck 		Q  20ffa:16fdc  prio=-4094 @ 16305ms: [i915]
    <7> [424.491011] hangcheck 		Q  20ffa:16fde  prio=-4094 @ 16305ms: [i915]
    <7> [424.491016] hangcheck 		Q  20ffa:16fe0  prio=-4094 @ 16305ms: [i915]
    <7> [424.491022] hangcheck 		Q  20ffa:16fe2  prio=-4094 @ 16305ms: [i915]
    <7> [424.491048] hangcheck 		Q  20ffa:16fe4  prio=-4094 @ 16305ms: [i915]
    <7> [424.491057] hangcheck 		...skipping 21 queued requests...
    <7> [424.491063] hangcheck 		Q  20ffa:17010  prio=-4094 @ 16305ms: [i915]
    <7> [424.491095] hangcheck HWSP:
    <7> [424.491102] hangcheck [0000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    <7> [424.491106] hangcheck *
    <7> [424.491113] hangcheck [0040] 10008002 00000000 10000018 00000000 10000018 00000000 10000001 00000000
    <7> [424.491118] hangcheck [0060] 10000018 00000000 10000001 00000000 10000018 00000000 10000001 00000000
    <7> [424.491122] hangcheck *
    <7> [424.491127] hangcheck [00a0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000000b
    <7> [424.491133] hangcheck [00c0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    <7> [424.491136] hangcheck *
    <7> [424.491141] hangcheck Idle? no
    <5> [424.491834] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
    
    Where not having cleared the pending array on reset, it persists
    indefinitely.
    
    Fixes: fff8102a ("drm/i915/execlists: Process interrupted context on reset")
    Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
    Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Reviewed-by: default avatarAndi Shyti <andi.shyti@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20190730133035.1977-2-chris@chris-wilson.co.uk
    10e36489
intel_lrc.c 108 KB