drm/i915/execlists: Always clear pending&inflight requests on reset
If we skip the reset as we found the engine inactive at the time of the reset, we still need to clear the residual inflight & pending request bookkeeping to reflect the current state of HW. Otherwise, we may end up stuck in a loop like: <7> [416.490346] hangcheck rcs0 <7> [416.490371] hangcheck Awake? 1 <7> [416.490376] hangcheck Hangcheck: 8003 ms ago <7> [416.490380] hangcheck Reset count: 0 (global 0) <7> [416.490383] hangcheck Requests: <7> [416.491210] hangcheck RING_START: 0x0017b000 <7> [416.491983] hangcheck RING_HEAD: 0x00000048 <7> [416.491992] hangcheck RING_TAIL: 0x00000048 <7> [416.492006] hangcheck RING_CTL: 0x00000000 <7> [416.492037] hangcheck RING_MODE: 0x00000200 [idle] <7> [416.492044] hangcheck RING_IMR: 00000000 <7> [416.492809] hangcheck ACTHD: 0x00000000_9ca00048 <7> [416.492824] hangcheck BBADDR: 0x00000000_00001004 <7> [416.492838] hangcheck DMA_FADDR: 0x00000000_00000000 <7> [416.492845] hangcheck IPEIR: 0x00000000 <7> [416.492852] hangcheck IPEHR: 0x00000000 <7> [416.492863] hangcheck Execlist status: 0x00018001 00000000, entries 12 <7> [416.492869] hangcheck Execlist CSB read 1, write 1, tasklet queued? no (enabled) <7> [416.492938] hangcheck Pending[0] ring:{start:0017b000, hwsp:fedf9000, seqno:00016fd6}, rq: 20ffa:16fd6!+ prio=-4094 @ 8307ms: signaled <7> [416.492972] hangcheck Queue priority hint: -4093 <7> [416.492979] hangcheck Q 20ffa:16fd8- prio=-4093 @ 8307ms: [i915] <7> [416.492985] hangcheck Q 20ffa:16fda prio=-4094 @ 8307ms: [i915] <7> [416.492990] hangcheck Q 20ffa:16fdc prio=-4094 @ 8307ms: [i915] <7> [416.492996] hangcheck Q 20ffa:16fde prio=-4094 @ 8307ms: [i915] <7> [416.493001] hangcheck Q 20ffa:16fe0 prio=-4094 @ 8307ms: [i915] <7> [416.493007] hangcheck Q 20ffa:16fe2 prio=-4094 @ 8307ms: [i915] <7> [416.493013] hangcheck Q 20ffa:16fe4 prio=-4094 @ 8307ms: [i915] <7> [416.493021] hangcheck ...skipping 21 queued requests... <7> [416.493027] hangcheck Q 20ffa:17010 prio=-4094 @ 8307ms: [i915] <7> [416.493081] hangcheck HWSP: <7> [416.493089] hangcheck [0000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <7> [416.493094] hangcheck * <7> [416.493100] hangcheck [0040] 10008002 00000000 10000018 00000000 10000018 00000000 10000001 00000000 <7> [416.493106] hangcheck [0060] 10000018 00000000 10000001 00000000 10000018 00000000 10000001 00000000 <7> [416.493111] hangcheck * <7> [416.493117] hangcheck [00a0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000001 <7> [416.493123] hangcheck [00c0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <7> [416.493127] hangcheck * <7> [416.493132] hangcheck Idle? no <6> [416.512124] i915 0000:00:02.0: GPU HANG: ecode 11:0:0x00000000, hang on rcs0 <6> [416.512205] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. <6> [416.512207] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel <6> [416.512208] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. <6> [416.512210] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. <6> [416.512212] [drm] GPU crash dump saved to /sys/class/drm/card0/error <5> [416.513602] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 <7> [424.489258] hangcheck rcs0 <7> [424.489263] hangcheck Awake? 1 <7> [424.489267] hangcheck Hangcheck: 5954 ms ago <7> [424.489271] hangcheck Reset count: 1 (global 0) <7> [424.489274] hangcheck Requests: <7> [424.490128] hangcheck RING_START: 0x00000000 <7> [424.490870] hangcheck RING_HEAD: 0x00000000 <7> [424.490877] hangcheck RING_TAIL: 0x00000000 <7> [424.490887] hangcheck RING_CTL: 0x00000000 <7> [424.490897] hangcheck RING_MODE: 0x00000200 [idle] <7> [424.490904] hangcheck RING_IMR: 00000000 <7> [424.490917] hangcheck ACTHD: 0x00000000_00000000 <7> [424.490930] hangcheck BBADDR: 0x00000000_00000000 <7> [424.490943] hangcheck DMA_FADDR: 0x00000000_00000000 <7> [424.490950] hangcheck IPEIR: 0x00000000 <7> [424.490956] hangcheck IPEHR: 0x00000000 <7> [424.490968] hangcheck Execlist status: 0x00000001 00000000, entries 12 <7> [424.490972] hangcheck Execlist CSB read 11, write 11, tasklet queued? no (enabled) <7> [424.490983] hangcheck Pending[0] ring:{start:0017b000, hwsp:fedf9000, seqno:00016fd6}, rq: 20ffa:16fd6!+ prio=-4094 @ 16305ms: signaled <7> [424.490989] hangcheck Queue priority hint: -4093 <7> [424.490996] hangcheck Q 20ffa:16fd8- prio=-4093 @ 16305ms: [i915] <7> [424.491001] hangcheck Q 20ffa:16fda prio=-4094 @ 16305ms: [i915] <7> [424.491006] hangcheck Q 20ffa:16fdc prio=-4094 @ 16305ms: [i915] <7> [424.491011] hangcheck Q 20ffa:16fde prio=-4094 @ 16305ms: [i915] <7> [424.491016] hangcheck Q 20ffa:16fe0 prio=-4094 @ 16305ms: [i915] <7> [424.491022] hangcheck Q 20ffa:16fe2 prio=-4094 @ 16305ms: [i915] <7> [424.491048] hangcheck Q 20ffa:16fe4 prio=-4094 @ 16305ms: [i915] <7> [424.491057] hangcheck ...skipping 21 queued requests... <7> [424.491063] hangcheck Q 20ffa:17010 prio=-4094 @ 16305ms: [i915] <7> [424.491095] hangcheck HWSP: <7> [424.491102] hangcheck [0000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <7> [424.491106] hangcheck * <7> [424.491113] hangcheck [0040] 10008002 00000000 10000018 00000000 10000018 00000000 10000001 00000000 <7> [424.491118] hangcheck [0060] 10000018 00000000 10000001 00000000 10000018 00000000 10000001 00000000 <7> [424.491122] hangcheck * <7> [424.491127] hangcheck [00a0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000000b <7> [424.491133] hangcheck [00c0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <7> [424.491136] hangcheck * <7> [424.491141] hangcheck Idle? no <5> [424.491834] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 Where not having cleared the pending array on reset, it persists indefinitely. Fixes: fff8102a ("drm/i915/execlists: Process interrupted context on reset") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190730133035.1977-2-chris@chris-wilson.co.uk
Showing
Please register or sign in to comment