-
Chris Wilson authored
On Tigerlake, we are seeing a repeat of commit d8f50531 ("drm/i915/icl: Forcibly evict stale csb entries") where, presumably, due to a missing Global Observation Point synchronisation, the write pointer of the CSB ringbuffer is updated _prior_ to the contents of the ringbuffer. That is we see the GPU report more context-switch entries for us to parse, but those entries have not been written, leading us to process stale events, and eventually report a hung GPU. However, this effect appears to be much more severe than we previously saw on Icelake (though it might be best if we try the same approach there as well and measure), and Bruce suggested the good idea of resetting the CSB entry after use so that we can detect when it has been updated by the GPU. By instrumenting how long that may be, we can set a reliable upper bound for how long we should wait for: 513 late, avg of 61 retries (590 ns), max of 1061 retries (10099 ns) Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2045 References: d8f50531 ("drm/i915/icl: Forcibly evict stale csb entries") References: HSDES#22011327657, HSDES#1508287568 Suggested-by: Bruce Chang <yu.bruce.chang@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Bruce Chang <yu.bruce.chang@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: stable@vger.kernel.org # v5.4 Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200915134923.30088-2-chris@chris-wilson.co.uk (cherry picked from commit 233c1ae3c83f21046c6c4083da904163ece8f110) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4a9bb58a