1. 13 May, 2019 1 commit
  2. 09 May, 2019 9 commits
  3. 08 May, 2019 3 commits
    • Chris Wilson's avatar
      drm/i915: Seal races between async GPU cancellation, retirement and signaling · 0152b3b3
      Chris Wilson authored
      Currently there is an underlying assumption that i915_request_unsubmit()
      is synchronous wrt the GPU -- that is the request is no longer in flight
      as we remove it. In the near future that may change, and this may upset
      our signaling as we can process an interrupt for that request while it
      is no longer in flight.
      
      CPU0					CPU1
      intel_engine_breadcrumbs_irq
      (queue request completion)
      					i915_request_cancel_signaling
      ...					...
      					i915_request_enable_signaling
      dma_fence_signal
      
      Hence in the time it took us to drop the lock to signal the request, a
      preemption event may have occurred and re-queued the request. In the
      process, that request would have seen I915_FENCE_FLAG_SIGNAL clear and
      so reused the rq->signal_link that was in use on CPU0, leading to bad
      pointer chasing in intel_engine_breadcrumbs_irq.
      
      A related issue was that if someone started listening for a signal on a
      completed but no longer in-flight request, we missed the opportunity to
      immediately signal that request.
      
      Furthermore, as intel_contexts may be immediately released during
      request retirement, in order to be entirely sure that
      intel_engine_breadcrumbs_irq may no longer dereference the intel_context
      (ce->signals and ce->signal_link), we must wait for irq spinlock.
      
      In order to prevent the race, we use a bit in the fence.flags to signal
      the transfer onto the signal list inside intel_engine_breadcrumbs_irq.
      For simplicity, we use the DMA_FENCE_FLAG_SIGNALED_BIT as it then
      quickly signals to any outside observer that the fence is indeed signaled.
      
      v2: Sketch out potential dma-fence API for manual signaling
      v3: And the test_and_set_bit()
      
      Fixes: 52c0fdb2 ("drm/i915: Replace global breadcrumbs with per-context interrupt tracking")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190508112452.18942-1-chris@chris-wilson.co.uk
      0152b3b3
    • Chris Wilson's avatar
      drm/i915/hangcheck: Replace hangcheck.seqno with RING_HEAD · 519a0194
      Chris Wilson authored
      After realising we need to sample RING_START to detect context switches
      from preemption events that do not allow for the seqno to advance, we
      can also realise that the seqno itself is just a distance along the ring
      and so can be replaced by sampling RING_HEAD.
      
      v2: Bonus comment for the mystery separate CS_STALL before MI_USER_INTERRUPT
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190508080704.24223-1-chris@chris-wilson.co.uk
      519a0194
    • Chris Wilson's avatar
      drm/i915: Reboot CI if forcewake fails · 18ecc6c5
      Chris Wilson authored
      If the HW fails to ack a change in forcewake status, the machine is as
      good as dead -- it may recover, but in reality it missed the mmio
      updates and is now in a very inconsistent state. If it happens, we can't
      trust the CI results (or at least the fails may be genuine but due to
      the HW being dead and not the actual test!) so reboot the machine (CI
      checks for a kernel taint in between each test and reboots if the
      machine is tainted).
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190508115245.27790-1-chris@chris-wilson.co.uk
      18ecc6c5
  4. 07 May, 2019 13 commits
  5. 06 May, 2019 5 commits
  6. 04 May, 2019 1 commit
    • Chris Wilson's avatar
      drm/i915: Disable semaphore busywaits on saturated systems · ca6e56f6
      Chris Wilson authored
      Asking the GPU to busywait on a memory address, perhaps not unexpectedly
      in hindsight for a shared system, leads to bus contention that affects
      CPU programs trying to concurrently access memory. This can manifest as
      a drop in transcode throughput on highly over-saturated workloads.
      
      The only clue offered by perf, is that the bus-cycles (perf stat -e
      bus-cycles) jumped by 50% when enabling semaphores. This corresponds
      with extra CPU active cycles being attributed to intel_idle's mwait.
      
      This patch introduces a heuristic to try and detect when more than one
      client is submitting to the GPU pushing it into an oversaturated state.
      As we already keep track of when the semaphores are signaled, we can
      inspect their state on submitting the busywait batch and if we planned
      to use a semaphore but were too late, conclude that the GPU is
      overloaded and not try to use semaphores in future requests. In
      practice, this means we optimistically try to use semaphores for the
      first frame of a transcode job split over multiple engines, and fail if
      there are multiple clients active and continue not to use semaphores for
      the subsequent frames in the sequence. Periodically, we try to
      optimistically switch semaphores back on whenever the client waits to
      catch up with the transcode results.
      
      With 1 client, on Broxton J3455, with the relative fps normalized by %cpu:
      
      x no semaphores
      + drm-tip
      * patched
      +------------------------------------------------------------------------+
      |                                                    *                   |
      |                                                    *+                  |
      |                                                    **+                 |
      |                                                    **+  x              |
      |                                x               *  +**+  x              |
      |                                x  x       *    *  +***x xx             |
      |                                x  x       *    * *+***x *x             |
      |                                x  x*   +  *    * *****x *x x           |
      |                         +    x xx+x*   + ***   * ********* x   *       |
      |                         +    x xx+x*   * *** +** ********* xx  *       |
      |    *   +         ++++*  +    x*x****+*+* ***+*************+x*  *       |
      |*+ +** *+ + +* + *++****** *xxx**********x***+*****************+*++    *|
      |                                   |__________A_____M_____|             |
      |                           |_______________A____M_________|             |
      |                                 |____________A___M________|            |
      +------------------------------------------------------------------------+
          N           Min           Max        Median           Avg        Stddev
      x 120       2.60475       3.50941       3.31123     3.2143953    0.21117399
      + 120        2.3826       3.57077       3.25101     3.1414161    0.28146407
      Difference at 95.0% confidence
      	-0.0729792 +/- 0.0629585
      	-2.27039% +/- 1.95864%
      	(Student's t, pooled s = 0.248814)
      * 120       2.35536       3.66713        3.2849     3.2059917    0.24618565
      No difference proven at 95.0% confidence
      
      With 10 clients over-saturating the pipeline:
      
      x no semaphores
      + drm-tip
      * patched
      +------------------------------------------------------------------------+
      |                     ++                                        **       |
      |                     ++                                        **       |
      |                     ++                                        **       |
      |                     ++                                        **       |
      |                     ++                                    xx ***       |
      |                     ++                                    xx ***       |
      |                     ++                                    xxx***       |
      |                     ++                                    xxx***       |
      |                    +++                                    xxx***       |
      |                    +++                                    xx****       |
      |                    +++                                    xx****       |
      |                    +++                                    xx****       |
      |                    +++                                    xx****       |
      |                    ++++                                   xx****       |
      |                   +++++                                   xx****       |
      |                   +++++                                 x x******      |
      |                  ++++++                                 xxx*******     |
      |                  ++++++                                 xxx*******     |
      |                  ++++++                                 xxx*******     |
      |                  ++++++                                 xx********     |
      |                  ++++++                               xxxx********     |
      |                  ++++++                               xxxx********     |
      |                ++++++++                             xxxxx*********     |
      |+ +  +        + ++++++++                           xxx*xx**********x*  *|
      |                                                         |__A__|        |
      |                 |__AM__|                                               |
      |                                                            |__A_|      |
      +------------------------------------------------------------------------+
          N           Min           Max        Median           Avg        Stddev
      x 120       2.47855        2.8972       2.72376     2.7193402   0.074604933
      + 120       1.17367       1.77459       1.71977     1.6966782   0.085850697
      Difference at 95.0% confidence
      	-1.02266 +/- 0.0203502
      	-37.607% +/- 0.748352%
      	(Student's t, pooled s = 0.0804246)
      * 120       2.57868       3.00821       2.80142     2.7923878   0.058646477
      Difference at 95.0% confidence
      	0.0730476 +/- 0.0169791
      	2.68622% +/- 0.624383%
      	(Student's t, pooled s = 0.0671018)
      
      Indicating that we've recovered the regression from enabling semaphores
      on this saturated setup, with a hint towards an overall improvement.
      
      Very similar, but of smaller magnitude, results are observed on both
      Skylake(gt2) and Kabylake(gt4). This may be due to the reduced impact of
      bus-cycles, where we see a 50% hit on Broxton, it is only 10% on the big
      core, in this particular test.
      
      One observation to make here is that for a greedy client trying to
      maximise its own throughput, using semaphores is the right choice. It is
      only the holistic system-wide view that semaphores of one client
      impacts another and reduces the overall throughput where we would choose
      to disable semaphores.
      
      The most noticeable negactive impact this has is on the no-op
      microbenchmarks, which are also very notable for having no cpu bus load.
      In particular, this increases the runtime and energy consumption of
      gem_exec_whisper.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
      Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190504070707.30902-1-chris@chris-wilson.co.uk
      ca6e56f6
  7. 03 May, 2019 8 commits