1. 01 Sep, 2024 5 commits
    • Vladimir Lypak's avatar
      drm/msm/a5xx: workaround early ring-buffer emptiness check · a30f9f65
      Vladimir Lypak authored
      There is another cause for soft lock-up of GPU in empty ring-buffer:
      race between GPU executing last commands and CPU checking ring for
      emptiness. On GPU side IRQ for retire is triggered by CACHE_FLUSH_TS
      event and RPTR shadow (which is used to check ring emptiness) is updated
      a bit later from CP_CONTEXT_SWITCH_YIELD. Thus if GPU is executing its
      last commands slow enough or we check that ring too fast we will miss a
      chance to trigger switch to lower priority ring because current ring isn't
      empty just yet. This can escalate to lock-up situation described in
      previous patch.
      To work-around this issue we keep track of last submit sequence number
      for each ring and compare it with one written to memptrs from GPU during
      execution of CACHE_FLUSH_TS event.
      
      Fixes: b1fc2839 ("drm/msm: Implement preemption for A5XX targets")
      Signed-off-by: default avatarVladimir Lypak <vladimir.lypak@gmail.com>
      Patchwork: https://patchwork.freedesktop.org/patch/612047/Signed-off-by: default avatarRob Clark <robdclark@chromium.org>
      a30f9f65
    • Vladimir Lypak's avatar
      drm/msm/a5xx: fix races in preemption evaluation stage · ce050f30
      Vladimir Lypak authored
      On A5XX GPUs when preemption is used it's invietable to enter a soft
      lock-up state in which GPU is stuck at empty ring-buffer doing nothing.
      This appears as full UI lockup and not detected as GPU hang (because
      it's not). This happens due to not triggering preemption when it was
      needed. Sometimes this state can be recovered by some new submit but
      generally it won't happen because applications are waiting for old
      submits to retire.
      
      One of the reasons why this happens is a race between a5xx_submit and
      a5xx_preempt_trigger called from IRQ during submit retire. Former thread
      updates ring->cur of previously empty and not current ring right after
      latter checks it for emptiness. Then both threads can just exit because
      for first one preempt_state wasn't NONE yet and for second one all rings
      appeared to be empty.
      
      To prevent such situations from happening we need to establish guarantee
      for preempt_trigger to make decision after each submit or retire. To
      implement this we serialize preemption initiation using spinlock. If
      switch is already in progress we need to re-trigger preemption when it
      finishes.
      
      Fixes: b1fc2839 ("drm/msm: Implement preemption for A5XX targets")
      Signed-off-by: default avatarVladimir Lypak <vladimir.lypak@gmail.com>
      Patchwork: https://patchwork.freedesktop.org/patch/612045/Signed-off-by: default avatarRob Clark <robdclark@chromium.org>
      ce050f30
    • Vladimir Lypak's avatar
      drm/msm/a5xx: properly clear preemption records on resume · 64fd6d01
      Vladimir Lypak authored
      Two fields of preempt_record which are used by CP aren't reset on
      resume: "data" and "info". This is the reason behind faults which happen
      when we try to switch to the ring that was active last before suspend.
      In addition those faults can't be recovered from because we use suspend
      and resume to do so (keeping values of those fields again).
      
      Fixes: b1fc2839 ("drm/msm: Implement preemption for A5XX targets")
      Signed-off-by: default avatarVladimir Lypak <vladimir.lypak@gmail.com>
      Reviewed-by: default avatarKonrad Dybcio <konrad.dybcio@linaro.org>
      Patchwork: https://patchwork.freedesktop.org/patch/612043/Signed-off-by: default avatarRob Clark <robdclark@chromium.org>
      64fd6d01
    • Vladimir Lypak's avatar
      drm/msm/a5xx: disable preemption in submits by default · db9dec2d
      Vladimir Lypak authored
      Fine grain preemption (switching from/to points within submits)
      requires extra handling in command stream of those submits, especially
      when rendering with tiling (using GMEM). However this handling is
      missing at this point in mesa (and always was). For this reason we get
      random GPU faults and hangs if more than one priority level is used
      because local preemption is enabled prior to executing command stream
      from submit.
      With that said it was ahead of time to enable local preemption by
      default considering the fact that even on downstream kernel it is only
      enabled if requested via UAPI.
      
      Fixes: a7a4c19c ("drm/msm/a5xx: fix setting of the CP_PREEMPT_ENABLE_LOCAL register")
      Signed-off-by: default avatarVladimir Lypak <vladimir.lypak@gmail.com>
      Patchwork: https://patchwork.freedesktop.org/patch/612041/Signed-off-by: default avatarRob Clark <robdclark@chromium.org>
      db9dec2d
    • Konrad Dybcio's avatar
      drm/msm/adreno: Assign msm_gpu->pdev earlier to avoid nullptrs · 16007768
      Konrad Dybcio authored
      There are some cases, such as the one uncovered by Commit 46d4efcc
      ("drm/msm/a6xx: Avoid a nullptr dereference when speedbin setting fails")
      where
      
      msm_gpu_cleanup() : platform_set_drvdata(gpu->pdev, NULL);
      
      is called on gpu->pdev == NULL, as the GPU device has not been fully
      initialized yet.
      
      Turns out that there's more than just the aforementioned path that
      causes this to happen (e.g. the case when there's speedbin data in the
      catalog, but opp-supported-hw is missing in DT).
      
      Assigning msm_gpu->pdev earlier seems like the least painful solution
      to this, therefore do so.
      Signed-off-by: default avatarKonrad Dybcio <konrad.dybcio@linaro.org>
      Patchwork: https://patchwork.freedesktop.org/patch/602742/Signed-off-by: default avatarRob Clark <robdclark@chromium.org>
      16007768
  2. 30 Aug, 2024 16 commits
  3. 29 Aug, 2024 19 commits