1. 09 Mar, 2023 1 commit
    • Daniele Ceraolo Spurio's avatar
      drm/i915/gsc: flush the GSC worker before wedging on unload · b09f9670
      Daniele Ceraolo Spurio authored
      If we unload the driver and wedge before the GSC worker is complete,
      the worker will hit an error on its submission to the GSC engine and
      then exit. This is hard to hit for a user, but it is reproducible
      with skipping selftests. The error is handled gracefully by the
      worker, so there are no functional issues, but we still end up with
      an error message in dmesg, which is something we want to avoid as
      this is a supported scenario. We could modify the worker to better
      handle a wedging occurring during its execution, but that gets
      complicated for a couple of reasons:
      - We do want the error on runtime wedging, because there are
        implications for subsystems outside of GT (i.e., PXP, HDCP), it's
        only the error on driver unload that we want to silence.
      - The worker is responsible for multiple submissions (GSC FW load,
        HuC auth, SW proxy), so all of those will have to be adapted to
        handle the wedged_on_fini scenario.
      Therefore, it's much simpler to just wait for the worker to be done
      before wedging on driver removal, also considering that the worker
      will likely already be idle in the great majority of non-selftest
      scenarios.
      Signed-off-by: default avatarDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
      Reviewed-by: default avatarAlan Previn <alan.previn.teres.alexis@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20230223172120.3304293-2-daniele.ceraolospurio@intel.com
      b09f9670
  2. 07 Mar, 2023 2 commits
  3. 03 Mar, 2023 4 commits
    • Anshuman Gupta's avatar
      drm/i915/selftest: Fix ktime_get() and h/w access order · 4d14d771
      Anshuman Gupta authored
      Use ktime_get() after accessing the mmio or any driver resource,
      while using wall time for various calculation that depends on
      the inserted delay in order to account any mmio and resource
      access latency.
      
      Cc: Chris Wilson <chris.p.wilson@intel.com>
      Signed-off-by: default avatarAnshuman Gupta <anshuman.gupta@intel.com>
      Reviewed-by: default avatarBadal Nilawar <badal.nilawar@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20230223100503.3323627-3-anshuman.gupta@intel.com
      4d14d771
    • Anshuman Gupta's avatar
      drm/i915/selftest: Fix engine timestamp and ktime disparity · 29b41cf7
      Anshuman Gupta authored
      While reading the engine timestamps there can be uncontrollable
      concurrent mmio access via other i915 child drivers and by GuC,
      which is not truly atomic context as expected by this selftest,
      which may cause mmio latency to read the engine timestamps,
      Account such latency to calculate time to read engine timestamp
      such that selftest can validate the timestamp and ktime pair.
      
      Cc: Chris Wilson <chris.p.wilson@intel.com>
      Signed-off-by: default avatarAnshuman Gupta <anshuman.gupta@intel.com>
      Reviewed-by: default avatarBadal Nilawar <badal.nilawar@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20230223100503.3323627-2-anshuman.gupta@intel.com
      29b41cf7
    • Janusz Krzysztofik's avatar
      drm/i915/active: Fix misuse of non-idle barriers as fence trackers · 50600605
      Janusz Krzysztofik authored
      Users reported oopses on list corruptions when using i915 perf with a
      number of concurrently running graphics applications.  Root cause analysis
      pointed at an issue in barrier processing code -- a race among perf open /
      close replacing active barriers with perf requests on kernel context and
      concurrent barrier preallocate / acquire operations performed during user
      context first pin / last unpin.
      
      When adding a request to a composite tracker, we try to reuse an existing
      fence tracker, already allocated and registered with that composite.  The
      tracker we obtain may already track another fence, may be an idle barrier,
      or an active barrier.
      
      If the tracker we get occurs a non-idle barrier then we try to delete that
      barrier from a list of barrier tasks it belongs to.  However, while doing
      that we don't respect return value from a function that performs the
      barrier deletion.  Should the deletion ever fail, we would end up reusing
      the tracker still registered as a barrier task.  Since the same structure
      field is reused with both fence callback lists and barrier tasks list,
      list corruptions would likely occur.
      
      Barriers are now deleted from a barrier tasks list by temporarily removing
      the list content, traversing that content with skip over the node to be
      deleted, then populating the list back with the modified content.  Should
      that intentionally racy concurrent deletion attempts be not serialized,
      one or more of those may fail because of the list being temporary empty.
      
      Related code that ignores the results of barrier deletion was initially
      introduced in v5.4 by commit d8af05ff ("drm/i915: Allow sharing the
      idle-barrier from other kernel requests").  However, all users of the
      barrier deletion routine were apparently serialized at that time, then the
      issue didn't exhibit itself.  Results of git bisect with help of a newly
      developed igt@gem_barrier_race@remote-request IGT test indicate that list
      corruptions might start to appear after commit 31177017 ("drm/i915/gt:
      Schedule request retirement when timeline idles"), introduced in v5.5.
      
      Respect results of barrier deletion attempts -- mark the barrier as idle
      only if successfully deleted from the list.  Then, before proceeding with
      setting our fence as the one currently tracked, make sure that the tracker
      we've got is not a non-idle barrier.  If that check fails then don't use
      that tracker but go back and try to acquire a new, usable one.
      
      v3: use unlikely() to document what outcome we expect (Andi),
        - fix bad grammar in commit description.
      v2: no code changes,
        - blame commit 31177017 ("drm/i915/gt: Schedule request retirement
          when timeline idles"), v5.5, not commit d8af05ff ("drm/i915: Allow
          sharing the idle-barrier from other kernel requests"), v5.4,
        - reword commit description.
      
      Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6333
      Fixes: 31177017 ("drm/i915/gt: Schedule request retirement when timeline idles")
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: stable@vger.kernel.org # v5.5
      Cc: Andi Shyti <andi.shyti@linux.intel.com>
      Signed-off-by: default avatarJanusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
      Reviewed-by: default avatarAndi Shyti <andi.shyti@linux.intel.com>
      Signed-off-by: default avatarAndi Shyti <andi.shyti@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20230302120820.48740-1-janusz.krzysztofik@linux.intel.com
      50600605
    • Alan Previn's avatar
      drm/i915/gsc: Fix the Driver-FLR completion · 0591bdad
      Alan Previn authored
      The Driver-FLR flow may inadvertently exit early before the full
      completion of the re-init of the internal HW state if we only poll
      GU_DEBUG Bit31 (polling for it to toggle from 0 -> 1). Instead
      we need a two-step completion wait-for-completion flow that also
      involves GU_CNTL. See the patch and new code comments for detail.
      This is new direction from HW architecture folks.
      
         v2: - Add error message for the teardown timeout (Anshuman)
             - Don't duplicate code in comments (Jani)
         v3: - Add get/put runtime-pm for this function. Though
               not functionally required during unload, its so the uncore
      	 doesn't complain.
         v4: - Remove the get/put runtime-pm - that was for a prior
               version of this patch (not needed for drm-managed callback).
             - Remove the fixes tag since this is only for MTL and MTL
               still needs force probe (Daniele).
             - Bit 31 of GU_CNTL should be DRIVERFLR instead of
               DRIVERFLR_STATUS (Daniele).
      Signed-off-by: default avatarAlan Previn <alan.previn.teres.alexis@intel.com>
      Tested-by: default avatarVinay Belgaumkar <vinay.belgaumkar@intel.com>
      Reviewed-by: default avatarDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Signed-off-by: default avatarJohn Harrison <John.C.Harrison@Intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20230224001758.544817-1-alan.previn.teres.alexis@intel.com
      0591bdad
  4. 01 Mar, 2023 2 commits
  5. 28 Feb, 2023 2 commits
  6. 27 Feb, 2023 1 commit
  7. 24 Feb, 2023 4 commits
  8. 17 Feb, 2023 4 commits
  9. 15 Feb, 2023 1 commit
  10. 10 Feb, 2023 1 commit
    • Matt Roper's avatar
      drm/i915/xehp: LNCF/LBCF workarounds should be on the GT list · 4583d6be
      Matt Roper authored
      Although registers in the L3 bank/node configuration ranges are marked
      as having "DEV" reset characteristics in the bspec, this appears to be a
      hold-over from pre-Xe_HP platforms.  In reality, these registers
      maintain their values across engine resets, meaning that workarounds
      and tuning settings targeting them should be placed on the GT
      workaround list rather than an engine workaround list.
      
      Note that an extra clue here is that these registers moved from the
      RENDER forcewake domain to the GT forcewake domain in Xe_HP; generally
      RCS/CCS engine resets should not lead to the reset of a register that
      lives outside the RENDER domain.
      
      Re-applying these registers on engine resets wouldn't actually hurt
      anything, but is unnecessary and just makes it more confusing to anyone
      trying to decipher how these registers really work.
      
      v2:
       - Also move DG2's Wa_14010648519 to the GT list.  (Gustavo)
      
      Cc: Gustavo Sousa <gustavo.sousa@intel.com>
      Signed-off-by: default avatarMatt Roper <matthew.d.roper@intel.com>
      Reviewed-by: default avatarGustavo Sousa <gustavo.sousa@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20230209232228.859317-1-matthew.d.roper@intel.com
      4583d6be
  11. 09 Feb, 2023 8 commits
  12. 08 Feb, 2023 4 commits
  13. 07 Feb, 2023 2 commits
  14. 06 Feb, 2023 3 commits
  15. 03 Feb, 2023 1 commit