1. 13 May, 2024 1 commit
  2. 08 May, 2024 3 commits
  3. 07 May, 2024 1 commit
    • Janusz Krzysztofik's avatar
      Revert "drm/i915: Remove extra multi-gt pm-references" · 749670a5
      Janusz Krzysztofik authored
      This reverts commit 1f33dc0c.
      
      There was a patch supposed to fix an issue of illegal attempts to free a
      still active i915 VMA object when parking a GT believed to be idle,
      reported by CI on 2-GT Meteor Lake.  As a solution, an extra wakeref for
      a Primary GT was acquired from i915_gem_do_execbuffer() -- see commit
      f56fe3e9 ("drm/i915: Fix a VMA UAF for multi-gt platform").
      
      However, that fix occurred insufficient -- the issue was still reported by
      CI.  That wakeref was released on exit from i915_gem_do_execbuffer(), then
      potentially before completion of the request and deactivation of its
      associated VMAs.  Moreover, CI reports indicated that single-GT platforms
      also suffered sporadically from the same race.
      
      Since that issue was fixed by another commit f3c71b2d ("drm/i915/vma:
      Fix UAF on destroy against retire race"), the changes introduced by that
      insufficient fix were dropped as no longer useful.  However, that series
      resulted in another VMA UAF scenario now being triggered in CI.
      
      <4> [260.290809] ------------[ cut here ]------------
      <4> [260.290988] list_del corruption. prev->next should be ffff888118c5d990, but was ffff888118c5a510. (prev=ffff888118c5a510)
      <4> [260.291004] WARNING: CPU: 2 PID: 1143 at lib/list_debug.c:62 __list_del_entry_valid_or_report+0xb7/0xe0
      ..
      <4> [260.291055] CPU: 2 PID: 1143 Comm: kms_plane Not tainted 6.9.0-rc2-CI_DRM_14524-ga25d180c6853+ #1
      <4> [260.291058] Hardware name: Intel Corporation Meteor Lake Client Platform/MTL-P LP5x T3 RVP, BIOS MTLPFWI1.R00.3471.D91.2401310918 01/31/2024
      <4> [260.291060] RIP: 0010:__list_del_entry_valid_or_report+0xb7/0xe0
      ...
      <4> [260.291087] Call Trace:
      <4> [260.291089]  <TASK>
      <4> [260.291124]  i915_vma_reopen+0x43/0x80 [i915]
      <4> [260.291298]  eb_lookup_vmas+0x9cb/0xcc0 [i915]
      <4> [260.291579]  i915_gem_do_execbuffer+0xc9a/0x26d0 [i915]
      <4> [260.291883]  i915_gem_execbuffer2_ioctl+0x123/0x2a0 [i915]
      ...
      <4> [260.292301]  </TASK>
      ...
      <4> [260.292506] ---[ end trace 0000000000000000 ]---
      <4> [260.292782] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6ca3: 0000 [#1] PREEMPT SMP NOPTI
      <4> [260.303575] CPU: 2 PID: 1143 Comm: kms_plane Tainted: G        W          6.9.0-rc2-CI_DRM_14524-ga25d180c6853+ #1
      <4> [260.313851] Hardware name: Intel Corporation Meteor Lake Client Platform/MTL-P LP5x T3 RVP, BIOS MTLPFWI1.R00.3471.D91.2401310918 01/31/2024
      <4> [260.326359] RIP: 0010:eb_validate_vmas+0x114/0xd80 [i915]
      ...
      <4> [260.428756] Call Trace:
      <4> [260.431192]  <TASK>
      <4> [639.283393]  i915_gem_do_execbuffer+0xd05/0x26d0 [i915]
      <4> [639.305245]  i915_gem_execbuffer2_ioctl+0x123/0x2a0 [i915]
      ...
      <4> [639.411134]  </TASK>
      ...
      <4> [639.449979] ---[ end trace 0000000000000000 ]---
      
      We defer actually closing, unbinding and destroying a VMA until next idle
      point, or until the object is freed in the meantime.  By postponing the
      unbind, we allow for the VMA to be reopened by the client, avoiding the
      work required to rebind the VMA.
      
      Starting from commit b0647a5e ("drm/i915: Avoid live-lock with
      i915_vma_parked()"), we assume that as long as a GT is held idle, no VMA
      would be reopened while we destroy them.  That assumption is no longer
      true in multi-GT configurations, where a VMA we reopen may be handled by a
      GT different from the one that we already keep active via its engine while
      we set up an execbuf request.
      
      Restoring the extra GT0 PM wakeref removed from i915_gem_do_execbuffer()
      processing path seems to fix this issue.
      
      Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10608Signed-off-by: default avatarJanusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Nirmoy Das <nirmoy.das@linux.intel.com>
      Reviewed-by: default avatarNirmoy Das <nirmoy.das@linux.intel.com>
      Fixes: 1f33dc0c ("drm/i915: Remove extra multi-gt pm-references")
      Link: https://patchwork.freedesktop.org/patch/msgid/20240506180253.96858-2-janusz.krzysztofik@linux.intel.comSigned-off-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      749670a5
  4. 30 Apr, 2024 1 commit
  5. 24 Apr, 2024 2 commits
  6. 19 Apr, 2024 1 commit
  7. 14 Apr, 2024 1 commit
  8. 05 Apr, 2024 1 commit
    • John Harrison's avatar
      drm/i915/guc: Fix the fix for reset lock confusion · 3563d855
      John Harrison authored
      The previous fix for the circlular lock splat about the busyness
      worker wasn't quite complete. Even though the reset-in-progress flag
      is cleared at the start of intel_uc_reset_finish, the entire function
      is still inside the reset mutex lock. Not sure why the patch appeared
      to fix the issue both locally and in CI. However, it is now back
      again.
      
      There is a further complication that the wedge code path within
      intel_gt_reset() jumps around so much that it results in nested
      reset_prepare/_finish calls. That is, the call sequence is:
        intel_gt_reset
        | reset_prepare
        | __intel_gt_set_wedged
        | | reset_prepare
        | | reset_finish
        | reset_finish
      
      The nested finish means that even if the clear of the in-progress flag
      was moved to the end of _finish, it would still be clear for the
      entire second call. Surprisingly, this does not seem to be causing any
      other problems at present.
      
      As an aside, a wedge on fini does not call the finish functions at
      all. The reset_in_progress flag is left set (twice).
      
      So instead of trying to cancel the worker anywhere at all in the reset
      path, just add a cancel to intel_guc_submission_fini instead. Note
      that it is not a problem if the worker is still active during a reset.
      Either it will run before the reset path starts locking things and
      will simply block the reset code for a tiny amount of time. Or it will
      run after the locks have been acquired and will early exit due to the
      try-lock.
      
      Also, do not use the reset-in-progress flag to decide whether a
      synchronous cancel is safe (from a lockdep perspective) or not.
      Instead, use the actual reset mutex state (both the genuine one and
      the custom rolled BACKOFF one).
      
      Fixes: 0e00a881 ("drm/i915/guc: Avoid circular locking issue on busyness flush")
      Signed-off-by: default avatarJohn Harrison <John.C.Harrison@Intel.com>
      Cc: Zhanjun Dong <zhanjun.dong@intel.com>
      Cc: John Harrison <John.C.Harrison@Intel.com>
      Cc: Andi Shyti <andi.shyti@linux.intel.com>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Nirmoy Das <nirmoy.das@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
      Cc: Andrzej Hajda <andrzej.hajda@intel.com>
      Cc: Matt Roper <matthew.d.roper@intel.com>
      Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
      Cc: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
      Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
      Cc: Madhumitha Tolakanahalli Pradeep <madhumitha.tolakanahalli.pradeep@intel.com>
      Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
      Cc: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com>
      Reviewed-by: default avatarNirmoy Das <nirmoy.das@intel.com>
      Reviewed-by: default avatarAndi Shyti <andi.shyti@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240329235306.1559639-1-John.C.Harrison@Intel.com
      3563d855
  9. 03 Apr, 2024 1 commit
  10. 30 Mar, 2024 3 commits
  11. 28 Mar, 2024 2 commits
  12. 26 Mar, 2024 4 commits
    • Chris Wilson's avatar
      drm/i915/gt: Reset queue_priority_hint on parking · 98850e96
      Chris Wilson authored
      Originally, with strict in order execution, we could complete execution
      only when the queue was empty. Preempt-to-busy allows replacement of an
      active request that may complete before the preemption is processed by
      HW. If that happens, the request is retired from the queue, but the
      queue_priority_hint remains set, preventing direct submission until
      after the next CS interrupt is processed.
      
      This preempt-to-busy race can be triggered by the heartbeat, which will
      also act as the power-management barrier and upon completion allow us to
      idle the HW. We may process the completion of the heartbeat, and begin
      parking the engine before the CS event that restores the
      queue_priority_hint, causing us to fail the assertion that it is MIN.
      
      <3>[  166.210729] __engine_park:283 GEM_BUG_ON(engine->sched_engine->queue_priority_hint != (-((int)(~0U >> 1)) - 1))
      <0>[  166.210781] Dumping ftrace buffer:
      <0>[  166.210795] ---------------------------------
      ...
      <0>[  167.302811] drm_fdin-1097      2..s1. 165741070us : trace_ports: 0000:00:02.0 rcs0: promote { ccid:20 1217:2 prio 0 }
      <0>[  167.302861] drm_fdin-1097      2d.s2. 165741072us : execlists_submission_tasklet: 0000:00:02.0 rcs0: preempting last=1217:2, prio=0, hint=2147483646
      <0>[  167.302928] drm_fdin-1097      2d.s2. 165741072us : __i915_request_unsubmit: 0000:00:02.0 rcs0: fence 1217:2, current 0
      <0>[  167.302992] drm_fdin-1097      2d.s2. 165741073us : __i915_request_submit: 0000:00:02.0 rcs0: fence 3:4660, current 4659
      <0>[  167.303044] drm_fdin-1097      2d.s1. 165741076us : execlists_submission_tasklet: 0000:00:02.0 rcs0: context:3 schedule-in, ccid:40
      <0>[  167.303095] drm_fdin-1097      2d.s1. 165741077us : trace_ports: 0000:00:02.0 rcs0: submit { ccid:40 3:4660* prio 2147483646 }
      <0>[  167.303159] kworker/-89       11..... 165741139us : i915_request_retire.part.0: 0000:00:02.0 rcs0: fence c90:2, current 2
      <0>[  167.303208] kworker/-89       11..... 165741148us : __intel_context_do_unpin: 0000:00:02.0 rcs0: context:c90 unpin
      <0>[  167.303272] kworker/-89       11..... 165741159us : i915_request_retire.part.0: 0000:00:02.0 rcs0: fence 1217:2, current 2
      <0>[  167.303321] kworker/-89       11..... 165741166us : __intel_context_do_unpin: 0000:00:02.0 rcs0: context:1217 unpin
      <0>[  167.303384] kworker/-89       11..... 165741170us : i915_request_retire.part.0: 0000:00:02.0 rcs0: fence 3:4660, current 4660
      <0>[  167.303434] kworker/-89       11d..1. 165741172us : __intel_context_retire: 0000:00:02.0 rcs0: context:1216 retire runtime: { total:56028ns, avg:56028ns }
      <0>[  167.303484] kworker/-89       11..... 165741198us : __engine_park: 0000:00:02.0 rcs0: parked
      <0>[  167.303534]   <idle>-0         5d.H3. 165741207us : execlists_irq_handler: 0000:00:02.0 rcs0: semaphore yield: 00000040
      <0>[  167.303583] kworker/-89       11..... 165741397us : __intel_context_retire: 0000:00:02.0 rcs0: context:1217 retire runtime: { total:325575ns, avg:0ns }
      <0>[  167.303756] kworker/-89       11..... 165741777us : __intel_context_retire: 0000:00:02.0 rcs0: context:c90 retire runtime: { total:0ns, avg:0ns }
      <0>[  167.303806] kworker/-89       11..... 165742017us : __engine_park: __engine_park:283 GEM_BUG_ON(engine->sched_engine->queue_priority_hint != (-((int)(~0U >> 1)) - 1))
      <0>[  167.303811] ---------------------------------
      <4>[  167.304722] ------------[ cut here ]------------
      <2>[  167.304725] kernel BUG at drivers/gpu/drm/i915/gt/intel_engine_pm.c:283!
      <4>[  167.304731] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
      <4>[  167.304734] CPU: 11 PID: 89 Comm: kworker/11:1 Tainted: G        W          6.8.0-rc2-CI_DRM_14193-gc655e0fd2804+ #1
      <4>[  167.304736] Hardware name: Intel Corporation Rocket Lake Client Platform/RocketLake S UDIMM 6L RVP, BIOS RKLSFWI1.R00.3173.A03.2204210138 04/21/2022
      <4>[  167.304738] Workqueue: i915-unordered retire_work_handler [i915]
      <4>[  167.304839] RIP: 0010:__engine_park+0x3fd/0x680 [i915]
      <4>[  167.304937] Code: 00 48 c7 c2 b0 e5 86 a0 48 8d 3d 00 00 00 00 e8 79 48 d4 e0 bf 01 00 00 00 e8 ef 0a d4 e0 31 f6 bf 09 00 00 00 e8 03 49 c0 e0 <0f> 0b 0f 0b be 01 00 00 00 e8 f5 61 fd ff 31 c0 e9 34 fd ff ff 48
      <4>[  167.304940] RSP: 0018:ffffc9000059fce0 EFLAGS: 00010246
      <4>[  167.304942] RAX: 0000000000000200 RBX: 0000000000000000 RCX: 0000000000000006
      <4>[  167.304944] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
      <4>[  167.304946] RBP: ffff8881330ca1b0 R08: 0000000000000001 R09: 0000000000000001
      <4>[  167.304947] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8881330ca000
      <4>[  167.304948] R13: ffff888110f02aa0 R14: ffff88812d1d0205 R15: ffff88811277d4f0
      <4>[  167.304950] FS:  0000000000000000(0000) GS:ffff88844f780000(0000) knlGS:0000000000000000
      <4>[  167.304952] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      <4>[  167.304953] CR2: 00007fc362200c40 CR3: 000000013306e003 CR4: 0000000000770ef0
      <4>[  167.304955] PKRU: 55555554
      <4>[  167.304957] Call Trace:
      <4>[  167.304958]  <TASK>
      <4>[  167.305573]  ____intel_wakeref_put_last+0x1d/0x80 [i915]
      <4>[  167.305685]  i915_request_retire.part.0+0x34f/0x600 [i915]
      <4>[  167.305800]  retire_requests+0x51/0x80 [i915]
      <4>[  167.305892]  intel_gt_retire_requests_timeout+0x27f/0x700 [i915]
      <4>[  167.305985]  process_scheduled_works+0x2db/0x530
      <4>[  167.305990]  worker_thread+0x18c/0x350
      <4>[  167.305993]  kthread+0xfe/0x130
      <4>[  167.305997]  ret_from_fork+0x2c/0x50
      <4>[  167.306001]  ret_from_fork_asm+0x1b/0x30
      <4>[  167.306004]  </TASK>
      
      It is necessary for the queue_priority_hint to be lower than the next
      request submission upon waking up, as we rely on the hint to decide when
      to kick the tasklet to submit that first request.
      
      Fixes: 22b7a426 ("drm/i915/execlists: Preempt-to-busy")
      Closes: https://gitlab.freedesktop.org/drm/intel/issues/10154Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarJanusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: <stable@vger.kernel.org> # v5.4+
      Reviewed-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      Reviewed-by: default avatarAndi Shyti <andi.shyti@linux.intel.com>
      Signed-off-by: default avatarAndi Shyti <andi.shyti@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240318135906.716055-2-janusz.krzysztofik@linux.intel.com
      98850e96
    • Janusz Krzysztofik's avatar
      Revert "drm/i915: Wait for active retire before i915_active_fini()" · d666a494
      Janusz Krzysztofik authored
      This reverts commit 7a2280e8, obsoleted
      by "drm/i915/vma: Fix UAF on destroy against retire race".
      Signed-off-by: default avatarJanusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
      Cc: Nirmoy Das <nirmoy.das@intel.com>
      Reviewed-by: default avatarNirmoy Das <nirmoy.das@intel.com>
      Signed-off-by: default avatarAndi Shyti <andi.shyti@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240305143747.335367-8-janusz.krzysztofik@linux.intel.com
      d666a494
    • Janusz Krzysztofik's avatar
      drm/i915: Remove extra multi-gt pm-references · 1f33dc0c
      Janusz Krzysztofik authored
      There was an attempt to fix an issue of illegal attempts to free a still
      active i915 VMA object when parking a GT believed to be idle, reported by
      CI on 2-GT Meteor Lake.  As a solution, an extra wakeref for a Primary GT
      was acquired from i915_gem_do_execbuffer() -- see commit f56fe3e9
      ("drm/i915: Fix a VMA UAF for multi-gt platform").
      
      However, that fix occurred insufficient -- the issue was still reported by
      CI.  That wakeref was released on exit from i915_gem_do_execbuffer(), then
      potentially before completion of the request and deactivation of its
      associated VMAs.  Moreover, CI reports indicated that single-GT platforms
      also suffered sporadically from the same race.
      
      Since the issue has now been fixed by a preceding patch "drm/i915/vma: Fix
      UAF on destroy against retire race", drop the no longer useful changes
      introduced by that insufficient fix.
      
      v3: Also drop the no longer used .wakeref_gt0 field from struct
          i915_execbuffer.
      v2: Avoid the word "revert" in commit message (Rodrigo),
        - update commit description reusing relevant chunks dropped from the
          description of the proper fix (Rodrigo).
      Signed-off-by: default avatarJanusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
      Cc: Nirmoy Das <nirmoy.das@intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Reviewed-by: default avatarNirmoy Das <nirmoy.das@intel.com>
      Signed-off-by: default avatarAndi Shyti <andi.shyti@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240305143747.335367-7-janusz.krzysztofik@linux.intel.com
      1f33dc0c
    • Janusz Krzysztofik's avatar
      drm/i915/vma: Fix UAF on destroy against retire race · f3c71b2d
      Janusz Krzysztofik authored
      Object debugging tools were sporadically reporting illegal attempts to
      free a still active i915 VMA object when parking a GT believed to be idle.
      
      [161.359441] ODEBUG: free active (active state 0) object: ffff88811643b958 object type: i915_active hint: __i915_vma_active+0x0/0x50 [i915]
      [161.360082] WARNING: CPU: 5 PID: 276 at lib/debugobjects.c:514 debug_print_object+0x80/0xb0
      ...
      [161.360304] CPU: 5 PID: 276 Comm: kworker/5:2 Not tainted 6.5.0-rc1-CI_DRM_13375-g003f860e5577+ #1
      [161.360314] Hardware name: Intel Corporation Rocket Lake Client Platform/RocketLake S UDIMM 6L RVP, BIOS RKLSFWI1.R00.3173.A03.2204210138 04/21/2022
      [161.360322] Workqueue: i915-unordered __intel_wakeref_put_work [i915]
      [161.360592] RIP: 0010:debug_print_object+0x80/0xb0
      ...
      [161.361347] debug_object_free+0xeb/0x110
      [161.361362] i915_active_fini+0x14/0x130 [i915]
      [161.361866] release_references+0xfe/0x1f0 [i915]
      [161.362543] i915_vma_parked+0x1db/0x380 [i915]
      [161.363129] __gt_park+0x121/0x230 [i915]
      [161.363515] ____intel_wakeref_put_last+0x1f/0x70 [i915]
      
      That has been tracked down to be happening when another thread is
      deactivating the VMA inside __active_retire() helper, after the VMA's
      active counter has been already decremented to 0, but before deactivation
      of the VMA's object is reported to the object debugging tool.
      
      We could prevent from that race by serializing i915_active_fini() with
      __active_retire() via ref->tree_lock, but that wouldn't stop the VMA from
      being used, e.g. from __i915_vma_retire() called at the end of
      __active_retire(), after that VMA has been already freed by a concurrent
      i915_vma_destroy() on return from the i915_active_fini().  Then, we should
      rather fix the issue at the VMA level, not in i915_active.
      
      Since __i915_vma_parked() is called from __gt_park() on last put of the
      GT's wakeref, the issue could be addressed by holding the GT wakeref long
      enough for __active_retire() to complete before that wakeref is released
      and the GT parked.
      
      I believe the issue was introduced by commit d9393973 ("drm/i915:
      Remove the vma refcount") which moved a call to i915_active_fini() from
      a dropped i915_vma_release(), called on last put of the removed VMA kref,
      to i915_vma_parked() processing path called on last put of a GT wakeref.
      However, its visibility to the object debugging tool was suppressed by a
      bug in i915_active that was fixed two weeks later with commit e92eb246
      ("drm/i915/active: Fix missing debug object activation").
      
      A VMA associated with a request doesn't acquire a GT wakeref by itself.
      Instead, it depends on a wakeref held directly by the request's active
      intel_context for a GT associated with its VM, and indirectly on that
      intel_context's engine wakeref if the engine belongs to the same GT as the
      VMA's VM.  Those wakerefs are released asynchronously to VMA deactivation.
      
      Fix the issue by getting a wakeref for the VMA's GT when activating it,
      and putting that wakeref only after the VMA is deactivated.  However,
      exclude global GTT from that processing path, otherwise the GPU never goes
      idle.  Since __i915_vma_retire() may be called from atomic contexts, use
      async variant of wakeref put.  Also, to avoid circular locking dependency,
      take care of acquiring the wakeref before VM mutex when both are needed.
      
      v7: Add inline comments with justifications for:
          - using untracked variants of intel_gt_pm_get/put() (Nirmoy),
          - using async variant of _put(),
          - not getting the wakeref in case of a global GTT,
          - always getting the first wakeref outside vm->mutex.
      v6: Since __i915_vma_active/retire() callbacks are not serialized, storing
          a wakeref tracking handle inside struct i915_vma is not safe, and
          there is no other good place for that.  Use untracked variants of
          intel_gt_pm_get/put_async().
      v5: Replace "tile" with "GT" across commit description (Rodrigo),
        - avoid mentioning multi-GT case in commit description (Rodrigo),
        - explain why we need to take a temporary wakeref unconditionally inside
          i915_vma_pin_ww() (Rodrigo).
      v4: Refresh on top of commit 5e4e06e4 ("drm/i915: Track gt pm
          wakerefs") (Andi),
        - for more easy backporting, split out removal of former insufficient
          workarounds and move them to separate patches (Nirmoy).
        - clean up commit message and description a bit.
      v3: Identify root cause more precisely, and a commit to blame,
        - identify and drop former workarounds,
        - update commit message and description.
      v2: Get the wakeref before VM mutex to avoid circular locking dependency,
        - drop questionable Fixes: tag.
      
      Fixes: d9393973 ("drm/i915: Remove the vma refcount")
      Closes: https://gitlab.freedesktop.org/drm/intel/issues/8875Signed-off-by: default avatarJanusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Nirmoy Das <nirmoy.das@intel.com>
      Cc: Andi Shyti <andi.shyti@linux.intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: stable@vger.kernel.org # v5.19+
      Reviewed-by: default avatarNirmoy Das <nirmoy.das@intel.com>
      Signed-off-by: default avatarAndi Shyti <andi.shyti@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240305143747.335367-6-janusz.krzysztofik@linux.intel.com
      f3c71b2d
  13. 20 Mar, 2024 1 commit
  14. 13 Mar, 2024 1 commit
  15. 11 Mar, 2024 1 commit
  16. 07 Mar, 2024 5 commits
  17. 06 Mar, 2024 1 commit
  18. 05 Mar, 2024 2 commits
    • Janusz Krzysztofik's avatar
      drm/i915/selftests: Fix dependency of some timeouts on HZ · 6ee3f54b
      Janusz Krzysztofik authored
      Third argument of i915_request_wait() accepts a timeout value in jiffies.
      Most users pass either a simple HZ based expression, or a result of
      msecs_to_jiffies(), or MAX_SCHEDULE_TIMEOUT, or a very small number not
      exceeding 4 if applicable as that value.  However, there is one user --
      intel_selftest_wait_for_rq() -- that passes a WAIT_FOR_RESET_TIME symbol,
      defined as a large constant value that most probably represents a desired
      timeout in ms.  While that usage results in the intended value of timeout
      on usual x86_64 kernel configurations, it is not portable across different
      architectures and custom kernel configs.
      
      Rename the symbol to clearly indicate intended units and convert it to
      jiffies before use.
      
      Fixes: 3a4bfa09 ("drm/i915/selftest: Fix workarounds selftest for GuC submission")
      Signed-off-by: default avatarJanusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
      Cc: Rahul Kumar Singh <rahul.kumar.singh@intel.com>
      Cc: John Harrison <John.C.Harrison@Intel.com>
      Cc: Matthew Brost <matthew.brost@intel.com>
      Reviewed-by: default avatarAndi Shyti <andi.shyti@linux.intel.com>
      Signed-off-by: default avatarAndi Shyti <andi.shyti@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240222113347.648945-2-janusz.krzysztofik@linux.intel.com
      6ee3f54b
    • Janusz Krzysztofik's avatar
      drm/i915/selftest_hangcheck: Check sanity with more patience · 6616e048
      Janusz Krzysztofik authored
      While trying to reproduce some other issues reported by CI for i915
      hangcheck live selftest, I found them hidden behind timeout failures
      reported by igt_hang_sanitycheck -- the very first hangcheck test case
      executed.
      
      Feb 22 19:49:06 DUT1394ACMR kernel: calling  mei_gsc_driver_init+0x0/0xff0 [mei_gsc] @ 121074
      Feb 22 19:49:06 DUT1394ACMR kernel: i915 0000:03:00.0: [drm] DRM_I915_DEBUG enabled
      Feb 22 19:49:06 DUT1394ACMR kernel: i915 0000:03:00.0: [drm] Cannot find any crtc or sizes
      Feb 22 19:49:06 DUT1394ACMR kernel: probe of i915.mei-gsc.768 returned 0 after 1475 usecs
      Feb 22 19:49:06 DUT1394ACMR kernel: probe of i915.mei-gscfi.768 returned 0 after 1441 usecs
      Feb 22 19:49:06 DUT1394ACMR kernel: initcall mei_gsc_driver_init+0x0/0xff0 [mei_gsc] returned 0 after 3010 usecs
      Feb 22 19:49:06 DUT1394ACMR kernel: i915 0000:03:00.0: [drm] DRM_I915_DEBUG_GEM enabled
      Feb 22 19:49:06 DUT1394ACMR kernel: i915 0000:03:00.0: [drm] DRM_I915_DEBUG_RUNTIME_PM enabled
      Feb 22 19:49:06 DUT1394ACMR kernel: i915: Performing live selftests with st_random_seed=0x4c26c048 st_timeout=500
      Feb 22 19:49:07 DUT1394ACMR kernel: i915: Running hangcheck
      Feb 22 19:49:07 DUT1394ACMR kernel: calling  mei_hdcp_driver_init+0x0/0xff0 [mei_hdcp] @ 121074
      Feb 22 19:49:07 DUT1394ACMR kernel: i915: Running intel_hangcheck_live_selftests/igt_hang_sanitycheck
      Feb 22 19:49:07 DUT1394ACMR kernel: probe of 0000:00:16.0-b638ab7e-94e2-4ea2-a552-d1c54b627f04 returned 0 after 1398 usecs
      Feb 22 19:49:07 DUT1394ACMR kernel: probe of i915.mei-gsc.768-b638ab7e-94e2-4ea2-a552-d1c54b627f04 returned 0 after 97 usecs
      Feb 22 19:49:07 DUT1394ACMR kernel: initcall mei_hdcp_driver_init+0x0/0xff0 [mei_hdcp] returned 0 after 101960 usecs
      Feb 22 19:49:07 DUT1394ACMR kernel: calling  mei_pxp_driver_init+0x0/0xff0 [mei_pxp] @ 121094
      Feb 22 19:49:07 DUT1394ACMR kernel: probe of 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1 returned 0 after 435 usecs
      Feb 22 19:49:07 DUT1394ACMR kernel: mei_pxp i915.mei-gsc.768-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: bound 0000:03:00.0 (ops i915_pxp_tee_component_ops [i915])
      Feb 22 19:49:07 DUT1394ACMR kernel: 100ms wait for request failed on rcs0, err=-62
      Feb 22 19:49:07 DUT1394ACMR kernel: probe of i915.mei-gsc.768-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1 returned 0 after 158425 usecs
      Feb 22 19:49:07 DUT1394ACMR kernel: initcall mei_pxp_driver_init+0x0/0xff0 [mei_pxp] returned 0 after 224159 usecs
      Feb 22 19:49:07 DUT1394ACMR kernel: i915/intel_hangcheck_live_selftests: igt_hang_sanitycheck failed with error -5
      Feb 22 19:49:07 DUT1394ACMR kernel: i915: probe of 0000:03:00.0 failed with error -5
      
      Those request waits, once timed out after 100ms, have never been
      confirmed to still persist over another 100ms, always being able to
      complete within the originally requested wait time doubled.
      
      Taking into account potentially significant additional concurrent workload
      generated by new auxiliary drivers that didn't exist before and now are
      loaded in parallel with the i915 module also when loaded in selftest mode,
      relax our expectations on time consumed by the sanity check request before
      it completes.
      Signed-off-by: default avatarJanusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
      Reviewed-by: default avatarAndi Shyti <andi.shyti@linux.intel.com>
      Signed-off-by: default avatarAndi Shyti <andi.shyti@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240228152500.38267-2-janusz.krzysztofik@linux.intel.com
      6616e048
  19. 04 Mar, 2024 1 commit
  20. 01 Mar, 2024 2 commits
  21. 28 Feb, 2024 1 commit
  22. 20 Feb, 2024 1 commit
  23. 15 Feb, 2024 1 commit
  24. 14 Feb, 2024 1 commit
  25. 13 Feb, 2024 1 commit