1. 06 Sep, 2021 4 commits
    • Daniel Vetter's avatar
      drm/i915: Drop code to handle set-vm races from execbuf · e1068a9e
      Daniel Vetter authored
      Changing the vm from a finalized gem ctx is no longer possible, which
      means we don't have to check for that anymore.
      
      I was pondering whether to keep the check as a WARN_ON, but things go
      boom real bad real fast if the vm of a vma is wrong. Plus we'd need to
      also get the ggtt vm for !full-ppgtt platforms. Ditching it all seemed
      like a better idea.
      Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      References: ccbc1b97 ("drm/i915/gem: Don't allow changing the VM on running contexts (v4)")
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Cc: Jon Bloomfield <jon.bloomfield@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Jason Ekstrand <jason@jlekstrand.net>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-4-daniel.vetter@ffwll.ch
      e1068a9e
    • Daniel Vetter's avatar
      drm/i915: Keep gem ctx->vm alive until the final put · 8cf97637
      Daniel Vetter authored
      The comment added in
      
          commit b81dde71
          Author: Chris Wilson <chris@chris-wilson.co.uk>
          Date:   Tue May 21 22:11:29 2019 +0100
      
              drm/i915: Allow userspace to clone contexts on creation
      
      and moved in
      
          commit 27dbae8f
          Author: Chris Wilson <chris@chris-wilson.co.uk>
          Date:   Wed Nov 6 09:13:12 2019 +0000
      
              drm/i915/gem: Safely acquire the ctx->vm when copying
      
      suggested that i915_address_space were at least intended to be managed
      through SLAB_TYPESAFE_BY_RCU:
      
                      * This ppgtt may have be reallocated between
                      * the read and the kref, and reassigned to a third
                      * context. In order to avoid inadvertent sharing
                      * of this ppgtt with that third context (and not
                      * src), we have to confirm that we have the same
                      * ppgtt after passing through the strong memory
                      * barrier implied by a successful
                      * kref_get_unless_zero().
      
      But extensive git history search has not brough any such reuse to
      light.
      
      What has come to light though is that ever since
      
      commit 2850748e
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Fri Oct 4 14:39:58 2019 +0100
      
          drm/i915: Pull i915_vma_pin under the vm->mutex
      
      (yes this commit is earlier) the final i915_vma_put call has been
      moved from i915_gem_context_free (now called _release) to
      context_close, which means it's not actually safe anymore to access
      the ctx->vm pointer without lock helds, because it might disappear at
      any moment. Note that superficially things all still work, because the
      i915_address_space is RCU protected since
      
          commit b32fa811
          Author: Chris Wilson <chris@chris-wilson.co.uk>
          Date:   Thu Jun 20 19:37:05 2019 +0100
      
              drm/i915/gtt: Defer address space cleanup to an RCU worker
      
      except the very clever macro above (which is designed to protected
      against object reuse due to SLAB_TYPESAFE_BY_RCU or similar tricks)
      results in an endless loop if the refcount of the ctx->vm ever
      permanently drops to 0. Which it totally now can.
      
      Fix that by moving the final i915_vm_put to where it should be.
      
      Note that i915_gem_context is rcu protected, but _only_ the final
      kfree. This means anyone who chases a pointer to a gem ctx solely
      under the protection can pretty only call kref_get_unless_zero(). This
      seems to be pretty much the case, aside from a bunch of cases that
      consult the scheduling information without any further protection.
      Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Jason Ekstrand <jason@jlekstrand.net>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Matthew Brost <matthew.brost@intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: "Thomas Hellström" <thomas.hellstrom@intel.com>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Fixes: 2850748e ("drm/i915: Pull i915_vma_pin under the vm->mutex")
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-3-daniel.vetter@ffwll.ch
      8cf97637
    • Daniel Vetter's avatar
      drm/i915: Release ctx->syncobj on final put, not on ctx close · c238980e
      Daniel Vetter authored
      gem context refcounting is another exercise in least locking design it
      seems, where most things get destroyed upon context closure (which can
      race with anything really). Only the actual memory allocation and the
      locks survive while holding a reference.
      
      This tripped up Jason when reimplementing the single timeline feature
      in
      
      commit 00dae4d3
      Author: Jason Ekstrand <jason@jlekstrand.net>
      Date:   Thu Jul 8 10:48:12 2021 -0500
      
          drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)
      
      We could fix the bug by holding ctx->mutex in execbuf and clear the
      pointer (again while holding the mutex) context_close, but it's
      cleaner to just make the context object actually invariant over its
      _entire_ lifetime. This way any other ioctl that's potentially racing,
      but holding a full reference, can still rely on ctx->syncobj being
      an immutable pointer. Which without this change, is not the case.
      Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Fixes: 00dae4d3 ("drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)")
      Cc: Jason Ekstrand <jason@jlekstrand.net>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Matthew Brost <matthew.brost@intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: "Thomas Hellström" <thomas.hellstrom@intel.com>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-2-daniel.vetter@ffwll.ch
      c238980e
    • Daniel Vetter's avatar
      drm/i915: Release i915_gem_context from a worker · 75eefd82
      Daniel Vetter authored
      The only reason for this really is the i915_gem_engines->fence
      callback engines_notify(), which exists purely as a fairly funky
      reference counting scheme for that. Otherwise all other callers are
      from process context, and generally fairly benign locking context.
      
      Unfortunately untangling that requires some major surgery, and we have
      a few i915_gem_context reference counting bugs that need fixing, and
      they blow in the current hardirq calling context, so we need a
      stop-gap measure.
      
      Put a FIXME comment in when this should be removable again.
      
      v2: Fix mock_context(), noticed by intel-gfx-ci.
      Acked-by: default avatarAcked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Cc: Jon Bloomfield <jon.bloomfield@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Jason Ekstrand <jason@jlekstrand.net>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-1-daniel.vetter@ffwll.ch
      75eefd82
  2. 03 Sep, 2021 5 commits
  3. 02 Sep, 2021 1 commit
  4. 31 Aug, 2021 1 commit
  5. 27 Aug, 2021 2 commits
  6. 26 Aug, 2021 1 commit
  7. 25 Aug, 2021 4 commits
  8. 24 Aug, 2021 2 commits
  9. 20 Aug, 2021 5 commits
  10. 19 Aug, 2021 1 commit
    • Matthew Brost's avatar
      drm/i915: Fix syncmap memory leak · faf89098
      Matthew Brost authored
      A small race exists between intel_gt_retire_requests_timeout and
      intel_timeline_exit which could result in the syncmap not getting
      free'd. Rather than work to hard to seal this race, simply cleanup the
      syncmap on fini.
      
      unreferenced object 0xffff88813bc53b18 (size 96):
        comm "gem_close_race", pid 5410, jiffies 4294917818 (age 1105.600s)
        hex dump (first 32 bytes):
          01 00 00 00 00 00 00 00 00 00 00 00 0a 00 00 00  ................
          00 00 00 00 00 00 00 00 6b 6b 6b 6b 06 00 00 00  ........kkkk....
        backtrace:
          [<00000000120b863a>] __sync_alloc_leaf+0x1e/0x40 [i915]
          [<00000000042f6959>] __sync_set+0x1bb/0x240 [i915]
          [<0000000090f0e90f>] i915_request_await_dma_fence+0x1c7/0x400 [i915]
          [<0000000056a48219>] i915_request_await_object+0x222/0x360 [i915]
          [<00000000aaac4ee3>] i915_gem_do_execbuffer+0x1bd0/0x2250 [i915]
          [<000000003c9d830f>] i915_gem_execbuffer2_ioctl+0x405/0xce0 [i915]
          [<00000000fd7a8e68>] drm_ioctl_kernel+0xb0/0xf0 [drm]
          [<00000000e721ee87>] drm_ioctl+0x305/0x3c0 [drm]
          [<000000008b0d8986>] __x64_sys_ioctl+0x71/0xb0
          [<0000000076c362a4>] do_syscall_64+0x33/0x80
          [<00000000eb7a4831>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      Signed-off-by: default avatarMatthew Brost <matthew.brost@intel.com>
      Fixes: 531958f6 ("drm/i915/gt: Track timeline activeness in enter/exit")
      Cc: <stable@vger.kernel.org>
      Reviewed-by: default avatarJohn Harrison <John.C.Harrison@Intel.com>
      Signed-off-by: default avatarJohn Harrison <John.C.Harrison@Intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210730195342.110234-1-matthew.brost@intel.com
      faf89098
  11. 18 Aug, 2021 2 commits
    • Matt Roper's avatar
      drm/i915/dg2: Maintain backward-compatible nested batch behavior · 9e9dfd08
      Matt Roper authored
      For tgl+, the per-context setting of MI_MODE[12] determines whether
      the bits of a nested MI_BATCH_BUFFER_START instruction should be
      interpreted in the traditional manner or whether they should
      instead use a new tgl+ meaning that breaks backward compatibility, but
      allows nesting into 3rd-level batchbuffers.  For previous platforms,
      the hardware default for this register bit is to maintain
      backward-compatible behavior unless a context intentionally opts into
      the new behavior; however Xe_HPG flips the hardware default behavior.
      
      From a SW perspective, we want to maintain the backward-compatible
      behavior for userspace, so we'll apply a fake workaround to set it back
      to the legacy behavior on platforms where the hardware default is to
      break compatibility.  At the moment there is no Linux userspace that
      utilizes third-level batchbuffers, so this will avoid userspace from
      needing to make any changes.  using the legacy meaning is the correct
      thing to do.  If/when we have userspace consumers that want to utilize
      third-level batch nesting, we can provide a context parameter to allow
      them to opt-in.
      
      Bspec: 45974, 45718
      Cc: John Harrison <John.C.Harrison@Intel.com>
      Signed-off-by: default avatarMatt Roper <matthew.d.roper@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210805163647.801064-9-matthew.d.roper@intel.comReviewed-by: default avatarAnusha Srivatsa <anusha.srivatsa@intel.com>
      9e9dfd08
    • Kees Cook's avatar
      drm/i915: Use designated initializers for init/exit table · 90fd2194
      Kees Cook authored
      The kernel builds with -Werror=designated-init, and __designated_init
      is used by CONFIG_GCC_PLUGIN_RANDSTRUCT for automatically selected (all
      function pointer) structures. Include the field names in the init/exit
      table. Avoids warnings like:
      
      drivers/gpu/drm/i915/i915_module.c:59:4: error: positional initialization of field in 'struct' declared with 'designated_init' attribute [-Werror=designated-init]
      
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: intel-gfx@lists.freedesktop.org
      Cc: dri-devel@lists.freedesktop.org
      Fixes: a04ea6ae ("drm/i915: Use a table for i915_init/exit (v2)")
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210817233357.2379455-1-keescook@chromium.org
      90fd2194
  12. 13 Aug, 2021 1 commit
  13. 12 Aug, 2021 3 commits
  14. 11 Aug, 2021 4 commits
  15. 10 Aug, 2021 4 commits