1. 13 Sep, 2021 23 commits
  2. 10 Sep, 2021 2 commits
    • Vinay Belgaumkar's avatar
      drm/i915: Get PM ref before accessing HW register · f25e3908
      Vinay Belgaumkar authored
      Seeing these errors when GT is likely in suspend state-
      "RPM wakelock ref not held during HW access"
      
      Ensure GT is awake before trying to access HW registers. Avoid
      reading the register if that is not the case.
      Signed-off-by: default avatarVinay Belgaumkar <vinay.belgaumkar@intel.com>
      Fixes: 41e5c17e ("drm/i915/guc/slpc: Sysfs hooks for SLPC")
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Signed-off-by: default avatarJohn Harrison <John.C.Harrison@Intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210907232704.12982-1-vinay.belgaumkar@intel.com
      f25e3908
    • Tvrtko Ursulin's avatar
      drm/i915: Use Transparent Hugepages when IOMMU is enabled · 74388ca4
      Tvrtko Ursulin authored
      Usage of Transparent Hugepages was disabled in 9987da4b
      ("drm/i915: Disable THP until we have a GPU read BW W/A"), but since it
      appears majority of performance regressions reported with an enabled IOMMU
      can be almost eliminated by turning them on, lets just do that.
      
      To err on the side of safety we keep the current default in cases where
      IOMMU is not active, and only when it is default to the "huge=within_size"
      mode. Although there probably would be wins to enable them throughout,
      more extensive testing across benchmarks and platforms would need to be
      done.
      
      With the patch and IOMMU enabled my local testing on a small Skylake part
      shows OglVSTangent regression being reduced from ~14% (IOMMU on versus
      IOMMU off) to ~2% (same comparison but with THP on).
      
      More detailed testing done in the below referenced Gitlab issue by Eero:
      
      Skylake GT4e:
      
      Performance drops from enabling IOMMU:
      
          30-35% SynMark CSDof
          20-25% Unigine Heaven, MemBW GPU write, SynMark VSTangent
          ~20% GLB Egypt  (1/2 screen window)
          10-15% GLB T-Rex (1/2 screen window)
          8-10% GfxBench T-Rex, MemBW GPU blit
          7-8% SynMark DeferredAA + TerrainFly* + ZBuffer
          6-7% GfxBench Manhattan 3.0 + 3.1, SynMark TexMem128 & CSCloth
          5-6% GfxBench CarChase, Unigine Valley
          3-5% GfxBench Vulkan & GL AztecRuins + ALU2, MemBW GPU texture,
               SynMark Fill*, Deferred, TerrainPan*
          1-2% Most of the other tests
      
      With the patch drops become:
      
          20-25% SynMark TexMem*
          15-20% GLB Egypt (1/2 screen window)
          10-15% GLB T-Rex (1/2 screen window)
          4-7% GfxBench T-Rex, GpuTest Triangle
          1-8% GfxBench ALU2 (offscreen 1%, onscreen 8%)
          3% GfxBench Manhattan 3.0, SynMark CSDof
          2-3% Unigine Heaven + Valley, MemBW GPU texture
          1-3 GfxBench Manhattan 3.1 + CarChase + Vulkan & GL AztecRuins
      
      Broxton:
      
      Performance drops from IOMMU, without patch:
      
          30% MemBW GPU write
          25% SynMark ZBuffer + Fill*
          20% MemBW GPU blit
          15% MemBW GPU blend, GpuTest Triangle
          10-15% MemBW GPU texture
          10% GLB Egypt, Unigine Heaven (had hangs), SynMark TerrainFly*
          7-9% GLB T-Rex, GfxBench Manhattan 3.0 + T-Rex,
               SynMark Deferred* + TexMem*
          6-8% GfxBench CarChase, Unigine Valley,
               SynMark CSCloth + ShMapVsm + TerrainPan*
          5-6% GfxBench Manhattan 3.1 + GL AztecRuins,
               SynMark CSDof + TexFilterTri
          2-4% GfxBench ALU2, SynMark DrvRes + GSCloth + ShMapPcf + Batch[0-5] +
               TexFilterAniso, GpuTest GiMark + 32-bit Julia
      
      And with patch:
      
          15-20% MemBW GPU texture
          10% SynMark TexMem*
          8-9% GLB Egypt (1/2 screen window)
          4-5% GLB T-Rex (1/2 screen window)
          3-6% GfxBench Manhattan 3.0, GpuTest FurMark,
               SynMark Deferred + TexFilterTri
          3-4% GfxBench Manhattan 3.1 + T-Rex, SynMark VSInstancing
          2-4% GpuTest Triangle, SynMark DeferredAA
          2-3% Unigine Heaven + Valley
          1-3% SynMark Terrain*
          1-2% GfxBench CarChase, SynMark TexFilterAniso + ZBuffer
      
      Tigerlake-H:
      
          20-25% MemBW GPU texture
          15-20% GpuTest Triangle
          13-15% SynMark TerrainFly* + DeferredAA + HdrBloom
          8-10% GfxBench Manhattan 3.1, SynMark TerrainPan* + DrvRes
          6-7% GfxBench Manhattan 3.0, SynMark TexMem*
          4-8% GLB onscreen Fill + T-Rex + Egypt (more in onscreen than
               offscreen versions of T-Rex/Egypt)
          4-6% GfxBench CarChase + GLES AztecRuins + ALU2, GpuTest 32-bit Julia,
               SynMark CSDof + DrvState
          3-5% GfxBench T-Rex + Egypt, Unigine Heaven + Valley, GpuTest Plot3D
          1-7% Media tests
          2-3% MemBW GPU blit
          1-3% Most of the rest of 3D tests
      
      With the patch:
      
          6-8% MemBW GPU blend => the only regression in these tests (compared
               to IOMMU without THP)
          4-6% SynMark DrvState (not impacted) + HdrBloom (improved)
          3-4% GLB T-Rex
          ~3% GLB Egypt, SynMark DrvRes
          1-3% GfxBench T-Rex + Egypt, SynMark TexFilterTri
          1-2% GfxBench CarChase + GLES AztecRuins, Unigine Valley,
              GpuTest Triangle
          ~1% GfxBench Manhattan 3.0/3.1, Unigine Heaven
      
      Perf of several tests actually improved with IOMMU + THP, compared to no
      IOMMU / no THP:
      
          10-15% SynMark Batch[0-3]
          5-10% MemBW GPU texture, SynMark ShMapVsm
          3-4% SynMark Fill* + Geom*
          2-3% SynMark TexMem512 + CSCloth
          1-2% SynMark TexMem128 + DeferredAA
      
      As a summary across all platforms, these are the benchmarks where enabling
      THP on top of IOMMU enabled brings regressions:
      
       * Skylake GT4e:
         20-25% SynMark TexMem*
         (whereas all MemBW GPU tests either improve or are not affected)
      
       * Broxton J4205:
         7% MemBW GPU texture
         2-3% SynMark TexMem*
      
       * Tigerlake-H:
         7% MemBW GPU blend
      
      Other benchmarks show either lowering of regressions or improvements.
      
      v2:
       * Add Kconfig dependency to transparent hugepages and some help text.
       * Move to helper for easier handling of kernel build options.
      
      v3:
       * Drop Kconfig. (Daniel)
      
      v4:
       * Add some benchmark results to commit message.
      
      v5:
       * Add explicit regression summary to commit message. (Eero)
      
      References: b901bb89 ("drm/i915/gemfs: enable THP")
      References: 9987da4b ("drm/i915: Disable THP until we have a GPU read BW W/A")
      References: https://gitlab.freedesktop.org/drm/intel/-/issues/430Co-developed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Eero Tamminen <eero.t.tamminen@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Signed-off-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210909114448.508493-1-tvrtko.ursulin@linux.intel.com
      74388ca4
  3. 08 Sep, 2021 4 commits
  4. 07 Sep, 2021 1 commit
  5. 06 Sep, 2021 10 commits
    • Daniel Vetter's avatar
      drm/i915: Stop rcu support for i915_address_space · dcc5d820
      Daniel Vetter authored
      The full audit is quite a bit of work:
      
      - i915_dpt has very simple lifetime (somehow we create a display pagetable vm
        per object, so its _very_ simple, there's only ever a single vma in there),
        and uses i915_vm_close(), which internally does a i915_vm_put(). No rcu.
      
        Aside: wtf is i915_dpt doing in the intel_display.c garbage collector as a new
        feature, instead of added as a separate file with some clean-ish interface.
      
        Also, i915_dpt unfortunately re-introduces some coding patterns from
        pre-dma_resv_lock conversion times.
      
      - i915_gem_proto_ctx is fully refcounted and no rcu, all protected by
        fpriv->proto_context_lock.
      
      - i915_gem_context is itself rcu protected, and that might leak to anything it
        points at. Before
      
      	commit cf977e18
      	Author: Chris Wilson <chris@chris-wilson.co.uk>
      	Date:   Wed Dec 2 11:21:40 2020 +0000
      
      	    drm/i915/gem: Spring clean debugfs
      
        and
      
      	commit db80a129
      	Author: Chris Wilson <chris@chris-wilson.co.uk>
      	Date:   Mon Jan 18 11:08:54 2021 +0000
      
      	    drm/i915/gem: Remove per-client stats from debugfs/i915_gem_objects
      
        we had a bunch of debugfs files that relied on rcu protecting everything, but
        those are gone now. The main one was removed even earlier with
      
        There doesn't seem to be anything left that's actually protecting
        stuff now that the ctx->vm itself is invariant. See
      
      	commit ccbc1b97
      	Author: Jason Ekstrand <jason@jlekstrand.net>
      	Date:   Thu Jul 8 10:48:30 2021 -0500
      
      	    drm/i915/gem: Don't allow changing the VM on running contexts (v4)
      
        Note that we drop the vm refcount before the final release of the gem context
        refcount, so this is all very dangerous even without rcu. Note that aside from
        later on creating new engines (a defunct feature) and debug output we're never
        looked at gem_ctx->vm for anything functional, hence why this is ok.
        Fingers crossed.
      
        Preceeding patches removed all vestiges of rcu use from gem_ctx->vm
        derferencing to make it clear it's really not used.
      
        The gem_ctx->rcu protection was introduced in
      
      	commit a4e7ccda
      	Author: Chris Wilson <chris@chris-wilson.co.uk>
      	Date:   Fri Oct 4 14:40:09 2019 +0100
      
      	    drm/i915: Move context management under GEM
      
        The commit message is somewhat entertaining because it fails to
        mention this fact completely, and compensates that by an in-commit
        changelog entry that claims that ctx->vm is protected by ctx->mutex.
        Which was the case _before_ this commit, but no longer after it.
      
      - intel_context holds a full reference. Unfortunately intel_context is also rcu
        protected and the reference to the ->vm is dropped before the
        rcu barrier - only the kfree is delayed. So again we need to check
        whether that leaks anywhere on the intel_context->vm. RCU is only
        used to protect intel_context sitting on the breadcrumb lists, which
        don't look at the vm anywhere, so we are fine.
      
        Nothing else relies on rcu protection of intel_context and hence is
        fully protected by the kref refcount alone, which protects
        intel_context->vm in turn.
      
        The breadcrumbs rcu usage was added in
      
      	commit c744d503
      	Author: Chris Wilson <chris@chris-wilson.co.uk>
      	Date:   Thu Nov 26 14:04:06 2020 +0000
      
      	    drm/i915/gt: Split the breadcrumb spinlock between global and contexts
      
        its parent commit added the intel_context rcu protection:
      
      	commit 14d1eaf0
      	Author: Chris Wilson <chris@chris-wilson.co.uk>
      	Date:   Thu Nov 26 14:04:05 2020 +0000
      
      	    drm/i915/gt: Protect context lifetime with RCU
      
        given some credence to my claim that I've actually caught them all.
      
      - drm_i915_gem_object's shares_resv_from pointer has a full refcount to the
        dma_resv, which is a sub-refcount that's released after the final
        i915_vm_put() has been called. Safe.
      
        Aside: Maybe we should have a struct dma_resv_shared which is just dma_resv +
        kref as a stand-alone thing. It's a pretty useful pattern which other drivers
        might want to copy.
      
        For a bit more context see
      
      	commit 4d8151ae
      	Author: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      	Date:   Tue Jun 1 09:46:41 2021 +0200
      
      	    drm/i915: Don't free shared locks while shared
      
      - the fpriv->vm_xa was relying on rcu_read_lock for lookup, but that
        was updated in a prep patch too to just be a spinlock-protected
        lookup.
      
      - intel_gt->vm is set at driver load in intel_gt_init() and released
        in intel_gt_driver_release(). There seems to be some issue that
        in some error paths this is called twice, but otherwise no rcu to be
        found anywhere. This was added in the below commit, which
        unfortunately doesn't explain why this complication exists.
      
      	commit e6ba7648
      	Author: Chris Wilson <chris@chris-wilson.co.uk>
      	Date:   Sat Dec 21 16:03:24 2019 +0000
      
      	    drm/i915: Remove i915->kernel_context
      
        The proper fix most likely for this is to start using drmm_ at large
        scale, but that's also huge amounts of work.
      
      - i915_vma->vm is some real pain, because rcu is rcu protected, at
        least in the vma lookup in the context lookup cache in
        eb_lookup_vma(). This was added in
      
      	commit 4ff4b44c
      	Author: Chris Wilson <chris@chris-wilson.co.uk>
      	Date:   Fri Jun 16 15:05:16 2017 +0100
      
      	    drm/i915: Store a direct lookup from object handle to vma
      
        This was changed to a radix tree from the hashtable in, but with the
        locking unchanged, in
      
      	commit d1b48c1e
      	Author: Chris Wilson <chris@chris-wilson.co.uk>
      	Date:   Wed Aug 16 09:52:08 2017 +0100
      
      	    drm/i915: Replace execbuf vma ht with an idr
      
        In
      
      	commit 93159e12
      	Author: Chris Wilson <chris@chris-wilson.co.uk>
      	Date:   Mon Mar 23 09:28:41 2020 +0000
      
      	    drm/i915/gem: Avoid gem_context->mutex for simple vma lookup
      
        the locking was changed from dev->struct_mutex to rcu, which added
        the requirement to rcu protect i915_vma. Somehow this was missed in
        review (or I'm completely blind).
      
        Irrespective of all that the vma lookup cache rcu_read_lock grabs a
        full reference of the vma and the rcu doesn't leak further. So no
        impact on i915_address_space from that.
      
        I have not found any other rcu use for i915_vma, but given that it
        seems broken I also didn't bother to do a careful in-depth audit.
      
      Alltogether there's nothing left in-tree anymore which requires that a
      pointer deref to an i915_address_space is safe undre rcu_read_lock
      only.
      
      rcu protection of i915_address_space was introduced in
      
      commit b32fa811
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Thu Jun 20 19:37:05 2019 +0100
      
          drm/i915/gtt: Defer address space cleanup to an RCU worker
      
      by mixing up a bugfixing (i915_address_space needs to be released from
      a worker) with enabling rcu support. The commit message also seems
      somewhat confused, because it talks about cleanup of WC pages
      requiring sleep, while the code and linked bugzilla are about a
      requirement to take dev->struct_mutex (which yes sleeps but it's a
      much more specific problem). Since final kref_put can be called from
      pretty much anywhere (including hardirq context through the
      scheduler's i915_active cleanup) we need a worker here. Hence that
      part must be kept.
      
      Ideally all these reclaim workers should have some kind of integration
      with our shrinkers, but for some of these it's rather tricky. Anyway,
      that's a preexisting condition in the codeebase that we wont fix in
      this patch here.
      
      We also remove the rcu_barrier in ggtt_cleanup_hw added in
      
      commit 60a4233a
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Mon Jul 29 14:24:12 2019 +0100
      
          drm/i915: Flush the i915_vm_release before ggtt shutdown
      Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Cc: Jon Bloomfield <jon.bloomfield@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Jason Ekstrand <jason@jlekstrand.net>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-11-daniel.vetter@ffwll.ch
      dcc5d820
    • Daniel Vetter's avatar
      drm/i915: use xa_lock/unlock for fpriv->vm_xa lookups · 84315152
      Daniel Vetter authored
      We don't need the absolute speed of rcu for this. And
      i915_address_space in general dont need rcu protection anywhere else,
      after we've made gem contexts and engines a lot more immutable.
      
      Note that this semantically reverts
      
      commit aabbe344
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Fri Aug 30 19:03:25 2019 +0100
      
          drm/i915: Use RCU for unlocked vm_idr lookup
      
      except we have the conversion from idr to xarray in between.
      
      v2: kref_get_unless_zero is no longer required (Maarten)
      Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Cc: Jon Bloomfield <jon.bloomfield@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Jason Ekstrand <jason@jlekstrand.net>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-10-daniel.vetter@ffwll.ch
      84315152
    • Daniel Vetter's avatar
      drm/i915: Drop __rcu from gem_context->vm · 9ec8795e
      Daniel Vetter authored
      It's been invariant since
      
          commit ccbc1b97
          Author: Jason Ekstrand <jason@jlekstrand.net>
          Date:   Thu Jul 8 10:48:30 2021 -0500
      
              drm/i915/gem: Don't allow changing the VM on running contexts (v4)
      
      this just completes the deed. I've tried to split out prep work for
      more careful review as much as possible, this is what's left:
      
      - get_ppgtt gets simplified since we don't need to grab a temporary
        reference - we can rely on the temporary reference for the gem_ctx
        while we inspect the vm. The new vm_id still needs a full
        i915_vm_open ofc. This also removes the final caller of context_get_vm_rcu
      
      - A pile of selftests can now just look at ctx->vm instead of
        rcu_dereference_protected( , true) or similar things.
      
      - All callers of i915_gem_context_vm also disappear.
      
      - I've changed the hugepage selftest to set scrub_64K without any
        locking, because when we inspect that setting we're also not taking
        any locks either. It works because it's a selftests that's careful
        (single threaded gives you nice ordering) and not a live driver
        where races can happen from anywhere.
      
      These can only be split up further if we have some intermediate state
      with a bunch more rcu_dereference_protected(ctx->vm, true), just to
      shut up lockdep and sparse.
      
      The conversion to __rcu happened in
      
      commit a4e7ccda
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Fri Oct 4 14:40:09 2019 +0100
      
          drm/i915: Move context management under GEM
      
      Note that we're not breaking the actual bugfix in there: The real
      bugfix is pushing the i915_vm_relase onto a separate worker, to avoid
      locking inversion issues. The rcu conversion was just thrown in for
      entertainment value on top (no vm lookup isn't even close to anything
      that's a hotpath where removing the single spinlock can be measured).
      
      v2: Rebase over the change to move the i915_vm_put() into
      i915_gem_context_release().
      
      v3: Trivial conflict against repainted shed.
      Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Cc: Jon Bloomfield <jon.bloomfield@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Jason Ekstrand <jason@jlekstrand.net>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-9-daniel.vetter@ffwll.ch
      9ec8795e
    • Daniel Vetter's avatar
      drm/i915: Use i915_gem_context_get_eb_vm in intel_context_set_gem · 0483a301
      Daniel Vetter authored
      Since
      
      commit ccbc1b97
      Author: Jason Ekstrand <jason@jlekstrand.net>
      Date:   Thu Jul 8 10:48:30 2021 -0500
      
          drm/i915/gem: Don't allow changing the VM on running contexts (v4)
      
      the gem_ctx->vm can't change anymore. Plus we always set the
      intel_context->vm, so might as well use the helper we have for that.
      
      This makes it very clear that we always overwrite intel_context->vm
      for userspace contexts, since the default is gt->vm, which is
      explicitly reserved for kernel context use. It would be good to split
      things up a bit further and avoid any possibility for an accident
      where we run kernel stuff in userspace vm or the other way round.
      Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Cc: Jon Bloomfield <jon.bloomfield@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Jason Ekstrand <jason@jlekstrand.net>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-8-daniel.vetter@ffwll.ch
      0483a301
    • Daniel Vetter's avatar
      drm/i915: Add i915_gem_context_is_full_ppgtt · a82a9979
      Daniel Vetter authored
      And use it anywhere we have open-coded checks for ctx->vm that really
      only check for full ppgtt.
      
      Plus for paranoia add a GEM_BUG_ON that checks it's really only set
      when we have full ppgtt, just in case. gem_context->vm is different
      since it's NULL in ggtt mode, unlike intel_context->vm or gt->vm,
      which is always set.
      
      v2: 0day found a testcase that I missed.
      
      v3: Repaint shed (Jon, Tvrtko)
      Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Cc: Jon Bloomfield <jon.bloomfield@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Jason Ekstrand <jason@jlekstrand.net>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-7-daniel.vetter@ffwll.ch
      a82a9979
    • Daniel Vetter's avatar
      drm/i915: Use i915_gem_context_get_eb_vm in ctx_getparam · 24fad29e
      Daniel Vetter authored
      Consolidates the "which is the vm my execbuf runs in" code a bit. We
      do some get/put which isn't really required, but all the other users
      want the refcounting, and I figured doing a function just for this
      getparam to avoid 2 atomis is a bit much.
      Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Cc: Jon Bloomfield <jon.bloomfield@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Jason Ekstrand <jason@jlekstrand.net>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-6-daniel.vetter@ffwll.ch
      24fad29e
    • Daniel Vetter's avatar
      drm/i915: Rename i915_gem_context_get_vm_rcu to i915_gem_context_get_eb_vm · c6d04e48
      Daniel Vetter authored
      The important part isn't so much that this does an rcu lookup - that's
      more an implementation detail, which will also be removed.
      
      The thing that makes this different from other functions is that it's
      gettting you the vm that batchbuffers will run in for that gem
      context, which is either a full ppgtt stored in gem->ctx, or the ggtt.
      
      We'll make more use of this function later on.
      Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Cc: Jon Bloomfield <jon.bloomfield@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Jason Ekstrand <jason@jlekstrand.net>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-5-daniel.vetter@ffwll.ch
      c6d04e48
    • Daniel Vetter's avatar
      drm/i915: Drop code to handle set-vm races from execbuf · e1068a9e
      Daniel Vetter authored
      Changing the vm from a finalized gem ctx is no longer possible, which
      means we don't have to check for that anymore.
      
      I was pondering whether to keep the check as a WARN_ON, but things go
      boom real bad real fast if the vm of a vma is wrong. Plus we'd need to
      also get the ggtt vm for !full-ppgtt platforms. Ditching it all seemed
      like a better idea.
      Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      References: ccbc1b97 ("drm/i915/gem: Don't allow changing the VM on running contexts (v4)")
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Cc: Jon Bloomfield <jon.bloomfield@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Jason Ekstrand <jason@jlekstrand.net>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-4-daniel.vetter@ffwll.ch
      e1068a9e
    • Daniel Vetter's avatar
      drm/i915: Keep gem ctx->vm alive until the final put · 8cf97637
      Daniel Vetter authored
      The comment added in
      
          commit b81dde71
          Author: Chris Wilson <chris@chris-wilson.co.uk>
          Date:   Tue May 21 22:11:29 2019 +0100
      
              drm/i915: Allow userspace to clone contexts on creation
      
      and moved in
      
          commit 27dbae8f
          Author: Chris Wilson <chris@chris-wilson.co.uk>
          Date:   Wed Nov 6 09:13:12 2019 +0000
      
              drm/i915/gem: Safely acquire the ctx->vm when copying
      
      suggested that i915_address_space were at least intended to be managed
      through SLAB_TYPESAFE_BY_RCU:
      
                      * This ppgtt may have be reallocated between
                      * the read and the kref, and reassigned to a third
                      * context. In order to avoid inadvertent sharing
                      * of this ppgtt with that third context (and not
                      * src), we have to confirm that we have the same
                      * ppgtt after passing through the strong memory
                      * barrier implied by a successful
                      * kref_get_unless_zero().
      
      But extensive git history search has not brough any such reuse to
      light.
      
      What has come to light though is that ever since
      
      commit 2850748e
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Fri Oct 4 14:39:58 2019 +0100
      
          drm/i915: Pull i915_vma_pin under the vm->mutex
      
      (yes this commit is earlier) the final i915_vma_put call has been
      moved from i915_gem_context_free (now called _release) to
      context_close, which means it's not actually safe anymore to access
      the ctx->vm pointer without lock helds, because it might disappear at
      any moment. Note that superficially things all still work, because the
      i915_address_space is RCU protected since
      
          commit b32fa811
          Author: Chris Wilson <chris@chris-wilson.co.uk>
          Date:   Thu Jun 20 19:37:05 2019 +0100
      
              drm/i915/gtt: Defer address space cleanup to an RCU worker
      
      except the very clever macro above (which is designed to protected
      against object reuse due to SLAB_TYPESAFE_BY_RCU or similar tricks)
      results in an endless loop if the refcount of the ctx->vm ever
      permanently drops to 0. Which it totally now can.
      
      Fix that by moving the final i915_vm_put to where it should be.
      
      Note that i915_gem_context is rcu protected, but _only_ the final
      kfree. This means anyone who chases a pointer to a gem ctx solely
      under the protection can pretty only call kref_get_unless_zero(). This
      seems to be pretty much the case, aside from a bunch of cases that
      consult the scheduling information without any further protection.
      Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Jason Ekstrand <jason@jlekstrand.net>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Matthew Brost <matthew.brost@intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: "Thomas Hellström" <thomas.hellstrom@intel.com>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Fixes: 2850748e ("drm/i915: Pull i915_vma_pin under the vm->mutex")
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-3-daniel.vetter@ffwll.ch
      8cf97637
    • Daniel Vetter's avatar
      drm/i915: Release ctx->syncobj on final put, not on ctx close · c238980e
      Daniel Vetter authored
      gem context refcounting is another exercise in least locking design it
      seems, where most things get destroyed upon context closure (which can
      race with anything really). Only the actual memory allocation and the
      locks survive while holding a reference.
      
      This tripped up Jason when reimplementing the single timeline feature
      in
      
      commit 00dae4d3
      Author: Jason Ekstrand <jason@jlekstrand.net>
      Date:   Thu Jul 8 10:48:12 2021 -0500
      
          drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)
      
      We could fix the bug by holding ctx->mutex in execbuf and clear the
      pointer (again while holding the mutex) context_close, but it's
      cleaner to just make the context object actually invariant over its
      _entire_ lifetime. This way any other ioctl that's potentially racing,
      but holding a full reference, can still rely on ctx->syncobj being
      an immutable pointer. Which without this change, is not the case.
      Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Fixes: 00dae4d3 ("drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)")
      Cc: Jason Ekstrand <jason@jlekstrand.net>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Matthew Brost <matthew.brost@intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: "Thomas Hellström" <thomas.hellstrom@intel.com>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-2-daniel.vetter@ffwll.ch
      c238980e