• Daniel Vetter's avatar
    drm/i915: Stop rcu support for i915_address_space · dcc5d820
    Daniel Vetter authored
    The full audit is quite a bit of work:
    
    - i915_dpt has very simple lifetime (somehow we create a display pagetable vm
      per object, so its _very_ simple, there's only ever a single vma in there),
      and uses i915_vm_close(), which internally does a i915_vm_put(). No rcu.
    
      Aside: wtf is i915_dpt doing in the intel_display.c garbage collector as a new
      feature, instead of added as a separate file with some clean-ish interface.
    
      Also, i915_dpt unfortunately re-introduces some coding patterns from
      pre-dma_resv_lock conversion times.
    
    - i915_gem_proto_ctx is fully refcounted and no rcu, all protected by
      fpriv->proto_context_lock.
    
    - i915_gem_context is itself rcu protected, and that might leak to anything it
      points at. Before
    
    	commit cf977e18
    	Author: Chris Wilson <chris@chris-wilson.co.uk>
    	Date:   Wed Dec 2 11:21:40 2020 +0000
    
    	    drm/i915/gem: Spring clean debugfs
    
      and
    
    	commit db80a129
    	Author: Chris Wilson <chris@chris-wilson.co.uk>
    	Date:   Mon Jan 18 11:08:54 2021 +0000
    
    	    drm/i915/gem: Remove per-client stats from debugfs/i915_gem_objects
    
      we had a bunch of debugfs files that relied on rcu protecting everything, but
      those are gone now. The main one was removed even earlier with
    
      There doesn't seem to be anything left that's actually protecting
      stuff now that the ctx->vm itself is invariant. See
    
    	commit ccbc1b97
    	Author: Jason Ekstrand <jason@jlekstrand.net>
    	Date:   Thu Jul 8 10:48:30 2021 -0500
    
    	    drm/i915/gem: Don't allow changing the VM on running contexts (v4)
    
      Note that we drop the vm refcount before the final release of the gem context
      refcount, so this is all very dangerous even without rcu. Note that aside from
      later on creating new engines (a defunct feature) and debug output we're never
      looked at gem_ctx->vm for anything functional, hence why this is ok.
      Fingers crossed.
    
      Preceeding patches removed all vestiges of rcu use from gem_ctx->vm
      derferencing to make it clear it's really not used.
    
      The gem_ctx->rcu protection was introduced in
    
    	commit a4e7ccda
    	Author: Chris Wilson <chris@chris-wilson.co.uk>
    	Date:   Fri Oct 4 14:40:09 2019 +0100
    
    	    drm/i915: Move context management under GEM
    
      The commit message is somewhat entertaining because it fails to
      mention this fact completely, and compensates that by an in-commit
      changelog entry that claims that ctx->vm is protected by ctx->mutex.
      Which was the case _before_ this commit, but no longer after it.
    
    - intel_context holds a full reference. Unfortunately intel_context is also rcu
      protected and the reference to the ->vm is dropped before the
      rcu barrier - only the kfree is delayed. So again we need to check
      whether that leaks anywhere on the intel_context->vm. RCU is only
      used to protect intel_context sitting on the breadcrumb lists, which
      don't look at the vm anywhere, so we are fine.
    
      Nothing else relies on rcu protection of intel_context and hence is
      fully protected by the kref refcount alone, which protects
      intel_context->vm in turn.
    
      The breadcrumbs rcu usage was added in
    
    	commit c744d503
    	Author: Chris Wilson <chris@chris-wilson.co.uk>
    	Date:   Thu Nov 26 14:04:06 2020 +0000
    
    	    drm/i915/gt: Split the breadcrumb spinlock between global and contexts
    
      its parent commit added the intel_context rcu protection:
    
    	commit 14d1eaf0
    	Author: Chris Wilson <chris@chris-wilson.co.uk>
    	Date:   Thu Nov 26 14:04:05 2020 +0000
    
    	    drm/i915/gt: Protect context lifetime with RCU
    
      given some credence to my claim that I've actually caught them all.
    
    - drm_i915_gem_object's shares_resv_from pointer has a full refcount to the
      dma_resv, which is a sub-refcount that's released after the final
      i915_vm_put() has been called. Safe.
    
      Aside: Maybe we should have a struct dma_resv_shared which is just dma_resv +
      kref as a stand-alone thing. It's a pretty useful pattern which other drivers
      might want to copy.
    
      For a bit more context see
    
    	commit 4d8151ae
    	Author: Thomas Hellström <thomas.hellstrom@linux.intel.com>
    	Date:   Tue Jun 1 09:46:41 2021 +0200
    
    	    drm/i915: Don't free shared locks while shared
    
    - the fpriv->vm_xa was relying on rcu_read_lock for lookup, but that
      was updated in a prep patch too to just be a spinlock-protected
      lookup.
    
    - intel_gt->vm is set at driver load in intel_gt_init() and released
      in intel_gt_driver_release(). There seems to be some issue that
      in some error paths this is called twice, but otherwise no rcu to be
      found anywhere. This was added in the below commit, which
      unfortunately doesn't explain why this complication exists.
    
    	commit e6ba7648
    	Author: Chris Wilson <chris@chris-wilson.co.uk>
    	Date:   Sat Dec 21 16:03:24 2019 +0000
    
    	    drm/i915: Remove i915->kernel_context
    
      The proper fix most likely for this is to start using drmm_ at large
      scale, but that's also huge amounts of work.
    
    - i915_vma->vm is some real pain, because rcu is rcu protected, at
      least in the vma lookup in the context lookup cache in
      eb_lookup_vma(). This was added in
    
    	commit 4ff4b44c
    	Author: Chris Wilson <chris@chris-wilson.co.uk>
    	Date:   Fri Jun 16 15:05:16 2017 +0100
    
    	    drm/i915: Store a direct lookup from object handle to vma
    
      This was changed to a radix tree from the hashtable in, but with the
      locking unchanged, in
    
    	commit d1b48c1e
    	Author: Chris Wilson <chris@chris-wilson.co.uk>
    	Date:   Wed Aug 16 09:52:08 2017 +0100
    
    	    drm/i915: Replace execbuf vma ht with an idr
    
      In
    
    	commit 93159e12
    	Author: Chris Wilson <chris@chris-wilson.co.uk>
    	Date:   Mon Mar 23 09:28:41 2020 +0000
    
    	    drm/i915/gem: Avoid gem_context->mutex for simple vma lookup
    
      the locking was changed from dev->struct_mutex to rcu, which added
      the requirement to rcu protect i915_vma. Somehow this was missed in
      review (or I'm completely blind).
    
      Irrespective of all that the vma lookup cache rcu_read_lock grabs a
      full reference of the vma and the rcu doesn't leak further. So no
      impact on i915_address_space from that.
    
      I have not found any other rcu use for i915_vma, but given that it
      seems broken I also didn't bother to do a careful in-depth audit.
    
    Alltogether there's nothing left in-tree anymore which requires that a
    pointer deref to an i915_address_space is safe undre rcu_read_lock
    only.
    
    rcu protection of i915_address_space was introduced in
    
    commit b32fa811
    Author: Chris Wilson <chris@chris-wilson.co.uk>
    Date:   Thu Jun 20 19:37:05 2019 +0100
    
        drm/i915/gtt: Defer address space cleanup to an RCU worker
    
    by mixing up a bugfixing (i915_address_space needs to be released from
    a worker) with enabling rcu support. The commit message also seems
    somewhat confused, because it talks about cleanup of WC pages
    requiring sleep, while the code and linked bugzilla are about a
    requirement to take dev->struct_mutex (which yes sleeps but it's a
    much more specific problem). Since final kref_put can be called from
    pretty much anywhere (including hardirq context through the
    scheduler's i915_active cleanup) we need a worker here. Hence that
    part must be kept.
    
    Ideally all these reclaim workers should have some kind of integration
    with our shrinkers, but for some of these it's rather tricky. Anyway,
    that's a preexisting condition in the codeebase that we wont fix in
    this patch here.
    
    We also remove the rcu_barrier in ggtt_cleanup_hw added in
    
    commit 60a4233a
    Author: Chris Wilson <chris@chris-wilson.co.uk>
    Date:   Mon Jul 29 14:24:12 2019 +0100
    
        drm/i915: Flush the i915_vm_release before ggtt shutdown
    Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
    Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
    Cc: Jon Bloomfield <jon.bloomfield@intel.com>
    Cc: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
    Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
    Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
    Cc: Matthew Auld <matthew.auld@intel.com>
    Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
    Cc: Dave Airlie <airlied@redhat.com>
    Cc: Jason Ekstrand <jason@jlekstrand.net>
    Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-11-daniel.vetter@ffwll.ch
    dcc5d820
intel_ggtt.c 38.8 KB