1. 05 Feb, 2019 10 commits
    • Chris Wilson's avatar
      drm/i915: Pull i915_gem_active into the i915_active family · 21950ee7
      Chris Wilson authored
      Looking forward, we need to break the struct_mutex dependency on
      i915_gem_active. In the meantime, external use of i915_gem_active is
      quite beguiling, little do new users suspect that it implies a barrier
      as each request it tracks must be ordered wrt the previous one. As one
      of many, it can be used to track activity across multiple timelines, a
      shared fence, which fits our unordered request submission much better. We
      need to steer external users away from the singular, exclusive fence
      imposed by i915_gem_active to i915_active instead. As part of that
      process, we move i915_gem_active out of i915_request.c into
      i915_active.c to start separating the two concepts, and rename it to
      i915_active_request (both to tie it to the concept of tracking just one
      request, and to give it a longer, less appealing name).
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190205130005.2807-5-chris@chris-wilson.co.uk
      21950ee7
    • Chris Wilson's avatar
      drm/i915: Allocate active tracking nodes from a slabcache · 5f5c139d
      Chris Wilson authored
      Wrap the active tracking for a GPU references in a slabcache for faster
      allocations, and hopefully better fragmentation reduction.
      
      v3: Nothing device specific left, it's just a slabcache that we can
      make global.
      v4: Include i915_active.h and don't put the initfunc under DEBUG_GEM
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190205130005.2807-4-chris@chris-wilson.co.uk
      5f5c139d
    • Chris Wilson's avatar
      drm/i915: Release the active tracker tree upon idling · a42375af
      Chris Wilson authored
      As soon as we detect that the active tracker is idle and we prepare to
      call the retire callback, release the storage for our tree of
      per-timeline nodes. We expect these to be infrequently used and quick
      to allocate, so there is little benefit in keeping the tree cached and
      we would prefer to return the pages back to the system in a timely
      fashion.
      
      This also means that when we finalize the struct as a whole, we know as
      the activity tracker must be idle, the tree has already been released.
      Indeed we can reduce i915_active_fini() just to the assertions that there
      is nothing to do.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190205130005.2807-3-chris@chris-wilson.co.uk
      a42375af
    • Chris Wilson's avatar
      drm/i915: Generalise GPU activity tracking · 64d6c500
      Chris Wilson authored
      We currently track GPU memory usage inside VMA, such that we never
      release memory used by the GPU until after it has finished accessing it.
      However, we may want to track other resources aside from VMA, or we may
      want to split a VMA into multiple independent regions and track each
      separately. For this purpose, generalise our request tracking (akin to
      struct reservation_object) so that we can embed it into other objects.
      
      v2: Tweak error handling during selftest setup.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190205130005.2807-2-chris@chris-wilson.co.uk
      64d6c500
    • Chris Wilson's avatar
      drm/i915/selftests: Exercise some AB...BA preemption chains · a21f453c
      Chris Wilson authored
      Build a chain using 2 contexts (A, B) then request a preemption such
      that a later A request runs before the spinner in B.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190205123835.25331-1-chris@chris-wilson.co.uk
      a21f453c
    • Tvrtko Ursulin's avatar
      drm/i915/selftests: Context SSEU reconfiguration tests · c06ee6ff
      Tvrtko Ursulin authored
      Exercise the context image reconfiguration logic for idle and busy
      contexts, with the resets thrown into the mix as well.
      
      Free from the uAPI restrictions this test runs on all Gen9+ platforms
      with slice power gating.
      
      v2:
       * Rename some helpers for clarity.
       * Include subtest names in error logs.
       * Remove unnecessary function export.
      
      v3:
       * Rebase for RUNTIME_INFO.
      
      v4:
       * Fix incomplete unexport from v2. (Chris Wilson)
      
      v5:
       * Rebased for runtime pm api changes.
      
      v6:
       * Rebased for i915_reset.c.
      
      v7:
       * Tidy checkpatch warnings.
       * Consolidate error checking and logging a bit.
       * Skip idle test phase if something failed before it.
      
      v8:
       (Chris Wilson)
       * Fix i915_request_wait error handling.
       * No need to PIN_HIGH the VMA.
       * Remove pointless GEM_BUG_ON before pointer dereference.
      
      v9:
       * Avoid rq leak if rpcs query fails. (Chris)
      Signed-off-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> # v6
      Link: https://patchwork.freedesktop.org/patch/msgid/20190205095032.22673-5-tvrtko.ursulin@linux.intel.com
      c06ee6ff
    • Tvrtko Ursulin's avatar
      drm/i915: Expose RPCS (SSEU) configuration to userspace (Gen11 only) · e46c2e99
      Tvrtko Ursulin authored
      We want to allow userspace to reconfigure the subslice configuration on a
      per context basis.
      
      This is required for the functional requirement of shutting down non-VME
      enabled sub-slices on Gen11 parts.
      
      To do so, we expose a context parameter to allow adjustment of the RPCS
      register stored within the context image (and currently not accessible via
      LRI).
      
      If the context is adjusted before first use or whilst idle, the adjustment
      is for "free"; otherwise if the context is active we queue a request to do
      so (using the kernel context), following all other activity by that
      context, which is also marked as barrier for all following submission
      against the same context.
      
      Since the overhead of device re-configuration during context switching can
      be significant, especially in multi-context workloads, we limit this new
      uAPI to only support the Gen11 VME use case. In this use case either the
      device is fully enabled, and exactly one slice and half of the subslices
      are enabled.
      
      Example usage:
      
      	struct drm_i915_gem_context_param_sseu sseu = { };
      	struct drm_i915_gem_context_param arg = {
      		.param = I915_CONTEXT_PARAM_SSEU,
      		.ctx_id = gem_context_create(fd),
      		.size = sizeof(sseu),
      		.value = to_user_pointer(&sseu)
      	};
      
      	/* Query device defaults. */
      	gem_context_get_param(fd, &arg);
      
      	/* Set VME configuration on a 1x6x8 part. */
      	sseu.slice_mask = 0x1;
      	sseu.subslice_mask = 0xe0;
      	gem_context_set_param(fd, &arg);
      
      v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu()
          (Lionel)
      
      v3: Add ability to program this per engine (Chris)
      
      v4: Move most get_sseu() into i915_gem_context.c (Lionel)
      
      v5: Validate sseu configuration against the device's capabilities (Lionel)
      
      v6: Change context powergating settings through MI_SDM on kernel context
          (Chris)
      
      v7: Synchronize the requests following a powergating setting change using
          a global dependency (Chris)
          Iterate timelines through dev_priv.gt.active_rings (Tvrtko)
          Disable RPCS configuration setting for non capable users
          (Lionel/Tvrtko)
      
      v8: s/union intel_sseu/struct intel_sseu/ (Lionel)
          s/dev_priv/i915/ (Tvrtko)
          Change uapi class/instance fields to u16 (Tvrtko)
          Bump mask fields to 64bits (Lionel)
          Don't return EPERM when dynamic sseu is disabled (Tvrtko)
      
      v9: Import context image into kernel context's ppgtt only when
          reconfiguring powergated slice/subslices (Chris)
          Use aliasing ppgtt when needed (Michel)
      
      Tvrtko Ursulin:
      
      v10:
       * Update for upstream changes.
       * Request submit needs a RPM reference.
       * Reject on !FULL_PPGTT for simplicity.
       * Pull out get/set param to helpers for readability and less indent.
       * Use i915_request_await_dma_fence in add_global_barrier to skip waits
         on the same timeline and avoid GEM_BUG_ON.
       * No need to explicitly assign a NULL pointer to engine in legacy mode.
       * No need to move gen8_make_rpcs up.
       * Factored out global barrier as prep patch.
       * Allow to only CAP_SYS_ADMIN if !Gen11.
      
      v11:
       * Remove engine vfunc in favour of local helper. (Chris Wilson)
       * Stop retiring requests before updates since it is not needed
         (Chris Wilson)
       * Implement direct CPU update path for idle contexts. (Chris Wilson)
       * Left side dependency needs only be on the same context timeline.
         (Chris Wilson)
       * It is sufficient to order the timeline. (Chris Wilson)
       * Reject !RCS configuration attempts with -ENODEV for now.
      
      v12:
       * Rebase for make_rpcs.
      
      v13:
       * Centralize SSEU normalization to make_rpcs.
       * Type width checking (uAPI <-> implementation).
       * Gen11 restrictions uAPI checks.
       * Gen11 subslice count differences handling.
       Chris Wilson:
       * args->size handling fixes.
       * Update context image from GGTT.
       * Postpone context image update to pinning.
       * Use i915_gem_active_raw instead of last_request_on_engine.
      
      v14:
       * Add activity tracker on intel_context to fix the lifetime issues
         and simplify the code. (Chris Wilson)
      
      v15:
       * Fix context pin leak if no space in ring by simplifying the
         context pinning sequence.
      
      v16:
       * Rebase for context get/set param locking changes.
       * Just -ENODEV on !Gen11. (Joonas)
      
      v17:
       * Fix one Gen11 subslice enablement rule.
       * Handle error from i915_sw_fence_await_sw_fence_gfp. (Chris Wilson)
      
      v18:
       * Update commit message. (Joonas)
       * Restrict uAPI to VME use case. (Joonas)
      
      v19:
       * Rebase.
      
      v20:
       * Rebase for ce->active_tracker.
      
      v21:
       * Rebase for IS_GEN changes.
      
      v22:
       * Reserve uAPI for flags straight away. (Chris Wilson)
      
      v23:
       * Rebase for RUNTIME_INFO.
      
      v24:
       * Added some headline docs for the uapi usage. (Joonas/Chris)
      
      v25:
       * Renamed class/instance to engine_class/engine_instance to avoid clash
         with C++ keyword. (Tony Ye)
      
      v26:
       * Rebased for runtime pm api changes.
      
      v27:
       * Rebased for intel_context_init.
       * Wrap commit msg to 75.
      
      v28:
       (Chris Wilson)
       * Use i915_gem_ggtt.
       * Use i915_request_await_dma_fence to show a better example.
      
      v29:
       * i915_timeline_set_barrier can now fail. (Chris Wilson)
      
      v30:
       * Capture some acks.
      
      v31:
       * Drop the WARN_ON from use controllable paths. (Chris Wilson)
       * Use overflows_type for all checks.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107634
      Issue: https://github.com/intel/media-driver/issues/267Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarLionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Zhipeng Gong <zhipeng.gong@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Tony Ye <tony.ye@intel.com>
      Signed-off-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Acked-by: default avatarTimo Aaltonen <timo.aaltonen@canonical.com>
      Acked-by: default avatarTakashi Iwai <tiwai@suse.de>
      Acked-by: default avatarStéphane Marchesin <marcheu@chromium.org>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190205095032.22673-4-tvrtko.ursulin@linux.intel.com
      e46c2e99
    • Tvrtko Ursulin's avatar
      drm/i915: Add timeline barrier support · 78108584
      Tvrtko Ursulin authored
      Timeline barrier allows serialization between different timelines.
      
      After calling i915_timeline_set_barrier with a request, all following
      submissions on this timeline will be set up as depending on this request,
      or barrier. Once the barrier has been completed it automatically gets
      cleared and things continue as normal.
      
      This facility will be used by the upcoming context SSEU code.
      
      v2:
       * Assert barrier has been retired on timeline_fini. (Chris Wilson)
       * Fix mock_timeline.
      
      v3:
       * Improved comment language. (Chris Wilson)
      
      v4:
       * Maintain ordering with previous barriers set on the timeline.
      
      v5:
       * Rebase.
      Signed-off-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Suggested-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190205095032.22673-3-tvrtko.ursulin@linux.intel.com
      78108584
    • Lionel Landwerlin's avatar
      drm/i915/perf: lock powergating configuration to default when active · ec431eae
      Lionel Landwerlin authored
      If some of the contexts submitting workloads to the GPU have been
      configured to shutdown slices/subslices, we might loose the NOA
      configurations written in the NOA muxes.
      
      One possible solution to this problem is to reprogram the NOA muxes
      when we switch to a new context. We initially tried this in the
      workaround batchbuffer but some concerns where raised about the cost
      of reprogramming at every context switch. This solution is also not
      without consequences from the userspace point of view. Reprogramming
      of the muxes can only happen once the powergating configuration has
      changed (which happens after context switch). This means for a window
      of time during the recording, counters recorded by the OA unit might
      be invalid. This requires userspace dealing with OA reports to discard
      the invalid values.
      
      Minimizing the reprogramming could be implemented by tracking of the
      last programmed configuration somewhere in GGTT and use MI_PREDICATE
      to discard some of the programming commands, but the command streamer
      would still have to parse all the MI_LRI instructions in the
      workaround batchbuffer.
      
      Another solution, which this change implements, is to simply disregard
      the user requested configuration for the period of time when i915/perf
      is active.
      
      On most platforms there are no issues with this apart from a performance
      penality for some media workloads that benefit from running on a partially
      powergated GPU. We already prevent RC6 from affecting the programming so
      it doesn't sound completely unreasonable to hold on powergating for the
      same reason.
      
      On Icelake however there would a functional problem if the slices not-
      containing the VME block were left enabled with a running media workload
      which explicitly disabled them. To avoid a GPU hang in this case, on
      Icelake we lock the enablement to only slices which contain VME blocks.
      Downside is that it means degraded GPU performance when OA is active but
      there is no known alternative solution for this.
      
      v2: Leave RPCS programming in intel_lrc.c (Lionel)
      
      v3: Update for s/union intel_sseu/struct intel_sseu/ (Lionel)
          More to_intel_context() (Tvrtko)
          s/dev_priv/i915/ (Tvrtko)
      
      Tvrtko Ursulin:
      
      v4:
       * Rebase for make_rpcs changes.
      
      v5:
       * Apply OA restriction from make_rpcs directly.
      
      v6:
       * Rebase for context image setup changes.
      
      v7:
       * Move stream assignment before metric enable.
      
      v8-9:
       * Rebase.
      
      v10:
       * Squashed with ICL support patch.
      
      Bspec: 21140
      Co-developed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Signed-off-by: default avatarLionel Landwerlin <lionel.g.landwerlin@intel.com>
      Signed-off-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> # v9
      Reviewed-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190205095032.22673-2-tvrtko.ursulin@linux.intel.com
      ec431eae
    • Lionel Landwerlin's avatar
      drm/i915: Record the sseu configuration per-context & engine · 87f1ef22
      Lionel Landwerlin authored
      We want to expose the ability to reconfigure the slices, subslice and
      eu per context and per engine. To facilitate that, store the current
      configuration on the context for each engine, which is initially set
      to the device default upon creation.
      
      v2: record sseu configuration per context & engine (Chris)
      
      v3: introduce the i915_gem_context_sseu to store powergating
          programming, sseu_dev_info has grown quite a bit (Lionel)
      
      v4: rename i915_gem_sseu into intel_sseu (Chris)
          use to_intel_context() (Chris)
      
      v5: More to_intel_context() (Tvrtko)
          Switch intel_sseu from union to struct (Tvrtko)
          Move context default sseu in existing loop (Chris)
      
      v6: s/intel_sseu_from_device_sseu/intel_device_default_sseu/ (Tvrtko)
      
      Tvrtko Ursulin:
      
      v7:
       * Pass intel_sseu by pointer instead of value to make_rpcs.
       * Rebase for make_rpcs changes.
      
      v8:
       * Rebase for RPCS edit on pin.
      
      v9:
       * Rebase for context image setup changes.
      
      v10:
       * Rename dev_priv to i915. (Chris Wilson)
      
      v11:
       * Rebase.
      
      v12:
       * Rebase for IS_GEN changes.
      
      v13:
       * Rebase for RUNTIME_INFO.
      
      v14:
       * Rebase for intel_context_init.
      
      v15:
       * Rebase for drm-tip changes.
      
      v16:
       * Moved struct intel_sseu definition to i915_gem_context.h.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarLionel Landwerlin <lionel.g.landwerlin@intel.com>
      Signed-off-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190205095032.22673-1-tvrtko.ursulin@linux.intel.com
      87f1ef22
  2. 04 Feb, 2019 2 commits
    • Chris Wilson's avatar
      drm/i915: Trim NEWCLIENT boosting · 1413b2bc
      Chris Wilson authored
      Limit the NEWCLIENT boost to only give its small priority boost to fresh
      clients only that have no dependencies.
      
      The idea for using NEWCLIENT boosting, commit b16c7651 ("drm/i915:
      Priority boost for new clients"), is that short-lived streams are often
      interactive and require lower latency -- and that by executing those
      ahead of the long running hogs, the short-lived clients do little to
      interfere with the system throughput by virtue of their short-lived
      nature. However, we were only considering the client's own timeline for
      determining whether or not it was a fresh stream. This allowed for
      compositors to wake up before their vblank and bump all of its client
      streams. However, in testing with media-bench this results in chaining
      all cooperating contexts together preventing us from being able to
      reorder contexts to reduce bubbles (pipeline stalls), overall increasing
      latency, and reducing system throughput. The exact opposite of our
      intent. The compromise of applying the NEWCLIENT boost to strictly fresh
      clients (that do not wait upon anything else) should maintain the
      "real-time response under load" characteristics of FQ_CODEL, without
      locking together the long chains of dependencies across the system.
      
      References: b16c7651 ("drm/i915: Priority boost for new clients")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190204150101.30759-1-chris@chris-wilson.co.uk
      1413b2bc
    • Chris Wilson's avatar
      drm/i915: Allow normal clients to always preempt idle priority clients · 3d7a64b9
      Chris Wilson authored
      When first enabling preemption, we hesitated from making it a free-for-all
      where every higher priority client would force a preempt-to-idle cycle
      and take over from all lower priority clients. We hesitated because we
      were uncertain just how well preemption would work in practice, whether
      the preemption latency itself would detract from the latency gains for
      higher priority tasks and whether it would work at all. Since
      introducing preemption, we have been enabling it for more common tasks,
      even giving normal clients a small preemptive boost when they first
      start (to aide fairness and improve interactivity). Now lets take one
      step further and give permission for all normal (priority:0) clients to
      preempt any idle (priority:<0) task so that users running long compute
      jobs do not overly impact other jobs (i.e. their desktop) and the system
      remains responsive under such idle loads.
      
      References: f6322edd ("drm/i915/preemption: Allow preemption between submission ports")
      References: b16c7651 ("drm/i915: Priority boost for new clients")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
      Cc: Michał Winiarski <michal.winiarski@intel.com>
      Cc: "Bloomfield, Jon" <jon.bloomfield@intel.com>
      Cc: "Stead, Alan" <alan.stead@intel.com>
      Reviewed-by: default avatarDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190204084116.3013-1-chris@chris-wilson.co.uk
      3d7a64b9
  3. 02 Feb, 2019 1 commit
  4. 31 Jan, 2019 1 commit
  5. 01 Feb, 2019 5 commits
  6. 31 Jan, 2019 9 commits
  7. 30 Jan, 2019 10 commits
  8. 29 Jan, 2019 2 commits