Commits · 16a8739402a004de7283c33427150f7189fccf4f · Kirill Smelkov / linux

20 Dec, 2017 2 commits

drm/i915: Tidy up GEM_TRACE around execlists · 16a87394

Chris Wilson authored Dec 20, 2017

Looking at the coordination of resets with the submission of execlists,
it will be useful to have a GEM_TRACE for when we issue the reset.
Whilst there tidy up the other GEM_TRACE to always include the engine
name, and be careful not to trust any pointers prior to asserts.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171220090626.31643-1-chris@chris-wilson.co.uk

16a87394

drm/i915/psr: Fix register name mess up. · 14c6547d

Dhinakaran Pandiyan authored Dec 19, 2017

Commit 77affa31 ("drm/i915/psr: Fix compiler warnings for
hsw_psr_disable()") swapped status and control registers while fixing
indentation. The _ctl at the end of the status register name must have to
led to this.

Fixes: 77affa31 ("drm/i915/psr: Fix compiler warnings for hsw_psr_disable()")
References: https://www.mrc-cbu.cam.ac.uk/people/matt.davis/cmabridge/
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171220043520.2599-1-dhinakaran.pandiyan@intel.comReviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

14c6547d

19 Dec, 2017 10 commits

drm/i915: Avoid context dereference inside execlists_submission_tasklet · 16c8619a

Chris Wilson authored Dec 19, 2017

A lesson that has to be relearnt over and over again is that the request
does not keep a reference to the context and so we cannot freely
dereference the context from inside the execlists_submission_tasklet. In
particular, we try to do so in the new GEM_TRACE() so convert those over
to the port->context_id we keep for GEM debugging. This means the
tracing now depends on DRM_I915_GEM_DEBUG.

Fixes: bccd3b83 ("drm/i915: Use trace_printk to provide a death rattle for GEM")
References: https://bugs.freedesktop.org/show_bug.cgi?id=104066
References: https://bugs.freedesktop.org/show_bug.cgi?id=104162
References: https://bugs.freedesktop.org/show_bug.cgi?id=104242
References: https://bugs.freedesktop.org/show_bug.cgi?id=104310Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171219220916.30882-1-chris@chris-wilson.co.uk

16c8619a

drm/i915: Remove pointer indirection for hangcheck_state local · 93dff100

Chris Wilson authored Dec 19, 2017

Use the local on-stack struct directly rather than hide it behind a
pointer. This should be both clearer for the reader and the compiler (we
rely on the compiler seeing through the functions to spot uninitialized
uses of the local).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171219130948.6282-1-chris@chris-wilson.co.uk

93dff100

drm/i915: Show if we consider the engine is idle in the GPU error state · 398c8a30

Chris Wilson authored Dec 19, 2017

Useful for verifying our bookkeeper when we encounter is knowing whether
we think the engine is idle at the time of the GPU hang.

References: https://bugs.freedesktop.org/show_bug.cgi?id=104305Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171219131419.13117-1-chris@chris-wilson.co.uk

398c8a30

drm/i915: Implement WaDisableEarlyEOT. · a2b16588

Rafael Antognolli authored Dec 15, 2017

There seems to be another clock gating issue which the workaround is
described as:

   "WA: Set 0xE4F0[1] = 1 to disable Early EOT of thread."
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171216001117.14232-2-rafael.antognolli@intel.com

a2b16588

drm/i915: Implement WaDisableVFclkgate. · 01ab0f92

Rafael Antognolli authored Dec 15, 2017

This workaround supposedly fixes some hangs in the VF unit.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171216001117.14232-1-rafael.antognolli@intel.com

01ab0f92

drm/i915: Add pretty printer for modparams · acfb9973

Michal Wajdeczko authored Dec 19, 2017

We dump modparams in few places (debugfs, gpu_error) using different
functions. Lets add reusable function to avoid code duplication.

add/remove: 1/0 grow/shrink: 0/2 up/down: 1096/-2339 (-1243)
Function old new delta
i915_params_dump - 1096 +1096
i915_capabilities 1353 185 -1168
i915_error_state_to_str 5507 4336 -1171
Total: Before=1285716, After=1284473, chg -0.10%

v2: use forward decl rather than include (Chris)
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171219114346.26308-3-michal.wajdeczko@intel.com

acfb9973

drm/i915: Convert intel_device_info_dump into pretty printer · eb10ed9a

Michal Wajdeczko authored Dec 19, 2017

Convert intel_device_info_dump into pretty printer to be
consistent with the rest of the driver code.
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171219114346.26308-2-michal.wajdeczko@intel.com

eb10ed9a

drm/i915: Add pretty printer for device info flags · a8c9b849

Michal Wajdeczko authored Dec 19, 2017

We dump device flags in few places (init_early, debugfs, gpu_error)
using different functions. Lets add reusable function to avoid
code duplication.

add/remove: 1/0 grow/shrink: 0/3 up/down: 1296/-3572 (-2276)
Function                                     old     new   delta
intel_device_info_dump_flags                   -    1296   +1296
i915_capabilities                           2435    1353   -1082
i915_error_state_to_str                     6642    5507   -1135
intel_device_info_dump                      1507     152   -1355
Total: Before=1287992, After=1285716, chg -0.18%
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171219114346.26308-1-michal.wajdeczko@intel.com

a8c9b849

drm/i915: Fix indentation for intel_ddi_clk_select · 04bf68bb

Chris Wilson authored Dec 19, 2017

drivers/gpu/drm/i915/intel_ddi.c:2098 intel_ddi_clk_select() warn: inconsistent indenting

References: 8edcda12 ("drm/i915: Protect DDI port to DPLL map from theoretical race.")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171219112649.9388-1-chris@chris-wilson.co.uk

04bf68bb

drm/i915: Add a strong mb to resetting the has-CS-interrupt bit · fcb1de54

Chris Wilson authored Dec 19, 2017

After a reset, the state of the CSB registers are scrubbed and not valid
until a powercontext is reloaded. We only know when a powercontext has
been reloaded once we see a CS-interrupt, before then we must ignore the
CSB registers within the execlists_submission_tasklet. However, glk is
sporadically dying with an illegal CSB pointer value (both in the HSWP
and mmio) suggesting that it is running with the CS-interrupt bit set
before the powercontext has been reloaded. Make sure the clearing of
that bit is serialised on reset with the re-enabling of the tasklet.

References: https://bugs.freedesktop.org/show_bug.cgi?id=104262Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171219090110.11153-1-chris@chris-wilson.co.ukReviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

fcb1de54

18 Dec, 2017 8 commits

drm/i915/cnl: Add support for horizontal plane flipping · 5f8e3f57

Joonas Lahtinen authored Dec 15, 2017

CNL supports horizontal plane flipping on non-linear plane formats.

v2:
- Avoid BUG unlike elsewhere in the code (Ville)
- Hoist the rotation-tiling restriction check (Ville)

v3 (Rodrigo):
- Rebased after a while.
- Fix small indentation issues.

Bspec: 7656
Suggested-by: Anusha Srivatsa <anusha.srivatsa@intel.com>
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Anusha Srivatsa <anusha.srivatsa@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171215213800.7896-1-rodrigo.vivi@intel.com

5f8e3f57

drm/i915: Protect DDI port to DPLL map from theoretical race. · 8edcda12

Rodrigo Vivi authored Dec 15, 2017

In case we have multiple modesets for different connectors
happening in parallel we could have a race on the RMW on these
shared registers.

This possibility was initially raised by Paulo when reviewing
commit '555e38d2 ("drm/i915/cnl: DDI - PLL mapping")'
but the original possibility comes from commit '5416d871
("drm/i915/skl: Set the eDP link rate on DPLL0")'. Or maybe
later when atomic commits entered into picture.

Apparently the discussion around this topic showed that the
right solution would be on serializing the atomic commits in
a way that we don't have the possibility of races here since
if that parallel modeset happenings apparently many other
things will be on fire.

Code is there since SKL and there was no report of issue,
but since we never looked back to that serialization possibility,
and also we don't have an igt case for that it is better to at
least protect this corner.
Suggested-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Fixes: 555e38d2 ("drm/i915/cnl: DDI - PLL mapping")
Fixes: 5416d871 ("drm/i915/skl: Set the eDP link rate on DPLL0")
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com
Link: https://patchwork.freedesktop.org/patch/msgid/20171215224310.19103-1-rodrigo.vivi@intel.com

8edcda12

drm/i915/selftests: Fix up igt_reset_engine · 3fb04cb0

Chris Wilson authored Dec 17, 2017

Now that we skip a per-engine reset on an idle engine, we need to update
the selftest to take that into account. In the process, we find that we
were not stressing the per-engine reset very hard, so add those missing
active resets.

v2: Actually test i915_reset_engine() by loading it with requests.

Fixes: f6ba181a ("drm/i915: Skip an engine reset if it recovered before our preparations")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104313Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171217132852.30642-3-chris@chris-wilson.co.ukReviewed-by: Michel Thierry <michel.thierry@intel.com>

3fb04cb0

drm/i915/trace: add hw_id to gem requests trace points · 151a99ec

Lionel Landwerlin authored Dec 18, 2017

When monitoring the GPU with i915 perf, reports are tagged with a hw
id. Gem context creation tracepoints already have a hw_id field,
unfortunately you only get this correlation between a process id and a
hw context id once when the context is created. It doesn't help if you
started monitoring after the process was initialized or if the drm fd
was transfered from one process to another.

This change adds the hw_id field to gem requests, so that correlation
can also be done on submission.

v2: Place hw_id at the end of the tracepoint to not disrupt too much
    existing tools (Chris)

v3: Reorder hw_id field again (Chris)

v4: Add missing hw_id to i915_gem_request_wait_begin tracepoint (Chris)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171218151959.14073-3-lionel.g.landwerlin@intel.com

151a99ec

drm/i915: reorder field in gem_request tracepoints · 3c2d0671

Lionel Landwerlin authored Dec 18, 2017

Let's make the order of the fields of the tracepoints involving gem
request match across i915. This makes userspace processing of
tracepoint a bit easier.
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171218151959.14073-2-lionel.g.landwerlin@intel.com

3c2d0671

drm/i915: Show IPEIR and IPEHR in the engine dump · a0cf5790

Chris Wilson authored Dec 18, 2017

A useful bit of information for inspecting GPU stalls from
intel_engine_dump() are the error registers, IPEIR and IPEHR.

v2: Fixup gen changes in register offsets (Tvrtko)
v3: Old FADDR location as well
v4: Use I915_READ64_2x32
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171218123914.19027-1-chris@chris-wilson.co.ukReviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

a0cf5790

drm/i915: prefer i915_gem_object_has_pages() · b65a9b98

Matthew Auld authored Dec 18, 2017

We have an existing helper for testing obj->mm.pages, so use it.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171218103855.25274-1-matthew.auld@intel.comReviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

b65a9b98

drm/i915: Re-enable GGTT earlier after GPU reset · a31d73c3

Chris Wilson authored Dec 17, 2017

Inside i915_gem_reset(), we start touching the HW and so require the
low-level HW to be re-enabled, in particular the PCI BARs.

Fixes: 7b6da818 ("drm/i915: Restore the kernel context after a GPU reset on an idle engine")
References: 0db8c961 ("drm/i915: Re-enable GTT following a device reset")
Testcase: igt/drv_hangman #i915g/i915gm
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171217132852.30642-1-chris@chris-wilson.co.ukReviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

a31d73c3

17 Dec, 2017 1 commit

drm/i915: Unifying debugfs return codes for unsupported features · ab309a6a

Michal Wajdeczko authored Dec 15, 2017

Instead of trying different seq_puts messages, lets use common
-ENODEV error code to indicate missing/unsupported feature.

v2: don't forget about guc_log_control fops (Sagar)
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Sujaritha Sundaresan <sujaritha.sundaresan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171215143635.17884-1-michal.wajdeczko@intel.comReviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

ab309a6a

16 Dec, 2017 2 commits

drm/i915: Restore the kernel context after a GPU reset on an idle engine · 7b6da818

Chris Wilson authored Dec 16, 2017

As part of the system requirement for powersaving is that we always have
a context loaded. Upon boot and resume, we load the kernel_context to
ensure that some valid state is set before powersaving kicks in, we
should do so after a full GPU reset as well. We only need to do so for
an idle engine, as any active engines will restart by executing the
stuck request, loading its context. For the idle engine, we create a
new request to load the kernel_context instead.

For whatever reason, perfoming a dummy execute on the idle engine after
reset papers over a subsequent GPU hang in rare circumstances, even on
machines not using contexts (e.g. Pineview).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104259
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104261Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171216000334.8197-1-chris@chris-wilson.co.uk

7b6da818

drm/i915: Skip an engine reset if it recovered before our preparations · f6ba181a

Chris Wilson authored Dec 16, 2017

At the beginning of a reset, we disable the submission method and find
the stuck request. We expect to find a stuck request for we have
declared the engine stalled. However, if we find no active request, the
engine must have recovered from its stall before we could issue a reset,
so let the engine continue on without a reset. If the engine is truly
stuck, we will back soon enough with the next reset attempt.

v2: Remove the stale debug message.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171216002206.31737-1-chris@chris-wilson.co.uk

f6ba181a

15 Dec, 2017 3 commits

x86/gpu: add CFL to early quirks · 33aa69ed

Lucas De Marchi authored Dec 13, 2017

CFL was missing from intel_early_ids[]. The PCI ID needs to be there to
allow the memory region to be stolen, otherwise we could have RAM being
arbitrarily overwritten if for example we keep using the UEFI framebuffer,
depending on how BIOS has set up the e820 map.

Fixes: b056f8f3 ("drm/i915/cfl: Add Coffee Lake PCI IDs for S Skus.")
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Anusha Srivatsa <anusha.srivatsa@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: David Airlie <airlied@linux.ie>
Cc: intel-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Cc: <stable@vger.kernel.org> # v4.13+ 0890540e drm/i915: add GT number to intel_device_info
Cc: <stable@vger.kernel.org> # v4.13+ 41693fd5 drm/i915/kbl: Change a KBL pci id to GT2 from GT1.5
Cc: <stable@vger.kernel.org> # v4.13+
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171213200425.2954-1-lucas.demarchi@intel.com

33aa69ed

drm/i915/lpe: Remove double-encapsulation of info string · 99cd05c4

Chris Wilson authored Dec 13, 2017

Just printk the string, or at least do not double up on the newlines!

Fixes: eef57324 ("drm/i915: setup bridge for HDMI LPE audio driver")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Cc: Jerome Anand <jerome.anand@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Takashi Iwai <tiwai@suse.de>
Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171213182858.2159-1-chris@chris-wilson.co.uk

99cd05c4

drm/i915: Allow internal page allocations to fail · ee2202d7

Chris Wilson authored Dec 15, 2017

Internal objects consistent of scratch pages not subject to the
persistence guarantees of user facing objects. They are used for
example, in ring buffers where they are only required for temporary
storage of commands that will be rewritten every time. As they are
temporary constructs, quietly report -ENOMEM back along the callchain
rather than subject the system to oomkiller if an allocation fails.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171215101753.1519-1-chris@chris-wilson.co.uk

ee2202d7

14 Dec, 2017 12 commits

drm/i915: Update DRIVER_DATE to 20171214 · ee5b5bf3
Rodrigo Vivi authored Dec 14, 2017
```
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
```
ee5b5bf3

Merge tag 'gvt-next-2017-12-14' of https://github.com/intel/gvt-linux into drm-intel-next-queued · 07825e4b

Rodrigo Vivi authored Dec 14, 2017

gvt-next-2017-12-14:

- fixes for two coverity scan errors (Colin)
- mmio switch code refine (Changbin)
- more virtual display dmabuf fixes (Tina/Gustavo)
- misc cleanups (Pei)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171214033434.jlppjlyal5d67ya7@zhen-hp.sh.intel.com

07825e4b

drm/i915: properly init lockdep class · 8d8c46fa

Sebastian Andrzej Siewior authored Dec 14, 2017

The code has an ifdef and uses two functions to either init the bare
spinlock or init it and set a lock-class. It is possible to do the same
thing without an ifdef.
With this patch (in debug case) we first use the "default" lock class
which is later overwritten to the supplied one. Without lockdep the set
name/class function vanishes.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171214131009.7479-1-joonas.lahtinen@linux.intel.comSigned-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

8d8c46fa

drm/i915: Show engine state when hangcheck detects a stall · 84ef3a72

Chris Wilson authored Dec 14, 2017

Knowing the state of the engine when hangcheck thinks it is stalling is
useful for both debugging hangcheck itself and the potential cause of an
unwanted stall.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171214122613.26134-1-chris@chris-wilson.co.uk

84ef3a72

drm/i915: make CS frequency read support missing more obvious · fe66e928

Lionel Landwerlin authored Dec 13, 2017

As suggested by Chris, we should make this more obvious for people
working with newer generations.
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171213171154.6201-1-lionel.g.landwerlin@intel.com

fe66e928

drm/i915/guc: Extract doorbell verification into a function · a00cfc95

Michał Winiarski authored Dec 13, 2017

We have the selftest that's checking doorbell create/destroy, so there's
no need to check all doorbells delaying the reset every time.
We do want to have that extra sanity check at module load/unload though.
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171213221352.7173-7-michal.winiarski@intel.com

a00cfc95

drm/i915/guc: Extract clients allocation to submission_init · 8ec52ec8

Michał Winiarski authored Dec 13, 2017

We can now move the clients allocation to submission_init path, rather
than keeping the condition inside submission_enable called on every
reset.
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171213221352.7173-6-michal.winiarski@intel.com

8ec52ec8

drm/i915/guc: Extract doorbell creation from client allocation · 9192d4fb

Michał Winiarski authored Dec 13, 2017

Full GPU reset causes GuC to be reset. This means that every time we're
doing a reset, we need to talk to GuC and tell it about doorbells.
Let's separate the communication part (create_doorbell) from our
internal bookkeeping (reserve_doorbell) so that we can cleanly separate
the initialization done at module load from reinitialization done at
reset in the following patch.
While I'm here, let's also add a proper (although slightly asymetric)
cleanup that doesn't try to communicate with GuC after it's already
gone, getting rid of "expected" warnings caused by GuC action failures
on module unload.

Note that I've also removed one of the tests (bitmap out of sync), since
it doesn't make much sense anymore - bitmaps are now not expected to
change during the lifetime of a client.
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171213221352.7173-5-michal.winiarski@intel.com

9192d4fb

drm/i915/guc: Call invalidate after changing the vfunc · aeb950bd

Michał Winiarski authored Dec 13, 2017

To make this operation a bit cleaner, we should make sure that the HW
can catch up by calling the new implementation right away.
Note that currently we're only touching the vfunc at module load time
(before GuC is even loaded), so this shouldn't cause any functional
changes.
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171213221352.7173-4-michal.winiarski@intel.com

aeb950bd

drm/i915/guc: Extract guc_init from guc_init_hw · 61b5c158

Michał Winiarski authored Dec 13, 2017

After GPU reset, GuC HW needs to be reinitialized (with FW reload).
Unfortunately, we're doing some extra work there (mostly allocating stuff),
work that can be moved to guc_init and called once at driver load time.

As a side effect we're no longer hitting an assert in
i915_ggtt_enable_guc on suspend/resume.

v2: Do not duplicate disable_communication / reset_guc_interrupts
v3: Add proper teardown after rebase

References: 04f7b24e ("drm/i915/guc: Assert that we switch between known ggtt->invalidate functions")
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171213221352.7173-3-michal.winiarski@intel.com

61b5c158

drm/i915/guc: Move GuC workqueue allocations outside of the mutex · 3176ff49

Michał Winiarski authored Dec 13, 2017

This gets rid of the following lockdep splat:

======================================================
WARNING: possible circular locking dependency detected
4.15.0-rc2-CI-Patchwork_7428+ #1 Not tainted
------------------------------------------------------
debugfs_test/1351 is trying to acquire lock:
 (&dev->struct_mutex){+.+.}, at: [<000000009d90d1a3>] i915_mutex_lock_interruptible+0x47/0x130 [i915]

but task is already holding lock:
 (&mm->mmap_sem){++++}, at: [<000000005df01c1e>] __do_page_fault+0x106/0x560

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #6 (&mm->mmap_sem){++++}:
       __might_fault+0x63/0x90
       _copy_to_user+0x1e/0x70
       filldir+0x8c/0xf0
       dcache_readdir+0xeb/0x160
       iterate_dir+0xe6/0x150
       SyS_getdents+0xa0/0x130
       entry_SYSCALL_64_fastpath+0x1c/0x89

-> #5 (&sb->s_type->i_mutex_key#5){++++}:
       lockref_get+0x9/0x20

-> #4 ((completion)&req.done){+.+.}:
       wait_for_common+0x54/0x210
       devtmpfs_create_node+0x130/0x150
       device_add+0x5ad/0x5e0
       device_create_groups_vargs+0xd4/0xe0
       device_create+0x35/0x40
       msr_device_create+0x22/0x40
       cpuhp_invoke_callback+0xc5/0xbf0
       cpuhp_thread_fun+0x167/0x210
       smpboot_thread_fn+0x17f/0x270
       kthread+0x173/0x1b0
       ret_from_fork+0x24/0x30

-> #3 (cpuhp_state-up){+.+.}:
       cpuhp_issue_call+0x132/0x1c0
       __cpuhp_setup_state_cpuslocked+0x12f/0x2a0
       __cpuhp_setup_state+0x3a/0x50
       page_writeback_init+0x3a/0x5c
       start_kernel+0x393/0x3e2
       secondary_startup_64+0xa5/0xb0

-> #2 (cpuhp_state_mutex){+.+.}:
       __mutex_lock+0x81/0x9b0
       __cpuhp_setup_state_cpuslocked+0x4b/0x2a0
       __cpuhp_setup_state+0x3a/0x50
       page_alloc_init+0x1f/0x26
       start_kernel+0x139/0x3e2
       secondary_startup_64+0xa5/0xb0

-> #1 (cpu_hotplug_lock.rw_sem){++++}:
       cpus_read_lock+0x34/0xa0
       apply_workqueue_attrs+0xd/0x40
       __alloc_workqueue_key+0x2c7/0x4e1
       intel_guc_submission_init+0x10c/0x650 [i915]
       intel_uc_init_hw+0x29e/0x460 [i915]
       i915_gem_init_hw+0xca/0x290 [i915]
       i915_gem_init+0x115/0x3a0 [i915]
       i915_driver_load+0x9a8/0x16c0 [i915]
       i915_pci_probe+0x2e/0x90 [i915]
       pci_device_probe+0x9c/0x120
       driver_probe_device+0x2a3/0x480
       __driver_attach+0xd9/0xe0
       bus_for_each_dev+0x57/0x90
       bus_add_driver+0x168/0x260
       driver_register+0x52/0xc0
       do_one_initcall+0x39/0x150
       do_init_module+0x56/0x1ef
       load_module+0x231c/0x2d70
       SyS_finit_module+0xa5/0xe0
       entry_SYSCALL_64_fastpath+0x1c/0x89

-> #0 (&dev->struct_mutex){+.+.}:
       lock_acquire+0xaf/0x200
       __mutex_lock+0x81/0x9b0
       i915_mutex_lock_interruptible+0x47/0x130 [i915]
       i915_gem_fault+0x201/0x760 [i915]
       __do_fault+0x15/0x70
       __handle_mm_fault+0x85b/0xe40
       handle_mm_fault+0x14f/0x2f0
       __do_page_fault+0x2d1/0x560
       page_fault+0x22/0x30

other info that might help us debug this:

Chain exists of:
  &dev->struct_mutex --> &sb->s_type->i_mutex_key#5 --> &mm->mmap_sem

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&mm->mmap_sem);
                               lock(&sb->s_type->i_mutex_key#5);
                               lock(&mm->mmap_sem);
  lock(&dev->struct_mutex);

 *** DEADLOCK ***

1 lock held by debugfs_test/1351:
 #0:  (&mm->mmap_sem){++++}, at: [<000000005df01c1e>] __do_page_fault+0x106/0x560

stack backtrace:
CPU: 2 PID: 1351 Comm: debugfs_test Not tainted 4.15.0-rc2-CI-Patchwork_7428+ #1
Hardware name:                  /NUC6i5SYB, BIOS SYSKLi35.86A.0057.2017.0119.1758 01/19/2017
Call Trace:
 dump_stack+0x5f/0x86
 print_circular_bug+0x230/0x3b0
 check_prev_add+0x439/0x7b0
 ? lockdep_init_map_crosslock+0x20/0x20
 ? unwind_get_return_address+0x16/0x30
 ? __lock_acquire+0x1385/0x15a0
 __lock_acquire+0x1385/0x15a0
 lock_acquire+0xaf/0x200
 ? i915_mutex_lock_interruptible+0x47/0x130 [i915]
 __mutex_lock+0x81/0x9b0
 ? i915_mutex_lock_interruptible+0x47/0x130 [i915]
 ? i915_mutex_lock_interruptible+0x47/0x130 [i915]
 ? i915_mutex_lock_interruptible+0x47/0x130 [i915]
 i915_mutex_lock_interruptible+0x47/0x130 [i915]
 ? __pm_runtime_resume+0x4f/0x80
 i915_gem_fault+0x201/0x760 [i915]
 __do_fault+0x15/0x70
 __handle_mm_fault+0x85b/0xe40
 handle_mm_fault+0x14f/0x2f0
 __do_page_fault+0x2d1/0x560
 page_fault+0x22/0x30
RIP: 0033:0x7f98d6f49116
RSP: 002b:00007ffd6ffc3278 EFLAGS: 00010283
RAX: 00007f98d39a2bc0 RBX: 0000000000000000 RCX: 0000000000001680
RDX: 0000000000001680 RSI: 00007ffd6ffc3400 RDI: 00007f98d39a2bc0
RBP: 00007ffd6ffc33a0 R08: 0000000000000000 R09: 00000000000005a0
R10: 000055e847c2a830 R11: 0000000000000002 R12: 0000000000000001
R13: 000055e847c1d040 R14: 00007ffd6ffc3400 R15: 00007f98d6752ba0

v2: Init preempt_work unconditionally (Chris)
v3: Mention that we need the enable_guc=1 for lockdep splat (Chris)

Testcase: igt/debugfs_test/read_all_entries # with i915.enable_guc=1
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171213221352.7173-2-michal.winiarski@intel.com

3176ff49

drm/i915/guc: Move shared data allocation away from submission path · 1bbbca0d

Michał Winiarski authored Dec 13, 2017

We need shared data for actions (e.g. guc suspend/resume), and we're
using those with GuC submission disabled.
Let's introduce intel_guc_init and move shared data alloc there.

This fixes GPF during module unload with HuC, but without GuC submission:

BUG: unable to handle kernel NULL pointer dereference at 000000005aee7809
IP: intel_guc_suspend+0x34/0x140 [i915]
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: i915(O-) netconsole x86_pkg_temp_thermal
intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
mei_me i2c_i801 mei prime_numbers [last unloaded: i915]
CPU: 2 PID: 2794 Comm: rmmod Tainted: G     U  W  O 4.15.0-rc2+ #297
Hardware name: /NUC6i5SYB, BIOS SYSKLi35.86A.0054.2016.0930.1102 09/30/2016
task: 0000000055945c61 task.stack: 00000000264ccb43
RIP: 0010:intel_guc_suspend+0x34/0x140 [i915]
RSP: 0018:ffffc90000483df8 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff880829180000 RCX: 0000000000000000
RDX: 0000000000000006 RSI: ffff880844c2c938 RDI: ffff880844c2c000
RBP: ffff880829180000 R08: 00000000a29c58c1 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa040ba40
R13: ffffffffa040bab0 R14: ffff88084a195060 R15: 000055df3ef357a0
FS:  00007ff43c043740(0000) GS:ffff88084e200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000f9 CR3: 000000083f179005 CR4: 00000000003606e0
Call Trace:
 i915_gem_suspend+0x9d/0x130 [i915]
 ? i915_driver_unload+0x68/0x180 [i915]
 i915_driver_unload+0x70/0x180 [i915]
 i915_pci_remove+0x15/0x20 [i915]
 pci_device_remove+0x36/0xb0
 device_release_driver_internal+0x15f/0x220
 driver_detach+0x3a/0x80
 bus_remove_driver+0x58/0xd0
 pci_unregister_driver+0x29/0x90
 SyS_delete_module+0x150/0x1e0
 entry_SYSCALL_64_fastpath+0x23/0x9a
RIP: 0033:0x7ff43b51b5c7
RSP: 002b:00007ffe6825a758 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007ff43b51b5c7
RDX: 000000000000000a RSI: 0000000000000800 RDI: 000055df3ef35808
RBP: 0000000000000000 R08: 00007ffe682596d1 R09: 0000000000000000
R10: 00007ff43b594880 R11: 0000000000000206 R12: 000055df3ef357a0
R13: 00007ffe68259740 R14: 000055df3ef35260 R15: 000055df3ef357a0
Code: 00 00 02 74 03 31 c0 c3 53 48 89 fb 48 83 ec 10 e8 52 0f
f8 ff 48 b8 01 05 00 00 02 00 00 00 48 89 44 24 04 48 8b 83 00 12 00 00 <f6> 80
f9 00 00 00 01 0f 84 a7 00 00 00 f6 80 98 00 00 00 01 0f
RIP: intel_guc_suspend+0x34/0x140 [i915] RSP: ffffc90000483df8
CR2: 00000000000000f9
---[ end trace 23a192a61d937a3e ]---

Fixes: b8e5eb96 ("drm/i915/guc: Allocate separate shared data object for GuC communication")
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171213221352.7173-1-michal.winiarski@intel.com

1bbbca0d

13 Dec, 2017 2 commits

drm/i915: Unwind i915_gem_init() failure · 6ca9a2be

Chris Wilson authored Dec 13, 2017

Since Michal introduced new user controllable errors other than -EIO
during i915_gem_init(), we need to actually unwind on the error path as
we have to abort the module load (and we expect to do so cleanly!).

As we now teardown key state and then mark the driver as wedged (on
EIO), we have to be careful to not allow ourselves to resume and
unwedge, thus attempting to use the uninitialised driver.

v2: Try not to free driver state for the suppressed EIO
v3: Use load-fault-injection to test both error/recovery paths.

References: 8620eb1d ("drm/i915/uc: Don't use -EIO to report missing firmware")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171213134347.4608-1-chris@chris-wilson.co.uk

6ca9a2be

drm/i915: Ratelimit request allocation under oom · 31c70f97

Chris Wilson authored Dec 12, 2017

If we fail to allocate a request, we can reap the outstanding requests
and push them to the request's slab's freelist before trying again. This
forces us to ratelimit malicious clients that tie up all of the system
resources in requests, instead of causing a system-wide oom.

Testcase: igt/gem_shrink/execbuf1
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171212180652.22061-3-chris@chris-wilson.co.uk

31c70f97