Commits · dc34ca9231f2631e635a4737242bc0f7fe5c4a45 · Kirill Smelkov / linux

15 Sep, 2021 3 commits

drm/i915: Mark GPU wedging on driver unregister unrecoverable · dc34ca92

Janusz Krzysztofik authored Sep 03, 2021

GPU wedged flag now set on driver unregister to prevent from further
using the GPU can be then cleared unintentionally when calling
__intel_gt_unset_wedged() still before the flag is finally marked
unrecoverable.  We need to have it marked unrecoverable earlier.
Implement that by replacing a call to intel_gt_set_wedged() in
intel_gt_driver_unregister() with intel_gt_set_wedged_on_fini().

With the above in place, intel_gt_set_wedged_on_fini() is now called
twice on driver remove, second time from __intel_gt_disable().  This
seems harmless, while dropping intel_gt_set_wedged_on_fini() from
__intel_gt_disable() proved to break some driver probe error unwind
paths as well as mock selftest exit path.
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210903142837.216978-1-janusz.krzysztofik@linux.intel.com

dc34ca92

drm/i915: Add mmap lock around vma_lookup() in the mman selftest. · ce079f6d

Maarten Lankhorst authored Sep 15, 2021

Add mmap_read_lock/unlock around vma_lookup(). The core code requires
this for lookups. Since we only check if the return value is NULL,
we can immediately unlock.

This fixes the following splat in the selftes:

i915: Running i915_gem_mman_live_selftests/igt_mmap
------------[ cut here ]------------
WARNING: CPU: 3 PID: 5654 at include/linux/mmap_lock.h:164 find_vma+0x4e/0xb0
Modules linked in: i915(+) vgem fuse snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio mei_hdcp x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_intel_dspcfg snd_hda_codec snd_hwdep e1000e snd_hda_core ptp snd_pcm ttm mei_me pps_core i2c_i801 prime_numbers i2c_smbus mei [last unloaded: i915]
CPU: 3 PID: 5654 Comm: i915_selftest Tainted: G     U            5.15.0-rc1-CI-Trybot_7984+ #1
Hardware name: Micro-Star International Co., Ltd. MS-7B54/Z370M MORTAR (MS-7B54), BIOS 1.00 10/31/2017
RIP: 0010:find_vma+0x4e/0xb0
Code: de 48 89 ef e8 d3 94 fe ff 48 85 c0 74 34 48 83 c4 08 5b 5d c3 48 8d bf 28 01 00 00 be ff ff ff ff e8 d6 46 8b 00 85 c0 75 c8 <0f> 0b 48 8b 85 b8 00 00 00 48 85 c0 75 c6 48 89 ef e8 12 26 87 00
RSP: 0018:ffffc900013df980 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 00007f9df2b80000 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffffffff822e314c RDI: ffffffff8233c83f
RBP: ffff88811bafc840 R08: ffff888107d0ddb8 R09: 00000000fffffffe
R10: 0000000000000001 R11: 00000000ffbae7ba R12: 0000000000000000
R13: 0000000000000000 R14: ffff88812a710000 R15: ffff888114fa42c0
FS:  00007f9def9d4c00(0000) GS:ffff888266580000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f799627fe50 CR3: 000000011bbc2006 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 __igt_mmap+0xe0/0x490 [i915]
 igt_mmap+0xd2/0x160 [i915]
 ? __trace_bprintk+0x6e/0x80
 __i915_subtests.cold.7+0x42/0x92 [i915]
 ? i915_perf_selftests+0x20/0x20 [i915]
 ? __i915_nop_setup+0x10/0x10 [i915]
 __run_selftests.part.3+0x10d/0x172 [i915]
 i915_live_selftests.cold.5+0x1f/0x47 [i915]
 i915_pci_probe+0x93/0x1d0 [i915]
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Closes: https://gitlab.freedesktop.org/drm/intel/issues/4129
Link: https://patchwork.freedesktop.org/patch/msgid/20210915105946.394412-1-maarten.lankhorst@linux.intel.comReviewed-by: Matthew Auld <matthew.auld@intel.com>

ce079f6d

Merge drm/drm-next into drm-intel-gt-next · d5dd580d

Joonas Lahtinen authored Sep 15, 2021

Close the divergence which has caused patches not to apply and
have a solid baseline for the PXP patches that Rodrigo will send
a topic branch PR for.
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

d5dd580d

14 Sep, 2021 5 commits

drm/i915/dg2: Define MOCS table for DG2 · e9354051

Matt Roper authored Sep 03, 2021

Bspec: 45101, 45427
Cc: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210904003544.2422282-3-matthew.d.roper@intel.com

e9354051

drm/i915/xehpsdv: Define MOCS table for XeHP SDV · 50bc6486

Lucas De Marchi authored Sep 03, 2021

Like DG1, XeHP SDV doesn't have LLC/eDRAM control values due to being a
dgfx card. XeHP SDV adds 2 more bits: L3_GLBGO to "push the Go point to
memory for L3 destined transaction" and L3_LKP to "enable Lookup for
uncacheable accesses".

Bspec: 45101
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Clint Taylor <Clinton.A.Taylor@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210904003544.2422282-2-matthew.d.roper@intel.com

50bc6486

drm/i915: Enable -Wsometimes-uninitialized · 43192617

Nathan Chancellor authored Aug 24, 2021

This warning helps catch uninitialized variables. It should have been
enabled at the same time as commit b2423184 ("drm/i915: Enable
-Wuninitialized") but I did not realize they were disabled separately.
Enable it now that i915 is clean so that it stays that way.
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210824225427.2065517-4-nathan@kernel.org

43192617

drm/i915/selftests: Always initialize err in igt_dmabuf_import_same_driver_lmem() · 46f20a35

Nathan Chancellor authored Aug 24, 2021

Clang warns:

drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c:127:13: warning:
variable 'err' is used uninitialized whenever 'if' condition is false
[-Wsometimes-uninitialized]
        } else if (PTR_ERR(import) != -EOPNOTSUPP) {
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c:138:9: note:
uninitialized use occurs here
        return err;
               ^~~
drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c:127:9: note: remove
the 'if' if its condition is always true
        } else if (PTR_ERR(import) != -EOPNOTSUPP) {
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c:95:9: note:
initialize the variable 'err' to silence this warning
        int err;
               ^
                = 0

The test is expected to pass if i915_gem_prime_import() returns
-EOPNOTSUPP so initialize err to zero in this case.

Fixes: cdb35d1e ("drm/i915/gem: Migrate to system at dma-buf attach time (v7)")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210824225427.2065517-3-nathan@kernel.org

46f20a35

drm/i915/selftests: Do not use import_obj uninitialized · 4796054b

Nathan Chancellor authored Aug 24, 2021

Clang warns a couple of times:

drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c:63:6: warning:
variable 'import_obj' is used uninitialized whenever 'if' condition is
true [-Wsometimes-uninitialized]
        if (import != &obj->base) {
            ^~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c:80:22: note:
uninitialized use occurs here
        i915_gem_object_put(import_obj);
                            ^~~~~~~~~~
drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c:63:2: note: remove
the 'if' if its condition is always false
        if (import != &obj->base) {
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c:38:46: note:
initialize the variable 'import_obj' to silence this warning
        struct drm_i915_gem_object *obj, *import_obj;
                                                    ^
                                                     = NULL

Shuffle the import_obj initialization above these if statements so that
it is not used uninitialized.

Fixes: d7b2cb38 ("drm/i915/gem: Correct the locking and pin pattern for dma-buf (v8)")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210824225427.2065517-2-nathan@kernel.org

4796054b

13 Sep, 2021 23 commits

drm/i915/guc: Add GuC kernel doc · 4f41ddc7

Matthew Brost authored Sep 09, 2021

Add GuC kernel doc for all structures added thus far for GuC submission
and update the main GuC submission section with the new interface
details.

v2:
 - Drop guc_active.lock DOC
v3:
 - Fixup a few kernel doc comments (Daniele)
v4 (Daniele):
 - Implement doc suggestions from John
 - Add kerneldoc for all members of the GuC structure and pull the file
   in i915.rst
v5 (Daniele):
 - Implement new doc suggestions from John
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-24-matthew.brost@intel.com

4f41ddc7

drm/i915/guc: Drop guc_active move everything into guc_state · af5bc9f2

Matthew Brost authored Sep 09, 2021

Now that we have locking hierarchy of sched_engine->lock ->
ce->guc_state everything from guc_active can be moved into guc_state and
protected the guc_state.lock.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-23-matthew.brost@intel.com

af5bc9f2

drm/i915/guc: Move fields protected by guc->contexts_lock into sub structure · 3cb3e343

Matthew Brost authored Sep 09, 2021

To make ownership of locking clear move fields (guc_id, guc_id_ref,
guc_id_link) to sub structure guc_id in intel_context.
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-22-matthew.brost@intel.com

3cb3e343

drm/i915/guc: Move GuC priority fields in context under guc_active · 9798b172

Matthew Brost authored Sep 09, 2021

Move GuC management fields in context under guc_active struct as this is
where the lock that protects theses fields lives. Also only set guc_prio
field once during context init.

v2:
 (Daniele)
  - set CONTEXT_SET_INIT
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-21-matthew.brost@intel.com

9798b172

drm/i915/guc: Drop pin count check trick between sched_disable and re-pin · 5b116c17

Matthew Brost authored Sep 09, 2021

Drop pin count check trick between a sched_disable and re-pin, now rely
on the lock and counter of the number of committed requests to determine
if scheduling should be disabled on the context.
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-20-matthew.brost@intel.com

5b116c17

drm/i915/guc: Proper xarray usage for contexts_lookup · 1424ba81

Matthew Brost authored Sep 09, 2021

Lock the xarray and take ref to the context if needed.

v2:
 (Checkpatch)
  - Add new line after declaration
 (Daniel Vetter)
  - Correct put / get accounting in xa_for_loops
v3:
 (Checkpatch)
  - Extra new line
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-19-matthew.brost@intel.com

1424ba81

drm/i915/guc: Rework and simplify locking · 0f797650

Matthew Brost authored Sep 09, 2021

Rework and simplify the locking with GuC subission. Drop
sched_state_no_lock and move all fields under the guc_state.sched_state
and protect all these fields with guc_state.lock . This requires
changing the locking hierarchy from guc_state.lock -> sched_engine.lock
to sched_engine.lock -> guc_state.lock.

v2:
 (Daniele)
  - Don't check fields outside of lock during sched disable, check less
    fields within lock as some of the outside are no longer needed
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-18-matthew.brost@intel.com

0f797650

drm/i915/guc: Move guc_blocked fence to struct guc_state · 52d66c06

Matthew Brost authored Sep 09, 2021

Move guc_blocked fence to struct guc_state as the lock which protects
the fence lives there.

s/ce->guc_blocked/ce->guc_state.blocked/g

v2:
 (Daniele)
  - s/blocked_fence/blocked/g
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-17-matthew.brost@intel.com

52d66c06

drm/i915/guc: Release submit fence from an irq_work · b0d83888

Matthew Brost authored Sep 09, 2021

A subsequent patch will flip the locking hierarchy from
ce->guc_state.lock -> sched_engine->lock to sched_engine->lock ->
ce->guc_state.lock. As such we need to release the submit fence for a
request from an IRQ to break a lock inversion - i.e. the fence must be
release went holding ce->guc_state.lock and the releasing of the can
acquire sched_engine->lock.

v2:
 (Daniele)
  - Delete request from list before calling irq_work_queue
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-16-matthew.brost@intel.com

b0d83888

drm/i915/guc: Reset LRC descriptor if register returns -ENODEV · ae36b629

Matthew Brost authored Sep 09, 2021

Reset LRC descriptor if a context register returns -ENODEV as this means
we are mid-reset.

Fixes: eb5e7da7 ("drm/i915/guc: Reset implementation for new GuC interface")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-15-matthew.brost@intel.com

ae36b629

drm/i915/guc: Don't touch guc_state.sched_state without a lock · f16d5cb9

Matthew Brost authored Sep 09, 2021

Before we did some clever tricks to not use the a lock when touching
guc_state.sched_state in certain cases. Don't do that, enforce the use
of the lock.

v2:
 (kernel test robo )
  - Add __maybe_unused to sched_state_is_init()

v3: rebase after the unused code path removal has been moved to an
earlier patch.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-14-matthew.brost@intel.com

f16d5cb9

drm/i915/guc: Take context ref when cancelling request · 422cda4f

Matthew Brost authored Sep 09, 2021

A context can get destroyed after cancelling a request, if a context or
GT reset occurs, so take a reference to context when cancelling a
request.

Fixes: 62eaf0ae ("drm/i915/guc: Support request cancellation")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-13-matthew.brost@intel.com

422cda4f

drm/i915/selftests: Add initial GuC selftest for scrubbing lost G2H · d2420c2e

Matthew Brost authored Sep 09, 2021

While debugging an issue with full GT resets I went down a rabbit hole
thinking the scrubbing of lost G2H wasn't working correctly. This proved
to be incorrect as this was working just fine but this chase inspired me
to write a selftest to prove that this works. This simple selftest
injects errors dropping various G2H and then issues a full GT reset
proving that the scrubbing of these G2H doesn't blow up.

v2:
 (Daniel Vetter)
  - Use ifdef instead of macros for selftests
v3:
 (Checkpatch)
  - A space after 'switch' statement
v4:
 (Daniele)
  - A comment saying GT won't idle if G2H are lost
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-12-matthew.brost@intel.com

d2420c2e

drm/i915/guc: Copy whole golden context, set engine state size of subset · d135865c

Matthew Brost authored Sep 09, 2021

When the GuC does a media reset, it copies a golden context state back
into the corrupted context's state. The address of the golden context
and the size of the engine state restore are passed in via the GuC ADS.
The i915 had a bug where it passed in the whole size of the golden
context, not the size of the engine state to restore resulting in a
memory corruption.

Also copy the entire golden context on init rather than just the engine
state that is restored.

v2 (Daniele): use defines to avoid duplicated const variables (John).

Fixes: 481d458c ("drm/i915/guc: Add golden context to GuC ADS")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-11-matthew.brost@intel.com

d135865c

drm/i915/guc: Don't enable scheduling on a banned context, guc_id invalid, not registered · 9888beaa

Matthew Brost authored Sep 09, 2021

When unblocking a context, do not enable scheduling if the context is
banned, guc_id invalid, or not registered.

v2:
 (Daniele)
  - Add helper for unblock

Fixes: 62eaf0ae ("drm/i915/guc: Support request cancellation")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-10-matthew.brost@intel.com

9888beaa

drm/i915/guc: Kick tasklet after queuing a request · cf37e5c8

Matthew Brost authored Sep 09, 2021

Kick tasklet after queuing a request so it submitted in a timely manner.

Fixes: 3a4cdf19 ("drm/i915/guc: Implement GuC context operations for new inteface")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-9-matthew.brost@intel.com

cf37e5c8

Revert "drm/i915/gt: Propagate change in error status to children on unhold" · ac653dd7

Matthew Brost authored Sep 09, 2021

Propagating errors to dependent fences is broken and can lead to errors
from one client ending up in another. In commit 3761baae ("Revert
"drm/i915: Propagate errors on awaiting already signaled fences""), we
attempted to get rid of fence error propagation but missed the case
added in commit 8e9f84cf ("drm/i915/gt: Propagate change in error
status to children on unhold"). Revert that one too. This error was
found by an up-and-coming selftest which triggers a reset during
request cancellation and verifies that subsequent requests complete
successfully.

v2:
 (Daniel Vetter)
  - Use revert
v3:
 (Jason)
  - Update commit message

v4 (Daniele):
 - fix checkpatch error in commit message.

References: '3761baae ("Revert "drm/i915: Propagate errors on awaiting already signaled fences"")'
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-8-matthew.brost@intel.com

ac653dd7

drm/i915/guc: Workaround reset G2H is received after schedule done G2H · 1ca36cff

Matthew Brost authored Sep 09, 2021

If the context is reset as a result of the request cancellation the
context reset G2H is received after schedule disable done G2H which is
the wrong order. The schedule disable done G2H release the waiting
request cancellation code which resubmits the context. This races
with the context reset G2H which also wants to resubmit the context but
in this case it really should be a NOP as request cancellation code owns
the resubmit. Use some clever tricks of checking the context state to
seal this race until the GuC firmware is fixed.

v2:
 (Checkpatch)
  - Fix typos
v3:
 (Daniele)
  - State that is a bug in the GuC firmware

Fixes: 62eaf0ae ("drm/i915/guc: Support request cancellation")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-7-matthew.brost@intel.com

1ca36cff

drm/i915/guc: Process all G2H message at once in work queue · d67e3d5a

Matthew Brost authored Sep 09, 2021

Rather than processing 1 G2H at a time and re-queuing the work queue if
more messages exist, process all the G2H in a single pass of the work
queue.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-6-matthew.brost@intel.com

d67e3d5a

drm/i915/guc: Don't drop ce->guc_active.lock when unwinding context · 88209a8e

Matthew Brost authored Sep 09, 2021

Don't drop ce->guc_active.lock when unwinding a context after reset.
At one point we had to drop this because of a lock inversion but that is
no longer the case. It is much safer to hold the lock so let's do that.

Fixes: eb5e7da7 ("drm/i915/guc: Reset implementation for new GuC interface")
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-5-matthew.brost@intel.com

88209a8e

drm/i915/guc: Unwind context requests in reverse order · c39f51cc

Matthew Brost authored Sep 09, 2021

When unwinding requests on a reset context, if other requests in the
context are in the priority list the requests could be resubmitted out
of seqno order. Traverse the list of active requests in reverse and
append to the head of the priority list to fix this.

Fixes: eb5e7da7 ("drm/i915/guc: Reset implementation for new GuC interface")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-4-matthew.brost@intel.com

c39f51cc

drm/i915/guc: Fix outstanding G2H accounting · 669b949c

Matthew Brost authored Sep 09, 2021

A small race that could result in incorrect accounting of the number
of outstanding G2H. Basically prior to this patch we did not increment
the number of outstanding G2H if we encoutered a GT reset while sending
a H2G. This was incorrect as the context state had already been updated
to anticipate a G2H response thus the counter should be incremented.

As part of this change we remove a legacy (now unused) path that was the
last caller requiring a G2H response that was not guaranteed to loop.
This allows us to simplify the accounting as we don't need to handle the
case where the send fails due to the channel being busy.

Also always use helper when decrementing this value.

v2 (Daniele): update GEM_BUG_ON check, pull in dead code removal from
later patch, remove loop param from context_deregister.

Fixes: f4eb1f3f ("drm/i915/guc: Ensure G2H response has space in buffer")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-3-matthew.brost@intel.com

669b949c

drm/i915/guc: Fix blocked context accounting · fc30a676

Matthew Brost authored Sep 09, 2021

Prior to this patch the blocked context counter was cleared on
init_sched_state (used during registering a context & resets) which is
incorrect. This state needs to be persistent or the counter can read the
incorrect value resulting in scheduling never getting enabled again.

Fixes: 62eaf0ae ("drm/i915/guc: Support request cancellation")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: <stable@vger.kernel.org>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-2-matthew.brost@intel.com

fc30a676

12 Sep, 2021 9 commits

Linux 5.15-rc1 · 6880fa6c
Linus Torvalds authored Sep 12, 2021

6880fa6c

Merge tag 'perf-tools-for-v5.15-2021-09-11' of... · b5b65f13

Linus Torvalds authored Sep 12, 2021

Merge tag 'perf-tools-for-v5.15-2021-09-11' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull more perf tools updates from Arnaldo Carvalho de Melo:

 - Add missing fields and remove some duplicate fields when printing a
   perf_event_attr.

 - Fix hybrid config terms list corruption.

 - Update kernel header copies, some resulted in new kernel features
   being automagically added to 'perf trace' syscall/tracepoint argument
   id->string translators.

 - Add a file generated during the documentation build to .gitignore.

 - Add an option to build without libbfd, as some distros, like Debian
   consider its ABI unstable.

 - Add support to print a textual representation of IBS raw sample data
   in 'perf report'.

 - Fix bpf 'perf test' sample mismatch reporting

 - Fix passing arguments to stackcollapse report in a 'perf script'
   python script.

 - Allow build-id with trailing zeros.

 - Look for ImageBase in PE file to compute .text offset.

* tag 'perf-tools-for-v5.15-2021-09-11' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (25 commits)
  tools headers UAPI: Update tools's copy of drm.h headers
  tools headers UAPI: Sync drm/i915_drm.h with the kernel sources
  tools headers UAPI: Sync linux/fs.h with the kernel sources
  tools headers UAPI: Sync linux/in.h copy with the kernel sources
  perf tools: Add an option to build without libbfd
  perf tools: Allow build-id with trailing zeros
  perf tools: Fix hybrid config terms list corruption
  perf tools: Factor out copy_config_terms() and free_config_terms()
  perf tools: Fix perf_event_attr__fprintf() missing/dupl. fields
  perf tools: Ignore Documentation dependency file
  perf bpf: Provide a weak btf__load_from_kernel_by_id() for older libbpf versions
  tools include UAPI: Update linux/mount.h copy
  perf beauty: Cover more flags in the  move_mount syscall argument beautifier
  tools headers UAPI: Sync linux/prctl.h with the kernel sources
  tools include UAPI: Sync sound/asound.h copy with the kernel sources
  tools headers UAPI: Sync linux/kvm.h with the kernel sources
  tools headers UAPI: Sync x86's asm/kvm.h with the kernel sources
  perf report: Add support to print a textual representation of IBS raw sample data
  perf report: Add tools/arch/x86/include/asm/amd-ibs.h
  perf env: Add perf_env__cpuid, perf_env__{nr_}pmu_mappings
  ...

b5b65f13

Merge tag 'compiler-attributes-for-linus-v5.15-rc1-v2' of git://github.com/ojeda/linux · c3e46874

Linus Torvalds authored Sep 12, 2021

Pull compiler attributes updates from Miguel Ojeda:

 - Fix __has_attribute(__no_sanitize_coverage__) for GCC 4 (Marco Elver)

 - Add Nick as Reviewer for compiler_attributes.h (Nick Desaulniers)

 - Move __compiletime_{error|warning} (Nick Desaulniers)

* tag 'compiler-attributes-for-linus-v5.15-rc1-v2' of git://github.com/ojeda/linux:
  compiler_attributes.h: move __compiletime_{error|warning}
  MAINTAINERS: add Nick as Reviewer for compiler_attributes.h
  Compiler Attributes: fix __has_attribute(__no_sanitize_coverage__) for GCC 4

c3e46874

Merge tag 'auxdisplay-for-linus-v5.15-rc1' of git://github.com/ojeda/linux · d41adc4e

Linus Torvalds authored Sep 12, 2021

Pull auxdisplay updates from Miguel Ojeda:
 "An assortment of improvements for auxdisplay:

   - Replace symbolic permissions with octal permissions (Jinchao Wang)

   - ks0108: Switch to use module_parport_driver() (Andy Shevchenko)

   - charlcd: Drop unneeded initializers and switch to C99 style (Andy
     Shevchenko)

   - hd44780: Fix oops on module unloading (Lars Poeschel)

   - Add I2C gpio expander example (Ralf Schlatterbeck)"

* tag 'auxdisplay-for-linus-v5.15-rc1' of git://github.com/ojeda/linux:
  auxdisplay: Replace symbolic permissions with octal permissions
  auxdisplay: ks0108: Switch to use module_parport_driver()
  auxdisplay: charlcd: Drop unneeded initializers and switch to C99 style
  auxdisplay: hd44780: Fix oops on module unloading
  auxdisplay: Add I2C gpio expander example

d41adc4e

Merge tag 'smp-urgent-2021-09-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f306b90c

Linus Torvalds authored Sep 12, 2021

Pull CPU hotplug updates from Thomas Gleixner:
 "Updates for the SMP and CPU hotplug:

   - Remove DEFINE_SMP_CALL_CACHE_FUNCTION() which is a left over of the
     original hotplug code and now causing trouble with the ARM64 cache
     topology setup due to the pointless SMP function call.

     It's not longer required as the hotplug callbacks are guaranteed to
     be invoked on the upcoming CPU.

   - Remove the deprecated and now unused CPU hotplug functions

   - Rewrite the CPU hotplug API documentation"

* tag 'smp-urgent-2021-09-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Documentation: core-api/cpuhotplug: Rewrite the API section
  cpu/hotplug: Remove deprecated CPU-hotplug functions.
  thermal: Replace deprecated CPU-hotplug functions.
  drivers: base: cacheinfo: Get rid of DEFINE_SMP_CALL_CACHE_FUNCTION()

f306b90c

Merge tag 'char-misc-5.15-rc1-lkdtm' of... · d8e988b6

Linus Torvalds authored Sep 12, 2021

Merge tag 'char-misc-5.15-rc1-lkdtm' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull misc driver fix from Greg KH:
 "Here is a single patch for 5.15-rc1, for the lkdtm misc driver.

  It resolves a build issue that many people were hitting with your
  current tree, and Kees and others felt would be good to get merged
  before -rc1 comes out, to prevent them from having to constantly hit
  it as many development trees restart on -rc1, not older -rc releases.

  It has NOT been in linux-next, but has passed 0-day testing and looks
  'obviously correct' when reviewing it locally :)"

* tag 'char-misc-5.15-rc1-lkdtm' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  lkdtm: Use init_uts_ns.name instead of macros

d8e988b6

Merge tag 'for-linus-5.15-1' of git://github.com/cminyard/linux-ipmi · 1791596b

Linus Torvalds authored Sep 12, 2021

Pull IPMI updates from Corey Minyard:
 "A couple of very minor fixes for style and rate limiting.

  Nothing big, but probably needs to go in"

* tag 'for-linus-5.15-1' of git://github.com/cminyard/linux-ipmi:
  char: ipmi: use DEVICE_ATTR helper macro
  ipmi: rate limit ipmi smi_event failure message

1791596b

Merge tag 'sched_urgent_for_v5.15_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 56c24438

Linus Torvalds authored Sep 12, 2021

Pull scheduler fixes from Borislav Petkov:

 - Make sure the idle timer expires in hardirq context, on PREEMPT_RT

 - Make sure the run-queue balance callback is invoked only on the
   outgoing CPU

* tag 'sched_urgent_for_v5.15_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Prevent balance_push() on remote runqueues
  sched/idle: Make the idle timer expire in hard interrupt context

56c24438

Merge tag 'locking_urgent_for_v5.15_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 165d05d8

Linus Torvalds authored Sep 12, 2021

Pull locking fixes from Borislav Petkov:

 - Fix the futex PI requeue machinery to not return to userspace in
   inconsistent state

 - Avoid a potential null pointer dereference in the ww_mutex deadlock
   check

 - Other smaller cleanups and optimizations

* tag 'locking_urgent_for_v5.15_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/rtmutex: Fix ww_mutex deadlock check
  futex: Remove unused variable 'vpid' in futex_proxy_trylock_atomic()
  futex: Avoid redundant task lookup
  futex: Clarify comment for requeue_pi_wake_futex()
  futex: Prevent inconsistent state and exit race
  futex: Return error code instead of assigning it without effect
  locking/rwsem: Add missing __init_rwsem() for PREEMPT_RT

165d05d8