Commits · 178ad0e280c088f5abfa61793cb992fa120d1830 · Kirill Smelkov / linux

02 Sep, 2024 26 commits

drm/amdgpu/mes11: implement mmio queue reset for gfx11 · 178ad0e2

Jiadong Zhu authored Jul 04, 2024

Implement queue reset for graphic and compute queue.

v2: use amdgpu_gfx_rlc funcs to enter/exit safe mode.
v3: use gfx_v11_0_request_gfx_index_mutex()
v4: fix mutex handling
Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

178ad0e2

drm/amdgpu/mes: implement amdgpu_mes_reset_hw_queue_mmio · 01b4ae38

Jiadong Zhu authored Jul 04, 2024

The reset_queue api could be used from kfd or kgd.

v2: add use_mmio parameter for mes_reset_legacy_queue.
Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

01b4ae38

drm/amdgpu/mes: modify mes api for mmio queue reset · 8b2429a1

Jiadong Zhu authored Jul 04, 2024

Add me/pipe/queue parameters for queue reset input.

v2: fix build (Alex)
Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

8b2429a1

drm/amdgpu/gfx12: fallback to driver reset compute queue directly · 8fe4fde3

Alex Deucher authored Jul 01, 2024

Since the MES FW resets kernel compute queue always failed, this
may caused by the KIQ failed to process unmap KCQ. So, before MES
FW work properly that will fallback to driver executes dequeue and
resets SPI directly. Besides, rework the ring reset function and make
the busy ring type reset in each function respectively.
Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

8fe4fde3

drm/amdgpu/gfx12: add ring reset callbacks · 24805998

Alex Deucher authored Jun 03, 2024

Add ring reset callbacks for gfx and compute.
Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

24805998

drm/amdgpu/gfx10: rework reset sequence · d1f21443

Alex Deucher authored Jul 01, 2024

To match other GFX IPs.
Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

d1f21443

drm/amdgpu/gfx10: wait for reset done before remap · 097af47d

Jiadong Zhu authored Jul 02, 2024

There is a racing condition that cp firmware modifies
MQD in reset sequence after driver updates it for
remapping. We have to wait till CP_HQD_ACTIVE becoming
false then remap the queue.

v2: fix KIQ locking (Alex)
v3: fix KIQ locking harder (Jessie)
Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

097af47d

drm/amdgpu/gfx10: remap queue after reset successfully · 2f3806f7

Jiadong Zhu authored Jun 14, 2024

Kiq command unmap_queues only does the dequeueing action.
We have to map the queue back with clean mqd.

v2: fix up error handling (Alex)
Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

2f3806f7

drm/amdgpu/gfx10: add ring reset callbacks · 1741281a

Alex Deucher authored May 24, 2024

Add ring reset callbacks for gfx and compute.

v2: fix gfx handling
v3: wait for KIQ to complete
Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1741281a

drm/amdgpu/gfx11: wait for reset done before remap · a10c9393

Jiadong Zhu authored Jul 02, 2024

There is a racing condition that cp firmware modifies
MQD in reset sequence after driver updates it for
remapping. We have to wait till CP_HQD_ACTIVE becoming
false then remap the queue.
Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

a10c9393

drm/amdgpu/gfx11: rename gfx_v11_0_gfx_init_queue() · 7d8e9e65

Alex Deucher authored Jul 01, 2024

Rename to gfx_v11_0_kgq_init_queue() to better align with
the other naming in the file.
Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

7d8e9e65

drm/amdgpu/gfx11: fallback to driver reset compute queue directly (v2) · 072b4414

Prike Liang authored Jun 14, 2024

Since the MES FW resets kernel compute queue always failed, this
may caused by the KIQ failed to process unmap KCQ. So, before MES
FW work properly that will fallback to driver executes dequeue and
resets SPI directly. Besides, rework the ring reset function and make
the busy ring type reset in each function respectively.
Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

072b4414

drm/amd/display: 3.2.299 · f2ea269b

Aric Cyr authored Aug 25, 2024

This version brings along the following:

- DCN35 fixes
- DML2 fixes
- IPS fixes
- ODM fixes
- Miscellaneous cleanups
- MST fixes
- SPL fixes
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

f2ea269b

drm/amd/display: Fix flickering caused by dccg · 98887737

Hansen Dsouza authored Aug 14, 2024

Always allow un-gating. Follow legacy workaround for repeated
dppclk dto updates
Reviewed-by: Muhammad Ahmed <ahmed.ahmed@amd.com>
Signed-off-by: Hansen Dsouza <Hansen.Dsouza@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

98887737

drm/amd/display: Block timing sync for different signals in PMO · 29d3d6af

Dillon Varone authored Aug 22, 2024

PMO assumes that like timings can be synchronized, but DC only allows
this if the signal types match.
Reviewed-by: Austin Zheng <austin.zheng@amd.com>
Signed-off-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

29d3d6af

drm/amd/display: fix graphics hang in multi-display mst case · fc5da5c0

Gabe Teeger authored Aug 23, 2024

[what]
Graphics hang observed with 3 displays connected to DP2.0 mst dock.

[why]
There's a mismatch in dml and dc between the assignments of hpo link
encoders.

[how]
Add a new array in dml that tracks the current mapping of HPO stream
encoders to HPO link encoders in dc.
Reviewed-by: Sung joon Kim <sungjoon.kim@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Gabe Teeger <Gabe.Teeger@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

fc5da5c0

drm/amd/display: Add sharpness control interface · efaf1575

Relja Vojvodic authored Aug 21, 2024

- Add interface for controlling shapness level input into DCN.
- Update SPL to support custom sharpness values.
- Add support for different sharpness values depending on YUV/RGB
  content.
Reviewed-by: Samson Tam <samson.tam@amd.com>
Signed-off-by: Relja Vojvodic <Relja.Vojvodic@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

efaf1575

Revert "drm/amd/display: Wait for all pending cleared before full update" · 6e841094

Dillon Varone authored Aug 20, 2024

This reverts commit f0b7dcf2.

It is causing graphics hangs.
Reviewed-by: Martin Leung <martin.leung@amd.com>
Signed-off-by: Dillon Varone <Dillon.Varone@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

6e841094

drm/amd/display: disable sharpness if HDR Multiplier is too large · 8a060e9c

Samson Tam authored Aug 21, 2024

[Why]
Certain profiles have higher HDR multiplier than SDR boost max which
is not currently supported

[How]
Disable sharpness for these profiles

Fixes: 1b0ce903 ("drm/amd/display: add improvements for text display and HDR DWM and MPO")
Reviewed-by: Martin Leung <martin.leung@amd.com>
Signed-off-by: Samson Tam <Samson.Tam@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

8a060e9c

drm/amd/display: Add dpia debug option to control power management · c24538c4

Meenakshikumar Somasundaram authored Aug 20, 2024

[Why]
To provide option to dpia control power management

[How]
By adding disable_usb4_pm_support bit field in dpia_debug option to
control dpia power management
Reviewed-by: Jun Lei <jun.lei@amd.com>
Signed-off-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

c24538c4

drm/amdgpu/gfx11: add ring reset callbacks · b3e9bfd8

Alex Deucher authored May 24, 2024

Add ring reset callbacks for gfx and compute.
Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

b3e9bfd8

drm/amd/display: re-enable Dynamic ODM policy · 0ba3cb8e

Samson Tam authored Aug 21, 2024

[Why]
Previous disable ODM policy due to underflow issue with sharpener.
Issue is resolved after updating sharpening policy to apply to
both windowed and fullscreen video

[How]
Remove sharpness check disabling Dynamic ODM policy
Reviewed-by: Martin Leung <martin.leung@amd.com>
Signed-off-by: Samson Tam <Samson.Tam@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

0ba3cb8e

drm/amd/display: Lock DC and exit IPS when changing backlight · 988fe286

Leo Li authored Aug 20, 2024

Backlight updates require aux and/or register access. Therefore, driver
needs to disallow IPS beforehand.

So, acquire the dc lock before calling into dc to update backlight - we
should be doing this regardless of IPS. Then, while the lock is held,
disallow IPS before calling into dc, then allow IPS afterwards (if it
was previously allowed).
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Roman Li <roman.li@amd.com>
Signed-off-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

988fe286

drm/amd/display: only trigger BIOS related assert for older ASICs · c66db9e9

Daniel Sa authored Aug 20, 2024

[Why]
Some asserts are always hit on startup/Pnp when they should only be used
to indicate when something has gone wrong.

[How]
Ignore result of getting function from bios cmd table for newer asics.
Reviewed-by: Jun Lei <jun.lei@amd.com>
Signed-off-by: Daniel Sa <Daniel.Sa@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

c66db9e9

drm/amd/display: Fix DCN35 set min dispclk logic · 6f4835f9

Nicholas Susanto authored Aug 20, 2024

[Why]

Setting min dispclk to 50Mhz outside clock lowering function causes
unnecessary calls to SMU to lower dispclk and causes dentist hangs when
there is no stream on the pipes.

[How]

Move the set minimum dispclk logic inside the lowering dispclk if
statement.

Fixes: 23444132 ("DCN35 set min dispclk to 50Mhz")
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Nicholas Susanto <Nicholas.Susanto@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

6f4835f9

drm/amdgpu/gfx9.4.3: Implement compute pipe reset · ad17b124

Prike Liang authored Aug 29, 2024

Implement the compute pipe reset, and the driver will
fallback to pipe reset when queue reset fails.
The pipe reset only deactivates the queue which is
scheduled in the pipe, and meanwhile the MEC pipe
will be reset to the firmware _start pointer. So,
it seems pipe reset will cost more cycles than the
queue reset; therefore, the driver tries to recover
by doing queue reset first.
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

ad17b124

29 Aug, 2024 13 commits

drm/amdgpu: always allocate cleared VRAM for GEM allocations · 6c0a7c3c

Alex Deucher authored Mar 26, 2024

This adds allocation latency, but aligns better with user
expectations.  The latency should improve with the drm buddy
clearing patches that Arun has been working on.

In addition this fixes the high CPU spikes seen when doing
wipe on release.

v2: always set AMDGPU_GEM_CREATE_VRAM_CLEARED (Christian)

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3528
Fixes: a68c7eaa ("drm/amdgpu: Enable clear page functionality")
Acked-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com> (v1)
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Cc: Christian König <christian.koenig@amd.com>

6c0a7c3c

drm/amdgpu/mes: add mes mapping legacy queue switch · 52491d97

Jack Xiao authored Aug 22, 2024

For mes11 old firmware has issue to map legacy queue,
add a flag to switch mes to map legacy queue.

Fixes: f9d8c5c7 ("drm/amdgpu/gfx: enable mes to map legacy queue support")
Reported-by: Andrew Worsley <amworsley@gmail.com>
Link: https://lists.freedesktop.org/archives/amd-gfx/2024-August/112773.htmlSigned-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

52491d97

drm/amdkfd: Don't drain ih1 for APU · 96316211

Yifan Zhang authored Aug 27, 2024

ih1 is not initialized for APUs. Don't drain it or NULL pointer
error will be triggered.

Fixes: 6ef29715 ("drm/amdkfd: Change kfd/svm page fault drain handling")
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

96316211

drm/amdgpu/gfx12: return early in preempt_ib() · 1125f95c

Alex Deucher authored Aug 15, 2024

When MES is enabled KIQ is not available.  Return an error
when someone uses the debugfs preempt test interface in
that case.
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1125f95c

drm/amdgpu/gfx11: return early in preempt_ib() · 1e487c91

Alex Deucher authored Aug 14, 2024

When MES is enabled KIQ is not available.  Return an error
when someone uses the debugfs preempt test interface in
that case.
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1e487c91

drm/amd/display: Determine IPS mode by ASIC and PMFW versions · 28d43d08

Leo Li authored Aug 27, 2024

[Why]

DCN IPS interoperates with other system idle power features, such as
Zstates.

On DCN35, there is a known issue where system Z8 + DCN IPS2 causes a
hard hang. We observe this on systems where the SBIOS allows Z8.

Though there is a SBIOS fix, there's no guarantee that users will get it
any time soon, or even install it. A workaround is needed to prevent
this from rearing its head in the wild.

[How]

For DCN35, check the pmfw version to determine whether the SBIOS has the
fix. If not, set IPS1+RCG as the deepest possible state in all cases
except for s0ix and display off (DPMS). Otherwise, enable all IPS
Signed-off-by: Leo Li <sunpeng.li@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

28d43d08

drm/amdgpu: Move the dumping log out of for loop · 30e8f4c2

Sunil Khatri authored Aug 28, 2024

log message "Dumping IP State Completed" needs to
be logged only once when state dumping is complete.

Hence moving it out of the for loop.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Trigger Huang <Trigger.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

30e8f4c2

drm/amd/amdgpu: move drain_workqueue before shutdown is set · af76ca8e

Victor Zhao authored Aug 26, 2024

[background] when unloading amdgpu driver right after running a
workload, drain_workqueue is causing "Fence fallback timer
expired on ring sdma0.0". Under sriov, this issue will cause sriov
full access timeout and a reset happening.

move drain_workqueue before shutdown is set to allow ih process and
before enter full access under sriov to avoid full access time cost.
Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

af76ca8e

drm/amdgpu: Do core dump immediately when job tmo · c67db6a6

Trigger Huang authored Aug 19, 2024

Do the coredump immediately after a job timeout to get a closer
representation of GPU's error status.

V2: This will skip printing vram_lost as the GPU reset is not
happened yet (Alex)

V3: Unconditionally call the core dump as we care about all the reset
functions(soft-recovery and queue reset and full adapter reset, Alex)

V4: Do the dump after adev->job_hang = true (Sunil)
Signed-off-by: Trigger Huang <Trigger.Huang@amd.com>
Acked-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

c67db6a6

drm/amdgpu: skip printing vram_lost if needed · 6122f5c7

Trigger Huang authored Aug 19, 2024

The vm lost status can only be obtained after a GPU reset occurs, but
sometimes a dev core dump can be happened before GPU reset. So a new
argument is added to tell the dev core dump implementation whether to
skip printing the vram_lost status in the dump.
And this patch is also trying to decouple the core dump function from
the GPU reset function, by replacing the argument amdgpu_reset_context
with amdgpu_job to specify the context for core dump.

V2: Inform user if VRAM lost check is skipped so users don't assume
VRAM wasn't lost (Alex)
Signed-off-by: Trigger Huang <Trigger.Huang@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

6122f5c7

drm/amdgpu/gfx9: put queue resets behind a debug option · 7c1a2d8a

Alex Deucher authored Aug 20, 2024

Pending extended validation.
Reviewed-and-tested-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

7c1a2d8a

drm/amdgpu: add experimental resets debug flag · a9b67c03

Alex Deucher authored Aug 20, 2024

Add this flag to enable experimental resets for testing before they
are fully validated.
Reviewed-and-tested-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

a9b67c03

drm/amdgpu/display: Fix a mistake in revert commit · 7745a1de

Fangzhi Zuo authored Aug 27, 2024

[why]
It is to fix in try_disable_dsc() due to misrevert of
commit 338567d1 ("drm/amd/display: Fix MST BW calculation Regression")

[How]
Fix restoring minimum compression bw by 'max_kbps', instead of native bw 'stream_kbps'
Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

7745a1de

27 Aug, 2024 1 commit

drm/amdgpu/swsmu: always force a state reprogram on init · c50fe289

Alex Deucher authored Aug 22, 2024

Always reprogram the hardware state on init. This ensures
the PMFW state is explicitly programmed and we are not relying
on the default PMFW state.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3131Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

c50fe289