Commits · ec9e2e7acc6dabb8f00c2c60785931310caaa883 · Kirill Smelkov / linux

21 Aug, 2024 25 commits

drm/amd/display: Fix construct_phy with MXM connector · ec9e2e7a

Ilya Bakoulin authored Aug 15, 2024

[Why/How]
The call to construct_phy will fail in cases where connector type is
MXM, and the dc_link won't be properly created/initialized.
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Ilya Bakoulin <Ilya.Bakoulin@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

ec9e2e7a

drm/amd/display: Support UHBR10 link rate on eDP · f3271893

Sung Joon Kim authored Aug 15, 2024

[why]
Supporting UHBR10 link rate on eDP leverages
the existing DP2.0 code but need to add some small
adjustments in code.

[how]
Acknowledge the given DPCD caps for UHBR10
link rate support and allow DP2.0 programming
sequence and link training for eDP.
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Sung Joon Kim <Sungjoon.Kim@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

f3271893

drm/amd/display: Hardware cursor changes color when switched to software cursor · 272e6aab

Nevenko Stupar authored Aug 15, 2024

[Why & How]
DCN4 Cursor has separate degamma block and should always
do Cursor degamma for Cursor color modes.
Reviewed-by: Chris Park <chris.park@amd.com>
Signed-off-by: Nevenko Stupar <Nevenko.Stupar@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

272e6aab

drm/amd/display: Allow UHBR Interop With eDP Supported Link Rates Table · 4e9e50b6

Michael Strauss authored Aug 15, 2024

[WHY]
eDP 2.0 is introducing support for UHBR link rates, however current eDP ILR
link optimization does not account for UHBR capabilities.
Either UHBR capabilities will be provided via the same 128b/132b rate DPCD caps
that are currently used on DP2.1, or Table 4-13 may be updated to include UHBR
rates.

[HOW]
Add extra Supported Link Rates table translations for UHBR10/13.5/20.
Update eDP link setting optimization search to be aware of 128b/132b DPCD
rate caps in order to unblock UHBR on panels with Supported Link Rates table.
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Michael Strauss <michael.strauss@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

4e9e50b6

drm/amd/display: Remove redundant check in DCN35 hwseq · 7c9cb6d1

Nicholas Susanto authored Aug 15, 2024

Removing redundant condition.
Reviewed-by: Hansen Dsouza <Hansen.Dsouza@amd.com>
Signed-off-by: Nicholas Susanto <Nicholas.Susanto@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

7c9cb6d1

drm/amd/display: remove an extraneous call for checking dchub clock · 8783a184

Aurabindo Pillai authored Aug 15, 2024

when removing the amdgpu module and reinserting it, a call trace is
triggered:

[  334.230602] RIP: 0010:hubbub2_get_dchub_ref_freq+0xbb/0xe0 [amdgpu]
[  334.230807] Code: 25 28 00 00 00 75 3c 48 8d 65 f0 5b 41 5c 5d 31 c0 31 d2 31 c9 31 f6 31 ff 45 31 c0 45 31 c9 45 31 d2 45 31 db e9 55 a1 ca de <0f> 0b eb c6 0f 0b eb c2 d1 eb 8d 83 c0 63 ff ff 3d 20 4e 00 00 76
[  334.230809] RSP: 0018:ffffbc8b823fb540 EFLAGS: 00010246
[  334.230811] RAX: 0000000000001000 RBX: 00000000000186a0 RCX: 0000000000000000
[  334.230812] RDX: ffffbc8b823fb544 RSI: 0000000000000000 RDI: 0000000000000000
[  334.230813] RBP: ffffbc8b823fb560 R08: 0000000000000000 R09: 0000000000000000
[  334.230814] R10: 0000000000000000 R11: 000000000000000f R12: ffff9e644f1f2bb0
[  334.230815] R13: ffff9e6451361300 R14: 0000000000000000 R15: ffff9e6452c00000
[  334.230816] FS:  00007af7c8519000(0000) GS:ffff9e737dd00000(0000) knlGS:0000000000000000
[  334.230817] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  334.230818] CR2: 0000703576b9cbd0 CR3: 00000001095a2000 CR4: 0000000000750ee0
[  334.230819] PKRU: 55555554
[  334.230820] Call Trace:
[  334.230822]  <TASK>
[  334.230824]  ? show_regs+0x6d/0x80
[  334.230828]  ? __warn+0x89/0x160
[  334.230832]  ? hubbub2_get_dchub_ref_freq+0xbb/0xe0 [amdgpu]
[  334.231024]  ? report_bug+0x17e/0x1b0
[  334.231028]  ? handle_bug+0x46/0x90
[  334.231030]  ? exc_invalid_op+0x18/0x80
[  334.231032]  ? asm_exc_invalid_op+0x1b/0x20
[  334.231036]  ? hubbub2_get_dchub_ref_freq+0xbb/0xe0 [amdgpu]
[  334.231217]  dc_create_resource_pool+0xfd/0x320 [amdgpu]
[  334.231408]  dc_create+0x256/0x700 [amdgpu]
[  334.231588]  ? srso_alias_return_thunk+0x5/0x7f
[  334.231590]  ? dmi_matches+0xa0/0x230
[  334.231594]  amdgpu_dm_init+0x28c/0x25f0 [amdgpu]
[  334.231791]  ? prb_read_valid+0x1c/0x30
[  334.231795]  ? __irq_work_queue_local+0x43/0xf0
[  334.231798]  ? srso_alias_return_thunk+0x5/0x7f
[  334.231800]  ? irq_work_queue+0x2f/0x70
[  334.231802]  ? srso_alias_return_thunk+0x5/0x7f
[  334.231803]  ? __wake_up_klogd.part.0+0x40/0x70
[  334.231805]  ? srso_alias_return_thunk+0x5/0x7f
[  334.231807]  ? vprintk_emit+0xd9/0x210
[  334.231809]  ? set_dev_info+0x130/0x1c0
[  334.231812]  ? srso_alias_return_thunk+0x5/0x7f
[  334.231813]  ? dev_printk_emit+0xa1/0xe0
[  334.231819]  dm_hw_init+0x14/0x30 [amdgpu]
[  334.231993]  amdgpu_device_init+0x23c7/0x2fc0 [amdgpu]
[  334.232134]  ? pci_read_config_word+0x25/0x50
[  334.232139]  amdgpu_driver_load_kms+0x1a/0xd0 [amdgpu]
[  334.232284]  amdgpu_pci_probe+0x1f9/0x620 [amdgpu]

On DCN401, get_dchub_ref_freq() hook is called before init_hw() hook.
Hence, it is expected to trigger an assert. Remove the extraneous call
to get_dchub_ref_freq() to suppress the call trace
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

8783a184

drm/amd/display: Update HPO I/O When Handling Link Retrain Automation Request · 9de60462

Michael Strauss authored Aug 15, 2024

[WHY]
Previous multi-display HPO fix moved where HPO I/O enable/disable is performed.
The codepath now taken to enable/disable HPO I/O is not used for compliance
test automation, meaning that if a compliance box being driven at a DP1 rate
requests retrain at UHBR, HPO I/O will remain off if it was previously off.

[HOW]
Explicitly update HPO I/O after allocating encoders for test request.
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Michael Strauss <michael.strauss@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

9de60462

Revert "drm/amd/display: Update to using new dccg callbacks" · 18ac82c2

Hansen Dsouza authored Aug 15, 2024

[Why]
Revert updated DCCG wrappers due to regression

[How]
This reverts commit 680458d4.
Reviewed-by: Chris Park <chris.park@amd.com>
Signed-off-by: Hansen Dsouza <Hansen.Dsouza@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

18ac82c2

drm/amdgpu: Validate TA binary size · c0a04e35

Candice Li authored Aug 15, 2024

Add TA binary size validation to avoid OOB write.
Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

c0a04e35

drm/amdkfd: Update BadOpcode Interrupt handling with MES · eb067d65

Mukul Joshi authored Aug 12, 2024

Based on the recommendation of MEC FW, update BadOpcode interrupt
handling by unmapping all queues, removing the queue that got the
interrupt from scheduling and remapping rest of the queues back when
using MES scheduler. This is done to prevent the case where unmapping
of the bad queue can fail thereby causing a GPU reset.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Acked-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

eb067d65

drm/amdkfd: Update queue unmap after VM fault with MES · 9a16042f

Mukul Joshi authored Jun 03, 2024

MEC FW expects MES to unmap all queues when a VM fault is observed
on a queue and then resumed once the affected process is terminated.
Use the MES Suspend and Resume APIs to achieve this.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

9a16042f

drm/amdgpu: Implement MES Suspend and Resume APIs for GFX11 · ccf8ef6b

Mukul Joshi authored Jun 03, 2024

Add implementation for MES Suspend and Resume APIs to unmap/map
all queues for GFX11. Support for GFX12 will be added when the
corresponding firmware support is in place.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

ccf8ef6b

drm/amdkfd: Enable processes isolation on gfx9 · 87758a0e

Amber Lin authored Apr 29, 2024

When amdgpu enable enforce_isolation, KFD enables single-process mode in
HWS and sets exec_cleaner_shader bit in MAP_PROCESS.
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

87758a0e

drm/amdgpu/gfx_v9_4_3: Apply Isolation Enforcement to GFX & Compute rings · f846250b

Srinivasan Shanmugam authored May 14, 2024

This commit applies isolation enforcement to the GFX and Compute rings
in the gfx_v9_4_3 module.

The commit sets `amdgpu_gfx_enforce_isolation_ring_begin_use` and
`amdgpu_gfx_enforce_isolation_ring_end_use` as the functions to be
called when a ring begins and ends its use, respectively.

`amdgpu_gfx_enforce_isolation_ring_begin_use` is called when a ring
begins its use. This function cancels any scheduled
`enforce_isolation_work` and, if necessary, signals the Kernel Fusion
Driver (KFD) to stop the runqueue.

`amdgpu_gfx_enforce_isolation_ring_end_use` is called when a ring ends
its use. This function schedules `enforce_isolation_work` to be run
after a delay.

These functions are part of the Enforce Isolation Handler, which
enforces shader isolation on AMD GPUs to prevent data leakage between
different processes.

The commit also includes a check for the type of the ring. If the type
of the ring is `AMDGPU_RING_TYPE_COMPUTE`, the `xcp_id` of the
`enforce_isolation` structure in the `gfx` structure of the
`amdgpu_device` is set to the `xcp_id` of the ring. This ensures that
the correct `xcp_id` is used when enforcing isolation on compute rings.
The `xcp_id` is an identifier for an XCP partition, and different rings
can be associated with different XCP partitions.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>

f846250b

drm/amdgpu/gfx9: Apply Isolation Enforcement to GFX & Compute rings · b710dbe5

Srinivasan Shanmugam authored Jul 18, 2024

This commit applies isolation enforcement to the GFX and Compute rings
in the gfx_v9_0 module.

The commit sets `amdgpu_gfx_enforce_isolation_ring_begin_use` and
`amdgpu_gfx_enforce_isolation_ring_end_use` as the functions to be
called when a ring begins and ends its use, respectively.

`amdgpu_gfx_enforce_isolation_ring_begin_use` is called when a ring
begins its use. This function cancels any scheduled
`enforce_isolation_work` and, if necessary, signals the Kernel Fusion
Driver (KFD) to stop the runqueue.

`amdgpu_gfx_enforce_isolation_ring_end_use` is called when a ring ends
its use. This function schedules `enforce_isolation_work` to be run
after a delay.

These functions are part of the Enforce Isolation Handler, which
enforces shader isolation on AMD GPUs to prevent data leakage between
different processes.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>

b710dbe5

drm/amdgpu: Implement Enforce Isolation Handler for KGD/KFD serialization · afefd6f2

Srinivasan Shanmugam authored Jun 06, 2024

This commit introduces the Enforce Isolation Handler designed to enforce
shader isolation on AMD GPUs, which helps to prevent data leakage
between different processes.

The handler counts the number of emitted fences for each GFX and compute
ring. If there are any fences, it schedules the `enforce_isolation_work`
to be run after a delay of `GFX_SLICE_PERIOD`. If there are no fences,
it signals the Kernel Fusion Driver (KFD) to resume the runqueue.

The function is synchronized using the `enforce_isolation_mutex`.

This commit also introduces a reference count mechanism
(kfd_sch_req_count) to keep track of the number of requests to enable
the KFD scheduler. When a request to enable the KFD scheduler is made,
the reference count is decremented. When the reference count reaches
zero, a delayed work is scheduled to enforce isolation after a delay of
GFX_SLICE_PERIOD.

When a request to disable the KFD scheduler is made, the function first
checks if the reference count is zero. If it is, it cancels the delayed
work for enforcing isolation and checks if the KFD scheduler is active.
If the KFD scheduler is active, it sends a request to stop the KFD
scheduler and sets the KFD scheduler state to inactive. Then, it
increments the reference count.

The function is synchronized using the kfd_sch_mutex to ensure that the
KFD scheduler state and reference count are updated atomically.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>

afefd6f2

drm/amdkfd: APIs to stop/start KFD scheduling · 234eebe1

Amber Lin authored Jul 29, 2024

Provide amdgpu_amdkfd_stop_sched() for amdgpu to stop KFD scheduling
compute work on HIQ. amdgpu_amdkfd_start_sched() resumes the scheduling.
When amdgpu_amdkfd_stop_sched is called, KFD will unmap queues from
runlist. If users send ioctls to KFD to create queues, they'll be added
but those queues won't be mapped to runlist (so not scheduled) until
amdgpu_amdkfd_start_sched is called.

v2: fix build (Alex)
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

234eebe1

drm/amdgpu/gfx9: Add cleaner shader support for GFX9.4.4 hardware · b1f49ff9

Srinivasan Shanmugam authored Jul 29, 2024

This commit extends the cleaner shader feature to support GFX9.4.4
hardware.

The cleaner shader feature is used to clear or initialize certain GPU
resources, such as Local Data Share (LDS), Vector General Purpose
Registers (VGPRs), and Scalar General Purpose Registers (SGPRs). This
operation needs to be performed in isolation, while no other tasks
should be running on the GPU at the same time.

Previously, the cleaner shader feature was implemented for GFX9.4.3
hardware. This commit adds support for GFX9.4.4 hardware by allowing the
cleaner shader to be used with this hardware version.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

b1f49ff9

drm/amdgpu/gfx9: Add cleaner shader for GFX9.4.3 · 33528831

Srinivasan Shanmugam authored Jul 29, 2024

This commit adds the cleaner shader microcode for GFX9.4.3 GPUs. The
cleaner shader is a piece of GPU code that is used to clear or
initialize certain GPU resources, such as Local Data Share (LDS), Vector
General Purpose Registers (VGPRs), and Scalar General Purpose Registers
(SGPRs).

Clearing these resources is important for ensuring data isolation
between different workloads running on the GPU. Without the cleaner
shader, residual data from a previous workload could potentially be
accessed by a subsequent workload, leading to data leaks and incorrect
computation results.

The cleaner shader microcode is represented as an array of 32-bit words
(`gfx_9_4_3_cleaner_shader_hex`). This array is the binary
representation of the cleaner shader code, which is written in a
low-level GPU instruction set.

When the cleaner shader feature is enabled, the AMDGPU driver loads this
array into a specific location in the GPU memory. The GPU then reads
this memory location to fetch and execute the cleaner shader
instructions.

The cleaner shader is executed automatically by the GPU at the end of
each workload, before the next workload starts. This ensures that all
GPU resources are in a clean state before the start of each workload.

This addition is part of the cleaner shader feature implementation. The
cleaner shader feature helps improve GPU performance and resource
utilization by cleaning up GPU resources after they are used. It also
enhances security and reliability by preventing data leaks between
workloads.

v2: fix copyright date (Alex)

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

33528831

drm/amdgpu/gfx9: Implement cleaner shader support for GFX9.4.3 hardware · d4c38154

Srinivasan Shanmugam authored Jul 29, 2024

The patch modifies the gfx_v9_4_3_kiq_set_resources function to write
the cleaner shader's memory controller address to the ring buffer. It
also adds a new function, gfx_v9_4_3_ring_emit_cleaner_shader, which
emits the PACKET3_RUN_CLEANER_SHADER packet to the ring buffer.

This patch adds support for the PACKET3_RUN_CLEANER_SHADER packet in the
gfx_v9_4_3 module. This packet is used to emit the cleaner shader, which
is used to clear GPU memory before it's reused, helping to prevent data
leakage between different processes.

Finally, the patch updates the ring function structures to include the
new gfx_v9_4_3_ring_emit_cleaner_shader function. This allows the
cleaner shader to be emitted as part of the ring's operations.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

d4c38154

drm/amdgpu/gfx9: Implement cleaner shader support for GFX9 hardware · c2e70d30

Srinivasan Shanmugam authored Jul 29, 2024

The patch modifies the gfx_v9_0_kiq_set_resources function to write
the cleaner shader's memory controller address to the ring buffer. It
also adds a new function, gfx_v9_0_ring_emit_cleaner_shader, which
emits the PACKET3_RUN_CLEANER_SHADER packet to the ring buffer.

This patch adds support for the PACKET3_RUN_CLEANER_SHADER packet in the
gfx_v9_0 module. This packet is used to emit the cleaner shader, which
is used to clear GPU memory before it's reused, helping to prevent data
leakage between different processes.

Finally, the patch updates the ring function structures to include the
new gfx_v9_0_ring_emit_cleaner_shader function. This allows the
cleaner shader to be emitted as part of the ring's operations.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

c2e70d30

drm/amdgpu: Add PACKET3_RUN_CLEANER_SHADER for cleaner shader execution · 22ff907d

Srinivasan Shanmugam authored Jul 07, 2024

This commit adds the PACKET3_RUN_CLEANER_SHADER definition. This packet
is a command packet used to instruct the GPU to execute the cleaner
shader.

The cleaner shader is a piece of GPU code that is used to clear or
initialize certain GPU resources, such as Local Data Share (LDS), Vector
General Purpose Registers (VGPRs), and Scalar General Purpose Registers
(SGPRs). Clearing these resources is important for ensuring data
isolation between different workloads running on the GPU.

The PACKET3_RUN_CLEANER_SHADER packet is used to trigger the execution
of the cleaner shader on the GPU. The packet consists of a header
followed by a RESERVED field, which is programmed to zero. When the GPU
receives this packet, it fetches and executes the cleaner shader
instructions from the location specified in the packet.

The cleaner shader feature helps to enhances security and reliability by
preventing data leaks between workloads.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

22ff907d

drm/amdgpu: Add sysfs interface for running cleaner shader · d361ad5d

Srinivasan Shanmugam authored May 27, 2024

This patch adds a new sysfs interface for running the cleaner shader on
AMD GPUs. The cleaner shader is used to clear GPU memory before it's
reused, which can help prevent data leakage between different processes.

The new sysfs file is write-only and is named `run_cleaner_shader`.
Write the number of the partition to this file to trigger the cleaner shader
on that partition. There is only one partition on GPUs which do not
support partitioning.

Changes made in this patch:

- Added `amdgpu_set_run_cleaner_shader` function to handle writes to the
  `run_cleaner_shader` sysfs file.
- Added `run_cleaner_shader` to the list of device attributes in
  `amdgpu_device_attrs`.
- Updated `default_attr_update` to handle `run_cleaner_shader`.
- Added `AMDGPU_DEVICE_ATTR_WO` macro to create write-only device
  attributes.

v2: fix error handling (Alex)

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>

d361ad5d

drm/amdgpu: Add enforce_isolation sysfs attribute · e189be9b

Srinivasan Shanmugam authored May 27, 2024

This commit adds a new sysfs attribute 'enforce_isolation' to control
the 'enforce_isolation' setting per GPU. The attribute can be read and
written, and accepts values 0 (disabled) and 1 (enabled).

When 'enforce_isolation' is enabled, reserved VMIDs are allocated for
each ring. When it's disabled, the reserved VMIDs are freed.

The set function locks a mutex before changing the 'enforce_isolation'
flag and the VMIDs, and unlocks it afterwards. This ensures that these
operations are atomic and prevents race conditions and other concurrency
issues.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

e189be9b

drm/amdgpu: Enforce isolation as part of the job · dba1a6cf

Srinivasan Shanmugam authored Mar 20, 2024

This patch adds a new parameter 'enforce_isolation' to the amdgpu_job
structure. This parameter is used to determine whether shader isolation
should be enforced for a job. The enforce_isolation parameter is then
stored in the amdgpu_job structure and used when flushing the VM.

The enforce_isolation field of the amdgpu_job structure is set directly
after the job is allocated

This change allows more fine-grained control over shader isolation,
making it possible to enforce isolation on a per-job basis rather than
globally. This can be useful in scenarios where only certain jobs
require isolation.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>

dba1a6cf

16 Aug, 2024 15 commits

drm/amdgpu: abort KIQ waits when there is a pending reset · 19cff165

Victor Skvortsov authored Aug 02, 2024

Stop waiting for the KIQ to return back when there is a reset pending.
It's quite likely that the KIQ will never response.
Signed-off-by: Koenig Christian <Christian.Koenig@amd.com>
Suggested-by: Lazar Lijo <Lijo.Lazar@amd.com>
Tested-by: Victor Skvortsov <victor.skvortsov@amd.com>
Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

19cff165

drm/amdgpu: Make enforce_isolation setting per GPU · 96595204

Srinivasan Shanmugam authored Jul 29, 2024

This commit makes enforce_isolation setting to be per GPU and per
partition by adding the enforce_isolation array to the adev structure.
The adev variable is set based on the global enforce_isolation module
parameter during device initialization.

In amdgpu_ids.c, the adev->enforce_isolation value for the current GPU
is used to determine whether to enforce isolation between graphics and
compute processes on that GPU.

In amdgpu_ids.c, the adev->enforce_isolation value for the current GPU
and partition is used to determine whether to enforce isolation between
graphics and compute processes on that GPU and partition.

This allows the enforce_isolation setting to be controlled individually
for each GPU and each partition, which is useful in a system with
multiple GPUs and partitions where different isolation settings might be
desired for different GPUs and partitions.

v2: fix loop in amdgpu_vmid_mgr_init() (Alex)

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>

96595204

drm/amdgpu: Emit cleaner shader at end of IB submission · ee7a846e

Alex Deucher authored Mar 12, 2024

This commit introduces the emission of a cleaner shader at the end of
the IB submission process. This is achieved by adding a new function
pointer, `emit_cleaner_shader`, to the `amdgpu_ring_funcs` structure. If
the `emit_cleaner_shader` function is set in the ring functions, it is
called during the VM flush process.

The cleaner shader is only emitted if the `enable_cleaner_shader` flag
is set in the `amdgpu_device` structure. This allows the cleaner shader
emission to be controlled on a per-device basis.

By emitting a cleaner shader at the end of the IB submission, we can
ensure that the VM state is properly cleaned up after each submission.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>

ee7a846e

drm/amdgpu: Add infrastructure for Cleaner Shader feature · aec773a1

Srinivasan Shanmugam authored Jun 06, 2024

The cleaner shader is used by the CP firmware to clean LDS and GPRs
between processes on the CUs.

This adds an internal API for GFX IP code to allocate and initialize the
cleaner shader.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>

aec773a1

drm/amdgpu: handle enforce isolation on non-0 gfxhub · f49280ff

Alex Deucher authored Aug 14, 2024

Some chips have more than one gfxhub so check if we
are a gfxhub rather than just gfxhub 0.
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

f49280ff

drm/amdgpu/sdma5.2: limit wptr workaround to sdma 5.2.1 · 2dc3851e

Alex Deucher authored Aug 14, 2024

The workaround seems to cause stability issues on other
SDMA 5.2.x IPs.

Fixes: a03ebf11 ("drm/amdgpu/sdma5.2: Update wptr registers as well as doorbell")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3556Acked-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

2dc3851e

drm/amdgpu: add vcn ip dump support for vcn_v2_6 · 1a2103d6

Sunil Khatri authored Aug 05, 2024

Add support for logging the registers in devcoredump
buffer for vcn_v2_6.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1a2103d6

drm/amdgpu: add print support for vcn_v2_5 ip dump · bc62abe1

Sunil Khatri authored Aug 05, 2024

Add support for logging the registers in devcoredump
buffer for vcn_v2_5.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

bc62abe1

drm/amdgpu: add vcn_v2_5 ip dump support · 0eea81ee

Sunil Khatri authored Aug 05, 2024

Add support of vcn ip dump in the devcoredump
for vcn_v2_5.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

0eea81ee

drm/amdgpu: add print support for vcn_v2_0 ip dump · b910cacb

Sunil Khatri authored Aug 05, 2024

Add support for logging the registers in devcoredump
buffer for vcn_v2_0.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

b910cacb

drm/amdgpu: add vcn_v2_0 ip dump support · 2239aaa2

Sunil Khatri authored Aug 05, 2024

Add support of vcn ip dump in the devcoredump
for vcn_v2_0.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

2239aaa2

drm/amdgpu: add print support for vcn_v1_0 ip dump · ef9f3b5f

Sunil Khatri authored Aug 05, 2024

Add support for logging the registers in devcoredump
buffer for vcn_v1_0.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

ef9f3b5f

drm/amdgpu: add vcn_v1_0 ip dump support · 837cc7f1

Sunil Khatri authored Aug 05, 2024

Add support of vcn ip dump in the devcoredump
for vcn_v1_0.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

837cc7f1

drm/amdgpu: add print support for vcn_v4_0_5 ip dump · 439c3b12

Sunil Khatri authored Aug 05, 2024

Add support for logging the registers in devcoredump
buffer for vcn_v4_0_5.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

439c3b12

drm/amdgpu: add print support for vcn_v4_0 ip dump · 3a50a51d

Sunil Khatri authored Aug 05, 2024

Add support for logging the registers in devcoredump
buffer for vcn_v4_0.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

3a50a51d