Commits · fe9e5f56feb287b3f14b0a5892061a1da2b89b5b · Kirill Smelkov / linux

09 Jun, 2023 40 commits

drm/amd/pm: Update PMFW headers for version 85.54 · fe9e5f56

Lijo Lazar authored Mar 10, 2023

It adds message support for FW notification on driver unload.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

fe9e5f56

drm/amd/pm: Expose mem temperature for GC version 9.4.3 · bfb4fd20

Asad Kamal authored Mar 08, 2023

Add mem temperature as part of hw mon attributes for GC version 9.4.3
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

bfb4fd20

drm/amd/pm: Update hw mon attributes for GC version 9.4.3 · 8572fa2a

Asad Kamal authored Mar 03, 2023

Update hw mon attributes for GC Version 9.4.3 to valid ones
on APU and Non APU systems

v2: Group checks along existing one
Added power limit & mclock for gc version 9.4.3
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

8572fa2a

drm/amd/pm: Initialize power limit for SMU v13.0.6 · 909ae715

Lijo Lazar authored Feb 27, 2023

PMFW will initialize the power limit values even if PPT throttler
feature is disabled. Fetch the limit value from FW.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

909ae715

drm/amd/pm: Keep interface version in PMFW header · 9661bf68

Lijo Lazar authored Feb 21, 2023

Use the interface version directly from PMFW interface header file rather
than keeping another definition in common smu13 file.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

9661bf68

drm/amd/pm: Add ih for SMU v13.0.6 thermal throttling · 676915e4

Asad kamal authored Feb 15, 2023

Add interrupt handler for thermal throttler events from
PMFW on SMUv13.0.6
Signed-off-by: Asad kamal <asad.kamal@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

676915e4

drm/amd/pm: Update pmfw header files for SMU v13.0.6 · 6d5f5eaf

Asad kamal authored Feb 13, 2023

Update driver interface for SMU v13.0.6 to be
compatible with PMFW v85.48 version
Signed-off-by: Asad kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

6d5f5eaf

drm/amd/pm: Update gfx clock frequency for SMU v13.0.6 · a1b0dafa

Asad kamal authored Feb 08, 2023

Update gfx clock frequency from metric table for SMU v13.0.6
Signed-off-by: Asad kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

a1b0dafa

drm/amd/pm: Update pmfw header files for SMU v13.0.6 · 8d1c1bc1

Asad kamal authored Feb 08, 2023

Update driver metrics table for SMU v13.0.6 to be
compatible with PMFW v85.47 version
Signed-off-by: Asad kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

8d1c1bc1

drm/amdgpu: fix sdma instance · 1ad29cb3

Stanley.Yang authored Mar 22, 2023

It should change logical instance to device instance
to query ras info
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1ad29cb3

drm/amdgpu: change the print level to warn for ip block disabled · 0c451baf

Le Ma authored Mar 16, 2023

Avoid to mislead users as it's not a real error.
Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

0c451baf

drm/amdgpu: Increase Max GPU instance to 64 · 9e4216cf

Mukul Joshi authored May 05, 2023

Increase Max GPU instances to 64 to handle multi-socket
system with GFX 9.4.3 asic.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

9e4216cf

drm/amdgpu: increase AMDGPU_MAX_RINGS · bb0ed57b

Le Ma authored Mar 16, 2023

On newer GPUs, the number of kernel rings are increased.
Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

bb0ed57b

drm/amdgpu: Create VRAM BOs on GTT for GFXIP9.4.3 · 970c1646

Rajneesh Bhardwaj authored Jan 27, 2023

On GFXIP9.4.3 APP APU where there is no dedicated VRAM domain handle
VRAM BO allocation requests on CPU domain and validate them on GTT.

Support for handling multi-socket and multi-numa partitions within a
socket will be added by future patches, this enables 1P NPS1 asic
bringup configuration.
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

970c1646

drm/amdgpu: Implement new dummy vram manager · f431393d

Rajneesh Bhardwaj authored Jan 27, 2023

This adds dummy vram manager to support ASICs that do not have a
dedicated or carvedout vram domain.
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

f431393d

drm/amdgpu: Handle VRAM dependencies on GFXIP9.4.3 · 228ce176

Rajneesh Bhardwaj authored Jan 27, 2023

[For 1P NPS1 mode driver bringup]

Changes required to initialize the amdgpu driver with frontdoor firmware
loading and discovery=2 with the native mode SBIOS that enables CPU GPU
unified interleaved memory.

sudo modprobe amdgpu discovery=2

Once PSP TMR region is reported via the ACPI interface, the dependency
on the ip_discovery.bin will be removed.

Choice of where to allocate driver table is given to each IP version. In
general, both GTT and VRAM domains will be considered. If one of the
tables has a strict restriction for VRAM domain, then only VRAM domain
is considered.
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
(lijo: Modified the handling for SMU Tables)
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

228ce176

drm/amdgpu: Enable CG for IH v4.4.2 · 9faf929f

Asad kamal authored Feb 07, 2023

Enable clock gating on IH v4.4.2 versions.
Signed-off-by: Asad kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

9faf929f

drm/amdgpu: Enable persistent edc harvesting in APP APU · 8107e499

Hawking Zhang authored Jan 29, 2023

Persistent edc harvesting is supported in APP APU
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

8107e499

drm/amdgpu: Initialize mmhub v1_8 ras function · 73c2b3fd

Hawking Zhang authored Jan 22, 2023

Initialize mmhub v1_8 ras function.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

73c2b3fd

drm/amdgpu: Add reset_ras_error_status for mmhub v1_8 · ccfdbd4b

Hawking Zhang authored Jan 22, 2023

Add reset_ras_error_status callback for mmhub
v1_8. It will be used to reset mmhub error status.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

ccfdbd4b

drm/amdgpu: Add query_ras_error_status for mmhub v1_8 · 00c14522

Hawking Zhang authored Jan 22, 2023

Add query_ras_error_status callback for mmhub
v1_8. It will be used to log mmhub error status.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

00c14522

drm/amdgpu: Add reset_ras_error_count for mmhub v1_8 · a0cdb3d0

Hawking Zhang authored Jan 22, 2023

Add reset_ras_error_count callback for mmhub
v1_8. It will be used to reset mmhub ras error
count.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

a0cdb3d0

drm/amdgpu: Add query_ras_error_count for mmhub v1_8 · bc069d82

Hawking Zhang authored Feb 02, 2023

Add query_ras_error_count callback for mmhub v1_8.
It will be used to query and log mmhub error count
and memory block.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

bc069d82

drm/amdgpu: Add mmhub v1_8_0 ras err status registers · 90cbee20

Hawking Zhang authored Dec 28, 2022

add new ras error status registers introduced in
mmhub v1_8_0 to log mmea and mm_cane ras err, including
MMEAx_UE|CE_ERR_STATUS_LO|HI
MM_CANE_UE|CE_ERR_STATUS_LO|HI
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

90cbee20

drm/amdgpu: Initialize sdma v4_4_2 ras function · 1e69fde7

Hawking Zhang authored Jan 22, 2023

Initialize sdma v4_4_2 ras function and interrupt
handler.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1e69fde7

drm/amdgpu: Add reset_ras_error_count for sdma v4_4_2 · a64b1552

Hawking Zhang authored Jan 22, 2023

Add reset_ras_error_count callback for sdma
v4_4_2. It will be used to reset sdma ras error
count.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

a64b1552

drm/amdgpu: Add query_ras_error_count for sdma v4_4_2 · dc37a919

Hawking Zhang authored Feb 05, 2023

Add query_ras_error_count callback for sdma
v4_4_2. It will be used to query and log sdma
uncorrectable error count and memory block.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

dc37a919

drm/amdgpu: Add sdma v4_4_2 ras registers · d90d90a1

Hawking Zhang authored Dec 23, 2022

SDMA_UE_ERR_STATUS_HI|LO are introduced in v4_4_2
to replace SDMA_EDC_COUNTER/COUNTER2 registers to
log SDMA RAS errors
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

d90d90a1

drm/amdgpu: Add common helper to reset ras error · e53a3250

Hawking Zhang authored Feb 03, 2023

Add common helper to reset ras error status. It
applies to IP blocks that follow the new ras error
logging register design, and need to write 0 to
reset the error status. For IP blocks that don't
support the new design, please still implement ip
specific helper.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

e53a3250

drm/amdgpu: Add common helper to query ras error (v2) · 322a7e00

Hawking Zhang authored Feb 02, 2023

Add common helper to query ras error status and
log error information, including memory block id
and erorr count. The helpers are applicable to IP
blocks that follow the new ras error logging design.
For IP blocks that don't support the new design,
please still implement ip specific helper to query
ras error.

v2: optimize struct amdgpu_ras_err_status_reg_entry
and the implementaion in helper (Lijo/Tao)
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

322a7e00

drm/amdgpu: Enable MGCG on SDMAv4.4.2 · cbf9e46a

Lijo Lazar authored Feb 03, 2023

Enable clock gating on SDMAv4.4.2 versions. Leave memory light sleep to
default.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

cbf9e46a

drm/amdgpu: enable context empty interrupt on sdma v4.4.2 · 35ff4301

Le Ma authored Feb 03, 2023

With SDMA_CTNL.CTXEMPTY_INT_ENABLE set, the F32 clock can be gated when
SDMA finishes all job and goes to idle.

And no specific interrupt handling is required in driver.
Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

35ff4301

drm/amdgpu: add vcn_4_0_3 codec query · 7b08b2e1

Sonny Jiang authored Jan 31, 2023

Add support for vcn_4_0_3 video codec query
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

7b08b2e1

drm/amdkfd: bind cpu and hiveless gpu to a hive if xgmi connected · 1698e200

Jonathan Kim authored Feb 02, 2023

If a CPU and GPU are xGMI connected but the GPU is hiveless with
respect to other GPUs, create a new CPU-GPU hive using the GPU's PCI
device location ID as the new hive ID to maintain fine grain memory
access usage.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1698e200

drm/amdkfd: Cleanup KFD nodes creation · 8c45a834

Philip Yang authored Jan 24, 2023

kfd node allocation outside kfd->num_nodes loop is not needed and causes
memory leak because kfd->num_nodes is at least equal to 1.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

8c45a834

drm/ttm: add NUMA node id to the pool · 4482d3c9

Rajneesh Bhardwaj authored Oct 12, 2022

This allows backing ttm_tt structure with pages from different NUMA
pools.
Tested-by: Graham Sider <graham.sider@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

4482d3c9

drm/amdgpu: Fix mqd init on GFX v9.4.3 · c1d3f627

Lijo Lazar authored Jan 20, 2023

For MQD init, an XCC's queue is selected with GRBM select. However, for
initialization of MQD, values read from logical XCC0 registers are used.
This results in garbage values being read from XCC0 whose queue is not
selected. Change to read from the right XCC for MQD initialization.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

c1d3f627

drm/amd: fix compiler error to support older compilers · 5ca1ceeb

Harish Kasiviswanathan authored Jan 21, 2023

‘for’ loop initial declarations are only allowed in C99 or C11 mode
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

5ca1ceeb

drm/amdgpu: Enable CGCG/LS for GC 9.4.3 · b7c7011e

Lijo Lazar authored Jan 19, 2023

Enable coarse grain clockgating/light sleep for GC v9.4.3. Remove
programming that is not meant for GC 9.4.3.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

b7c7011e

drm/amdgpu: Use unique doorbell range per xcc · 233bb373

Lijo Lazar authored Jan 19, 2023

Program different ranges in each XCC with MEC_DOORBELL_RANGE_LOWER/HIGHER.
Keeping the same range causes CPF in other XCCs also to be busy when an IB
packet is submitted to KCQ. Only the XCC which processes the packet
comes back to idle afterwards and this causes other CPs not be idle.
This in turn affects clockgating behavior as RLC doesn't get idle
interrupt.

LOWER/HIGHER covers only KIQ/KCQs which are per XCC queues. Assigning
different ranges doesn't seem to have any side effect as user queue ranges
are outside of this range. User queue tests - PM4 through KFD and AQL
through rocr - have the same results after this change.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

233bb373