Commits · 1698e200e88db96aef7d16aa3d63df68a209ffbd · Kirill Smelkov / linux

09 Jun, 2023 40 commits

drm/amdkfd: bind cpu and hiveless gpu to a hive if xgmi connected · 1698e200

Jonathan Kim authored Feb 02, 2023

If a CPU and GPU are xGMI connected but the GPU is hiveless with
respect to other GPUs, create a new CPU-GPU hive using the GPU's PCI
device location ID as the new hive ID to maintain fine grain memory
access usage.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1698e200

drm/amdkfd: Cleanup KFD nodes creation · 8c45a834

Philip Yang authored Jan 24, 2023

kfd node allocation outside kfd->num_nodes loop is not needed and causes
memory leak because kfd->num_nodes is at least equal to 1.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

8c45a834

drm/ttm: add NUMA node id to the pool · 4482d3c9

Rajneesh Bhardwaj authored Oct 12, 2022

This allows backing ttm_tt structure with pages from different NUMA
pools.
Tested-by: Graham Sider <graham.sider@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

4482d3c9

drm/amdgpu: Fix mqd init on GFX v9.4.3 · c1d3f627

Lijo Lazar authored Jan 20, 2023

For MQD init, an XCC's queue is selected with GRBM select. However, for
initialization of MQD, values read from logical XCC0 registers are used.
This results in garbage values being read from XCC0 whose queue is not
selected. Change to read from the right XCC for MQD initialization.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

c1d3f627

drm/amd: fix compiler error to support older compilers · 5ca1ceeb

Harish Kasiviswanathan authored Jan 21, 2023

‘for’ loop initial declarations are only allowed in C99 or C11 mode
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

5ca1ceeb

drm/amdgpu: Enable CGCG/LS for GC 9.4.3 · b7c7011e

Lijo Lazar authored Jan 19, 2023

Enable coarse grain clockgating/light sleep for GC v9.4.3. Remove
programming that is not meant for GC 9.4.3.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

b7c7011e

drm/amdgpu: Use unique doorbell range per xcc · 233bb373

Lijo Lazar authored Jan 19, 2023

Program different ranges in each XCC with MEC_DOORBELL_RANGE_LOWER/HIGHER.
Keeping the same range causes CPF in other XCCs also to be busy when an IB
packet is submitted to KCQ. Only the XCC which processes the packet
comes back to idle afterwards and this causes other CPs not be idle.
This in turn affects clockgating behavior as RLC doesn't get idle
interrupt.

LOWER/HIGHER covers only KIQ/KCQs which are per XCC queues. Assigning
different ranges doesn't seem to have any side effect as user queue ranges
are outside of this range. User queue tests - PM4 through KFD and AQL
through rocr - have the same results after this change.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

233bb373

drm/amdgpu: Keep SDMAv4.4.2 active during reset · 7389c751

Lijo Lazar authored Jan 17, 2023

During ASIC wide reset, SDMA shouldn't be clockgated and be ready to
accept freeze requests from PMFW. For that, don't stop SDMA engine
during reset and keep the clocks active.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

7389c751

drm/amdkfd: Report XGMI IOLINKs for GFXIP9.4.3 · b2ef2fdf

Rajneesh Bhardwaj authored Jan 05, 2023

GFXIP 9.4.3 could be in APU or carveout mode but we cannot use the
xgmi.connected_to_cpu flag to identify the iolinks type. Use appropriate
APU or Carveout mode based condition to report xgmi connection in kfd
topology.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

b2ef2fdf

drm/amdgpu: add num_xcps return · 13a94f3f

James Zhu authored Jan 10, 2023

Add num_xcps return.
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

13a94f3f

drm/amdgpu: increase AMDGPU_MAX_HWIP_RINGS · 1bd99ca2

James Zhu authored Jan 10, 2023

[WA] Increase AMDGPU_MAX_HWIP_RINGS to 64 to support more compute
ring resource. Later need redesign with queue/prirority/scheduler
factors to reduce AMDGPU_MAX_HWIP_RINGS.
Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1bd99ca2

drm/amdgpu: vcn_v4_0_3 load vcn fw once for all AIDs · f471de25

James Zhu authored Dec 19, 2022

Signed-off-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

f471de25

drm/amdgpu: Populate VCN/JPEG harvest information · 52c293ab

Lijo Lazar authored Jan 10, 2023

Certain instances of VCN/JPEG IPs may not be usable. Fetch the information
from harvest table.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

52c293ab

drm/amdgpu: Correct dGPU MTYPE settings for gfx943 · d839a158

Graham Sider authored Jan 05, 2023

Revert temporary dGPU VRAM MTYPE setting and align with expected
coherency protocol.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

d839a158

drm/amdgpu: Remove SMU powergate message call for SDMA · 30b52995

Asad kamal authored Jan 03, 2023

SDMA v4.4.2 doesn't need explicit power gating control through PMFW
Signed-off-by: Asad kamal <asad.kamal@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

30b52995

drm/amdgpu: enable vcn/jpeg on vcn_v4_0_3 · ed1f42f0

James Zhu authored Dec 17, 2022

Enable vcn/jpeg on vcn_v4_0_3.
Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

ed1f42f0

drm/amdgpu: enable indirect_sram mode on vcn_v4_0_3 · e40b4b9a

James Zhu authored Dec 12, 2022

Enable indirect_sram mode on vcn_v4_0_3.
Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

e40b4b9a

drm/amdgpu: add unified queue support on vcn_v4_0_3 · da044aae

James Zhu authored Dec 17, 2022

Add unified queue support on vcn_v4_0_3.
Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

da044aae

drm/amdgpu: add fwlog support on vcn_v4_0_3 · 2d7f1d51

James Zhu authored Dec 12, 2022

Add fwlog support on vcn_v4_0_3.
Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

2d7f1d51

drm/amdgpu: increase MAX setting to hold more jpeg instances · 45ed97ad

James Zhu authored Dec 12, 2022

vcn_v4_0_3 increased jpeg instances,
need increasing MAX resources setting accordlingly.
Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

45ed97ad

drm/amdgpu: Use discovery to get XCC/SDMA mask · 73fa2553

Lijo Lazar authored Nov 28, 2022

Get information about active XCC and SDMAs from discovery table.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

73fa2553

drm/amdgpu: Make VRAM discovery read optional · 44cbc453

Lijo Lazar authored Dec 01, 2022

When overridden with module param, directly read discovery info
from discovery binary instead of reading from VRAM.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

44cbc453

drm/amdgpu: Allocate GART table in RAM for AMD APU · c9a502e9

Felix Kuehling authored Nov 29, 2022

Some AMD APUs may not have a dedicated VRAM. On such platforms the GART
table should be allocated on the system memory. When real vram size is
zero, place the GART table in system memory and create an SG BO to make
it GPU accessible.

v2: fix includes
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
(rajneesh: removed set_memory_wc workaround)
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

c9a502e9

drm/amdgpu: Add FGCG logic for GFX v9.4.3 · 34fd9d68

Lijo Lazar authored Dec 20, 2022

Add logic for fine grain clock gating logic for GFX v9.4.3. The feature
will be controlled using CG flags. Also, make a change so that RLC safe
mode entry/exit is done only once during CG update sequence.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

34fd9d68

drm/amdgpu: Make UTCL2 snoop CPU caches · 7a7aaab0

Rajneesh Bhardwaj authored Dec 20, 2022

On AMD APP APUs, to make UTCL2 snoop CPU caches, its not sufficient to
rely on xgmi connected flag so add the logic to use is_app_apu to
program the PDE_REQUEST_PHYSICAL bit correctly for gfxhub and mmhub
both.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

7a7aaab0

amd/amdgpu: Set MTYPE_UC for access over PCIe · 85b45b60

Amber Lin authored Nov 28, 2022

For GFX v9_4_3, set MTYPE_UC for memory access over PCIe.

v4 - add missing indentation pointed out by Felix and add his
reviewed-by tag.
v3 - add missing logic for the svm path.
v2 - add amdgpu_xgmi_same_hive to separate access over xgmi from pcie
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

85b45b60

drm/amdgpu: Fix GFX v9.4.3 EOP buffer allocation · d524180b

Lijo Lazar authored Dec 19, 2022

Each compute cluster gets 8 compute queues in GFX v9.4.3. Fix the EOP
buffer allocation so that compute queue on every XCC gets a unique
address.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Tested-and-Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

d524180b

drm/amdgpu: Fix GFX 9.4.3 dma address capability · 12c4d7ed

Lijo Lazar authored Dec 15, 2022

ASICs with GFX 9.4.3 support 48-bit addressing.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

12c4d7ed

drm/amdgpu: Fix semaphore release · a0a0c69c

Lijo Lazar authored Dec 14, 2022

Use the right register for semaphore release during invalidation.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

a0a0c69c

drm/amdkfd: Setup current_logical_xcc_id in MQD · c2d43918

Mukul Joshi authored Dec 09, 2022

Setup rolling current_logical_xcc_id in MQD for GFX9.4.3
to ensure each queue starts at a different place and prevent
hotspotting issues. Also, remove updating current_logical_xcc_id
during queue update.
Suggested-by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

c2d43918

drm/amdgpu: Remove unnecessary return value check · a820d3ca

Lijo Lazar authored Dec 01, 2022

There is no need to check return value, as the function internally
used - amdgpu_discovery_read_binary_from_vram() - returns void.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

a820d3ca

drm/amdgpu: correct the vmhub index when page fault occurs · 98b2e9ca

Le Ma authored Dec 09, 2022

The AMDGPU_GFXHUB was bind to each xcc in the logical order.
Thus convert the node_id to logical xcc_id to index the
correct AMDGPU_GFXHUB. And "node_id / 4" can get the correct
AMDGPU_MMHUB0 index.
Signed-off-by: Le Ma <le.ma@amd.com>
Tested-by: Asad kamal <asad.kamal@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

98b2e9ca

drm/amdkfd: Update packet manager for GFX9.4.3 · 1794e9d7

Mukul Joshi authored Dec 08, 2022

In GFX 9.4.3, there can be more than 8 SDMA engines.
As a result, extended_engine_sel and engine_sel fields
in MAP_QUEUES packet need to be updated to allow correct
mapping of SDMA queues to these SDMA engines.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1794e9d7

drm/amdgpu: set MTYPE in PTE for GFXIP 9.4.3 · 753b999a

Rajneesh Bhardwaj authored Dec 07, 2022

Apply the GFXIP 9.4.3 specific snoop and mtype settings for various
scenarios such as APU, APU in Carveout mode and dGPU mode.

Note: This is expected to change due to:
1 - NPS > 1 support in future
2 - Hardware bugs found during initial asic bringup.

Cc: Graham Sider <graham.sider@amd.com>
Cc: Hawking Zhang <hawking.zhang@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

753b999a

drm/amdgpu: Use mask for active clusters · 7a1efad0

Lijo Lazar authored Nov 29, 2022

Use a mask of available active clusters instead of using only the number
of active clusters.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

7a1efad0

drm/amdgpu: Derive active clusters from SDMA · bbca579f

Lijo Lazar authored Nov 28, 2022

SDMA instances per active cluster and SDMA instance mask are used
to find the number of active clusters.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

bbca579f

drm/amdgpu: Move generic logic to soc config · dc6df209

Lijo Lazar authored Nov 28, 2022

Move soc specific configuration details to aqua vanjaram specific file.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

dc6df209

drm/amdgpu: Fix the KCQ hang when binding back · fee500fa

Shiwu Zhang authored Nov 18, 2022

Just like the KIQ, KCQ need to clear the doorbell related regs as well
to avoid hangs when to load driver again after unloading.
Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

fee500fa

drm/amdgpu: Skip TMR allocation if not required · 5b03127d

Lijo Lazar authored Nov 24, 2022

On ASICs with PSPv13.0.6, TMR is reserved at boot time. There is no need
to allocate TMR region by driver. However, it's still required to send
SETUP_TMR command to PSP.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

5b03127d

drm/amdgpu: Add XCP IP callback funcs for each IP · 845c9b31

Lijo Lazar authored Sep 23, 2022

Initialize with the IP specific functions needed for GFXHUB, GFX and
SDMA.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

845c9b31