Commits · fc021438d0ab7863dc93f84a557af6dc6255b881 · Kirill Smelkov / linux

09 Jun, 2023 40 commits

drm/amdgpu: Enable NPS4 CPX mode · fc021438

Philip Yang authored Apr 19, 2023

CPX compute mode is valid mode for NPS4 memory partition mode.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

fc021438

drm/amdkfd: Move pgmap to amdgpu_kfd_dev structure · 610dab11

Philip Yang authored Mar 31, 2023

VRAM pgmap resource is allocated every time when switching compute
partitions because kfd_dev is re-initialized by post_partition_switch,
As a result, it causes memory region resource leaking and system
memory usage accounting unbalanced.

pgmap resource should be allocated and registered only once when loading
driver and freed when unloading driver, move it from kfd_dev to
amdgpu_kfd_dev.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

610dab11

drm/amdgpu: Skip halting RLC on GFX v9.4.3 · 00e1ab02

Lijo Lazar authored Mar 24, 2023

RLC-PMFW handshake happens periodically when GFXCLK DPM is enabled and
halting RLC may cause unexpected results. Avoid halting RLC from driver
side.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

00e1ab02

drm/amdgpu: Fix register accesses in GFX v9.4.3 · 1e91a5f7

Lijo Lazar authored Mar 16, 2023

Access registers with the right xcc id. Also, remove the unused logic as
PG is not used in GFX v9.4.3
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1e91a5f7

drm/amdkfd: Increase queue number per process to 255 on GFX9.4.3 · 3697b9bd

Mukul Joshi authored Mar 15, 2023

Increase the maximum number of queues that can be created per process
to 255 on GFX 9.4.3. There is no HWS limitation restricting the number
queues that can be created.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

3697b9bd

drm/amdgpu: Adjust the sequence to query ras error info · 9b337b7d

Hawking Zhang authored Mar 20, 2023

It turns out STATUS_VALID_FLAG needs to be checked
ahead of any other fields. ADDRESS_VALID_FLAG and
ERR_INFO_VALID_FLAG only manages ADDRESS and ERR_INFO
field respectively. driver should continue poll
ERR CNT field even ERR_INFO_VALD_FLAG is not set.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

9b337b7d

drm/amdgpu: Initialize jpeg v4_0_3 ras function · 35d54e21

Hawking Zhang authored Mar 06, 2023

Initialize jpeg v4_0_3 ras function.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

35d54e21

drm/amdgpu: Add reset_ras_error_count for jpeg v4_0_3 · 570df4bc

Hawking Zhang authored Mar 02, 2023

Add reset_ras_error_count callback for jpeg v4_0_3.
It will be used to reset jpeg ras error count.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

570df4bc

drm/amdgpu: Add query_ras_error_count for jpeg v4_0_3 · 41e491d8

Hawking Zhang authored Mar 02, 2023

Add query_ras_error_count callback for jpeg v4_0_3.
It will be used to query and log jpeg error count.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

41e491d8

drm/amdgpu: Re-enable VCN RAS if DPG is enabled · 85f23b0a

Hawking Zhang authored Mar 02, 2023

VCN RAS enablement sequence needs to be added in
DPG HW init sequence.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

85f23b0a

drm/amdgpu: Initialize vcn v4_0_3 ras function · c3f05ab8

Hawking Zhang authored Mar 06, 2023

Initialize vcn v4_0_3 ras function
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

c3f05ab8

drm/amdgpu: Add reset_ras_error_count for vcn v4_0_3 · 6d39fa3f

Hawking Zhang authored Mar 02, 2023

Add reset_ras_error_count callback for vcn v4_0_3.
It will be used to reset vcn ras error count.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

6d39fa3f

drm/amdgpu: Add query_ras_error_count for vcn v4_0_3 · 5e1e227f

Hawking Zhang authored Mar 01, 2023

Add query_ras_error_count callback for vcn v4_0_3.
It will be used to query and log vcn error count.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

5e1e227f

drm/amdgpu: Add vcn/jpeg ras err status registers · 6c2bebfc

Hawking Zhang authored Mar 01, 2023

Add new ras error status registers introduced in
vcn v4_0_3 to log vcn and jpeg ras error.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

6c2bebfc

drm/amdgpu: Checked if the pointer NULL before use it. · b4520bfd

Gavin Wan authored Mar 17, 2023

For SRIOV on some parts, the host driver does not post VBIOS. So the guest
cannot get bios information. Therefore, adev->virt.fw_reserve.p_pf2vf
and adev->mode_info.atom_context are NULL.
Signed-off-by: Gavin Wan <Gavin.Wan@amd.com>
Reviewed-by: Zhigang Luo <Zhigang.Luo@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

b4520bfd

drm/amdgpu: Set memory partitions to 1 for SRIOV. · 46f7b4de

Gavin Wan authored Apr 10, 2023

For SRIOV, the memory partitions are set on host drover. Each VF only
has one memory partition. We need set the memory partitions to 1 on
guest driver for SRIOV.

V2: sqaush in fix ("drm/amdgpu: Fix memory range info of GC 9.4.3 VFs")
Signed-off-by: Gavin Wan <Gavin.Wan@amd.com>
Acked-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

46f7b4de

drm/amdgpu: Skip using MC FB Offset when APU flag is set for SRIOV. · b0a3bbf9

Gavin Wan authored Apr 03, 2023

The MC_VM_FB_OFFSET is PF only register. It cannot be read on VF.
So, the driver should not use MC_VM_FB_OFFSET address to set the
address of dev->gmc.aper_base.
Signed-off-by: Gavin Wan <Gavin.Wan@amd.com>
Reviewed-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

b0a3bbf9

drm/amdgpu: Add PSP supporting PSP 13.0.6 SRIOV ucode init. · 63630c9e

Gavin Wan authored Mar 16, 2023

Add PSP supporting PSP 13.0.6 SRIOV ucode init.
Signed-off-by: Gavin Wan <Gavin.Wan@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

63630c9e

drm/amdgpu: Add PSP spatial parition interface · ba08e9cb

Lijo Lazar authored Mar 10, 2023

Add PSP ring command interface for spatial partitioning.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

ba08e9cb

drm/amdgpu: Return error on invalid compute mode · b6b85c8b

Lijo Lazar authored Mar 07, 2023

Return error if an invalid compute partition mode is requested.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

b6b85c8b

drm/amdgpu: Add compute mode descriptor function · f9632096

Lijo Lazar authored Mar 07, 2023

Keep a helper function to get description of compute partition mode.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

f9632096

drm/amdgpu: Fix unmapping of aperture · a0ba1279

Lijo Lazar authored Mar 03, 2023

When aperture size is zero, there is no mapping done.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

a0ba1279

drm/amdgpu: Fix xGMI access P2P mapping failure on GFXIP 9.4.3 · e181be58

Rajneesh Bhardwaj authored Feb 27, 2023

On GFXIP 9.4.3, we dont need to rely on xGMI hive info to determine P2P
access.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-and-tested-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

e181be58

drm/amdkfd: Native mode memory partition support · fcfefd85

Rajneesh Bhardwaj authored Feb 27, 2023

For native mode, after amdgpu_bo is created on CPU domain, then call
amdgpu_ttm_tt_set_mem_pool to select the TTM pool using bo->mem_id.
ttm_bo_validate will allocate the memory to the correct memory partition
before mapping to GPUs.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-and-tested-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

fcfefd85

drm/amdgpu: Set TTM pools for memory partitions · 1e03322c

Philip Yang authored Feb 27, 2023

For native mode only, create TTM pool for each memory partition to store
the NUMA node id, then the TTM pool will be selected using memory
partition id to allocate memory from the correct partition.
Acked-by: Christian König <christian.koenig@amd.com>
(rajneesh: changed need_swiotlb and need_dma32 to false for pool init)
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-and-tested-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1e03322c

drm/ttm: export ttm_pool_fini for cleanup · 6b43e1a0

Rajneesh Bhardwaj authored Feb 13, 2023

ttm_pool_init is exported and used outside of ttm subsystem with
amdgpu_ttm interface, similarly export ttm_pool_fini for proper cleanup.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

6b43e1a0

drm/amdgpu: Add auto mode for compute partition · 570de94b

Lijo Lazar authored Feb 13, 2023

When auto mode is specified, driver will choose the right compute
partition mode.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Reviewed-by: Philip Yang <philip.yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

570de94b

drm/amdgpu: Check memory ranges for valid xcp mode · 1589c82a

Lijo Lazar authored Feb 13, 2023

Check the memory ranges available to the device also for deciding a
valid partition mode. Only select combinations are valid for a
particular mode.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Reviewed-by: Philip Yang <philip.yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1589c82a

drm/amdkfd: Use xcc mask for identifying xcc · c4050ff1

Lijo Lazar authored Feb 09, 2023

Instead of start xcc id and number of xcc per node, use the xcc mask
which is the mask of logical ids of xccs belonging to a parition.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

c4050ff1

drm/amdkfd: Add xcp reference to kfd node · a75f2271

Lijo Lazar authored Feb 09, 2023

Fetch xcp information from xcp_mgr and also add xcc_mask to kfd node.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

a75f2271

drm/amdgpu: Move initialization of xcp before kfd · e47947ab

Lijo Lazar authored Feb 03, 2023

After partition switch, fill all relevant xcp information before kfd
starts initialization.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

e47947ab

drm/amdgpu: Fill xcp mem node in aquavanjaram · 15e3eee8

Lijo Lazar authored Feb 03, 2023

Implement callbacks to fill memory node information in aquavanjaram.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

15e3eee8

drm/amdgpu: Add callback to fill xcp memory id · da539b21

Lijo Lazar authored Feb 03, 2023

Add callback in xcp interface to fill xcp memory id information. Memory
id is used to identify the range/partition of an XCP from the available
memory partitions in device. Also, fill the id information.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

da539b21

drm/amdgpu: Initialize memory ranges for GC 9.4.3 · a433f1f5

Lijo Lazar authored Feb 14, 2023

GC 9.4.3 ASICS may have memory split into multiple partitions.Initialize
the memory partition information for each range. The information may be
in the form of a numa node id or a range of pages.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

a433f1f5

drm/amdgpu: Add memory partitions to gmc · 14493cb9

Lijo Lazar authored Feb 14, 2023

Some ASICs have the device memory divided into multiple partitions. The
parititions could be denoted by a numa node or by a range of pages.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

14493cb9

drm/amdgpu: Add API to get numa information of XCC · fa0497c3

Lijo Lazar authored Feb 14, 2023

Add interface to get numa information of ACPI XCC object. The interface
uses logical id to identify an XCC.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

fa0497c3

drm/amdgpu: Store additional numa node information · 1cc82301

Lijo Lazar authored Feb 14, 2023

Use a struct to store additional numa node information including size
and base address. Add numa_info pointer to xcc object to point to the
relevant structure based on its proximity domain.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1cc82301

drm/amdgpu: Get supported memory partition modes · 0f2e1d62

Lijo Lazar authored Feb 17, 2023

Expand the interface to get supported memory partition modes also along
with the current memory partition mode.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

0f2e1d62

drm/amdgpu: Move memory partition query to gmc · b6f90baa

Lijo Lazar authored Jan 31, 2023

GMC block handles memory related information, it makes more sense to
keep memory partition functions in gmc block.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

b6f90baa

drm/amdgpu: Add utility functions for xcp · 4bdca205

Lijo Lazar authored Jan 25, 2023

Add utility functions to get details of xcp and iterate through
available xcps.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

4bdca205