- 24 Mar, 2021 40 commits
-
-
Eric Huang authored
The flag is only applied on fine-grained memory. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Oak Zeng <Oak.Zeng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Eric Huang authored
Page tables in vram mapping to cpu is changed from uncached to cached in A+A, the snoop bit in VM_CONTEXTx_PAGE_TABLE_BASE_ADDR/ PDE0s/PDE1s/PDE2s/PTE.TFs has to be set so gpuvm walker snoop page table data out of CPU cache. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Eric Huang authored
New A+A HW supports cached vram mapped to cpu. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Dennis Li authored
When connected to a host via xGMI, system fatal errors may trigger warm reset, driver has no change to query edc status before reset. Therefore in this case, driver should harvest previous error loging registers during boot, instead of only resetting them. v2: 1. IP's ras_manager object is created when its ras feature is enabled, so change to query edc status after amdgpu_ras_late_init called 2. change to enable watchdog timer after finishing gfx edc init Signed-off-by: Dennis Li <Dennis.Li@amd.com> Reivewed-by: Hawking Zhang <hawking.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Felix Kuehling authored
This is needed for best machine learning performance. XNACK can still be enabled per-process if needed. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Tested-by: Alex Sierra <alex.sierra@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Harish Kasiviswanathan authored
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reivewed-by: Hawking Zhang <hawking.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Kenneth Feng authored
Export new data in the metrics table for gfx and memory utilization counter, and each hbm temperature as well. v2: change the metrics table version to v1.1 v3: fix the coding style v4: rebase against latest kernel Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Kevin Wang <kevin1.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Kevin Wang authored
add PSP RAP L0 check when RAP TA is loaded. Signed-off-by: Kevin Wang <kevin1.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Kevin Wang authored
RAP TA is an optional firmware. if it doesn’t exist, the driver should bypass psp_rap_invoke() function. 1. bypass psp_rap_invoke() when RAP TA is not loaded. 2. add new parameter (status) to query RAP TA status. (the status value is different with psp_ta_invoke(), 3. fix the 'rap_status' MThread critical problem. (used without lock) Signed-off-by: Kevin Wang <kevin1.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Kevin Wang authored
add aldebaran serial number support. (serial number from metrics table) Signed-off-by: Kevin Wang <kevin1.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Felix Kuehling authored
When there is no graphics support, KFD can use more of the VMIDs. Graphics VMIDs are only used for video decoding/encoding and post processing. With two VCE engines, there is no reason to reserve more than 2 VMIDs for that. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Dennis Li authored
SQ's watchdog timer monitors forward progress, a mask of which waves caused the watchdog timeout is recorded into ras status registers and then trigger a system fatal error event. v2: 1. change *query_timeout_status to *query_sq_timeout_status. 2. move query_sq_timeout_status into amdgpu_ras_do_recovery. 3. add module parameters to enable/disable fatal error event and modify the watchdog timer. v3: 1. remove unused parameters of *enable_watchdog_timer Signed-off-by: Dennis Li <Dennis.Li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Dennis Li authored
The bank number of both VML2 and ATCL2 are changed to 8, so refine related codes to avoid defining long name arrays. Signed-off-by: Dennis Li <Dennis.Li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Dennis Li authored
add edc counter/status reset and query functions for gfx block of aldebaran. v2: change to clear edc counter explicitly aldebaran hardware will not clear edc counter after driver reading them, so driver should clear them explicitly. Signed-off-by: Dennis Li <Dennis.Li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Kevin Wang authored
add GC power brake feature support for Aldebaran. v2: squash in fixes (Alex) Signed-off-by: Kevin Wang <kevin1.wang@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Hawking Zhang authored
The golden setting was changed recently. update to the latest one Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Kevin Wang <kevin1.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Hawking Zhang authored
golden settings that should be applied Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Kevin Wang <kevin1.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Hawking Zhang authored
Those registers should be programmed as one-time initialization Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Kevin Wang <kevin1.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Jonathan Kim authored
Initialization of TRAP_DATA0/1 is still required for the debugger to detect new waves on Aldebaran. Also, per-vmid global trap enablement may be required outside of debugger scope so move to init phase. v2: just add the gfx 9.4.2 changes (Alex) Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Jonathan Kim authored
Create dedicated Aldebaran kfd2kgd callbacks to prepare for new per-vmid register instructions for debug trap setting functions and sending host traps. v2: rebase (Alex) Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com> Reviewed-by: Oak Zeng <Oak.Zeng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Oak Zeng authored
MEC firmware can silently fail the queue preemption request without time out. In this case, HIQ's MQD's queue_doorbell_id will be set. Check this field to see whether last queue preemption was successful or not. Signed-off-by: Oak Zeng <Oak.Zeng@amd.com> Suggested-by: Jay Cornwall <Jay.Cornwall@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Oak Zeng authored
This is to keep wavefront context for debug purpose Signed-off-by: Oak Zeng <Oak.Zeng@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Hawking Zhang authored
Match existing asics. v2: rebase (Alex) Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Kevin Wang <kevin1.wang@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Lijo Lazar authored
Aldebaran has fine grained DPM for GFXCLK. Instead of a discrete level, user can specify a min/max range of GFXCLK for any profiling/tuning purpose.This option is available only in manual performance level mode. Select "manual" as power_dpm_force_performance_level and specify the min/max range using pp_dpm_sclk sysfs node. User cannot specify a min/max range outside of the default min/max range of the ASIC. If specified outside the range, values will be bound by the default min/max range. Ex: To use gfxclk min = 600MHz and max = 900MHz echo manual > /sys/bus/pci/devices/.../power_dpm_force_performance_level echo min 600 max 900 > /sys/bus/pci/devices/.../pp_dpm_sclk Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Kevin Wang authored
the following message is not supported. PPSMC_MSG_ReadSerialNumTop32 PPSMC_MSG_ReadSerialNumBottom32 Signed-off-by: Kevin Wang <kevin1.wang@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Alex Sierra authored
With a recent gart page table re-construction, the gart page table is now 2-level for some ASICs: PDB0->PTB. In the case of 2-level gart page table, the page_table_base of vmid0 should point to PDB0 instead of PTB. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Oak Zeng <Oak.Zeng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Oak Zeng authored
More accurate words are used to address a code review feedback Signed-off-by: Oak Zeng <Oak.Zeng@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Oak Zeng authored
For the new 2-level GART table, the last PDE0 points to PTB. Since PTB is in vram and right now we are runing under s=0 mode (vram is treated as FB carveout), so the s bit of this PDE0 should be set to 0. Signed-off-by: Oak Zeng <Oak.Zeng@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Alex Sierra authored
update mmhub client id table for Aldebaran. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Dennis Li authored
Aldebaran can share the same initializing shader code witn arcturus. Signed-off-by: Dennis Li <Dennis.Li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Oak Zeng authored
With the 2-level gart page table, vram is squeezed into gart aperture and FB aperture is disabled. Therefore all VRAM virtual addresses are in the GART aperture. However currently PSP requires TMR addresses in FB aperture. So we need some design change at PSP FW level to support this 2-level gart table driver change. Right now this PSP FW support doesn't exist. To workaround this issue temporarily, FB aperture is added back and the gart aperture address is converted back to FB aperture for this PSP TMR address. Will revert it after we get a fix from PSP FW. v2: squash in tmr fix for other asics (Kevin) Signed-off-by: Oak Zeng <Oak.Zeng@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Oak Zeng authored
Set up HW for 2-level vmid0 page table: 1. Set up PAGE_TABLE_START/END registers. Currently only plan to do 2-level page table for ALDEBARAN, so only gfxhub1.0 and mmhub1.7 is changed. 2. Set page table base register. For 2-level page table, the page table base should point to PDB0. 3. Disable AGP and FB aperture as they are not used. Signed-off-by: Oak Zeng <Oak.Zeng@amd.com> Reviewed-by: Christian Konig <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Oak Zeng authored
If use gart for FB translation, allocate and fill PDB0. Signed-off-by: Oak Zeng <Oak.Zeng@amd.com> Reviewed-by: Christian Konig <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Oak Zeng authored
Add functions to allocate PDB0, map it for CPU access, and fill it. Those functions are only used for 2-level vmid0 page table construction Signed-off-by: Oak Zeng <Oak.Zeng@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Oak Zeng authored
If use gart for FB translation, we will squeeze vram into sysvm aperture. This requires 2 level gart table. Add page table depth and page table block size parameters to gmc. This is prepare work to 2-level gart table construction Signed-off-by: Oak Zeng <Oak.Zeng@amd.com> Reviewed-by: Christian Konig <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Oak Zeng authored
If use GART for FB translation, place both vram and gart to sysvm aperture. AGP aperture is not set up in this case because it is not used Signed-off-by: Oak Zeng <Oak.Zeng@amd.com> Reviewed-by: Christian Konig <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Oak Zeng authored
Modify the comment to reflect the fact that, if use GART for vram address translation for vmid0, [vram_start, vram_end] will be placed inside SYSVM aperture, together with GART. Signed-off-by: Oak Zeng <Oak.Zeng@amd.com> Reviewed-by: Christian Konig <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Oak Zeng authored
In amdgpu_gmc_gart_location function, gart_size is adjusted by a smu_prv_buffer_size. This logic shouldn't belong to this function. Move the logic to the mc_init functions Signed-off-by: Oak Zeng <Oak.Zeng@amd.com> Reviewed-by: Christian Konig <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Oak Zeng authored
On A+A platform, CPU write page directory and page table in cached mode. So it is necessary for page table walker to snoop CPU cache. This setting is necessary for page walker to snoop page directory and page table data out of CPU cache. Signed-off-by: Oak Zeng <Oak.Zeng@amd.com> Acked-by: Christian Konig <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Oak Zeng authored
On A+A platform, vram can be mapped as WB. Not necessarily to always map vram as WC on such platform. Calling function arch_io_reserve_memtype_wc will mark the whole vram region as WC. So don't call it for A+A platform. Signed-off-by: Oak Zeng <Oak.Zeng@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Christian Konig <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-