- 27 Aug, 2018 40 commits
-
-
Christian König authored
No more waiting for a fence done here. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Rex Zhu authored
1. initialize kiq before initialize gfx ring. 2. set kiq ring ready immediately when kiq initialize successfully. 3. split function gfx_v9_0_kiq_resume into two functions. gfx_v9_0_kiq_resume is for kiq initialize. gfx_v9_0_kcq_resume is for kcq initialize. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Rex Zhu authored
1. initialize kiq before initialize gfx ring. 2. set kiq ring ready immediately when kiq initialize successfully. 3. split function gfx_v8_0_kiq_resume into two functions. gfx_v8_0_kiq_resume is for kiq initialize. gfx_v8_0_kcq_resume is for kcq initialize. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Rex Zhu authored
Send all kcq unmap_queue packets and then wait for complete. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Rex Zhu authored
There are no any logical changes here. 1. if kcq can be enabled via kiq, we don't need to do kiq ring test. 2. amdgpu_ring_test_ring function can be used to sync the ring complete, remove the duplicate code. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Rex Zhu authored
Send all kcq unmap_queue packets and then wait for complete. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Rex Zhu authored
There are no any logical changes here. 1. if kcq can be enabled via kiq, we don't need to do kiq ring test. 2. amdgpu_ring_test_ring function can be used to sync the ring complete, remove the duplicate code. v2: alloc 6 (not 7) dws for unmap_queues Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Rex Zhu authored
As we have unify powergate_uvd/vce/mmhub to set_powergating_by_smu, and set_powergating_by_smu was supported by both dpm and powerplay. so remove the else case. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Rex Zhu authored
when hw_fini/suspend, smu only need to power on uvd block if uvd pg is supported, don't need to call uvd to do hw_init. v2: fix typo in patch descriptions and comments. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Tested-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Rex Zhu authored
For SI/Kv, the power state is managed by function amdgpu_pm_compute_clocks. when dpm enabled, we should call amdgpu_pm_compute_clocks to update current power state instand of set boot state. this change can fix the oops when kfd driver was enabled on Kv. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Tested-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Rex Zhu authored
Forgot to add vce pg support via smu for Kaveri/Mullins. Fixes: 561a5c83eadd ("drm/amd/pp: Unify powergate_uvd/vce/mmhub to set_powergating_by_smu") v2: refine patch descriptions suggested by Michel Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Tested-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Rex Zhu authored
when ac/dc switch, driver will be notified by acpi event. then the power source will be updated. so don't need to get power source when set power state. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Rex Zhu authored
This is required by gfx hw and can fix the rlc hang when do s3 stree test on Cz/St. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Hang Zhou <hang.zhou@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Felix Kuehling authored
Set the VM size based on system memory size between the ASIC-specific limits given by min_vm_size and max_bits. GFXv9 GPUs will keep their default VM size of 256TB (48 bit). Only older GPUs will adjust VM size depending on system memory size. This makes more VM space available for ROCm applications on GFXv8 GPUs that want to map all available VRAM and system memory in their SVM address space. v2: * Clarify comment * Round up memory size before >> 30 * Round up automatic vm_size to power of two Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Junwei Zhang <Jerry.Zhang@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Christian König authored
Try to kill waves on the SQ. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Christian König authored
Try to kill waves on the SQ. v2: only for the GFX ring for now. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Christian König authored
Try to kill waves on the SQ. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Christian König authored
Instead of hammering hard on the GPU try a soft recovery first. v2: reorder code a bit v3: increase timeout to 10ms, increment GPU reset counter v4: squash in compile fix (Christian) Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com>
-
Huang Rui authored
The new bulk moving functionality is ready, the overhead of moving PD/PT bos to LRU is fixed. So move them on LRU again. Signed-off-by: Huang Rui <ray.huang@amd.com> Tested-by: Mike Lothian <mike@fireburn.co.uk> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Chunming Zhou <david1.zhou@amd.com> Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Huang Rui authored
I continue to work for bulk moving that based on the proposal by Christian. Background: amdgpu driver will move all PD/PT and PerVM BOs into idle list. Then move all of them on the end of LRU list one by one. Thus, that cause so many BOs moved to the end of the LRU, and impact performance seriously. Then Christian provided a workaround to not move PD/PT BOs on LRU with below patch: Commit 0bbf32026cf5ba41e9922b30e26e1bed1ecd38ae ("drm/amdgpu: band aid validating VM PTs") However, the final solution should bulk move all PD/PT and PerVM BOs on the LRU instead of one by one. Whenever amdgpu_vm_validate_pt_bos() is called and we have BOs which need to be validated we move all BOs together to the end of the LRU without dropping the lock for the LRU. While doing so we note the beginning and end of this block in the LRU list. Now when amdgpu_vm_validate_pt_bos() is called and we don't have anything to do, we don't move every BO one by one, but instead cut the LRU list into pieces so that we bulk move everything to the end in just one operation. Test data: +--------------+-----------------+-----------+---------------------------------------+ | |The Talos |Clpeak(OCL)|BusSpeedReadback(OCL) | | |Principle(Vulkan)| | | +------------------------------------------------------------------------------------+ | | | |0.319 ms(1k) 0.314 ms(2K) 0.308 ms(4K) | | Original | 147.7 FPS | 76.86 us |0.307 ms(8K) 0.310 ms(16K) | +------------------------------------------------------------------------------------+ | Orignial + WA| | |0.254 ms(1K) 0.241 ms(2K) | |(don't move | 162.1 FPS | 42.15 us |0.230 ms(4K) 0.223 ms(8K) 0.204 ms(16K)| |PT BOs on LRU)| | | | +------------------------------------------------------------------------------------+ | Bulk move | 163.1 FPS | 40.52 us |0.244 ms(1K) 0.252 ms(2K) 0.213 ms(4K) | | | | |0.214 ms(8K) 0.225 ms(16K) | +--------------+-----------------+-----------+---------------------------------------+ After test them with above three benchmarks include vulkan and opencl. We can see the visible improvement than original, and even better than original with workaround. v2: move all BOs include idle, relocated, and moved list to the end of LRU and put them together. v3: remove unused parameter and use list_for_each_entry instead of the one with save entry. v4: move the amdgpu_vm_move_to_lru_tail after command submission, at that time, all bo will be back on idle list. v5: remove amdgpu_vm_move_to_lru_tail_by_list(), use bulk_moveable instread of validated, and move ttm_bo_bulk_move_lru_tail() also into amdgpu_vm_move_to_lru_tail(). v6: clean up and fix return value. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Huang Rui <ray.huang@amd.com> Tested-by: Mike Lothian <mike@fireburn.co.uk> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Chunming Zhou <david1.zhou@amd.com> Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Huang Rui authored
This function allow us to bulk move a group of BOs to the tail of their LRU. The positions of group of BOs are stored on the (first, last) bulk_move_pos structure. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Huang Rui <ray.huang@amd.com> Tested-by: Mike Lothian <mike@fireburn.co.uk> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Chunming Zhou <david1.zhou@amd.com> Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Christian König authored
When move a BO to the end of LRU, it need remember the BO positions. Make sure all moved bo in between "first" and "last". And they will be bulk moving together. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Huang Rui <ray.huang@amd.com> Tested-by: Mike Lothian <mike@fireburn.co.uk> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Chunming Zhou <david1.zhou@amd.com> Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Christian König authored
Add bulk move pos to store the pointer of first and last buffer object. The list in between will be bulk moved on lru list. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Huang Rui <ray.huang@amd.com> Tested-by: Mike Lothian <mike@fireburn.co.uk> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Chunming Zhou <david1.zhou@amd.com> Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Christian König authored
Add a helper to get the root PD address and remove the workarounds from the GMC9 code for that. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Christian König authored
We can easily figure out the address on the fly. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Christian König authored
sed -i "s/gart.robj/gart.bo/" drivers/gpu/drm/amd/amdgpu/*.c sed -i "s/gart.robj/gart.bo/" drivers/gpu/drm/amd/amdgpu/*.h Just cleaning up radeon leftovers. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Christian König authored
Move setting the GART addr for window based copies into the TTM code who uses it. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Christian König authored
Add a helper function for getting the root PD addr and cleanup join the two VM related functions and cleanup the function name. No functional change. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Christian König authored
Preparation for following changes. This validates the root PD twice, but the overhead of that should be minimal. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Christian König authored
Check if we should call the function instead of providing the forced flag. v2: rebase on KFD changes (Alex) Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
kbuild test robot authored
Fixes: d790449835e6 ("drm/amdgpu: use kiq to do invalidate tlb") Reviewed-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Alex Deucher authored
This adds support for LVDS displays. v2: add support for spread spectrum, sink detect v3: clean up enable_lvds_output v4: fix up link_detect v5: remove assert on 888 format Bug: https://bugs.freedesktop.org/show_bug.cgi?id=105880Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Emily Deng authored
When in gpu reset, don't use kiq, it will generate more TDR. Signed-off-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>. Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Nayan Deshmukh authored
do not remove entity from the rq if the current rq is from the least loaded scheduler. Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Dan Carpenter authored
The if statement isn't indented and it makes static checkers complain. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Wen Yang authored
Fix comile warning like, CC [M] drivers/gpu/drm/i915/gvt/execlist.o CC [M] drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.o CC [M] drivers/gpu/drm/radeon/btc_dpm.o CC [M] drivers/isdn/hisax/avm_a1p.o CC [M] drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_dpp.o drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_hw_sequencer.c: In function ‘dcn10_update_mpcc’: drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_hw_sequencer.c:1903:9: warning: missing braces around initializer [-Wmissing-braces] struct mpcc_blnd_cfg blnd_cfg = {0}; ^ drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_hw_sequencer.c:1903:9: warning: (near initialization for ‘blnd_cfg.black_color’) [-Wmissing-braces] Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Wen Yang <wen.yang99@zte.com.cn> Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Emily Deng authored
For sriov, don't use kiq in exclusive mode, as don't know how long time it will take, some times it will occur exclusive timeout. Signed-off-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Evan Quan authored
Use the old doorbell range setting until the driver is able to support more sdma queues. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Rex Zhu authored
In function ‘gfx_v9_0_check_fw_write_wait’: warning: enumeration value ‘CHIP_TAHITI’ not handled in switch [-Wswitch] Always add default case in case there is no match Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Christian König authored
Use a fixed number of entities for each hardware IP. The number of compute entities is reduced to four, SDMA keeps it two entities and all other engines just expose one entity. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-