- 14 Feb, 2023 26 commits
-
-
Bhawanpreet Lakha authored
[Why] We only allowed 1 overlay plane. But now some ASICS can support multiple overlay planes. [How] Use max_slave_planes as the number of overlays we can support. Also since we cannot draw cursor over a video plane, we need to make sure that we reject commits where the topmost plane is a video plane (overlay only). Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Wenjing Liu authored
[why] Link is a subcomponent in dc. DM should be aware of dc link structure as one of the abstracted objects maintained by dc. However it should have no idea of the existence of a link component in dc dedicated to maintain the states of dc link structure. As such we are moving link interfaces out of dc_link.h and directly added to dc.h. We are grandually fading out the explicit inclusion of dc_link header and eventually delete it. On dc side, since link is a subcomponent behind dc interfaces, it is not a good idea to implement dc interfaces in each individual subcomponent of link which is already a subcomponent of dc. So we are decoupling it by implementing a dc_link_exports in dc. This file will be a thin translation layer that breaks the dependency so link is able to make interface changes without breaking DM. Reviewed-by: Jun Lei <Jun.Lei@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Wenjing Liu <wenjing.liu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Nicholas Kazlauskas authored
[Why] Request from HW team to update the latencies to the new measured values. [How] Update the values in the bounding box. Reviewed-by: Charlene Liu <Charlene.Liu@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Leo (Hanghong) Ma authored
[Why] The FreeSync active bit unconditionally set in HDMI VSIF. [How] Set this bit to true when FAMS is enable on desktop. Reviewed-by: Felipe Clark <felipe.clark@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Leo (Hanghong) Ma <hanghong.ma@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Nasir Osman authored
[why] HUBP_UNBOUNDED_REQ_MODE being enabled while the display is rotated (eg. going from Portrait mode to Landscape mode) appears to be causing a p-state hang, specifically during full screen mode on the Tiktok PC app. Unbounded request mode doesn't appear to be supported with rotation configs, hence disabling it. [how] Within DML, modified unbounded request mode to be configured only when the rotation angle of the plane is 0. Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Nasir Osman<nasir.osman@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Yifan Zha authored
[Why] Regression of commit 72fef498 ("drm/amdgpu: Remove writing GRBM_GFX_CNTL in RLCG interface under SRIOV") on GFX9. According to GFX9 VF using different method to access GC registers including MMIO(direct) and RLCG(indirect), removing GRBM_GFX_* writing would make PIPE/ME/VM/QUEUE selection chaos leading to some OCL benchmark failure. For example, using RLCG interface to program GRBM_GFX_CNTL/INDEX for selecting MEC(actually the value is only in scratch2/3), then using MMIO directly program a MEC register in VF driver. The register programming are invalid due to GC switched to incorrect ME. [How] With checking RLCG accessing flag, keep writing GRBM_GFX_* as a legacy way. But it is still skipped on GFX10+ to avoid violation occurrence. Signed-off-by: Yifan Zha <Yifan.Zha@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Jiapeng Chong authored
Variable pre_connection_type is not effectively used, so delete it. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4031Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Jiapeng Chong authored
Variable ds_port is not effectively used, so delete it. drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dp_capability.c:280:35: warning: variable ‘ds_port’ set but not used. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4030Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Nasir Osman authored
[why] Newer ASICs such as DCN314 needs to allow for both self refresh and mem clk switching rather than just self refresh only. Otherwise, we can see some p-state hangs on ASICs that do support mem clk switching. [how] Added an allow_self_refresh_only flag for dcn30_internal_validate_bw and created a validate_bw method for DCN314 with the allow_self_refresh_only flag set to false (to support mem clk switching). Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Nasir Osman <nasir.osman@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Tom Chung authored
[Why] Dmub will cache the video position data during PSR-SU enable. The dmub will use an outdated MPO video position if user try to drag the video window and it will cause video glitch. [How] Disable the PSR-SU temporarily while user drag the video window. The PSR-SU will be re-enabled after the video window is stable. Reviewed-by: Sun peng Li <Sunpeng.Li@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Yang Li authored
drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_factory.c:145 get_ddc_line() warn: inconsistent indenting drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_factory.c:201 dc_link_construct_phy() warn: inconsistent indenting Reported-by: Abaci Robot <abaci@linux.alibaba.com> Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4026Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Yang Li authored
./drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_capability.c:1610:68-73: WARNING: conversion to bool not needed here Reported-by: Abaci Robot <abaci@linux.alibaba.com> Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4025Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Charlene Liu authored
[Why] In virtual link use case, link->ddc could be NULL. [How] Add null pointer check to avoid undefined behavior. Reviewed-by: Martin Leung <Martin.Leung@amd.com> Reviewed-by: Hansen Dsouza <hansen.dsouza@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Charlene Liu <Charlene.Liu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Wenjing Liu authored
[why] We mistakenly toggle dpms state for non master pipe when handling link lost. A non master pipe doesn't connect to a backend. So it is toggling dpms for non master is undefined and caused NULL pointer dereference. [how] Add helper functions to find an array of active master pipes for current link and only toggle DPMS for active master pipes connected to the link. Add assert in case we get called to program dpms with non master pipe. Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Wenjing Liu <wenjing.liu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Wenjing Liu authored
[why] A recent change was made to implement temporary workaround due DRM update in MST interfaces. The workaround is added into our generic deallocation MST sequence. This ticket is to extract this temporary workaround into its own function so it is differentiated from our generic sequence. Reviewed-by: Jerry Zuo <Jerry.Zuo@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Wenjing Liu <wenjing.liu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Samson Tam authored
[Why] In disable_dangling_plane, for phantom pipes, we enable OTG so disable programming gets the double buffer update. But this causes an underflow to occur. [How] Enable DPG prior to enabling OTG. Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Samson Tam <samson.tam@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Daniel Miess authored
This reverts commit f5df7725 [Why] This commit causes corruption when viewing a P010 video clip on a 300Hz eDP Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Daniel Miess <Daniel.Miess@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Wenjing Liu authored
[why] a recent regression has caused us to mistakenly switch RX back to SST mode when there are remaining mst stream enabled to the link. We are missing a condition check for stream count before setting RX back to SST mode. [how] Add stream count check condition back and do some further refactor so the logic is easier to understand to prevent future coding error in this sequence. Reviewed-by: Samson Tam <Samson.Tam@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Wenjing Liu <wenjing.liu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Alvin Lee authored
[Why & How] - For prefetch max vratio check, use the calculated prefetch bandwidth from dml32_CalculatePrefetchSchedule instead of max prefetch bandwidth - Also multiply prefetch bandwidth by VRatio since scaling is not considered one calculating require prefetch bw Reviewed-by: Jun Lei <Jun.Lei@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Alvin Lee <Alvin.Lee2@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Nicholas Kazlauskas authored
[Why] To align with DCN31 behavior. This helps avoid p-state hangs in the case where underflow does occur. [How] Flip the bit to true. Reviewed-by: Hansen Dsouza <hansen.dsouza@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Aurabindo Pillai authored
[Why & How] When k1 and k2 divider programming logic is executed for a phantom stream, the corresponding master stream should be used for the calculation. Fix the if condition to use the master stream for checking signal type instead of the phantom stream. Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Alvin Lee authored
[Description] - Single 4K60 playing YUV420 MPO video blocks P-State because the required VRatio for prefetch is too high (luma plane for YUV420 is 1bpe, so swath height is 16 and prefetch requires more lines) - Allow max vratio per plane to be 7.9 for single display YUV420 MPO video cases - Ensure that global vratio prefetch (i.e. total prefetch BW vs. total active bandwidth) does not excited 4.0 Reviewed-by: Nevenko Stupar <Nevenko.Stupar@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Alvin Lee <Alvin.Lee2@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Nicholas Kazlauskas authored
[Why] DOMAIN power gating control is now required to be done via firmware due to interlock with other power features. This is to avoid intermittent issues in the LB memories. [How] If the firmware supports the command then use the new firmware as the sequence can avoid potential display corruption issues. The command will be ignored on firmware that does not support DOMAIN power control and the pipes will remain always on - frequent PG cycling can cause the issue to occur on the old sequence, so we should avoid it. Reviewed-by: Hansen Dsouza <hansen.dsouza@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Kenneth Feng authored
implement mode2 reset on smu_v13_0_10 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Xiaogang Chen authored
When xnack is on user space can use svm page restore to set a vm range without setup it first, then use regular api to register. Currently kfd api and svm are not interoperable. We already have check on that, but for user buffer the mapping address is not same as buffer cpu virtual address. Add checking on that to avoid error propagate to hmm. Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Ma Jun authored
Checking INVOKE_CMD to fix the below warning info when unload or remove amdgpu driver [ 319.489809] Call Trace: [ 319.489810] <TASK> [ 319.489812] psp_ta_unload+0x9a/0xd0 [amdgpu] [ 319.489926] ? smu_smc_hw_cleanup+0x2f6/0x360 [amdgpu] [ 319.490072] psp_hw_fini+0xea/0x170 [amdgpu] [ 319.490231] amdgpu_device_fini_hw+0x2fc/0x413 [amdgpu] [ 319.490398] ? blocking_notifier_chain_unregister+0x56/0xb0 [ 319.490401] amdgpu_driver_unload_kms+0x51/0x60 [amdgpu] [ 319.490493] amdgpu_pci_remove+0x5a/0x140 [amdgpu] [ 319.490583] ? __pm_runtime_resume+0x60/0x90 [ 319.490586] pci_device_remove+0x3b/0xb0 [ 319.490588] __device_release_driver+0x1a8/0x2a0 [ 319.490591] driver_detach+0xf3/0x140 [ 319.490593] bus_remove_driver+0x6c/0xf0 [ 319.490595] driver_unregister+0x31/0x60 [ 319.490597] pci_unregister_driver+0x40/0x90 [ 319.490599] amdgpu_exit+0x15/0x44e [amdgpu] Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
- 09 Feb, 2023 14 commits
-
-
Alex Deucher authored
This reverts commit 3cc67fe1. Some users have reported flickerng with S/G display. We've tried extensively to reproduce and debug the issue on a wide variety of platform configurations (DRAM bandwidth, etc.) and a variety of monitors, but so far have not been able to. We disabled S/G display on a number of platforms to address this but that leads to failure to pin framebuffers errors and blank displays when there is memory pressure or no displays at all on systems with limited carveout (e.g., Chromebooks). We have a parameter to disable this as a debugging option as a way for users to disable this, depending on their use case, and for us to help debug this further. Having this enabled seems like the lesser of to evils. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Alex Deucher authored
This reverts commit 2404f9b0. Some users have reported flickerng with S/G display. We've tried extensively to reproduce and debug the issue on a wide variety of platform configurations (DRAM bandwidth, etc.) and a variety of monitors, but so far have not been able to. We disabled S/G display on a number of platforms to address this but that leads to failure to pin framebuffers errors and blank displays when there is memory pressure or no displays at all on systems with limited carveout (e.g., Chromebooks). We have a parameter to disable this as a debugging option as a way for users to disable this, depending on their use case, and for us to help debug this further. Having this enabled seems like the lesser of to evils. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Alex Deucher authored
This reverts commit f081cd4c. Some users have reported flickerng with S/G display. We've tried extensively to reproduce and debug the issue on a wide variety of platform configurations (DRAM bandwidth, etc.) and a variety of monitors, but so far have not been able to. We disabled S/G display on a number of platforms to address this but that leads to failure to pin framebuffers errors and blank displays when there is memory pressure or no displays at all on systems with limited carveout (e.g., Chromebooks). We have a parameter to disable this as a debugging option as a way for users to disable this, depending on their use case, and for us to help debug this further. Having this enabled seems like the lesser of to evils. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Alex Deucher authored
Some users have reported flickerng with S/G display. We've tried extensively to reproduce and debug the issue on a wide variety of platform configurations (DRAM bandwidth, etc.) and a variety of monitors, but so far have not been able to. We disabled S/G display on a number of platforms to address this but that leads to failure to pin framebuffers errors and blank displays when there is memory pressure or no displays at all on systems with limited carveout (e.g., Chromebooks). Add a option to disable this as a debugging option as a way for users to disable this, depending on their use case, and for us to help debug this further. v2: fix typo Reviewed-by: Harry Wentland <harry.wentland@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Colin Ian King authored
The function name is being reported as dc_link_contruct when it is actually dc_link_construct_phy. Fix this by using %s and the __func__ for the function name. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Ye Xingchen authored
link_hwss.h is included more than once in link_dpms.c . Signed-off-by: Ye Xingchen <ye.xingchen@zte.com.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Arnd Bergmann authored
When CONFIG_DRM_AMD_DC_DCN is disabled, the is_frl member is not defined: drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_validation.c: In function 'dp_active_dongle_validate_timing': drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_validation.c:126:66: error: 'const struct dc_dsc_config' has no member named 'is_frl' 126 | if (timing->flags.DSC && !timing->dsc_cfg.is_frl) | ^ Use the same #ifdef as the other references to this. Fixes: 54618888 ("drm/amd/display: break down dc_link.c") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Tom Rix authored
smatch reports drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn315/dcn315_clk_mgr.c:90:6: warning: symbol 'should_disable_otg' was not declared. Should it be static? should_disable_otg() is only used in dcn315_clk_mgr.c, so it should be static Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Alex Deucher authored
Use fb_start/end for consistency with gmc code for non- XGMI systems, they are equivalent to vram_start/end. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Alex Deucher authored
Need to cover both FB and AGP apertures. v2: fix missed gfxhub_v3_0_3.c Fixes: c6eafee0 ("Revert "Revert "drm/amdgpu/gmc11: enable AGP aperture""") Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Hamza Mahfooz authored
As made mention of in commit 4ea7fc09 ("drm/amd/display: Do not program interrupt status on disabled crtc"), we shouldn't program disabled crtcs. So, filter out disabled crtcs in dm_set_vupdate_irq() and dm_set_vblank(). Reviewed-by: Harry Wentland <harry.wentland@amd.com> Fixes: 589d2739 ("drm/amd/display: Use crtc enable/disable_vblank hooks") Fixes: d2574c33 ("drm/amd/display: In VRR mode, do DRM core vblank handling at end of vblank. (v2)") Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Jane Jian authored
sriov does not need to init pptable from amdgpu driver we finish it from PF Signed-off-by: Jane Jian <Jane.Jian@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
JesseZhang authored
test ib function is not necessary on hw ring, so remove it. v2: squash in NULL check fix Signed-off-by: JesseZhang <Jesse.Zhang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Guilherme G. Piccoli authored
Currently amdgpu calls drm_sched_fini() from the fence driver sw fini routine - such function is expected to be called only after the respective init function - drm_sched_init() - was executed successfully. Happens that we faced a driver probe failure in the Steam Deck recently, and the function drm_sched_fini() was called even without its counter-part had been previously called, causing the following oops: amdgpu: probe of 0000:04:00.0 failed with error -110 BUG: kernel NULL pointer dereference, address: 0000000000000090 PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli #338 Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022 RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched] [...] Call Trace: <TASK> amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu] amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu] amdgpu_driver_release_kms+0x16/0x30 [amdgpu] devm_drm_dev_init_release+0x49/0x70 [...] To prevent that, check if the drm_sched was properly initialized for a given ring before calling its fini counter-part. Notice ideally we'd use sched.ready for that; such field is set as the latest thing on drm_sched_init(). But amdgpu seems to "override" the meaning of such field - in the above oops for example, it was a GFX ring causing the crash, and the sched.ready field was set to true in the ring init routine, regardless of the state of the DRM scheduler. Hence, we ended-up using sched.ops as per Christian's suggestion [0], and also removed the no_scheduler check [1]. [0] https://lore.kernel.org/amd-gfx/984ee981-2906-0eaf-ccec-9f80975cb136@amd.com/ [1] https://lore.kernel.org/amd-gfx/cd0e2994-f85f-d837-609f-7056d5fb7231@amd.com/ Fixes: 067f44c8 ("drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)") Suggested-by: Christian König <christian.koenig@amd.com> Cc: Guchun Chen <guchun.chen@amd.com> Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-