• Guchun Chen's avatar
    drm/amdgpu: fix kernel page fault issue by ras recovery on sGPU · 12c17b9d
    Guchun Chen authored
    When running ras uncorrectable error injection and triggering GPU
    reset on sGPU, below issue is observed. It's caused by the list
    uninitialized when accessing.
    
    [   80.047227] BUG: unable to handle page fault for address: ffffffffc0f4f750
    [   80.047300] #PF: supervisor write access in kernel mode
    [   80.047351] #PF: error_code(0x0003) - permissions violation
    [   80.047404] PGD 12c20e067 P4D 12c20e067 PUD 12c210067 PMD 41c4ee067 PTE 404316061
    [   80.047477] Oops: 0003 [#1] SMP PTI
    [   80.047516] CPU: 7 PID: 377 Comm: kworker/7:2 Tainted: G           OE     5.4.0-rc7-guchchen #1
    [   80.047594] Hardware name: System manufacturer System Product Name/TUF Z370-PLUS GAMING II, BIOS 0411 09/21/2018
    [   80.047888] Workqueue: events amdgpu_ras_do_recovery [amdgpu]
    Signed-off-by: default avatarGuchun Chen <guchun.chen@amd.com>
    Reviewed-by: default avatarJohn Clements <John.Clements@amd.com>
    Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
    12c17b9d
amdgpu_ras.c 50.3 KB