1. 25 Jan, 2022 34 commits
  2. 21 Jan, 2022 2 commits
    • Xiaojian Du's avatar
      drm/amdgpu: fix the page fault caused by uninitialized variables · a357dca9
      Xiaojian Du authored
      This patch will fix the page fault caused by uninitialized variables.
      
      Error Log:
      ......
      [ 130.246323] [drm] GART: num cpu pages 131072, num gpu pages 131072
      [ 131.963112] [drm] PCIE GART of 512M enabled (table at 0x0000008000000000).
      [ 131.963130] BUG: unable to handle page fault for address: 000000000002db80
      [ 131.963181] #PF: supervisor write access in kernel mode
      [ 131.963210] #PF: error_code(0x0002) - not-present page
      [ 131.963233] PGD 0 P4D 0
      [ 131.963253] Oops: 0002 [#1] SMP NOPTI
      [ 131.963273] CPU: 3 PID: 1411 Comm: modprobe Not tainted 5.13.0+ #1
      [ 131.963338] RIP: 0010:osq_lock+0x4d/0x120
      [ 131.963381] Code: 10 00 00 00 00 48 c7 02 00 00 00 00 89 42 14 87 07 85 c0 0f 84 d0 00 00 00 83 e8 01 48 98 48 03 0c c5 00 d9 ea 9c 48 89 4a 08 <48> 89 11 44 8b 42 10 45 85 c0 0f 85 af 00 00 00 55 48 89 fe 65 4c
      [ 131.963460] RSP: 0018:ffffa40481717768 EFLAGS: 00010202
      [ 131.963483] RAX: fffffffffffffffe RBX: ffffa40481717920 RCX: 000000000002db80
      [ 131.963520] RDX: ffff9256fecedb80 RSI: ffff9256cbed2e80 RDI: ffffa40481717ac4
      [ 131.963547] RBP: ffffa40481717808 R08: ffffa40481717920 R09: 00000000ffffffff
      [ 131.963582] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
      [ 131.963609] R13: ffffa40481717ac4 R14: ffffa40481717ab8 R15: ffff9256c9480000
      [ 131.963646] FS: 00007f23d9b9c540(0000) GS:ffff9256fecc0000(0000) knlGS:0000000000000000
      [ 131.963687] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 131.963721] CR2: 000000000002db80 CR3: 0000000008444000 CR4: 00000000000506e0
      [ 131.963758] Call Trace:
      [ 131.963772] ? __ww_mutex_lock.isra.0+0x3a2/0x760
      [ 131.963810] ? prb_read_valid+0x1c/0x20
      [ 131.963830] ? console_unlock+0x2fe/0x4f0
      [ 131.963849] __ww_mutex_lock_interruptible_slowpath+0x16/0x20
      [ 131.963882] ww_mutex_lock_interruptible+0x83/0x90
      [ 131.963908] amdgpu_bo_create_reserved+0xf0/0x1e0 [amdgpu]
      [ 131.964237] amdgpu_bo_create_kernel+0x17/0x80 [amdgpu]
      [ 131.964509] amdgpu_gmc_vram_checking+0x41/0xf0 [amdgpu]
      [ 131.964807] gmc_v10_0_hw_init+0x105/0x120 [amdgpu]
      [ 131.965108] amdgpu_device_init.cold+0x1aa4/0x1e3e [amdgpu]
      ......
      Signed-off-by: default avatarXiaojian Du <Xiaojian.Du@amd.com>
      Reviewed-by: default avatarYang Wang <kevinyang.wang@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      a357dca9
    • Stanley.Yang's avatar
      drm/amdgpu: fix convert bad page retiremt · 37ff945f
      Stanley.Yang authored
      Pmfw read ecc info registers and store values in
      eccinfo_table in the following order
      
      umc0 ch_inst 0, 1, 2 ... 7
      umc1 ch_inst 0, 1, 2 ... 7
      ...
      umc3 ch_inst 0, 1, 2 ... 7
      
      Driver should convert eccinfo_table_idx to channel_index according
      to channel_idx_tbl.
      Signed-off-by: default avatarStanley.Yang <Stanley.Yang@amd.com>
      Reviewed-by: default avatarTao Zhou <tao.zhou1@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      37ff945f
  3. 20 Jan, 2022 4 commits