Commits · 4b617e2b9e5473747ef925dd2e47239acb9925c2 · Kirill Smelkov / linux

16 Sep, 2019 40 commits

drm/amdkfd: Swap trap temporary registers in gfx10 trap handler · 4b617e2b

Jay Cornwall authored Sep 12, 2019

ttmp[4:5] hold information useful to the debugger. Use ttmp[14:15]
instead, aligning implementation with gfx9 trap handler.
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: shaoyun liu <Shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

4b617e2b

drm/amdgpu: remove the redundant null checks · 20323246

zhong jiang authored Sep 03, 2019

debugfs_remove and kfree has taken the null check in account.
hence it is unnecessary to check it. Just remove the condition.
No functional change.

This issue was detected by using the Coccinelle software.
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

20323246

drm/radeon: be quiet when no SAD block is found · 72496eb1

Jean Delvare authored Sep 04, 2019

It is fine for displays without audio functionality to not provide
any SAD block in their EDID. Do not log an error in that case,
just return quietly.

Inspired by a similar fix to the amdgpu driver in the context of bug
fdo#107825:
https://bugs.freedesktop.org/show_bug.cgi?id=107825Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

72496eb1

drm/amd: be quiet when no SAD block is found · ae2a3495

Jean Delvare authored Sep 04, 2019

It is fine for displays without audio functionality to not provide
any SAD block in their EDID. Do not log an error in that case,
just return quietly.

This fixes half of bug fdo#107825:
https://bugs.freedesktop.org/show_bug.cgi?id=107825Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

ae2a3495

drm/amdgpu: Check for valid number of registers to read · 13238d4f

Trek authored Aug 31, 2019

Do not try to allocate any amount of memory requested by the user.
Instead limit it to 128 registers. Actually the longest series of
consecutive allowed registers are 48, mmGB_TILE_MODE0-31 and
mmGB_MACROTILE_MODE0-15 (0x2644-0x2673).

Bug: https://bugs.freedesktop.org/show_bug.cgi?id=111273Signed-off-by: Trek <trek00@inbox.ru>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

13238d4f

drm/radeon: Bail earlier when radeon.cik_/si_support=0 is passed · 80f349ce

Hans de Goede authored Sep 07, 2019

Bail from the pci_driver probe function instead of from the drm_driver
load function.

This avoid /dev/dri/card0 temporarily getting registered and then
unregistered again, sending unwanted add / remove udev events to
userspace.

Specifically this avoids triggering the (userspace) bug fixed by this
plymouth merge-request:
https://gitlab.freedesktop.org/plymouth/plymouth/merge_requests/59

Note that despite that being an userspace bug, not sending unnecessary
udev events is a good idea in general.

BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1490490Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

80f349ce

drm/amd/display: rename variable eanble -> enable · 60233044

Colin Ian King authored Sep 13, 2019

There is a spelling mistake in the variable name eanble,
rename it to enable.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

60233044

Revert "drm/amdgpu/nbio7.4: add hw bug workaround for vega20" · 3e103fc3

Kent Russell authored Sep 10, 2019

This reverts commit e01f2d41.

VG20 did not require this workaround, as the fix is in the VBIOS.
Leave VG10/12 workaround as some older shipped cards do not have the
VBIOS fix in place, and the kernel workaround is required in those
situations
Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

3e103fc3

drm/amdgpu: add graceful VM fault handling v3 · ec671737

Christian König authored Dec 07, 2018

Next step towards HMM support. For now just silence the retry fault and
optionally redirect the request to the dummy page.

v2: make sure the VM is not destroyed while we handle the fault.
v3: fix VM destroy check, cleanup comments
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

ec671737

drm/amdgpu: reserve the root PD while freeing PASIDs · b65709a9

Christian König authored Jul 17, 2019

Free the pasid only while the root PD is reserved. This prevents use after
free in the page fault handling.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

b65709a9

drm/amdgpu: allocate PDs/PTs with no_gpu_wait in a page fault · 061468c4

Christian König authored Sep 16, 2019

While handling a page fault we can't wait for other ongoing GPU
operations or we can potentially run into deadlocks.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

061468c4

drm/amdgpu: allow direct submission of clears · 0f6064d6

Christian König authored Mar 28, 2019

For handling PD/PT clears directly in the fault handler.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

0f6064d6

drm/amdgpu: allow direct submission of PTE updates · acb476f5

Christian König authored Mar 27, 2019

For handling PTE updates directly in the fault handler.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

acb476f5

drm/amdgpu: allow direct submission of PDE updates v2 · 807e2994

Christian König authored Mar 14, 2019

For handling PDE updates directly in the fault handler.

v2: fix typo in comment
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

807e2994

drm/amdgpu: allow direct submission in the VM backends v2 · 47ca7efa

Christian König authored Sep 16, 2019

This allows us to update page tables directly while in a page fault.

v2: use direct/delayed entities and still wait for moves
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

47ca7efa

drm/amdgpu: split the VM entity into direct and delayed · a2cf3247

Christian König authored Jul 19, 2019

For page fault handling we need to use a direct update which can't be
blocked by ongoing user CS.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

a2cf3247

drm/ttm: return -EBUSY on pipelining with no_gpu_wait (v2) · 3084cf46

Christian König authored Sep 16, 2019

Setting the no_gpu_wait flag means that the allocate BO must be available
immediately and we can't wait for any GPU operation to finish.

v2: squash in mem leak fix, rebase
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

3084cf46

drm/amdgpu: grab the id mgr lock while accessing passid_mapping · 6817bf28

Christian König authored Sep 09, 2019

Need to make sure that we actually dropping the right fence.
Could be done with RCU as well, but to complicated for a fix.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

6817bf28

drm/amdgpu/SRIOV: Navi12 SRIOV VF doesn't load TOC · 1b657824

Jiange Zhao authored Sep 12, 2019

In SRIOV case, the autoload sequence is the same

as bare metal, except VF won't load TOC.
Signed-off-by: Jiange Zhao <Jiange.Zhao@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1b657824

drm/amdgpu/SRIOV: Navi10/12 VF doesn't support SMU · a4ac7693

Jiange Zhao authored Sep 12, 2019

In SRIOV case, SMU and powerplay are handled in HV.

VF shouldn't have control over SMU and powerplay.
Signed-off-by: Jiange Zhao <Jiange.Zhao@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

a4ac7693

drm/amd/amdgpu: power up sdma engine when S3 resume back · a90a24d5

Prike Liang authored Sep 11, 2019

The sdma_v4 should be ungated when the IP resume back,
otherwise it will hang up and resume time out error.
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

a90a24d5

drm/amdgpu: For Navi12 SRIOV VF, register mailbox functions · b05b6903

Jiange Zhao authored Sep 11, 2019

Mailbox functions and interrupts are only for Navi12 VF.

Register functions and irqs during initialization.
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Jiange Zhao <Jiange.Zhao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

b05b6903

drm/amdgpu/sriov: add ring_stop before ring_create in psp v11 code · 51c0f58e

Jack Zhang authored Sep 10, 2019

psp  v11 code missed ring stop in ring create function(VMR)
while psp v3.1 code had the code. This will cause VM destroy1
fail and psp ring create fail.

For SIOV-VF, ring_stop should not be deleted in ring_create
function.
Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

51c0f58e

drm/amd/powerplay: check SMU engine readiness before proceeding on S3 resume · f7e3a577

Evan Quan authored Sep 11, 2019

This is especially needed for non-psp loading way.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

f7e3a577

drm/amd/powerplay: properly set mp1 state for SW SMU suspend/reset routine · 0e0b89c0

Evan Quan authored Sep 11, 2019

Set mp1 state properly for SW SMU suspend/reset routine.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

0e0b89c0

drm/amdgpu: Fix KFD-related kernel oops on Hawaii · d950800e

Felix Kuehling authored Sep 05, 2019

Hawaii needs to flush caches explicitly, submitting an IB in a user
VMID from kernel mode. There is no s_fence in this case.

Fixes: eb3961a5 ("drm/amdgpu: remove fence context from the job")
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

d950800e

drm/amdgpu: Fix mutex lock from atomic context. · 708901a6

Andrey Grodzovsky authored Sep 10, 2019

Problem:
amdgpu_ras_reserve_bad_pages was moved to amdgpu_ras_reset_gpu
because writing to EEPROM during ASIC reset was unstable.
But for ERREVENT_ATHUB_INTERRUPT amdgpu_ras_reset_gpu is called
directly from ISR context and so locking is not allowed. Also it's
irrelevant for this partilcular interrupt as this is generic RAS
interrupt and not memory errors specific.

Fix:
Avoid calling amdgpu_ras_reserve_bad_pages if not in task context.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

708901a6

drm/amdgpu: Add SRIOV mailbox backend for Navi1x · 3636169c

Jiange Zhao authored Sep 04, 2019

Mimic the ones for Vega10, add mailbox backend for Navi1x
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Jiange Zhao <Jiange.Zhao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

3636169c

drm/amdgpu: implement ras query function for pcie bif · 1a3f2e8c

Guchun Chen authored Sep 11, 2019

ras error query funtionality implementation
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1a3f2e8c

drm/amdgpu: add pcie bif ras related registers · d7b1ed4a

Guchun Chen authored Sep 11, 2019

These registers will be accessed for querying ras errors.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

d7b1ed4a

drm/amdgpu: support pcie bif ras query and inject · d7bd680d

Guchun Chen authored Sep 11, 2019

Call pcie bif ras query/inject in amdgpu ras.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

d7bd680d

drm/amdgpu: add ras error query count interface for nbio · 52652ef2

Guchun Chen authored Sep 04, 2019

Add the interface query_ras_error_count for nbio.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

52652ef2

drm/amdgpu: fix CPDMA hang in PRT mode for VEGA10 · ff9d0971

Tianci.Yin authored Sep 10, 2019

add and_mask since the programming logic of golden setting changed
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Tianci.Yin <tianci.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

ff9d0971

drm/amdgpu: enable error injection to XGMI block via debugfs · f3170352

Hawking Zhang authored Sep 08, 2019

allow inject error to XGMI block via debugfs node ras_ctrl
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

f3170352

drm/amdgpu: initialize ras structures for xgmi block (v2) · 029fbd43

Hawking Zhang authored Sep 10, 2019

init ras common interface and fs node for xgmi block

v2: remove unnecessary physical node number check before
invoking amdgpu_xgmi_ras_late_init
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

029fbd43

drm/amdkfd: fix the missed asic name while inited renoir_device_info · acb9acbe

Huang Rui authored Sep 10, 2019

This patch fixes null pointer issue below, I missed to init the asic renior name
while I rebase the patches.

[  106.004250] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  106.004254] #PF: supervisor read access in kernel mode
[  106.004256] #PF: error_code(0x0000) - not-present page
[  106.004257] PGD 0 P4D 0
[  106.004261] Oops: 0000 [#1] SMP NOPTI
[  106.004264] CPU: 3 PID: 1422 Comm: modprobe Not tainted 5.2.0-rc1-custom #1
[  106.004266] Hardware name: AMD Celadon-RN/Celadon-RN, BIOS
WCD9814N_Weekly_19_08_1 08/14/2019
[  106.004272] RIP: 0010:strncpy+0x12/0x30
[  106.004274] Code: c1 c0 11 48 c1 c6 15 48 31 d0 48 c1 c2 20 31 c2 89 d0 31 f0
41 5c 5d c3 55 48 85 d2 48 89 f8 48 89 e5 74 1e 48 01 fa 48 89 f9 <44> 0f b6 06
41 80 f8 01 44 88 01 48 83 de ff 48 83 c1 01 48 39 d1
[  106.004278] RSP: 0018:ffffc092c1fd37a8 EFLAGS: 00010286
[  106.004281] RAX: ffff9e943466a28c RBX: 00000000000036ed RCX: ffff9e943466a28c
[  106.004283] RDX: ffff9e943466a2ac RSI: 0000000000000000 RDI: ffff9e943466a28c
[  106.004285] RBP: ffffc092c1fd37a8 R08: ffff9e943d100000 R09: 0000000000000228
[  106.004287] R10: ffff9e94418dc5a8 R11: ffff9e944746c0d0 R12: 0000000000000000
[  106.004289] R13: ffff9e943fa1ec00 R14: ffff9e943466a200 R15: ffff9e943466a200
[  106.004291] FS:  00007f7a022c5540(0000) GS:ffff9e9447ac0000(0000)
knlGS:0000000000000000
[  106.004294] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  106.004296] CR2: 0000000000000000 CR3: 00000001ff0b0000 CR4: 0000000000340ee0
[  106.004298] Call Trace:
[  106.004382]  kfd_topology_add_device+0x150/0x610 [amdgpu]
[  106.004445]  kgd2kfd_device_init+0x2e0/0x4f0 [amdgpu]
[  106.004509]  amdgpu_amdkfd_device_init+0x14c/0x1b0 [amdgpu]
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-and-Tested-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

acb9acbe

drm/amd/display: Implement voltage limitation for dali · d1082e23

Bhawanpreet Lakha authored Sep 06, 2019

[Why]
we only want the lowest voltage to be available for dali.

[How]
Use the get_highest_allowed_voltage_level function
to return 0 for dali
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

d1082e23

drm/amd/display: add Asic ID for Dali · c4cacce7

Bhawanpreet Lakha authored Sep 06, 2019

Dali is a new asic revision based on raven2

Add the ID and ASICREV_IS_DALI define
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

c4cacce7

drm/amdgpu: Allow to reset to EERPOM table. · 084fe13b

Andrey Grodzovsky authored Sep 09, 2019

The table grows quickly during debug/development effort when
multiple RAS errors are injected. Allow to avoid this by setting
table header back to empty if needed.

v2: Switch to debugfs entry instead of load time parameter.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

084fe13b

drm/amdgpu: Add amdgpu_ras_eeprom_reset_table · d01b400b

Andrey Grodzovsky authored Sep 09, 2019

This will allow to reset the table on the fly.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

d01b400b