Commits · 061468c405fdf3b518d03bebbeafa0a9dc7300c2 · Kirill Smelkov / linux

16 Sep, 2019 40 commits

drm/amdgpu: allocate PDs/PTs with no_gpu_wait in a page fault · 061468c4

Christian König authored Sep 16, 2019

While handling a page fault we can't wait for other ongoing GPU
operations or we can potentially run into deadlocks.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

061468c4

drm/amdgpu: allow direct submission of clears · 0f6064d6

Christian König authored Mar 28, 2019

For handling PD/PT clears directly in the fault handler.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

0f6064d6

drm/amdgpu: allow direct submission of PTE updates · acb476f5

Christian König authored Mar 27, 2019

For handling PTE updates directly in the fault handler.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

acb476f5

drm/amdgpu: allow direct submission of PDE updates v2 · 807e2994

Christian König authored Mar 14, 2019

For handling PDE updates directly in the fault handler.

v2: fix typo in comment
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

807e2994

drm/amdgpu: allow direct submission in the VM backends v2 · 47ca7efa

Christian König authored Sep 16, 2019

This allows us to update page tables directly while in a page fault.

v2: use direct/delayed entities and still wait for moves
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

47ca7efa

drm/amdgpu: split the VM entity into direct and delayed · a2cf3247

Christian König authored Jul 19, 2019

For page fault handling we need to use a direct update which can't be
blocked by ongoing user CS.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

a2cf3247

drm/ttm: return -EBUSY on pipelining with no_gpu_wait (v2) · 3084cf46

Christian König authored Sep 16, 2019

Setting the no_gpu_wait flag means that the allocate BO must be available
immediately and we can't wait for any GPU operation to finish.

v2: squash in mem leak fix, rebase
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

3084cf46

drm/amdgpu: grab the id mgr lock while accessing passid_mapping · 6817bf28

Christian König authored Sep 09, 2019

Need to make sure that we actually dropping the right fence.
Could be done with RCU as well, but to complicated for a fix.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

6817bf28

drm/amdgpu/SRIOV: Navi12 SRIOV VF doesn't load TOC · 1b657824

Jiange Zhao authored Sep 12, 2019

In SRIOV case, the autoload sequence is the same

as bare metal, except VF won't load TOC.
Signed-off-by: Jiange Zhao <Jiange.Zhao@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1b657824

drm/amdgpu/SRIOV: Navi10/12 VF doesn't support SMU · a4ac7693

Jiange Zhao authored Sep 12, 2019

In SRIOV case, SMU and powerplay are handled in HV.

VF shouldn't have control over SMU and powerplay.
Signed-off-by: Jiange Zhao <Jiange.Zhao@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

a4ac7693

drm/amd/amdgpu: power up sdma engine when S3 resume back · a90a24d5

Prike Liang authored Sep 11, 2019

The sdma_v4 should be ungated when the IP resume back,
otherwise it will hang up and resume time out error.
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

a90a24d5

drm/amdgpu: For Navi12 SRIOV VF, register mailbox functions · b05b6903

Jiange Zhao authored Sep 11, 2019

Mailbox functions and interrupts are only for Navi12 VF.

Register functions and irqs during initialization.
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Jiange Zhao <Jiange.Zhao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

b05b6903

drm/amdgpu/sriov: add ring_stop before ring_create in psp v11 code · 51c0f58e

Jack Zhang authored Sep 10, 2019

psp  v11 code missed ring stop in ring create function(VMR)
while psp v3.1 code had the code. This will cause VM destroy1
fail and psp ring create fail.

For SIOV-VF, ring_stop should not be deleted in ring_create
function.
Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

51c0f58e

drm/amd/powerplay: check SMU engine readiness before proceeding on S3 resume · f7e3a577

Evan Quan authored Sep 11, 2019

This is especially needed for non-psp loading way.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

f7e3a577

drm/amd/powerplay: properly set mp1 state for SW SMU suspend/reset routine · 0e0b89c0

Evan Quan authored Sep 11, 2019

Set mp1 state properly for SW SMU suspend/reset routine.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

0e0b89c0

drm/amdgpu: Fix KFD-related kernel oops on Hawaii · d950800e

Felix Kuehling authored Sep 05, 2019

Hawaii needs to flush caches explicitly, submitting an IB in a user
VMID from kernel mode. There is no s_fence in this case.

Fixes: eb3961a5 ("drm/amdgpu: remove fence context from the job")
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

d950800e

drm/amdgpu: Fix mutex lock from atomic context. · 708901a6

Andrey Grodzovsky authored Sep 10, 2019

Problem:
amdgpu_ras_reserve_bad_pages was moved to amdgpu_ras_reset_gpu
because writing to EEPROM during ASIC reset was unstable.
But for ERREVENT_ATHUB_INTERRUPT amdgpu_ras_reset_gpu is called
directly from ISR context and so locking is not allowed. Also it's
irrelevant for this partilcular interrupt as this is generic RAS
interrupt and not memory errors specific.

Fix:
Avoid calling amdgpu_ras_reserve_bad_pages if not in task context.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

708901a6

drm/amdgpu: Add SRIOV mailbox backend for Navi1x · 3636169c

Jiange Zhao authored Sep 04, 2019

Mimic the ones for Vega10, add mailbox backend for Navi1x
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Jiange Zhao <Jiange.Zhao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

3636169c

drm/amdgpu: implement ras query function for pcie bif · 1a3f2e8c

Guchun Chen authored Sep 11, 2019

ras error query funtionality implementation
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1a3f2e8c

drm/amdgpu: add pcie bif ras related registers · d7b1ed4a

Guchun Chen authored Sep 11, 2019

These registers will be accessed for querying ras errors.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

d7b1ed4a

drm/amdgpu: support pcie bif ras query and inject · d7bd680d

Guchun Chen authored Sep 11, 2019

Call pcie bif ras query/inject in amdgpu ras.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

d7bd680d

drm/amdgpu: add ras error query count interface for nbio · 52652ef2

Guchun Chen authored Sep 04, 2019

Add the interface query_ras_error_count for nbio.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

52652ef2

drm/amdgpu: fix CPDMA hang in PRT mode for VEGA10 · ff9d0971

Tianci.Yin authored Sep 10, 2019

add and_mask since the programming logic of golden setting changed
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Tianci.Yin <tianci.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

ff9d0971

drm/amdgpu: enable error injection to XGMI block via debugfs · f3170352

Hawking Zhang authored Sep 08, 2019

allow inject error to XGMI block via debugfs node ras_ctrl
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

f3170352

drm/amdgpu: initialize ras structures for xgmi block (v2) · 029fbd43

Hawking Zhang authored Sep 10, 2019

init ras common interface and fs node for xgmi block

v2: remove unnecessary physical node number check before
invoking amdgpu_xgmi_ras_late_init
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

029fbd43

drm/amdkfd: fix the missed asic name while inited renoir_device_info · acb9acbe

Huang Rui authored Sep 10, 2019

This patch fixes null pointer issue below, I missed to init the asic renior name
while I rebase the patches.

[  106.004250] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  106.004254] #PF: supervisor read access in kernel mode
[  106.004256] #PF: error_code(0x0000) - not-present page
[  106.004257] PGD 0 P4D 0
[  106.004261] Oops: 0000 [#1] SMP NOPTI
[  106.004264] CPU: 3 PID: 1422 Comm: modprobe Not tainted 5.2.0-rc1-custom #1
[  106.004266] Hardware name: AMD Celadon-RN/Celadon-RN, BIOS
WCD9814N_Weekly_19_08_1 08/14/2019
[  106.004272] RIP: 0010:strncpy+0x12/0x30
[  106.004274] Code: c1 c0 11 48 c1 c6 15 48 31 d0 48 c1 c2 20 31 c2 89 d0 31 f0
41 5c 5d c3 55 48 85 d2 48 89 f8 48 89 e5 74 1e 48 01 fa 48 89 f9 <44> 0f b6 06
41 80 f8 01 44 88 01 48 83 de ff 48 83 c1 01 48 39 d1
[  106.004278] RSP: 0018:ffffc092c1fd37a8 EFLAGS: 00010286
[  106.004281] RAX: ffff9e943466a28c RBX: 00000000000036ed RCX: ffff9e943466a28c
[  106.004283] RDX: ffff9e943466a2ac RSI: 0000000000000000 RDI: ffff9e943466a28c
[  106.004285] RBP: ffffc092c1fd37a8 R08: ffff9e943d100000 R09: 0000000000000228
[  106.004287] R10: ffff9e94418dc5a8 R11: ffff9e944746c0d0 R12: 0000000000000000
[  106.004289] R13: ffff9e943fa1ec00 R14: ffff9e943466a200 R15: ffff9e943466a200
[  106.004291] FS:  00007f7a022c5540(0000) GS:ffff9e9447ac0000(0000)
knlGS:0000000000000000
[  106.004294] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  106.004296] CR2: 0000000000000000 CR3: 00000001ff0b0000 CR4: 0000000000340ee0
[  106.004298] Call Trace:
[  106.004382]  kfd_topology_add_device+0x150/0x610 [amdgpu]
[  106.004445]  kgd2kfd_device_init+0x2e0/0x4f0 [amdgpu]
[  106.004509]  amdgpu_amdkfd_device_init+0x14c/0x1b0 [amdgpu]
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-and-Tested-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

acb9acbe

drm/amd/display: Implement voltage limitation for dali · d1082e23

Bhawanpreet Lakha authored Sep 06, 2019

[Why]
we only want the lowest voltage to be available for dali.

[How]
Use the get_highest_allowed_voltage_level function
to return 0 for dali
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

d1082e23

drm/amd/display: add Asic ID for Dali · c4cacce7

Bhawanpreet Lakha authored Sep 06, 2019

Dali is a new asic revision based on raven2

Add the ID and ASICREV_IS_DALI define
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

c4cacce7

drm/amdgpu: Allow to reset to EERPOM table. · 084fe13b

Andrey Grodzovsky authored Sep 09, 2019

The table grows quickly during debug/development effort when
multiple RAS errors are injected. Allow to avoid this by setting
table header back to empty if needed.

v2: Switch to debugfs entry instead of load time parameter.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

084fe13b

drm/amdgpu: Add amdgpu_ras_eeprom_reset_table · d01b400b

Andrey Grodzovsky authored Sep 09, 2019

This will allow to reset the table on the fly.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

d01b400b

drm/amdgpu: rename umc ras_init to err_cnt_init · d99659a0

Tao Zhou authored Sep 06, 2019

this interface is related to specific version of umc, distinguish it
from ras_late_init
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

d99659a0

drm/amdgpu: move umc ras init to umc block · 4930aabe

Tao Zhou authored Sep 05, 2019

move umc ras init from ras module to umc block, generic ras module
should pay less attention to specific ras block.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

4930aabe

drm/amdgpu: move umc late init from gmc to umc block · 86edcc7d

Tao Zhou authored Sep 05, 2019

umc late init is umc specific, it's more suitable to be put in umc block
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

86edcc7d

drm/amdgpu: remove duplicated header file include · 1bd252c5

Guchun Chen authored Sep 10, 2019

amdgpu_ras.h is already included.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1bd252c5

drm/amdgpu: remove needless usage of #ifdef · a35ad98b

Shirish S authored Sep 12, 2019

define sched_policy in case CONFIG_HSA_AMD is not
enabled, with this there is no need to check for CONFIG_HSA_AMD
else where in driver code.
Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Shirish S <shirish.s@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

a35ad98b

drm/amdgpu: fix build error without CONFIG_HSA_AMD · 8c9f69bc

Shirish S authored Sep 10, 2019

If CONFIG_HSA_AMD is not set, build fails:

drivers/gpu/drm/amd/amdgpu/amdgpu_device.o: In function `amdgpu_device_ip_early_init':
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1626: undefined reference to `sched_policy'

Use CONFIG_HSA_AMD to guard this.

Fixes: 1abb680ad371 ("drm/amdgpu: disable gfxoff while use no H/W scheduling policy")
Signed-off-by: Shirish S <shirish.s@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

8c9f69bc

drm/amd/powerplay: update smu11_driver_if_arcturus.h · 38750f03

Evan Quan authored Sep 06, 2019

Also bump the SMU11_DRIVER_IF_VERSION_ARCT.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

38750f03

drm/amd/powerplay: issue DC-BTC for arcturus on SMU init · 04c572a0

Evan Quan authored Sep 05, 2019

Need to perform DC-BTC for arcturus on bootup.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

04c572a0

drm/amdgpu: Avoid RAS recovery init when no RAS support. · 4d1337d2

Andrey Grodzovsky authored Sep 06, 2019

Fixes driver load regression on APUs.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

4d1337d2

drm/amdgpu: cleanup PTE flag generation v3 · cbfae36c

Christian König authored Sep 02, 2019

Move the ASIC specific code into a new callback function.

v2: mask the flags for SI and CIK instead of a BUG_ON().
v3: remove last missed BUG_ON().
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

cbfae36c