1. 28 Apr, 2022 7 commits
    • Marek Marczykowski-Górecki's avatar
      drm/amdgpu: do not use passthrough mode in Xen dom0 · 78b12008
      Marek Marczykowski-Górecki authored
      While technically Xen dom0 is a virtual machine too, it does have
      access to most of the hardware so it doesn't need to be considered a
      "passthrough". Commit b818a5d3 ("drm/amdgpu/gmc: use PCI BARs for
      APUs in passthrough") changed how FB is accessed based on passthrough
      mode. This breaks amdgpu in Xen dom0 with message like this:
      
          [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
      
      While the reason for this failure is unclear, the passthrough mode is
      not really necessary in Xen dom0 anyway. So, to unbreak booting affected
      kernels, disable passthrough mode in this case.
      
      Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1985
      Fixes: b818a5d3 ("drm/amdgpu/gmc: use PCI BARs for APUs in passthrough")
      Signed-off-by: default avatarMarek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      78b12008
    • Evan Quan's avatar
      drm/amd/pm: fix the compile warning · 555238d9
      Evan Quan authored
      Fix the compile warning below:
      drivers/gpu/drm/amd/amdgpu/../pm/legacy-dpm/kv_dpm.c:1641
      kv_get_acp_boot_level() warn: always true condition '(table->entries[i]->clk >= 0) => (0-u32max >= 0)'
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      CC: Alex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarEvan Quan <evan.quan@amd.com>
      Reviewed-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      555238d9
    • Mukul Joshi's avatar
      drm/amdkfd: Fix circular lock dependency warning · b179fc28
      Mukul Joshi authored
      [  168.544078] ======================================================
      [  168.550309] WARNING: possible circular locking dependency detected
      [  168.556523] 5.16.0-kfd-fkuehlin #148 Tainted: G            E
      [  168.562558] ------------------------------------------------------
      [  168.568764] kfdtest/3479 is trying to acquire lock:
      [  168.573672] ffffffffc0927a70 (&topology_lock){++++}-{3:3}, at:
      		kfd_topology_device_by_id+0x16/0x60 [amdgpu] [  168.583663]
                      but task is already holding lock:
      [  168.589529] ffff97d303dee668 (&mm->mmap_lock#2){++++}-{3:3}, at:
      		vm_mmap_pgoff+0xa9/0x180 [  168.597755]
                      which lock already depends on the new lock.
      
      [  168.605970]
                      the existing dependency chain (in reverse order) is:
      [  168.613487]
                      -> #3 (&mm->mmap_lock#2){++++}-{3:3}:
      [  168.619700]        lock_acquire+0xca/0x2e0
      [  168.623814]        down_read+0x3e/0x140
      [  168.627676]        do_user_addr_fault+0x40d/0x690
      [  168.632399]        exc_page_fault+0x6f/0x270
      [  168.636692]        asm_exc_page_fault+0x1e/0x30
      [  168.641249]        filldir64+0xc8/0x1e0
      [  168.645115]        call_filldir+0x7c/0x110
      [  168.649238]        ext4_readdir+0x58e/0x940
      [  168.653442]        iterate_dir+0x16a/0x1b0
      [  168.657558]        __x64_sys_getdents64+0x83/0x140
      [  168.662375]        do_syscall_64+0x35/0x80
      [  168.666492]        entry_SYSCALL_64_after_hwframe+0x44/0xae
      [  168.672095]
                      -> #2 (&type->i_mutex_dir_key#6){++++}-{3:3}:
      [  168.679008]        lock_acquire+0xca/0x2e0
      [  168.683122]        down_read+0x3e/0x140
      [  168.686982]        path_openat+0x5b2/0xa50
      [  168.691095]        do_file_open_root+0xfc/0x190
      [  168.695652]        file_open_root+0xd8/0x1b0
      [  168.702010]        kernel_read_file_from_path_initns+0xc4/0x140
      [  168.709542]        _request_firmware+0x2e9/0x5e0
      [  168.715741]        request_firmware+0x32/0x50
      [  168.721667]        amdgpu_cgs_get_firmware_info+0x370/0xdd0 [amdgpu]
      [  168.730060]        smu7_upload_smu_firmware_image+0x53/0x190 [amdgpu]
      [  168.738414]        fiji_start_smu+0xcf/0x4e0 [amdgpu]
      [  168.745539]        pp_dpm_load_fw+0x21/0x30 [amdgpu]
      [  168.752503]        amdgpu_pm_load_smu_firmware+0x4b/0x80 [amdgpu]
      [  168.760698]        amdgpu_device_fw_loading+0xb8/0x140 [amdgpu]
      [  168.768412]        amdgpu_device_init.cold+0xdf6/0x1716 [amdgpu]
      [  168.776285]        amdgpu_driver_load_kms+0x15/0x120 [amdgpu]
      [  168.784034]        amdgpu_pci_probe+0x19b/0x3a0 [amdgpu]
      [  168.791161]        local_pci_probe+0x40/0x80
      [  168.797027]        work_for_cpu_fn+0x10/0x20
      [  168.802839]        process_one_work+0x273/0x5b0
      [  168.808903]        worker_thread+0x20f/0x3d0
      [  168.814700]        kthread+0x176/0x1a0
      [  168.819968]        ret_from_fork+0x1f/0x30
      [  168.825563]
                      -> #1 (&adev->pm.mutex){+.+.}-{3:3}:
      [  168.834721]        lock_acquire+0xca/0x2e0
      [  168.840364]        __mutex_lock+0xa2/0x930
      [  168.846020]        amdgpu_dpm_get_mclk+0x37/0x60 [amdgpu]
      [  168.853257]        amdgpu_amdkfd_get_local_mem_info+0xba/0xe0 [amdgpu]
      [  168.861547]        kfd_create_vcrat_image_gpu+0x1b1/0xbb0 [amdgpu]
      [  168.869478]        kfd_create_crat_image_virtual+0x447/0x510 [amdgpu]
      [  168.877884]        kfd_topology_add_device+0x5c8/0x6f0 [amdgpu]
      [  168.885556]        kgd2kfd_device_init.cold+0x385/0x4c5 [amdgpu]
      [  168.893347]        amdgpu_amdkfd_device_init+0x138/0x180 [amdgpu]
      [  168.901177]        amdgpu_device_init.cold+0x141b/0x1716 [amdgpu]
      [  168.909025]        amdgpu_driver_load_kms+0x15/0x120 [amdgpu]
      [  168.916458]        amdgpu_pci_probe+0x19b/0x3a0 [amdgpu]
      [  168.923442]        local_pci_probe+0x40/0x80
      [  168.929249]        work_for_cpu_fn+0x10/0x20
      [  168.935008]        process_one_work+0x273/0x5b0
      [  168.940944]        worker_thread+0x20f/0x3d0
      [  168.946623]        kthread+0x176/0x1a0
      [  168.951765]        ret_from_fork+0x1f/0x30
      [  168.957277]
                      -> #0 (&topology_lock){++++}-{3:3}:
      [  168.965993]        check_prev_add+0x8f/0xbf0
      [  168.971613]        __lock_acquire+0x1299/0x1ca0
      [  168.977485]        lock_acquire+0xca/0x2e0
      [  168.982877]        down_read+0x3e/0x140
      [  168.987975]        kfd_topology_device_by_id+0x16/0x60 [amdgpu]
      [  168.995583]        kfd_device_by_id+0xa/0x20 [amdgpu]
      [  169.002180]        kfd_mmap+0x95/0x200 [amdgpu]
      [  169.008293]        mmap_region+0x337/0x5a0
      [  169.013679]        do_mmap+0x3aa/0x540
      [  169.018678]        vm_mmap_pgoff+0xdc/0x180
      [  169.024095]        ksys_mmap_pgoff+0x186/0x1f0
      [  169.029734]        do_syscall_64+0x35/0x80
      [  169.035005]        entry_SYSCALL_64_after_hwframe+0x44/0xae
      [  169.041754]
                      other info that might help us debug this:
      
      [  169.053276] Chain exists of:
                        &topology_lock --> &type->i_mutex_dir_key#6 --> &mm->mmap_lock#2
      
      [  169.068389]  Possible unsafe locking scenario:
      
      [  169.076661]        CPU0                    CPU1
      [  169.082383]        ----                    ----
      [  169.088087]   lock(&mm->mmap_lock#2);
      [  169.092922]                                lock(&type->i_mutex_dir_key#6);
      [  169.100975]                                lock(&mm->mmap_lock#2);
      [  169.108320]   lock(&topology_lock);
      [  169.112957]
                       *** DEADLOCK ***
      
      This commit fixes the deadlock warning by ensuring pm.mutex is not
      held while holding the topology lock. For this, kfd_local_mem_info
      is moved into the KFD dev struct and filled during device init.
      This cached value can then be used instead of querying the value
      again and again.
      Signed-off-by: default avatarMukul Joshi <mukul.joshi@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      b179fc28
    • Mukul Joshi's avatar
      drm/amdkfd: Fix updating IO links during device removal · 98447635
      Mukul Joshi authored
      The logic to update the IO links when a KFD device
      is removed was not correct as it would miss updating
      the proximity domain values for some nodes where the
      node_from and node_to both were greater values than the
      proximity domain value of the KFD device being removed
      from topology.
      
      Fixes: 46d18d51 ("drm/amdkfd: Cleanup IO links during KFD device removal")
      Signed-off-by: default avatarMukul Joshi <mukul.joshi@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      98447635
    • Christophe JAILLET's avatar
      drm/amdkfd: Use non-atomic bitmap functions when possible · b8b9ba58
      Christophe JAILLET authored
      All uses of the 'kfd->gtt_sa_bitmap' bitmap are protected with the
      'kfd->gtt_sa_lock' mutex.
      
      So:
         - prefer the non-atomic '__set_bit()' function
         - use the non-atomic 'bitmap_[set|clear]()' functions instead of
           equivalent 'for' loops. These functions can work on several bits at a
           time
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      b8b9ba58
    • Christophe JAILLET's avatar
      drm/amdkfd: Use bitmap_zalloc() when applicable · f43a9f18
      Christophe JAILLET authored
      'kfd->gtt_sa_bitmap' is a bitmap. So use 'bitmap_zalloc()' to simplify
      code, improve the semantic and avoid some open-coded arithmetic in
      allocator arguments.
      
      Also change the corresponding 'kfree()' into 'bitmap_free()' to keep
      consistency.
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      f43a9f18
    • Melissa Wen's avatar
      drm/amd/display: protect remaining FPU-code calls on dcn3.1.x · 7324d02a
      Melissa Wen authored
      From [1], I realized two other calls to dcn30 code are associated with
      FPU operations and are not protected by DC_FP_* macros:
      * dcn30_populate_dml_writeback_from_context()
      * dcn30_set_mcif_arb_params()
      
      So, since FPU-associated code is not fully isolated in dcn30, and
      dcn3.1.x reuses them, let's wrap their calls properly.
      
      Note: this patch complements the fix from [1].
      
      [1] https://lore.kernel.org/amd-gfx/20220329082957.1662655-1-chandan.vurdigerenataraj@amd.com/Reviewed-by: default avatarRodrigo Siqueira <Rodrigo.Siqueira@amd.com>
      Signed-off-by: default avatarMelissa Wen <mwen@igalia.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      7324d02a
  2. 26 Apr, 2022 17 commits
  3. 25 Apr, 2022 15 commits
  4. 22 Apr, 2022 1 commit