• Philip Yang's avatar
    drm/amdkfd: Fix lock dependency warning with srcu · 2a9de42e
    Philip Yang authored
    ======================================================
    WARNING: possible circular locking dependency detected
    6.5.0-kfd-yangp #2289 Not tainted
    ------------------------------------------------------
    kworker/0:2/996 is trying to acquire lock:
            (srcu){.+.+}-{0:0}, at: __synchronize_srcu+0x5/0x1a0
    
    but task is already holding lock:
            ((work_completion)(&svms->deferred_list_work)){+.+.}-{0:0}, at:
    	process_one_work+0x211/0x560
    
    which lock already depends on the new lock.
    
    the existing dependency chain (in reverse order) is:
    
    -> #3 ((work_completion)(&svms->deferred_list_work)){+.+.}-{0:0}:
            __flush_work+0x88/0x4f0
            svm_range_list_lock_and_flush_work+0x3d/0x110 [amdgpu]
            svm_range_set_attr+0xd6/0x14c0 [amdgpu]
            kfd_ioctl+0x1d1/0x630 [amdgpu]
            __x64_sys_ioctl+0x88/0xc0
    
    -> #2 (&info->lock#2){+.+.}-{3:3}:
            __mutex_lock+0x99/0xc70
            amdgpu_amdkfd_gpuvm_restore_process_bos+0x54/0x740 [amdgpu]
            restore_process_helper+0x22/0x80 [amdgpu]
            restore_process_worker+0x2d/0xa0 [amdgpu]
            process_one_work+0x29b/0x560
            worker_thread+0x3d/0x3d0
    
    -> #1 ((work_completion)(&(&process->restore_work)->work)){+.+.}-{0:0}:
            __flush_work+0x88/0x4f0
            __cancel_work_timer+0x12c/0x1c0
            kfd_process_notifier_release_internal+0x37/0x1f0 [amdgpu]
            __mmu_notifier_release+0xad/0x240
            exit_mmap+0x6a/0x3a0
            mmput+0x6a/0x120
            do_exit+0x322/0xb90
            do_group_exit+0x37/0xa0
            __x64_sys_exit_group+0x18/0x20
            do_syscall_64+0x38/0x80
    
    -> #0 (srcu){.+.+}-{0:0}:
            __lock_acquire+0x1521/0x2510
            lock_sync+0x5f/0x90
            __synchronize_srcu+0x4f/0x1a0
            __mmu_notifier_release+0x128/0x240
            exit_mmap+0x6a/0x3a0
            mmput+0x6a/0x120
            svm_range_deferred_list_work+0x19f/0x350 [amdgpu]
            process_one_work+0x29b/0x560
            worker_thread+0x3d/0x3d0
    
    other info that might help us debug this:
    Chain exists of:
      srcu --> &info->lock#2 --> (work_completion)(&svms->deferred_list_work)
    
    Possible unsafe locking scenario:
    
            CPU0                    CPU1
            ----                    ----
            lock((work_completion)(&svms->deferred_list_work));
                            lock(&info->lock#2);
    			lock((work_completion)(&svms->deferred_list_work));
            sync(srcu);
    Signed-off-by: default avatarPhilip Yang <Philip.Yang@amd.com>
    Reviewed-by: default avatarFelix Kuehling <felix.kuehling@amd.com>
    Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
    2a9de42e
kfd_svm.c 114 KB