• Philip Yang's avatar
    drm/amdkfd: handle stale retry fault · 373e3ccd
    Philip Yang authored
    Retry fault interrupt maybe pending in IH ring after GPU page table
    is updated to recover the vm fault, because each page of the range
    generate retry fault interrupt. There is race if application unmap
    range to remove and free the range first and then retry fault work
    restore_pages handle the retry fault interrupt, because range can not be
    found, this vm fault can not be recovered and report incorrect GPU vm
    fault to application.
    
    Before unmap to remove and free range, drain retry fault interrupt
    from IH ring1 to ensure no retry fault comes after the range is removed.
    
    Drain retry fault interrupt skip the range which is on deferred list
    to remove, or the range is child range, which is split by unmap, does
    not add to svms and have interval notifier.
    Signed-off-by: default avatarPhilip Yang <Philip.Yang@amd.com>
    Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
    Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
    373e3ccd
kfd_svm.c 83.4 KB