• Jonathan Kim's avatar
    drm/amdgpu: add gfx9.4.1 hw debug mode enable and disable calls · 01f64820
    Jonathan Kim authored
    On GFX9.4.1, the implicit wait count instruction on s_barrier is
    disabled by default in the driver during normal operation for
    performance requirements.
    
    There is a hardware bug in GFX9.4.1 where if the implicit wait count
    instruction after an s_barrier instruction is disabled, any wave that
    hits an exception may step over the s_barrier when returning from the
    trap handler with the barrier logic having no ability to be
    aware of this, thereby causing other waves to wait at the barrier
    indefinitely resulting in a shader hang.  This bug has been corrected
    for GFX9.4.2 and onward.
    
    Since the debugger subscribes to hardware exceptions, in order to avoid
    this bug, the debugger must enable implicit wait count on s_barrier
    for a debug session and disable it on detach.
    
    In order to change this setting in the in the device global SQ_CONFIG
    register, the GFX pipeline must be idle.  GFX9.4.1 as a compute device
    will either dispatch work through the compute ring buffers used for
    image post processing or through the hardware scheduler by the KFD.
    
    Have the KGD suspend and drain the compute ring buffer, then suspend the
    hardware scheduler and block any future KFD process job requests before
    changing the implicit wait count setting.  Once set, resume all work.
    Signed-off-by: default avatarJonathan Kim <jonathan.kim@amd.com>
    Reviewed-by: default avatarFelix Kuehling <felix.kuehling@amd.com>
    Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
    01f64820
gfx_v9_0.c 241 KB