1. 18 Jul, 2023 38 commits
  2. 13 Jul, 2023 2 commits
    • Saleemkhan Jamadar's avatar
      Revert "drm/amdgpu: update kernel vcn ring test" · 093b21f4
      Saleemkhan Jamadar authored
      VCN FW depncencies revert it to unlock others
      
      This reverts commit 3ebfa943.
      Signed-off-by: default avatarSaleemkhan Jamadar <saleemkhan.jamadar@amd.com>
      Acked-by: default avatarVeerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      093b21f4
    • Guchun Chen's avatar
      drm/amdgpu/vkms: relax timer deactivation by hrtimer_try_to_cancel · 826c1e92
      Guchun Chen authored
      In below thousands of screen rotation loop tests with virtual display
      enabled, a CPU hard lockup issue may happen, leading system to unresponsive
      and crash.
      
      do {
      	xrandr --output Virtual --rotate inverted
      	xrandr --output Virtual --rotate right
      	xrandr --output Virtual --rotate left
      	xrandr --output Virtual --rotate normal
      } while (1);
      
      NMI watchdog: Watchdog detected hard LOCKUP on cpu 1
      
      ? hrtimer_run_softirq+0x140/0x140
      ? store_vblank+0xe0/0xe0 [drm]
      hrtimer_cancel+0x15/0x30
      amdgpu_vkms_disable_vblank+0x15/0x30 [amdgpu]
      drm_vblank_disable_and_save+0x185/0x1f0 [drm]
      drm_crtc_vblank_off+0x159/0x4c0 [drm]
      ? record_print_text.cold+0x11/0x11
      ? wait_for_completion_timeout+0x232/0x280
      ? drm_crtc_wait_one_vblank+0x40/0x40 [drm]
      ? bit_wait_io_timeout+0xe0/0xe0
      ? wait_for_completion_interruptible+0x1d7/0x320
      ? mutex_unlock+0x81/0xd0
      amdgpu_vkms_crtc_atomic_disable
      
      It's caused by a stuck in lock dependency in such scenario on different
      CPUs.
      
      CPU1                                             CPU2
      drm_crtc_vblank_off                              hrtimer_interrupt
          grab event_lock (irq disabled)                   __hrtimer_run_queues
              grab vbl_lock/vblank_time_block                  amdgpu_vkms_vblank_simulate
                  amdgpu_vkms_disable_vblank                       drm_handle_vblank
                      hrtimer_cancel                                         grab dev->event_lock
      
      So CPU1 stucks in hrtimer_cancel as timer callback is running endless on
      current clock base, as that timer queue on CPU2 has no chance to finish it
      because of failing to hold the lock. So NMI watchdog will throw the errors
      after its threshold, and all later CPUs are impacted/blocked.
      
      So use hrtimer_try_to_cancel to fix this, as disable_vblank callback
      does not need to wait the handler to finish. And also it's not necessary
      to check the return value of hrtimer_try_to_cancel, because even if it's
      -1 which means current timer callback is running, it will be reprogrammed
      in hrtimer_start with calling enable_vblank to make it works.
      
      v2: only re-arm timer when vblank is enabled (Christian) and add a Fixes
      tag as well
      
      v3: drop warn printing (Christian)
      
      v4: drop superfluous check of blank->enabled in timer function, as it's
      guaranteed in drm_handle_vblank (Christian)
      
      Fixes: 84ec374b ("drm/amdgpu: create amdgpu_vkms (v4)")
      Cc: stable@vger.kernel.org
      Suggested-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarGuchun Chen <guchun.chen@amd.com>
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      826c1e92