1. 19 Oct, 2017 20 commits
  2. 09 Oct, 2017 18 commits
  3. 06 Oct, 2017 2 commits
    • Nicolai Hähnle's avatar
      drm/amd/sched: fix deadlock caused by unsignaled fences of deleted jobs · 79867462
      Nicolai Hähnle authored
      Highly concurrent Piglit runs can trigger a race condition where a pending
      SDMA job on a buffer object is never executed because the corresponding
      process is killed (perhaps due to a crash). Since the job's fences were
      never signaled, the buffer object was effectively leaked. Worse, the
      buffer was stuck wherever it happened to be at the time, possibly in VRAM.
      
      The symptom was user space processes stuck in interruptible waits with
      kernel stacks like:
      
          [<ffffffffbc5e6722>] dma_fence_default_wait+0x112/0x250
          [<ffffffffbc5e6399>] dma_fence_wait_timeout+0x39/0xf0
          [<ffffffffbc5e82d2>] reservation_object_wait_timeout_rcu+0x1c2/0x300
          [<ffffffffc03ce56f>] ttm_bo_cleanup_refs_and_unlock+0xff/0x1a0 [ttm]
          [<ffffffffc03cf1ea>] ttm_mem_evict_first+0xba/0x1a0 [ttm]
          [<ffffffffc03cf611>] ttm_bo_mem_space+0x341/0x4c0 [ttm]
          [<ffffffffc03cfc54>] ttm_bo_validate+0xd4/0x150 [ttm]
          [<ffffffffc03cffbd>] ttm_bo_init_reserved+0x2ed/0x420 [ttm]
          [<ffffffffc042f523>] amdgpu_bo_create_restricted+0x1f3/0x470 [amdgpu]
          [<ffffffffc042f9fa>] amdgpu_bo_create+0xda/0x220 [amdgpu]
          [<ffffffffc04349ea>] amdgpu_gem_object_create+0xaa/0x140 [amdgpu]
          [<ffffffffc0434f97>] amdgpu_gem_create_ioctl+0x97/0x120 [amdgpu]
          [<ffffffffc037ddba>] drm_ioctl+0x1fa/0x480 [drm]
          [<ffffffffc041904f>] amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
          [<ffffffffbc23db33>] do_vfs_ioctl+0xa3/0x5f0
          [<ffffffffbc23e0f9>] SyS_ioctl+0x79/0x90
          [<ffffffffbc864ffb>] entry_SYSCALL_64_fastpath+0x1e/0xad
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      Note: The correctness of this change depends on the earlier commit
      "drm/amd/sched: move adding finish callback to amd_sched_job_begin"
      
      v2: set an error on the finished fence
      Signed-off-by: default avatarNicolai Hähnle <nicolai.haehnle@amd.com>
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Reviewed-by: default avatarAndres Rodriguez <andresx7@gmail.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      79867462
    • Nicolai Hähnle's avatar
      drm/amd/sched: NULL out the s_fence field after run_job · 29d25355
      Nicolai Hähnle authored
      amd_sched_process_job drops the fence reference, so NULL out the s_fence
      field before adding it as a callback to guard against accidentally using
      s_fence after it may have be freed.
      
      v2: add a clarifying comment
      Signed-off-by: default avatarNicolai Hähnle <nicolai.haehnle@amd.com>
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Reviewed-by: default avatarAndres Rodriguez <andresx7@gmail.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      29d25355