• Chris Wilson's avatar
    drm/i915/gt: Close race between engine_park and intel_gt_retire_requests · ca1711d1
    Chris Wilson authored
    The general concept was that intel_timeline.active_count was locked by
    the intel_timeline.mutex. The exception was for power management, where
    the engine->kernel_context->timeline could be manipulated under the
    global wakeref.mutex.
    
    This was quite solid, as we always manipulated the timeline only while
    we held an engine wakeref.
    
    And then we started retiring requests outside of struct_mutex, only
    using the timelines.active_list and the timeline->mutex. There we
    started manipulating intel_timeline.active_count outside of an engine
    wakeref, and so introduced a race between __engine_park() and
    intel_gt_retire_requests(), a race that could result in the
    engine->kernel_context not being added to the active timelines and so
    losing requests, which caused us to keep the system permanently powered
    up [and unloadable].
    
    The race would be easy to close if we could take the engine wakeref for
    the timeline before we retire -- except timelines are not bound to any
    engine and so we would need to keep all active engines awake. The
    alternative is to guard intel_timeline_enter/intel_timeline_exit for use
    outside of the timeline->mutex.
    
    Fixes: e5dadff4 ("drm/i915: Protect request retirement with timeline->mutex")
    Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
    Cc: Matthew Auld <matthew.auld@intel.com>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20191120165514.3955081-1-chris@chris-wilson.co.uk
    (cherry picked from commit a6edbca7)
    Signed-off-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
    ca1711d1
intel_gt_requests.c 3.29 KB