• Chris Wilson's avatar
    drm/i915: Replace global breadcrumbs with per-context interrupt tracking · 52c0fdb2
    Chris Wilson authored
    A few years ago, see commit 688e6c72 ("drm/i915: Slaughter the
    thundering i915_wait_request herd"), the issue of handling multiple
    clients waiting in parallel was brought to our attention. The
    requirement was that every client should be woken immediately upon its
    request being signaled, without incurring any cpu overhead.
    
    To handle certain fragility of our hw meant that we could not do a
    simple check inside the irq handler (some generations required almost
    unbounded delays before we could be sure of seqno coherency) and so
    request completion checking required delegation.
    
    Before commit 688e6c72, the solution was simple. Every client
    waiting on a request would be woken on every interrupt and each would do
    a heavyweight check to see if their request was complete. Commit
    688e6c72 introduced an rbtree so that only the earliest waiter on
    the global timeline would woken, and would wake the next and so on.
    (Along with various complications to handle requests being reordered
    along the global timeline, and also a requirement for kthread to provide
    a delegate for fence signaling that had no process context.)
    
    The global rbtree depends on knowing the execution timeline (and global
    seqno). Without knowing that order, we must instead check all contexts
    queued to the HW to see which may have advanced. We trim that list by
    only checking queued contexts that are being waited on, but still we
    keep a list of all active contexts and their active signalers that we
    inspect from inside the irq handler. By moving the waiters onto the fence
    signal list, we can combine the client wakeup with the dma_fence
    signaling (a dramatic reduction in complexity, but does require the HW
    being coherent, the seqno must be visible from the cpu before the
    interrupt is raised - we keep a timer backup just in case).
    
    Having previously fixed all the issues with irq-seqno serialisation (by
    inserting delays onto the GPU after each request instead of random delays
    on the CPU after each interrupt), we can rely on the seqno state to
    perfom direct wakeups from the interrupt handler. This allows us to
    preserve our single context switch behaviour of the current routine,
    with the only downside that we lose the RT priority sorting of wakeups.
    In general, direct wakeup latency of multiple clients is about the same
    (about 10% better in most cases) with a reduction in total CPU time spent
    in the waiter (about 20-50% depending on gen). Average herd behaviour is
    improved, but at the cost of not delegating wakeups on task_prio.
    
    v2: Capture fence signaling state for error state and add comments to
    warm even the most cold of hearts.
    v3: Check if the request is still active before busywaiting
    v4: Reduce the amount of pointer misdirection with list_for_each_safe
    and using a local i915_request variable inside the loops
    v5: Add a missing pluralisation to a purely informative selftest message.
    
    References: 688e6c72 ("drm/i915: Slaughter the thundering i915_wait_request herd")
    Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20190129205230.19056-2-chris@chris-wilson.co.uk
    52c0fdb2
lib_sw_fence.c 3.31 KB