• Chris Wilson's avatar
    drm/i915: Implement inter-engine read-read optimisations · b4716185
    Chris Wilson authored
    Currently, we only track the last request globally across all engines.
    This prevents us from issuing concurrent read requests on e.g. the RCS
    and BCS engines (or more likely the render and media engines). Without
    semaphores, we incur costly stalls as we synchronise between rings -
    greatly impacting the current performance of Broadwell versus Haswell in
    certain workloads (like video decode). With the introduction of
    reference counted requests, it is much easier to track the last request
    per ring, as well as the last global write request so that we can
    optimise inter-engine read read requests (as well as better optimise
    certain CPU waits).
    
    v2: Fix inverted readonly condition for nonblocking waits.
    v3: Handle non-continguous engine array after waits
    v4: Rebase, tidy, rewrite ring list debugging
    v5: Use obj->active as a bitfield, it looks cool
    v6: Micro-optimise, mostly involving moving code around
    v7: Fix retire-requests-upto for execlists (and multiple rq->ringbuf)
    v8: Rebase
    v9: Refactor i915_gem_object_sync() to allow the compiler to better
    optimise it.
    
    Benchmark: igt/gem_read_read_speed
    hsw:gt3e (with semaphores):
    Before: Time to read-read 1024k:		275.794µs
    After:  Time to read-read 1024k:		123.260µs
    
    hsw:gt3e (w/o semaphores):
    Before: Time to read-read 1024k:		230.433µs
    After:  Time to read-read 1024k:		124.593µs
    
    bdw-u (w/o semaphores):             Before          After
    Time to read-read 1x1:            26.274µs       10.350µs
    Time to read-read 128x128:        40.097µs       21.366µs
    Time to read-read 256x256:        77.087µs       42.608µs
    Time to read-read 512x512:       281.999µs      181.155µs
    Time to read-read 1024x1024:    1196.141µs     1118.223µs
    Time to read-read 2048x2048:    5639.072µs     5225.837µs
    Time to read-read 4096x4096:   22401.662µs    21137.067µs
    Time to read-read 8192x8192:   89617.735µs    85637.681µs
    
    Testcase: igt/gem_concurrent_blit (read-read and friends)
    Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
    Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> [v8]
    [danvet: s/\<rq\>/req/g]
    Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
    b4716185
intel_overlay.c 39.7 KB