Commit c6a2ac71 authored by Tvrtko Ursulin's avatar Tvrtko Ursulin

drm/i915: Execlists small cleanups and micro-optimisations

Assorted changes in the areas of code cleanup, reduction of
invariant conditional in the interrupt handler and lock
contention and MMIO access optimisation.

 * Remove needless initialization.
 * Improve cache locality by reorganizing code and/or using
   branch hints to keep unexpected or error conditions out
   of line.
 * Favor busy submit path vs. empty queue.
 * Less branching in hot-paths.

v2:

 * Avoid mmio reads when possible. (Chris Wilson)
 * Use natural integer size for csb indices.
 * Remove useless return value from execlists_update_context.
 * Extract 32-bit ppgtt PDPs update so it is out of line and
   shared with two callers.
 * Grab forcewake across all mmio operations to ease the
   load on uncore lock and use chepear mmio ops.

v3:

 * Removed some more pointless u8 data types.
 * Removed unused return from execlists_context_queue.
 * Commit message updates.

v4:
 * Unclumsify the unqueue if statement. (Chris Wilson)
 * Hide forcewake from the queuing function. (Chris Wilson)

Version 3 now makes the irq handling code path ~20% smaller on
48-bit PPGTT hardware, and a little bit less elsewhere. Hot
paths are mostly in-line now and hammering on the uncore
spinlock is greatly reduced together with mmio traffic to an
extent.

Benchmarking with "gem_latency -n 100" (keep submitting
batches with 100 nop instruction) shows approximately 4% higher
throughput, 2% less CPU time and 22% smaller latencies. This was
on a big-core while small-cores could benefit even more.

Most likely reason for the improvements are the MMIO
optimization and uncore lock traffic reduction.

One odd result is with "gem_latency -n 0" (dispatching empty
batches) which shows 5% more throughput, 8% less CPU time,
25% better producer and consumer latencies, but 15% higher
dispatch latency which is yet unexplained.
Signed-off-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1456505912-22286-1-git-send-email-tvrtko.ursulin@linux.intel.com
parent 3ba86073
This diff is collapsed.
...@@ -271,7 +271,8 @@ struct intel_engine_cs { ...@@ -271,7 +271,8 @@ struct intel_engine_cs {
spinlock_t execlist_lock; spinlock_t execlist_lock;
struct list_head execlist_queue; struct list_head execlist_queue;
struct list_head execlist_retired_req_list; struct list_head execlist_retired_req_list;
u8 next_context_status_buffer; unsigned int next_context_status_buffer;
unsigned int idle_lite_restore_wa;
bool disable_lite_restore_wa; bool disable_lite_restore_wa;
u32 ctx_desc_template; u32 ctx_desc_template;
u32 irq_keep_mask; /* bitmask for interrupts that should not be masked */ u32 irq_keep_mask; /* bitmask for interrupts that should not be masked */
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment