- 22 Aug, 2016 11 commits
-
-
David Weinehall authored
Fix minor whitespace issues plus a typo. Signed-off-by: David Weinehall <david.weinehall@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160822103245.24069-2-david.weinehall@linux.intel.comReviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
-
Maarten Lankhorst authored
Merge commit 5e580523 reverts the version bumping parts of commit 4aa7fb9c. Bump the versions again and request the specific firmware version. The currently recommended versions are: SKL 1.26, KBL 1.01 and BXT 1.07. Cc: Patrik Jakobsson <patrik.jakobsson@linux.intel.com> Cc: Imre Deak <imre.deak@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97242 Cc: drm-intel-fixes@lists.freedesktop.org Fixes: 5e580523 ("Backmerge tag 'v4.7' into drm-next") Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471266567-22443-1-git-send-email-maarten.lankhorst@linux.intel.comReviewed-by: Imre Deak <imre.deak@intel.com>
-
Lyude authored
If we're enabling a pipe, we'll need to modify the watermarks on all active planes. Since those planes won't be added to the state on their own, we need to add them ourselves. Signed-off-by: Lyude <cpaul@redhat.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Cc: stable@vger.kernel.org Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com> Cc: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471463761-26796-6-git-send-email-cpaul@redhat.com
-
Matt Roper authored
When we write watermark values to the hardware, those values are stored in dev_priv->wm.skl_hw. However with recent watermark changes, the results structure we're copying from only contains valid watermark and DDB values for the pipes that are actually changing; the values for other pipes remain 0. Thus a blind copy of the entire skl_wm_values structure will clobber the values for unchanged pipes...we need to be more selective and only copy over the values for the changing pipes. This mistake was hidden until recently due to another bug that caused us to erroneously re-calculate watermarks for all active pipes rather than changing pipes. Only when that bug was fixed was the impact of this bug discovered (e.g., modesets failing with "Requested display configuration exceeds system watermark limitations" messages and leaving watermarks non-functional, even ones initiated by intel_fbdev_restore_mode). Changes since v1: - Add a function for copying a pipe's wm values (skl_copy_wm_for_pipe()) so we can reuse this later Fixes: 734fa01f ("drm/i915/gen9: Calculate watermarks during atomic 'check' (v2)") Fixes: 9b613022 ("drm/i915/gen9: Re-allocate DDB only for changed pipes") Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Lyude <cpaul@redhat.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Cc: stable@vger.kernel.org Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com> Cc: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471463761-26796-4-git-send-email-cpaul@redhat.com
-
Lyude authored
Since the watermark calculations for Skylake are still broken, we're apt to hitting underruns very easily under multi-monitor configurations. While it would be lovely if this was fixed, it's not. Another problem that's been coming from this however, is the mysterious issue of underruns causing full system hangs. An easy way to reproduce this with a skylake system: - Get a laptop with a skylake GPU, and hook up two external monitors to it - Move the cursor from the built-in LCD to one of the external displays as quickly as you can - You'll get a few pipe underruns, and eventually the entire system will just freeze. After doing a lot of investigation and reading through the bspec, I found the existence of the SAGV, which is responsible for adjusting the system agent voltage and clock frequencies depending on how much power we need. According to the bspec: "The display engine access to system memory is blocked during the adjustment time. SAGV defaults to enabled. Software must use the GT-driver pcode mailbox to disable SAGV when the display engine is not able to tolerate the blocking time." The rest of the bspec goes on to explain that software can simply leave the SAGV enabled, and disable it when we use interlaced pipes/have more then one pipe active. Sure enough, with this patchset the system hangs resulting from pipe underruns on Skylake have completely vanished on my T460s. Additionally, the bspec mentions turning off the SAGV with more then one pipe enabled as a workaround for display underruns. While this patch doesn't entirely fix that, it looks like it does improve the situation a little bit so it's likely this is going to be required to make watermarks on Skylake fully functional. This will still need additional work in the future: we shouldn't be enabling the SAGV if any of the currently enabled planes can't enable WM levels that introduce latencies >= 30 µs. Changes since v11: - Add skl_can_enable_sagv() - Make sure we don't enable SAGV when not all planes can enable watermarks >= the SAGV engine block time. I was originally going to save this for later, but I recently managed to run into a machine that was having problems with a single pipe configuration + SAGV. - Make comparisons to I915_SKL_SAGV_NOT_CONTROLLED explicit - Change I915_SAGV_DYNAMIC_FREQ to I915_SAGV_ENABLE - Move printks outside of mutexes - Don't print error messages twice Changes since v10: - Apparently sandybridge_pcode_read actually writes values and reads them back, despite it's misleading function name. This means we've been doing this mostly wrong and have been writing garbage to the SAGV control. Because of this, we no longer attempt to read the SAGV status during initialization (since there are no helpers for this). - mlankhorst noticed that this patch was breaking on some very early pre-release Skylake machines, which apparently don't allow you to disable the SAGV. To prevent machines from failing tests due to SAGV errors, if the first time we try to control the SAGV results in the mailbox indicating an invalid command, we just disable future attempts to control the SAGV state by setting dev_priv->skl_sagv_status to I915_SKL_SAGV_NOT_CONTROLLED and make a note of it in dmesg. - Move mutex_unlock() a little higher in skl_enable_sagv(). This doesn't actually fix anything, but lets us release the lock a little sooner since we're finished with it. Changes since v9: - Only enable/disable sagv on Skylake Changes since v8: - Add intel_state->modeset guard to the conditional for skl_enable_sagv() Changes since v7: - Remove GEN9_SAGV_LOW_FREQ, replace with GEN9_SAGV_IS_ENABLED (that's all we use it for anyway) - Use GEN9_SAGV_IS_ENABLED instead of 0x1 for clarification - Fix a styling error that snuck past me Changes since v6: - Protect skl_enable_sagv() with intel_state->modeset conditional in intel_atomic_commit_tail() Changes since v5: - Don't use is_power_of_2. Makes things confusing - Don't use the old state to figure out whether or not to enable/disable the sagv, use the new one - Split the loop in skl_disable_sagv into it's own function - Move skl_sagv_enable/disable() calls into intel_atomic_commit_tail() Changes since v4: - Use is_power_of_2 against active_crtcs to check whether we have > 1 pipe enabled - Fix skl_sagv_get_hw_state(): (temp & 0x1) indicates disabled, 0x0 enabled - Call skl_sagv_enable/disable() from pre/post-plane updates Changes since v3: - Use time_before() to compare timeout to jiffies Changes since v2: - Really apply minor style nitpicks to patch this time Changes since v1: - Added comments about this probably being one of the requirements to fixing Skylake's watermark issues - Minor style nitpicks from Matt Roper - Disable these functions on Broxton, since it doesn't have an SAGV Signed-off-by: Lyude <cpaul@redhat.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471463761-26796-3-git-send-email-cpaul@redhat.com [mlankhorst: ENOSYS -> ENXIO, whitespace fixes]
-
Lyude authored
In order to add proper support for the SAGV, we need to be able to know what the cause of a failure to change the SAGV through the pcode mailbox was. The reasoning for this is that some very early pre-release Skylake machines don't actually allow you to control the SAGV on them, and indicate an invalid mailbox command was sent. This also might come in handy in the future for debugging. Changes since v1: - Add functions for interpreting gen6 mailbox error codes along with gen7+ error codes, and actually interpret those codes properly - Renamed patch to reflect new behavior Signed-off-by: Lyude <cpaul@redhat.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471463761-26796-2-git-send-email-cpaul@redhat.com [mlankhorst: -ENOSYS -> -ENXIO for checkpatch]
-
Paulo Zanoni authored
intel_fbc_pre_update() depends upon the new state being already pinned in place in the Global GTT (primarily for both fencing which wants both an offset and a fence register, if assigned). This requires the call to intel_fbc_pre_update() be after intel_pin_and_fence_fb() - but commit e8216e50 ("drm/i915/fbc: call intel_fbc_pre_update earlier during page flips") moved the code way too much up in its attempt to call it before the page flip. v2 (from Paulo): - Point the original bad commit. - Add a comment to maybe prevent further regressions. Fixes: e8216e50 ("drm/i915/fbc: call intel_fbc_pre_update earlier...") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Patrik Jakobsson <patrik.jakobsson@linux.intel.com> Cc: drm-intel-fixes@lists.freedesktop.org Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1471462904-842-1-git-send-email-paulo.r.zanoni@intel.com Cc: stable@vger.kernel.org
-
Daniel Vetter authored
This issue here is (I think) purely theoretical, since a compiler would need to be especially foolish to recompute the value of i915_gem_request_completed right after it was already used. Hence the additional barrier() is also not really a restriction. But I believe this to be at least permissible, and since our rcu trickery is a beast it's worth to annotate all the corner cases. Chris proposed to instead just wrap a READ_ONCE around request->fence.seqno in i915_gem_request_completed. But that has a measurable impact on code size, and everywhere we hold a full reference to the underlying request it's also not needed. And personally I'd like to have just enough barriers and locking needed for correctness, but not more - it makes it much easier in the future to understand what's going on. Since the busy ioctl has now fully embraced it's races there's no point annotating it there too. We really only need it in active_get_rcu, since that function _must_ deliver a correct snapshot of the active fences (and not chase something else). v2: Polish the comment a bit more (Chris). Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1471856122-466-1-git-send-email-daniel.vetter@ffwll.ch
-
Chris Wilson authored
Since by design, if not entirely by practice, nothing is allowed to access the scratch page we use to background fill the VM, then we do not need to ensure that it is coherent between the CPU and GPU. set_pages_uc() does a stop_machine() after changing the PAT, and that significantly impacts upon context creation throughput. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20160822074431.26872-1-chris@chris-wilson.co.ukReviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
-
Chris Wilson authored
The passed in flag that distinguishes i915_gem_pin_display from i915_gem_gtt is from node->info_ent->data not the data function parameter. Fixes: 6da84829 ("drm/i915: Focus debugfs/i915_gem_pinned to show...") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160819115625.17688-1-chris@chris-wilson.co.ukReviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
-
Daniel Vetter authored
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
- 19 Aug, 2016 7 commits
-
-
Chris Wilson authored
Very old numbers indicate this is a 66% improvement when remapping the entire object for fence contention - due to the elimination of track_pfn_insert and its strcmp. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Testcase: igt/gem_fence_upload/performance Testcase: igt/gem_mmap_gtt Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160819155428.1670-6-chris@chris-wilson.co.uk
-
Chris Wilson authored
As io_mapping.h now always allocates the struct, we can avoid that allocation and extra pointer dance by embedding the struct inside drm_i915_private Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160819155428.1670-5-chris@chris-wilson.co.uk
-
Chris Wilson authored
Currently, we only allocate a structure to hold metadata if we need to allocate an ioremap for every access, such as on x86-32. However, it would be useful to store basic information about the io-mapping, such as its page protection, on all platforms. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: linux-mm@kvack.org Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160819155428.1670-4-chris@chris-wilson.co.uk
-
Chris Wilson authored
Only fbc1 is tied to using a fence. Later iterations of fbc are more flexible and allow operation on unfenced frontbuffers. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: "Zanoni, Paulo R" <paulo.r.zanoni@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160819155428.1670-3-chris@chris-wilson.co.uk
-
Chris Wilson authored
If the frontbuffer doesn't have an associated fence, it will have a fence reg of -1. If we attempt to OR in this register into the FBC control register we end up setting all control bits, oops! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: "Zanoni, Paulo R" <paulo.r.zanoni@intel.com> Reviwed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160819155428.1670-2-chris@chris-wilson.co.uk
-
Chris Wilson authored
What I never hit in testing, but Mika immediately did, was a GPU hang with a pending fence release (where a tiled object has been changed by the user to be untiled, and the update has not yet been committed to the fence register). As the stride/tiling is 0, this causes a divide-by-zero error when trying to write the new fence parameters: [ 28.784518] drm/i915: Resetting chip after gpu hang [ 28.784551] divide error: 0000 [#1] PREEMPT SMP [ 28.784565] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat mxm_wmi x86_pkg_temp_thermal snd_hda_codec_hdmi kvm irqbypass snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec serio_raw snd_hwdep snd_hda_core snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_timer snd_seq_device snd soundcore mac_hid wmi efivarfs autofs4 raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid0 multipath linear psmouse e1000e ptp pps_core nvme nvme_core i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm video [ 28.784738] CPU: 0 PID: 1692 Comm: kworker/0:2 Not tainted 4.8.0-rc2+ #895 [ 28.784752] Hardware name: System manufacturer System Product Name/Z170M-PLUS, BIOS 1803 05/09/2016 [ 28.784786] Workqueue: events_long i915_hangcheck_elapsed [i915] [ 28.784814] task: ffff923c18f59d40 task.stack: ffff923c1b7e4000 [ 28.784827] RIP: 0010:[<ffffffffc0475b5f>] [<ffffffffc0475b5f>] fence_write+0x9f/0x3b0 [i915] [ 28.784854] RSP: 0018:ffff923c1b7e7b30 EFLAGS: 00010246 [ 28.784866] RAX: 00000000008ca000 RBX: ffff923c18540000 RCX: 0000000000000020 [ 28.784880] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000596d000 [ 28.784894] RBP: ffff923c1b7e7b68 R08: 0000000000000000 R09: 0000000000000000 [ 28.784908] R10: 0000000000000000 R11: 00000000008ca000 R12: ffff923c1ef9d600 [ 28.784921] R13: 0000000000100040 R14: 0000000000100044 R15: ffff923c18549908 [ 28.784935] FS: 0000000000000000(0000) GS:ffff923c36c00000(0000) knlGS:0000000000000000 [ 28.784951] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 28.784962] CR2: 00007f193373c893 CR3: 0000000419c78000 CR4: 00000000003406f0 [ 28.784976] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 28.784990] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 28.785004] Stack: [ 28.785009] 000000000596c03b ffff923c1b7e7b68 ffff923c18549938 0000000000000009 [ 28.785026] ffff923c18540000 ffff923c18549280 ffff923c18547ce8 ffff923c1b7e7b90 [ 28.785044] ffffffffc04761f9 ffff923c18540000 ffff923c18547d00 ffff923c18548ff8 [ 28.785062] Call Trace: [ 28.785078] [<ffffffffc04761f9>] i915_gem_restore_fences+0x39/0x50 [i915] [ 28.785102] [<ffffffffc047fe89>] i915_gem_reset+0x179/0x300 [i915] Reported-by: Mika Kuoppala <mika.kuoppala@intel.com> Fixes: 49ef5294 ("drm/i915: Move fence tracking from object to vma") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160819155428.1670-1-chris@chris-wilson.co.uk
-
Dave Gordon authored
In the recent patch bc3d6744 drm/i915: Allow userspace to request no-error-capture upon ... the final version moved the flags and the associated #defines around so they were adjacent; unfortunately, they ended up between a comment and the thing (hw_id) to which the comment applies :( So this patch reshuffles the comment and subject back together. Also, as we're touching 'hw_id', let's change it from just 'unsigned' to a fully-specified 'unsigned int', because some code checking tools (including checkpatch) object to plain 'unsigned'. Fixes: bc3d6744 ("drm/i915: Allow userspace to request no-error-capture...") Signed-off-by: Dave Gordon <david.s.gordon@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1471616622-6919-1-git-send-email-david.s.gordon@intel.comReviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
-
- 18 Aug, 2016 22 commits
-
-
Chris Wilson authored
If we need to use clflush to prepare our batch for reads from memory, we can bypass the cache instead by using non-temporal copies. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-39-chris@chris-wilson.co.uk
-
Chris Wilson authored
A significant proportion of the cmdparsing time for some batches is the cost to find the register in the mmiotable. We ensure that those tables are in ascending order such that we could do a binary search if it was ever merited. It is. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-38-chris@chris-wilson.co.uk
-
Chris Wilson authored
If the command descriptor says to skip it, ignore checking for anyother other conflict. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-37-chris@chris-wilson.co.uk
-
Chris Wilson authored
On the blitter (and in test code), we see long sequences of repeated commands, e.g. XY_PIXEL_BLT, XY_SCANLINE_BLT, or XY_SRC_COPY. For these, we can skip the hashtable lookup by remembering the previous command descriptor and doing a straightforward compare of the command header. The corollary is that we need to do one extra comparison before lookup up new commands. v2: Less magic mask (ok, it is still magic, but now you cannot see!) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-36-chris@chris-wilson.co.uk
-
Chris Wilson authored
The existing code's hashfunction is very suboptimal (most 3D commands use the same bucket degrading the hash to a long list). The code even acknowledge that the issue was known and the fix simple: /* * If we attempt to generate a perfect hash, we should be able to look at bits * 31:29 of a command from a batch buffer and use the full mask for that * client. The existing INSTR_CLIENT_MASK/SHIFT defines can be used for this. */ Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-35-chris@chris-wilson.co.uk
-
Chris Wilson authored
For simplicity, we want to continue using a contiguous mapping of the command buffer, but we can reduce the number of vmappings we hold by switching over to a page-by-page copy from the user batch buffer to the shadow. The cost for saving one linear mapping is about 5% in trivial workloads - which is more or less the overhead in calling kmap_atomic(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-34-chris@chris-wilson.co.uk
-
Chris Wilson authored
The single largest factor in the overhead of parsing the commands is the setup of the virtual mapping to provide a continuous block for the batch buffer. If we keep those vmappings around (against the better judgement of mm/vmalloc.c, which we offset by handwaving and looking suggestively at the shrinker) we can dramatically improve the performance of the parser for small batches (such as media workloads). Furthermore, we can use the prepare shmem read/write functions to determine how best we need to clflush the range (rather than every page of the object). The impact of caching both src/dst vmaps is +80% on ivb and +140% on byt for the throughput on small batches. (Caching just the dst vmap and iterating over the src, doing a page by page copy is roughly 5% slower on both platforms. That may be an acceptable trade-off to eliminate one cached vmapping, and we may be able to reduce the per-page copying overhead further.) For *this* simple test case, the cmdparser is now within a factor of 2 of ideal performance. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.william.auld@gmail.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-33-chris@chris-wilson.co.uk
-
Chris Wilson authored
Since I have been using the BCS_TIMESTAMP to measure latency of execution upon the blitter ring, allow regular userspace to also read from that register. They are already allowed RCS_TIMESTAMP! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-32-chris@chris-wilson.co.uk
-
Chris Wilson authored
If the developer adds a register in the wrong order, we BUG during boot. That makes development and testing very difficult. Let's be a bit more friendly and disable the command parser with a big warning if the tables are invalid. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-31-chris@chris-wilson.co.uk
-
Chris Wilson authored
Since commit 43566ded ("drm/i915: Broaden application of set-domain(GTT)") we allowed objects to be in the GTT domain, but unbound. Therefore removing the GTT cache domain when removing the GGTT vma is no longer semantically correct. An unfortunate side-effect is we lose the wondrously named i915_gem_object_finish_gtt(), not to be confused with i915_gem_gtt_finish_object()! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Akash Goel <akash.goel@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-30-chris@chris-wilson.co.uk
-
Chris Wilson authored
We track the LRU access for eviction and bump the last access for the user GGTT on set-to-gtt. When we do so we need to not only bump the primary GGTT VMA but all partials as well. Similarly we want to bump the last access tracking for when unpinning an object from the scanout so that they do not get promptly evicted and hopefully remain available for reuse on the next frame. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-29-chris@chris-wilson.co.uk
-
Chris Wilson authored
When using the aliasing ppgtt and pageflipping with the shrinker/eviction active, we note that we often have to rebind the backbuffer before flipping onto the scanout because it has an invalid alignment. If we store the worst-case alignment required for a VMA, we can avoid having to rebind at critical junctures. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-28-chris@chris-wilson.co.uk
-
Chris Wilson authored
The existing ABI says that scanouts are pinned into the mappable region so that legacy clients (e.g. old Xorg or plymouthd) can write directly into the scanout through a GTT mapping. However if the surface does not fit into the mappable region, we are better off just trying to fit it anywhere and hoping for the best. (Any userspace that is capable of using ginormous scanouts is also likely not to rely on pure GTT updates.) With the partial vma fault support, we are no longer restricted to only using scanouts that we can pin (though it is still preferred for performance reasons and for powersaving features like FBC). v2: Skip fence pinning when not mappable. v3: Add a comment to explain the possible ramifications of not being able to use fences for unmappable scanouts. v4: Rebase to skip over some local patches v5: Rebase to defer until after we have unmappable GTT fault support Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Deepak S <deepak.s@linux.intel.com> Cc: Damien Lespiau <damien.lespiau@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Jani Nikula <jani.nikula@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-27-chris@chris-wilson.co.uk
-
Chris Wilson authored
Often times we do not want to evict mapped objects from the GGTT as these are quite expensive to teardown and frequently reused (causing an equally, if not more so, expensive setup). In particular, when faulting in a new object we want to avoid evicting an active object, or else we may trigger a page-fault-of-doom as we ping-pong between evicting two objects. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-26-chris@chris-wilson.co.uk
-
Chris Wilson authored
If FBC is set on a framebuffer that is unmapped, all GTT faults will be from a partial mapping. Writes by the user through the partial VMA are then untracked by the FBC and so we must use the ORIGIN_CPU when flushing the I915_GEM_DOMAIN_GTT. v2: Keep ORIGIN_CPU for set-to-domain(.write=CPU) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: "Zanoni, Paulo R" <paulo.r.zanoni@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-25-chris@chris-wilson.co.uk
-
Chris Wilson authored
If we want to create a partial vma from a chunk that is the same size as the object, create a normal ggtt vma instead. The benefit is that it will match future requests for the normal ggtt. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-24-chris@chris-wilson.co.uk
-
Chris Wilson authored
We want to always use the partial VMA as a fallback for a failure to bind the object into the GGTT. This extends the support partial objects in the GGTT to cover everything, not just objects too large. v2: Call the partial view, view not partial. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-23-chris@chris-wilson.co.uk
-
Chris Wilson authored
In order to support setting up fences for partial mappings of an object, we have to align those mappings with the fence. The minimum chunksize we choose is at least the size of a single tile row. v2: Make minimum chunk size a define for later use Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-22-chris@chris-wilson.co.uk
-
Chris Wilson authored
In order to handle tiled partial GTT mmappings, we need to associate the fence with an individual vma. v2: A couple of silly drops replaced spotted by Joonas Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-21-chris@chris-wilson.co.uk
-
Chris Wilson authored
Our current practice is to only name the actual list (here dev_priv->fence_list) using "list", and elements upon that list are referred to as "link". Further, the lru nature is of the list and not of the node and including in the name does not disambiguate the link from anything else. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-20-chris@chris-wilson.co.uk
-
Chris Wilson authored
Keep any error reported by the gup_worker until we are notified that the arena has changed (via the mmu-notifier). This has the importance of making two consecutive calls to i915_gem_object_get_pages() reporting the same error, and curtailing a loop of detecting a fault and requeueing a gup_worker. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-19-chris@chris-wilson.co.uk
-
Chris Wilson authored
If we have stolen available, make use of it for ringbuffer allocation. Previously this was restricted to !llc platforms, as writing to stolen requires a GGTT mapping - but now that we have partial mappable support, the mappable aperture isn't quite so precious so we can use it more freely and ringbuffers are a good user for the otherwise wasted stolen. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-18-chris@chris-wilson.co.uk
-