- 11 Jul, 2018 1 commit
-
-
Chris Wilson authored
Add a mutex into struct i915_address_space to be used while operating on the vma and their lists for a particular vm. As this may be called from the shrinker, we taint the mutex with fs_reclaim so that from the start lockdep warns us if we are caught holding the mutex across an allocation. (With such small steps we will eventually rid ourselves of struct_mutex recursion!) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20180711073608.20286-2-chris@chris-wilson.co.uk
-
- 10 Jul, 2018 10 commits
-
-
Paulo Zanoni authored
Now that our stolen memory is already reserved by the x86 subsystem (since commit "x86/gpu: reserve ICL's graphics stolen memory"), make use of it. Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: x86@kernel.org Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180504203252.28048-2-paulo.r.zanoni@intel.com
-
Paulo Zanoni authored
ICL changes the registers and addresses to 64 bits. I also briefly looked at implementing an u64 version of the PCI config read functions, but I concluded this wouldn't be trivial, so it's not worth doing it for a single user that can't have any racing problems while reading the register in two separate operations. v2: - Scrub the development (non-public) changelog (Joonas). - Remove the i915.ko bits so this can be easily backported in order to properly avoid stolen memory even on machines without i915.ko (Joonas). - CC stable for the reasons above. Issue: VIZ-9250 CC: stable@vger.kernel.org Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: x86@kernel.org Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Fixes: 41231001 ("drm/i915/icl: Add initial Icelake definitions.") Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180504203252.28048-1-paulo.r.zanoni@intel.com
-
Chris Wilson authored
Following intel_gvt_init() failure, we missed unwinding our setup leaving pointers dangling past the module unload. For our example, the pm_qos: [ 441.057615] top: 000000006b3baf1c, n: 0000000054d8ef33, p: 0000000097cdf1a2 prev: 0000000054d8ef33, n: 0000000097cdf1a2, p: 000000006b3baf1c next: 0000000097cdf1a2, n: 000000006de8fc8b, p: 0000000081087253 [ 441.057627] WARNING: CPU: 4 PID: 9277 at lib/plist.c:42 plist_check_prev_next+0x2d/0x40 [ 441.057628] Modules linked in: i915(+) vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec snd_hwdep snd_hda_core e1000e snd_pcm mei_me mei prime_numbers [last unloaded: i915] [ 441.057652] CPU: 4 PID: 9277 Comm: drv_selftest Tainted: G U 4.18.0-rc4-CI-CI_DRM_4464+ #1 [ 441.057653] Hardware name: System manufacturer System Product Name/Z170 PRO GAMING, BIOS 3402 04/26/2017 [ 441.057656] RIP: 0010:plist_check_prev_next+0x2d/0x40 [ 441.057657] Code: 08 48 39 f0 74 2b 49 89 f0 48 8b 4f 08 50 ff 32 52 48 89 fe 41 ff 70 08 48 8b 17 48 c7 c7 d8 ae 14 82 4d 8b 08 e8 63 0e 76 ff <0f> 0b 48 83 c4 20 c3 48 39 10 75 d0 f3 c3 0f 1f 44 00 00 41 54 55 [ 441.057717] RSP: 0018:ffffc900003a3a68 EFLAGS: 00010082 [ 441.057720] RAX: 0000000000000000 RBX: ffff8802193978c0 RCX: 0000000000000002 [ 441.057721] RDX: 0000000080000002 RSI: ffffffff820c65a4 RDI: 00000000ffffffff [ 441.057722] RBP: ffff8802193978c0 R08: 0000000000000000 R09: 0000000000000001 [ 441.057724] R10: ffffc900003a3a70 R11: 0000000000000000 R12: ffffffff82243de0 [ 441.057725] R13: ffffffff82243de0 R14: ffff88021a6c78c0 R15: 0000000077359400 [ 441.057726] FS: 00007fc23a4a9980(0000) GS:ffff880236d00000(0000) knlGS:0000000000000000 [ 441.057728] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 441.057729] CR2: 0000563e4503d038 CR3: 0000000138f86005 CR4: 00000000003606e0 [ 441.057730] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 441.057731] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 441.057732] Call Trace: [ 441.057736] plist_check_list+0x2e/0x40 [ 441.057738] plist_add+0x23/0x130 [ 441.057743] pm_qos_update_target+0x1bd/0x2f0 [ 441.057771] i915_driver_load+0xec4/0x1060 [i915] [ 441.057775] ? trace_hardirqs_on_caller+0xe0/0x1b0 [ 441.057800] i915_pci_probe+0x29/0x90 [i915] [ 441.057804] pci_device_probe+0xa1/0x130 [ 441.057807] driver_probe_device+0x306/0x480 [ 441.057810] __driver_attach+0xdb/0x100 [ 441.057812] ? driver_probe_device+0x480/0x480 [ 441.057813] ? driver_probe_device+0x480/0x480 [ 441.057816] bus_for_each_dev+0x74/0xc0 [ 441.057819] bus_add_driver+0x15f/0x250 [ 441.057821] ? 0xffffffffa0696000 [ 441.057823] driver_register+0x56/0xe0 [ 441.057825] ? 0xffffffffa0696000 [ 441.057827] do_one_initcall+0x58/0x370 [ 441.057830] ? do_init_module+0x1d/0x1ea [ 441.057832] ? rcu_read_lock_sched_held+0x6f/0x80 [ 441.057834] ? kmem_cache_alloc_trace+0x282/0x2e0 [ 441.057838] do_init_module+0x56/0x1ea [ 441.057841] load_module+0x2435/0x2b20 [ 441.057852] ? __se_sys_finit_module+0xd3/0xf0 [ 441.057854] __se_sys_finit_module+0xd3/0xf0 [ 441.057861] do_syscall_64+0x55/0x190 [ 441.057863] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 441.057865] RIP: 0033:0x7fc239d75839 [ 441.057866] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48 [ 441.057927] RSP: 002b:00007fffb7825d38 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 441.057930] RAX: ffffffffffffffda RBX: 0000563e45035dd0 RCX: 00007fc239d75839 [ 441.057931] RDX: 0000000000000000 RSI: 0000563e4502f8a0 RDI: 0000000000000004 [ 441.057932] RBP: 0000563e4502f8a0 R08: 0000000000000004 R09: 0000000000000000 [ 441.057933] R10: 00007fffb7825ea0 R11: 0000000000000246 R12: 0000000000000000 [ 441.057934] R13: 0000563e4502f690 R14: 0000000000000000 R15: 000000000000003f [ 441.057940] irq event stamp: 231338 [ 441.057943] hardirqs last enabled at (231337): [<ffffffff8193e3fc>] _raw_spin_unlock_irqrestore+0x4c/0x60 [ 441.057944] hardirqs last disabled at (231338): [<ffffffff8193e26d>] _raw_spin_lock_irqsave+0xd/0x50 [ 441.057947] softirqs last enabled at (231024): [<ffffffff81c0034f>] __do_softirq+0x34f/0x505 [ 441.057949] softirqs last disabled at (231005): [<ffffffff8108c7b9>] irq_exit+0xa9/0xc0 [ 441.057951] WARNING: CPU: 4 PID: 9277 at lib/plist.c:42 plist_check_prev_next+0x2d/0x40 v2: Add a load failure point to intel_gvt_init() so that we always exercise this path in future. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107129Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180710143821.1889-1-chris@chris-wilson.co.uk
-
Chris Wilson authored
After handling a critical failure initialising GEM we need to unwind the modesetting setup. Testcase: igt/drv_module_reload/basic-reload-inject Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180710094421.16223-2-chris@chris-wilson.co.ukReviewed-by: Matthew Auld <matthew.auld@intel.com>
-
Chris Wilson authored
On unwinding following a critical failure inside GEM init, we also need to be sure to flush the workers before unloading the module. Testcase: igt/drv_module_reload/basic-reload-inject Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180710094421.16223-1-chris@chris-wilson.co.uk
-
Chris Wilson authored
In the next patch, we will make a fairly minor change to flush outstanding resets before suspend. In order to keep churn to a minimum in that functional patch, we fix up the comments and coding style now. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180709130208.11730-7-chris@chris-wilson.co.uk
-
Chris Wilson authored
Across a reset, the seqno (and thus hangcheck) should restart and the hangcheck naturally progress, for when it does not, we want to declare an emergency. Currently, we only detect if reset and reinit fails, but we do not detect if the call to reinit succeeds but the HW is fried - as we are resetting hangcheck on initialisation the engine. Remove that and rely on the natural progress to reset the hangcheck timer. References: e21b1413 ("drm/i915: Mark the hangcheck as idle when unparking the engines") References: 1fd00c0f ("drm/i915: Declare the driver wedged if hangcheck makes no progress") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180709130208.11730-2-chris@chris-wilson.co.uk
-
Chris Wilson authored
In our swizzling selftests, we cannot predict the physical address of the target page (at least not simply!) and so skip bit17 swizzles. However, there are two bit17 swizzle modes and we only skipped one, with the second being observed on the lab gdg causing the test to fail, as soon as we hit a page with bit17 set in its address. Testcase: igt/drv_selftest/live_objects #gdg Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180709194915.5789-1-chris@chris-wilson.co.uk
-
Chris Wilson authored
Be pessimistic and presume that we actually allocate every page we exercise via the mock_gtt (e.g. for gvt). In which case we have to keep our working set under the available physical memory to prevent oom. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180710080424.7821-1-chris@chris-wilson.co.uk
-
Chris Wilson authored
Error messages are intended to be addressed to the user; be clear, succinct, instructive and unambiguous. Adding the function name to that message does not add any information the user requires and in the process makes the message less clear. E.g. [ 245.539711] i915 0000:00:02.0: [drm:i915_gem_init [i915]] Failed to initialize GPU, declaring it wedged! becomes [ 245.539711] i915 0000:00:02.0: Failed to initialize GPU, declaring it wedged! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180709134858.12446-1-chris@chris-wilson.co.uk
-
- 09 Jul, 2018 6 commits
-
-
Rodrigo Vivi authored
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
-
Chris Wilson authored
igt_mmap_offset_exhaustion() wants to test what happens when the mmap space is filled with zombie objects, objects discarded by userspace but still active on the GPU. As they are only protected by the active reference, we have to be certain that active reference is kept while we peek into our dangling pointer. That active reference should not be freed until we retire, but we do that retirement from a background thread. This leaves us with a subtle timing problem, exacerbated and highlighted by KASAN: <3>[ 132.380399] BUG: KASAN: use-after-free in drm_gem_create_mmap_offset+0x8c/0xd0 <3>[ 132.380430] Read of size 8 at addr ffff8801e13245f8 by task drv_selftest/5822 <4>[ 132.380470] CPU: 0 PID: 5822 Comm: drv_selftest Tainted: G U 4.18.0-rc3-g7ae7763aa2be-kasan_48+ #1 <4>[ 132.380473] Hardware name: Dell Inc. XPS 8300 /0Y2MRG, BIOS A06 10/17/2011 <4>[ 132.380475] Call Trace: <4>[ 132.380481] dump_stack+0x7c/0xbb <4>[ 132.380487] print_address_description+0x65/0x270 <4>[ 132.380493] kasan_report+0x25b/0x380 <4>[ 132.380497] ? drm_gem_create_mmap_offset+0x8c/0xd0 <4>[ 132.380503] drm_gem_create_mmap_offset+0x8c/0xd0 <4>[ 132.380584] i915_gem_object_create_mmap_offset+0x6d/0x100 [i915] <4>[ 132.380650] igt_mmap_offset_exhaustion+0x462/0x940 [i915] <4>[ 132.380714] ? i915_gem_close_object+0x740/0x740 [i915] <4>[ 132.380784] ? igt_gem_huge+0x269/0x3d0 [i915] <4>[ 132.380865] __i915_subtests+0x5a/0x160 [i915] <4>[ 132.380936] __run_selftests+0x1a2/0x2f0 [i915] <4>[ 132.381008] i915_live_selftests+0x4e/0x80 [i915] <4>[ 132.381071] i915_pci_probe+0xd8/0x1b0 [i915] <4>[ 132.381077] pci_device_probe+0x1c5/0x3a0 <4>[ 132.381087] driver_probe_device+0x6b6/0xcb0 <4>[ 132.381094] __driver_attach+0x22d/0x2c0 <4>[ 132.381100] ? driver_probe_device+0xcb0/0xcb0 <4>[ 132.381103] bus_for_each_dev+0x113/0x1a0 <4>[ 132.381108] ? check_flags.part.24+0x450/0x450 <4>[ 132.381112] ? subsys_dev_iter_exit+0x10/0x10 <4>[ 132.381123] bus_add_driver+0x38b/0x6e0 <4>[ 132.381131] driver_register+0x189/0x400 <4>[ 132.381136] ? 0xffffffffc12d8000 <4>[ 132.381140] do_one_initcall+0xa0/0x4c0 <4>[ 132.381145] ? initcall_blacklisted+0x180/0x180 <4>[ 132.381152] ? do_init_module+0x4a/0x54c <4>[ 132.381156] ? rcu_lockdep_current_cpu_online+0xdc/0x130 <4>[ 132.381161] ? kasan_unpoison_shadow+0x30/0x40 <4>[ 132.381169] do_init_module+0x1b5/0x54c <4>[ 132.381177] load_module+0x619e/0x9b70 <4>[ 132.381202] ? module_frob_arch_sections+0x20/0x20 <4>[ 132.381211] ? vfs_read+0x257/0x2f0 <4>[ 132.381214] ? vfs_read+0x257/0x2f0 <4>[ 132.381221] ? kernel_read+0x8b/0x130 <4>[ 132.381231] ? copy_strings_kernel+0x120/0x120 <4>[ 132.381244] ? __se_sys_finit_module+0x17c/0x1a0 <4>[ 132.381248] __se_sys_finit_module+0x17c/0x1a0 <4>[ 132.381252] ? __ia32_sys_init_module+0xa0/0xa0 <4>[ 132.381261] ? __se_sys_newstat+0x77/0xd0 <4>[ 132.381265] ? cp_new_stat+0x590/0x590 <4>[ 132.381269] ? kmem_cache_free+0x2f0/0x340 <4>[ 132.381285] do_syscall_64+0x97/0x400 <4>[ 132.381292] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4>[ 132.381295] RIP: 0033:0x7eff4af46839 <4>[ 132.381297] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48 <4>[ 132.381426] RSP: 002b:00007ffcd84f4cf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 <4>[ 132.381432] RAX: ffffffffffffffda RBX: 000055dfdeb429a0 RCX: 00007eff4af46839 <4>[ 132.381435] RDX: 0000000000000000 RSI: 000055dfdeb43670 RDI: 0000000000000004 <4>[ 132.381437] RBP: 000055dfdeb43670 R08: 0000000000000004 R09: 0000000000000000 <4>[ 132.381440] R10: 00007ffcd84f4e60 R11: 0000000000000246 R12: 0000000000000000 <4>[ 132.381442] R13: 000055dfdeb3bec0 R14: 0000000000000000 R15: 000000000000003b <3>[ 132.381466] Allocated by task 5822: <4>[ 132.381485] kmem_cache_alloc+0xdf/0x2e0 <4>[ 132.381546] i915_gem_object_create_internal+0x24/0x1e0 [i915] <4>[ 132.381609] igt_mmap_offset_exhaustion+0x257/0x940 [i915] <4>[ 132.381677] __i915_subtests+0x5a/0x160 [i915] <4>[ 132.381742] __run_selftests+0x1a2/0x2f0 [i915] <4>[ 132.381806] i915_live_selftests+0x4e/0x80 [i915] <4>[ 132.381865] i915_pci_probe+0xd8/0x1b0 [i915] <4>[ 132.381868] pci_device_probe+0x1c5/0x3a0 <4>[ 132.381871] driver_probe_device+0x6b6/0xcb0 <4>[ 132.381874] __driver_attach+0x22d/0x2c0 <4>[ 132.381877] bus_for_each_dev+0x113/0x1a0 <4>[ 132.381880] bus_add_driver+0x38b/0x6e0 <4>[ 132.381884] driver_register+0x189/0x400 <4>[ 132.381886] do_one_initcall+0xa0/0x4c0 <4>[ 132.381889] do_init_module+0x1b5/0x54c <4>[ 132.381892] load_module+0x619e/0x9b70 <4>[ 132.381895] __se_sys_finit_module+0x17c/0x1a0 <4>[ 132.381898] do_syscall_64+0x97/0x400 <4>[ 132.381901] entry_SYSCALL_64_after_hwframe+0x49/0xbe <3>[ 132.381914] Freed by task 150: <4>[ 132.381931] kmem_cache_free+0xb7/0x340 <4>[ 132.381995] __i915_gem_free_objects+0x875/0xf50 [i915] <4>[ 132.382054] __i915_gem_free_work+0x69/0xb0 [i915] <4>[ 132.382058] process_one_work+0x78b/0x1740 <4>[ 132.382061] worker_thread+0x82/0xb80 <4>[ 132.382064] kthread+0x30c/0x3d0 <4>[ 132.382067] ret_from_fork+0x3a/0x50 <3>[ 132.382081] The buggy address belongs to the object at ffff8801e1324500 which belongs to the cache drm_i915_gem_object of size 1168 <3>[ 132.382133] The buggy address is located 248 bytes inside of 1168-byte region [ffff8801e1324500, ffff8801e1324990) <3>[ 132.382179] The buggy address belongs to the page: <0>[ 132.382202] page:ffffea000784c800 count:1 mapcount:0 mapping:ffff8801dedf6500 index:0xffff8801e1323ec0 compound_mapcount: 0 <0>[ 132.382251] flags: 0x8000000000008100(slab|head) <1>[ 132.382274] raw: 8000000000008100 ffff8801d6317440 ffff8801d6317440 ffff8801dedf6500 <1>[ 132.382307] raw: ffff8801e1323ec0 0000000000140013 00000001ffffffff 0000000000000000 <1>[ 132.382339] page dumped because: kasan: bad access detected <3>[ 132.382373] Memory state around the buggy address: <3>[ 132.382395] ffff8801e1324480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc <3>[ 132.382426] ffff8801e1324500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb <3>[ 132.382457] >ffff8801e1324580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb <3>[ 132.382488] ^ <3>[ 132.382517] ffff8801e1324600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb <3>[ 132.382548] ffff8801e1324680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb This patch tricks the system into running without the background retire thread, until after we finish the test. The only reaping should then be performed by the mmap offset routine to reclaim the space as required. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180709130208.11730-1-chris@chris-wilson.co.uk
-
Chris Wilson authored
In igt_flush_test() we install a background timer in order to ensure that the wait completes within a certain time. We can now tell the wait that it has to complete within a timeout, and so no longer need the background timer. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180709122044.7028-3-chris@chris-wilson.co.uk
-
Chris Wilson authored
With a broken GPU we expect it to fail during the initial GPU setup where do a couple of context switches to record the defaults. This is a task that takes a few milliseconds even on the slowest of devices, but we may have to wait 60s for hangcheck to give in and declare the machine inoperable. In this a case where any gpu hang is unacceptable, both from a timeliness and practical standpoint. We can therefore set a timeout on our wait-for-idle that is shorter than the hangcheck (which may be up to 60s for a declaring a wedged driver) and so detect the broken GPU much more quickly during driver load (and so prevent stalling userspace for ages). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180709122044.7028-2-chris@chris-wilson.co.uk
-
Chris Wilson authored
Usually we have no idea about the upper bound we need to wait to catch up with userspace when idling the device, but in a few situations we know the system was idle beforehand and can provide a short timeout in order to very quickly catch a failure, long before hangcheck kicks in. In the following patches, we will use the timeout to curtain two overly long waits, where we know we can expect the GPU to complete within a reasonable time or declare it broken. In particular, with a broken GPU we expect it to fail during the initial GPU setup where do a couple of context switches to record the defaults. This is a task that takes a few milliseconds even on the slowest of devices, but we may have to wait 60s for hangcheck to give in and declare the machine inoperable. In this a case where any gpu hang is unacceptable, both from a timeliness and practical standpoint. The other improvement is that in selftests, we do not need to arm an independent timer to inject a wedge, as we can just limit the timeout on the wait directly. v2: Include the timeout parameter in the trace. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180709122044.7028-1-chris@chris-wilson.co.uk
-
Chris Wilson authored
i915g has a slightly different tiling layout, and so requires a different reference swizzle pattern. Testcase: igt/drv_selftests/live_objects #gdg Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180707100405.817-1-chris@chris-wilson.co.uk
-
- 07 Jul, 2018 1 commit
-
-
Chris Wilson authored
In the next patch, we will want a third distinct class of timeline that may overlap with the current pair of client and engine timeline classes. Rather than use the ad hoc markup of SINGLE_DEPTH_NESTING, initialise the different timeline classes with an explicit subclass. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180706210710.16251-1-chris@chris-wilson.co.uk
-
- 06 Jul, 2018 22 commits
-
-
Chris Wilson authored
Inside the mock GEM device, we try to grab the runtime pm for the fake device to prevent it from ever suspending. However, if CONFIG_PM is not set, trying to obtain the wakref returns an error which we WARN about. Suppress the expected warning. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180706205947.11209-1-chris@chris-wilson.co.uk
-
Chris Wilson authored
clflush is an unserialised instruction and the IA manual strongly advises you to serialise it with a mb. To be cautious, apply one before and one after, so that it is serialised with both writes and reads without worrying too much about the required direction. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180706174926.4712-1-chris@chris-wilson.co.uk
-
Chris Wilson authored
Using a VMA on more than one timeline concurrently is the exception rather than the rule (using it concurrently on multiple engines). As we expect to only use one active tracker, store the most recently used tracker inside the i915_vma itself and only fallback to the rbtree if we need a second or more concurrent active trackers. v2: Comments on how we overwrite any existing last_active cache. v3: __list_del_entry() before list_replace_init() is confusing and, much more important, entirely redundant. v4: Note that both last_active and the rbtree may be simultaneously tracking this timeline, albeit with different requests, and so the vma may be retired twice for the same timeline. v5: No, that list_del is required! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180706123157.9645-1-chris@chris-wilson.co.uk
-
Chris Wilson authored
In the next patch, we will want to be able to use more flexible request timelines that can hop between engines. From the vma pov, we can then not rely on the binding of this request to an engine and so can not ensure that different requests are ordered through a per-engine timeline, and so we must track activity of all timelines. (We track activity on the vma itself to prevent unbinding from HW before the HW has finished accessing it.) v2: Switch to a rbtree for 32b safety (since using u64 as a radixtree index is fraught with aliasing of unsigned longs). v3: s/lookup_active/active_instance/ because we can never agree on names Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180706103947.15919-5-chris@chris-wilson.co.uk
-
Chris Wilson authored
i915_vma_move_to_active() has grown beyond its execbuf origins, and should take its rightful place in i915_vma.c as a method for i915_vma! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180706103947.15919-4-chris@chris-wilson.co.uk
-
Chris Wilson authored
Handling such a late error in request construction is tricky, but to accommodate future patches which may allocate here, we potentially could err. To handle the error after already adjusting global state to track the new request, we must finish and submit the request. But we don't want to use the request as not everything is being tracked by it, so we opt to cancel the commands inside the request. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180706103947.15919-3-chris@chris-wilson.co.uk
-
Chris Wilson authored
In the next patch, we will want to start skipping requests on failing to complete their payloads. So export the utility function current used to make requests inoperable following a failed gpu reset. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180706103947.15919-2-chris@chris-wilson.co.uk
-
Chris Wilson authored
Currently all callers are responsible for adding the vma to the active timeline and then exporting its fence. Combine the two operations into i915_vma_move_to_active() to move all the extra handling from the callers to the single site. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180706103947.15919-1-chris@chris-wilson.co.uk
-
Chris Wilson authored
We always want to use a virtual address (i.e. use the GTT) for MI_STORE_DWORD_IMM, but forgot the ever so important flag in live_hangcheck for gen3. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180706142323.25699-2-chris@chris-wilson.co.uk
-
Chris Wilson authored
Replace the magic bit with the proper symbolic name for instructing MI_STORE_DWORD_IMM to use a virtual address (on gen3) or the global GTT address (still virtual!) on gen4+. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180706142323.25699-1-chris@chris-wilson.co.ukReviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
-
Chris Wilson authored
Limit the GTT size we try and allocate to ensure that it fits within RAM and does not trigger the oomkiller indiscriminately. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180706125338.24432-1-chris@chris-wilson.co.uk
-
Chris Wilson authored
We already maually control the CPU cache for our page table directories, so we can tell the dma mapper to skip doing it as well. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180706122611.4142-2-chris@chris-wilson.co.uk
-
Chris Wilson authored
As we propagate back the error to the caller for them to handle, we do not need the lowest level spitting out a redundant warning upon an allocation failure inside dma_map_page(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180706122611.4142-1-chris@chris-wilson.co.uk
-
Chris Wilson authored
If we have just completed a WC write, we must ensure that the WCB (Write Combining Buffer) is flushed out to main memory before we can expect to see the results. This is especially important when mixing WC with GTT as the physical paths are different and cachelines are not naturally flushed. Testcase: igt/drv_selftests/live_coherency #gdg Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180706115402.18547-1-chris@chris-wilson.co.uk
-
Chris Wilson authored
If the GPU is irrecoverably wedged, we can not execute any requests making testing execlists (request execution) pointless. Skip! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180706114510.18467-1-chris@chris-wilson.co.uk
-
Chris Wilson authored
If the HW (or driver) doesn't support logical contexts, don't pretend we gain anything from trying to execute GPU commands with them. At best it reports -ENODEV, which is an unhelpful failure that we should just skip. v2: Be more specific and check the driver/engine caps for logical (HW) context support. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180706101923.28548-1-chris@chris-wilson.co.uk
-
Chris Wilson authored
Avoid looking at the magical engines[RCS] to decide if the HW and driver supports logical contexts, and instead record that knowledge during initialisation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180706101442.21279-1-chris@chris-wilson.co.uk
-
Imre Deak authored
We can simplify the encoder's get_power_domains() hook by calling it only if the encoder is active. That way the hook can return its power domains unconditionally without checking the active state by calling encoder::get_hw_state(). This get_hw_state() query is in fact redundant since it's already done by intel_modeset_readout_hw_state() setting the encoder's crtc or leaving it NULL accordingly. Let's use this fact to decide if the encoder is active. While at it clarify the comment in intel_ddi_get_power_domains() about primary vs. fake MST encoders and make sure we never do an incorrect encoder->dig_port cast for fake MST encoders. Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180705122654.17072-1-imre.deak@intel.com
-
Maarten Lankhorst authored
This interface is deprecated, and has been replaced by the upstream drm crc interface. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Tomi Sarvela <tomi.p.sarvela@intel.com> Cc: Petri Latvala <petri.latvala@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180628072303.14175-1-maarten.lankhorst@linux.intel.com
-
Chris Wilson authored
If the GPU is terminally wedged we cannot submit any requests into a context, completely unfulfilling our purpose of doing so. As this expectedly fails, skip over the test. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180706065332.15214-9-chris@chris-wilson.co.uk
-
Chris Wilson authored
We test the GPU handling of huge pages by submitting requests that write into a huge page, but if the GPU is irrecoverably wedged we cannot submit any requests. As the test expectedly fails, skip over it. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180706065332.15214-8-chris@chris-wilson.co.uk
-
Chris Wilson authored
If the GPU is irrecoverably wedged, we cannot submit any requests and so cannot make the GTT busy in order to test evicting active objects. As this expectedly fails, skip over the test. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180706065332.15214-7-chris@chris-wilson.co.uk
-