1. 16 Oct, 2017 8 commits
  2. 13 Oct, 2017 20 commits
  3. 12 Oct, 2017 7 commits
    • Chris Wilson's avatar
      drm/i915/selftests: Exercise adding requests to a full GGTT · 9c1477e8
      Chris Wilson authored
      A bug recently encountered involved the issue where are we were
      submitting requests to different ppGTT, each would pin a segment of the
      GGTT for its logical context and ring. However, this is invisible to
      eviction as we do not tie the context/ring VMA to a request and so do
      not automatically wait upon it them (instead they are marked as pinned,
      preventing eviction entirely). Instead the eviction code must flush those
      contexts by switching to the kernel context. This selftest tries to
      fill the GGTT with contexts to exercise a path where the
      switch-to-kernel-context failed to make forward progress and we fail
      with ENOSPC.
      
      v2: Make the hole in the filled GGTT explicit.
      v3: Swap out the arbitrary timeout for a private notification from
      i915_gem_evict_something()
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20171012125726.14736-3-chris@chris-wilson.co.ukReviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      9c1477e8
    • Chris Wilson's avatar
      drm/i915/selftests: Wrap a timer into a i915_sw_fence · 214707fc
      Chris Wilson authored
      For some selftests, we want to issue requests but delay them going to
      hardware. Furthermore, we don't want those requests to block
      indefinitely (or else we may hang the driver and block testing) so we
      want to employ a timeout. So naturally we want a fence that is
      automatically signaled by a timer.
      
      v2: Add kselftests.
      v3: Limit the API available to selftests; there isn't an overwhelming
      reason to export it universally.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20171012125726.14736-2-chris@chris-wilson.co.ukReviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      214707fc
    • Chris Wilson's avatar
      drm/i915: Fix eviction when the GGTT is idle but full · 55b4f1ce
      Chris Wilson authored
      In the full-ppgtt world, we can fill the GGTT full of context objects.
      These context objects are currently implicitly tracked by the requests
      that pin them i.e. they are only unpinned when the request is completed
      and retired, but we do not have the link from the vma to the request
      (anymore). In order to unpin those contexts, we have to issue another
      request and wait upon the switch to the kernel context.
      
      The bug during eviction was that we assumed that a full GGTT meant we
      would have requests on the GGTT timeline, and so we missed situations
      where those requests where merely in flight (and when even they have not
      yet been submitted to hw yet). The fix employed here is to change the
      already-is-idle test to no look at the execution timeline, but count the
      outstanding requests and then check that we have switched to the kernel
      context. Erring on the side of overkill here just means that we stall a
      little longer than may be strictly required, but we only expect to hit
      this path in extreme corner cases where returning an erroneous error is
      worse than the delay.
      
      v2: Logical inversion when swapping over branches.
      
      Fixes: 80b204bc ("drm/i915: Enable multiple timelines")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20171012125726.14736-1-chris@chris-wilson.co.uk
      55b4f1ce
    • Ville Syrjälä's avatar
      drm/i915: Start tracking PSR state in crtc state · 4d90f2d5
      Ville Syrjälä authored
      Add the minimal amount of PSR tracking into the crtc state. This allows
      precomputing the possibility of using PSR correctly, and it means we can
      safely call the psr enable/disable functions for any DP endcoder.
      
      As a nice bonus we get rid of some more crtc->config usage, which we
      want to kill off eventually.
      
      v2: Fix 'goto unlock' fail in intel_psr_enable() (Jani)
          Check intel_dp_is_edp() in is_edp_psr() (Jani)
      
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Jani Nikula <jani.nikula@intel.com>
      Signed-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20171012130201.21318-1-ville.syrjala@linux.intel.comReviewed-by: default avatarJani Nikula <jani.nikula@intel.com>
      Acked-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      4d90f2d5
    • Jani Nikula's avatar
      drm/i915: Update DRIVER_DATE to 20171012 · fa9caf0b
      Jani Nikula authored
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      fa9caf0b
    • Joonas Lahtinen's avatar
      drm/i915: Simplify intel_sanitize_enable_ppgtt · 612dde7e
      Joonas Lahtinen authored
      Remove dead code around has_aliasing_ppgtt condition.
      Suggested-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20171010143355.16577-1-joonas.lahtinen@linux.intel.com
      612dde7e
    • Chris Wilson's avatar
      drm/i915/userptr: Drop struct_mutex before cleanup · 7c781423
      Chris Wilson authored
      Purely to silence lockdep, as we know that no bo can exist at this time
      and so the inversion is impossible. Nevertheless, lockdep currently
      warns on unload:
      
      [  137.522565] WARNING: possible circular locking dependency detected
      [  137.522568] 4.14.0-rc4-CI-CI_DRM_3209+ #1 Tainted: G     U
      [  137.522570] ------------------------------------------------------
      [  137.522572] drv_module_relo/1532 is trying to acquire lock:
      [  137.522574]  ("i915-userptr-acquire"){+.+.}, at: [<ffffffff8109a831>] flush_workqueue+0x91/0x540
      [  137.522581]
                     but task is already holding lock:
      [  137.522583]  (&dev->struct_mutex){+.+.}, at: [<ffffffffa014fb3f>] i915_gem_fini+0x3f/0xc0 [i915]
      [  137.522605]
                     which lock already depends on the new lock.
      
      [  137.522608]
                     the existing dependency chain (in reverse order) is:
      [  137.522611]
                     -> #3 (&dev->struct_mutex){+.+.}:
      [  137.522615]        __lock_acquire+0x1420/0x15e0
      [  137.522618]        lock_acquire+0xb0/0x200
      [  137.522621]        __mutex_lock+0x86/0x9b0
      [  137.522623]        mutex_lock_interruptible_nested+0x1b/0x20
      [  137.522640]        i915_mutex_lock_interruptible+0x51/0x130 [i915]
      [  137.522657]        i915_gem_fault+0x20b/0x720 [i915]
      [  137.522660]        __do_fault+0x1e/0x80
      [  137.522662]        __handle_mm_fault+0xa08/0xed0
      [  137.522664]        handle_mm_fault+0x156/0x300
      [  137.522666]        __do_page_fault+0x2c5/0x570
      [  137.522668]        do_page_fault+0x28/0x250
      [  137.522671]        page_fault+0x22/0x30
      [  137.522672]
                     -> #2 (&mm->mmap_sem){++++}:
      [  137.522677]        __lock_acquire+0x1420/0x15e0
      [  137.522679]        lock_acquire+0xb0/0x200
      [  137.522682]        down_read+0x3e/0x70
      [  137.522699]        __i915_gem_userptr_get_pages_worker+0x141/0x240 [i915]
      [  137.522701]        process_one_work+0x233/0x660
      [  137.522704]        worker_thread+0x4e/0x3b0
      [  137.522706]        kthread+0x152/0x190
      [  137.522708]        ret_from_fork+0x27/0x40
      [  137.522710]
                     -> #1 ((&work->work)){+.+.}:
      [  137.522714]        __lock_acquire+0x1420/0x15e0
      [  137.522717]        lock_acquire+0xb0/0x200
      [  137.522719]        process_one_work+0x206/0x660
      [  137.522721]        worker_thread+0x4e/0x3b0
      [  137.522723]        kthread+0x152/0x190
      [  137.522725]        ret_from_fork+0x27/0x40
      [  137.522727]
                     -> #0 ("i915-userptr-acquire"){+.+.}:
      [  137.522731]        check_prev_add+0x430/0x840
      [  137.522733]        __lock_acquire+0x1420/0x15e0
      [  137.522735]        lock_acquire+0xb0/0x200
      [  137.522738]        flush_workqueue+0xb4/0x540
      [  137.522740]        drain_workqueue+0xd4/0x1b0
      [  137.522742]        destroy_workqueue+0x1c/0x200
      [  137.522758]        i915_gem_cleanup_userptr+0x15/0x20 [i915]
      [  137.522770]        i915_gem_fini+0x5f/0xc0 [i915]
      [  137.522782]        i915_driver_unload+0x122/0x180 [i915]
      [  137.522794]        i915_pci_remove+0x19/0x30 [i915]
      [  137.522797]        pci_device_remove+0x39/0xb0
      [  137.522800]        device_release_driver_internal+0x15d/0x220
      [  137.522803]        driver_detach+0x40/0x80
      [  137.522805]        bus_remove_driver+0x58/0xd0
      [  137.522807]        driver_unregister+0x2c/0x40
      [  137.522809]        pci_unregister_driver+0x36/0xb0
      [  137.522828]        i915_exit+0x1a/0x8b [i915]
      [  137.522831]        SyS_delete_module+0x18c/0x1e0
      [  137.522834]        entry_SYSCALL_64_fastpath+0x1c/0xb1
      [  137.522835]
                     other info that might help us debug this:
      
      [  137.522838] Chain exists of:
                       "i915-userptr-acquire" --> &mm->mmap_sem --> &dev->struct_mutex
      
      [  137.522844]  Possible unsafe locking scenario:
      
      [  137.522846]        CPU0                    CPU1
      [  137.522848]        ----                    ----
      [  137.522850]   lock(&dev->struct_mutex);
      [  137.522852]                                lock(&mm->mmap_sem);
      [  137.522854]                                lock(&dev->struct_mutex);
      [  137.522857]   lock("i915-userptr-acquire");
      [  137.522859]
                      *** DEADLOCK ***
      
      [  137.522862] 3 locks held by drv_module_relo/1532:
      [  137.522864]  #0:  (&dev->mutex){....}, at: [<ffffffff8161d47b>] device_release_driver_internal+0x2b/0x220
      [  137.522869]  #1:  (&dev->mutex){....}, at: [<ffffffff8161d489>] device_release_driver_internal+0x39/0x220
      [  137.522873]  #2:  (&dev->struct_mutex){+.+.}, at: [<ffffffffa014fb3f>] i915_gem_fini+0x3f/0xc0 [i915]
      [  137.522888]
                     stack backtrace:
      [  137.522891] CPU: 0 PID: 1532 Comm: drv_module_relo Tainted: G     U          4.14.0-rc4-CI-CI_DRM_3209+ #1
      [  137.522894] Hardware name:                  /NUC7i5BNB, BIOS BNKBL357.86A.0048.2017.0704.1415 07/04/2017
      [  137.522897] Call Trace:
      [  137.522900]  dump_stack+0x68/0x9f
      [  137.522902]  print_circular_bug+0x235/0x3c0
      [  137.522905]  ? lockdep_init_map_crosslock+0x20/0x20
      [  137.522908]  check_prev_add+0x430/0x840
      [  137.522919]  ? i915_gem_fini+0x5f/0xc0 [i915]
      [  137.522922]  ? __kernel_text_address+0x12/0x40
      [  137.522925]  ? __save_stack_trace+0x66/0xd0
      [  137.522928]  __lock_acquire+0x1420/0x15e0
      [  137.522930]  ? __lock_acquire+0x1420/0x15e0
      [  137.522933]  ? lockdep_init_map_crosslock+0x20/0x20
      [  137.522936]  ? __this_cpu_preempt_check+0x13/0x20
      [  137.522938]  lock_acquire+0xb0/0x200
      [  137.522940]  ? flush_workqueue+0x91/0x540
      [  137.522943]  flush_workqueue+0xb4/0x540
      [  137.522945]  ? flush_workqueue+0x91/0x540
      [  137.522948]  ? __mutex_unlock_slowpath+0x43/0x2c0
      [  137.522951]  ? trace_hardirqs_on_caller+0xe3/0x1b0
      [  137.522954]  drain_workqueue+0xd4/0x1b0
      [  137.522956]  ? drain_workqueue+0xd4/0x1b0
      [  137.522958]  destroy_workqueue+0x1c/0x200
      [  137.522975]  i915_gem_cleanup_userptr+0x15/0x20 [i915]
      [  137.522987]  i915_gem_fini+0x5f/0xc0 [i915]
      [  137.523000]  i915_driver_unload+0x122/0x180 [i915]
      [  137.523015]  i915_pci_remove+0x19/0x30 [i915]
      [  137.523018]  pci_device_remove+0x39/0xb0
      [  137.523021]  device_release_driver_internal+0x15d/0x220
      [  137.523023]  driver_detach+0x40/0x80
      [  137.523026]  bus_remove_driver+0x58/0xd0
      [  137.523028]  driver_unregister+0x2c/0x40
      [  137.523030]  pci_unregister_driver+0x36/0xb0
      [  137.523049]  i915_exit+0x1a/0x8b [i915]
      [  137.523052]  SyS_delete_module+0x18c/0x1e0
      [  137.523055]  entry_SYSCALL_64_fastpath+0x1c/0xb1
      [  137.523057] RIP: 0033:0x7f7bd0609287
      [  137.523059] RSP: 002b:00007ffef694bc18 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0
      [  137.523062] RAX: ffffffffffffffda RBX: ffffffff81493f33 RCX: 00007f7bd0609287
      [  137.523065] RDX: 0000000000000001 RSI: 0000000000000800 RDI: 0000564f999f9fc8
      [  137.523067] RBP: ffffc90005c4ff88 R08: 0000000000000000 R09: 0000000000000080
      [  137.523069] R10: 00007f7bd20ef8c0 R11: 0000000000000246 R12: 0000000000000000
      [  137.523072] R13: 00007ffef694be00 R14: 0000000000000000 R15: 0000000000000000
      [  137.523075]  ? __this_cpu_preempt_check+0x13/0x20
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Link: https://patchwork.freedesktop.org/patch/msgid/20171011141857.14161-1-chris@chris-wilson.co.ukReviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      7c781423
  4. 11 Oct, 2017 5 commits
    • Jani Nikula's avatar
      drm/i915/dp: limit sink rates based on rate · a8a08886
      Jani Nikula authored
      Get rid of redundant intel_dp_num_rates(). We can simply look at the
      rate and limit based on that.
      
      Cc: Manasi Navare <manasi.d.navare@intel.com>
      Reviewed-by: <manasi.d.navare@intel.com>
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20171009092959.29021-3-jani.nikula@intel.com
      a8a08886
    • Jani Nikula's avatar
      drm/i915/dp: centralize max source rate conditions more · fc603ca7
      Jani Nikula authored
      Turn intel_dp_source_supports_hbr2() into a simple helper to query the
      pre-filled source rates array, and move the conditions about which
      platforms support which rates to the single point of truth in
      intel_dp_set_source_rates().
      
      This also reduces the code paths you have to think about in the source
      rates initialization in intel_dp_set_source_rates(), making it easier to
      grasp.
      
      Cc: Manasi Navare <manasi.d.navare@intel.com>
      Reviewed-by: default avatarManasi Navare <manasi.d.navare@intel.com>
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20171009092959.29021-2-jani.nikula@intel.com
      fc603ca7
    • Ville Syrjälä's avatar
      drm/i915: Allow PCH platforms fall back to BIOS LVDS mode · 3c7b6b3c
      Ville Syrjälä authored
      With intel_encoder_current_mode() using the normal state readout code it
      actually works on PCH platforms as well. So let's nuke the PCH check from
      intel_lvds_init(). I suppose there aren't any machines that actually
      need this, but at least we get to eliminate a few lines of code, and one
      FIXME.
      Signed-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20171009161951.22420-2-ville.syrjala@linux.intel.comReviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      3c7b6b3c
    • Ville Syrjälä's avatar
      drm/i915: Reuse normal state readout for LVDS/DVO fixed mode · de330815
      Ville Syrjälä authored
      Reuse the normal state readout code to get the fixed mode for LVDS/DVO
      encoders. This removes some partially duplicated state readout code
      from LVDS/DVO encoders. The duplicated code wasn't actually even
      populating the negative h/vsync flags, leading to possible state checker
      complaints. The normal readout code populates that stuff fully.
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20171009161951.22420-1-ville.syrjala@linux.intel.comReviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Tested-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      de330815
    • Daniel Vetter's avatar
      drm/i915: Use rcu instead of stop_machine in set_wedged · af7a8ffa
      Daniel Vetter authored
      stop_machine is not really a locking primitive we should use, except
      when the hw folks tell us the hw is broken and that's the only way to
      work around it.
      
      This patch tries to address the locking abuse of stop_machine() from
      
      commit 20e4933c
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Tue Nov 22 14:41:21 2016 +0000
      
          drm/i915: Stop the machine as we install the wedged submit_request handler
      
      Chris said parts of the reasons for going with stop_machine() was that
      it's no overhead for the fast-path. But these callbacks use irqsave
      spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast.
      
      To stay as close as possible to the stop_machine semantics we first
      update all the submit function pointers to the nop handler, then call
      synchronize_rcu() to make sure no new requests can be submitted. This
      should give us exactly the huge barrier we want.
      
      I pondered whether we should annotate engine->submit_request as __rcu
      and use rcu_assign_pointer and rcu_dereference on it. But the reason
      behind those is to make sure the compiler/cpu barriers are there for
      when you have an actual data structure you point at, to make sure all
      the writes are seen correctly on the read side. But we just have a
      function pointer, and .text isn't changed, so no need for these
      barriers and hence no need for annotations.
      
      Unfortunately there's a complication with the call to
      intel_engine_init_global_seqno:
      
      - Without stop_machine we must hold the corresponding spinlock.
      
      - Without stop_machine we must ensure that all requests are marked as
        having failed with dma_fence_set_error() before we call it. That
        means we need to split the nop request submission into two phases,
        both synchronized with rcu:
      
        1. Only stop submitting the requests to hw and mark them as failed.
      
        2. After all pending requests in the scheduler/ring are suitably
        marked up as failed and we can force complete them all, also force
        complete by calling intel_engine_init_global_seqno().
      
      This should fix the followwing lockdep splat:
      
      ======================================================
      WARNING: possible circular locking dependency detected
      4.14.0-rc3-CI-CI_DRM_3179+ #1 Tainted: G     U
      ------------------------------------------------------
      kworker/3:4/562 is trying to acquire lock:
       (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8113d4bc>] stop_machine+0x1c/0x40
      
      but task is already holding lock:
       (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #6 (&dev->struct_mutex){+.+.}:
             __lock_acquire+0x1420/0x15e0
             lock_acquire+0xb0/0x200
             __mutex_lock+0x86/0x9b0
             mutex_lock_interruptible_nested+0x1b/0x20
             i915_mutex_lock_interruptible+0x51/0x130 [i915]
             i915_gem_fault+0x209/0x650 [i915]
             __do_fault+0x1e/0x80
             __handle_mm_fault+0xa08/0xed0
             handle_mm_fault+0x156/0x300
             __do_page_fault+0x2c5/0x570
             do_page_fault+0x28/0x250
             page_fault+0x22/0x30
      
      -> #5 (&mm->mmap_sem){++++}:
             __lock_acquire+0x1420/0x15e0
             lock_acquire+0xb0/0x200
             __might_fault+0x68/0x90
             _copy_to_user+0x23/0x70
             filldir+0xa5/0x120
             dcache_readdir+0xf9/0x170
             iterate_dir+0x69/0x1a0
             SyS_getdents+0xa5/0x140
             entry_SYSCALL_64_fastpath+0x1c/0xb1
      
      -> #4 (&sb->s_type->i_mutex_key#5){++++}:
             down_write+0x3b/0x70
             handle_create+0xcb/0x1e0
             devtmpfsd+0x139/0x180
             kthread+0x152/0x190
             ret_from_fork+0x27/0x40
      
      -> #3 ((complete)&req.done){+.+.}:
             __lock_acquire+0x1420/0x15e0
             lock_acquire+0xb0/0x200
             wait_for_common+0x58/0x210
             wait_for_completion+0x1d/0x20
             devtmpfs_create_node+0x13d/0x160
             device_add+0x5eb/0x620
             device_create_groups_vargs+0xe0/0xf0
             device_create+0x3a/0x40
             msr_device_create+0x2b/0x40
             cpuhp_invoke_callback+0xc9/0xbf0
             cpuhp_thread_fun+0x17b/0x240
             smpboot_thread_fn+0x18a/0x280
             kthread+0x152/0x190
             ret_from_fork+0x27/0x40
      
      -> #2 (cpuhp_state-up){+.+.}:
             __lock_acquire+0x1420/0x15e0
             lock_acquire+0xb0/0x200
             cpuhp_issue_call+0x133/0x1c0
             __cpuhp_setup_state_cpuslocked+0x139/0x2a0
             __cpuhp_setup_state+0x46/0x60
             page_writeback_init+0x43/0x67
             pagecache_init+0x3d/0x42
             start_kernel+0x3a8/0x3fc
             x86_64_start_reservations+0x2a/0x2c
             x86_64_start_kernel+0x6d/0x70
             verify_cpu+0x0/0xfb
      
      -> #1 (cpuhp_state_mutex){+.+.}:
             __lock_acquire+0x1420/0x15e0
             lock_acquire+0xb0/0x200
             __mutex_lock+0x86/0x9b0
             mutex_lock_nested+0x1b/0x20
             __cpuhp_setup_state_cpuslocked+0x53/0x2a0
             __cpuhp_setup_state+0x46/0x60
             page_alloc_init+0x28/0x30
             start_kernel+0x145/0x3fc
             x86_64_start_reservations+0x2a/0x2c
             x86_64_start_kernel+0x6d/0x70
             verify_cpu+0x0/0xfb
      
      -> #0 (cpu_hotplug_lock.rw_sem){++++}:
             check_prev_add+0x430/0x840
             __lock_acquire+0x1420/0x15e0
             lock_acquire+0xb0/0x200
             cpus_read_lock+0x3d/0xb0
             stop_machine+0x1c/0x40
             i915_gem_set_wedged+0x1a/0x20 [i915]
             i915_reset+0xb9/0x230 [i915]
             i915_reset_device+0x1f6/0x260 [i915]
             i915_handle_error+0x2d8/0x430 [i915]
             hangcheck_declare_hang+0xd3/0xf0 [i915]
             i915_hangcheck_elapsed+0x262/0x2d0 [i915]
             process_one_work+0x233/0x660
             worker_thread+0x4e/0x3b0
             kthread+0x152/0x190
             ret_from_fork+0x27/0x40
      
      other info that might help us debug this:
      
      Chain exists of:
        cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev->struct_mutex
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(&dev->struct_mutex);
                                     lock(&mm->mmap_sem);
                                     lock(&dev->struct_mutex);
        lock(cpu_hotplug_lock.rw_sem);
      
       *** DEADLOCK ***
      
      3 locks held by kworker/3:4/562:
       #0:  ("events_long"){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
       #1:  ((&(&i915->gpu_error.hangcheck_work)->work)){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
       #2:  (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]
      
      stack backtrace:
      CPU: 3 PID: 562 Comm: kworker/3:4 Tainted: G     U          4.14.0-rc3-CI-CI_DRM_3179+ #1
      Hardware name:                  /NUC7i5BNB, BIOS BNKBL357.86A.0048.2017.0704.1415 07/04/2017
      Workqueue: events_long i915_hangcheck_elapsed [i915]
      Call Trace:
       dump_stack+0x68/0x9f
       print_circular_bug+0x235/0x3c0
       ? lockdep_init_map_crosslock+0x20/0x20
       check_prev_add+0x430/0x840
       ? irq_work_queue+0x86/0xe0
       ? wake_up_klogd+0x53/0x70
       __lock_acquire+0x1420/0x15e0
       ? __lock_acquire+0x1420/0x15e0
       ? lockdep_init_map_crosslock+0x20/0x20
       lock_acquire+0xb0/0x200
       ? stop_machine+0x1c/0x40
       ? i915_gem_object_truncate+0x50/0x50 [i915]
       cpus_read_lock+0x3d/0xb0
       ? stop_machine+0x1c/0x40
       stop_machine+0x1c/0x40
       i915_gem_set_wedged+0x1a/0x20 [i915]
       i915_reset+0xb9/0x230 [i915]
       i915_reset_device+0x1f6/0x260 [i915]
       ? gen8_gt_irq_ack+0x170/0x170 [i915]
       ? work_on_cpu_safe+0x60/0x60
       i915_handle_error+0x2d8/0x430 [i915]
       ? vsnprintf+0xd1/0x4b0
       ? scnprintf+0x3a/0x70
       hangcheck_declare_hang+0xd3/0xf0 [i915]
       ? intel_runtime_pm_put+0x56/0xa0 [i915]
       i915_hangcheck_elapsed+0x262/0x2d0 [i915]
       process_one_work+0x233/0x660
       worker_thread+0x4e/0x3b0
       kthread+0x152/0x190
       ? process_one_work+0x660/0x660
       ? kthread_create_on_node+0x40/0x40
       ret_from_fork+0x27/0x40
      Setting dangerous option reset - tainting kernel
      i915 0000:00:02.0: Resetting chip after gpu hang
      Setting dangerous option reset - tainting kernel
      i915 0000:00:02.0: Resetting chip after gpu hang
      
      v2: Have 1 global synchronize_rcu() barrier across all engines, and
      improve commit message.
      
      v3: We need to protect the seqno update with the timeline spinlock (in
      set_wedged) to avoid racing with other updates of the seqno, like we
      already do in nop_submit_request (Chris).
      
      v4: Use two-phase sequence to plug the race Chris spotted where we can
      complete requests before they're marked up with -EIO.
      
      v5: Review from Chris:
      - simplify nop_submit_request.
      - Add comment to rcu_read_lock section.
      - Align comments with the new style.
      
      v6: Remove unused variable to appease CI.
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102886
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103096
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Marta Lofstedt <marta.lofstedt@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20171011091019.1425-1-daniel.vetter@ffwll.ch
      af7a8ffa