• Chris Wilson's avatar
    drm/i915/gem: Avoid rcu_barrier() from shrinker paths · 16c46fd5
    Chris Wilson authored
    As i915_gem_object_unbind() waits on an rcu_barrier() to flush vm
    releases (and destruction of their bound vma), we have to be careful not
    to invoke that barrier from beneath the shrinker:
    
    <4> [430.222671] WARNING: possible circular locking dependency detected
    <4> [430.222673] 5.4.0-rc8-CI-CI_DRM_7508+ #1 Tainted: G     U
    <4> [430.222675] ------------------------------------------------------
    <4> [430.222677] gem_pwrite/2317 is trying to acquire lock:
    <4> [430.222678] ffffffff82248218 (rcu_state.barrier_mutex){+.+.}, at: rcu_barrier+0x23/0x190
    <4> [430.222685]
    but task is already holding lock:
    <4> [430.222687] ffffffff82263a40 (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part.117+0x0/0x30
    <4> [430.222691]
    which lock already depends on the new lock.
    
    <4> [430.222693]
    the existing dependency chain (in reverse order) is:
    <4> [430.222695]
    -> #2 (fs_reclaim){+.+.}:
    <4> [430.222698]        fs_reclaim_acquire.part.117+0x24/0x30
    <4> [430.222702]        kmem_cache_alloc_trace+0x2a/0x2c0
    <4> [430.222705]        intel_cpuc_prepare+0x37/0x1a0
    <4> [430.222709]        cpuhp_invoke_callback+0x9b/0x9d0
    <4> [430.222712]        _cpu_up+0xa2/0x140
    <4> [430.222714]        do_cpu_up+0x61/0xa0
    <4> [430.222718]        smp_init+0x57/0x96
    <4> [430.222722]        kernel_init_freeable+0xac/0x1c7
    <4> [430.222725]        kernel_init+0x5/0x100
    <4> [430.222728]        ret_from_fork+0x24/0x50
    <4> [430.222729]
    -> #1 (cpu_hotplug_lock.rw_sem){++++}:
    <4> [430.222733]        cpus_read_lock+0x34/0xd0
    <4> [430.222734]        rcu_barrier+0xaa/0x190
    <4> [430.222736]        kernel_init+0x21/0x100
    <4> [430.222737]        ret_from_fork+0x24/0x50
    <4> [430.222739]
    -> #0 (rcu_state.barrier_mutex){+.+.}:
    <4> [430.222742]        __lock_acquire+0x1328/0x15d0
    <4> [430.222743]        lock_acquire+0xa7/0x1c0
    <4> [430.222746]        __mutex_lock+0x9a/0x9d0
    <4> [430.222747]        rcu_barrier+0x23/0x190
    <4> [430.222850]        i915_gem_object_unbind+0x264/0x3d0 [i915]
    <4> [430.222882]        i915_gem_shrink+0x297/0x5f0 [i915]
    <4> [430.222912]        i915_gem_shrink_all+0x38/0x60 [i915]
    <4> [430.222934]        i915_drop_caches_set+0x1f0/0x240 [i915]
    <4> [430.222938]        simple_attr_write+0xb0/0xd0
    <4> [430.222941]        full_proxy_write+0x51/0x80
    <4> [430.222943]        vfs_write+0xb9/0x1d0
    <4> [430.222944]        ksys_write+0x9f/0xe0
    <4> [430.222946]        do_syscall_64+0x4f/0x210
    <4> [430.222948]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
    <4> [430.222950]
    other info that might help us debug this:
    
    <4> [430.222952] Chain exists of:
      rcu_state.barrier_mutex --> cpu_hotplug_lock.rw_sem --> fs_reclaim
    
    <4> [430.222955]  Possible unsafe locking scenario:
    
    <4> [430.222957]        CPU0                    CPU1
    <4> [430.222958]        ----                    ----
    <4> [430.222960]   lock(fs_reclaim);
    <4> [430.222961]                                lock(cpu_hotplug_lock.rw_sem);
    <4> [430.222963]                                lock(fs_reclaim);
    <4> [430.222964]   lock(rcu_state.barrier_mutex);
    <4> [430.222966]
     *** DEADLOCK ***
    
    <4> [430.222968] 3 locks held by gem_pwrite/2317:
    <4> [430.222969]  #0: ffff88849e2d9408 (sb_writers#14){.+.+}, at: vfs_write+0x1a4/0x1d0
    <4> [430.222973]  #1: ffff888496976db0 (&attr->mutex){+.+.}, at: simple_attr_write+0x36/0xd0
    <4> [430.222976]  #2: ffffffff82263a40 (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part.117+0x0/0x30
    <4> [430.222980]
    stack backtrace:
    <4> [430.222982] CPU: 1 PID: 2317 Comm: gem_pwrite Tainted: G     U            5.4.0-rc8-CI-CI_DRM_7508+ #1
    <4> [430.222985] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake U DDR4 SODIMM RVP, BIOS TGLSFWI1.R00.2321.A08.1909162051 09/16/2019
    <4> [430.222989] Call Trace:
    <4> [430.222992]  dump_stack+0x71/0x9b
    <4> [430.222995]  check_noncircular+0x19b/0x1c0
    <4> [430.222998]  ? __lock_acquire+0x1328/0x15d0
    <4> [430.222999]  __lock_acquire+0x1328/0x15d0
    <4> [430.223001]  ? mark_held_locks+0x49/0x70
    <4> [430.223003]  lock_acquire+0xa7/0x1c0
    <4> [430.223005]  ? rcu_barrier+0x23/0x190
    <4> [430.223008]  __mutex_lock+0x9a/0x9d0
    <4> [430.223009]  ? rcu_barrier+0x23/0x190
    <4> [430.223011]  ? rcu_barrier+0x23/0x190
    <4> [430.223013]  ? find_held_lock+0x2d/0x90
    <4> [430.223045]  ? i915_gem_object_unbind+0x24a/0x3d0 [i915]
    <4> [430.223048]  ? rcu_barrier+0x23/0x190
    <4> [430.223049]  rcu_barrier+0x23/0x190
    <4> [430.223081]  i915_gem_object_unbind+0x264/0x3d0 [i915]
    <4> [430.223119]  i915_gem_shrink+0x297/0x5f0 [i915]
    
    Closes: https://gitlab.freedesktop.org/drm/intel/issues/743Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: default avatarMatthew Auld <matthew.auld@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20191208161252.3015727-1-chris@chris-wilson.co.uk
    16c46fd5
i915_gem_domain.c 18.2 KB