• Frederic Weisbecker's avatar
    workqueue: Provide one lock class key per work_on_cpu() callsite · 265f3ed0
    Frederic Weisbecker authored
    All callers of work_on_cpu() share the same lock class key for all the
    functions queued. As a result the workqueue related locking scenario for
    a function A may be spuriously accounted as an inversion against the
    locking scenario of function B such as in the following model:
    
    	long A(void *arg)
    	{
    		mutex_lock(&mutex);
    		mutex_unlock(&mutex);
    	}
    
    	long B(void *arg)
    	{
    	}
    
    	void launchA(void)
    	{
    		work_on_cpu(0, A, NULL);
    	}
    
    	void launchB(void)
    	{
    		mutex_lock(&mutex);
    		work_on_cpu(1, B, NULL);
    		mutex_unlock(&mutex);
    	}
    
    launchA and launchB running concurrently have no chance to deadlock.
    However the above can be reported by lockdep as a possible locking
    inversion because the works containing A() and B() are treated as
    belonging to the same locking class.
    
    The following shows an existing example of such a spurious lockdep splat:
    
    	 ======================================================
    	 WARNING: possible circular locking dependency detected
    	 6.6.0-rc1-00065-g934ebd6e5359 #35409 Not tainted
    	 ------------------------------------------------------
    	 kworker/0:1/9 is trying to acquire lock:
    	 ffffffff9bc72f30 (cpu_hotplug_lock){++++}-{0:0}, at: _cpu_down+0x57/0x2b0
    
    	 but task is already holding lock:
    	 ffff9e3bc0057e60 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_scheduled_works+0x216/0x500
    
    	 which lock already depends on the new lock.
    
    	 the existing dependency chain (in reverse order) is:
    
    	 -> #2 ((work_completion)(&wfc.work)){+.+.}-{0:0}:
    			__flush_work+0x83/0x4e0
    			work_on_cpu+0x97/0xc0
    			rcu_nocb_cpu_offload+0x62/0xb0
    			rcu_nocb_toggle+0xd0/0x1d0
    			kthread+0xe6/0x120
    			ret_from_fork+0x2f/0x40
    			ret_from_fork_asm+0x1b/0x30
    
    	 -> #1 (rcu_state.barrier_mutex){+.+.}-{3:3}:
    			__mutex_lock+0x81/0xc80
    			rcu_nocb_cpu_deoffload+0x38/0xb0
    			rcu_nocb_toggle+0x144/0x1d0
    			kthread+0xe6/0x120
    			ret_from_fork+0x2f/0x40
    			ret_from_fork_asm+0x1b/0x30
    
    	 -> #0 (cpu_hotplug_lock){++++}-{0:0}:
    			__lock_acquire+0x1538/0x2500
    			lock_acquire+0xbf/0x2a0
    			percpu_down_write+0x31/0x200
    			_cpu_down+0x57/0x2b0
    			__cpu_down_maps_locked+0x10/0x20
    			work_for_cpu_fn+0x15/0x20
    			process_scheduled_works+0x2a7/0x500
    			worker_thread+0x173/0x330
    			kthread+0xe6/0x120
    			ret_from_fork+0x2f/0x40
    			ret_from_fork_asm+0x1b/0x30
    
    	 other info that might help us debug this:
    
    	 Chain exists of:
    	   cpu_hotplug_lock --> rcu_state.barrier_mutex --> (work_completion)(&wfc.work)
    
    	  Possible unsafe locking scenario:
    
    			CPU0                    CPU1
    			----                    ----
    	   lock((work_completion)(&wfc.work));
    									lock(rcu_state.barrier_mutex);
    									lock((work_completion)(&wfc.work));
    	   lock(cpu_hotplug_lock);
    
    	  *** DEADLOCK ***
    
    	 2 locks held by kworker/0:1/9:
    	  #0: ffff900481068b38 ((wq_completion)events){+.+.}-{0:0}, at: process_scheduled_works+0x212/0x500
    	  #1: ffff9e3bc0057e60 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_scheduled_works+0x216/0x500
    
    	 stack backtrace:
    	 CPU: 0 PID: 9 Comm: kworker/0:1 Not tainted 6.6.0-rc1-00065-g934ebd6e5359 #35409
    	 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
    	 Workqueue: events work_for_cpu_fn
    	 Call Trace:
    	 rcu-torture: rcu_torture_read_exit: Start of episode
    	  <TASK>
    	  dump_stack_lvl+0x4a/0x80
    	  check_noncircular+0x132/0x150
    	  __lock_acquire+0x1538/0x2500
    	  lock_acquire+0xbf/0x2a0
    	  ? _cpu_down+0x57/0x2b0
    	  percpu_down_write+0x31/0x200
    	  ? _cpu_down+0x57/0x2b0
    	  _cpu_down+0x57/0x2b0
    	  __cpu_down_maps_locked+0x10/0x20
    	  work_for_cpu_fn+0x15/0x20
    	  process_scheduled_works+0x2a7/0x500
    	  worker_thread+0x173/0x330
    	  ? __pfx_worker_thread+0x10/0x10
    	  kthread+0xe6/0x120
    	  ? __pfx_kthread+0x10/0x10
    	  ret_from_fork+0x2f/0x40
    	  ? __pfx_kthread+0x10/0x10
    	  ret_from_fork_asm+0x1b/0x30
    	  </TASK
    
    Fix this with providing one lock class key per work_on_cpu() caller.
    Reported-and-tested-by: default avatarPaul E. McKenney <paulmck@kernel.org>
    Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    265f3ed0
workqueue.c 189 KB