• Gautham R Shenoy's avatar
    rcu: fix hotplug vs rcu race · 8558f8f8
    Gautham R Shenoy authored
    Dhaval Giani reported this warning during cpu hotplug stress-tests:
    
    | On running kernel compiles in parallel with cpu hotplug:
    |
    | WARNING: at arch/x86/kernel/smp.c:118
    | native_smp_send_reschedule+0x21/0x36()
    | Modules linked in:
    | Pid: 27483, comm: cc1 Not tainted 2.6.26-rc7 #1
    | [...]
    |  [<c0110355>] native_smp_send_reschedule+0x21/0x36
    |  [<c014fe8f>] force_quiescent_state+0x47/0x57
    |  [<c014fef0>] call_rcu+0x51/0x6d
    |  [<c01713b3>] __fput+0x130/0x158
    |  [<c0171231>] fput+0x17/0x19
    |  [<c016fd99>] filp_close+0x4d/0x57
    |  [<c016fdff>] sys_close+0x5c/0x97
    
    IMHO the warning is a spurious one.
    
    cpu_online_map is updated by the _cpu_down() using stop_machine_run().
    Since force_quiescent_state is invoked from irqs disabled section,
    stop_machine_run() won't be executing while a cpu is executing
    force_quiescent_state(). Hence the cpu_online_map is stable while we're
    in the irq disabled section.
    
    However, a cpu might have been offlined _just_ before we disabled irqs
    while entering force_quiescent_state(). And rcu subsystem might not yet
    have handled the CPU_DEAD notification, leading to the offlined cpu's
    bit being set in the rcp->cpumask.
    
    Hence cpumask = (rcp->cpumask & cpu_online_map) to prevent sending
    smp_reschedule() to an offlined CPU.
    
    Here's the timeline:
    
    CPU_A						 CPU_B
    --------------------------------------------------------------
    cpu_down():					.
    .					   	.
    .						.
    stop_machine(): /* disables preemption,		.
    		 * and irqs */			.
    .						.
    .						.
    take_cpu_down();				.
    .						.
    .						.
    .						.
    cpu_disable(); /*this removes cpu 		.
    		*from cpu_online_map 		.
    		*/				.
    .						.
    .						.
    restart_machine(); /* enables irqs */		.
    ------WINDOW DURING WHICH rcp->cpumask is stale ---------------
    .						call_rcu();
    .						/* disables irqs here */
    .						.force_quiescent_state();
    .CPU_DEAD:					.for_each_cpu(rcp->cpumask)
    .						.   smp_send_reschedule();
    .						.
    .						.   WARN_ON() for offlined CPU!
    .
    .
    .
    rcu_cpu_notify:
    .
    -------- WINDOW ENDS ------------------------------------------
    rcu_offline_cpu() /* Which calls cpu_quiet()
    		   * which removes
    		   * cpu from rcp->cpumask.
    		   */
    
    If a new batch was started just before calling stop_machine_run(), the
    "tobe-offlined" cpu is still present in rcp-cpumask.
    
    During a cpu-offline, from take_cpu_down(), we queue an rt-prio idle
    task as the next task to be picked by the scheduler. We also call
    cpu_disable() which will disable any further interrupts and remove the
    cpu's bit from the cpu_online_map.
    
    Once the stop_machine_run() successfully calls take_cpu_down(), it calls
    schedule(). That's the last time a schedule is called on the offlined
    cpu, and hence the last time when rdp->passed_quiesc will be set to 1
    through rcu_qsctr_inc().
    
    But the cpu_quiet() will be on this cpu will be called only when the
    next RCU_SOFTIRQ occurs on this CPU. So at this time, the offlined CPU
    is still set in rcp->cpumask.
    
    Now coming back to the idle_task which truely offlines the CPU, it does
    check for a pending RCU and raises the softirq, since it will find
    rdp->passed_quiesc to be 0 in this case. However, since the cpu is
    offline I am not sure if the softirq will trigger on the CPU.
    
    Even if it doesn't the rcu_offline_cpu() will find that rcp->completed
    is not the same as rcp->cur, which means that our cpu could be holding
    up the grace period progression. Hence we call cpu_quiet() and move
    ahead.
    
    But because of the window explained in the timeline, we could still have
    a call_rcu() before the RCU subsystem executes it's CPU_DEAD
    notification, and we send smp_send_reschedule() to offlined cpu while
    trying to force the quiescent states. The appended patch adds comments
    and prevents checking for offlined cpu everytime.
    
    cpu_online_map is updated by the _cpu_down() using stop_machine_run().
    Since force_quiescent_state is invoked from irqs disabled section,
    stop_machine_run() won't be executing while a cpu is executing
    force_quiescent_state(). Hence the cpu_online_map is stable while we're
    in the irq disabled section.
    Reported-by: default avatarDhaval Giani <dhaval@linux.vnet.ibm.com>
    Signed-off-by: default avatarGautham R Shenoy <ego@in.ibm.com>
    Acked-by: default avatarDhaval Giani <dhaval@linux.vnet.ibm.com>
    Cc: Dipankar Sarma <dipankar@in.ibm.com>
    Cc: laijs@cn.fujitsu.com
    Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Cc: Rusty Russel <rusty@rustcorp.com.au>
    Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
    Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
    8558f8f8
rcuclassic.c 16.2 KB