• Paul E. McKenney's avatar
    rcu: Clean up handling of tasks blocked across full-rcu_node offline · 962aff03
    Paul E. McKenney authored
    Commit 0aa04b05 ("rcu: Process offlining and onlining only at
    grace-period start") deferred handling of CPU-hotplug events until the
    start of the next grace period, but consider the following sequence
    of events:
    
    1.	A task is preempted within an RCU-preempt read-side critical
    	section.
    
    2.	The CPU that this task was running on goes offline, along with all
    	other CPUs sharing the corresponding leaf rcu_node structure.
    
    3.	The task resumes execution.
    
    4.	One of those CPUs comes back online before a new grace period starts.
    
    In step 2, the code in the next rcu_gp_init() invocation will (correctly)
    defer removing the leaf rcu_node structure from the upper-level bitmasks,
    and will (correctly) set that structure's ->wait_blkd_tasks field.  During
    the ensuing interval, RCU will (correctly) track the tasks preempted on
    that structure because they must block any subsequent grace period.
    
    In step 3, the code in rcu_read_unlock_special() will (correctly) remove
    the task from the leaf rcu_node structure.  From this point forward, RCU
    need not pay attention to this structure, at least not until one of the
    corresponding CPUs comes back online.
    
    In step 4, the code in the next rcu_gp_init() invocation will
    (incorrectly) invoke rcu_init_new_rnp().  This is incorrect because
    the corresponding rcu_cleanup_dead_rnp() was never invoked.  This is
    nevertheless harmless because the upper-level bits are still set.
    So, no harm, no foul, right?
    
    At least, all is well until a little further into rcu_gp_init()
    invocation, which will notice that there are no longer any tasks blocked
    on the leaf rcu_node structure, conclude that there is no longer anything
    left over from step 2's offline operation, and will therefore invoke
    rcu_cleanup_dead_rnp().  But this invocation of rcu_cleanup_dead_rnp()
    is for the beginning of the earlier offline interval, and the previous
    invocation of rcu_init_new_rnp() is for the end of that same interval.
    That is right, they are invoked out of order.
    
    That cannot be good, can it?
    
    It turns out that this is not a (correctness!) problem because
    rcu_cleanup_dead_rnp() checks to see if any of the corresponding CPUs
    are online, and refuses to do anything if so.  In other words, in the
    case where rcu_init_new_rnp() and rcu_cleanup_dead_rnp() execute out of
    order, they both have no effect.
    
    But this is at best an accident waiting to happen.
    
    This commit therefore adds logic to rcu_gp_init() so that
    rcu_init_new_rnp() and rcu_cleanup_dead_rnp() are always invoked in
    order, and so that neither are invoked at all in cases where RCU had to
    pay attention to the leaf rcu_node structure during the entire time that
    all corresponding CPUs were offline.
    
    And, while in the area, this commit reduces confusion by using formal
    parameters rather than local variables that just happen to have the same
    value at that particular point in the code.
    Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
    962aff03
tree.c 129 KB