1. 12 Jul, 2018 40 commits
    • Paul E. McKenney's avatar
      rcutorture: Change units of onoff_interval to jiffies · 028be12b
      Paul E. McKenney authored
      Some RCU bugs have been sensitive to the frequency of CPU-hotplug
      operations, which have been gradually increased over time.  But this
      frequency is now at the one-second lower limit that can be specified using
      the rcutorture.onoff_interval kernel parameter.  This commit therefore
      changes the units of rcutorture.onoff_interval from seconds to jiffies,
      and also sets the value specified for this kernel parameter in the TREE03
      rcutorture scenario to 200, which is 200 milliseconds for HZ=1000.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      028be12b
    • Paul E. McKenney's avatar
      rcu: Add diagnostics for offline CPUs failing to report QS · f2e2df59
      Paul E. McKenney authored
      CPUs are expected to report quiescent states when coming online and
      when going offline, and grace-period initialization is supposed to
      handle any race conditions where a CPU's ->qsmask bit is set just after
      it goes offline.  This commit adds diagnostics for the case where an
      offline CPU nevertheless has a grace period waiting on it.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      f2e2df59
    • Paul E. McKenney's avatar
      rcu: Record ->gp_state for both phases of grace-period initialization · fea3f222
      Paul E. McKenney authored
      Grace-period initialization first processes any recent CPU-hotplug
      operations, and then initializes state for the new grace period.  These
      two phases of initialization are currently not distinguished in debug
      prints, but the distinction is valuable in a number of debug situations.
      This commit therefore introduces two new values for ->gp_state,
      RCU_GP_ONOFF and RCU_GP_INIT, in order to make this distinction.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      fea3f222
    • Paul E. McKenney's avatar
      rcu: Add CPU online/offline state to dump_blkd_tasks() · 57738942
      Paul E. McKenney authored
      Interactions between CPU-hotplug operations and grace-period
      initialization can result in dump_blkd_tasks().  One of the first
      debugging actions in this case is to search back in dmesg to work
      out which of the affected rcu_node structure's CPUs are online and to
      determine the last CPU-hotplug operation affecting any of those CPUs.
      This can be laborious and error-prone, especially when console output
      is lost.
      
      This commit therefore causes dump_blkd_tasks() to dump the state of
      the affected rcu_node structure's CPUs and the last grace period during
      which the last offline and online operation affected each of these CPUs.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      57738942
    • Paul E. McKenney's avatar
      rcu: Add up-tree information to dump_blkd_tasks() diagnostics · ff3cee39
      Paul E. McKenney authored
      This commit updates dump_blkd_tasks() to print out quiescent-state
      bitmasks for the rcu_node structures further up the tree.  This
      information helps debugging of interactions between CPU-hotplug
      operations and RCU grace-period initialization.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      ff3cee39
    • Paul E. McKenney's avatar
      rcu: Remove CPU-hotplug failsafe from force-quiescent-state code path · e05121ba
      Paul E. McKenney authored
      Now that quiescent states for newly offlined CPUs are reported either
      when that CPU goes offline or at the end of grace-period initialization,
      the CPU-hotplug failsafe in the force-quiescent-state code path is no
      longer needed.
      
      This commit therefore removes this failsafe.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      e05121ba
    • Paul E. McKenney's avatar
      rcu: Remove failsafe check for lost quiescent state · 17a8212b
      Paul E. McKenney authored
      Now that quiescent-state reporting is fully event-driven, this commit
      removes the check for a lost quiescent state from force_qs_rnp().
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      17a8212b
    • Paul E. McKenney's avatar
      rcu: Move grace-period pre-init delay after pre-init · f34f2f58
      Paul E. McKenney authored
      The main race with the early part of grace-period initialization appears
      to be with CPU hotplug.  To more fully open this race window, this commit
      moves the rcu_gp_slow() from the beginning of the early initialization
      loop to follow that loop, thus widening the race window, especially for
      the rcu_node structures that are initialized last.  This commit also
      expands rcutree.gp_preinit_delay from 3 to 12, giving the same overall
      delay in the grace period, but concentrated in the spot where it will
      do the most good.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      f34f2f58
    • Paul E. McKenney's avatar
      rcu: Add RCU-preempt check for waiting on newly onlined CPU · 1f3e5f51
      Paul E. McKenney authored
      RCU should only be waiting on CPUs that were online at the time that the
      current grace period started.  Failure to abide by this rule can result
      in confusing splats during grace-period cleanup and initialization.
      This commit therefore adds a check to RCU-preempt's preempted-task
      queuing that checks for waiting on newly onlined CPUs.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      1f3e5f51
    • Paul E. McKenney's avatar
      rcu: Fix grace-period hangs due to race with CPU offline · 1e64b15a
      Paul E. McKenney authored
      Without special fail-safe quiescent-state-propagation checks, grace-period
      hangs can result from the following scenario:
      
      1.	CPU 1 goes offline.
      
      2.	Because CPU 1 is the only CPU in the system blocking the current
      	grace period, the grace period ends as soon as
      	rcu_cleanup_dying_idle_cpu()'s call to rcu_report_qs_rnp()
      	returns.
      
      3.	At this point, the leaf rcu_node structure's ->lock is no longer
      	held: rcu_report_qs_rnp() has released it, as it must in order
      	to awaken the RCU grace-period kthread.
      
      4.	At this point, that same leaf rcu_node structure's ->qsmaskinitnext
      	field still records CPU 1 as being online.  This is absolutely
      	necessary because the scheduler uses RCU (in this case on the
      	wake-up path while awakening RCU's grace-period kthread), and
      	->qsmaskinitnext contains RCU's idea as to which CPUs are online.
      	Therefore, invoking rcu_report_qs_rnp() after clearing CPU 1's
      	bit from ->qsmaskinitnext would result in a lockdep-RCU splat
      	due to RCU being used from an offline CPU.
      
      5.	RCU's grace-period kthread awakens, sees that the old grace period
      	has completed and that a new one is needed.  It therefore starts
      	a new grace period, but because CPU 1's leaf rcu_node structure's
      	->qsmaskinitnext field still shows CPU 1 as being online, this new
      	grace period is initialized to wait for a quiescent state from the
      	now-offline CPU 1.
      
      6.	Without the fail-safe force-quiescent-state checks, there would
      	be no quiescent state from the now-offline CPU 1, which would
      	eventually result in RCU CPU stall warnings and memory exhaustion.
      
      It would be good to get rid of the special fail-safe quiescent-state
      propagation checks, and thus it would be good to fix things so that
      the above scenario cannot happen.  This commit therefore adds a new
      ->ofl_lock to the rcu_state structure.  This lock is held by rcu_gp_init()
      across the applying of buffered online and offline operations to the
      rcu_node tree, and it is also held by rcu_cleanup_dying_idle_cpu()
      when buffering a new offline operation.  This prevents rcu_gp_init()
      from acquiring the leaf rcu_node structure's lock during the interval
      between when rcu_cleanup_dying_idle_cpu() invokes rcu_report_qs_rnp(),
      which releases ->lock and the re-acquisition of that same lock.
      This in turn prevents the failure scenario outlined above, and will
      hopefully eventually allow removal of the offline-CPU checks from the
      force-quiescent-state code path.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      1e64b15a
    • Paul E. McKenney's avatar
      rcu: Fix grace-period hangs from mid-init task resume · ec2c2976
      Paul E. McKenney authored
      Without special fail-safe quiescent-state-propagation checks, grace-period
      hangs can result from the following scenario:
      
      1.	A task running on a given CPU is preempted in its RCU read-side
      	critical section.
      
      2.	That CPU goes offline, and there are now no online CPUs
      	corresponding to that CPU's leaf rcu_node structure.
      
      3.	The rcu_gp_init() function does the first phase of grace-period
      	initialization, and sets the aforementioned leaf rcu_node
      	structure's ->qsmaskinit field to all zeroes.  Because there
      	is a blocked task, it does not propagate the zeroing of either
      	->qsmaskinit or ->qsmaskinitnext up the rcu_node tree.
      
      4.	The task resumes on some other CPU and exits its critical section.
      	There is no grace period in progress, so the resulting quiescent
      	state is not reported up the tree.
      
      5.	The rcu_gp_init() function does the second phase of grace-period
      	initialization, which results in the leaf rcu_node structure
      	being initialized to expect no further quiescent states, but
      	with that structure's parent expecting a quiescent-state report.
      
      	The parent will never receive a quiescent state from this leaf
      	rcu_node structure, so the grace period will hang, resulting in
      	RCU CPU stall warnings.
      
      It would be good to get rid of the special fail-safe quiescent-state
      propagation checks.  This commit therefore checks the leaf rcu_node
      structure's ->wait_blkd_tasks field during grace-period initialization.
      If this flag is set, the rcu_report_qs_rnp() is invoked to immediately
      report the possible quiescent state.  While in the neighborhood, this
      commit also report quiescent states for any CPUs that went offline between
      the two phases of grace-period initialization, thus reducing grace-period
      delays and hopefully eventually allowing removal of offline-CPU checks
      from the force-quiescent-state code path.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      ec2c2976
    • Paul E. McKenney's avatar
      rcu: Suppress false-positive splats from mid-init task resume · 0b107d24
      Paul E. McKenney authored
      Consider the following sequence of events in a PREEMPT=y kernel:
      
      1.	All CPUs corresponding to a given leaf rcu_node structure are
      	offline.
      
      2.	The first phase of the rcu_gp_init() function's grace-period
      	initialization runs, and sets that rcu_node structure's
      	->qsmaskinit to zero, as it should.
      
      3.	One of the CPUs corresponding to that rcu_node structure comes
      	back online.  Note that because this CPU came online after the
      	grace period started, this grace period can safely ignore this
      	newly onlined CPU.
      
      4.	A task running on the newly onlined CPU enters an RCU-preempt
      	read-side critical section, and is then preempted.  Because
      	the corresponding rcu_node structure's ->qsmask is zero,
      	rcu_preempt_ctxt_queue() leaves the rcu_node structure's
      	->gp_tasks field NULL, as it should.
      
      5.	The rcu_gp_init() function continues running the second phase of
      	grace-period initialization.  The ->qsmask field of the parent of
      	the aforementioned leaf rcu_node structure is set to not expect
      	a quiescent state from the leaf, as is only right and proper.
      
      	However, when rcu_gp_init() reaches the leaf, it invokes
      	rcu_preempt_check_blocked_tasks(), which sees that the leaf's
      	->blkd_tasks list is non-empty, and therefore sets the leaf's
      	->gp_tasks field to reference the first task on that list.
      
      6.	The grace period ends before the preempted task resumes, which
      	is perfectly fine, given that this grace period was under no
      	obligation to wait for that task to exit its late-starting
      	RCU-preempt read-side critical section.  Unfortunately, the
      	leaf's ->gp_tasks field is non-NULL, so rcu_gp_cleanup() splats.
      	After all, it appears to rcu_gp_cleanup() that the grace period
      	failed to wait for a task that was supposed to be blocking that
      	grace period.
      
      This commit avoids this false-positive splat by adding a check of both
      ->qsmaskinit and ->wait_blkd_tasks to rcu_preempt_check_blocked_tasks().
      If both ->qsmaskinit and ->wait_blkd_tasks are zero, then the task must
      have entered its RCU-preempt read-side critical section late (after all,
      the CPU that it is running on was not online at that time), which means
      that the upper-level rcu_node structure won't be waiting for anything
      on the leaf anyway.
      
      If ->wait_blkd_tasks is non-zero, then there is at least one task on
      ths rcu_node structure's ->blkd_tasks list whose RCU read-side
      critical section predates the current grace period.  If ->qsmaskinit
      is non-zero, there is at least one CPU that was online at the start
      of the current grace period.  Thus, if both are zero, there is nothing
      to wait for.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      0b107d24
    • Paul E. McKenney's avatar
      rcu: Suppress more involved false-positive preempted-task splats · 99990da1
      Paul E. McKenney authored
      Consider the following sequence of events in a PREEMPT=y kernel:
      
      1.	All but one of the CPUs corresponding to a given leaf rcu_node
      	structure go offline.  Each of these CPUs clears its bit in that
      	structure's ->qsmaskinitnext field.
      
      2.	A new grace period starts, and rcu_gp_init() scans the leaf
      	rcu_node structures, applying CPU-hotplug changes since the
      	start of the previous grace period, including those changes in
      	#1 above.  This copies each leaf structure's ->qsmaskinitnext
      	to its ->qsmask field, which represents the CPUs that this new
      	grace period will wait on.  Each copy operation is done holding
      	the corresponding leaf rcu_node structure's ->lock, and at the
      	end of this scan, rcu_gp_init() holds no locks.
      
      3.	The last CPU corresponding to #1's leaf rcu_node structure goes
      	offline, clearing its bit in that structure's ->qsmaskinitnext
      	field, but not touching the ->qsmaskinit field.  Note that
      	rcu_gp_init() is not currently holding any locks!  This CPU does
      	-not- report a quiescent state because the grace period has not
      	yet initialized itself sufficiently to have set any bits in any
      	of the leaf rcu_node structures' ->qsmask fields.
      
      4.	The rcu_gp_init() function continues initializing the new grace
      	period, copying each leaf rcu_node structure's ->qsmaskinit
      	field to its ->qsmask field while holding the corresponding ->lock.
      	This sets the ->qsmask bit corresponding to #3's CPU.
      
      5.	Before the grace period ends, #3's CPU comes back online.
      	Because te grace period has not yet done any force-quiescent-state
      	scans (which would report a quiescent state on behalf of any
      	offline CPUs), this CPU's ->qsmask bit is still set.
      
      6.	A task running on the newly onlined CPU is preempted while in
      	an RCU read-side critical section.  Because this CPU's ->qsmask
      	bit is net, not only does this task queue itself on the leaf
      	rcu_node structure's ->blkd_tasks list, it also sets that
      	structure's ->gp_tasks pointer to reference it.
      
      7.	The grace period started in #1 above comes to an end.  This
      	results in rcu_gp_cleanup() being invoked, which, among other
      	things, checks to make sure that there are no tasks blocking the
      	just-ended grace period, that is, that all ->gp_tasks pointers
      	are NULL.  The ->gp_tasks pointer corresponding to the task
      	preempted in #3 above is non-NULL, which results in a splat.
      
      This splat is a false positive.  The task's RCU read-side critical
      section cannot have begun before the just-ended grace period because
      this would mean either: (1) The CPU came online before the grace period
      started, which cannot have happened because the grace period started
      before that CPU went offline, or (2) The task started its RCU read-side
      critical section on some other CPU, but then it would have had to have
      been preempted before migrating to this CPU, which would mean that it
      would have instead queued itself on that other CPU's rcu_node structure.
      RCU's grace periods thus are working correctly.  Or, more accurately,
      that remaining bugs in RCU's grace periods are elsewhere.
      
      This commit eliminates this false positive by adding code to the end
      of rcu_cpu_starting() that reports a quiescent state to RCU, which has
      the side-effect of clearing that CPU's ->qsmask bit, preventing the
      above scenario.  This approach has the added benefit of more promptly
      reporting quiescent states corresponding to offline CPUs.  Nevertheless,
      this commit does -not- remove the need for the force-quiescent-state
      scans to check for offline CPUs, given that a CPU might remain offline
      indefinitely.  And without the checks in the force-quiescent-state scans,
      the grace period would also persist indefinitely, which could result in
      hangs or memory exhaustion.
      
      Note well that the call to rcu_report_qs_rnp() reporting the quiescent
      state must come -after- the setting of this CPU's bit in the leaf rcu_node
      structure's ->qsmaskinitnext field.  Otherwise, lockdep-RCU will complain
      bitterly about quiescent states coming from an offline CPU.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      99990da1
    • Paul E. McKenney's avatar
      rcu: Suppress false-positive preempted-task splats · fece2776
      Paul E. McKenney authored
      Consider the following sequence of events in a PREEMPT=y kernel:
      
      1.	All CPUs corresponding to a given rcu_node structure go offline.
      	A new grace period starts just after the CPU-hotplug code path
      	does its synchronize_rcu() for the last CPU, so at least this
      	CPU is present in that structure's ->qsmask.
      
      2.	Before the grace period ends, a CPU comes back online, and not
      	just any CPU, but the one corresponding to a non-zero bit in
      	the leaf rcu_node structure's ->qsmask.
      
      3.	A task running on the newly onlined CPU is preempted while in
      	an RCU read-side critical section.  Because this CPU's ->qsmask
      	bit is net, not only does this task queue itself on the leaf
      	rcu_node structure's ->blkd_tasks list, it also sets that
      	structure's ->gp_tasks pointer to reference it.
      
      4.	The grace period started in #1 above comes to an end.  This
      	results in rcu_gp_cleanup() being invoked, which, among other
      	things, checks to make sure that there are no tasks blocking the
      	just-ended grace period, that is, that all ->gp_tasks pointers
      	are NULL.  The ->gp_tasks pointer corresponding to the task
      	preempted in #3 above is non-NULL, which results in a splat.
      
      This splat is a false positive.  The task's RCU read-side critical
      section cannot have begun before the just-ended grace period because
      this would mean either: (1) The CPU came online before the grace period
      started, which cannot have happened because the grace period started
      before that CPU was all the way offline, or (2) The task started its
      RCU read-side critical section on some other CPU, but then it would
      have had to have been preempted before migrating to this CPU, which
      would mean that it would have instead queued itself on that other CPU's
      rcu_node structure.
      
      This commit eliminates this false positive by adding code to the end
      of rcu_cleanup_dying_idle_cpu() that reports a quiescent state to RCU,
      which has the side-effect of clearing that CPU's ->qsmask bit, preventing
      the above scenario.  This approach has the added benefit of more promptly
      reporting quiescent states corresponding to offline CPUs.
      
      Note well that the call to rcu_report_qs_rnp() reporting the quiescent
      state must come -before- the clearing of this CPU's bit in the leaf
      rcu_node structure's ->qsmaskinitnext field.  Otherwise, lockdep-RCU
      will complain bitterly about quiescent states coming from an offline CPU.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      fece2776
    • Paul E. McKenney's avatar
      rcu: Suppress false-positive offline-CPU lockdep-RCU splat · 5554788e
      Paul E. McKenney authored
      The rcu_lockdep_current_cpu_online() function currently checks only the
      RCU-sched data structures to determine whether or not RCU believes that a
      given CPU is offline.  Unfortunately, there are multiple flavors of RCU,
      which means that there is a short window of time during which the various
      flavors disagree as to whether or not a given CPU is offline.  This can
      result in false-positive lockdep-RCU splats in which some other flavor
      of RCU tries to do something based on its view that the CPU is online,
      only to get hit with a lockdep-RCU splat because RCU-sched instead
      believes that the CPU is offline.
      
      This commit therefore changes rcu_lockdep_current_cpu_online() to scan
      all RCU flavors and to consider a given CPU to be online if any of the
      RCU flavors believe it to be online, thus preventing these false-positive
      splats.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      5554788e
    • Paul E. McKenney's avatar
      rcu: Prevent useless FQS scan after all CPUs have checked in · 92816435
      Paul E. McKenney authored
      The force_qs_rnp() function checks for ->qsmask being all zero, that is,
      all CPUs for the current rcu_node structure having already passed through
      quiescent states.  But with RCU-preempt, this is not sufficient to report
      quiescent states further up the tree, so there are further checks that
      can initiate RCU priority boosting and also for races with CPU-hotplug
      operations.  However, if neither of these further checks apply, the code
      proceeds to carry out a useless scan of an all-zero ->qsmask.
      
      This commit therefore adds code to release the current rcu_node
      structure's lock and continue on to the next rcu_node structure, thereby
      avoiding this useless scan.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      92816435
    • Paul E. McKenney's avatar
      rcu: Replace smp_wmb() with smp_store_release() for stall check · 91f63ced
      Paul E. McKenney authored
      This commit gets rid of the smp_wmb() in record_gp_stall_check_time()
      in favor of an smp_store_release().
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      91f63ced
    • Paul E. McKenney's avatar
      rcu: Fix typo and add additional debug · 77cfc7bf
      Paul E. McKenney authored
      This commit fixes a typo and adds some additional debugging to the
      message emitted when a task blocking the current grace period is listed
      as blocking it when either that grace period ends or the next grace
      period begins.  This commit also reformats the console message for
      readability.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      77cfc7bf
    • Paul E. McKenney's avatar
      rcu: Make rcu_report_unblock_qs_rnp() warn on violated preconditions · c74859d1
      Paul E. McKenney authored
      If rcu_report_unblock_qs_rnp() is invoked on something other than
      preemptible RCU or if there are still preempted tasks blocking the
      current grace period, something went badly wrong in the caller.
      This commit therefore adds WARN_ON_ONCE() to these conditions, but
      leaving the legitimate reason for early exit (rnp->qsmask != 0)
      unwarned.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      c74859d1
    • Paul E. McKenney's avatar
      rcu: Make rcu_init_new_rnp() stop upon already-set bit · 8d672fa6
      Paul E. McKenney authored
      Currently, rcu_init_new_rnp() walks up the rcu_node combining tree,
      setting bits in the ->qsmaskinit fields on the way up.  It walks up
      unconditionally, regardless of the initial state of these bits.  This is
      OK because only the corresponding RCU grace-period kthread ever tests
      or sets these bits during runtime.  However, it is also pointless, and
      it increases both memory and lock contention (albeit only slightly), so
      this commit stops the walk as soon as an already-set bit is encountered.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      8d672fa6
    • Paul E. McKenney's avatar
      rcu: Fix an obsolete ->qsmaskinit comment · c50cbe53
      Paul E. McKenney authored
      Back in the old days, when grace-period initialization blocked CPU
      hotplug, the ->qsmaskinit mask was indeed updated at the time that
      a given CPU went offline.  However, with the deferral of these updates
      until the beginning of the next grace period in commit 0aa04b05
      ("rcu: Process offlining and onlining only at grace-period start"),
      it is instead ->qsmaskinitnext that gets updated at that time.
      
      This commit therefore updates the obsolete comment.  It also fixes
      punctuation while on the topic of comments mentioning ->qsmaskinit.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      c50cbe53
    • Paul E. McKenney's avatar
      rcu: Clean up handling of tasks blocked across full-rcu_node offline · 962aff03
      Paul E. McKenney authored
      Commit 0aa04b05 ("rcu: Process offlining and onlining only at
      grace-period start") deferred handling of CPU-hotplug events until the
      start of the next grace period, but consider the following sequence
      of events:
      
      1.	A task is preempted within an RCU-preempt read-side critical
      	section.
      
      2.	The CPU that this task was running on goes offline, along with all
      	other CPUs sharing the corresponding leaf rcu_node structure.
      
      3.	The task resumes execution.
      
      4.	One of those CPUs comes back online before a new grace period starts.
      
      In step 2, the code in the next rcu_gp_init() invocation will (correctly)
      defer removing the leaf rcu_node structure from the upper-level bitmasks,
      and will (correctly) set that structure's ->wait_blkd_tasks field.  During
      the ensuing interval, RCU will (correctly) track the tasks preempted on
      that structure because they must block any subsequent grace period.
      
      In step 3, the code in rcu_read_unlock_special() will (correctly) remove
      the task from the leaf rcu_node structure.  From this point forward, RCU
      need not pay attention to this structure, at least not until one of the
      corresponding CPUs comes back online.
      
      In step 4, the code in the next rcu_gp_init() invocation will
      (incorrectly) invoke rcu_init_new_rnp().  This is incorrect because
      the corresponding rcu_cleanup_dead_rnp() was never invoked.  This is
      nevertheless harmless because the upper-level bits are still set.
      So, no harm, no foul, right?
      
      At least, all is well until a little further into rcu_gp_init()
      invocation, which will notice that there are no longer any tasks blocked
      on the leaf rcu_node structure, conclude that there is no longer anything
      left over from step 2's offline operation, and will therefore invoke
      rcu_cleanup_dead_rnp().  But this invocation of rcu_cleanup_dead_rnp()
      is for the beginning of the earlier offline interval, and the previous
      invocation of rcu_init_new_rnp() is for the end of that same interval.
      That is right, they are invoked out of order.
      
      That cannot be good, can it?
      
      It turns out that this is not a (correctness!) problem because
      rcu_cleanup_dead_rnp() checks to see if any of the corresponding CPUs
      are online, and refuses to do anything if so.  In other words, in the
      case where rcu_init_new_rnp() and rcu_cleanup_dead_rnp() execute out of
      order, they both have no effect.
      
      But this is at best an accident waiting to happen.
      
      This commit therefore adds logic to rcu_gp_init() so that
      rcu_init_new_rnp() and rcu_cleanup_dead_rnp() are always invoked in
      order, and so that neither are invoked at all in cases where RCU had to
      pay attention to the leaf rcu_node structure during the entire time that
      all corresponding CPUs were offline.
      
      And, while in the area, this commit reduces confusion by using formal
      parameters rather than local variables that just happen to have the same
      value at that particular point in the code.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      962aff03
    • Joel Fernandes (Google)'s avatar
      rcu: Identify grace period is in progress as we advance up the tree · 226ca5e7
      Joel Fernandes (Google) authored
      There's no need to keep checking the same starting node for whether a
      grace period is in progress as we advance up the funnel lock loop. Its
      sufficient if we just checked it in the start, and then subsequently
      checked the internal nodes as we advanced up the combining tree. This
      also makes sense because the grace-period updates propogate from the
      root to the leaf, so there's a chance we may find a grace period has
      started as we advance up, lets check for the same.
      Reported-by: default avatarPaul McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      226ca5e7
    • Joel Fernandes (Google)'s avatar
      rcu: Use better variable names in funnel locking loop · df2bf8f7
      Joel Fernandes (Google) authored
      The funnel locking loop in rcu_start_this_gp uses rcu_root as a
      temporary variable while walking the combining tree. This causes a
      tiresome exercise of a code reader reminding themselves that rcu_root
      may not be root. Lets just call it rnp, and rename other variables as
      well to be more appropriate.
      
      Original patch: https://patchwork.kernel.org/patch/10396577/Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      [ paulmck: Fix name in comment as well. ]
      df2bf8f7
    • Joel Fernandes's avatar
      rcu: Rename the grace-period-request variables and parameters · b73de91d
      Joel Fernandes authored
      The name 'c' is used for variables and parameters holding the requested
      grace-period sequence number.  However it is no longer very meaningful
      given the conversions from ->gpnum and (especially) ->completed to
      ->gp_seq. This commit therefore renames 'c' to 'gp_seq_req'.
      
      Previous patch discussion is at:
      https://patchwork.kernel.org/patch/10396579/Signed-off-by: default avatarJoel Fernandes <joel@joelfernandes.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b73de91d
    • Paul E. McKenney's avatar
      rcu: Regularize resetting of rcu_data wrap indicator · 3d18469a
      Paul E. McKenney authored
      The rcu_data structure's ->gpwrap indicator is currently reset only
      when the CPU in question detects a new grace period.  This is in theory
      sufficient because any CPU that has been out of action for long enough
      that its ->gpwrap indicator is set is guaranteed to see both the end
      of an old grace period and the start of a new one.
      
      However, the current code leaves a short window during which the ->gpwrap
      indicator has been reset but the corresponding ->gp_seq counter has not
      yet been brought up to date.  This is harmless because interrupts are
      disabled, but it is likely to (at the very least) cause confusion.
      
      This commit therefore moves the resetting of ->gpwrap to follow the
      updating of ->gp_seq.  While in the area, it also resets ->gp_seq_needed.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      3d18469a
    • Paul E. McKenney's avatar
      rcutorture: Correctly handle grace-period sequence wrap · d7219312
      Paul E. McKenney authored
      The new ->gq_seq grace-period sequence numbers must be shifted down,
      which give artifacts when these numbers wrap.  This commit therefore
      enables rcutorture and rcuperf to handle grace-period sequence numbers
      even if they do wrap.  It does this by allowing a special subtraction
      function to be specified, and this function subtracts before shifting.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      d7219312
    • Paul E. McKenney's avatar
      rcu: Make rcu_start_this_gp() check for grace period already started · 2e3e5e55
      Paul E. McKenney authored
      In the old days of ->gpnum and ->completed, the code requesting a new
      grace period checked to see if that grace period had already started,
      bailing early if so.  The new-age ->gp_seq approach instead checks
      whether the grace period has already finished.  A compensating change
      pushed the requested grace period down to the bottom of the tree, thus
      reducing lock contention and even eliminating it in some cases.  But why
      not further reduce contention, especially on large systems, by doing both,
      especially given that the cost of doing both is extremely small?
      
      This commit therefore adds a new rcu_seq_started() function that checks
      whether a specified grace period has already started.  It then uses
      this new function in place of rcu_seq_done() in the rcu_start_this_gp()
      function's funnel locking code.
      Reported-by: default avatarJoel Fernandes <joel@joelfernandes.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      2e3e5e55
    • Joel Fernandes (Google)'s avatar
      rcu: Fix cpustart tracepoint gp_seq number · 5ca0905f
      Joel Fernandes (Google) authored
      The "cpustart" trace event shows a stale gp_seq. This is because it uses
      rdp->gp_seq, which is updated only at the end of the __note_gp_changes()
      function. This commit therefore instead uses rnp->gp_seq.
      
      An alternative fix would be to update rdp->gp_seq earlier, but this would
      break RCU's detection of the beginning of a new-to-this-CPU grace period.
      Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      5ca0905f
    • Joel Fernandes (Google)'s avatar
      rcu: Produce last "CleanupMore" trace only if late-breaking request · 5b55072f
      Joel Fernandes (Google) authored
      Currently Tree RCU's clean-up code emits a "CleanupMore" trace event in
      response to late-arriving grace-period requests even if the grace period
      was already requested. This makes "CleanupMore" show up an extra time (in
      addition to once for each rcu_node structure that was previously marked
      with the request), and for no good reason.  This commit therefore avoids
      emitting this trace message unless the the only request for this next
      grace period arrived during or after the cleanup scan of the rcu_node
      structures.
      Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      5b55072f
    • Paul E. McKenney's avatar
      rcu: Don't funnel-lock above leaf node if GP in progress · a2165e41
      Paul E. McKenney authored
      The old grace-period start code would acquire only the leaf's rcu_node
      structure's ->lock if that structure believed that a grace period was
      in progress.  The new code advances to the leaf's parent in this case,
      needlessly acquiring then leaf's parent's ->lock.  This commit therefore
      checks the grace-period state after marking the leaf with the need for
      the specified grace period, and if the leaf believes that a grace period
      is in progress, takes an early exit.
      Reported-by: default avatarJoel Fernandes <joel@joelfernandes.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      [ paulmck: Add "Startedleaf" tracing as suggested by Joel Fernandes. ]
      a2165e41
    • Paul E. McKenney's avatar
    • Paul E. McKenney's avatar
    • Paul E. McKenney's avatar
    • Paul E. McKenney's avatar
      rcu: Make simple callback acceleration refer to rdp->gp_seq_needed · e44e73ca
      Paul E. McKenney authored
      Now that the rcu_data structure contains ->gp_seq_needed, create an
      rcu_accelerate_cbs_unlocked() helper function that locklessly checks to
      see if new callbacks' required grace period has already been requested.
      If so, update the callback list locally and again locklessly.  (Though
      interrupts must be and are disabled to avoid racing with conflicting
      updates in interrupt handlers.)
      
      Otherwise, call rcu_accelerate_cbs() as before.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      e44e73ca
    • Paul E. McKenney's avatar
      rcu: Remove ->gpnum and ->completed · ff3bb6f4
      Paul E. McKenney authored
      Now that everything has been converted to use ->gp_seq instead of
      ->gpnum and ->completed, this commit removes ->gpnum and ->completed.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      ff3bb6f4
    • Paul E. McKenney's avatar
      rcu: Convert rcu_fqs tracepoint to ->gp_seq · fee5997c
      Paul E. McKenney authored
      This commit makes the rcu_fqs tracepoint use ->gp_seq instead of ->gpnum.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      fee5997c
    • Paul E. McKenney's avatar
      rcu: Convert rcu_quiescent_state_report tracepoint to ->gp_seq · db023296
      Paul E. McKenney authored
      This commit makes the rcu_quiescent_state_report tracepoint use ->gp_seq
      instead of ->gpnum.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      db023296
    • Paul E. McKenney's avatar
      rcu: Convert rcu_unlock_preempted_task tracepoint to ->gp_seq · 865aa1e0
      Paul E. McKenney authored
      This commit makes the rcu_unlock_preempted_task tracepoint use ->gp_seq
      instead of ->gpnum.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      865aa1e0
    • Paul E. McKenney's avatar
      rcu: Convert rcu_preempt_task tracepoint to ->gp_seq · 598ce094
      Paul E. McKenney authored
      This commit makes the rcu_preempt_task tracepoint use ->gp_seq instead
      of ->gpnum.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      598ce094