1. 12 Jul, 2018 40 commits
    • Joel Fernandes (Google)'s avatar
      rcutorture: Fix rcu_barrier successes counter · bf5b6435
      Joel Fernandes (Google) authored
      The rcutorture test module currently increments both successes and error
      for the barrier test upon error, which results in misleading statistics
      being printed.  This commit therefore changes the code to increment the
      success counter only when the test actually passes.
      
      This change was tested by by returning from the barrier callback without
      incrementing the callback counter, thus introducing what appeared to
      rcutorture to be rcu_barrier() failures.
      Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      bf5b6435
    • Joel Fernandes (Google)'s avatar
      rcutorture: Add support to detect if boost kthread prio is too low · 4babd855
      Joel Fernandes (Google) authored
      When rcutorture is built in to the kernel, an earlier patch detects
      that and raises the priority of RCU's kthreads to allow rcutorture's
      RCU priority boosting tests to succeed.
      
      However, if rcutorture is built as a module, those priorities must be
      raised manually via the rcutree.kthread_prio kernel boot parameter.
      If this manual step is not taken, rcutorture's RCU priority boosting
      tests will fail due to kthread starvation.  One approach would be to
      raise the default priority, but that risks breaking existing users.
      Another approach would be to allow runtime adjustment of RCU's kthread
      priorities, but that introduces numerous "interesting" race conditions.
      This patch therefore instead detects too-low priorities, and prints a
      message and disables the RCU priority boosting tests in that case.
      Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      4babd855
    • Arnd Bergmann's avatar
      rcutorture: Use monotonic timestamp for stall detection · 622be33f
      Arnd Bergmann authored
      The get_seconds() call is deprecated because it overflows on 32-bit
      architectures. The algorithm in rcu_torture_stall() can deal with
      the overflow, but another problem here is that using a CLOCK_REALTIME
      stamp can lead to a false-positive stall warning when a settimeofday()
      happens concurrently.
      
      Using ktime_get_seconds() instead avoids those issues and will never
      overflow. The added cast to 'unsigned long' however is necessary to
      make ULONG_CMP_LT() work correctly.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      622be33f
    • Joel Fernandes (Google)'s avatar
      rcutorture: Make boost test more robust · 3b745c89
      Joel Fernandes (Google) authored
      Currently, with RCU_BOOST disabled, I get no failures when forcing
      rcutorture to test RCU boost priority inversion. The reason seems to be
      that we don't check for failures if the callback never ran at all for
      the duration of the boost-test loop.
      
      Further, the 'rtb' and 'rtbf' counters seem to be used inconsistently.
      'rtb' is incremented at the start of each test and 'rtbf' is incremented
      per-cpu on each failure of call_rcu. So its possible 'rtbf' > 'rtb'.
      
      To test the boost with rcutorture, I did following on a 4-CPU x86 machine:
      
      modprobe rcutorture  test_boost=2
      sleep 20
      rmmod rcutorture
      
      With patch:
      rtbf: 8 rtb: 12
      
      Without patch:
      rtbf: 0 rtb: 2
      
      In summary this patch:
       - Increments failed and total test counters once per boost-test.
       - Checks for failure cases correctly.
      Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      3b745c89
    • Joel Fernandes (Google)'s avatar
      rcutorture: Disable RT throttling for boost tests · 450efca7
      Joel Fernandes (Google) authored
      Currently rcutorture is not able to torture RCU boosting properly. This
      is because the rcutorture's boost threads which are doing the torturing
      may be throttled due to RT throttling.
      
      This patch makes rcutorture use the right torture technique (unthrottled
      rcutorture boost tasks) for torturing RCU so that the test fails
      correctly when no boost is available.
      
      Currently this requires accessing sysctl_sched_rt_runtime directly, but
      that should be Ok since rcutorture is test code. Such direct access is
      also only possible if rcutorture is used as a built-in so make it
      conditional on that.
      Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      450efca7
    • Paul E. McKenney's avatar
      rcutorture: Emphasize testing of single reader protection type · bf1bef50
      Paul E. McKenney authored
      For RCU implementations supporting multiple types of reader protection,
      rcutorture currently randomly selects the combinations of types of
      protection for each phase of each reader.  The problem with this,
      for example, given the four kinds of protection for RCU-sched
      (local_irq_disable(), local_bh_disable(), preempt_disable(), and
      rcu_read_lock_sched()), the reader will be protected by a single
      mechanism only 25% of the time.  We really heavier testing of single
      read-side mechanisms.
      
      This commit therefore uses only a single mechanism about 60% of the time,
      half of the time explicitly and one-eighth of the time by chance.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      bf1bef50
    • Paul E. McKenney's avatar
      rcutorture: Handle extended read-side critical sections · 2397d072
      Paul E. McKenney authored
      This commit enables rcutorture to test whether RCU properly aggregates
      different types of read-side critical sections into a larger section
      covering the set.  It does this by extending an initial read-side
      critical section randomly for a random number of extensions.  There is
      a new rcu_torture_ops field ->extendable that specifies what extensions
      are permitted for a given flavor of RCU (for example, SRCU does not
      permit any extensions, while RCU-sched permits all types).  Note that
      if a given operation (for example, local_bh_disable()) extends an RCU
      read-side critical section, then rcutorture feels free to also start
      and end the critical section with that operation's type of disabling.
      
      Disabling operations include local_bh_disable(), local_irq_disable(),
      and preempt_disable().  This commit also adds a new "busted_srcud"
      torture type, which verifies rcutorture's ability to detect extensions
      of RCU read-side critical sections that are not handled.  Gotta test
      the test, after all!
      
      Note that it is not legal to invoke local_bh_disable() with interrupts
      disabled, and this transition is avoided by overriding the random-number
      generator when it wants to call local_bh_disable() while interrupts
      are disabled.  The code instead leaves both interrupts and bh/softirq
      disabled in this case.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      2397d072
    • Paul E. McKenney's avatar
      rcutorture: Make rcu_torture_timer() use rcu_torture_one_read() · 241b4252
      Paul E. McKenney authored
      This commit saves a few lines of code by making rcu_torture_timer()
      invoke rcu_torture_one_read(), thus completing the consolidation of
      code between rcu_torture_timer() and rcu_torture_reader().
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      241b4252
    • Paul E. McKenney's avatar
      rcutorture: Use per-CPU random state for rcu_torture_timer() · 3025520e
      Paul E. McKenney authored
      Currently, the rcu_torture_timer() function uses a single global
      torture_random_state structure protected by a single global lock.
      This conflicts to some extent with performance and scalability,
      but even more with the goal of consolidating read-side testing
      with rcu_torture_reader().  This commit therefore creates a per-CPU
      torture_random_state structure for use by rcu_torture_timer() and
      eliminates the lock.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      [ paulmck: Make rcu_torture_timer_rand static, per 0day Test Robot report. ]
      3025520e
    • Paul E. McKenney's avatar
      rcutorture: Use atomic increment for n_rcu_torture_timers · 8da9a595
      Paul E. McKenney authored
      Currently, rcu_torture_timer() relies on a lock to guard updates to
      n_rcu_torture_timers.  Unfortunately, consolidating code with
      rcu_torture_reader() will dispense with this lock.  This commit
      therefore makes n_rcu_torture_timers be an atomic_long_t and uses
      atomic_long_inc() to carry out the update.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      8da9a595
    • Paul E. McKenney's avatar
      rcutorture: Extract common code from rcu_torture_reader() · 6b06aa72
      Paul E. McKenney authored
      This commit extracts the code executed on each pass through the loop
      in rcu_torture_reader() into a new rcu_torture_one_read() function.
      This new function will also be used by rcu_torture_timer().
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      6b06aa72
    • Paul E. McKenney's avatar
      rcuperf: Remove unused torturing_tasks() function · 2d362584
      Paul E. McKenney authored
      The torturing_tasks() function in rcuperf.c is not used, so this commit
      removes it.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      2d362584
    • Paul E. McKenney's avatar
      rcu: Remove rcutorture test version and sequence number · 6bea2cc5
      Paul E. McKenney authored
      Back when RCU had a debugfs interface, there was a test version and
      sequence number that allowed associating debugfs data with a particular
      test run, where the test run started with modprobe and ended with rmmod,
      which was how tests were run back on the old ABAT system within IBM.
      But rcutorture testing no longer runs on ABAT, and there is no longer an
      RCU debugfs interface, so there is no longer any need for test versions
      and sequence numbers.
      
      This commit therefore removes the rcutorture_record_test_transition()
      and rcutorture_record_progress() functions, and along with them the
      rcutorture_testseq and rcutorture_vernum variables that they update.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      6bea2cc5
    • Paul E. McKenney's avatar
      rcutorture: Change units of onoff_interval to jiffies · 028be12b
      Paul E. McKenney authored
      Some RCU bugs have been sensitive to the frequency of CPU-hotplug
      operations, which have been gradually increased over time.  But this
      frequency is now at the one-second lower limit that can be specified using
      the rcutorture.onoff_interval kernel parameter.  This commit therefore
      changes the units of rcutorture.onoff_interval from seconds to jiffies,
      and also sets the value specified for this kernel parameter in the TREE03
      rcutorture scenario to 200, which is 200 milliseconds for HZ=1000.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      028be12b
    • Paul E. McKenney's avatar
      rcu: Add diagnostics for offline CPUs failing to report QS · f2e2df59
      Paul E. McKenney authored
      CPUs are expected to report quiescent states when coming online and
      when going offline, and grace-period initialization is supposed to
      handle any race conditions where a CPU's ->qsmask bit is set just after
      it goes offline.  This commit adds diagnostics for the case where an
      offline CPU nevertheless has a grace period waiting on it.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      f2e2df59
    • Paul E. McKenney's avatar
      rcu: Record ->gp_state for both phases of grace-period initialization · fea3f222
      Paul E. McKenney authored
      Grace-period initialization first processes any recent CPU-hotplug
      operations, and then initializes state for the new grace period.  These
      two phases of initialization are currently not distinguished in debug
      prints, but the distinction is valuable in a number of debug situations.
      This commit therefore introduces two new values for ->gp_state,
      RCU_GP_ONOFF and RCU_GP_INIT, in order to make this distinction.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      fea3f222
    • Paul E. McKenney's avatar
      rcu: Add CPU online/offline state to dump_blkd_tasks() · 57738942
      Paul E. McKenney authored
      Interactions between CPU-hotplug operations and grace-period
      initialization can result in dump_blkd_tasks().  One of the first
      debugging actions in this case is to search back in dmesg to work
      out which of the affected rcu_node structure's CPUs are online and to
      determine the last CPU-hotplug operation affecting any of those CPUs.
      This can be laborious and error-prone, especially when console output
      is lost.
      
      This commit therefore causes dump_blkd_tasks() to dump the state of
      the affected rcu_node structure's CPUs and the last grace period during
      which the last offline and online operation affected each of these CPUs.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      57738942
    • Paul E. McKenney's avatar
      rcu: Add up-tree information to dump_blkd_tasks() diagnostics · ff3cee39
      Paul E. McKenney authored
      This commit updates dump_blkd_tasks() to print out quiescent-state
      bitmasks for the rcu_node structures further up the tree.  This
      information helps debugging of interactions between CPU-hotplug
      operations and RCU grace-period initialization.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      ff3cee39
    • Paul E. McKenney's avatar
      rcu: Remove CPU-hotplug failsafe from force-quiescent-state code path · e05121ba
      Paul E. McKenney authored
      Now that quiescent states for newly offlined CPUs are reported either
      when that CPU goes offline or at the end of grace-period initialization,
      the CPU-hotplug failsafe in the force-quiescent-state code path is no
      longer needed.
      
      This commit therefore removes this failsafe.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      e05121ba
    • Paul E. McKenney's avatar
      rcu: Remove failsafe check for lost quiescent state · 17a8212b
      Paul E. McKenney authored
      Now that quiescent-state reporting is fully event-driven, this commit
      removes the check for a lost quiescent state from force_qs_rnp().
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      17a8212b
    • Paul E. McKenney's avatar
      rcu: Move grace-period pre-init delay after pre-init · f34f2f58
      Paul E. McKenney authored
      The main race with the early part of grace-period initialization appears
      to be with CPU hotplug.  To more fully open this race window, this commit
      moves the rcu_gp_slow() from the beginning of the early initialization
      loop to follow that loop, thus widening the race window, especially for
      the rcu_node structures that are initialized last.  This commit also
      expands rcutree.gp_preinit_delay from 3 to 12, giving the same overall
      delay in the grace period, but concentrated in the spot where it will
      do the most good.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      f34f2f58
    • Paul E. McKenney's avatar
      rcu: Add RCU-preempt check for waiting on newly onlined CPU · 1f3e5f51
      Paul E. McKenney authored
      RCU should only be waiting on CPUs that were online at the time that the
      current grace period started.  Failure to abide by this rule can result
      in confusing splats during grace-period cleanup and initialization.
      This commit therefore adds a check to RCU-preempt's preempted-task
      queuing that checks for waiting on newly onlined CPUs.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      1f3e5f51
    • Paul E. McKenney's avatar
      rcu: Fix grace-period hangs due to race with CPU offline · 1e64b15a
      Paul E. McKenney authored
      Without special fail-safe quiescent-state-propagation checks, grace-period
      hangs can result from the following scenario:
      
      1.	CPU 1 goes offline.
      
      2.	Because CPU 1 is the only CPU in the system blocking the current
      	grace period, the grace period ends as soon as
      	rcu_cleanup_dying_idle_cpu()'s call to rcu_report_qs_rnp()
      	returns.
      
      3.	At this point, the leaf rcu_node structure's ->lock is no longer
      	held: rcu_report_qs_rnp() has released it, as it must in order
      	to awaken the RCU grace-period kthread.
      
      4.	At this point, that same leaf rcu_node structure's ->qsmaskinitnext
      	field still records CPU 1 as being online.  This is absolutely
      	necessary because the scheduler uses RCU (in this case on the
      	wake-up path while awakening RCU's grace-period kthread), and
      	->qsmaskinitnext contains RCU's idea as to which CPUs are online.
      	Therefore, invoking rcu_report_qs_rnp() after clearing CPU 1's
      	bit from ->qsmaskinitnext would result in a lockdep-RCU splat
      	due to RCU being used from an offline CPU.
      
      5.	RCU's grace-period kthread awakens, sees that the old grace period
      	has completed and that a new one is needed.  It therefore starts
      	a new grace period, but because CPU 1's leaf rcu_node structure's
      	->qsmaskinitnext field still shows CPU 1 as being online, this new
      	grace period is initialized to wait for a quiescent state from the
      	now-offline CPU 1.
      
      6.	Without the fail-safe force-quiescent-state checks, there would
      	be no quiescent state from the now-offline CPU 1, which would
      	eventually result in RCU CPU stall warnings and memory exhaustion.
      
      It would be good to get rid of the special fail-safe quiescent-state
      propagation checks, and thus it would be good to fix things so that
      the above scenario cannot happen.  This commit therefore adds a new
      ->ofl_lock to the rcu_state structure.  This lock is held by rcu_gp_init()
      across the applying of buffered online and offline operations to the
      rcu_node tree, and it is also held by rcu_cleanup_dying_idle_cpu()
      when buffering a new offline operation.  This prevents rcu_gp_init()
      from acquiring the leaf rcu_node structure's lock during the interval
      between when rcu_cleanup_dying_idle_cpu() invokes rcu_report_qs_rnp(),
      which releases ->lock and the re-acquisition of that same lock.
      This in turn prevents the failure scenario outlined above, and will
      hopefully eventually allow removal of the offline-CPU checks from the
      force-quiescent-state code path.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      1e64b15a
    • Paul E. McKenney's avatar
      rcu: Fix grace-period hangs from mid-init task resume · ec2c2976
      Paul E. McKenney authored
      Without special fail-safe quiescent-state-propagation checks, grace-period
      hangs can result from the following scenario:
      
      1.	A task running on a given CPU is preempted in its RCU read-side
      	critical section.
      
      2.	That CPU goes offline, and there are now no online CPUs
      	corresponding to that CPU's leaf rcu_node structure.
      
      3.	The rcu_gp_init() function does the first phase of grace-period
      	initialization, and sets the aforementioned leaf rcu_node
      	structure's ->qsmaskinit field to all zeroes.  Because there
      	is a blocked task, it does not propagate the zeroing of either
      	->qsmaskinit or ->qsmaskinitnext up the rcu_node tree.
      
      4.	The task resumes on some other CPU and exits its critical section.
      	There is no grace period in progress, so the resulting quiescent
      	state is not reported up the tree.
      
      5.	The rcu_gp_init() function does the second phase of grace-period
      	initialization, which results in the leaf rcu_node structure
      	being initialized to expect no further quiescent states, but
      	with that structure's parent expecting a quiescent-state report.
      
      	The parent will never receive a quiescent state from this leaf
      	rcu_node structure, so the grace period will hang, resulting in
      	RCU CPU stall warnings.
      
      It would be good to get rid of the special fail-safe quiescent-state
      propagation checks.  This commit therefore checks the leaf rcu_node
      structure's ->wait_blkd_tasks field during grace-period initialization.
      If this flag is set, the rcu_report_qs_rnp() is invoked to immediately
      report the possible quiescent state.  While in the neighborhood, this
      commit also report quiescent states for any CPUs that went offline between
      the two phases of grace-period initialization, thus reducing grace-period
      delays and hopefully eventually allowing removal of offline-CPU checks
      from the force-quiescent-state code path.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      ec2c2976
    • Paul E. McKenney's avatar
      rcu: Suppress false-positive splats from mid-init task resume · 0b107d24
      Paul E. McKenney authored
      Consider the following sequence of events in a PREEMPT=y kernel:
      
      1.	All CPUs corresponding to a given leaf rcu_node structure are
      	offline.
      
      2.	The first phase of the rcu_gp_init() function's grace-period
      	initialization runs, and sets that rcu_node structure's
      	->qsmaskinit to zero, as it should.
      
      3.	One of the CPUs corresponding to that rcu_node structure comes
      	back online.  Note that because this CPU came online after the
      	grace period started, this grace period can safely ignore this
      	newly onlined CPU.
      
      4.	A task running on the newly onlined CPU enters an RCU-preempt
      	read-side critical section, and is then preempted.  Because
      	the corresponding rcu_node structure's ->qsmask is zero,
      	rcu_preempt_ctxt_queue() leaves the rcu_node structure's
      	->gp_tasks field NULL, as it should.
      
      5.	The rcu_gp_init() function continues running the second phase of
      	grace-period initialization.  The ->qsmask field of the parent of
      	the aforementioned leaf rcu_node structure is set to not expect
      	a quiescent state from the leaf, as is only right and proper.
      
      	However, when rcu_gp_init() reaches the leaf, it invokes
      	rcu_preempt_check_blocked_tasks(), which sees that the leaf's
      	->blkd_tasks list is non-empty, and therefore sets the leaf's
      	->gp_tasks field to reference the first task on that list.
      
      6.	The grace period ends before the preempted task resumes, which
      	is perfectly fine, given that this grace period was under no
      	obligation to wait for that task to exit its late-starting
      	RCU-preempt read-side critical section.  Unfortunately, the
      	leaf's ->gp_tasks field is non-NULL, so rcu_gp_cleanup() splats.
      	After all, it appears to rcu_gp_cleanup() that the grace period
      	failed to wait for a task that was supposed to be blocking that
      	grace period.
      
      This commit avoids this false-positive splat by adding a check of both
      ->qsmaskinit and ->wait_blkd_tasks to rcu_preempt_check_blocked_tasks().
      If both ->qsmaskinit and ->wait_blkd_tasks are zero, then the task must
      have entered its RCU-preempt read-side critical section late (after all,
      the CPU that it is running on was not online at that time), which means
      that the upper-level rcu_node structure won't be waiting for anything
      on the leaf anyway.
      
      If ->wait_blkd_tasks is non-zero, then there is at least one task on
      ths rcu_node structure's ->blkd_tasks list whose RCU read-side
      critical section predates the current grace period.  If ->qsmaskinit
      is non-zero, there is at least one CPU that was online at the start
      of the current grace period.  Thus, if both are zero, there is nothing
      to wait for.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      0b107d24
    • Paul E. McKenney's avatar
      rcu: Suppress more involved false-positive preempted-task splats · 99990da1
      Paul E. McKenney authored
      Consider the following sequence of events in a PREEMPT=y kernel:
      
      1.	All but one of the CPUs corresponding to a given leaf rcu_node
      	structure go offline.  Each of these CPUs clears its bit in that
      	structure's ->qsmaskinitnext field.
      
      2.	A new grace period starts, and rcu_gp_init() scans the leaf
      	rcu_node structures, applying CPU-hotplug changes since the
      	start of the previous grace period, including those changes in
      	#1 above.  This copies each leaf structure's ->qsmaskinitnext
      	to its ->qsmask field, which represents the CPUs that this new
      	grace period will wait on.  Each copy operation is done holding
      	the corresponding leaf rcu_node structure's ->lock, and at the
      	end of this scan, rcu_gp_init() holds no locks.
      
      3.	The last CPU corresponding to #1's leaf rcu_node structure goes
      	offline, clearing its bit in that structure's ->qsmaskinitnext
      	field, but not touching the ->qsmaskinit field.  Note that
      	rcu_gp_init() is not currently holding any locks!  This CPU does
      	-not- report a quiescent state because the grace period has not
      	yet initialized itself sufficiently to have set any bits in any
      	of the leaf rcu_node structures' ->qsmask fields.
      
      4.	The rcu_gp_init() function continues initializing the new grace
      	period, copying each leaf rcu_node structure's ->qsmaskinit
      	field to its ->qsmask field while holding the corresponding ->lock.
      	This sets the ->qsmask bit corresponding to #3's CPU.
      
      5.	Before the grace period ends, #3's CPU comes back online.
      	Because te grace period has not yet done any force-quiescent-state
      	scans (which would report a quiescent state on behalf of any
      	offline CPUs), this CPU's ->qsmask bit is still set.
      
      6.	A task running on the newly onlined CPU is preempted while in
      	an RCU read-side critical section.  Because this CPU's ->qsmask
      	bit is net, not only does this task queue itself on the leaf
      	rcu_node structure's ->blkd_tasks list, it also sets that
      	structure's ->gp_tasks pointer to reference it.
      
      7.	The grace period started in #1 above comes to an end.  This
      	results in rcu_gp_cleanup() being invoked, which, among other
      	things, checks to make sure that there are no tasks blocking the
      	just-ended grace period, that is, that all ->gp_tasks pointers
      	are NULL.  The ->gp_tasks pointer corresponding to the task
      	preempted in #3 above is non-NULL, which results in a splat.
      
      This splat is a false positive.  The task's RCU read-side critical
      section cannot have begun before the just-ended grace period because
      this would mean either: (1) The CPU came online before the grace period
      started, which cannot have happened because the grace period started
      before that CPU went offline, or (2) The task started its RCU read-side
      critical section on some other CPU, but then it would have had to have
      been preempted before migrating to this CPU, which would mean that it
      would have instead queued itself on that other CPU's rcu_node structure.
      RCU's grace periods thus are working correctly.  Or, more accurately,
      that remaining bugs in RCU's grace periods are elsewhere.
      
      This commit eliminates this false positive by adding code to the end
      of rcu_cpu_starting() that reports a quiescent state to RCU, which has
      the side-effect of clearing that CPU's ->qsmask bit, preventing the
      above scenario.  This approach has the added benefit of more promptly
      reporting quiescent states corresponding to offline CPUs.  Nevertheless,
      this commit does -not- remove the need for the force-quiescent-state
      scans to check for offline CPUs, given that a CPU might remain offline
      indefinitely.  And without the checks in the force-quiescent-state scans,
      the grace period would also persist indefinitely, which could result in
      hangs or memory exhaustion.
      
      Note well that the call to rcu_report_qs_rnp() reporting the quiescent
      state must come -after- the setting of this CPU's bit in the leaf rcu_node
      structure's ->qsmaskinitnext field.  Otherwise, lockdep-RCU will complain
      bitterly about quiescent states coming from an offline CPU.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      99990da1
    • Paul E. McKenney's avatar
      rcu: Suppress false-positive preempted-task splats · fece2776
      Paul E. McKenney authored
      Consider the following sequence of events in a PREEMPT=y kernel:
      
      1.	All CPUs corresponding to a given rcu_node structure go offline.
      	A new grace period starts just after the CPU-hotplug code path
      	does its synchronize_rcu() for the last CPU, so at least this
      	CPU is present in that structure's ->qsmask.
      
      2.	Before the grace period ends, a CPU comes back online, and not
      	just any CPU, but the one corresponding to a non-zero bit in
      	the leaf rcu_node structure's ->qsmask.
      
      3.	A task running on the newly onlined CPU is preempted while in
      	an RCU read-side critical section.  Because this CPU's ->qsmask
      	bit is net, not only does this task queue itself on the leaf
      	rcu_node structure's ->blkd_tasks list, it also sets that
      	structure's ->gp_tasks pointer to reference it.
      
      4.	The grace period started in #1 above comes to an end.  This
      	results in rcu_gp_cleanup() being invoked, which, among other
      	things, checks to make sure that there are no tasks blocking the
      	just-ended grace period, that is, that all ->gp_tasks pointers
      	are NULL.  The ->gp_tasks pointer corresponding to the task
      	preempted in #3 above is non-NULL, which results in a splat.
      
      This splat is a false positive.  The task's RCU read-side critical
      section cannot have begun before the just-ended grace period because
      this would mean either: (1) The CPU came online before the grace period
      started, which cannot have happened because the grace period started
      before that CPU was all the way offline, or (2) The task started its
      RCU read-side critical section on some other CPU, but then it would
      have had to have been preempted before migrating to this CPU, which
      would mean that it would have instead queued itself on that other CPU's
      rcu_node structure.
      
      This commit eliminates this false positive by adding code to the end
      of rcu_cleanup_dying_idle_cpu() that reports a quiescent state to RCU,
      which has the side-effect of clearing that CPU's ->qsmask bit, preventing
      the above scenario.  This approach has the added benefit of more promptly
      reporting quiescent states corresponding to offline CPUs.
      
      Note well that the call to rcu_report_qs_rnp() reporting the quiescent
      state must come -before- the clearing of this CPU's bit in the leaf
      rcu_node structure's ->qsmaskinitnext field.  Otherwise, lockdep-RCU
      will complain bitterly about quiescent states coming from an offline CPU.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      fece2776
    • Paul E. McKenney's avatar
      rcu: Suppress false-positive offline-CPU lockdep-RCU splat · 5554788e
      Paul E. McKenney authored
      The rcu_lockdep_current_cpu_online() function currently checks only the
      RCU-sched data structures to determine whether or not RCU believes that a
      given CPU is offline.  Unfortunately, there are multiple flavors of RCU,
      which means that there is a short window of time during which the various
      flavors disagree as to whether or not a given CPU is offline.  This can
      result in false-positive lockdep-RCU splats in which some other flavor
      of RCU tries to do something based on its view that the CPU is online,
      only to get hit with a lockdep-RCU splat because RCU-sched instead
      believes that the CPU is offline.
      
      This commit therefore changes rcu_lockdep_current_cpu_online() to scan
      all RCU flavors and to consider a given CPU to be online if any of the
      RCU flavors believe it to be online, thus preventing these false-positive
      splats.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      5554788e
    • Paul E. McKenney's avatar
      rcu: Prevent useless FQS scan after all CPUs have checked in · 92816435
      Paul E. McKenney authored
      The force_qs_rnp() function checks for ->qsmask being all zero, that is,
      all CPUs for the current rcu_node structure having already passed through
      quiescent states.  But with RCU-preempt, this is not sufficient to report
      quiescent states further up the tree, so there are further checks that
      can initiate RCU priority boosting and also for races with CPU-hotplug
      operations.  However, if neither of these further checks apply, the code
      proceeds to carry out a useless scan of an all-zero ->qsmask.
      
      This commit therefore adds code to release the current rcu_node
      structure's lock and continue on to the next rcu_node structure, thereby
      avoiding this useless scan.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      92816435
    • Paul E. McKenney's avatar
      rcu: Replace smp_wmb() with smp_store_release() for stall check · 91f63ced
      Paul E. McKenney authored
      This commit gets rid of the smp_wmb() in record_gp_stall_check_time()
      in favor of an smp_store_release().
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      91f63ced
    • Paul E. McKenney's avatar
      rcu: Fix typo and add additional debug · 77cfc7bf
      Paul E. McKenney authored
      This commit fixes a typo and adds some additional debugging to the
      message emitted when a task blocking the current grace period is listed
      as blocking it when either that grace period ends or the next grace
      period begins.  This commit also reformats the console message for
      readability.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      77cfc7bf
    • Paul E. McKenney's avatar
      rcu: Make rcu_report_unblock_qs_rnp() warn on violated preconditions · c74859d1
      Paul E. McKenney authored
      If rcu_report_unblock_qs_rnp() is invoked on something other than
      preemptible RCU or if there are still preempted tasks blocking the
      current grace period, something went badly wrong in the caller.
      This commit therefore adds WARN_ON_ONCE() to these conditions, but
      leaving the legitimate reason for early exit (rnp->qsmask != 0)
      unwarned.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      c74859d1
    • Paul E. McKenney's avatar
      rcu: Make rcu_init_new_rnp() stop upon already-set bit · 8d672fa6
      Paul E. McKenney authored
      Currently, rcu_init_new_rnp() walks up the rcu_node combining tree,
      setting bits in the ->qsmaskinit fields on the way up.  It walks up
      unconditionally, regardless of the initial state of these bits.  This is
      OK because only the corresponding RCU grace-period kthread ever tests
      or sets these bits during runtime.  However, it is also pointless, and
      it increases both memory and lock contention (albeit only slightly), so
      this commit stops the walk as soon as an already-set bit is encountered.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      8d672fa6
    • Paul E. McKenney's avatar
      rcu: Fix an obsolete ->qsmaskinit comment · c50cbe53
      Paul E. McKenney authored
      Back in the old days, when grace-period initialization blocked CPU
      hotplug, the ->qsmaskinit mask was indeed updated at the time that
      a given CPU went offline.  However, with the deferral of these updates
      until the beginning of the next grace period in commit 0aa04b05
      ("rcu: Process offlining and onlining only at grace-period start"),
      it is instead ->qsmaskinitnext that gets updated at that time.
      
      This commit therefore updates the obsolete comment.  It also fixes
      punctuation while on the topic of comments mentioning ->qsmaskinit.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      c50cbe53
    • Paul E. McKenney's avatar
      rcu: Clean up handling of tasks blocked across full-rcu_node offline · 962aff03
      Paul E. McKenney authored
      Commit 0aa04b05 ("rcu: Process offlining and onlining only at
      grace-period start") deferred handling of CPU-hotplug events until the
      start of the next grace period, but consider the following sequence
      of events:
      
      1.	A task is preempted within an RCU-preempt read-side critical
      	section.
      
      2.	The CPU that this task was running on goes offline, along with all
      	other CPUs sharing the corresponding leaf rcu_node structure.
      
      3.	The task resumes execution.
      
      4.	One of those CPUs comes back online before a new grace period starts.
      
      In step 2, the code in the next rcu_gp_init() invocation will (correctly)
      defer removing the leaf rcu_node structure from the upper-level bitmasks,
      and will (correctly) set that structure's ->wait_blkd_tasks field.  During
      the ensuing interval, RCU will (correctly) track the tasks preempted on
      that structure because they must block any subsequent grace period.
      
      In step 3, the code in rcu_read_unlock_special() will (correctly) remove
      the task from the leaf rcu_node structure.  From this point forward, RCU
      need not pay attention to this structure, at least not until one of the
      corresponding CPUs comes back online.
      
      In step 4, the code in the next rcu_gp_init() invocation will
      (incorrectly) invoke rcu_init_new_rnp().  This is incorrect because
      the corresponding rcu_cleanup_dead_rnp() was never invoked.  This is
      nevertheless harmless because the upper-level bits are still set.
      So, no harm, no foul, right?
      
      At least, all is well until a little further into rcu_gp_init()
      invocation, which will notice that there are no longer any tasks blocked
      on the leaf rcu_node structure, conclude that there is no longer anything
      left over from step 2's offline operation, and will therefore invoke
      rcu_cleanup_dead_rnp().  But this invocation of rcu_cleanup_dead_rnp()
      is for the beginning of the earlier offline interval, and the previous
      invocation of rcu_init_new_rnp() is for the end of that same interval.
      That is right, they are invoked out of order.
      
      That cannot be good, can it?
      
      It turns out that this is not a (correctness!) problem because
      rcu_cleanup_dead_rnp() checks to see if any of the corresponding CPUs
      are online, and refuses to do anything if so.  In other words, in the
      case where rcu_init_new_rnp() and rcu_cleanup_dead_rnp() execute out of
      order, they both have no effect.
      
      But this is at best an accident waiting to happen.
      
      This commit therefore adds logic to rcu_gp_init() so that
      rcu_init_new_rnp() and rcu_cleanup_dead_rnp() are always invoked in
      order, and so that neither are invoked at all in cases where RCU had to
      pay attention to the leaf rcu_node structure during the entire time that
      all corresponding CPUs were offline.
      
      And, while in the area, this commit reduces confusion by using formal
      parameters rather than local variables that just happen to have the same
      value at that particular point in the code.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      962aff03
    • Joel Fernandes (Google)'s avatar
      rcu: Identify grace period is in progress as we advance up the tree · 226ca5e7
      Joel Fernandes (Google) authored
      There's no need to keep checking the same starting node for whether a
      grace period is in progress as we advance up the funnel lock loop. Its
      sufficient if we just checked it in the start, and then subsequently
      checked the internal nodes as we advanced up the combining tree. This
      also makes sense because the grace-period updates propogate from the
      root to the leaf, so there's a chance we may find a grace period has
      started as we advance up, lets check for the same.
      Reported-by: default avatarPaul McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      226ca5e7
    • Joel Fernandes (Google)'s avatar
      rcu: Use better variable names in funnel locking loop · df2bf8f7
      Joel Fernandes (Google) authored
      The funnel locking loop in rcu_start_this_gp uses rcu_root as a
      temporary variable while walking the combining tree. This causes a
      tiresome exercise of a code reader reminding themselves that rcu_root
      may not be root. Lets just call it rnp, and rename other variables as
      well to be more appropriate.
      
      Original patch: https://patchwork.kernel.org/patch/10396577/Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      [ paulmck: Fix name in comment as well. ]
      df2bf8f7
    • Joel Fernandes's avatar
      rcu: Rename the grace-period-request variables and parameters · b73de91d
      Joel Fernandes authored
      The name 'c' is used for variables and parameters holding the requested
      grace-period sequence number.  However it is no longer very meaningful
      given the conversions from ->gpnum and (especially) ->completed to
      ->gp_seq. This commit therefore renames 'c' to 'gp_seq_req'.
      
      Previous patch discussion is at:
      https://patchwork.kernel.org/patch/10396579/Signed-off-by: default avatarJoel Fernandes <joel@joelfernandes.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b73de91d
    • Paul E. McKenney's avatar
      rcu: Regularize resetting of rcu_data wrap indicator · 3d18469a
      Paul E. McKenney authored
      The rcu_data structure's ->gpwrap indicator is currently reset only
      when the CPU in question detects a new grace period.  This is in theory
      sufficient because any CPU that has been out of action for long enough
      that its ->gpwrap indicator is set is guaranteed to see both the end
      of an old grace period and the start of a new one.
      
      However, the current code leaves a short window during which the ->gpwrap
      indicator has been reset but the corresponding ->gp_seq counter has not
      yet been brought up to date.  This is harmless because interrupts are
      disabled, but it is likely to (at the very least) cause confusion.
      
      This commit therefore moves the resetting of ->gpwrap to follow the
      updating of ->gp_seq.  While in the area, it also resets ->gp_seq_needed.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      3d18469a
    • Paul E. McKenney's avatar
      rcutorture: Correctly handle grace-period sequence wrap · d7219312
      Paul E. McKenney authored
      The new ->gq_seq grace-period sequence numbers must be shifted down,
      which give artifacts when these numbers wrap.  This commit therefore
      enables rcutorture and rcuperf to handle grace-period sequence numbers
      even if they do wrap.  It does this by allowing a special subtraction
      function to be specified, and this function subtracts before shifting.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      d7219312