1. 12 Jul, 2018 40 commits
    • NeilBrown's avatar
      rculist: Improve documentation for list_for_each_entry_from_rcu() · b7b6f94c
      NeilBrown authored
      Unfortunately the patch for adding list_for_each_entry_from_rcu()
      wasn't the final patch after all review.  It is functionally
      correct but the documentation was incomplete.
      
      This patch adds this missing documentation which includes an update to
      the documentation for list_for_each_entry_continue_rcu() to match the
      documentation for the new list_for_each_entry_from_rcu(), and adds
      list_for_each_entry_from_rcu() and the already existing
      hlist_for_each_entry_from_rcu() to section 7 of whatisRCU.txt.
      Reviewed-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b7b6f94c
    • Paul E. McKenney's avatar
      srcu: Add grace-period number to rcutorture statistics printout · 52e17ba1
      Paul E. McKenney authored
      This commit adds the SRCU grace-period number to the rcutorture statistics
      printout, which allows it to be compared to the rcutorture "Writer stall
      state" message.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      52e17ba1
    • Paul E. McKenney's avatar
      rcu: Print stall-warning NMI dyntick state in hexadecimal · 89b4cd4b
      Paul E. McKenney authored
      The ->dynticks_nmi_nesting field records the nesting depth of both
      interrupt and NMI handlers.  Because the kernel can enter interrupts
      and never leave them (and vice versa) and because NMIs can interrupt
      manipulation of the ->dynticks_nmi_nesting field, the values in this
      field must be both chosen and maniupated very carefully.  As a result,
      although the value is zero when the corresponding CPU is executing
      neither an interrupt nor an NMI handler, it is 4,611,686,018,427,387,906
      on 64-bit systems when there is a single level of interrupt/NMI handling
      in progress.
      
      This number is difficult to remember and interpret, so this commit
      switches the output to hexadecimal, resulting in the much nicer
      0x4000000000000002.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      89b4cd4b
    • Paul E. McKenney's avatar
      MAINTAINERS: Update RCU, SRCU, and TORTURE-TEST entries · cfe15038
      Paul E. McKenney authored
      The RCU, SRCU, and TORTURE-TEST entries are missing some recent
      changes, so this commit brings them up to date.
      Reported-by: default avatarAndrea Parri <andrea.parri@amarulasolutions.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      cfe15038
    • Paul E. McKenney's avatar
      rcu: Make rcu_seq_diff() more exact · 2ee5aca5
      Paul E. McKenney authored
      The current implementatation of rcu_seq_diff() follows tradition in
      providing a rough-and-ready approximation of the number of elapsed grace
      periods between the two rcu_seq values.  However, this difference is
      used to flag RCU-failure "near misses", which can be a valuable debugging
      aid, so more exactitude would be an improvement.  This commit therefore
      improves the accuracy of rcu_seq_diff().
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      2ee5aca5
    • Andrea Parri's avatar
      doc: Update synchronize_rcu() definition in whatisRCU.txt · 264d4f88
      Andrea Parri authored
      The synchronize_rcu() definition based on RW-locks in whatisRCU.txt
      does not meet the "Memory-Barrier Guarantees" in Requirements.html;
      for example, the following SB-like test:
      
          P0:                      P1:
      
          WRITE_ONCE(x, 1);        WRITE_ONCE(y, 1);
          synchronize_rcu();       smp_mb();
          r0 = READ_ONCE(y);       r1 = READ_ONCE(x);
      
      should not be allowed to reach the state "r0 = 0 AND r1 = 0", but
      the current write_lock()+write_unlock() definition can not ensure
      this.  This commit therefore inserts an smp_mb__after_spinlock()
      in order to cause this synchronize_rcu() implementation to provide
      this memory-barrier guarantee.
      Suggested-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrea Parri <andrea.parri@amarulasolutions.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      264d4f88
    • Byungchul Park's avatar
      rcu: Check the range of jiffies_till_{first,next}_fqs when setting them · 67abb96c
      Byungchul Park authored
      Currently, the range of jiffies_till_{first,next}_fqs are checked and
      adjusted on and on in the loop of rcu_gp_kthread on runtime.
      
      However, it's enough to check them only when setting them, not every
      time in the loop. So make them handled on a setting time via sysfs.
      Signed-off-by: default avatarByungchul Park <byungchul.park@lge.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      67abb96c
    • Paul E. McKenney's avatar
      rcu: Add diagnostics for rcutorture writer stall warning · 47199a08
      Paul E. McKenney authored
      This commit adds any in-the-future ->gp_seq_needed fields to the
      diagnostics for an rcutorture writer stall warning message.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      47199a08
    • Steven Rostedt (VMware)'s avatar
      rcu: Add comment to the last sleep in the rcu tasks loop · cd23ac8d
      Steven Rostedt (VMware) authored
      At the end of rcu_tasks_kthread() there's a lonely
      schedule_timeout_uninterruptible() call with no apparent rationale for
      its existence. But there is. It is to keep the thread from going into
      a tight loop if there's some anomaly. That really needs a comment.
      
      Link: http://lkml.kernel.org/r/20180524223839.GU3803@linux.vnet.ibm.comSigned-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      cd23ac8d
    • Steven Rostedt (VMware)'s avatar
      rcu: Speed up calling of RCU tasks callbacks · c03be752
      Steven Rostedt (VMware) authored
      Joel Fernandes found that the synchronize_rcu_tasks() was taking a
      significant amount of time. He demonstrated it with the following test:
      
       # cd /sys/kernel/tracing
       # while [ 1 ]; do x=1; done &
       # echo '__schedule_bug:traceon' > set_ftrace_filter
       # time echo '!__schedule_bug:traceon' > set_ftrace_filter;
      
      real	0m1.064s
      user	0m0.000s
      sys	0m0.004s
      
      Where it takes a little over a second to perform the synchronize,
      because there's a loop that waits 1 second at a time for tasks to get
      through their quiescent points when there's a task that must be waited
      for.
      
      After discussion we came up with a simple way to wait for holdouts but
      increase the time for each iteration of the loop but no more than a
      full second.
      
      With the new patch we have:
      
       # time echo '!__schedule_bug:traceon' > set_ftrace_filter;
      
      real	0m0.131s
      user	0m0.000s
      sys	0m0.004s
      
      Which drops it down to 13% of what the original wait time was.
      
      Link: http://lkml.kernel.org/r/20180523063815.198302-2-joel@joelfernandes.orgReported-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Suggested-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      c03be752
    • Joel Fernandes (Google)'s avatar
      rcu: Add comment documenting how rcu_seq_snap works · 0d805a70
      Joel Fernandes (Google) authored
      rcu_seq_snap may be tricky to decipher. Lets document how it works with
      an example to make it easier.
      Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      [ paulmck: Shrink comment as suggested by Peter Zijlstra. ]
      0d805a70
    • Paul E. McKenney's avatar
      rcu: Use RCU CPU stall timeout for rcu_check_gp_start_stall() · b06ae25a
      Paul E. McKenney authored
      Currently, rcu_check_gp_start_stall() waits for one second after the first
      request before complaining that a grace period has not yet started.  This
      was desirable while testing the conversion from ->future_gp_needed[] to
      ->gp_seq_needed, but it is a bit on the hair-trigger side for production
      use under heavy load.  This commit therefore makes this wait time be
      exactly that of the RCU CPU stall warning, allowing easy adjustment of
      both timeouts to suit the distribution or installation at hand.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b06ae25a
    • Paul E. McKenney's avatar
      rcu: Remove __maybe_unused from rcu_cpu_has_callbacks() · 51fbb910
      Paul E. McKenney authored
      The rcu_cpu_has_callbacks() function is now used in all configurations,
      so this commit removes the __maybe_unused.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      51fbb910
    • Paul E. McKenney's avatar
      rcu: Remove "inline" from rcu_perf_print_module_parms() · 96221795
      Paul E. McKenney authored
      This function is in rcuperf.c, which is not an include file, so there
      is no problem dropping the "inline", especially given that this function
      is invoked only twice per rcuperf run.  This commit therefore delegates
      the inlining decision to the compiler by dropping the "inline".
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      96221795
    • Paul E. McKenney's avatar
      rcu: Remove "inline" from rcu_torture_print_module_parms() · eac45e58
      Paul E. McKenney authored
      This function is in rcutorture.c, which is not an include file, so there
      is no problem dropping the "inline", especially given that this function
      is invoked only twice per rcutorture run.  This commit therefore delegates
      the inlining decision to the compiler by dropping the "inline".
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      eac45e58
    • Paul E. McKenney's avatar
      rcu: Remove "inline" from panic_on_rcu_stall() and rcu_blocking_is_gp() · 95394e69
      Paul E. McKenney authored
      These functions are in kernel/rcu/tree.c, which is not an include file,
      so there is no problem dropping the "inline", especially given that these
      functions are nowhere near a fastpath.  This commit therefore delegates
      the inlining decision to the compiler by dropping the "inline".
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      95394e69
    • Paul E. McKenney's avatar
      rcu: Remove unused local variable "cpu" · ab6b8214
      Paul E. McKenney authored
      One danger of using __maybe_unused is that the compiler doesn't yell
      at you when you remove the last reference, witness rcu_bind_gp_kthread()
      and its local variable "cpu".  This commit removes this local variable.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      ab6b8214
    • Paul E. McKenney's avatar
      rcu: Remove unused rcu_kick_nohz_cpu() function · 164ba3fc
      Paul E. McKenney authored
      The rcu_kick_nohz_cpu() function is no longer used, and the functionality
      it used to provide is now provided by a call to resched_cpu() in the
      force-quiescent-state function rcu_implicit_dynticks_qs().  This commit
      therefore removes rcu_kick_nohz_cpu().
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      164ba3fc
    • Paul E. McKenney's avatar
      rcu: Clarify and correct the rcu_preempt_qs() header comment · c7037ff5
      Paul E. McKenney authored
      The rcu_preempt_qs() function only applies to the CPU, not the task.
      A task really is allowed to invoke this function while in an RCU-preempt
      read-side critical section, but only if it has first added itself to
      some leaf rcu_node structure's ->blkd_tasks list.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      c7037ff5
    • Paul E. McKenney's avatar
      rcu: Inline rcu_dynticks_momentary_idle() into its sole caller · 3b57a399
      Paul E. McKenney authored
      The rcu_dynticks_momentary_idle() function is invoked only from
      rcu_momentary_dyntick_idle(), and neither function is particularly
      large.  This commit therefore saves a few lines by inlining
      rcu_dynticks_momentary_idle() into rcu_momentary_dyntick_idle().
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      3b57a399
    • Paul E. McKenney's avatar
      rcu: Mark task as .need_qs less aggressively · 15651201
      Paul E. McKenney authored
      If any scheduling-clock interrupt interrupts an RCU-preempt read-side
      critical section, the interrupted task's ->rcu_read_unlock_special.b.need_qs
      field is set.  This causes the outermost rcu_read_unlock() to incur the
      extra overhead of calling into rcu_read_unlock_special().  This commit
      reduces that overhead by setting ->rcu_read_unlock_special.b.need_qs only
      if the grace period has been in effect for more than one second.
      
      Why one second?  Because this is comfortably smaller than the minimum
      RCU CPU stall-warning timeout of three seconds, but long enough that the
      .need_qs marking should happen quite rarely.  And if your RCU read-side
      critical section has run on-CPU for a full second, it is not unreasonable
      to invest some CPU time in ending the grace period quickly.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      15651201
    • Paul E. McKenney's avatar
      rcu: Improve RCU-tasks naming and comments · 6f56f714
      Paul E. McKenney authored
      The naming and comments associated with some RCU-tasks code make
      the faulty assumption that context switches due to cond_resched()
      are voluntary.  As several people pointed out, this is not the case.
      This commit therefore updates function names and comments to better
      reflect current reality.
      Reported-by: default avatarByungchul Park <byungchul.park@lge.com>
      Reported-by: default avatarJoel Fernandes <joel@joelfernandes.org>
      Reported-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      6f56f714
    • Joe Perches's avatar
      rcu: Use pr_fmt to prefix "rcu: " to logging output · a7538352
      Joe Perches authored
      This commit also adjusts some whitespace while in the area.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      [ paulmck: Revert string-breaking %s as requested by Andy Shevchenko. ]
      a7538352
    • Mauro Carvalho Chehab's avatar
      rcu: rcupdate.h: Get rid of Sphinx warnings at rcu_pointer_handoff() · 1445e917
      Mauro Carvalho Chehab authored
      The code example at rcupdate.h currently produce lots of warnings:
      
      	./include/linux/rcupdate.h:572: WARNING: Unexpected indentation.
      	./include/linux/rcupdate.h:576: WARNING: Unexpected indentation.
      	./include/linux/rcupdate.h:580: WARNING: Block quote ends without a blank line; unexpected unindent.
      	./include/linux/rcupdate.h:582: WARNING: Block quote ends without a blank line; unexpected unindent.
      	./include/linux/rcupdate.h:582: WARNING: Inline literal start-string without end-string.
      
      This commit therefore changes it to a code-block.
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      1445e917
    • Byungchul Park's avatar
      rcu: Improve rcu_note_voluntary_context_switch() reporting · 07f27570
      Byungchul Park authored
      We expect a quiescent state of TASKS_RCU when cond_resched_tasks_rcu_qs()
      is called, no matter whether it actually be scheduled or not. However,
      it currently doesn't report the quiescent state when the task enters
      into __schedule() as it's called with preempt = true. So make it report
      the quiescent state unconditionally when cond_resched_tasks_rcu_qs() is
      called.
      
      And in TINY_RCU, even though the quiescent state of rcu_bh also should
      be reported when the tick interrupt comes from user, it doesn't. So make
      it reported.
      
      Lastly in TREE_RCU, rcu_note_voluntary_context_switch() should be
      reported when the tick interrupt comes from not only user but also idle,
      as an extended quiescent state.
      Signed-off-by: default avatarByungchul Park <byungchul.park@lge.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      [ paulmck: Simplify rcutiny portion given no RCU-tasks for !PREEMPT. ]
      07f27570
    • Paul E. McKenney's avatar
      rcu: Make rcu_read_unlock_special() static · 3949fa9b
      Paul E. McKenney authored
      Because rcu_read_unlock_special() is no longer used outside of
      kernel/rcu/tree_plugin.h, this commit makes it static.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      3949fa9b
    • Paul E. McKenney's avatar
      rcu: Add diagnostics for offline CPUs failing to report QS · f2e2df59
      Paul E. McKenney authored
      CPUs are expected to report quiescent states when coming online and
      when going offline, and grace-period initialization is supposed to
      handle any race conditions where a CPU's ->qsmask bit is set just after
      it goes offline.  This commit adds diagnostics for the case where an
      offline CPU nevertheless has a grace period waiting on it.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      f2e2df59
    • Paul E. McKenney's avatar
      rcu: Record ->gp_state for both phases of grace-period initialization · fea3f222
      Paul E. McKenney authored
      Grace-period initialization first processes any recent CPU-hotplug
      operations, and then initializes state for the new grace period.  These
      two phases of initialization are currently not distinguished in debug
      prints, but the distinction is valuable in a number of debug situations.
      This commit therefore introduces two new values for ->gp_state,
      RCU_GP_ONOFF and RCU_GP_INIT, in order to make this distinction.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      fea3f222
    • Paul E. McKenney's avatar
      rcu: Add CPU online/offline state to dump_blkd_tasks() · 57738942
      Paul E. McKenney authored
      Interactions between CPU-hotplug operations and grace-period
      initialization can result in dump_blkd_tasks().  One of the first
      debugging actions in this case is to search back in dmesg to work
      out which of the affected rcu_node structure's CPUs are online and to
      determine the last CPU-hotplug operation affecting any of those CPUs.
      This can be laborious and error-prone, especially when console output
      is lost.
      
      This commit therefore causes dump_blkd_tasks() to dump the state of
      the affected rcu_node structure's CPUs and the last grace period during
      which the last offline and online operation affected each of these CPUs.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      57738942
    • Paul E. McKenney's avatar
      rcu: Add up-tree information to dump_blkd_tasks() diagnostics · ff3cee39
      Paul E. McKenney authored
      This commit updates dump_blkd_tasks() to print out quiescent-state
      bitmasks for the rcu_node structures further up the tree.  This
      information helps debugging of interactions between CPU-hotplug
      operations and RCU grace-period initialization.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      ff3cee39
    • Paul E. McKenney's avatar
      rcu: Remove CPU-hotplug failsafe from force-quiescent-state code path · e05121ba
      Paul E. McKenney authored
      Now that quiescent states for newly offlined CPUs are reported either
      when that CPU goes offline or at the end of grace-period initialization,
      the CPU-hotplug failsafe in the force-quiescent-state code path is no
      longer needed.
      
      This commit therefore removes this failsafe.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      e05121ba
    • Paul E. McKenney's avatar
      rcu: Remove failsafe check for lost quiescent state · 17a8212b
      Paul E. McKenney authored
      Now that quiescent-state reporting is fully event-driven, this commit
      removes the check for a lost quiescent state from force_qs_rnp().
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      17a8212b
    • Paul E. McKenney's avatar
      rcu: Move grace-period pre-init delay after pre-init · f34f2f58
      Paul E. McKenney authored
      The main race with the early part of grace-period initialization appears
      to be with CPU hotplug.  To more fully open this race window, this commit
      moves the rcu_gp_slow() from the beginning of the early initialization
      loop to follow that loop, thus widening the race window, especially for
      the rcu_node structures that are initialized last.  This commit also
      expands rcutree.gp_preinit_delay from 3 to 12, giving the same overall
      delay in the grace period, but concentrated in the spot where it will
      do the most good.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      f34f2f58
    • Paul E. McKenney's avatar
      rcu: Add RCU-preempt check for waiting on newly onlined CPU · 1f3e5f51
      Paul E. McKenney authored
      RCU should only be waiting on CPUs that were online at the time that the
      current grace period started.  Failure to abide by this rule can result
      in confusing splats during grace-period cleanup and initialization.
      This commit therefore adds a check to RCU-preempt's preempted-task
      queuing that checks for waiting on newly onlined CPUs.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      1f3e5f51
    • Paul E. McKenney's avatar
      rcu: Fix grace-period hangs due to race with CPU offline · 1e64b15a
      Paul E. McKenney authored
      Without special fail-safe quiescent-state-propagation checks, grace-period
      hangs can result from the following scenario:
      
      1.	CPU 1 goes offline.
      
      2.	Because CPU 1 is the only CPU in the system blocking the current
      	grace period, the grace period ends as soon as
      	rcu_cleanup_dying_idle_cpu()'s call to rcu_report_qs_rnp()
      	returns.
      
      3.	At this point, the leaf rcu_node structure's ->lock is no longer
      	held: rcu_report_qs_rnp() has released it, as it must in order
      	to awaken the RCU grace-period kthread.
      
      4.	At this point, that same leaf rcu_node structure's ->qsmaskinitnext
      	field still records CPU 1 as being online.  This is absolutely
      	necessary because the scheduler uses RCU (in this case on the
      	wake-up path while awakening RCU's grace-period kthread), and
      	->qsmaskinitnext contains RCU's idea as to which CPUs are online.
      	Therefore, invoking rcu_report_qs_rnp() after clearing CPU 1's
      	bit from ->qsmaskinitnext would result in a lockdep-RCU splat
      	due to RCU being used from an offline CPU.
      
      5.	RCU's grace-period kthread awakens, sees that the old grace period
      	has completed and that a new one is needed.  It therefore starts
      	a new grace period, but because CPU 1's leaf rcu_node structure's
      	->qsmaskinitnext field still shows CPU 1 as being online, this new
      	grace period is initialized to wait for a quiescent state from the
      	now-offline CPU 1.
      
      6.	Without the fail-safe force-quiescent-state checks, there would
      	be no quiescent state from the now-offline CPU 1, which would
      	eventually result in RCU CPU stall warnings and memory exhaustion.
      
      It would be good to get rid of the special fail-safe quiescent-state
      propagation checks, and thus it would be good to fix things so that
      the above scenario cannot happen.  This commit therefore adds a new
      ->ofl_lock to the rcu_state structure.  This lock is held by rcu_gp_init()
      across the applying of buffered online and offline operations to the
      rcu_node tree, and it is also held by rcu_cleanup_dying_idle_cpu()
      when buffering a new offline operation.  This prevents rcu_gp_init()
      from acquiring the leaf rcu_node structure's lock during the interval
      between when rcu_cleanup_dying_idle_cpu() invokes rcu_report_qs_rnp(),
      which releases ->lock and the re-acquisition of that same lock.
      This in turn prevents the failure scenario outlined above, and will
      hopefully eventually allow removal of the offline-CPU checks from the
      force-quiescent-state code path.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      1e64b15a
    • Paul E. McKenney's avatar
      rcu: Fix grace-period hangs from mid-init task resume · ec2c2976
      Paul E. McKenney authored
      Without special fail-safe quiescent-state-propagation checks, grace-period
      hangs can result from the following scenario:
      
      1.	A task running on a given CPU is preempted in its RCU read-side
      	critical section.
      
      2.	That CPU goes offline, and there are now no online CPUs
      	corresponding to that CPU's leaf rcu_node structure.
      
      3.	The rcu_gp_init() function does the first phase of grace-period
      	initialization, and sets the aforementioned leaf rcu_node
      	structure's ->qsmaskinit field to all zeroes.  Because there
      	is a blocked task, it does not propagate the zeroing of either
      	->qsmaskinit or ->qsmaskinitnext up the rcu_node tree.
      
      4.	The task resumes on some other CPU and exits its critical section.
      	There is no grace period in progress, so the resulting quiescent
      	state is not reported up the tree.
      
      5.	The rcu_gp_init() function does the second phase of grace-period
      	initialization, which results in the leaf rcu_node structure
      	being initialized to expect no further quiescent states, but
      	with that structure's parent expecting a quiescent-state report.
      
      	The parent will never receive a quiescent state from this leaf
      	rcu_node structure, so the grace period will hang, resulting in
      	RCU CPU stall warnings.
      
      It would be good to get rid of the special fail-safe quiescent-state
      propagation checks.  This commit therefore checks the leaf rcu_node
      structure's ->wait_blkd_tasks field during grace-period initialization.
      If this flag is set, the rcu_report_qs_rnp() is invoked to immediately
      report the possible quiescent state.  While in the neighborhood, this
      commit also report quiescent states for any CPUs that went offline between
      the two phases of grace-period initialization, thus reducing grace-period
      delays and hopefully eventually allowing removal of offline-CPU checks
      from the force-quiescent-state code path.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      ec2c2976
    • Paul E. McKenney's avatar
      rcu: Suppress false-positive splats from mid-init task resume · 0b107d24
      Paul E. McKenney authored
      Consider the following sequence of events in a PREEMPT=y kernel:
      
      1.	All CPUs corresponding to a given leaf rcu_node structure are
      	offline.
      
      2.	The first phase of the rcu_gp_init() function's grace-period
      	initialization runs, and sets that rcu_node structure's
      	->qsmaskinit to zero, as it should.
      
      3.	One of the CPUs corresponding to that rcu_node structure comes
      	back online.  Note that because this CPU came online after the
      	grace period started, this grace period can safely ignore this
      	newly onlined CPU.
      
      4.	A task running on the newly onlined CPU enters an RCU-preempt
      	read-side critical section, and is then preempted.  Because
      	the corresponding rcu_node structure's ->qsmask is zero,
      	rcu_preempt_ctxt_queue() leaves the rcu_node structure's
      	->gp_tasks field NULL, as it should.
      
      5.	The rcu_gp_init() function continues running the second phase of
      	grace-period initialization.  The ->qsmask field of the parent of
      	the aforementioned leaf rcu_node structure is set to not expect
      	a quiescent state from the leaf, as is only right and proper.
      
      	However, when rcu_gp_init() reaches the leaf, it invokes
      	rcu_preempt_check_blocked_tasks(), which sees that the leaf's
      	->blkd_tasks list is non-empty, and therefore sets the leaf's
      	->gp_tasks field to reference the first task on that list.
      
      6.	The grace period ends before the preempted task resumes, which
      	is perfectly fine, given that this grace period was under no
      	obligation to wait for that task to exit its late-starting
      	RCU-preempt read-side critical section.  Unfortunately, the
      	leaf's ->gp_tasks field is non-NULL, so rcu_gp_cleanup() splats.
      	After all, it appears to rcu_gp_cleanup() that the grace period
      	failed to wait for a task that was supposed to be blocking that
      	grace period.
      
      This commit avoids this false-positive splat by adding a check of both
      ->qsmaskinit and ->wait_blkd_tasks to rcu_preempt_check_blocked_tasks().
      If both ->qsmaskinit and ->wait_blkd_tasks are zero, then the task must
      have entered its RCU-preempt read-side critical section late (after all,
      the CPU that it is running on was not online at that time), which means
      that the upper-level rcu_node structure won't be waiting for anything
      on the leaf anyway.
      
      If ->wait_blkd_tasks is non-zero, then there is at least one task on
      ths rcu_node structure's ->blkd_tasks list whose RCU read-side
      critical section predates the current grace period.  If ->qsmaskinit
      is non-zero, there is at least one CPU that was online at the start
      of the current grace period.  Thus, if both are zero, there is nothing
      to wait for.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      0b107d24
    • Paul E. McKenney's avatar
      rcu: Suppress more involved false-positive preempted-task splats · 99990da1
      Paul E. McKenney authored
      Consider the following sequence of events in a PREEMPT=y kernel:
      
      1.	All but one of the CPUs corresponding to a given leaf rcu_node
      	structure go offline.  Each of these CPUs clears its bit in that
      	structure's ->qsmaskinitnext field.
      
      2.	A new grace period starts, and rcu_gp_init() scans the leaf
      	rcu_node structures, applying CPU-hotplug changes since the
      	start of the previous grace period, including those changes in
      	#1 above.  This copies each leaf structure's ->qsmaskinitnext
      	to its ->qsmask field, which represents the CPUs that this new
      	grace period will wait on.  Each copy operation is done holding
      	the corresponding leaf rcu_node structure's ->lock, and at the
      	end of this scan, rcu_gp_init() holds no locks.
      
      3.	The last CPU corresponding to #1's leaf rcu_node structure goes
      	offline, clearing its bit in that structure's ->qsmaskinitnext
      	field, but not touching the ->qsmaskinit field.  Note that
      	rcu_gp_init() is not currently holding any locks!  This CPU does
      	-not- report a quiescent state because the grace period has not
      	yet initialized itself sufficiently to have set any bits in any
      	of the leaf rcu_node structures' ->qsmask fields.
      
      4.	The rcu_gp_init() function continues initializing the new grace
      	period, copying each leaf rcu_node structure's ->qsmaskinit
      	field to its ->qsmask field while holding the corresponding ->lock.
      	This sets the ->qsmask bit corresponding to #3's CPU.
      
      5.	Before the grace period ends, #3's CPU comes back online.
      	Because te grace period has not yet done any force-quiescent-state
      	scans (which would report a quiescent state on behalf of any
      	offline CPUs), this CPU's ->qsmask bit is still set.
      
      6.	A task running on the newly onlined CPU is preempted while in
      	an RCU read-side critical section.  Because this CPU's ->qsmask
      	bit is net, not only does this task queue itself on the leaf
      	rcu_node structure's ->blkd_tasks list, it also sets that
      	structure's ->gp_tasks pointer to reference it.
      
      7.	The grace period started in #1 above comes to an end.  This
      	results in rcu_gp_cleanup() being invoked, which, among other
      	things, checks to make sure that there are no tasks blocking the
      	just-ended grace period, that is, that all ->gp_tasks pointers
      	are NULL.  The ->gp_tasks pointer corresponding to the task
      	preempted in #3 above is non-NULL, which results in a splat.
      
      This splat is a false positive.  The task's RCU read-side critical
      section cannot have begun before the just-ended grace period because
      this would mean either: (1) The CPU came online before the grace period
      started, which cannot have happened because the grace period started
      before that CPU went offline, or (2) The task started its RCU read-side
      critical section on some other CPU, but then it would have had to have
      been preempted before migrating to this CPU, which would mean that it
      would have instead queued itself on that other CPU's rcu_node structure.
      RCU's grace periods thus are working correctly.  Or, more accurately,
      that remaining bugs in RCU's grace periods are elsewhere.
      
      This commit eliminates this false positive by adding code to the end
      of rcu_cpu_starting() that reports a quiescent state to RCU, which has
      the side-effect of clearing that CPU's ->qsmask bit, preventing the
      above scenario.  This approach has the added benefit of more promptly
      reporting quiescent states corresponding to offline CPUs.  Nevertheless,
      this commit does -not- remove the need for the force-quiescent-state
      scans to check for offline CPUs, given that a CPU might remain offline
      indefinitely.  And without the checks in the force-quiescent-state scans,
      the grace period would also persist indefinitely, which could result in
      hangs or memory exhaustion.
      
      Note well that the call to rcu_report_qs_rnp() reporting the quiescent
      state must come -after- the setting of this CPU's bit in the leaf rcu_node
      structure's ->qsmaskinitnext field.  Otherwise, lockdep-RCU will complain
      bitterly about quiescent states coming from an offline CPU.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      99990da1
    • Paul E. McKenney's avatar
      rcu: Suppress false-positive preempted-task splats · fece2776
      Paul E. McKenney authored
      Consider the following sequence of events in a PREEMPT=y kernel:
      
      1.	All CPUs corresponding to a given rcu_node structure go offline.
      	A new grace period starts just after the CPU-hotplug code path
      	does its synchronize_rcu() for the last CPU, so at least this
      	CPU is present in that structure's ->qsmask.
      
      2.	Before the grace period ends, a CPU comes back online, and not
      	just any CPU, but the one corresponding to a non-zero bit in
      	the leaf rcu_node structure's ->qsmask.
      
      3.	A task running on the newly onlined CPU is preempted while in
      	an RCU read-side critical section.  Because this CPU's ->qsmask
      	bit is net, not only does this task queue itself on the leaf
      	rcu_node structure's ->blkd_tasks list, it also sets that
      	structure's ->gp_tasks pointer to reference it.
      
      4.	The grace period started in #1 above comes to an end.  This
      	results in rcu_gp_cleanup() being invoked, which, among other
      	things, checks to make sure that there are no tasks blocking the
      	just-ended grace period, that is, that all ->gp_tasks pointers
      	are NULL.  The ->gp_tasks pointer corresponding to the task
      	preempted in #3 above is non-NULL, which results in a splat.
      
      This splat is a false positive.  The task's RCU read-side critical
      section cannot have begun before the just-ended grace period because
      this would mean either: (1) The CPU came online before the grace period
      started, which cannot have happened because the grace period started
      before that CPU was all the way offline, or (2) The task started its
      RCU read-side critical section on some other CPU, but then it would
      have had to have been preempted before migrating to this CPU, which
      would mean that it would have instead queued itself on that other CPU's
      rcu_node structure.
      
      This commit eliminates this false positive by adding code to the end
      of rcu_cleanup_dying_idle_cpu() that reports a quiescent state to RCU,
      which has the side-effect of clearing that CPU's ->qsmask bit, preventing
      the above scenario.  This approach has the added benefit of more promptly
      reporting quiescent states corresponding to offline CPUs.
      
      Note well that the call to rcu_report_qs_rnp() reporting the quiescent
      state must come -before- the clearing of this CPU's bit in the leaf
      rcu_node structure's ->qsmaskinitnext field.  Otherwise, lockdep-RCU
      will complain bitterly about quiescent states coming from an offline CPU.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      fece2776
    • Paul E. McKenney's avatar
      rcu: Suppress false-positive offline-CPU lockdep-RCU splat · 5554788e
      Paul E. McKenney authored
      The rcu_lockdep_current_cpu_online() function currently checks only the
      RCU-sched data structures to determine whether or not RCU believes that a
      given CPU is offline.  Unfortunately, there are multiple flavors of RCU,
      which means that there is a short window of time during which the various
      flavors disagree as to whether or not a given CPU is offline.  This can
      result in false-positive lockdep-RCU splats in which some other flavor
      of RCU tries to do something based on its view that the CPU is online,
      only to get hit with a lockdep-RCU splat because RCU-sched instead
      believes that the CPU is offline.
      
      This commit therefore changes rcu_lockdep_current_cpu_online() to scan
      all RCU flavors and to consider a given CPU to be online if any of the
      RCU flavors believe it to be online, thus preventing these false-positive
      splats.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      5554788e