1. 18 May, 2021 3 commits
    • Paul E. McKenney's avatar
      Merge branches 'bitmaprange.2021.05.10c', 'doc.2021.05.10c',... · 641faf1b
      Paul E. McKenney authored
      Merge branches 'bitmaprange.2021.05.10c', 'doc.2021.05.10c', 'fixes.2021.05.13a', 'kvfree_rcu.2021.05.10c', 'mmdumpobj.2021.05.10c', 'nocb.2021.05.12a', 'srcu.2021.05.12a', 'tasks.2021.05.18a' and 'torture.2021.05.10c' into HEAD
      
      bitmaprange.2021.05.10c: Allow "all" for bitmap ranges.
      doc.2021.05.10c: Documentation updates.
      fixes.2021.05.13a: Miscellaneous fixes.
      kvfree_rcu.2021.05.10c: kvfree_rcu() updates.
      mmdumpobj.2021.05.10c: mem_dump_obj() updates.
      nocb.2021.05.12a: RCU NOCB CPU updates, including limited deoffloading.
      srcu.2021.05.12a: SRCU updates.
      tasks.2021.05.18a: Tasks-RCU updates.
      torture.2021.05.10c: Torture-test updates.
      641faf1b
    • Paul E. McKenney's avatar
      tasks-rcu: Make show_rcu_tasks_gp_kthreads() be static inline · 474d0997
      Paul E. McKenney authored
      In some architectures, the no-op variant of show_rcu_tasks_gp_kthreads()
      get "no previous prototype" compiler warnings.  These are false positives
      given that kernel/rcu/tasks.h is included only once.  But why put up
      with the compiler noise?
      
      This commit therefore adds "static inline" to this definition to force
      the compiler to accept this situation, while also moving it to its proper
      place in kernel/rcu/rcu.h.
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      [ paulmck: Update per Stephen Rothwell feedback. ]
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      474d0997
    • Paul E. McKenney's avatar
      rcu-tasks: Make ksoftirqd provide RCU Tasks quiescent states · cf868c2a
      Paul E. McKenney authored
      Heavy networking load can cause a CPU to execute continuously and
      indefinitely within ksoftirqd, in which case there will be no voluntary
      task switches and thus no RCU-tasks quiescent states.  This commit
      therefore causes the exiting rcu_softirq_qs() to provide an RCU-tasks
      quiescent state.
      
      This of course means that __do_softirq() and its callers cannot be
      invoked from within a tracing trampoline.
      Reported-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Tested-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Reviewed-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      cf868c2a
  2. 13 May, 2021 4 commits
    • Jules Irenge's avatar
      rcu: Add missing __releases() annotation · c70360c3
      Jules Irenge authored
      Sparse reports a warning at rcu_print_task_stall():
      
      "warning: context imbalance in rcu_print_task_stall - unexpected unlock"
      
      The root cause is a missing annotation on rcu_print_task_stall().
      
      This commit therefore adds the missing __releases(rnp->lock) annotation.
      Signed-off-by: default avatarJules Irenge <jbi.octave@gmail.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      c70360c3
    • Paul E. McKenney's avatar
      rcu: Remove obsolete rcu_read_unlock() deadlock commentary · 02238460
      Paul E. McKenney authored
      The deferred quiescent states resulting from the consolidation of RCU-bh
      and RCU-sched into RCU means that rcu_read_unlock() will no longer attempt
      to acquire scheduler locks if interrupts were disabled across that call
      to rcu_read_unlock().  The cautions in the rcu_read_unlock() header
      comment are therefore obsolete.  This commit therefore removes them.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      02238460
    • Paul E. McKenney's avatar
      rcu: Improve comments describing RCU read-side critical sections · 1893afd6
      Paul E. McKenney authored
      There are a number of places that call out the fact that preempt-disable
      regions of code now act as RCU read-side critical sections, where
      preempt-disable regions of code include irq-disable regions of code,
      bh-disable regions of code, hardirq handlers, and NMI handlers.  However,
      someone relying solely on (for example) the call_rcu() header comment
      might well have no idea that preempt-disable regions of code have RCU
      semantics.
      
      This commit therefore updates the header comments for
      call_rcu(), synchronize_rcu(), rcu_dereference_bh_check(), and
      rcu_dereference_sched_check() to call out these new(ish) forms of RCU
      readers.
      Reported-by: default avatarMichel Lespinasse <michel@lespinasse.org>
      [ paulmck: Apply Matthew Wilcox and Michel Lespinasse feedback. ]
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      1893afd6
    • Paul E. McKenney's avatar
      rcu: Create an unrcu_pointer() to remove __rcu from a pointer · 76c8eaaf
      Paul E. McKenney authored
      The xchg() and cmpxchg() functions are sometimes used to carry out RCU
      updates.  Unfortunately, this can result in sparse warnings for both
      the old-value and new-value arguments, as well as for the return value.
      The arguments can be dealt with using RCU_INITIALIZER():
      
      	old_p = xchg(&p, RCU_INITIALIZER(new_p));
      
      But a sparse warning still remains due to assigning the __rcu pointer
      returned from xchg to the (most likely) non-__rcu pointer old_p.
      
      This commit therefore provides an unrcu_pointer() macro that strips
      the __rcu.  This macro can be used as follows:
      
      	old_p = unrcu_pointer(xchg(&p, RCU_INITIALIZER(new_p)));
      Reported-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      76c8eaaf
  3. 12 May, 2021 9 commits
    • Frederic Weisbecker's avatar
      srcu: Early test SRCU polling start · 0a580fa6
      Frederic Weisbecker authored
      Place an early call to start_poll_synchronize_srcu() before the invocation
      of call_srcu() on the same srcu_struct structure.
      
      After the later call to srcu_barrier(), the completion of the
      first grace period should be visible to a subsequent invocation of
      poll_state_synchronize_srcu(), and if not, warn.
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Joel Fernandes <joel@joelfernandes.org>
      Cc: Uladzislau Rezki <urezki@gmail.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      0a580fa6
    • Ingo Molnar's avatar
      rcu: Fix various typos in comments · a616aec9
      Ingo Molnar authored
      Fix ~12 single-word typos in RCU code comments.
      
      [ paulmck: Apply feedback from Randy Dunlap. ]
      Reviewed-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      a616aec9
    • Frederic Weisbecker's avatar
      rcu/nocb: Unify timers · e75bcd48
      Frederic Weisbecker authored
      Now that ->nocb_timer and ->nocb_bypass_timer have become quite similar,
      this commit merges them together.  A new RCU_NOCB_WAKE_BYPASS wake level
      is introduced.  As a result, timers perform all kinds of deferred wake
      ups but other deferred wakeup callsites only handle non-bypass wakeups
      in order not to wake up rcuo too early.
      
      The timer also unconditionally executes a full barrier so as to order
      timer_pending() and callback enqueue although the path performing
      RCU_NOCB_WAKE_FORCE that makes use of it is debatable. It should also
      test against the rdp leader instead of the current rdp.
      
      This unconditional full barrier shouldn't bring visible overhead since
      these timers almost never fire.
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Joel Fernandes <joel@joelfernandes.org>
      Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      e75bcd48
    • Frederic Weisbecker's avatar
      rcu/nocb: Prepare for fine-grained deferred wakeup · 87090516
      Frederic Weisbecker authored
      Tuning the deferred wakeup level must be done from a safe wakeup
      point. Currently those sites are:
      
      * ->nocb_timer
      * user/idle/guest entry
      * CPU down
      * softirq/rcuc
      
      All of these sites perform the wake up for both RCU_NOCB_WAKE and
      RCU_NOCB_WAKE_FORCE.
      
      In order to merge ->nocb_timer and ->nocb_bypass_timer together, we plan
      to add a new RCU_NOCB_WAKE_BYPASS that really should be deferred until
      a timer fires so that we don't wake up the NOCB-gp kthread too early.
      
      To prepare for that, this commit specifies the per-callsite wakeup
      level/limit.
      
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Joel Fernandes <joel@joelfernandes.org>
      Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      [ paulmck: Fix non-NOCB rcu_nocb_need_deferred_wakeup() definition. ]
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      87090516
    • Frederic Weisbecker's avatar
      rcu/nocb: Only cancel nocb timer if not polling · f9fc166b
      Frederic Weisbecker authored
      This commit refrains deleting the ->nocb_timer if rcu_nocb is polling
      because it should not ever have been queued in the polling case.
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Joel Fernandes <joel@joelfernandes.org>
      Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      f9fc166b
    • Frederic Weisbecker's avatar
      rcu/nocb: Delete bypass_timer upon nocb_gp wakeup · 3b2348e2
      Frederic Weisbecker authored
      A NOCB-gp wake p can safely delete the ->nocb_bypass_timer because
      nocb_gp_wait() will recheck again the bypass state and rearm the bypass
      timer if necessary.  This commit therefore deletes this timer.
      Reviewed-by: default avatarBoqun Feng <boqun.feng@gmail.com>
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Joel Fernandes <joel@joelfernandes.org>
      Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      3b2348e2
    • Frederic Weisbecker's avatar
      rcu/nocb: Cancel nocb_timer upon nocb_gp wakeup · b6e2c4ed
      Frederic Weisbecker authored
      When waking up in nocb_gp_wait(), there is no need to keep the nocb_timer
      around because this function will traverse the whole rdp list. Any
      update performed before the timer was armed will now be visible after
      the ->nocb_gp_lock acquire.
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Joel Fernandes <joel@joelfernandes.org>
      Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      b6e2c4ed
    • Frederic Weisbecker's avatar
      rcu/nocb: Allow de-offloading rdp leader · 552cac80
      Frederic Weisbecker authored
      The only thing that prevented an rdp leader from being de-offloaded was
      the nocb_bypass_timer that used to lock the nocb_lock of the rdp leader.
      
      If an rdp gets de-offloaded, it will subtlely ignore rcu_nocb_lock()
      calls and do its job in the timer unsafely.  Worse yet:  If it gets
      re-offloaded in the middle of the timer, rcu_nocb_unlock() would try to
      unlock, leaving it imbalanced.
      
      Now that the nocb_bypass_timer doesn't use the nocb_lock anymore,
      de-offloading the rdp leader is now safe.  This commit therefore allows
      the rdp leader to be de-offloaded.
      Reported-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Joel Fernandes <joel@joelfernandes.org>
      Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      552cac80
    • Frederic Weisbecker's avatar
      rcu/nocb: Directly call __wake_nocb_gp() from bypass timer · c7ef7500
      Frederic Weisbecker authored
      The bypass timer calls __call_rcu_nocb_wake() instead of directly
      calling __wake_nocb_gp().  The only difference here is that
      rdp->qlen_last_fqs_check gets overridden.  But resetting the deferred
      force quiescent state base shouldn't be relevant for that timer.  In fact
      the bypass queue in question can be for any rdp from the group and not
      necessarily the rdp leader on which the bypass timer is attached.
      
      This commit therefore calls __wake_nocb_gp() directly.  This way we
      don't even need to lock the ->nocb_lock.
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Joel Fernandes <joel@joelfernandes.org>
      Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      c7ef7500
  4. 10 May, 2021 24 commits
    • Paul E. McKenney's avatar
      rcu: Don't penalize priority boosting when there is nothing to boost · 5390473e
      Paul E. McKenney authored
      RCU priority boosting cannot do anything unless there is at least one
      task blocking the current RCU grace period that was preempted within
      the RCU read-side critical section that it still resides in.  However,
      the current rcu_torture_boost_failed() code will count this as an RCU
      priority-boosting failure if there were no CPUs blocking the current
      grace period.  This situation can happen (for example) if the last CPU
      blocking the current grace period was subjected to vCPU preemption,
      which is always a risk for rcutorture guest OSes.
      
      This commit therefore causes rcu_torture_boost_failed() to refrain from
      reporting failure unless there is at least one task blocking the current
      RCU grace period that was preempted within the RCU read-side critical
      section that it still resides in.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      5390473e
    • Paul E. McKenney's avatar
      rcu: Point to documentation of ordering guarantees · 3d3a0d1b
      Paul E. McKenney authored
      Add comments to synchronize_rcu() and friends that point to
      Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      3d3a0d1b
    • Paul E. McKenney's avatar
      rcu: Make rcu_gp_cleanup() be noinline for tracing · 2f20de99
      Paul E. McKenney authored
      Although there are trace events for RCU grace periods, these are only
      enabled in CONFIG_RCU_TRACE=y kernels.  This commit therefore marks
      rcu_gp_cleanup() noinline in order to provide a function that can be
      traced that is invoked near the end of each grace period.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      2f20de99
    • Paul E. McKenney's avatar
      rcu: Restrict RCU_STRICT_GRACE_PERIOD to at most four CPUs · 4d80b8e1
      Paul E. McKenney authored
      Kernels built with CONFIG_RCU_STRICT_GRACE_PERIOD=y can experience
      significant lock contention due to RCU's resulting focus on ending grace
      periods as soon as possible.  This is OK, but only if there are not very
      many CPUs.  This commit therefore puts this Kconfig option off-limits
      to systems with more than four CPUs.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      4d80b8e1
    • Paul E. McKenney's avatar
      rcu: Make show_rcu_gp_kthreads() dump rcu_node structures blocking GP · b1580501
      Paul E. McKenney authored
      Currently, show_rcu_gp_kthreads() only dumps rcu_node structures that
      have outdated ideas of the current grace-period number.  This commit
      also dumps those that are in any way blocking the current grace period.
      This helps diagnose RCU priority boosting failures.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      b1580501
    • Paul E. McKenney's avatar
      rcu: Make RCU priority boosting work on single-CPU rcu_node structures · 3ef5a1c3
      Paul E. McKenney authored
      When any CPU comes online, it checks to see if an RCU-boost kthread has
      already been created for that CPU's leaf rcu_node structure, and if
      not, it creates one.  Unfortunately, it also verifies that this leaf
      rcu_node structure actually has at least one online CPU, and if not,
      it declines to create the kthread.  Although this behavior makes sense
      during early boot, especially on systems that claim far more CPUs than
      they actually have, it makes no sense for the first CPU to come online
      for a given rcu_node structure.  There is no point in checking because
      we know there is a CPU on its way in.
      
      The problem is that timing differences can cause this incoming CPU to not
      yet be reflected in the various bit masks even at rcutree_online_cpu()
      time, and there is no chance at rcutree_prepare_cpu() time.  Plus it
      would be better to create the RCU-boost kthread at rcutree_prepare_cpu()
      to handle the case where the CPU is involved in an RCU priority inversion
      very shortly after it comes online.
      
      This commit therefore moves the checking to rcu_prepare_kthreads(), which
      is called only at early boot, when the check is appropriate.  In addition,
      it makes rcutree_prepare_cpu() invoke rcu_spawn_one_boost_kthread(), which
      no longer does any checking for online CPUs.
      
      With this change, RCU priority boosting tests now pass for short rcutorture
      runs, even with single-CPU leaf rcu_node structures.
      
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Scott Wood <swood@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      3ef5a1c3
    • Paul E. McKenney's avatar
      rcu: Add quiescent states and boost states to show_rcu_gp_kthreads() output · 396eba65
      Paul E. McKenney authored
      This commit adds each rcu_node structure's ->qsmask and "bBEG" output
      indicating whether: (1) There is a boost kthread, (2) A reader needs
      to be (or is in the process of being) boosted, (3) A reader is blocking
      an expedited grace period, and (4) A reader is blocking a normal grace
      period.  This helps diagnose RCU priority boosting failures.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      396eba65
    • Paul E. McKenney's avatar
      rcu: Reject RCU_LOCKDEP_WARN() false positives · 30668200
      Paul E. McKenney authored
      If another lockdep report runs concurrently with an RCU lockdep report
      from RCU_LOCKDEP_WARN(), the following sequence of events can occur:
      
      1.	debug_lockdep_rcu_enabled() sees that lockdep is enabled
      	when called from (say) synchronize_rcu().
      
      2.	Lockdep is disabled by a concurrent lockdep report.
      
      3.	debug_lockdep_rcu_enabled() evaluates its lockdep-expression
      	argument, for example, lock_is_held(&rcu_bh_lock_map).
      
      4.	Because lockdep is now disabled, lock_is_held() plays it safe and
      	returns the constant 1.
      
      5.	But in this case, the constant 1 is not safe, because invoking
      	synchronize_rcu() under rcu_read_lock_bh() is disallowed.
      
      6.	debug_lockdep_rcu_enabled() wrongly invokes lockdep_rcu_suspicious(),
      	resulting in a false-positive splat.
      
      This commit therefore changes RCU_LOCKDEP_WARN() to check
      debug_lockdep_rcu_enabled() after checking the lockdep expression,
      so that any "safe" returns from lock_is_held() are rejected by
      debug_lockdep_rcu_enabled().  This requires memory ordering, which is
      supplied by READ_ONCE(debug_locks).  The resulting volatile accesses
      prevent the compiler from reordering and the fact that only one variable
      is being accessed prevents the underlying hardware from reordering.
      The combination works for IA64, which can reorder reads to the same
      location, but this is defeated by the volatile accesses, which compile
      to load instructions that provide ordering.
      
      Reported-by: syzbot+dde0cc33951735441301@syzkaller.appspotmail.com
      Reported-by: default avatarMatthew Wilcox <willy@infradead.org>
      Reported-by: syzbot+88e4f02896967fe1ab0d@syzkaller.appspotmail.com
      Reported-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Suggested-by: default avatarBoqun Feng <boqun.feng@gmail.com>
      Reviewed-by: default avatarBoqun Feng <boqun.feng@gmail.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      30668200
    • Paul E. McKenney's avatar
      lockdep: Explicitly flag likely false-positive report · 1feb2cc8
      Paul E. McKenney authored
      The reason that lockdep_rcu_suspicious() prints the value of debug_locks
      is because a value of zero indicates a likely false positive.  This can
      work, but is a bit obtuse.  This commit therefore explicitly calls out
      the possibility of a false positive.
      Reviewed-by: default avatarBoqun Feng <boqun.feng@gmail.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      1feb2cc8
    • Paul E. McKenney's avatar
      rcu: Add ->gp_max to show_rcu_gp_kthreads() output · 27ba76e1
      Paul E. McKenney authored
      This commit adds ->gp_max to show_rcu_gp_kthreads() output in order to
      better diagnose RCU priority boosting failures.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      27ba76e1
    • Paul E. McKenney's avatar
      rcu: Add ->rt_priority and ->gp_start to show_rcu_gp_kthreads() output · e44111ed
      Paul E. McKenney authored
      This commit adds ->rt_priority and ->gp_start to show_rcu_gp_kthreads()
      output in order to better diagnose RCU priority boosting failures.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      e44111ed
    • Paul E. McKenney's avatar
      rcu: Invoke rcu_spawn_core_kthreads() from rcu_spawn_gp_kthread() · 8e4b1d2b
      Paul E. McKenney authored
      Currently, rcu_spawn_core_kthreads() is invoked via an early_initcall(),
      which works, except that rcu_spawn_gp_kthread() is also invoked via an
      early_initcall() and rcu_spawn_core_kthreads() relies on adjustments to
      kthread_prio that are carried out by rcu_spawn_gp_kthread().  There is
      no guaranttee of ordering among early_initcall() handlers, and thus no
      guarantee that kthread_prio will be properly checked and range-limited
      at the time that rcu_spawn_core_kthreads() needs it.
      
      In most cases, this bug is harmless.  After all, the only reason that
      rcu_spawn_gp_kthread() adjusts the value of kthread_prio is if the user
      specified a nonsensical value for this boot parameter, which experience
      indicates is rare.
      
      Nevertheless, a bug is a bug.  This commit therefore causes the
      rcu_spawn_core_kthreads() function to be invoked directly from
      rcu_spawn_gp_kthread() after any needed adjustments to kthread_prio have
      been carried out.
      
      Fixes: 48d07c04 ("rcu: Enable elimination of Tree-RCU softirq processing")
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      8e4b1d2b
    • Zhouyi Zhou's avatar
      rcu: Improve tree.c comments and add code cleanups · 277ffe1b
      Zhouyi Zhou authored
      This commit cleans up some comments and code in kernel/rcu/tree.c.
      Signed-off-by: default avatarZhouyi Zhou <zhouzhouyi@gmail.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      277ffe1b
    • Paul E. McKenney's avatar
      rcu: Remove the unused rcu_irq_exit_preempt() function · ce7c169d
      Paul E. McKenney authored
      Commit 9ee01e0f ("x86/entry: Clean up idtentry_enter/exit()
      leftovers") left the rcu_irq_exit_preempt() in place in order to avoid
      conflicts with the -rcu tree.  Now that this change has long since hit
      mainline, this commit removes the no-longer-used rcu_irq_exit_preempt()
      function.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      ce7c169d
    • Paul E. McKenney's avatar
      rcutorture: Move mem_dump_obj() tests into separate function · 7ab2bd31
      Paul E. McKenney authored
      To make the purpose of the code more apparent, this commit moves the
      tests of mem_dump_obj() to a new rcu_torture_mem_dump_obj() function
      and calls it from rcu_torture_cleanup().
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      7ab2bd31
    • Paul E. McKenney's avatar
      torture: Don't cap remote runs by build-system number of CPUs · 3d78668e
      Paul E. McKenney authored
      Currently, if a torture scenario requires more CPUs than are present
      on the build system, kvm.sh and friends limit the CPUs available to
      that scenario.  This makes total sense when the build system and the
      system running the scenarios are one and the same, but not so much when
      remote systems might well have more CPUs.
      
      This commit therefore introduces a --remote flag to kvm.sh that suppresses
      this CPU-limiting behavior, and causes kvm-remote.sh to use this flag.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      3d78668e
    • Paul E. McKenney's avatar
      torture: Make kvm-remote.sh account for network failure in pathname checks · c43d3b00
      Paul E. McKenney authored
      In a long-duration kvm-remote.sh run, almost all of the remote accesses will
      be simple file-existence checks.  These are thus the most likely to be caught
      out by network failures, which do happen from time to time.
      
      This commit therefore takes a first step towards tolerating temporary
      network outages by making the file-existence checks repeat in the face of
      such an outage.  They also print a message every minute during a outage,
      allowing the user to take appropriate action.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      c43d3b00
    • Paul E. McKenney's avatar
      rcutorture: Don't count CPU-stalled time against priority boosting · 063f5a4d
      Paul E. McKenney authored
      It will frequently be the case that rcu_torture_boost() will get a
      ->start_gp_poll() cookie that needs almost all of the current grace period
      plus an additional grace period to elapse before ->poll_gp_state() will
      return true.  It is quite possible that the current grace period will have
      (say) two seconds of stall by a CPU failing to pass through a quiescent
      state, followed by 300 milliseconds of delay due to a preempted reader.
      The next grace period might suffer only one second of stall by a CPU,
      followed by another 300 milliseconds of delay due to a preempted reader.
      This is an example of RCU priority boosting doing its job, but the full
      elapsed time of 3.6 seconds exceeds the 3.5-second limit.  In addition,
      there is no CPU stall in force at the 3.5-second mark, so this would
      nevertheless currently be counted as an RCU priority boosting failure.
      
      This commit therefore avoids this sort of false positive by resetting
      the gp_state_time timestamp any time that the current grace period is
      being blocked by a CPU.  This results in extremely frequent calls to
      the ->check_boost_failed() function, so this commit provides a lockless
      fastpath that is selected by supplying a NULL CPU-number pointer.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      063f5a4d
    • Paul E. McKenney's avatar
      rcutorture: Forgive RCU boost failures when CPUs don't pass through QS · 0260b92e
      Paul E. McKenney authored
      Currently, rcu_torture_boost() runs CPU-bound at real-time priority
      to force RCU priority inversions.  It then checks that grace periods
      progress during this CPU-bound time.  If grace periods fail to progress,
      it reports and RCU priority boosting failure.
      
      However, it is possible (and sometimes does happen) that the grace period
      fails to progress due to a CPU failing to pass through a quiescent state
      for an extended time period (3.5 seconds by default).  This can happen
      due to vCPU preemption, long-running interrupts, and much else besides.
      There is nothing that RCU priority boosting can do about these situations,
      and so they should not be counted as RCU priority boosting failures.
      
      This commit therefore checks for CPUs (as opposed to preempted tasks)
      holding up a grace period, and flags the resulting RCU priority boosting
      failures, but does not splat nor count them as errors.  It does rate-limit
      them to avoid flooding the console log.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      0260b92e
    • Paul E. McKenney's avatar
      rcutorture: Add BUSTED-BOOST to test RCU priority boosting tests · d4240d62
      Paul E. McKenney authored
      This commit adds the BUSTED-BOOST rcutorture scenario, which can be
      used to test rcutorture's ability to test RCU priority boosting.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      d4240d62
    • Paul E. McKenney's avatar
      rcutorture: Make rcu_torture_boost_failed() check for GP end · bcd4af44
      Paul E. McKenney authored
      It is possible that a delayed grace period that rcu_torture_boost()
      was polling for ended while rcu_torture_boost_failed() was printing the
      failure splat.  It would be good to know when this happens.  This commit
      therefore has rcu_torture_boost_failed() recheck the grace period after
      printing the splat, and printing a message indicating whether or not
      the grace period has ended.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      bcd4af44
    • Paul E. McKenney's avatar
      rcutorture: Consolidate rcu_torture_boost() timing and statistics · 8c7ec02e
      Paul E. McKenney authored
      This commit consolidates two loops in rcu_torture_boost(), one of which
      counts the number of boost-test episodes and the other of which computes
      the start time of the next episode, into one loop that does both with but
      a single acquisition of boost_mutex.  This means that the count of the
      number of boost-test episodes is incremented after an episode completes
      rather than before it starts, but it also avoids the over-counting that
      was possible previously.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      8c7ec02e
    • Paul E. McKenney's avatar
      rcutorture: Delay-based false positives for RCU priority boosting tests · 7b9dad7a
      Paul E. McKenney authored
      If an rcu_torture_boost() kthread determines that its grace period
      has not yet ended, it invokes rcu_torture_boost_failed() which checks
      whether enough time has elapsed for this to be considered a failure of
      RCU priority boosting, and, if so, flags the error.
      
      Unfortunately, that kthread might be preempted for some seconds between
      the time that it checks the grace period and the time that it checks the
      time.  This delay can result in a false positive, featuring a complaint
      that a particular grace period has not ended, followed by a diagnostic
      dump featuring a much later grace period.
      
      This commit avoids these false positives by rechecking for the end of
      the grace period after the time check.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      7b9dad7a
    • Paul E. McKenney's avatar
      torture: Set kvm.sh language to English · 00ad25f6
      Paul E. McKenney authored
      Some of the code invoked directly and indirectly from kvm.sh parses
      the output of commands.  This parsing assumes English, which can cause
      failures if the user has set some other language.  In a few cases,
      there are language-independent commands available, but this is not
      always the case.  Therefore, as an alternative to polyglot parsing,
      this commit sets the LANG environment variable to en_US.UTF-8.
      Reported-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      00ad25f6