Commit cee1352f authored by Linus Torvalds's avatar Linus Torvalds

Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RCU updates from Ingo Molnar:
 "The biggest change in this cycle is the conclusion of the big
  'simplify RCU to two primary flavors' consolidation work - i.e.
  there's a single RCU flavor for any kernel variant (PREEMPT and
  !PREEMPT):

    - Consolidate the RCU-bh, RCU-preempt, and RCU-sched flavors into a
      single flavor similar to RCU-sched in !PREEMPT kernels and into a
      single flavor similar to RCU-preempt (but also waiting on
      preempt-disabled sequences of code) in PREEMPT kernels.

      This branch also includes a refactoring of
      rcu_{nmi,irq}_{enter,exit}() from Byungchul Park.

    - Now that there is only one RCU flavor in any given running kernel,
      the many "rsp" pointers are no longer required, and this cleanup
      series removes them.

    - This branch carries out additional cleanups made possible by the
      RCU flavor consolidation, including inlining now-trivial
      functions, updating comments and definitions, and removing
      now-unneeded rcutorture scenarios.

    - Now that there is only one flavor of RCU in any running kernel,
      there is also only on rcu_data structure per CPU. This means that
      the rcu_dynticks structure can be merged into the rcu_data
      structure, a task taken on by this branch. This branch also
      contains a -rt-related fix from Mike Galbraith.

  There were also other updates:

    - Documentation updates, including some good-eye catches from Joel
      Fernandes.

    - SRCU updates, most notably changes enabling call_srcu() to be
      invoked very early in the boot sequence.

    - Torture-test updates, including some preliminary work towards
      making rcutorture better able to find problems that result in
      insufficient grace-period forward progress.

    - Initial changes to RCU to better promote forward progress of grace
      periods, including fixing a bug found by Marius Hillenbrand and
      David Woodhouse, with the fix suggested by Peter Zijlstra"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (140 commits)
  srcu: Make early-boot call_srcu() reuse workqueue lists
  rcutorture: Test early boot call_srcu()
  srcu: Make call_srcu() available during very early boot
  rcu: Convert rcu_state.ofl_lock to raw_spinlock_t
  rcu: Remove obsolete ->dynticks_fqs and ->cond_resched_completed
  rcu: Switch ->dynticks to rcu_data structure, remove rcu_dynticks
  rcu: Switch dyntick nesting counters to rcu_data structure
  rcu: Switch urgent quiescent-state requests to rcu_data structure
  rcu: Switch lazy counts to rcu_data structure
  rcu: Switch last accelerate/advance to rcu_data structure
  rcu: Switch ->tick_nohz_enabled_snap to rcu_data structure
  rcu: Merge rcu_dynticks structure into rcu_data structure
  rcu: Remove unused rcu_dynticks_snap() from Tiny RCU
  rcu: Convert "1UL << x" to "BIT(x)"
  rcu: Avoid resched_cpu() when rescheduling the current CPU
  rcu: More aggressively enlist scheduler aid for nohz_full CPUs
  rcu: Compute jiffies_till_sched_qs from other kernel parameters
  rcu: Provide functions for determining if call_rcu() has been invoked
  rcu: Eliminate ->rcu_qs_ctr from the rcu_dynticks structure
  rcu: Motivate Tiny RCU forward progress
  ...
parents e2b623fb d0346559
...@@ -1227,9 +1227,11 @@ to overflow the counter, this approach corrects the ...@@ -1227,9 +1227,11 @@ to overflow the counter, this approach corrects the
CPU enters the idle loop from process context. CPU enters the idle loop from process context.
</p><p>The <tt>-&gt;dynticks</tt> field counts the corresponding </p><p>The <tt>-&gt;dynticks</tt> field counts the corresponding
CPU's transitions to and from dyntick-idle mode, so that this counter CPU's transitions to and from either dyntick-idle or user mode, so
has an even value when the CPU is in dyntick-idle mode and an odd that this counter has an even value when the CPU is in dyntick-idle
value otherwise. mode or user mode and an odd value otherwise. The transitions to/from
user mode need to be counted for user mode adaptive-ticks support
(see timers/NO_HZ.txt).
</p><p>The <tt>-&gt;rcu_need_heavy_qs</tt> field is used </p><p>The <tt>-&gt;rcu_need_heavy_qs</tt> field is used
to record the fact that the RCU core code would really like to to record the fact that the RCU core code would really like to
...@@ -1372,8 +1374,7 @@ that is, if the CPU is currently idle. ...@@ -1372,8 +1374,7 @@ that is, if the CPU is currently idle.
Accessor Functions</a></h3> Accessor Functions</a></h3>
<p>The following listing shows the <p>The following listing shows the
<tt>rcu_get_root()</tt>, <tt>rcu_for_each_node_breadth_first</tt>, <tt>rcu_get_root()</tt>, <tt>rcu_for_each_node_breadth_first</tt> and
<tt>rcu_for_each_nonleaf_node_breadth_first()</tt>, and
<tt>rcu_for_each_leaf_node()</tt> function and macros: <tt>rcu_for_each_leaf_node()</tt> function and macros:
<pre> <pre>
...@@ -1386,13 +1387,9 @@ Accessor Functions</a></h3> ...@@ -1386,13 +1387,9 @@ Accessor Functions</a></h3>
7 for ((rnp) = &amp;(rsp)-&gt;node[0]; \ 7 for ((rnp) = &amp;(rsp)-&gt;node[0]; \
8 (rnp) &lt; &amp;(rsp)-&gt;node[NUM_RCU_NODES]; (rnp)++) 8 (rnp) &lt; &amp;(rsp)-&gt;node[NUM_RCU_NODES]; (rnp)++)
9 9
10 #define rcu_for_each_nonleaf_node_breadth_first(rsp, rnp) \ 10 #define rcu_for_each_leaf_node(rsp, rnp) \
11 for ((rnp) = &amp;(rsp)-&gt;node[0]; \ 11 for ((rnp) = (rsp)-&gt;level[NUM_RCU_LVLS - 1]; \
12 (rnp) &lt; (rsp)-&gt;level[NUM_RCU_LVLS - 1]; (rnp)++) 12 (rnp) &lt; &amp;(rsp)-&gt;node[NUM_RCU_NODES]; (rnp)++)
13
14 #define rcu_for_each_leaf_node(rsp, rnp) \
15 for ((rnp) = (rsp)-&gt;level[NUM_RCU_LVLS - 1]; \
16 (rnp) &lt; &amp;(rsp)-&gt;node[NUM_RCU_NODES]; (rnp)++)
</pre> </pre>
<p>The <tt>rcu_get_root()</tt> simply returns a pointer to the <p>The <tt>rcu_get_root()</tt> simply returns a pointer to the
...@@ -1405,10 +1402,7 @@ macro takes advantage of the layout of the <tt>rcu_node</tt> ...@@ -1405,10 +1402,7 @@ macro takes advantage of the layout of the <tt>rcu_node</tt>
structures in the <tt>rcu_state</tt> structure's structures in the <tt>rcu_state</tt> structure's
<tt>-&gt;node[]</tt> array, performing a breadth-first traversal by <tt>-&gt;node[]</tt> array, performing a breadth-first traversal by
simply traversing the array in order. simply traversing the array in order.
The <tt>rcu_for_each_nonleaf_node_breadth_first()</tt> macro operates Similarly, the <tt>rcu_for_each_leaf_node()</tt> macro traverses only
similarly, but traverses only the first part of the array, thus excluding
the leaf <tt>rcu_node</tt> structures.
Finally, the <tt>rcu_for_each_leaf_node()</tt> macro traverses only
the last part of the array, thus traversing only the leaf the last part of the array, thus traversing only the leaf
<tt>rcu_node</tt> structures. <tt>rcu_node</tt> structures.
...@@ -1416,15 +1410,14 @@ the last part of the array, thus traversing only the leaf ...@@ -1416,15 +1410,14 @@ the last part of the array, thus traversing only the leaf
<tr><th>&nbsp;</th></tr> <tr><th>&nbsp;</th></tr>
<tr><th align="left">Quick Quiz:</th></tr> <tr><th align="left">Quick Quiz:</th></tr>
<tr><td> <tr><td>
What do <tt>rcu_for_each_nonleaf_node_breadth_first()</tt> and What does
<tt>rcu_for_each_leaf_node()</tt> do if the <tt>rcu_node</tt> tree <tt>rcu_for_each_leaf_node()</tt> do if the <tt>rcu_node</tt> tree
contains only a single node? contains only a single node?
</td></tr> </td></tr>
<tr><th align="left">Answer:</th></tr> <tr><th align="left">Answer:</th></tr>
<tr><td bgcolor="#ffffff"><font color="ffffff"> <tr><td bgcolor="#ffffff"><font color="ffffff">
In the single-node case, In the single-node case,
<tt>rcu_for_each_nonleaf_node_breadth_first()</tt> is a no-op <tt>rcu_for_each_leaf_node()</tt> traverses the single node.
and <tt>rcu_for_each_leaf_node()</tt> traverses the single node.
</font></td></tr> </font></td></tr>
<tr><td>&nbsp;</td></tr> <tr><td>&nbsp;</td></tr>
</table> </table>
......
...@@ -12,10 +12,9 @@ high efficiency and minimal disturbance, expedited grace periods accept ...@@ -12,10 +12,9 @@ high efficiency and minimal disturbance, expedited grace periods accept
lower efficiency and significant disturbance to attain shorter latencies. lower efficiency and significant disturbance to attain shorter latencies.
<p> <p>
There are three flavors of RCU (RCU-bh, RCU-preempt, and RCU-sched), There are two flavors of RCU (RCU-preempt and RCU-sched), with an earlier
but only two flavors of expedited grace periods because the RCU-bh third RCU-bh flavor having been implemented in terms of the other two.
expedited grace period maps onto the RCU-sched expedited grace period. Each of the two implementations is covered in its own section.
Each of the remaining two implementations is covered in its own section.
<ol> <ol>
<li> <a href="#Expedited Grace Period Design"> <li> <a href="#Expedited Grace Period Design">
...@@ -158,7 +157,7 @@ whether or not the current CPU is in an RCU read-side critical section. ...@@ -158,7 +157,7 @@ whether or not the current CPU is in an RCU read-side critical section.
The best that <tt>sync_sched_exp_handler()</tt> can do is to check The best that <tt>sync_sched_exp_handler()</tt> can do is to check
for idle, on the off-chance that the CPU went idle while the IPI for idle, on the off-chance that the CPU went idle while the IPI
was in flight. was in flight.
If the CPU is idle, then tt>sync_sched_exp_handler()</tt> reports If the CPU is idle, then <tt>sync_sched_exp_handler()</tt> reports
the quiescent state. the quiescent state.
<p> <p>
......
...@@ -1306,8 +1306,6 @@ doing so would degrade real-time response. ...@@ -1306,8 +1306,6 @@ doing so would degrade real-time response.
<p> <p>
This non-requirement appeared with preemptible RCU. This non-requirement appeared with preemptible RCU.
If you need a grace period that waits on non-preemptible code regions, use
<a href="#Sched Flavor">RCU-sched</a>.
<h2><a name="Parallelism Facts of Life">Parallelism Facts of Life</a></h2> <h2><a name="Parallelism Facts of Life">Parallelism Facts of Life</a></h2>
...@@ -2165,14 +2163,9 @@ however, this is not a panacea because there would be severe restrictions ...@@ -2165,14 +2163,9 @@ however, this is not a panacea because there would be severe restrictions
on what operations those callbacks could invoke. on what operations those callbacks could invoke.
<p> <p>
Perhaps surprisingly, <tt>synchronize_rcu()</tt>, Perhaps surprisingly, <tt>synchronize_rcu()</tt> and
<a href="#Bottom-Half Flavor"><tt>synchronize_rcu_bh()</tt></a>
(<a href="#Bottom-Half Flavor">discussed below</a>),
<a href="#Sched Flavor"><tt>synchronize_sched()</tt></a>,
<tt>synchronize_rcu_expedited()</tt>, <tt>synchronize_rcu_expedited()</tt>,
<tt>synchronize_rcu_bh_expedited()</tt>, and will operate normally
<tt>synchronize_sched_expedited()</tt>
will all operate normally
during very early boot, the reason being that there is only one CPU during very early boot, the reason being that there is only one CPU
and preemption is disabled. and preemption is disabled.
This means that the call <tt>synchronize_rcu()</tt> (or friends) This means that the call <tt>synchronize_rcu()</tt> (or friends)
...@@ -2269,12 +2262,23 @@ Thankfully, RCU update-side primitives, including ...@@ -2269,12 +2262,23 @@ Thankfully, RCU update-side primitives, including
The name notwithstanding, some Linux-kernel architectures The name notwithstanding, some Linux-kernel architectures
can have nested NMIs, which RCU must handle correctly. can have nested NMIs, which RCU must handle correctly.
Andy Lutomirski Andy Lutomirski
<a href="https://lkml.kernel.org/g/CALCETrXLq1y7e_dKFPgou-FKHB6Pu-r8+t-6Ds+8=va7anBWDA@mail.gmail.com">surprised me</a> <a href="https://lkml.kernel.org/r/CALCETrXLq1y7e_dKFPgou-FKHB6Pu-r8+t-6Ds+8=va7anBWDA@mail.gmail.com">surprised me</a>
with this requirement; with this requirement;
he also kindly surprised me with he also kindly surprised me with
<a href="https://lkml.kernel.org/g/CALCETrXSY9JpW3uE6H8WYk81sg56qasA2aqmjMPsq5dOtzso=g@mail.gmail.com">an algorithm</a> <a href="https://lkml.kernel.org/r/CALCETrXSY9JpW3uE6H8WYk81sg56qasA2aqmjMPsq5dOtzso=g@mail.gmail.com">an algorithm</a>
that meets this requirement. that meets this requirement.
<p>
Furthermore, NMI handlers can be interrupted by what appear to RCU
to be normal interrupts.
One way that this can happen is for code that directly invokes
<tt>rcu_irq_enter()</tt> and </tt>rcu_irq_exit()</tt> to be called
from an NMI handler.
This astonishing fact of life prompted the current code structure,
which has <tt>rcu_irq_enter()</tt> invoking <tt>rcu_nmi_enter()</tt>
and <tt>rcu_irq_exit()</tt> invoking <tt>rcu_nmi_exit()</tt>.
And yes, I also learned of this requirement the hard way.
<h3><a name="Loadable Modules">Loadable Modules</a></h3> <h3><a name="Loadable Modules">Loadable Modules</a></h3>
<p> <p>
...@@ -2394,30 +2398,9 @@ when invoked from a CPU-hotplug notifier. ...@@ -2394,30 +2398,9 @@ when invoked from a CPU-hotplug notifier.
<p> <p>
RCU depends on the scheduler, and the scheduler uses RCU to RCU depends on the scheduler, and the scheduler uses RCU to
protect some of its data structures. protect some of its data structures.
This means the scheduler is forbidden from acquiring The preemptible-RCU <tt>rcu_read_unlock()</tt>
the runqueue locks and the priority-inheritance locks implementation must therefore be written carefully to avoid deadlocks
in the middle of an outermost RCU read-side critical section unless either involving the scheduler's runqueue and priority-inheritance locks.
(1)&nbsp;it releases them before exiting that same
RCU read-side critical section, or
(2)&nbsp;interrupts are disabled across
that entire RCU read-side critical section.
This same prohibition also applies (recursively!) to any lock that is acquired
while holding any lock to which this prohibition applies.
Adhering to this rule prevents preemptible RCU from invoking
<tt>rcu_read_unlock_special()</tt> while either runqueue or
priority-inheritance locks are held, thus avoiding deadlock.
<p>
Prior to v4.4, it was only necessary to disable preemption across
RCU read-side critical sections that acquired scheduler locks.
In v4.4, expedited grace periods started using IPIs, and these
IPIs could force a <tt>rcu_read_unlock()</tt> to take the slowpath.
Therefore, this expedited-grace-period change required disabling of
interrupts, not just preemption.
<p>
For RCU's part, the preemptible-RCU <tt>rcu_read_unlock()</tt>
implementation must be written carefully to avoid similar deadlocks.
In particular, <tt>rcu_read_unlock()</tt> must tolerate an In particular, <tt>rcu_read_unlock()</tt> must tolerate an
interrupt where the interrupt handler invokes both interrupt where the interrupt handler invokes both
<tt>rcu_read_lock()</tt> and <tt>rcu_read_unlock()</tt>. <tt>rcu_read_lock()</tt> and <tt>rcu_read_unlock()</tt>.
...@@ -2426,7 +2409,7 @@ negative nesting levels to avoid destructive recursion via ...@@ -2426,7 +2409,7 @@ negative nesting levels to avoid destructive recursion via
interrupt handler's use of RCU. interrupt handler's use of RCU.
<p> <p>
This pair of mutual scheduler-RCU requirements came as a This scheduler-RCU requirement came as a
<a href="https://lwn.net/Articles/453002/">complete surprise</a>. <a href="https://lwn.net/Articles/453002/">complete surprise</a>.
<p> <p>
...@@ -2437,9 +2420,28 @@ when running context-switch-heavy workloads when built with ...@@ -2437,9 +2420,28 @@ when running context-switch-heavy workloads when built with
<tt>CONFIG_NO_HZ_FULL=y</tt> <tt>CONFIG_NO_HZ_FULL=y</tt>
<a href="http://www.rdrop.com/users/paulmck/scalability/paper/BareMetal.2015.01.15b.pdf">did come as a surprise [PDF]</a>. <a href="http://www.rdrop.com/users/paulmck/scalability/paper/BareMetal.2015.01.15b.pdf">did come as a surprise [PDF]</a>.
RCU has made good progress towards meeting this requirement, even RCU has made good progress towards meeting this requirement, even
for context-switch-have <tt>CONFIG_NO_HZ_FULL=y</tt> workloads, for context-switch-heavy <tt>CONFIG_NO_HZ_FULL=y</tt> workloads,
but there is room for further improvement. but there is room for further improvement.
<p>
In the past, it was forbidden to disable interrupts across an
<tt>rcu_read_unlock()</tt> unless that interrupt-disabled region
of code also included the matching <tt>rcu_read_lock()</tt>.
Violating this restriction could result in deadlocks involving the
scheduler's runqueue and priority-inheritance spinlocks.
This restriction was lifted when interrupt-disabled calls to
<tt>rcu_read_unlock()</tt> started deferring the reporting of
the resulting RCU-preempt quiescent state until the end of that
interrupts-disabled region.
This deferred reporting means that the scheduler's runqueue and
priority-inheritance locks cannot be held while reporting an RCU-preempt
quiescent state, which lifts the earlier restriction, at least from
a deadlock perspective.
Unfortunately, real-time systems using RCU priority boosting may
need this restriction to remain in effect because deferred
quiescent-state reporting also defers deboosting, which in turn
degrades real-time latencies.
<h3><a name="Tracing and RCU">Tracing and RCU</a></h3> <h3><a name="Tracing and RCU">Tracing and RCU</a></h3>
<p> <p>
...@@ -2850,15 +2852,22 @@ The other four flavors are listed below, with requirements for each ...@@ -2850,15 +2852,22 @@ The other four flavors are listed below, with requirements for each
described in a separate section. described in a separate section.
<ol> <ol>
<li> <a href="#Bottom-Half Flavor">Bottom-Half Flavor</a> <li> <a href="#Bottom-Half Flavor">Bottom-Half Flavor (Historical)</a>
<li> <a href="#Sched Flavor">Sched Flavor</a> <li> <a href="#Sched Flavor">Sched Flavor (Historical)</a>
<li> <a href="#Sleepable RCU">Sleepable RCU</a> <li> <a href="#Sleepable RCU">Sleepable RCU</a>
<li> <a href="#Tasks RCU">Tasks RCU</a> <li> <a href="#Tasks RCU">Tasks RCU</a>
<li> <a href="#Waiting for Multiple Grace Periods">
Waiting for Multiple Grace Periods</a>
</ol> </ol>
<h3><a name="Bottom-Half Flavor">Bottom-Half Flavor</a></h3> <h3><a name="Bottom-Half Flavor">Bottom-Half Flavor (Historical)</a></h3>
<p>
The RCU-bh flavor of RCU has since been expressed in terms of
the other RCU flavors as part of a consolidation of the three
flavors into a single flavor.
The read-side API remains, and continues to disable softirq and to
be accounted for by lockdep.
Much of the material in this section is therefore strictly historical
in nature.
<p> <p>
The softirq-disable (AKA &ldquo;bottom-half&rdquo;, The softirq-disable (AKA &ldquo;bottom-half&rdquo;,
...@@ -2918,8 +2927,20 @@ includes ...@@ -2918,8 +2927,20 @@ includes
<tt>call_rcu_bh()</tt>, <tt>call_rcu_bh()</tt>,
<tt>rcu_barrier_bh()</tt>, and <tt>rcu_barrier_bh()</tt>, and
<tt>rcu_read_lock_bh_held()</tt>. <tt>rcu_read_lock_bh_held()</tt>.
However, the update-side APIs are now simple wrappers for other RCU
flavors, namely RCU-sched in CONFIG_PREEMPT=n kernels and RCU-preempt
otherwise.
<h3><a name="Sched Flavor">Sched Flavor (Historical)</a></h3>
<h3><a name="Sched Flavor">Sched Flavor</a></h3> <p>
The RCU-sched flavor of RCU has since been expressed in terms of
the other RCU flavors as part of a consolidation of the three
flavors into a single flavor.
The read-side API remains, and continues to disable preemption and to
be accounted for by lockdep.
Much of the material in this section is therefore strictly historical
in nature.
<p> <p>
Before preemptible RCU, waiting for an RCU grace period had the Before preemptible RCU, waiting for an RCU grace period had the
...@@ -3139,94 +3160,14 @@ The tasks-RCU API is quite compact, consisting only of ...@@ -3139,94 +3160,14 @@ The tasks-RCU API is quite compact, consisting only of
<tt>call_rcu_tasks()</tt>, <tt>call_rcu_tasks()</tt>,
<tt>synchronize_rcu_tasks()</tt>, and <tt>synchronize_rcu_tasks()</tt>, and
<tt>rcu_barrier_tasks()</tt>. <tt>rcu_barrier_tasks()</tt>.
In <tt>CONFIG_PREEMPT=n</tt> kernels, trampolines cannot be preempted,
<h3><a name="Waiting for Multiple Grace Periods"> so these APIs map to
Waiting for Multiple Grace Periods</a></h3> <tt>call_rcu()</tt>,
<tt>synchronize_rcu()</tt>, and
<p> <tt>rcu_barrier()</tt>, respectively.
Perhaps you have an RCU protected data structure that is accessed from In <tt>CONFIG_PREEMPT=y</tt> kernels, trampolines can be preempted,
RCU read-side critical sections, from softirq handlers, and from and these three APIs are therefore implemented by separate functions
hardware interrupt handlers. that check for voluntary context switches.
That is three flavors of RCU, the normal flavor, the bottom-half flavor,
and the sched flavor.
How to wait for a compound grace period?
<p>
The best approach is usually to &ldquo;just say no!&rdquo; and
insert <tt>rcu_read_lock()</tt> and <tt>rcu_read_unlock()</tt>
around each RCU read-side critical section, regardless of what
environment it happens to be in.
But suppose that some of the RCU read-side critical sections are
on extremely hot code paths, and that use of <tt>CONFIG_PREEMPT=n</tt>
is not a viable option, so that <tt>rcu_read_lock()</tt> and
<tt>rcu_read_unlock()</tt> are not free.
What then?
<p>
You <i>could</i> wait on all three grace periods in succession, as follows:
<blockquote>
<pre>
1 synchronize_rcu();
2 synchronize_rcu_bh();
3 synchronize_sched();
</pre>
</blockquote>
<p>
This works, but triples the update-side latency penalty.
In cases where this is not acceptable, <tt>synchronize_rcu_mult()</tt>
may be used to wait on all three flavors of grace period concurrently:
<blockquote>
<pre>
1 synchronize_rcu_mult(call_rcu, call_rcu_bh, call_rcu_sched);
</pre>
</blockquote>
<p>
But what if it is necessary to also wait on SRCU?
This can be done as follows:
<blockquote>
<pre>
1 static void call_my_srcu(struct rcu_head *head,
2 void (*func)(struct rcu_head *head))
3 {
4 call_srcu(&amp;my_srcu, head, func);
5 }
6
7 synchronize_rcu_mult(call_rcu, call_rcu_bh, call_rcu_sched, call_my_srcu);
</pre>
</blockquote>
<p>
If you needed to wait on multiple different flavors of SRCU
(but why???), you would need to create a wrapper function resembling
<tt>call_my_srcu()</tt> for each SRCU flavor.
<table>
<tr><th>&nbsp;</th></tr>
<tr><th align="left">Quick Quiz:</th></tr>
<tr><td>
But what if I need to wait for multiple RCU flavors, but I also need
the grace periods to be expedited?
</td></tr>
<tr><th align="left">Answer:</th></tr>
<tr><td bgcolor="#ffffff"><font color="ffffff">
If you are using expedited grace periods, there should be less penalty
for waiting on them in succession.
But if that is nevertheless a problem, you can use workqueues
or multiple kthreads to wait on the various expedited grace
periods concurrently.
</font></td></tr>
<tr><td>&nbsp;</td></tr>
</table>
<p>
Again, it is usually better to adjust the RCU read-side critical sections
to use a single flavor of RCU, but when this is not feasible, you can use
<tt>synchronize_rcu_mult()</tt>.
<h2><a name="Possible Future Changes">Possible Future Changes</a></h2> <h2><a name="Possible Future Changes">Possible Future Changes</a></h2>
...@@ -3237,12 +3178,6 @@ If this becomes a serious problem, it will be necessary to rework the ...@@ -3237,12 +3178,6 @@ If this becomes a serious problem, it will be necessary to rework the
grace-period state machine so as to avoid the need for the additional grace-period state machine so as to avoid the need for the additional
latency. latency.
<p>
Expedited grace periods scan the CPUs, so their latency and overhead
increases with increasing numbers of CPUs.
If this becomes a serious problem on large systems, it will be necessary
to do some redesign to avoid this scalability problem.
<p> <p>
RCU disables CPU hotplug in a few places, perhaps most notably in the RCU disables CPU hotplug in a few places, perhaps most notably in the
<tt>rcu_barrier()</tt> operations. <tt>rcu_barrier()</tt> operations.
...@@ -3287,11 +3222,6 @@ Please note that arrangements that require RCU to remap CPU numbers will ...@@ -3287,11 +3222,6 @@ Please note that arrangements that require RCU to remap CPU numbers will
require extremely good demonstration of need and full exploration of require extremely good demonstration of need and full exploration of
alternatives. alternatives.
<p>
There is an embarrassingly large number of flavors of RCU, and this
number has been increasing over time.
Perhaps it will be possible to combine some at some future date.
<p> <p>
RCU's various kthreads are reasonably recent additions. RCU's various kthreads are reasonably recent additions.
It is quite likely that adjustments will be required to more gracefully It is quite likely that adjustments will be required to more gracefully
......
...@@ -16,12 +16,9 @@ o A CPU looping in an RCU read-side critical section. ...@@ -16,12 +16,9 @@ o A CPU looping in an RCU read-side critical section.
o A CPU looping with interrupts disabled. o A CPU looping with interrupts disabled.
o A CPU looping with preemption disabled. This condition can o A CPU looping with preemption disabled.
result in RCU-sched stalls and, if ksoftirqd is in use, RCU-bh
stalls.
o A CPU looping with bottom halves disabled. This condition can o A CPU looping with bottom halves disabled.
result in RCU-sched and RCU-bh stalls.
o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel
without invoking schedule(). If the looping in the kernel is without invoking schedule(). If the looping in the kernel is
...@@ -87,9 +84,9 @@ o A hardware failure. This is quite unlikely, but has occurred ...@@ -87,9 +84,9 @@ o A hardware failure. This is quite unlikely, but has occurred
This resulted in a series of RCU CPU stall warnings, eventually This resulted in a series of RCU CPU stall warnings, eventually
leading the realization that the CPU had failed. leading the realization that the CPU had failed.
The RCU, RCU-sched, RCU-bh, and RCU-tasks implementations have CPU stall The RCU, RCU-sched, and RCU-tasks implementations have CPU stall warning.
warning. Note that SRCU does -not- have CPU stall warnings. Please note Note that SRCU does -not- have CPU stall warnings. Please note that
that RCU only detects CPU stalls when there is a grace period in progress. RCU only detects CPU stalls when there is a grace period in progress.
No grace period, no CPU stall warnings. No grace period, no CPU stall warnings.
To diagnose the cause of the stall, inspect the stack traces. To diagnose the cause of the stall, inspect the stack traces.
......
...@@ -934,7 +934,8 @@ c. Do you need to treat NMI handlers, hardirq handlers, ...@@ -934,7 +934,8 @@ c. Do you need to treat NMI handlers, hardirq handlers,
d. Do you need RCU grace periods to complete even in the face d. Do you need RCU grace periods to complete even in the face
of softirq monopolization of one or more of the CPUs? For of softirq monopolization of one or more of the CPUs? For
example, is your code subject to network-based denial-of-service example, is your code subject to network-based denial-of-service
attacks? If so, you need RCU-bh. attacks? If so, you should disable softirq across your readers,
for example, by using rcu_read_lock_bh().
e. Is your workload too update-intensive for normal use of e. Is your workload too update-intensive for normal use of
RCU, but inappropriate for other synchronization mechanisms? RCU, but inappropriate for other synchronization mechanisms?
......
...@@ -3540,14 +3540,14 @@ ...@@ -3540,14 +3540,14 @@
In kernels built with CONFIG_RCU_NOCB_CPU=y, set In kernels built with CONFIG_RCU_NOCB_CPU=y, set
the specified list of CPUs to be no-callback CPUs. the specified list of CPUs to be no-callback CPUs.
Invocation of these CPUs' RCU callbacks will Invocation of these CPUs' RCU callbacks will be
be offloaded to "rcuox/N" kthreads created for offloaded to "rcuox/N" kthreads created for that
that purpose, where "x" is "b" for RCU-bh, "p" purpose, where "x" is "p" for RCU-preempt, and
for RCU-preempt, and "s" for RCU-sched, and "N" "s" for RCU-sched, and "N" is the CPU number.
is the CPU number. This reduces OS jitter on the This reduces OS jitter on the offloaded CPUs,
offloaded CPUs, which can be useful for HPC and which can be useful for HPC and real-time
real-time workloads. It can also improve energy workloads. It can also improve energy efficiency
efficiency for asymmetric multiprocessors. for asymmetric multiprocessors.
rcu_nocb_poll [KNL] rcu_nocb_poll [KNL]
Rather than requiring that offloaded CPUs Rather than requiring that offloaded CPUs
...@@ -3601,7 +3601,14 @@ ...@@ -3601,7 +3601,14 @@
Set required age in jiffies for a Set required age in jiffies for a
given grace period before RCU starts given grace period before RCU starts
soliciting quiescent-state help from soliciting quiescent-state help from
rcu_note_context_switch(). rcu_note_context_switch(). If not specified, the
kernel will calculate a value based on the most
recent settings of rcutree.jiffies_till_first_fqs
and rcutree.jiffies_till_next_fqs.
This calculated value may be viewed in
rcutree.jiffies_to_sched_qs. Any attempt to
set rcutree.jiffies_to_sched_qs will be
cheerfully overwritten.
rcutree.jiffies_till_first_fqs= [KNL] rcutree.jiffies_till_first_fqs= [KNL]
Set delay from grace-period initialization to Set delay from grace-period initialization to
...@@ -3869,12 +3876,6 @@ ...@@ -3869,12 +3876,6 @@
rcupdate.rcu_self_test= [KNL] rcupdate.rcu_self_test= [KNL]
Run the RCU early boot self tests Run the RCU early boot self tests
rcupdate.rcu_self_test_bh= [KNL]
Run the RCU bh early boot self tests
rcupdate.rcu_self_test_sched= [KNL]
Run the RCU sched early boot self tests
rdinit= [KNL] rdinit= [KNL]
Format: <full_path> Format: <full_path>
Run specified binary instead of /init from the ramdisk, Run specified binary instead of /init from the ramdisk,
......
...@@ -321,7 +321,7 @@ To reduce its OS jitter, do at least one of the following: ...@@ -321,7 +321,7 @@ To reduce its OS jitter, do at least one of the following:
to do. to do.
Name: Name:
rcuob/%d, rcuop/%d, and rcuos/%d rcuop/%d and rcuos/%d
Purpose: Purpose:
Offload RCU callbacks from the corresponding CPU. Offload RCU callbacks from the corresponding CPU.
......
...@@ -182,7 +182,7 @@ static inline void list_replace_rcu(struct list_head *old, ...@@ -182,7 +182,7 @@ static inline void list_replace_rcu(struct list_head *old,
* @list: the RCU-protected list to splice * @list: the RCU-protected list to splice
* @prev: points to the last element of the existing list * @prev: points to the last element of the existing list
* @next: points to the first element of the existing list * @next: points to the first element of the existing list
* @sync: function to sync: synchronize_rcu(), synchronize_sched(), ... * @sync: synchronize_rcu, synchronize_rcu_expedited, ...
* *
* The list pointed to by @prev and @next can be RCU-read traversed * The list pointed to by @prev and @next can be RCU-read traversed
* concurrently with this function. * concurrently with this function.
...@@ -240,7 +240,7 @@ static inline void __list_splice_init_rcu(struct list_head *list, ...@@ -240,7 +240,7 @@ static inline void __list_splice_init_rcu(struct list_head *list,
* designed for stacks. * designed for stacks.
* @list: the RCU-protected list to splice * @list: the RCU-protected list to splice
* @head: the place in the existing list to splice the first list into * @head: the place in the existing list to splice the first list into
* @sync: function to sync: synchronize_rcu(), synchronize_sched(), ... * @sync: synchronize_rcu, synchronize_rcu_expedited, ...
*/ */
static inline void list_splice_init_rcu(struct list_head *list, static inline void list_splice_init_rcu(struct list_head *list,
struct list_head *head, struct list_head *head,
...@@ -255,7 +255,7 @@ static inline void list_splice_init_rcu(struct list_head *list, ...@@ -255,7 +255,7 @@ static inline void list_splice_init_rcu(struct list_head *list,
* list, designed for queues. * list, designed for queues.
* @list: the RCU-protected list to splice * @list: the RCU-protected list to splice
* @head: the place in the existing list to splice the first list into * @head: the place in the existing list to splice the first list into
* @sync: function to sync: synchronize_rcu(), synchronize_sched(), ... * @sync: synchronize_rcu, synchronize_rcu_expedited, ...
*/ */
static inline void list_splice_tail_init_rcu(struct list_head *list, static inline void list_splice_tail_init_rcu(struct list_head *list,
struct list_head *head, struct list_head *head,
...@@ -359,13 +359,12 @@ static inline void list_splice_tail_init_rcu(struct list_head *list, ...@@ -359,13 +359,12 @@ static inline void list_splice_tail_init_rcu(struct list_head *list,
* @type: the type of the struct this is embedded in. * @type: the type of the struct this is embedded in.
* @member: the name of the list_head within the struct. * @member: the name of the list_head within the struct.
* *
* This primitive may safely run concurrently with the _rcu list-mutation * This primitive may safely run concurrently with the _rcu
* primitives such as list_add_rcu(), but requires some implicit RCU * list-mutation primitives such as list_add_rcu(), but requires some
* read-side guarding. One example is running within a special * implicit RCU read-side guarding. One example is running within a special
* exception-time environment where preemption is disabled and where * exception-time environment where preemption is disabled and where lockdep
* lockdep cannot be invoked (in which case updaters must use RCU-sched, * cannot be invoked. Another example is when items are added to the list,
* as in synchronize_sched(), call_rcu_sched(), and friends). Another * but never deleted.
* example is when items are added to the list, but never deleted.
*/ */
#define list_entry_lockless(ptr, type, member) \ #define list_entry_lockless(ptr, type, member) \
container_of((typeof(ptr))READ_ONCE(ptr), type, member) container_of((typeof(ptr))READ_ONCE(ptr), type, member)
...@@ -376,13 +375,12 @@ static inline void list_splice_tail_init_rcu(struct list_head *list, ...@@ -376,13 +375,12 @@ static inline void list_splice_tail_init_rcu(struct list_head *list,
* @head: the head for your list. * @head: the head for your list.
* @member: the name of the list_struct within the struct. * @member: the name of the list_struct within the struct.
* *
* This primitive may safely run concurrently with the _rcu list-mutation * This primitive may safely run concurrently with the _rcu
* primitives such as list_add_rcu(), but requires some implicit RCU * list-mutation primitives such as list_add_rcu(), but requires some
* read-side guarding. One example is running within a special * implicit RCU read-side guarding. One example is running within a special
* exception-time environment where preemption is disabled and where * exception-time environment where preemption is disabled and where lockdep
* lockdep cannot be invoked (in which case updaters must use RCU-sched, * cannot be invoked. Another example is when items are added to the list,
* as in synchronize_sched(), call_rcu_sched(), and friends). Another * but never deleted.
* example is when items are added to the list, but never deleted.
*/ */
#define list_for_each_entry_lockless(pos, head, member) \ #define list_for_each_entry_lockless(pos, head, member) \
for (pos = list_entry_lockless((head)->next, typeof(*pos), member); \ for (pos = list_entry_lockless((head)->next, typeof(*pos), member); \
......
...@@ -48,23 +48,14 @@ ...@@ -48,23 +48,14 @@
#define ulong2long(a) (*(long *)(&(a))) #define ulong2long(a) (*(long *)(&(a)))
/* Exported common interfaces */ /* Exported common interfaces */
#ifdef CONFIG_PREEMPT_RCU
void call_rcu(struct rcu_head *head, rcu_callback_t func); void call_rcu(struct rcu_head *head, rcu_callback_t func);
#else /* #ifdef CONFIG_PREEMPT_RCU */
#define call_rcu call_rcu_sched
#endif /* #else #ifdef CONFIG_PREEMPT_RCU */
void call_rcu_bh(struct rcu_head *head, rcu_callback_t func);
void call_rcu_sched(struct rcu_head *head, rcu_callback_t func);
void synchronize_sched(void);
void rcu_barrier_tasks(void); void rcu_barrier_tasks(void);
void synchronize_rcu(void);
#ifdef CONFIG_PREEMPT_RCU #ifdef CONFIG_PREEMPT_RCU
void __rcu_read_lock(void); void __rcu_read_lock(void);
void __rcu_read_unlock(void); void __rcu_read_unlock(void);
void synchronize_rcu(void);
/* /*
* Defined as a macro as it is a very low level header included from * Defined as a macro as it is a very low level header included from
...@@ -88,11 +79,6 @@ static inline void __rcu_read_unlock(void) ...@@ -88,11 +79,6 @@ static inline void __rcu_read_unlock(void)
preempt_enable(); preempt_enable();
} }
static inline void synchronize_rcu(void)
{
synchronize_sched();
}
static inline int rcu_preempt_depth(void) static inline int rcu_preempt_depth(void)
{ {
return 0; return 0;
...@@ -103,8 +89,6 @@ static inline int rcu_preempt_depth(void) ...@@ -103,8 +89,6 @@ static inline int rcu_preempt_depth(void)
/* Internal to kernel */ /* Internal to kernel */
void rcu_init(void); void rcu_init(void);
extern int rcu_scheduler_active __read_mostly; extern int rcu_scheduler_active __read_mostly;
void rcu_sched_qs(void);
void rcu_bh_qs(void);
void rcu_check_callbacks(int user); void rcu_check_callbacks(int user);
void rcu_report_dead(unsigned int cpu); void rcu_report_dead(unsigned int cpu);
void rcutree_migrate_callbacks(int cpu); void rcutree_migrate_callbacks(int cpu);
...@@ -135,11 +119,10 @@ static inline void rcu_init_nohz(void) { } ...@@ -135,11 +119,10 @@ static inline void rcu_init_nohz(void) { }
* RCU_NONIDLE - Indicate idle-loop code that needs RCU readers * RCU_NONIDLE - Indicate idle-loop code that needs RCU readers
* @a: Code that RCU needs to pay attention to. * @a: Code that RCU needs to pay attention to.
* *
* RCU, RCU-bh, and RCU-sched read-side critical sections are forbidden * RCU read-side critical sections are forbidden in the inner idle loop,
* in the inner idle loop, that is, between the rcu_idle_enter() and * that is, between the rcu_idle_enter() and the rcu_idle_exit() -- RCU
* the rcu_idle_exit() -- RCU will happily ignore any such read-side * will happily ignore any such read-side critical sections. However,
* critical sections. However, things like powertop need tracepoints * things like powertop need tracepoints in the inner idle loop.
* in the inner idle loop.
* *
* This macro provides the way out: RCU_NONIDLE(do_something_with_RCU()) * This macro provides the way out: RCU_NONIDLE(do_something_with_RCU())
* will tell RCU that it needs to pay attention, invoke its argument * will tell RCU that it needs to pay attention, invoke its argument
...@@ -167,20 +150,16 @@ static inline void rcu_init_nohz(void) { } ...@@ -167,20 +150,16 @@ static inline void rcu_init_nohz(void) { }
if (READ_ONCE((t)->rcu_tasks_holdout)) \ if (READ_ONCE((t)->rcu_tasks_holdout)) \
WRITE_ONCE((t)->rcu_tasks_holdout, false); \ WRITE_ONCE((t)->rcu_tasks_holdout, false); \
} while (0) } while (0)
#define rcu_note_voluntary_context_switch(t) \ #define rcu_note_voluntary_context_switch(t) rcu_tasks_qs(t)
do { \
rcu_all_qs(); \
rcu_tasks_qs(t); \
} while (0)
void call_rcu_tasks(struct rcu_head *head, rcu_callback_t func); void call_rcu_tasks(struct rcu_head *head, rcu_callback_t func);
void synchronize_rcu_tasks(void); void synchronize_rcu_tasks(void);
void exit_tasks_rcu_start(void); void exit_tasks_rcu_start(void);
void exit_tasks_rcu_finish(void); void exit_tasks_rcu_finish(void);
#else /* #ifdef CONFIG_TASKS_RCU */ #else /* #ifdef CONFIG_TASKS_RCU */
#define rcu_tasks_qs(t) do { } while (0) #define rcu_tasks_qs(t) do { } while (0)
#define rcu_note_voluntary_context_switch(t) rcu_all_qs() #define rcu_note_voluntary_context_switch(t) do { } while (0)
#define call_rcu_tasks call_rcu_sched #define call_rcu_tasks call_rcu
#define synchronize_rcu_tasks synchronize_sched #define synchronize_rcu_tasks synchronize_rcu
static inline void exit_tasks_rcu_start(void) { } static inline void exit_tasks_rcu_start(void) { }
static inline void exit_tasks_rcu_finish(void) { } static inline void exit_tasks_rcu_finish(void) { }
#endif /* #else #ifdef CONFIG_TASKS_RCU */ #endif /* #else #ifdef CONFIG_TASKS_RCU */
...@@ -325,9 +304,8 @@ static inline void rcu_preempt_sleep_check(void) { } ...@@ -325,9 +304,8 @@ static inline void rcu_preempt_sleep_check(void) { }
* Helper functions for rcu_dereference_check(), rcu_dereference_protected() * Helper functions for rcu_dereference_check(), rcu_dereference_protected()
* and rcu_assign_pointer(). Some of these could be folded into their * and rcu_assign_pointer(). Some of these could be folded into their
* callers, but they are left separate in order to ease introduction of * callers, but they are left separate in order to ease introduction of
* multiple flavors of pointers to match the multiple flavors of RCU * multiple pointers markings to match different RCU implementations
* (e.g., __rcu_bh, * __rcu_sched, and __srcu), should this make sense in * (e.g., __srcu), should this make sense in the future.
* the future.
*/ */
#ifdef __CHECKER__ #ifdef __CHECKER__
...@@ -686,14 +664,9 @@ static inline void rcu_read_unlock(void) ...@@ -686,14 +664,9 @@ static inline void rcu_read_unlock(void)
/** /**
* rcu_read_lock_bh() - mark the beginning of an RCU-bh critical section * rcu_read_lock_bh() - mark the beginning of an RCU-bh critical section
* *
* This is equivalent of rcu_read_lock(), but to be used when updates * This is equivalent of rcu_read_lock(), but also disables softirqs.
* are being done using call_rcu_bh() or synchronize_rcu_bh(). Since * Note that anything else that disables softirqs can also serve as
* both call_rcu_bh() and synchronize_rcu_bh() consider completion of a * an RCU read-side critical section.
* softirq handler to be a quiescent state, a process in RCU read-side
* critical section must be protected by disabling softirqs. Read-side
* critical sections in interrupt context can use just rcu_read_lock(),
* though this should at least be commented to avoid confusing people
* reading the code.
* *
* Note that rcu_read_lock_bh() and the matching rcu_read_unlock_bh() * Note that rcu_read_lock_bh() and the matching rcu_read_unlock_bh()
* must occur in the same context, for example, it is illegal to invoke * must occur in the same context, for example, it is illegal to invoke
...@@ -726,10 +699,9 @@ static inline void rcu_read_unlock_bh(void) ...@@ -726,10 +699,9 @@ static inline void rcu_read_unlock_bh(void)
/** /**
* rcu_read_lock_sched() - mark the beginning of a RCU-sched critical section * rcu_read_lock_sched() - mark the beginning of a RCU-sched critical section
* *
* This is equivalent of rcu_read_lock(), but to be used when updates * This is equivalent of rcu_read_lock(), but disables preemption.
* are being done using call_rcu_sched() or synchronize_rcu_sched(). * Read-side critical sections can also be introduced by anything else
* Read-side critical sections can also be introduced by anything that * that disables preemption, including local_irq_disable() and friends.
* disables preemption, including local_irq_disable() and friends.
* *
* Note that rcu_read_lock_sched() and the matching rcu_read_unlock_sched() * Note that rcu_read_lock_sched() and the matching rcu_read_unlock_sched()
* must occur in the same context, for example, it is illegal to invoke * must occur in the same context, for example, it is illegal to invoke
...@@ -885,4 +857,96 @@ static inline notrace void rcu_read_unlock_sched_notrace(void) ...@@ -885,4 +857,96 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
#endif /* #else #ifdef CONFIG_ARCH_WEAK_RELEASE_ACQUIRE */ #endif /* #else #ifdef CONFIG_ARCH_WEAK_RELEASE_ACQUIRE */
/* Has the specified rcu_head structure been handed to call_rcu()? */
/*
* rcu_head_init - Initialize rcu_head for rcu_head_after_call_rcu()
* @rhp: The rcu_head structure to initialize.
*
* If you intend to invoke rcu_head_after_call_rcu() to test whether a
* given rcu_head structure has already been passed to call_rcu(), then
* you must also invoke this rcu_head_init() function on it just after
* allocating that structure. Calls to this function must not race with
* calls to call_rcu(), rcu_head_after_call_rcu(), or callback invocation.
*/
static inline void rcu_head_init(struct rcu_head *rhp)
{
rhp->func = (rcu_callback_t)~0L;
}
/*
* rcu_head_after_call_rcu - Has this rcu_head been passed to call_rcu()?
* @rhp: The rcu_head structure to test.
* @func: The function passed to call_rcu() along with @rhp.
*
* Returns @true if the @rhp has been passed to call_rcu() with @func,
* and @false otherwise. Emits a warning in any other case, including
* the case where @rhp has already been invoked after a grace period.
* Calls to this function must not race with callback invocation. One way
* to avoid such races is to enclose the call to rcu_head_after_call_rcu()
* in an RCU read-side critical section that includes a read-side fetch
* of the pointer to the structure containing @rhp.
*/
static inline bool
rcu_head_after_call_rcu(struct rcu_head *rhp, rcu_callback_t f)
{
if (READ_ONCE(rhp->func) == f)
return true;
WARN_ON_ONCE(READ_ONCE(rhp->func) != (rcu_callback_t)~0L);
return false;
}
/* Transitional pre-consolidation compatibility definitions. */
static inline void synchronize_rcu_bh(void)
{
synchronize_rcu();
}
static inline void synchronize_rcu_bh_expedited(void)
{
synchronize_rcu_expedited();
}
static inline void call_rcu_bh(struct rcu_head *head, rcu_callback_t func)
{
call_rcu(head, func);
}
static inline void rcu_barrier_bh(void)
{
rcu_barrier();
}
static inline void synchronize_sched(void)
{
synchronize_rcu();
}
static inline void synchronize_sched_expedited(void)
{
synchronize_rcu_expedited();
}
static inline void call_rcu_sched(struct rcu_head *head, rcu_callback_t func)
{
call_rcu(head, func);
}
static inline void rcu_barrier_sched(void)
{
rcu_barrier();
}
static inline unsigned long get_state_synchronize_sched(void)
{
return get_state_synchronize_rcu();
}
static inline void cond_synchronize_sched(unsigned long oldstate)
{
cond_synchronize_rcu(oldstate);
}
#endif /* __LINUX_RCUPDATE_H */ #endif /* __LINUX_RCUPDATE_H */
...@@ -33,17 +33,17 @@ do { \ ...@@ -33,17 +33,17 @@ do { \
/** /**
* synchronize_rcu_mult - Wait concurrently for multiple grace periods * synchronize_rcu_mult - Wait concurrently for multiple grace periods
* @...: List of call_rcu() functions for the flavors to wait on. * @...: List of call_rcu() functions for different grace periods to wait on
* *
* This macro waits concurrently for multiple flavors of RCU grace periods. * This macro waits concurrently for multiple types of RCU grace periods.
* For example, synchronize_rcu_mult(call_rcu, call_rcu_bh) would wait * For example, synchronize_rcu_mult(call_rcu, call_rcu_tasks) would wait
* on concurrent RCU and RCU-bh grace periods. Waiting on a give SRCU * on concurrent RCU and RCU-tasks grace periods. Waiting on a give SRCU
* domain requires you to write a wrapper function for that SRCU domain's * domain requires you to write a wrapper function for that SRCU domain's
* call_srcu() function, supplying the corresponding srcu_struct. * call_srcu() function, supplying the corresponding srcu_struct.
* *
* If Tiny RCU, tell _wait_rcu_gp() not to bother waiting for RCU * If Tiny RCU, tell _wait_rcu_gp() does not bother waiting for RCU,
* or RCU-bh, given that anywhere synchronize_rcu_mult() can be called * given that anywhere synchronize_rcu_mult() can be called is automatically
* is automatically a grace period. * a grace period.
*/ */
#define synchronize_rcu_mult(...) \ #define synchronize_rcu_mult(...) \
_wait_rcu_gp(IS_ENABLED(CONFIG_TINY_RCU), __VA_ARGS__) _wait_rcu_gp(IS_ENABLED(CONFIG_TINY_RCU), __VA_ARGS__)
......
...@@ -27,12 +27,6 @@ ...@@ -27,12 +27,6 @@
#include <linux/ktime.h> #include <linux/ktime.h>
struct rcu_dynticks;
static inline int rcu_dynticks_snap(struct rcu_dynticks *rdtp)
{
return 0;
}
/* Never flag non-existent other CPUs! */ /* Never flag non-existent other CPUs! */
static inline bool rcu_eqs_special_set(int cpu) { return false; } static inline bool rcu_eqs_special_set(int cpu) { return false; }
...@@ -46,53 +40,28 @@ static inline void cond_synchronize_rcu(unsigned long oldstate) ...@@ -46,53 +40,28 @@ static inline void cond_synchronize_rcu(unsigned long oldstate)
might_sleep(); might_sleep();
} }
static inline unsigned long get_state_synchronize_sched(void) extern void rcu_barrier(void);
{
return 0;
}
static inline void cond_synchronize_sched(unsigned long oldstate)
{
might_sleep();
}
extern void rcu_barrier_bh(void);
extern void rcu_barrier_sched(void);
static inline void synchronize_rcu_expedited(void) static inline void synchronize_rcu_expedited(void)
{ {
synchronize_sched(); /* Only one CPU, so pretty fast anyway!!! */ synchronize_rcu();
} }
static inline void rcu_barrier(void) static inline void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
{ {
rcu_barrier_sched(); /* Only one CPU, so only one list of callbacks! */ call_rcu(head, func);
}
static inline void synchronize_rcu_bh(void)
{
synchronize_sched();
}
static inline void synchronize_rcu_bh_expedited(void)
{
synchronize_sched();
} }
static inline void synchronize_sched_expedited(void) void rcu_qs(void);
{
synchronize_sched();
}
static inline void kfree_call_rcu(struct rcu_head *head, static inline void rcu_softirq_qs(void)
rcu_callback_t func)
{ {
call_rcu(head, func); rcu_qs();
} }
#define rcu_note_context_switch(preempt) \ #define rcu_note_context_switch(preempt) \
do { \ do { \
rcu_sched_qs(); \ rcu_qs(); \
rcu_tasks_qs(current); \ rcu_tasks_qs(current); \
} while (0) } while (0)
...@@ -108,6 +77,7 @@ static inline int rcu_needs_cpu(u64 basemono, u64 *nextevt) ...@@ -108,6 +77,7 @@ static inline int rcu_needs_cpu(u64 basemono, u64 *nextevt)
*/ */
static inline void rcu_virt_note_context_switch(int cpu) { } static inline void rcu_virt_note_context_switch(int cpu) { }
static inline void rcu_cpu_stall_reset(void) { } static inline void rcu_cpu_stall_reset(void) { }
static inline int rcu_jiffies_till_stall_check(void) { return 21 * HZ; }
static inline void rcu_idle_enter(void) { } static inline void rcu_idle_enter(void) { }
static inline void rcu_idle_exit(void) { } static inline void rcu_idle_exit(void) { }
static inline void rcu_irq_enter(void) { } static inline void rcu_irq_enter(void) { }
...@@ -115,6 +85,11 @@ static inline void rcu_irq_exit_irqson(void) { } ...@@ -115,6 +85,11 @@ static inline void rcu_irq_exit_irqson(void) { }
static inline void rcu_irq_enter_irqson(void) { } static inline void rcu_irq_enter_irqson(void) { }
static inline void rcu_irq_exit(void) { } static inline void rcu_irq_exit(void) { }
static inline void exit_rcu(void) { } static inline void exit_rcu(void) { }
static inline bool rcu_preempt_need_deferred_qs(struct task_struct *t)
{
return false;
}
static inline void rcu_preempt_deferred_qs(struct task_struct *t) { }
#ifdef CONFIG_SRCU #ifdef CONFIG_SRCU
void rcu_scheduler_starting(void); void rcu_scheduler_starting(void);
#else /* #ifndef CONFIG_SRCU */ #else /* #ifndef CONFIG_SRCU */
......
...@@ -30,6 +30,7 @@ ...@@ -30,6 +30,7 @@
#ifndef __LINUX_RCUTREE_H #ifndef __LINUX_RCUTREE_H
#define __LINUX_RCUTREE_H #define __LINUX_RCUTREE_H
void rcu_softirq_qs(void);
void rcu_note_context_switch(bool preempt); void rcu_note_context_switch(bool preempt);
int rcu_needs_cpu(u64 basem, u64 *nextevt); int rcu_needs_cpu(u64 basem, u64 *nextevt);
void rcu_cpu_stall_reset(void); void rcu_cpu_stall_reset(void);
...@@ -44,41 +45,13 @@ static inline void rcu_virt_note_context_switch(int cpu) ...@@ -44,41 +45,13 @@ static inline void rcu_virt_note_context_switch(int cpu)
rcu_note_context_switch(false); rcu_note_context_switch(false);
} }
void synchronize_rcu_bh(void);
void synchronize_sched_expedited(void);
void synchronize_rcu_expedited(void); void synchronize_rcu_expedited(void);
void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func); void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
/**
* synchronize_rcu_bh_expedited - Brute-force RCU-bh grace period
*
* Wait for an RCU-bh grace period to elapse, but use a "big hammer"
* approach to force the grace period to end quickly. This consumes
* significant time on all CPUs and is unfriendly to real-time workloads,
* so is thus not recommended for any sort of common-case code. In fact,
* if you are using synchronize_rcu_bh_expedited() in a loop, please
* restructure your code to batch your updates, and then use a single
* synchronize_rcu_bh() instead.
*
* Note that it is illegal to call this function while holding any lock
* that is acquired by a CPU-hotplug notifier. And yes, it is also illegal
* to call this function from a CPU-hotplug notifier. Failing to observe
* these restriction will result in deadlock.
*/
static inline void synchronize_rcu_bh_expedited(void)
{
synchronize_sched_expedited();
}
void rcu_barrier(void); void rcu_barrier(void);
void rcu_barrier_bh(void);
void rcu_barrier_sched(void);
bool rcu_eqs_special_set(int cpu); bool rcu_eqs_special_set(int cpu);
unsigned long get_state_synchronize_rcu(void); unsigned long get_state_synchronize_rcu(void);
void cond_synchronize_rcu(unsigned long oldstate); void cond_synchronize_rcu(unsigned long oldstate);
unsigned long get_state_synchronize_sched(void);
void cond_synchronize_sched(unsigned long oldstate);
void rcu_idle_enter(void); void rcu_idle_enter(void);
void rcu_idle_exit(void); void rcu_idle_exit(void);
...@@ -93,7 +66,9 @@ void rcu_scheduler_starting(void); ...@@ -93,7 +66,9 @@ void rcu_scheduler_starting(void);
extern int rcu_scheduler_active __read_mostly; extern int rcu_scheduler_active __read_mostly;
void rcu_end_inkernel_boot(void); void rcu_end_inkernel_boot(void);
bool rcu_is_watching(void); bool rcu_is_watching(void);
#ifndef CONFIG_PREEMPT
void rcu_all_qs(void); void rcu_all_qs(void);
#endif
/* RCUtree hotplug events */ /* RCUtree hotplug events */
int rcutree_prepare_cpu(unsigned int cpu); int rcutree_prepare_cpu(unsigned int cpu);
......
...@@ -571,12 +571,8 @@ union rcu_special { ...@@ -571,12 +571,8 @@ union rcu_special {
struct { struct {
u8 blocked; u8 blocked;
u8 need_qs; u8 need_qs;
u8 exp_need_qs;
/* Otherwise the compiler can store garbage here: */
u8 pad;
} b; /* Bits. */ } b; /* Bits. */
u32 s; /* Set of bits. */ u16 s; /* Set of bits. */
}; };
enum perf_event_task_context { enum perf_event_task_context {
......
...@@ -105,12 +105,13 @@ struct srcu_struct { ...@@ -105,12 +105,13 @@ struct srcu_struct {
#define SRCU_STATE_SCAN2 2 #define SRCU_STATE_SCAN2 2
#define __SRCU_STRUCT_INIT(name, pcpu_name) \ #define __SRCU_STRUCT_INIT(name, pcpu_name) \
{ \ { \
.sda = &pcpu_name, \ .sda = &pcpu_name, \
.lock = __SPIN_LOCK_UNLOCKED(name.lock), \ .lock = __SPIN_LOCK_UNLOCKED(name.lock), \
.srcu_gp_seq_needed = 0 - 1, \ .srcu_gp_seq_needed = -1UL, \
__SRCU_DEP_MAP_INIT(name) \ .work = __DELAYED_WORK_INITIALIZER(name.work, NULL, 0), \
} __SRCU_DEP_MAP_INIT(name) \
}
/* /*
* Define and initialize a srcu struct at build time. * Define and initialize a srcu struct at build time.
......
...@@ -77,7 +77,7 @@ void torture_shutdown_absorb(const char *title); ...@@ -77,7 +77,7 @@ void torture_shutdown_absorb(const char *title);
int torture_shutdown_init(int ssecs, void (*cleanup)(void)); int torture_shutdown_init(int ssecs, void (*cleanup)(void));
/* Task stuttering, which forces load/no-load transitions. */ /* Task stuttering, which forces load/no-load transitions. */
void stutter_wait(const char *title); bool stutter_wait(const char *title);
int torture_stutter_init(int s); int torture_stutter_init(int s);
/* Initialization and cleanup. */ /* Initialization and cleanup. */
......
...@@ -393,9 +393,8 @@ TRACE_EVENT(rcu_quiescent_state_report, ...@@ -393,9 +393,8 @@ TRACE_EVENT(rcu_quiescent_state_report,
* Tracepoint for quiescent states detected by force_quiescent_state(). * Tracepoint for quiescent states detected by force_quiescent_state().
* These trace events include the type of RCU, the grace-period number * These trace events include the type of RCU, the grace-period number
* that was blocked by the CPU, the CPU itself, and the type of quiescent * that was blocked by the CPU, the CPU itself, and the type of quiescent
* state, which can be "dti" for dyntick-idle mode, "kick" when kicking * state, which can be "dti" for dyntick-idle mode or "kick" when kicking
* a CPU that has been in dyntick-idle mode for too long, or "rqc" if the * a CPU that has been in dyntick-idle mode for too long.
* CPU got a quiescent state via its rcu_qs_ctr.
*/ */
TRACE_EVENT(rcu_fqs, TRACE_EVENT(rcu_fqs,
...@@ -705,20 +704,20 @@ TRACE_EVENT(rcu_torture_read, ...@@ -705,20 +704,20 @@ TRACE_EVENT(rcu_torture_read,
); );
/* /*
* Tracepoint for _rcu_barrier() execution. The string "s" describes * Tracepoint for rcu_barrier() execution. The string "s" describes
* the _rcu_barrier phase: * the rcu_barrier phase:
* "Begin": _rcu_barrier() started. * "Begin": rcu_barrier() started.
* "EarlyExit": _rcu_barrier() piggybacked, thus early exit. * "EarlyExit": rcu_barrier() piggybacked, thus early exit.
* "Inc1": _rcu_barrier() piggyback check counter incremented. * "Inc1": rcu_barrier() piggyback check counter incremented.
* "OfflineNoCB": _rcu_barrier() found callback on never-online CPU * "OfflineNoCB": rcu_barrier() found callback on never-online CPU
* "OnlineNoCB": _rcu_barrier() found online no-CBs CPU. * "OnlineNoCB": rcu_barrier() found online no-CBs CPU.
* "OnlineQ": _rcu_barrier() found online CPU with callbacks. * "OnlineQ": rcu_barrier() found online CPU with callbacks.
* "OnlineNQ": _rcu_barrier() found online CPU, no callbacks. * "OnlineNQ": rcu_barrier() found online CPU, no callbacks.
* "IRQ": An rcu_barrier_callback() callback posted on remote CPU. * "IRQ": An rcu_barrier_callback() callback posted on remote CPU.
* "IRQNQ": An rcu_barrier_callback() callback found no callbacks. * "IRQNQ": An rcu_barrier_callback() callback found no callbacks.
* "CB": An rcu_barrier_callback() invoked a callback, not the last. * "CB": An rcu_barrier_callback() invoked a callback, not the last.
* "LastCB": An rcu_barrier_callback() invoked the last callback. * "LastCB": An rcu_barrier_callback() invoked the last callback.
* "Inc2": _rcu_barrier() piggyback check counter incremented. * "Inc2": rcu_barrier() piggyback check counter incremented.
* The "cpu" argument is the CPU or -1 if meaningless, the "cnt" argument * The "cpu" argument is the CPU or -1 if meaningless, the "cnt" argument
* is the count of remaining callbacks, and "done" is the piggybacking count. * is the count of remaining callbacks, and "done" is the piggybacking count.
*/ */
......
...@@ -196,7 +196,7 @@ config RCU_BOOST ...@@ -196,7 +196,7 @@ config RCU_BOOST
This option boosts the priority of preempted RCU readers that This option boosts the priority of preempted RCU readers that
block the current preemptible RCU grace period for too long. block the current preemptible RCU grace period for too long.
This option also prevents heavy loads from blocking RCU This option also prevents heavy loads from blocking RCU
callback invocation for all flavors of RCU. callback invocation.
Say Y here if you are working with real-time apps or heavy loads Say Y here if you are working with real-time apps or heavy loads
Say N here if you are unsure. Say N here if you are unsure.
...@@ -225,12 +225,12 @@ config RCU_NOCB_CPU ...@@ -225,12 +225,12 @@ config RCU_NOCB_CPU
callback invocation to energy-efficient CPUs in battery-powered callback invocation to energy-efficient CPUs in battery-powered
asymmetric multiprocessors. asymmetric multiprocessors.
This option offloads callback invocation from the set of This option offloads callback invocation from the set of CPUs
CPUs specified at boot time by the rcu_nocbs parameter. specified at boot time by the rcu_nocbs parameter. For each
For each such CPU, a kthread ("rcuox/N") will be created to such CPU, a kthread ("rcuox/N") will be created to invoke
invoke callbacks, where the "N" is the CPU being offloaded, callbacks, where the "N" is the CPU being offloaded, and where
and where the "x" is "b" for RCU-bh, "p" for RCU-preempt, and the "p" for RCU-preempt (PREEMPT kernels) and "s" for RCU-sched
"s" for RCU-sched. Nothing prevents this kthread from running (!PREEMPT kernels). Nothing prevents this kthread from running
on the specified CPUs, but (1) the kthreads may be preempted on the specified CPUs, but (1) the kthreads may be preempted
between each callback, and (2) affinity or cgroups can be used between each callback, and (2) affinity or cgroups can be used
to force the kthreads to run on whatever set of CPUs is desired. to force the kthreads to run on whatever set of CPUs is desired.
......
...@@ -176,8 +176,9 @@ static inline unsigned long rcu_seq_diff(unsigned long new, unsigned long old) ...@@ -176,8 +176,9 @@ static inline unsigned long rcu_seq_diff(unsigned long new, unsigned long old)
/* /*
* debug_rcu_head_queue()/debug_rcu_head_unqueue() are used internally * debug_rcu_head_queue()/debug_rcu_head_unqueue() are used internally
* by call_rcu() and rcu callback execution, and are therefore not part of the * by call_rcu() and rcu callback execution, and are therefore not part
* RCU API. Leaving in rcupdate.h because they are used by all RCU flavors. * of the RCU API. These are in rcupdate.h because they are used by all
* RCU implementations.
*/ */
#ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD #ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD
...@@ -223,6 +224,7 @@ void kfree(const void *); ...@@ -223,6 +224,7 @@ void kfree(const void *);
*/ */
static inline bool __rcu_reclaim(const char *rn, struct rcu_head *head) static inline bool __rcu_reclaim(const char *rn, struct rcu_head *head)
{ {
rcu_callback_t f;
unsigned long offset = (unsigned long)head->func; unsigned long offset = (unsigned long)head->func;
rcu_lock_acquire(&rcu_callback_map); rcu_lock_acquire(&rcu_callback_map);
...@@ -233,7 +235,9 @@ static inline bool __rcu_reclaim(const char *rn, struct rcu_head *head) ...@@ -233,7 +235,9 @@ static inline bool __rcu_reclaim(const char *rn, struct rcu_head *head)
return true; return true;
} else { } else {
RCU_TRACE(trace_rcu_invoke_callback(rn, head);) RCU_TRACE(trace_rcu_invoke_callback(rn, head);)
head->func(head); f = head->func;
WRITE_ONCE(head->func, (rcu_callback_t)0L);
f(head);
rcu_lock_release(&rcu_callback_map); rcu_lock_release(&rcu_callback_map);
return false; return false;
} }
...@@ -328,40 +332,35 @@ static inline void rcu_init_levelspread(int *levelspread, const int *levelcnt) ...@@ -328,40 +332,35 @@ static inline void rcu_init_levelspread(int *levelspread, const int *levelcnt)
} }
} }
/* Returns first leaf rcu_node of the specified RCU flavor. */ /* Returns a pointer to the first leaf rcu_node structure. */
#define rcu_first_leaf_node(rsp) ((rsp)->level[rcu_num_lvls - 1]) #define rcu_first_leaf_node() (rcu_state.level[rcu_num_lvls - 1])
/* Is this rcu_node a leaf? */ /* Is this rcu_node a leaf? */
#define rcu_is_leaf_node(rnp) ((rnp)->level == rcu_num_lvls - 1) #define rcu_is_leaf_node(rnp) ((rnp)->level == rcu_num_lvls - 1)
/* Is this rcu_node the last leaf? */ /* Is this rcu_node the last leaf? */
#define rcu_is_last_leaf_node(rsp, rnp) ((rnp) == &(rsp)->node[rcu_num_nodes - 1]) #define rcu_is_last_leaf_node(rnp) ((rnp) == &rcu_state.node[rcu_num_nodes - 1])
/* /*
* Do a full breadth-first scan of the rcu_node structures for the * Do a full breadth-first scan of the {s,}rcu_node structures for the
* specified rcu_state structure. * specified state structure (for SRCU) or the only rcu_state structure
* (for RCU).
*/ */
#define rcu_for_each_node_breadth_first(rsp, rnp) \ #define srcu_for_each_node_breadth_first(sp, rnp) \
for ((rnp) = &(rsp)->node[0]; \ for ((rnp) = &(sp)->node[0]; \
(rnp) < &(rsp)->node[rcu_num_nodes]; (rnp)++) (rnp) < &(sp)->node[rcu_num_nodes]; (rnp)++)
#define rcu_for_each_node_breadth_first(rnp) \
srcu_for_each_node_breadth_first(&rcu_state, rnp)
/* /*
* Do a breadth-first scan of the non-leaf rcu_node structures for the * Scan the leaves of the rcu_node hierarchy for the rcu_state structure.
* specified rcu_state structure. Note that if there is a singleton * Note that if there is a singleton rcu_node tree with but one rcu_node
* rcu_node tree with but one rcu_node structure, this loop is a no-op. * structure, this loop -will- visit the rcu_node structure. It is still
* a leaf node, even if it is also the root node.
*/ */
#define rcu_for_each_nonleaf_node_breadth_first(rsp, rnp) \ #define rcu_for_each_leaf_node(rnp) \
for ((rnp) = &(rsp)->node[0]; !rcu_is_leaf_node(rsp, rnp); (rnp)++) for ((rnp) = rcu_first_leaf_node(); \
(rnp) < &rcu_state.node[rcu_num_nodes]; (rnp)++)
/*
* Scan the leaves of the rcu_node hierarchy for the specified rcu_state
* structure. Note that if there is a singleton rcu_node tree with but
* one rcu_node structure, this loop -will- visit the rcu_node structure.
* It is still a leaf node, even if it is also the root node.
*/
#define rcu_for_each_leaf_node(rsp, rnp) \
for ((rnp) = rcu_first_leaf_node(rsp); \
(rnp) < &(rsp)->node[rcu_num_nodes]; (rnp)++)
/* /*
* Iterate over all possible CPUs in a leaf RCU node. * Iterate over all possible CPUs in a leaf RCU node.
...@@ -435,6 +434,12 @@ do { \ ...@@ -435,6 +434,12 @@ do { \
#endif /* #if defined(SRCU) || !defined(TINY_RCU) */ #endif /* #if defined(SRCU) || !defined(TINY_RCU) */
#ifdef CONFIG_SRCU
void srcu_init(void);
#else /* #ifdef CONFIG_SRCU */
static inline void srcu_init(void) { }
#endif /* #else #ifdef CONFIG_SRCU */
#ifdef CONFIG_TINY_RCU #ifdef CONFIG_TINY_RCU
/* Tiny RCU doesn't expedite, as its purpose in life is instead to be tiny. */ /* Tiny RCU doesn't expedite, as its purpose in life is instead to be tiny. */
static inline bool rcu_gp_is_normal(void) { return true; } static inline bool rcu_gp_is_normal(void) { return true; }
...@@ -515,29 +520,19 @@ void srcutorture_get_gp_data(enum rcutorture_type test_type, ...@@ -515,29 +520,19 @@ void srcutorture_get_gp_data(enum rcutorture_type test_type,
#ifdef CONFIG_TINY_RCU #ifdef CONFIG_TINY_RCU
static inline unsigned long rcu_get_gp_seq(void) { return 0; } static inline unsigned long rcu_get_gp_seq(void) { return 0; }
static inline unsigned long rcu_bh_get_gp_seq(void) { return 0; }
static inline unsigned long rcu_sched_get_gp_seq(void) { return 0; }
static inline unsigned long rcu_exp_batches_completed(void) { return 0; } static inline unsigned long rcu_exp_batches_completed(void) { return 0; }
static inline unsigned long rcu_exp_batches_completed_sched(void) { return 0; }
static inline unsigned long static inline unsigned long
srcu_batches_completed(struct srcu_struct *sp) { return 0; } srcu_batches_completed(struct srcu_struct *sp) { return 0; }
static inline void rcu_force_quiescent_state(void) { } static inline void rcu_force_quiescent_state(void) { }
static inline void rcu_bh_force_quiescent_state(void) { }
static inline void rcu_sched_force_quiescent_state(void) { }
static inline void show_rcu_gp_kthreads(void) { } static inline void show_rcu_gp_kthreads(void) { }
static inline int rcu_get_gp_kthreads_prio(void) { return 0; } static inline int rcu_get_gp_kthreads_prio(void) { return 0; }
#else /* #ifdef CONFIG_TINY_RCU */ #else /* #ifdef CONFIG_TINY_RCU */
unsigned long rcu_get_gp_seq(void); unsigned long rcu_get_gp_seq(void);
unsigned long rcu_bh_get_gp_seq(void);
unsigned long rcu_sched_get_gp_seq(void);
unsigned long rcu_exp_batches_completed(void); unsigned long rcu_exp_batches_completed(void);
unsigned long rcu_exp_batches_completed_sched(void);
unsigned long srcu_batches_completed(struct srcu_struct *sp); unsigned long srcu_batches_completed(struct srcu_struct *sp);
void show_rcu_gp_kthreads(void); void show_rcu_gp_kthreads(void);
int rcu_get_gp_kthreads_prio(void); int rcu_get_gp_kthreads_prio(void);
void rcu_force_quiescent_state(void); void rcu_force_quiescent_state(void);
void rcu_bh_force_quiescent_state(void);
void rcu_sched_force_quiescent_state(void);
extern struct workqueue_struct *rcu_gp_wq; extern struct workqueue_struct *rcu_gp_wq;
extern struct workqueue_struct *rcu_par_gp_wq; extern struct workqueue_struct *rcu_par_gp_wq;
#endif /* #else #ifdef CONFIG_TINY_RCU */ #endif /* #else #ifdef CONFIG_TINY_RCU */
......
...@@ -189,36 +189,6 @@ static struct rcu_perf_ops rcu_ops = { ...@@ -189,36 +189,6 @@ static struct rcu_perf_ops rcu_ops = {
.name = "rcu" .name = "rcu"
}; };
/*
* Definitions for rcu_bh perf testing.
*/
static int rcu_bh_perf_read_lock(void) __acquires(RCU_BH)
{
rcu_read_lock_bh();
return 0;
}
static void rcu_bh_perf_read_unlock(int idx) __releases(RCU_BH)
{
rcu_read_unlock_bh();
}
static struct rcu_perf_ops rcu_bh_ops = {
.ptype = RCU_BH_FLAVOR,
.init = rcu_sync_perf_init,
.readlock = rcu_bh_perf_read_lock,
.readunlock = rcu_bh_perf_read_unlock,
.get_gp_seq = rcu_bh_get_gp_seq,
.gp_diff = rcu_seq_diff,
.exp_completed = rcu_exp_batches_completed_sched,
.async = call_rcu_bh,
.gp_barrier = rcu_barrier_bh,
.sync = synchronize_rcu_bh,
.exp_sync = synchronize_rcu_bh_expedited,
.name = "rcu_bh"
};
/* /*
* Definitions for srcu perf testing. * Definitions for srcu perf testing.
*/ */
...@@ -305,36 +275,6 @@ static struct rcu_perf_ops srcud_ops = { ...@@ -305,36 +275,6 @@ static struct rcu_perf_ops srcud_ops = {
.name = "srcud" .name = "srcud"
}; };
/*
* Definitions for sched perf testing.
*/
static int sched_perf_read_lock(void)
{
preempt_disable();
return 0;
}
static void sched_perf_read_unlock(int idx)
{
preempt_enable();
}
static struct rcu_perf_ops sched_ops = {
.ptype = RCU_SCHED_FLAVOR,
.init = rcu_sync_perf_init,
.readlock = sched_perf_read_lock,
.readunlock = sched_perf_read_unlock,
.get_gp_seq = rcu_sched_get_gp_seq,
.gp_diff = rcu_seq_diff,
.exp_completed = rcu_exp_batches_completed_sched,
.async = call_rcu_sched,
.gp_barrier = rcu_barrier_sched,
.sync = synchronize_sched,
.exp_sync = synchronize_sched_expedited,
.name = "sched"
};
/* /*
* Definitions for RCU-tasks perf testing. * Definitions for RCU-tasks perf testing.
*/ */
...@@ -611,7 +551,7 @@ rcu_perf_cleanup(void) ...@@ -611,7 +551,7 @@ rcu_perf_cleanup(void)
kfree(writer_n_durations); kfree(writer_n_durations);
} }
/* Do flavor-specific cleanup operations. */ /* Do torture-type-specific cleanup operations. */
if (cur_ops->cleanup != NULL) if (cur_ops->cleanup != NULL)
cur_ops->cleanup(); cur_ops->cleanup();
...@@ -661,8 +601,7 @@ rcu_perf_init(void) ...@@ -661,8 +601,7 @@ rcu_perf_init(void)
long i; long i;
int firsterr = 0; int firsterr = 0;
static struct rcu_perf_ops *perf_ops[] = { static struct rcu_perf_ops *perf_ops[] = {
&rcu_ops, &rcu_bh_ops, &srcu_ops, &srcud_ops, &sched_ops, &rcu_ops, &srcu_ops, &srcud_ops, &tasks_ops,
&tasks_ops,
}; };
if (!torture_init_begin(perf_type, verbose)) if (!torture_init_begin(perf_type, verbose))
...@@ -680,6 +619,7 @@ rcu_perf_init(void) ...@@ -680,6 +619,7 @@ rcu_perf_init(void)
for (i = 0; i < ARRAY_SIZE(perf_ops); i++) for (i = 0; i < ARRAY_SIZE(perf_ops); i++)
pr_cont(" %s", perf_ops[i]->name); pr_cont(" %s", perf_ops[i]->name);
pr_cont("\n"); pr_cont("\n");
WARN_ON(!IS_MODULE(CONFIG_RCU_PERF_TEST));
firsterr = -EINVAL; firsterr = -EINVAL;
goto unwind; goto unwind;
} }
......
...@@ -66,15 +66,19 @@ MODULE_AUTHOR("Paul E. McKenney <paulmck@us.ibm.com> and Josh Triplett <josh@jos ...@@ -66,15 +66,19 @@ MODULE_AUTHOR("Paul E. McKenney <paulmck@us.ibm.com> and Josh Triplett <josh@jos
/* Bits for ->extendables field, extendables param, and related definitions. */ /* Bits for ->extendables field, extendables param, and related definitions. */
#define RCUTORTURE_RDR_SHIFT 8 /* Put SRCU index in upper bits. */ #define RCUTORTURE_RDR_SHIFT 8 /* Put SRCU index in upper bits. */
#define RCUTORTURE_RDR_MASK ((1 << RCUTORTURE_RDR_SHIFT) - 1) #define RCUTORTURE_RDR_MASK ((1 << RCUTORTURE_RDR_SHIFT) - 1)
#define RCUTORTURE_RDR_BH 0x1 /* Extend readers by disabling bh. */ #define RCUTORTURE_RDR_BH 0x01 /* Extend readers by disabling bh. */
#define RCUTORTURE_RDR_IRQ 0x2 /* ... disabling interrupts. */ #define RCUTORTURE_RDR_IRQ 0x02 /* ... disabling interrupts. */
#define RCUTORTURE_RDR_PREEMPT 0x4 /* ... disabling preemption. */ #define RCUTORTURE_RDR_PREEMPT 0x04 /* ... disabling preemption. */
#define RCUTORTURE_RDR_RCU 0x8 /* ... entering another RCU reader. */ #define RCUTORTURE_RDR_RBH 0x08 /* ... rcu_read_lock_bh(). */
#define RCUTORTURE_RDR_NBITS 4 /* Number of bits defined above. */ #define RCUTORTURE_RDR_SCHED 0x10 /* ... rcu_read_lock_sched(). */
#define RCUTORTURE_MAX_EXTEND (RCUTORTURE_RDR_BH | RCUTORTURE_RDR_IRQ | \ #define RCUTORTURE_RDR_RCU 0x20 /* ... entering another RCU reader. */
RCUTORTURE_RDR_PREEMPT) #define RCUTORTURE_RDR_NBITS 6 /* Number of bits defined above. */
#define RCUTORTURE_MAX_EXTEND \
(RCUTORTURE_RDR_BH | RCUTORTURE_RDR_IRQ | RCUTORTURE_RDR_PREEMPT | \
RCUTORTURE_RDR_RBH | RCUTORTURE_RDR_SCHED)
#define RCUTORTURE_RDR_MAX_LOOPS 0x7 /* Maximum reader extensions. */ #define RCUTORTURE_RDR_MAX_LOOPS 0x7 /* Maximum reader extensions. */
/* Must be power of two minus one. */ /* Must be power of two minus one. */
#define RCUTORTURE_RDR_MAX_SEGS (RCUTORTURE_RDR_MAX_LOOPS + 3)
torture_param(int, cbflood_inter_holdoff, HZ, torture_param(int, cbflood_inter_holdoff, HZ,
"Holdoff between floods (jiffies)"); "Holdoff between floods (jiffies)");
...@@ -89,6 +93,12 @@ torture_param(int, fqs_duration, 0, ...@@ -89,6 +93,12 @@ torture_param(int, fqs_duration, 0,
"Duration of fqs bursts (us), 0 to disable"); "Duration of fqs bursts (us), 0 to disable");
torture_param(int, fqs_holdoff, 0, "Holdoff time within fqs bursts (us)"); torture_param(int, fqs_holdoff, 0, "Holdoff time within fqs bursts (us)");
torture_param(int, fqs_stutter, 3, "Wait time between fqs bursts (s)"); torture_param(int, fqs_stutter, 3, "Wait time between fqs bursts (s)");
torture_param(bool, fwd_progress, 1, "Test grace-period forward progress");
torture_param(int, fwd_progress_div, 4, "Fraction of CPU stall to wait");
torture_param(int, fwd_progress_holdoff, 60,
"Time between forward-progress tests (s)");
torture_param(bool, fwd_progress_need_resched, 1,
"Hide cond_resched() behind need_resched()");
torture_param(bool, gp_cond, false, "Use conditional/async GP wait primitives"); torture_param(bool, gp_cond, false, "Use conditional/async GP wait primitives");
torture_param(bool, gp_exp, false, "Use expedited GP wait primitives"); torture_param(bool, gp_exp, false, "Use expedited GP wait primitives");
torture_param(bool, gp_normal, false, torture_param(bool, gp_normal, false,
...@@ -125,7 +135,7 @@ torture_param(int, verbose, 1, ...@@ -125,7 +135,7 @@ torture_param(int, verbose, 1,
static char *torture_type = "rcu"; static char *torture_type = "rcu";
module_param(torture_type, charp, 0444); module_param(torture_type, charp, 0444);
MODULE_PARM_DESC(torture_type, "Type of RCU to torture (rcu, rcu_bh, ...)"); MODULE_PARM_DESC(torture_type, "Type of RCU to torture (rcu, srcu, ...)");
static int nrealreaders; static int nrealreaders;
static int ncbflooders; static int ncbflooders;
...@@ -137,6 +147,7 @@ static struct task_struct **cbflood_task; ...@@ -137,6 +147,7 @@ static struct task_struct **cbflood_task;
static struct task_struct *fqs_task; static struct task_struct *fqs_task;
static struct task_struct *boost_tasks[NR_CPUS]; static struct task_struct *boost_tasks[NR_CPUS];
static struct task_struct *stall_task; static struct task_struct *stall_task;
static struct task_struct *fwd_prog_task;
static struct task_struct **barrier_cbs_tasks; static struct task_struct **barrier_cbs_tasks;
static struct task_struct *barrier_task; static struct task_struct *barrier_task;
...@@ -197,6 +208,18 @@ static const char * const rcu_torture_writer_state_names[] = { ...@@ -197,6 +208,18 @@ static const char * const rcu_torture_writer_state_names[] = {
"RTWS_STOPPING", "RTWS_STOPPING",
}; };
/* Record reader segment types and duration for first failing read. */
struct rt_read_seg {
int rt_readstate;
unsigned long rt_delay_jiffies;
unsigned long rt_delay_ms;
unsigned long rt_delay_us;
bool rt_preempted;
};
static int err_segs_recorded;
static struct rt_read_seg err_segs[RCUTORTURE_RDR_MAX_SEGS];
static int rt_read_nsegs;
static const char *rcu_torture_writer_state_getname(void) static const char *rcu_torture_writer_state_getname(void)
{ {
unsigned int i = READ_ONCE(rcu_torture_writer_state); unsigned int i = READ_ONCE(rcu_torture_writer_state);
...@@ -278,7 +301,8 @@ struct rcu_torture_ops { ...@@ -278,7 +301,8 @@ struct rcu_torture_ops {
void (*init)(void); void (*init)(void);
void (*cleanup)(void); void (*cleanup)(void);
int (*readlock)(void); int (*readlock)(void);
void (*read_delay)(struct torture_random_state *rrsp); void (*read_delay)(struct torture_random_state *rrsp,
struct rt_read_seg *rtrsp);
void (*readunlock)(int idx); void (*readunlock)(int idx);
unsigned long (*get_gp_seq)(void); unsigned long (*get_gp_seq)(void);
unsigned long (*gp_diff)(unsigned long new, unsigned long old); unsigned long (*gp_diff)(unsigned long new, unsigned long old);
...@@ -291,6 +315,7 @@ struct rcu_torture_ops { ...@@ -291,6 +315,7 @@ struct rcu_torture_ops {
void (*cb_barrier)(void); void (*cb_barrier)(void);
void (*fqs)(void); void (*fqs)(void);
void (*stats)(void); void (*stats)(void);
int (*stall_dur)(void);
int irq_capable; int irq_capable;
int can_boost; int can_boost;
int extendables; int extendables;
...@@ -310,12 +335,13 @@ static int rcu_torture_read_lock(void) __acquires(RCU) ...@@ -310,12 +335,13 @@ static int rcu_torture_read_lock(void) __acquires(RCU)
return 0; return 0;
} }
static void rcu_read_delay(struct torture_random_state *rrsp) static void
rcu_read_delay(struct torture_random_state *rrsp, struct rt_read_seg *rtrsp)
{ {
unsigned long started; unsigned long started;
unsigned long completed; unsigned long completed;
const unsigned long shortdelay_us = 200; const unsigned long shortdelay_us = 200;
const unsigned long longdelay_ms = 50; unsigned long longdelay_ms = 300;
unsigned long long ts; unsigned long long ts;
/* We want a short delay sometimes to make a reader delay the grace /* We want a short delay sometimes to make a reader delay the grace
...@@ -325,16 +351,23 @@ static void rcu_read_delay(struct torture_random_state *rrsp) ...@@ -325,16 +351,23 @@ static void rcu_read_delay(struct torture_random_state *rrsp)
if (!(torture_random(rrsp) % (nrealreaders * 2000 * longdelay_ms))) { if (!(torture_random(rrsp) % (nrealreaders * 2000 * longdelay_ms))) {
started = cur_ops->get_gp_seq(); started = cur_ops->get_gp_seq();
ts = rcu_trace_clock_local(); ts = rcu_trace_clock_local();
if (preempt_count() & (SOFTIRQ_MASK | HARDIRQ_MASK))
longdelay_ms = 5; /* Avoid triggering BH limits. */
mdelay(longdelay_ms); mdelay(longdelay_ms);
rtrsp->rt_delay_ms = longdelay_ms;
completed = cur_ops->get_gp_seq(); completed = cur_ops->get_gp_seq();
do_trace_rcu_torture_read(cur_ops->name, NULL, ts, do_trace_rcu_torture_read(cur_ops->name, NULL, ts,
started, completed); started, completed);
} }
if (!(torture_random(rrsp) % (nrealreaders * 2 * shortdelay_us))) if (!(torture_random(rrsp) % (nrealreaders * 2 * shortdelay_us))) {
udelay(shortdelay_us); udelay(shortdelay_us);
rtrsp->rt_delay_us = shortdelay_us;
}
if (!preempt_count() && if (!preempt_count() &&
!(torture_random(rrsp) % (nrealreaders * 500))) !(torture_random(rrsp) % (nrealreaders * 500))) {
torture_preempt_schedule(); /* QS only if preemptible. */ torture_preempt_schedule(); /* QS only if preemptible. */
rtrsp->rt_preempted = true;
}
} }
static void rcu_torture_read_unlock(int idx) __releases(RCU) static void rcu_torture_read_unlock(int idx) __releases(RCU)
...@@ -429,52 +462,13 @@ static struct rcu_torture_ops rcu_ops = { ...@@ -429,52 +462,13 @@ static struct rcu_torture_ops rcu_ops = {
.cb_barrier = rcu_barrier, .cb_barrier = rcu_barrier,
.fqs = rcu_force_quiescent_state, .fqs = rcu_force_quiescent_state,
.stats = NULL, .stats = NULL,
.stall_dur = rcu_jiffies_till_stall_check,
.irq_capable = 1, .irq_capable = 1,
.can_boost = rcu_can_boost(), .can_boost = rcu_can_boost(),
.extendables = RCUTORTURE_MAX_EXTEND,
.name = "rcu" .name = "rcu"
}; };
/*
* Definitions for rcu_bh torture testing.
*/
static int rcu_bh_torture_read_lock(void) __acquires(RCU_BH)
{
rcu_read_lock_bh();
return 0;
}
static void rcu_bh_torture_read_unlock(int idx) __releases(RCU_BH)
{
rcu_read_unlock_bh();
}
static void rcu_bh_torture_deferred_free(struct rcu_torture *p)
{
call_rcu_bh(&p->rtort_rcu, rcu_torture_cb);
}
static struct rcu_torture_ops rcu_bh_ops = {
.ttype = RCU_BH_FLAVOR,
.init = rcu_sync_torture_init,
.readlock = rcu_bh_torture_read_lock,
.read_delay = rcu_read_delay, /* just reuse rcu's version. */
.readunlock = rcu_bh_torture_read_unlock,
.get_gp_seq = rcu_bh_get_gp_seq,
.gp_diff = rcu_seq_diff,
.deferred_free = rcu_bh_torture_deferred_free,
.sync = synchronize_rcu_bh,
.exp_sync = synchronize_rcu_bh_expedited,
.call = call_rcu_bh,
.cb_barrier = rcu_barrier_bh,
.fqs = rcu_bh_force_quiescent_state,
.stats = NULL,
.irq_capable = 1,
.extendables = (RCUTORTURE_RDR_BH | RCUTORTURE_RDR_IRQ),
.ext_irq_conflict = RCUTORTURE_RDR_RCU,
.name = "rcu_bh"
};
/* /*
* Don't even think about trying any of these in real life!!! * Don't even think about trying any of these in real life!!!
* The names includes "busted", and they really means it! * The names includes "busted", and they really means it!
...@@ -531,7 +525,8 @@ static int srcu_torture_read_lock(void) __acquires(srcu_ctlp) ...@@ -531,7 +525,8 @@ static int srcu_torture_read_lock(void) __acquires(srcu_ctlp)
return srcu_read_lock(srcu_ctlp); return srcu_read_lock(srcu_ctlp);
} }
static void srcu_read_delay(struct torture_random_state *rrsp) static void
srcu_read_delay(struct torture_random_state *rrsp, struct rt_read_seg *rtrsp)
{ {
long delay; long delay;
const long uspertick = 1000000 / HZ; const long uspertick = 1000000 / HZ;
...@@ -541,10 +536,12 @@ static void srcu_read_delay(struct torture_random_state *rrsp) ...@@ -541,10 +536,12 @@ static void srcu_read_delay(struct torture_random_state *rrsp)
delay = torture_random(rrsp) % delay = torture_random(rrsp) %
(nrealreaders * 2 * longdelay * uspertick); (nrealreaders * 2 * longdelay * uspertick);
if (!delay && in_task()) if (!delay && in_task()) {
schedule_timeout_interruptible(longdelay); schedule_timeout_interruptible(longdelay);
else rtrsp->rt_delay_jiffies = longdelay;
rcu_read_delay(rrsp); } else {
rcu_read_delay(rrsp, rtrsp);
}
} }
static void srcu_torture_read_unlock(int idx) __releases(srcu_ctlp) static void srcu_torture_read_unlock(int idx) __releases(srcu_ctlp)
...@@ -662,48 +659,6 @@ static struct rcu_torture_ops busted_srcud_ops = { ...@@ -662,48 +659,6 @@ static struct rcu_torture_ops busted_srcud_ops = {
.name = "busted_srcud" .name = "busted_srcud"
}; };
/*
* Definitions for sched torture testing.
*/
static int sched_torture_read_lock(void)
{
preempt_disable();
return 0;
}
static void sched_torture_read_unlock(int idx)
{
preempt_enable();
}
static void rcu_sched_torture_deferred_free(struct rcu_torture *p)
{
call_rcu_sched(&p->rtort_rcu, rcu_torture_cb);
}
static struct rcu_torture_ops sched_ops = {
.ttype = RCU_SCHED_FLAVOR,
.init = rcu_sync_torture_init,
.readlock = sched_torture_read_lock,
.read_delay = rcu_read_delay, /* just reuse rcu's version. */
.readunlock = sched_torture_read_unlock,
.get_gp_seq = rcu_sched_get_gp_seq,
.gp_diff = rcu_seq_diff,
.deferred_free = rcu_sched_torture_deferred_free,
.sync = synchronize_sched,
.exp_sync = synchronize_sched_expedited,
.get_state = get_state_synchronize_sched,
.cond_sync = cond_synchronize_sched,
.call = call_rcu_sched,
.cb_barrier = rcu_barrier_sched,
.fqs = rcu_sched_force_quiescent_state,
.stats = NULL,
.irq_capable = 1,
.extendables = RCUTORTURE_MAX_EXTEND,
.name = "sched"
};
/* /*
* Definitions for RCU-tasks torture testing. * Definitions for RCU-tasks torture testing.
*/ */
...@@ -1116,7 +1071,8 @@ rcu_torture_writer(void *arg) ...@@ -1116,7 +1071,8 @@ rcu_torture_writer(void *arg)
break; break;
} }
} }
rcu_torture_current_version++; WRITE_ONCE(rcu_torture_current_version,
rcu_torture_current_version + 1);
/* Cycle through nesting levels of rcu_expedite_gp() calls. */ /* Cycle through nesting levels of rcu_expedite_gp() calls. */
if (can_expedite && if (can_expedite &&
!(torture_random(&rand) & 0xff & (!!expediting - 1))) { !(torture_random(&rand) & 0xff & (!!expediting - 1))) {
...@@ -1132,7 +1088,10 @@ rcu_torture_writer(void *arg) ...@@ -1132,7 +1088,10 @@ rcu_torture_writer(void *arg)
!rcu_gp_is_normal(); !rcu_gp_is_normal();
} }
rcu_torture_writer_state = RTWS_STUTTER; rcu_torture_writer_state = RTWS_STUTTER;
stutter_wait("rcu_torture_writer"); if (stutter_wait("rcu_torture_writer"))
for (i = 0; i < ARRAY_SIZE(rcu_tortures); i++)
if (list_empty(&rcu_tortures[i].rtort_free))
WARN_ON_ONCE(1);
} while (!torture_must_stop()); } while (!torture_must_stop());
/* Reset expediting back to unexpedited. */ /* Reset expediting back to unexpedited. */
if (expediting > 0) if (expediting > 0)
...@@ -1199,7 +1158,8 @@ static void rcu_torture_timer_cb(struct rcu_head *rhp) ...@@ -1199,7 +1158,8 @@ static void rcu_torture_timer_cb(struct rcu_head *rhp)
* change, do a ->read_delay(). * change, do a ->read_delay().
*/ */
static void rcutorture_one_extend(int *readstate, int newstate, static void rcutorture_one_extend(int *readstate, int newstate,
struct torture_random_state *trsp) struct torture_random_state *trsp,
struct rt_read_seg *rtrsp)
{ {
int idxnew = -1; int idxnew = -1;
int idxold = *readstate; int idxold = *readstate;
...@@ -1208,6 +1168,7 @@ static void rcutorture_one_extend(int *readstate, int newstate, ...@@ -1208,6 +1168,7 @@ static void rcutorture_one_extend(int *readstate, int newstate,
WARN_ON_ONCE(idxold < 0); WARN_ON_ONCE(idxold < 0);
WARN_ON_ONCE((idxold >> RCUTORTURE_RDR_SHIFT) > 1); WARN_ON_ONCE((idxold >> RCUTORTURE_RDR_SHIFT) > 1);
rtrsp->rt_readstate = newstate;
/* First, put new protection in place to avoid critical-section gap. */ /* First, put new protection in place to avoid critical-section gap. */
if (statesnew & RCUTORTURE_RDR_BH) if (statesnew & RCUTORTURE_RDR_BH)
...@@ -1216,6 +1177,10 @@ static void rcutorture_one_extend(int *readstate, int newstate, ...@@ -1216,6 +1177,10 @@ static void rcutorture_one_extend(int *readstate, int newstate,
local_irq_disable(); local_irq_disable();
if (statesnew & RCUTORTURE_RDR_PREEMPT) if (statesnew & RCUTORTURE_RDR_PREEMPT)
preempt_disable(); preempt_disable();
if (statesnew & RCUTORTURE_RDR_RBH)
rcu_read_lock_bh();
if (statesnew & RCUTORTURE_RDR_SCHED)
rcu_read_lock_sched();
if (statesnew & RCUTORTURE_RDR_RCU) if (statesnew & RCUTORTURE_RDR_RCU)
idxnew = cur_ops->readlock() << RCUTORTURE_RDR_SHIFT; idxnew = cur_ops->readlock() << RCUTORTURE_RDR_SHIFT;
...@@ -1226,12 +1191,16 @@ static void rcutorture_one_extend(int *readstate, int newstate, ...@@ -1226,12 +1191,16 @@ static void rcutorture_one_extend(int *readstate, int newstate,
local_bh_enable(); local_bh_enable();
if (statesold & RCUTORTURE_RDR_PREEMPT) if (statesold & RCUTORTURE_RDR_PREEMPT)
preempt_enable(); preempt_enable();
if (statesold & RCUTORTURE_RDR_RBH)
rcu_read_unlock_bh();
if (statesold & RCUTORTURE_RDR_SCHED)
rcu_read_unlock_sched();
if (statesold & RCUTORTURE_RDR_RCU) if (statesold & RCUTORTURE_RDR_RCU)
cur_ops->readunlock(idxold >> RCUTORTURE_RDR_SHIFT); cur_ops->readunlock(idxold >> RCUTORTURE_RDR_SHIFT);
/* Delay if neither beginning nor end and there was a change. */ /* Delay if neither beginning nor end and there was a change. */
if ((statesnew || statesold) && *readstate && newstate) if ((statesnew || statesold) && *readstate && newstate)
cur_ops->read_delay(trsp); cur_ops->read_delay(trsp, rtrsp);
/* Update the reader state. */ /* Update the reader state. */
if (idxnew == -1) if (idxnew == -1)
...@@ -1260,18 +1229,19 @@ rcutorture_extend_mask(int oldmask, struct torture_random_state *trsp) ...@@ -1260,18 +1229,19 @@ rcutorture_extend_mask(int oldmask, struct torture_random_state *trsp)
{ {
int mask = rcutorture_extend_mask_max(); int mask = rcutorture_extend_mask_max();
unsigned long randmask1 = torture_random(trsp) >> 8; unsigned long randmask1 = torture_random(trsp) >> 8;
unsigned long randmask2 = randmask1 >> 1; unsigned long randmask2 = randmask1 >> 3;
WARN_ON_ONCE(mask >> RCUTORTURE_RDR_SHIFT); WARN_ON_ONCE(mask >> RCUTORTURE_RDR_SHIFT);
/* Half the time lots of bits, half the time only one bit. */ /* Most of the time lots of bits, half the time only one bit. */
if (randmask1 & 0x1) if (!(randmask1 & 0x7))
mask = mask & randmask2; mask = mask & randmask2;
else else
mask = mask & (1 << (randmask2 % RCUTORTURE_RDR_NBITS)); mask = mask & (1 << (randmask2 % RCUTORTURE_RDR_NBITS));
/* Can't enable bh w/irq disabled. */
if ((mask & RCUTORTURE_RDR_IRQ) && if ((mask & RCUTORTURE_RDR_IRQ) &&
!(mask & RCUTORTURE_RDR_BH) && ((!(mask & RCUTORTURE_RDR_BH) && (oldmask & RCUTORTURE_RDR_BH)) ||
(oldmask & RCUTORTURE_RDR_BH)) (!(mask & RCUTORTURE_RDR_RBH) && (oldmask & RCUTORTURE_RDR_RBH))))
mask |= RCUTORTURE_RDR_BH; /* Can't enable bh w/irq disabled. */ mask |= RCUTORTURE_RDR_BH | RCUTORTURE_RDR_RBH;
if ((mask & RCUTORTURE_RDR_IRQ) && if ((mask & RCUTORTURE_RDR_IRQ) &&
!(mask & cur_ops->ext_irq_conflict) && !(mask & cur_ops->ext_irq_conflict) &&
(oldmask & cur_ops->ext_irq_conflict)) (oldmask & cur_ops->ext_irq_conflict))
...@@ -1283,20 +1253,25 @@ rcutorture_extend_mask(int oldmask, struct torture_random_state *trsp) ...@@ -1283,20 +1253,25 @@ rcutorture_extend_mask(int oldmask, struct torture_random_state *trsp)
* Do a randomly selected number of extensions of an existing RCU read-side * Do a randomly selected number of extensions of an existing RCU read-side
* critical section. * critical section.
*/ */
static void rcutorture_loop_extend(int *readstate, static struct rt_read_seg *
struct torture_random_state *trsp) rcutorture_loop_extend(int *readstate, struct torture_random_state *trsp,
struct rt_read_seg *rtrsp)
{ {
int i; int i;
int j;
int mask = rcutorture_extend_mask_max(); int mask = rcutorture_extend_mask_max();
WARN_ON_ONCE(!*readstate); /* -Existing- RCU read-side critsect! */ WARN_ON_ONCE(!*readstate); /* -Existing- RCU read-side critsect! */
if (!((mask - 1) & mask)) if (!((mask - 1) & mask))
return; /* Current RCU flavor not extendable. */ return rtrsp; /* Current RCU reader not extendable. */
i = (torture_random(trsp) >> 3) & RCUTORTURE_RDR_MAX_LOOPS; /* Bias towards larger numbers of loops. */
while (i--) { i = (torture_random(trsp) >> 3);
i = ((i | (i >> 3)) & RCUTORTURE_RDR_MAX_LOOPS) + 1;
for (j = 0; j < i; j++) {
mask = rcutorture_extend_mask(*readstate, trsp); mask = rcutorture_extend_mask(*readstate, trsp);
rcutorture_one_extend(readstate, mask, trsp); rcutorture_one_extend(readstate, mask, trsp, &rtrsp[j]);
} }
return &rtrsp[j];
} }
/* /*
...@@ -1306,16 +1281,20 @@ static void rcutorture_loop_extend(int *readstate, ...@@ -1306,16 +1281,20 @@ static void rcutorture_loop_extend(int *readstate,
*/ */
static bool rcu_torture_one_read(struct torture_random_state *trsp) static bool rcu_torture_one_read(struct torture_random_state *trsp)
{ {
int i;
unsigned long started; unsigned long started;
unsigned long completed; unsigned long completed;
int newstate; int newstate;
struct rcu_torture *p; struct rcu_torture *p;
int pipe_count; int pipe_count;
int readstate = 0; int readstate = 0;
struct rt_read_seg rtseg[RCUTORTURE_RDR_MAX_SEGS] = { { 0 } };
struct rt_read_seg *rtrsp = &rtseg[0];
struct rt_read_seg *rtrsp1;
unsigned long long ts; unsigned long long ts;
newstate = rcutorture_extend_mask(readstate, trsp); newstate = rcutorture_extend_mask(readstate, trsp);
rcutorture_one_extend(&readstate, newstate, trsp); rcutorture_one_extend(&readstate, newstate, trsp, rtrsp++);
started = cur_ops->get_gp_seq(); started = cur_ops->get_gp_seq();
ts = rcu_trace_clock_local(); ts = rcu_trace_clock_local();
p = rcu_dereference_check(rcu_torture_current, p = rcu_dereference_check(rcu_torture_current,
...@@ -1325,12 +1304,12 @@ static bool rcu_torture_one_read(struct torture_random_state *trsp) ...@@ -1325,12 +1304,12 @@ static bool rcu_torture_one_read(struct torture_random_state *trsp)
torturing_tasks()); torturing_tasks());
if (p == NULL) { if (p == NULL) {
/* Wait for rcu_torture_writer to get underway */ /* Wait for rcu_torture_writer to get underway */
rcutorture_one_extend(&readstate, 0, trsp); rcutorture_one_extend(&readstate, 0, trsp, rtrsp);
return false; return false;
} }
if (p->rtort_mbtest == 0) if (p->rtort_mbtest == 0)
atomic_inc(&n_rcu_torture_mberror); atomic_inc(&n_rcu_torture_mberror);
rcutorture_loop_extend(&readstate, trsp); rtrsp = rcutorture_loop_extend(&readstate, trsp, rtrsp);
preempt_disable(); preempt_disable();
pipe_count = p->rtort_pipe_count; pipe_count = p->rtort_pipe_count;
if (pipe_count > RCU_TORTURE_PIPE_LEN) { if (pipe_count > RCU_TORTURE_PIPE_LEN) {
...@@ -1351,8 +1330,17 @@ static bool rcu_torture_one_read(struct torture_random_state *trsp) ...@@ -1351,8 +1330,17 @@ static bool rcu_torture_one_read(struct torture_random_state *trsp)
} }
__this_cpu_inc(rcu_torture_batch[completed]); __this_cpu_inc(rcu_torture_batch[completed]);
preempt_enable(); preempt_enable();
rcutorture_one_extend(&readstate, 0, trsp); rcutorture_one_extend(&readstate, 0, trsp, rtrsp);
WARN_ON_ONCE(readstate & RCUTORTURE_RDR_MASK); WARN_ON_ONCE(readstate & RCUTORTURE_RDR_MASK);
/* If error or close call, record the sequence of reader protections. */
if ((pipe_count > 1 || completed > 1) && !xchg(&err_segs_recorded, 1)) {
i = 0;
for (rtrsp1 = &rtseg[0]; rtrsp1 < rtrsp; rtrsp1++)
err_segs[i++] = *rtrsp1;
rt_read_nsegs = i;
}
return true; return true;
} }
...@@ -1387,6 +1375,9 @@ static void rcu_torture_timer(struct timer_list *unused) ...@@ -1387,6 +1375,9 @@ static void rcu_torture_timer(struct timer_list *unused)
static int static int
rcu_torture_reader(void *arg) rcu_torture_reader(void *arg)
{ {
unsigned long lastsleep = jiffies;
long myid = (long)arg;
int mynumonline = myid;
DEFINE_TORTURE_RANDOM(rand); DEFINE_TORTURE_RANDOM(rand);
struct timer_list t; struct timer_list t;
...@@ -1402,6 +1393,12 @@ rcu_torture_reader(void *arg) ...@@ -1402,6 +1393,12 @@ rcu_torture_reader(void *arg)
} }
if (!rcu_torture_one_read(&rand)) if (!rcu_torture_one_read(&rand))
schedule_timeout_interruptible(HZ); schedule_timeout_interruptible(HZ);
if (time_after(jiffies, lastsleep)) {
schedule_timeout_interruptible(1);
lastsleep = jiffies + 10;
}
while (num_online_cpus() < mynumonline && !torture_must_stop())
schedule_timeout_interruptible(HZ / 5);
stutter_wait("rcu_torture_reader"); stutter_wait("rcu_torture_reader");
} while (!torture_must_stop()); } while (!torture_must_stop());
if (irqreader && cur_ops->irq_capable) { if (irqreader && cur_ops->irq_capable) {
...@@ -1655,6 +1652,121 @@ static int __init rcu_torture_stall_init(void) ...@@ -1655,6 +1652,121 @@ static int __init rcu_torture_stall_init(void)
return torture_create_kthread(rcu_torture_stall, NULL, stall_task); return torture_create_kthread(rcu_torture_stall, NULL, stall_task);
} }
/* State structure for forward-progress self-propagating RCU callback. */
struct fwd_cb_state {
struct rcu_head rh;
int stop;
};
/*
* Forward-progress self-propagating RCU callback function. Because
* callbacks run from softirq, this function is an implicit RCU read-side
* critical section.
*/
static void rcu_torture_fwd_prog_cb(struct rcu_head *rhp)
{
struct fwd_cb_state *fcsp = container_of(rhp, struct fwd_cb_state, rh);
if (READ_ONCE(fcsp->stop)) {
WRITE_ONCE(fcsp->stop, 2);
return;
}
cur_ops->call(&fcsp->rh, rcu_torture_fwd_prog_cb);
}
/* Carry out grace-period forward-progress testing. */
static int rcu_torture_fwd_prog(void *args)
{
unsigned long cver;
unsigned long dur;
struct fwd_cb_state fcs;
unsigned long gps;
int idx;
int sd;
int sd4;
bool selfpropcb = false;
unsigned long stopat;
int tested = 0;
int tested_tries = 0;
static DEFINE_TORTURE_RANDOM(trs);
VERBOSE_TOROUT_STRING("rcu_torture_fwd_progress task started");
if (!IS_ENABLED(CONFIG_SMP) || !IS_ENABLED(CONFIG_RCU_BOOST))
set_user_nice(current, MAX_NICE);
if (cur_ops->call && cur_ops->sync && cur_ops->cb_barrier) {
init_rcu_head_on_stack(&fcs.rh);
selfpropcb = true;
}
do {
schedule_timeout_interruptible(fwd_progress_holdoff * HZ);
if (selfpropcb) {
WRITE_ONCE(fcs.stop, 0);
cur_ops->call(&fcs.rh, rcu_torture_fwd_prog_cb);
}
cver = READ_ONCE(rcu_torture_current_version);
gps = cur_ops->get_gp_seq();
sd = cur_ops->stall_dur() + 1;
sd4 = (sd + fwd_progress_div - 1) / fwd_progress_div;
dur = sd4 + torture_random(&trs) % (sd - sd4);
stopat = jiffies + dur;
while (time_before(jiffies, stopat) && !torture_must_stop()) {
idx = cur_ops->readlock();
udelay(10);
cur_ops->readunlock(idx);
if (!fwd_progress_need_resched || need_resched())
cond_resched();
}
tested_tries++;
if (!time_before(jiffies, stopat) && !torture_must_stop()) {
tested++;
cver = READ_ONCE(rcu_torture_current_version) - cver;
gps = rcutorture_seq_diff(cur_ops->get_gp_seq(), gps);
WARN_ON(!cver && gps < 2);
pr_alert("%s: Duration %ld cver %ld gps %ld\n", __func__, dur, cver, gps);
}
if (selfpropcb) {
WRITE_ONCE(fcs.stop, 1);
cur_ops->sync(); /* Wait for running CB to complete. */
cur_ops->cb_barrier(); /* Wait for queued callbacks. */
}
/* Avoid slow periods, better to test when busy. */
stutter_wait("rcu_torture_fwd_prog");
} while (!torture_must_stop());
if (selfpropcb) {
WARN_ON(READ_ONCE(fcs.stop) != 2);
destroy_rcu_head_on_stack(&fcs.rh);
}
/* Short runs might not contain a valid forward-progress attempt. */
WARN_ON(!tested && tested_tries >= 5);
pr_alert("%s: tested %d tested_tries %d\n", __func__, tested, tested_tries);
torture_kthread_stopping("rcu_torture_fwd_prog");
return 0;
}
/* If forward-progress checking is requested and feasible, spawn the thread. */
static int __init rcu_torture_fwd_prog_init(void)
{
if (!fwd_progress)
return 0; /* Not requested, so don't do it. */
if (!cur_ops->stall_dur || cur_ops->stall_dur() <= 0) {
VERBOSE_TOROUT_STRING("rcu_torture_fwd_prog_init: Disabled, unsupported by RCU flavor under test");
return 0;
}
if (stall_cpu > 0) {
VERBOSE_TOROUT_STRING("rcu_torture_fwd_prog_init: Disabled, conflicts with CPU-stall testing");
if (IS_MODULE(CONFIG_RCU_TORTURE_TESTS))
return -EINVAL; /* In module, can fail back to user. */
WARN_ON(1); /* Make sure rcutorture notices conflict. */
return 0;
}
if (fwd_progress_holdoff <= 0)
fwd_progress_holdoff = 1;
if (fwd_progress_div <= 0)
fwd_progress_div = 4;
return torture_create_kthread(rcu_torture_fwd_prog,
NULL, fwd_prog_task);
}
/* Callback function for RCU barrier testing. */ /* Callback function for RCU barrier testing. */
static void rcu_torture_barrier_cbf(struct rcu_head *rcu) static void rcu_torture_barrier_cbf(struct rcu_head *rcu)
{ {
...@@ -1817,6 +1929,7 @@ static enum cpuhp_state rcutor_hp; ...@@ -1817,6 +1929,7 @@ static enum cpuhp_state rcutor_hp;
static void static void
rcu_torture_cleanup(void) rcu_torture_cleanup(void)
{ {
int firsttime;
int flags = 0; int flags = 0;
unsigned long gp_seq = 0; unsigned long gp_seq = 0;
int i; int i;
...@@ -1828,6 +1941,7 @@ rcu_torture_cleanup(void) ...@@ -1828,6 +1941,7 @@ rcu_torture_cleanup(void)
} }
rcu_torture_barrier_cleanup(); rcu_torture_barrier_cleanup();
torture_stop_kthread(rcu_torture_fwd_prog, fwd_prog_task);
torture_stop_kthread(rcu_torture_stall, stall_task); torture_stop_kthread(rcu_torture_stall, stall_task);
torture_stop_kthread(rcu_torture_writer, writer_task); torture_stop_kthread(rcu_torture_writer, writer_task);
...@@ -1860,7 +1974,7 @@ rcu_torture_cleanup(void) ...@@ -1860,7 +1974,7 @@ rcu_torture_cleanup(void)
cpuhp_remove_state(rcutor_hp); cpuhp_remove_state(rcutor_hp);
/* /*
* Wait for all RCU callbacks to fire, then do flavor-specific * Wait for all RCU callbacks to fire, then do torture-type-specific
* cleanup operations. * cleanup operations.
*/ */
if (cur_ops->cb_barrier != NULL) if (cur_ops->cb_barrier != NULL)
...@@ -1870,6 +1984,33 @@ rcu_torture_cleanup(void) ...@@ -1870,6 +1984,33 @@ rcu_torture_cleanup(void)
rcu_torture_stats_print(); /* -After- the stats thread is stopped! */ rcu_torture_stats_print(); /* -After- the stats thread is stopped! */
if (err_segs_recorded) {
pr_alert("Failure/close-call rcutorture reader segments:\n");
if (rt_read_nsegs == 0)
pr_alert("\t: No segments recorded!!!\n");
firsttime = 1;
for (i = 0; i < rt_read_nsegs; i++) {
pr_alert("\t%d: %#x ", i, err_segs[i].rt_readstate);
if (err_segs[i].rt_delay_jiffies != 0) {
pr_cont("%s%ldjiffies", firsttime ? "" : "+",
err_segs[i].rt_delay_jiffies);
firsttime = 0;
}
if (err_segs[i].rt_delay_ms != 0) {
pr_cont("%s%ldms", firsttime ? "" : "+",
err_segs[i].rt_delay_ms);
firsttime = 0;
}
if (err_segs[i].rt_delay_us != 0) {
pr_cont("%s%ldus", firsttime ? "" : "+",
err_segs[i].rt_delay_us);
firsttime = 0;
}
pr_cont("%s\n",
err_segs[i].rt_preempted ? "preempted" : "");
}
}
if (atomic_read(&n_rcu_torture_error) || n_rcu_torture_barrier_error) if (atomic_read(&n_rcu_torture_error) || n_rcu_torture_barrier_error)
rcu_torture_print_module_parms(cur_ops, "End of test: FAILURE"); rcu_torture_print_module_parms(cur_ops, "End of test: FAILURE");
else if (torture_onoff_failures()) else if (torture_onoff_failures())
...@@ -1939,12 +2080,12 @@ static void rcu_test_debug_objects(void) ...@@ -1939,12 +2080,12 @@ static void rcu_test_debug_objects(void)
static int __init static int __init
rcu_torture_init(void) rcu_torture_init(void)
{ {
int i; long i;
int cpu; int cpu;
int firsterr = 0; int firsterr = 0;
static struct rcu_torture_ops *torture_ops[] = { static struct rcu_torture_ops *torture_ops[] = {
&rcu_ops, &rcu_bh_ops, &rcu_busted_ops, &srcu_ops, &srcud_ops, &rcu_ops, &rcu_busted_ops, &srcu_ops, &srcud_ops,
&busted_srcud_ops, &sched_ops, &tasks_ops, &busted_srcud_ops, &tasks_ops,
}; };
if (!torture_init_begin(torture_type, verbose)) if (!torture_init_begin(torture_type, verbose))
...@@ -1963,6 +2104,7 @@ rcu_torture_init(void) ...@@ -1963,6 +2104,7 @@ rcu_torture_init(void)
for (i = 0; i < ARRAY_SIZE(torture_ops); i++) for (i = 0; i < ARRAY_SIZE(torture_ops); i++)
pr_cont(" %s", torture_ops[i]->name); pr_cont(" %s", torture_ops[i]->name);
pr_cont("\n"); pr_cont("\n");
WARN_ON(!IS_MODULE(CONFIG_RCU_TORTURE_TEST));
firsterr = -EINVAL; firsterr = -EINVAL;
goto unwind; goto unwind;
} }
...@@ -2013,6 +2155,8 @@ rcu_torture_init(void) ...@@ -2013,6 +2155,8 @@ rcu_torture_init(void)
per_cpu(rcu_torture_batch, cpu)[i] = 0; per_cpu(rcu_torture_batch, cpu)[i] = 0;
} }
} }
err_segs_recorded = 0;
rt_read_nsegs = 0;
/* Start up the kthreads. */ /* Start up the kthreads. */
...@@ -2044,7 +2188,7 @@ rcu_torture_init(void) ...@@ -2044,7 +2188,7 @@ rcu_torture_init(void)
goto unwind; goto unwind;
} }
for (i = 0; i < nrealreaders; i++) { for (i = 0; i < nrealreaders; i++) {
firsterr = torture_create_kthread(rcu_torture_reader, NULL, firsterr = torture_create_kthread(rcu_torture_reader, (void *)i,
reader_tasks[i]); reader_tasks[i]);
if (firsterr) if (firsterr)
goto unwind; goto unwind;
...@@ -2098,6 +2242,9 @@ rcu_torture_init(void) ...@@ -2098,6 +2242,9 @@ rcu_torture_init(void)
if (firsterr) if (firsterr)
goto unwind; goto unwind;
firsterr = rcu_torture_stall_init(); firsterr = rcu_torture_stall_init();
if (firsterr)
goto unwind;
firsterr = rcu_torture_fwd_prog_init();
if (firsterr) if (firsterr)
goto unwind; goto unwind;
firsterr = rcu_torture_barrier_init(); firsterr = rcu_torture_barrier_init();
......
...@@ -34,6 +34,8 @@ ...@@ -34,6 +34,8 @@
#include "rcu.h" #include "rcu.h"
int rcu_scheduler_active __read_mostly; int rcu_scheduler_active __read_mostly;
static LIST_HEAD(srcu_boot_list);
static bool srcu_init_done;
static int init_srcu_struct_fields(struct srcu_struct *sp) static int init_srcu_struct_fields(struct srcu_struct *sp)
{ {
...@@ -46,6 +48,7 @@ static int init_srcu_struct_fields(struct srcu_struct *sp) ...@@ -46,6 +48,7 @@ static int init_srcu_struct_fields(struct srcu_struct *sp)
sp->srcu_gp_waiting = false; sp->srcu_gp_waiting = false;
sp->srcu_idx = 0; sp->srcu_idx = 0;
INIT_WORK(&sp->srcu_work, srcu_drive_gp); INIT_WORK(&sp->srcu_work, srcu_drive_gp);
INIT_LIST_HEAD(&sp->srcu_work.entry);
return 0; return 0;
} }
...@@ -179,8 +182,12 @@ void call_srcu(struct srcu_struct *sp, struct rcu_head *rhp, ...@@ -179,8 +182,12 @@ void call_srcu(struct srcu_struct *sp, struct rcu_head *rhp,
*sp->srcu_cb_tail = rhp; *sp->srcu_cb_tail = rhp;
sp->srcu_cb_tail = &rhp->next; sp->srcu_cb_tail = &rhp->next;
local_irq_restore(flags); local_irq_restore(flags);
if (!READ_ONCE(sp->srcu_gp_running)) if (!READ_ONCE(sp->srcu_gp_running)) {
schedule_work(&sp->srcu_work); if (likely(srcu_init_done))
schedule_work(&sp->srcu_work);
else if (list_empty(&sp->srcu_work.entry))
list_add(&sp->srcu_work.entry, &srcu_boot_list);
}
} }
EXPORT_SYMBOL_GPL(call_srcu); EXPORT_SYMBOL_GPL(call_srcu);
...@@ -204,3 +211,21 @@ void __init rcu_scheduler_starting(void) ...@@ -204,3 +211,21 @@ void __init rcu_scheduler_starting(void)
{ {
rcu_scheduler_active = RCU_SCHEDULER_RUNNING; rcu_scheduler_active = RCU_SCHEDULER_RUNNING;
} }
/*
* Queue work for srcu_struct structures with early boot callbacks.
* The work won't actually execute until the workqueue initialization
* phase that takes place after the scheduler starts.
*/
void __init srcu_init(void)
{
struct srcu_struct *sp;
srcu_init_done = true;
while (!list_empty(&srcu_boot_list)) {
sp = list_first_entry(&srcu_boot_list,
struct srcu_struct, srcu_work.entry);
list_del_init(&sp->srcu_work.entry);
schedule_work(&sp->srcu_work);
}
}
...@@ -51,6 +51,10 @@ module_param(exp_holdoff, ulong, 0444); ...@@ -51,6 +51,10 @@ module_param(exp_holdoff, ulong, 0444);
static ulong counter_wrap_check = (ULONG_MAX >> 2); static ulong counter_wrap_check = (ULONG_MAX >> 2);
module_param(counter_wrap_check, ulong, 0444); module_param(counter_wrap_check, ulong, 0444);
/* Early-boot callback-management, so early that no lock is required! */
static LIST_HEAD(srcu_boot_list);
static bool __read_mostly srcu_init_done;
static void srcu_invoke_callbacks(struct work_struct *work); static void srcu_invoke_callbacks(struct work_struct *work);
static void srcu_reschedule(struct srcu_struct *sp, unsigned long delay); static void srcu_reschedule(struct srcu_struct *sp, unsigned long delay);
static void process_srcu(struct work_struct *work); static void process_srcu(struct work_struct *work);
...@@ -105,7 +109,7 @@ static void init_srcu_struct_nodes(struct srcu_struct *sp, bool is_static) ...@@ -105,7 +109,7 @@ static void init_srcu_struct_nodes(struct srcu_struct *sp, bool is_static)
rcu_init_levelspread(levelspread, num_rcu_lvl); rcu_init_levelspread(levelspread, num_rcu_lvl);
/* Each pass through this loop initializes one srcu_node structure. */ /* Each pass through this loop initializes one srcu_node structure. */
rcu_for_each_node_breadth_first(sp, snp) { srcu_for_each_node_breadth_first(sp, snp) {
spin_lock_init(&ACCESS_PRIVATE(snp, lock)); spin_lock_init(&ACCESS_PRIVATE(snp, lock));
WARN_ON_ONCE(ARRAY_SIZE(snp->srcu_have_cbs) != WARN_ON_ONCE(ARRAY_SIZE(snp->srcu_have_cbs) !=
ARRAY_SIZE(snp->srcu_data_have_cbs)); ARRAY_SIZE(snp->srcu_data_have_cbs));
...@@ -235,7 +239,6 @@ static void check_init_srcu_struct(struct srcu_struct *sp) ...@@ -235,7 +239,6 @@ static void check_init_srcu_struct(struct srcu_struct *sp)
{ {
unsigned long flags; unsigned long flags;
WARN_ON_ONCE(rcu_scheduler_active == RCU_SCHEDULER_INIT);
/* The smp_load_acquire() pairs with the smp_store_release(). */ /* The smp_load_acquire() pairs with the smp_store_release(). */
if (!rcu_seq_state(smp_load_acquire(&sp->srcu_gp_seq_needed))) /*^^^*/ if (!rcu_seq_state(smp_load_acquire(&sp->srcu_gp_seq_needed))) /*^^^*/
return; /* Already initialized. */ return; /* Already initialized. */
...@@ -561,7 +564,7 @@ static void srcu_gp_end(struct srcu_struct *sp) ...@@ -561,7 +564,7 @@ static void srcu_gp_end(struct srcu_struct *sp)
/* Initiate callback invocation as needed. */ /* Initiate callback invocation as needed. */
idx = rcu_seq_ctr(gpseq) % ARRAY_SIZE(snp->srcu_have_cbs); idx = rcu_seq_ctr(gpseq) % ARRAY_SIZE(snp->srcu_have_cbs);
rcu_for_each_node_breadth_first(sp, snp) { srcu_for_each_node_breadth_first(sp, snp) {
spin_lock_irq_rcu_node(snp); spin_lock_irq_rcu_node(snp);
cbs = false; cbs = false;
last_lvl = snp >= sp->level[rcu_num_lvls - 1]; last_lvl = snp >= sp->level[rcu_num_lvls - 1];
...@@ -701,7 +704,11 @@ static void srcu_funnel_gp_start(struct srcu_struct *sp, struct srcu_data *sdp, ...@@ -701,7 +704,11 @@ static void srcu_funnel_gp_start(struct srcu_struct *sp, struct srcu_data *sdp,
rcu_seq_state(sp->srcu_gp_seq) == SRCU_STATE_IDLE) { rcu_seq_state(sp->srcu_gp_seq) == SRCU_STATE_IDLE) {
WARN_ON_ONCE(ULONG_CMP_GE(sp->srcu_gp_seq, sp->srcu_gp_seq_needed)); WARN_ON_ONCE(ULONG_CMP_GE(sp->srcu_gp_seq, sp->srcu_gp_seq_needed));
srcu_gp_start(sp); srcu_gp_start(sp);
queue_delayed_work(rcu_gp_wq, &sp->work, srcu_get_delay(sp)); if (likely(srcu_init_done))
queue_delayed_work(rcu_gp_wq, &sp->work,
srcu_get_delay(sp));
else if (list_empty(&sp->work.work.entry))
list_add(&sp->work.work.entry, &srcu_boot_list);
} }
spin_unlock_irqrestore_rcu_node(sp, flags); spin_unlock_irqrestore_rcu_node(sp, flags);
} }
...@@ -980,7 +987,7 @@ EXPORT_SYMBOL_GPL(synchronize_srcu_expedited); ...@@ -980,7 +987,7 @@ EXPORT_SYMBOL_GPL(synchronize_srcu_expedited);
* There are memory-ordering constraints implied by synchronize_srcu(). * There are memory-ordering constraints implied by synchronize_srcu().
* On systems with more than one CPU, when synchronize_srcu() returns, * On systems with more than one CPU, when synchronize_srcu() returns,
* each CPU is guaranteed to have executed a full memory barrier since * each CPU is guaranteed to have executed a full memory barrier since
* the end of its last corresponding SRCU-sched read-side critical section * the end of its last corresponding SRCU read-side critical section
* whose beginning preceded the call to synchronize_srcu(). In addition, * whose beginning preceded the call to synchronize_srcu(). In addition,
* each CPU having an SRCU read-side critical section that extends beyond * each CPU having an SRCU read-side critical section that extends beyond
* the return from synchronize_srcu() is guaranteed to have executed a * the return from synchronize_srcu() is guaranteed to have executed a
...@@ -1308,3 +1315,17 @@ static int __init srcu_bootup_announce(void) ...@@ -1308,3 +1315,17 @@ static int __init srcu_bootup_announce(void)
return 0; return 0;
} }
early_initcall(srcu_bootup_announce); early_initcall(srcu_bootup_announce);
void __init srcu_init(void)
{
struct srcu_struct *sp;
srcu_init_done = true;
while (!list_empty(&srcu_boot_list)) {
sp = list_first_entry(&srcu_boot_list, struct srcu_struct,
work.work.entry);
check_init_srcu_struct(sp);
list_del_init(&sp->work.work.entry);
queue_work(rcu_gp_wq, &sp->work.work);
}
}
...@@ -46,69 +46,27 @@ struct rcu_ctrlblk { ...@@ -46,69 +46,27 @@ struct rcu_ctrlblk {
}; };
/* Definition for rcupdate control block. */ /* Definition for rcupdate control block. */
static struct rcu_ctrlblk rcu_sched_ctrlblk = { static struct rcu_ctrlblk rcu_ctrlblk = {
.donetail = &rcu_sched_ctrlblk.rcucblist, .donetail = &rcu_ctrlblk.rcucblist,
.curtail = &rcu_sched_ctrlblk.rcucblist, .curtail = &rcu_ctrlblk.rcucblist,
}; };
static struct rcu_ctrlblk rcu_bh_ctrlblk = { void rcu_barrier(void)
.donetail = &rcu_bh_ctrlblk.rcucblist,
.curtail = &rcu_bh_ctrlblk.rcucblist,
};
void rcu_barrier_bh(void)
{
wait_rcu_gp(call_rcu_bh);
}
EXPORT_SYMBOL(rcu_barrier_bh);
void rcu_barrier_sched(void)
{
wait_rcu_gp(call_rcu_sched);
}
EXPORT_SYMBOL(rcu_barrier_sched);
/*
* Helper function for rcu_sched_qs() and rcu_bh_qs().
* Also irqs are disabled to avoid confusion due to interrupt handlers
* invoking call_rcu().
*/
static int rcu_qsctr_help(struct rcu_ctrlblk *rcp)
{
if (rcp->donetail != rcp->curtail) {
rcp->donetail = rcp->curtail;
return 1;
}
return 0;
}
/*
* Record an rcu quiescent state. And an rcu_bh quiescent state while we
* are at it, given that any rcu quiescent state is also an rcu_bh
* quiescent state. Use "+" instead of "||" to defeat short circuiting.
*/
void rcu_sched_qs(void)
{ {
unsigned long flags; wait_rcu_gp(call_rcu);
local_irq_save(flags);
if (rcu_qsctr_help(&rcu_sched_ctrlblk) +
rcu_qsctr_help(&rcu_bh_ctrlblk))
raise_softirq(RCU_SOFTIRQ);
local_irq_restore(flags);
} }
EXPORT_SYMBOL(rcu_barrier);
/* /* Record an rcu quiescent state. */
* Record an rcu_bh quiescent state. void rcu_qs(void)
*/
void rcu_bh_qs(void)
{ {
unsigned long flags; unsigned long flags;
local_irq_save(flags); local_irq_save(flags);
if (rcu_qsctr_help(&rcu_bh_ctrlblk)) if (rcu_ctrlblk.donetail != rcu_ctrlblk.curtail) {
rcu_ctrlblk.donetail = rcu_ctrlblk.curtail;
raise_softirq(RCU_SOFTIRQ); raise_softirq(RCU_SOFTIRQ);
}
local_irq_restore(flags); local_irq_restore(flags);
} }
...@@ -120,34 +78,33 @@ void rcu_bh_qs(void) ...@@ -120,34 +78,33 @@ void rcu_bh_qs(void)
*/ */
void rcu_check_callbacks(int user) void rcu_check_callbacks(int user)
{ {
if (user) if (user) {
rcu_sched_qs(); rcu_qs();
if (user || !in_softirq()) } else if (rcu_ctrlblk.donetail != rcu_ctrlblk.curtail) {
rcu_bh_qs(); set_tsk_need_resched(current);
set_preempt_need_resched();
}
} }
/* /* Invoke the RCU callbacks whose grace period has elapsed. */
* Invoke the RCU callbacks on the specified rcu_ctrlkblk structure static __latent_entropy void rcu_process_callbacks(struct softirq_action *unused)
* whose grace period has elapsed.
*/
static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp)
{ {
struct rcu_head *next, *list; struct rcu_head *next, *list;
unsigned long flags; unsigned long flags;
/* Move the ready-to-invoke callbacks to a local list. */ /* Move the ready-to-invoke callbacks to a local list. */
local_irq_save(flags); local_irq_save(flags);
if (rcp->donetail == &rcp->rcucblist) { if (rcu_ctrlblk.donetail == &rcu_ctrlblk.rcucblist) {
/* No callbacks ready, so just leave. */ /* No callbacks ready, so just leave. */
local_irq_restore(flags); local_irq_restore(flags);
return; return;
} }
list = rcp->rcucblist; list = rcu_ctrlblk.rcucblist;
rcp->rcucblist = *rcp->donetail; rcu_ctrlblk.rcucblist = *rcu_ctrlblk.donetail;
*rcp->donetail = NULL; *rcu_ctrlblk.donetail = NULL;
if (rcp->curtail == rcp->donetail) if (rcu_ctrlblk.curtail == rcu_ctrlblk.donetail)
rcp->curtail = &rcp->rcucblist; rcu_ctrlblk.curtail = &rcu_ctrlblk.rcucblist;
rcp->donetail = &rcp->rcucblist; rcu_ctrlblk.donetail = &rcu_ctrlblk.rcucblist;
local_irq_restore(flags); local_irq_restore(flags);
/* Invoke the callbacks on the local list. */ /* Invoke the callbacks on the local list. */
...@@ -162,37 +119,31 @@ static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp) ...@@ -162,37 +119,31 @@ static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp)
} }
} }
static __latent_entropy void rcu_process_callbacks(struct softirq_action *unused)
{
__rcu_process_callbacks(&rcu_sched_ctrlblk);
__rcu_process_callbacks(&rcu_bh_ctrlblk);
}
/* /*
* Wait for a grace period to elapse. But it is illegal to invoke * Wait for a grace period to elapse. But it is illegal to invoke
* synchronize_sched() from within an RCU read-side critical section. * synchronize_rcu() from within an RCU read-side critical section.
* Therefore, any legal call to synchronize_sched() is a quiescent * Therefore, any legal call to synchronize_rcu() is a quiescent
* state, and so on a UP system, synchronize_sched() need do nothing. * state, and so on a UP system, synchronize_rcu() need do nothing.
* Ditto for synchronize_rcu_bh(). (But Lai Jiangshan points out the * (But Lai Jiangshan points out the benefits of doing might_sleep()
* benefits of doing might_sleep() to reduce latency.) * to reduce latency.)
* *
* Cool, huh? (Due to Josh Triplett.) * Cool, huh? (Due to Josh Triplett.)
*/ */
void synchronize_sched(void) void synchronize_rcu(void)
{ {
RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map) || RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map) ||
lock_is_held(&rcu_lock_map) || lock_is_held(&rcu_lock_map) ||
lock_is_held(&rcu_sched_lock_map), lock_is_held(&rcu_sched_lock_map),
"Illegal synchronize_sched() in RCU read-side critical section"); "Illegal synchronize_rcu() in RCU read-side critical section");
} }
EXPORT_SYMBOL_GPL(synchronize_sched); EXPORT_SYMBOL_GPL(synchronize_rcu);
/* /*
* Helper function for call_rcu() and call_rcu_bh(). * Post an RCU callback to be invoked after the end of an RCU grace
* period. But since we have but one CPU, that would be after any
* quiescent state.
*/ */
static void __call_rcu(struct rcu_head *head, void call_rcu(struct rcu_head *head, rcu_callback_t func)
rcu_callback_t func,
struct rcu_ctrlblk *rcp)
{ {
unsigned long flags; unsigned long flags;
...@@ -201,39 +152,20 @@ static void __call_rcu(struct rcu_head *head, ...@@ -201,39 +152,20 @@ static void __call_rcu(struct rcu_head *head,
head->next = NULL; head->next = NULL;
local_irq_save(flags); local_irq_save(flags);
*rcp->curtail = head; *rcu_ctrlblk.curtail = head;
rcp->curtail = &head->next; rcu_ctrlblk.curtail = &head->next;
local_irq_restore(flags); local_irq_restore(flags);
if (unlikely(is_idle_task(current))) { if (unlikely(is_idle_task(current))) {
/* force scheduling for rcu_sched_qs() */ /* force scheduling for rcu_qs() */
resched_cpu(0); resched_cpu(0);
} }
} }
EXPORT_SYMBOL_GPL(call_rcu);
/*
* Post an RCU callback to be invoked after the end of an RCU-sched grace
* period. But since we have but one CPU, that would be after any
* quiescent state.
*/
void call_rcu_sched(struct rcu_head *head, rcu_callback_t func)
{
__call_rcu(head, func, &rcu_sched_ctrlblk);
}
EXPORT_SYMBOL_GPL(call_rcu_sched);
/*
* Post an RCU bottom-half callback to be invoked after any subsequent
* quiescent state.
*/
void call_rcu_bh(struct rcu_head *head, rcu_callback_t func)
{
__call_rcu(head, func, &rcu_bh_ctrlblk);
}
EXPORT_SYMBOL_GPL(call_rcu_bh);
void __init rcu_init(void) void __init rcu_init(void)
{ {
open_softirq(RCU_SOFTIRQ, rcu_process_callbacks); open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
rcu_early_boot_tests(); rcu_early_boot_tests();
srcu_init();
} }
This source diff could not be displayed because it is too large. You can view the blob instead.
...@@ -34,34 +34,9 @@ ...@@ -34,34 +34,9 @@
#include "rcu_segcblist.h" #include "rcu_segcblist.h"
/*
* Dynticks per-CPU state.
*/
struct rcu_dynticks {
long dynticks_nesting; /* Track process nesting level. */
long dynticks_nmi_nesting; /* Track irq/NMI nesting level. */
atomic_t dynticks; /* Even value for idle, else odd. */
bool rcu_need_heavy_qs; /* GP old, need heavy quiescent state. */
unsigned long rcu_qs_ctr; /* Light universal quiescent state ctr. */
bool rcu_urgent_qs; /* GP old need light quiescent state. */
#ifdef CONFIG_RCU_FAST_NO_HZ
bool all_lazy; /* Are all CPU's CBs lazy? */
unsigned long nonlazy_posted;
/* # times non-lazy CBs posted to CPU. */
unsigned long nonlazy_posted_snap;
/* idle-period nonlazy_posted snapshot. */
unsigned long last_accelerate;
/* Last jiffy CBs were accelerated. */
unsigned long last_advance_all;
/* Last jiffy CBs were all advanced. */
int tick_nohz_enabled_snap; /* Previously seen value from sysfs. */
#endif /* #ifdef CONFIG_RCU_FAST_NO_HZ */
};
/* Communicate arguments to a workqueue handler. */ /* Communicate arguments to a workqueue handler. */
struct rcu_exp_work { struct rcu_exp_work {
smp_call_func_t rew_func; smp_call_func_t rew_func;
struct rcu_state *rew_rsp;
unsigned long rew_s; unsigned long rew_s;
struct work_struct rew_work; struct work_struct rew_work;
}; };
...@@ -170,7 +145,7 @@ struct rcu_node { ...@@ -170,7 +145,7 @@ struct rcu_node {
* are indexed relative to this interval rather than the global CPU ID space. * are indexed relative to this interval rather than the global CPU ID space.
* This generates the bit for a CPU in node-local masks. * This generates the bit for a CPU in node-local masks.
*/ */
#define leaf_node_cpu_bit(rnp, cpu) (1UL << ((cpu) - (rnp)->grplo)) #define leaf_node_cpu_bit(rnp, cpu) (BIT((cpu) - (rnp)->grplo))
/* /*
* Union to allow "aggregate OR" operation on the need for a quiescent * Union to allow "aggregate OR" operation on the need for a quiescent
...@@ -189,12 +164,11 @@ struct rcu_data { ...@@ -189,12 +164,11 @@ struct rcu_data {
/* 1) quiescent-state and grace-period handling : */ /* 1) quiescent-state and grace-period handling : */
unsigned long gp_seq; /* Track rsp->rcu_gp_seq counter. */ unsigned long gp_seq; /* Track rsp->rcu_gp_seq counter. */
unsigned long gp_seq_needed; /* Track rsp->rcu_gp_seq_needed ctr. */ unsigned long gp_seq_needed; /* Track rsp->rcu_gp_seq_needed ctr. */
unsigned long rcu_qs_ctr_snap;/* Snapshot of rcu_qs_ctr to check */
/* for rcu_all_qs() invocations. */
union rcu_noqs cpu_no_qs; /* No QSes yet for this CPU. */ union rcu_noqs cpu_no_qs; /* No QSes yet for this CPU. */
bool core_needs_qs; /* Core waits for quiesc state. */ bool core_needs_qs; /* Core waits for quiesc state. */
bool beenonline; /* CPU online at least once. */ bool beenonline; /* CPU online at least once. */
bool gpwrap; /* Possible ->gp_seq wrap. */ bool gpwrap; /* Possible ->gp_seq wrap. */
bool deferred_qs; /* This CPU awaiting a deferred QS? */
struct rcu_node *mynode; /* This CPU's leaf of hierarchy */ struct rcu_node *mynode; /* This CPU's leaf of hierarchy */
unsigned long grpmask; /* Mask to apply to leaf qsmask. */ unsigned long grpmask; /* Mask to apply to leaf qsmask. */
unsigned long ticks_this_gp; /* The number of scheduling-clock */ unsigned long ticks_this_gp; /* The number of scheduling-clock */
...@@ -213,23 +187,27 @@ struct rcu_data { ...@@ -213,23 +187,27 @@ struct rcu_data {
long blimit; /* Upper limit on a processed batch */ long blimit; /* Upper limit on a processed batch */
/* 3) dynticks interface. */ /* 3) dynticks interface. */
struct rcu_dynticks *dynticks; /* Shared per-CPU dynticks state. */
int dynticks_snap; /* Per-GP tracking for dynticks. */ int dynticks_snap; /* Per-GP tracking for dynticks. */
long dynticks_nesting; /* Track process nesting level. */
/* 4) reasons this CPU needed to be kicked by force_quiescent_state */ long dynticks_nmi_nesting; /* Track irq/NMI nesting level. */
unsigned long dynticks_fqs; /* Kicked due to dynticks idle. */ atomic_t dynticks; /* Even value for idle, else odd. */
unsigned long cond_resched_completed; bool rcu_need_heavy_qs; /* GP old, so heavy quiescent state! */
/* Grace period that needs help */ bool rcu_urgent_qs; /* GP old need light quiescent state. */
/* from cond_resched(). */
/* 5) _rcu_barrier(), OOM callbacks, and expediting. */
struct rcu_head barrier_head;
#ifdef CONFIG_RCU_FAST_NO_HZ #ifdef CONFIG_RCU_FAST_NO_HZ
struct rcu_head oom_head; bool all_lazy; /* Are all CPU's CBs lazy? */
unsigned long nonlazy_posted; /* # times non-lazy CB posted to CPU. */
unsigned long nonlazy_posted_snap;
/* Nonlazy_posted snapshot. */
unsigned long last_accelerate; /* Last jiffy CBs were accelerated. */
unsigned long last_advance_all; /* Last jiffy CBs were all advanced. */
int tick_nohz_enabled_snap; /* Previously seen value from sysfs. */
#endif /* #ifdef CONFIG_RCU_FAST_NO_HZ */ #endif /* #ifdef CONFIG_RCU_FAST_NO_HZ */
/* 4) rcu_barrier(), OOM callbacks, and expediting. */
struct rcu_head barrier_head;
int exp_dynticks_snap; /* Double-check need for IPI. */ int exp_dynticks_snap; /* Double-check need for IPI. */
/* 6) Callback offloading. */ /* 5) Callback offloading. */
#ifdef CONFIG_RCU_NOCB_CPU #ifdef CONFIG_RCU_NOCB_CPU
struct rcu_head *nocb_head; /* CBs waiting for kthread. */ struct rcu_head *nocb_head; /* CBs waiting for kthread. */
struct rcu_head **nocb_tail; struct rcu_head **nocb_tail;
...@@ -256,7 +234,7 @@ struct rcu_data { ...@@ -256,7 +234,7 @@ struct rcu_data {
/* Leader CPU takes GP-end wakeups. */ /* Leader CPU takes GP-end wakeups. */
#endif /* #ifdef CONFIG_RCU_NOCB_CPU */ #endif /* #ifdef CONFIG_RCU_NOCB_CPU */
/* 7) Diagnostic data, including RCU CPU stall warnings. */ /* 6) Diagnostic data, including RCU CPU stall warnings. */
unsigned int softirq_snap; /* Snapshot of softirq activity. */ unsigned int softirq_snap; /* Snapshot of softirq activity. */
/* ->rcu_iw* fields protected by leaf rcu_node ->lock. */ /* ->rcu_iw* fields protected by leaf rcu_node ->lock. */
struct irq_work rcu_iw; /* Check for non-irq activity. */ struct irq_work rcu_iw; /* Check for non-irq activity. */
...@@ -266,9 +244,9 @@ struct rcu_data { ...@@ -266,9 +244,9 @@ struct rcu_data {
short rcu_ofl_gp_flags; /* ->gp_flags at last offline. */ short rcu_ofl_gp_flags; /* ->gp_flags at last offline. */
unsigned long rcu_onl_gp_seq; /* ->gp_seq at last online. */ unsigned long rcu_onl_gp_seq; /* ->gp_seq at last online. */
short rcu_onl_gp_flags; /* ->gp_flags at last online. */ short rcu_onl_gp_flags; /* ->gp_flags at last online. */
unsigned long last_fqs_resched; /* Time of last rcu_resched(). */
int cpu; int cpu;
struct rcu_state *rsp;
}; };
/* Values for nocb_defer_wakeup field in struct rcu_data. */ /* Values for nocb_defer_wakeup field in struct rcu_data. */
...@@ -314,8 +292,6 @@ struct rcu_state { ...@@ -314,8 +292,6 @@ struct rcu_state {
struct rcu_node *level[RCU_NUM_LVLS + 1]; struct rcu_node *level[RCU_NUM_LVLS + 1];
/* Hierarchy levels (+1 to */ /* Hierarchy levels (+1 to */
/* shut bogus gcc warning) */ /* shut bogus gcc warning) */
struct rcu_data __percpu *rda; /* pointer of percu rcu_data. */
call_rcu_func_t call; /* call_rcu() flavor. */
int ncpus; /* # CPUs seen so far. */ int ncpus; /* # CPUs seen so far. */
/* The following fields are guarded by the root rcu_node's lock. */ /* The following fields are guarded by the root rcu_node's lock. */
...@@ -334,7 +310,7 @@ struct rcu_state { ...@@ -334,7 +310,7 @@ struct rcu_state {
atomic_t barrier_cpu_count; /* # CPUs waiting on. */ atomic_t barrier_cpu_count; /* # CPUs waiting on. */
struct completion barrier_completion; /* Wake at barrier end. */ struct completion barrier_completion; /* Wake at barrier end. */
unsigned long barrier_sequence; /* ++ at start and end of */ unsigned long barrier_sequence; /* ++ at start and end of */
/* _rcu_barrier(). */ /* rcu_barrier(). */
/* End of fields guarded by barrier_mutex. */ /* End of fields guarded by barrier_mutex. */
struct mutex exp_mutex; /* Serialize expedited GP. */ struct mutex exp_mutex; /* Serialize expedited GP. */
...@@ -366,9 +342,8 @@ struct rcu_state { ...@@ -366,9 +342,8 @@ struct rcu_state {
/* jiffies. */ /* jiffies. */
const char *name; /* Name of structure. */ const char *name; /* Name of structure. */
char abbr; /* Abbreviated name. */ char abbr; /* Abbreviated name. */
struct list_head flavors; /* List of RCU flavors. */
spinlock_t ofl_lock ____cacheline_internodealigned_in_smp; raw_spinlock_t ofl_lock ____cacheline_internodealigned_in_smp;
/* Synchronize offline with */ /* Synchronize offline with */
/* GP pre-initialization. */ /* GP pre-initialization. */
}; };
...@@ -388,7 +363,6 @@ struct rcu_state { ...@@ -388,7 +363,6 @@ struct rcu_state {
#define RCU_GP_CLEANUP 7 /* Grace-period cleanup started. */ #define RCU_GP_CLEANUP 7 /* Grace-period cleanup started. */
#define RCU_GP_CLEANED 8 /* Grace-period cleanup complete. */ #define RCU_GP_CLEANED 8 /* Grace-period cleanup complete. */
#ifndef RCU_TREE_NONCORE
static const char * const gp_state_names[] = { static const char * const gp_state_names[] = {
"RCU_GP_IDLE", "RCU_GP_IDLE",
"RCU_GP_WAIT_GPS", "RCU_GP_WAIT_GPS",
...@@ -400,13 +374,29 @@ static const char * const gp_state_names[] = { ...@@ -400,13 +374,29 @@ static const char * const gp_state_names[] = {
"RCU_GP_CLEANUP", "RCU_GP_CLEANUP",
"RCU_GP_CLEANED", "RCU_GP_CLEANED",
}; };
#endif /* #ifndef RCU_TREE_NONCORE */
extern struct list_head rcu_struct_flavors;
/* Sequence through rcu_state structures for each RCU flavor. */ /*
#define for_each_rcu_flavor(rsp) \ * In order to export the rcu_state name to the tracing tools, it
list_for_each_entry((rsp), &rcu_struct_flavors, flavors) * needs to be added in the __tracepoint_string section.
* This requires defining a separate variable tp_<sname>_varname
* that points to the string being used, and this will allow
* the tracing userspace tools to be able to decipher the string
* address to the matching string.
*/
#ifdef CONFIG_PREEMPT_RCU
#define RCU_ABBR 'p'
#define RCU_NAME_RAW "rcu_preempt"
#else /* #ifdef CONFIG_PREEMPT_RCU */
#define RCU_ABBR 's'
#define RCU_NAME_RAW "rcu_sched"
#endif /* #else #ifdef CONFIG_PREEMPT_RCU */
#ifndef CONFIG_TRACING
#define RCU_NAME RCU_NAME_RAW
#else /* #ifdef CONFIG_TRACING */
static char rcu_name[] = RCU_NAME_RAW;
static const char *tp_rcu_varname __used __tracepoint_string = rcu_name;
#define RCU_NAME rcu_name
#endif /* #else #ifdef CONFIG_TRACING */
/* /*
* RCU implementation internal declarations: * RCU implementation internal declarations:
...@@ -419,7 +409,7 @@ extern struct rcu_state rcu_bh_state; ...@@ -419,7 +409,7 @@ extern struct rcu_state rcu_bh_state;
extern struct rcu_state rcu_preempt_state; extern struct rcu_state rcu_preempt_state;
#endif /* #ifdef CONFIG_PREEMPT_RCU */ #endif /* #ifdef CONFIG_PREEMPT_RCU */
int rcu_dynticks_snap(struct rcu_dynticks *rdtp); int rcu_dynticks_snap(struct rcu_data *rdp);
#ifdef CONFIG_RCU_BOOST #ifdef CONFIG_RCU_BOOST
DECLARE_PER_CPU(unsigned int, rcu_cpu_kthread_status); DECLARE_PER_CPU(unsigned int, rcu_cpu_kthread_status);
...@@ -428,45 +418,37 @@ DECLARE_PER_CPU(unsigned int, rcu_cpu_kthread_loops); ...@@ -428,45 +418,37 @@ DECLARE_PER_CPU(unsigned int, rcu_cpu_kthread_loops);
DECLARE_PER_CPU(char, rcu_cpu_has_work); DECLARE_PER_CPU(char, rcu_cpu_has_work);
#endif /* #ifdef CONFIG_RCU_BOOST */ #endif /* #ifdef CONFIG_RCU_BOOST */
#ifndef RCU_TREE_NONCORE
/* Forward declarations for rcutree_plugin.h */ /* Forward declarations for rcutree_plugin.h */
static void rcu_bootup_announce(void); static void rcu_bootup_announce(void);
static void rcu_preempt_note_context_switch(bool preempt); static void rcu_qs(void);
static int rcu_preempt_blocked_readers_cgp(struct rcu_node *rnp); static int rcu_preempt_blocked_readers_cgp(struct rcu_node *rnp);
#ifdef CONFIG_HOTPLUG_CPU #ifdef CONFIG_HOTPLUG_CPU
static bool rcu_preempt_has_tasks(struct rcu_node *rnp); static bool rcu_preempt_has_tasks(struct rcu_node *rnp);
#endif /* #ifdef CONFIG_HOTPLUG_CPU */ #endif /* #ifdef CONFIG_HOTPLUG_CPU */
static void rcu_print_detail_task_stall(struct rcu_state *rsp); static void rcu_print_detail_task_stall(void);
static int rcu_print_task_stall(struct rcu_node *rnp); static int rcu_print_task_stall(struct rcu_node *rnp);
static int rcu_print_task_exp_stall(struct rcu_node *rnp); static int rcu_print_task_exp_stall(struct rcu_node *rnp);
static void rcu_preempt_check_blocked_tasks(struct rcu_state *rsp, static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp);
struct rcu_node *rnp); static void rcu_flavor_check_callbacks(int user);
static void rcu_preempt_check_callbacks(void);
void call_rcu(struct rcu_head *head, rcu_callback_t func); void call_rcu(struct rcu_head *head, rcu_callback_t func);
static void __init __rcu_init_preempt(void); static void dump_blkd_tasks(struct rcu_node *rnp, int ncheck);
static void dump_blkd_tasks(struct rcu_state *rsp, struct rcu_node *rnp,
int ncheck);
static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags); static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags);
static void rcu_preempt_boost_start_gp(struct rcu_node *rnp); static void rcu_preempt_boost_start_gp(struct rcu_node *rnp);
static void invoke_rcu_callbacks_kthread(void); static void invoke_rcu_callbacks_kthread(void);
static bool rcu_is_callbacks_kthread(void); static bool rcu_is_callbacks_kthread(void);
#ifdef CONFIG_RCU_BOOST
static int rcu_spawn_one_boost_kthread(struct rcu_state *rsp,
struct rcu_node *rnp);
#endif /* #ifdef CONFIG_RCU_BOOST */
static void __init rcu_spawn_boost_kthreads(void); static void __init rcu_spawn_boost_kthreads(void);
static void rcu_prepare_kthreads(int cpu); static void rcu_prepare_kthreads(int cpu);
static void rcu_cleanup_after_idle(void); static void rcu_cleanup_after_idle(void);
static void rcu_prepare_for_idle(void); static void rcu_prepare_for_idle(void);
static void rcu_idle_count_callbacks_posted(void); static void rcu_idle_count_callbacks_posted(void);
static bool rcu_preempt_has_tasks(struct rcu_node *rnp); static bool rcu_preempt_has_tasks(struct rcu_node *rnp);
static bool rcu_preempt_need_deferred_qs(struct task_struct *t);
static void rcu_preempt_deferred_qs(struct task_struct *t);
static void print_cpu_stall_info_begin(void); static void print_cpu_stall_info_begin(void);
static void print_cpu_stall_info(struct rcu_state *rsp, int cpu); static void print_cpu_stall_info(int cpu);
static void print_cpu_stall_info_end(void); static void print_cpu_stall_info_end(void);
static void zero_cpu_stall_ticks(struct rcu_data *rdp); static void zero_cpu_stall_ticks(struct rcu_data *rdp);
static void increment_cpu_stall_ticks(void); static bool rcu_nocb_cpu_needs_barrier(int cpu);
static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu);
static struct swait_queue_head *rcu_nocb_gp_get(struct rcu_node *rnp); static struct swait_queue_head *rcu_nocb_gp_get(struct rcu_node *rnp);
static void rcu_nocb_gp_cleanup(struct swait_queue_head *sq); static void rcu_nocb_gp_cleanup(struct swait_queue_head *sq);
static void rcu_init_one_nocb(struct rcu_node *rnp); static void rcu_init_one_nocb(struct rcu_node *rnp);
...@@ -481,11 +463,11 @@ static void rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp); ...@@ -481,11 +463,11 @@ static void rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp);
static void rcu_spawn_all_nocb_kthreads(int cpu); static void rcu_spawn_all_nocb_kthreads(int cpu);
static void __init rcu_spawn_nocb_kthreads(void); static void __init rcu_spawn_nocb_kthreads(void);
#ifdef CONFIG_RCU_NOCB_CPU #ifdef CONFIG_RCU_NOCB_CPU
static void __init rcu_organize_nocb_kthreads(struct rcu_state *rsp); static void __init rcu_organize_nocb_kthreads(void);
#endif /* #ifdef CONFIG_RCU_NOCB_CPU */ #endif /* #ifdef CONFIG_RCU_NOCB_CPU */
static bool init_nocb_callback_list(struct rcu_data *rdp); static bool init_nocb_callback_list(struct rcu_data *rdp);
static void rcu_bind_gp_kthread(void); static void rcu_bind_gp_kthread(void);
static bool rcu_nohz_full_cpu(struct rcu_state *rsp); static bool rcu_nohz_full_cpu(void);
static void rcu_dynticks_task_enter(void); static void rcu_dynticks_task_enter(void);
static void rcu_dynticks_task_exit(void); static void rcu_dynticks_task_exit(void);
...@@ -496,5 +478,3 @@ void srcu_offline_cpu(unsigned int cpu); ...@@ -496,5 +478,3 @@ void srcu_offline_cpu(unsigned int cpu);
void srcu_online_cpu(unsigned int cpu) { } void srcu_online_cpu(unsigned int cpu) { }
void srcu_offline_cpu(unsigned int cpu) { } void srcu_offline_cpu(unsigned int cpu) { }
#endif /* #else #ifdef CONFIG_SRCU */ #endif /* #else #ifdef CONFIG_SRCU */
#endif /* #ifndef RCU_TREE_NONCORE */
...@@ -25,39 +25,39 @@ ...@@ -25,39 +25,39 @@
/* /*
* Record the start of an expedited grace period. * Record the start of an expedited grace period.
*/ */
static void rcu_exp_gp_seq_start(struct rcu_state *rsp) static void rcu_exp_gp_seq_start(void)
{ {
rcu_seq_start(&rsp->expedited_sequence); rcu_seq_start(&rcu_state.expedited_sequence);
} }
/* /*
* Return then value that expedited-grace-period counter will have * Return then value that expedited-grace-period counter will have
* at the end of the current grace period. * at the end of the current grace period.
*/ */
static __maybe_unused unsigned long rcu_exp_gp_seq_endval(struct rcu_state *rsp) static __maybe_unused unsigned long rcu_exp_gp_seq_endval(void)
{ {
return rcu_seq_endval(&rsp->expedited_sequence); return rcu_seq_endval(&rcu_state.expedited_sequence);
} }
/* /*
* Record the end of an expedited grace period. * Record the end of an expedited grace period.
*/ */
static void rcu_exp_gp_seq_end(struct rcu_state *rsp) static void rcu_exp_gp_seq_end(void)
{ {
rcu_seq_end(&rsp->expedited_sequence); rcu_seq_end(&rcu_state.expedited_sequence);
smp_mb(); /* Ensure that consecutive grace periods serialize. */ smp_mb(); /* Ensure that consecutive grace periods serialize. */
} }
/* /*
* Take a snapshot of the expedited-grace-period counter. * Take a snapshot of the expedited-grace-period counter.
*/ */
static unsigned long rcu_exp_gp_seq_snap(struct rcu_state *rsp) static unsigned long rcu_exp_gp_seq_snap(void)
{ {
unsigned long s; unsigned long s;
smp_mb(); /* Caller's modifications seen first by other CPUs. */ smp_mb(); /* Caller's modifications seen first by other CPUs. */
s = rcu_seq_snap(&rsp->expedited_sequence); s = rcu_seq_snap(&rcu_state.expedited_sequence);
trace_rcu_exp_grace_period(rsp->name, s, TPS("snap")); trace_rcu_exp_grace_period(rcu_state.name, s, TPS("snap"));
return s; return s;
} }
...@@ -66,9 +66,9 @@ static unsigned long rcu_exp_gp_seq_snap(struct rcu_state *rsp) ...@@ -66,9 +66,9 @@ static unsigned long rcu_exp_gp_seq_snap(struct rcu_state *rsp)
* if a full expedited grace period has elapsed since that snapshot * if a full expedited grace period has elapsed since that snapshot
* was taken. * was taken.
*/ */
static bool rcu_exp_gp_seq_done(struct rcu_state *rsp, unsigned long s) static bool rcu_exp_gp_seq_done(unsigned long s)
{ {
return rcu_seq_done(&rsp->expedited_sequence, s); return rcu_seq_done(&rcu_state.expedited_sequence, s);
} }
/* /*
...@@ -78,26 +78,26 @@ static bool rcu_exp_gp_seq_done(struct rcu_state *rsp, unsigned long s) ...@@ -78,26 +78,26 @@ static bool rcu_exp_gp_seq_done(struct rcu_state *rsp, unsigned long s)
* ever been online. This means that this function normally takes its * ever been online. This means that this function normally takes its
* no-work-to-do fastpath. * no-work-to-do fastpath.
*/ */
static void sync_exp_reset_tree_hotplug(struct rcu_state *rsp) static void sync_exp_reset_tree_hotplug(void)
{ {
bool done; bool done;
unsigned long flags; unsigned long flags;
unsigned long mask; unsigned long mask;
unsigned long oldmask; unsigned long oldmask;
int ncpus = smp_load_acquire(&rsp->ncpus); /* Order against locking. */ int ncpus = smp_load_acquire(&rcu_state.ncpus); /* Order vs. locking. */
struct rcu_node *rnp; struct rcu_node *rnp;
struct rcu_node *rnp_up; struct rcu_node *rnp_up;
/* If no new CPUs onlined since last time, nothing to do. */ /* If no new CPUs onlined since last time, nothing to do. */
if (likely(ncpus == rsp->ncpus_snap)) if (likely(ncpus == rcu_state.ncpus_snap))
return; return;
rsp->ncpus_snap = ncpus; rcu_state.ncpus_snap = ncpus;
/* /*
* Each pass through the following loop propagates newly onlined * Each pass through the following loop propagates newly onlined
* CPUs for the current rcu_node structure up the rcu_node tree. * CPUs for the current rcu_node structure up the rcu_node tree.
*/ */
rcu_for_each_leaf_node(rsp, rnp) { rcu_for_each_leaf_node(rnp) {
raw_spin_lock_irqsave_rcu_node(rnp, flags); raw_spin_lock_irqsave_rcu_node(rnp, flags);
if (rnp->expmaskinit == rnp->expmaskinitnext) { if (rnp->expmaskinit == rnp->expmaskinitnext) {
raw_spin_unlock_irqrestore_rcu_node(rnp, flags); raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
...@@ -135,13 +135,13 @@ static void sync_exp_reset_tree_hotplug(struct rcu_state *rsp) ...@@ -135,13 +135,13 @@ static void sync_exp_reset_tree_hotplug(struct rcu_state *rsp)
* Reset the ->expmask values in the rcu_node tree in preparation for * Reset the ->expmask values in the rcu_node tree in preparation for
* a new expedited grace period. * a new expedited grace period.
*/ */
static void __maybe_unused sync_exp_reset_tree(struct rcu_state *rsp) static void __maybe_unused sync_exp_reset_tree(void)
{ {
unsigned long flags; unsigned long flags;
struct rcu_node *rnp; struct rcu_node *rnp;
sync_exp_reset_tree_hotplug(rsp); sync_exp_reset_tree_hotplug();
rcu_for_each_node_breadth_first(rsp, rnp) { rcu_for_each_node_breadth_first(rnp) {
raw_spin_lock_irqsave_rcu_node(rnp, flags); raw_spin_lock_irqsave_rcu_node(rnp, flags);
WARN_ON_ONCE(rnp->expmask); WARN_ON_ONCE(rnp->expmask);
rnp->expmask = rnp->expmaskinit; rnp->expmask = rnp->expmaskinit;
...@@ -194,7 +194,7 @@ static bool sync_rcu_preempt_exp_done_unlocked(struct rcu_node *rnp) ...@@ -194,7 +194,7 @@ static bool sync_rcu_preempt_exp_done_unlocked(struct rcu_node *rnp)
* *
* Caller must hold the specified rcu_node structure's ->lock. * Caller must hold the specified rcu_node structure's ->lock.
*/ */
static void __rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp, static void __rcu_report_exp_rnp(struct rcu_node *rnp,
bool wake, unsigned long flags) bool wake, unsigned long flags)
__releases(rnp->lock) __releases(rnp->lock)
{ {
...@@ -212,7 +212,7 @@ static void __rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp, ...@@ -212,7 +212,7 @@ static void __rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
raw_spin_unlock_irqrestore_rcu_node(rnp, flags); raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
if (wake) { if (wake) {
smp_mb(); /* EGP done before wake_up(). */ smp_mb(); /* EGP done before wake_up(). */
swake_up_one(&rsp->expedited_wq); swake_up_one(&rcu_state.expedited_wq);
} }
break; break;
} }
...@@ -229,20 +229,19 @@ static void __rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp, ...@@ -229,20 +229,19 @@ static void __rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
* Report expedited quiescent state for specified node. This is a * Report expedited quiescent state for specified node. This is a
* lock-acquisition wrapper function for __rcu_report_exp_rnp(). * lock-acquisition wrapper function for __rcu_report_exp_rnp().
*/ */
static void __maybe_unused rcu_report_exp_rnp(struct rcu_state *rsp, static void __maybe_unused rcu_report_exp_rnp(struct rcu_node *rnp, bool wake)
struct rcu_node *rnp, bool wake)
{ {
unsigned long flags; unsigned long flags;
raw_spin_lock_irqsave_rcu_node(rnp, flags); raw_spin_lock_irqsave_rcu_node(rnp, flags);
__rcu_report_exp_rnp(rsp, rnp, wake, flags); __rcu_report_exp_rnp(rnp, wake, flags);
} }
/* /*
* Report expedited quiescent state for multiple CPUs, all covered by the * Report expedited quiescent state for multiple CPUs, all covered by the
* specified leaf rcu_node structure. * specified leaf rcu_node structure.
*/ */
static void rcu_report_exp_cpu_mult(struct rcu_state *rsp, struct rcu_node *rnp, static void rcu_report_exp_cpu_mult(struct rcu_node *rnp,
unsigned long mask, bool wake) unsigned long mask, bool wake)
{ {
unsigned long flags; unsigned long flags;
...@@ -253,23 +252,23 @@ static void rcu_report_exp_cpu_mult(struct rcu_state *rsp, struct rcu_node *rnp, ...@@ -253,23 +252,23 @@ static void rcu_report_exp_cpu_mult(struct rcu_state *rsp, struct rcu_node *rnp,
return; return;
} }
rnp->expmask &= ~mask; rnp->expmask &= ~mask;
__rcu_report_exp_rnp(rsp, rnp, wake, flags); /* Releases rnp->lock. */ __rcu_report_exp_rnp(rnp, wake, flags); /* Releases rnp->lock. */
} }
/* /*
* Report expedited quiescent state for specified rcu_data (CPU). * Report expedited quiescent state for specified rcu_data (CPU).
*/ */
static void rcu_report_exp_rdp(struct rcu_state *rsp, struct rcu_data *rdp, static void rcu_report_exp_rdp(struct rcu_data *rdp)
bool wake)
{ {
rcu_report_exp_cpu_mult(rsp, rdp->mynode, rdp->grpmask, wake); WRITE_ONCE(rdp->deferred_qs, false);
rcu_report_exp_cpu_mult(rdp->mynode, rdp->grpmask, true);
} }
/* Common code for synchronize_{rcu,sched}_expedited() work-done checking. */ /* Common code for work-done checking. */
static bool sync_exp_work_done(struct rcu_state *rsp, unsigned long s) static bool sync_exp_work_done(unsigned long s)
{ {
if (rcu_exp_gp_seq_done(rsp, s)) { if (rcu_exp_gp_seq_done(s)) {
trace_rcu_exp_grace_period(rsp->name, s, TPS("done")); trace_rcu_exp_grace_period(rcu_state.name, s, TPS("done"));
/* Ensure test happens before caller kfree(). */ /* Ensure test happens before caller kfree(). */
smp_mb__before_atomic(); /* ^^^ */ smp_mb__before_atomic(); /* ^^^ */
return true; return true;
...@@ -284,28 +283,28 @@ static bool sync_exp_work_done(struct rcu_state *rsp, unsigned long s) ...@@ -284,28 +283,28 @@ static bool sync_exp_work_done(struct rcu_state *rsp, unsigned long s)
* with the mutex held, indicating that the caller must actually do the * with the mutex held, indicating that the caller must actually do the
* expedited grace period. * expedited grace period.
*/ */
static bool exp_funnel_lock(struct rcu_state *rsp, unsigned long s) static bool exp_funnel_lock(unsigned long s)
{ {
struct rcu_data *rdp = per_cpu_ptr(rsp->rda, raw_smp_processor_id()); struct rcu_data *rdp = per_cpu_ptr(&rcu_data, raw_smp_processor_id());
struct rcu_node *rnp = rdp->mynode; struct rcu_node *rnp = rdp->mynode;
struct rcu_node *rnp_root = rcu_get_root(rsp); struct rcu_node *rnp_root = rcu_get_root();
/* Low-contention fastpath. */ /* Low-contention fastpath. */
if (ULONG_CMP_LT(READ_ONCE(rnp->exp_seq_rq), s) && if (ULONG_CMP_LT(READ_ONCE(rnp->exp_seq_rq), s) &&
(rnp == rnp_root || (rnp == rnp_root ||
ULONG_CMP_LT(READ_ONCE(rnp_root->exp_seq_rq), s)) && ULONG_CMP_LT(READ_ONCE(rnp_root->exp_seq_rq), s)) &&
mutex_trylock(&rsp->exp_mutex)) mutex_trylock(&rcu_state.exp_mutex))
goto fastpath; goto fastpath;
/* /*
* Each pass through the following loop works its way up * Each pass through the following loop works its way up
* the rcu_node tree, returning if others have done the work or * the rcu_node tree, returning if others have done the work or
* otherwise falls through to acquire rsp->exp_mutex. The mapping * otherwise falls through to acquire ->exp_mutex. The mapping
* from CPU to rcu_node structure can be inexact, as it is just * from CPU to rcu_node structure can be inexact, as it is just
* promoting locality and is not strictly needed for correctness. * promoting locality and is not strictly needed for correctness.
*/ */
for (; rnp != NULL; rnp = rnp->parent) { for (; rnp != NULL; rnp = rnp->parent) {
if (sync_exp_work_done(rsp, s)) if (sync_exp_work_done(s))
return true; return true;
/* Work not done, either wait here or go up. */ /* Work not done, either wait here or go up. */
...@@ -314,68 +313,29 @@ static bool exp_funnel_lock(struct rcu_state *rsp, unsigned long s) ...@@ -314,68 +313,29 @@ static bool exp_funnel_lock(struct rcu_state *rsp, unsigned long s)
/* Someone else doing GP, so wait for them. */ /* Someone else doing GP, so wait for them. */
spin_unlock(&rnp->exp_lock); spin_unlock(&rnp->exp_lock);
trace_rcu_exp_funnel_lock(rsp->name, rnp->level, trace_rcu_exp_funnel_lock(rcu_state.name, rnp->level,
rnp->grplo, rnp->grphi, rnp->grplo, rnp->grphi,
TPS("wait")); TPS("wait"));
wait_event(rnp->exp_wq[rcu_seq_ctr(s) & 0x3], wait_event(rnp->exp_wq[rcu_seq_ctr(s) & 0x3],
sync_exp_work_done(rsp, s)); sync_exp_work_done(s));
return true; return true;
} }
rnp->exp_seq_rq = s; /* Followers can wait on us. */ rnp->exp_seq_rq = s; /* Followers can wait on us. */
spin_unlock(&rnp->exp_lock); spin_unlock(&rnp->exp_lock);
trace_rcu_exp_funnel_lock(rsp->name, rnp->level, rnp->grplo, trace_rcu_exp_funnel_lock(rcu_state.name, rnp->level,
rnp->grphi, TPS("nxtlvl")); rnp->grplo, rnp->grphi, TPS("nxtlvl"));
} }
mutex_lock(&rsp->exp_mutex); mutex_lock(&rcu_state.exp_mutex);
fastpath: fastpath:
if (sync_exp_work_done(rsp, s)) { if (sync_exp_work_done(s)) {
mutex_unlock(&rsp->exp_mutex); mutex_unlock(&rcu_state.exp_mutex);
return true; return true;
} }
rcu_exp_gp_seq_start(rsp); rcu_exp_gp_seq_start();
trace_rcu_exp_grace_period(rsp->name, s, TPS("start")); trace_rcu_exp_grace_period(rcu_state.name, s, TPS("start"));
return false; return false;
} }
/* Invoked on each online non-idle CPU for expedited quiescent state. */
static void sync_sched_exp_handler(void *data)
{
struct rcu_data *rdp;
struct rcu_node *rnp;
struct rcu_state *rsp = data;
rdp = this_cpu_ptr(rsp->rda);
rnp = rdp->mynode;
if (!(READ_ONCE(rnp->expmask) & rdp->grpmask) ||
__this_cpu_read(rcu_sched_data.cpu_no_qs.b.exp))
return;
if (rcu_is_cpu_rrupt_from_idle()) {
rcu_report_exp_rdp(&rcu_sched_state,
this_cpu_ptr(&rcu_sched_data), true);
return;
}
__this_cpu_write(rcu_sched_data.cpu_no_qs.b.exp, true);
/* Store .exp before .rcu_urgent_qs. */
smp_store_release(this_cpu_ptr(&rcu_dynticks.rcu_urgent_qs), true);
resched_cpu(smp_processor_id());
}
/* Send IPI for expedited cleanup if needed at end of CPU-hotplug operation. */
static void sync_sched_exp_online_cleanup(int cpu)
{
struct rcu_data *rdp;
int ret;
struct rcu_node *rnp;
struct rcu_state *rsp = &rcu_sched_state;
rdp = per_cpu_ptr(rsp->rda, cpu);
rnp = rdp->mynode;
if (!(READ_ONCE(rnp->expmask) & rdp->grpmask))
return;
ret = smp_call_function_single(cpu, sync_sched_exp_handler, rsp, 0);
WARN_ON_ONCE(ret);
}
/* /*
* Select the CPUs within the specified rcu_node that the upcoming * Select the CPUs within the specified rcu_node that the upcoming
* expedited grace period needs to wait for. * expedited grace period needs to wait for.
...@@ -391,7 +351,6 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp) ...@@ -391,7 +351,6 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp)
struct rcu_exp_work *rewp = struct rcu_exp_work *rewp =
container_of(wp, struct rcu_exp_work, rew_work); container_of(wp, struct rcu_exp_work, rew_work);
struct rcu_node *rnp = container_of(rewp, struct rcu_node, rew); struct rcu_node *rnp = container_of(rewp, struct rcu_node, rew);
struct rcu_state *rsp = rewp->rew_rsp;
func = rewp->rew_func; func = rewp->rew_func;
raw_spin_lock_irqsave_rcu_node(rnp, flags); raw_spin_lock_irqsave_rcu_node(rnp, flags);
...@@ -400,15 +359,14 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp) ...@@ -400,15 +359,14 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp)
mask_ofl_test = 0; mask_ofl_test = 0;
for_each_leaf_node_cpu_mask(rnp, cpu, rnp->expmask) { for_each_leaf_node_cpu_mask(rnp, cpu, rnp->expmask) {
unsigned long mask = leaf_node_cpu_bit(rnp, cpu); unsigned long mask = leaf_node_cpu_bit(rnp, cpu);
struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu); struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
struct rcu_dynticks *rdtp = per_cpu_ptr(&rcu_dynticks, cpu);
int snap; int snap;
if (raw_smp_processor_id() == cpu || if (raw_smp_processor_id() == cpu ||
!(rnp->qsmaskinitnext & mask)) { !(rnp->qsmaskinitnext & mask)) {
mask_ofl_test |= mask; mask_ofl_test |= mask;
} else { } else {
snap = rcu_dynticks_snap(rdtp); snap = rcu_dynticks_snap(rdp);
if (rcu_dynticks_in_eqs(snap)) if (rcu_dynticks_in_eqs(snap))
mask_ofl_test |= mask; mask_ofl_test |= mask;
else else
...@@ -429,17 +387,16 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp) ...@@ -429,17 +387,16 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp)
/* IPI the remaining CPUs for expedited quiescent state. */ /* IPI the remaining CPUs for expedited quiescent state. */
for_each_leaf_node_cpu_mask(rnp, cpu, rnp->expmask) { for_each_leaf_node_cpu_mask(rnp, cpu, rnp->expmask) {
unsigned long mask = leaf_node_cpu_bit(rnp, cpu); unsigned long mask = leaf_node_cpu_bit(rnp, cpu);
struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu); struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
if (!(mask_ofl_ipi & mask)) if (!(mask_ofl_ipi & mask))
continue; continue;
retry_ipi: retry_ipi:
if (rcu_dynticks_in_eqs_since(rdp->dynticks, if (rcu_dynticks_in_eqs_since(rdp, rdp->exp_dynticks_snap)) {
rdp->exp_dynticks_snap)) {
mask_ofl_test |= mask; mask_ofl_test |= mask;
continue; continue;
} }
ret = smp_call_function_single(cpu, func, rsp, 0); ret = smp_call_function_single(cpu, func, NULL, 0);
if (!ret) { if (!ret) {
mask_ofl_ipi &= ~mask; mask_ofl_ipi &= ~mask;
continue; continue;
...@@ -450,7 +407,7 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp) ...@@ -450,7 +407,7 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp)
(rnp->expmask & mask)) { (rnp->expmask & mask)) {
/* Online, so delay for a bit and try again. */ /* Online, so delay for a bit and try again. */
raw_spin_unlock_irqrestore_rcu_node(rnp, flags); raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
trace_rcu_exp_grace_period(rsp->name, rcu_exp_gp_seq_endval(rsp), TPS("selectofl")); trace_rcu_exp_grace_period(rcu_state.name, rcu_exp_gp_seq_endval(), TPS("selectofl"));
schedule_timeout_uninterruptible(1); schedule_timeout_uninterruptible(1);
goto retry_ipi; goto retry_ipi;
} }
...@@ -462,33 +419,31 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp) ...@@ -462,33 +419,31 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp)
/* Report quiescent states for those that went offline. */ /* Report quiescent states for those that went offline. */
mask_ofl_test |= mask_ofl_ipi; mask_ofl_test |= mask_ofl_ipi;
if (mask_ofl_test) if (mask_ofl_test)
rcu_report_exp_cpu_mult(rsp, rnp, mask_ofl_test, false); rcu_report_exp_cpu_mult(rnp, mask_ofl_test, false);
} }
/* /*
* Select the nodes that the upcoming expedited grace period needs * Select the nodes that the upcoming expedited grace period needs
* to wait for. * to wait for.
*/ */
static void sync_rcu_exp_select_cpus(struct rcu_state *rsp, static void sync_rcu_exp_select_cpus(smp_call_func_t func)
smp_call_func_t func)
{ {
int cpu; int cpu;
struct rcu_node *rnp; struct rcu_node *rnp;
trace_rcu_exp_grace_period(rsp->name, rcu_exp_gp_seq_endval(rsp), TPS("reset")); trace_rcu_exp_grace_period(rcu_state.name, rcu_exp_gp_seq_endval(), TPS("reset"));
sync_exp_reset_tree(rsp); sync_exp_reset_tree();
trace_rcu_exp_grace_period(rsp->name, rcu_exp_gp_seq_endval(rsp), TPS("select")); trace_rcu_exp_grace_period(rcu_state.name, rcu_exp_gp_seq_endval(), TPS("select"));
/* Schedule work for each leaf rcu_node structure. */ /* Schedule work for each leaf rcu_node structure. */
rcu_for_each_leaf_node(rsp, rnp) { rcu_for_each_leaf_node(rnp) {
rnp->exp_need_flush = false; rnp->exp_need_flush = false;
if (!READ_ONCE(rnp->expmask)) if (!READ_ONCE(rnp->expmask))
continue; /* Avoid early boot non-existent wq. */ continue; /* Avoid early boot non-existent wq. */
rnp->rew.rew_func = func; rnp->rew.rew_func = func;
rnp->rew.rew_rsp = rsp;
if (!READ_ONCE(rcu_par_gp_wq) || if (!READ_ONCE(rcu_par_gp_wq) ||
rcu_scheduler_active != RCU_SCHEDULER_RUNNING || rcu_scheduler_active != RCU_SCHEDULER_RUNNING ||
rcu_is_last_leaf_node(rsp, rnp)) { rcu_is_last_leaf_node(rnp)) {
/* No workqueues yet or last leaf, do direct call. */ /* No workqueues yet or last leaf, do direct call. */
sync_rcu_exp_select_node_cpus(&rnp->rew.rew_work); sync_rcu_exp_select_node_cpus(&rnp->rew.rew_work);
continue; continue;
...@@ -505,12 +460,12 @@ static void sync_rcu_exp_select_cpus(struct rcu_state *rsp, ...@@ -505,12 +460,12 @@ static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
} }
/* Wait for workqueue jobs (if any) to complete. */ /* Wait for workqueue jobs (if any) to complete. */
rcu_for_each_leaf_node(rsp, rnp) rcu_for_each_leaf_node(rnp)
if (rnp->exp_need_flush) if (rnp->exp_need_flush)
flush_work(&rnp->rew.rew_work); flush_work(&rnp->rew.rew_work);
} }
static void synchronize_sched_expedited_wait(struct rcu_state *rsp) static void synchronize_sched_expedited_wait(void)
{ {
int cpu; int cpu;
unsigned long jiffies_stall; unsigned long jiffies_stall;
...@@ -518,16 +473,16 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp) ...@@ -518,16 +473,16 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
unsigned long mask; unsigned long mask;
int ndetected; int ndetected;
struct rcu_node *rnp; struct rcu_node *rnp;
struct rcu_node *rnp_root = rcu_get_root(rsp); struct rcu_node *rnp_root = rcu_get_root();
int ret; int ret;
trace_rcu_exp_grace_period(rsp->name, rcu_exp_gp_seq_endval(rsp), TPS("startwait")); trace_rcu_exp_grace_period(rcu_state.name, rcu_exp_gp_seq_endval(), TPS("startwait"));
jiffies_stall = rcu_jiffies_till_stall_check(); jiffies_stall = rcu_jiffies_till_stall_check();
jiffies_start = jiffies; jiffies_start = jiffies;
for (;;) { for (;;) {
ret = swait_event_timeout_exclusive( ret = swait_event_timeout_exclusive(
rsp->expedited_wq, rcu_state.expedited_wq,
sync_rcu_preempt_exp_done_unlocked(rnp_root), sync_rcu_preempt_exp_done_unlocked(rnp_root),
jiffies_stall); jiffies_stall);
if (ret > 0 || sync_rcu_preempt_exp_done_unlocked(rnp_root)) if (ret > 0 || sync_rcu_preempt_exp_done_unlocked(rnp_root))
...@@ -537,9 +492,9 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp) ...@@ -537,9 +492,9 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
continue; continue;
panic_on_rcu_stall(); panic_on_rcu_stall();
pr_err("INFO: %s detected expedited stalls on CPUs/tasks: {", pr_err("INFO: %s detected expedited stalls on CPUs/tasks: {",
rsp->name); rcu_state.name);
ndetected = 0; ndetected = 0;
rcu_for_each_leaf_node(rsp, rnp) { rcu_for_each_leaf_node(rnp) {
ndetected += rcu_print_task_exp_stall(rnp); ndetected += rcu_print_task_exp_stall(rnp);
for_each_leaf_node_possible_cpu(rnp, cpu) { for_each_leaf_node_possible_cpu(rnp, cpu) {
struct rcu_data *rdp; struct rcu_data *rdp;
...@@ -548,7 +503,7 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp) ...@@ -548,7 +503,7 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
if (!(rnp->expmask & mask)) if (!(rnp->expmask & mask))
continue; continue;
ndetected++; ndetected++;
rdp = per_cpu_ptr(rsp->rda, cpu); rdp = per_cpu_ptr(&rcu_data, cpu);
pr_cont(" %d-%c%c%c", cpu, pr_cont(" %d-%c%c%c", cpu,
"O."[!!cpu_online(cpu)], "O."[!!cpu_online(cpu)],
"o."[!!(rdp->grpmask & rnp->expmaskinit)], "o."[!!(rdp->grpmask & rnp->expmaskinit)],
...@@ -556,11 +511,11 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp) ...@@ -556,11 +511,11 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
} }
} }
pr_cont(" } %lu jiffies s: %lu root: %#lx/%c\n", pr_cont(" } %lu jiffies s: %lu root: %#lx/%c\n",
jiffies - jiffies_start, rsp->expedited_sequence, jiffies - jiffies_start, rcu_state.expedited_sequence,
rnp_root->expmask, ".T"[!!rnp_root->exp_tasks]); rnp_root->expmask, ".T"[!!rnp_root->exp_tasks]);
if (ndetected) { if (ndetected) {
pr_err("blocking rcu_node structures:"); pr_err("blocking rcu_node structures:");
rcu_for_each_node_breadth_first(rsp, rnp) { rcu_for_each_node_breadth_first(rnp) {
if (rnp == rnp_root) if (rnp == rnp_root)
continue; /* printed unconditionally */ continue; /* printed unconditionally */
if (sync_rcu_preempt_exp_done_unlocked(rnp)) if (sync_rcu_preempt_exp_done_unlocked(rnp))
...@@ -572,7 +527,7 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp) ...@@ -572,7 +527,7 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
} }
pr_cont("\n"); pr_cont("\n");
} }
rcu_for_each_leaf_node(rsp, rnp) { rcu_for_each_leaf_node(rnp) {
for_each_leaf_node_possible_cpu(rnp, cpu) { for_each_leaf_node_possible_cpu(rnp, cpu) {
mask = leaf_node_cpu_bit(rnp, cpu); mask = leaf_node_cpu_bit(rnp, cpu);
if (!(rnp->expmask & mask)) if (!(rnp->expmask & mask))
...@@ -590,21 +545,21 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp) ...@@ -590,21 +545,21 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
* grace period. Also update all the ->exp_seq_rq counters as needed * grace period. Also update all the ->exp_seq_rq counters as needed
* in order to avoid counter-wrap problems. * in order to avoid counter-wrap problems.
*/ */
static void rcu_exp_wait_wake(struct rcu_state *rsp, unsigned long s) static void rcu_exp_wait_wake(unsigned long s)
{ {
struct rcu_node *rnp; struct rcu_node *rnp;
synchronize_sched_expedited_wait(rsp); synchronize_sched_expedited_wait();
rcu_exp_gp_seq_end(rsp); rcu_exp_gp_seq_end();
trace_rcu_exp_grace_period(rsp->name, s, TPS("end")); trace_rcu_exp_grace_period(rcu_state.name, s, TPS("end"));
/* /*
* Switch over to wakeup mode, allowing the next GP, but -only- the * Switch over to wakeup mode, allowing the next GP, but -only- the
* next GP, to proceed. * next GP, to proceed.
*/ */
mutex_lock(&rsp->exp_wake_mutex); mutex_lock(&rcu_state.exp_wake_mutex);
rcu_for_each_node_breadth_first(rsp, rnp) { rcu_for_each_node_breadth_first(rnp) {
if (ULONG_CMP_LT(READ_ONCE(rnp->exp_seq_rq), s)) { if (ULONG_CMP_LT(READ_ONCE(rnp->exp_seq_rq), s)) {
spin_lock(&rnp->exp_lock); spin_lock(&rnp->exp_lock);
/* Recheck, avoid hang in case someone just arrived. */ /* Recheck, avoid hang in case someone just arrived. */
...@@ -613,24 +568,23 @@ static void rcu_exp_wait_wake(struct rcu_state *rsp, unsigned long s) ...@@ -613,24 +568,23 @@ static void rcu_exp_wait_wake(struct rcu_state *rsp, unsigned long s)
spin_unlock(&rnp->exp_lock); spin_unlock(&rnp->exp_lock);
} }
smp_mb(); /* All above changes before wakeup. */ smp_mb(); /* All above changes before wakeup. */
wake_up_all(&rnp->exp_wq[rcu_seq_ctr(rsp->expedited_sequence) & 0x3]); wake_up_all(&rnp->exp_wq[rcu_seq_ctr(rcu_state.expedited_sequence) & 0x3]);
} }
trace_rcu_exp_grace_period(rsp->name, s, TPS("endwake")); trace_rcu_exp_grace_period(rcu_state.name, s, TPS("endwake"));
mutex_unlock(&rsp->exp_wake_mutex); mutex_unlock(&rcu_state.exp_wake_mutex);
} }
/* /*
* Common code to drive an expedited grace period forward, used by * Common code to drive an expedited grace period forward, used by
* workqueues and mid-boot-time tasks. * workqueues and mid-boot-time tasks.
*/ */
static void rcu_exp_sel_wait_wake(struct rcu_state *rsp, static void rcu_exp_sel_wait_wake(smp_call_func_t func, unsigned long s)
smp_call_func_t func, unsigned long s)
{ {
/* Initialize the rcu_node tree in preparation for the wait. */ /* Initialize the rcu_node tree in preparation for the wait. */
sync_rcu_exp_select_cpus(rsp, func); sync_rcu_exp_select_cpus(func);
/* Wait and clean up, including waking everyone. */ /* Wait and clean up, including waking everyone. */
rcu_exp_wait_wake(rsp, s); rcu_exp_wait_wake(s);
} }
/* /*
...@@ -641,15 +595,14 @@ static void wait_rcu_exp_gp(struct work_struct *wp) ...@@ -641,15 +595,14 @@ static void wait_rcu_exp_gp(struct work_struct *wp)
struct rcu_exp_work *rewp; struct rcu_exp_work *rewp;
rewp = container_of(wp, struct rcu_exp_work, rew_work); rewp = container_of(wp, struct rcu_exp_work, rew_work);
rcu_exp_sel_wait_wake(rewp->rew_rsp, rewp->rew_func, rewp->rew_s); rcu_exp_sel_wait_wake(rewp->rew_func, rewp->rew_s);
} }
/* /*
* Given an rcu_state pointer and a smp_call_function() handler, kick * Given a smp_call_function() handler, kick off the specified
* off the specified flavor of expedited grace period. * implementation of expedited grace period.
*/ */
static void _synchronize_rcu_expedited(struct rcu_state *rsp, static void _synchronize_rcu_expedited(smp_call_func_t func)
smp_call_func_t func)
{ {
struct rcu_data *rdp; struct rcu_data *rdp;
struct rcu_exp_work rew; struct rcu_exp_work rew;
...@@ -658,71 +611,37 @@ static void _synchronize_rcu_expedited(struct rcu_state *rsp, ...@@ -658,71 +611,37 @@ static void _synchronize_rcu_expedited(struct rcu_state *rsp,
/* If expedited grace periods are prohibited, fall back to normal. */ /* If expedited grace periods are prohibited, fall back to normal. */
if (rcu_gp_is_normal()) { if (rcu_gp_is_normal()) {
wait_rcu_gp(rsp->call); wait_rcu_gp(call_rcu);
return; return;
} }
/* Take a snapshot of the sequence number. */ /* Take a snapshot of the sequence number. */
s = rcu_exp_gp_seq_snap(rsp); s = rcu_exp_gp_seq_snap();
if (exp_funnel_lock(rsp, s)) if (exp_funnel_lock(s))
return; /* Someone else did our work for us. */ return; /* Someone else did our work for us. */
/* Ensure that load happens before action based on it. */ /* Ensure that load happens before action based on it. */
if (unlikely(rcu_scheduler_active == RCU_SCHEDULER_INIT)) { if (unlikely(rcu_scheduler_active == RCU_SCHEDULER_INIT)) {
/* Direct call during scheduler init and early_initcalls(). */ /* Direct call during scheduler init and early_initcalls(). */
rcu_exp_sel_wait_wake(rsp, func, s); rcu_exp_sel_wait_wake(func, s);
} else { } else {
/* Marshall arguments & schedule the expedited grace period. */ /* Marshall arguments & schedule the expedited grace period. */
rew.rew_func = func; rew.rew_func = func;
rew.rew_rsp = rsp;
rew.rew_s = s; rew.rew_s = s;
INIT_WORK_ONSTACK(&rew.rew_work, wait_rcu_exp_gp); INIT_WORK_ONSTACK(&rew.rew_work, wait_rcu_exp_gp);
queue_work(rcu_gp_wq, &rew.rew_work); queue_work(rcu_gp_wq, &rew.rew_work);
} }
/* Wait for expedited grace period to complete. */ /* Wait for expedited grace period to complete. */
rdp = per_cpu_ptr(rsp->rda, raw_smp_processor_id()); rdp = per_cpu_ptr(&rcu_data, raw_smp_processor_id());
rnp = rcu_get_root(rsp); rnp = rcu_get_root();
wait_event(rnp->exp_wq[rcu_seq_ctr(s) & 0x3], wait_event(rnp->exp_wq[rcu_seq_ctr(s) & 0x3],
sync_exp_work_done(rsp, s)); sync_exp_work_done(s));
smp_mb(); /* Workqueue actions happen before return. */ smp_mb(); /* Workqueue actions happen before return. */
/* Let the next expedited grace period start. */ /* Let the next expedited grace period start. */
mutex_unlock(&rsp->exp_mutex); mutex_unlock(&rcu_state.exp_mutex);
}
/**
* synchronize_sched_expedited - Brute-force RCU-sched grace period
*
* Wait for an RCU-sched grace period to elapse, but use a "big hammer"
* approach to force the grace period to end quickly. This consumes
* significant time on all CPUs and is unfriendly to real-time workloads,
* so is thus not recommended for any sort of common-case code. In fact,
* if you are using synchronize_sched_expedited() in a loop, please
* restructure your code to batch your updates, and then use a single
* synchronize_sched() instead.
*
* This implementation can be thought of as an application of sequence
* locking to expedited grace periods, but using the sequence counter to
* determine when someone else has already done the work instead of for
* retrying readers.
*/
void synchronize_sched_expedited(void)
{
struct rcu_state *rsp = &rcu_sched_state;
RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map) ||
lock_is_held(&rcu_lock_map) ||
lock_is_held(&rcu_sched_lock_map),
"Illegal synchronize_sched_expedited() in RCU read-side critical section");
/* If only one CPU, this is automatically a grace period. */
if (rcu_blocking_is_gp())
return;
_synchronize_rcu_expedited(rsp, sync_sched_exp_handler);
} }
EXPORT_SYMBOL_GPL(synchronize_sched_expedited);
#ifdef CONFIG_PREEMPT_RCU #ifdef CONFIG_PREEMPT_RCU
...@@ -733,34 +652,78 @@ EXPORT_SYMBOL_GPL(synchronize_sched_expedited); ...@@ -733,34 +652,78 @@ EXPORT_SYMBOL_GPL(synchronize_sched_expedited);
* ->expmask fields in the rcu_node tree. Otherwise, immediately * ->expmask fields in the rcu_node tree. Otherwise, immediately
* report the quiescent state. * report the quiescent state.
*/ */
static void sync_rcu_exp_handler(void *info) static void sync_rcu_exp_handler(void *unused)
{ {
struct rcu_data *rdp; unsigned long flags;
struct rcu_state *rsp = info; struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
struct rcu_node *rnp = rdp->mynode;
struct task_struct *t = current; struct task_struct *t = current;
/* /*
* Within an RCU read-side critical section, request that the next * First, the common case of not being in an RCU read-side
* rcu_read_unlock() report. Unless this RCU read-side critical * critical section. If also enabled or idle, immediately
* section has already blocked, in which case it is already set * report the quiescent state, otherwise defer.
* up for the expedited grace period to wait on it.
*/ */
if (t->rcu_read_lock_nesting > 0 && if (!t->rcu_read_lock_nesting) {
!t->rcu_read_unlock_special.b.blocked) { if (!(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK)) ||
t->rcu_read_unlock_special.b.exp_need_qs = true; rcu_dynticks_curr_cpu_in_eqs()) {
rcu_report_exp_rdp(rdp);
} else {
rdp->deferred_qs = true;
set_tsk_need_resched(t);
set_preempt_need_resched();
}
return; return;
} }
/* /*
* We are either exiting an RCU read-side critical section (negative * Second, the less-common case of being in an RCU read-side
* values of t->rcu_read_lock_nesting) or are not in one at all * critical section. In this case we can count on a future
* (zero value of t->rcu_read_lock_nesting). Or we are in an RCU * rcu_read_unlock(). However, this rcu_read_unlock() might
* read-side critical section that blocked before this expedited * execute on some other CPU, but in that case there will be
* grace period started. Either way, we can immediately report * a future context switch. Either way, if the expedited
* the quiescent state. * grace period is still waiting on this CPU, set ->deferred_qs
* so that the eventual quiescent state will be reported.
* Note that there is a large group of race conditions that
* can have caused this quiescent state to already have been
* reported, so we really do need to check ->expmask.
*/
if (t->rcu_read_lock_nesting > 0) {
raw_spin_lock_irqsave_rcu_node(rnp, flags);
if (rnp->expmask & rdp->grpmask)
rdp->deferred_qs = true;
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
}
/*
* The final and least likely case is where the interrupted
* code was just about to or just finished exiting the RCU-preempt
* read-side critical section, and no, we can't tell which.
* So either way, set ->deferred_qs to flag later code that
* a quiescent state is required.
*
* If the CPU is fully enabled (or if some buggy RCU-preempt
* read-side critical section is being used from idle), just
* invoke rcu_preempt_defer_qs() to immediately report the
* quiescent state. We cannot use rcu_read_unlock_special()
* because we are in an interrupt handler, which will cause that
* function to take an early exit without doing anything.
*
* Otherwise, force a context switch after the CPU enables everything.
*/ */
rdp = this_cpu_ptr(rsp->rda); rdp->deferred_qs = true;
rcu_report_exp_rdp(rsp, rdp, true); if (!(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK)) ||
WARN_ON_ONCE(rcu_dynticks_curr_cpu_in_eqs())) {
rcu_preempt_deferred_qs(t);
} else {
set_tsk_need_resched(t);
set_preempt_need_resched();
}
}
/* PREEMPT=y, so no PREEMPT=n expedited grace period to clean up after. */
static void sync_sched_exp_online_cleanup(int cpu)
{
} }
/** /**
...@@ -780,11 +743,11 @@ static void sync_rcu_exp_handler(void *info) ...@@ -780,11 +743,11 @@ static void sync_rcu_exp_handler(void *info)
* you are using synchronize_rcu_expedited() in a loop, please restructure * you are using synchronize_rcu_expedited() in a loop, please restructure
* your code to batch your updates, and then Use a single synchronize_rcu() * your code to batch your updates, and then Use a single synchronize_rcu()
* instead. * instead.
*
* This has the same semantics as (but is more brutal than) synchronize_rcu().
*/ */
void synchronize_rcu_expedited(void) void synchronize_rcu_expedited(void)
{ {
struct rcu_state *rsp = rcu_state_p;
RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map) || RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map) ||
lock_is_held(&rcu_lock_map) || lock_is_held(&rcu_lock_map) ||
lock_is_held(&rcu_sched_lock_map), lock_is_held(&rcu_sched_lock_map),
...@@ -792,19 +755,82 @@ void synchronize_rcu_expedited(void) ...@@ -792,19 +755,82 @@ void synchronize_rcu_expedited(void)
if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE) if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
return; return;
_synchronize_rcu_expedited(rsp, sync_rcu_exp_handler); _synchronize_rcu_expedited(sync_rcu_exp_handler);
} }
EXPORT_SYMBOL_GPL(synchronize_rcu_expedited); EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
#else /* #ifdef CONFIG_PREEMPT_RCU */ #else /* #ifdef CONFIG_PREEMPT_RCU */
/* Invoked on each online non-idle CPU for expedited quiescent state. */
static void sync_sched_exp_handler(void *unused)
{
struct rcu_data *rdp;
struct rcu_node *rnp;
rdp = this_cpu_ptr(&rcu_data);
rnp = rdp->mynode;
if (!(READ_ONCE(rnp->expmask) & rdp->grpmask) ||
__this_cpu_read(rcu_data.cpu_no_qs.b.exp))
return;
if (rcu_is_cpu_rrupt_from_idle()) {
rcu_report_exp_rdp(this_cpu_ptr(&rcu_data));
return;
}
__this_cpu_write(rcu_data.cpu_no_qs.b.exp, true);
/* Store .exp before .rcu_urgent_qs. */
smp_store_release(this_cpu_ptr(&rcu_data.rcu_urgent_qs), true);
set_tsk_need_resched(current);
set_preempt_need_resched();
}
/* Send IPI for expedited cleanup if needed at end of CPU-hotplug operation. */
static void sync_sched_exp_online_cleanup(int cpu)
{
struct rcu_data *rdp;
int ret;
struct rcu_node *rnp;
rdp = per_cpu_ptr(&rcu_data, cpu);
rnp = rdp->mynode;
if (!(READ_ONCE(rnp->expmask) & rdp->grpmask))
return;
ret = smp_call_function_single(cpu, sync_sched_exp_handler, NULL, 0);
WARN_ON_ONCE(ret);
}
/* /*
* Wait for an rcu-preempt grace period, but make it happen quickly. * Because a context switch is a grace period for !PREEMPT, any
* But because preemptible RCU does not exist, map to rcu-sched. * blocking grace-period wait automatically implies a grace period if
* there is only one CPU online at any point time during execution of
* either synchronize_rcu() or synchronize_rcu_expedited(). It is OK to
* occasionally incorrectly indicate that there are multiple CPUs online
* when there was in fact only one the whole time, as this just adds some
* overhead: RCU still operates correctly.
*/ */
static int rcu_blocking_is_gp(void)
{
int ret;
might_sleep(); /* Check for RCU read-side critical section. */
preempt_disable();
ret = num_online_cpus() <= 1;
preempt_enable();
return ret;
}
/* PREEMPT=n implementation of synchronize_rcu_expedited(). */
void synchronize_rcu_expedited(void) void synchronize_rcu_expedited(void)
{ {
synchronize_sched_expedited(); RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map) ||
lock_is_held(&rcu_lock_map) ||
lock_is_held(&rcu_sched_lock_map),
"Illegal synchronize_rcu_expedited() in RCU read-side critical section");
/* If only one CPU, this is automatically a grace period. */
if (rcu_blocking_is_gp())
return;
_synchronize_rcu_expedited(sync_sched_exp_handler);
} }
EXPORT_SYMBOL_GPL(synchronize_rcu_expedited); EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
......
...@@ -38,8 +38,7 @@ ...@@ -38,8 +38,7 @@
#include "../locking/rtmutex_common.h" #include "../locking/rtmutex_common.h"
/* /*
* Control variables for per-CPU and per-rcu_node kthreads. These * Control variables for per-CPU and per-rcu_node kthreads.
* handle all flavors of RCU.
*/ */
static DEFINE_PER_CPU(struct task_struct *, rcu_cpu_kthread_task); static DEFINE_PER_CPU(struct task_struct *, rcu_cpu_kthread_task);
DEFINE_PER_CPU(unsigned int, rcu_cpu_kthread_status); DEFINE_PER_CPU(unsigned int, rcu_cpu_kthread_status);
...@@ -106,6 +105,8 @@ static void __init rcu_bootup_announce_oddness(void) ...@@ -106,6 +105,8 @@ static void __init rcu_bootup_announce_oddness(void)
pr_info("\tBoot-time adjustment of first FQS scan delay to %ld jiffies.\n", jiffies_till_first_fqs); pr_info("\tBoot-time adjustment of first FQS scan delay to %ld jiffies.\n", jiffies_till_first_fqs);
if (jiffies_till_next_fqs != ULONG_MAX) if (jiffies_till_next_fqs != ULONG_MAX)
pr_info("\tBoot-time adjustment of subsequent FQS scan delay to %ld jiffies.\n", jiffies_till_next_fqs); pr_info("\tBoot-time adjustment of subsequent FQS scan delay to %ld jiffies.\n", jiffies_till_next_fqs);
if (jiffies_till_sched_qs != ULONG_MAX)
pr_info("\tBoot-time adjustment of scheduler-enlistment delay to %ld jiffies.\n", jiffies_till_sched_qs);
if (rcu_kick_kthreads) if (rcu_kick_kthreads)
pr_info("\tKick kthreads if too-long grace period.\n"); pr_info("\tKick kthreads if too-long grace period.\n");
if (IS_ENABLED(CONFIG_DEBUG_OBJECTS_RCU_HEAD)) if (IS_ENABLED(CONFIG_DEBUG_OBJECTS_RCU_HEAD))
...@@ -123,12 +124,7 @@ static void __init rcu_bootup_announce_oddness(void) ...@@ -123,12 +124,7 @@ static void __init rcu_bootup_announce_oddness(void)
#ifdef CONFIG_PREEMPT_RCU #ifdef CONFIG_PREEMPT_RCU
RCU_STATE_INITIALIZER(rcu_preempt, 'p', call_rcu); static void rcu_report_exp_rnp(struct rcu_node *rnp, bool wake);
static struct rcu_state *const rcu_state_p = &rcu_preempt_state;
static struct rcu_data __percpu *const rcu_data_p = &rcu_preempt_data;
static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
bool wake);
static void rcu_read_unlock_special(struct task_struct *t); static void rcu_read_unlock_special(struct task_struct *t);
/* /*
...@@ -284,13 +280,10 @@ static void rcu_preempt_ctxt_queue(struct rcu_node *rnp, struct rcu_data *rdp) ...@@ -284,13 +280,10 @@ static void rcu_preempt_ctxt_queue(struct rcu_node *rnp, struct rcu_data *rdp)
* no need to check for a subsequent expedited GP. (Though we are * no need to check for a subsequent expedited GP. (Though we are
* still in a quiescent state in any case.) * still in a quiescent state in any case.)
*/ */
if (blkd_state & RCU_EXP_BLKD && if (blkd_state & RCU_EXP_BLKD && rdp->deferred_qs)
t->rcu_read_unlock_special.b.exp_need_qs) { rcu_report_exp_rdp(rdp);
t->rcu_read_unlock_special.b.exp_need_qs = false; else
rcu_report_exp_rdp(rdp->rsp, rdp, true); WARN_ON_ONCE(rdp->deferred_qs);
} else {
WARN_ON_ONCE(t->rcu_read_unlock_special.b.exp_need_qs);
}
} }
/* /*
...@@ -306,15 +299,15 @@ static void rcu_preempt_ctxt_queue(struct rcu_node *rnp, struct rcu_data *rdp) ...@@ -306,15 +299,15 @@ static void rcu_preempt_ctxt_queue(struct rcu_node *rnp, struct rcu_data *rdp)
* *
* Callers to this function must disable preemption. * Callers to this function must disable preemption.
*/ */
static void rcu_preempt_qs(void) static void rcu_qs(void)
{ {
RCU_LOCKDEP_WARN(preemptible(), "rcu_preempt_qs() invoked with preemption enabled!!!\n"); RCU_LOCKDEP_WARN(preemptible(), "rcu_qs() invoked with preemption enabled!!!\n");
if (__this_cpu_read(rcu_data_p->cpu_no_qs.s)) { if (__this_cpu_read(rcu_data.cpu_no_qs.s)) {
trace_rcu_grace_period(TPS("rcu_preempt"), trace_rcu_grace_period(TPS("rcu_preempt"),
__this_cpu_read(rcu_data_p->gp_seq), __this_cpu_read(rcu_data.gp_seq),
TPS("cpuqs")); TPS("cpuqs"));
__this_cpu_write(rcu_data_p->cpu_no_qs.b.norm, false); __this_cpu_write(rcu_data.cpu_no_qs.b.norm, false);
barrier(); /* Coordinate with rcu_preempt_check_callbacks(). */ barrier(); /* Coordinate with rcu_flavor_check_callbacks(). */
current->rcu_read_unlock_special.b.need_qs = false; current->rcu_read_unlock_special.b.need_qs = false;
} }
} }
...@@ -332,19 +325,20 @@ static void rcu_preempt_qs(void) ...@@ -332,19 +325,20 @@ static void rcu_preempt_qs(void)
* *
* Caller must disable interrupts. * Caller must disable interrupts.
*/ */
static void rcu_preempt_note_context_switch(bool preempt) void rcu_note_context_switch(bool preempt)
{ {
struct task_struct *t = current; struct task_struct *t = current;
struct rcu_data *rdp; struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
struct rcu_node *rnp; struct rcu_node *rnp;
barrier(); /* Avoid RCU read-side critical sections leaking down. */
trace_rcu_utilization(TPS("Start context switch"));
lockdep_assert_irqs_disabled(); lockdep_assert_irqs_disabled();
WARN_ON_ONCE(!preempt && t->rcu_read_lock_nesting > 0); WARN_ON_ONCE(!preempt && t->rcu_read_lock_nesting > 0);
if (t->rcu_read_lock_nesting > 0 && if (t->rcu_read_lock_nesting > 0 &&
!t->rcu_read_unlock_special.b.blocked) { !t->rcu_read_unlock_special.b.blocked) {
/* Possibly blocking in an RCU read-side critical section. */ /* Possibly blocking in an RCU read-side critical section. */
rdp = this_cpu_ptr(rcu_state_p->rda);
rnp = rdp->mynode; rnp = rdp->mynode;
raw_spin_lock_rcu_node(rnp); raw_spin_lock_rcu_node(rnp);
t->rcu_read_unlock_special.b.blocked = true; t->rcu_read_unlock_special.b.blocked = true;
...@@ -357,7 +351,7 @@ static void rcu_preempt_note_context_switch(bool preempt) ...@@ -357,7 +351,7 @@ static void rcu_preempt_note_context_switch(bool preempt)
*/ */
WARN_ON_ONCE((rdp->grpmask & rcu_rnp_online_cpus(rnp)) == 0); WARN_ON_ONCE((rdp->grpmask & rcu_rnp_online_cpus(rnp)) == 0);
WARN_ON_ONCE(!list_empty(&t->rcu_node_entry)); WARN_ON_ONCE(!list_empty(&t->rcu_node_entry));
trace_rcu_preempt_task(rdp->rsp->name, trace_rcu_preempt_task(rcu_state.name,
t->pid, t->pid,
(rnp->qsmask & rdp->grpmask) (rnp->qsmask & rdp->grpmask)
? rnp->gp_seq ? rnp->gp_seq
...@@ -371,6 +365,9 @@ static void rcu_preempt_note_context_switch(bool preempt) ...@@ -371,6 +365,9 @@ static void rcu_preempt_note_context_switch(bool preempt)
* behalf of preempted instance of __rcu_read_unlock(). * behalf of preempted instance of __rcu_read_unlock().
*/ */
rcu_read_unlock_special(t); rcu_read_unlock_special(t);
rcu_preempt_deferred_qs(t);
} else {
rcu_preempt_deferred_qs(t);
} }
/* /*
...@@ -382,8 +379,13 @@ static void rcu_preempt_note_context_switch(bool preempt) ...@@ -382,8 +379,13 @@ static void rcu_preempt_note_context_switch(bool preempt)
* grace period, then the fact that the task has been enqueued * grace period, then the fact that the task has been enqueued
* means that we continue to block the current grace period. * means that we continue to block the current grace period.
*/ */
rcu_preempt_qs(); rcu_qs();
if (rdp->deferred_qs)
rcu_report_exp_rdp(rdp);
trace_rcu_utilization(TPS("End context switch"));
barrier(); /* Avoid RCU read-side critical sections leaking up. */
} }
EXPORT_SYMBOL_GPL(rcu_note_context_switch);
/* /*
* Check for preempted RCU readers blocking the current grace period * Check for preempted RCU readers blocking the current grace period
...@@ -464,74 +466,56 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp) ...@@ -464,74 +466,56 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp)
} }
/* /*
* Handle special cases during rcu_read_unlock(), such as needing to * Report deferred quiescent states. The deferral time can
* notify RCU core processing or task having blocked during the RCU * be quite short, for example, in the case of the call from
* read-side critical section. * rcu_read_unlock_special().
*/ */
static void rcu_read_unlock_special(struct task_struct *t) static void
rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
{ {
bool empty_exp; bool empty_exp;
bool empty_norm; bool empty_norm;
bool empty_exp_now; bool empty_exp_now;
unsigned long flags;
struct list_head *np; struct list_head *np;
bool drop_boost_mutex = false; bool drop_boost_mutex = false;
struct rcu_data *rdp; struct rcu_data *rdp;
struct rcu_node *rnp; struct rcu_node *rnp;
union rcu_special special; union rcu_special special;
/* NMI handlers cannot block and cannot safely manipulate state. */
if (in_nmi())
return;
local_irq_save(flags);
/* /*
* If RCU core is waiting for this CPU to exit its critical section, * If RCU core is waiting for this CPU to exit its critical section,
* report the fact that it has exited. Because irqs are disabled, * report the fact that it has exited. Because irqs are disabled,
* t->rcu_read_unlock_special cannot change. * t->rcu_read_unlock_special cannot change.
*/ */
special = t->rcu_read_unlock_special; special = t->rcu_read_unlock_special;
rdp = this_cpu_ptr(&rcu_data);
if (!special.s && !rdp->deferred_qs) {
local_irq_restore(flags);
return;
}
if (special.b.need_qs) { if (special.b.need_qs) {
rcu_preempt_qs(); rcu_qs();
t->rcu_read_unlock_special.b.need_qs = false; t->rcu_read_unlock_special.b.need_qs = false;
if (!t->rcu_read_unlock_special.s) { if (!t->rcu_read_unlock_special.s && !rdp->deferred_qs) {
local_irq_restore(flags); local_irq_restore(flags);
return; return;
} }
} }
/* /*
* Respond to a request for an expedited grace period, but only if * Respond to a request by an expedited grace period for a
* we were not preempted, meaning that we were running on the same * quiescent state from this CPU. Note that requests from
* CPU throughout. If we were preempted, the exp_need_qs flag * tasks are handled when removing the task from the
* would have been cleared at the time of the first preemption, * blocked-tasks list below.
* and the quiescent state would be reported when we were dequeued.
*/ */
if (special.b.exp_need_qs) { if (rdp->deferred_qs) {
WARN_ON_ONCE(special.b.blocked); rcu_report_exp_rdp(rdp);
t->rcu_read_unlock_special.b.exp_need_qs = false;
rdp = this_cpu_ptr(rcu_state_p->rda);
rcu_report_exp_rdp(rcu_state_p, rdp, true);
if (!t->rcu_read_unlock_special.s) { if (!t->rcu_read_unlock_special.s) {
local_irq_restore(flags); local_irq_restore(flags);
return; return;
} }
} }
/* Hardware IRQ handlers cannot block, complain if they get here. */
if (in_irq() || in_serving_softirq()) {
lockdep_rcu_suspicious(__FILE__, __LINE__,
"rcu_read_unlock() from irq or softirq with blocking in critical section!!!\n");
pr_alert("->rcu_read_unlock_special: %#x (b: %d, enq: %d nq: %d)\n",
t->rcu_read_unlock_special.s,
t->rcu_read_unlock_special.b.blocked,
t->rcu_read_unlock_special.b.exp_need_qs,
t->rcu_read_unlock_special.b.need_qs);
local_irq_restore(flags);
return;
}
/* Clean up if blocked during RCU read-side critical section. */ /* Clean up if blocked during RCU read-side critical section. */
if (special.b.blocked) { if (special.b.blocked) {
t->rcu_read_unlock_special.b.blocked = false; t->rcu_read_unlock_special.b.blocked = false;
...@@ -582,7 +566,7 @@ static void rcu_read_unlock_special(struct task_struct *t) ...@@ -582,7 +566,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
rnp->grplo, rnp->grplo,
rnp->grphi, rnp->grphi,
!!rnp->gp_tasks); !!rnp->gp_tasks);
rcu_report_unblock_qs_rnp(rcu_state_p, rnp, flags); rcu_report_unblock_qs_rnp(rnp, flags);
} else { } else {
raw_spin_unlock_irqrestore_rcu_node(rnp, flags); raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
} }
...@@ -596,12 +580,78 @@ static void rcu_read_unlock_special(struct task_struct *t) ...@@ -596,12 +580,78 @@ static void rcu_read_unlock_special(struct task_struct *t)
* then we need to report up the rcu_node hierarchy. * then we need to report up the rcu_node hierarchy.
*/ */
if (!empty_exp && empty_exp_now) if (!empty_exp && empty_exp_now)
rcu_report_exp_rnp(rcu_state_p, rnp, true); rcu_report_exp_rnp(rnp, true);
} else { } else {
local_irq_restore(flags); local_irq_restore(flags);
} }
} }
/*
* Is a deferred quiescent-state pending, and are we also not in
* an RCU read-side critical section? It is the caller's responsibility
* to ensure it is otherwise safe to report any deferred quiescent
* states. The reason for this is that it is safe to report a
* quiescent state during context switch even though preemption
* is disabled. This function cannot be expected to understand these
* nuances, so the caller must handle them.
*/
static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
{
return (this_cpu_ptr(&rcu_data)->deferred_qs ||
READ_ONCE(t->rcu_read_unlock_special.s)) &&
t->rcu_read_lock_nesting <= 0;
}
/*
* Report a deferred quiescent state if needed and safe to do so.
* As with rcu_preempt_need_deferred_qs(), "safe" involves only
* not being in an RCU read-side critical section. The caller must
* evaluate safety in terms of interrupt, softirq, and preemption
* disabling.
*/
static void rcu_preempt_deferred_qs(struct task_struct *t)
{
unsigned long flags;
bool couldrecurse = t->rcu_read_lock_nesting >= 0;
if (!rcu_preempt_need_deferred_qs(t))
return;
if (couldrecurse)
t->rcu_read_lock_nesting -= INT_MIN;
local_irq_save(flags);
rcu_preempt_deferred_qs_irqrestore(t, flags);
if (couldrecurse)
t->rcu_read_lock_nesting += INT_MIN;
}
/*
* Handle special cases during rcu_read_unlock(), such as needing to
* notify RCU core processing or task having blocked during the RCU
* read-side critical section.
*/
static void rcu_read_unlock_special(struct task_struct *t)
{
unsigned long flags;
bool preempt_bh_were_disabled =
!!(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK));
bool irqs_were_disabled;
/* NMI handlers cannot block and cannot safely manipulate state. */
if (in_nmi())
return;
local_irq_save(flags);
irqs_were_disabled = irqs_disabled_flags(flags);
if ((preempt_bh_were_disabled || irqs_were_disabled) &&
t->rcu_read_unlock_special.b.blocked) {
/* Need to defer quiescent state until everything is enabled. */
raise_softirq_irqoff(RCU_SOFTIRQ);
local_irq_restore(flags);
return;
}
rcu_preempt_deferred_qs_irqrestore(t, flags);
}
/* /*
* Dump detailed information for all tasks blocking the current RCU * Dump detailed information for all tasks blocking the current RCU
* grace period on the specified rcu_node structure. * grace period on the specified rcu_node structure.
...@@ -633,12 +683,12 @@ static void rcu_print_detail_task_stall_rnp(struct rcu_node *rnp) ...@@ -633,12 +683,12 @@ static void rcu_print_detail_task_stall_rnp(struct rcu_node *rnp)
* Dump detailed information for all tasks blocking the current RCU * Dump detailed information for all tasks blocking the current RCU
* grace period. * grace period.
*/ */
static void rcu_print_detail_task_stall(struct rcu_state *rsp) static void rcu_print_detail_task_stall(void)
{ {
struct rcu_node *rnp = rcu_get_root(rsp); struct rcu_node *rnp = rcu_get_root();
rcu_print_detail_task_stall_rnp(rnp); rcu_print_detail_task_stall_rnp(rnp);
rcu_for_each_leaf_node(rsp, rnp) rcu_for_each_leaf_node(rnp)
rcu_print_detail_task_stall_rnp(rnp); rcu_print_detail_task_stall_rnp(rnp);
} }
...@@ -706,14 +756,13 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp) ...@@ -706,14 +756,13 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
* Also, if there are blocked tasks on the list, they automatically * Also, if there are blocked tasks on the list, they automatically
* block the newly created grace period, so set up ->gp_tasks accordingly. * block the newly created grace period, so set up ->gp_tasks accordingly.
*/ */
static void static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
rcu_preempt_check_blocked_tasks(struct rcu_state *rsp, struct rcu_node *rnp)
{ {
struct task_struct *t; struct task_struct *t;
RCU_LOCKDEP_WARN(preemptible(), "rcu_preempt_check_blocked_tasks() invoked with preemption enabled!!!\n"); RCU_LOCKDEP_WARN(preemptible(), "rcu_preempt_check_blocked_tasks() invoked with preemption enabled!!!\n");
if (WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp(rnp))) if (WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp(rnp)))
dump_blkd_tasks(rsp, rnp, 10); dump_blkd_tasks(rnp, 10);
if (rcu_preempt_has_tasks(rnp) && if (rcu_preempt_has_tasks(rnp) &&
(rnp->qsmaskinit || rnp->wait_blkd_tasks)) { (rnp->qsmaskinit || rnp->wait_blkd_tasks)) {
rnp->gp_tasks = rnp->blkd_tasks.next; rnp->gp_tasks = rnp->blkd_tasks.next;
...@@ -732,61 +781,37 @@ rcu_preempt_check_blocked_tasks(struct rcu_state *rsp, struct rcu_node *rnp) ...@@ -732,61 +781,37 @@ rcu_preempt_check_blocked_tasks(struct rcu_state *rsp, struct rcu_node *rnp)
* *
* Caller must disable hard irqs. * Caller must disable hard irqs.
*/ */
static void rcu_preempt_check_callbacks(void) static void rcu_flavor_check_callbacks(int user)
{ {
struct rcu_state *rsp = &rcu_preempt_state;
struct task_struct *t = current; struct task_struct *t = current;
if (t->rcu_read_lock_nesting == 0) { if (user || rcu_is_cpu_rrupt_from_idle()) {
rcu_preempt_qs(); rcu_note_voluntary_context_switch(current);
}
if (t->rcu_read_lock_nesting > 0 ||
(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK))) {
/* No QS, force context switch if deferred. */
if (rcu_preempt_need_deferred_qs(t)) {
set_tsk_need_resched(t);
set_preempt_need_resched();
}
} else if (rcu_preempt_need_deferred_qs(t)) {
rcu_preempt_deferred_qs(t); /* Report deferred QS. */
return;
} else if (!t->rcu_read_lock_nesting) {
rcu_qs(); /* Report immediate QS. */
return; return;
} }
/* If GP is oldish, ask for help from rcu_read_unlock_special(). */
if (t->rcu_read_lock_nesting > 0 && if (t->rcu_read_lock_nesting > 0 &&
__this_cpu_read(rcu_data_p->core_needs_qs) && __this_cpu_read(rcu_data.core_needs_qs) &&
__this_cpu_read(rcu_data_p->cpu_no_qs.b.norm) && __this_cpu_read(rcu_data.cpu_no_qs.b.norm) &&
!t->rcu_read_unlock_special.b.need_qs && !t->rcu_read_unlock_special.b.need_qs &&
time_after(jiffies, rsp->gp_start + HZ)) time_after(jiffies, rcu_state.gp_start + HZ))
t->rcu_read_unlock_special.b.need_qs = true; t->rcu_read_unlock_special.b.need_qs = true;
} }
/**
* call_rcu() - Queue an RCU callback for invocation after a grace period.
* @head: structure to be used for queueing the RCU updates.
* @func: actual callback function to be invoked after the grace period
*
* The callback function will be invoked some time after a full grace
* period elapses, in other words after all pre-existing RCU read-side
* critical sections have completed. However, the callback function
* might well execute concurrently with RCU read-side critical sections
* that started after call_rcu() was invoked. RCU read-side critical
* sections are delimited by rcu_read_lock() and rcu_read_unlock(),
* and may be nested.
*
* Note that all CPUs must agree that the grace period extended beyond
* all pre-existing RCU read-side critical section. On systems with more
* than one CPU, this means that when "func()" is invoked, each CPU is
* guaranteed to have executed a full memory barrier since the end of its
* last RCU read-side critical section whose beginning preceded the call
* to call_rcu(). It also means that each CPU executing an RCU read-side
* critical section that continues beyond the start of "func()" must have
* executed a memory barrier after the call_rcu() but before the beginning
* of that RCU read-side critical section. Note that these guarantees
* include CPUs that are offline, idle, or executing in user mode, as
* well as CPUs that are executing in the kernel.
*
* Furthermore, if CPU A invoked call_rcu() and CPU B invoked the
* resulting RCU callback function "func()", then both CPU A and CPU B are
* guaranteed to execute a full memory barrier during the time interval
* between the call to call_rcu() and the invocation of "func()" -- even
* if CPU A and CPU B are the same CPU (but again only if the system has
* more than one CPU).
*/
void call_rcu(struct rcu_head *head, rcu_callback_t func)
{
__call_rcu(head, func, rcu_state_p, -1, 0);
}
EXPORT_SYMBOL_GPL(call_rcu);
/** /**
* synchronize_rcu - wait until a grace period has elapsed. * synchronize_rcu - wait until a grace period has elapsed.
* *
...@@ -797,14 +822,28 @@ EXPORT_SYMBOL_GPL(call_rcu); ...@@ -797,14 +822,28 @@ EXPORT_SYMBOL_GPL(call_rcu);
* concurrently with new RCU read-side critical sections that began while * concurrently with new RCU read-side critical sections that began while
* synchronize_rcu() was waiting. RCU read-side critical sections are * synchronize_rcu() was waiting. RCU read-side critical sections are
* delimited by rcu_read_lock() and rcu_read_unlock(), and may be nested. * delimited by rcu_read_lock() and rcu_read_unlock(), and may be nested.
* In addition, regions of code across which interrupts, preemption, or
* softirqs have been disabled also serve as RCU read-side critical
* sections. This includes hardware interrupt handlers, softirq handlers,
* and NMI handlers.
*
* Note that this guarantee implies further memory-ordering guarantees.
* On systems with more than one CPU, when synchronize_rcu() returns,
* each CPU is guaranteed to have executed a full memory barrier since
* the end of its last RCU read-side critical section whose beginning
* preceded the call to synchronize_rcu(). In addition, each CPU having
* an RCU read-side critical section that extends beyond the return from
* synchronize_rcu() is guaranteed to have executed a full memory barrier
* after the beginning of synchronize_rcu() and before the beginning of
* that RCU read-side critical section. Note that these guarantees include
* CPUs that are offline, idle, or executing in user mode, as well as CPUs
* that are executing in the kernel.
* *
* See the description of synchronize_sched() for more detailed * Furthermore, if CPU A invoked synchronize_rcu(), which returned
* information on memory-ordering guarantees. However, please note * to its caller on CPU B, then both CPU A and CPU B are guaranteed
* that -only- the memory-ordering guarantees apply. For example, * to have executed a full memory barrier during the execution of
* synchronize_rcu() is -not- guaranteed to wait on things like code * synchronize_rcu() -- even if CPU A and CPU B are the same CPU (but
* protected by preempt_disable(), instead, synchronize_rcu() is -only- * again only if the system has more than one CPU).
* guaranteed to wait on RCU read-side critical sections, that is, sections
* of code protected by rcu_read_lock().
*/ */
void synchronize_rcu(void) void synchronize_rcu(void)
{ {
...@@ -821,28 +860,6 @@ void synchronize_rcu(void) ...@@ -821,28 +860,6 @@ void synchronize_rcu(void)
} }
EXPORT_SYMBOL_GPL(synchronize_rcu); EXPORT_SYMBOL_GPL(synchronize_rcu);
/**
* rcu_barrier - Wait until all in-flight call_rcu() callbacks complete.
*
* Note that this primitive does not necessarily wait for an RCU grace period
* to complete. For example, if there are no RCU callbacks queued anywhere
* in the system, then rcu_barrier() is within its rights to return
* immediately, without waiting for anything, much less an RCU grace period.
*/
void rcu_barrier(void)
{
_rcu_barrier(rcu_state_p);
}
EXPORT_SYMBOL_GPL(rcu_barrier);
/*
* Initialize preemptible RCU's state structures.
*/
static void __init __rcu_init_preempt(void)
{
rcu_init_one(rcu_state_p);
}
/* /*
* Check for a task exiting while in a preemptible-RCU read-side * Check for a task exiting while in a preemptible-RCU read-side
* critical section, clean up if so. No need to issue warnings, * critical section, clean up if so. No need to issue warnings,
...@@ -859,6 +876,7 @@ void exit_rcu(void) ...@@ -859,6 +876,7 @@ void exit_rcu(void)
barrier(); barrier();
t->rcu_read_unlock_special.b.blocked = true; t->rcu_read_unlock_special.b.blocked = true;
__rcu_read_unlock(); __rcu_read_unlock();
rcu_preempt_deferred_qs(current);
} }
/* /*
...@@ -866,7 +884,7 @@ void exit_rcu(void) ...@@ -866,7 +884,7 @@ void exit_rcu(void)
* specified number of elements. * specified number of elements.
*/ */
static void static void
dump_blkd_tasks(struct rcu_state *rsp, struct rcu_node *rnp, int ncheck) dump_blkd_tasks(struct rcu_node *rnp, int ncheck)
{ {
int cpu; int cpu;
int i; int i;
...@@ -893,7 +911,7 @@ dump_blkd_tasks(struct rcu_state *rsp, struct rcu_node *rnp, int ncheck) ...@@ -893,7 +911,7 @@ dump_blkd_tasks(struct rcu_state *rsp, struct rcu_node *rnp, int ncheck)
} }
pr_cont("\n"); pr_cont("\n");
for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++) { for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++) {
rdp = per_cpu_ptr(rsp->rda, cpu); rdp = per_cpu_ptr(&rcu_data, cpu);
onl = !!(rdp->grpmask & rcu_rnp_online_cpus(rnp)); onl = !!(rdp->grpmask & rcu_rnp_online_cpus(rnp));
pr_info("\t%d: %c online: %ld(%d) offline: %ld(%d)\n", pr_info("\t%d: %c online: %ld(%d) offline: %ld(%d)\n",
cpu, ".o"[onl], cpu, ".o"[onl],
...@@ -904,8 +922,6 @@ dump_blkd_tasks(struct rcu_state *rsp, struct rcu_node *rnp, int ncheck) ...@@ -904,8 +922,6 @@ dump_blkd_tasks(struct rcu_state *rsp, struct rcu_node *rnp, int ncheck)
#else /* #ifdef CONFIG_PREEMPT_RCU */ #else /* #ifdef CONFIG_PREEMPT_RCU */
static struct rcu_state *const rcu_state_p = &rcu_sched_state;
/* /*
* Tell them what RCU they are running. * Tell them what RCU they are running.
*/ */
...@@ -916,13 +932,84 @@ static void __init rcu_bootup_announce(void) ...@@ -916,13 +932,84 @@ static void __init rcu_bootup_announce(void)
} }
/* /*
* Because preemptible RCU does not exist, we never have to check for * Note a quiescent state for PREEMPT=n. Because we do not need to know
* CPUs being in quiescent states. * how many quiescent states passed, just if there was at least one since
* the start of the grace period, this just sets a flag. The caller must
* have disabled preemption.
*/ */
static void rcu_preempt_note_context_switch(bool preempt) static void rcu_qs(void)
{ {
RCU_LOCKDEP_WARN(preemptible(), "rcu_qs() invoked with preemption enabled!!!");
if (!__this_cpu_read(rcu_data.cpu_no_qs.s))
return;
trace_rcu_grace_period(TPS("rcu_sched"),
__this_cpu_read(rcu_data.gp_seq), TPS("cpuqs"));
__this_cpu_write(rcu_data.cpu_no_qs.b.norm, false);
if (!__this_cpu_read(rcu_data.cpu_no_qs.b.exp))
return;
__this_cpu_write(rcu_data.cpu_no_qs.b.exp, false);
rcu_report_exp_rdp(this_cpu_ptr(&rcu_data));
} }
/*
* Register an urgently needed quiescent state. If there is an
* emergency, invoke rcu_momentary_dyntick_idle() to do a heavy-weight
* dyntick-idle quiescent state visible to other CPUs, which will in
* some cases serve for expedited as well as normal grace periods.
* Either way, register a lightweight quiescent state.
*
* The barrier() calls are redundant in the common case when this is
* called externally, but just in case this is called from within this
* file.
*
*/
void rcu_all_qs(void)
{
unsigned long flags;
if (!raw_cpu_read(rcu_data.rcu_urgent_qs))
return;
preempt_disable();
/* Load rcu_urgent_qs before other flags. */
if (!smp_load_acquire(this_cpu_ptr(&rcu_data.rcu_urgent_qs))) {
preempt_enable();
return;
}
this_cpu_write(rcu_data.rcu_urgent_qs, false);
barrier(); /* Avoid RCU read-side critical sections leaking down. */
if (unlikely(raw_cpu_read(rcu_data.rcu_need_heavy_qs))) {
local_irq_save(flags);
rcu_momentary_dyntick_idle();
local_irq_restore(flags);
}
rcu_qs();
barrier(); /* Avoid RCU read-side critical sections leaking up. */
preempt_enable();
}
EXPORT_SYMBOL_GPL(rcu_all_qs);
/*
* Note a PREEMPT=n context switch. The caller must have disabled interrupts.
*/
void rcu_note_context_switch(bool preempt)
{
barrier(); /* Avoid RCU read-side critical sections leaking down. */
trace_rcu_utilization(TPS("Start context switch"));
rcu_qs();
/* Load rcu_urgent_qs before other flags. */
if (!smp_load_acquire(this_cpu_ptr(&rcu_data.rcu_urgent_qs)))
goto out;
this_cpu_write(rcu_data.rcu_urgent_qs, false);
if (unlikely(raw_cpu_read(rcu_data.rcu_need_heavy_qs)))
rcu_momentary_dyntick_idle();
if (!preempt)
rcu_tasks_qs(current);
out:
trace_rcu_utilization(TPS("End context switch"));
barrier(); /* Avoid RCU read-side critical sections leaking up. */
}
EXPORT_SYMBOL_GPL(rcu_note_context_switch);
/* /*
* Because preemptible RCU does not exist, there are never any preempted * Because preemptible RCU does not exist, there are never any preempted
* RCU readers. * RCU readers.
...@@ -940,11 +1027,21 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp) ...@@ -940,11 +1027,21 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp)
return false; return false;
} }
/*
* Because there is no preemptible RCU, there can be no deferred quiescent
* states.
*/
static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
{
return false;
}
static void rcu_preempt_deferred_qs(struct task_struct *t) { }
/* /*
* Because preemptible RCU does not exist, we never have to check for * Because preemptible RCU does not exist, we never have to check for
* tasks blocked within RCU read-side critical sections. * tasks blocked within RCU read-side critical sections.
*/ */
static void rcu_print_detail_task_stall(struct rcu_state *rsp) static void rcu_print_detail_task_stall(void)
{ {
} }
...@@ -972,36 +1069,54 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp) ...@@ -972,36 +1069,54 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
* so there is no need to check for blocked tasks. So check only for * so there is no need to check for blocked tasks. So check only for
* bogus qsmask values. * bogus qsmask values.
*/ */
static void static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
rcu_preempt_check_blocked_tasks(struct rcu_state *rsp, struct rcu_node *rnp)
{ {
WARN_ON_ONCE(rnp->qsmask); WARN_ON_ONCE(rnp->qsmask);
} }
/* /*
* Because preemptible RCU does not exist, it never has any callbacks * Check to see if this CPU is in a non-context-switch quiescent state
* to check. * (user mode or idle loop for rcu, non-softirq execution for rcu_bh).
* Also schedule RCU core processing.
*
* This function must be called from hardirq context. It is normally
* invoked from the scheduling-clock interrupt.
*/ */
static void rcu_preempt_check_callbacks(void) static void rcu_flavor_check_callbacks(int user)
{ {
} if (user || rcu_is_cpu_rrupt_from_idle()) {
/* /*
* Because preemptible RCU does not exist, rcu_barrier() is just * Get here if this CPU took its interrupt from user
* another name for rcu_barrier_sched(). * mode or from the idle loop, and if this is not a
*/ * nested interrupt. In this case, the CPU is in
void rcu_barrier(void) * a quiescent state, so note it.
{ *
rcu_barrier_sched(); * No memory barrier is required here because rcu_qs()
* references only CPU-local variables that other CPUs
* neither access nor modify, at least not while the
* corresponding CPU is online.
*/
rcu_qs();
}
} }
EXPORT_SYMBOL_GPL(rcu_barrier);
/* /* PREEMPT=n implementation of synchronize_rcu(). */
* Because preemptible RCU does not exist, it need not be initialized. void synchronize_rcu(void)
*/
static void __init __rcu_init_preempt(void)
{ {
RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map) ||
lock_is_held(&rcu_lock_map) ||
lock_is_held(&rcu_sched_lock_map),
"Illegal synchronize_rcu() in RCU read-side critical section");
if (rcu_blocking_is_gp())
return;
if (rcu_gp_is_expedited())
synchronize_rcu_expedited();
else
wait_rcu_gp(call_rcu);
} }
EXPORT_SYMBOL_GPL(synchronize_rcu);
/* /*
* Because preemptible RCU does not exist, tasks cannot possibly exit * Because preemptible RCU does not exist, tasks cannot possibly exit
...@@ -1015,7 +1130,7 @@ void exit_rcu(void) ...@@ -1015,7 +1130,7 @@ void exit_rcu(void)
* Dump the guaranteed-empty blocked-tasks state. Trust but verify. * Dump the guaranteed-empty blocked-tasks state. Trust but verify.
*/ */
static void static void
dump_blkd_tasks(struct rcu_state *rsp, struct rcu_node *rnp, int ncheck) dump_blkd_tasks(struct rcu_node *rnp, int ncheck)
{ {
WARN_ON_ONCE(!list_empty(&rnp->blkd_tasks)); WARN_ON_ONCE(!list_empty(&rnp->blkd_tasks));
} }
...@@ -1212,21 +1327,20 @@ static void rcu_preempt_boost_start_gp(struct rcu_node *rnp) ...@@ -1212,21 +1327,20 @@ static void rcu_preempt_boost_start_gp(struct rcu_node *rnp)
* already exist. We only create this kthread for preemptible RCU. * already exist. We only create this kthread for preemptible RCU.
* Returns zero if all is well, a negated errno otherwise. * Returns zero if all is well, a negated errno otherwise.
*/ */
static int rcu_spawn_one_boost_kthread(struct rcu_state *rsp, static int rcu_spawn_one_boost_kthread(struct rcu_node *rnp)
struct rcu_node *rnp)
{ {
int rnp_index = rnp - &rsp->node[0]; int rnp_index = rnp - rcu_get_root();
unsigned long flags; unsigned long flags;
struct sched_param sp; struct sched_param sp;
struct task_struct *t; struct task_struct *t;
if (rcu_state_p != rsp) if (!IS_ENABLED(CONFIG_PREEMPT_RCU))
return 0; return 0;
if (!rcu_scheduler_fully_active || rcu_rnp_online_cpus(rnp) == 0) if (!rcu_scheduler_fully_active || rcu_rnp_online_cpus(rnp) == 0)
return 0; return 0;
rsp->boost = 1; rcu_state.boost = 1;
if (rnp->boost_kthread_task != NULL) if (rnp->boost_kthread_task != NULL)
return 0; return 0;
t = kthread_create(rcu_boost_kthread, (void *)rnp, t = kthread_create(rcu_boost_kthread, (void *)rnp,
...@@ -1244,9 +1358,7 @@ static int rcu_spawn_one_boost_kthread(struct rcu_state *rsp, ...@@ -1244,9 +1358,7 @@ static int rcu_spawn_one_boost_kthread(struct rcu_state *rsp,
static void rcu_kthread_do_work(void) static void rcu_kthread_do_work(void)
{ {
rcu_do_batch(&rcu_sched_state, this_cpu_ptr(&rcu_sched_data)); rcu_do_batch(this_cpu_ptr(&rcu_data));
rcu_do_batch(&rcu_bh_state, this_cpu_ptr(&rcu_bh_data));
rcu_do_batch(&rcu_preempt_state, this_cpu_ptr(&rcu_preempt_data));
} }
static void rcu_cpu_kthread_setup(unsigned int cpu) static void rcu_cpu_kthread_setup(unsigned int cpu)
...@@ -1268,9 +1380,9 @@ static int rcu_cpu_kthread_should_run(unsigned int cpu) ...@@ -1268,9 +1380,9 @@ static int rcu_cpu_kthread_should_run(unsigned int cpu)
} }
/* /*
* Per-CPU kernel thread that invokes RCU callbacks. This replaces the * Per-CPU kernel thread that invokes RCU callbacks. This replaces
* RCU softirq used in flavors and configurations of RCU that do not * the RCU softirq used in configurations of RCU that do not support RCU
* support RCU priority boosting. * priority boosting.
*/ */
static void rcu_cpu_kthread(unsigned int cpu) static void rcu_cpu_kthread(unsigned int cpu)
{ {
...@@ -1353,18 +1465,18 @@ static void __init rcu_spawn_boost_kthreads(void) ...@@ -1353,18 +1465,18 @@ static void __init rcu_spawn_boost_kthreads(void)
for_each_possible_cpu(cpu) for_each_possible_cpu(cpu)
per_cpu(rcu_cpu_has_work, cpu) = 0; per_cpu(rcu_cpu_has_work, cpu) = 0;
BUG_ON(smpboot_register_percpu_thread(&rcu_cpu_thread_spec)); BUG_ON(smpboot_register_percpu_thread(&rcu_cpu_thread_spec));
rcu_for_each_leaf_node(rcu_state_p, rnp) rcu_for_each_leaf_node(rnp)
(void)rcu_spawn_one_boost_kthread(rcu_state_p, rnp); (void)rcu_spawn_one_boost_kthread(rnp);
} }
static void rcu_prepare_kthreads(int cpu) static void rcu_prepare_kthreads(int cpu)
{ {
struct rcu_data *rdp = per_cpu_ptr(rcu_state_p->rda, cpu); struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
struct rcu_node *rnp = rdp->mynode; struct rcu_node *rnp = rdp->mynode;
/* Fire up the incoming CPU's kthread and leaf rcu_node kthread. */ /* Fire up the incoming CPU's kthread and leaf rcu_node kthread. */
if (rcu_scheduler_fully_active) if (rcu_scheduler_fully_active)
(void)rcu_spawn_one_boost_kthread(rcu_state_p, rnp); (void)rcu_spawn_one_boost_kthread(rnp);
} }
#else /* #ifdef CONFIG_RCU_BOOST */ #else /* #ifdef CONFIG_RCU_BOOST */
...@@ -1411,8 +1523,8 @@ static void rcu_prepare_kthreads(int cpu) ...@@ -1411,8 +1523,8 @@ static void rcu_prepare_kthreads(int cpu)
* 1 if so. This function is part of the RCU implementation; it is -not- * 1 if so. This function is part of the RCU implementation; it is -not-
* an exported member of the RCU API. * an exported member of the RCU API.
* *
* Because we not have RCU_FAST_NO_HZ, just check whether this CPU needs * Because we not have RCU_FAST_NO_HZ, just check whether or not this
* any flavor of RCU. * CPU has RCU callbacks queued.
*/ */
int rcu_needs_cpu(u64 basemono, u64 *nextevt) int rcu_needs_cpu(u64 basemono, u64 *nextevt)
{ {
...@@ -1478,41 +1590,36 @@ static int rcu_idle_lazy_gp_delay = RCU_IDLE_LAZY_GP_DELAY; ...@@ -1478,41 +1590,36 @@ static int rcu_idle_lazy_gp_delay = RCU_IDLE_LAZY_GP_DELAY;
module_param(rcu_idle_lazy_gp_delay, int, 0644); module_param(rcu_idle_lazy_gp_delay, int, 0644);
/* /*
* Try to advance callbacks for all flavors of RCU on the current CPU, but * Try to advance callbacks on the current CPU, but only if it has been
* only if it has been awhile since the last time we did so. Afterwards, * awhile since the last time we did so. Afterwards, if there are any
* if there are any callbacks ready for immediate invocation, return true. * callbacks ready for immediate invocation, return true.
*/ */
static bool __maybe_unused rcu_try_advance_all_cbs(void) static bool __maybe_unused rcu_try_advance_all_cbs(void)
{ {
bool cbs_ready = false; bool cbs_ready = false;
struct rcu_data *rdp; struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
struct rcu_node *rnp; struct rcu_node *rnp;
struct rcu_state *rsp;
/* Exit early if we advanced recently. */ /* Exit early if we advanced recently. */
if (jiffies == rdtp->last_advance_all) if (jiffies == rdp->last_advance_all)
return false; return false;
rdtp->last_advance_all = jiffies; rdp->last_advance_all = jiffies;
for_each_rcu_flavor(rsp) { rnp = rdp->mynode;
rdp = this_cpu_ptr(rsp->rda);
rnp = rdp->mynode;
/* /*
* Don't bother checking unless a grace period has * Don't bother checking unless a grace period has
* completed since we last checked and there are * completed since we last checked and there are
* callbacks not yet ready to invoke. * callbacks not yet ready to invoke.
*/ */
if ((rcu_seq_completed_gp(rdp->gp_seq, if ((rcu_seq_completed_gp(rdp->gp_seq,
rcu_seq_current(&rnp->gp_seq)) || rcu_seq_current(&rnp->gp_seq)) ||
unlikely(READ_ONCE(rdp->gpwrap))) && unlikely(READ_ONCE(rdp->gpwrap))) &&
rcu_segcblist_pend_cbs(&rdp->cblist)) rcu_segcblist_pend_cbs(&rdp->cblist))
note_gp_changes(rsp, rdp); note_gp_changes(rdp);
if (rcu_segcblist_ready_cbs(&rdp->cblist)) if (rcu_segcblist_ready_cbs(&rdp->cblist))
cbs_ready = true; cbs_ready = true;
}
return cbs_ready; return cbs_ready;
} }
...@@ -1526,16 +1633,16 @@ static bool __maybe_unused rcu_try_advance_all_cbs(void) ...@@ -1526,16 +1633,16 @@ static bool __maybe_unused rcu_try_advance_all_cbs(void)
*/ */
int rcu_needs_cpu(u64 basemono, u64 *nextevt) int rcu_needs_cpu(u64 basemono, u64 *nextevt)
{ {
struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks); struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
unsigned long dj; unsigned long dj;
lockdep_assert_irqs_disabled(); lockdep_assert_irqs_disabled();
/* Snapshot to detect later posting of non-lazy callback. */ /* Snapshot to detect later posting of non-lazy callback. */
rdtp->nonlazy_posted_snap = rdtp->nonlazy_posted; rdp->nonlazy_posted_snap = rdp->nonlazy_posted;
/* If no callbacks, RCU doesn't need the CPU. */ /* If no callbacks, RCU doesn't need the CPU. */
if (!rcu_cpu_has_callbacks(&rdtp->all_lazy)) { if (!rcu_cpu_has_callbacks(&rdp->all_lazy)) {
*nextevt = KTIME_MAX; *nextevt = KTIME_MAX;
return 0; return 0;
} }
...@@ -1546,10 +1653,10 @@ int rcu_needs_cpu(u64 basemono, u64 *nextevt) ...@@ -1546,10 +1653,10 @@ int rcu_needs_cpu(u64 basemono, u64 *nextevt)
invoke_rcu_core(); invoke_rcu_core();
return 1; return 1;
} }
rdtp->last_accelerate = jiffies; rdp->last_accelerate = jiffies;
/* Request timer delay depending on laziness, and round. */ /* Request timer delay depending on laziness, and round. */
if (!rdtp->all_lazy) { if (!rdp->all_lazy) {
dj = round_up(rcu_idle_gp_delay + jiffies, dj = round_up(rcu_idle_gp_delay + jiffies,
rcu_idle_gp_delay) - jiffies; rcu_idle_gp_delay) - jiffies;
} else { } else {
...@@ -1572,10 +1679,8 @@ int rcu_needs_cpu(u64 basemono, u64 *nextevt) ...@@ -1572,10 +1679,8 @@ int rcu_needs_cpu(u64 basemono, u64 *nextevt)
static void rcu_prepare_for_idle(void) static void rcu_prepare_for_idle(void)
{ {
bool needwake; bool needwake;
struct rcu_data *rdp; struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
struct rcu_node *rnp; struct rcu_node *rnp;
struct rcu_state *rsp;
int tne; int tne;
lockdep_assert_irqs_disabled(); lockdep_assert_irqs_disabled();
...@@ -1584,10 +1689,10 @@ static void rcu_prepare_for_idle(void) ...@@ -1584,10 +1689,10 @@ static void rcu_prepare_for_idle(void)
/* Handle nohz enablement switches conservatively. */ /* Handle nohz enablement switches conservatively. */
tne = READ_ONCE(tick_nohz_active); tne = READ_ONCE(tick_nohz_active);
if (tne != rdtp->tick_nohz_enabled_snap) { if (tne != rdp->tick_nohz_enabled_snap) {
if (rcu_cpu_has_callbacks(NULL)) if (rcu_cpu_has_callbacks(NULL))
invoke_rcu_core(); /* force nohz to see update. */ invoke_rcu_core(); /* force nohz to see update. */
rdtp->tick_nohz_enabled_snap = tne; rdp->tick_nohz_enabled_snap = tne;
return; return;
} }
if (!tne) if (!tne)
...@@ -1598,10 +1703,10 @@ static void rcu_prepare_for_idle(void) ...@@ -1598,10 +1703,10 @@ static void rcu_prepare_for_idle(void)
* callbacks, invoke RCU core for the side-effect of recalculating * callbacks, invoke RCU core for the side-effect of recalculating
* idle duration on re-entry to idle. * idle duration on re-entry to idle.
*/ */
if (rdtp->all_lazy && if (rdp->all_lazy &&
rdtp->nonlazy_posted != rdtp->nonlazy_posted_snap) { rdp->nonlazy_posted != rdp->nonlazy_posted_snap) {
rdtp->all_lazy = false; rdp->all_lazy = false;
rdtp->nonlazy_posted_snap = rdtp->nonlazy_posted; rdp->nonlazy_posted_snap = rdp->nonlazy_posted;
invoke_rcu_core(); invoke_rcu_core();
return; return;
} }
...@@ -1610,19 +1715,16 @@ static void rcu_prepare_for_idle(void) ...@@ -1610,19 +1715,16 @@ static void rcu_prepare_for_idle(void)
* If we have not yet accelerated this jiffy, accelerate all * If we have not yet accelerated this jiffy, accelerate all
* callbacks on this CPU. * callbacks on this CPU.
*/ */
if (rdtp->last_accelerate == jiffies) if (rdp->last_accelerate == jiffies)
return; return;
rdtp->last_accelerate = jiffies; rdp->last_accelerate = jiffies;
for_each_rcu_flavor(rsp) { if (rcu_segcblist_pend_cbs(&rdp->cblist)) {
rdp = this_cpu_ptr(rsp->rda);
if (!rcu_segcblist_pend_cbs(&rdp->cblist))
continue;
rnp = rdp->mynode; rnp = rdp->mynode;
raw_spin_lock_rcu_node(rnp); /* irqs already disabled. */ raw_spin_lock_rcu_node(rnp); /* irqs already disabled. */
needwake = rcu_accelerate_cbs(rsp, rnp, rdp); needwake = rcu_accelerate_cbs(rnp, rdp);
raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */ raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */
if (needwake) if (needwake)
rcu_gp_kthread_wake(rsp); rcu_gp_kthread_wake();
} }
} }
...@@ -1650,104 +1752,23 @@ static void rcu_cleanup_after_idle(void) ...@@ -1650,104 +1752,23 @@ static void rcu_cleanup_after_idle(void)
*/ */
static void rcu_idle_count_callbacks_posted(void) static void rcu_idle_count_callbacks_posted(void)
{ {
__this_cpu_add(rcu_dynticks.nonlazy_posted, 1); __this_cpu_add(rcu_data.nonlazy_posted, 1);
}
/*
* Data for flushing lazy RCU callbacks at OOM time.
*/
static atomic_t oom_callback_count;
static DECLARE_WAIT_QUEUE_HEAD(oom_callback_wq);
/*
* RCU OOM callback -- decrement the outstanding count and deliver the
* wake-up if we are the last one.
*/
static void rcu_oom_callback(struct rcu_head *rhp)
{
if (atomic_dec_and_test(&oom_callback_count))
wake_up(&oom_callback_wq);
}
/*
* Post an rcu_oom_notify callback on the current CPU if it has at
* least one lazy callback. This will unnecessarily post callbacks
* to CPUs that already have a non-lazy callback at the end of their
* callback list, but this is an infrequent operation, so accept some
* extra overhead to keep things simple.
*/
static void rcu_oom_notify_cpu(void *unused)
{
struct rcu_state *rsp;
struct rcu_data *rdp;
for_each_rcu_flavor(rsp) {
rdp = raw_cpu_ptr(rsp->rda);
if (rcu_segcblist_n_lazy_cbs(&rdp->cblist)) {
atomic_inc(&oom_callback_count);
rsp->call(&rdp->oom_head, rcu_oom_callback);
}
}
}
/*
* If low on memory, ensure that each CPU has a non-lazy callback.
* This will wake up CPUs that have only lazy callbacks, in turn
* ensuring that they free up the corresponding memory in a timely manner.
* Because an uncertain amount of memory will be freed in some uncertain
* timeframe, we do not claim to have freed anything.
*/
static int rcu_oom_notify(struct notifier_block *self,
unsigned long notused, void *nfreed)
{
int cpu;
/* Wait for callbacks from earlier instance to complete. */
wait_event(oom_callback_wq, atomic_read(&oom_callback_count) == 0);
smp_mb(); /* Ensure callback reuse happens after callback invocation. */
/*
* Prevent premature wakeup: ensure that all increments happen
* before there is a chance of the counter reaching zero.
*/
atomic_set(&oom_callback_count, 1);
for_each_online_cpu(cpu) {
smp_call_function_single(cpu, rcu_oom_notify_cpu, NULL, 1);
cond_resched_tasks_rcu_qs();
}
/* Unconditionally decrement: no need to wake ourselves up. */
atomic_dec(&oom_callback_count);
return NOTIFY_OK;
} }
static struct notifier_block rcu_oom_nb = {
.notifier_call = rcu_oom_notify
};
static int __init rcu_register_oom_notifier(void)
{
register_oom_notifier(&rcu_oom_nb);
return 0;
}
early_initcall(rcu_register_oom_notifier);
#endif /* #else #if !defined(CONFIG_RCU_FAST_NO_HZ) */ #endif /* #else #if !defined(CONFIG_RCU_FAST_NO_HZ) */
#ifdef CONFIG_RCU_FAST_NO_HZ #ifdef CONFIG_RCU_FAST_NO_HZ
static void print_cpu_stall_fast_no_hz(char *cp, int cpu) static void print_cpu_stall_fast_no_hz(char *cp, int cpu)
{ {
struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu); struct rcu_data *rdp = &per_cpu(rcu_data, cpu);
unsigned long nlpd = rdtp->nonlazy_posted - rdtp->nonlazy_posted_snap; unsigned long nlpd = rdp->nonlazy_posted - rdp->nonlazy_posted_snap;
sprintf(cp, "last_accelerate: %04lx/%04lx, nonlazy_posted: %ld, %c%c", sprintf(cp, "last_accelerate: %04lx/%04lx, nonlazy_posted: %ld, %c%c",
rdtp->last_accelerate & 0xffff, jiffies & 0xffff, rdp->last_accelerate & 0xffff, jiffies & 0xffff,
ulong2long(nlpd), ulong2long(nlpd),
rdtp->all_lazy ? 'L' : '.', rdp->all_lazy ? 'L' : '.',
rdtp->tick_nohz_enabled_snap ? '.' : 'D'); rdp->tick_nohz_enabled_snap ? '.' : 'D');
} }
#else /* #ifdef CONFIG_RCU_FAST_NO_HZ */ #else /* #ifdef CONFIG_RCU_FAST_NO_HZ */
...@@ -1768,21 +1789,19 @@ static void print_cpu_stall_info_begin(void) ...@@ -1768,21 +1789,19 @@ static void print_cpu_stall_info_begin(void)
/* /*
* Print out diagnostic information for the specified stalled CPU. * Print out diagnostic information for the specified stalled CPU.
* *
* If the specified CPU is aware of the current RCU grace period * If the specified CPU is aware of the current RCU grace period, then
* (flavor specified by rsp), then print the number of scheduling * print the number of scheduling clock interrupts the CPU has taken
* clock interrupts the CPU has taken during the time that it has * during the time that it has been aware. Otherwise, print the number
* been aware. Otherwise, print the number of RCU grace periods * of RCU grace periods that this CPU is ignorant of, for example, "1"
* that this CPU is ignorant of, for example, "1" if the CPU was * if the CPU was aware of the previous grace period.
* aware of the previous grace period.
* *
* Also print out idle and (if CONFIG_RCU_FAST_NO_HZ) idle-entry info. * Also print out idle and (if CONFIG_RCU_FAST_NO_HZ) idle-entry info.
*/ */
static void print_cpu_stall_info(struct rcu_state *rsp, int cpu) static void print_cpu_stall_info(int cpu)
{ {
unsigned long delta; unsigned long delta;
char fast_no_hz[72]; char fast_no_hz[72];
struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu); struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
struct rcu_dynticks *rdtp = rdp->dynticks;
char *ticks_title; char *ticks_title;
unsigned long ticks_value; unsigned long ticks_value;
...@@ -1792,7 +1811,7 @@ static void print_cpu_stall_info(struct rcu_state *rsp, int cpu) ...@@ -1792,7 +1811,7 @@ static void print_cpu_stall_info(struct rcu_state *rsp, int cpu)
*/ */
touch_nmi_watchdog(); touch_nmi_watchdog();
ticks_value = rcu_seq_ctr(rsp->gp_seq - rdp->gp_seq); ticks_value = rcu_seq_ctr(rcu_state.gp_seq - rdp->gp_seq);
if (ticks_value) { if (ticks_value) {
ticks_title = "GPs behind"; ticks_title = "GPs behind";
} else { } else {
...@@ -1810,10 +1829,10 @@ static void print_cpu_stall_info(struct rcu_state *rsp, int cpu) ...@@ -1810,10 +1829,10 @@ static void print_cpu_stall_info(struct rcu_state *rsp, int cpu)
rdp->rcu_iw_pending ? (int)min(delta, 9UL) + '0' : rdp->rcu_iw_pending ? (int)min(delta, 9UL) + '0' :
"!."[!delta], "!."[!delta],
ticks_value, ticks_title, ticks_value, ticks_title,
rcu_dynticks_snap(rdtp) & 0xfff, rcu_dynticks_snap(rdp) & 0xfff,
rdtp->dynticks_nesting, rdtp->dynticks_nmi_nesting, rdp->dynticks_nesting, rdp->dynticks_nmi_nesting,
rdp->softirq_snap, kstat_softirqs_cpu(RCU_SOFTIRQ, cpu), rdp->softirq_snap, kstat_softirqs_cpu(RCU_SOFTIRQ, cpu),
READ_ONCE(rsp->n_force_qs) - rsp->n_force_qs_gpstart, READ_ONCE(rcu_state.n_force_qs) - rcu_state.n_force_qs_gpstart,
fast_no_hz); fast_no_hz);
} }
...@@ -1823,20 +1842,12 @@ static void print_cpu_stall_info_end(void) ...@@ -1823,20 +1842,12 @@ static void print_cpu_stall_info_end(void)
pr_err("\t"); pr_err("\t");
} }
/* Zero ->ticks_this_gp for all flavors of RCU. */ /* Zero ->ticks_this_gp and snapshot the number of RCU softirq handlers. */
static void zero_cpu_stall_ticks(struct rcu_data *rdp) static void zero_cpu_stall_ticks(struct rcu_data *rdp)
{ {
rdp->ticks_this_gp = 0; rdp->ticks_this_gp = 0;
rdp->softirq_snap = kstat_softirqs_cpu(RCU_SOFTIRQ, smp_processor_id()); rdp->softirq_snap = kstat_softirqs_cpu(RCU_SOFTIRQ, smp_processor_id());
} WRITE_ONCE(rdp->last_fqs_resched, jiffies);
/* Increment ->ticks_this_gp for all flavors of RCU. */
static void increment_cpu_stall_ticks(void)
{
struct rcu_state *rsp;
for_each_rcu_flavor(rsp)
raw_cpu_inc(rsp->rda->ticks_this_gp);
} }
#ifdef CONFIG_RCU_NOCB_CPU #ifdef CONFIG_RCU_NOCB_CPU
...@@ -1958,17 +1969,17 @@ static void wake_nocb_leader_defer(struct rcu_data *rdp, int waketype, ...@@ -1958,17 +1969,17 @@ static void wake_nocb_leader_defer(struct rcu_data *rdp, int waketype,
if (rdp->nocb_defer_wakeup == RCU_NOCB_WAKE_NOT) if (rdp->nocb_defer_wakeup == RCU_NOCB_WAKE_NOT)
mod_timer(&rdp->nocb_timer, jiffies + 1); mod_timer(&rdp->nocb_timer, jiffies + 1);
WRITE_ONCE(rdp->nocb_defer_wakeup, waketype); WRITE_ONCE(rdp->nocb_defer_wakeup, waketype);
trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, reason); trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, reason);
raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags); raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
} }
/* /*
* Does the specified CPU need an RCU callback for the specified flavor * Does the specified CPU need an RCU callback for this invocation
* of rcu_barrier()? * of rcu_barrier()?
*/ */
static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu) static bool rcu_nocb_cpu_needs_barrier(int cpu)
{ {
struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu); struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
unsigned long ret; unsigned long ret;
#ifdef CONFIG_PROVE_RCU #ifdef CONFIG_PROVE_RCU
struct rcu_head *rhp; struct rcu_head *rhp;
...@@ -1979,7 +1990,7 @@ static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu) ...@@ -1979,7 +1990,7 @@ static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu)
* There needs to be a barrier before this function is called, * There needs to be a barrier before this function is called,
* but associated with a prior determination that no more * but associated with a prior determination that no more
* callbacks would be posted. In the worst case, the first * callbacks would be posted. In the worst case, the first
* barrier in _rcu_barrier() suffices (but the caller cannot * barrier in rcu_barrier() suffices (but the caller cannot
* necessarily rely on this, not a substitute for the caller * necessarily rely on this, not a substitute for the caller
* getting the concurrency design right!). There must also be * getting the concurrency design right!). There must also be
* a barrier between the following load an posting of a callback * a barrier between the following load an posting of a callback
...@@ -2037,7 +2048,7 @@ static void __call_rcu_nocb_enqueue(struct rcu_data *rdp, ...@@ -2037,7 +2048,7 @@ static void __call_rcu_nocb_enqueue(struct rcu_data *rdp,
/* If we are not being polled and there is a kthread, awaken it ... */ /* If we are not being polled and there is a kthread, awaken it ... */
t = READ_ONCE(rdp->nocb_kthread); t = READ_ONCE(rdp->nocb_kthread);
if (rcu_nocb_poll || !t) { if (rcu_nocb_poll || !t) {
trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
TPS("WakeNotPoll")); TPS("WakeNotPoll"));
return; return;
} }
...@@ -2046,7 +2057,7 @@ static void __call_rcu_nocb_enqueue(struct rcu_data *rdp, ...@@ -2046,7 +2057,7 @@ static void __call_rcu_nocb_enqueue(struct rcu_data *rdp,
if (!irqs_disabled_flags(flags)) { if (!irqs_disabled_flags(flags)) {
/* ... if queue was empty ... */ /* ... if queue was empty ... */
wake_nocb_leader(rdp, false); wake_nocb_leader(rdp, false);
trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
TPS("WakeEmpty")); TPS("WakeEmpty"));
} else { } else {
wake_nocb_leader_defer(rdp, RCU_NOCB_WAKE, wake_nocb_leader_defer(rdp, RCU_NOCB_WAKE,
...@@ -2057,7 +2068,7 @@ static void __call_rcu_nocb_enqueue(struct rcu_data *rdp, ...@@ -2057,7 +2068,7 @@ static void __call_rcu_nocb_enqueue(struct rcu_data *rdp,
/* ... or if many callbacks queued. */ /* ... or if many callbacks queued. */
if (!irqs_disabled_flags(flags)) { if (!irqs_disabled_flags(flags)) {
wake_nocb_leader(rdp, true); wake_nocb_leader(rdp, true);
trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
TPS("WakeOvf")); TPS("WakeOvf"));
} else { } else {
wake_nocb_leader_defer(rdp, RCU_NOCB_WAKE_FORCE, wake_nocb_leader_defer(rdp, RCU_NOCB_WAKE_FORCE,
...@@ -2065,7 +2076,7 @@ static void __call_rcu_nocb_enqueue(struct rcu_data *rdp, ...@@ -2065,7 +2076,7 @@ static void __call_rcu_nocb_enqueue(struct rcu_data *rdp,
} }
rdp->qlen_last_fqs_check = LONG_MAX / 2; rdp->qlen_last_fqs_check = LONG_MAX / 2;
} else { } else {
trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, TPS("WakeNot")); trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WakeNot"));
} }
return; return;
} }
...@@ -2087,12 +2098,12 @@ static bool __call_rcu_nocb(struct rcu_data *rdp, struct rcu_head *rhp, ...@@ -2087,12 +2098,12 @@ static bool __call_rcu_nocb(struct rcu_data *rdp, struct rcu_head *rhp,
return false; return false;
__call_rcu_nocb_enqueue(rdp, rhp, &rhp->next, 1, lazy, flags); __call_rcu_nocb_enqueue(rdp, rhp, &rhp->next, 1, lazy, flags);
if (__is_kfree_rcu_offset((unsigned long)rhp->func)) if (__is_kfree_rcu_offset((unsigned long)rhp->func))
trace_rcu_kfree_callback(rdp->rsp->name, rhp, trace_rcu_kfree_callback(rcu_state.name, rhp,
(unsigned long)rhp->func, (unsigned long)rhp->func,
-atomic_long_read(&rdp->nocb_q_count_lazy), -atomic_long_read(&rdp->nocb_q_count_lazy),
-atomic_long_read(&rdp->nocb_q_count)); -atomic_long_read(&rdp->nocb_q_count));
else else
trace_rcu_callback(rdp->rsp->name, rhp, trace_rcu_callback(rcu_state.name, rhp,
-atomic_long_read(&rdp->nocb_q_count_lazy), -atomic_long_read(&rdp->nocb_q_count_lazy),
-atomic_long_read(&rdp->nocb_q_count)); -atomic_long_read(&rdp->nocb_q_count));
...@@ -2142,7 +2153,7 @@ static void rcu_nocb_wait_gp(struct rcu_data *rdp) ...@@ -2142,7 +2153,7 @@ static void rcu_nocb_wait_gp(struct rcu_data *rdp)
struct rcu_node *rnp = rdp->mynode; struct rcu_node *rnp = rdp->mynode;
local_irq_save(flags); local_irq_save(flags);
c = rcu_seq_snap(&rdp->rsp->gp_seq); c = rcu_seq_snap(&rcu_state.gp_seq);
if (!rdp->gpwrap && ULONG_CMP_GE(rdp->gp_seq_needed, c)) { if (!rdp->gpwrap && ULONG_CMP_GE(rdp->gp_seq_needed, c)) {
local_irq_restore(flags); local_irq_restore(flags);
} else { } else {
...@@ -2150,7 +2161,7 @@ static void rcu_nocb_wait_gp(struct rcu_data *rdp) ...@@ -2150,7 +2161,7 @@ static void rcu_nocb_wait_gp(struct rcu_data *rdp)
needwake = rcu_start_this_gp(rnp, rdp, c); needwake = rcu_start_this_gp(rnp, rdp, c);
raw_spin_unlock_irqrestore_rcu_node(rnp, flags); raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
if (needwake) if (needwake)
rcu_gp_kthread_wake(rdp->rsp); rcu_gp_kthread_wake();
} }
/* /*
...@@ -2187,7 +2198,7 @@ static void nocb_leader_wait(struct rcu_data *my_rdp) ...@@ -2187,7 +2198,7 @@ static void nocb_leader_wait(struct rcu_data *my_rdp)
/* Wait for callbacks to appear. */ /* Wait for callbacks to appear. */
if (!rcu_nocb_poll) { if (!rcu_nocb_poll) {
trace_rcu_nocb_wake(my_rdp->rsp->name, my_rdp->cpu, TPS("Sleep")); trace_rcu_nocb_wake(rcu_state.name, my_rdp->cpu, TPS("Sleep"));
swait_event_interruptible_exclusive(my_rdp->nocb_wq, swait_event_interruptible_exclusive(my_rdp->nocb_wq,
!READ_ONCE(my_rdp->nocb_leader_sleep)); !READ_ONCE(my_rdp->nocb_leader_sleep));
raw_spin_lock_irqsave(&my_rdp->nocb_lock, flags); raw_spin_lock_irqsave(&my_rdp->nocb_lock, flags);
...@@ -2197,7 +2208,7 @@ static void nocb_leader_wait(struct rcu_data *my_rdp) ...@@ -2197,7 +2208,7 @@ static void nocb_leader_wait(struct rcu_data *my_rdp)
raw_spin_unlock_irqrestore(&my_rdp->nocb_lock, flags); raw_spin_unlock_irqrestore(&my_rdp->nocb_lock, flags);
} else if (firsttime) { } else if (firsttime) {
firsttime = false; /* Don't drown trace log with "Poll"! */ firsttime = false; /* Don't drown trace log with "Poll"! */
trace_rcu_nocb_wake(my_rdp->rsp->name, my_rdp->cpu, TPS("Poll")); trace_rcu_nocb_wake(rcu_state.name, my_rdp->cpu, TPS("Poll"));
} }
/* /*
...@@ -2224,7 +2235,7 @@ static void nocb_leader_wait(struct rcu_data *my_rdp) ...@@ -2224,7 +2235,7 @@ static void nocb_leader_wait(struct rcu_data *my_rdp)
if (rcu_nocb_poll) { if (rcu_nocb_poll) {
schedule_timeout_interruptible(1); schedule_timeout_interruptible(1);
} else { } else {
trace_rcu_nocb_wake(my_rdp->rsp->name, my_rdp->cpu, trace_rcu_nocb_wake(rcu_state.name, my_rdp->cpu,
TPS("WokeEmpty")); TPS("WokeEmpty"));
} }
goto wait_again; goto wait_again;
...@@ -2269,7 +2280,7 @@ static void nocb_leader_wait(struct rcu_data *my_rdp) ...@@ -2269,7 +2280,7 @@ static void nocb_leader_wait(struct rcu_data *my_rdp)
static void nocb_follower_wait(struct rcu_data *rdp) static void nocb_follower_wait(struct rcu_data *rdp)
{ {
for (;;) { for (;;) {
trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, TPS("FollowerSleep")); trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("FollowerSleep"));
swait_event_interruptible_exclusive(rdp->nocb_wq, swait_event_interruptible_exclusive(rdp->nocb_wq,
READ_ONCE(rdp->nocb_follower_head)); READ_ONCE(rdp->nocb_follower_head));
if (smp_load_acquire(&rdp->nocb_follower_head)) { if (smp_load_acquire(&rdp->nocb_follower_head)) {
...@@ -2277,7 +2288,7 @@ static void nocb_follower_wait(struct rcu_data *rdp) ...@@ -2277,7 +2288,7 @@ static void nocb_follower_wait(struct rcu_data *rdp)
return; return;
} }
WARN_ON(signal_pending(current)); WARN_ON(signal_pending(current));
trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, TPS("WokeEmpty")); trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WokeEmpty"));
} }
} }
...@@ -2312,10 +2323,10 @@ static int rcu_nocb_kthread(void *arg) ...@@ -2312,10 +2323,10 @@ static int rcu_nocb_kthread(void *arg)
rdp->nocb_follower_tail = &rdp->nocb_follower_head; rdp->nocb_follower_tail = &rdp->nocb_follower_head;
raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags); raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
BUG_ON(!list); BUG_ON(!list);
trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, TPS("WokeNonEmpty")); trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WokeNonEmpty"));
/* Each pass through the following loop invokes a callback. */ /* Each pass through the following loop invokes a callback. */
trace_rcu_batch_start(rdp->rsp->name, trace_rcu_batch_start(rcu_state.name,
atomic_long_read(&rdp->nocb_q_count_lazy), atomic_long_read(&rdp->nocb_q_count_lazy),
atomic_long_read(&rdp->nocb_q_count), -1); atomic_long_read(&rdp->nocb_q_count), -1);
c = cl = 0; c = cl = 0;
...@@ -2323,23 +2334,23 @@ static int rcu_nocb_kthread(void *arg) ...@@ -2323,23 +2334,23 @@ static int rcu_nocb_kthread(void *arg)
next = list->next; next = list->next;
/* Wait for enqueuing to complete, if needed. */ /* Wait for enqueuing to complete, if needed. */
while (next == NULL && &list->next != tail) { while (next == NULL && &list->next != tail) {
trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
TPS("WaitQueue")); TPS("WaitQueue"));
schedule_timeout_interruptible(1); schedule_timeout_interruptible(1);
trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
TPS("WokeQueue")); TPS("WokeQueue"));
next = list->next; next = list->next;
} }
debug_rcu_head_unqueue(list); debug_rcu_head_unqueue(list);
local_bh_disable(); local_bh_disable();
if (__rcu_reclaim(rdp->rsp->name, list)) if (__rcu_reclaim(rcu_state.name, list))
cl++; cl++;
c++; c++;
local_bh_enable(); local_bh_enable();
cond_resched_tasks_rcu_qs(); cond_resched_tasks_rcu_qs();
list = next; list = next;
} }
trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1); trace_rcu_batch_end(rcu_state.name, c, !!list, 0, 0, 1);
smp_mb__before_atomic(); /* _add after CB invocation. */ smp_mb__before_atomic(); /* _add after CB invocation. */
atomic_long_add(-c, &rdp->nocb_q_count); atomic_long_add(-c, &rdp->nocb_q_count);
atomic_long_add(-cl, &rdp->nocb_q_count_lazy); atomic_long_add(-cl, &rdp->nocb_q_count_lazy);
...@@ -2367,7 +2378,7 @@ static void do_nocb_deferred_wakeup_common(struct rcu_data *rdp) ...@@ -2367,7 +2378,7 @@ static void do_nocb_deferred_wakeup_common(struct rcu_data *rdp)
ndw = READ_ONCE(rdp->nocb_defer_wakeup); ndw = READ_ONCE(rdp->nocb_defer_wakeup);
WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT); WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
__wake_nocb_leader(rdp, ndw == RCU_NOCB_WAKE_FORCE, flags); __wake_nocb_leader(rdp, ndw == RCU_NOCB_WAKE_FORCE, flags);
trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, TPS("DeferredWake")); trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("DeferredWake"));
} }
/* Do a deferred wakeup of rcu_nocb_kthread() from a timer handler. */ /* Do a deferred wakeup of rcu_nocb_kthread() from a timer handler. */
...@@ -2393,7 +2404,6 @@ void __init rcu_init_nohz(void) ...@@ -2393,7 +2404,6 @@ void __init rcu_init_nohz(void)
{ {
int cpu; int cpu;
bool need_rcu_nocb_mask = false; bool need_rcu_nocb_mask = false;
struct rcu_state *rsp;
#if defined(CONFIG_NO_HZ_FULL) #if defined(CONFIG_NO_HZ_FULL)
if (tick_nohz_full_running && cpumask_weight(tick_nohz_full_mask)) if (tick_nohz_full_running && cpumask_weight(tick_nohz_full_mask))
...@@ -2427,11 +2437,9 @@ void __init rcu_init_nohz(void) ...@@ -2427,11 +2437,9 @@ void __init rcu_init_nohz(void)
if (rcu_nocb_poll) if (rcu_nocb_poll)
pr_info("\tPoll for callbacks from no-CBs CPUs.\n"); pr_info("\tPoll for callbacks from no-CBs CPUs.\n");
for_each_rcu_flavor(rsp) { for_each_cpu(cpu, rcu_nocb_mask)
for_each_cpu(cpu, rcu_nocb_mask) init_nocb_callback_list(per_cpu_ptr(&rcu_data, cpu));
init_nocb_callback_list(per_cpu_ptr(rsp->rda, cpu)); rcu_organize_nocb_kthreads();
rcu_organize_nocb_kthreads(rsp);
}
} }
/* Initialize per-rcu_data variables for no-CBs CPUs. */ /* Initialize per-rcu_data variables for no-CBs CPUs. */
...@@ -2446,16 +2454,15 @@ static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp) ...@@ -2446,16 +2454,15 @@ static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp)
/* /*
* If the specified CPU is a no-CBs CPU that does not already have its * If the specified CPU is a no-CBs CPU that does not already have its
* rcuo kthread for the specified RCU flavor, spawn it. If the CPUs are * rcuo kthread, spawn it. If the CPUs are brought online out of order,
* brought online out of order, this can require re-organizing the * this can require re-organizing the leader-follower relationships.
* leader-follower relationships.
*/ */
static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu) static void rcu_spawn_one_nocb_kthread(int cpu)
{ {
struct rcu_data *rdp; struct rcu_data *rdp;
struct rcu_data *rdp_last; struct rcu_data *rdp_last;
struct rcu_data *rdp_old_leader; struct rcu_data *rdp_old_leader;
struct rcu_data *rdp_spawn = per_cpu_ptr(rsp->rda, cpu); struct rcu_data *rdp_spawn = per_cpu_ptr(&rcu_data, cpu);
struct task_struct *t; struct task_struct *t;
/* /*
...@@ -2485,9 +2492,9 @@ static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu) ...@@ -2485,9 +2492,9 @@ static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu)
rdp_spawn->nocb_next_follower = rdp_old_leader; rdp_spawn->nocb_next_follower = rdp_old_leader;
} }
/* Spawn the kthread for this CPU and RCU flavor. */ /* Spawn the kthread for this CPU. */
t = kthread_run(rcu_nocb_kthread, rdp_spawn, t = kthread_run(rcu_nocb_kthread, rdp_spawn,
"rcuo%c/%d", rsp->abbr, cpu); "rcuo%c/%d", rcu_state.abbr, cpu);
BUG_ON(IS_ERR(t)); BUG_ON(IS_ERR(t));
WRITE_ONCE(rdp_spawn->nocb_kthread, t); WRITE_ONCE(rdp_spawn->nocb_kthread, t);
} }
...@@ -2498,11 +2505,8 @@ static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu) ...@@ -2498,11 +2505,8 @@ static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu)
*/ */
static void rcu_spawn_all_nocb_kthreads(int cpu) static void rcu_spawn_all_nocb_kthreads(int cpu)
{ {
struct rcu_state *rsp;
if (rcu_scheduler_fully_active) if (rcu_scheduler_fully_active)
for_each_rcu_flavor(rsp) rcu_spawn_one_nocb_kthread(cpu);
rcu_spawn_one_nocb_kthread(rsp, cpu);
} }
/* /*
...@@ -2526,7 +2530,7 @@ module_param(rcu_nocb_leader_stride, int, 0444); ...@@ -2526,7 +2530,7 @@ module_param(rcu_nocb_leader_stride, int, 0444);
/* /*
* Initialize leader-follower relationships for all no-CBs CPU. * Initialize leader-follower relationships for all no-CBs CPU.
*/ */
static void __init rcu_organize_nocb_kthreads(struct rcu_state *rsp) static void __init rcu_organize_nocb_kthreads(void)
{ {
int cpu; int cpu;
int ls = rcu_nocb_leader_stride; int ls = rcu_nocb_leader_stride;
...@@ -2548,7 +2552,7 @@ static void __init rcu_organize_nocb_kthreads(struct rcu_state *rsp) ...@@ -2548,7 +2552,7 @@ static void __init rcu_organize_nocb_kthreads(struct rcu_state *rsp)
* we will spawn the needed set of rcu_nocb_kthread() kthreads. * we will spawn the needed set of rcu_nocb_kthread() kthreads.
*/ */
for_each_cpu(cpu, rcu_nocb_mask) { for_each_cpu(cpu, rcu_nocb_mask) {
rdp = per_cpu_ptr(rsp->rda, cpu); rdp = per_cpu_ptr(&rcu_data, cpu);
if (rdp->cpu >= nl) { if (rdp->cpu >= nl) {
/* New leader, set up for followers & next leader. */ /* New leader, set up for followers & next leader. */
nl = DIV_ROUND_UP(rdp->cpu + 1, ls) * ls; nl = DIV_ROUND_UP(rdp->cpu + 1, ls) * ls;
...@@ -2585,7 +2589,7 @@ static bool init_nocb_callback_list(struct rcu_data *rdp) ...@@ -2585,7 +2589,7 @@ static bool init_nocb_callback_list(struct rcu_data *rdp)
#else /* #ifdef CONFIG_RCU_NOCB_CPU */ #else /* #ifdef CONFIG_RCU_NOCB_CPU */
static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu) static bool rcu_nocb_cpu_needs_barrier(int cpu)
{ {
WARN_ON_ONCE(1); /* Should be dead code. */ WARN_ON_ONCE(1); /* Should be dead code. */
return false; return false;
...@@ -2654,12 +2658,12 @@ static bool init_nocb_callback_list(struct rcu_data *rdp) ...@@ -2654,12 +2658,12 @@ static bool init_nocb_callback_list(struct rcu_data *rdp)
* This code relies on the fact that all NO_HZ_FULL CPUs are also * This code relies on the fact that all NO_HZ_FULL CPUs are also
* CONFIG_RCU_NOCB_CPU CPUs. * CONFIG_RCU_NOCB_CPU CPUs.
*/ */
static bool rcu_nohz_full_cpu(struct rcu_state *rsp) static bool rcu_nohz_full_cpu(void)
{ {
#ifdef CONFIG_NO_HZ_FULL #ifdef CONFIG_NO_HZ_FULL
if (tick_nohz_full_cpu(smp_processor_id()) && if (tick_nohz_full_cpu(smp_processor_id()) &&
(!rcu_gp_in_progress(rsp) || (!rcu_gp_in_progress() ||
ULONG_CMP_LT(jiffies, READ_ONCE(rsp->gp_start) + HZ))) ULONG_CMP_LT(jiffies, READ_ONCE(rcu_state.gp_start) + HZ)))
return true; return true;
#endif /* #ifdef CONFIG_NO_HZ_FULL */ #endif /* #ifdef CONFIG_NO_HZ_FULL */
return false; return false;
......
...@@ -203,11 +203,7 @@ void rcu_test_sync_prims(void) ...@@ -203,11 +203,7 @@ void rcu_test_sync_prims(void)
if (!IS_ENABLED(CONFIG_PROVE_RCU)) if (!IS_ENABLED(CONFIG_PROVE_RCU))
return; return;
synchronize_rcu(); synchronize_rcu();
synchronize_rcu_bh();
synchronize_sched();
synchronize_rcu_expedited(); synchronize_rcu_expedited();
synchronize_rcu_bh_expedited();
synchronize_sched_expedited();
} }
#if !defined(CONFIG_TINY_RCU) || defined(CONFIG_SRCU) #if !defined(CONFIG_TINY_RCU) || defined(CONFIG_SRCU)
...@@ -298,7 +294,7 @@ EXPORT_SYMBOL_GPL(rcu_read_lock_held); ...@@ -298,7 +294,7 @@ EXPORT_SYMBOL_GPL(rcu_read_lock_held);
* *
* Check debug_lockdep_rcu_enabled() to prevent false positives during boot. * Check debug_lockdep_rcu_enabled() to prevent false positives during boot.
* *
* Note that rcu_read_lock() is disallowed if the CPU is either idle or * Note that rcu_read_lock_bh() is disallowed if the CPU is either idle or
* offline from an RCU perspective, so check for those as well. * offline from an RCU perspective, so check for those as well.
*/ */
int rcu_read_lock_bh_held(void) int rcu_read_lock_bh_held(void)
...@@ -336,7 +332,7 @@ void __wait_rcu_gp(bool checktiny, int n, call_rcu_func_t *crcu_array, ...@@ -336,7 +332,7 @@ void __wait_rcu_gp(bool checktiny, int n, call_rcu_func_t *crcu_array,
int i; int i;
int j; int j;
/* Initialize and register callbacks for each flavor specified. */ /* Initialize and register callbacks for each crcu_array element. */
for (i = 0; i < n; i++) { for (i = 0; i < n; i++) {
if (checktiny && if (checktiny &&
(crcu_array[i] == call_rcu || (crcu_array[i] == call_rcu ||
...@@ -472,6 +468,7 @@ int rcu_jiffies_till_stall_check(void) ...@@ -472,6 +468,7 @@ int rcu_jiffies_till_stall_check(void)
} }
return till_stall_check * HZ + RCU_STALL_DELAY_DELTA; return till_stall_check * HZ + RCU_STALL_DELAY_DELTA;
} }
EXPORT_SYMBOL_GPL(rcu_jiffies_till_stall_check);
void rcu_sysrq_start(void) void rcu_sysrq_start(void)
{ {
...@@ -701,19 +698,19 @@ static int __noreturn rcu_tasks_kthread(void *arg) ...@@ -701,19 +698,19 @@ static int __noreturn rcu_tasks_kthread(void *arg)
/* /*
* Wait for all pre-existing t->on_rq and t->nvcsw * Wait for all pre-existing t->on_rq and t->nvcsw
* transitions to complete. Invoking synchronize_sched() * transitions to complete. Invoking synchronize_rcu()
* suffices because all these transitions occur with * suffices because all these transitions occur with
* interrupts disabled. Without this synchronize_sched(), * interrupts disabled. Without this synchronize_rcu(),
* a read-side critical section that started before the * a read-side critical section that started before the
* grace period might be incorrectly seen as having started * grace period might be incorrectly seen as having started
* after the grace period. * after the grace period.
* *
* This synchronize_sched() also dispenses with the * This synchronize_rcu() also dispenses with the
* need for a memory barrier on the first store to * need for a memory barrier on the first store to
* ->rcu_tasks_holdout, as it forces the store to happen * ->rcu_tasks_holdout, as it forces the store to happen
* after the beginning of the grace period. * after the beginning of the grace period.
*/ */
synchronize_sched(); synchronize_rcu();
/* /*
* There were callbacks, so we need to wait for an * There were callbacks, so we need to wait for an
...@@ -740,7 +737,7 @@ static int __noreturn rcu_tasks_kthread(void *arg) ...@@ -740,7 +737,7 @@ static int __noreturn rcu_tasks_kthread(void *arg)
* This does only part of the job, ensuring that all * This does only part of the job, ensuring that all
* tasks that were previously exiting reach the point * tasks that were previously exiting reach the point
* where they have disabled preemption, allowing the * where they have disabled preemption, allowing the
* later synchronize_sched() to finish the job. * later synchronize_rcu() to finish the job.
*/ */
synchronize_srcu(&tasks_rcu_exit_srcu); synchronize_srcu(&tasks_rcu_exit_srcu);
...@@ -790,20 +787,20 @@ static int __noreturn rcu_tasks_kthread(void *arg) ...@@ -790,20 +787,20 @@ static int __noreturn rcu_tasks_kthread(void *arg)
* cause their RCU-tasks read-side critical sections to * cause their RCU-tasks read-side critical sections to
* extend past the end of the grace period. However, * extend past the end of the grace period. However,
* because these ->nvcsw updates are carried out with * because these ->nvcsw updates are carried out with
* interrupts disabled, we can use synchronize_sched() * interrupts disabled, we can use synchronize_rcu()
* to force the needed ordering on all such CPUs. * to force the needed ordering on all such CPUs.
* *
* This synchronize_sched() also confines all * This synchronize_rcu() also confines all
* ->rcu_tasks_holdout accesses to be within the grace * ->rcu_tasks_holdout accesses to be within the grace
* period, avoiding the need for memory barriers for * period, avoiding the need for memory barriers for
* ->rcu_tasks_holdout accesses. * ->rcu_tasks_holdout accesses.
* *
* In addition, this synchronize_sched() waits for exiting * In addition, this synchronize_rcu() waits for exiting
* tasks to complete their final preempt_disable() region * tasks to complete their final preempt_disable() region
* of execution, cleaning up after the synchronize_srcu() * of execution, cleaning up after the synchronize_srcu()
* above. * above.
*/ */
synchronize_sched(); synchronize_rcu();
/* Invoke the callbacks. */ /* Invoke the callbacks. */
while (list) { while (list) {
...@@ -870,15 +867,10 @@ static void __init rcu_tasks_bootup_oddness(void) ...@@ -870,15 +867,10 @@ static void __init rcu_tasks_bootup_oddness(void)
#ifdef CONFIG_PROVE_RCU #ifdef CONFIG_PROVE_RCU
/* /*
* Early boot self test parameters, one for each flavor * Early boot self test parameters.
*/ */
static bool rcu_self_test; static bool rcu_self_test;
static bool rcu_self_test_bh;
static bool rcu_self_test_sched;
module_param(rcu_self_test, bool, 0444); module_param(rcu_self_test, bool, 0444);
module_param(rcu_self_test_bh, bool, 0444);
module_param(rcu_self_test_sched, bool, 0444);
static int rcu_self_test_counter; static int rcu_self_test_counter;
...@@ -888,25 +880,16 @@ static void test_callback(struct rcu_head *r) ...@@ -888,25 +880,16 @@ static void test_callback(struct rcu_head *r)
pr_info("RCU test callback executed %d\n", rcu_self_test_counter); pr_info("RCU test callback executed %d\n", rcu_self_test_counter);
} }
DEFINE_STATIC_SRCU(early_srcu);
static void early_boot_test_call_rcu(void) static void early_boot_test_call_rcu(void)
{ {
static struct rcu_head head; static struct rcu_head head;
static struct rcu_head shead;
call_rcu(&head, test_callback); call_rcu(&head, test_callback);
} if (IS_ENABLED(CONFIG_SRCU))
call_srcu(&early_srcu, &shead, test_callback);
static void early_boot_test_call_rcu_bh(void)
{
static struct rcu_head head;
call_rcu_bh(&head, test_callback);
}
static void early_boot_test_call_rcu_sched(void)
{
static struct rcu_head head;
call_rcu_sched(&head, test_callback);
} }
void rcu_early_boot_tests(void) void rcu_early_boot_tests(void)
...@@ -915,10 +898,6 @@ void rcu_early_boot_tests(void) ...@@ -915,10 +898,6 @@ void rcu_early_boot_tests(void)
if (rcu_self_test) if (rcu_self_test)
early_boot_test_call_rcu(); early_boot_test_call_rcu();
if (rcu_self_test_bh)
early_boot_test_call_rcu_bh();
if (rcu_self_test_sched)
early_boot_test_call_rcu_sched();
rcu_test_sync_prims(); rcu_test_sync_prims();
} }
...@@ -930,16 +909,11 @@ static int rcu_verify_early_boot_tests(void) ...@@ -930,16 +909,11 @@ static int rcu_verify_early_boot_tests(void)
if (rcu_self_test) { if (rcu_self_test) {
early_boot_test_counter++; early_boot_test_counter++;
rcu_barrier(); rcu_barrier();
if (IS_ENABLED(CONFIG_SRCU)) {
early_boot_test_counter++;
srcu_barrier(&early_srcu);
}
} }
if (rcu_self_test_bh) {
early_boot_test_counter++;
rcu_barrier_bh();
}
if (rcu_self_test_sched) {
early_boot_test_counter++;
rcu_barrier_sched();
}
if (rcu_self_test_counter != early_boot_test_counter) { if (rcu_self_test_counter != early_boot_test_counter) {
WARN_ON(1); WARN_ON(1);
ret = -1; ret = -1;
......
...@@ -301,7 +301,8 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) ...@@ -301,7 +301,8 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
pending >>= softirq_bit; pending >>= softirq_bit;
} }
rcu_bh_qs(); if (__this_cpu_read(ksoftirqd) == current)
rcu_softirq_qs();
local_irq_disable(); local_irq_disable();
pending = local_softirq_pending(); pending = local_softirq_pending();
......
...@@ -573,7 +573,7 @@ static int stutter; ...@@ -573,7 +573,7 @@ static int stutter;
* Block until the stutter interval ends. This must be called periodically * Block until the stutter interval ends. This must be called periodically
* by all running kthreads that need to be subject to stuttering. * by all running kthreads that need to be subject to stuttering.
*/ */
void stutter_wait(const char *title) bool stutter_wait(const char *title)
{ {
int spt; int spt;
...@@ -590,6 +590,7 @@ void stutter_wait(const char *title) ...@@ -590,6 +590,7 @@ void stutter_wait(const char *title)
} }
torture_shutdown_absorb(title); torture_shutdown_absorb(title);
} }
return !!spt;
} }
EXPORT_SYMBOL_GPL(stutter_wait); EXPORT_SYMBOL_GPL(stutter_wait);
......
...@@ -120,7 +120,6 @@ then ...@@ -120,7 +120,6 @@ then
parse-build.sh $resdir/Make.out $title parse-build.sh $resdir/Make.out $title
else else
# Build failed. # Build failed.
cp $builddir/Make*.out $resdir
cp $builddir/.config $resdir || : cp $builddir/.config $resdir || :
echo Build failed, not running KVM, see $resdir. echo Build failed, not running KVM, see $resdir.
if test -f $builddir.wait if test -f $builddir.wait
......
...@@ -3,9 +3,7 @@ TREE02 ...@@ -3,9 +3,7 @@ TREE02
TREE03 TREE03
TREE04 TREE04
TREE05 TREE05
TREE06
TREE07 TREE07
TREE08
TREE09 TREE09
SRCU-N SRCU-N
SRCU-P SRCU-P
......
rcutorture.torture_type=srcud rcutorture.torture_type=srcud
rcupdate.rcu_self_test=1
rcutorture.torture_type=srcud rcutorture.torture_type=srcud
rcupdate.rcu_self_test=1
rcupdate.rcu_self_test=1 rcupdate.rcu_self_test=1
rcupdate.rcu_self_test_bh=1
rcutorture.torture_type=rcu_bh
rcutorture.torture_type=rcu_bh maxcpus=8 nr_cpus=43 maxcpus=8 nr_cpus=43
rcutree.gp_preinit_delay=3 rcutree.gp_preinit_delay=3
rcutree.gp_init_delay=3 rcutree.gp_init_delay=3
rcutree.gp_cleanup_delay=3 rcutree.gp_cleanup_delay=3
......
rcutorture.torture_type=rcu_bh rcutree.rcu_fanout_leaf=4 nohz_full=1-7 rcutree.rcu_fanout_leaf=4 nohz_full=1-7
rcutorture.torture_type=sched
rcupdate.rcu_self_test_sched=1
rcutree.gp_preinit_delay=3 rcutree.gp_preinit_delay=3
rcutree.gp_init_delay=3 rcutree.gp_init_delay=3
rcutree.gp_cleanup_delay=3 rcutree.gp_cleanup_delay=3
rcupdate.rcu_self_test=1
rcupdate.rcu_self_test=1 rcupdate.rcu_self_test=1
rcupdate.rcu_self_test_bh=1
rcupdate.rcu_self_test_sched=1
rcutree.rcu_fanout_exact=1 rcutree.rcu_fanout_exact=1
rcutree.gp_preinit_delay=3 rcutree.gp_preinit_delay=3
rcutree.gp_init_delay=3 rcutree.gp_init_delay=3
......
rcutorture.torture_type=sched
rcupdate.rcu_self_test=1 rcupdate.rcu_self_test=1
rcupdate.rcu_self_test_sched=1
rcutree.rcu_fanout_exact=1 rcutree.rcu_fanout_exact=1
rcu_nocbs=0-7 rcu_nocbs=0-7
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment