Commit 7073bc66 authored by Linus Torvalds's avatar Linus Torvalds

Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RCU updates from Ingo Molnar:
 "The main RCU changes in this cycle are:

   - the combination of tree geometry-initialization simplifications and
     OS-jitter-reduction changes to expedited grace periods.  These two
     are stacked due to the large number of conflicts that would
     otherwise result.

   - privatize smp_mb__after_unlock_lock().

     This commit moves the definition of smp_mb__after_unlock_lock() to
     kernel/rcu/tree.h, in recognition of the fact that RCU is the only
     thing using this, that nothing else is likely to use it, and that
     it is likely to go away completely.

   - documentation updates.

   - torture-test updates.

   - misc fixes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
  rcu,locking: Privatize smp_mb__after_unlock_lock()
  rcu: Silence lockdep false positive for expedited grace periods
  rcu: Don't disable CPU hotplug during OOM notifiers
  scripts: Make checkpatch.pl warn on expedited RCU grace periods
  rcu: Update MAINTAINERS entry
  rcu: Clarify CONFIG_RCU_EQS_DEBUG help text
  rcu: Fix backwards RCU_LOCKDEP_WARN() in synchronize_rcu_tasks()
  rcu: Rename rcu_lockdep_assert() to RCU_LOCKDEP_WARN()
  rcu: Make rcu_is_watching() really notrace
  cpu: Wait for RCU grace periods concurrently
  rcu: Create a synchronize_rcu_mult()
  rcu: Fix obsolete priority-boosting comment
  rcu: Use WRITE_ONCE in RCU_INIT_POINTER
  rcu: Hide RCU_NOCB_CPU behind RCU_EXPERT
  rcu: Add RCU-sched flavors of get-state and cond-sync
  rcu: Add fastpath bypassing funnel locking
  rcu: Rename RCU_GP_DONE_FQS to RCU_GP_DOING_FQS
  rcu: Pull out wait_event*() condition into helper function
  documentation: Describe new expedited stall warnings
  rcu: Add stall warnings to synchronize_sched_expedited()
  ...
parents d4c90396 f612a7b1
......@@ -28,7 +28,7 @@ o You must use one of the rcu_dereference() family of primitives
o Avoid cancellation when using the "+" and "-" infix arithmetic
operators. For example, for a given variable "x", avoid
"(x-x)". There are similar arithmetic pitfalls from other
arithmetic operatiors, such as "(x*0)", "(x/(x+1))" or "(x%1)".
arithmetic operators, such as "(x*0)", "(x/(x+1))" or "(x%1)".
The compiler is within its rights to substitute zero for all of
these expressions, so that subsequent accesses no longer depend
on the rcu_dereference(), again possibly resulting in bugs due
......
......@@ -26,12 +26,6 @@ CONFIG_RCU_CPU_STALL_TIMEOUT
Stall-warning messages may be enabled and disabled completely via
/sys/module/rcupdate/parameters/rcu_cpu_stall_suppress.
CONFIG_RCU_CPU_STALL_INFO
This kernel configuration parameter causes the stall warning to
print out additional per-CPU diagnostic information, including
information on scheduling-clock ticks and RCU's idle-CPU tracking.
RCU_STALL_DELAY_DELTA
Although the lockdep facility is extremely useful, it does add
......@@ -101,15 +95,13 @@ interact. Please note that it is not possible to entirely eliminate this
sort of false positive without resorting to things like stop_machine(),
which is overkill for this sort of problem.
If the CONFIG_RCU_CPU_STALL_INFO kernel configuration parameter is set,
more information is printed with the stall-warning message, for example:
Recent kernels will print a long form of the stall-warning message:
INFO: rcu_preempt detected stall on CPU
0: (63959 ticks this GP) idle=241/3fffffffffffffff/0 softirq=82/543
(t=65000 jiffies)
In kernels with CONFIG_RCU_FAST_NO_HZ, even more information is
printed:
In kernels with CONFIG_RCU_FAST_NO_HZ, more information is printed:
INFO: rcu_preempt detected stall on CPU
0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 nonlazy_posted: 25 .D
......@@ -171,6 +163,23 @@ message will be about three times the interval between the beginning
of the stall and the first message.
Stall Warnings for Expedited Grace Periods
If an expedited grace period detects a stall, it will place a message
like the following in dmesg:
INFO: rcu_sched detected expedited stalls on CPUs: { 1 2 6 } 26009 jiffies s: 1043
This indicates that CPUs 1, 2, and 6 have failed to respond to a
reschedule IPI, that the expedited grace period has been going on for
26,009 jiffies, and that the expedited grace-period sequence counter is
1043. The fact that this last value is odd indicates that an expedited
grace period is in flight.
It is entirely possible to see stall warnings from normal and from
expedited grace periods at about the same time from the same run.
What Causes RCU CPU Stall Warnings?
So your kernel printed an RCU CPU stall warning. The next question is
......
......@@ -237,42 +237,26 @@ o "ktl" is the low-order 16 bits (in hexadecimal) of the count of
The output of "cat rcu/rcu_preempt/rcuexp" looks as follows:
s=21872 d=21872 w=0 tf=0 wd1=0 wd2=0 n=0 sc=21872 dt=21872 dl=0 dx=21872
s=21872 wd0=0 wd1=0 wd2=0 wd3=5 n=0 enq=0 sc=21872
These fields are as follows:
o "s" is the starting sequence number.
o "s" is the sequence number, with an odd number indicating that
an expedited grace period is in progress.
o "d" is the ending sequence number. When the starting and ending
numbers differ, there is an expedited grace period in progress.
o "w" is the number of times that the sequence numbers have been
in danger of wrapping.
o "tf" is the number of times that contention has resulted in a
failure to begin an expedited grace period.
o "wd1" and "wd2" are the number of times that an attempt to
start an expedited grace period found that someone else had
completed an expedited grace period that satisfies the
o "wd0", "wd1", "wd2", and "wd3" are the number of times that an
attempt to start an expedited grace period found that someone
else had completed an expedited grace period that satisfies the
attempted request. "Our work is done."
o "n" is number of times that contention was so great that
the request was demoted from an expedited grace period to
a normal grace period.
o "n" is number of times that a concurrent CPU-hotplug operation
forced a fallback to a normal grace period.
o "enq" is the number of quiescent states still outstanding.
o "sc" is the number of times that the attempt to start a
new expedited grace period succeeded.
o "dt" is the number of times that we attempted to update
the "d" counter.
o "dl" is the number of times that we failed to update the "d"
counter.
o "dx" is the number of times that we succeeded in updating
the "d" counter.
The output of "cat rcu/rcu_preempt/rcugp" looks as follows:
......
......@@ -883,7 +883,7 @@ All: lockdep-checked RCU-protected pointer access
rcu_access_pointer
rcu_dereference_raw
rcu_lockdep_assert
RCU_LOCKDEP_WARN
rcu_sleep_check
RCU_NONIDLE
......
......@@ -3137,22 +3137,35 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
in a given burst of a callback-flood test.
rcutorture.fqs_duration= [KNL]
Set duration of force_quiescent_state bursts.
Set duration of force_quiescent_state bursts
in microseconds.
rcutorture.fqs_holdoff= [KNL]
Set holdoff time within force_quiescent_state bursts.
Set holdoff time within force_quiescent_state bursts
in microseconds.
rcutorture.fqs_stutter= [KNL]
Set wait time between force_quiescent_state bursts.
Set wait time between force_quiescent_state bursts
in seconds.
rcutorture.gp_cond= [KNL]
Use conditional/asynchronous update-side
primitives, if available.
rcutorture.gp_exp= [KNL]
Use expedited update-side primitives.
Use expedited update-side primitives, if available.
rcutorture.gp_normal= [KNL]
Use normal (non-expedited) update-side primitives.
If both gp_exp and gp_normal are set, do both.
If neither gp_exp nor gp_normal are set, still
do both.
Use normal (non-expedited) asynchronous
update-side primitives, if available.
rcutorture.gp_sync= [KNL]
Use normal (non-expedited) synchronous
update-side primitives, if available. If all
of rcutorture.gp_cond=, rcutorture.gp_exp=,
rcutorture.gp_normal=, and rcutorture.gp_sync=
are zero, rcutorture acts as if is interpreted
they are all non-zero.
rcutorture.n_barrier_cbs= [KNL]
Set callbacks/threads for rcu_barrier() testing.
......@@ -3179,9 +3192,6 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
Set time (s) between CPU-hotplug operations, or
zero to disable CPU-hotplug testing.
rcutorture.torture_runnable= [BOOT]
Start rcutorture running at boot time.
rcutorture.shuffle_interval= [KNL]
Set task-shuffle interval (s). Shuffling tasks
allows some CPUs to go into dyntick-idle mode
......@@ -3222,6 +3232,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
Test RCU's dyntick-idle handling. See also the
rcutorture.shuffle_interval parameter.
rcutorture.torture_runnable= [BOOT]
Start rcutorture running at boot time.
rcutorture.torture_type= [KNL]
Specify the RCU implementation to test.
......
This diff is collapsed.
......@@ -8518,7 +8518,7 @@ M: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
M: Josh Triplett <josh@joshtriplett.org>
R: Steven Rostedt <rostedt@goodmis.org>
R: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
R: Lai Jiangshan <laijs@cn.fujitsu.com>
R: Lai Jiangshan <jiangshanlai@gmail.com>
L: linux-kernel@vger.kernel.org
S: Supported
T: git git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
......@@ -8545,7 +8545,7 @@ M: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
M: Josh Triplett <josh@joshtriplett.org>
R: Steven Rostedt <rostedt@goodmis.org>
R: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
R: Lai Jiangshan <laijs@cn.fujitsu.com>
R: Lai Jiangshan <jiangshanlai@gmail.com>
L: linux-kernel@vger.kernel.org
W: http://www.rdrop.com/users/paulmck/RCU/
S: Supported
......@@ -9417,7 +9417,7 @@ F: include/linux/sl?b*.h
F: mm/sl?b*
SLEEPABLE READ-COPY UPDATE (SRCU)
M: Lai Jiangshan <laijs@cn.fujitsu.com>
M: Lai Jiangshan <jiangshanlai@gmail.com>
M: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
M: Josh Triplett <josh@joshtriplett.org>
R: Steven Rostedt <rostedt@goodmis.org>
......
......@@ -28,8 +28,6 @@
#include <asm/synch.h>
#include <asm/ppc-opcode.h>
#define smp_mb__after_unlock_lock() smp_mb() /* Full ordering for lock. */
#ifdef CONFIG_PPC64
/* use 0x800000yy when locked, where yy == CPU number */
#ifdef __BIG_ENDIAN__
......
......@@ -54,9 +54,9 @@ static DEFINE_MUTEX(mce_chrdev_read_mutex);
#define rcu_dereference_check_mce(p) \
({ \
rcu_lockdep_assert(rcu_read_lock_sched_held() || \
lockdep_is_held(&mce_chrdev_read_mutex), \
"suspicious rcu_dereference_check_mce() usage"); \
RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held() && \
!lockdep_is_held(&mce_chrdev_read_mutex), \
"suspicious rcu_dereference_check_mce() usage"); \
smp_load_acquire(&(p)); \
})
......
......@@ -136,7 +136,7 @@ enum ctx_state ist_enter(struct pt_regs *regs)
preempt_count_add(HARDIRQ_OFFSET);
/* This code is a bit fragile. Test it. */
rcu_lockdep_assert(rcu_is_watching(), "ist_enter didn't work");
RCU_LOCKDEP_WARN(!rcu_is_watching(), "ist_enter didn't work");
return prev_state;
}
......
......@@ -110,8 +110,8 @@ static DEFINE_MUTEX(dev_opp_list_lock);
#define opp_rcu_lockdep_assert() \
do { \
rcu_lockdep_assert(rcu_read_lock_held() || \
lockdep_is_held(&dev_opp_list_lock), \
RCU_LOCKDEP_WARN(!rcu_read_lock_held() && \
!lockdep_is_held(&dev_opp_list_lock), \
"Missing rcu_read_lock() or " \
"dev_opp_list_lock protection"); \
} while (0)
......
......@@ -86,8 +86,8 @@ static inline struct file *__fcheck_files(struct files_struct *files, unsigned i
static inline struct file *fcheck_files(struct files_struct *files, unsigned int fd)
{
rcu_lockdep_assert(rcu_read_lock_held() ||
lockdep_is_held(&files->file_lock),
RCU_LOCKDEP_WARN(!rcu_read_lock_held() &&
!lockdep_is_held(&files->file_lock),
"suspicious rcu_dereference_check() usage");
return __fcheck_files(files, fd);
}
......
This diff is collapsed.
......@@ -37,6 +37,16 @@ static inline void cond_synchronize_rcu(unsigned long oldstate)
might_sleep();
}
static inline unsigned long get_state_synchronize_sched(void)
{
return 0;
}
static inline void cond_synchronize_sched(unsigned long oldstate)
{
might_sleep();
}
static inline void rcu_barrier_bh(void)
{
wait_rcu_gp(call_rcu_bh);
......
......@@ -76,6 +76,8 @@ void rcu_barrier_bh(void);
void rcu_barrier_sched(void);
unsigned long get_state_synchronize_rcu(void);
void cond_synchronize_rcu(unsigned long oldstate);
unsigned long get_state_synchronize_sched(void);
void cond_synchronize_sched(unsigned long oldstate);
extern unsigned long rcutorture_testseq;
extern unsigned long rcutorture_vernum;
......
......@@ -130,16 +130,6 @@ do { \
#define smp_mb__before_spinlock() smp_wmb()
#endif
/*
* Place this after a lock-acquisition primitive to guarantee that
* an UNLOCK+LOCK pair act as a full barrier. This guarantee applies
* if the UNLOCK and LOCK are executed by the same CPU or if the
* UNLOCK and LOCK operate on the same lock variable.
*/
#ifndef smp_mb__after_unlock_lock
#define smp_mb__after_unlock_lock() do { } while (0)
#endif
/**
* raw_spin_unlock_wait - wait until the spinlock gets unlocked
* @lock: the spinlock in question.
......
......@@ -212,6 +212,9 @@ struct callback_head {
};
#define rcu_head callback_head
typedef void (*rcu_callback_t)(struct rcu_head *head);
typedef void (*call_rcu_func_t)(struct rcu_head *head, rcu_callback_t func);
/* clocksource cycle base type */
typedef u64 cycle_t;
......
......@@ -661,7 +661,6 @@ TRACE_EVENT(rcu_torture_read,
* Tracepoint for _rcu_barrier() execution. The string "s" describes
* the _rcu_barrier phase:
* "Begin": _rcu_barrier() started.
* "Check": _rcu_barrier() checking for piggybacking.
* "EarlyExit": _rcu_barrier() piggybacked, thus early exit.
* "Inc1": _rcu_barrier() piggyback check counter incremented.
* "OfflineNoCB": _rcu_barrier() found callback on never-online CPU
......
......@@ -538,15 +538,6 @@ config RCU_STALL_COMMON
config CONTEXT_TRACKING
bool
config RCU_USER_QS
bool
help
This option sets hooks on kernel / userspace boundaries and
puts RCU in extended quiescent state when the CPU runs in
userspace. It means that when a CPU runs in userspace, it is
excluded from the global RCU state machine and thus doesn't
try to keep the timer tick on for RCU.
config CONTEXT_TRACKING_FORCE
bool "Force context tracking"
depends on CONTEXT_TRACKING
......@@ -707,6 +698,7 @@ config RCU_BOOST_DELAY
config RCU_NOCB_CPU
bool "Offload RCU callback processing from boot-selected CPUs"
depends on TREE_RCU || PREEMPT_RCU
depends on RCU_EXPERT || NO_HZ_FULL
default n
help
Use this option to reduce OS jitter for aggressive HPC or
......
......@@ -107,8 +107,8 @@ static DEFINE_SPINLOCK(release_agent_path_lock);
struct percpu_rw_semaphore cgroup_threadgroup_rwsem;
#define cgroup_assert_mutex_or_rcu_locked() \
rcu_lockdep_assert(rcu_read_lock_held() || \
lockdep_is_held(&cgroup_mutex), \
RCU_LOCKDEP_WARN(!rcu_read_lock_held() && \
!lockdep_is_held(&cgroup_mutex), \
"cgroup_mutex or RCU read lock required");
/*
......
......@@ -382,14 +382,14 @@ static int _cpu_down(unsigned int cpu, int tasks_frozen)
* will observe it.
*
* For CONFIG_PREEMPT we have preemptible RCU and its sync_rcu() might
* not imply sync_sched(), so explicitly call both.
* not imply sync_sched(), so wait for both.
*
* Do sync before park smpboot threads to take care the rcu boost case.
*/
#ifdef CONFIG_PREEMPT
synchronize_sched();
#endif
synchronize_rcu();
if (IS_ENABLED(CONFIG_PREEMPT))
synchronize_rcu_mult(call_rcu, call_rcu_sched);
else
synchronize_rcu();
smpboot_park_threads(cpu);
......
......@@ -451,9 +451,8 @@ EXPORT_SYMBOL(pid_task);
*/
struct task_struct *find_task_by_pid_ns(pid_t nr, struct pid_namespace *ns)
{
rcu_lockdep_assert(rcu_read_lock_held(),
"find_task_by_pid_ns() needs rcu_read_lock()"
" protection");
RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
"find_task_by_pid_ns() needs rcu_read_lock() protection");
return pid_task(find_pid_ns(nr, ns), PIDTYPE_PID);
}
......
......@@ -635,6 +635,8 @@ static struct rcu_torture_ops sched_ops = {
.deferred_free = rcu_sched_torture_deferred_free,
.sync = synchronize_sched,
.exp_sync = synchronize_sched_expedited,
.get_state = get_state_synchronize_sched,
.cond_sync = cond_synchronize_sched,
.call = call_rcu_sched,
.cb_barrier = rcu_barrier_sched,
.fqs = rcu_sched_force_quiescent_state,
......@@ -684,10 +686,20 @@ static struct rcu_torture_ops tasks_ops = {
#define RCUTORTURE_TASKS_OPS &tasks_ops,
static bool __maybe_unused torturing_tasks(void)
{
return cur_ops == &tasks_ops;
}
#else /* #ifdef CONFIG_TASKS_RCU */
#define RCUTORTURE_TASKS_OPS
static bool torturing_tasks(void)
{
return false;
}
#endif /* #else #ifdef CONFIG_TASKS_RCU */
/*
......@@ -823,9 +835,7 @@ rcu_torture_cbflood(void *arg)
}
if (err) {
VERBOSE_TOROUT_STRING("rcu_torture_cbflood disabled: Bad args or OOM");
while (!torture_must_stop())
schedule_timeout_interruptible(HZ);
return 0;
goto wait_for_stop;
}
VERBOSE_TOROUT_STRING("rcu_torture_cbflood task started");
do {
......@@ -844,6 +854,7 @@ rcu_torture_cbflood(void *arg)
stutter_wait("rcu_torture_cbflood");
} while (!torture_must_stop());
vfree(rhp);
wait_for_stop:
torture_kthread_stopping("rcu_torture_cbflood");
return 0;
}
......@@ -1088,7 +1099,8 @@ static void rcu_torture_timer(unsigned long unused)
p = rcu_dereference_check(rcu_torture_current,
rcu_read_lock_bh_held() ||
rcu_read_lock_sched_held() ||
srcu_read_lock_held(srcu_ctlp));
srcu_read_lock_held(srcu_ctlp) ||
torturing_tasks());
if (p == NULL) {
/* Leave because rcu_torture_writer is not yet underway */
cur_ops->readunlock(idx);
......@@ -1162,7 +1174,8 @@ rcu_torture_reader(void *arg)
p = rcu_dereference_check(rcu_torture_current,
rcu_read_lock_bh_held() ||
rcu_read_lock_sched_held() ||
srcu_read_lock_held(srcu_ctlp));
srcu_read_lock_held(srcu_ctlp) ||
torturing_tasks());
if (p == NULL) {
/* Wait for rcu_torture_writer to get underway */
cur_ops->readunlock(idx);
......@@ -1507,7 +1520,7 @@ static int rcu_torture_barrier_init(void)
int i;
int ret;
if (n_barrier_cbs == 0)
if (n_barrier_cbs <= 0)
return 0;
if (cur_ops->call == NULL || cur_ops->cb_barrier == NULL) {
pr_alert("%s" TORTURE_FLAG
......@@ -1786,12 +1799,15 @@ rcu_torture_init(void)
writer_task);
if (firsterr)
goto unwind;
fakewriter_tasks = kzalloc(nfakewriters * sizeof(fakewriter_tasks[0]),
GFP_KERNEL);
if (fakewriter_tasks == NULL) {
VERBOSE_TOROUT_ERRSTRING("out of memory");
firsterr = -ENOMEM;
goto unwind;
if (nfakewriters > 0) {
fakewriter_tasks = kzalloc(nfakewriters *
sizeof(fakewriter_tasks[0]),
GFP_KERNEL);
if (fakewriter_tasks == NULL) {
VERBOSE_TOROUT_ERRSTRING("out of memory");
firsterr = -ENOMEM;
goto unwind;
}
}
for (i = 0; i < nfakewriters; i++) {
firsterr = torture_create_kthread(rcu_torture_fakewriter,
......@@ -1818,7 +1834,7 @@ rcu_torture_init(void)
if (firsterr)
goto unwind;
}
if (test_no_idle_hz) {
if (test_no_idle_hz && shuffle_interval > 0) {
firsterr = torture_shuffle_init(shuffle_interval * HZ);
if (firsterr)
goto unwind;
......
......@@ -252,14 +252,15 @@ static bool srcu_readers_active_idx_check(struct srcu_struct *sp, int idx)
}
/**
* srcu_readers_active - returns approximate number of readers.
* srcu_readers_active - returns true if there are readers. and false
* otherwise
* @sp: which srcu_struct to count active readers (holding srcu_read_lock).
*
* Note that this is not an atomic primitive, and can therefore suffer
* severe errors when invoked on an active srcu_struct. That said, it
* can be useful as an error check at cleanup time.
*/
static int srcu_readers_active(struct srcu_struct *sp)
static bool srcu_readers_active(struct srcu_struct *sp)
{
int cpu;
unsigned long sum = 0;
......@@ -414,11 +415,11 @@ static void __synchronize_srcu(struct srcu_struct *sp, int trycount)
struct rcu_head *head = &rcu.head;
bool done = false;
rcu_lockdep_assert(!lock_is_held(&sp->dep_map) &&
!lock_is_held(&rcu_bh_lock_map) &&
!lock_is_held(&rcu_lock_map) &&
!lock_is_held(&rcu_sched_lock_map),
"Illegal synchronize_srcu() in same-type SRCU (or RCU) read-side critical section");
RCU_LOCKDEP_WARN(lock_is_held(&sp->dep_map) ||
lock_is_held(&rcu_bh_lock_map) ||
lock_is_held(&rcu_lock_map) ||
lock_is_held(&rcu_sched_lock_map),
"Illegal synchronize_srcu() in same-type SRCU (or in RCU) read-side critical section");
might_sleep();
init_completion(&rcu.completion);
......
......@@ -191,10 +191,10 @@ static void rcu_process_callbacks(struct softirq_action *unused)
*/
void synchronize_sched(void)
{
rcu_lockdep_assert(!lock_is_held(&rcu_bh_lock_map) &&
!lock_is_held(&rcu_lock_map) &&
!lock_is_held(&rcu_sched_lock_map),
"Illegal synchronize_sched() in RCU read-side critical section");
RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map) ||
lock_is_held(&rcu_lock_map) ||
lock_is_held(&rcu_sched_lock_map),
"Illegal synchronize_sched() in RCU read-side critical section");
cond_resched();
}
EXPORT_SYMBOL_GPL(synchronize_sched);
......
This diff is collapsed.
......@@ -27,6 +27,7 @@
#include <linux/threads.h>
#include <linux/cpumask.h>
#include <linux/seqlock.h>
#include <linux/stop_machine.h>
/*
* Define shape of hierarchy based on NR_CPUS, CONFIG_RCU_FANOUT, and
......@@ -36,8 +37,6 @@
* Of course, your mileage may vary.
*/
#define MAX_RCU_LVLS 4
#ifdef CONFIG_RCU_FANOUT
#define RCU_FANOUT CONFIG_RCU_FANOUT
#else /* #ifdef CONFIG_RCU_FANOUT */
......@@ -66,38 +65,53 @@
#if NR_CPUS <= RCU_FANOUT_1
# define RCU_NUM_LVLS 1
# define NUM_RCU_LVL_0 1
# define NUM_RCU_LVL_1 (NR_CPUS)
# define NUM_RCU_LVL_2 0
# define NUM_RCU_LVL_3 0
# define NUM_RCU_LVL_4 0
# define NUM_RCU_NODES NUM_RCU_LVL_0
# define NUM_RCU_LVL_INIT { NUM_RCU_LVL_0 }
# define RCU_NODE_NAME_INIT { "rcu_node_0" }
# define RCU_FQS_NAME_INIT { "rcu_node_fqs_0" }
# define RCU_EXP_NAME_INIT { "rcu_node_exp_0" }
# define RCU_EXP_SCHED_NAME_INIT \
{ "rcu_node_exp_sched_0" }
#elif NR_CPUS <= RCU_FANOUT_2
# define RCU_NUM_LVLS 2
# define NUM_RCU_LVL_0 1
# define NUM_RCU_LVL_1 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_1)
# define NUM_RCU_LVL_2 (NR_CPUS)
# define NUM_RCU_LVL_3 0
# define NUM_RCU_LVL_4 0
# define NUM_RCU_NODES (NUM_RCU_LVL_0 + NUM_RCU_LVL_1)
# define NUM_RCU_LVL_INIT { NUM_RCU_LVL_0, NUM_RCU_LVL_1 }
# define RCU_NODE_NAME_INIT { "rcu_node_0", "rcu_node_1" }
# define RCU_FQS_NAME_INIT { "rcu_node_fqs_0", "rcu_node_fqs_1" }
# define RCU_EXP_NAME_INIT { "rcu_node_exp_0", "rcu_node_exp_1" }
# define RCU_EXP_SCHED_NAME_INIT \
{ "rcu_node_exp_sched_0", "rcu_node_exp_sched_1" }
#elif NR_CPUS <= RCU_FANOUT_3
# define RCU_NUM_LVLS 3
# define NUM_RCU_LVL_0 1
# define NUM_RCU_LVL_1 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_2)
# define NUM_RCU_LVL_2 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_1)
# define NUM_RCU_LVL_3 (NR_CPUS)
# define NUM_RCU_LVL_4 0
# define NUM_RCU_NODES (NUM_RCU_LVL_0 + NUM_RCU_LVL_1 + NUM_RCU_LVL_2)
# define NUM_RCU_LVL_INIT { NUM_RCU_LVL_0, NUM_RCU_LVL_1, NUM_RCU_LVL_2 }
# define RCU_NODE_NAME_INIT { "rcu_node_0", "rcu_node_1", "rcu_node_2" }
# define RCU_FQS_NAME_INIT { "rcu_node_fqs_0", "rcu_node_fqs_1", "rcu_node_fqs_2" }
# define RCU_EXP_NAME_INIT { "rcu_node_exp_0", "rcu_node_exp_1", "rcu_node_exp_2" }
# define RCU_EXP_SCHED_NAME_INIT \
{ "rcu_node_exp_sched_0", "rcu_node_exp_sched_1", "rcu_node_exp_sched_2" }
#elif NR_CPUS <= RCU_FANOUT_4
# define RCU_NUM_LVLS 4
# define NUM_RCU_LVL_0 1
# define NUM_RCU_LVL_1 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_3)
# define NUM_RCU_LVL_2 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_2)
# define NUM_RCU_LVL_3 DIV_ROUND_UP(NR_CPUS, RCU_FANOUT_1)
# define NUM_RCU_LVL_4 (NR_CPUS)
# define NUM_RCU_NODES (NUM_RCU_LVL_0 + NUM_RCU_LVL_1 + NUM_RCU_LVL_2 + NUM_RCU_LVL_3)
# define NUM_RCU_LVL_INIT { NUM_RCU_LVL_0, NUM_RCU_LVL_1, NUM_RCU_LVL_2, NUM_RCU_LVL_3 }
# define RCU_NODE_NAME_INIT { "rcu_node_0", "rcu_node_1", "rcu_node_2", "rcu_node_3" }
# define RCU_FQS_NAME_INIT { "rcu_node_fqs_0", "rcu_node_fqs_1", "rcu_node_fqs_2", "rcu_node_fqs_3" }
# define RCU_EXP_NAME_INIT { "rcu_node_exp_0", "rcu_node_exp_1", "rcu_node_exp_2", "rcu_node_exp_3" }
# define RCU_EXP_SCHED_NAME_INIT \
{ "rcu_node_exp_sched_0", "rcu_node_exp_sched_1", "rcu_node_exp_sched_2", "rcu_node_exp_sched_3" }
#else
# error "CONFIG_RCU_FANOUT insufficient for NR_CPUS"
#endif /* #if (NR_CPUS) <= RCU_FANOUT_1 */
#define RCU_SUM (NUM_RCU_LVL_0 + NUM_RCU_LVL_1 + NUM_RCU_LVL_2 + NUM_RCU_LVL_3 + NUM_RCU_LVL_4)
#define NUM_RCU_NODES (RCU_SUM - NR_CPUS)
extern int rcu_num_lvls;
extern int rcu_num_nodes;
......@@ -236,6 +250,8 @@ struct rcu_node {
int need_future_gp[2];
/* Counts of upcoming no-CB GP requests. */
raw_spinlock_t fqslock ____cacheline_internodealigned_in_smp;
struct mutex exp_funnel_mutex ____cacheline_internodealigned_in_smp;
} ____cacheline_internodealigned_in_smp;
/*
......@@ -287,12 +303,13 @@ struct rcu_data {
bool gpwrap; /* Possible gpnum/completed wrap. */
struct rcu_node *mynode; /* This CPU's leaf of hierarchy */
unsigned long grpmask; /* Mask to apply to leaf qsmask. */
#ifdef CONFIG_RCU_CPU_STALL_INFO
unsigned long ticks_this_gp; /* The number of scheduling-clock */
/* ticks this CPU has handled */
/* during and after the last grace */
/* period it is aware of. */
#endif /* #ifdef CONFIG_RCU_CPU_STALL_INFO */
struct cpu_stop_work exp_stop_work;
/* Expedited grace-period control */
/* for CPU stopping. */
/* 2) batch handling */
/*
......@@ -355,11 +372,13 @@ struct rcu_data {
unsigned long n_rp_nocb_defer_wakeup;
unsigned long n_rp_need_nothing;
/* 6) _rcu_barrier() and OOM callbacks. */
/* 6) _rcu_barrier(), OOM callbacks, and expediting. */
struct rcu_head barrier_head;
#ifdef CONFIG_RCU_FAST_NO_HZ
struct rcu_head oom_head;
#endif /* #ifdef CONFIG_RCU_FAST_NO_HZ */
struct mutex exp_funnel_mutex;
bool exp_done; /* Expedited QS for this CPU? */
/* 7) Callback offloading. */
#ifdef CONFIG_RCU_NOCB_CPU
......@@ -387,9 +406,7 @@ struct rcu_data {
#endif /* #ifdef CONFIG_RCU_NOCB_CPU */
/* 8) RCU CPU stall data. */
#ifdef CONFIG_RCU_CPU_STALL_INFO
unsigned int softirq_snap; /* Snapshot of softirq activity. */
#endif /* #ifdef CONFIG_RCU_CPU_STALL_INFO */
int cpu;
struct rcu_state *rsp;
......@@ -442,9 +459,9 @@ do { \
*/
struct rcu_state {
struct rcu_node node[NUM_RCU_NODES]; /* Hierarchy. */
struct rcu_node *level[RCU_NUM_LVLS]; /* Hierarchy levels. */
u32 levelcnt[MAX_RCU_LVLS + 1]; /* # nodes in each level. */
u8 levelspread[RCU_NUM_LVLS]; /* kids/node in each level. */
struct rcu_node *level[RCU_NUM_LVLS + 1];
/* Hierarchy levels (+1 to */
/* shut bogus gcc warning) */
u8 flavor_mask; /* bit in flavor mask. */
struct rcu_data __percpu *rda; /* pointer of percu rcu_data. */
void (*call)(struct rcu_head *head, /* call_rcu() flavor. */
......@@ -479,21 +496,18 @@ struct rcu_state {
struct mutex barrier_mutex; /* Guards barrier fields. */
atomic_t barrier_cpu_count; /* # CPUs waiting on. */
struct completion barrier_completion; /* Wake at barrier end. */
unsigned long n_barrier_done; /* ++ at start and end of */
unsigned long barrier_sequence; /* ++ at start and end of */
/* _rcu_barrier(). */
/* End of fields guarded by barrier_mutex. */
atomic_long_t expedited_start; /* Starting ticket. */
atomic_long_t expedited_done; /* Done ticket. */
atomic_long_t expedited_wrap; /* # near-wrap incidents. */
atomic_long_t expedited_tryfail; /* # acquisition failures. */
unsigned long expedited_sequence; /* Take a ticket. */
atomic_long_t expedited_workdone0; /* # done by others #0. */
atomic_long_t expedited_workdone1; /* # done by others #1. */
atomic_long_t expedited_workdone2; /* # done by others #2. */
atomic_long_t expedited_workdone3; /* # done by others #3. */
atomic_long_t expedited_normal; /* # fallbacks to normal. */
atomic_long_t expedited_stoppedcpus; /* # successful stop_cpus. */
atomic_long_t expedited_done_tries; /* # tries to update _done. */
atomic_long_t expedited_done_lost; /* # times beaten to _done. */
atomic_long_t expedited_done_exit; /* # times exited _done loop. */
atomic_t expedited_need_qs; /* # CPUs left to check in. */
wait_queue_head_t expedited_wq; /* Wait for check-ins. */
unsigned long jiffies_force_qs; /* Time at which to invoke */
/* force_quiescent_state(). */
......@@ -527,7 +541,11 @@ struct rcu_state {
/* Values for rcu_state structure's gp_flags field. */
#define RCU_GP_WAIT_INIT 0 /* Initial state. */
#define RCU_GP_WAIT_GPS 1 /* Wait for grace-period start. */
#define RCU_GP_WAIT_FQS 2 /* Wait for force-quiescent-state time. */
#define RCU_GP_DONE_GPS 2 /* Wait done for grace-period start. */
#define RCU_GP_WAIT_FQS 3 /* Wait for force-quiescent-state time. */
#define RCU_GP_DOING_FQS 4 /* Wait done for force-quiescent-state time. */
#define RCU_GP_CLEANUP 5 /* Grace-period cleanup started. */
#define RCU_GP_CLEANED 6 /* Grace-period cleanup complete. */
extern struct list_head rcu_struct_flavors;
......@@ -635,3 +653,15 @@ static inline void rcu_nocb_q_lengths(struct rcu_data *rdp, long *ql, long *qll)
#endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */
}
#endif /* #ifdef CONFIG_RCU_TRACE */
/*
* Place this after a lock-acquisition primitive to guarantee that
* an UNLOCK+LOCK pair act as a full barrier. This guarantee applies
* if the UNLOCK and LOCK are executed by the same CPU or if the
* UNLOCK and LOCK operate on the same lock variable.
*/
#ifdef CONFIG_PPC
#define smp_mb__after_unlock_lock() smp_mb() /* Full ordering for lock. */
#else /* #ifdef CONFIG_PPC */
#define smp_mb__after_unlock_lock() do { } while (0)
#endif /* #else #ifdef CONFIG_PPC */
......@@ -82,10 +82,8 @@ static void __init rcu_bootup_announce_oddness(void)
pr_info("\tRCU lockdep checking is enabled.\n");
if (IS_ENABLED(CONFIG_RCU_TORTURE_TEST_RUNNABLE))
pr_info("\tRCU torture testing starts during boot.\n");
if (IS_ENABLED(CONFIG_RCU_CPU_STALL_INFO))
pr_info("\tAdditional per-CPU info printed with stalls.\n");
if (NUM_RCU_LVL_4 != 0)
pr_info("\tFour-level hierarchy is enabled.\n");
if (RCU_NUM_LVLS >= 4)
pr_info("\tFour(or more)-level hierarchy is enabled.\n");
if (RCU_FANOUT_LEAF != 16)
pr_info("\tBuild-time adjustment of leaf fanout to %d.\n",
RCU_FANOUT_LEAF);
......@@ -418,8 +416,6 @@ static void rcu_print_detail_task_stall(struct rcu_state *rsp)
rcu_print_detail_task_stall_rnp(rnp);
}
#ifdef CONFIG_RCU_CPU_STALL_INFO
static void rcu_print_task_stall_begin(struct rcu_node *rnp)
{
pr_err("\tTasks blocked on level-%d rcu_node (CPUs %d-%d):",
......@@ -431,18 +427,6 @@ static void rcu_print_task_stall_end(void)
pr_cont("\n");
}
#else /* #ifdef CONFIG_RCU_CPU_STALL_INFO */
static void rcu_print_task_stall_begin(struct rcu_node *rnp)
{
}
static void rcu_print_task_stall_end(void)
{
}
#endif /* #else #ifdef CONFIG_RCU_CPU_STALL_INFO */
/*
* Scan the current list of tasks blocked within RCU read-side critical
* sections, printing out the tid of each.
......@@ -538,10 +522,10 @@ EXPORT_SYMBOL_GPL(call_rcu);
*/
void synchronize_rcu(void)
{
rcu_lockdep_assert(!lock_is_held(&rcu_bh_lock_map) &&
!lock_is_held(&rcu_lock_map) &&
!lock_is_held(&rcu_sched_lock_map),
"Illegal synchronize_rcu() in RCU read-side critical section");
RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map) ||
lock_is_held(&rcu_lock_map) ||
lock_is_held(&rcu_sched_lock_map),
"Illegal synchronize_rcu() in RCU read-side critical section");
if (!rcu_scheduler_active)
return;
if (rcu_gp_is_expedited())
......@@ -552,8 +536,6 @@ void synchronize_rcu(void)
EXPORT_SYMBOL_GPL(synchronize_rcu);
static DECLARE_WAIT_QUEUE_HEAD(sync_rcu_preempt_exp_wq);
static unsigned long sync_rcu_preempt_exp_count;
static DEFINE_MUTEX(sync_rcu_preempt_exp_mutex);
/*
* Return non-zero if there are any tasks in RCU read-side critical
......@@ -573,7 +555,7 @@ static int rcu_preempted_readers_exp(struct rcu_node *rnp)
* for the current expedited grace period. Works only for preemptible
* RCU -- other RCU implementation use other means.
*
* Caller must hold sync_rcu_preempt_exp_mutex.
* Caller must hold the root rcu_node's exp_funnel_mutex.
*/
static int sync_rcu_preempt_exp_done(struct rcu_node *rnp)
{
......@@ -589,7 +571,7 @@ static int sync_rcu_preempt_exp_done(struct rcu_node *rnp)
* recursively up the tree. (Calm down, calm down, we do the recursion
* iteratively!)
*
* Caller must hold sync_rcu_preempt_exp_mutex.
* Caller must hold the root rcu_node's exp_funnel_mutex.
*/
static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
bool wake)
......@@ -628,7 +610,7 @@ static void rcu_report_exp_rnp(struct rcu_state *rsp, struct rcu_node *rnp,
* set the ->expmask bits on the leaf rcu_node structures to tell phase 2
* that work is needed here.
*
* Caller must hold sync_rcu_preempt_exp_mutex.
* Caller must hold the root rcu_node's exp_funnel_mutex.
*/
static void
sync_rcu_preempt_exp_init1(struct rcu_state *rsp, struct rcu_node *rnp)
......@@ -671,7 +653,7 @@ sync_rcu_preempt_exp_init1(struct rcu_state *rsp, struct rcu_node *rnp)
* invoke rcu_report_exp_rnp() to clear out the upper-level ->expmask bits,
* enabling rcu_read_unlock_special() to do the bit-clearing.
*
* Caller must hold sync_rcu_preempt_exp_mutex.
* Caller must hold the root rcu_node's exp_funnel_mutex.
*/
static void
sync_rcu_preempt_exp_init2(struct rcu_state *rsp, struct rcu_node *rnp)
......@@ -719,51 +701,17 @@ sync_rcu_preempt_exp_init2(struct rcu_state *rsp, struct rcu_node *rnp)
void synchronize_rcu_expedited(void)
{
struct rcu_node *rnp;
struct rcu_node *rnp_unlock;
struct rcu_state *rsp = rcu_state_p;
unsigned long snap;
int trycount = 0;
unsigned long s;
smp_mb(); /* Caller's modifications seen first by other CPUs. */
snap = READ_ONCE(sync_rcu_preempt_exp_count) + 1;
smp_mb(); /* Above access cannot bleed into critical section. */
s = rcu_exp_gp_seq_snap(rsp);
/*
* Block CPU-hotplug operations. This means that any CPU-hotplug
* operation that finds an rcu_node structure with tasks in the
* process of being boosted will know that all tasks blocking
* this expedited grace period will already be in the process of
* being boosted. This simplifies the process of moving tasks
* from leaf to root rcu_node structures.
*/
if (!try_get_online_cpus()) {
/* CPU-hotplug operation in flight, fall back to normal GP. */
wait_rcu_gp(call_rcu);
return;
}
rnp_unlock = exp_funnel_lock(rsp, s);
if (rnp_unlock == NULL)
return; /* Someone else did our work for us. */
/*
* Acquire lock, falling back to synchronize_rcu() if too many
* lock-acquisition failures. Of course, if someone does the
* expedited grace period for us, just leave.
*/
while (!mutex_trylock(&sync_rcu_preempt_exp_mutex)) {
if (ULONG_CMP_LT(snap,
READ_ONCE(sync_rcu_preempt_exp_count))) {
put_online_cpus();
goto mb_ret; /* Others did our work for us. */
}
if (trycount++ < 10) {
udelay(trycount * num_online_cpus());
} else {
put_online_cpus();
wait_rcu_gp(call_rcu);
return;
}
}
if (ULONG_CMP_LT(snap, READ_ONCE(sync_rcu_preempt_exp_count))) {
put_online_cpus();
goto unlock_mb_ret; /* Others did our work for us. */
}
rcu_exp_gp_seq_start(rsp);
/* force all RCU readers onto ->blkd_tasks lists. */
synchronize_sched_expedited();
......@@ -779,20 +727,14 @@ void synchronize_rcu_expedited(void)
rcu_for_each_leaf_node(rsp, rnp)
sync_rcu_preempt_exp_init2(rsp, rnp);
put_online_cpus();
/* Wait for snapshotted ->blkd_tasks lists to drain. */
rnp = rcu_get_root(rsp);
wait_event(sync_rcu_preempt_exp_wq,
sync_rcu_preempt_exp_done(rnp));
/* Clean up and exit. */
smp_mb(); /* ensure expedited GP seen before counter increment. */
WRITE_ONCE(sync_rcu_preempt_exp_count, sync_rcu_preempt_exp_count + 1);
unlock_mb_ret:
mutex_unlock(&sync_rcu_preempt_exp_mutex);
mb_ret:
smp_mb(); /* ensure subsequent action seen after grace period. */
rcu_exp_gp_seq_end(rsp);
mutex_unlock(&rnp_unlock->exp_funnel_mutex);
}
EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
......@@ -1061,8 +1003,7 @@ static int rcu_boost(struct rcu_node *rnp)
}
/*
* Priority-boosting kthread. One per leaf rcu_node and one for the
* root rcu_node.
* Priority-boosting kthread, one per leaf rcu_node.
*/
static int rcu_boost_kthread(void *arg)
{
......@@ -1680,12 +1621,10 @@ static int rcu_oom_notify(struct notifier_block *self,
*/
atomic_set(&oom_callback_count, 1);
get_online_cpus();
for_each_online_cpu(cpu) {
smp_call_function_single(cpu, rcu_oom_notify_cpu, NULL, 1);
cond_resched_rcu_qs();
}
put_online_cpus();
/* Unconditionally decrement: no need to wake ourselves up. */
atomic_dec(&oom_callback_count);
......@@ -1706,8 +1645,6 @@ early_initcall(rcu_register_oom_notifier);
#endif /* #else #if !defined(CONFIG_RCU_FAST_NO_HZ) */
#ifdef CONFIG_RCU_CPU_STALL_INFO
#ifdef CONFIG_RCU_FAST_NO_HZ
static void print_cpu_stall_fast_no_hz(char *cp, int cpu)
......@@ -1796,33 +1733,6 @@ static void increment_cpu_stall_ticks(void)
raw_cpu_inc(rsp->rda->ticks_this_gp);
}
#else /* #ifdef CONFIG_RCU_CPU_STALL_INFO */
static void print_cpu_stall_info_begin(void)
{
pr_cont(" {");
}
static void print_cpu_stall_info(struct rcu_state *rsp, int cpu)
{
pr_cont(" %d", cpu);
}
static void print_cpu_stall_info_end(void)
{
pr_cont("} ");
}
static void zero_cpu_stall_ticks(struct rcu_data *rdp)
{
}
static void increment_cpu_stall_ticks(void)
{
}
#endif /* #else #ifdef CONFIG_RCU_CPU_STALL_INFO */
#ifdef CONFIG_RCU_NOCB_CPU
/*
......
......@@ -81,9 +81,9 @@ static void r_stop(struct seq_file *m, void *v)
static int show_rcubarrier(struct seq_file *m, void *v)
{
struct rcu_state *rsp = (struct rcu_state *)m->private;
seq_printf(m, "bcc: %d nbd: %lu\n",
seq_printf(m, "bcc: %d bseq: %lu\n",
atomic_read(&rsp->barrier_cpu_count),
rsp->n_barrier_done);
rsp->barrier_sequence);
return 0;
}
......@@ -185,18 +185,15 @@ static int show_rcuexp(struct seq_file *m, void *v)
{
struct rcu_state *rsp = (struct rcu_state *)m->private;
seq_printf(m, "s=%lu d=%lu w=%lu tf=%lu wd1=%lu wd2=%lu n=%lu sc=%lu dt=%lu dl=%lu dx=%lu\n",
atomic_long_read(&rsp->expedited_start),
atomic_long_read(&rsp->expedited_done),
atomic_long_read(&rsp->expedited_wrap),
atomic_long_read(&rsp->expedited_tryfail),
seq_printf(m, "s=%lu wd0=%lu wd1=%lu wd2=%lu wd3=%lu n=%lu enq=%d sc=%lu\n",
rsp->expedited_sequence,
atomic_long_read(&rsp->expedited_workdone0),
atomic_long_read(&rsp->expedited_workdone1),
atomic_long_read(&rsp->expedited_workdone2),
atomic_long_read(&rsp->expedited_workdone3),
atomic_long_read(&rsp->expedited_normal),
atomic_long_read(&rsp->expedited_stoppedcpus),
atomic_long_read(&rsp->expedited_done_tries),
atomic_long_read(&rsp->expedited_done_lost),
atomic_long_read(&rsp->expedited_done_exit));
atomic_read(&rsp->expedited_need_qs),
rsp->expedited_sequence / 2);
return 0;
}
......
......@@ -62,6 +62,55 @@ MODULE_ALIAS("rcupdate");
module_param(rcu_expedited, int, 0);
#if defined(CONFIG_DEBUG_LOCK_ALLOC) && defined(CONFIG_PREEMPT_COUNT)
/**
* rcu_read_lock_sched_held() - might we be in RCU-sched read-side critical section?
*
* If CONFIG_DEBUG_LOCK_ALLOC is selected, returns nonzero iff in an
* RCU-sched read-side critical section. In absence of
* CONFIG_DEBUG_LOCK_ALLOC, this assumes we are in an RCU-sched read-side
* critical section unless it can prove otherwise. Note that disabling
* of preemption (including disabling irqs) counts as an RCU-sched
* read-side critical section. This is useful for debug checks in functions
* that required that they be called within an RCU-sched read-side
* critical section.
*
* Check debug_lockdep_rcu_enabled() to prevent false positives during boot
* and while lockdep is disabled.
*
* Note that if the CPU is in the idle loop from an RCU point of
* view (ie: that we are in the section between rcu_idle_enter() and
* rcu_idle_exit()) then rcu_read_lock_held() returns false even if the CPU
* did an rcu_read_lock(). The reason for this is that RCU ignores CPUs
* that are in such a section, considering these as in extended quiescent
* state, so such a CPU is effectively never in an RCU read-side critical
* section regardless of what RCU primitives it invokes. This state of
* affairs is required --- we need to keep an RCU-free window in idle
* where the CPU may possibly enter into low power mode. This way we can
* notice an extended quiescent state to other CPUs that started a grace
* period. Otherwise we would delay any grace period as long as we run in
* the idle task.
*
* Similarly, we avoid claiming an SRCU read lock held if the current
* CPU is offline.
*/
int rcu_read_lock_sched_held(void)
{
int lockdep_opinion = 0;
if (!debug_lockdep_rcu_enabled())
return 1;
if (!rcu_is_watching())
return 0;
if (!rcu_lockdep_current_cpu_online())
return 0;
if (debug_locks)
lockdep_opinion = lock_is_held(&rcu_sched_lock_map);
return lockdep_opinion || preempt_count() != 0 || irqs_disabled();
}
EXPORT_SYMBOL(rcu_read_lock_sched_held);
#endif
#ifndef CONFIG_TINY_RCU
static atomic_t rcu_expedited_nesting =
......@@ -269,20 +318,37 @@ void wakeme_after_rcu(struct rcu_head *head)
rcu = container_of(head, struct rcu_synchronize, head);
complete(&rcu->completion);
}
EXPORT_SYMBOL_GPL(wakeme_after_rcu);
void wait_rcu_gp(call_rcu_func_t crf)
void __wait_rcu_gp(bool checktiny, int n, call_rcu_func_t *crcu_array,
struct rcu_synchronize *rs_array)
{
struct rcu_synchronize rcu;
int i;
init_rcu_head_on_stack(&rcu.head);
init_completion(&rcu.completion);
/* Will wake me after RCU finished. */
crf(&rcu.head, wakeme_after_rcu);
/* Wait for it. */
wait_for_completion(&rcu.completion);
destroy_rcu_head_on_stack(&rcu.head);
/* Initialize and register callbacks for each flavor specified. */
for (i = 0; i < n; i++) {
if (checktiny &&
(crcu_array[i] == call_rcu ||
crcu_array[i] == call_rcu_bh)) {
might_sleep();
continue;
}
init_rcu_head_on_stack(&rs_array[i].head);
init_completion(&rs_array[i].completion);
(crcu_array[i])(&rs_array[i].head, wakeme_after_rcu);
}
/* Wait for all callbacks to be invoked. */
for (i = 0; i < n; i++) {
if (checktiny &&
(crcu_array[i] == call_rcu ||
crcu_array[i] == call_rcu_bh))
continue;
wait_for_completion(&rs_array[i].completion);
destroy_rcu_head_on_stack(&rs_array[i].head);
}
}
EXPORT_SYMBOL_GPL(wait_rcu_gp);
EXPORT_SYMBOL_GPL(__wait_rcu_gp);
#ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD
void init_rcu_head(struct rcu_head *head)
......@@ -523,8 +589,8 @@ EXPORT_SYMBOL_GPL(call_rcu_tasks);
void synchronize_rcu_tasks(void)
{
/* Complain if the scheduler has not started. */
rcu_lockdep_assert(!rcu_scheduler_active,
"synchronize_rcu_tasks called too soon");
RCU_LOCKDEP_WARN(!rcu_scheduler_active,
"synchronize_rcu_tasks called too soon");
/* Wait for the grace period. */
wait_rcu_gp(call_rcu_tasks);
......
......@@ -2200,8 +2200,8 @@ unsigned long to_ratio(u64 period, u64 runtime)
#ifdef CONFIG_SMP
inline struct dl_bw *dl_bw_of(int i)
{
rcu_lockdep_assert(rcu_read_lock_sched_held(),
"sched RCU must be held");
RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held(),
"sched RCU must be held");
return &cpu_rq(i)->rd->dl_bw;
}
......@@ -2210,8 +2210,8 @@ static inline int dl_bw_cpus(int i)
struct root_domain *rd = cpu_rq(i)->rd;
int cpus = 0;
rcu_lockdep_assert(rcu_read_lock_sched_held(),
"sched RCU must be held");
RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held(),
"sched RCU must be held");
for_each_cpu_and(i, rd->span, cpu_active_mask)
cpus++;
......
......@@ -92,12 +92,10 @@ config NO_HZ_FULL
depends on !ARCH_USES_GETTIMEOFFSET && GENERIC_CLOCKEVENTS
# We need at least one periodic CPU for timekeeping
depends on SMP
# RCU_USER_QS dependency
depends on HAVE_CONTEXT_TRACKING
# VIRT_CPU_ACCOUNTING_GEN dependency
depends on HAVE_VIRT_CPU_ACCOUNTING_GEN
select NO_HZ_COMMON
select RCU_USER_QS
select RCU_NOCB_CPU
select VIRT_CPU_ACCOUNTING_GEN
select IRQ_WORK
......
......@@ -338,20 +338,20 @@ static void workqueue_sysfs_unregister(struct workqueue_struct *wq);
#include <trace/events/workqueue.h>
#define assert_rcu_or_pool_mutex() \
rcu_lockdep_assert(rcu_read_lock_sched_held() || \
lockdep_is_held(&wq_pool_mutex), \
"sched RCU or wq_pool_mutex should be held")
RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held() && \
!lockdep_is_held(&wq_pool_mutex), \
"sched RCU or wq_pool_mutex should be held")
#define assert_rcu_or_wq_mutex(wq) \
rcu_lockdep_assert(rcu_read_lock_sched_held() || \
lockdep_is_held(&wq->mutex), \
"sched RCU or wq->mutex should be held")
RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held() && \
!lockdep_is_held(&wq->mutex), \
"sched RCU or wq->mutex should be held")
#define assert_rcu_or_wq_mutex_or_pool_mutex(wq) \
rcu_lockdep_assert(rcu_read_lock_sched_held() || \
lockdep_is_held(&wq->mutex) || \
lockdep_is_held(&wq_pool_mutex), \
"sched RCU, wq->mutex or wq_pool_mutex should be held")
RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held() && \
!lockdep_is_held(&wq->mutex) && \
!lockdep_is_held(&wq_pool_mutex), \
"sched RCU, wq->mutex or wq_pool_mutex should be held")
#define for_each_cpu_worker_pool(pool, cpu) \
for ((pool) = &per_cpu(cpu_worker_pools, cpu)[0]; \
......
......@@ -1353,20 +1353,6 @@ config RCU_CPU_STALL_TIMEOUT
RCU grace period persists, additional CPU stall warnings are
printed at more widely spaced intervals.
config RCU_CPU_STALL_INFO
bool "Print additional diagnostics on RCU CPU stall"
depends on (TREE_RCU || PREEMPT_RCU) && DEBUG_KERNEL
default y
help
For each stalled CPU that is aware of the current RCU grace
period, print out additional per-CPU diagnostic information
regarding scheduling-clock ticks, idle state, and,
for RCU_FAST_NO_HZ kernels, idle-entry state.
Say N if you are unsure.
Say Y if you want to enable such diagnostics.
config RCU_TRACE
bool "Enable tracing for RCU"
depends on DEBUG_KERNEL
......@@ -1379,7 +1365,7 @@ config RCU_TRACE
Say N if you are unsure.
config RCU_EQS_DEBUG
bool "Use this when adding any sort of NO_HZ support to your arch"
bool "Provide debugging asserts for adding NO_HZ support to an arch"
depends on DEBUG_KERNEL
help
This option provides consistency checks in RCU's handling of
......
......@@ -5011,6 +5011,7 @@ sub process {
"memory barrier without comment\n" . $herecurr);
}
}
# check for waitqueue_active without a comment.
if ($line =~ /\bwaitqueue_active\s*\(/) {
if (!ctx_has_comment($first_line, $linenr)) {
......@@ -5018,6 +5019,24 @@ sub process {
"waitqueue_active without comment\n" . $herecurr);
}
}
# Check for expedited grace periods that interrupt non-idle non-nohz
# online CPUs. These expedited can therefore degrade real-time response
# if used carelessly, and should be avoided where not absolutely
# needed. It is always OK to use synchronize_rcu_expedited() and
# synchronize_sched_expedited() at boot time (before real-time applications
# start) and in error situations where real-time response is compromised in
# any case. Note that synchronize_srcu_expedited() does -not- interrupt
# other CPUs, so don't warn on uses of synchronize_srcu_expedited().
# Of course, nothing comes for free, and srcu_read_lock() and
# srcu_read_unlock() do contain full memory barriers in payment for
# synchronize_srcu_expedited() non-interruption properties.
if ($line =~ /\b(synchronize_rcu_expedited|synchronize_sched_expedited)\(/) {
WARN("EXPEDITED_RCU_GRACE_PERIOD",
"expedited RCU grace periods should be avoided where they can degrade real-time response\n" . $herecurr);
}
# check of hardware specific defines
if ($line =~ m@^.\s*\#\s*if.*\b(__i386__|__powerpc64__|__sun__|__s390x__)\b@ && $realfile !~ m@include/asm-@) {
CHK("ARCH_DEFINES",
......
......@@ -400,9 +400,9 @@ static bool verify_new_ex(struct dev_cgroup *dev_cgroup,
{
bool match = false;
rcu_lockdep_assert(rcu_read_lock_held() ||
lockdep_is_held(&devcgroup_mutex),
"device_cgroup:verify_new_ex called without proper synchronization");
RCU_LOCKDEP_WARN(!rcu_read_lock_held() &&
lockdep_is_held(&devcgroup_mutex),
"device_cgroup:verify_new_ex called without proper synchronization");
if (dev_cgroup->behavior == DEVCG_DEFAULT_ALLOW) {
if (behavior == DEVCG_DEFAULT_ALLOW) {
......
......@@ -5,6 +5,6 @@ CONFIG_PREEMPT_NONE=n
CONFIG_PREEMPT_VOLUNTARY=n
CONFIG_PREEMPT=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=n
#CHECK#CONFIG_PROVE_RCU=n
CONFIG_PROVE_LOCKING=y
#CHECK#CONFIG_PROVE_RCU=y
CONFIG_RCU_EXPERT=y
......@@ -13,7 +13,6 @@ CONFIG_MAXSMP=y
CONFIG_RCU_NOCB_CPU=y
CONFIG_RCU_NOCB_CPU_ZERO=y
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_RCU_CPU_STALL_INFO=n
CONFIG_RCU_BOOST=n
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
CONFIG_RCU_EXPERT=y
......@@ -17,7 +17,6 @@ CONFIG_RCU_FANOUT_LEAF=3
CONFIG_RCU_NOCB_CPU=n
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=n
CONFIG_RCU_CPU_STALL_INFO=n
CONFIG_RCU_BOOST=n
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
CONFIG_RCU_EXPERT=y
......@@ -17,6 +17,5 @@ CONFIG_RCU_FANOUT_LEAF=3
CONFIG_RCU_NOCB_CPU=n
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=n
CONFIG_RCU_CPU_STALL_INFO=n
CONFIG_RCU_BOOST=n
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
......@@ -13,7 +13,6 @@ CONFIG_RCU_FANOUT=2
CONFIG_RCU_FANOUT_LEAF=2
CONFIG_RCU_NOCB_CPU=n
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_RCU_CPU_STALL_INFO=n
CONFIG_RCU_BOOST=y
CONFIG_RCU_KTHREAD_PRIO=2
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
......
......@@ -17,6 +17,5 @@ CONFIG_RCU_FANOUT=4
CONFIG_RCU_FANOUT_LEAF=4
CONFIG_RCU_NOCB_CPU=n
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_RCU_CPU_STALL_INFO=n
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
CONFIG_RCU_EXPERT=y
......@@ -17,6 +17,5 @@ CONFIG_RCU_NOCB_CPU_NONE=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=y
#CHECK#CONFIG_PROVE_RCU=y
CONFIG_RCU_CPU_STALL_INFO=n
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
CONFIG_RCU_EXPERT=y
......@@ -18,6 +18,5 @@ CONFIG_RCU_NOCB_CPU=n
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=y
#CHECK#CONFIG_PROVE_RCU=y
CONFIG_RCU_CPU_STALL_INFO=n
CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
CONFIG_RCU_EXPERT=y
......@@ -17,6 +17,5 @@ CONFIG_RCU_FANOUT=2
CONFIG_RCU_FANOUT_LEAF=2
CONFIG_RCU_NOCB_CPU=n
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_RCU_CPU_STALL_INFO=n
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
CONFIG_RCU_EXPERT=y
......@@ -19,7 +19,6 @@ CONFIG_RCU_NOCB_CPU_ALL=y
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_PROVE_LOCKING=y
#CHECK#CONFIG_PROVE_RCU=y
CONFIG_RCU_CPU_STALL_INFO=n
CONFIG_RCU_BOOST=n
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
CONFIG_RCU_EXPERT=y
......@@ -17,6 +17,5 @@ CONFIG_RCU_FANOUT_LEAF=2
CONFIG_RCU_NOCB_CPU=y
CONFIG_RCU_NOCB_CPU_ALL=y
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_RCU_CPU_STALL_INFO=n
CONFIG_RCU_BOOST=n
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
......@@ -13,7 +13,6 @@ CONFIG_SUSPEND=n
CONFIG_HIBERNATION=n
CONFIG_RCU_NOCB_CPU=n
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_RCU_CPU_STALL_INFO=n
CONFIG_RCU_BOOST=n
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
#CHECK#CONFIG_RCU_EXPERT=n
......@@ -16,7 +16,6 @@ CONFIG_PROVE_LOCKING -- Do several, covering CONFIG_DEBUG_LOCK_ALLOC=y and not.
CONFIG_PROVE_RCU -- Hardwired to CONFIG_PROVE_LOCKING.
CONFIG_RCU_BOOST -- one of PREEMPT_RCU.
CONFIG_RCU_KTHREAD_PRIO -- set to 2 for _BOOST testing.
CONFIG_RCU_CPU_STALL_INFO -- Now default, avoid at least twice.
CONFIG_RCU_FANOUT -- Cover hierarchy, but overlap with others.
CONFIG_RCU_FANOUT_LEAF -- Do one non-default.
CONFIG_RCU_FAST_NO_HZ -- Do one, but not with CONFIG_RCU_NOCB_CPU_ALL.
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment