Commit 31da0670 authored by Paul E. McKenney's avatar Paul E. McKenney

Merge branches 'consolidate.2019.08.01b', 'fixes.2019.08.12a',...

Merge branches 'consolidate.2019.08.01b', 'fixes.2019.08.12a', 'lists.2019.08.13a' and 'torture.2019.08.01b' into HEAD

consolidate.2019.08.01b: Further consolidation cleanups
fixes.2019.08.12a: Miscellaneous fixes
lists.2019.08.13a: Optional lockdep arguments for RCU list macros
torture.2019.08.01b: Torture-test updates
...@@ -2129,6 +2129,8 @@ Some of the relevant points of interest are as follows: ...@@ -2129,6 +2129,8 @@ Some of the relevant points of interest are as follows:
<li> <a href="#Hotplug CPU">Hotplug CPU</a>. <li> <a href="#Hotplug CPU">Hotplug CPU</a>.
<li> <a href="#Scheduler and RCU">Scheduler and RCU</a>. <li> <a href="#Scheduler and RCU">Scheduler and RCU</a>.
<li> <a href="#Tracing and RCU">Tracing and RCU</a>. <li> <a href="#Tracing and RCU">Tracing and RCU</a>.
<li> <a href="#Accesses to User Memory and RCU">
Accesses to User Memory and RCU</a>.
<li> <a href="#Energy Efficiency">Energy Efficiency</a>. <li> <a href="#Energy Efficiency">Energy Efficiency</a>.
<li> <a href="#Scheduling-Clock Interrupts and RCU"> <li> <a href="#Scheduling-Clock Interrupts and RCU">
Scheduling-Clock Interrupts and RCU</a>. Scheduling-Clock Interrupts and RCU</a>.
...@@ -2512,7 +2514,7 @@ disabled across the entire RCU read-side critical section. ...@@ -2512,7 +2514,7 @@ disabled across the entire RCU read-side critical section.
<p> <p>
It is possible to use tracing on RCU code, but tracing itself It is possible to use tracing on RCU code, but tracing itself
uses RCU. uses RCU.
For this reason, <tt>rcu_dereference_raw_notrace()</tt> For this reason, <tt>rcu_dereference_raw_check()</tt>
is provided for use by tracing, which avoids the destructive is provided for use by tracing, which avoids the destructive
recursion that could otherwise ensue. recursion that could otherwise ensue.
This API is also used by virtualization in some architectures, This API is also used by virtualization in some architectures,
...@@ -2521,6 +2523,75 @@ cannot be used. ...@@ -2521,6 +2523,75 @@ cannot be used.
The tracing folks both located the requirement and provided the The tracing folks both located the requirement and provided the
needed fix, so this surprise requirement was relatively painless. needed fix, so this surprise requirement was relatively painless.
<h3><a name="Accesses to User Memory and RCU">
Accesses to User Memory and RCU</a></h3>
<p>
The kernel needs to access user-space memory, for example, to access
data referenced by system-call parameters.
The <tt>get_user()</tt> macro does this job.
<p>
However, user-space memory might well be paged out, which means
that <tt>get_user()</tt> might well page-fault and thus block while
waiting for the resulting I/O to complete.
It would be a very bad thing for the compiler to reorder
a <tt>get_user()</tt> invocation into an RCU read-side critical
section.
For example, suppose that the source code looked like this:
<blockquote>
<pre>
1 rcu_read_lock();
2 p = rcu_dereference(gp);
3 v = p-&gt;value;
4 rcu_read_unlock();
5 get_user(user_v, user_p);
6 do_something_with(v, user_v);
</pre>
</blockquote>
<p>
The compiler must not be permitted to transform this source code into
the following:
<blockquote>
<pre>
1 rcu_read_lock();
2 p = rcu_dereference(gp);
3 get_user(user_v, user_p); // BUG: POSSIBLE PAGE FAULT!!!
4 v = p-&gt;value;
5 rcu_read_unlock();
6 do_something_with(v, user_v);
</pre>
</blockquote>
<p>
If the compiler did make this transformation in a
<tt>CONFIG_PREEMPT=n</tt> kernel build, and if <tt>get_user()</tt> did
page fault, the result would be a quiescent state in the middle
of an RCU read-side critical section.
This misplaced quiescent state could result in line&nbsp;4 being
a use-after-free access, which could be bad for your kernel's
actuarial statistics.
Similar examples can be constructed with the call to <tt>get_user()</tt>
preceding the <tt>rcu_read_lock()</tt>.
<p>
Unfortunately, <tt>get_user()</tt> doesn't have any particular
ordering properties, and in some architectures the underlying <tt>asm</tt>
isn't even marked <tt>volatile</tt>.
And even if it was marked <tt>volatile</tt>, the above access to
<tt>p-&gt;value</tt> is not volatile, so the compiler would not have any
reason to keep those two accesses in order.
<p>
Therefore, the Linux-kernel definitions of <tt>rcu_read_lock()</tt>
and <tt>rcu_read_unlock()</tt> must act as compiler barriers,
at least for outermost instances of <tt>rcu_read_lock()</tt> and
<tt>rcu_read_unlock()</tt> within a nested set of RCU read-side critical
sections.
<h3><a name="Energy Efficiency">Energy Efficiency</a></h3> <h3><a name="Energy Efficiency">Energy Efficiency</a></h3>
<p> <p>
......
...@@ -57,6 +57,12 @@ o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that ...@@ -57,6 +57,12 @@ o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
CONFIG_PREEMPT_RCU case, you might see stall-warning CONFIG_PREEMPT_RCU case, you might see stall-warning
messages. messages.
You can use the rcutree.kthread_prio kernel boot parameter to
increase the scheduling priority of RCU's kthreads, which can
help avoid this problem. However, please note that doing this
can increase your system's context-switch rate and thus degrade
performance.
o A periodic interrupt whose handler takes longer than the time o A periodic interrupt whose handler takes longer than the time
interval between successive pairs of interrupts. This can interval between successive pairs of interrupts. This can
prevent RCU's kthreads and softirq handlers from running. prevent RCU's kthreads and softirq handlers from running.
......
...@@ -4047,6 +4047,10 @@ ...@@ -4047,6 +4047,10 @@
rcutorture.verbose= [KNL] rcutorture.verbose= [KNL]
Enable additional printk() statements. Enable additional printk() statements.
rcupdate.rcu_cpu_stall_ftrace_dump= [KNL]
Dump ftrace buffer after reporting RCU CPU
stall warning.
rcupdate.rcu_cpu_stall_suppress= [KNL] rcupdate.rcu_cpu_stall_suppress= [KNL]
Suppress RCU CPU stall warning messages. Suppress RCU CPU stall warning messages.
......
...@@ -9326,7 +9326,7 @@ F: drivers/misc/lkdtm/* ...@@ -9326,7 +9326,7 @@ F: drivers/misc/lkdtm/*
LINUX KERNEL MEMORY CONSISTENCY MODEL (LKMM) LINUX KERNEL MEMORY CONSISTENCY MODEL (LKMM)
M: Alan Stern <stern@rowland.harvard.edu> M: Alan Stern <stern@rowland.harvard.edu>
M: Andrea Parri <andrea.parri@amarulasolutions.com> M: Andrea Parri <parri.andrea@gmail.com>
M: Will Deacon <will@kernel.org> M: Will Deacon <will@kernel.org>
M: Peter Zijlstra <peterz@infradead.org> M: Peter Zijlstra <peterz@infradead.org>
M: Boqun Feng <boqun.feng@gmail.com> M: Boqun Feng <boqun.feng@gmail.com>
......
...@@ -264,15 +264,13 @@ int __cpu_disable(void) ...@@ -264,15 +264,13 @@ int __cpu_disable(void)
return 0; return 0;
} }
static DECLARE_COMPLETION(cpu_died);
/* /*
* called on the thread which is asking for a CPU to be shutdown - * called on the thread which is asking for a CPU to be shutdown -
* waits until shutdown has completed, or it is timed out. * waits until shutdown has completed, or it is timed out.
*/ */
void __cpu_die(unsigned int cpu) void __cpu_die(unsigned int cpu)
{ {
if (!wait_for_completion_timeout(&cpu_died, msecs_to_jiffies(5000))) { if (!cpu_wait_death(cpu, 5)) {
pr_err("CPU%u: cpu didn't die\n", cpu); pr_err("CPU%u: cpu didn't die\n", cpu);
return; return;
} }
...@@ -319,7 +317,7 @@ void arch_cpu_idle_dead(void) ...@@ -319,7 +317,7 @@ void arch_cpu_idle_dead(void)
* this returns, power and/or clocks can be removed at any point * this returns, power and/or clocks can be removed at any point
* from this CPU and its cache by platform_cpu_kill(). * from this CPU and its cache by platform_cpu_kill().
*/ */
complete(&cpu_died); (void)cpu_report_death();
/* /*
* Ensure that the cache lines associated with that completion are * Ensure that the cache lines associated with that completion are
......
...@@ -535,7 +535,7 @@ static inline void note_hpte_modification(struct kvm *kvm, ...@@ -535,7 +535,7 @@ static inline void note_hpte_modification(struct kvm *kvm,
*/ */
static inline struct kvm_memslots *kvm_memslots_raw(struct kvm *kvm) static inline struct kvm_memslots *kvm_memslots_raw(struct kvm *kvm)
{ {
return rcu_dereference_raw_notrace(kvm->memslots[0]); return rcu_dereference_raw_check(kvm->memslots[0]);
} }
extern void kvmppc_mmu_debugfs_init(struct kvm *kvm); extern void kvmppc_mmu_debugfs_init(struct kvm *kvm);
......
...@@ -29,6 +29,7 @@ ...@@ -29,6 +29,7 @@
static bool pci_mmcfg_running_state; static bool pci_mmcfg_running_state;
static bool pci_mmcfg_arch_init_failed; static bool pci_mmcfg_arch_init_failed;
static DEFINE_MUTEX(pci_mmcfg_lock); static DEFINE_MUTEX(pci_mmcfg_lock);
#define pci_mmcfg_lock_held() lock_is_held(&(pci_mmcfg_lock).dep_map)
LIST_HEAD(pci_mmcfg_list); LIST_HEAD(pci_mmcfg_list);
...@@ -54,7 +55,7 @@ static void list_add_sorted(struct pci_mmcfg_region *new) ...@@ -54,7 +55,7 @@ static void list_add_sorted(struct pci_mmcfg_region *new)
struct pci_mmcfg_region *cfg; struct pci_mmcfg_region *cfg;
/* keep list sorted by segment and starting bus number */ /* keep list sorted by segment and starting bus number */
list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) { list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list, pci_mmcfg_lock_held()) {
if (cfg->segment > new->segment || if (cfg->segment > new->segment ||
(cfg->segment == new->segment && (cfg->segment == new->segment &&
cfg->start_bus >= new->start_bus)) { cfg->start_bus >= new->start_bus)) {
...@@ -118,7 +119,7 @@ struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus) ...@@ -118,7 +119,7 @@ struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus)
{ {
struct pci_mmcfg_region *cfg; struct pci_mmcfg_region *cfg;
list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list, pci_mmcfg_lock_held())
if (cfg->segment == segment && if (cfg->segment == segment &&
cfg->start_bus <= bus && bus <= cfg->end_bus) cfg->start_bus <= bus && bus <= cfg->end_bus)
return cfg; return cfg;
......
...@@ -14,6 +14,7 @@ ...@@ -14,6 +14,7 @@
#include <linux/slab.h> #include <linux/slab.h>
#include <linux/mm.h> #include <linux/mm.h>
#include <linux/highmem.h> #include <linux/highmem.h>
#include <linux/lockdep.h>
#include <linux/pci.h> #include <linux/pci.h>
#include <linux/interrupt.h> #include <linux/interrupt.h>
#include <linux/kmod.h> #include <linux/kmod.h>
...@@ -80,6 +81,7 @@ struct acpi_ioremap { ...@@ -80,6 +81,7 @@ struct acpi_ioremap {
static LIST_HEAD(acpi_ioremaps); static LIST_HEAD(acpi_ioremaps);
static DEFINE_MUTEX(acpi_ioremap_lock); static DEFINE_MUTEX(acpi_ioremap_lock);
#define acpi_ioremap_lock_held() lock_is_held(&acpi_ioremap_lock.dep_map)
static void __init acpi_request_region (struct acpi_generic_address *gas, static void __init acpi_request_region (struct acpi_generic_address *gas,
unsigned int length, char *desc) unsigned int length, char *desc)
...@@ -206,7 +208,7 @@ acpi_map_lookup(acpi_physical_address phys, acpi_size size) ...@@ -206,7 +208,7 @@ acpi_map_lookup(acpi_physical_address phys, acpi_size size)
{ {
struct acpi_ioremap *map; struct acpi_ioremap *map;
list_for_each_entry_rcu(map, &acpi_ioremaps, list) list_for_each_entry_rcu(map, &acpi_ioremaps, list, acpi_ioremap_lock_held())
if (map->phys <= phys && if (map->phys <= phys &&
phys + size <= map->phys + map->size) phys + size <= map->phys + map->size)
return map; return map;
...@@ -249,7 +251,7 @@ acpi_map_lookup_virt(void __iomem *virt, acpi_size size) ...@@ -249,7 +251,7 @@ acpi_map_lookup_virt(void __iomem *virt, acpi_size size)
{ {
struct acpi_ioremap *map; struct acpi_ioremap *map;
list_for_each_entry_rcu(map, &acpi_ioremaps, list) list_for_each_entry_rcu(map, &acpi_ioremaps, list, acpi_ioremap_lock_held())
if (map->virt <= virt && if (map->virt <= virt &&
virt + size <= map->virt + map->size) virt + size <= map->virt + map->size)
return map; return map;
......
...@@ -165,6 +165,7 @@ static inline int devtmpfs_init(void) { return 0; } ...@@ -165,6 +165,7 @@ static inline int devtmpfs_init(void) { return 0; }
/* Device links support */ /* Device links support */
extern int device_links_read_lock(void); extern int device_links_read_lock(void);
extern void device_links_read_unlock(int idx); extern void device_links_read_unlock(int idx);
extern int device_links_read_lock_held(void);
extern int device_links_check_suppliers(struct device *dev); extern int device_links_check_suppliers(struct device *dev);
extern void device_links_driver_bound(struct device *dev); extern void device_links_driver_bound(struct device *dev);
extern void device_links_driver_cleanup(struct device *dev); extern void device_links_driver_cleanup(struct device *dev);
......
...@@ -68,6 +68,11 @@ void device_links_read_unlock(int idx) ...@@ -68,6 +68,11 @@ void device_links_read_unlock(int idx)
{ {
srcu_read_unlock(&device_links_srcu, idx); srcu_read_unlock(&device_links_srcu, idx);
} }
int device_links_read_lock_held(void)
{
return srcu_read_lock_held(&device_links_srcu);
}
#else /* !CONFIG_SRCU */ #else /* !CONFIG_SRCU */
static DECLARE_RWSEM(device_links_lock); static DECLARE_RWSEM(device_links_lock);
...@@ -91,6 +96,13 @@ void device_links_read_unlock(int not_used) ...@@ -91,6 +96,13 @@ void device_links_read_unlock(int not_used)
{ {
up_read(&device_links_lock); up_read(&device_links_lock);
} }
#ifdef CONFIG_DEBUG_LOCK_ALLOC
int device_links_read_lock_held(void)
{
return lockdep_is_held(&device_links_lock);
}
#endif
#endif /* !CONFIG_SRCU */ #endif /* !CONFIG_SRCU */
/** /**
......
...@@ -287,7 +287,8 @@ static int rpm_get_suppliers(struct device *dev) ...@@ -287,7 +287,8 @@ static int rpm_get_suppliers(struct device *dev)
{ {
struct device_link *link; struct device_link *link;
list_for_each_entry_rcu(link, &dev->links.suppliers, c_node) { list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
device_links_read_lock_held()) {
int retval; int retval;
if (!(link->flags & DL_FLAG_PM_RUNTIME) || if (!(link->flags & DL_FLAG_PM_RUNTIME) ||
...@@ -309,7 +310,8 @@ static void rpm_put_suppliers(struct device *dev) ...@@ -309,7 +310,8 @@ static void rpm_put_suppliers(struct device *dev)
{ {
struct device_link *link; struct device_link *link;
list_for_each_entry_rcu(link, &dev->links.suppliers, c_node) { list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
device_links_read_lock_held()) {
if (READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND) if (READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND)
continue; continue;
...@@ -1640,7 +1642,8 @@ void pm_runtime_clean_up_links(struct device *dev) ...@@ -1640,7 +1642,8 @@ void pm_runtime_clean_up_links(struct device *dev)
idx = device_links_read_lock(); idx = device_links_read_lock();
list_for_each_entry_rcu(link, &dev->links.consumers, s_node) { list_for_each_entry_rcu(link, &dev->links.consumers, s_node,
device_links_read_lock_held()) {
if (link->flags & DL_FLAG_STATELESS) if (link->flags & DL_FLAG_STATELESS)
continue; continue;
...@@ -1662,7 +1665,8 @@ void pm_runtime_get_suppliers(struct device *dev) ...@@ -1662,7 +1665,8 @@ void pm_runtime_get_suppliers(struct device *dev)
idx = device_links_read_lock(); idx = device_links_read_lock();
list_for_each_entry_rcu(link, &dev->links.suppliers, c_node) list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
device_links_read_lock_held())
if (link->flags & DL_FLAG_PM_RUNTIME) { if (link->flags & DL_FLAG_PM_RUNTIME) {
link->supplier_preactivated = true; link->supplier_preactivated = true;
refcount_inc(&link->rpm_active); refcount_inc(&link->rpm_active);
...@@ -1683,7 +1687,8 @@ void pm_runtime_put_suppliers(struct device *dev) ...@@ -1683,7 +1687,8 @@ void pm_runtime_put_suppliers(struct device *dev)
idx = device_links_read_lock(); idx = device_links_read_lock();
list_for_each_entry_rcu(link, &dev->links.suppliers, c_node) list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
device_links_read_lock_held())
if (link->supplier_preactivated) { if (link->supplier_preactivated) {
link->supplier_preactivated = false; link->supplier_preactivated = false;
if (refcount_dec_not_one(&link->rpm_active)) if (refcount_dec_not_one(&link->rpm_active))
......
...@@ -31,9 +31,7 @@ struct rcu_sync { ...@@ -31,9 +31,7 @@ struct rcu_sync {
*/ */
static inline bool rcu_sync_is_idle(struct rcu_sync *rsp) static inline bool rcu_sync_is_idle(struct rcu_sync *rsp)
{ {
RCU_LOCKDEP_WARN(!rcu_read_lock_held() && RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),
!rcu_read_lock_bh_held() &&
!rcu_read_lock_sched_held(),
"suspicious rcu_sync_is_idle() usage"); "suspicious rcu_sync_is_idle() usage");
return !READ_ONCE(rsp->gp_state); /* GP_IDLE */ return !READ_ONCE(rsp->gp_state); /* GP_IDLE */
} }
......
...@@ -40,6 +40,24 @@ static inline void INIT_LIST_HEAD_RCU(struct list_head *list) ...@@ -40,6 +40,24 @@ static inline void INIT_LIST_HEAD_RCU(struct list_head *list)
*/ */
#define list_next_rcu(list) (*((struct list_head __rcu **)(&(list)->next))) #define list_next_rcu(list) (*((struct list_head __rcu **)(&(list)->next)))
/*
* Check during list traversal that we are within an RCU reader
*/
#define check_arg_count_one(dummy)
#ifdef CONFIG_PROVE_RCU_LIST
#define __list_check_rcu(dummy, cond, extra...) \
({ \
check_arg_count_one(extra); \
RCU_LOCKDEP_WARN(!cond && !rcu_read_lock_any_held(), \
"RCU-list traversed in non-reader section!"); \
})
#else
#define __list_check_rcu(dummy, cond, extra...) \
({ check_arg_count_one(extra); })
#endif
/* /*
* Insert a new entry between two known consecutive entries. * Insert a new entry between two known consecutive entries.
* *
...@@ -343,13 +361,15 @@ static inline void list_splice_tail_init_rcu(struct list_head *list, ...@@ -343,13 +361,15 @@ static inline void list_splice_tail_init_rcu(struct list_head *list,
* @pos: the type * to use as a loop cursor. * @pos: the type * to use as a loop cursor.
* @head: the head for your list. * @head: the head for your list.
* @member: the name of the list_head within the struct. * @member: the name of the list_head within the struct.
* @cond: optional lockdep expression if called from non-RCU protection.
* *
* This list-traversal primitive may safely run concurrently with * This list-traversal primitive may safely run concurrently with
* the _rcu list-mutation primitives such as list_add_rcu() * the _rcu list-mutation primitives such as list_add_rcu()
* as long as the traversal is guarded by rcu_read_lock(). * as long as the traversal is guarded by rcu_read_lock().
*/ */
#define list_for_each_entry_rcu(pos, head, member) \ #define list_for_each_entry_rcu(pos, head, member, cond...) \
for (pos = list_entry_rcu((head)->next, typeof(*pos), member); \ for (__list_check_rcu(dummy, ## cond, 0), \
pos = list_entry_rcu((head)->next, typeof(*pos), member); \
&pos->member != (head); \ &pos->member != (head); \
pos = list_entry_rcu(pos->member.next, typeof(*pos), member)) pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
...@@ -616,13 +636,15 @@ static inline void hlist_add_behind_rcu(struct hlist_node *n, ...@@ -616,13 +636,15 @@ static inline void hlist_add_behind_rcu(struct hlist_node *n,
* @pos: the type * to use as a loop cursor. * @pos: the type * to use as a loop cursor.
* @head: the head for your list. * @head: the head for your list.
* @member: the name of the hlist_node within the struct. * @member: the name of the hlist_node within the struct.
* @cond: optional lockdep expression if called from non-RCU protection.
* *
* This list-traversal primitive may safely run concurrently with * This list-traversal primitive may safely run concurrently with
* the _rcu list-mutation primitives such as hlist_add_head_rcu() * the _rcu list-mutation primitives such as hlist_add_head_rcu()
* as long as the traversal is guarded by rcu_read_lock(). * as long as the traversal is guarded by rcu_read_lock().
*/ */
#define hlist_for_each_entry_rcu(pos, head, member) \ #define hlist_for_each_entry_rcu(pos, head, member, cond...) \
for (pos = hlist_entry_safe (rcu_dereference_raw(hlist_first_rcu(head)),\ for (__list_check_rcu(dummy, ## cond, 0), \
pos = hlist_entry_safe(rcu_dereference_raw(hlist_first_rcu(head)),\
typeof(*(pos)), member); \ typeof(*(pos)), member); \
pos; \ pos; \
pos = hlist_entry_safe(rcu_dereference_raw(hlist_next_rcu(\ pos = hlist_entry_safe(rcu_dereference_raw(hlist_next_rcu(\
...@@ -642,10 +664,10 @@ static inline void hlist_add_behind_rcu(struct hlist_node *n, ...@@ -642,10 +664,10 @@ static inline void hlist_add_behind_rcu(struct hlist_node *n,
* not do any RCU debugging or tracing. * not do any RCU debugging or tracing.
*/ */
#define hlist_for_each_entry_rcu_notrace(pos, head, member) \ #define hlist_for_each_entry_rcu_notrace(pos, head, member) \
for (pos = hlist_entry_safe (rcu_dereference_raw_notrace(hlist_first_rcu(head)),\ for (pos = hlist_entry_safe(rcu_dereference_raw_check(hlist_first_rcu(head)),\
typeof(*(pos)), member); \ typeof(*(pos)), member); \
pos; \ pos; \
pos = hlist_entry_safe(rcu_dereference_raw_notrace(hlist_next_rcu(\ pos = hlist_entry_safe(rcu_dereference_raw_check(hlist_next_rcu(\
&(pos)->member)), typeof(*(pos)), member)) &(pos)->member)), typeof(*(pos)), member))
/** /**
......
...@@ -221,6 +221,7 @@ int debug_lockdep_rcu_enabled(void); ...@@ -221,6 +221,7 @@ int debug_lockdep_rcu_enabled(void);
int rcu_read_lock_held(void); int rcu_read_lock_held(void);
int rcu_read_lock_bh_held(void); int rcu_read_lock_bh_held(void);
int rcu_read_lock_sched_held(void); int rcu_read_lock_sched_held(void);
int rcu_read_lock_any_held(void);
#else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */ #else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
...@@ -241,6 +242,12 @@ static inline int rcu_read_lock_sched_held(void) ...@@ -241,6 +242,12 @@ static inline int rcu_read_lock_sched_held(void)
{ {
return !preemptible(); return !preemptible();
} }
static inline int rcu_read_lock_any_held(void)
{
return !preemptible();
}
#endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */ #endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */
#ifdef CONFIG_PROVE_RCU #ifdef CONFIG_PROVE_RCU
...@@ -476,7 +483,7 @@ do { \ ...@@ -476,7 +483,7 @@ do { \
* The no-tracing version of rcu_dereference_raw() must not call * The no-tracing version of rcu_dereference_raw() must not call
* rcu_read_lock_held(). * rcu_read_lock_held().
*/ */
#define rcu_dereference_raw_notrace(p) __rcu_dereference_check((p), 1, __rcu) #define rcu_dereference_raw_check(p) __rcu_dereference_check((p), 1, __rcu)
/** /**
* rcu_dereference_protected() - fetch RCU pointer when updates prevented * rcu_dereference_protected() - fetch RCU pointer when updates prevented
......
...@@ -620,7 +620,7 @@ static void print_lock(struct held_lock *hlock) ...@@ -620,7 +620,7 @@ static void print_lock(struct held_lock *hlock)
return; return;
} }
printk(KERN_CONT "%p", hlock->instance); printk(KERN_CONT "%px", hlock->instance);
print_lock_name(lock); print_lock_name(lock);
printk(KERN_CONT ", at: %pS\n", (void *)hlock->acquire_ip); printk(KERN_CONT ", at: %pS\n", (void *)hlock->acquire_ip);
} }
......
...@@ -8,6 +8,17 @@ menu "RCU Debugging" ...@@ -8,6 +8,17 @@ menu "RCU Debugging"
config PROVE_RCU config PROVE_RCU
def_bool PROVE_LOCKING def_bool PROVE_LOCKING
config PROVE_RCU_LIST
bool "RCU list lockdep debugging"
depends on PROVE_RCU && RCU_EXPERT
default n
help
Enable RCU lockdep checking for list usages. By default it is
turned off since there are several list RCU users that still
need to be converted to pass a lockdep expression. To prevent
false-positive splats, we keep it default disabled but once all
users are converted, we can remove this config option.
config TORTURE_TEST config TORTURE_TEST
tristate tristate
default n default n
......
...@@ -227,6 +227,7 @@ static inline bool __rcu_reclaim(const char *rn, struct rcu_head *head) ...@@ -227,6 +227,7 @@ static inline bool __rcu_reclaim(const char *rn, struct rcu_head *head)
#ifdef CONFIG_RCU_STALL_COMMON #ifdef CONFIG_RCU_STALL_COMMON
extern int rcu_cpu_stall_ftrace_dump;
extern int rcu_cpu_stall_suppress; extern int rcu_cpu_stall_suppress;
extern int rcu_cpu_stall_timeout; extern int rcu_cpu_stall_timeout;
int rcu_jiffies_till_stall_check(void); int rcu_jiffies_till_stall_check(void);
......
...@@ -76,27 +76,6 @@ static inline bool rcu_segcblist_restempty(struct rcu_segcblist *rsclp, int seg) ...@@ -76,27 +76,6 @@ static inline bool rcu_segcblist_restempty(struct rcu_segcblist *rsclp, int seg)
return !*rsclp->tails[seg]; return !*rsclp->tails[seg];
} }
/*
* Interim function to return rcu_segcblist head pointer. Longer term, the
* rcu_segcblist will be used more pervasively, removing the need for this
* function.
*/
static inline struct rcu_head *rcu_segcblist_head(struct rcu_segcblist *rsclp)
{
return rsclp->head;
}
/*
* Interim function to return rcu_segcblist head pointer. Longer term, the
* rcu_segcblist will be used more pervasively, removing the need for this
* function.
*/
static inline struct rcu_head **rcu_segcblist_tail(struct rcu_segcblist *rsclp)
{
WARN_ON_ONCE(rcu_segcblist_empty(rsclp));
return rsclp->tails[RCU_NEXT_TAIL];
}
void rcu_segcblist_init(struct rcu_segcblist *rsclp); void rcu_segcblist_init(struct rcu_segcblist *rsclp);
void rcu_segcblist_disable(struct rcu_segcblist *rsclp); void rcu_segcblist_disable(struct rcu_segcblist *rsclp);
bool rcu_segcblist_ready_cbs(struct rcu_segcblist *rsclp); bool rcu_segcblist_ready_cbs(struct rcu_segcblist *rsclp);
......
...@@ -89,7 +89,7 @@ torture_param(int, writer_holdoff, 0, "Holdoff (us) between GPs, zero to disable ...@@ -89,7 +89,7 @@ torture_param(int, writer_holdoff, 0, "Holdoff (us) between GPs, zero to disable
static char *perf_type = "rcu"; static char *perf_type = "rcu";
module_param(perf_type, charp, 0444); module_param(perf_type, charp, 0444);
MODULE_PARM_DESC(perf_type, "Type of RCU to performance-test (rcu, rcu_bh, ...)"); MODULE_PARM_DESC(perf_type, "Type of RCU to performance-test (rcu, srcu, ...)");
static int nrealreaders; static int nrealreaders;
static int nrealwriters; static int nrealwriters;
...@@ -375,6 +375,14 @@ rcu_perf_writer(void *arg) ...@@ -375,6 +375,14 @@ rcu_perf_writer(void *arg)
if (holdoff) if (holdoff)
schedule_timeout_uninterruptible(holdoff * HZ); schedule_timeout_uninterruptible(holdoff * HZ);
/*
* Wait until rcu_end_inkernel_boot() is called for normal GP tests
* so that RCU is not always expedited for normal GP tests.
* The system_state test is approximate, but works well in practice.
*/
while (!gp_exp && system_state != SYSTEM_RUNNING)
schedule_timeout_uninterruptible(1);
t = ktime_get_mono_fast_ns(); t = ktime_get_mono_fast_ns();
if (atomic_inc_return(&n_rcu_perf_writer_started) >= nrealwriters) { if (atomic_inc_return(&n_rcu_perf_writer_started) >= nrealwriters) {
t_rcu_perf_writer_started = t; t_rcu_perf_writer_started = t;
......
...@@ -161,6 +161,7 @@ static atomic_long_t n_rcu_torture_timers; ...@@ -161,6 +161,7 @@ static atomic_long_t n_rcu_torture_timers;
static long n_barrier_attempts; static long n_barrier_attempts;
static long n_barrier_successes; /* did rcu_barrier test succeed? */ static long n_barrier_successes; /* did rcu_barrier test succeed? */
static struct list_head rcu_torture_removed; static struct list_head rcu_torture_removed;
static unsigned long shutdown_jiffies;
static int rcu_torture_writer_state; static int rcu_torture_writer_state;
#define RTWS_FIXED_DELAY 0 #define RTWS_FIXED_DELAY 0
...@@ -228,6 +229,15 @@ static u64 notrace rcu_trace_clock_local(void) ...@@ -228,6 +229,15 @@ static u64 notrace rcu_trace_clock_local(void)
} }
#endif /* #else #ifdef CONFIG_RCU_TRACE */ #endif /* #else #ifdef CONFIG_RCU_TRACE */
/*
* Stop aggressive CPU-hog tests a bit before the end of the test in order
* to avoid interfering with test shutdown.
*/
static bool shutdown_time_arrived(void)
{
return shutdown_secs && time_after(jiffies, shutdown_jiffies - 30 * HZ);
}
static unsigned long boost_starttime; /* jiffies of next boost test start. */ static unsigned long boost_starttime; /* jiffies of next boost test start. */
static DEFINE_MUTEX(boost_mutex); /* protect setting boost_starttime */ static DEFINE_MUTEX(boost_mutex); /* protect setting boost_starttime */
/* and boost task create/destroy. */ /* and boost task create/destroy. */
...@@ -1713,12 +1723,14 @@ static void rcu_torture_fwd_cb_cr(struct rcu_head *rhp) ...@@ -1713,12 +1723,14 @@ static void rcu_torture_fwd_cb_cr(struct rcu_head *rhp)
} }
// Give the scheduler a chance, even on nohz_full CPUs. // Give the scheduler a chance, even on nohz_full CPUs.
static void rcu_torture_fwd_prog_cond_resched(void) static void rcu_torture_fwd_prog_cond_resched(unsigned long iter)
{ {
if (IS_ENABLED(CONFIG_PREEMPT) && IS_ENABLED(CONFIG_NO_HZ_FULL)) { if (IS_ENABLED(CONFIG_PREEMPT) && IS_ENABLED(CONFIG_NO_HZ_FULL)) {
if (need_resched()) // Real call_rcu() floods hit userspace, so emulate that.
if (need_resched() || (iter & 0xfff))
schedule(); schedule();
} else { } else {
// No userspace emulation: CB invocation throttles call_rcu()
cond_resched(); cond_resched();
} }
} }
...@@ -1746,7 +1758,7 @@ static unsigned long rcu_torture_fwd_prog_cbfree(void) ...@@ -1746,7 +1758,7 @@ static unsigned long rcu_torture_fwd_prog_cbfree(void)
spin_unlock_irqrestore(&rcu_fwd_lock, flags); spin_unlock_irqrestore(&rcu_fwd_lock, flags);
kfree(rfcp); kfree(rfcp);
freed++; freed++;
rcu_torture_fwd_prog_cond_resched(); rcu_torture_fwd_prog_cond_resched(freed);
} }
return freed; return freed;
} }
...@@ -1785,15 +1797,17 @@ static void rcu_torture_fwd_prog_nr(int *tested, int *tested_tries) ...@@ -1785,15 +1797,17 @@ static void rcu_torture_fwd_prog_nr(int *tested, int *tested_tries)
WRITE_ONCE(rcu_fwd_startat, jiffies); WRITE_ONCE(rcu_fwd_startat, jiffies);
stopat = rcu_fwd_startat + dur; stopat = rcu_fwd_startat + dur;
while (time_before(jiffies, stopat) && while (time_before(jiffies, stopat) &&
!shutdown_time_arrived() &&
!READ_ONCE(rcu_fwd_emergency_stop) && !torture_must_stop()) { !READ_ONCE(rcu_fwd_emergency_stop) && !torture_must_stop()) {
idx = cur_ops->readlock(); idx = cur_ops->readlock();
udelay(10); udelay(10);
cur_ops->readunlock(idx); cur_ops->readunlock(idx);
if (!fwd_progress_need_resched || need_resched()) if (!fwd_progress_need_resched || need_resched())
rcu_torture_fwd_prog_cond_resched(); rcu_torture_fwd_prog_cond_resched(1);
} }
(*tested_tries)++; (*tested_tries)++;
if (!time_before(jiffies, stopat) && if (!time_before(jiffies, stopat) &&
!shutdown_time_arrived() &&
!READ_ONCE(rcu_fwd_emergency_stop) && !torture_must_stop()) { !READ_ONCE(rcu_fwd_emergency_stop) && !torture_must_stop()) {
(*tested)++; (*tested)++;
cver = READ_ONCE(rcu_torture_current_version) - cver; cver = READ_ONCE(rcu_torture_current_version) - cver;
...@@ -1852,6 +1866,7 @@ static void rcu_torture_fwd_prog_cr(void) ...@@ -1852,6 +1866,7 @@ static void rcu_torture_fwd_prog_cr(void)
gps = cur_ops->get_gp_seq(); gps = cur_ops->get_gp_seq();
rcu_launder_gp_seq_start = gps; rcu_launder_gp_seq_start = gps;
while (time_before(jiffies, stopat) && while (time_before(jiffies, stopat) &&
!shutdown_time_arrived() &&
!READ_ONCE(rcu_fwd_emergency_stop) && !torture_must_stop()) { !READ_ONCE(rcu_fwd_emergency_stop) && !torture_must_stop()) {
rfcp = READ_ONCE(rcu_fwd_cb_head); rfcp = READ_ONCE(rcu_fwd_cb_head);
rfcpn = NULL; rfcpn = NULL;
...@@ -1875,7 +1890,7 @@ static void rcu_torture_fwd_prog_cr(void) ...@@ -1875,7 +1890,7 @@ static void rcu_torture_fwd_prog_cr(void)
rfcp->rfc_gps = 0; rfcp->rfc_gps = 0;
} }
cur_ops->call(&rfcp->rh, rcu_torture_fwd_cb_cr); cur_ops->call(&rfcp->rh, rcu_torture_fwd_cb_cr);
rcu_torture_fwd_prog_cond_resched(); rcu_torture_fwd_prog_cond_resched(n_launders + n_max_cbs);
} }
stoppedat = jiffies; stoppedat = jiffies;
n_launders_cb_snap = READ_ONCE(n_launders_cb); n_launders_cb_snap = READ_ONCE(n_launders_cb);
...@@ -1884,7 +1899,8 @@ static void rcu_torture_fwd_prog_cr(void) ...@@ -1884,7 +1899,8 @@ static void rcu_torture_fwd_prog_cr(void)
cur_ops->cb_barrier(); /* Wait for callbacks to be invoked. */ cur_ops->cb_barrier(); /* Wait for callbacks to be invoked. */
(void)rcu_torture_fwd_prog_cbfree(); (void)rcu_torture_fwd_prog_cbfree();
if (!torture_must_stop() && !READ_ONCE(rcu_fwd_emergency_stop)) { if (!torture_must_stop() && !READ_ONCE(rcu_fwd_emergency_stop) &&
!shutdown_time_arrived()) {
WARN_ON(n_max_gps < MIN_FWD_CBS_LAUNDERED); WARN_ON(n_max_gps < MIN_FWD_CBS_LAUNDERED);
pr_alert("%s Duration %lu barrier: %lu pending %ld n_launders: %ld n_launders_sa: %ld n_max_gps: %ld n_max_cbs: %ld cver %ld gps %ld\n", pr_alert("%s Duration %lu barrier: %lu pending %ld n_launders: %ld n_launders_sa: %ld n_max_gps: %ld n_max_cbs: %ld cver %ld gps %ld\n",
__func__, __func__,
...@@ -2465,6 +2481,7 @@ rcu_torture_init(void) ...@@ -2465,6 +2481,7 @@ rcu_torture_init(void)
goto unwind; goto unwind;
rcutor_hp = firsterr; rcutor_hp = firsterr;
} }
shutdown_jiffies = jiffies + shutdown_secs * HZ;
firsterr = torture_shutdown_init(shutdown_secs, rcu_torture_cleanup); firsterr = torture_shutdown_init(shutdown_secs, rcu_torture_cleanup);
if (firsterr) if (firsterr)
goto unwind; goto unwind;
......
...@@ -1279,8 +1279,9 @@ void srcu_torture_stats_print(struct srcu_struct *ssp, char *tt, char *tf) ...@@ -1279,8 +1279,9 @@ void srcu_torture_stats_print(struct srcu_struct *ssp, char *tt, char *tf)
c0 = l0 - u0; c0 = l0 - u0;
c1 = l1 - u1; c1 = l1 - u1;
pr_cont(" %d(%ld,%ld %1p)", pr_cont(" %d(%ld,%ld %c)",
cpu, c0, c1, rcu_segcblist_head(&sdp->srcu_cblist)); cpu, c0, c1,
"C."[rcu_segcblist_empty(&sdp->srcu_cblist)]);
s0 += c0; s0 += c0;
s1 += c1; s1 += c1;
} }
......
...@@ -781,7 +781,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp) ...@@ -781,7 +781,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
* other hand, if the CPU is not in an RCU read-side critical section, * other hand, if the CPU is not in an RCU read-side critical section,
* the IPI handler reports the quiescent state immediately. * the IPI handler reports the quiescent state immediately.
* *
* Although this is a greate improvement over previous expedited * Although this is a great improvement over previous expedited
* implementations, it is still unfriendly to real-time workloads, so is * implementations, it is still unfriendly to real-time workloads, so is
* thus not recommended for any sort of common-case code. In fact, if * thus not recommended for any sort of common-case code. In fact, if
* you are using synchronize_rcu_expedited() in a loop, please restructure * you are using synchronize_rcu_expedited() in a loop, please restructure
...@@ -792,6 +792,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp) ...@@ -792,6 +792,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
*/ */
void synchronize_rcu_expedited(void) void synchronize_rcu_expedited(void)
{ {
bool boottime = (rcu_scheduler_active == RCU_SCHEDULER_INIT);
struct rcu_exp_work rew; struct rcu_exp_work rew;
struct rcu_node *rnp; struct rcu_node *rnp;
unsigned long s; unsigned long s;
...@@ -817,7 +818,7 @@ void synchronize_rcu_expedited(void) ...@@ -817,7 +818,7 @@ void synchronize_rcu_expedited(void)
return; /* Someone else did our work for us. */ return; /* Someone else did our work for us. */
/* Ensure that load happens before action based on it. */ /* Ensure that load happens before action based on it. */
if (unlikely(rcu_scheduler_active == RCU_SCHEDULER_INIT)) { if (unlikely(boottime)) {
/* Direct call during scheduler init and early_initcalls(). */ /* Direct call during scheduler init and early_initcalls(). */
rcu_exp_sel_wait_wake(s); rcu_exp_sel_wait_wake(s);
} else { } else {
...@@ -835,5 +836,8 @@ void synchronize_rcu_expedited(void) ...@@ -835,5 +836,8 @@ void synchronize_rcu_expedited(void)
/* Let the next expedited grace period start. */ /* Let the next expedited grace period start. */
mutex_unlock(&rcu_state.exp_mutex); mutex_unlock(&rcu_state.exp_mutex);
if (likely(!boottime))
destroy_work_on_stack(&rew.rew_work);
} }
EXPORT_SYMBOL_GPL(synchronize_rcu_expedited); EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
...@@ -288,7 +288,6 @@ void rcu_note_context_switch(bool preempt) ...@@ -288,7 +288,6 @@ void rcu_note_context_switch(bool preempt)
struct rcu_data *rdp = this_cpu_ptr(&rcu_data); struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
struct rcu_node *rnp; struct rcu_node *rnp;
barrier(); /* Avoid RCU read-side critical sections leaking down. */
trace_rcu_utilization(TPS("Start context switch")); trace_rcu_utilization(TPS("Start context switch"));
lockdep_assert_irqs_disabled(); lockdep_assert_irqs_disabled();
WARN_ON_ONCE(!preempt && t->rcu_read_lock_nesting > 0); WARN_ON_ONCE(!preempt && t->rcu_read_lock_nesting > 0);
...@@ -331,7 +330,6 @@ void rcu_note_context_switch(bool preempt) ...@@ -331,7 +330,6 @@ void rcu_note_context_switch(bool preempt)
if (rdp->exp_deferred_qs) if (rdp->exp_deferred_qs)
rcu_report_exp_rdp(rdp); rcu_report_exp_rdp(rdp);
trace_rcu_utilization(TPS("End context switch")); trace_rcu_utilization(TPS("End context switch"));
barrier(); /* Avoid RCU read-side critical sections leaking up. */
} }
EXPORT_SYMBOL_GPL(rcu_note_context_switch); EXPORT_SYMBOL_GPL(rcu_note_context_switch);
...@@ -815,11 +813,6 @@ static void rcu_qs(void) ...@@ -815,11 +813,6 @@ static void rcu_qs(void)
* dyntick-idle quiescent state visible to other CPUs, which will in * dyntick-idle quiescent state visible to other CPUs, which will in
* some cases serve for expedited as well as normal grace periods. * some cases serve for expedited as well as normal grace periods.
* Either way, register a lightweight quiescent state. * Either way, register a lightweight quiescent state.
*
* The barrier() calls are redundant in the common case when this is
* called externally, but just in case this is called from within this
* file.
*
*/ */
void rcu_all_qs(void) void rcu_all_qs(void)
{ {
...@@ -834,14 +827,12 @@ void rcu_all_qs(void) ...@@ -834,14 +827,12 @@ void rcu_all_qs(void)
return; return;
} }
this_cpu_write(rcu_data.rcu_urgent_qs, false); this_cpu_write(rcu_data.rcu_urgent_qs, false);
barrier(); /* Avoid RCU read-side critical sections leaking down. */
if (unlikely(raw_cpu_read(rcu_data.rcu_need_heavy_qs))) { if (unlikely(raw_cpu_read(rcu_data.rcu_need_heavy_qs))) {
local_irq_save(flags); local_irq_save(flags);
rcu_momentary_dyntick_idle(); rcu_momentary_dyntick_idle();
local_irq_restore(flags); local_irq_restore(flags);
} }
rcu_qs(); rcu_qs();
barrier(); /* Avoid RCU read-side critical sections leaking up. */
preempt_enable(); preempt_enable();
} }
EXPORT_SYMBOL_GPL(rcu_all_qs); EXPORT_SYMBOL_GPL(rcu_all_qs);
...@@ -851,7 +842,6 @@ EXPORT_SYMBOL_GPL(rcu_all_qs); ...@@ -851,7 +842,6 @@ EXPORT_SYMBOL_GPL(rcu_all_qs);
*/ */
void rcu_note_context_switch(bool preempt) void rcu_note_context_switch(bool preempt)
{ {
barrier(); /* Avoid RCU read-side critical sections leaking down. */
trace_rcu_utilization(TPS("Start context switch")); trace_rcu_utilization(TPS("Start context switch"));
rcu_qs(); rcu_qs();
/* Load rcu_urgent_qs before other flags. */ /* Load rcu_urgent_qs before other flags. */
...@@ -864,7 +854,6 @@ void rcu_note_context_switch(bool preempt) ...@@ -864,7 +854,6 @@ void rcu_note_context_switch(bool preempt)
rcu_tasks_qs(current); rcu_tasks_qs(current);
out: out:
trace_rcu_utilization(TPS("End context switch")); trace_rcu_utilization(TPS("End context switch"));
barrier(); /* Avoid RCU read-side critical sections leaking up. */
} }
EXPORT_SYMBOL_GPL(rcu_note_context_switch); EXPORT_SYMBOL_GPL(rcu_note_context_switch);
...@@ -1121,7 +1110,7 @@ static void rcu_preempt_boost_start_gp(struct rcu_node *rnp) ...@@ -1121,7 +1110,7 @@ static void rcu_preempt_boost_start_gp(struct rcu_node *rnp)
* already exist. We only create this kthread for preemptible RCU. * already exist. We only create this kthread for preemptible RCU.
* Returns zero if all is well, a negated errno otherwise. * Returns zero if all is well, a negated errno otherwise.
*/ */
static int rcu_spawn_one_boost_kthread(struct rcu_node *rnp) static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp)
{ {
int rnp_index = rnp - rcu_get_root(); int rnp_index = rnp - rcu_get_root();
unsigned long flags; unsigned long flags;
...@@ -1129,25 +1118,27 @@ static int rcu_spawn_one_boost_kthread(struct rcu_node *rnp) ...@@ -1129,25 +1118,27 @@ static int rcu_spawn_one_boost_kthread(struct rcu_node *rnp)
struct task_struct *t; struct task_struct *t;
if (!IS_ENABLED(CONFIG_PREEMPT_RCU)) if (!IS_ENABLED(CONFIG_PREEMPT_RCU))
return 0; return;
if (!rcu_scheduler_fully_active || rcu_rnp_online_cpus(rnp) == 0) if (!rcu_scheduler_fully_active || rcu_rnp_online_cpus(rnp) == 0)
return 0; return;
rcu_state.boost = 1; rcu_state.boost = 1;
if (rnp->boost_kthread_task != NULL) if (rnp->boost_kthread_task != NULL)
return 0; return;
t = kthread_create(rcu_boost_kthread, (void *)rnp, t = kthread_create(rcu_boost_kthread, (void *)rnp,
"rcub/%d", rnp_index); "rcub/%d", rnp_index);
if (IS_ERR(t)) if (WARN_ON_ONCE(IS_ERR(t)))
return PTR_ERR(t); return;
raw_spin_lock_irqsave_rcu_node(rnp, flags); raw_spin_lock_irqsave_rcu_node(rnp, flags);
rnp->boost_kthread_task = t; rnp->boost_kthread_task = t;
raw_spin_unlock_irqrestore_rcu_node(rnp, flags); raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
sp.sched_priority = kthread_prio; sp.sched_priority = kthread_prio;
sched_setscheduler_nocheck(t, SCHED_FIFO, &sp); sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
wake_up_process(t); /* get to TASK_INTERRUPTIBLE quickly. */ wake_up_process(t); /* get to TASK_INTERRUPTIBLE quickly. */
return 0;
} }
/* /*
...@@ -1188,7 +1179,7 @@ static void __init rcu_spawn_boost_kthreads(void) ...@@ -1188,7 +1179,7 @@ static void __init rcu_spawn_boost_kthreads(void)
struct rcu_node *rnp; struct rcu_node *rnp;
rcu_for_each_leaf_node(rnp) rcu_for_each_leaf_node(rnp)
(void)rcu_spawn_one_boost_kthread(rnp); rcu_spawn_one_boost_kthread(rnp);
} }
static void rcu_prepare_kthreads(int cpu) static void rcu_prepare_kthreads(int cpu)
...@@ -1198,7 +1189,7 @@ static void rcu_prepare_kthreads(int cpu) ...@@ -1198,7 +1189,7 @@ static void rcu_prepare_kthreads(int cpu)
/* Fire up the incoming CPU's kthread and leaf rcu_node kthread. */ /* Fire up the incoming CPU's kthread and leaf rcu_node kthread. */
if (rcu_scheduler_fully_active) if (rcu_scheduler_fully_active)
(void)rcu_spawn_one_boost_kthread(rnp); rcu_spawn_one_boost_kthread(rnp);
} }
#else /* #ifdef CONFIG_RCU_BOOST */ #else /* #ifdef CONFIG_RCU_BOOST */
......
...@@ -527,6 +527,8 @@ static void check_cpu_stall(struct rcu_data *rdp) ...@@ -527,6 +527,8 @@ static void check_cpu_stall(struct rcu_data *rdp)
/* We haven't checked in, so go dump stack. */ /* We haven't checked in, so go dump stack. */
print_cpu_stall(); print_cpu_stall();
if (rcu_cpu_stall_ftrace_dump)
rcu_ftrace_dump(DUMP_ALL);
} else if (rcu_gp_in_progress() && } else if (rcu_gp_in_progress() &&
ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) && ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) &&
...@@ -534,6 +536,8 @@ static void check_cpu_stall(struct rcu_data *rdp) ...@@ -534,6 +536,8 @@ static void check_cpu_stall(struct rcu_data *rdp)
/* They had a few time units to dump stack, so complain. */ /* They had a few time units to dump stack, so complain. */
print_other_cpu_stall(gs2); print_other_cpu_stall(gs2);
if (rcu_cpu_stall_ftrace_dump)
rcu_ftrace_dump(DUMP_ALL);
} }
} }
......
...@@ -61,9 +61,15 @@ module_param(rcu_normal_after_boot, int, 0); ...@@ -61,9 +61,15 @@ module_param(rcu_normal_after_boot, int, 0);
#ifdef CONFIG_DEBUG_LOCK_ALLOC #ifdef CONFIG_DEBUG_LOCK_ALLOC
/** /**
* rcu_read_lock_sched_held() - might we be in RCU-sched read-side critical section? * rcu_read_lock_held_common() - might we be in RCU-sched read-side critical section?
* @ret: Best guess answer if lockdep cannot be relied on
* *
* If CONFIG_DEBUG_LOCK_ALLOC is selected, returns nonzero iff in an * Returns true if lockdep must be ignored, in which case *ret contains
* the best guess described below. Otherwise returns false, in which
* case *ret tells the caller nothing and the caller should instead
* consult lockdep.
*
* If CONFIG_DEBUG_LOCK_ALLOC is selected, set *ret to nonzero iff in an
* RCU-sched read-side critical section. In absence of * RCU-sched read-side critical section. In absence of
* CONFIG_DEBUG_LOCK_ALLOC, this assumes we are in an RCU-sched read-side * CONFIG_DEBUG_LOCK_ALLOC, this assumes we are in an RCU-sched read-side
* critical section unless it can prove otherwise. Note that disabling * critical section unless it can prove otherwise. Note that disabling
...@@ -75,35 +81,45 @@ module_param(rcu_normal_after_boot, int, 0); ...@@ -75,35 +81,45 @@ module_param(rcu_normal_after_boot, int, 0);
* Check debug_lockdep_rcu_enabled() to prevent false positives during boot * Check debug_lockdep_rcu_enabled() to prevent false positives during boot
* and while lockdep is disabled. * and while lockdep is disabled.
* *
* Note that if the CPU is in the idle loop from an RCU point of * Note that if the CPU is in the idle loop from an RCU point of view (ie:
* view (ie: that we are in the section between rcu_idle_enter() and * that we are in the section between rcu_idle_enter() and rcu_idle_exit())
* rcu_idle_exit()) then rcu_read_lock_held() returns false even if the CPU * then rcu_read_lock_held() sets *ret to false even if the CPU did an
* did an rcu_read_lock(). The reason for this is that RCU ignores CPUs * rcu_read_lock(). The reason for this is that RCU ignores CPUs that are
* that are in such a section, considering these as in extended quiescent * in such a section, considering these as in extended quiescent state,
* state, so such a CPU is effectively never in an RCU read-side critical * so such a CPU is effectively never in an RCU read-side critical section
* section regardless of what RCU primitives it invokes. This state of * regardless of what RCU primitives it invokes. This state of affairs is
* affairs is required --- we need to keep an RCU-free window in idle * required --- we need to keep an RCU-free window in idle where the CPU may
* where the CPU may possibly enter into low power mode. This way we can * possibly enter into low power mode. This way we can notice an extended
* notice an extended quiescent state to other CPUs that started a grace * quiescent state to other CPUs that started a grace period. Otherwise
* period. Otherwise we would delay any grace period as long as we run in * we would delay any grace period as long as we run in the idle task.
* the idle task. *
* * Similarly, we avoid claiming an RCU read lock held if the current
* Similarly, we avoid claiming an SRCU read lock held if the current
* CPU is offline. * CPU is offline.
*/ */
static bool rcu_read_lock_held_common(bool *ret)
{
if (!debug_lockdep_rcu_enabled()) {
*ret = 1;
return true;
}
if (!rcu_is_watching()) {
*ret = 0;
return true;
}
if (!rcu_lockdep_current_cpu_online()) {
*ret = 0;
return true;
}
return false;
}
int rcu_read_lock_sched_held(void) int rcu_read_lock_sched_held(void)
{ {
int lockdep_opinion = 0; bool ret;
if (!debug_lockdep_rcu_enabled()) if (rcu_read_lock_held_common(&ret))
return 1; return ret;
if (!rcu_is_watching()) return lock_is_held(&rcu_sched_lock_map) || !preemptible();
return 0;
if (!rcu_lockdep_current_cpu_online())
return 0;
if (debug_locks)
lockdep_opinion = lock_is_held(&rcu_sched_lock_map);
return lockdep_opinion || !preemptible();
} }
EXPORT_SYMBOL(rcu_read_lock_sched_held); EXPORT_SYMBOL(rcu_read_lock_sched_held);
#endif #endif
...@@ -136,8 +152,7 @@ static atomic_t rcu_expedited_nesting = ATOMIC_INIT(1); ...@@ -136,8 +152,7 @@ static atomic_t rcu_expedited_nesting = ATOMIC_INIT(1);
*/ */
bool rcu_gp_is_expedited(void) bool rcu_gp_is_expedited(void)
{ {
return rcu_expedited || atomic_read(&rcu_expedited_nesting) || return rcu_expedited || atomic_read(&rcu_expedited_nesting);
rcu_scheduler_active == RCU_SCHEDULER_INIT;
} }
EXPORT_SYMBOL_GPL(rcu_gp_is_expedited); EXPORT_SYMBOL_GPL(rcu_gp_is_expedited);
...@@ -261,12 +276,10 @@ NOKPROBE_SYMBOL(debug_lockdep_rcu_enabled); ...@@ -261,12 +276,10 @@ NOKPROBE_SYMBOL(debug_lockdep_rcu_enabled);
*/ */
int rcu_read_lock_held(void) int rcu_read_lock_held(void)
{ {
if (!debug_lockdep_rcu_enabled()) bool ret;
return 1;
if (!rcu_is_watching()) if (rcu_read_lock_held_common(&ret))
return 0; return ret;
if (!rcu_lockdep_current_cpu_online())
return 0;
return lock_is_held(&rcu_lock_map); return lock_is_held(&rcu_lock_map);
} }
EXPORT_SYMBOL_GPL(rcu_read_lock_held); EXPORT_SYMBOL_GPL(rcu_read_lock_held);
...@@ -288,16 +301,28 @@ EXPORT_SYMBOL_GPL(rcu_read_lock_held); ...@@ -288,16 +301,28 @@ EXPORT_SYMBOL_GPL(rcu_read_lock_held);
*/ */
int rcu_read_lock_bh_held(void) int rcu_read_lock_bh_held(void)
{ {
if (!debug_lockdep_rcu_enabled()) bool ret;
return 1;
if (!rcu_is_watching()) if (rcu_read_lock_held_common(&ret))
return 0; return ret;
if (!rcu_lockdep_current_cpu_online())
return 0;
return in_softirq() || irqs_disabled(); return in_softirq() || irqs_disabled();
} }
EXPORT_SYMBOL_GPL(rcu_read_lock_bh_held); EXPORT_SYMBOL_GPL(rcu_read_lock_bh_held);
int rcu_read_lock_any_held(void)
{
bool ret;
if (rcu_read_lock_held_common(&ret))
return ret;
if (lock_is_held(&rcu_lock_map) ||
lock_is_held(&rcu_bh_lock_map) ||
lock_is_held(&rcu_sched_lock_map))
return 1;
return !preemptible();
}
EXPORT_SYMBOL_GPL(rcu_read_lock_any_held);
#endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */ #endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
/** /**
...@@ -437,6 +462,8 @@ EXPORT_SYMBOL_GPL(rcutorture_sched_setaffinity); ...@@ -437,6 +462,8 @@ EXPORT_SYMBOL_GPL(rcutorture_sched_setaffinity);
#endif #endif
#ifdef CONFIG_RCU_STALL_COMMON #ifdef CONFIG_RCU_STALL_COMMON
int rcu_cpu_stall_ftrace_dump __read_mostly;
module_param(rcu_cpu_stall_ftrace_dump, int, 0644);
int rcu_cpu_stall_suppress __read_mostly; /* 1 = suppress stall warnings. */ int rcu_cpu_stall_suppress __read_mostly; /* 1 = suppress stall warnings. */
EXPORT_SYMBOL_GPL(rcu_cpu_stall_suppress); EXPORT_SYMBOL_GPL(rcu_cpu_stall_suppress);
module_param(rcu_cpu_stall_suppress, int, 0644); module_param(rcu_cpu_stall_suppress, int, 0644);
......
...@@ -3486,8 +3486,36 @@ void scheduler_tick(void) ...@@ -3486,8 +3486,36 @@ void scheduler_tick(void)
struct tick_work { struct tick_work {
int cpu; int cpu;
atomic_t state;
struct delayed_work work; struct delayed_work work;
}; };
/* Values for ->state, see diagram below. */
#define TICK_SCHED_REMOTE_OFFLINE 0
#define TICK_SCHED_REMOTE_OFFLINING 1
#define TICK_SCHED_REMOTE_RUNNING 2
/*
* State diagram for ->state:
*
*
* TICK_SCHED_REMOTE_OFFLINE
* | ^
* | |
* | | sched_tick_remote()
* | |
* | |
* +--TICK_SCHED_REMOTE_OFFLINING
* | ^
* | |
* sched_tick_start() | | sched_tick_stop()
* | |
* V |
* TICK_SCHED_REMOTE_RUNNING
*
*
* Other transitions get WARN_ON_ONCE(), except that sched_tick_remote()
* and sched_tick_start() are happy to leave the state in RUNNING.
*/
static struct tick_work __percpu *tick_work_cpu; static struct tick_work __percpu *tick_work_cpu;
...@@ -3500,6 +3528,7 @@ static void sched_tick_remote(struct work_struct *work) ...@@ -3500,6 +3528,7 @@ static void sched_tick_remote(struct work_struct *work)
struct task_struct *curr; struct task_struct *curr;
struct rq_flags rf; struct rq_flags rf;
u64 delta; u64 delta;
int os;
/* /*
* Handle the tick only if it appears the remote CPU is running in full * Handle the tick only if it appears the remote CPU is running in full
...@@ -3513,7 +3542,7 @@ static void sched_tick_remote(struct work_struct *work) ...@@ -3513,7 +3542,7 @@ static void sched_tick_remote(struct work_struct *work)
rq_lock_irq(rq, &rf); rq_lock_irq(rq, &rf);
curr = rq->curr; curr = rq->curr;
if (is_idle_task(curr)) if (is_idle_task(curr) || cpu_is_offline(cpu))
goto out_unlock; goto out_unlock;
update_rq_clock(rq); update_rq_clock(rq);
...@@ -3533,13 +3562,18 @@ static void sched_tick_remote(struct work_struct *work) ...@@ -3533,13 +3562,18 @@ static void sched_tick_remote(struct work_struct *work)
/* /*
* Run the remote tick once per second (1Hz). This arbitrary * Run the remote tick once per second (1Hz). This arbitrary
* frequency is large enough to avoid overload but short enough * frequency is large enough to avoid overload but short enough
* to keep scheduler internal stats reasonably up to date. * to keep scheduler internal stats reasonably up to date. But
* first update state to reflect hotplug activity if required.
*/ */
os = atomic_fetch_add_unless(&twork->state, -1, TICK_SCHED_REMOTE_RUNNING);
WARN_ON_ONCE(os == TICK_SCHED_REMOTE_OFFLINE);
if (os == TICK_SCHED_REMOTE_RUNNING)
queue_delayed_work(system_unbound_wq, dwork, HZ); queue_delayed_work(system_unbound_wq, dwork, HZ);
} }
static void sched_tick_start(int cpu) static void sched_tick_start(int cpu)
{ {
int os;
struct tick_work *twork; struct tick_work *twork;
if (housekeeping_cpu(cpu, HK_FLAG_TICK)) if (housekeeping_cpu(cpu, HK_FLAG_TICK))
...@@ -3548,15 +3582,20 @@ static void sched_tick_start(int cpu) ...@@ -3548,15 +3582,20 @@ static void sched_tick_start(int cpu)
WARN_ON_ONCE(!tick_work_cpu); WARN_ON_ONCE(!tick_work_cpu);
twork = per_cpu_ptr(tick_work_cpu, cpu); twork = per_cpu_ptr(tick_work_cpu, cpu);
os = atomic_xchg(&twork->state, TICK_SCHED_REMOTE_RUNNING);
WARN_ON_ONCE(os == TICK_SCHED_REMOTE_RUNNING);
if (os == TICK_SCHED_REMOTE_OFFLINE) {
twork->cpu = cpu; twork->cpu = cpu;
INIT_DELAYED_WORK(&twork->work, sched_tick_remote); INIT_DELAYED_WORK(&twork->work, sched_tick_remote);
queue_delayed_work(system_unbound_wq, &twork->work, HZ); queue_delayed_work(system_unbound_wq, &twork->work, HZ);
}
} }
#ifdef CONFIG_HOTPLUG_CPU #ifdef CONFIG_HOTPLUG_CPU
static void sched_tick_stop(int cpu) static void sched_tick_stop(int cpu)
{ {
struct tick_work *twork; struct tick_work *twork;
int os;
if (housekeeping_cpu(cpu, HK_FLAG_TICK)) if (housekeeping_cpu(cpu, HK_FLAG_TICK))
return; return;
...@@ -3564,7 +3603,10 @@ static void sched_tick_stop(int cpu) ...@@ -3564,7 +3603,10 @@ static void sched_tick_stop(int cpu)
WARN_ON_ONCE(!tick_work_cpu); WARN_ON_ONCE(!tick_work_cpu);
twork = per_cpu_ptr(tick_work_cpu, cpu); twork = per_cpu_ptr(tick_work_cpu, cpu);
cancel_delayed_work_sync(&twork->work); /* There cannot be competing actions, but don't rely on stop-machine. */
os = atomic_xchg(&twork->state, TICK_SCHED_REMOTE_OFFLINING);
WARN_ON_ONCE(os != TICK_SCHED_REMOTE_RUNNING);
/* Don't cancel, as this would mess up the state machine. */
} }
#endif /* CONFIG_HOTPLUG_CPU */ #endif /* CONFIG_HOTPLUG_CPU */
...@@ -3572,7 +3614,6 @@ int __init sched_tick_offload_init(void) ...@@ -3572,7 +3614,6 @@ int __init sched_tick_offload_init(void)
{ {
tick_work_cpu = alloc_percpu(struct tick_work); tick_work_cpu = alloc_percpu(struct tick_work);
BUG_ON(!tick_work_cpu); BUG_ON(!tick_work_cpu);
return 0; return 0;
} }
......
...@@ -241,13 +241,14 @@ static void do_idle(void) ...@@ -241,13 +241,14 @@ static void do_idle(void)
check_pgt_cache(); check_pgt_cache();
rmb(); rmb();
local_irq_disable();
if (cpu_is_offline(cpu)) { if (cpu_is_offline(cpu)) {
tick_nohz_idle_stop_tick_protected(); tick_nohz_idle_stop_tick();
cpuhp_report_idle_dead(); cpuhp_report_idle_dead();
arch_cpu_idle_dead(); arch_cpu_idle_dead();
} }
local_irq_disable();
arch_cpu_idle_enter(); arch_cpu_idle_enter();
/* /*
......
...@@ -263,7 +263,6 @@ static void torture_onoff_cleanup(void) ...@@ -263,7 +263,6 @@ static void torture_onoff_cleanup(void)
onoff_task = NULL; onoff_task = NULL;
#endif /* #ifdef CONFIG_HOTPLUG_CPU */ #endif /* #ifdef CONFIG_HOTPLUG_CPU */
} }
EXPORT_SYMBOL_GPL(torture_onoff_cleanup);
/* /*
* Print online/offline testing statistics. * Print online/offline testing statistics.
...@@ -449,7 +448,6 @@ static void torture_shuffle_cleanup(void) ...@@ -449,7 +448,6 @@ static void torture_shuffle_cleanup(void)
} }
shuffler_task = NULL; shuffler_task = NULL;
} }
EXPORT_SYMBOL_GPL(torture_shuffle_cleanup);
/* /*
* Variables for auto-shutdown. This allows "lights out" torture runs * Variables for auto-shutdown. This allows "lights out" torture runs
......
...@@ -6,22 +6,22 @@ ...@@ -6,22 +6,22 @@
/* /*
* Traverse the ftrace_global_list, invoking all entries. The reason that we * Traverse the ftrace_global_list, invoking all entries. The reason that we
* can use rcu_dereference_raw_notrace() is that elements removed from this list * can use rcu_dereference_raw_check() is that elements removed from this list
* are simply leaked, so there is no need to interact with a grace-period * are simply leaked, so there is no need to interact with a grace-period
* mechanism. The rcu_dereference_raw_notrace() calls are needed to handle * mechanism. The rcu_dereference_raw_check() calls are needed to handle
* concurrent insertions into the ftrace_global_list. * concurrent insertions into the ftrace_global_list.
* *
* Silly Alpha and silly pointer-speculation compiler optimizations! * Silly Alpha and silly pointer-speculation compiler optimizations!
*/ */
#define do_for_each_ftrace_op(op, list) \ #define do_for_each_ftrace_op(op, list) \
op = rcu_dereference_raw_notrace(list); \ op = rcu_dereference_raw_check(list); \
do do
/* /*
* Optimized for just a single item in the list (as that is the normal case). * Optimized for just a single item in the list (as that is the normal case).
*/ */
#define while_for_each_ftrace_op(op) \ #define while_for_each_ftrace_op(op) \
while (likely(op = rcu_dereference_raw_notrace((op)->next)) && \ while (likely(op = rcu_dereference_raw_check((op)->next)) && \
unlikely((op) != &ftrace_list_end)) unlikely((op) != &ftrace_list_end))
extern struct ftrace_ops __rcu *ftrace_ops_list; extern struct ftrace_ops __rcu *ftrace_ops_list;
......
...@@ -2642,10 +2642,10 @@ static void ftrace_exports(struct ring_buffer_event *event) ...@@ -2642,10 +2642,10 @@ static void ftrace_exports(struct ring_buffer_event *event)
preempt_disable_notrace(); preempt_disable_notrace();
export = rcu_dereference_raw_notrace(ftrace_exports_list); export = rcu_dereference_raw_check(ftrace_exports_list);
while (export) { while (export) {
trace_process_export(export, event); trace_process_export(export, event);
export = rcu_dereference_raw_notrace(export->next); export = rcu_dereference_raw_check(export->next);
} }
preempt_enable_notrace(); preempt_enable_notrace();
......
...@@ -124,7 +124,8 @@ struct fib_table *fib_get_table(struct net *net, u32 id) ...@@ -124,7 +124,8 @@ struct fib_table *fib_get_table(struct net *net, u32 id)
h = id & (FIB_TABLE_HASHSZ - 1); h = id & (FIB_TABLE_HASHSZ - 1);
head = &net->ipv4.fib_table_hash[h]; head = &net->ipv4.fib_table_hash[h];
hlist_for_each_entry_rcu(tb, head, tb_hlist) { hlist_for_each_entry_rcu(tb, head, tb_hlist,
lockdep_rtnl_is_held()) {
if (tb->tb_id == id) if (tb->tb_id == id)
return tb; return tb;
} }
......
...@@ -227,7 +227,7 @@ then ...@@ -227,7 +227,7 @@ then
must_continue=yes must_continue=yes
fi fi
last_ts="`tail $resdir/console.log | grep '^\[ *[0-9]\+\.[0-9]\+]' | tail -1 | sed -e 's/^\[ *//' -e 's/\..*$//'`" last_ts="`tail $resdir/console.log | grep '^\[ *[0-9]\+\.[0-9]\+]' | tail -1 | sed -e 's/^\[ *//' -e 's/\..*$//'`"
if test -z "last_ts" if test -z "$last_ts"
then then
last_ts=0 last_ts=0
fi fi
......
...@@ -3,3 +3,4 @@ rcutree.gp_preinit_delay=12 ...@@ -3,3 +3,4 @@ rcutree.gp_preinit_delay=12
rcutree.gp_init_delay=3 rcutree.gp_init_delay=3
rcutree.gp_cleanup_delay=3 rcutree.gp_cleanup_delay=3
rcutree.kthread_prio=2 rcutree.kthread_prio=2
threadirqs
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment