- 21 Apr, 2017 5 commits
-
-
Paul E. McKenney authored
doc.2017.04.12a: Documentation updates fixes.2017.04.19a: Miscellaneous fixes srcu.2017.04.21a: Parallelize SRCU callback handling
-
Paul E. McKenney authored
Currently, a call to schedule() acts as a Tasks RCU quiescent state only if a context switch actually takes place. However, just the call to schedule() guarantees that the calling task has moved off of whatever tracing trampoline that it might have been one previously. This commit therefore plumbs schedule()'s "preempt" parameter into rcu_note_context_switch(), which then records the Tasks RCU quiescent state, but only if this call to schedule() was -not- due to a preemption. To avoid adding overhead to the common-case context-switch path, this commit hides the rcu_note_context_switch() check under an existing non-common-case check. Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
Paul E. McKenney authored
Although Tree SRCU does reduce delays when there is at least one synchronize_srcu_expedited() invocation pending, srcu_schedule_cbs_snp() still waits for SRCU_INTERVAL before invoking callbacks. Since synchronize_srcu_expedited() now posts a callback and waits for that callback to do a wakeup, this destroys the expedited nature of synchronize_srcu_expedited(). This destruction became apparent to Marc Zyngier in the guise of a guest-OS bootup slowdown from five seconds to no fewer than forty seconds. This commit therefore invokes callbacks immediately at the end of the grace period when there is at least one synchronize_srcu_expedited() invocation pending. This brought Marc's guest-OS bootup times back into the realm of reason. Reported-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Marc Zyngier <marc.zyngier@arm.com>
-
Paul E. McKenney authored
Peter Zijlstra proposed using SRCU to reduce mmap_sem contention [1,2], however, there are workloads that could result in a high volume of concurrent invocations of call_srcu(), which with current SRCU would result in excessive lock contention on the srcu_struct structure's ->queue_lock, which protects SRCU's callback lists. This commit therefore moves SRCU to per-CPU callback lists, thus greatly reducing contention. Because a given SRCU instance no longer has a single centralized callback list, starting grace periods and invoking callbacks are both more complex than in the single-list Classic SRCU implementation. Starting grace periods and handling callbacks are now handled using an srcu_node tree that is in some ways similar to the rcu_node trees used by RCU-bh, RCU-preempt, and RCU-sched (for example, the srcu_node tree shape is controlled by exactly the same Kconfig options and boot parameters that control the shape of the rcu_node tree). In addition, the old per-CPU srcu_array structure is now named srcu_data and contains an rcu_segcblist structure named ->srcu_cblist for its callbacks (and a spinlock to protect this). The srcu_struct gets an srcu_gp_seq that is used to associate callback segments with the corresponding completion-time grace-period number. These completion-time grace-period numbers are propagated up the srcu_node tree so that the grace-period workqueue handler can determine whether additional grace periods are needed on the one hand and where to look for callbacks that are ready to be invoked. The srcu_barrier() function must now wait on all instances of the per-CPU ->srcu_cblist. Because each ->srcu_cblist is protected by ->lock, srcu_barrier() can remotely add the needed callbacks. In theory, it could also remotely start grace periods, but in practice doing so is complex and racy. And interestingly enough, it is never necessary for srcu_barrier() to start a grace period because srcu_barrier() only enqueues a callback when a callback is already present--and it turns out that a grace period has to have already been started for this pre-existing callback. Furthermore, it is only the callback that srcu_barrier() needs to wait on, not any particular grace period. Therefore, a new rcu_segcblist_entrain() function enqueues the srcu_barrier() function's callback into the same segment occupied by the last pre-existing callback in the list. The special case where all the pre-existing callbacks are on a different list (because they are in the process of being invoked) is handled by enqueuing srcu_barrier()'s callback into the RCU_DONE_TAIL segment, relying on the done-callbacks check that takes place after all callbacks are inovked. Note that the readers use the same algorithm as before. Note that there is a separate srcu_idx that tells the readers what counter to increment. This unfortunately cannot be combined with srcu_gp_seq because they need to be incremented at different times. This commit introduces some ugly #ifdefs in rcutorture. These will go away when I feel good enough about Tree SRCU to ditch Classic SRCU. Some crude performance comparisons, courtesy of a quickly hacked rcuperf asynchronous-grace-period capability: Callback Queuing Overhead ------------------------- # CPUS Classic SRCU Tree SRCU ------ ------------ --------- 2 0.349 us 0.342 us 16 31.66 us 0.4 us 41 --------- 0.417 us The times are the 90th percentiles, a statistic that was chosen to reject the overheads of the occasional srcu_barrier() call needed to avoid OOMing the test machine. The rcuperf test hangs when running Classic SRCU at 41 CPUs, hence the line of dashes. Despite the hacks to both the rcuperf code and that statistics, this is a convincing demonstration of Tree SRCU's performance and scalability advantages. [1] https://lwn.net/Articles/309030/ [2] https://patchwork.kernel.org/patch/5108281/Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Fix initialization if synchronize_srcu_expedited() called first. ]
-
Paul E. McKenney authored
Parallelizing SRCU callback handling will increase the size of srcu_struct, which will move the kvm structure's kvm_arch field out of reach of powerpc's current assembly code, which will result in the following sort of build error: arch/powerpc/kvm/book3s_hv_rmhandlers.S:617: Error: operand out of range (0x000000000000b328 is not between 0xffffffffffff8000 and 0x0000000000007fff) This commit moves the srcu_struct fields in the kvm structure to follow the kvm_arch field, which will allow powerpc's assembly code to continue to be able to reach the kvm_arch field. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Reported-by: Michael Ellerman <michaele@au1.ibm.com> Reported-by: kbuild test robot <fengguang.wu@intel.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Paolo Bonzini <pbonzini@redhat.com> [ paulmck: Moved this commit to precede SRCU callback parallelization, and reworded the commit log into future tense, all in the name of bisectability. ]
-
- 19 Apr, 2017 10 commits
-
-
Paul E. McKenney authored
This commit just changes a "the the" to "the" to reduce repetition. Reported-by: Michalis Kokologiannakis <mixaskok@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
Nicholas Mc Guire authored
This commit makes the parse_rcu_nocb_poll() function assign true (rather than the constant 1) to the bool variable rcu_nocb_poll. Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
Nicholas Mc Guire authored
The beenonline variable is declared bool so there is no need for an explicit comparison, especially not against the constant zero. Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
Paul E. McKenney authored
Currently, the rcutorture scripting will give an error message if running a duplicate scenario that happens also to have a non-existent build directory (b1, b2, ... in the rcutorture directory). Worse yet, if the build directory has already been created and used for a real build, the script will silently grab the wrong Kconfig fragment, which could cause confusion to the poor sap (me) analyzing old test results. At least the actual test runs correctly... This commit therefore accesses the Kconfig fragment from the results directory corresponding to the first of the duplicate scenarios, for which a build was actually carried out. This prevents both the messages and at least one form of later confusion. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
Paul E. McKenney authored
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
Michael S. Tsirkin authored
sparse is unhappy about this code in hlist_add_tail_rcu: struct hlist_node *i, *last = NULL; for (i = hlist_first_rcu(h); i; i = hlist_next_rcu(i)) last = i; This is because hlist_next_rcu and hlist_next_rcu return __rcu pointers. It's a false positive - it's a write side primitive and so does not need to be called in a read side critical section. The following trivial patch disables the warning without changing the behaviour in any way. Note: __hlist_for_each_rcu would also remove the warning but it would be confusing since it calls rcu_derefence and is designed to run in the rcu read side critical section. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
Paul E. McKenney authored
The rcu_nocb_gp_cleanup() function is now invoked elsewhere, so this commit drags this comment into the year 2017. Reported-by: Michalis Kokologiannakis <mixaskok@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
Paul E. McKenney authored
If you set RCU_FANOUT_LEAF too high, you can get lock contention on the leaf rcu_node, and you should boot with the skew_tick kernel parameter set in order to avoid this lock contention. This commit therefore upgrades the RCU_FANOUT_LEAF help text to explicitly state this. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
Paul E. McKenney authored
The comment header for callback_head (and thus for rcu_head) states that the bottom two bits of a pointer to these structures must be zero. This is obsolete: The new requirement is that only the bottom bit need be zero. This commit therefore updates this comment. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
Paul E. McKenney authored
This commit changes lockdep splats to begin lines with "WARNING" and to use pr_warn() instead of printk(). This change eases scripted analysis of kernel console output. Reported-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
-
- 18 Apr, 2017 25 commits
-
-
Paul E. McKenney authored
A group of Linux kernel hackers reported chasing a bug that resulted from their assumption that SLAB_DESTROY_BY_RCU provided an existence guarantee, that is, that no block from such a slab would be reallocated during an RCU read-side critical section. Of course, that is not the case. Instead, SLAB_DESTROY_BY_RCU only prevents freeing of an entire slab of blocks. However, there is a phrase for this, namely "type safety". This commit therefore renames SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU in order to avoid future instances of this sort of confusion. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: <linux-mm@kvack.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> [ paulmck: Add comments mentioning the old name, as requested by Eric Dumazet, in order to help people familiar with the old name find the new one. ] Acked-by: David Rientjes <rientjes@google.com>
-
Paul E. McKenney authored
The TREE_SRCU rewrite is large and a bit on the non-simple side, so this commit helps reduce risk by allowing the old v4.11 SRCU algorithm to be selected using a new CLASSIC_SRCU Kconfig option that depends on RCU_EXPERT. The default is to use the new TREE_SRCU and TINY_SRCU algorithms, in order to help get these the testing that they need. However, if your users do not require the update-side scalability that is to be provided by TREE_SRCU, select RCU_EXPERT and then CLASSIC_SRCU to revert back to the old classic SRCU algorithm. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
Paul E. McKenney authored
The srcu_torture_stats() function is adapted to the specific srcu_struct layout traditionally used by SRCU. This commit therefore adds support for Tiny SRCU. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
Paul E. McKenney authored
In response to automated complaints about modifications to SRCU increasing its size, this commit creates a tiny SRCU that is used in SMP=n && PREEMPT=n builds. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
Paul E. McKenney authored
The MM-notifier code currently dynamically initializes the srcu_struct named "srcu" at subsys_initcall() time, and includes a BUG_ON() to check this initialization in do_mmu_notifier_register(). Unfortunately, there is no foolproof way to verify that an srcu_struct has been initialized, given the possibility of an srcu_struct being allocated on the stack or on the heap. This means that creating an srcu_struct_is_initialized() function is not a reasonable course of action. Nor is peppering do_mmu_notifier_register() with SRCU-specific #ifdefs an attractive alternative. This commit therefore uses DEFINE_STATIC_SRCU() to initialize this srcu_struct at compile time, thus eliminating both the subsys_initcall()-time initialization and the runtime BUG_ON(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: <linux-mm@kvack.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Vegard Nossum <vegard.nossum@oracle.com>
-
Paul E. McKenney authored
SRCU's implementation of expedited grace periods has always assumed that the SRCU instance is idle when the expedited request arrives. This commit improves this a bit by maintaining a count of the number of outstanding expedited requests, thus allowing prior non-expedited grace periods accommodate these requests by shifting to expedited mode. However, any non-expedited wait already in progress will still wait for the full duration. Improved control of expedited grace periods is planned, but one step at a time. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
Paul E. McKenney authored
Updating ->srcu_state and ->srcu_gp_seq will lead to extremely complex race conditions given multiple callback queues, so this commit takes advantage of the two-bit state now available in rcu_seq counters to store the state in the bottom two bits of ->srcu_gp_seq. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
Paul E. McKenney authored
This commit increases the number of reserved bits at the bottom of an rcu_seq grace-period counter from one to two, as will be needed to accommodate SRCU's three-state grace periods. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
Paul E. McKenney authored
The expedited grace-period code contains several open-coded shifts know the format of an rcu_seq grace-period counter, which is not particularly good style. This commit therefore creates a new rcu_seq_ctr() function that extracts the counter portion of the counter, and an rcu_seq_state() function that extracts the low-order state bit. This commit prepares for SRCU callback parallelization, which will require two state bits. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
Paul E. McKenney authored
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
Paul E. McKenney authored
This commit makes the num_rcu_lvl[] array external so that SRCU can make use of it for initializing its upcoming srcu_node tree. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
Paul E. McKenney authored
This commit moves rcu_for_each_node_breadth_first(), rcu_for_each_nonleaf_node_breadth_first(), and rcu_for_each_leaf_node() from kernel/rcu/tree.h to kernel/rcu/rcu.h so that SRCU can access them. This commit is code-movement only. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
Paul E. McKenney authored
The levelcnt[] array is identical to num_rcu_lvl[], so this commit removes levelcnt[]. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
Paul E. McKenney authored
This commit moves the rcu_init_levelspread() function from kernel/rcu/tree.c to kernel/rcu/rcu.h so that SRCU can access it. This is another step towards enabling SRCU to create its own combining tree. This commit is code-movement only, give or take knock-on adjustments. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
Paul E. McKenney authored
This commit moves the C preprocessor code that defines the default shape of the rcu_node combining tree to a new include/linux/rcu_node_tree.h file as a first step towards enabling SRCU to create its own combining tree, which in turn enables SRCU to implement per-CPU callback handling, thus avoiding contention on the lock currently guarding the single list of callbacks. Note that users of SRCU still need to know the size of the srcu_struct structure, hence include/linux rather than kernel/rcu. This commit is code-movement only. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
Paul E. McKenney authored
This commit switches SRCU from custom-built callback queues to the new rcu_segcblist structure. This change associates grace-period sequence numbers with groups of callbacks, which will be needed for efficient processing of per-CPU callbacks. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
Paul E. McKenney authored
This commit adds grace-period sequence numbers, which will be used to handle mid-boot grace periods and per-CPU callback lists. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
Paul E. McKenney authored
The current SRCU grace-period processing might never reach the last portion of srcu_advance_batches(). This is OK given the current implementation, as the first portion, up to the try_check_zero() following the srcu_flip() is sufficient to drive grace periods forward. However, it has the unfortunate side-effect of making it impossible to determine when a given grace period has ended, and it will be necessary to efficiently trace ends of grace periods in order to efficiently handle per-CPU SRCU callback lists. This commit therefore adds states to the SRCU grace-period processing, so that the end of a given SRCU grace period is marked by the transition to the SRCU_STATE_DONE state. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
Paul E. McKenney authored
This commit simplifies the SRCU state machine by pushing the srcu_advance_batches() idle-SRCU fastpath into the common case. This is done by giving srcu_reschedule() a delay parameter, which is zero in the call from srcu_advance_batches(). This commit is a step towards numbering callbacks in order to efficiently handle per-CPU callback lists. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
Dmitry Vyukov authored
The rcu_seq_end() function increments seq signifying completion of a grace period, after that checks that the seq is even and wakes _synchronize_rcu_expedited(). The _synchronize_rcu_expedited() function uses wait_event() to wait for even seq. The problem is that wait_event() can return as soon as seq becomes even without waiting for the wakeup. In such case the warning in rcu_seq_end() can falsely fire if the next expedited grace period starts before the check. Check that seq has good value before incrementing it. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Cc: syzkaller@googlegroups.com Cc: linux-kernel@vger.kernel.org Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: josh@joshtriplett.org Cc: jiangshanlai@gmail.com Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> --- syzkaller-triggered warning: WARNING: CPU: 0 PID: 4832 at kernel/rcu/tree.c:3533 rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533 CPU: 0 PID: 4832 Comm: kworker/0:3 Not tainted 4.10.0+ #276 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: events wait_rcu_exp_gp Call Trace: __dump_stack lib/dump_stack.c:15 [inline] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 panic+0x1fb/0x412 kernel/panic.c:179 __warn+0x1c4/0x1e0 kernel/panic.c:540 warn_slowpath_null+0x2c/0x40 kernel/panic.c:583 rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533 rcu_exp_gp_seq_end kernel/rcu/tree_exp.h:36 [inline] rcu_exp_wait_wake+0x8a9/0x1330 kernel/rcu/tree_exp.h:517 rcu_exp_sel_wait_wake kernel/rcu/tree_exp.h:559 [inline] wait_rcu_exp_gp+0x83/0xc0 kernel/rcu/tree_exp.h:570 process_one_work+0xc06/0x1c20 kernel/workqueue.c:2096 worker_thread+0x223/0x19c0 kernel/workqueue.c:2230 kthread+0x326/0x3f0 kernel/kthread.c:227 ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430 ---
-
Paul E. McKenney authored
Expedited grace periods use workqueue handlers that wake up the requesters, but there is no lock mediating this wakeup. Therefore, memory barriers are required to ensure that the handler's memory references are seen by all to occur before synchronize_*_expedited() returns to its caller. Possibly detected by syzkaller. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
Paul E. McKenney authored
This commit moves rcu_seq_start(), rcu_seq_end(), rcu_seq_snap(), and rcu_seq_done() from kernel/rcu/tree.c to kernel/rcu/rcu.h. This will allow SRCU to use these functions, which in turn will allow SRCU to move from a single global callback queue to a per-CPU callback queue. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
Paul E. McKenney authored
This commit adds single-element dequeue functions to rcu_segcblist. These are less efficient than using the extract and insert functions, but allow more precise debugging code. These functions are thus expected to be used only in debug builds, for example, CONFIG_PROVE_RCU. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
Paul E. McKenney authored
This commit checks for pre-scheduler state, and if that early in the boot process, synchronize_srcu() and friends are no-ops. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
Paul E. McKenney authored
This is primarily a code-movement commit in preparation for allowing SRCU to handle early-boot SRCU grace periods. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-