[PATCH] RCU documentation

Signed-off-by: <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] RCU documentation
Signed-off-by: <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
a5fb42e6 · Paul E. McKenney · Linus Torvalds · c4afd441 · a5fb42e6 · a5fb42e6
Commit a5fb42e6 authored Sep 07, 2004 by Paul E. McKenney Committed by Linus Torvalds Sep 07, 2004
5 changed files
--- a/Documentation/RCU/RTFP.txt
+++ b/Documentation/RCU/RTFP.txt
--- a/Documentation/RCU/UP.txt
+++ b/Documentation/RCU/UP.txt
+RCU on Uniprocessor Systems
+
+
+A common misconception is that, on UP systems, the call_rcu() primitive
+may immediately invoke its function, and that the synchronize_kernel
+primitive may return immediately.  The basis of this misconception
+is that since there is only one CPU, it should not be necessary to
+wait for anything else to get done, since there are no other CPUs for
+anything else to be happening on.  Although this approach will sort of
+work a surprising amount of the time, it is a very bad idea in general.
+This document presents two examples that demonstrate exactly how bad an
+idea this is.
+
+
+Example 1: softirq Suicide
+
+Suppose that an RCU-based algorithm scans a linked list containing
+elements A, B, and C in process context, and can delete elements from
+this same list in softirq context.  Suppose that the process-context scan
+is referencing element B when it is interrupted by softirq processing,
+which deletes element B, and then invokes call_rcu() to free element B
+after a grace period.
+
+Now, if call_rcu() were to directly invoke its arguments, then upon return
+from softirq, the list scan would find itself referencing a newly freed
+element B.  This situation can greatly decrease the life expectancy of
+your kernel.
+
+
+Example 2: Function-Call Fatality
+
+Of course, one could avert the suicide described in the preceding example
+by having call_rcu() directly invoke its arguments only if it was called
+from process context.  However, this can fail in a similar manner.
+
+Suppose that an RCU-based algorithm again scans a linked list containing
+elements A, B, and C in process contexts, but that it invokes a function
+on each element as it is scanned.  Suppose further that this function
+deletes element B from the list, then passes it to call_rcu() for deferred
+freeing.  This may be a bit unconventional, but it is perfectly legal
+RCU usage, since call_rcu() must wait for a grace period to elapse.
+Therefore, in this case, allowing call_rcu() to immediately invoke
+its arguments would cause it to fail to make the fundamental guarantee
+underlying RCU, namely that call_rcu() defers invoking its arguments until
+all RCU read-side critical sections currently executing have completed.
+
+Quick Quiz: why is it -not- legal to invoke synchronize_kernel() in
+this case?
+
+
+Summary
+
+Permitting call_rcu() to immediatly invoke its arguments or permitting
+synchronize_kernel() to immediatly return breaks RCU, even on a UP system.
+So do not do it!  Even on a UP system, the RCU infrastructure -must-
+respect grace periods.
+
+
+Answer to Quick Quiz
+
+The calling function is scanning an RCU-protected linked list, and
+is therefore within an RCU read-side critical section.  Therefore,
+the called function has been invoked within an RCU read-side critical
+section, and is not permitted to block.
--- a/Documentation/RCU/checklist.txt
+++ b/Documentation/RCU/checklist.txt
+Review Checklist for RCU Patches
+
+
+This document contains a checklist for producing and reviewing patches
+that make use of RCU.  Violating any of the rules listed below will
+result in the same sorts of problems that leaving out a locking primitive
+would cause.  This list is based on experiences reviewing such patches
+over a rather long period of time, but improvements are always welcome!
+
+0.	Is RCU being applied to a read-mostly situation?  If the data
+	structure is updated more than about 10% of the time, then
+	you should strongly consider some other approach, unless
+	detailed performance measurements show that RCU is nonetheless
+	the right tool for the job.
+
+	The other exception would be where performance is not an issue,
+	and RCU provides a simpler implementation.  An example of this
+	situation is the dynamic NMI code in the Linux 2.6 kernel,
+	at least on architectures where NMIs are rare.
+
+1.	Does the update code have proper mutual exclusion?
+
+	RCU does allow -readers- to run (almost) naked, but -writers- must
+	still use some sort of mutual exclusion, such as:
+
+	a.	locking,
+	b.	atomic operations, or
+	c.	restricting updates to a single task.
+
+	If you choose #b, be prepared to describe how you have handled
+	memory barriers on weakly ordered machines (pretty much all of
+	them -- even x86 allows reads to be reordered), and be prepared
+	to explain why this added complexity is worthwhile.  If you
+	choose #c, be prepared to explain how this single task does not
+	become a major bottleneck on big multiprocessor machines.
+
+2.	Do the RCU read-side critical sections make proper use of
+	rcu_read_lock() and friends?  These primitives are needed
+	to suppress preemption (or bottom halves, in the case of
+	rcu_read_lock_bh()) in the read-side critical sections,
+	and are also an excellent aid to readability.
+
+3.	Does the update code tolerate concurrent accesses?
+
+	The whole point of RCU is to permit readers to run without
+	any locks or atomic operations.  This means that readers will
+	be running while updates are in progress.  There are a number
+	of ways to handle this concurrency, depending on the situation:
+
+	a.	Make updates appear atomic to readers.  For example,
+		pointer updates to properly aligned fields will appear
+		atomic, as will individual atomic primitives.  Operations
+		performed under a lock and sequences of multiple atomic
+		primitives will -not- appear to be atomic.
+
+		This is almost always the best approach.
+
+	b.	Carefully order the updates and the reads so that
+		readers see valid data at all phases of the update.
+		This is often more difficult than it sounds, especially
+		given modern CPUs' tendency to reorder memory references.
+		One must usually liberally sprinkle memory barriers
+		(smp_wmb(), smp_rmb(), smp_mb()) through the code,
+		making it difficult to understand and to test.
+
+		It is usually better to group the changing data into
+		a separate structure, so that the change may be made
+		to appear atomic by updating a pointer to reference
+		a new structure containing updated values.
+
+4.	Weakly ordered CPUs pose special challenges.  Almost all CPUs
+	are weakly ordered -- even i386 CPUs allow reads to be reordered.
+	RCU code must take all of the following measures to prevent
+	memory-corruption problems:
+
+	a.	Readers must maintain proper ordering of their memory
+		accesses.  The rcu_dereference() primitive ensures that
+		the CPU picks up the pointer before it picks up the data
+		that the pointer points to.  This really is necessary
+		on Alpha CPUs.	If you don't believe me, see:
+
+			http://www.openvms.compaq.com/wizard/wiz_2637.html
+
+		The rcu_dereference() primitive is also an excellent
+		documentation aid, letting the person reading the code
+		know exactly which pointers are protected by RCU.
+
+		The rcu_dereference() primitive is used by the various
+		"_rcu()" list-traversal primitives, such as the
+		list_for_each_entry_rcu().
+
+	b.	If the list macros are being used, the list_del_rcu(),
+		list_add_tail_rcu(), and list_del_rcu() primitives must
+		be used in order to prevent weakly ordered machines from
+		misordering structure initialization and pointer planting.
+		Similarly, if the hlist macros are being used, the
+		hlist_del_rcu() and hlist_add_head_rcu() primitives
+		are required.
+
+	c.	Updates must ensure that initialization of a given
+		structure happens before pointers to that structure are
+		publicized.  Use the rcu_assign_pointer() primitive
+		when publicizing a pointer to a structure that can
+		be traversed by an RCU read-side critical section.
+
+		[The rcu_assign_pointer() primitive is in process.]
+
+5.	If call_rcu(), or a related primitive such as call_rcu_bh(),
+	is used, the callback function must be written to be called
+	from softirq context.  In particular, it cannot block.
+
+6.	Since synchronize_kernel() blocks, it cannot be called from
+	any sort of irq context.
+
+7.	If the updater uses call_rcu(), then the corresponding readers
+	must use rcu_read_lock() and rcu_read_unlock().  If the updater
+	uses call_rcu_bh(), then the corresponding readers must use
+	rcu_read_lock_bh() and rcu_read_unlock_bh().  Mixing things up
+	will result in confusion and broken kernels.
+
+	One exception to this rule: rcu_read_lock() and rcu_read_unlock()
+	may be substituted for rcu_read_lock_bh() and rcu_read_unlock_bh()
+	in cases where local bottom halves are already known to be
+	disabled, for example, in irq or softirq context.  Commenting
+	such cases is a must, of course!  And the jury is still out on
+	whether the increased speed is worth it.
+
+8.	Although synchronize_kernel() is a bit slower than is call_rcu(),
+	it usually results in simpler code.  So, unless update performance
+	is important or the updaters cannot block, synchronize_kernel()
+	should be used in preference to call_rcu().
+
+9.	All RCU list-traversal primitives, which include
+	list_for_each_rcu(), list_for_each_entry_rcu(),
+	list_for_each_continue_rcu(), and list_for_each_safe_rcu(),
+	must be within an RCU read-side critical section.  RCU
+	read-side critical sections are delimited by rcu_read_lock()
+	and rcu_read_unlock(), or by similar primitives such as
+	rcu_read_lock_bh() and rcu_read_unlock_bh().
+
+	Use of the _rcu() list-traversal primitives outside of an
+	RCU read-side critical section causes no harm other than
+	a slight performance degradation on Alpha CPUs and some
+	confusion on the part of people trying to read the code.
+
+	Another way of thinking of this is "If you are holding the
+	lock that prevents the data structure from changing, why do
+	you also need RCU-based protection?"  That said, there may
+	well be situations where use of the _rcu() list-traversal
+	primitives while the update-side lock is held results in
+	simpler and more maintainable code.  The jury is still out
+	on this question.
+
+10.	Conversely, if you are in an RCU read-side critical section,
+	you -must- use the "_rcu()" variants of the list macros.
+	Failing to do so will break Alpha and confuse people reading
+	your code.
--- a/Documentation/RCU/listRCU.txt
+++ b/Documentation/RCU/listRCU.txt
--- a/Documentation/RCU/rcu.txt
+++ b/Documentation/RCU/rcu.txt
+RCU Concepts
+
+
+The basic idea behind RCU is to split destructive operations into two
+parts, one that makes anyone from seeing the data item being destroyed,
+and one that actually carries out the destruction.  A "grace period"
+must elapse between the two parts, and this grace period must be long
+enough that any readers accessing the item being deleted have since
+dropped their references.  For example, an RCU-protected deletion from a
+linked list would first remove the item from the list, wait for a grace
+period to elapse, then free the element.  See the listRCU.txt file for
+more information on using RCU with linked lists.
+
+
+Frequently Asked Questions
+
+
+o	Why would anyone want to use RCU?
+
+	The advantage of RCU's two-part approach is that RCU readers need
+	not acquire any locks, perform any atomic instructions, write to
+	shared memory, or (on CPUs other than Alpha) execute any memory
+	barriers.  The fact that these operations are quite expensive
+	on modern CPUs is what gives RCU its performance advantages
+	in read-mostly situations.  The fact that RCU readers need not
+	acquire locks can also greatly simplify deadlock-avoidance code.
+
+
+o	How can the updater tell when a grace period has completed
+	if the RCU readers give no indication when they are done?
+
+	Just as with spinlocks, RCU readers are not permitted to
+	block, switch to user-mode execution, or enter the idle loop.
+	Therefore, as soon as a CPU is seen passing through any of these
+	three states, we know that that CPU has exited any previous RCU
+	read-side critical sections.  So, if we remove an item from a
+	linked list, and then wait until all CPUs have switched context,
+	executed in user mode, or executed in the idle loop, we can
+	safely free up that item.
+
+o	If I am running on a uniprocessor kernel, which can only do one
+	thing at a time, why should I wait for a grace period?
+
+	See the UP.txt file in this directory.
+
+o	How can I see where RCU is currently used in the Linux kernel?
+
+	Search for "rcu_read_lock", "call_rcu", and "synchronize_kernel".
+
+o	What guidelines should I follow when writing code that uses RCU?
+
+	See the checklist.txt file in this directory.
+
+o	Where can I find more information on RCU?
+
+	See the RTFP.txt file in this directory.