aboutsummaryrefslogtreecommitdiff
path: root/Documentation/RCU/whatisRCU.txt
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/RCU/whatisRCU.txt')
-rw-r--r--Documentation/RCU/whatisRCU.txt241
1 files changed, 186 insertions, 55 deletions
diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt
index 15da16861fa..49b8551a3b6 100644
--- a/Documentation/RCU/whatisRCU.txt
+++ b/Documentation/RCU/whatisRCU.txt
@@ -1,3 +1,12 @@
+Please note that the "What is RCU?" LWN series is an excellent place
+to start learning about RCU:
+
+1. What is RCU, Fundamentally? http://lwn.net/Articles/262464/
+2. What is RCU? Part 2: Usage http://lwn.net/Articles/263130/
+3. RCU part 3: the RCU API http://lwn.net/Articles/264090/
+4. The RCU API, 2010 Edition http://lwn.net/Articles/418853/
+
+
What is RCU?
RCU is a synchronization mechanism that was added to the Linux kernel
@@ -128,10 +137,10 @@ rcu_read_lock()
Used by a reader to inform the reclaimer that the reader is
entering an RCU read-side critical section. It is illegal
to block while in an RCU read-side critical section, though
- kernels built with CONFIG_PREEMPT_RCU can preempt RCU read-side
- critical sections. Any RCU-protected data structure accessed
- during an RCU read-side critical section is guaranteed to remain
- unreclaimed for the full duration of that critical section.
+ kernels built with CONFIG_TREE_PREEMPT_RCU can preempt RCU
+ read-side critical sections. Any RCU-protected data structure
+ accessed during an RCU read-side critical section is guaranteed to
+ remain unreclaimed for the full duration of that critical section.
Reference counts may be used in conjunction with RCU to maintain
longer-term references to data structures.
@@ -184,7 +193,17 @@ synchronize_rcu()
blocking, it registers a function and argument which are invoked
after all ongoing RCU read-side critical sections have completed.
This callback variant is particularly useful in situations where
- it is illegal to block.
+ it is illegal to block or where update-side performance is
+ critically important.
+
+ However, the call_rcu() API should not be used lightly, as use
+ of the synchronize_rcu() API generally results in simpler code.
+ In addition, the synchronize_rcu() API has the nice property
+ of automatically limiting update rate should grace periods
+ be delayed. This property results in system resilience in face
+ of denial-of-service attacks. Code using call_rcu() should limit
+ update rate in order to gain this same sort of resilience. See
+ checklist.txt for some approaches to limiting the update rate.
rcu_assign_pointer()
@@ -200,10 +219,11 @@ rcu_assign_pointer()
the new value, and also executes any memory-barrier instructions
required for a given CPU architecture.
- Perhaps more important, it serves to document which pointers
- are protected by RCU. That said, rcu_assign_pointer() is most
- frequently used indirectly, via the _rcu list-manipulation
- primitives such as list_add_rcu().
+ Perhaps just as important, it serves to document (1) which
+ pointers are protected by RCU and (2) the point at which a
+ given structure becomes accessible to other CPUs. That said,
+ rcu_assign_pointer() is most frequently used indirectly, via
+ the _rcu list-manipulation primitives such as list_add_rcu().
rcu_dereference()
@@ -245,9 +265,9 @@ rcu_dereference()
rcu_read_lock();
p = rcu_dereference(head.next);
rcu_read_unlock();
- x = p->address;
+ x = p->address; /* BUG!!! */
rcu_read_lock();
- y = p->data;
+ y = p->data; /* BUG!!! */
rcu_read_unlock();
Holding a reference from one RCU read-side critical section
@@ -258,9 +278,11 @@ rcu_dereference()
locking.
As with rcu_assign_pointer(), an important function of
- rcu_dereference() is to document which pointers are protected
- by RCU. And, again like rcu_assign_pointer(), rcu_dereference()
- is typically used indirectly, via the _rcu list-manipulation
+ rcu_dereference() is to document which pointers are protected by
+ RCU, in particular, flagging a pointer that is subject to changing
+ at any time, including immediately after the rcu_dereference().
+ And, again like rcu_assign_pointer(), rcu_dereference() is
+ typically used indirectly, via the _rcu list-manipulation
primitives, such as list_for_each_entry_rcu().
The following diagram shows how each API communicates among the
@@ -302,14 +324,17 @@ used as follows:
Defer Protect
a. synchronize_rcu() rcu_read_lock() / rcu_read_unlock()
- call_rcu()
+ call_rcu() rcu_dereference()
-b. call_rcu_bh() rcu_read_lock_bh() / rcu_read_unlock_bh()
+b. synchronize_rcu_bh() rcu_read_lock_bh() / rcu_read_unlock_bh()
+ call_rcu_bh() rcu_dereference_bh()
-c. synchronize_sched() preempt_disable() / preempt_enable()
+c. synchronize_sched() rcu_read_lock_sched() / rcu_read_unlock_sched()
+ call_rcu_sched() preempt_disable() / preempt_enable()
local_irq_save() / local_irq_restore()
hardirq enter / hardirq exit
NMI enter / NMI exit
+ rcu_dereference_sched()
These three mechanisms are used as follows:
@@ -327,7 +352,7 @@ for specialized uses, but are relatively uncommon.
3. WHAT ARE SOME EXAMPLE USES OF CORE RCU API?
This section shows a simple use of the core RCU API to protect a
-global pointer to a dynamically allocated structure. More typical
+global pointer to a dynamically allocated structure. More-typical
uses of RCU may be found in listRCU.txt, arrayRCU.txt, and NMI-RCU.txt.
struct foo {
@@ -357,7 +382,7 @@ uses of RCU may be found in listRCU.txt, arrayRCU.txt, and NMI-RCU.txt.
struct foo *new_fp;
struct foo *old_fp;
- new_fp = kmalloc(sizeof(*fp), GFP_KERNEL);
+ new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL);
spin_lock(&foo_mutex);
old_fp = gbl_foo;
*new_fp = *old_fp;
@@ -410,6 +435,8 @@ o Use synchronize_rcu() -after- removing a data element from an
data item.
See checklist.txt for additional rules to follow when using RCU.
+And again, more-typical uses of RCU may be found in listRCU.txt,
+arrayRCU.txt, and NMI-RCU.txt.
4. WHAT IF MY UPDATING THREAD CANNOT BLOCK?
@@ -456,7 +483,7 @@ The foo_update_a() function might then be written as follows:
struct foo *new_fp;
struct foo *old_fp;
- new_fp = kmalloc(sizeof(*fp), GFP_KERNEL);
+ new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL);
spin_lock(&foo_mutex);
old_fp = gbl_foo;
*new_fp = *old_fp;
@@ -472,6 +499,8 @@ The foo_reclaim() function might appear as follows:
{
struct foo *fp = container_of(rp, struct foo, rcu);
+ foo_cleanup(fp->a);
+
kfree(fp);
}
@@ -494,6 +523,12 @@ o Use call_rcu() -after- removing a data element from an
read-side critical sections that might be referencing that
data item.
+If the callback for call_rcu() is not doing anything more than calling
+kfree() on the structure, you can use kfree_rcu() instead of call_rcu()
+to avoid having to write your own callback:
+
+ kfree_rcu(old_fp, rcu);
+
Again, see checklist.txt for additional rules governing the use of RCU.
@@ -513,7 +548,7 @@ production-quality implementation, and see:
for papers describing the Linux kernel RCU implementation. The OLS'01
and OLS'02 papers are a good introduction, and the dissertation provides
-more details on the current implementation.
+more details on the current implementation as of early 2004.
5A. "TOY" IMPLEMENTATION #1: LOCKING
@@ -567,7 +602,7 @@ The rcu_read_lock() and rcu_read_unlock() primitive read-acquire
and release a global reader-writer lock. The synchronize_rcu()
primitive write-acquires this same lock, then immediately releases
it. This means that once synchronize_rcu() exits, all RCU read-side
-critical sections that were in progress before synchonize_rcu() was
+critical sections that were in progress before synchronize_rcu() was
called are guaranteed to have completed -- there is no way that
synchronize_rcu() would have been able to write-acquire the lock
otherwise.
@@ -600,7 +635,7 @@ are the same as those shown in the preceding section, so they are omitted.
{
int cpu;
- for_each_cpu(cpu)
+ for_each_possible_cpu(cpu)
run_on(cpu);
}
@@ -672,8 +707,9 @@ diff shows how closely related RCU and reader-writer locking can be.
+ spin_lock(&listmutex);
list_for_each_entry(p, head, lp) {
if (p->key == key) {
- list_del(&p->list);
+ - list_del(&p->list);
- write_unlock(&listmutex);
+ + list_del_rcu(&p->list);
+ spin_unlock(&listmutex);
+ synchronize_rcu();
kfree(p);
@@ -721,7 +757,7 @@ Or, for those who prefer a side-by-side listing:
5 write_lock(&listmutex); 5 spin_lock(&listmutex);
6 list_for_each_entry(p, head, lp) { 6 list_for_each_entry(p, head, lp) {
7 if (p->key == key) { 7 if (p->key == key) {
- 8 list_del(&p->list); 8 list_del(&p->list);
+ 8 list_del(&p->list); 8 list_del_rcu(&p->list);
9 write_unlock(&listmutex); 9 spin_unlock(&listmutex);
10 synchronize_rcu();
10 kfree(p); 11 kfree(p);
@@ -734,7 +770,7 @@ Or, for those who prefer a side-by-side listing:
Either way, the differences are quite small. Read-side locking moves
to rcu_read_lock() and rcu_read_unlock, update-side locking moves from
-from a reader-writer lock to a simple spinlock, and a synchronize_rcu()
+a reader-writer lock to a simple spinlock, and a synchronize_rcu()
precedes the kfree().
However, there is one potential catch: the read-side and update-side
@@ -745,8 +781,8 @@ a single atomic update, converting to RCU will require special care.
Also, the presence of synchronize_rcu() means that the RCU version of
delete() can now block. If this is a problem, there is a callback-based
-mechanism that never blocks, namely call_rcu(), that can be used in
-place of synchronize_rcu().
+mechanism that never blocks, namely call_rcu() or kfree_rcu(), that can
+be used in place of synchronize_rcu().
7. FULL LIST OF RCU APIs
@@ -756,46 +792,140 @@ Linux-kernel source code, but it helps to have a full list of the
APIs, since there does not appear to be a way to categorize them
in docbook. Here is the list, by category.
-Markers for RCU read-side critical sections:
+RCU list traversal:
- rcu_read_lock
- rcu_read_unlock
- rcu_read_lock_bh
- rcu_read_unlock_bh
-
-RCU pointer/list traversal:
-
- rcu_dereference
- list_for_each_rcu (to be deprecated in favor of
- list_for_each_entry_rcu)
- list_for_each_safe_rcu (deprecated, not used)
+ list_entry_rcu
+ list_first_entry_rcu
+ list_next_rcu
list_for_each_entry_rcu
- list_for_each_continue_rcu (to be deprecated in favor of new
- list_for_each_entry_continue_rcu)
+ list_for_each_entry_continue_rcu
+ hlist_first_rcu
+ hlist_next_rcu
+ hlist_pprev_rcu
hlist_for_each_entry_rcu
+ hlist_for_each_entry_rcu_bh
+ hlist_for_each_entry_continue_rcu
+ hlist_for_each_entry_continue_rcu_bh
+ hlist_nulls_first_rcu
+ hlist_nulls_for_each_entry_rcu
+ hlist_bl_first_rcu
+ hlist_bl_for_each_entry_rcu
-RCU pointer update:
+RCU pointer/list update:
rcu_assign_pointer
list_add_rcu
list_add_tail_rcu
list_del_rcu
list_replace_rcu
- hlist_del_rcu
+ hlist_add_after_rcu
+ hlist_add_before_rcu
hlist_add_head_rcu
-
-RCU grace period:
-
- synchronize_kernel (deprecated)
- synchronize_net
- synchronize_sched
- synchronize_rcu
- call_rcu
- call_rcu_bh
+ hlist_del_rcu
+ hlist_del_init_rcu
+ hlist_replace_rcu
+ list_splice_init_rcu()
+ hlist_nulls_del_init_rcu
+ hlist_nulls_del_rcu
+ hlist_nulls_add_head_rcu
+ hlist_bl_add_head_rcu
+ hlist_bl_del_init_rcu
+ hlist_bl_del_rcu
+ hlist_bl_set_first_rcu
+
+RCU: Critical sections Grace period Barrier
+
+ rcu_read_lock synchronize_net rcu_barrier
+ rcu_read_unlock synchronize_rcu
+ rcu_dereference synchronize_rcu_expedited
+ rcu_read_lock_held call_rcu
+ rcu_dereference_check kfree_rcu
+ rcu_dereference_protected
+
+bh: Critical sections Grace period Barrier
+
+ rcu_read_lock_bh call_rcu_bh rcu_barrier_bh
+ rcu_read_unlock_bh synchronize_rcu_bh
+ rcu_dereference_bh synchronize_rcu_bh_expedited
+ rcu_dereference_bh_check
+ rcu_dereference_bh_protected
+ rcu_read_lock_bh_held
+
+sched: Critical sections Grace period Barrier
+
+ rcu_read_lock_sched synchronize_sched rcu_barrier_sched
+ rcu_read_unlock_sched call_rcu_sched
+ [preempt_disable] synchronize_sched_expedited
+ [and friends]
+ rcu_read_lock_sched_notrace
+ rcu_read_unlock_sched_notrace
+ rcu_dereference_sched
+ rcu_dereference_sched_check
+ rcu_dereference_sched_protected
+ rcu_read_lock_sched_held
+
+
+SRCU: Critical sections Grace period Barrier
+
+ srcu_read_lock synchronize_srcu srcu_barrier
+ srcu_read_unlock call_srcu
+ srcu_dereference synchronize_srcu_expedited
+ srcu_dereference_check
+ srcu_read_lock_held
+
+SRCU: Initialization/cleanup
+ init_srcu_struct
+ cleanup_srcu_struct
+
+All: lockdep-checked RCU-protected pointer access
+
+ rcu_access_index
+ rcu_access_pointer
+ rcu_dereference_index_check
+ rcu_dereference_raw
+ rcu_lockdep_assert
+ rcu_sleep_check
+ RCU_NONIDLE
See the comment headers in the source code (or the docbook generated
from them) for more information.
+However, given that there are no fewer than four families of RCU APIs
+in the Linux kernel, how do you choose which one to use? The following
+list can be helpful:
+
+a. Will readers need to block? If so, you need SRCU.
+
+b. What about the -rt patchset? If readers would need to block
+ in an non-rt kernel, you need SRCU. If readers would block
+ in a -rt kernel, but not in a non-rt kernel, SRCU is not
+ necessary.
+
+c. Do you need to treat NMI handlers, hardirq handlers,
+ and code segments with preemption disabled (whether
+ via preempt_disable(), local_irq_save(), local_bh_disable(),
+ or some other mechanism) as if they were explicit RCU readers?
+ If so, RCU-sched is the only choice that will work for you.
+
+d. Do you need RCU grace periods to complete even in the face
+ of softirq monopolization of one or more of the CPUs? For
+ example, is your code subject to network-based denial-of-service
+ attacks? If so, you need RCU-bh.
+
+e. Is your workload too update-intensive for normal use of
+ RCU, but inappropriate for other synchronization mechanisms?
+ If so, consider SLAB_DESTROY_BY_RCU. But please be careful!
+
+f. Do you need read-side critical sections that are respected
+ even though they are in the middle of the idle loop, during
+ user-mode execution, or on an offlined CPU? If so, SRCU is the
+ only choice that will work for you.
+
+g. Otherwise, use RCU.
+
+Of course, this all assumes that you have determined that RCU is in fact
+the right tool for your job.
+
8. ANSWERS TO QUICK QUIZZES
@@ -807,7 +937,8 @@ Quick Quiz #1: Why is this argument naive? How could a deadlock
Answer: Consider the following sequence of events:
1. CPU 0 acquires some unrelated lock, call it
- "problematic_lock".
+ "problematic_lock", disabling irq via
+ spin_lock_irqsave().
2. CPU 1 enters synchronize_rcu(), write-acquiring
rcu_gp_mutex.
@@ -894,7 +1025,7 @@ Answer: Just as PREEMPT_RT permits preemption of spinlock
ACKNOWLEDGEMENTS
My thanks to the people who helped make this human-readable, including
-Jon Walpole, Josh Triplett, Serge Hallyn, and Suzanne Wood.
+Jon Walpole, Josh Triplett, Serge Hallyn, Suzanne Wood, and Alan Stern.
For more information, see http://www.rdrop.com/users/paulmck/RCU.