diff options
Diffstat (limited to 'Documentation/RCU/whatisRCU.txt')
| -rw-r--r-- | Documentation/RCU/whatisRCU.txt | 241 |
1 files changed, 186 insertions, 55 deletions
diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt index 15da16861fa..49b8551a3b6 100644 --- a/Documentation/RCU/whatisRCU.txt +++ b/Documentation/RCU/whatisRCU.txt @@ -1,3 +1,12 @@ +Please note that the "What is RCU?" LWN series is an excellent place +to start learning about RCU: + +1. What is RCU, Fundamentally? http://lwn.net/Articles/262464/ +2. What is RCU? Part 2: Usage http://lwn.net/Articles/263130/ +3. RCU part 3: the RCU API http://lwn.net/Articles/264090/ +4. The RCU API, 2010 Edition http://lwn.net/Articles/418853/ + + What is RCU? RCU is a synchronization mechanism that was added to the Linux kernel @@ -128,10 +137,10 @@ rcu_read_lock() Used by a reader to inform the reclaimer that the reader is entering an RCU read-side critical section. It is illegal to block while in an RCU read-side critical section, though - kernels built with CONFIG_PREEMPT_RCU can preempt RCU read-side - critical sections. Any RCU-protected data structure accessed - during an RCU read-side critical section is guaranteed to remain - unreclaimed for the full duration of that critical section. + kernels built with CONFIG_TREE_PREEMPT_RCU can preempt RCU + read-side critical sections. Any RCU-protected data structure + accessed during an RCU read-side critical section is guaranteed to + remain unreclaimed for the full duration of that critical section. Reference counts may be used in conjunction with RCU to maintain longer-term references to data structures. @@ -184,7 +193,17 @@ synchronize_rcu() blocking, it registers a function and argument which are invoked after all ongoing RCU read-side critical sections have completed. This callback variant is particularly useful in situations where - it is illegal to block. + it is illegal to block or where update-side performance is + critically important. + + However, the call_rcu() API should not be used lightly, as use + of the synchronize_rcu() API generally results in simpler code. + In addition, the synchronize_rcu() API has the nice property + of automatically limiting update rate should grace periods + be delayed. This property results in system resilience in face + of denial-of-service attacks. Code using call_rcu() should limit + update rate in order to gain this same sort of resilience. See + checklist.txt for some approaches to limiting the update rate. rcu_assign_pointer() @@ -200,10 +219,11 @@ rcu_assign_pointer() the new value, and also executes any memory-barrier instructions required for a given CPU architecture. - Perhaps more important, it serves to document which pointers - are protected by RCU. That said, rcu_assign_pointer() is most - frequently used indirectly, via the _rcu list-manipulation - primitives such as list_add_rcu(). + Perhaps just as important, it serves to document (1) which + pointers are protected by RCU and (2) the point at which a + given structure becomes accessible to other CPUs. That said, + rcu_assign_pointer() is most frequently used indirectly, via + the _rcu list-manipulation primitives such as list_add_rcu(). rcu_dereference() @@ -245,9 +265,9 @@ rcu_dereference() rcu_read_lock(); p = rcu_dereference(head.next); rcu_read_unlock(); - x = p->address; + x = p->address; /* BUG!!! */ rcu_read_lock(); - y = p->data; + y = p->data; /* BUG!!! */ rcu_read_unlock(); Holding a reference from one RCU read-side critical section @@ -258,9 +278,11 @@ rcu_dereference() locking. As with rcu_assign_pointer(), an important function of - rcu_dereference() is to document which pointers are protected - by RCU. And, again like rcu_assign_pointer(), rcu_dereference() - is typically used indirectly, via the _rcu list-manipulation + rcu_dereference() is to document which pointers are protected by + RCU, in particular, flagging a pointer that is subject to changing + at any time, including immediately after the rcu_dereference(). + And, again like rcu_assign_pointer(), rcu_dereference() is + typically used indirectly, via the _rcu list-manipulation primitives, such as list_for_each_entry_rcu(). The following diagram shows how each API communicates among the @@ -302,14 +324,17 @@ used as follows: Defer Protect a. synchronize_rcu() rcu_read_lock() / rcu_read_unlock() - call_rcu() + call_rcu() rcu_dereference() -b. call_rcu_bh() rcu_read_lock_bh() / rcu_read_unlock_bh() +b. synchronize_rcu_bh() rcu_read_lock_bh() / rcu_read_unlock_bh() + call_rcu_bh() rcu_dereference_bh() -c. synchronize_sched() preempt_disable() / preempt_enable() +c. synchronize_sched() rcu_read_lock_sched() / rcu_read_unlock_sched() + call_rcu_sched() preempt_disable() / preempt_enable() local_irq_save() / local_irq_restore() hardirq enter / hardirq exit NMI enter / NMI exit + rcu_dereference_sched() These three mechanisms are used as follows: @@ -327,7 +352,7 @@ for specialized uses, but are relatively uncommon. 3. WHAT ARE SOME EXAMPLE USES OF CORE RCU API? This section shows a simple use of the core RCU API to protect a -global pointer to a dynamically allocated structure. More typical +global pointer to a dynamically allocated structure. More-typical uses of RCU may be found in listRCU.txt, arrayRCU.txt, and NMI-RCU.txt. struct foo { @@ -357,7 +382,7 @@ uses of RCU may be found in listRCU.txt, arrayRCU.txt, and NMI-RCU.txt. struct foo *new_fp; struct foo *old_fp; - new_fp = kmalloc(sizeof(*fp), GFP_KERNEL); + new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL); spin_lock(&foo_mutex); old_fp = gbl_foo; *new_fp = *old_fp; @@ -410,6 +435,8 @@ o Use synchronize_rcu() -after- removing a data element from an data item. See checklist.txt for additional rules to follow when using RCU. +And again, more-typical uses of RCU may be found in listRCU.txt, +arrayRCU.txt, and NMI-RCU.txt. 4. WHAT IF MY UPDATING THREAD CANNOT BLOCK? @@ -456,7 +483,7 @@ The foo_update_a() function might then be written as follows: struct foo *new_fp; struct foo *old_fp; - new_fp = kmalloc(sizeof(*fp), GFP_KERNEL); + new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL); spin_lock(&foo_mutex); old_fp = gbl_foo; *new_fp = *old_fp; @@ -472,6 +499,8 @@ The foo_reclaim() function might appear as follows: { struct foo *fp = container_of(rp, struct foo, rcu); + foo_cleanup(fp->a); + kfree(fp); } @@ -494,6 +523,12 @@ o Use call_rcu() -after- removing a data element from an read-side critical sections that might be referencing that data item. +If the callback for call_rcu() is not doing anything more than calling +kfree() on the structure, you can use kfree_rcu() instead of call_rcu() +to avoid having to write your own callback: + + kfree_rcu(old_fp, rcu); + Again, see checklist.txt for additional rules governing the use of RCU. @@ -513,7 +548,7 @@ production-quality implementation, and see: for papers describing the Linux kernel RCU implementation. The OLS'01 and OLS'02 papers are a good introduction, and the dissertation provides -more details on the current implementation. +more details on the current implementation as of early 2004. 5A. "TOY" IMPLEMENTATION #1: LOCKING @@ -567,7 +602,7 @@ The rcu_read_lock() and rcu_read_unlock() primitive read-acquire and release a global reader-writer lock. The synchronize_rcu() primitive write-acquires this same lock, then immediately releases it. This means that once synchronize_rcu() exits, all RCU read-side -critical sections that were in progress before synchonize_rcu() was +critical sections that were in progress before synchronize_rcu() was called are guaranteed to have completed -- there is no way that synchronize_rcu() would have been able to write-acquire the lock otherwise. @@ -600,7 +635,7 @@ are the same as those shown in the preceding section, so they are omitted. { int cpu; - for_each_cpu(cpu) + for_each_possible_cpu(cpu) run_on(cpu); } @@ -672,8 +707,9 @@ diff shows how closely related RCU and reader-writer locking can be. + spin_lock(&listmutex); list_for_each_entry(p, head, lp) { if (p->key == key) { - list_del(&p->list); + - list_del(&p->list); - write_unlock(&listmutex); + + list_del_rcu(&p->list); + spin_unlock(&listmutex); + synchronize_rcu(); kfree(p); @@ -721,7 +757,7 @@ Or, for those who prefer a side-by-side listing: 5 write_lock(&listmutex); 5 spin_lock(&listmutex); 6 list_for_each_entry(p, head, lp) { 6 list_for_each_entry(p, head, lp) { 7 if (p->key == key) { 7 if (p->key == key) { - 8 list_del(&p->list); 8 list_del(&p->list); + 8 list_del(&p->list); 8 list_del_rcu(&p->list); 9 write_unlock(&listmutex); 9 spin_unlock(&listmutex); 10 synchronize_rcu(); 10 kfree(p); 11 kfree(p); @@ -734,7 +770,7 @@ Or, for those who prefer a side-by-side listing: Either way, the differences are quite small. Read-side locking moves to rcu_read_lock() and rcu_read_unlock, update-side locking moves from -from a reader-writer lock to a simple spinlock, and a synchronize_rcu() +a reader-writer lock to a simple spinlock, and a synchronize_rcu() precedes the kfree(). However, there is one potential catch: the read-side and update-side @@ -745,8 +781,8 @@ a single atomic update, converting to RCU will require special care. Also, the presence of synchronize_rcu() means that the RCU version of delete() can now block. If this is a problem, there is a callback-based -mechanism that never blocks, namely call_rcu(), that can be used in -place of synchronize_rcu(). +mechanism that never blocks, namely call_rcu() or kfree_rcu(), that can +be used in place of synchronize_rcu(). 7. FULL LIST OF RCU APIs @@ -756,46 +792,140 @@ Linux-kernel source code, but it helps to have a full list of the APIs, since there does not appear to be a way to categorize them in docbook. Here is the list, by category. -Markers for RCU read-side critical sections: +RCU list traversal: - rcu_read_lock - rcu_read_unlock - rcu_read_lock_bh - rcu_read_unlock_bh - -RCU pointer/list traversal: - - rcu_dereference - list_for_each_rcu (to be deprecated in favor of - list_for_each_entry_rcu) - list_for_each_safe_rcu (deprecated, not used) + list_entry_rcu + list_first_entry_rcu + list_next_rcu list_for_each_entry_rcu - list_for_each_continue_rcu (to be deprecated in favor of new - list_for_each_entry_continue_rcu) + list_for_each_entry_continue_rcu + hlist_first_rcu + hlist_next_rcu + hlist_pprev_rcu hlist_for_each_entry_rcu + hlist_for_each_entry_rcu_bh + hlist_for_each_entry_continue_rcu + hlist_for_each_entry_continue_rcu_bh + hlist_nulls_first_rcu + hlist_nulls_for_each_entry_rcu + hlist_bl_first_rcu + hlist_bl_for_each_entry_rcu -RCU pointer update: +RCU pointer/list update: rcu_assign_pointer list_add_rcu list_add_tail_rcu list_del_rcu list_replace_rcu - hlist_del_rcu + hlist_add_after_rcu + hlist_add_before_rcu hlist_add_head_rcu - -RCU grace period: - - synchronize_kernel (deprecated) - synchronize_net - synchronize_sched - synchronize_rcu - call_rcu - call_rcu_bh + hlist_del_rcu + hlist_del_init_rcu + hlist_replace_rcu + list_splice_init_rcu() + hlist_nulls_del_init_rcu + hlist_nulls_del_rcu + hlist_nulls_add_head_rcu + hlist_bl_add_head_rcu + hlist_bl_del_init_rcu + hlist_bl_del_rcu + hlist_bl_set_first_rcu + +RCU: Critical sections Grace period Barrier + + rcu_read_lock synchronize_net rcu_barrier + rcu_read_unlock synchronize_rcu + rcu_dereference synchronize_rcu_expedited + rcu_read_lock_held call_rcu + rcu_dereference_check kfree_rcu + rcu_dereference_protected + +bh: Critical sections Grace period Barrier + + rcu_read_lock_bh call_rcu_bh rcu_barrier_bh + rcu_read_unlock_bh synchronize_rcu_bh + rcu_dereference_bh synchronize_rcu_bh_expedited + rcu_dereference_bh_check + rcu_dereference_bh_protected + rcu_read_lock_bh_held + +sched: Critical sections Grace period Barrier + + rcu_read_lock_sched synchronize_sched rcu_barrier_sched + rcu_read_unlock_sched call_rcu_sched + [preempt_disable] synchronize_sched_expedited + [and friends] + rcu_read_lock_sched_notrace + rcu_read_unlock_sched_notrace + rcu_dereference_sched + rcu_dereference_sched_check + rcu_dereference_sched_protected + rcu_read_lock_sched_held + + +SRCU: Critical sections Grace period Barrier + + srcu_read_lock synchronize_srcu srcu_barrier + srcu_read_unlock call_srcu + srcu_dereference synchronize_srcu_expedited + srcu_dereference_check + srcu_read_lock_held + +SRCU: Initialization/cleanup + init_srcu_struct + cleanup_srcu_struct + +All: lockdep-checked RCU-protected pointer access + + rcu_access_index + rcu_access_pointer + rcu_dereference_index_check + rcu_dereference_raw + rcu_lockdep_assert + rcu_sleep_check + RCU_NONIDLE See the comment headers in the source code (or the docbook generated from them) for more information. +However, given that there are no fewer than four families of RCU APIs +in the Linux kernel, how do you choose which one to use? The following +list can be helpful: + +a. Will readers need to block? If so, you need SRCU. + +b. What about the -rt patchset? If readers would need to block + in an non-rt kernel, you need SRCU. If readers would block + in a -rt kernel, but not in a non-rt kernel, SRCU is not + necessary. + +c. Do you need to treat NMI handlers, hardirq handlers, + and code segments with preemption disabled (whether + via preempt_disable(), local_irq_save(), local_bh_disable(), + or some other mechanism) as if they were explicit RCU readers? + If so, RCU-sched is the only choice that will work for you. + +d. Do you need RCU grace periods to complete even in the face + of softirq monopolization of one or more of the CPUs? For + example, is your code subject to network-based denial-of-service + attacks? If so, you need RCU-bh. + +e. Is your workload too update-intensive for normal use of + RCU, but inappropriate for other synchronization mechanisms? + If so, consider SLAB_DESTROY_BY_RCU. But please be careful! + +f. Do you need read-side critical sections that are respected + even though they are in the middle of the idle loop, during + user-mode execution, or on an offlined CPU? If so, SRCU is the + only choice that will work for you. + +g. Otherwise, use RCU. + +Of course, this all assumes that you have determined that RCU is in fact +the right tool for your job. + 8. ANSWERS TO QUICK QUIZZES @@ -807,7 +937,8 @@ Quick Quiz #1: Why is this argument naive? How could a deadlock Answer: Consider the following sequence of events: 1. CPU 0 acquires some unrelated lock, call it - "problematic_lock". + "problematic_lock", disabling irq via + spin_lock_irqsave(). 2. CPU 1 enters synchronize_rcu(), write-acquiring rcu_gp_mutex. @@ -894,7 +1025,7 @@ Answer: Just as PREEMPT_RT permits preemption of spinlock ACKNOWLEDGEMENTS My thanks to the people who helped make this human-readable, including -Jon Walpole, Josh Triplett, Serge Hallyn, and Suzanne Wood. +Jon Walpole, Josh Triplett, Serge Hallyn, Suzanne Wood, and Alan Stern. For more information, see http://www.rdrop.com/users/paulmck/RCU. |
