diff options
Diffstat (limited to 'Documentation/RCU')
| -rw-r--r-- | Documentation/RCU/00-INDEX | 4 | ||||
| -rw-r--r-- | Documentation/RCU/RTFP.txt | 149 | ||||
| -rw-r--r-- | Documentation/RCU/checklist.txt | 30 | ||||
| -rw-r--r-- | Documentation/RCU/rcu_dereference.txt | 371 | ||||
| -rw-r--r-- | Documentation/RCU/stallwarn.txt | 2 | ||||
| -rw-r--r-- | Documentation/RCU/trace.txt | 22 | ||||
| -rw-r--r-- | Documentation/RCU/whatisRCU.txt | 55 |
7 files changed, 579 insertions, 54 deletions
diff --git a/Documentation/RCU/00-INDEX b/Documentation/RCU/00-INDEX index 1d7a885761f..f773a264ae0 100644 --- a/Documentation/RCU/00-INDEX +++ b/Documentation/RCU/00-INDEX @@ -8,8 +8,12 @@ listRCU.txt - Using RCU to Protect Read-Mostly Linked Lists lockdep.txt - RCU and lockdep checking +lockdep-splat.txt + - RCU Lockdep splats explained. NMI-RCU.txt - Using RCU to Protect Dynamic NMI Handlers +rcu_dereference.txt + - Proper care and feeding of return values from rcu_dereference() rcubarrier.txt - RCU and Unloadable Modules rculist_nulls.txt diff --git a/Documentation/RCU/RTFP.txt b/Documentation/RCU/RTFP.txt index 273e654d7d0..2f0fcb2112d 100644 --- a/Documentation/RCU/RTFP.txt +++ b/Documentation/RCU/RTFP.txt @@ -31,6 +31,14 @@ has lapsed, so this approach may be used in non-GPL software, if desired. (In contrast, implementation of RCU is permitted only in software licensed under either GPL or LGPL. Sorry!!!) +In 1987, Rashid et al. described lazy TLB-flush [RichardRashid87a]. +At first glance, this has nothing to do with RCU, but nevertheless +this paper helped inspire the update-side batching used in the later +RCU implementation in DYNIX/ptx. In 1988, Barbara Liskov published +a description of Argus that noted that use of out-of-date values can +be tolerated in some situations. Thus, this paper provides some early +theoretical justification for use of stale data. + In 1990, Pugh [Pugh90] noted that explicitly tracking which threads were reading a given data structure permitted deferred free to operate in the presence of non-terminating threads. However, this explicit @@ -41,11 +49,11 @@ providing a fine-grained locking design, however, it would be interesting to see how much of the performance advantage reported in 1990 remains today. -At about this same time, Adams [Adams91] described ``chaotic relaxation'', -where the normal barriers between successive iterations of convergent -numerical algorithms are relaxed, so that iteration $n$ might use -data from iteration $n-1$ or even $n-2$. This introduces error, -which typically slows convergence and thus increases the number of +At about this same time, Andrews [Andrews91textbook] described ``chaotic +relaxation'', where the normal barriers between successive iterations +of convergent numerical algorithms are relaxed, so that iteration $n$ +might use data from iteration $n-1$ or even $n-2$. This introduces +error, which typically slows convergence and thus increases the number of iterations required. However, this increase is sometimes more than made up for by a reduction in the number of expensive barrier operations, which are otherwise required to synchronize the threads at the end @@ -55,7 +63,8 @@ is thus inapplicable to most data structures in operating-system kernels. In 1992, Henry (now Alexia) Massalin completed a dissertation advising parallel programmers to defer processing when feasible to simplify -synchronization. RCU makes extremely heavy use of this advice. +synchronization [HMassalinPhD]. RCU makes extremely heavy use of +this advice. In 1993, Jacobson [Jacobson93] verbally described what is perhaps the simplest deferred-free technique: simply waiting a fixed amount of time @@ -90,27 +99,29 @@ mechanism, which is quite similar to RCU [Gamsa99]. These operating systems made pervasive use of RCU in place of "existence locks", which greatly simplifies locking hierarchies and helps avoid deadlocks. -2001 saw the first RCU presentation involving Linux [McKenney01a] -at OLS. The resulting abundance of RCU patches was presented the -following year [McKenney02a], and use of RCU in dcache was first -described that same year [Linder02a]. +The year 2000 saw an email exchange that would likely have +led to yet another independent invention of something like RCU +[RustyRussell2000a,RustyRussell2000b]. Instead, 2001 saw the first +RCU presentation involving Linux [McKenney01a] at OLS. The resulting +abundance of RCU patches was presented the following year [McKenney02a], +and use of RCU in dcache was first described that same year [Linder02a]. Also in 2002, Michael [Michael02b,Michael02a] presented "hazard-pointer" techniques that defer the destruction of data structures to simplify non-blocking synchronization (wait-free synchronization, lock-free synchronization, and obstruction-free synchronization are all examples of -non-blocking synchronization). In particular, this technique eliminates -locking, reduces contention, reduces memory latency for readers, and -parallelizes pipeline stalls and memory latency for writers. However, -these techniques still impose significant read-side overhead in the -form of memory barriers. Researchers at Sun worked along similar lines -in the same timeframe [HerlihyLM02]. These techniques can be thought -of as inside-out reference counts, where the count is represented by the -number of hazard pointers referencing a given data structure rather than -the more conventional counter field within the data structure itself. -The key advantage of inside-out reference counts is that they can be -stored in immortal variables, thus allowing races between access and -deletion to be avoided. +non-blocking synchronization). The corresponding journal article appeared +in 2004 [MagedMichael04a]. This technique eliminates locking, reduces +contention, reduces memory latency for readers, and parallelizes pipeline +stalls and memory latency for writers. However, these techniques still +impose significant read-side overhead in the form of memory barriers. +Researchers at Sun worked along similar lines in the same timeframe +[HerlihyLM02]. These techniques can be thought of as inside-out reference +counts, where the count is represented by the number of hazard pointers +referencing a given data structure rather than the more conventional +counter field within the data structure itself. The key advantage +of inside-out reference counts is that they can be stored in immortal +variables, thus allowing races between access and deletion to be avoided. By the same token, RCU can be thought of as a "bulk reference count", where some form of reference counter covers all reference by a given CPU @@ -123,8 +134,10 @@ can be thought of in other terms as well. In 2003, the K42 group described how RCU could be used to create hot-pluggable implementations of operating-system functions [Appavoo03a]. -Later that year saw a paper describing an RCU implementation of System -V IPC [Arcangeli03], and an introduction to RCU in Linux Journal +Later that year saw a paper describing an RCU implementation +of System V IPC [Arcangeli03] (following up on a suggestion by +Hugh Dickins [Dickins02a] and an implementation by Mingming Cao +[MingmingCao2002IPCRCU]), and an introduction to RCU in Linux Journal [McKenney03a]. 2004 has seen a Linux-Journal article on use of RCU in dcache @@ -383,6 +396,21 @@ for Programming Languages and Operating Systems}" } } +@phdthesis{HMassalinPhD +,author="H. Massalin" +,title="Synthesis: An Efficient Implementation of Fundamental Operating +System Services" +,school="Columbia University" +,address="New York, NY" +,year="1992" +,annotation={ + Mondo optimizing compiler. + Wait-free stuff. + Good advice: defer work to avoid synchronization. See page 90 + (PDF page 106), Section 5.4, fourth bullet point. +} +} + @unpublished{Jacobson93 ,author="Van Jacobson" ,title="Avoid Read-Side Locking Via Delayed Free" @@ -671,6 +699,20 @@ Orran Krieger and Rusty Russell and Dipankar Sarma and Maneesh Soni" [Viewed October 18, 2004]" } +@conference{Michael02b +,author="Maged M. Michael" +,title="High Performance Dynamic Lock-Free Hash Tables and List-Based Sets" +,Year="2002" +,Month="August" +,booktitle="{Proceedings of the 14\textsuperscript{th} Annual ACM +Symposium on Parallel +Algorithms and Architecture}" +,pages="73-82" +,annotation={ +Like the title says... +} +} + @Conference{Linder02a ,Author="Hanna Linder and Dipankar Sarma and Maneesh Soni" ,Title="Scalability of the Directory Entry Cache" @@ -727,6 +769,24 @@ Andrea Arcangeli and Andi Kleen and Orran Krieger and Rusty Russell" } } +@conference{Michael02a +,author="Maged M. Michael" +,title="Safe Memory Reclamation for Dynamic Lock-Free Objects Using Atomic +Reads and Writes" +,Year="2002" +,Month="August" +,booktitle="{Proceedings of the 21\textsuperscript{st} Annual ACM +Symposium on Principles of Distributed Computing}" +,pages="21-30" +,annotation={ + Each thread keeps an array of pointers to items that it is + currently referencing. Sort of an inside-out garbage collection + mechanism, but one that requires the accessing code to explicitly + state its needs. Also requires read-side memory barriers on + most architectures. +} +} + @unpublished{Dickins02a ,author="Hugh Dickins" ,title="Use RCU for System-V IPC" @@ -735,6 +795,17 @@ Andrea Arcangeli and Andi Kleen and Orran Krieger and Rusty Russell" ,note="private communication" } +@InProceedings{HerlihyLM02 +,author={Maurice Herlihy and Victor Luchangco and Mark Moir} +,title="The Repeat Offender Problem: A Mechanism for Supporting Dynamic-Sized, +Lock-Free Data Structures" +,booktitle={Proceedings of 16\textsuperscript{th} International +Symposium on Distributed Computing} +,year=2002 +,month="October" +,pages="339-353" +} + @unpublished{Sarma02b ,Author="Dipankar Sarma" ,Title="Some dcache\_rcu benchmark numbers" @@ -749,6 +820,19 @@ Andrea Arcangeli and Andi Kleen and Orran Krieger and Rusty Russell" } } +@unpublished{MingmingCao2002IPCRCU +,Author="Mingming Cao" +,Title="[PATCH]updated ipc lock patch" +,month="October" +,year="2002" +,note="Available: +\url{https://lkml.org/lkml/2002/10/24/262} +[Viewed February 15, 2014]" +,annotation={ + Mingming Cao's patch to introduce RCU to SysV IPC. +} +} + @unpublished{LinusTorvalds2003a ,Author="Linus Torvalds" ,Title="Re: {[PATCH]} small fixes in brlock.h" @@ -982,6 +1066,23 @@ Realtime Applications" } } +@article{MagedMichael04a +,author="Maged M. Michael" +,title="Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects" +,Year="2004" +,Month="June" +,journal="IEEE Transactions on Parallel and Distributed Systems" +,volume="15" +,number="6" +,pages="491-504" +,url="Available: +\url{http://www.research.ibm.com/people/m/michael/ieeetpds-2004.pdf} +[Viewed March 1, 2005]" +,annotation={ + New canonical hazard-pointer citation. +} +} + @phdthesis{PaulEdwardMcKenneyPhD ,author="Paul E. McKenney" ,title="Exploiting Deferred Destruction: diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt index 91266193b8f..877947130eb 100644 --- a/Documentation/RCU/checklist.txt +++ b/Documentation/RCU/checklist.txt @@ -114,12 +114,16 @@ over a rather long period of time, but improvements are always welcome! http://www.openvms.compaq.com/wizard/wiz_2637.html The rcu_dereference() primitive is also an excellent - documentation aid, letting the person reading the code - know exactly which pointers are protected by RCU. + documentation aid, letting the person reading the + code know exactly which pointers are protected by RCU. Please note that compilers can also reorder code, and they are becoming increasingly aggressive about doing - just that. The rcu_dereference() primitive therefore - also prevents destructive compiler optimizations. + just that. The rcu_dereference() primitive therefore also + prevents destructive compiler optimizations. However, + with a bit of devious creativity, it is possible to + mishandle the return value from rcu_dereference(). + Please see rcu_dereference.txt in this directory for + more information. The rcu_dereference() primitive is used by the various "_rcu()" list-traversal primitives, such @@ -256,10 +260,10 @@ over a rather long period of time, but improvements are always welcome! variations on this theme. b. Limiting update rate. For example, if updates occur only - once per hour, then no explicit rate limiting is required, - unless your system is already badly broken. The dcache - subsystem takes this approach -- updates are guarded - by a global lock, limiting their rate. + once per hour, then no explicit rate limiting is + required, unless your system is already badly broken. + Older versions of the dcache subsystem take this approach, + guarding updates with a global lock, limiting their rate. c. Trusted update -- if updates can only be done manually by superuser or some other trusted user, then it might not @@ -268,7 +272,8 @@ over a rather long period of time, but improvements are always welcome! the machine. d. Use call_rcu_bh() rather than call_rcu(), in order to take - advantage of call_rcu_bh()'s faster grace periods. + advantage of call_rcu_bh()'s faster grace periods. (This + is only a partial solution, though.) e. Periodically invoke synchronize_rcu(), permitting a limited number of updates per grace period. @@ -276,6 +281,13 @@ over a rather long period of time, but improvements are always welcome! The same cautions apply to call_rcu_bh(), call_rcu_sched(), call_srcu(), and kfree_rcu(). + Note that although these primitives do take action to avoid memory + exhaustion when any given CPU has too many callbacks, a determined + user could still exhaust memory. This is especially the case + if a system with a large number of CPUs has been configured to + offload all of its RCU callbacks onto a single CPU, or if the + system has relatively little free memory. + 9. All RCU list-traversal primitives, which include rcu_dereference(), list_for_each_entry_rcu(), and list_for_each_safe_rcu(), must be either within an RCU read-side diff --git a/Documentation/RCU/rcu_dereference.txt b/Documentation/RCU/rcu_dereference.txt new file mode 100644 index 00000000000..ceb05da5a5a --- /dev/null +++ b/Documentation/RCU/rcu_dereference.txt @@ -0,0 +1,371 @@ +PROPER CARE AND FEEDING OF RETURN VALUES FROM rcu_dereference() + +Most of the time, you can use values from rcu_dereference() or one of +the similar primitives without worries. Dereferencing (prefix "*"), +field selection ("->"), assignment ("="), address-of ("&"), addition and +subtraction of constants, and casts all work quite naturally and safely. + +It is nevertheless possible to get into trouble with other operations. +Follow these rules to keep your RCU code working properly: + +o You must use one of the rcu_dereference() family of primitives + to load an RCU-protected pointer, otherwise CONFIG_PROVE_RCU + will complain. Worse yet, your code can see random memory-corruption + bugs due to games that compilers and DEC Alpha can play. + Without one of the rcu_dereference() primitives, compilers + can reload the value, and won't your code have fun with two + different values for a single pointer! Without rcu_dereference(), + DEC Alpha can load a pointer, dereference that pointer, and + return data preceding initialization that preceded the store of + the pointer. + + In addition, the volatile cast in rcu_dereference() prevents the + compiler from deducing the resulting pointer value. Please see + the section entitled "EXAMPLE WHERE THE COMPILER KNOWS TOO MUCH" + for an example where the compiler can in fact deduce the exact + value of the pointer, and thus cause misordering. + +o Do not use single-element RCU-protected arrays. The compiler + is within its right to assume that the value of an index into + such an array must necessarily evaluate to zero. The compiler + could then substitute the constant zero for the computation, so + that the array index no longer depended on the value returned + by rcu_dereference(). If the array index no longer depends + on rcu_dereference(), then both the compiler and the CPU + are within their rights to order the array access before the + rcu_dereference(), which can cause the array access to return + garbage. + +o Avoid cancellation when using the "+" and "-" infix arithmetic + operators. For example, for a given variable "x", avoid + "(x-x)". There are similar arithmetic pitfalls from other + arithmetic operatiors, such as "(x*0)", "(x/(x+1))" or "(x%1)". + The compiler is within its rights to substitute zero for all of + these expressions, so that subsequent accesses no longer depend + on the rcu_dereference(), again possibly resulting in bugs due + to misordering. + + Of course, if "p" is a pointer from rcu_dereference(), and "a" + and "b" are integers that happen to be equal, the expression + "p+a-b" is safe because its value still necessarily depends on + the rcu_dereference(), thus maintaining proper ordering. + +o Avoid all-zero operands to the bitwise "&" operator, and + similarly avoid all-ones operands to the bitwise "|" operator. + If the compiler is able to deduce the value of such operands, + it is within its rights to substitute the corresponding constant + for the bitwise operation. Once again, this causes subsequent + accesses to no longer depend on the rcu_dereference(), causing + bugs due to misordering. + + Please note that single-bit operands to bitwise "&" can also + be dangerous. At this point, the compiler knows that the + resulting value can only take on one of two possible values. + Therefore, a very small amount of additional information will + allow the compiler to deduce the exact value, which again can + result in misordering. + +o If you are using RCU to protect JITed functions, so that the + "()" function-invocation operator is applied to a value obtained + (directly or indirectly) from rcu_dereference(), you may need to + interact directly with the hardware to flush instruction caches. + This issue arises on some systems when a newly JITed function is + using the same memory that was used by an earlier JITed function. + +o Do not use the results from the boolean "&&" and "||" when + dereferencing. For example, the following (rather improbable) + code is buggy: + + int a[2]; + int index; + int force_zero_index = 1; + + ... + + r1 = rcu_dereference(i1) + r2 = a[r1 && force_zero_index]; /* BUGGY!!! */ + + The reason this is buggy is that "&&" and "||" are often compiled + using branches. While weak-memory machines such as ARM or PowerPC + do order stores after such branches, they can speculate loads, + which can result in misordering bugs. + +o Do not use the results from relational operators ("==", "!=", + ">", ">=", "<", or "<=") when dereferencing. For example, + the following (quite strange) code is buggy: + + int a[2]; + int index; + int flip_index = 0; + + ... + + r1 = rcu_dereference(i1) + r2 = a[r1 != flip_index]; /* BUGGY!!! */ + + As before, the reason this is buggy is that relational operators + are often compiled using branches. And as before, although + weak-memory machines such as ARM or PowerPC do order stores + after such branches, but can speculate loads, which can again + result in misordering bugs. + +o Be very careful about comparing pointers obtained from + rcu_dereference() against non-NULL values. As Linus Torvalds + explained, if the two pointers are equal, the compiler could + substitute the pointer you are comparing against for the pointer + obtained from rcu_dereference(). For example: + + p = rcu_dereference(gp); + if (p == &default_struct) + do_default(p->a); + + Because the compiler now knows that the value of "p" is exactly + the address of the variable "default_struct", it is free to + transform this code into the following: + + p = rcu_dereference(gp); + if (p == &default_struct) + do_default(default_struct.a); + + On ARM and Power hardware, the load from "default_struct.a" + can now be speculated, such that it might happen before the + rcu_dereference(). This could result in bugs due to misordering. + + However, comparisons are OK in the following cases: + + o The comparison was against the NULL pointer. If the + compiler knows that the pointer is NULL, you had better + not be dereferencing it anyway. If the comparison is + non-equal, the compiler is none the wiser. Therefore, + it is safe to compare pointers from rcu_dereference() + against NULL pointers. + + o The pointer is never dereferenced after being compared. + Since there are no subsequent dereferences, the compiler + cannot use anything it learned from the comparison + to reorder the non-existent subsequent dereferences. + This sort of comparison occurs frequently when scanning + RCU-protected circular linked lists. + + o The comparison is against a pointer that references memory + that was initialized "a long time ago." The reason + this is safe is that even if misordering occurs, the + misordering will not affect the accesses that follow + the comparison. So exactly how long ago is "a long + time ago"? Here are some possibilities: + + o Compile time. + + o Boot time. + + o Module-init time for module code. + + o Prior to kthread creation for kthread code. + + o During some prior acquisition of the lock that + we now hold. + + o Before mod_timer() time for a timer handler. + + There are many other possibilities involving the Linux + kernel's wide array of primitives that cause code to + be invoked at a later time. + + o The pointer being compared against also came from + rcu_dereference(). In this case, both pointers depend + on one rcu_dereference() or another, so you get proper + ordering either way. + + That said, this situation can make certain RCU usage + bugs more likely to happen. Which can be a good thing, + at least if they happen during testing. An example + of such an RCU usage bug is shown in the section titled + "EXAMPLE OF AMPLIFIED RCU-USAGE BUG". + + o All of the accesses following the comparison are stores, + so that a control dependency preserves the needed ordering. + That said, it is easy to get control dependencies wrong. + Please see the "CONTROL DEPENDENCIES" section of + Documentation/memory-barriers.txt for more details. + + o The pointers are not equal -and- the compiler does + not have enough information to deduce the value of the + pointer. Note that the volatile cast in rcu_dereference() + will normally prevent the compiler from knowing too much. + +o Disable any value-speculation optimizations that your compiler + might provide, especially if you are making use of feedback-based + optimizations that take data collected from prior runs. Such + value-speculation optimizations reorder operations by design. + + There is one exception to this rule: Value-speculation + optimizations that leverage the branch-prediction hardware are + safe on strongly ordered systems (such as x86), but not on weakly + ordered systems (such as ARM or Power). Choose your compiler + command-line options wisely! + + +EXAMPLE OF AMPLIFIED RCU-USAGE BUG + +Because updaters can run concurrently with RCU readers, RCU readers can +see stale and/or inconsistent values. If RCU readers need fresh or +consistent values, which they sometimes do, they need to take proper +precautions. To see this, consider the following code fragment: + + struct foo { + int a; + int b; + int c; + }; + struct foo *gp1; + struct foo *gp2; + + void updater(void) + { + struct foo *p; + + p = kmalloc(...); + if (p == NULL) + deal_with_it(); + p->a = 42; /* Each field in its own cache line. */ + p->b = 43; + p->c = 44; + rcu_assign_pointer(gp1, p); + p->b = 143; + p->c = 144; + rcu_assign_pointer(gp2, p); + } + + void reader(void) + { + struct foo *p; + struct foo *q; + int r1, r2; + + p = rcu_dereference(gp2); + if (p == NULL) + return; + r1 = p->b; /* Guaranteed to get 143. */ + q = rcu_dereference(gp1); /* Guaranteed non-NULL. */ + if (p == q) { + /* The compiler decides that q->c is same as p->c. */ + r2 = p->c; /* Could get 44 on weakly order system. */ + } + do_something_with(r1, r2); + } + +You might be surprised that the outcome (r1 == 143 && r2 == 44) is possible, +but you should not be. After all, the updater might have been invoked +a second time between the time reader() loaded into "r1" and the time +that it loaded into "r2". The fact that this same result can occur due +to some reordering from the compiler and CPUs is beside the point. + +But suppose that the reader needs a consistent view? + +Then one approach is to use locking, for example, as follows: + + struct foo { + int a; + int b; + int c; + spinlock_t lock; + }; + struct foo *gp1; + struct foo *gp2; + + void updater(void) + { + struct foo *p; + + p = kmalloc(...); + if (p == NULL) + deal_with_it(); + spin_lock(&p->lock); + p->a = 42; /* Each field in its own cache line. */ + p->b = 43; + p->c = 44; + spin_unlock(&p->lock); + rcu_assign_pointer(gp1, p); + spin_lock(&p->lock); + p->b = 143; + p->c = 144; + spin_unlock(&p->lock); + rcu_assign_pointer(gp2, p); + } + + void reader(void) + { + struct foo *p; + struct foo *q; + int r1, r2; + + p = rcu_dereference(gp2); + if (p == NULL) + return; + spin_lock(&p->lock); + r1 = p->b; /* Guaranteed to get 143. */ + q = rcu_dereference(gp1); /* Guaranteed non-NULL. */ + if (p == q) { + /* The compiler decides that q->c is same as p->c. */ + r2 = p->c; /* Locking guarantees r2 == 144. */ + } + spin_unlock(&p->lock); + do_something_with(r1, r2); + } + +As always, use the right tool for the job! + + +EXAMPLE WHERE THE COMPILER KNOWS TOO MUCH + +If a pointer obtained from rcu_dereference() compares not-equal to some +other pointer, the compiler normally has no clue what the value of the +first pointer might be. This lack of knowledge prevents the compiler +from carrying out optimizations that otherwise might destroy the ordering +guarantees that RCU depends on. And the volatile cast in rcu_dereference() +should prevent the compiler from guessing the value. + +But without rcu_dereference(), the compiler knows more than you might +expect. Consider the following code fragment: + + struct foo { + int a; + int b; + }; + static struct foo variable1; + static struct foo variable2; + static struct foo *gp = &variable1; + + void updater(void) + { + initialize_foo(&variable2); + rcu_assign_pointer(gp, &variable2); + /* + * The above is the only store to gp in this translation unit, + * and the address of gp is not exported in any way. + */ + } + + int reader(void) + { + struct foo *p; + + p = gp; + barrier(); + if (p == &variable1) + return p->a; /* Must be variable1.a. */ + else + return p->b; /* Must be variable2.b. */ + } + +Because the compiler can see all stores to "gp", it knows that the only +possible values of "gp" are "variable1" on the one hand and "variable2" +on the other. The comparison in reader() therefore tells the compiler +the exact value of "p" even in the not-equals case. This allows the +compiler to make the return values independent of the load from "gp", +in turn destroying the ordering between this load and the loads of the +return values. This can result in "p->b" returning pre-initialization +garbage values. + +In short, rcu_dereference() is -not- optional when you are going to +dereference the resulting pointer. diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt index 6f3a0057548..68fe3ad2701 100644 --- a/Documentation/RCU/stallwarn.txt +++ b/Documentation/RCU/stallwarn.txt @@ -24,7 +24,7 @@ CONFIG_RCU_CPU_STALL_TIMEOUT timing of the next warning for the current stall. Stall-warning messages may be enabled and disabled completely via - /sys/module/rcutree/parameters/rcu_cpu_stall_suppress. + /sys/module/rcupdate/parameters/rcu_cpu_stall_suppress. CONFIG_RCU_CPU_STALL_VERBOSE diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt index f3778f8952d..910870b15ac 100644 --- a/Documentation/RCU/trace.txt +++ b/Documentation/RCU/trace.txt @@ -396,14 +396,14 @@ o Each element of the form "3/3 ..>. 0:7 ^0" represents one rcu_node The output of "cat rcu/rcu_sched/rcu_pending" looks as follows: - 0!np=26111 qsp=29 rpq=5386 cbr=1 cng=570 gpc=3674 gps=577 nn=15903 - 1!np=28913 qsp=35 rpq=6097 cbr=1 cng=448 gpc=3700 gps=554 nn=18113 - 2!np=32740 qsp=37 rpq=6202 cbr=0 cng=476 gpc=4627 gps=546 nn=20889 - 3 np=23679 qsp=22 rpq=5044 cbr=1 cng=415 gpc=3403 gps=347 nn=14469 - 4!np=30714 qsp=4 rpq=5574 cbr=0 cng=528 gpc=3931 gps=639 nn=20042 - 5 np=28910 qsp=2 rpq=5246 cbr=0 cng=428 gpc=4105 gps=709 nn=18422 - 6!np=38648 qsp=5 rpq=7076 cbr=0 cng=840 gpc=4072 gps=961 nn=25699 - 7 np=37275 qsp=2 rpq=6873 cbr=0 cng=868 gpc=3416 gps=971 nn=25147 + 0!np=26111 qsp=29 rpq=5386 cbr=1 cng=570 gpc=3674 gps=577 nn=15903 ndw=0 + 1!np=28913 qsp=35 rpq=6097 cbr=1 cng=448 gpc=3700 gps=554 nn=18113 ndw=0 + 2!np=32740 qsp=37 rpq=6202 cbr=0 cng=476 gpc=4627 gps=546 nn=20889 ndw=0 + 3 np=23679 qsp=22 rpq=5044 cbr=1 cng=415 gpc=3403 gps=347 nn=14469 ndw=0 + 4!np=30714 qsp=4 rpq=5574 cbr=0 cng=528 gpc=3931 gps=639 nn=20042 ndw=0 + 5 np=28910 qsp=2 rpq=5246 cbr=0 cng=428 gpc=4105 gps=709 nn=18422 ndw=0 + 6!np=38648 qsp=5 rpq=7076 cbr=0 cng=840 gpc=4072 gps=961 nn=25699 ndw=0 + 7 np=37275 qsp=2 rpq=6873 cbr=0 cng=868 gpc=3416 gps=971 nn=25147 ndw=0 The fields are as follows: @@ -432,6 +432,10 @@ o "gpc" is the number of times that an old grace period had o "gps" is the number of times that a new grace period had started, but this CPU was not yet aware of it. +o "ndw" is the number of times that a wakeup of an rcuo + callback-offload kthread had to be deferred in order to avoid + deadlock. + o "nn" is the number of times that this CPU needed nothing. @@ -443,7 +447,7 @@ The output of "cat rcu/rcuboost" looks as follows: balk: nt=0 egt=6541 bt=0 nb=0 ny=126 nos=0 This information is output only for rcu_preempt. Each two-line entry -corresponds to a leaf rcu_node strcuture. The fields are as follows: +corresponds to a leaf rcu_node structure. The fields are as follows: o "n:m" is the CPU-number range for the corresponding two-line entry. In the sample output above, the first entry covers diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt index 0f0fb7c432c..49b8551a3b6 100644 --- a/Documentation/RCU/whatisRCU.txt +++ b/Documentation/RCU/whatisRCU.txt @@ -326,11 +326,11 @@ used as follows: a. synchronize_rcu() rcu_read_lock() / rcu_read_unlock() call_rcu() rcu_dereference() -b. call_rcu_bh() rcu_read_lock_bh() / rcu_read_unlock_bh() - rcu_dereference_bh() +b. synchronize_rcu_bh() rcu_read_lock_bh() / rcu_read_unlock_bh() + call_rcu_bh() rcu_dereference_bh() c. synchronize_sched() rcu_read_lock_sched() / rcu_read_unlock_sched() - preempt_disable() / preempt_enable() + call_rcu_sched() preempt_disable() / preempt_enable() local_irq_save() / local_irq_restore() hardirq enter / hardirq exit NMI enter / NMI exit @@ -794,10 +794,22 @@ in docbook. Here is the list, by category. RCU list traversal: + list_entry_rcu + list_first_entry_rcu + list_next_rcu list_for_each_entry_rcu + list_for_each_entry_continue_rcu + hlist_first_rcu + hlist_next_rcu + hlist_pprev_rcu hlist_for_each_entry_rcu + hlist_for_each_entry_rcu_bh + hlist_for_each_entry_continue_rcu + hlist_for_each_entry_continue_rcu_bh + hlist_nulls_first_rcu hlist_nulls_for_each_entry_rcu - list_for_each_entry_continue_rcu + hlist_bl_first_rcu + hlist_bl_for_each_entry_rcu RCU pointer/list update: @@ -806,28 +818,38 @@ RCU pointer/list update: list_add_tail_rcu list_del_rcu list_replace_rcu - hlist_del_rcu hlist_add_after_rcu hlist_add_before_rcu hlist_add_head_rcu + hlist_del_rcu + hlist_del_init_rcu hlist_replace_rcu list_splice_init_rcu() + hlist_nulls_del_init_rcu + hlist_nulls_del_rcu + hlist_nulls_add_head_rcu + hlist_bl_add_head_rcu + hlist_bl_del_init_rcu + hlist_bl_del_rcu + hlist_bl_set_first_rcu RCU: Critical sections Grace period Barrier rcu_read_lock synchronize_net rcu_barrier rcu_read_unlock synchronize_rcu rcu_dereference synchronize_rcu_expedited - call_rcu - kfree_rcu - + rcu_read_lock_held call_rcu + rcu_dereference_check kfree_rcu + rcu_dereference_protected bh: Critical sections Grace period Barrier rcu_read_lock_bh call_rcu_bh rcu_barrier_bh rcu_read_unlock_bh synchronize_rcu_bh rcu_dereference_bh synchronize_rcu_bh_expedited - + rcu_dereference_bh_check + rcu_dereference_bh_protected + rcu_read_lock_bh_held sched: Critical sections Grace period Barrier @@ -835,7 +857,12 @@ sched: Critical sections Grace period Barrier rcu_read_unlock_sched call_rcu_sched [preempt_disable] synchronize_sched_expedited [and friends] + rcu_read_lock_sched_notrace + rcu_read_unlock_sched_notrace rcu_dereference_sched + rcu_dereference_sched_check + rcu_dereference_sched_protected + rcu_read_lock_sched_held SRCU: Critical sections Grace period Barrier @@ -843,6 +870,8 @@ SRCU: Critical sections Grace period Barrier srcu_read_lock synchronize_srcu srcu_barrier srcu_read_unlock call_srcu srcu_dereference synchronize_srcu_expedited + srcu_dereference_check + srcu_read_lock_held SRCU: Initialization/cleanup init_srcu_struct @@ -850,9 +879,13 @@ SRCU: Initialization/cleanup All: lockdep-checked RCU-protected pointer access - rcu_dereference_check - rcu_dereference_protected + rcu_access_index rcu_access_pointer + rcu_dereference_index_check + rcu_dereference_raw + rcu_lockdep_assert + rcu_sleep_check + RCU_NONIDLE See the comment headers in the source code (or the docbook generated from them) for more information. |
