aboutsummaryrefslogtreecommitdiff
path: root/Documentation/memory-barriers.txt
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/memory-barriers.txt')
-rw-r--r--Documentation/memory-barriers.txt1913
1 files changed, 1913 insertions, 0 deletions
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
new file mode 100644
index 00000000000..f8550310a6d
--- /dev/null
+++ b/Documentation/memory-barriers.txt
@@ -0,0 +1,1913 @@
+ ============================
+ LINUX KERNEL MEMORY BARRIERS
+ ============================
+
+By: David Howells <dhowells@redhat.com>
+
+Contents:
+
+ (*) Abstract memory access model.
+
+ - Device operations.
+ - Guarantees.
+
+ (*) What are memory barriers?
+
+ - Varieties of memory barrier.
+ - What may not be assumed about memory barriers?
+ - Data dependency barriers.
+ - Control dependencies.
+ - SMP barrier pairing.
+ - Examples of memory barrier sequences.
+
+ (*) Explicit kernel barriers.
+
+ - Compiler barrier.
+ - The CPU memory barriers.
+ - MMIO write barrier.
+
+ (*) Implicit kernel memory barriers.
+
+ - Locking functions.
+ - Interrupt disabling functions.
+ - Miscellaneous functions.
+
+ (*) Inter-CPU locking barrier effects.
+
+ - Locks vs memory accesses.
+ - Locks vs I/O accesses.
+
+ (*) Where are memory barriers needed?
+
+ - Interprocessor interaction.
+ - Atomic operations.
+ - Accessing devices.
+ - Interrupts.
+
+ (*) Kernel I/O barrier effects.
+
+ (*) Assumed minimum execution ordering model.
+
+ (*) The effects of the cpu cache.
+
+ - Cache coherency.
+ - Cache coherency vs DMA.
+ - Cache coherency vs MMIO.
+
+ (*) The things CPUs get up to.
+
+ - And then there's the Alpha.
+
+ (*) References.
+
+
+============================
+ABSTRACT MEMORY ACCESS MODEL
+============================
+
+Consider the following abstract model of the system:
+
+ : :
+ : :
+ : :
+ +-------+ : +--------+ : +-------+
+ | | : | | : | |
+ | | : | | : | |
+ | CPU 1 |<----->| Memory |<----->| CPU 2 |
+ | | : | | : | |
+ | | : | | : | |
+ +-------+ : +--------+ : +-------+
+ ^ : ^ : ^
+ | : | : |
+ | : | : |
+ | : v : |
+ | : +--------+ : |
+ | : | | : |
+ | : | | : |
+ +---------->| Device |<----------+
+ : | | :
+ : | | :
+ : +--------+ :
+ : :
+
+Each CPU executes a program that generates memory access operations. In the
+abstract CPU, memory operation ordering is very relaxed, and a CPU may actually
+perform the memory operations in any order it likes, provided program causality
+appears to be maintained. Similarly, the compiler may also arrange the
+instructions it emits in any order it likes, provided it doesn't affect the
+apparent operation of the program.
+
+So in the above diagram, the effects of the memory operations performed by a
+CPU are perceived by the rest of the system as the operations cross the
+interface between the CPU and rest of the system (the dotted lines).
+
+
+For example, consider the following sequence of events:
+
+ CPU 1 CPU 2
+ =============== ===============
+ { A == 1; B == 2 }
+ A = 3; x = A;
+ B = 4; y = B;
+
+The set of accesses as seen by the memory system in the middle can be arranged
+in 24 different combinations:
+
+ STORE A=3, STORE B=4, x=LOAD A->3, y=LOAD B->4
+ STORE A=3, STORE B=4, y=LOAD B->4, x=LOAD A->3
+ STORE A=3, x=LOAD A->3, STORE B=4, y=LOAD B->4
+ STORE A=3, x=LOAD A->3, y=LOAD B->2, STORE B=4
+ STORE A=3, y=LOAD B->2, STORE B=4, x=LOAD A->3
+ STORE A=3, y=LOAD B->2, x=LOAD A->3, STORE B=4
+ STORE B=4, STORE A=3, x=LOAD A->3, y=LOAD B->4
+ STORE B=4, ...
+ ...
+
+and can thus result in four different combinations of values:
+
+ x == 1, y == 2
+ x == 1, y == 4
+ x == 3, y == 2
+ x == 3, y == 4
+
+
+Furthermore, the stores committed by a CPU to the memory system may not be
+perceived by the loads made by another CPU in the same order as the stores were
+committed.
+
+
+As a further example, consider this sequence of events:
+
+ CPU 1 CPU 2
+ =============== ===============
+ { A == 1, B == 2, C = 3, P == &A, Q == &C }
+ B = 4; Q = P;
+ P = &B D = *Q;
+
+There is an obvious data dependency here, as the value loaded into D depends on
+the address retrieved from P by CPU 2. At the end of the sequence, any of the
+following results are possible:
+
+ (Q == &A) and (D == 1)
+ (Q == &B) and (D == 2)
+ (Q == &B) and (D == 4)
+
+Note that CPU 2 will never try and load C into D because the CPU will load P
+into Q before issuing the load of *Q.
+
+
+DEVICE OPERATIONS
+-----------------
+
+Some devices present their control interfaces as collections of memory
+locations, but the order in which the control registers are accessed is very
+important. For instance, imagine an ethernet card with a set of internal
+registers that are accessed through an address port register (A) and a data
+port register (D). To read internal register 5, the following code might then
+be used:
+
+ *A = 5;
+ x = *D;
+
+but this might show up as either of the following two sequences:
+
+ STORE *A = 5, x = LOAD *D
+ x = LOAD *D, STORE *A = 5
+
+the second of which will almost certainly result in a malfunction, since it set
+the address _after_ attempting to read the register.
+
+
+GUARANTEES
+----------
+
+There are some minimal guarantees that may be expected of a CPU:
+
+ (*) On any given CPU, dependent memory accesses will be issued in order, with
+ respect to itself. This means that for:
+
+ Q = P; D = *Q;
+
+ the CPU will issue the following memory operations:
+
+ Q = LOAD P, D = LOAD *Q
+
+ and always in that order.
+
+ (*) Overlapping loads and stores within a particular CPU will appear to be
+ ordered within that CPU. This means that for:
+
+ a = *X; *X = b;
+
+ the CPU will only issue the following sequence of memory operations:
+
+ a = LOAD *X, STORE *X = b
+
+ And for:
+
+ *X = c; d = *X;
+
+ the CPU will only issue:
+
+ STORE *X = c, d = LOAD *X
+
+ (Loads and stores overlap if they are targetted at overlapping pieces of
+ memory).
+
+And there are a number of things that _must_ or _must_not_ be assumed:
+
+ (*) It _must_not_ be assumed that independent loads and stores will be issued
+ in the order given. This means that for:
+
+ X = *A; Y = *B; *D = Z;
+
+ we may get any of the following sequences:
+
+ X = LOAD *A, Y = LOAD *B, STORE *D = Z
+ X = LOAD *A, STORE *D = Z, Y = LOAD *B
+ Y = LOAD *B, X = LOAD *A, STORE *D = Z
+ Y = LOAD *B, STORE *D = Z, X = LOAD *A
+ STORE *D = Z, X = LOAD *A, Y = LOAD *B
+ STORE *D = Z, Y = LOAD *B, X = LOAD *A
+
+ (*) It _must_ be assumed that overlapping memory accesses may be merged or
+ discarded. This means that for:
+
+ X = *A; Y = *(A + 4);
+
+ we may get any one of the following sequences:
+
+ X = LOAD *A; Y = LOAD *(A + 4);
+ Y = LOAD *(A + 4); X = LOAD *A;
+ {X, Y} = LOAD {*A, *(A + 4) };
+
+ And for:
+
+ *A = X; Y = *A;
+
+ we may get either of:
+
+ STORE *A = X; Y = LOAD *A;
+ STORE *A = Y;
+
+
+=========================
+WHAT ARE MEMORY BARRIERS?
+=========================
+
+As can be seen above, independent memory operations are effectively performed
+in random order, but this can be a problem for CPU-CPU interaction and for I/O.
+What is required is some way of intervening to instruct the compiler and the
+CPU to restrict the order.
+
+Memory barriers are such interventions. They impose a perceived partial
+ordering between the memory operations specified on either side of the barrier.
+They request that the sequence of memory events generated appears to other
+parts of the system as if the barrier is effective on that CPU.
+
+
+VARIETIES OF MEMORY BARRIER
+---------------------------
+
+Memory barriers come in four basic varieties:
+
+ (1) Write (or store) memory barriers.
+
+ A write memory barrier gives a guarantee that all the STORE operations
+ specified before the barrier will appear to happen before all the STORE
+ operations specified after the barrier with respect to the other
+ components of the system.
+
+ A write barrier is a partial ordering on stores only; it is not required
+ to have any effect on loads.
+
+ A CPU can be viewed as as commiting a sequence of store operations to the
+ memory system as time progresses. All stores before a write barrier will
+ occur in the sequence _before_ all the stores after the write barrier.
+
+ [!] Note that write barriers should normally be paired with read or data
+ dependency barriers; see the "SMP barrier pairing" subsection.
+
+
+ (2) Data dependency barriers.
+
+ A data dependency barrier is a weaker form of read barrier. In the case
+ where two loads are performed such that the second depends on the result
+ of the first (eg: the first load retrieves the address to which the second
+ load will be directed), a data dependency barrier would be required to
+ make sure that the target of the second load is updated before the address
+ obtained by the first load is accessed.
+
+ A data dependency barrier is a partial ordering on interdependent loads
+ only; it is not required to have any effect on stores, independent loads
+ or overlapping loads.
+
+ As mentioned in (1), the other CPUs in the system can be viewed as
+ committing sequences of stores to the memory system that the CPU being
+ considered can then perceive. A data dependency barrier issued by the CPU
+ under consideration guarantees that for any load preceding it, if that
+ load touches one of a sequence of stores from another CPU, then by the
+ time the barrier completes, the effects of all the stores prior to that
+ touched by the load will be perceptible to any loads issued after the data
+ dependency barrier.
+
+ See the "Examples of memory barrier sequences" subsection for diagrams
+ showing the ordering constraints.
+
+ [!] Note that the first load really has to have a _data_ dependency and
+ not a control dependency. If the address for the second load is dependent
+ on the first load, but the dependency is through a conditional rather than
+ actually loading the address itself, then it's a _control_ dependency and
+ a full read barrier or better is required. See the "Control dependencies"
+ subsection for more information.
+
+ [!] Note that data dependency barriers should normally be paired with
+ write barriers; see the "SMP barrier pairing" subsection.
+
+
+ (3) Read (or load) memory barriers.
+
+ A read barrier is a data dependency barrier plus a guarantee that all the
+ LOAD operations specified before the barrier will appear to happen before
+ all the LOAD operations specified after the barrier with respect to the
+ other components of the system.
+
+ A read barrier is a partial ordering on loads only; it is not required to
+ have any effect on stores.
+
+ Read memory barriers imply data dependency barriers, and so can substitute
+ for them.
+
+ [!] Note that read barriers should normally be paired with write barriers;
+ see the "SMP barrier pairing" subsection.
+
+
+ (4) General memory barriers.
+
+ A general memory barrier is a combination of both a read memory barrier
+ and a write memory barrier. It is a partial ordering over both loads and
+ stores.
+
+ General memory barriers imply both read and write memory barriers, and so
+ can substitute for either.
+
+
+And a couple of implicit varieties:
+
+ (5) LOCK operations.
+
+ This acts as a one-way permeable barrier. It guarantees that all memory
+ operations after the LOCK operation will appear to happen after the LOCK
+ operation with respect to the other components of the system.
+
+ Memory operations that occur before a LOCK operation may appear to happen
+ after it completes.
+
+ A LOCK operation should almost always be paired with an UNLOCK operation.
+
+
+ (6) UNLOCK operations.
+
+ This also acts as a one-way permeable barrier. It guarantees that all
+ memory operations before the UNLOCK operation will appear to happen before
+ the UNLOCK operation with respect to the other components of the system.
+
+ Memory operations that occur after an UNLOCK operation may appear to
+ happen before it completes.
+
+ LOCK and UNLOCK operations are guaranteed to appear with respect to each
+ other strictly in the order specified.
+
+ The use of LOCK and UNLOCK operations generally precludes the need for
+ other sorts of memory barrier (but note the exceptions mentioned in the
+ subsection "MMIO write barrier").
+
+
+Memory barriers are only required where there's a possibility of interaction
+between two CPUs or between a CPU and a device. If it can be guaranteed that
+there won't be any such interaction in any particular piece of code, then
+memory barriers are unnecessary in that piece of code.
+
+
+Note that these are the _minimum_ guarantees. Different architectures may give
+more substantial guarantees, but they may _not_ be relied upon outside of arch
+specific code.
+
+
+WHAT MAY NOT BE ASSUMED ABOUT MEMORY BARRIERS?
+----------------------------------------------
+
+There are certain things that the Linux kernel memory barriers do not guarantee:
+
+ (*) There is no guarantee that any of the memory accesses specified before a
+ memory barrier will be _complete_ by the completion of a memory barrier
+ instruction; the barrier can be considered to draw a line in that CPU's
+ access queue that accesses of the appropriate type may not cross.
+
+ (*) There is no guarantee that issuing a memory barrier on one CPU will have
+ any direct effect on another CPU or any other hardware in the system. The
+ indirect effect will be the order in which the second CPU sees the effects
+ of the first CPU's accesses occur, but see the next point:
+
+ (*) There is no guarantee that the a CPU will see the correct order of effects
+ from a second CPU's accesses, even _if_ the second CPU uses a memory
+ barrier, unless the first CPU _also_ uses a matching memory barrier (see
+ the subsection on "SMP Barrier Pairing").
+
+ (*) There is no guarantee that some intervening piece of off-the-CPU
+ hardware[*] will not reorder the memory accesses. CPU cache coherency
+ mechanisms should propagate the indirect effects of a memory barrier
+ between CPUs, but might not do so in order.
+
+ [*] For information on bus mastering DMA and coherency please read:
+
+ Documentation/pci.txt
+ Documentation/DMA-mapping.txt
+ Documentation/DMA-API.txt
+
+
+DATA DEPENDENCY BARRIERS
+------------------------
+
+The usage requirements of data dependency barriers are a little subtle, and
+it's not always obvious that they're needed. To illustrate, consider the
+following sequence of events:
+
+ CPU 1 CPU 2
+ =============== ===============
+ { A == 1, B == 2, C = 3, P == &A, Q == &C }
+ B = 4;
+ <write barrier>
+ P = &B
+ Q = P;
+ D = *Q;
+
+There's a clear data dependency here, and it would seem that by the end of the
+sequence, Q must be either &A or &B, and that:
+
+ (Q == &A) implies (D == 1)
+ (Q == &B) implies (D == 4)
+
+But! CPU 2's perception of P may be updated _before_ its perception of B, thus
+leading to the following situation:
+
+ (Q == &B) and (D == 2) ????
+
+Whilst this may seem like a failure of coherency or causality maintenance, it
+isn't, and this behaviour can be observed on certain real CPUs (such as the DEC
+Alpha).
+
+To deal with this, a data dependency barrier must be inserted between the
+address load and the data load:
+
+ CPU 1 CPU 2
+ =============== ===============
+ { A == 1, B == 2, C = 3, P == &A, Q == &C }
+ B = 4;
+ <write barrier>
+ P = &B
+ Q = P;
+ <data dependency barrier>
+ D = *Q;
+
+This enforces the occurrence of one of the two implications, and prevents the
+third possibility from arising.
+
+[!] Note that this extremely counterintuitive situation arises most easily on
+machines with split caches, so that, for example, one cache bank processes
+even-numbered cache lines and the other bank processes odd-numbered cache
+lines. The pointer P might be stored in an odd-numbered cache line, and the
+variable B might be stored in an even-numbered cache line. Then, if the
+even-numbered bank of the reading CPU's cache is extremely busy while the
+odd-numbered bank is idle, one can see the new value of the pointer P (&B),
+but the old value of the variable B (1).
+
+
+Another example of where data dependency barriers might by required is where a
+number is read from memory and then used to calculate the index for an array
+access:
+
+ CPU 1 CPU 2
+ =============== ===============
+ { M[0] == 1, M[1] == 2, M[3] = 3, P == 0, Q == 3 }
+ M[1] = 4;
+ <write barrier>
+ P = 1
+ Q = P;
+ <data dependency barrier>
+ D = M[Q];
+
+
+The data dependency barrier is very important to the RCU system, for example.
+See rcu_dereference() in include/linux/rcupdate.h. This permits the current
+target of an RCU'd pointer to be replaced with a new modified target, without
+the replacement target appearing to be incompletely initialised.
+
+See also the subsection on "Cache Coherency" for a more thorough example.
+
+
+CONTROL DEPENDENCIES
+--------------------
+
+A control dependency requires a full read memory barrier, not simply a data
+dependency barrier to make it work correctly. Consider the following bit of
+code:
+
+ q = &a;
+ if (p)
+ q = &b;
+ <data dependency barrier>
+ x = *q;
+
+This will not have the desired effect because there is no actual data
+dependency, but rather a control dependency that the CPU may short-circuit by
+attempting to predict the outcome in advance. In such a case what's actually
+required is:
+
+ q = &a;
+ if (p)
+ q = &b;
+ <read barrier>
+ x = *q;
+
+
+SMP BARRIER PAIRING
+-------------------
+
+When dealing with CPU-CPU interactions, certain types of memory barrier should
+always be paired. A lack of appropriate pairing is almost certainly an error.
+
+A write barrier should always be paired with a data dependency barrier or read
+barrier, though a general barrier would also be viable. Similarly a read
+barrier or a data dependency barrier should always be paired with at least an
+write barrier, though, again, a general barrier is viable:
+
+ CPU 1 CPU 2
+ =============== ===============
+ a = 1;
+ <write barrier>
+ b = 2; x = a;
+ <read barrier>
+ y = b;
+
+Or:
+
+ CPU 1 CPU 2
+ =============== ===============================
+ a = 1;
+ <write barrier>
+ b = &a; x = b;
+ <data dependency barrier>
+ y = *x;
+
+Basically, the read barrier always has to be there, even though it can be of
+the "weaker" type.
+
+
+EXAMPLES OF MEMORY BARRIER SEQUENCES
+------------------------------------
+
+Firstly, write barriers act as a partial orderings on store operations.
+Consider the following sequence of events:
+
+ CPU 1
+ =======================
+ STORE A = 1
+ STORE B = 2
+ STORE C = 3
+ <write barrier>
+ STORE D = 4
+ STORE E = 5
+
+This sequence of events is committed to the memory coherence system in an order
+that the rest of the system might perceive as the unordered set of { STORE A,
+STORE B, STORE C } all occuring before the unordered set of { STORE D, STORE E
+}:
+
+ +-------+ : :
+ | | +------+
+ | |------>| C=3 | } /\
+ | | : +------+ }----- \ -----> Events perceptible
+ | | : | A=1 | } \/ to rest of system
+ | | : +------+ }
+ | CPU 1 | : | B=2 | }
+ | | +------+ }
+ | | wwwwwwwwwwwwwwww } <--- At this point the write barrier
+ | | +------+ } requires all stores prior to the
+ | | : | E=5 | } barrier to be committed before
+ | | : +------+ } further stores may be take place.
+ | |------>| D=4 | }
+ | | +------+
+ +-------+ : :
+ |
+ | Sequence in which stores committed to memory system
+ | by CPU 1
+ V
+
+
+Secondly, data dependency barriers act as a partial orderings on data-dependent
+loads. Consider the following sequence of events:
+
+ CPU 1 CPU 2
+ ======================= =======================
+ STORE A = 1
+ STORE B = 2
+ <write barrier>
+ STORE C = &B LOAD X
+ STORE D = 4 LOAD C (gets &B)
+ LOAD *C (reads B)
+
+Without intervention, CPU 2 may perceive the events on CPU 1 in some
+effectively random order, despite the write barrier issued by CPU 1:
+
+ +-------+ : : : :
+ | | +------+ +-------+ | Sequence of update
+ | |------>| B=2 |----- --->| Y->8 | | of perception on
+ | | : +------+ \ +-------+ | CPU 2
+ | CPU 1 | : | A=1 | \ --->| C->&Y | V
+ | | +------+ | +-------+
+ | | wwwwwwwwwwwwwwww | : :
+ | | +------+ | : :
+ | | : | C=&B |--- | : : +-------+
+ | | : +------+ \ | +-------+ | |
+ | |------>| D=4 | ----------->| C->&B |------>| |
+ | | +------+ | +-------+ | |
+ +-------+ : : | : : | |
+ | : : | |
+ | : : | CPU 2 |
+ | +-------+ | |
+ Apparently incorrect ---> | | B->7 |------>| |
+ perception of B (!) | +-------+ | |
+ | : : | |
+ | +-------+ | |
+ The load of X holds ---> \ | X->9 |------>| |
+ up the maintenance \ +-------+ | |
+ of coherence of B ----->| B->2 | +-------+
+ +-------+
+ : :
+
+
+In the above example, CPU 2 perceives that B is 7, despite the load of *C
+(which would be B) coming after the the LOAD of C.
+
+If, however, a data dependency barrier were to be placed between the load of C
+and the load of *C (ie: B) on CPU 2, then the following will occur:
+
+ +-------+ : : : :
+ | | +------+ +-------+
+ | |------>| B=2 |----- --->| Y->8 |
+ | | : +------+ \ +-------+
+ | CPU 1 | : | A=1 | \ --->| C->&Y |
+ | | +------+ | +-------+
+ | | wwwwwwwwwwwwwwww | : :
+ | | +------+ | : :
+ | | : | C=&B |--- | : : +-------+
+ | | : +------+ \ | +-------+ | |
+ | |------>| D=4 | ----------->| C->&B |------>| |
+ | | +------+ | +-------+ | |
+ +-------+ : : | : : | |
+ | : : | |
+ | : : | CPU 2 |
+ | +-------+ | |
+ \ | X->9 |------>| |
+ \ +-------+ | |
+ ----->| B->2 | | |
+ +-------+ | |
+ Makes sure all effects ---> ddddddddddddddddd | |
+ prior to the store of C +-------+ | |
+ are perceptible to | B->2 |------>| |
+ successive loads +-------+ | |
+ : : +-------+
+
+
+And thirdly, a read barrier acts as a partial order on loads. Consider the
+following sequence of events:
+
+ CPU 1 CPU 2
+ ======================= =======================
+ STORE A=1
+ STORE B=2
+ STORE C=3
+ <write barrier>
+ STORE D=4
+ STORE E=5
+ LOAD A
+ LOAD B
+ LOAD C
+ LOAD D
+ LOAD E
+
+Without intervention, CPU 2 may then choose to perceive the events on CPU 1 in
+some effectively random order, despite the write barrier issued by CPU 1:
+
+ +-------+ : :
+ | | +------+
+ | |------>| C=3 | }
+ | | : +------+ }
+ | | : | A=1 | }
+ | | : +------+ }
+ | CPU 1 | : | B=2 | }---
+ | | +------+ } \
+ | | wwwwwwwwwwwww} \
+ | | +------+ } \ : : +-------+
+ | | : | E=5 | } \ +-------+ | |
+ | | : +------+ } \ { | C->3 |------>| |
+ | |------>| D=4 | } \ { +-------+ : | |
+ | | +------+ \ { | E->5 | : | |
+ +-------+ : : \ { +-------+ : | |
+ Transfer -->{ | A->1 | : | CPU 2 |
+ from CPU 1 { +-------+ : | |
+ to CPU 2 { | D->4 | : | |
+ { +-------+ : | |
+ { | B->2 |------>| |
+ +-------+ | |
+ : : +-------+
+
+
+If, however, a read barrier were to be placed between the load of C and the
+load of D on CPU 2, then the partial ordering imposed by CPU 1 will be
+perceived correctly by CPU 2.
+
+ +-------+ : :
+ | | +------+
+ | |------>| C=3 | }
+ | | : +------+ }
+ | | : | A=1 | }---
+ | | : +------+ } \
+ | CPU 1 | : | B=2 | } \
+ | | +------+ \
+ | | wwwwwwwwwwwwwwww \
+ | | +------+ \ : : +-------+
+ | | : | E=5 | } \ +-------+ | |
+ | | : +------+ }--- \ { | C->3 |------>| |
+ | |------>| D=4 | } \ \ { +-------+ : | |
+ | | +------+ \ -->{ | B->2 | : | |
+ +-------+ : : \ { +-------+ : | |
+ \ { | A->1 | : | CPU 2 |
+ \ +-------+ | |
+ At this point the read ----> \ rrrrrrrrrrrrrrrrr | |
+ barrier causes all effects \ +-------+ | |
+ prior to the storage of C \ { | E->5 | : | |
+ to be perceptible to CPU 2 -->{ +-------+ : | |
+ { | D->4 |------>| |
+ +-------+ | |
+ : : +-------+
+
+
+========================
+EXPLICIT KERNEL BARRIERS
+========================
+
+The Linux kernel has a variety of different barriers that act at different
+levels:
+
+ (*) Compiler barrier.
+
+ (*) CPU memory barriers.
+
+ (*) MMIO write barrier.
+
+
+COMPILER BARRIER
+----------------
+
+The Linux kernel has an explicit compiler barrier function that prevents the
+compiler from moving the memory accesses either side of it to the other side:
+
+ barrier();
+
+This a general barrier - lesser varieties of compiler barrier do not exist.
+
+The compiler barrier has no direct effect on the CPU, which may then reorder
+things however it wishes.
+
+
+CPU MEMORY BARRIERS
+-------------------
+
+The Linux kernel has eight basic CPU memory barriers:
+
+ TYPE MANDATORY SMP CONDITIONAL
+ =============== ======================= ===========================
+ GENERAL mb() smp_mb()
+ WRITE wmb() smp_wmb()
+ READ rmb() smp_rmb()
+ DATA DEPENDENCY read_barrier_depends() smp_read_barrier_depends()
+
+
+All CPU memory barriers unconditionally imply compiler barriers.
+
+SMP memory barriers are reduced to compiler barriers on uniprocessor compiled
+systems because it is assumed that a CPU will be appear to be self-consistent,
+and will order overlapping accesses correctly with respect to itself.
+
+[!] Note that SMP memory barriers _must_ be used to control the ordering of
+references to shared memory on SMP systems, though the use of locking instead
+is sufficient.
+
+Mandatory barriers should not be used to control SMP effects, since mandatory
+barriers unnecessarily impose overhead on UP systems. They may, however, be
+used to control MMIO effects on accesses through relaxed memory I/O windows.
+These are required even on non-SMP systems as they affect the order in which
+memory operations appear to a device by prohibiting both the compiler and the
+CPU from reordering them.
+
+
+There are some more advanced barrier functions:
+
+ (*) set_mb(var, value)
+ (*) set_wmb(var, value)
+
+ These assign the value to the variable and then insert at least a write
+ barrier after it, depending on the function. They aren't guaranteed to
+ insert anything more than a compiler barrier in a UP compilation.
+
+
+ (*) smp_mb__before_atomic_dec();
+ (*) smp_mb__after_atomic_dec();
+ (*) smp_mb__before_atomic_inc();
+ (*) smp_mb__after_atomic_inc();
+
+ These are for use with atomic add, subtract, increment and decrement
+ functions, especially when used for reference counting. These functions
+ do not imply memory barriers.
+
+ As an example, consider a piece of code that marks an object as being dead
+ and then decrements the object's reference count:
+
+ obj->dead = 1;
+ smp_mb__before_atomic_dec();
+ atomic_dec(&obj->ref_count);
+
+ This makes sure that the death mark on the object is perceived to be set
+ *before* the reference counter is decremented.
+
+ See Documentation/atomic_ops.txt for more information. See the "Atomic
+ operations" subsection for information on where to use these.
+
+
+ (*) smp_mb__before_clear_bit(void);
+ (*) smp_mb__after_clear_bit(void);
+
+ These are for use similar to the atomic inc/dec barriers. These are
+ typically used for bitwise unlocking operations, so care must be taken as
+ there are no implicit memory barriers here either.
+
+ Consider implementing an unlock operation of some nature by clearing a
+ locking bit. The clear_bit() would then need to be barriered like this:
+
+ smp_mb__before_clear_bit();
+ clear_bit( ... );
+
+ This prevents memory operations before the clear leaking to after it. See
+ the subsection on "Locking Functions" with reference to UNLOCK operation
+ implications.
+
+ See Documentation/atomic_ops.txt for more information. See the "Atomic
+ operations" subsection for information on where to use these.
+
+
+MMIO WRITE BARRIER
+------------------
+
+The Linux kernel also has a special barrier for use with memory-mapped I/O
+writes:
+
+ mmiowb();
+
+This is a variation on the mandatory write barrier that causes writes to weakly
+ordered I/O regions to be partially ordered. Its effects may go beyond the
+CPU->Hardware interface and actually affect the hardware at some level.
+
+See the subsection "Locks vs I/O accesses" for more information.
+
+
+===============================
+IMPLICIT KERNEL MEMORY BARRIERS
+===============================
+
+Some of the other functions in the linux kernel imply memory barriers, amongst
+which are locking, scheduling and memory allocation functions.
+
+This specification is a _minimum_ guarantee; any particular architecture may
+provide more substantial guarantees, but these may not be relied upon outside
+of arch specific code.
+
+
+LOCKING FUNCTIONS
+-----------------
+
+The Linux kernel has a number of locking constructs:
+
+ (*) spin locks
+ (*) R/W spin locks
+ (*) mutexes
+ (*) semaphores
+ (*) R/W semaphores
+ (*) RCU
+
+In all cases there are variants on "LOCK" operations and "UNLOCK" operations
+for each construct. These operations all imply certain barriers:
+
+ (1) LOCK operation implication:
+
+ Memory operations issued after the LOCK will be completed after the LOCK
+ operation has completed.
+
+ Memory operations issued before the LOCK may be completed after the LOCK
+ operation has completed.
+
+ (2) UNLOCK operation implication:
+
+ Memory operations issued before the UNLOCK will be completed before the
+ UNLOCK operation has completed.
+
+ Memory operations issued after the UNLOCK may be completed before the
+ UNLOCK operation has completed.
+
+ (3) LOCK vs LOCK implication:
+
+ All LOCK operations issued before another LOCK operation will be completed
+ before that LOCK operation.
+
+ (4) LOCK vs UNLOCK implication:
+
+ All LOCK operations issued before an UNLOCK operation will be completed
+ before the UNLOCK operation.
+
+ All UNLOCK operations issued before a LOCK operation will be completed
+ before the LOCK operation.
+
+ (5) Failed conditional LOCK implication:
+
+ Certain variants of the LOCK operation may fail, either due to being
+ unable to get the lock immediately, or due to receiving an unblocked
+ signal whilst asleep waiting for the lock to become available. Failed
+ locks do not imply any sort of barrier.
+
+Therefore, from (1), (2) and (4) an UNLOCK followed by an unconditional LOCK is
+equivalent to a full barrier, but a LOCK followed by an UNLOCK is not.
+
+[!] Note: one of the consequence of LOCKs and UNLOCKs being only one-way
+ barriers is that the effects instructions outside of a critical section may
+ seep into the inside of the critical section.
+
+Locks and semaphores may not provide any guarantee of ordering on UP compiled
+systems, and so cannot be counted on in such a situation to actually achieve
+anything at all - especially with respect to I/O accesses - unless combined
+with interrupt disabling operations.
+
+See also the section on "Inter-CPU locking barrier effects".
+
+
+As an example, consider the following:
+
+ *A = a;
+ *B = b;
+ LOCK
+ *C = c;
+ *D = d;
+ UNLOCK
+ *E = e;
+ *F = f;
+
+The following sequence of events is acceptable:
+
+ LOCK, {*F,*A}, *E, {*C,*D}, *B, UNLOCK
+
+ [+] Note that {*F,*A} indicates a combined access.
+
+But none of the following are:
+
+ {*F,*A}, *B, LOCK, *C, *D, UNLOCK, *E
+ *A, *B, *C, LOCK, *D, UNLOCK, *E, *F
+ *A, *B, LOCK, *C, UNLOCK, *D, *E, *F
+ *B, LOCK, *C, *D, UNLOCK, {*F,*A}, *E
+
+
+
+INTERRUPT DISABLING FUNCTIONS
+-----------------------------
+
+Functions that disable interrupts (LOCK equivalent) and enable interrupts
+(UNLOCK equivalent) will act as compiler barriers only. So if memory or I/O
+barriers are required in such a situation, they must be provided from some
+other means.
+
+
+MISCELLANEOUS FUNCTIONS
+-----------------------
+
+Other functions that imply barriers:
+
+ (*) schedule() and similar imply full memory barriers.
+
+ (*) Memory allocation and release functions imply full memory barriers.
+
+
+=================================
+INTER-CPU LOCKING BARRIER EFFECTS
+=================================
+
+On SMP systems locking primitives give a more substantial form of barrier: one
+that does affect memory access ordering on other CPUs, within the context of
+conflict on any particular lock.
+
+
+LOCKS VS MEMORY ACCESSES
+------------------------
+
+Consider the following: the system has a pair of spinlocks (N) and (Q), and
+three CPUs; then should the following sequence of events occur:
+
+ CPU 1 CPU 2
+ =============================== ===============================
+ *A = a; *E = e;
+ LOCK M LOCK Q
+ *B = b; *F = f;
+ *C = c; *G = g;
+ UNLOCK M UNLOCK Q
+ *D = d; *H = h;
+
+Then there is no guarantee as to what order CPU #3 will see the accesses to *A
+through *H occur in, other than the constraints imposed by the separate locks
+on the separate CPUs. It might, for example, see:
+
+ *E, LOCK M, LOCK Q, *G, *C, *F, *A, *B, UNLOCK Q, *D, *H, UNLOCK M
+
+But it won't see any of:
+
+ *B, *C or *D preceding LOCK M