Merge remote-tracking branch 'origin/master'

author: Eli Bendersky <eliben@chromium.org> 2013-07-18 18:00:27 -0700
committer: Eli Bendersky <eliben@chromium.org> 2013-07-18 18:00:27 -0700
commit: 4412ea4b8e019d00dc7574fe1723eea0473a8ec1 (patch)
tree: 2badd5ce0727bfad02f10d0d82c8bcfa65677676 /docs
parent: 4a9f2a703db400ccf760f34101bcdd57642f96e4 (diff)
parent: 5b548094edef39376e17445aea28ad2b37d701c4 (diff)
1 files changed, 174 insertions, 13 deletions
diff --git a/docs/PNaClLangRef.rst b/docs/PNaClLangRef.rst
index 4f322a3eff..eef6023e2b 100644
--- a/docs/PNaClLangRef.rst
+++ b/docs/PNaClLangRef.rst
@@ -113,21 +113,139 @@ Volatile Memory Accesses
 
 `LLVM LangRef: Volatile Memory Accesses <LangRef.html#volatile>`_
 
-TODO: are we going to promote volatile to atomic?
+PNaCl bitcode does not support volatile memory accesses.
+
+.. note::
+
+    The C11/C++11 standards mandate that ``volatile`` accesses execute
+    in program order (but are not fences, so other memory operations can
+    reorder around them), are not necessarily atomic, and can’t be
+    elided. They can be separated into smaller width accesses.
+
+    The PNaCl toolchain applies regular LLVM optimizations along these
+    guidelines, and it further prevents any load/store (even
+    non-``volatile`` and non-atomic ones) from moving above or below a
+    volatile operations: they act as compiler barriers before
+    optimizations occur. The PNaCl toolchain freezes ``volatile``
+    accesses after optimizations into atomic accesses with sequentially
+    consistent memory ordering. This eases the support of legacy
+    (i.e. non-C11/C++11) code, and combined with builtin fences these
+    programs can do meaningful cross-thread communication without
+    changing code. It also reflects the original code's intent and
+    guarantees better portability.
+
+    Relaxed ordering could be used instead, but for the first release it
+    is more conservative to apply sequential consistency. Future
+    releases may change what happens at compile-time, but
+    already-released pexes will continue using sequential consistency.
+
+    The PNaCl toolchain also requires that ``volatile`` accesses be at
+    least naturally aligned, and tries to guarantee this alignment.
 
 Memory Model for Concurrent Operations
 --------------------------------------
 
 `LLVM LangRef: Memory Model for Concurrent Operations <LangRef.html#memmodel>`_
 
-TODO.
+The memory model offered by PNaCl relies on the same coding guidelines
+as the C11/C++11 one: concurrent accesses must always occur through
+atomic primitives (offered by `atomic intrinsics`_), and these accesses
+must always occur with the same size for the same memory
+location. Visibility of stores is provided on a happens-before basis
+that relates memory locations to each other as the C11/C++11 standards
+do.
+
+.. note::
+
+    As in C11/C++11 some atomic accesses may be implemented with locks
+    on certain platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always
+    be ``1``, signifying that all types are sometimes lock-free. The
+    ``is_lock_free`` methods will return the current platform's
+    implementation at runtime.
+
+    The PNaCl toolchain supports concurrent memory accesses through
+    legacy GCC-style ``__sync_*`` builtins, as well as through C11/C++11
+    atomic primitives. ``volatile`` memory accesses can also be used,
+    though these are discouraged, and aren't present in bitcode.
+
+    PNaCl supports concurrency and parallelism with some restrictions:
+
+    * Threading is explicitly supported.
+    * Inter-process communication through shared memory is limited to
+      operations which are lock-free on the current platform
+      (``is_lock_free`` methods). This may change at a later date.
+    * Direct interaction with device memory isn't supported.
+    * Signal handling isn't supported, PNaCl therefore promotes all
+      primitives to cross-thread (instead of single-thread). This may
+      change at a later date. Note that using atomic operations which
+      aren't lock-free may lead to deadlocks when handling asynchronous
+      signals.
+    * ``volatile`` and atomic operations are address-free (operations on
+      the same memory location via two different addresses work
+      atomically), as intended by the C11/C++11 standards. This is
+      critical for inter-process communication as well as synchronous
+      "external modifications" such as mapping underlying memory at
+      multiple locations.
+
+    Setting up the above mechanisms requires assistance from the
+    embedding sandbox's runtime (e.g. NaCl's Pepper APIs), but using
+    them once setup can be done through regular C/C++ code.
+
+    The PNaCl toolchain currently optimizes for memory ordering as LLVM
+    normally does, but at pexe creation time it promotes all
+    ``volatile`` accesses as well as all atomic accesses to be
+    sequentially consistent. Other memory orderings will be supported in
+    a future release, but pexes generated with the current toolchain
+    will continue functioning with sequential consistency. Using
+    sequential consistency provides a total ordering for all
+    sequentially-consistent operations on all addresses.
+
+    This means that ``volatile`` and atomic memory accesses can only be
+    re-ordered in some limited way before the pexe is created, and will
+    act as fences for all memory accesses (even non-atomic and
+    non-``volatile``) after pexe creation. Non-atomic and
+    non-``volatile`` memory accesses may be reordered (unless a fence
+    intervenes), separated, elided or fused according to C and C++'s
+    memory model before the pexe is created as well as after its
+    creation.
 
 Atomic Memory Ordering Constraints
 ----------------------------------
 
 `LLVM LangRef: Atomic Memory Ordering Constraints <LangRef.html#ordering>`_
 
-TODO.
+PNaCl bitcode currently supports sequential consistency only, through
+its `atomic intrinsics`_.
+
+.. note::
+
+    Atomics follow the same ordering constraints as in regular LLVM, but
+    all accesses are promoted to sequential consistency (the strongest
+    memory ordering) at pexe creation time. As more C11/C++11 code
+    allows us to understand performance and portability needs we intend
+    to support the full gamut of C11/C++11 memory orderings:
+
+       - Relaxed: no operation orders memory.
+       - Consume: a load operation performs a consume operation on the
+         affected memory location (currently unsupported by LLVM).
+       - Acquire: a load operation performs an acquire operation on the
+         affected memory location.
+       - Release: a store operation performs a release operation on the
+         affected memory location.
+       - Acquire-release: load and store operations perform acquire and
+         release operations on the affected memory.
+       - Sequentially consistent: same as acquire-release, but providing
+         a global total ordering for all affected locations.
+
+    As in C11/C++11:
+
+      - Atomic accesses must at least be naturally aligned.
+      - Some accesses may not actually be atomic on certain platforms,
+        requiring an implementation that uses a global lock.
+      - An atomic memory location must always be accessed with atomic
+        primitives, and these primitives must always be of the same bit
+        size for that location.
+      - Not all memory orderings are valid for all atomic operations.
 
 Fast-Math Flags
 ---------------
@@ -277,14 +395,6 @@ Only the LLVM instructions listed here are supported by PNaCl bitcode.
   The pointer argument of these instructions must be a *normalized* pointer
   (see :ref:`pointer types <pointertypes>`).
 
-* ``fence``
-* ``cmpxchg``, ``atomicrmw``
-
-  The pointer argument of these instructions must be a *normalized* pointer
-  (see :ref:`pointer types <pointertypes>`).
-
-  TODO(jfb): this may change
-
 * ``trunc``
 * ``zext``
 * ``sext``
@@ -323,8 +433,6 @@ Intrinsic Functions
 
 The only intrinsics supported by PNaCl bitcode are the following.
 
-TODO(jfb): atomics
-
 * ``llvm.memcpy``
 * ``llvm.memmove``
 * ``llvm.memset``
@@ -365,3 +473,56 @@ TODO(jfb): atomics
 
   TODO: describe
 
+.. _atomic intrinsics:
+
+* ``llvm.nacl.atomic.store``
+* ``llvm.nacl.atomic.load``
+* ``llvm.nacl.atomic.rmw``
+* ``llvm.nacl.atomic.cmpxchg``
+* ``llvm.nacl.atomic.fence``
+
+  .. code-block:: llvm
+
+    declare iN @llvm.nacl.atomic.load.<size>(
+            iN* <source>, i32 <memory_order>)
+    declare void @llvm.nacl.atomic.store.<size>(
+            iN <operand>, iN* <destination>, i32 <memory_order>)
+    declare iN @llvm.nacl.atomic.rmw.<size>(
+            i32 <computation>, iN* <object>, iN <operand>, i32 <memory_order>)
+    declare iN @llvm.nacl.atomic.cmpxchg.<size>(
+            iN* <object>, iN <expected>, iN <desired>,
+	    i32 <memory_order_success>, i32 <memory_order_failure>)
+    declare void @llvm.nacl.atomic.fence(i32 <memory_order>)
+
+  Each of these intrinsics is overloaded on the ``iN`` argument, which
+  is reflected through ``<size>`` in the overload's name. Integral types
+  of 8, 16, 32 and 64-bit width are supported for these arguments.
+
+  The ``@llvm.nacl.atomic.rmw`` intrinsic implements the following
+  read-modify-write operations, from the general and arithmetic sections
+  of the C11/C++11 standards:
+
+   - ``add``
+   - ``sub``
+   - ``or``
+   - ``and``
+   - ``xor``
+   - ``exchange``
+
+  For all of these read-modify-write operations, the returned value is
+  that at ``object`` before the computation. The ``computation``
+  argument must be a compile-time constant.
+
+  All atomic intrinsics also support C11/C++11 memory orderings, which
+  must be compile-time constants. Those are detailed in `Atomic Memory
+  Ordering Constraints`_.
+
+  Integer values for these computations and memory orderings are defined
+  in ``"llvm/IR/NaClAtomicIntrinsics.h"``.
+
+  .. note::
+
+      These intrinsics allow PNaCl to support C11/C++11 style atomic
+      operations as well as some legacy GCC-style ``__sync_*`` builtins
+      while remaining stable as the LLVM codebase changes. The user
+      isn't expected to use these intrinsics directly.
author	Eli Bendersky <eliben@chromium.org>	2013-07-18 18:00:27 -0700
committer	Eli Bendersky <eliben@chromium.org>	2013-07-18 18:00:27 -0700
commit	4412ea4b8e019d00dc7574fe1723eea0473a8ec1 (patch)
tree	2badd5ce0727bfad02f10d0d82c8bcfa65677676 /docs
parent	4a9f2a703db400ccf760f34101bcdd57642f96e4 (diff)
parent	5b548094edef39376e17445aea28ad2b37d701c4 (diff)