Rework PNaCl memory ordering

This CL reworks memory ordering as specified by PNaCl. The documentation needed some clarification, and the implementation needs a bit more work around volatile and __sync_synchronize to offer stronger guarantees than what LLVM intends to offer for legacy code. There is a companion patch with Clang changes: https://codereview.chromium.org/22294002 R=eliben@chromium.org TEST= ninja check-all BUG= https://code.google.com/p/nativeclient/issues/detail?id=3475 Review URL: https://codereview.chromium.org/22240002
author: JF Bastien <jfb@chromium.org> 2013-08-06 16:14:36 -0700
committer: JF Bastien <jfb@chromium.org> 2013-08-06 16:14:36 -0700
commit: 77f169c9afeaf7384360ff6d56b73cc4d3200f5b (patch)
tree: af76df0613246defa73290adf854e349cad79c3e
parent: b6846e1a64c3a56be80f1b7bd2d5bf10cfabc36f (diff)
3 files changed, 158 insertions, 110 deletions
diff --git a/docs/PNaClDeveloperGuide.rst b/docs/PNaClDeveloperGuide.rst
index 9c27ae5c14..e807d572f7 100644
--- a/docs/PNaClDeveloperGuide.rst
+++ b/docs/PNaClDeveloperGuide.rst
@@ -14,126 +14,159 @@ TODO
 Memory Model and Atomics
 ========================
 
-Volatile Memory Accesses
-------------------------
-
-The C11/C++11 standards mandate that ``volatile`` accesses execute in program
-order (but are not fences, so other memory operations can reorder around them),
-are not necessarily atomic, and can’t be elided. They can be separated into
-smaller width accesses.
-
-The PNaCl toolchain applies regular LLVM optimizations along these guidelines,
-and it further prevents any load/store (even non-``volatile`` and non-atomic
-ones) from moving above or below a volatile operations: they act as compiler
-barriers before optimizations occur. The PNaCl toolchain freezes ``volatile``
-accesses after optimizations into atomic accesses with sequentially consistent
-memory ordering. This eases the support of legacy (i.e. non-C11/C++11) code, and
-combined with builtin fences these programs can do meaningful cross-thread
-communication without changing code. It also reflects the original code's intent
-and guarantees better portability.
-
-Relaxed ordering could be used instead, but for the first release it is more
-conservative to apply sequential consistency. Future releases may change what
-happens at compile-time, but already-released pexes will continue using
-sequential consistency.
-
-The PNaCl toolchain also requires that ``volatile`` accesses be at least
-naturally aligned, and tries to guarantee this alignment.
-
 Memory Model for Concurrent Operations
 --------------------------------------
 
-The memory model offered by PNaCl relies on the same coding guidelines as the
-C11/C++11 one: concurrent accesses must always occur through atomic primitives
-(offered by `atomic intrinsics <PNaClLangRef.html#atomicintrinsics>`_), and
-these accesses must always occur with the same size for the same memory
-location. Visibility of stores is provided on a happens-before basis that
-relates memory locations to each other as the C11/C++11 standards do.
-
-As in C11/C++11 some atomic accesses may be implemented with locks on certain
-platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always be ``1``, signifying
-that all types are sometimes lock-free. The ``is_lock_free`` methods will return
-the current platform's implementation at translation time.
-
-The PNaCl toolchain supports concurrent memory accesses through legacy GCC-style
-``__sync_*`` builtins, as well as through C11/C++11 atomic primitives.
-``volatile`` memory accesses can also be used, though these are discouraged, and
-aren't present in bitcode.
+The memory model offered by PNaCl relies on the same coding guidelines
+as the C11/C++11 one: concurrent accesses must always occur through
+atomic primitives (offered by `atomic intrinsics
+<PNaClLangRef.html#atomicintrinsics>`_), and these accesses must always
+occur with the same size for the same memory location. Visibility of
+stores is provided on a happens-before basis that relates memory
+locations to each other as the C11/C++11 standards do.
+
+Non-atomic memory accesses may be reordered, separated, elided or fused
+according to C and C++'s memory model before the pexe is created as well
+as after its creation.
+
+As in C11/C++11 some atomic accesses may be implemented with locks on
+certain platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always be
+``1``, signifying that all types are sometimes lock-free. The
+``is_lock_free`` methods and ``atomic_is_lock_free`` will return the
+current platform's implementation at translation time. These macros,
+methods and functions are in the C11 header ``<stdatomic.h>`` and the
+C++11 header ``<atomic>``.
+
+The PNaCl toolchain supports concurrent memory accesses through legacy
+GCC-style ``__sync_*`` builtins, as well as through C11/C++11 atomic
+primitives.  ``volatile`` memory accesses can also be used, though these
+are discouraged. See `Volatile Memory Accesses`_.
 
 PNaCl supports concurrency and parallelism with some restrictions:
 
-* Threading is explicitly supported.
+* Threading is explicitly supported and has no restrictions over what
+  prevalent implementations offer. See `Threading`_.
+  
+* ``volatile`` and atomic operations are address-free (operations on the
+  same memory location via two different addresses work atomically), as
+  intended by the C11/C++11 standards. This is critical in supporting
+  synchronous "external modifications" such as mapping underlying memory
+  at multiple locations.
 
-* Inter-process communication through shared memory is limited to operations
-  which are lock-free on the current platform (``is_lock_free`` methods). This
-  may change at a later date.
+* Inter-process communication through shared memory is currently not
+  supported. See `Future Directions`_.
 
-* Direct interaction with device memory isn't supported.
+* Signal handling isn't supported, PNaCl therefore promotes all
+  primitives to cross-thread (instead of single-thread). This may change
+  at a later date. Note that using atomic operations which aren't
+  lock-free may lead to deadlocks when handling asynchronous
+  signals. See `Future Directions`_.
 
-* Signal handling isn't supported, PNaCl therefore promotes all primitives to
-  cross-thread (instead of single-thread). This may change at a later date. Note
-  that using atomic operations which aren't lock-free may lead to deadlocks when
-  handling asynchronous signals.
-  
-* ``volatile`` and atomic operations are address-free (operations on the same
-  memory location via two different addresses work atomically), as intended by
-  the C11/C++11 standards. This is critical for inter-process communication as
-  well as synchronous "external modifications" such as mapping underlying memory
-  at multiple locations.
+* Direct interaction with device memory isn't supported, and there is no
+  intent to support it. The embedding sandbox's runtime can offer APIs
+  to indirectly access devices.
 
-Setting up the above mechanisms requires assistance from the embedding sandbox's
-runtime (e.g. NaCl's Pepper APIs), but using them once setup can be done through
-regular C/C++ code.
-
-The PNaCl toolchain currently optimizes for memory ordering as LLVM normally
-does, but at pexe creation time it promotes all ``volatile`` accesses as well as
-all atomic accesses to be sequentially consistent. Other memory orderings will
-be supported in a future release, but pexes generated with the current toolchain
-will continue functioning with sequential consistency. Using sequential
-consistency provides a total ordering for all sequentially-consistent operations
-on all addresses.
-
-This means that ``volatile`` and atomic memory accesses can only be re-ordered
-in some limited way before the pexe is created, and will act as fences for all
-memory accesses (even non-atomic and non-``volatile``) after pexe creation.
-Non-atomic and non-``volatile`` memory accesses may be reordered (unless a fence
-intervenes), separated, elided or fused according to C and C++'s memory model
-before the pexe is created as well as after its creation.
+Setting up the above mechanisms requires assistance from the embedding
+sandbox's runtime (e.g. NaCl's Pepper APIs), but using them once setup
+can be done through regular C/C++ code.
 
 Atomic Memory Ordering Constraints
 ----------------------------------
 
-Atomics follow the same ordering constraints as in regular LLVM, but
-all accesses are promoted to sequential consistency (the strongest
-memory ordering) at pexe creation time. As more C11/C++11 code
-allows us to understand performance and portability needs we intend
-to support the full gamut of C11/C++11 memory orderings:
+Atomics follow the same ordering constraints as in regular C11/C++11,
+but all accesses are promoted to sequential consistency (the strongest
+memory ordering) at pexe creation time. As more C11/C++11 code allows us
+to understand performance and portability needs we intend to support the
+full gamut of C11/C++11 memory orderings:
 
 - Relaxed: no operation orders memory.
-- Consume: a load operation performs a consume operation on the affected memory
-  location (currently unsupported by LLVM).
-- Acquire: a load operation performs an acquire operation on the affected memory
-  location.
-- Release: a store operation performs a release operation on the affected memory
-  location.
+- Consume: a load operation performs a consume operation on the affected
+  memory location (note: currently unsupported by LLVM).
+- Acquire: a load operation performs an acquire operation on the
+  affected memory location.
+- Release: a store operation performs a release operation on the
+  affected memory location.
 - Acquire-release: load and store operations perform acquire and release
   operations on the affected memory.
-- Sequentially consistent: same as acquire-release, but providing a global total
-  ordering for all affected locations.
+- Sequentially consistent: same as acquire-release, but providing a
+  global total ordering for all affected locations.
 
 As in C11/C++11:
 
 - Atomic accesses must at least be naturally aligned.
-- Some accesses may not actually be atomic on certain platforms, requiring an
-  implementation that uses a global lock.
-- An atomic memory location must always be accessed with atomic primitives, and
-  these primitives must always be of the same bit size for that location.
+- Some accesses may not actually be atomic on certain platforms,
+  requiring an implementation that uses global lock(s).
+- An atomic memory location must always be accessed with atomic
+  primitives, and these primitives must always be of the same bit size
+  for that location.
 - Not all memory orderings are valid for all atomic operations.
 
+Volatile Memory Accesses
+------------------------
+
+The C11/C++11 standards mandate that ``volatile`` accesses execute in
+program order (but are not fences, so other memory operations can
+reorder around them), are not necessarily atomic, and can’t be
+elided. They can be separated into smaller width accesses.
+
+Before any optimizations occur the PNaCl toolchain transforms
+``volatile`` loads and stores into sequentially consistent ``volatile``
+atomic loads and stores, and applies regular compiler optimizations
+along the above guidelines. This orders ``volatiles`` according to the
+atomic rules, and means that fences (including ``__sync_synchronize``)
+act in a better-defined manner. Regular memory accesses still do not
+have ordering guarantees with ``volatile`` and atomic accesses, though
+the internal representation of ``__sync_synchronize`` attempts to
+prevent reordering of memory accesses to objects which may escape.
+
+Relaxed ordering could be used instead, but for the first release it is
+more conservative to apply sequential consistency. Future releases may
+change what happens at compile-time, but already-released pexes will
+continue using sequential consistency.
+
+The PNaCl toolchain also requires that ``volatile`` accesses be at least
+naturally aligned, and tries to guarantee this alignment.
+
+The above guarantees ease the support of legacy (i.e. non-C11/C++11)
+code, and combined with builtin fences these programs can do meaningful
+cross-thread communication without changing code. They also better
+reflect the original code's intent and guarantee better portability.
+
+Threading
+=========
+
+Threading is explicitly supported through C11/C++11's threading
+libraries as well as POSIX threads.
+
+Communication between threads should use atomic primitives as described
+in `Memory Model and Atomics`_.
+
 Inline Assembly
 ===============
 
 Inline assembly isn't supported by PNaCl because it isn't portable. The
 one current exception is the common compiler barrier idiom
 ``asm("":::"memory")``, which gets transformed to a sequentially
-consistent memory barrier (equivalent to ``__sync_synchronize()``).
+consistent memory barrier (equivalent to ``__sync_synchronize()``). In
+PNaCl this barrier is only guaranteed to order ``volatile`` and atomic
+memory accesses, though in practice the implementation attempts to also
+prevent reordering of memory accesses to objects which may escape.
+
+Future Directions
+=================
+
+Inter-Process Communication
+---------------------------
+
+Inter-process communication through shared memory is currently not
+supported by PNaCl.  When implemented, it may be limited to operations
+which are lock-free on the current platform (``is_lock_free``
+methods). It will rely on the address-free properly discussed in `Memory
+Model for Concurrent Operations`_.
+
+Signal Handling
+---------------
+
+Untrusted signal handling currently isn't supported by PNaCl. When
+supported, the impact of ``volatile`` and atomics for same-thread signal
+handling will need to be carefully detailed.
diff --git a/docs/PNaClLangRef.rst b/docs/PNaClLangRef.rst
index 75218a0c02..624bebbda6 100644
--- a/docs/PNaClLangRef.rst
+++ b/docs/PNaClLangRef.rst
@@ -143,25 +143,30 @@ Volatile Memory Accesses
 
 `LLVM LangRef: Volatile Memory Accesses <LangRef.html#volatile>`_
 
-PNaCl bitcode does not support volatile memory accesses. The ``volatile``
-attribute on loads and stores is not supported. See the
+PNaCl bitcode does not support volatile memory accesses. The
+``volatile`` attribute on loads and stores is not supported. See the
 `PNaCl Developer's Guide <PNaClDeveloperGuide.html>`_ for more details.
 
 Memory Model for Concurrent Operations
 --------------------------------------
 
-`LLVM LangRef: Memory Model for Concurrent Operations <LangRef.html#memmodel>`_
+`LLVM LangRef: Memory Model for Concurrent Operations
+<LangRef.html#memmodel>`_
 
-See the `PNaCl Developer's Guide <PNaClDeveloperGuide.html>`_ for more details.
+See the `PNaCl Developer's Guide <PNaClDeveloperGuide.html>`_ for more
+details.
 
 Atomic Memory Ordering Constraints
 ----------------------------------
 
 `LLVM LangRef: Atomic Memory Ordering Constraints <LangRef.html#ordering>`_
 
-PNaCl bitcode currently supports sequential consistency only, through its
-`atomic intrinsics`_. See the
-`PNaCl Developer's Guide <PNaClDeveloperGuide.html>`_ for more details.
+PNaCl bitcode currently supports sequential consistency only, through
+its `atomic intrinsics`_. See the `PNaCl Developer's Guide
+<PNaClDeveloperGuide.html>`_ for more details.
+
+The integer values for memory ordering constraints are in
+``"llvm/IR/NaClAtomicIntrinsics.h"``.
 
 Fast-Math Flags
 ---------------
@@ -410,6 +415,8 @@ The only intrinsics supported by PNaCl bitcode are the following.
 * ``llvm.nacl.atomic.rmw``
 * ``llvm.nacl.atomic.cmpxchg``
 * ``llvm.nacl.atomic.fence``
+* ``llvm.nacl.atomic.fence.all``
+* ``llvm.nacl.atomic.is.lock.free``
 
   See :ref:`atomic intrinsics <atomicintrinsics>`.
 
@@ -434,9 +441,9 @@ Setjmp and Longjmp
     declare void @llvm.nacl.longjmp(i8* %jmpbuf, i32)
     declare i32 @llvm.nacl.setjmp(i8* %jmpbuf)
 
-These intrinsics implement the semantics of C11 ``setjmp`` and ``longjmp``. The
-``jmpbuf`` pointer must be 64-bit aligned and point to at least 1024 bytes of
-allocated memory.
+These intrinsics implement the semantics of C11 ``setjmp`` and
+``longjmp``. The ``jmpbuf`` pointer must be 64-bit aligned and point to
+at least 1024 bytes of allocated memory.
 
 .. _atomicintrinsics:
 
@@ -455,10 +462,11 @@ Atomic intrinsics
             iN* <object>, iN <expected>, iN <desired>,
             i32 <memory_order_success>, i32 <memory_order_failure>)
     declare void @llvm.nacl.atomic.fence(i32 <memory_order>)
+    declare void @llvm.nacl.atomic.fence.all()
 
-Each of these intrinsics is overloaded on the ``iN`` argument, which
-is reflected through ``<size>`` in the overload's name. Integral types
-of 8, 16, 32 and 64-bit width are supported for these arguments.
+Each of these intrinsics is overloaded on the ``iN`` argument, which is
+reflected through ``<size>`` in the overload's name. Integral types of
+8, 16, 32 and 64-bit width are supported for these arguments.
 
 The ``@llvm.nacl.atomic.rmw`` intrinsic implements the following
 read-modify-write operations, from the general and arithmetic sections
@@ -472,8 +480,8 @@ of the C11/C++11 standards:
  - ``exchange``
 
 For all of these read-modify-write operations, the returned value is
-that at ``object`` before the computation. The ``computation``
-argument must be a compile-time constant.
+that at ``object`` before the computation. The ``computation`` argument
+must be a compile-time constant.
 
 All atomic intrinsics also support C11/C++11 memory orderings, which
 must be compile-time constants. Those are detailed in `Atomic Memory
@@ -482,12 +490,17 @@ Ordering Constraints`_.
 Integer values for these computations and memory orderings are defined
 in ``"llvm/IR/NaClAtomicIntrinsics.h"``.
 
+The ``@llvm.nacl.atomic.fence.all`` intrinsic is equivalent to the
+``@llvm.nacl.atomic.fence`` intrinsic with sequentially consistent
+ordering and compiler barriers preventing most non-atomic memory
+accesses from reordering around it.
+
 .. note::
 
     These intrinsics allow PNaCl to support C11/C++11 style atomic
     operations as well as some legacy GCC-style ``__sync_*`` builtins
-    while remaining stable as the LLVM codebase changes. The user
-    isn't expected to use these intrinsics directly.
+    while remaining stable as the LLVM codebase changes. The user isn't
+    expected to use these intrinsics directly.
 
 .. code-block:: llvm
 
diff --git a/include/llvm/IR/Intrinsics.td b/include/llvm/IR/Intrinsics.td
index 15567eb2db..c82051ae7d 100644
--- a/include/llvm/IR/Intrinsics.td
+++ b/include/llvm/IR/Intrinsics.td
@@ -526,6 +526,8 @@ def int_nacl_atomic_cmpxchg : Intrinsic<[llvm_anyint_ty],
     [IntrReadWriteArgMem]>;
 def int_nacl_atomic_fence : Intrinsic<[], [llvm_i32_ty],
     [IntrReadWriteArgMem]>;
+def int_nacl_atomic_fence_all : Intrinsic<[], [],
+    [IntrReadWriteArgMem]>;
 def int_nacl_atomic_is_lock_free : Intrinsic<[llvm_i1_ty],
     [llvm_i32_ty, llvm_ptr_ty], [IntrNoMem]>,
     GCCBuiltin<"__nacl_atomic_is_lock_free">;
author	JF Bastien <jfb@chromium.org>	2013-08-06 16:14:36 -0700
committer	JF Bastien <jfb@chromium.org>	2013-08-06 16:14:36 -0700
commit	77f169c9afeaf7384360ff6d56b73cc4d3200f5b (patch)
tree	af76df0613246defa73290adf854e349cad79c3e
parent	b6846e1a64c3a56be80f1b7bd2d5bf10cfabc36f (diff)