aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJF Bastien <jfb@chromium.org>2013-08-06 16:14:36 -0700
committerJF Bastien <jfb@chromium.org>2013-08-06 16:14:36 -0700
commit77f169c9afeaf7384360ff6d56b73cc4d3200f5b (patch)
treeaf76df0613246defa73290adf854e349cad79c3e
parentb6846e1a64c3a56be80f1b7bd2d5bf10cfabc36f (diff)
Rework PNaCl memory ordering
This CL reworks memory ordering as specified by PNaCl. The documentation needed some clarification, and the implementation needs a bit more work around volatile and __sync_synchronize to offer stronger guarantees than what LLVM intends to offer for legacy code. There is a companion patch with Clang changes: https://codereview.chromium.org/22294002 R=eliben@chromium.org TEST= ninja check-all BUG= https://code.google.com/p/nativeclient/issues/detail?id=3475 Review URL: https://codereview.chromium.org/22240002
-rw-r--r--docs/PNaClDeveloperGuide.rst219
-rw-r--r--docs/PNaClLangRef.rst47
-rw-r--r--include/llvm/IR/Intrinsics.td2
3 files changed, 158 insertions, 110 deletions
diff --git a/docs/PNaClDeveloperGuide.rst b/docs/PNaClDeveloperGuide.rst
index 9c27ae5c14..e807d572f7 100644
--- a/docs/PNaClDeveloperGuide.rst
+++ b/docs/PNaClDeveloperGuide.rst
@@ -14,126 +14,159 @@ TODO
Memory Model and Atomics
========================
-Volatile Memory Accesses
-------------------------
-
-The C11/C++11 standards mandate that ``volatile`` accesses execute in program
-order (but are not fences, so other memory operations can reorder around them),
-are not necessarily atomic, and can’t be elided. They can be separated into
-smaller width accesses.
-
-The PNaCl toolchain applies regular LLVM optimizations along these guidelines,
-and it further prevents any load/store (even non-``volatile`` and non-atomic
-ones) from moving above or below a volatile operations: they act as compiler
-barriers before optimizations occur. The PNaCl toolchain freezes ``volatile``
-accesses after optimizations into atomic accesses with sequentially consistent
-memory ordering. This eases the support of legacy (i.e. non-C11/C++11) code, and
-combined with builtin fences these programs can do meaningful cross-thread
-communication without changing code. It also reflects the original code's intent
-and guarantees better portability.
-
-Relaxed ordering could be used instead, but for the first release it is more
-conservative to apply sequential consistency. Future releases may change what
-happens at compile-time, but already-released pexes will continue using
-sequential consistency.
-
-The PNaCl toolchain also requires that ``volatile`` accesses be at least
-naturally aligned, and tries to guarantee this alignment.
-
Memory Model for Concurrent Operations
--------------------------------------
-The memory model offered by PNaCl relies on the same coding guidelines as the
-C11/C++11 one: concurrent accesses must always occur through atomic primitives
-(offered by `atomic intrinsics <PNaClLangRef.html#atomicintrinsics>`_), and
-these accesses must always occur with the same size for the same memory
-location. Visibility of stores is provided on a happens-before basis that
-relates memory locations to each other as the C11/C++11 standards do.
-
-As in C11/C++11 some atomic accesses may be implemented with locks on certain
-platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always be ``1``, signifying
-that all types are sometimes lock-free. The ``is_lock_free`` methods will return
-the current platform's implementation at translation time.
-
-The PNaCl toolchain supports concurrent memory accesses through legacy GCC-style
-``__sync_*`` builtins, as well as through C11/C++11 atomic primitives.
-``volatile`` memory accesses can also be used, though these are discouraged, and
-aren't present in bitcode.
+The memory model offered by PNaCl relies on the same coding guidelines
+as the C11/C++11 one: concurrent accesses must always occur through
+atomic primitives (offered by `atomic intrinsics
+<PNaClLangRef.html#atomicintrinsics>`_), and these accesses must always
+occur with the same size for the same memory location. Visibility of
+stores is provided on a happens-before basis that relates memory
+locations to each other as the C11/C++11 standards do.
+
+Non-atomic memory accesses may be reordered, separated, elided or fused
+according to C and C++'s memory model before the pexe is created as well
+as after its creation.
+
+As in C11/C++11 some atomic accesses may be implemented with locks on
+certain platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always be
+``1``, signifying that all types are sometimes lock-free. The
+``is_lock_free`` methods and ``atomic_is_lock_free`` will return the
+current platform's implementation at translation time. These macros,
+methods and functions are in the C11 header ``<stdatomic.h>`` and the
+C++11 header ``<atomic>``.
+
+The PNaCl toolchain supports concurrent memory accesses through legacy
+GCC-style ``__sync_*`` builtins, as well as through C11/C++11 atomic
+primitives. ``volatile`` memory accesses can also be used, though these
+are discouraged. See `Volatile Memory Accesses`_.
PNaCl supports concurrency and parallelism with some restrictions:
-* Threading is explicitly supported.
+* Threading is explicitly supported and has no restrictions over what
+ prevalent implementations offer. See `Threading`_.
+
+* ``volatile`` and atomic operations are address-free (operations on the
+ same memory location via two different addresses work atomically), as
+ intended by the C11/C++11 standards. This is critical in supporting
+ synchronous "external modifications" such as mapping underlying memory
+ at multiple locations.
-* Inter-process communication through shared memory is limited to operations
- which are lock-free on the current platform (``is_lock_free`` methods). This
- may change at a later date.
+* Inter-process communication through shared memory is currently not
+ supported. See `Future Directions`_.
-* Direct interaction with device memory isn't supported.
+* Signal handling isn't supported, PNaCl therefore promotes all
+ primitives to cross-thread (instead of single-thread). This may change
+ at a later date. Note that using atomic operations which aren't
+ lock-free may lead to deadlocks when handling asynchronous
+ signals. See `Future Directions`_.
-* Signal handling isn't supported, PNaCl therefore promotes all primitives to
- cross-thread (instead of single-thread). This may change at a later date. Note
- that using atomic operations which aren't lock-free may lead to deadlocks when
- handling asynchronous signals.
-
-* ``volatile`` and atomic operations are address-free (operations on the same
- memory location via two different addresses work atomically), as intended by
- the C11/C++11 standards. This is critical for inter-process communication as
- well as synchronous "external modifications" such as mapping underlying memory
- at multiple locations.
+* Direct interaction with device memory isn't supported, and there is no
+ intent to support it. The embedding sandbox's runtime can offer APIs
+ to indirectly access devices.
-Setting up the above mechanisms requires assistance from the embedding sandbox's
-runtime (e.g. NaCl's Pepper APIs), but using them once setup can be done through
-regular C/C++ code.
-
-The PNaCl toolchain currently optimizes for memory ordering as LLVM normally
-does, but at pexe creation time it promotes all ``volatile`` accesses as well as
-all atomic accesses to be sequentially consistent. Other memory orderings will
-be supported in a future release, but pexes generated with the current toolchain
-will continue functioning with sequential consistency. Using sequential
-consistency provides a total ordering for all sequentially-consistent operations
-on all addresses.
-
-This means that ``volatile`` and atomic memory accesses can only be re-ordered
-in some limited way before the pexe is created, and will act as fences for all
-memory accesses (even non-atomic and non-``volatile``) after pexe creation.
-Non-atomic and non-``volatile`` memory accesses may be reordered (unless a fence
-intervenes), separated, elided or fused according to C and C++'s memory model
-before the pexe is created as well as after its creation.
+Setting up the above mechanisms requires assistance from the embedding
+sandbox's runtime (e.g. NaCl's Pepper APIs), but using them once setup
+can be done through regular C/C++ code.
Atomic Memory Ordering Constraints
----------------------------------
-Atomics follow the same ordering constraints as in regular LLVM, but
-all accesses are promoted to sequential consistency (the strongest
-memory ordering) at pexe creation time. As more C11/C++11 code
-allows us to understand performance and portability needs we intend
-to support the full gamut of C11/C++11 memory orderings:
+Atomics follow the same ordering constraints as in regular C11/C++11,
+but all accesses are promoted to sequential consistency (the strongest
+memory ordering) at pexe creation time. As more C11/C++11 code allows us
+to understand performance and portability needs we intend to support the
+full gamut of C11/C++11 memory orderings:
- Relaxed: no operation orders memory.
-- Consume: a load operation performs a consume operation on the affected memory
- location (currently unsupported by LLVM).
-- Acquire: a load operation performs an acquire operation on the affected memory
- location.
-- Release: a store operation performs a release operation on the affected memory
- location.
+- Consume: a load operation performs a consume operation on the affected
+ memory location (note: currently unsupported by LLVM).
+- Acquire: a load operation performs an acquire operation on the
+ affected memory location.
+- Release: a store operation performs a release operation on the
+ affected memory location.
- Acquire-release: load and store operations perform acquire and release
operations on the affected memory.
-- Sequentially consistent: same as acquire-release, but providing a global total
- ordering for all affected locations.
+- Sequentially consistent: same as acquire-release, but providing a
+ global total ordering for all affected locations.
As in C11/C++11:
- Atomic accesses must at least be naturally aligned.
-- Some accesses may not actually be atomic on certain platforms, requiring an
- implementation that uses a global lock.
-- An atomic memory location must always be accessed with atomic primitives, and
- these primitives must always be of the same bit size for that location.
+- Some accesses may not actually be atomic on certain platforms,
+ requiring an implementation that uses global lock(s).
+- An atomic memory location must always be accessed with atomic
+ primitives, and these primitives must always be of the same bit size
+ for that location.
- Not all memory orderings are valid for all atomic operations.
+Volatile Memory Accesses
+------------------------
+
+The C11/C++11 standards mandate that ``volatile`` accesses execute in
+program order (but are not fences, so other memory operations can
+reorder around them), are not necessarily atomic, and can’t be
+elided. They can be separated into smaller width accesses.
+
+Before any optimizations occur the PNaCl toolchain transforms
+``volatile`` loads and stores into sequentially consistent ``volatile``
+atomic loads and stores, and applies regular compiler optimizations
+along the above guidelines. This orders ``volatiles`` according to the
+atomic rules, and means that fences (including ``__sync_synchronize``)
+act in a better-defined manner. Regular memory accesses still do not
+have ordering guarantees with ``volatile`` and atomic accesses, though
+the internal representation of ``__sync_synchronize`` attempts to
+prevent reordering of memory accesses to objects which may escape.
+
+Relaxed ordering could be used instead, but for the first release it is
+more conservative to apply sequential consistency. Future releases may
+change what happens at compile-time, but already-released pexes will
+continue using sequential consistency.
+
+The PNaCl toolchain also requires that ``volatile`` accesses be at least
+naturally aligned, and tries to guarantee this alignment.
+
+The above guarantees ease the support of legacy (i.e. non-C11/C++11)
+code, and combined with builtin fences these programs can do meaningful
+cross-thread communication without changing code. They also better
+reflect the original code's intent and guarantee better portability.
+
+Threading
+=========
+
+Threading is explicitly supported through C11/C++11's threading
+libraries as well as POSIX threads.
+
+Communication between threads should use atomic primitives as described
+in `Memory Model and Atomics`_.
+
Inline Assembly
===============
Inline assembly isn't supported by PNaCl because it isn't portable. The
one current exception is the common compiler barrier idiom
``asm("":::"memory")``, which gets transformed to a sequentially
-consistent memory barrier (equivalent to ``__sync_synchronize()``).
+consistent memory barrier (equivalent to ``__sync_synchronize()``). In
+PNaCl this barrier is only guaranteed to order ``volatile`` and atomic
+memory accesses, though in practice the implementation attempts to also
+prevent reordering of memory accesses to objects which may escape.
+
+Future Directions
+=================
+
+Inter-Process Communication
+---------------------------
+
+Inter-process communication through shared memory is currently not
+supported by PNaCl. When implemented, it may be limited to operations
+which are lock-free on the current platform (``is_lock_free``
+methods). It will rely on the address-free properly discussed in `Memory
+Model for Concurrent Operations`_.
+
+Signal Handling
+---------------
+
+Untrusted signal handling currently isn't supported by PNaCl. When
+supported, the impact of ``volatile`` and atomics for same-thread signal
+handling will need to be carefully detailed.
diff --git a/docs/PNaClLangRef.rst b/docs/PNaClLangRef.rst
index 75218a0c02..624bebbda6 100644
--- a/docs/PNaClLangRef.rst
+++ b/docs/PNaClLangRef.rst
@@ -143,25 +143,30 @@ Volatile Memory Accesses
`LLVM LangRef: Volatile Memory Accesses <LangRef.html#volatile>`_
-PNaCl bitcode does not support volatile memory accesses. The ``volatile``
-attribute on loads and stores is not supported. See the
+PNaCl bitcode does not support volatile memory accesses. The
+``volatile`` attribute on loads and stores is not supported. See the
`PNaCl Developer's Guide <PNaClDeveloperGuide.html>`_ for more details.
Memory Model for Concurrent Operations
--------------------------------------
-`LLVM LangRef: Memory Model for Concurrent Operations <LangRef.html#memmodel>`_
+`LLVM LangRef: Memory Model for Concurrent Operations
+<LangRef.html#memmodel>`_
-See the `PNaCl Developer's Guide <PNaClDeveloperGuide.html>`_ for more details.
+See the `PNaCl Developer's Guide <PNaClDeveloperGuide.html>`_ for more
+details.
Atomic Memory Ordering Constraints
----------------------------------
`LLVM LangRef: Atomic Memory Ordering Constraints <LangRef.html#ordering>`_
-PNaCl bitcode currently supports sequential consistency only, through its
-`atomic intrinsics`_. See the
-`PNaCl Developer's Guide <PNaClDeveloperGuide.html>`_ for more details.
+PNaCl bitcode currently supports sequential consistency only, through
+its `atomic intrinsics`_. See the `PNaCl Developer's Guide
+<PNaClDeveloperGuide.html>`_ for more details.
+
+The integer values for memory ordering constraints are in
+``"llvm/IR/NaClAtomicIntrinsics.h"``.
Fast-Math Flags
---------------
@@ -410,6 +415,8 @@ The only intrinsics supported by PNaCl bitcode are the following.
* ``llvm.nacl.atomic.rmw``
* ``llvm.nacl.atomic.cmpxchg``
* ``llvm.nacl.atomic.fence``
+* ``llvm.nacl.atomic.fence.all``
+* ``llvm.nacl.atomic.is.lock.free``
See :ref:`atomic intrinsics <atomicintrinsics>`.
@@ -434,9 +441,9 @@ Setjmp and Longjmp
declare void @llvm.nacl.longjmp(i8* %jmpbuf, i32)
declare i32 @llvm.nacl.setjmp(i8* %jmpbuf)
-These intrinsics implement the semantics of C11 ``setjmp`` and ``longjmp``. The
-``jmpbuf`` pointer must be 64-bit aligned and point to at least 1024 bytes of
-allocated memory.
+These intrinsics implement the semantics of C11 ``setjmp`` and
+``longjmp``. The ``jmpbuf`` pointer must be 64-bit aligned and point to
+at least 1024 bytes of allocated memory.
.. _atomicintrinsics:
@@ -455,10 +462,11 @@ Atomic intrinsics
iN* <object>, iN <expected>, iN <desired>,
i32 <memory_order_success>, i32 <memory_order_failure>)
declare void @llvm.nacl.atomic.fence(i32 <memory_order>)
+ declare void @llvm.nacl.atomic.fence.all()
-Each of these intrinsics is overloaded on the ``iN`` argument, which
-is reflected through ``<size>`` in the overload's name. Integral types
-of 8, 16, 32 and 64-bit width are supported for these arguments.
+Each of these intrinsics is overloaded on the ``iN`` argument, which is
+reflected through ``<size>`` in the overload's name. Integral types of
+8, 16, 32 and 64-bit width are supported for these arguments.
The ``@llvm.nacl.atomic.rmw`` intrinsic implements the following
read-modify-write operations, from the general and arithmetic sections
@@ -472,8 +480,8 @@ of the C11/C++11 standards:
- ``exchange``
For all of these read-modify-write operations, the returned value is
-that at ``object`` before the computation. The ``computation``
-argument must be a compile-time constant.
+that at ``object`` before the computation. The ``computation`` argument
+must be a compile-time constant.
All atomic intrinsics also support C11/C++11 memory orderings, which
must be compile-time constants. Those are detailed in `Atomic Memory
@@ -482,12 +490,17 @@ Ordering Constraints`_.
Integer values for these computations and memory orderings are defined
in ``"llvm/IR/NaClAtomicIntrinsics.h"``.
+The ``@llvm.nacl.atomic.fence.all`` intrinsic is equivalent to the
+``@llvm.nacl.atomic.fence`` intrinsic with sequentially consistent
+ordering and compiler barriers preventing most non-atomic memory
+accesses from reordering around it.
+
.. note::
These intrinsics allow PNaCl to support C11/C++11 style atomic
operations as well as some legacy GCC-style ``__sync_*`` builtins
- while remaining stable as the LLVM codebase changes. The user
- isn't expected to use these intrinsics directly.
+ while remaining stable as the LLVM codebase changes. The user isn't
+ expected to use these intrinsics directly.
.. code-block:: llvm
diff --git a/include/llvm/IR/Intrinsics.td b/include/llvm/IR/Intrinsics.td
index 15567eb2db..c82051ae7d 100644
--- a/include/llvm/IR/Intrinsics.td
+++ b/include/llvm/IR/Intrinsics.td
@@ -526,6 +526,8 @@ def int_nacl_atomic_cmpxchg : Intrinsic<[llvm_anyint_ty],
[IntrReadWriteArgMem]>;
def int_nacl_atomic_fence : Intrinsic<[], [llvm_i32_ty],
[IntrReadWriteArgMem]>;
+def int_nacl_atomic_fence_all : Intrinsic<[], [],
+ [IntrReadWriteArgMem]>;
def int_nacl_atomic_is_lock_free : Intrinsic<[llvm_i1_ty],
[llvm_i32_ty, llvm_ptr_ty], [IntrNoMem]>,
GCCBuiltin<"__nacl_atomic_is_lock_free">;