diff options
author | JF Bastien <jfb@chromium.org> | 2013-08-06 16:14:36 -0700 |
---|---|---|
committer | JF Bastien <jfb@chromium.org> | 2013-08-06 16:14:36 -0700 |
commit | 77f169c9afeaf7384360ff6d56b73cc4d3200f5b (patch) | |
tree | af76df0613246defa73290adf854e349cad79c3e | |
parent | b6846e1a64c3a56be80f1b7bd2d5bf10cfabc36f (diff) |
Rework PNaCl memory ordering
This CL reworks memory ordering as specified by PNaCl. The documentation needed some clarification, and the implementation needs a bit more work around volatile and __sync_synchronize to offer stronger guarantees than what LLVM intends to offer for legacy code.
There is a companion patch with Clang changes:
https://codereview.chromium.org/22294002
R=eliben@chromium.org
TEST= ninja check-all
BUG= https://code.google.com/p/nativeclient/issues/detail?id=3475
Review URL: https://codereview.chromium.org/22240002
-rw-r--r-- | docs/PNaClDeveloperGuide.rst | 219 | ||||
-rw-r--r-- | docs/PNaClLangRef.rst | 47 | ||||
-rw-r--r-- | include/llvm/IR/Intrinsics.td | 2 |
3 files changed, 158 insertions, 110 deletions
diff --git a/docs/PNaClDeveloperGuide.rst b/docs/PNaClDeveloperGuide.rst index 9c27ae5c14..e807d572f7 100644 --- a/docs/PNaClDeveloperGuide.rst +++ b/docs/PNaClDeveloperGuide.rst @@ -14,126 +14,159 @@ TODO Memory Model and Atomics ======================== -Volatile Memory Accesses ------------------------- - -The C11/C++11 standards mandate that ``volatile`` accesses execute in program -order (but are not fences, so other memory operations can reorder around them), -are not necessarily atomic, and can’t be elided. They can be separated into -smaller width accesses. - -The PNaCl toolchain applies regular LLVM optimizations along these guidelines, -and it further prevents any load/store (even non-``volatile`` and non-atomic -ones) from moving above or below a volatile operations: they act as compiler -barriers before optimizations occur. The PNaCl toolchain freezes ``volatile`` -accesses after optimizations into atomic accesses with sequentially consistent -memory ordering. This eases the support of legacy (i.e. non-C11/C++11) code, and -combined with builtin fences these programs can do meaningful cross-thread -communication without changing code. It also reflects the original code's intent -and guarantees better portability. - -Relaxed ordering could be used instead, but for the first release it is more -conservative to apply sequential consistency. Future releases may change what -happens at compile-time, but already-released pexes will continue using -sequential consistency. - -The PNaCl toolchain also requires that ``volatile`` accesses be at least -naturally aligned, and tries to guarantee this alignment. - Memory Model for Concurrent Operations -------------------------------------- -The memory model offered by PNaCl relies on the same coding guidelines as the -C11/C++11 one: concurrent accesses must always occur through atomic primitives -(offered by `atomic intrinsics <PNaClLangRef.html#atomicintrinsics>`_), and -these accesses must always occur with the same size for the same memory -location. Visibility of stores is provided on a happens-before basis that -relates memory locations to each other as the C11/C++11 standards do. - -As in C11/C++11 some atomic accesses may be implemented with locks on certain -platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always be ``1``, signifying -that all types are sometimes lock-free. The ``is_lock_free`` methods will return -the current platform's implementation at translation time. - -The PNaCl toolchain supports concurrent memory accesses through legacy GCC-style -``__sync_*`` builtins, as well as through C11/C++11 atomic primitives. -``volatile`` memory accesses can also be used, though these are discouraged, and -aren't present in bitcode. +The memory model offered by PNaCl relies on the same coding guidelines +as the C11/C++11 one: concurrent accesses must always occur through +atomic primitives (offered by `atomic intrinsics +<PNaClLangRef.html#atomicintrinsics>`_), and these accesses must always +occur with the same size for the same memory location. Visibility of +stores is provided on a happens-before basis that relates memory +locations to each other as the C11/C++11 standards do. + +Non-atomic memory accesses may be reordered, separated, elided or fused +according to C and C++'s memory model before the pexe is created as well +as after its creation. + +As in C11/C++11 some atomic accesses may be implemented with locks on +certain platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always be +``1``, signifying that all types are sometimes lock-free. The +``is_lock_free`` methods and ``atomic_is_lock_free`` will return the +current platform's implementation at translation time. These macros, +methods and functions are in the C11 header ``<stdatomic.h>`` and the +C++11 header ``<atomic>``. + +The PNaCl toolchain supports concurrent memory accesses through legacy +GCC-style ``__sync_*`` builtins, as well as through C11/C++11 atomic +primitives. ``volatile`` memory accesses can also be used, though these +are discouraged. See `Volatile Memory Accesses`_. PNaCl supports concurrency and parallelism with some restrictions: -* Threading is explicitly supported. +* Threading is explicitly supported and has no restrictions over what + prevalent implementations offer. See `Threading`_. + +* ``volatile`` and atomic operations are address-free (operations on the + same memory location via two different addresses work atomically), as + intended by the C11/C++11 standards. This is critical in supporting + synchronous "external modifications" such as mapping underlying memory + at multiple locations. -* Inter-process communication through shared memory is limited to operations - which are lock-free on the current platform (``is_lock_free`` methods). This - may change at a later date. +* Inter-process communication through shared memory is currently not + supported. See `Future Directions`_. -* Direct interaction with device memory isn't supported. +* Signal handling isn't supported, PNaCl therefore promotes all + primitives to cross-thread (instead of single-thread). This may change + at a later date. Note that using atomic operations which aren't + lock-free may lead to deadlocks when handling asynchronous + signals. See `Future Directions`_. -* Signal handling isn't supported, PNaCl therefore promotes all primitives to - cross-thread (instead of single-thread). This may change at a later date. Note - that using atomic operations which aren't lock-free may lead to deadlocks when - handling asynchronous signals. - -* ``volatile`` and atomic operations are address-free (operations on the same - memory location via two different addresses work atomically), as intended by - the C11/C++11 standards. This is critical for inter-process communication as - well as synchronous "external modifications" such as mapping underlying memory - at multiple locations. +* Direct interaction with device memory isn't supported, and there is no + intent to support it. The embedding sandbox's runtime can offer APIs + to indirectly access devices. -Setting up the above mechanisms requires assistance from the embedding sandbox's -runtime (e.g. NaCl's Pepper APIs), but using them once setup can be done through -regular C/C++ code. - -The PNaCl toolchain currently optimizes for memory ordering as LLVM normally -does, but at pexe creation time it promotes all ``volatile`` accesses as well as -all atomic accesses to be sequentially consistent. Other memory orderings will -be supported in a future release, but pexes generated with the current toolchain -will continue functioning with sequential consistency. Using sequential -consistency provides a total ordering for all sequentially-consistent operations -on all addresses. - -This means that ``volatile`` and atomic memory accesses can only be re-ordered -in some limited way before the pexe is created, and will act as fences for all -memory accesses (even non-atomic and non-``volatile``) after pexe creation. -Non-atomic and non-``volatile`` memory accesses may be reordered (unless a fence -intervenes), separated, elided or fused according to C and C++'s memory model -before the pexe is created as well as after its creation. +Setting up the above mechanisms requires assistance from the embedding +sandbox's runtime (e.g. NaCl's Pepper APIs), but using them once setup +can be done through regular C/C++ code. Atomic Memory Ordering Constraints ---------------------------------- -Atomics follow the same ordering constraints as in regular LLVM, but -all accesses are promoted to sequential consistency (the strongest -memory ordering) at pexe creation time. As more C11/C++11 code -allows us to understand performance and portability needs we intend -to support the full gamut of C11/C++11 memory orderings: +Atomics follow the same ordering constraints as in regular C11/C++11, +but all accesses are promoted to sequential consistency (the strongest +memory ordering) at pexe creation time. As more C11/C++11 code allows us +to understand performance and portability needs we intend to support the +full gamut of C11/C++11 memory orderings: - Relaxed: no operation orders memory. -- Consume: a load operation performs a consume operation on the affected memory - location (currently unsupported by LLVM). -- Acquire: a load operation performs an acquire operation on the affected memory - location. -- Release: a store operation performs a release operation on the affected memory - location. +- Consume: a load operation performs a consume operation on the affected + memory location (note: currently unsupported by LLVM). +- Acquire: a load operation performs an acquire operation on the + affected memory location. +- Release: a store operation performs a release operation on the + affected memory location. - Acquire-release: load and store operations perform acquire and release operations on the affected memory. -- Sequentially consistent: same as acquire-release, but providing a global total - ordering for all affected locations. +- Sequentially consistent: same as acquire-release, but providing a + global total ordering for all affected locations. As in C11/C++11: - Atomic accesses must at least be naturally aligned. -- Some accesses may not actually be atomic on certain platforms, requiring an - implementation that uses a global lock. -- An atomic memory location must always be accessed with atomic primitives, and - these primitives must always be of the same bit size for that location. +- Some accesses may not actually be atomic on certain platforms, + requiring an implementation that uses global lock(s). +- An atomic memory location must always be accessed with atomic + primitives, and these primitives must always be of the same bit size + for that location. - Not all memory orderings are valid for all atomic operations. +Volatile Memory Accesses +------------------------ + +The C11/C++11 standards mandate that ``volatile`` accesses execute in +program order (but are not fences, so other memory operations can +reorder around them), are not necessarily atomic, and can’t be +elided. They can be separated into smaller width accesses. + +Before any optimizations occur the PNaCl toolchain transforms +``volatile`` loads and stores into sequentially consistent ``volatile`` +atomic loads and stores, and applies regular compiler optimizations +along the above guidelines. This orders ``volatiles`` according to the +atomic rules, and means that fences (including ``__sync_synchronize``) +act in a better-defined manner. Regular memory accesses still do not +have ordering guarantees with ``volatile`` and atomic accesses, though +the internal representation of ``__sync_synchronize`` attempts to +prevent reordering of memory accesses to objects which may escape. + +Relaxed ordering could be used instead, but for the first release it is +more conservative to apply sequential consistency. Future releases may +change what happens at compile-time, but already-released pexes will +continue using sequential consistency. + +The PNaCl toolchain also requires that ``volatile`` accesses be at least +naturally aligned, and tries to guarantee this alignment. + +The above guarantees ease the support of legacy (i.e. non-C11/C++11) +code, and combined with builtin fences these programs can do meaningful +cross-thread communication without changing code. They also better +reflect the original code's intent and guarantee better portability. + +Threading +========= + +Threading is explicitly supported through C11/C++11's threading +libraries as well as POSIX threads. + +Communication between threads should use atomic primitives as described +in `Memory Model and Atomics`_. + Inline Assembly =============== Inline assembly isn't supported by PNaCl because it isn't portable. The one current exception is the common compiler barrier idiom ``asm("":::"memory")``, which gets transformed to a sequentially -consistent memory barrier (equivalent to ``__sync_synchronize()``). +consistent memory barrier (equivalent to ``__sync_synchronize()``). In +PNaCl this barrier is only guaranteed to order ``volatile`` and atomic +memory accesses, though in practice the implementation attempts to also +prevent reordering of memory accesses to objects which may escape. + +Future Directions +================= + +Inter-Process Communication +--------------------------- + +Inter-process communication through shared memory is currently not +supported by PNaCl. When implemented, it may be limited to operations +which are lock-free on the current platform (``is_lock_free`` +methods). It will rely on the address-free properly discussed in `Memory +Model for Concurrent Operations`_. + +Signal Handling +--------------- + +Untrusted signal handling currently isn't supported by PNaCl. When +supported, the impact of ``volatile`` and atomics for same-thread signal +handling will need to be carefully detailed. diff --git a/docs/PNaClLangRef.rst b/docs/PNaClLangRef.rst index 75218a0c02..624bebbda6 100644 --- a/docs/PNaClLangRef.rst +++ b/docs/PNaClLangRef.rst @@ -143,25 +143,30 @@ Volatile Memory Accesses `LLVM LangRef: Volatile Memory Accesses <LangRef.html#volatile>`_ -PNaCl bitcode does not support volatile memory accesses. The ``volatile`` -attribute on loads and stores is not supported. See the +PNaCl bitcode does not support volatile memory accesses. The +``volatile`` attribute on loads and stores is not supported. See the `PNaCl Developer's Guide <PNaClDeveloperGuide.html>`_ for more details. Memory Model for Concurrent Operations -------------------------------------- -`LLVM LangRef: Memory Model for Concurrent Operations <LangRef.html#memmodel>`_ +`LLVM LangRef: Memory Model for Concurrent Operations +<LangRef.html#memmodel>`_ -See the `PNaCl Developer's Guide <PNaClDeveloperGuide.html>`_ for more details. +See the `PNaCl Developer's Guide <PNaClDeveloperGuide.html>`_ for more +details. Atomic Memory Ordering Constraints ---------------------------------- `LLVM LangRef: Atomic Memory Ordering Constraints <LangRef.html#ordering>`_ -PNaCl bitcode currently supports sequential consistency only, through its -`atomic intrinsics`_. See the -`PNaCl Developer's Guide <PNaClDeveloperGuide.html>`_ for more details. +PNaCl bitcode currently supports sequential consistency only, through +its `atomic intrinsics`_. See the `PNaCl Developer's Guide +<PNaClDeveloperGuide.html>`_ for more details. + +The integer values for memory ordering constraints are in +``"llvm/IR/NaClAtomicIntrinsics.h"``. Fast-Math Flags --------------- @@ -410,6 +415,8 @@ The only intrinsics supported by PNaCl bitcode are the following. * ``llvm.nacl.atomic.rmw`` * ``llvm.nacl.atomic.cmpxchg`` * ``llvm.nacl.atomic.fence`` +* ``llvm.nacl.atomic.fence.all`` +* ``llvm.nacl.atomic.is.lock.free`` See :ref:`atomic intrinsics <atomicintrinsics>`. @@ -434,9 +441,9 @@ Setjmp and Longjmp declare void @llvm.nacl.longjmp(i8* %jmpbuf, i32) declare i32 @llvm.nacl.setjmp(i8* %jmpbuf) -These intrinsics implement the semantics of C11 ``setjmp`` and ``longjmp``. The -``jmpbuf`` pointer must be 64-bit aligned and point to at least 1024 bytes of -allocated memory. +These intrinsics implement the semantics of C11 ``setjmp`` and +``longjmp``. The ``jmpbuf`` pointer must be 64-bit aligned and point to +at least 1024 bytes of allocated memory. .. _atomicintrinsics: @@ -455,10 +462,11 @@ Atomic intrinsics iN* <object>, iN <expected>, iN <desired>, i32 <memory_order_success>, i32 <memory_order_failure>) declare void @llvm.nacl.atomic.fence(i32 <memory_order>) + declare void @llvm.nacl.atomic.fence.all() -Each of these intrinsics is overloaded on the ``iN`` argument, which -is reflected through ``<size>`` in the overload's name. Integral types -of 8, 16, 32 and 64-bit width are supported for these arguments. +Each of these intrinsics is overloaded on the ``iN`` argument, which is +reflected through ``<size>`` in the overload's name. Integral types of +8, 16, 32 and 64-bit width are supported for these arguments. The ``@llvm.nacl.atomic.rmw`` intrinsic implements the following read-modify-write operations, from the general and arithmetic sections @@ -472,8 +480,8 @@ of the C11/C++11 standards: - ``exchange`` For all of these read-modify-write operations, the returned value is -that at ``object`` before the computation. The ``computation`` -argument must be a compile-time constant. +that at ``object`` before the computation. The ``computation`` argument +must be a compile-time constant. All atomic intrinsics also support C11/C++11 memory orderings, which must be compile-time constants. Those are detailed in `Atomic Memory @@ -482,12 +490,17 @@ Ordering Constraints`_. Integer values for these computations and memory orderings are defined in ``"llvm/IR/NaClAtomicIntrinsics.h"``. +The ``@llvm.nacl.atomic.fence.all`` intrinsic is equivalent to the +``@llvm.nacl.atomic.fence`` intrinsic with sequentially consistent +ordering and compiler barriers preventing most non-atomic memory +accesses from reordering around it. + .. note:: These intrinsics allow PNaCl to support C11/C++11 style atomic operations as well as some legacy GCC-style ``__sync_*`` builtins - while remaining stable as the LLVM codebase changes. The user - isn't expected to use these intrinsics directly. + while remaining stable as the LLVM codebase changes. The user isn't + expected to use these intrinsics directly. .. code-block:: llvm diff --git a/include/llvm/IR/Intrinsics.td b/include/llvm/IR/Intrinsics.td index 15567eb2db..c82051ae7d 100644 --- a/include/llvm/IR/Intrinsics.td +++ b/include/llvm/IR/Intrinsics.td @@ -526,6 +526,8 @@ def int_nacl_atomic_cmpxchg : Intrinsic<[llvm_anyint_ty], [IntrReadWriteArgMem]>; def int_nacl_atomic_fence : Intrinsic<[], [llvm_i32_ty], [IntrReadWriteArgMem]>; +def int_nacl_atomic_fence_all : Intrinsic<[], [], + [IntrReadWriteArgMem]>; def int_nacl_atomic_is_lock_free : Intrinsic<[llvm_i1_ty], [llvm_i32_ty, llvm_ptr_ty], [IntrNoMem]>, GCCBuiltin<"__nacl_atomic_is_lock_free">; |