aboutsummaryrefslogtreecommitdiff
path: root/docs/ProgrammersManual.rst
diff options
context:
space:
mode:
Diffstat (limited to 'docs/ProgrammersManual.rst')
-rw-r--r--docs/ProgrammersManual.rst3165
1 files changed, 3165 insertions, 0 deletions
diff --git a/docs/ProgrammersManual.rst b/docs/ProgrammersManual.rst
new file mode 100644
index 0000000000..f626c60ee6
--- /dev/null
+++ b/docs/ProgrammersManual.rst
@@ -0,0 +1,3165 @@
+========================
+LLVM Programmer's Manual
+========================
+
+.. contents::
+ :local:
+
+.. warning::
+ This is a work in progress.
+
+.. sectionauthor:: Chris Lattner <sabre@nondot.org>,
+ Dinakar Dhurjati <dhurjati@cs.uiuc.edu>,
+ Gabor Greif <ggreif@gmail.com>,
+ Joel Stanley <jstanley@cs.uiuc.edu>,
+ Reid Spencer <rspencer@x10sys.com> and
+ Owen Anderson <owen@apple.com>
+
+.. _introduction:
+
+Introduction
+============
+
+This document is meant to highlight some of the important classes and interfaces
+available in the LLVM source-base. This manual is not intended to explain what
+LLVM is, how it works, and what LLVM code looks like. It assumes that you know
+the basics of LLVM and are interested in writing transformations or otherwise
+analyzing or manipulating the code.
+
+This document should get you oriented so that you can find your way in the
+continuously growing source code that makes up the LLVM infrastructure. Note
+that this manual is not intended to serve as a replacement for reading the
+source code, so if you think there should be a method in one of these classes to
+do something, but it's not listed, check the source. Links to the `doxygen
+<http://llvm.org/doxygen/>`__ sources are provided to make this as easy as
+possible.
+
+The first section of this document describes general information that is useful
+to know when working in the LLVM infrastructure, and the second describes the
+Core LLVM classes. In the future this manual will be extended with information
+describing how to use extension libraries, such as dominator information, CFG
+traversal routines, and useful utilities like the ``InstVisitor`` (`doxygen
+<http://llvm.org/doxygen/InstVisitor_8h-source.html>`__) template.
+
+.. _general:
+
+General Information
+===================
+
+This section contains general information that is useful if you are working in
+the LLVM source-base, but that isn't specific to any particular API.
+
+.. _stl:
+
+The C++ Standard Template Library
+---------------------------------
+
+LLVM makes heavy use of the C++ Standard Template Library (STL), perhaps much
+more than you are used to, or have seen before. Because of this, you might want
+to do a little background reading in the techniques used and capabilities of the
+library. There are many good pages that discuss the STL, and several books on
+the subject that you can get, so it will not be discussed in this document.
+
+Here are some useful links:
+
+#. `Dinkumware C++ Library reference
+ <http://www.dinkumware.com/manuals/#Standard C++ Library>`_ - an excellent
+ reference for the STL and other parts of the standard C++ library.
+
+#. `C++ In a Nutshell <http://www.tempest-sw.com/cpp/>`_ - This is an O'Reilly
+ book in the making. It has a decent Standard Library Reference that rivals
+ Dinkumware's, and is unfortunately no longer free since the book has been
+ published.
+
+#. `C++ Frequently Asked Questions <http://www.parashift.com/c++-faq-lite/>`_.
+
+#. `SGI's STL Programmer's Guide <http://www.sgi.com/tech/stl/>`_ - Contains a
+ useful `Introduction to the STL
+ <http://www.sgi.com/tech/stl/stl_introduction.html>`_.
+
+#. `Bjarne Stroustrup's C++ Page
+ <http://www.research.att.com/%7Ebs/C++.html>`_.
+
+#. `Bruce Eckel's Thinking in C++, 2nd ed. Volume 2 Revision 4.0
+ (even better, get the book) <http://64.78.49.204/>`_.
+
+You are also encouraged to take a look at the :ref:`LLVM Coding Standards
+<coding_standards>` guide which focuses on how to write maintainable code more
+than where to put your curly braces.
+
+.. _resources:
+
+Other useful references
+-----------------------
+
+#. `Using static and shared libraries across platforms
+ <http://www.fortran-2000.com/ArnaudRecipes/sharedlib.html>`_
+
+.. _apis:
+
+Important and useful LLVM APIs
+==============================
+
+Here we highlight some LLVM APIs that are generally useful and good to know
+about when writing transformations.
+
+.. _isa:
+
+The ``isa<>``, ``cast<>`` and ``dyn_cast<>`` templates
+------------------------------------------------------
+
+The LLVM source-base makes extensive use of a custom form of RTTI. These
+templates have many similarities to the C++ ``dynamic_cast<>`` operator, but
+they don't have some drawbacks (primarily stemming from the fact that
+``dynamic_cast<>`` only works on classes that have a v-table). Because they are
+used so often, you must know what they do and how they work. All of these
+templates are defined in the ``llvm/Support/Casting.h`` (`doxygen
+<http://llvm.org/doxygen/Casting_8h-source.html>`__) file (note that you very
+rarely have to include this file directly).
+
+``isa<>``:
+ The ``isa<>`` operator works exactly like the Java "``instanceof``" operator.
+ It returns true or false depending on whether a reference or pointer points to
+ an instance of the specified class. This can be very useful for constraint
+ checking of various sorts (example below).
+
+``cast<>``:
+ The ``cast<>`` operator is a "checked cast" operation. It converts a pointer
+ or reference from a base class to a derived class, causing an assertion
+ failure if it is not really an instance of the right type. This should be
+ used in cases where you have some information that makes you believe that
+ something is of the right type. An example of the ``isa<>`` and ``cast<>``
+ template is:
+
+ .. code-block:: c++
+
+ static bool isLoopInvariant(const Value *V, const Loop *L) {
+ if (isa<Constant>(V) || isa<Argument>(V) || isa<GlobalValue>(V))
+ return true;
+
+ // Otherwise, it must be an instruction...
+ return !L->contains(cast<Instruction>(V)->getParent());
+ }
+
+ Note that you should **not** use an ``isa<>`` test followed by a ``cast<>``,
+ for that use the ``dyn_cast<>`` operator.
+
+``dyn_cast<>``:
+ The ``dyn_cast<>`` operator is a "checking cast" operation. It checks to see
+ if the operand is of the specified type, and if so, returns a pointer to it
+ (this operator does not work with references). If the operand is not of the
+ correct type, a null pointer is returned. Thus, this works very much like
+ the ``dynamic_cast<>`` operator in C++, and should be used in the same
+ circumstances. Typically, the ``dyn_cast<>`` operator is used in an ``if``
+ statement or some other flow control statement like this:
+
+ .. code-block:: c++
+
+ if (AllocationInst *AI = dyn_cast<AllocationInst>(Val)) {
+ // ...
+ }
+
+ This form of the ``if`` statement effectively combines together a call to
+ ``isa<>`` and a call to ``cast<>`` into one statement, which is very
+ convenient.
+
+ Note that the ``dyn_cast<>`` operator, like C++'s ``dynamic_cast<>`` or Java's
+ ``instanceof`` operator, can be abused. In particular, you should not use big
+ chained ``if/then/else`` blocks to check for lots of different variants of
+ classes. If you find yourself wanting to do this, it is much cleaner and more
+ efficient to use the ``InstVisitor`` class to dispatch over the instruction
+ type directly.
+
+``cast_or_null<>``:
+ The ``cast_or_null<>`` operator works just like the ``cast<>`` operator,
+ except that it allows for a null pointer as an argument (which it then
+ propagates). This can sometimes be useful, allowing you to combine several
+ null checks into one.
+
+``dyn_cast_or_null<>``:
+ The ``dyn_cast_or_null<>`` operator works just like the ``dyn_cast<>``
+ operator, except that it allows for a null pointer as an argument (which it
+ then propagates). This can sometimes be useful, allowing you to combine
+ several null checks into one.
+
+These five templates can be used with any classes, whether they have a v-table
+or not. If you want to add support for these templates, see the document
+:ref:`How to set up LLVM-style RTTI for your class hierarchy
+<how-to-set-up-llvm-style-rtti>`
+
+.. _string_apis:
+
+Passing strings (the ``StringRef`` and ``Twine`` classes)
+---------------------------------------------------------
+
+Although LLVM generally does not do much string manipulation, we do have several
+important APIs which take strings. Two important examples are the Value class
+-- which has names for instructions, functions, etc. -- and the ``StringMap``
+class which is used extensively in LLVM and Clang.
+
+These are generic classes, and they need to be able to accept strings which may
+have embedded null characters. Therefore, they cannot simply take a ``const
+char *``, and taking a ``const std::string&`` requires clients to perform a heap
+allocation which is usually unnecessary. Instead, many LLVM APIs use a
+``StringRef`` or a ``const Twine&`` for passing strings efficiently.
+
+.. _StringRef:
+
+The ``StringRef`` class
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``StringRef`` data type represents a reference to a constant string (a
+character array and a length) and supports the common operations available on
+``std::string``, but does not require heap allocation.
+
+It can be implicitly constructed using a C style null-terminated string, an
+``std::string``, or explicitly with a character pointer and length. For
+example, the ``StringRef`` find function is declared as:
+
+.. code-block:: c++
+
+ iterator find(StringRef Key);
+
+and clients can call it using any one of:
+
+.. code-block:: c++
+
+ Map.find("foo"); // Lookup "foo"
+ Map.find(std::string("bar")); // Lookup "bar"
+ Map.find(StringRef("\0baz", 4)); // Lookup "\0baz"
+
+Similarly, APIs which need to return a string may return a ``StringRef``
+instance, which can be used directly or converted to an ``std::string`` using
+the ``str`` member function. See ``llvm/ADT/StringRef.h`` (`doxygen
+<http://llvm.org/doxygen/classllvm_1_1StringRef_8h-source.html>`__) for more
+information.
+
+You should rarely use the ``StringRef`` class directly, because it contains
+pointers to external memory it is not generally safe to store an instance of the
+class (unless you know that the external storage will not be freed).
+``StringRef`` is small and pervasive enough in LLVM that it should always be
+passed by value.
+
+The ``Twine`` class
+^^^^^^^^^^^^^^^^^^^
+
+The ``Twine`` (`doxygen <http://llvm.org/doxygen/classllvm_1_1Twine.html>`__)
+class is an efficient way for APIs to accept concatenated strings. For example,
+a common LLVM paradigm is to name one instruction based on the name of another
+instruction with a suffix, for example:
+
+.. code-block:: c++
+
+ New = CmpInst::Create(..., SO->getName() + ".cmp");
+
+The ``Twine`` class is effectively a lightweight `rope
+<http://en.wikipedia.org/wiki/Rope_(computer_science)>`_ which points to
+temporary (stack allocated) objects. Twines can be implicitly constructed as
+the result of the plus operator applied to strings (i.e., a C strings, an
+``std::string``, or a ``StringRef``). The twine delays the actual concatenation
+of strings until it is actually required, at which point it can be efficiently
+rendered directly into a character array. This avoids unnecessary heap
+allocation involved in constructing the temporary results of string
+concatenation. See ``llvm/ADT/Twine.h`` (`doxygen
+<http://llvm.org/doxygen/Twine_8h_source.html>`__) and :ref:`here <dss_twine>`
+for more information.
+
+As with a ``StringRef``, ``Twine`` objects point to external memory and should
+almost never be stored or mentioned directly. They are intended solely for use
+when defining a function which should be able to efficiently accept concatenated
+strings.
+
+.. _DEBUG:
+
+The ``DEBUG()`` macro and ``-debug`` option
+-------------------------------------------
+
+Often when working on your pass you will put a bunch of debugging printouts and
+other code into your pass. After you get it working, you want to remove it, but
+you may need it again in the future (to work out new bugs that you run across).
+
+Naturally, because of this, you don't want to delete the debug printouts, but
+you don't want them to always be noisy. A standard compromise is to comment
+them out, allowing you to enable them if you need them in the future.
+
+The ``llvm/Support/Debug.h`` (`doxygen
+<http://llvm.org/doxygen/Debug_8h-source.html>`__) file provides a macro named
+``DEBUG()`` that is a much nicer solution to this problem. Basically, you can
+put arbitrary code into the argument of the ``DEBUG`` macro, and it is only
+executed if '``opt``' (or any other tool) is run with the '``-debug``' command
+line argument:
+
+.. code-block:: c++
+
+ DEBUG(errs() << "I am here!\n");
+
+Then you can run your pass like this:
+
+.. code-block:: none
+
+ $ opt < a.bc > /dev/null -mypass
+ <no output>
+ $ opt < a.bc > /dev/null -mypass -debug
+ I am here!
+
+Using the ``DEBUG()`` macro instead of a home-brewed solution allows you to not
+have to create "yet another" command line option for the debug output for your
+pass. Note that ``DEBUG()`` macros are disabled for optimized builds, so they
+do not cause a performance impact at all (for the same reason, they should also
+not contain side-effects!).
+
+One additional nice thing about the ``DEBUG()`` macro is that you can enable or
+disable it directly in gdb. Just use "``set DebugFlag=0``" or "``set
+DebugFlag=1``" from the gdb if the program is running. If the program hasn't
+been started yet, you can always just run it with ``-debug``.
+
+.. _DEBUG_TYPE:
+
+Fine grained debug info with ``DEBUG_TYPE`` and the ``-debug-only`` option
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Sometimes you may find yourself in a situation where enabling ``-debug`` just
+turns on **too much** information (such as when working on the code generator).
+If you want to enable debug information with more fine-grained control, you
+define the ``DEBUG_TYPE`` macro and the ``-debug`` only option as follows:
+
+.. code-block:: c++
+
+ #undef DEBUG_TYPE
+ DEBUG(errs() << "No debug type\n");
+ #define DEBUG_TYPE "foo"
+ DEBUG(errs() << "'foo' debug type\n");
+ #undef DEBUG_TYPE
+ #define DEBUG_TYPE "bar"
+ DEBUG(errs() << "'bar' debug type\n"));
+ #undef DEBUG_TYPE
+ #define DEBUG_TYPE ""
+ DEBUG(errs() << "No debug type (2)\n");
+
+Then you can run your pass like this:
+
+.. code-block:: none
+
+ $ opt < a.bc > /dev/null -mypass
+ <no output>
+ $ opt < a.bc > /dev/null -mypass -debug
+ No debug type
+ 'foo' debug type
+ 'bar' debug type
+ No debug type (2)
+ $ opt < a.bc > /dev/null -mypass -debug-only=foo
+ 'foo' debug type
+ $ opt < a.bc > /dev/null -mypass -debug-only=bar
+ 'bar' debug type
+
+Of course, in practice, you should only set ``DEBUG_TYPE`` at the top of a file,
+to specify the debug type for the entire module (if you do this before you
+``#include "llvm/Support/Debug.h"``, you don't have to insert the ugly
+``#undef``'s). Also, you should use names more meaningful than "foo" and "bar",
+because there is no system in place to ensure that names do not conflict. If
+two different modules use the same string, they will all be turned on when the
+name is specified. This allows, for example, all debug information for
+instruction scheduling to be enabled with ``-debug-type=InstrSched``, even if
+the source lives in multiple files.
+
+The ``DEBUG_WITH_TYPE`` macro is also available for situations where you would
+like to set ``DEBUG_TYPE``, but only for one specific ``DEBUG`` statement. It
+takes an additional first parameter, which is the type to use. For example, the
+preceding example could be written as:
+
+.. code-block:: c++
+
+ DEBUG_WITH_TYPE("", errs() << "No debug type\n");
+ DEBUG_WITH_TYPE("foo", errs() << "'foo' debug type\n");
+ DEBUG_WITH_TYPE("bar", errs() << "'bar' debug type\n"));
+ DEBUG_WITH_TYPE("", errs() << "No debug type (2)\n");
+
+.. _Statistic:
+
+The ``Statistic`` class & ``-stats`` option
+-------------------------------------------
+
+The ``llvm/ADT/Statistic.h`` (`doxygen
+<http://llvm.org/doxygen/Statistic_8h-source.html>`__) file provides a class
+named ``Statistic`` that is used as a unified way to keep track of what the LLVM
+compiler is doing and how effective various optimizations are. It is useful to
+see what optimizations are contributing to making a particular program run
+faster.
+
+Often you may run your pass on some big program, and you're interested to see
+how many times it makes a certain transformation. Although you can do this with
+hand inspection, or some ad-hoc method, this is a real pain and not very useful
+for big programs. Using the ``Statistic`` class makes it very easy to keep
+track of this information, and the calculated information is presented in a
+uniform manner with the rest of the passes being executed.
+
+There are many examples of ``Statistic`` uses, but the basics of using it are as
+follows:
+
+#. Define your statistic like this:
+
+ .. code-block:: c++
+
+ #define DEBUG_TYPE "mypassname" // This goes before any #includes.
+ STATISTIC(NumXForms, "The # of times I did stuff");
+
+ The ``STATISTIC`` macro defines a static variable, whose name is specified by
+ the first argument. The pass name is taken from the ``DEBUG_TYPE`` macro, and
+ the description is taken from the second argument. The variable defined
+ ("NumXForms" in this case) acts like an unsigned integer.
+
+#. Whenever you make a transformation, bump the counter:
+
+ .. code-block:: c++
+
+ ++NumXForms; // I did stuff!
+
+That's all you have to do. To get '``opt``' to print out the statistics
+gathered, use the '``-stats``' option:
+
+.. code-block:: none
+
+ $ opt -stats -mypassname < program.bc > /dev/null
+ ... statistics output ...
+
+When running ``opt`` on a C file from the SPEC benchmark suite, it gives a
+report that looks like this:
+
+.. code-block:: none
+
+ 7646 bitcodewriter - Number of normal instructions
+ 725 bitcodewriter - Number of oversized instructions
+ 129996 bitcodewriter - Number of bitcode bytes written
+ 2817 raise - Number of insts DCEd or constprop'd
+ 3213 raise - Number of cast-of-self removed
+ 5046 raise - Number of expression trees converted
+ 75 raise - Number of other getelementptr's formed
+ 138 raise - Number of load/store peepholes
+ 42 deadtypeelim - Number of unused typenames removed from symtab
+ 392 funcresolve - Number of varargs functions resolved
+ 27 globaldce - Number of global variables removed
+ 2 adce - Number of basic blocks removed
+ 134 cee - Number of branches revectored
+ 49 cee - Number of setcc instruction eliminated
+ 532 gcse - Number of loads removed
+ 2919 gcse - Number of instructions removed
+ 86 indvars - Number of canonical indvars added
+ 87 indvars - Number of aux indvars removed
+ 25 instcombine - Number of dead inst eliminate
+ 434 instcombine - Number of insts combined
+ 248 licm - Number of load insts hoisted
+ 1298 licm - Number of insts hoisted to a loop pre-header
+ 3 licm - Number of insts hoisted to multiple loop preds (bad, no loop pre-header)
+ 75 mem2reg - Number of alloca's promoted
+ 1444 cfgsimplify - Number of blocks simplified
+
+Obviously, with so many optimizations, having a unified framework for this stuff
+is very nice. Making your pass fit well into the framework makes it more
+maintainable and useful.
+
+.. _ViewGraph:
+
+Viewing graphs while debugging code
+-----------------------------------
+
+Several of the important data structures in LLVM are graphs: for example CFGs
+made out of LLVM :ref:`BasicBlocks <BasicBlock>`, CFGs made out of LLVM
+:ref:`MachineBasicBlocks <MachineBasicBlock>`, and :ref:`Instruction Selection
+DAGs <SelectionDAG>`. In many cases, while debugging various parts of the
+compiler, it is nice to instantly visualize these graphs.
+
+LLVM provides several callbacks that are available in a debug build to do
+exactly that. If you call the ``Function::viewCFG()`` method, for example, the
+current LLVM tool will pop up a window containing the CFG for the function where
+each basic block is a node in the graph, and each node contains the instructions
+in the block. Similarly, there also exists ``Function::viewCFGOnly()`` (does
+not include the instructions), the ``MachineFunction::viewCFG()`` and
+``MachineFunction::viewCFGOnly()``, and the ``SelectionDAG::viewGraph()``
+methods. Within GDB, for example, you can usually use something like ``call
+DAG.viewGraph()`` to pop up a window. Alternatively, you can sprinkle calls to
+these functions in your code in places you want to debug.
+
+Getting this to work requires a small amount of configuration. On Unix systems
+with X11, install the `graphviz <http://www.graphviz.org>`_ toolkit, and make
+sure 'dot' and 'gv' are in your path. If you are running on Mac OS/X, download
+and install the Mac OS/X `Graphviz program
+<http://www.pixelglow.com/graphviz/>`_ and add
+``/Applications/Graphviz.app/Contents/MacOS/`` (or wherever you install it) to
+your path. Once in your system and path are set up, rerun the LLVM configure
+script and rebuild LLVM to enable this functionality.
+
+``SelectionDAG`` has been extended to make it easier to locate *interesting*
+nodes in large complex graphs. From gdb, if you ``call DAG.setGraphColor(node,
+"color")``, then the next ``call DAG.viewGraph()`` would highlight the node in
+the specified color (choices of colors can be found at `colors
+<http://www.graphviz.org/doc/info/colors.html>`_.) More complex node attributes
+can be provided with ``call DAG.setGraphAttrs(node, "attributes")`` (choices can
+be found at `Graph attributes <http://www.graphviz.org/doc/info/attrs.html>`_.)
+If you want to restart and clear all the current graph attributes, then you can
+``call DAG.clearGraphAttrs()``.
+
+Note that graph visualization features are compiled out of Release builds to
+reduce file size. This means that you need a Debug+Asserts or Release+Asserts
+build to use these features.
+
+.. _datastructure:
+
+Picking the Right Data Structure for a Task
+===========================================
+
+LLVM has a plethora of data structures in the ``llvm/ADT/`` directory, and we
+commonly use STL data structures. This section describes the trade-offs you
+should consider when you pick one.
+
+The first step is a choose your own adventure: do you want a sequential
+container, a set-like container, or a map-like container? The most important
+thing when choosing a container is the algorithmic properties of how you plan to
+access the container. Based on that, you should use:
+
+
+* a :ref:`map-like <ds_map>` container if you need efficient look-up of a
+ value based on another value. Map-like containers also support efficient
+ queries for containment (whether a key is in the map). Map-like containers
+ generally do not support efficient reverse mapping (values to keys). If you
+ need that, use two maps. Some map-like containers also support efficient
+ iteration through the keys in sorted order. Map-like containers are the most
+ expensive sort, only use them if you need one of these capabilities.
+
+* a :ref:`set-like <ds_set>` container if you need to put a bunch of stuff into
+ a container that automatically eliminates duplicates. Some set-like
+ containers support efficient iteration through the elements in sorted order.
+ Set-like containers are more expensive than sequential containers.
+
+* a :ref:`sequential <ds_sequential>` container provides the most efficient way
+ to add elements and keeps track of the order they are added to the collection.
+ They permit duplicates and support efficient iteration, but do not support
+ efficient look-up based on a key.
+
+* a :ref:`string <ds_string>` container is a specialized sequential container or
+ reference structure that is used for character or byte arrays.
+
+* a :ref:`bit <ds_bit>` container provides an efficient way to store and
+ perform set operations on sets of numeric id's, while automatically
+ eliminating duplicates. Bit containers require a maximum of 1 bit for each
+ identifier you want to store.
+
+Once the proper category of container is determined, you can fine tune the
+memory use, constant factors, and cache behaviors of access by intelligently
+picking a member of the category. Note that constant factors and cache behavior
+can be a big deal. If you have a vector that usually only contains a few
+elements (but could contain many), for example, it's much better to use
+:ref:`SmallVector <dss_smallvector>` than :ref:`vector <dss_vector>`. Doing so
+avoids (relatively) expensive malloc/free calls, which dwarf the cost of adding
+the elements to the container.
+
+.. _ds_sequential:
+
+Sequential Containers (std::vector, std::list, etc)
+---------------------------------------------------
+
+There are a variety of sequential containers available for you, based on your
+needs. Pick the first in this section that will do what you want.
+
+.. _dss_arrayref:
+
+llvm/ADT/ArrayRef.h
+^^^^^^^^^^^^^^^^^^^
+
+The ``llvm::ArrayRef`` class is the preferred class to use in an interface that
+accepts a sequential list of elements in memory and just reads from them. By
+taking an ``ArrayRef``, the API can be passed a fixed size array, an
+``std::vector``, an ``llvm::SmallVector`` and anything else that is contiguous
+in memory.
+
+.. _dss_fixedarrays:
+
+Fixed Size Arrays
+^^^^^^^^^^^^^^^^^
+
+Fixed size arrays are very simple and very fast. They are good if you know
+exactly how many elements you have, or you have a (low) upper bound on how many
+you have.
+
+.. _dss_heaparrays:
+
+Heap Allocated Arrays
+^^^^^^^^^^^^^^^^^^^^^
+
+Heap allocated arrays (``new[]`` + ``delete[]``) are also simple. They are good
+if the number of elements is variable, if you know how many elements you will
+need before the array is allocated, and if the array is usually large (if not,
+consider a :ref:`SmallVector <dss_smallvector>`). The cost of a heap allocated
+array is the cost of the new/delete (aka malloc/free). Also note that if you
+are allocating an array of a type with a constructor, the constructor and
+destructors will be run for every element in the array (re-sizable vectors only
+construct those elements actually used).
+
+.. _dss_tinyptrvector:
+
+llvm/ADT/TinyPtrVector.h
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+``TinyPtrVector<Type>`` is a highly specialized collection class that is
+optimized to avoid allocation in the case when a vector has zero or one
+elements. It has two major restrictions: 1) it can only hold values of pointer
+type, and 2) it cannot hold a null pointer.
+
+Since this container is highly specialized, it is rarely used.
+
+.. _dss_smallvector:
+
+llvm/ADT/SmallVector.h
+^^^^^^^^^^^^^^^^^^^^^^
+
+``SmallVector<Type, N>`` is a simple class that looks and smells just like
+``vector<Type>``: it supports efficient iteration, lays out elements in memory
+order (so you can do pointer arithmetic between elements), supports efficient
+push_back/pop_back operations, supports efficient random access to its elements,
+etc.
+
+The advantage of SmallVector is that it allocates space for some number of
+elements (N) **in the object itself**. Because of this, if the SmallVector is
+dynamically smaller than N, no malloc is performed. This can be a big win in
+cases where the malloc/free call is far more expensive than the code that
+fiddles around with the elements.
+
+This is good for vectors that are "usually small" (e.g. the number of
+predecessors/successors of a block is usually less than 8). On the other hand,
+this makes the size of the SmallVector itself large, so you don't want to
+allocate lots of them (doing so will waste a lot of space). As such,
+SmallVectors are most useful when on the stack.
+
+SmallVector also provides a nice portable and efficient replacement for
+``alloca``.
+
+.. _dss_vector:
+
+<vector>
+^^^^^^^^
+
+``std::vector`` is well loved and respected. It is useful when SmallVector
+isn't: when the size of the vector is often large (thus the small optimization
+will rarely be a benefit) or if you will be allocating many instances of the
+vector itself (which would waste space for elements that aren't in the
+container). vector is also useful when interfacing with code that expects
+vectors :).
+
+One worthwhile note about std::vector: avoid code like this:
+
+.. code-block:: c++
+
+ for ( ... ) {
+ std::vector<foo> V;
+ // make use of V.
+ }
+
+Instead, write this as:
+
+.. code-block:: c++
+
+ std::vector<foo> V;
+ for ( ... ) {
+ // make use of V.
+ V.clear();
+ }
+
+Doing so will save (at least) one heap allocation and free per iteration of the
+loop.
+
+.. _dss_deque:
+
+<deque>
+^^^^^^^
+
+``std::deque`` is, in some senses, a generalized version of ``std::vector``.
+Like ``std::vector``, it provides constant time random access and other similar
+properties, but it also provides efficient access to the front of the list. It
+does not guarantee continuity of elements within memory.
+
+In exchange for this extra flexibility, ``std::deque`` has significantly higher
+constant factor costs than ``std::vector``. If possible, use ``std::vector`` or
+something cheaper.
+
+.. _dss_list:
+
+<list>
+^^^^^^
+
+``std::list`` is an extremely inefficient class that is rarely useful. It
+performs a heap allocation for every element inserted into it, thus having an
+extremely high constant factor, particularly for small data types.
+``std::list`` also only supports bidirectional iteration, not random access
+iteration.
+
+In exchange for this high cost, std::list supports efficient access to both ends
+of the list (like ``std::deque``, but unlike ``std::vector`` or
+``SmallVector``). In addition, the iterator invalidation characteristics of
+std::list are stronger than that of a vector class: inserting or removing an
+element into the list does not invalidate iterator or pointers to other elements
+in the list.
+
+.. _dss_ilist:
+
+llvm/ADT/ilist.h
+^^^^^^^^^^^^^^^^
+
+``ilist<T>`` implements an 'intrusive' doubly-linked list. It is intrusive,
+because it requires the element to store and provide access to the prev/next
+pointers for the list.
+
+``ilist`` has the same drawbacks as ``std::list``, and additionally requires an
+``ilist_traits`` implementation for the element type, but it provides some novel
+characteristics. In particular, it can efficiently store polymorphic objects,
+the traits class is informed when an element is inserted or removed from the
+list, and ``ilist``\ s are guaranteed to support a constant-time splice
+operation.
+
+These properties are exactly what we want for things like ``Instruction``\ s and
+basic blocks, which is why these are implemented with ``ilist``\ s.
+
+Related classes of interest are explained in the following subsections:
+
+* :ref:`ilist_traits <dss_ilist_traits>`
+
+* :ref:`iplist <dss_iplist>`
+
+* :ref:`llvm/ADT/ilist_node.h <dss_ilist_node>`
+
+* :ref:`Sentinels <dss_ilist_sentinel>`
+
+.. _dss_packedvector:
+
+llvm/ADT/PackedVector.h
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Useful for storing a vector of values using only a few number of bits for each
+value. Apart from the standard operations of a vector-like container, it can
+also perform an 'or' set operation.
+
+For example:
+
+.. code-block:: c++
+
+ enum State {
+ None = 0x0,
+ FirstCondition = 0x1,
+ SecondCondition = 0x2,
+ Both = 0x3
+ };
+
+ State get() {
+ PackedVector<State, 2> Vec1;
+ Vec1.push_back(FirstCondition);
+
+ PackedVector<State, 2> Vec2;
+ Vec2.push_back(SecondCondition);
+
+ Vec1 |= Vec2;
+ return Vec1[0]; // returns 'Both'.
+ }
+
+.. _dss_ilist_traits:
+
+ilist_traits
+^^^^^^^^^^^^
+
+``ilist_traits<T>`` is ``ilist<T>``'s customization mechanism. ``iplist<T>``
+(and consequently ``ilist<T>``) publicly derive from this traits class.
+
+.. _dss_iplist:
+
+iplist
+^^^^^^
+
+``iplist<T>`` is ``ilist<T>``'s base and as such supports a slightly narrower
+interface. Notably, inserters from ``T&`` are absent.
+
+``ilist_traits<T>`` is a public base of this class and can be used for a wide
+variety of customizations.
+
+.. _dss_ilist_node:
+
+llvm/ADT/ilist_node.h
+^^^^^^^^^^^^^^^^^^^^^
+
+``ilist_node<T>`` implements a the forward and backward links that are expected
+by the ``ilist<T>`` (and analogous containers) in the default manner.
+
+``ilist_node<T>``\ s are meant to be embedded in the node type ``T``, usually
+``T`` publicly derives from ``ilist_node<T>``.
+
+.. _dss_ilist_sentinel:
+
+Sentinels
+^^^^^^^^^
+
+``ilist``\ s have another specialty that must be considered. To be a good
+citizen in the C++ ecosystem, it needs to support the standard container
+operations, such as ``begin`` and ``end`` iterators, etc. Also, the
+``operator--`` must work correctly on the ``end`` iterator in the case of
+non-empty ``ilist``\ s.
+
+The only sensible solution to this problem is to allocate a so-called *sentinel*
+along with the intrusive list, which serves as the ``end`` iterator, providing
+the back-link to the last element. However conforming to the C++ convention it
+is illegal to ``operator++`` beyond the sentinel and it also must not be
+dereferenced.
+
+These constraints allow for some implementation freedom to the ``ilist`` how to
+allocate and store the sentinel. The corresponding policy is dictated by
+``ilist_traits<T>``. By default a ``T`` gets heap-allocated whenever the need
+for a sentinel arises.
+
+While the default policy is sufficient in most cases, it may break down when
+``T`` does not provide a default constructor. Also, in the case of many
+instances of ``ilist``\ s, the memory overhead of the associated sentinels is
+wasted. To alleviate the situation with numerous and voluminous
+``T``-sentinels, sometimes a trick is employed, leading to *ghostly sentinels*.
+
+Ghostly sentinels are obtained by specially-crafted ``ilist_traits<T>`` which
+superpose the sentinel with the ``ilist`` instance in memory. Pointer
+arithmetic is used to obtain the sentinel, which is relative to the ``ilist``'s
+``this`` pointer. The ``ilist`` is augmented by an extra pointer, which serves
+as the back-link of the sentinel. This is the only field in the ghostly
+sentinel which can be legally accessed.
+
+.. _dss_other:
+
+Other Sequential Container options
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Other STL containers are available, such as ``std::string``.
+
+There are also various STL adapter classes such as ``std::queue``,
+``std::priority_queue``, ``std::stack``, etc. These provide simplified access
+to an underlying container but don't affect the cost of the container itself.
+
+.. _ds_string:
+
+String-like containers
+----------------------
+
+There are a variety of ways to pass around and use strings in C and C++, and
+LLVM adds a few new options to choose from. Pick the first option on this list
+that will do what you need, they are ordered according to their relative cost.
+
+Note that is is generally preferred to *not* pass strings around as ``const
+char*``'s. These have a number of problems, including the fact that they
+cannot represent embedded nul ("\0") characters, and do not have a length
+available efficiently. The general replacement for '``const char*``' is
+StringRef.
+
+For more information on choosing string containers for APIs, please see
+:ref:`Passing Strings <string_apis>`.
+
+.. _dss_stringref:
+
+llvm/ADT/StringRef.h
+^^^^^^^^^^^^^^^^^^^^
+
+The StringRef class is a simple value class that contains a pointer to a
+character and a length, and is quite related to the :ref:`ArrayRef
+<dss_arrayref>` class (but specialized for arrays of characters). Because
+StringRef carries a length with it, it safely handles strings with embedded nul
+characters in it, getting the length does not require a strlen call, and it even
+has very convenient APIs for slicing and dicing the character range that it
+represents.
+
+StringRef is ideal for passing simple strings around that are known to be live,
+either because they are C string literals, std::string, a C array, or a
+SmallVector. Each of these cases has an efficient implicit conversion to
+StringRef, which doesn't result in a dynamic strlen being executed.
+
+StringRef has a few major limitations which make more powerful string containers
+useful:
+
+#. You cannot directly convert a StringRef to a 'const char*' because there is
+ no way to add a trailing nul (unlike the .c_str() method on various stronger
+ classes).
+
+#. StringRef doesn't own or keep alive the underlying string bytes.
+ As such it can easily lead to dangling pointers, and is not suitable for
+ embedding in datastructures in most cases (instead, use an std::string or
+ something like that).
+
+#. For the same reason, StringRef cannot be used as the return value of a
+ method if the method "computes" the result string. Instead, use std::string.
+
+#. StringRef's do not allow you to mutate the pointed-to string bytes and it
+ doesn't allow you to insert or remove bytes from the range. For editing
+ operations like this, it interoperates with the :ref:`Twine <dss_twine>`
+ class.
+
+Because of its strengths and limitations, it is very common for a function to
+take a StringRef and for a method on an object to return a StringRef that points
+into some string that it owns.
+
+.. _dss_twine:
+
+llvm/ADT/Twine.h
+^^^^^^^^^^^^^^^^
+
+The Twine class is used as an intermediary datatype for APIs that want to take a
+string that can be constructed inline with a series of concatenations. Twine
+works by forming recursive instances of the Twine datatype (a simple value
+object) on the stack as temporary objects, linking them together