diff options
Diffstat (limited to 'docs/ProgrammersManual.rst')
-rw-r--r-- | docs/ProgrammersManual.rst | 3165 |
1 files changed, 3165 insertions, 0 deletions
diff --git a/docs/ProgrammersManual.rst b/docs/ProgrammersManual.rst new file mode 100644 index 0000000000..f626c60ee6 --- /dev/null +++ b/docs/ProgrammersManual.rst @@ -0,0 +1,3165 @@ +======================== +LLVM Programmer's Manual +======================== + +.. contents:: + :local: + +.. warning:: + This is a work in progress. + +.. sectionauthor:: Chris Lattner <sabre@nondot.org>, + Dinakar Dhurjati <dhurjati@cs.uiuc.edu>, + Gabor Greif <ggreif@gmail.com>, + Joel Stanley <jstanley@cs.uiuc.edu>, + Reid Spencer <rspencer@x10sys.com> and + Owen Anderson <owen@apple.com> + +.. _introduction: + +Introduction +============ + +This document is meant to highlight some of the important classes and interfaces +available in the LLVM source-base. This manual is not intended to explain what +LLVM is, how it works, and what LLVM code looks like. It assumes that you know +the basics of LLVM and are interested in writing transformations or otherwise +analyzing or manipulating the code. + +This document should get you oriented so that you can find your way in the +continuously growing source code that makes up the LLVM infrastructure. Note +that this manual is not intended to serve as a replacement for reading the +source code, so if you think there should be a method in one of these classes to +do something, but it's not listed, check the source. Links to the `doxygen +<http://llvm.org/doxygen/>`__ sources are provided to make this as easy as +possible. + +The first section of this document describes general information that is useful +to know when working in the LLVM infrastructure, and the second describes the +Core LLVM classes. In the future this manual will be extended with information +describing how to use extension libraries, such as dominator information, CFG +traversal routines, and useful utilities like the ``InstVisitor`` (`doxygen +<http://llvm.org/doxygen/InstVisitor_8h-source.html>`__) template. + +.. _general: + +General Information +=================== + +This section contains general information that is useful if you are working in +the LLVM source-base, but that isn't specific to any particular API. + +.. _stl: + +The C++ Standard Template Library +--------------------------------- + +LLVM makes heavy use of the C++ Standard Template Library (STL), perhaps much +more than you are used to, or have seen before. Because of this, you might want +to do a little background reading in the techniques used and capabilities of the +library. There are many good pages that discuss the STL, and several books on +the subject that you can get, so it will not be discussed in this document. + +Here are some useful links: + +#. `Dinkumware C++ Library reference + <http://www.dinkumware.com/manuals/#Standard C++ Library>`_ - an excellent + reference for the STL and other parts of the standard C++ library. + +#. `C++ In a Nutshell <http://www.tempest-sw.com/cpp/>`_ - This is an O'Reilly + book in the making. It has a decent Standard Library Reference that rivals + Dinkumware's, and is unfortunately no longer free since the book has been + published. + +#. `C++ Frequently Asked Questions <http://www.parashift.com/c++-faq-lite/>`_. + +#. `SGI's STL Programmer's Guide <http://www.sgi.com/tech/stl/>`_ - Contains a + useful `Introduction to the STL + <http://www.sgi.com/tech/stl/stl_introduction.html>`_. + +#. `Bjarne Stroustrup's C++ Page + <http://www.research.att.com/%7Ebs/C++.html>`_. + +#. `Bruce Eckel's Thinking in C++, 2nd ed. Volume 2 Revision 4.0 + (even better, get the book) <http://64.78.49.204/>`_. + +You are also encouraged to take a look at the :ref:`LLVM Coding Standards +<coding_standards>` guide which focuses on how to write maintainable code more +than where to put your curly braces. + +.. _resources: + +Other useful references +----------------------- + +#. `Using static and shared libraries across platforms + <http://www.fortran-2000.com/ArnaudRecipes/sharedlib.html>`_ + +.. _apis: + +Important and useful LLVM APIs +============================== + +Here we highlight some LLVM APIs that are generally useful and good to know +about when writing transformations. + +.. _isa: + +The ``isa<>``, ``cast<>`` and ``dyn_cast<>`` templates +------------------------------------------------------ + +The LLVM source-base makes extensive use of a custom form of RTTI. These +templates have many similarities to the C++ ``dynamic_cast<>`` operator, but +they don't have some drawbacks (primarily stemming from the fact that +``dynamic_cast<>`` only works on classes that have a v-table). Because they are +used so often, you must know what they do and how they work. All of these +templates are defined in the ``llvm/Support/Casting.h`` (`doxygen +<http://llvm.org/doxygen/Casting_8h-source.html>`__) file (note that you very +rarely have to include this file directly). + +``isa<>``: + The ``isa<>`` operator works exactly like the Java "``instanceof``" operator. + It returns true or false depending on whether a reference or pointer points to + an instance of the specified class. This can be very useful for constraint + checking of various sorts (example below). + +``cast<>``: + The ``cast<>`` operator is a "checked cast" operation. It converts a pointer + or reference from a base class to a derived class, causing an assertion + failure if it is not really an instance of the right type. This should be + used in cases where you have some information that makes you believe that + something is of the right type. An example of the ``isa<>`` and ``cast<>`` + template is: + + .. code-block:: c++ + + static bool isLoopInvariant(const Value *V, const Loop *L) { + if (isa<Constant>(V) || isa<Argument>(V) || isa<GlobalValue>(V)) + return true; + + // Otherwise, it must be an instruction... + return !L->contains(cast<Instruction>(V)->getParent()); + } + + Note that you should **not** use an ``isa<>`` test followed by a ``cast<>``, + for that use the ``dyn_cast<>`` operator. + +``dyn_cast<>``: + The ``dyn_cast<>`` operator is a "checking cast" operation. It checks to see + if the operand is of the specified type, and if so, returns a pointer to it + (this operator does not work with references). If the operand is not of the + correct type, a null pointer is returned. Thus, this works very much like + the ``dynamic_cast<>`` operator in C++, and should be used in the same + circumstances. Typically, the ``dyn_cast<>`` operator is used in an ``if`` + statement or some other flow control statement like this: + + .. code-block:: c++ + + if (AllocationInst *AI = dyn_cast<AllocationInst>(Val)) { + // ... + } + + This form of the ``if`` statement effectively combines together a call to + ``isa<>`` and a call to ``cast<>`` into one statement, which is very + convenient. + + Note that the ``dyn_cast<>`` operator, like C++'s ``dynamic_cast<>`` or Java's + ``instanceof`` operator, can be abused. In particular, you should not use big + chained ``if/then/else`` blocks to check for lots of different variants of + classes. If you find yourself wanting to do this, it is much cleaner and more + efficient to use the ``InstVisitor`` class to dispatch over the instruction + type directly. + +``cast_or_null<>``: + The ``cast_or_null<>`` operator works just like the ``cast<>`` operator, + except that it allows for a null pointer as an argument (which it then + propagates). This can sometimes be useful, allowing you to combine several + null checks into one. + +``dyn_cast_or_null<>``: + The ``dyn_cast_or_null<>`` operator works just like the ``dyn_cast<>`` + operator, except that it allows for a null pointer as an argument (which it + then propagates). This can sometimes be useful, allowing you to combine + several null checks into one. + +These five templates can be used with any classes, whether they have a v-table +or not. If you want to add support for these templates, see the document +:ref:`How to set up LLVM-style RTTI for your class hierarchy +<how-to-set-up-llvm-style-rtti>` + +.. _string_apis: + +Passing strings (the ``StringRef`` and ``Twine`` classes) +--------------------------------------------------------- + +Although LLVM generally does not do much string manipulation, we do have several +important APIs which take strings. Two important examples are the Value class +-- which has names for instructions, functions, etc. -- and the ``StringMap`` +class which is used extensively in LLVM and Clang. + +These are generic classes, and they need to be able to accept strings which may +have embedded null characters. Therefore, they cannot simply take a ``const +char *``, and taking a ``const std::string&`` requires clients to perform a heap +allocation which is usually unnecessary. Instead, many LLVM APIs use a +``StringRef`` or a ``const Twine&`` for passing strings efficiently. + +.. _StringRef: + +The ``StringRef`` class +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The ``StringRef`` data type represents a reference to a constant string (a +character array and a length) and supports the common operations available on +``std::string``, but does not require heap allocation. + +It can be implicitly constructed using a C style null-terminated string, an +``std::string``, or explicitly with a character pointer and length. For +example, the ``StringRef`` find function is declared as: + +.. code-block:: c++ + + iterator find(StringRef Key); + +and clients can call it using any one of: + +.. code-block:: c++ + + Map.find("foo"); // Lookup "foo" + Map.find(std::string("bar")); // Lookup "bar" + Map.find(StringRef("\0baz", 4)); // Lookup "\0baz" + +Similarly, APIs which need to return a string may return a ``StringRef`` +instance, which can be used directly or converted to an ``std::string`` using +the ``str`` member function. See ``llvm/ADT/StringRef.h`` (`doxygen +<http://llvm.org/doxygen/classllvm_1_1StringRef_8h-source.html>`__) for more +information. + +You should rarely use the ``StringRef`` class directly, because it contains +pointers to external memory it is not generally safe to store an instance of the +class (unless you know that the external storage will not be freed). +``StringRef`` is small and pervasive enough in LLVM that it should always be +passed by value. + +The ``Twine`` class +^^^^^^^^^^^^^^^^^^^ + +The ``Twine`` (`doxygen <http://llvm.org/doxygen/classllvm_1_1Twine.html>`__) +class is an efficient way for APIs to accept concatenated strings. For example, +a common LLVM paradigm is to name one instruction based on the name of another +instruction with a suffix, for example: + +.. code-block:: c++ + + New = CmpInst::Create(..., SO->getName() + ".cmp"); + +The ``Twine`` class is effectively a lightweight `rope +<http://en.wikipedia.org/wiki/Rope_(computer_science)>`_ which points to +temporary (stack allocated) objects. Twines can be implicitly constructed as +the result of the plus operator applied to strings (i.e., a C strings, an +``std::string``, or a ``StringRef``). The twine delays the actual concatenation +of strings until it is actually required, at which point it can be efficiently +rendered directly into a character array. This avoids unnecessary heap +allocation involved in constructing the temporary results of string +concatenation. See ``llvm/ADT/Twine.h`` (`doxygen +<http://llvm.org/doxygen/Twine_8h_source.html>`__) and :ref:`here <dss_twine>` +for more information. + +As with a ``StringRef``, ``Twine`` objects point to external memory and should +almost never be stored or mentioned directly. They are intended solely for use +when defining a function which should be able to efficiently accept concatenated +strings. + +.. _DEBUG: + +The ``DEBUG()`` macro and ``-debug`` option +------------------------------------------- + +Often when working on your pass you will put a bunch of debugging printouts and +other code into your pass. After you get it working, you want to remove it, but +you may need it again in the future (to work out new bugs that you run across). + +Naturally, because of this, you don't want to delete the debug printouts, but +you don't want them to always be noisy. A standard compromise is to comment +them out, allowing you to enable them if you need them in the future. + +The ``llvm/Support/Debug.h`` (`doxygen +<http://llvm.org/doxygen/Debug_8h-source.html>`__) file provides a macro named +``DEBUG()`` that is a much nicer solution to this problem. Basically, you can +put arbitrary code into the argument of the ``DEBUG`` macro, and it is only +executed if '``opt``' (or any other tool) is run with the '``-debug``' command +line argument: + +.. code-block:: c++ + + DEBUG(errs() << "I am here!\n"); + +Then you can run your pass like this: + +.. code-block:: none + + $ opt < a.bc > /dev/null -mypass + <no output> + $ opt < a.bc > /dev/null -mypass -debug + I am here! + +Using the ``DEBUG()`` macro instead of a home-brewed solution allows you to not +have to create "yet another" command line option for the debug output for your +pass. Note that ``DEBUG()`` macros are disabled for optimized builds, so they +do not cause a performance impact at all (for the same reason, they should also +not contain side-effects!). + +One additional nice thing about the ``DEBUG()`` macro is that you can enable or +disable it directly in gdb. Just use "``set DebugFlag=0``" or "``set +DebugFlag=1``" from the gdb if the program is running. If the program hasn't +been started yet, you can always just run it with ``-debug``. + +.. _DEBUG_TYPE: + +Fine grained debug info with ``DEBUG_TYPE`` and the ``-debug-only`` option +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Sometimes you may find yourself in a situation where enabling ``-debug`` just +turns on **too much** information (such as when working on the code generator). +If you want to enable debug information with more fine-grained control, you +define the ``DEBUG_TYPE`` macro and the ``-debug`` only option as follows: + +.. code-block:: c++ + + #undef DEBUG_TYPE + DEBUG(errs() << "No debug type\n"); + #define DEBUG_TYPE "foo" + DEBUG(errs() << "'foo' debug type\n"); + #undef DEBUG_TYPE + #define DEBUG_TYPE "bar" + DEBUG(errs() << "'bar' debug type\n")); + #undef DEBUG_TYPE + #define DEBUG_TYPE "" + DEBUG(errs() << "No debug type (2)\n"); + +Then you can run your pass like this: + +.. code-block:: none + + $ opt < a.bc > /dev/null -mypass + <no output> + $ opt < a.bc > /dev/null -mypass -debug + No debug type + 'foo' debug type + 'bar' debug type + No debug type (2) + $ opt < a.bc > /dev/null -mypass -debug-only=foo + 'foo' debug type + $ opt < a.bc > /dev/null -mypass -debug-only=bar + 'bar' debug type + +Of course, in practice, you should only set ``DEBUG_TYPE`` at the top of a file, +to specify the debug type for the entire module (if you do this before you +``#include "llvm/Support/Debug.h"``, you don't have to insert the ugly +``#undef``'s). Also, you should use names more meaningful than "foo" and "bar", +because there is no system in place to ensure that names do not conflict. If +two different modules use the same string, they will all be turned on when the +name is specified. This allows, for example, all debug information for +instruction scheduling to be enabled with ``-debug-type=InstrSched``, even if +the source lives in multiple files. + +The ``DEBUG_WITH_TYPE`` macro is also available for situations where you would +like to set ``DEBUG_TYPE``, but only for one specific ``DEBUG`` statement. It +takes an additional first parameter, which is the type to use. For example, the +preceding example could be written as: + +.. code-block:: c++ + + DEBUG_WITH_TYPE("", errs() << "No debug type\n"); + DEBUG_WITH_TYPE("foo", errs() << "'foo' debug type\n"); + DEBUG_WITH_TYPE("bar", errs() << "'bar' debug type\n")); + DEBUG_WITH_TYPE("", errs() << "No debug type (2)\n"); + +.. _Statistic: + +The ``Statistic`` class & ``-stats`` option +------------------------------------------- + +The ``llvm/ADT/Statistic.h`` (`doxygen +<http://llvm.org/doxygen/Statistic_8h-source.html>`__) file provides a class +named ``Statistic`` that is used as a unified way to keep track of what the LLVM +compiler is doing and how effective various optimizations are. It is useful to +see what optimizations are contributing to making a particular program run +faster. + +Often you may run your pass on some big program, and you're interested to see +how many times it makes a certain transformation. Although you can do this with +hand inspection, or some ad-hoc method, this is a real pain and not very useful +for big programs. Using the ``Statistic`` class makes it very easy to keep +track of this information, and the calculated information is presented in a +uniform manner with the rest of the passes being executed. + +There are many examples of ``Statistic`` uses, but the basics of using it are as +follows: + +#. Define your statistic like this: + + .. code-block:: c++ + + #define DEBUG_TYPE "mypassname" // This goes before any #includes. + STATISTIC(NumXForms, "The # of times I did stuff"); + + The ``STATISTIC`` macro defines a static variable, whose name is specified by + the first argument. The pass name is taken from the ``DEBUG_TYPE`` macro, and + the description is taken from the second argument. The variable defined + ("NumXForms" in this case) acts like an unsigned integer. + +#. Whenever you make a transformation, bump the counter: + + .. code-block:: c++ + + ++NumXForms; // I did stuff! + +That's all you have to do. To get '``opt``' to print out the statistics +gathered, use the '``-stats``' option: + +.. code-block:: none + + $ opt -stats -mypassname < program.bc > /dev/null + ... statistics output ... + +When running ``opt`` on a C file from the SPEC benchmark suite, it gives a +report that looks like this: + +.. code-block:: none + + 7646 bitcodewriter - Number of normal instructions + 725 bitcodewriter - Number of oversized instructions + 129996 bitcodewriter - Number of bitcode bytes written + 2817 raise - Number of insts DCEd or constprop'd + 3213 raise - Number of cast-of-self removed + 5046 raise - Number of expression trees converted + 75 raise - Number of other getelementptr's formed + 138 raise - Number of load/store peepholes + 42 deadtypeelim - Number of unused typenames removed from symtab + 392 funcresolve - Number of varargs functions resolved + 27 globaldce - Number of global variables removed + 2 adce - Number of basic blocks removed + 134 cee - Number of branches revectored + 49 cee - Number of setcc instruction eliminated + 532 gcse - Number of loads removed + 2919 gcse - Number of instructions removed + 86 indvars - Number of canonical indvars added + 87 indvars - Number of aux indvars removed + 25 instcombine - Number of dead inst eliminate + 434 instcombine - Number of insts combined + 248 licm - Number of load insts hoisted + 1298 licm - Number of insts hoisted to a loop pre-header + 3 licm - Number of insts hoisted to multiple loop preds (bad, no loop pre-header) + 75 mem2reg - Number of alloca's promoted + 1444 cfgsimplify - Number of blocks simplified + +Obviously, with so many optimizations, having a unified framework for this stuff +is very nice. Making your pass fit well into the framework makes it more +maintainable and useful. + +.. _ViewGraph: + +Viewing graphs while debugging code +----------------------------------- + +Several of the important data structures in LLVM are graphs: for example CFGs +made out of LLVM :ref:`BasicBlocks <BasicBlock>`, CFGs made out of LLVM +:ref:`MachineBasicBlocks <MachineBasicBlock>`, and :ref:`Instruction Selection +DAGs <SelectionDAG>`. In many cases, while debugging various parts of the +compiler, it is nice to instantly visualize these graphs. + +LLVM provides several callbacks that are available in a debug build to do +exactly that. If you call the ``Function::viewCFG()`` method, for example, the +current LLVM tool will pop up a window containing the CFG for the function where +each basic block is a node in the graph, and each node contains the instructions +in the block. Similarly, there also exists ``Function::viewCFGOnly()`` (does +not include the instructions), the ``MachineFunction::viewCFG()`` and +``MachineFunction::viewCFGOnly()``, and the ``SelectionDAG::viewGraph()`` +methods. Within GDB, for example, you can usually use something like ``call +DAG.viewGraph()`` to pop up a window. Alternatively, you can sprinkle calls to +these functions in your code in places you want to debug. + +Getting this to work requires a small amount of configuration. On Unix systems +with X11, install the `graphviz <http://www.graphviz.org>`_ toolkit, and make +sure 'dot' and 'gv' are in your path. If you are running on Mac OS/X, download +and install the Mac OS/X `Graphviz program +<http://www.pixelglow.com/graphviz/>`_ and add +``/Applications/Graphviz.app/Contents/MacOS/`` (or wherever you install it) to +your path. Once in your system and path are set up, rerun the LLVM configure +script and rebuild LLVM to enable this functionality. + +``SelectionDAG`` has been extended to make it easier to locate *interesting* +nodes in large complex graphs. From gdb, if you ``call DAG.setGraphColor(node, +"color")``, then the next ``call DAG.viewGraph()`` would highlight the node in +the specified color (choices of colors can be found at `colors +<http://www.graphviz.org/doc/info/colors.html>`_.) More complex node attributes +can be provided with ``call DAG.setGraphAttrs(node, "attributes")`` (choices can +be found at `Graph attributes <http://www.graphviz.org/doc/info/attrs.html>`_.) +If you want to restart and clear all the current graph attributes, then you can +``call DAG.clearGraphAttrs()``. + +Note that graph visualization features are compiled out of Release builds to +reduce file size. This means that you need a Debug+Asserts or Release+Asserts +build to use these features. + +.. _datastructure: + +Picking the Right Data Structure for a Task +=========================================== + +LLVM has a plethora of data structures in the ``llvm/ADT/`` directory, and we +commonly use STL data structures. This section describes the trade-offs you +should consider when you pick one. + +The first step is a choose your own adventure: do you want a sequential +container, a set-like container, or a map-like container? The most important +thing when choosing a container is the algorithmic properties of how you plan to +access the container. Based on that, you should use: + + +* a :ref:`map-like <ds_map>` container if you need efficient look-up of a + value based on another value. Map-like containers also support efficient + queries for containment (whether a key is in the map). Map-like containers + generally do not support efficient reverse mapping (values to keys). If you + need that, use two maps. Some map-like containers also support efficient + iteration through the keys in sorted order. Map-like containers are the most + expensive sort, only use them if you need one of these capabilities. + +* a :ref:`set-like <ds_set>` container if you need to put a bunch of stuff into + a container that automatically eliminates duplicates. Some set-like + containers support efficient iteration through the elements in sorted order. + Set-like containers are more expensive than sequential containers. + +* a :ref:`sequential <ds_sequential>` container provides the most efficient way + to add elements and keeps track of the order they are added to the collection. + They permit duplicates and support efficient iteration, but do not support + efficient look-up based on a key. + +* a :ref:`string <ds_string>` container is a specialized sequential container or + reference structure that is used for character or byte arrays. + +* a :ref:`bit <ds_bit>` container provides an efficient way to store and + perform set operations on sets of numeric id's, while automatically + eliminating duplicates. Bit containers require a maximum of 1 bit for each + identifier you want to store. + +Once the proper category of container is determined, you can fine tune the +memory use, constant factors, and cache behaviors of access by intelligently +picking a member of the category. Note that constant factors and cache behavior +can be a big deal. If you have a vector that usually only contains a few +elements (but could contain many), for example, it's much better to use +:ref:`SmallVector <dss_smallvector>` than :ref:`vector <dss_vector>`. Doing so +avoids (relatively) expensive malloc/free calls, which dwarf the cost of adding +the elements to the container. + +.. _ds_sequential: + +Sequential Containers (std::vector, std::list, etc) +--------------------------------------------------- + +There are a variety of sequential containers available for you, based on your +needs. Pick the first in this section that will do what you want. + +.. _dss_arrayref: + +llvm/ADT/ArrayRef.h +^^^^^^^^^^^^^^^^^^^ + +The ``llvm::ArrayRef`` class is the preferred class to use in an interface that +accepts a sequential list of elements in memory and just reads from them. By +taking an ``ArrayRef``, the API can be passed a fixed size array, an +``std::vector``, an ``llvm::SmallVector`` and anything else that is contiguous +in memory. + +.. _dss_fixedarrays: + +Fixed Size Arrays +^^^^^^^^^^^^^^^^^ + +Fixed size arrays are very simple and very fast. They are good if you know +exactly how many elements you have, or you have a (low) upper bound on how many +you have. + +.. _dss_heaparrays: + +Heap Allocated Arrays +^^^^^^^^^^^^^^^^^^^^^ + +Heap allocated arrays (``new[]`` + ``delete[]``) are also simple. They are good +if the number of elements is variable, if you know how many elements you will +need before the array is allocated, and if the array is usually large (if not, +consider a :ref:`SmallVector <dss_smallvector>`). The cost of a heap allocated +array is the cost of the new/delete (aka malloc/free). Also note that if you +are allocating an array of a type with a constructor, the constructor and +destructors will be run for every element in the array (re-sizable vectors only +construct those elements actually used). + +.. _dss_tinyptrvector: + +llvm/ADT/TinyPtrVector.h +^^^^^^^^^^^^^^^^^^^^^^^^ + +``TinyPtrVector<Type>`` is a highly specialized collection class that is +optimized to avoid allocation in the case when a vector has zero or one +elements. It has two major restrictions: 1) it can only hold values of pointer +type, and 2) it cannot hold a null pointer. + +Since this container is highly specialized, it is rarely used. + +.. _dss_smallvector: + +llvm/ADT/SmallVector.h +^^^^^^^^^^^^^^^^^^^^^^ + +``SmallVector<Type, N>`` is a simple class that looks and smells just like +``vector<Type>``: it supports efficient iteration, lays out elements in memory +order (so you can do pointer arithmetic between elements), supports efficient +push_back/pop_back operations, supports efficient random access to its elements, +etc. + +The advantage of SmallVector is that it allocates space for some number of +elements (N) **in the object itself**. Because of this, if the SmallVector is +dynamically smaller than N, no malloc is performed. This can be a big win in +cases where the malloc/free call is far more expensive than the code that +fiddles around with the elements. + +This is good for vectors that are "usually small" (e.g. the number of +predecessors/successors of a block is usually less than 8). On the other hand, +this makes the size of the SmallVector itself large, so you don't want to +allocate lots of them (doing so will waste a lot of space). As such, +SmallVectors are most useful when on the stack. + +SmallVector also provides a nice portable and efficient replacement for +``alloca``. + +.. _dss_vector: + +<vector> +^^^^^^^^ + +``std::vector`` is well loved and respected. It is useful when SmallVector +isn't: when the size of the vector is often large (thus the small optimization +will rarely be a benefit) or if you will be allocating many instances of the +vector itself (which would waste space for elements that aren't in the +container). vector is also useful when interfacing with code that expects +vectors :). + +One worthwhile note about std::vector: avoid code like this: + +.. code-block:: c++ + + for ( ... ) { + std::vector<foo> V; + // make use of V. + } + +Instead, write this as: + +.. code-block:: c++ + + std::vector<foo> V; + for ( ... ) { + // make use of V. + V.clear(); + } + +Doing so will save (at least) one heap allocation and free per iteration of the +loop. + +.. _dss_deque: + +<deque> +^^^^^^^ + +``std::deque`` is, in some senses, a generalized version of ``std::vector``. +Like ``std::vector``, it provides constant time random access and other similar +properties, but it also provides efficient access to the front of the list. It +does not guarantee continuity of elements within memory. + +In exchange for this extra flexibility, ``std::deque`` has significantly higher +constant factor costs than ``std::vector``. If possible, use ``std::vector`` or +something cheaper. + +.. _dss_list: + +<list> +^^^^^^ + +``std::list`` is an extremely inefficient class that is rarely useful. It +performs a heap allocation for every element inserted into it, thus having an +extremely high constant factor, particularly for small data types. +``std::list`` also only supports bidirectional iteration, not random access +iteration. + +In exchange for this high cost, std::list supports efficient access to both ends +of the list (like ``std::deque``, but unlike ``std::vector`` or +``SmallVector``). In addition, the iterator invalidation characteristics of +std::list are stronger than that of a vector class: inserting or removing an +element into the list does not invalidate iterator or pointers to other elements +in the list. + +.. _dss_ilist: + +llvm/ADT/ilist.h +^^^^^^^^^^^^^^^^ + +``ilist<T>`` implements an 'intrusive' doubly-linked list. It is intrusive, +because it requires the element to store and provide access to the prev/next +pointers for the list. + +``ilist`` has the same drawbacks as ``std::list``, and additionally requires an +``ilist_traits`` implementation for the element type, but it provides some novel +characteristics. In particular, it can efficiently store polymorphic objects, +the traits class is informed when an element is inserted or removed from the +list, and ``ilist``\ s are guaranteed to support a constant-time splice +operation. + +These properties are exactly what we want for things like ``Instruction``\ s and +basic blocks, which is why these are implemented with ``ilist``\ s. + +Related classes of interest are explained in the following subsections: + +* :ref:`ilist_traits <dss_ilist_traits>` + +* :ref:`iplist <dss_iplist>` + +* :ref:`llvm/ADT/ilist_node.h <dss_ilist_node>` + +* :ref:`Sentinels <dss_ilist_sentinel>` + +.. _dss_packedvector: + +llvm/ADT/PackedVector.h +^^^^^^^^^^^^^^^^^^^^^^^ + +Useful for storing a vector of values using only a few number of bits for each +value. Apart from the standard operations of a vector-like container, it can +also perform an 'or' set operation. + +For example: + +.. code-block:: c++ + + enum State { + None = 0x0, + FirstCondition = 0x1, + SecondCondition = 0x2, + Both = 0x3 + }; + + State get() { + PackedVector<State, 2> Vec1; + Vec1.push_back(FirstCondition); + + PackedVector<State, 2> Vec2; + Vec2.push_back(SecondCondition); + + Vec1 |= Vec2; + return Vec1[0]; // returns 'Both'. + } + +.. _dss_ilist_traits: + +ilist_traits +^^^^^^^^^^^^ + +``ilist_traits<T>`` is ``ilist<T>``'s customization mechanism. ``iplist<T>`` +(and consequently ``ilist<T>``) publicly derive from this traits class. + +.. _dss_iplist: + +iplist +^^^^^^ + +``iplist<T>`` is ``ilist<T>``'s base and as such supports a slightly narrower +interface. Notably, inserters from ``T&`` are absent. + +``ilist_traits<T>`` is a public base of this class and can be used for a wide +variety of customizations. + +.. _dss_ilist_node: + +llvm/ADT/ilist_node.h +^^^^^^^^^^^^^^^^^^^^^ + +``ilist_node<T>`` implements a the forward and backward links that are expected +by the ``ilist<T>`` (and analogous containers) in the default manner. + +``ilist_node<T>``\ s are meant to be embedded in the node type ``T``, usually +``T`` publicly derives from ``ilist_node<T>``. + +.. _dss_ilist_sentinel: + +Sentinels +^^^^^^^^^ + +``ilist``\ s have another specialty that must be considered. To be a good +citizen in the C++ ecosystem, it needs to support the standard container +operations, such as ``begin`` and ``end`` iterators, etc. Also, the +``operator--`` must work correctly on the ``end`` iterator in the case of +non-empty ``ilist``\ s. + +The only sensible solution to this problem is to allocate a so-called *sentinel* +along with the intrusive list, which serves as the ``end`` iterator, providing +the back-link to the last element. However conforming to the C++ convention it +is illegal to ``operator++`` beyond the sentinel and it also must not be +dereferenced. + +These constraints allow for some implementation freedom to the ``ilist`` how to +allocate and store the sentinel. The corresponding policy is dictated by +``ilist_traits<T>``. By default a ``T`` gets heap-allocated whenever the need +for a sentinel arises. + +While the default policy is sufficient in most cases, it may break down when +``T`` does not provide a default constructor. Also, in the case of many +instances of ``ilist``\ s, the memory overhead of the associated sentinels is +wasted. To alleviate the situation with numerous and voluminous +``T``-sentinels, sometimes a trick is employed, leading to *ghostly sentinels*. + +Ghostly sentinels are obtained by specially-crafted ``ilist_traits<T>`` which +superpose the sentinel with the ``ilist`` instance in memory. Pointer +arithmetic is used to obtain the sentinel, which is relative to the ``ilist``'s +``this`` pointer. The ``ilist`` is augmented by an extra pointer, which serves +as the back-link of the sentinel. This is the only field in the ghostly +sentinel which can be legally accessed. + +.. _dss_other: + +Other Sequential Container options +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Other STL containers are available, such as ``std::string``. + +There are also various STL adapter classes such as ``std::queue``, +``std::priority_queue``, ``std::stack``, etc. These provide simplified access +to an underlying container but don't affect the cost of the container itself. + +.. _ds_string: + +String-like containers +---------------------- + +There are a variety of ways to pass around and use strings in C and C++, and +LLVM adds a few new options to choose from. Pick the first option on this list +that will do what you need, they are ordered according to their relative cost. + +Note that is is generally preferred to *not* pass strings around as ``const +char*``'s. These have a number of problems, including the fact that they +cannot represent embedded nul ("\0") characters, and do not have a length +available efficiently. The general replacement for '``const char*``' is +StringRef. + +For more information on choosing string containers for APIs, please see +:ref:`Passing Strings <string_apis>`. + +.. _dss_stringref: + +llvm/ADT/StringRef.h +^^^^^^^^^^^^^^^^^^^^ + +The StringRef class is a simple value class that contains a pointer to a +character and a length, and is quite related to the :ref:`ArrayRef +<dss_arrayref>` class (but specialized for arrays of characters). Because +StringRef carries a length with it, it safely handles strings with embedded nul +characters in it, getting the length does not require a strlen call, and it even +has very convenient APIs for slicing and dicing the character range that it +represents. + +StringRef is ideal for passing simple strings around that are known to be live, +either because they are C string literals, std::string, a C array, or a +SmallVector. Each of these cases has an efficient implicit conversion to +StringRef, which doesn't result in a dynamic strlen being executed. + +StringRef has a few major limitations which make more powerful string containers +useful: + +#. You cannot directly convert a StringRef to a 'const char*' because there is + no way to add a trailing nul (unlike the .c_str() method on various stronger + classes). + +#. StringRef doesn't own or keep alive the underlying string bytes. + As such it can easily lead to dangling pointers, and is not suitable for + embedding in datastructures in most cases (instead, use an std::string or + something like that). + +#. For the same reason, StringRef cannot be used as the return value of a + method if the method "computes" the result string. Instead, use std::string. + +#. StringRef's do not allow you to mutate the pointed-to string bytes and it + doesn't allow you to insert or remove bytes from the range. For editing + operations like this, it interoperates with the :ref:`Twine <dss_twine>` + class. + +Because of its strengths and limitations, it is very common for a function to +take a StringRef and for a method on an object to return a StringRef that points +into some string that it owns. + +.. _dss_twine: + +llvm/ADT/Twine.h +^^^^^^^^^^^^^^^^ + +The Twine class is used as an intermediary datatype for APIs that want to take a +string that can be constructed inline with a series of concatenations. Twine +works by forming recursive instances of the Twine datatype (a simple value +object) on the stack as temporary objects, linking them together |