From b715b205a41dc53942acbcfbfdbdce6bdaf9a4aa Mon Sep 17 00:00:00 2001 From: Sean Silva Date: Tue, 4 Dec 2012 03:20:08 +0000 Subject: docs: Convert ProgrammersManual to reST. Patch by Alexander Zinenko! git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169208 91177308-0d34-0410-b5e6-96231b3b80d8 --- docs/ProgrammersManual.html | 4156 ------------------------------------------- 1 file changed, 4156 deletions(-) delete mode 100644 docs/ProgrammersManual.html (limited to 'docs/ProgrammersManual.html') diff --git a/docs/ProgrammersManual.html b/docs/ProgrammersManual.html deleted file mode 100644 index 64ddb9d105..0000000000 --- a/docs/ProgrammersManual.html +++ /dev/null @@ -1,4156 +0,0 @@ - - - - - LLVM Programmer's Manual - - - - -

- LLVM Programmer's Manual -

- -
    -
  1. Introduction
  2. -
  3. General Information - -
  4. -
  5. Important and useful LLVM APIs - -
  6. -
  7. Picking the Right Data Structure for a Task - -
  8. -
  9. Helpful Hints for Common Operations - -
  10. - -
  11. Threads and LLVM - -
  12. - -
  13. Advanced Topics -
  14. - -
  15. The Core LLVM Class Hierarchy Reference - -
  16. -
- -
-

Written by Chris Lattner, - Dinakar Dhurjati, - Gabor Greif, - Joel Stanley, - Reid Spencer and - Owen Anderson

-
- - -

- Introduction -

- - -
- -

This document is meant to highlight some of the important classes and -interfaces available in the LLVM source-base. This manual is not -intended to explain what LLVM is, how it works, and what LLVM code looks -like. It assumes that you know the basics of LLVM and are interested -in writing transformations or otherwise analyzing or manipulating the -code.

- -

This document should get you oriented so that you can find your -way in the continuously growing source code that makes up the LLVM -infrastructure. Note that this manual is not intended to serve as a -replacement for reading the source code, so if you think there should be -a method in one of these classes to do something, but it's not listed, -check the source. Links to the doxygen sources -are provided to make this as easy as possible.

- -

The first section of this document describes general information that is -useful to know when working in the LLVM infrastructure, and the second describes -the Core LLVM classes. In the future this manual will be extended with -information describing how to use extension libraries, such as dominator -information, CFG traversal routines, and useful utilities like the InstVisitor template.

- -
- - -

- General Information -

- - -
- -

This section contains general information that is useful if you are working -in the LLVM source-base, but that isn't specific to any particular API.

- - -

- The C++ Standard Template Library -

- -
- -

LLVM makes heavy use of the C++ Standard Template Library (STL), -perhaps much more than you are used to, or have seen before. Because of -this, you might want to do a little background reading in the -techniques used and capabilities of the library. There are many good -pages that discuss the STL, and several books on the subject that you -can get, so it will not be discussed in this document.

- -

Here are some useful links:

- -
    - -
  1. Dinkumware -C++ Library reference - an excellent reference for the STL and other parts -of the standard C++ library.
  2. - -
  3. C++ In a Nutshell - This is an -O'Reilly book in the making. It has a decent Standard Library -Reference that rivals Dinkumware's, and is unfortunately no longer free since the -book has been published.
  4. - -
  5. C++ Frequently Asked -Questions
  6. - -
  7. SGI's STL Programmer's Guide - -Contains a useful Introduction to the -STL.
  8. - -
  9. Bjarne Stroustrup's C++ -Page
  10. - -
  11. -Bruce Eckel's Thinking in C++, 2nd ed. Volume 2 Revision 4.0 (even better, get -the book).
  12. - -
- -

You are also encouraged to take a look at the LLVM Coding Standards guide which focuses on how -to write maintainable code more than where to put your curly braces.

- -
- - -

- Other useful references -

- -
- -
    -
  1. Using -static and shared libraries across platforms
  2. -
- -
- -
- - -

- Important and useful LLVM APIs -

- - -
- -

Here we highlight some LLVM APIs that are generally useful and good to -know about when writing transformations.

- - -

- The isa<>, cast<> and - dyn_cast<> templates -

- -
- -

The LLVM source-base makes extensive use of a custom form of RTTI. -These templates have many similarities to the C++ dynamic_cast<> -operator, but they don't have some drawbacks (primarily stemming from -the fact that dynamic_cast<> only works on classes that -have a v-table). Because they are used so often, you must know what they -do and how they work. All of these templates are defined in the llvm/Support/Casting.h -file (note that you very rarely have to include this file directly).

- -
-
isa<>:
- -

The isa<> operator works exactly like the Java - "instanceof" operator. It returns true or false depending on whether - a reference or pointer points to an instance of the specified class. This can - be very useful for constraint checking of various sorts (example below).

-
- -
cast<>:
- -

The cast<> operator is a "checked cast" operation. It - converts a pointer or reference from a base class to a derived class, causing - an assertion failure if it is not really an instance of the right type. This - should be used in cases where you have some information that makes you believe - that something is of the right type. An example of the isa<> - and cast<> template is:

- -
-
-static bool isLoopInvariant(const Value *V, const Loop *L) {
-  if (isa<Constant>(V) || isa<Argument>(V) || isa<GlobalValue>(V))
-    return true;
-
-  // Otherwise, it must be an instruction...
-  return !L->contains(cast<Instruction>(V)->getParent());
-}
-
-
- -

Note that you should not use an isa<> test followed - by a cast<>, for that use the dyn_cast<> - operator.

- -
- -
dyn_cast<>:
- -

The dyn_cast<> operator is a "checking cast" operation. - It checks to see if the operand is of the specified type, and if so, returns a - pointer to it (this operator does not work with references). If the operand is - not of the correct type, a null pointer is returned. Thus, this works very - much like the dynamic_cast<> operator in C++, and should be - used in the same circumstances. Typically, the dyn_cast<> - operator is used in an if statement or some other flow control - statement like this:

- -
-
-if (AllocationInst *AI = dyn_cast<AllocationInst>(Val)) {
-  // ...
-}
-
-
- -

This form of the if statement effectively combines together a call - to isa<> and a call to cast<> into one - statement, which is very convenient.

- -

Note that the dyn_cast<> operator, like C++'s - dynamic_cast<> or Java's instanceof operator, can be - abused. In particular, you should not use big chained if/then/else - blocks to check for lots of different variants of classes. If you find - yourself wanting to do this, it is much cleaner and more efficient to use the - InstVisitor class to dispatch over the instruction type directly.

- -
- -
cast_or_null<>:
- -

The cast_or_null<> operator works just like the - cast<> operator, except that it allows for a null pointer as an - argument (which it then propagates). This can sometimes be useful, allowing - you to combine several null checks into one.

- -
dyn_cast_or_null<>:
- -

The dyn_cast_or_null<> operator works just like the - dyn_cast<> operator, except that it allows for a null pointer - as an argument (which it then propagates). This can sometimes be useful, - allowing you to combine several null checks into one.

- -
- -

These five templates can be used with any classes, whether they have a -v-table or not. If you want to add support for these templates, see the -document How to set up LLVM-style -RTTI for your class hierarchy . -

- -
- - - -

- Passing strings (the StringRef -and Twine classes) -

- -
- -

Although LLVM generally does not do much string manipulation, we do have -several important APIs which take strings. Two important examples are the -Value class -- which has names for instructions, functions, etc. -- and the -StringMap class which is used extensively in LLVM and Clang.

- -

These are generic classes, and they need to be able to accept strings which -may have embedded null characters. Therefore, they cannot simply take -a const char *, and taking a const std::string& requires -clients to perform a heap allocation which is usually unnecessary. Instead, -many LLVM APIs use a StringRef or a const Twine& for -passing strings efficiently.

- - -

- The StringRef class -

- -
- -

The StringRef data type represents a reference to a constant string -(a character array and a length) and supports the common operations available -on std:string, but does not require heap allocation.

- -

It can be implicitly constructed using a C style null-terminated string, -an std::string, or explicitly with a character pointer and length. -For example, the StringRef find function is declared as:

- -
-  iterator find(StringRef Key);
-
- -

and clients can call it using any one of:

- -
-  Map.find("foo");                 // Lookup "foo"
-  Map.find(std::string("bar"));    // Lookup "bar"
-  Map.find(StringRef("\0baz", 4)); // Lookup "\0baz"
-
- -

Similarly, APIs which need to return a string may return a StringRef -instance, which can be used directly or converted to an std::string -using the str member function. See -"llvm/ADT/StringRef.h" -for more information.

- -

You should rarely use the StringRef class directly, because it contains -pointers to external memory it is not generally safe to store an instance of the -class (unless you know that the external storage will not be freed). StringRef is -small and pervasive enough in LLVM that it should always be passed by value.

- -
- - -

- The Twine class -

- -
- -

The Twine class is an -efficient way for APIs to accept concatenated strings. For example, a common -LLVM paradigm is to name one instruction based on -the name of another instruction with a suffix, for example:

- -
-
-    New = CmpInst::Create(..., SO->getName() + ".cmp");
-
-
- -

The Twine class is effectively a lightweight -rope -which points to temporary (stack allocated) objects. Twines can be implicitly -constructed as the result of the plus operator applied to strings (i.e., a C -strings, an std::string, or a StringRef). The twine delays -the actual concatenation of strings until it is actually required, at which -point it can be efficiently rendered directly into a character array. This -avoids unnecessary heap allocation involved in constructing the temporary -results of string concatenation. See -"llvm/ADT/Twine.h" -and here for more information.

- -

As with a StringRef, Twine objects point to external memory -and should almost never be stored or mentioned directly. They are intended -solely for use when defining a function which should be able to efficiently -accept concatenated strings.

- -
- -
- - -

- The DEBUG() macro and -debug option -

- -
- -

Often when working on your pass you will put a bunch of debugging printouts -and other code into your pass. After you get it working, you want to remove -it, but you may need it again in the future (to work out new bugs that you run -across).

- -

Naturally, because of this, you don't want to delete the debug printouts, -but you don't want them to always be noisy. A standard compromise is to comment -them out, allowing you to enable them if you need them in the future.

- -

The "llvm/Support/Debug.h" -file provides a macro named DEBUG() that is a much nicer solution to -this problem. Basically, you can put arbitrary code into the argument of the -DEBUG macro, and it is only executed if 'opt' (or any other -tool) is run with the '-debug' command line argument:

- -
-
-DEBUG(errs() << "I am here!\n");
-
-
- -

Then you can run your pass like this:

- -
-
-$ opt < a.bc > /dev/null -mypass
-<no output>
-$ opt < a.bc > /dev/null -mypass -debug
-I am here!
-
-
- -

Using the DEBUG() macro instead of a home-brewed solution allows you -to not have to create "yet another" command line option for the debug output for -your pass. Note that DEBUG() macros are disabled for optimized builds, -so they do not cause a performance impact at all (for the same reason, they -should also not contain side-effects!).

- -

One additional nice thing about the DEBUG() macro is that you can -enable or disable it directly in gdb. Just use "set DebugFlag=0" or -"set DebugFlag=1" from the gdb if the program is running. If the -program hasn't been started yet, you can always just run it with --debug.

- - -

- Fine grained debug info with DEBUG_TYPE and - the -debug-only option -

- -
- -

Sometimes you may find yourself in a situation where enabling -debug -just turns on too much information (such as when working on the code -generator). If you want to enable debug information with more fine-grained -control, you define the DEBUG_TYPE macro and the -debug only -option as follows:

- -
-
-#undef  DEBUG_TYPE
-DEBUG(errs() << "No debug type\n");
-#define DEBUG_TYPE "foo"
-DEBUG(errs() << "'foo' debug type\n");
-#undef  DEBUG_TYPE
-#define DEBUG_TYPE "bar"
-DEBUG(errs() << "'bar' debug type\n"));
-#undef  DEBUG_TYPE
-#define DEBUG_TYPE ""
-DEBUG(errs() << "No debug type (2)\n");
-
-
- -

Then you can run your pass like this:

- -
-
-$ opt < a.bc > /dev/null -mypass
-<no output>
-$ opt < a.bc > /dev/null -mypass -debug
-No debug type
-'foo' debug type
-'bar' debug type
-No debug type (2)
-$ opt < a.bc > /dev/null -mypass -debug-only=foo
-'foo' debug type
-$ opt < a.bc > /dev/null -mypass -debug-only=bar
-'bar' debug type
-
-
- -

Of course, in practice, you should only set DEBUG_TYPE at the top of -a file, to specify the debug type for the entire module (if you do this before -you #include "llvm/Support/Debug.h", you don't have to insert the ugly -#undef's). Also, you should use names more meaningful than "foo" and -"bar", because there is no system in place to ensure that names do not -conflict. If two different modules use the same string, they will all be turned -on when the name is specified. This allows, for example, all debug information -for instruction scheduling to be enabled with -debug-type=InstrSched, -even if the source lives in multiple files.

- -

The DEBUG_WITH_TYPE macro is also available for situations where you -would like to set DEBUG_TYPE, but only for one specific DEBUG -statement. It takes an additional first parameter, which is the type to use. For -example, the preceding example could be written as:

- - -
-
-DEBUG_WITH_TYPE("", errs() << "No debug type\n");
-DEBUG_WITH_TYPE("foo", errs() << "'foo' debug type\n");
-DEBUG_WITH_TYPE("bar", errs() << "'bar' debug type\n"));
-DEBUG_WITH_TYPE("", errs() << "No debug type (2)\n");
-
-
- -
- -
- - -

- The Statistic class & -stats - option -

- -
- -

The "llvm/ADT/Statistic.h" file -provides a class named Statistic that is used as a unified way to -keep track of what the LLVM compiler is doing and how effective various -optimizations are. It is useful to see what optimizations are contributing to -making a particular program run faster.

- -

Often you may run your pass on some big program, and you're interested to see -how many times it makes a certain transformation. Although you can do this with -hand inspection, or some ad-hoc method, this is a real pain and not very useful -for big programs. Using the Statistic class makes it very easy to -keep track of this information, and the calculated information is presented in a -uniform manner with the rest of the passes being executed.

- -

There are many examples of Statistic uses, but the basics of using -it are as follows:

- -
    -
  1. Define your statistic like this:

    - -
    -
    -#define DEBUG_TYPE "mypassname"   // This goes before any #includes.
    -STATISTIC(NumXForms, "The # of times I did stuff");
    -
    -
    - -

    The STATISTIC macro defines a static variable, whose name is - specified by the first argument. The pass name is taken from the DEBUG_TYPE - macro, and the description is taken from the second argument. The variable - defined ("NumXForms" in this case) acts like an unsigned integer.

  2. - -
  3. Whenever you make a transformation, bump the counter:

    - -
    -
    -++NumXForms;   // I did stuff!
    -
    -
    - -
  4. -
- -

That's all you have to do. To get 'opt' to print out the - statistics gathered, use the '-stats' option:

- -
-
-$ opt -stats -mypassname < program.bc > /dev/null
-... statistics output ...
-
-
- -

When running opt on a C file from the SPEC benchmark -suite, it gives a report that looks like this:

- -
-
-   7646 bitcodewriter   - Number of normal instructions
-    725 bitcodewriter   - Number of oversized instructions
- 129996 bitcodewriter   - Number of bitcode bytes written
-   2817 raise           - Number of insts DCEd or constprop'd
-   3213 raise           - Number of cast-of-self removed
-   5046 raise           - Number of expression trees converted
-     75 raise           - Number of other getelementptr's formed
-    138 raise           - Number of load/store peepholes
-     42 deadtypeelim    - Number of unused typenames removed from symtab
-    392 funcresolve     - Number of varargs functions resolved
-     27 globaldce       - Number of global variables removed
-      2 adce            - Number of basic blocks removed
-    134 cee             - Number of branches revectored
-     49 cee             - Number of setcc instruction eliminated
-    532 gcse            - Number of loads removed
-   2919 gcse            - Number of instructions removed
-     86 indvars         - Number of canonical indvars added
-     87 indvars         - Number of aux indvars removed
-     25 instcombine     - Number of dead inst eliminate
-    434 instcombine     - Number of insts combined
-    248 licm            - Number of load insts hoisted
-   1298 licm            - Number of insts hoisted to a loop pre-header
-      3 licm            - Number of insts hoisted to multiple loop preds (bad, no loop pre-header)
-     75 mem2reg         - Number of alloca's promoted
-   1444 cfgsimplify     - Number of blocks simplified
-
-
- -

Obviously, with so many optimizations, having a unified framework for this -stuff is very nice. Making your pass fit well into the framework makes it more -maintainable and useful.

- -
- - -

- Viewing graphs while debugging code -

- -
- -

Several of the important data structures in LLVM are graphs: for example -CFGs made out of LLVM BasicBlocks, CFGs made out of -LLVM MachineBasicBlocks, and -Instruction Selection -DAGs. In many cases, while debugging various parts of the compiler, it is -nice to instantly visualize these graphs.

- -

LLVM provides several callbacks that are available in a debug build to do -exactly that. If you call the Function::viewCFG() method, for example, -the current LLVM tool will pop up a window containing the CFG for the function -where each basic block is a node in the graph, and each node contains the -instructions in the block. Similarly, there also exists -Function::viewCFGOnly() (does not include the instructions), the -MachineFunction::viewCFG() and MachineFunction::viewCFGOnly(), -and the SelectionDAG::viewGraph() methods. Within GDB, for example, -you can usually use something like call DAG.viewGraph() to pop -up a window. Alternatively, you can sprinkle calls to these functions in your -code in places you want to debug.

- -

Getting this to work requires a small amount of configuration. On Unix -systems with X11, install the graphviz -toolkit, and make sure 'dot' and 'gv' are in your path. If you are running on -Mac OS/X, download and install the Mac OS/X Graphviz program, and add -/Applications/Graphviz.app/Contents/MacOS/ (or wherever you install -it) to your path. Once in your system and path are set up, rerun the LLVM -configure script and rebuild LLVM to enable this functionality.

- -

SelectionDAG has been extended to make it easier to locate -interesting nodes in large complex graphs. From gdb, if you -call DAG.setGraphColor(node, "color"), then the -next call DAG.viewGraph() would highlight the node in the -specified color (choices of colors can be found at colors.) More -complex node attributes can be provided with call -DAG.setGraphAttrs(node, "attributes") (choices can be -found at Graph -Attributes.) If you want to restart and clear all the current graph -attributes, then you can call DAG.clearGraphAttrs().

- -

Note that graph visualization features are compiled out of Release builds -to reduce file size. This means that you need a Debug+Asserts or -Release+Asserts build to use these features.

- -
- -
- - -

- Picking the Right Data Structure for a Task -

- - -
- -

LLVM has a plethora of data structures in the llvm/ADT/ directory, - and we commonly use STL data structures. This section describes the trade-offs - you should consider when you pick one.

- -

-The first step is a choose your own adventure: do you want a sequential -container, a set-like container, or a map-like container? The most important -thing when choosing a container is the algorithmic properties of how you plan to -access the container. Based on that, you should use:

- - - -

-Once the proper category of container is determined, you can fine tune the -memory use, constant factors, and cache behaviors of access by intelligently -picking a member of the category. Note that constant factors and cache behavior -can be a big deal. If you have a vector that usually only contains a few -elements (but could contain many), for example, it's much better to use -SmallVector than vector -. Doing so avoids (relatively) expensive malloc/free calls, which dwarf the -cost of adding the elements to the container.

- - -

- Sequential Containers (std::vector, std::list, etc) -

- -
-There are a variety of sequential containers available for you, based on your -needs. Pick the first in this section that will do what you want. - - -

- llvm/ADT/ArrayRef.h -

- -
-

The llvm::ArrayRef class is the preferred class to use in an interface that - accepts a sequential list of elements in memory and just reads from them. By - taking an ArrayRef, the API can be passed a fixed size array, an std::vector, - an llvm::SmallVector and anything else that is contiguous in memory. -

-
- - - - -

- Fixed Size Arrays -

- -
-

Fixed size arrays are very simple and very fast. They are good if you know -exactly how many elements you have, or you have a (low) upper bound on how many -you have.

-
- - -

- Heap Allocated Arrays -

- -
-

Heap allocated arrays (new[] + delete[]) are also simple. They are good if -the number of elements is variable, if you know how many elements you will need -before the array is allocated, and if the array is usually large (if not, -consider a SmallVector). The cost of a heap -allocated array is the cost of the new/delete (aka malloc/free). Also note that -if you are allocating an array of a type with a constructor, the constructor and -destructors will be run for every element in the array (re-sizable vectors only -construct those elements actually used).

-
- - -

- "llvm/ADT/TinyPtrVector.h" -

- - -
-

TinyPtrVector<Type> is a highly specialized collection class -that is optimized to avoid allocation in the case when a vector has zero or one -elements. It has two major restrictions: 1) it can only hold values of pointer -type, and 2) it cannot hold a null pointer.

- -

Since this container is highly specialized, it is rarely used.

- -
- - -

- "llvm/ADT/SmallVector.h" -

- -
-

SmallVector<Type, N> is a simple class that looks and smells -just like vector<Type>: -it supports efficient iteration, lays out elements in memory order (so you can -do pointer arithmetic between elements), supports efficient push_back/pop_back -operations, supports efficient random access to its elements, etc.

- -

The advantage of SmallVector is that it allocates space for -some number of elements (N) in the object itself. Because of this, if -the SmallVector is dynamically smaller than N, no malloc is performed. This can -be a big win in cases where the malloc/free call is far more expensive than the -code that fiddles around with the elements.

- -

This is good for vectors that are "usually small" (e.g. the number of -predecessors/successors of a block is usually less than 8). On the other hand, -this makes the size of the SmallVector itself large, so you don't want to -allocate lots of them (doing so will waste a lot of space). As such, -SmallVectors are most useful when on the stack.

- -

SmallVector also provides a nice portable and efficient replacement for -alloca.

- -
- - -

- <vector> -

- -
-

-std::vector is well loved and respected. It is useful when SmallVector isn't: -when the size of the vector is often large (thus the small optimization will -rarely be a benefit) or if you will be allocating many instances of the vector -itself (which would waste space for elements that aren't in the container). -vector is also useful when interfacing with code that expects vectors :). -

- -

One worthwhile note about std::vector: avoid code like this:

- -
-
-for ( ... ) {
-   std::vector<foo> V;
-   // make use of V.
-}
-
-
- -

Instead, write this as:

- -
-
-std::vector<foo> V;
-for ( ... ) {
-   // make use of V.
-   V.clear();
-}
-
-
- -

Doing so will save (at least) one heap allocation and free per iteration of -the loop.

- -
- - -

- <deque> -

- -
-

std::deque is, in some senses, a generalized version of std::vector. Like -std::vector, it provides constant time random access and other similar -properties, but it also provides efficient access to the front of the list. It -does not guarantee continuity of elements within memory.

- -

In exchange for this extra flexibility, std::deque has significantly higher -constant factor costs than std::vector. If possible, use std::vector or -something cheaper.

-
- - -

- <list> -

- -
-

std::list is an extremely inefficient class that is rarely useful. -It performs a heap allocation for every element inserted into it, thus having an -extremely high constant factor, particularly for small data types. std::list -also only supports bidirectional iteration, not random access iteration.

- -

In exchange for this high cost, std::list supports efficient access to both -ends of the list (like std::deque, but unlike std::vector or SmallVector). In -addition, the iterator invalidation characteristics of std::list are stronger -than that of a vector class: inserting or removing an element into the list does -not invalidate iterator or pointers to other elements in the list.

-
- - -

- llvm/ADT/ilist.h -

- -
-

ilist<T> implements an 'intrusive' doubly-linked list. It is -intrusive, because it requires the element to store and provide access to the -prev/next pointers for the list.

- -

ilist has the same drawbacks as std::list, and additionally -requires an ilist_traits implementation for the element type, but it -provides some novel characteristics. In particular, it can efficiently store -polymorphic objects, the traits class is informed when an element is inserted or -removed from the list, and ilists are guaranteed to support a -constant-time splice operation.

- -

These properties are exactly what we want for things like -Instructions and basic blocks, which is why these are implemented with -ilists.

- -Related classes of interest are explained in the following subsections: - -
- - -

- llvm/ADT/PackedVector.h -

- -
-

-Useful for storing a vector of values using only a few number of bits for each -value. Apart from the standard operations of a vector-like container, it can -also perform an 'or' set operation. -

- -

For example:

- -
-
-enum State {
-    None = 0x0,
-    FirstCondition = 0x1,
-    SecondCondition = 0x2,
-    Both = 0x3
-};
-
-State get() {
-    PackedVector<State, 2> Vec1;
-    Vec1.push_back(FirstCondition);
-
-    PackedVector<State, 2> Vec2;
-    Vec2.push_back(SecondCondition);
-
-    Vec1 |= Vec2;
-    return Vec1[0]; // returns 'Both'.
-}
-
-
- -
- - -

- ilist_traits -

- -
-

ilist_traits<T> is ilist<T>'s customization -mechanism. iplist<T> (and consequently ilist<T>) -publicly derive from this traits class.

-
- - -

- iplist -

- -
-

iplist<T> is ilist<T>'s base and as such -supports a slightly narrower interface. Notably, inserters from -T& are absent.

- -

ilist_traits<T> is a public base of this class and can be -used for a wide variety of customizations.

-
- - -

- llvm/ADT/ilist_node.h -

- -
-

ilist_node<T> implements a the forward and backward links -that are expected by the ilist<T> (and analogous containers) -in the default manner.

- -

ilist_node<T>s are meant to be embedded in the node type -T, usually T publicly derives from -ilist_node<T>.

-
- - -

- Sentinels -

- -
-

ilists have another specialty that must be considered. To be a good -citizen in the C++ ecosystem, it needs to support the standard container -operations, such as begin and end iterators, etc. Also, the -operator-- must work correctly on the end iterator in the -case of non-empty ilists.

- -

The only sensible solution to this problem is to allocate a so-called -sentinel along with the intrusive list, which serves as the end -iterator, providing the back-link to the last element. However conforming to the -C++ convention it is illegal to operator++ beyond the sentinel and it -also must not be dereferenced.

- -

These constraints allow for some implementation freedom to the ilist -how to allocate and store the sentinel. The corresponding policy is dictated -by ilist_traits<T>. By default a T gets heap-allocated -whenever the need for a sentinel arises.

- -

While the default policy is sufficient in most cases, it may break down when -T does not provide a default constructor. Also, in the case of many -instances of ilists, the memory overhead of the associated sentinels -is wasted. To alleviate the situation with numerous and voluminous -T-sentinels, sometimes a trick is employed, leading to ghostly -sentinels.

- -

Ghostly sentinels are obtained by specially-crafted ilist_traits<T> -which superpose the sentinel with the ilist instance in memory. Pointer -arithmetic is used to obtain the sentinel, which is relative to the -ilist's this pointer. The ilist is augmented by an -extra pointer, which serves as the back-link of the sentinel. This is the only -field in the ghostly sentinel which can be legally accessed.

-
- - -

- Other Sequential Container options -

- -
-

Other STL containers are available, such as std::string.

- -

There are also various STL adapter classes such as std::queue, -std::priority_queue, std::stack, etc. These provide simplified access to an -underlying container but don't affect the cost of the container itself.

- -
-
- - -

- String-like containers -

- -
- -

-There are a variety of ways to pass around and use strings in C and C++, and -LLVM adds a few new options to choose from. Pick the first option on this list -that will do what you need, they are ordered according to their relative cost. -

-

-Note that is is generally preferred to not pass strings around as -"const char*"'s. These have a number of problems, including the fact -that they cannot represent embedded nul ("\0") characters, and do not have a -length available efficiently. The general replacement for 'const -char*' is StringRef. -

- -

For more information on choosing string containers for APIs, please see -Passing strings.

- - - -

- llvm/ADT/StringRef.h -

- -
-

-The StringRef class is a simple value class that contains a pointer to a -character and a length, and is quite related to the ArrayRef class (but specialized for arrays of -characters). Because StringRef carries a length with it, it safely handles -strings with embedded nul characters in it, getting the length does not require -a strlen call, and it even has very convenient APIs for slicing and dicing the -character range that it represents. -

- -

-StringRef is ideal for passing simple strings around that are known to be live, -either because they are C string literals, std::string, a C array, or a -SmallVector. Each of these cases has an efficient implicit conversion to -StringRef, which doesn't result in a dynamic strlen being executed. -

- -

StringRef has a few major limitations which make more powerful string -containers useful:

- -
    -
  1. You cannot directly convert a StringRef to a 'const char*' because there is -no way to add a trailing nul (unlike the .c_str() method on various stronger -classes).
  2. - - -
  3. StringRef doesn't own or keep alive the underlying string bytes. -As such it can easily lead to dangling pointers, and is not suitable for -embedding in datastructures in most cases (instead, use an std::string or -something like that).
  4. - -
  5. For the same reason, StringRef cannot be used as the return value of a -method if the method "computes" the result string. Instead, use -std::string.
  6. - -
  7. StringRef's do not allow you to mutate the pointed-to string bytes and it -doesn't allow you to insert or remove bytes from the range. For editing -operations like this, it interoperates with the Twine class.
  8. -
- -

Because of its strengths and limitations, it is very common for a function to -take a StringRef and for a method on an object to return a StringRef that -points into some string that it owns.

- -
- - -

- llvm/ADT/Twine.h -

- -
-

- The Twine class is used as an intermediary datatype for APIs that want to take - a string that can be constructed inline with a series of concatenations. - Twine works by forming recursive instances of the Twine datatype (a simple - value object) on the stack as temporary objects, linking them together into a - tree which is then linearized when the Twine is consumed. Twine is only safe - to use as the argument to a function, and should always be a const reference, - e.g.: -

- -
-    void foo(const Twine &T);
-    ...
-    StringRef X = ...
-    unsigned i = ...
-    foo(X + "." + Twine(i));
-  
- -

This example forms a string like "blarg.42" by concatenating the values - together, and does not form intermediate strings containing "blarg" or - "blarg.". -

- -

Because Twine is constructed with temporary objects on the stack, and - because these instances are destroyed at the end of the current statement, - it is an inherently dangerous API. For example, this simple variant contains - undefined behavior and will probably crash:

- -
-    void foo(const Twine &T);
-    ...
-    StringRef X = ...
-    unsigned i = ...
-    const Twine &Tmp = X + "." + Twine(i);
-    foo(Tmp);
-  
- -

... because the temporaries are destroyed before the call. That said, - Twine's are much more efficient than intermediate std::string temporaries, and - they work really well with StringRef. Just be aware of their limitations.

- -
- - - -

- llvm/ADT/SmallString.h -

- -
- -

SmallString is a subclass of SmallVector that -adds some convenience APIs like += that takes StringRef's. SmallString avoids -allocating memory in the case when the preallocated space is enough to hold its -data, and it calls back to general heap allocation when required. Since it owns -its data, it is very safe to use and supports full mutation of the string.

- -

Like SmallVector's, the big downside to SmallString is their sizeof. While -they are optimized for small strings, they themselves are not particularly -small. This means that they work great for temporary scratch buffers on the -stack, but should not generally be put into the heap: it is very rare to -see a SmallString as the member of a frequently-allocated heap data structure -or returned by-value. -

- -
- - -

- std::string -

- -
- -

The standard C++ std::string class is a very general class that (like - SmallString) owns its underlying data. sizeof(std::string) is very reasonable - so it can be embedded into heap data structures and returned by-value. - On the other hand, std::string is highly inefficient for inline editing (e.g. - concatenating a bunch of stuff together) and because it is provided by the - standard library, its performance characteristics depend a lot of the host - standard library (e.g. libc++ and MSVC provide a highly optimized string - class, GCC contains a really slow implementation). -

- -

The major disadvantage of std::string is that almost every operation that - makes them larger can allocate memory, which is slow. As such, it is better - to use SmallVector or Twine as a scratch buffer, but then use std::string to - persist the result.

- - -
- - -
- - - -

- Set-Like Containers (std::set, SmallSet, SetVector, etc) -

- -
- -

Set-like containers are useful when you need to canonicalize multiple values -into a single representation. There are several different choices for how to do -this, providing various trade-offs.

- - -

- A sorted 'vector' -

- -
- -

If you intend to insert a lot of elements, then do a lot of queries, a -great approach is to use a vector (or other sequential container) with -std::sort+std::unique to remove duplicates. This approach works really well if -your usage pattern has these two distinct phases (insert then query), and can be -coupled with a good choice of sequential container. -

- -

-This combination provides the several nice properties: the result data is -contiguous in memory (good for cache locality), has few allocations, is easy to -address (iterators in the final vector are just indices or pointers), and can be -efficiently queried with a standard binary or radix search.

- -
- - -

- "llvm/ADT/SmallSet.h" -

- -
- -

If you have a set-like data structure that is usually small and whose elements -are reasonably small, a SmallSet<Type, N> is a good choice. This set -has space for N elements in place (thus, if the set is dynamically smaller than -N, no malloc traffic is required) and accesses them with a simple linear search. -When the set grows beyond 'N' elements, it allocates a more expensive representation that -guarantees efficient access (for most types, it falls back to std::set, but for -pointers it uses something far better, SmallPtrSet).

- -

The magic of this class is that it handles small sets extremely efficiently, -but gracefully handles extremely large sets without loss of efficiency. The -drawback is that the interface is quite small: it supports insertion, queries -and erasing, but does not support iteration.

- -
- - -

- "llvm/ADT/SmallPtrSet.h" -

- -
- -

SmallPtrSet has all the advantages of SmallSet (and a SmallSet of pointers is -transparently implemented with a SmallPtrSet), but also supports iterators. If -more than 'N' insertions are performed, a single quadratically -probed hash table is allocated and grows as needed, providing extremely -efficient access (constant time insertion/deleting/queries with low constant -factors) and is very stingy with malloc traffic.

- -

Note that, unlike std::set, the iterators of SmallPtrSet are invalidated -whenever an insertion occurs. Also, the values visited by the iterators are not -visited in sorted order.

- -
- - -

- "llvm/ADT/DenseSet.h" -

- -
- -

-DenseSet is a simple quadratically probed hash table. It excels at supporting -small values: it uses a single allocation to hold all of the pairs that -are currently inserted in the set. DenseSet is a great way to unique small -values that are not simple pointers (use SmallPtrSet for pointers). Note that DenseSet has -the same requirements for the value type that DenseMap has. -

- -
- - -

- "llvm/ADT/SparseSet.h" -

- -
- -

SparseSet holds a small number of objects identified by unsigned keys of -moderate size. It uses a lot of memory, but provides operations that are -almost as fast as a vector. Typical keys are physical registers, virtual -registers, or numbered basic blocks.

- -

SparseSet is useful for algorithms that need very fast clear/find/insert/erase -and fast iteration over small sets. It is not intended for building composite -data structures.

- -
- - -

- "llvm/ADT/FoldingSet.h" -

- -
- -

-FoldingSet is an aggregate class that is really good at uniquing -expensive-to-create or polymorphic objects. It is a combination of a chained -hash table with intrusive links (uniqued objects are required to inherit from -FoldingSetNode) that uses SmallVector as part of -its ID process.

- -

Consider a case where you want to implement a "getOrCreateFoo" method for -a complex object (for example, a node in the code generator). The client has a -description of *what* it wants to generate (it knows the opcode and all the -operands), but we don't want to 'new' a node, then try inserting it into a set -only to find out it already exists, at which point we would have to delete it -and return the node that already exists. -

- -

To support this style of client, FoldingSet perform a query with a -FoldingSetNodeID (which wraps SmallVector) that can be used to describe the -element that we want to query for. The query either returns the element -matching the ID or it returns an opaque ID that indicates where insertion should -take place. Construction of the ID usually does not require heap traffic.

- -

Because FoldingSet uses intrusive links, it can support polymorphic objects -in the set (for example, you can have SDNode instances mixed with LoadSDNodes). -Because the elements are individually allocated, pointers to the elements are -stable: inserting or removing elements does not invalidate any pointers to other -elements. -

- -
- - -

- <set> -

- -
- -

std::set is a reasonable all-around set class, which is decent at -many things but great at nothing. std::set allocates memory for each element -inserted (thus it is very malloc intensive) and typically stores three pointers -per element in the set (thus adding a large amount of per-element space -overhead). It offers guaranteed log(n) performance, which is not particularly -fast from a complexity standpoint (particularly if the elements of the set are -expensive to compare, like strings), and has extremely high constant factors for -lookup, insertion and removal.

- -

The advantages of std::set are that its iterators are stable (deleting or -inserting an element from the set does not affect iterators or pointers to other -elements) and that iteration over the set is guaranteed to be in sorted order. -If the elements in the set are large, then the relative overhead of the pointers -and malloc traffic is not a big deal, but if the elements of the set are small, -std::set is almost never a good choice.

- -
- - -

- "llvm/ADT/SetVector.h" -

- -
-

LLVM's SetVector<Type> is an adapter class that combines your choice of -a set-like container along with a Sequential -Container. The important property -that this provides is efficient insertion with uniquing (duplicate elements are -ignored) with iteration support. It implements this by inserting elements into -both a set-like container and the sequential container, using the set-like -container for uniquing and the sequential container for it