Date: Tue, 4 Dec 2012 03:20:08 +0000 Subject: docs: Convert ProgrammersManual to reST. Patch by Alexander Zinenko! git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169208 91177308-0d34-0410-b5e6-96231b3b80d8 --- docs/ProgrammersManual.html | 4156 ------------------------------------------- 1 file changed, 4156 deletions(-) delete mode 100644 docs/ProgrammersManual.html (limited to 'docs/ProgrammersManual.html') diff --git a/docs/ProgrammersManual.html b/docs/ProgrammersManual.html deleted file mode 100644 index 64ddb9d105..0000000000 --- a/docs/ProgrammersManual.html +++ /dev/null @@ -1,4156 +0,0 @@ - - - - - LLVM Programmer's Manual - - - - -

- LLVM Programmer's Manual -

- -

Introduction
General Information -
- The C++ Standard Template Library
-
Important and useful LLVM APIs -
-
Picking the Right Data Structure for a Task -
-
Helpful Hints for Common Operations -
-
Threads and LLVM -
-
Advanced Topics -
- The ValueSymbolTable class
- The User and owned Use classes' memory layout
The Core LLVM Class Hierarchy Reference -
- The Type class
- The Module class
- The Value class -
  - The User class -
    - The Instruction class
    - The Constant class -
      - The GlobalValue class -
        -
        The Function class
        -
        The GlobalVariable class
        -
        -
      -
    -
  - The BasicBlock class
  - The Argument class
  -
-

- -

Written by Chris Lattner, - Dinakar Dhurjati, - Gabor Greif, - Joel Stanley, - Reid Spencer and - Owen Anderson

- - -

- Introduction -

- - -

- -

This document is meant to highlight some of the important classes and -interfaces available in the LLVM source-base. This manual is not -intended to explain what LLVM is, how it works, and what LLVM code looks -like. It assumes that you know the basics of LLVM and are interested -in writing transformations or otherwise analyzing or manipulating the -code.

- -

This document should get you oriented so that you can find your -way in the continuously growing source code that makes up the LLVM -infrastructure. Note that this manual is not intended to serve as a -replacement for reading the source code, so if you think there should be -a method in one of these classes to do something, but it's not listed, -check the source. Links to the doxygen sources -are provided to make this as easy as possible.

- -

The first section of this document describes general information that is -useful to know when working in the LLVM infrastructure, and the second describes -the Core LLVM classes. In the future this manual will be extended with -information describing how to use extension libraries, such as dominator -information, CFG traversal routines, and useful utilities like the InstVisitor template.

- -

- - -

- General Information -

- - -

- -

This section contains general information that is useful if you are working -in the LLVM source-base, but that isn't specific to any particular API.

- - -

- The C++ Standard Template Library -

- -

LLVM makes heavy use of the C++ Standard Template Library (STL), -perhaps much more than you are used to, or have seen before. Because of -this, you might want to do a little background reading in the -techniques used and capabilities of the library. There are many good -pages that discuss the STL, and several books on the subject that you -can get, so it will not be discussed in this document.

- -

Here are some useful links:

- -

Dinkumware -C++ Library reference - an excellent reference for the STL and other parts -of the standard C++ library.
C++ In a Nutshell - This is an -O'Reilly book in the making. It has a decent Standard Library -Reference that rivals Dinkumware's, and is unfortunately no longer free since the -book has been published.
C++ Frequently Asked -Questions
SGI's STL Programmer's Guide - -Contains a useful Introduction to the -STL.
Bjarne Stroustrup's C++ -Page
-Bruce Eckel's Thinking in C++, 2nd ed. Volume 2 Revision 4.0 (even better, get -the book).

- -

You are also encouraged to take a look at the LLVM Coding Standards guide which focuses on how -to write maintainable code more than where to put your curly braces.

- -

- - -

- Other useful references -

- -

Using -static and shared libraries across platforms

- -

- - -

- Important and useful LLVM APIs -

- - -

- -

Here we highlight some LLVM APIs that are generally useful and good to -know about when writing transformations.

- - -

- The `isa<>`, `cast<>` and - `dyn_cast<>` templates -

- -

The LLVM source-base makes extensive use of a custom form of RTTI. -These templates have many similarities to the C++ dynamic_cast<> -operator, but they don't have some drawbacks (primarily stemming from -the fact that dynamic_cast<> only works on classes that -have a v-table). Because they are used so often, you must know what they -do and how they work. All of these templates are defined in the llvm/Support/Casting.h -file (note that you very rarely have to include this file directly).

- -

isa<>:

The isa<> operator works exactly like the Java - "instanceof" operator. It returns true or false depending on whether - a reference or pointer points to an instance of the specified class. This can - be very useful for constraint checking of various sorts (example below).

cast<>:

The cast<> operator is a "checked cast" operation. It - converts a pointer or reference from a base class to a derived class, causing - an assertion failure if it is not really an instance of the right type. This - should be used in cases where you have some information that makes you believe - that something is of the right type. An example of the isa<> - and cast<> template is:

- -

-static bool isLoopInvariant(const Value *V, const Loop *L) {
-  if (isa<Constant>(V) || isa<Argument>(V) || isa<GlobalValue>(V))
-    return true;
-
-  // Otherwise, it must be an instruction...
-  return !L->contains(cast<Instruction>(V)->getParent());
-}
-

- -

Note that you should not use an isa<> test followed - by a cast<>, for that use the dyn_cast<> - operator.

- -

dyn_cast<>:

The dyn_cast<> operator is a "checking cast" operation. - It checks to see if the operand is of the specified type, and if so, returns a - pointer to it (this operator does not work with references). If the operand is - not of the correct type, a null pointer is returned. Thus, this works very - much like the dynamic_cast<> operator in C++, and should be - used in the same circumstances. Typically, the dyn_cast<> - operator is used in an if statement or some other flow control - statement like this:

- -

-if (AllocationInst *AI = dyn_cast<AllocationInst>(Val)) {
-  // ...
-}
-

- -

This form of the if statement effectively combines together a call - to isa<> and a call to cast<> into one - statement, which is very convenient.

- -

Note that the dyn_cast<> operator, like C++'s - dynamic_cast<> or Java's instanceof operator, can be - abused. In particular, you should not use big chained if/then/else - blocks to check for lots of different variants of classes. If you find - yourself wanting to do this, it is much cleaner and more efficient to use the - InstVisitor class to dispatch over the instruction type directly.

- -

cast_or_null<>:

The cast_or_null<> operator works just like the - cast<> operator, except that it allows for a null pointer as an - argument (which it then propagates). This can sometimes be useful, allowing - you to combine several null checks into one.

dyn_cast_or_null<>:

The dyn_cast_or_null<> operator works just like the - dyn_cast<> operator, except that it allows for a null pointer - as an argument (which it then propagates). This can sometimes be useful, - allowing you to combine several null checks into one.

- -

These five templates can be used with any classes, whether they have a -v-table or not. If you want to add support for these templates, see the -document How to set up LLVM-style -RTTI for your class hierarchy . -

- -

- - - -

- Passing strings (the `StringRef` -and `Twine` classes) -

- -

Although LLVM generally does not do much string manipulation, we do have -several important APIs which take strings. Two important examples are the -Value class -- which has names for instructions, functions, etc. -- and the -StringMap class which is used extensively in LLVM and Clang.

- -

These are generic classes, and they need to be able to accept strings which -may have embedded null characters. Therefore, they cannot simply take -a const char *, and taking a const std::string& requires -clients to perform a heap allocation which is usually unnecessary. Instead, -many LLVM APIs use a StringRef or a const Twine& for -passing strings efficiently.

- - -

- The `StringRef` class -

- -

The StringRef data type represents a reference to a constant string -(a character array and a length) and supports the common operations available -on std:string, but does not require heap allocation.

- -

It can be implicitly constructed using a C style null-terminated string, -an std::string, or explicitly with a character pointer and length. -For example, the StringRef find function is declared as:

- -

-  iterator find(StringRef Key);
-

- -

and clients can call it using any one of:

- -

-  Map.find("foo");                 // Lookup "foo"
-  Map.find(std::string("bar"));    // Lookup "bar"
-  Map.find(StringRef("\0baz", 4)); // Lookup "\0baz"
-

- -

Similarly, APIs which need to return a string may return a StringRef -instance, which can be used directly or converted to an std::string -using the str member function. See -"llvm/ADT/StringRef.h" -for more information.

- -

You should rarely use the StringRef class directly, because it contains -pointers to external memory it is not generally safe to store an instance of the -class (unless you know that the external storage will not be freed). StringRef is -small and pervasive enough in LLVM that it should always be passed by value.

- -

- - -

- The `Twine` class -

- -

The Twine class is an -efficient way for APIs to accept concatenated strings. For example, a common -LLVM paradigm is to name one instruction based on -the name of another instruction with a suffix, for example:

- -

-    New = CmpInst::Create(..., SO->getName() + ".cmp");
-

- -

The Twine class is effectively a lightweight -rope -which points to temporary (stack allocated) objects. Twines can be implicitly -constructed as the result of the plus operator applied to strings (i.e., a C -strings, an std::string, or a StringRef). The twine delays -the actual concatenation of strings until it is actually required, at which -point it can be efficiently rendered directly into a character array. This -avoids unnecessary heap allocation involved in constructing the temporary -results of string concatenation. See -"llvm/ADT/Twine.h" -and here for more information.

- -

As with a StringRef, Twine objects point to external memory -and should almost never be stored or mentioned directly. They are intended -solely for use when defining a function which should be able to efficiently -accept concatenated strings.

- -

- - -

- The `DEBUG()` macro and `-debug` option -

- -

Often when working on your pass you will put a bunch of debugging printouts -and other code into your pass. After you get it working, you want to remove -it, but you may need it again in the future (to work out new bugs that you run -across).

- -

Naturally, because of this, you don't want to delete the debug printouts, -but you don't want them to always be noisy. A standard compromise is to comment -them out, allowing you to enable them if you need them in the future.

- -

The "llvm/Support/Debug.h" -file provides a macro named DEBUG() that is a much nicer solution to -this problem. Basically, you can put arbitrary code into the argument of the -DEBUG macro, and it is only executed if 'opt' (or any other -tool) is run with the '-debug' command line argument:

- -

-DEBUG(errs() << "I am here!\n");
-

- -

Then you can run your pass like this:

- -

-$ opt < a.bc > /dev/null -mypass
-<no output>
-$ opt < a.bc > /dev/null -mypass -debug
-I am here!
-

- -

Using the DEBUG() macro instead of a home-brewed solution allows you -to not have to create "yet another" command line option for the debug output for -your pass. Note that DEBUG() macros are disabled for optimized builds, -so they do not cause a performance impact at all (for the same reason, they -should also not contain side-effects!).

- -

One additional nice thing about the DEBUG() macro is that you can -enable or disable it directly in gdb. Just use "set DebugFlag=0" or -"set DebugFlag=1" from the gdb if the program is running. If the -program hasn't been started yet, you can always just run it with --debug.

- - -

- Fine grained debug info with `DEBUG_TYPE` and - the `-debug-only` option -

- -

Sometimes you may find yourself in a situation where enabling -debug -just turns on too much information (such as when working on the code -generator). If you want to enable debug information with more fine-grained -control, you define the DEBUG_TYPE macro and the -debug only -option as follows:

- -

-#undef  DEBUG_TYPE
-DEBUG(errs() << "No debug type\n");
-#define DEBUG_TYPE "foo"
-DEBUG(errs() << "'foo' debug type\n");
-#undef  DEBUG_TYPE
-#define DEBUG_TYPE "bar"
-DEBUG(errs() << "'bar' debug type\n"));
-#undef  DEBUG_TYPE
-#define DEBUG_TYPE ""
-DEBUG(errs() << "No debug type (2)\n");
-

- -

Then you can run your pass like this:

- -

-$ opt < a.bc > /dev/null -mypass
-<no output>
-$ opt < a.bc > /dev/null -mypass -debug
-No debug type
-'foo' debug type
-'bar' debug type
-No debug type (2)
-$ opt < a.bc > /dev/null -mypass -debug-only=foo
-'foo' debug type
-$ opt < a.bc > /dev/null -mypass -debug-only=bar
-'bar' debug type
-

- -

Of course, in practice, you should only set DEBUG_TYPE at the top of -a file, to specify the debug type for the entire module (if you do this before -you #include "llvm/Support/Debug.h", you don't have to insert the ugly -#undef's). Also, you should use names more meaningful than "foo" and -"bar", because there is no system in place to ensure that names do not -conflict. If two different modules use the same string, they will all be turned -on when the name is specified. This allows, for example, all debug information -for instruction scheduling to be enabled with -debug-type=InstrSched, -even if the source lives in multiple files.

- -

The DEBUG_WITH_TYPE macro is also available for situations where you -would like to set DEBUG_TYPE, but only for one specific DEBUG -statement. It takes an additional first parameter, which is the type to use. For -example, the preceding example could be written as:

- - -

-DEBUG_WITH_TYPE("", errs() << "No debug type\n");
-DEBUG_WITH_TYPE("foo", errs() << "'foo' debug type\n");
-DEBUG_WITH_TYPE("bar", errs() << "'bar' debug type\n"));
-DEBUG_WITH_TYPE("", errs() << "No debug type (2)\n");
-

- -

- - -

- The `Statistic` class & `-stats` - option -

- -

The "llvm/ADT/Statistic.h" file -provides a class named Statistic that is used as a unified way to -keep track of what the LLVM compiler is doing and how effective various -optimizations are. It is useful to see what optimizations are contributing to -making a particular program run faster.

- -

Often you may run your pass on some big program, and you're interested to see -how many times it makes a certain transformation. Although you can do this with -hand inspection, or some ad-hoc method, this is a real pain and not very useful -for big programs. Using the Statistic class makes it very easy to -keep track of this information, and the calculated information is presented in a -uniform manner with the rest of the passes being executed.

- -

There are many examples of Statistic uses, but the basics of using -it are as follows:

- -

Define your statistic like this:
- -
-
```
-#define DEBUG_TYPE "mypassname"   // This goes before any #includes.
-STATISTIC(NumXForms, "The # of times I did stuff");
-
```
-
- -
The STATISTIC macro defines a static variable, whose name is - specified by the first argument. The pass name is taken from the DEBUG_TYPE - macro, and the description is taken from the second argument. The variable - defined ("NumXForms" in this case) acts like an unsigned integer.
Whenever you make a transformation, bump the counter:
- -
-
```
-++NumXForms;   // I did stuff!
-
```
-
- -

- -

That's all you have to do. To get 'opt' to print out the - statistics gathered, use the '-stats' option:

- -

-$ opt -stats -mypassname < program.bc > /dev/null
-... statistics output ...
-

- -

When running opt on a C file from the SPEC benchmark -suite, it gives a report that looks like this:

- -

-   7646 bitcodewriter   - Number of normal instructions
-    725 bitcodewriter   - Number of oversized instructions
- 129996 bitcodewriter   - Number of bitcode bytes written
-   2817 raise           - Number of insts DCEd or constprop'd
-   3213 raise           - Number of cast-of-self removed
-   5046 raise           - Number of expression trees converted
-     75 raise           - Number of other getelementptr's formed
-    138 raise           - Number of load/store peepholes
-     42 deadtypeelim    - Number of unused typenames removed from symtab
-    392 funcresolve     - Number of varargs functions resolved
-     27 globaldce       - Number of global variables removed
-      2 adce            - Number of basic blocks removed
-    134 cee             - Number of branches revectored
-     49 cee             - Number of setcc instruction eliminated
-    532 gcse            - Number of loads removed
-   2919 gcse            - Number of instructions removed
-     86 indvars         - Number of canonical indvars added
-     87 indvars         - Number of aux indvars removed
-     25 instcombine     - Number of dead inst eliminate
-    434 instcombine     - Number of insts combined
-    248 licm            - Number of load insts hoisted
-   1298 licm            - Number of insts hoisted to a loop pre-header
-      3 licm            - Number of insts hoisted to multiple loop preds (bad, no loop pre-header)
-     75 mem2reg         - Number of alloca's promoted
-   1444 cfgsimplify     - Number of blocks simplified
-

- -

Obviously, with so many optimizations, having a unified framework for this -stuff is very nice. Making your pass fit well into the framework makes it more -maintainable and useful.

- -

- - -

- Viewing graphs while debugging code -

- -

Several of the important data structures in LLVM are graphs: for example -CFGs made out of LLVM BasicBlocks, CFGs made out of -LLVM MachineBasicBlocks, and -Instruction Selection -DAGs. In many cases, while debugging various parts of the compiler, it is -nice to instantly visualize these graphs.

- -

LLVM provides several callbacks that are available in a debug build to do -exactly that. If you call the Function::viewCFG() method, for example, -the current LLVM tool will pop up a window containing the CFG for the function -where each basic block is a node in the graph, and each node contains the -instructions in the block. Similarly, there also exists -Function::viewCFGOnly() (does not include the instructions), the -MachineFunction::viewCFG() and MachineFunction::viewCFGOnly(), -and the SelectionDAG::viewGraph() methods. Within GDB, for example, -you can usually use something like call DAG.viewGraph() to pop -up a window. Alternatively, you can sprinkle calls to these functions in your -code in places you want to debug.

- -

Getting this to work requires a small amount of configuration. On Unix -systems with X11, install the graphviz -toolkit, and make sure 'dot' and 'gv' are in your path. If you are running on -Mac OS/X, download and install the Mac OS/X Graphviz program, and add -/Applications/Graphviz.app/Contents/MacOS/ (or wherever you install -it) to your path. Once in your system and path are set up, rerun the LLVM -configure script and rebuild LLVM to enable this functionality.

- -

SelectionDAG has been extended to make it easier to locate -interesting nodes in large complex graphs. From gdb, if you -call DAG.setGraphColor(node, "color"), then the -next call DAG.viewGraph() would highlight the node in the -specified color (choices of colors can be found at colors.) More -complex node attributes can be provided with call -DAG.setGraphAttrs(node, "attributes") (choices can be -found at Graph -Attributes.) If you want to restart and clear all the current graph -attributes, then you can call DAG.clearGraphAttrs().

- -

Note that graph visualization features are compiled out of Release builds -to reduce file size. This means that you need a Debug+Asserts or -Release+Asserts build to use these features.

- -

- - -

- Picking the Right Data Structure for a Task -

- - -

- -

LLVM has a plethora of data structures in the llvm/ADT/ directory, - and we commonly use STL data structures. This section describes the trade-offs - you should consider when you pick one.

- -

-The first step is a choose your own adventure: do you want a sequential -container, a set-like container, or a map-like container? The most important -thing when choosing a container is the algorithmic properties of how you plan to -access the container. Based on that, you should use:

- -

a map-like container if you need efficient look-up - of an value based on another value. Map-like containers also support - efficient queries for containment (whether a key is in the map). Map-like - containers generally do not support efficient reverse mapping (values to - keys). If you need that, use two maps. Some map-like containers also - support efficient iteration through the keys in sorted order. Map-like - containers are the most expensive sort, only use them if you need one of - these capabilities.
a set-like container if you need to put a bunch of - stuff into a container that automatically eliminates duplicates. Some - set-like containers support efficient iteration through the elements in - sorted order. Set-like containers are more expensive than sequential - containers. -
a sequential container provides - the most efficient way to add elements and keeps track of the order they are - added to the collection. They permit duplicates and support efficient - iteration, but do not support efficient look-up based on a key. -
a string container is a specialized sequential - container or reference structure that is used for character or byte - arrays.
a bit container provides an efficient way to store and - perform set operations on sets of numeric id's, while automatically - eliminating duplicates. Bit containers require a maximum of 1 bit for each - identifier you want to store. -

- -

-Once the proper category of container is determined, you can fine tune the -memory use, constant factors, and cache behaviors of access by intelligently -picking a member of the category. Note that constant factors and cache behavior -can be a big deal. If you have a vector that usually only contains a few -elements (but could contain many), for example, it's much better to use -SmallVector than vector -. Doing so avoids (relatively) expensive malloc/free calls, which dwarf the -cost of adding the elements to the container.

- - -

- - -

TinyPtrVector<Type> is a highly specialized collection class -that is optimized to avoid allocation in the case when a vector has zero or one -elements. It has two major restrictions: 1) it can only hold values of pointer -type, and 2) it cannot hold a null pointer.

- -

Since this container is highly specialized, it is rarely used.

- -

- - -

- "llvm/ADT/SmallVector.h" -

- -

SmallVector<Type, N> is a simple class that looks and smells -just like vector<Type>: -it supports efficient iteration, lays out elements in memory order (so you can -do pointer arithmetic between elements), supports efficient push_back/pop_back -operations, supports efficient random access to its elements, etc.

- -

The advantage of SmallVector is that it allocates space for -some number of elements (N) in the object itself. Because of this, if -the SmallVector is dynamically smaller than N, no malloc is performed. This can -be a big win in cases where the malloc/free call is far more expensive than the -code that fiddles around with the elements.

- -

This is good for vectors that are "usually small" (e.g. the number of -predecessors/successors of a block is usually less than 8). On the other hand, -this makes the size of the SmallVector itself large, so you don't want to -allocate lots of them (doing so will waste a lot of space). As such, -SmallVectors are most useful when on the stack.

- -

SmallVector also provides a nice portable and efficient replacement for -alloca.

- -

- - -

- <vector> -

- -

-std::vector is well loved and respected. It is useful when SmallVector isn't: -when the size of the vector is often large (thus the small optimization will -rarely be a benefit) or if you will be allocating many instances of the vector -itself (which would waste space for elements that aren't in the container). -vector is also useful when interfacing with code that expects vectors :). -

- -

One worthwhile note about std::vector: avoid code like this:

- -

-for ( ... ) {
-   std::vector<foo> V;
-   // make use of V.
-}
-

- -

Instead, write this as:

- -

-std::vector<foo> V;
-for ( ... ) {
-   // make use of V.
-   V.clear();
-}
-

- -

Doing so will save (at least) one heap allocation and free per iteration of -the loop.

- -

- - -

- <deque> -

- -

std::deque is, in some senses, a generalized version of std::vector. Like -std::vector, it provides constant time random access and other similar -properties, but it also provides efficient access to the front of the list. It -does not guarantee continuity of elements within memory.

- -

In exchange for this extra flexibility, std::deque has significantly higher -constant factor costs than std::vector. If possible, use std::vector or -something cheaper.

- - -

- <list> -

- -

std::list is an extremely inefficient class that is rarely useful. -It performs a heap allocation for every element inserted into it, thus having an -extremely high constant factor, particularly for small data types. std::list -also only supports bidirectional iteration, not random access iteration.

- -

In exchange for this high cost, std::list supports efficient access to both -ends of the list (like std::deque, but unlike std::vector or SmallVector). In -addition, the iterator invalidation characteristics of std::list are stronger -than that of a vector class: inserting or removing an element into the list does -not invalidate iterator or pointers to other elements in the list.

- - -

- llvm/ADT/ilist.h -

- -

ilist<T> implements an 'intrusive' doubly-linked list. It is -intrusive, because it requires the element to store and provide access to the -prev/next pointers for the list.

- -

ilist has the same drawbacks as std::list, and additionally -requires an ilist_traits implementation for the element type, but it -provides some novel characteristics. In particular, it can efficiently store -polymorphic objects, the traits class is informed when an element is inserted or -removed from the list, and ilists are guaranteed to support a -constant-time splice operation.

- -

These properties are exactly what we want for things like -Instructions and basic blocks, which is why these are implemented with -ilists.

- -Related classes of interest are explained in the following subsections: -

ilist_traits
iplist
llvm/ADT/ilist_node.h
Sentinels

- - -

- llvm/ADT/PackedVector.h -

- -

-Useful for storing a vector of values using only a few number of bits for each -value. Apart from the standard operations of a vector-like container, it can -also perform an 'or' set operation. -

- -

For example:

- -

-enum State {
-    None = 0x0,
-    FirstCondition = 0x1,
-    SecondCondition = 0x2,
-    Both = 0x3
-};
-
-State get() {
-    PackedVector<State, 2> Vec1;
-    Vec1.push_back(FirstCondition);
-
-    PackedVector<State, 2> Vec2;
-    Vec2.push_back(SecondCondition);
-
-    Vec1 |= Vec2;
-    return Vec1[0]; // returns 'Both'.
-}
-

- -

- -

ilists have another specialty that must be considered. To be a good -citizen in the C++ ecosystem, it needs to support the standard container -operations, such as begin and end iterators, etc. Also, the -operator-- must work correctly on the end iterator in the -case of non-empty ilists.

- -

The only sensible solution to this problem is to allocate a so-called -sentinel along with the intrusive list, which serves as the end -iterator, providing the back-link to the last element. However conforming to the -C++ convention it is illegal to operator++ beyond the sentinel and it -also must not be dereferenced.

- -

These constraints allow for some implementation freedom to the ilist -how to allocate and store the sentinel. The corresponding policy is dictated -by ilist_traits<T>. By default a T gets heap-allocated -whenever the need for a sentinel arises.

- -

While the default policy is sufficient in most cases, it may break down when -T does not provide a default constructor. Also, in the case of many -instances of ilists, the memory overhead of the associated sentinels -is wasted. To alleviate the situation with numerous and voluminous -T-sentinels, sometimes a trick is employed, leading to ghostly -sentinels.

- -

Ghostly sentinels are obtained by specially-crafted ilist_traits<T> -which superpose the sentinel with the ilist instance in memory. Pointer -arithmetic is used to obtain the sentinel, which is relative to the -ilist's this pointer. The ilist is augmented by an -extra pointer, which serves as the back-link of the sentinel. This is the only -field in the ghostly sentinel which can be legally accessed.

- - -

- Other Sequential Container options -

- -

Other STL containers are available, such as std::string.

- -

There are also various STL adapter classes such as std::queue, -std::priority_queue, std::stack, etc. These provide simplified access to an -underlying container but don't affect the cost of the container itself.

- -

- - -

- String-like containers -

- -

-There are a variety of ways to pass around and use strings in C and C++, and -LLVM adds a few new options to choose from. Pick the first option on this list -that will do what you need, they are ordered according to their relative cost. -

-Note that is is generally preferred to not pass strings around as -"const char*"'s. These have a number of problems, including the fact -that they cannot represent embedded nul ("\0") characters, and do not have a -length available efficiently. The general replacement for 'const -char*' is StringRef. -

- -

For more information on choosing string containers for APIs, please see -Passing strings.

- - - -

- llvm/ADT/StringRef.h -

- -

-The StringRef class is a simple value class that contains a pointer to a -character and a length, and is quite related to the ArrayRef class (but specialized for arrays of -characters). Because StringRef carries a length with it, it safely handles -strings with embedded nul characters in it, getting the length does not require -a strlen call, and it even has very convenient APIs for slicing and dicing the -character range that it represents. -

- -

-StringRef is ideal for passing simple strings around that are known to be live, -either because they are C string literals, std::string, a C array, or a -SmallVector. Each of these cases has an efficient implicit conversion to -StringRef, which doesn't result in a dynamic strlen being executed. -

- -

StringRef has a few major limitations which make more powerful string -containers useful:

- -

You cannot directly convert a StringRef to a 'const char*' because there is -no way to add a trailing nul (unlike the .c_str() method on various stronger -classes).
StringRef doesn't own or keep alive the underlying string bytes. -As such it can easily lead to dangling pointers, and is not suitable for -embedding in datastructures in most cases (instead, use an std::string or -something like that).
For the same reason, StringRef cannot be used as the return value of a -method if the method "computes" the result string. Instead, use -std::string.
StringRef's do not allow you to mutate the pointed-to string bytes and it -doesn't allow you to insert or remove bytes from the range. For editing -operations like this, it interoperates with the Twine class.

- -

Because of its strengths and limitations, it is very common for a function to -take a StringRef and for a method on an object to return a StringRef that -points into some string that it owns.

- -

- - -

- llvm/ADT/Twine.h -

- -

- The Twine class is used as an intermediary datatype for APIs that want to take - a string that can be constructed inline with a series of concatenations. - Twine works by forming recursive instances of the Twine datatype (a simple - value object) on the stack as temporary objects, linking them together into a - tree which is then linearized when the Twine is consumed. Twine is only safe - to use as the argument to a function, and should always be a const reference, - e.g.: -

- -

-    void foo(const Twine &T);
-    ...
-    StringRef X = ...
-    unsigned i = ...
-    foo(X + "." + Twine(i));
-

- -

This example forms a string like "blarg.42" by concatenating the values - together, and does not form intermediate strings containing "blarg" or - "blarg.". -

- -

Because Twine is constructed with temporary objects on the stack, and - because these instances are destroyed at the end of the current statement, - it is an inherently dangerous API. For example, this simple variant contains - undefined behavior and will probably crash:

- -

-    void foo(const Twine &T);
-    ...
-    StringRef X = ...
-    unsigned i = ...
-    const Twine &Tmp = X + "." + Twine(i);
-    foo(Tmp);
-

- -

... because the temporaries are destroyed before the call. That said, - Twine's are much more efficient than intermediate std::string temporaries, and - they work really well with StringRef. Just be aware of their limitations.

- -

- - - -

- llvm/ADT/SmallString.h -

- -

SmallString is a subclass of SmallVector that -adds some convenience APIs like += that takes StringRef's. SmallString avoids -allocating memory in the case when the preallocated space is enough to hold its -data, and it calls back to general heap allocation when required. Since it owns -its data, it is very safe to use and supports full mutation of the string.

- -

Like SmallVector's, the big downside to SmallString is their sizeof. While -they are optimized for small strings, they themselves are not particularly -small. This means that they work great for temporary scratch buffers on the -stack, but should not generally be put into the heap: it is very rare to -see a SmallString as the member of a frequently-allocated heap data structure -or returned by-value. -

- -

- - -

- std::string -

- -

The standard C++ std::string class is a very general class that (like - SmallString) owns its underlying data. sizeof(std::string) is very reasonable - so it can be embedded into heap data structures and returned by-value. - On the other hand, std::string is highly inefficient for inline editing (e.g. - concatenating a bunch of stuff together) and because it is provided by the - standard library, its performance characteristics depend a lot of the host - standard library (e.g. libc++ and MSVC provide a highly optimized string - class, GCC contains a really slow implementation). -

- -

The major disadvantage of std::string is that almost every operation that - makes them larger can allocate memory, which is slow. As such, it is better - to use SmallVector or Twine as a scratch buffer, but then use std::string to - persist the result.

- -

-DenseSet is a simple quadratically probed hash table. It excels at supporting -small values: it uses a single allocation to hold all of the pairs that -are currently inserted in the set. DenseSet is a great way to unique small -values that are not simple pointers (use SmallPtrSet for pointers). Note that DenseSet has -the same requirements for the value type that DenseMap has. -

- -

- - -

- "llvm/ADT/SparseSet.h" -

- -

SparseSet holds a small number of objects identified by unsigned keys of -moderate size. It uses a lot of memory, but provides operations that are -almost as fast as a vector. Typical keys are physical registers, virtual -registers, or numbered basic blocks.

- -

SparseSet is useful for algorithms that need very fast clear/find/insert/erase -and fast iteration over small sets. It is not intended for building composite -data structures.

- -

- - -

- "llvm/ADT/FoldingSet.h" -

- -

-FoldingSet is an aggregate class that is really good at uniquing -expensive-to-create or polymorphic objects. It is a combination of a chained -hash table with intrusive links (uniqued objects are required to inherit from -FoldingSetNode) that uses SmallVector as part of -its ID process.

- -

Consider a case where you want to implement a "getOrCreateFoo" method for -a complex object (for example, a node in the code generator). The client has a -description of *what* it wants to generate (it knows the opcode and all the -operands), but we don't want to 'new' a node, then try inserting it into a set -only to find out it already exists, at which point we would have to delete it -and return the node that already exists. -

- -

To support this style of client, FoldingSet perform a query with a -FoldingSetNodeID (which wraps SmallVector) that can be used to describe the -element that we want to query for. The query either returns the element -matching the ID or it returns an opaque ID that indicates where insertion should -take place. Construction of the ID usually does not require heap traffic.

- -

Because FoldingSet uses intrusive links, it can support polymorphic objects -in the set (for example, you can have SDNode instances mixed with LoadSDNodes). -Because the elements are individually allocated, pointers to the elements are -stable: inserting or removing elements does not invalidate any pointers to other -elements. -

- -

- - -

- <set> -

- -

std::set is a reasonable all-around set class, which is decent at -many things but great at nothing. std::set allocates memory for each element -inserted (thus it is very malloc intensive) and typically stores three pointers -per element in the set (thus adding a large amount of per-element space -overhead). It offers guaranteed log(n) performance, which is not particularly -fast from a complexity standpoint (particularly if the elements of the set are -expensive to compare, like strings), and has extremely high constant factors for -lookup, insertion and removal.

- -

The advantages of std::set are that its iterators are stable (deleting or -inserting an element from the set does not affect iterators or pointers to other -elements) and that iteration over the set is guaranteed to be in sorted order. -If the elements in the set are large, then the relative overhead of the pointers -and malloc traffic is not a big deal, but if the elements of the set are small, -std::set is almost never a good choice.

- -

- - -

- "llvm/ADT/SetVector.h" -

- -

LLVM's SetVector<Type> is an adapter class that combines your choice of -a set-like container along with a Sequential -Container. The important property -that this provides is efficient insertion with uniquing (duplicate elements are -ignored) with iteration support. It implements this by inserting elements into -both a set-like container and the sequential container, using the set-like -container for uniquing and the sequential container for iteration. -

- -

The difference between SetVector and other sets is that the order of -iteration is guaranteed to match the order of insertion into the SetVector. -This property is really important for things like sets of pointers. Because -pointer values are non-deterministic (e.g. vary across runs of the program on -different machines), iterating over the pointers in the set will -not be in a well-defined order.

- -

-The drawback of SetVector is that it requires twice as much space as a normal -set and has the sum of constant factors from the set-like container and the -sequential container that it uses. Use it *only* if you need to iterate over -the elements in a deterministic order. SetVector is also expensive to delete -elements out of (linear time), unless you use it's "pop_back" method, which is -faster. -

- -

SetVector is an adapter class that defaults to - using std::vector and a size 16 SmallSet for the underlying - containers, so it is quite expensive. However, - "llvm/ADT/SetVector.h" also provides a SmallSetVector - class, which defaults to using a SmallVector and SmallSet - of a specified size. If you use this, and if your sets are dynamically - smaller than N, you will save a lot of heap traffic.

- -

- - -

- "llvm/ADT/UniqueVector.h" -

- -

-UniqueVector is similar to SetVector, but it -retains a unique ID for each element inserted into the set. It internally -contains a map and a vector, and it assigns a unique ID for each value inserted -into the set.

- -

UniqueVector is very expensive: its cost is the sum of the cost of -maintaining both the map and vector, it has high complexity, high constant -factors, and produces a lot of malloc traffic. It should be avoided.

- -

- - -

- "llvm/ADT/ImmutableSet.h" -

- -

-ImmutableSet is an immutable (functional) set implementation based on an AVL -tree. -Adding or removing elements is done through a Factory object and results in the -creation of a new ImmutableSet object. -If an ImmutableSet already exists with the given contents, then the existing one -is returned; equality is compared with a FoldingSetNodeID. -The time and space complexity of add or remove operations is logarithmic in the -size of the original set. - -

-There is no method for returning an element of the set, you can only check for -membership. - -

- - - -

- Other Set-Like Container Options -

- -

-The STL provides several other options, such as std::multiset and the various -"hash_set" like containers (whether from C++ TR1 or from the SGI library). We -never use hash_set and unordered_set because they are generally very expensive -(each insertion requires a malloc) and very non-portable. -

- -

std::multiset is useful if you're not interested in elimination of -duplicates, but has all the drawbacks of std::set. A sorted vector (where you -don't delete duplicate entries) or some other approach is almost always -better.

- -

- - -

- Map-Like Containers (std::map, DenseMap, etc) -

- -

-Map-like containers are useful when you want to associate data to a key. As -usual, there are a lot of different ways to do this. :) - - -

- A sorted 'vector' -

- -

-If your usage pattern follows a strict insert-then-query approach, you can -trivially use the same approach as sorted vectors -for set-like containers. The only difference is that your query function -(which uses std::lower_bound to get efficient log(n) lookup) should only compare -the key, not both the key and value. This yields the same advantages as sorted -vectors for sets. -

- - -

- "llvm/ADT/StringMap.h" -

- -

-Strings are commonly used as keys in maps, and they are difficult to support -efficiently: they are variable length, inefficient to hash and compare when -long, expensive to copy, etc. StringMap is a specialized container designed to -cope with these issues. It supports mapping an arbitrary range of bytes to an -arbitrary other object.

- -

The StringMap implementation uses a quadratically-probed hash table, where -the buckets store a pointer to the heap allocated entries (and some other -stuff). The entries in the map must be heap allocated because the strings are -variable length. The string data (key) and the element object (value) are -stored in the same allocation with the string data immediately after the element -object. This container guarantees the "(char*)(&Value+1)" points -to the key string for a value.

- -

The StringMap is very fast for several reasons: quadratic probing is very -cache efficient for lookups, the hash value of strings in buckets is not -recomputed when looking up an element, StringMap rarely has to touch the -memory for unrelated objects when looking up a value (even when hash collisions -happen), hash table growth does not recompute the hash values for strings -already in the table, and each pair in the map is store in a single allocation -(the string data is stored in the same allocation as the Value of a pair).

- -

StringMap also provides query methods that take byte ranges, so it only ever -copies a string if a value is inserted into the table.

- -

StringMap iteratation order, however, is not guaranteed to be deterministic, -so any uses which require that should instead use a std::map.

- - -

- "llvm/ADT/IndexedMap.h" -

- -

-IndexedMap is a specialized container for mapping small dense integers (or -values that can be mapped to small dense integers) to some other type. It is -internally implemented as a vector with a mapping function that maps the keys to -the dense integer range. -

- -

-This is useful for cases like virtual registers in the LLVM code generator: they -have a dense mapping that is offset by a compile-time constant (the first -virtual register ID).

- -

- - -

- "llvm/ADT/DenseMap.h" -

- -

-DenseMap is a simple quadratically probed hash table. It excels at supporting -small keys and values: it uses a single allocation to hold all of the pairs that -are currently inserted in the map. DenseMap is a great way to map pointers to -pointers, or map other small types to each other. -

- -

-There are several aspects of DenseMap that you should be aware of, however. The -iterators in a DenseMap are invalidated whenever an insertion occurs, unlike -map. Also, because DenseMap allocates space for a large number of key/value -pairs (it starts with 64 by default), it will waste a lot of space if your keys -or values are large. Finally, you must implement a partial specialization of -DenseMapInfo for the key that you want, if it isn't already supported. This -is required to tell DenseMap about two special marker values (which can never be -inserted into the map) that it needs internally.

- -

-DenseMap's find_as() method supports lookup operations using an alternate key -type. This is useful in cases where the normal key type is expensive to -construct, but cheap to compare against. The DenseMapInfo is responsible for -defining the appropriate comparison and hashing methods for each alternate -key type used. -

- -

- -

-ImmutableMap is an immutable (functional) map implementation based on an AVL -tree. -Adding or removing elements is done through a Factory object and results in the -creation of a new ImmutableMap object. -If an ImmutableMap already exists with the given key set, then the existing one -is returned; equality is compared with a FoldingSetNodeID. -The time and space complexity of add or remove operations is logarithmic in the -size of the original map. - -

- - -

- Other Map-Like Container Options -

- -

-The STL provides several other options, such as std::multimap and the various -"hash_map" like containers (whether from C++ TR1 or from the SGI library). We -never use hash_set and unordered_set because they are generally very expensive -(each insertion requires a malloc) and very non-portable.

- -

std::multimap is useful if you want to map a key to multiple values, but has -all the drawbacks of std::map. A sorted vector or some other approach is almost -always better.

- -

- - -

- Bit storage containers (BitVector, SparseBitVector) -

- -

Unlike the other containers, there are only two bit storage containers, and -choosing when to use each is relatively straightforward.

- -

One additional option is -std::vector<bool>: we discourage its use for two reasons 1) the -implementation in many common compilers (e.g. commonly available versions of -GCC) is extremely inefficient and 2) the C++ standards committee is likely to -deprecate this container and/or change it significantly somehow. In any case, -please don't use it.

- - -

- BitVector -

- -

The BitVector container provides a dynamic size set of bits for manipulation. -It supports individual bit setting/testing, as well as set operations. The set -operations take time O(size of bitvector), but operations are performed one word -at a time, instead of one bit at a time. This makes the BitVector very fast for -set operations compared to other containers. Use the BitVector when you expect -the number of set bits to be high (IE a dense set). -

- - -

- SmallBitVector -

- -

The SmallBitVector container provides the same interface as BitVector, but -it is optimized for the case where only a small number of bits, less than -25 or so, are needed. It also transparently supports larger bit counts, but -slightly less efficiently than a plain BitVector, so SmallBitVector should -only be used when larger counts are rare. -

- -

-At this time, SmallBitVector does not support set operations (and, or, xor), -and its operator[] does not provide an assignable lvalue. -

- - -

- -

It's quite common to have a Function instance that you'd like to -transform in some way; in particular, you'd like to manipulate its -BasicBlocks. To facilitate this, you'll need to iterate over all of -the BasicBlocks that constitute the Function. The following is -an example that prints the name of a BasicBlock and the number of -Instructions it contains:

- -

-// func is a pointer to a Function instance
-for (Function::iterator i = func->begin(), e = func->end(); i != e; ++i)
-  // Print out the name of the basic block if it has one, and then the
-  // number of instructions that it contains
-  errs() << "Basic block (name=" << i->getName() << ") has "
-             << i->size() << " instructions.\n";
-

- -

Note that i can be used as if it were a pointer for the purposes of -invoking member functions of the Instruction class. This is -because the indirection operator is overloaded for the iterator -classes. In the above code, the expression i->size() is -exactly equivalent to (*i).size() just like you'd expect.

- -

- - -

- Iterating over the `Instruction`s in a `BasicBlock` -

- -

Just like when dealing with BasicBlocks in Functions, it's -easy to iterate over the individual instructions that make up -BasicBlocks. Here's a code snippet that prints out each instruction in -a BasicBlock:

- -

-// blk is a pointer to a BasicBlock instance
-for (BasicBlock::iterator i = blk->begin(), e = blk->end(); i != e; ++i)
-   // The next statement works since operator<<(ostream&,...)
-   // is overloaded for Instruction&
-   errs() << *i << "\n";
-

- -

However, this isn't really the best way to print out the contents of a -BasicBlock! Since the ostream operators are overloaded for virtually -anything you'll care about, you could have just invoked the print routine on the -basic block itself: errs() << *blk << "\n";.

- -

- - -

- Iterating over the `Instruction`s in a `Function` -

- -

If you're finding that you commonly iterate over a Function's -BasicBlocks and then that BasicBlock's Instructions, -InstIterator should be used instead. You'll need to include llvm/Support/InstIterator.h, -and then instantiate InstIterators explicitly in your code. Here's a -small example that shows how to dump all instructions in a function to the standard error stream:

- -

-#include "llvm/Support/InstIterator.h"
-
-// F is a pointer to a Function instance
-for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I)
-  errs() << *I << "\n";
-

- -

Easy, isn't it? You can also use InstIterators to fill a -work list with its initial contents. For example, if you wanted to -initialize a work list to contain all instructions in a Function -F, all you would need to do is something like:

- -

-std::set<Instruction*> worklist;
-// or better yet, SmallPtrSet<Instruction*, 64> worklist;
-
-for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I)
-   worklist.insert(&*I);
-

- -

The STL set worklist would now contain all instructions in the -Function pointed to by F.

- -

- - -

- Turning an iterator into a class pointer (and - vice-versa) -

- -

Sometimes, it'll be useful to grab a reference (or pointer) to a class -instance when all you've got at hand is an iterator. Well, extracting -a reference or a pointer from an iterator is very straight-forward. -Assuming that i is a BasicBlock::iterator and j -is a BasicBlock::const_iterator:

- -

-Instruction& inst = *i;   // Grab reference to instruction reference
-Instruction* pinst = &*i; // Grab pointer to instruction reference
-const Instruction& inst = *j;
-

- -

However, the iterators you'll be working with in the LLVM framework are -special: they will automatically convert to a ptr-to-instance type whenever they -need to. Instead of dereferencing the iterator and then taking the address of -the result, you can simply assign the iterator to the proper pointer type and -you get the dereference and address-of operation as a result of the assignment -(behind the scenes, this is a result of overloading casting mechanisms). Thus -the last line of the last example,

- -

-Instruction *pinst = &*i;
-

- -

is semantically equivalent to

- -

-Instruction *pinst = i;
-

- -

It's also possible to turn a class pointer into the corresponding iterator, -and this is a constant time operation (very efficient). The following code -snippet illustrates use of the conversion constructors provided by LLVM -iterators. By using these, you can explicitly grab the iterator of something -without actually obtaining it via iteration over some structure:

- -

-void printNextInstruction(Instruction* inst) {
-  BasicBlock::iterator it(inst);
-  ++it; // After this line, it refers to the instruction after *inst
-  if (it != inst->getParent()->end()) errs() << *it << "\n";
-}
-

- -

Unfortunately, these implicit conversions come at a cost; they prevent -these iterators from conforming to standard iterator conventions, and thus -from being usable with standard algorithms and containers. For example, they -prevent the following code, where B is a BasicBlock, -from compiling:

- -

-  llvm::SmallVector<llvm::Instruction *, 16>(B->begin(), B->end());
-

- -

Because of this, these implicit conversions may be removed some day, -and operator* changed to return a pointer instead of a reference.

- -

- - -

- Finding call sites: a slightly more complex - example -

- -

Say that you're writing a FunctionPass and would like to count all the -locations in the entire module (that is, across every Function) where a -certain function (i.e., some Function*) is already in scope. As you'll -learn later, you may want to use an InstVisitor to accomplish this in a -much more straight-forward manner, but this example will allow us to explore how -you'd do it if you didn't have InstVisitor around. In pseudo-code, this -is what we want to do:

- -

-initialize callCounter to zero
-for each Function f in the Module
-  for each BasicBlock b in f
-    for each Instruction i in b
-      if (i is a CallInst and calls the given function)
-        increment callCounter
-

- -

And the actual code is (remember, because we're writing a -FunctionPass, our FunctionPass-derived class simply has to -override the runOnFunction method):

- -

-Function* targetFunc = ...;
-
-class OurFunctionPass : public FunctionPass {
-  public:
-    OurFunctionPass(): callCounter(0) { }
-
-    virtual runOnFunction(Function& F) {
-      for (Function::iterator b = F.begin(), be = F.end(); b != be; ++b) {
-        for (BasicBlock::iterator i = b->begin(), ie = b->end(); i != ie; ++i) {
-          if (CallInst* callInst = dyn_cast<CallInst>(&*i)) {
-            // We know we've encountered a call instruction, so we
-            // need to determine if it's a call to the
-            // function pointed to by m_func or not.
-            if (callInst->getCalledFunction() == targetFunc)
-              ++callCounter;
-          }
-        }
-      }
-    }
-
-  private:
-    unsigned callCounter;
-};
-

- -

- - -

- Treating calls and invokes the same way -

- -

You may have noticed that the previous example was a bit oversimplified in -that it did not deal with call sites generated by 'invoke' instructions. In -this, and in other situations, you may find that you want to treat -CallInsts and InvokeInsts the same way, even though their -most-specific common base class is Instruction, which includes lots of -less closely-related things. For these cases, LLVM provides a handy wrapper -class called CallSite. -It is essentially a wrapper around an Instruction pointer, with some -methods that provide functionality common to CallInsts and -InvokeInsts.

- -

This class has "value semantics": it should be passed by value, not by -reference and it should not be dynamically allocated or deallocated using -operator new or operator delete. It is efficiently copyable, -assignable and constructable, with costs equivalents to that of a bare pointer. -If you look at its definition, it has only a single pointer member.

- -

- - -

- Iterating over def-use & use-def chains -

- -

Frequently, we might have an instance of the Value Class and we want to -determine which Users use the Value. The list of all -Users of a particular Value is called a def-use chain. -For example, let's say we have a Function* named F to a -particular function foo. Finding all of the instructions that -use foo is as simple as iterating over the def-use chain -of F:

- -

-Function *F = ...;
-
-for (Value::use_iterator i = F->use_begin(), e = F->use_end(); i != e; ++i)
-  if (Instruction *Inst = dyn_cast<Instruction>(*i)) {
-    errs() << "F is used in instruction:\n";
-    errs() << *Inst << "\n";
-  }
-

- -

Note that dereferencing a Value::use_iterator is not a very cheap -operation. Instead of performing *i above several times, consider -doing it only once in the loop body and reusing its result.

- -

Alternatively, it's common to have an instance of the User Class and need to know what -Values are used by it. The list of all Values used by a -User is known as a use-def chain. Instances of class -Instruction are common Users, so we might want to iterate over -all of the values that a particular instruction uses (that is, the operands of -the particular Instruction):

- -

-Instruction *pi = ...;
-
-for (User::op_iterator i = pi->op_begin(), e = pi->op_end(); i != e; ++i) {
-  Value *v = *i;
-  // ...
-}
-

- -

Declaring objects as const is an important tool of enforcing -mutation free algorithms (such as analyses, etc.). For this purpose above -iterators come in constant flavors as Value::const_use_iterator -and Value::const_op_iterator. They automatically arise when -calling use/op_begin() on const Value*s or -const User*s respectively. Upon dereferencing, they return -const Use*s. Otherwise the above patterns remain unchanged.

- -

- - -

- Iterating over predecessors & -successors of blocks -

- -

Iterating over the predecessors and successors of a block is quite easy -with the routines defined in "llvm/Support/CFG.h". Just use code like -this to iterate over all predecessors of BB:

- -

-#include "llvm/Support/CFG.h"
-BasicBlock *BB = ...;
-
-for (pred_iterator PI = pred_begin(BB), E = pred_end(BB); PI != E; ++PI) {
-  BasicBlock *Pred = *PI;
-  // ...
-}
-

- -

Similarly, to iterate over successors use -succ_iterator/succ_begin/succ_end.

- -

- - -

- Making simple changes -

- -

There are some primitive transformation operations present in the LLVM -infrastructure that are worth knowing about. When performing -transformations, it's fairly common to manipulate the contents of basic -blocks. This section describes some of the common methods for doing so -and gives example code.

- - -

- Creating and inserting new - `Instruction`s -

- -

Instantiating Instructions

- -

Creation of Instructions is straight-forward: simply call the -constructor for the kind of instruction to instantiate and provide the necessary -parameters. For example, an AllocaInst only requires a -(const-ptr-to) Type. Thus:

- -

-AllocaInst* ai = new AllocaInst(Type::Int32Ty);
-

- -

will create an AllocaInst instance that represents the allocation of -one integer in the current stack frame, at run time. Each Instruction -subclass is likely to have varying default parameters which change the semantics -of the instruction, so refer to the doxygen documentation for the subclass of -Instruction that you're interested in instantiating.

- -

Naming values

- -

It is very useful to name the values of instructions when you're able to, as -this facilitates the debugging of your transformations. If you end up looking -at generated LLVM machine code, you definitely want to have logical names -associated with the results of instructions! By supplying a value for the -Name (default) parameter of the Instruction constructor, you -associate a logical name with the result of the instruction's execution at -run time. For example, say that I'm writing a transformation that dynamically -allocates space for an integer on the stack, and that integer is going to be -used as some kind of index by some other code. To accomplish this, I place an -AllocaInst at the first point in the first BasicBlock of some -Function, and I'm intending to use it within the same -Function. I might do:

- -

-AllocaInst* pa = new AllocaInst(Type::Int32Ty, 0, "indexLoc");
-

- -

where indexLoc is now the logical name of the instruction's -execution value, which is a pointer to an integer on the run time stack.

- -

Inserting instructions

- -

There are essentially two ways to insert an Instruction -into an existing sequence of instructions that form a BasicBlock:

- -

Insertion into an explicit instruction list - -
Given a BasicBlock* pb, an Instruction* pi within that - BasicBlock, and a newly-created instruction we wish to insert - before *pi, we do the following:
- -
-
```
-BasicBlock *pb = ...;
-Instruction *pi = ...;
-Instruction *newInst = new Instruction(...);
-
-pb->getInstList().insert(pi, newInst); // Inserts newInst before pi in pb
-
```
-
- -
Appending to the end of a BasicBlock is so common that - the Instruction class and Instruction-derived - classes provide constructors which take a pointer to a - BasicBlock to be appended to. For example code that - looked like:
- -
-
```
-BasicBlock *pb = ...;
-Instruction *newInst = new Instruction(...);
-
-pb->getInstList().push_back(newInst); // Appends newInst to pb
-
```
-
- -
becomes:
- -
-
```
-BasicBlock *pb = ...;
-Instruction *newInst = new Instruction(..., pb);
-
```
-
- -
which is much cleaner, especially if you are creating - long instruction streams.
Insertion into an implicit instruction list - -
Instruction instances that are already in BasicBlocks - are implicitly associated with an existing instruction list: the instruction - list of the enclosing basic block. Thus, we could have accomplished the same - thing as the above code without being given a BasicBlock by doing: -
- -
-
```
-Instruction *pi = ...;
-Instruction *newInst = new Instruction(...);
-
-pi->getParent()->getInstList().insert(pi, newInst);
-
```
-
- -
In fact, this sequence of steps occurs so frequently that the - Instruction class and Instruction-derived classes provide - constructors which take (as a default parameter) a pointer to an - Instruction which the newly-created Instruction should - precede. That is, Instruction constructors are capable of - inserting the newly-created instance into the BasicBlock of a - provided instruction, immediately before that instruction. Using an - Instruction constructor with a insertBefore (default) - parameter, the above code becomes:
- -
-
```
-Instruction* pi = ...;
-Instruction* newInst = new Instruction(..., pi);
-
```
-
- -
which is much cleaner, especially if you're creating a lot of - instructions and adding them to BasicBlocks.

- -

- - -

- Deleting `Instruction`s -

- -

Deleting an instruction from an existing sequence of instructions that form a -BasicBlock is very straight-forward: just -call the instruction's eraseFromParent() method. For example:

- -

-Instruction *I = .. ;
-I->eraseFromParent();
-

- -

This unlinks the instruction from its containing basic block and deletes -it. If you'd just like to unlink the instruction from its containing basic -block but not delete it, you can use the removeFromParent() method.

- -

- -

You can use Value::replaceAllUsesWith and -User::replaceUsesOfWith to change more than one use at a time. See the -doxygen documentation for the Value Class -and User Class, respectively, for more -information.

- - - -

- - -

- Deleting `GlobalVariable`s -

- -

Deleting a global variable from a module is just as easy as deleting an -Instruction. First, you must have a pointer to the global variable that you wish - to delete. You use this pointer to erase it from its parent, the module. - For example:

- -

-GlobalVariable *GV = .. ;
-
-GV->eraseFromParent();
-

- -

- - -

- How to Create Types -

- -

In generating IR, you may need some complex types. If you know these types -statically, you can use TypeBuilder<...>::get(), defined -in llvm/Support/TypeBuilder.h, to retrieve them. TypeBuilder -has two forms depending on whether you're building types for cross-compilation -or native library use. TypeBuilder<T, true> requires -that T be independent of the host environment, meaning that it's built -out of types from -the llvm::types -namespace and pointers, functions, arrays, etc. built of -those. TypeBuilder<T, false> additionally allows native C types -whose size may depend on the host compiler. For example,

- -

-FunctionType *ft = TypeBuilder<types::i<8>(types::i<32>*), true>::get();
-

- -

is easier to read and write than the equivalent

- -

-std::vector<const Type*> params;
-params.push_back(PointerType::getUnqual(Type::Int32Ty));
-FunctionType *ft = FunctionType::get(Type::Int8Ty, params, false);
-

- -

See the class -comment for more details.

- -

- - -

- Threads and LLVM -

- - -

-This section describes the interaction of the LLVM APIs with multithreading, -both on the part of client applications, and in the JIT, in the hosted -application. -

- -

-Note that LLVM's support for multithreading is still relatively young. Up -through version 2.5, the execution of threaded hosted applications was -supported, but not threaded client access to the APIs. While this use case is -now supported, clients must adhere to the guidelines specified below to -ensure proper operation in multithreaded mode. -

- -

-Note that, on Unix-like platforms, LLVM requires the presence of GCC's atomic -intrinsics in order to support threaded operation. If you need a -multhreading-capable LLVM on a platform without a suitably modern system -compiler, consider compiling LLVM and LLVM-GCC in single-threaded mode, and -using the resultant compiler to build a copy of LLVM with multithreading -support. -

- - -

- Entering and Exiting Multithreaded Mode -

- -

-In order to properly protect its internal data structures while avoiding -excessive locking overhead in the single-threaded case, the LLVM must intialize -certain data structures necessary to provide guards around its internals. To do -so, the client program must invoke llvm_start_multithreaded() before -making any concurrent LLVM API calls. To subsequently tear down these -structures, use the llvm_stop_multithreaded() call. You can also use -the llvm_is_multithreaded() call to check the status of multithreaded -mode. -

- -

-Note that both of these calls must be made in isolation. That is to -say that no other LLVM API calls may be executing at any time during the -execution of llvm_start_multithreaded() or llvm_stop_multithreaded -. It's is the client's responsibility to enforce this isolation. -

- -

-The return value of llvm_start_multithreaded() indicates the success or -failure of the initialization. Failure typically indicates that your copy of -LLVM was built without multithreading support, typically because GCC atomic -intrinsics were not found in your system compiler. In this case, the LLVM API -will not be safe for concurrent calls. However, it will be safe for -hosting threaded applications in the JIT, though care -must be taken to ensure that side exits and the like do not accidentally -result in concurrent LLVM API calls. -

- - -

- Ending Execution with `llvm_shutdown()` -

- -

-When you are done using the LLVM APIs, you should call llvm_shutdown() -to deallocate memory used for internal structures. This will also invoke -llvm_stop_multithreaded() if LLVM is operating in multithreaded mode. -As such, llvm_shutdown() requires the same isolation guarantees as -llvm_stop_multithreaded(). -

- -

-Note that, if you use scope-based shutdown, you can use the -llvm_shutdown_obj class, which calls llvm_shutdown() in its -destructor. -

- - -

- Lazy Initialization with `ManagedStatic` -

- -

-ManagedStatic is a utility class in LLVM used to implement static -initialization of static resources, such as the global type tables. Before the -invocation of llvm_shutdown(), it implements a simple lazy -initialization scheme. Once llvm_start_multithreaded() returns, -however, it uses double-checked locking to implement thread-safe lazy -initialization. -

- -

-Note that, because no other threads are allowed to issue LLVM API calls before -llvm_start_multithreaded() returns, it is possible to have -ManagedStatics of llvm::sys::Mutexs. -

- -

-The llvm_acquire_global_lock() and llvm_release_global_lock -APIs provide access to the global lock used to implement the double-checked -locking for lazy initialization. These should only be used internally to LLVM, -and only if you know what you're doing! -

- - -

- Achieving Isolation with `LLVMContext` -

- -

-LLVMContext is an opaque class in the LLVM API which clients can use -to operate multiple, isolated instances of LLVM concurrently within the same -address space. For instance, in a hypothetical compile-server, the compilation -of an individual translation unit is conceptually independent from all the -others, and it would be desirable to be able to compile incoming translation -units concurrently on independent server threads. Fortunately, -LLVMContext exists to enable just this kind of scenario! -

- -

-Conceptually, LLVMContext provides isolation. Every LLVM entity -(Modules, Values, Types, Constants, etc.) -in LLVM's in-memory IR belongs to an LLVMContext. Entities in -different contexts cannot interact with each other: Modules in -different contexts cannot be linked together, Functions cannot be added -to Modules in different contexts, etc. What this means is that is is -safe to compile on multiple threads simultaneously, as long as no two threads -operate on entities within the same context. -

- -

-In practice, very few places in the API require the explicit specification of a -LLVMContext, other than the Type creation/lookup APIs. -Because every Type carries a reference to its owning context, most -other entities can determine what context they belong to by looking at their -own Type. If you are adding new entities to LLVM IR, please try to -maintain this interface design. -

- -

-For clients that do not require the benefits of isolation, LLVM -provides a convenience API getGlobalContext(). This returns a global, -lazily initialized LLVMContext that may be used in situations where -isolation is not a concern. -

- - -

- Threads and the JIT -

- -

-LLVM's "eager" JIT compiler is safe to use in threaded programs. Multiple -threads can call ExecutionEngine::getPointerToFunction() or -ExecutionEngine::runFunction() concurrently, and multiple threads can -run code output by the JIT concurrently. The user must still ensure that only -one thread accesses IR in a given LLVMContext while another thread -might be modifying it. One way to do that is to always hold the JIT lock while -accessing IR outside the JIT (the JIT modifies the IR by adding -CallbackVHs). Another way is to only -call getPointerToFunction() from the LLVMContext's thread. -

- -

When the JIT is configured to compile lazily (using -ExecutionEngine::DisableLazyCompilation(false)), there is currently a -race condition in -updating call sites after a function is lazily-jitted. It's still possible to -use the lazy JIT in a threaded program if you ensure that only one thread at a -time can call any particular lazy stub and that the JIT lock guards any IR -access, but we suggest using only the eager JIT in threaded programs. -

- -

- - -

- Advanced Topics -

- - -

-This section describes some of the advanced or obscure API's that most clients -do not need to be aware of. These API's tend manage the inner workings of the -LLVM system, and only need to be accessed in unusual circumstances. -

- - - -

- The `ValueSymbolTable` class -

- -

The -ValueSymbolTable class provides a symbol table that the Function and -Module classes use for naming value definitions. The symbol table -can provide a name for any Value. -

- -

Note that the SymbolTable class should not be directly accessed -by most clients. It should only be used when iteration over the symbol table -names themselves are required, which is very special purpose. Note that not -all LLVM -Values have names, and those without names (i.e. they have -an empty name) do not exist in the symbol table. -

- -

Symbol tables support iteration over the values in the symbol -table with begin/end/iterator and supports querying to see if a -specific name is in the symbol table (with lookup). The -ValueSymbolTable class exposes no public mutator methods, instead, -simply call setName on a value, which will autoinsert it into the -appropriate symbol table.

- -

- - - - -

- The `User` and owned `Use` classes' memory layout -

- -

The -User class provides a basis for expressing the ownership of User -towards other -Values. The -Use helper class is employed to do the bookkeeping and to facilitate O(1) -addition and removal.

- - -

- - Interaction and relationship between `User` and `Use` objects - -

- -

-A subclass of User can choose between incorporating its Use objects -or refer to them out-of-line by means of a pointer. A mixed variant -(some Uses inline others hung off) is impractical and breaks the invariant -that the Use objects belonging to the same User form a contiguous array. -

- -

-We have 2 different layouts in the User (sub)classes: -

Layout a) -The Use object(s) are inside (resp. at fixed offset) of the User -object and there are a fixed number of them.
- -
Layout b) -The Use object(s) are referenced by a pointer to an -array from the User object and there may be a variable -number of them.
-

-As of v2.4 each layout still possesses a direct pointer to the -start of the array of Uses. Though not mandatory for layout a), -we stick to this redundancy for the sake of simplicity. -The User object also stores the number of Use objects it -has. (Theoretically this information can also be calculated -given the scheme presented below.)

-Special forms of allocation operators (operator new) -enforce the following memory layouts:

- -

Layout a) is modelled by prepending the User object by the Use[] array.

- -

-...---.---.---.---.-------...
-  | P | P | P | P | User
-'''---'---'---'---'-------'''
-

- -

Layout b) is modelled by pointing at the Use[] array.

-.-------...
-| User
-'-------'''
-    |
-    v
-    .---.---.---.---...
-    | P | P | P | P |
-    '---'---'---'---'''
-

-(In the above figures 'P' stands for the Use** that - is stored in each Use object in the member Use::Prev) - -

- - -

- The waymarking algorithm -

- -

-Since the Use objects are deprived of the direct (back)pointer to -their User objects, there must be a fast and exact method to -recover it. This is accomplished by the following scheme:

- -A bit-encoding in the 2 LSBits (least significant bits) of the Use::Prev allows to find the -start of the User object: -

00 —> binary digit 0
01 —> binary digit 1
10 —> stop and calculate (s)
11 —> full stop (S)

-Given a Use*, all we have to do is to walk till we get -a stop and we either have a User immediately behind or -we have to walk to the next stop picking up digits -and calculating the offset:

-.---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.----------------
-| 1 | s | 1 | 0 | 1 | 0 | s | 1 | 1 | 0 | s | 1 | 1 | s | 1 | S | User (or User*)
-'---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'----------------
-    |+15                |+10            |+6         |+3     |+1
-    |                   |               |           |       |__>
-    |                   |               |           |__________>
-    |                   |               |______________________>
-    |                   |______________________________________>
-    |__________________________________________________________>
-

-Only the significant number of bits need to be stored between the -stops, so that the worst case is 20 memory accesses when there are -1000 Use objects associated with a User.

- -

- - -

- Reference implementation -

- -

-The following literate Haskell fragment demonstrates the concept:

- -

-> import Test.QuickCheck
-> 
-> digits :: Int -> [Char] -> [Char]
-> digits 0 acc = '0' : acc
-> digits 1 acc = '1' : acc
-> digits n acc = digits (n `div` 2) $ digits (n `mod` 2) acc
-> 
-> dist :: Int -> [Char] -> [Char]
-> dist 0 [] = ['S']
-> dist 0 acc = acc
-> dist 1 acc = let r = dist 0 acc in 's' : digits (length r) r
-> dist n acc = dist (n - 1) $ dist 1 acc
-> 
-> takeLast n ss = reverse $ take n $ reverse ss
-> 
-> test = takeLast 40 $ dist 20 []
-> 
-

-Printing <test> gives: "1s100000s11010s10100s1111s1010s110s11s1S"

-The reverse algorithm computes the length of the string just by examining -a certain prefix:

- -

-> pref :: [Char] -> Int
-> pref "S" = 1
-> pref ('s':'1':rest) = decode 2 1 rest
-> pref (_:rest) = 1 + pref rest
-> 
-> decode walk acc ('0':rest) = decode (walk + 1) (acc * 2) rest
-> decode walk acc ('1':rest) = decode (walk + 1) (acc * 2 + 1) rest
-> decode walk acc _ = walk + acc
-> 
-

-Now, as expected, printing <pref test> gives 40.

-We can quickCheck this with following property:

- -

-> testcase = dist 2000 []
-> testcaseLength = length testcase
-> 
-> identityProp n = n > 0 && n <= testcaseLength ==> length arr == pref arr
->     where arr = takeLast n testcase
-> 
-

-As expected <quickCheck identityProp> gives:

- -

-*Main> quickCheck identityProp
-OK, passed 100 tests.
-

-Let's be a bit more exhaustive:

- -

-> 
-> deepCheck p = check (defaultConfig { configMaxTest = 500 }) p
-> 
-

-And here is the result of <deepCheck identityProp>:

- -

-*Main> deepCheck identityProp
-OK, passed 500 tests.
-

- -

- - -

- Tagging considerations -

- -

-To maintain the invariant that the 2 LSBits of each Use** in Use -never change after being set up, setters of Use::Prev must re-tag the -new Use** on every modification. Accordingly getters must strip the -tag bits.

-For layout b) instead of the User we find a pointer (User* with LSBit set). -Following this pointer brings us to the User. A portable trick ensures -that the first bytes of User (if interpreted as a pointer) never has -the LSBit set. (Portability is relying on the fact that all known compilers place the -vptr in the first word of the instances.)

- -

- - -

- The Core LLVM Class Hierarchy Reference -

- - -

#include "llvm/Type.h" -
doxygen info: Type Class

- -

The Core LLVM classes are the primary means of representing the program -being inspected or transformed. The core LLVM classes are defined in -header files in the include/llvm/ directory, and implemented in -the lib/VMCore directory.

- - -

- The `Type` class and Derived Types -

- -

Type is a superclass of all type classes. Every Value has - a Type. Type cannot be instantiated directly but only - through its subclasses. Certain primitive types (VoidType, - LabelType, FloatType and DoubleType) have hidden - subclasses. They are hidden because they offer no useful functionality beyond - what the Type class offers except to distinguish themselves from - other subclasses of Type.

All other types are subclasses of DerivedType. Types can be - named, but this is not a requirement. There exists exactly - one instance of a given shape at any one time. This allows type equality to - be performed with address equality of the Type Instance. That is, given two - Type* values, the types are identical if the pointers are identical. -

- - -

- Important Public Methods -

- -

bool isIntegerTy() const: Returns true for any integer type.
bool isFloatingPointTy(): Return true if this is one of the five - floating point types.
bool isSized(): Return true if the type has known size. Things - that don't have a size are abstract types, labels and void.

- - -

- Important Derived Types -

IntegerType

Subclass of DerivedType that represents integer types of any bit width. - Any bit width between IntegerType::MIN_INT_BITS (1) and - IntegerType::MAX_INT_BITS (~8 million) can be represented. -

static const IntegerType* get(unsigned NumBits): get an integer - type of a specific bit width.
unsigned getBitWidth() const: Get the bit width of an integer - type.

SequentialType

This is subclassed by ArrayType, PointerType and VectorType. -

const Type * getElementType() const: Returns the type of each - of the elements in the sequential type.

ArrayType

This is a subclass of SequentialType and defines the interface for array - types. -

unsigned getNumElements() const: Returns the number of - elements in the array.

PointerType

Subclass of SequentialType for pointer types.

VectorType

Subclass of SequentialType for vector types. A - vector type is similar to an ArrayType but is distinguished because it is - a first class type whereas ArrayType is not. Vector types are used for - vector operations and are usually small vectors of of an integer or floating - point type.

StructType

Subclass of DerivedTypes for struct types.

FunctionType

Subclass of DerivedTypes for function types. -

bool isVarArg() const: Returns true if it's a vararg - function
const Type * getReturnType() const: Returns the - return type of the function.
const Type * getParamType (unsigned i): Returns - the type of the ith parameter.
const unsigned getNumParams() const: Returns the - number of formal parameters.

- -

- - -

- The `Module` class -

- -

#include "llvm/Module.h"
doxygen info: -Module Class

- -

The Module class represents the top level structure present in LLVM -programs. An LLVM module is effectively either a translation unit of the -original program or a combination of several translation units merged by the -linker. The Module class keeps track of a list of Functions, a list of GlobalVariables, and a SymbolTable. Additionally, it contains a few -helpful member functions that try to make common operations easy.

- - -

- Important Public Members of the `Module` class -

- -

Module::Module(std::string name = "") - -
Constructing a Module is easy. You can optionally -provide a name for it (probably based on the name of the translation unit).
-
Module::iterator - Typedef for function list iterator
- Module::const_iterator - Typedef for const_iterator.
- - begin(), end() - size(), empty() - -
These are forwarding methods that make it easy to access the contents of - a Module object's Function - list.
Module::FunctionListType &getFunctionList() - -
Returns the list of Functions. This is - necessary to use when you need to update the list or perform a complex - action that doesn't have a forwarding method.
- -

- -

Module::global_iterator - Typedef for global variable list iterator
- - Module::const_global_iterator - Typedef for const_iterator.
- - global_begin(), global_end() - global_size(), global_empty() - -
These are forwarding methods that make it easy to access the contents of - a Module object's GlobalVariable list.
Module::GlobalListType &getGlobalList() - -
Returns the list of GlobalVariables. This is necessary to - use when you need to update the list or perform a complex action that - doesn't have a forwarding method.
- -

- -

SymbolTable *getSymbolTable() - -
Return a reference to the SymbolTable - for this Module.
- -

- -

Function *getFunction(StringRef Name) const - - -
Look up the specified function in the Module SymbolTable. If it does not exist, return - null.
Function *getOrInsertFunction(const - std::string &Name, const FunctionType *T) - -
Look up the specified function in the Module SymbolTable. If it does not exist, add an - external declaration for the function and return it.
std::string getTypeName(const Type *Ty) - -
If there is at least one entry in the SymbolTable for the specified Type, return it. Otherwise return the empty - string.
bool addTypeName(const std::string &Name, const Type *Ty) - -
Insert an entry in the SymbolTable - mapping Name to Ty. If there is already an entry for this - name, true is returned and the SymbolTable is not modified.

- -

- - -

- The `Value` class -

- -

#include "llvm/Value.h" -
-doxygen info: Value Class

- -

The Value class is the most important class in the LLVM Source -base. It represents a typed value that may be used (among other things) as an -operand to an instruction. There are many different types of Values, -such as Constants,Arguments. Even Instructions and Functions are Values.

- -

A particular Value may be used many times in the LLVM representation -for a program. For example, an incoming argument to a function (represented -with an instance of the Argument class) is "used" by -every instruction in the function that references the argument. To keep track -of this relationship, the Value class keeps a list of all of the Users that is using it (the User class is a base class for all nodes in the LLVM -graph that can refer to Values). This use list is how LLVM represents -def-use information in the program, and is accessible through the use_* -methods, shown below.

- -

Because LLVM is a typed representation, every LLVM Value is typed, -and this Type is available through the getType() -method. In addition, all LLVM values can be named. The "name" of the -Value is a symbolic string printed in the LLVM code:

- -

-%foo = add i32 1, 2
-

- -

The name of this instruction is "foo". NOTE -that the name of any value may be missing (an empty string), so names should -ONLY be used for debugging (making the source code easier to read, -debugging printouts), they should not be used to keep track of values or map -between them. For this purpose, use a std::map of pointers to the -Value itself instead.

- -

One important aspect of LLVM is that there is no distinction between an SSA -variable and the operation that produces it. Because of this, any reference to -the value produced by an instruction (or the value available as an incoming -argument, for example) is represented as a direct pointer to the instance of -the class that -represents this value. Although this may take some getting used to, it -simplifies the representation and makes it easier to manipulate.

- - -

- Important Public Members of the `Value` class -

- -

Value::use_iterator - Typedef for iterator over the -use-list
- Value::const_use_iterator - Typedef for const_iterator over -the use-list
- unsigned use_size() - Returns the number of users of the -value.
- bool use_empty() - Returns true if there are no users.
- use_iterator use_begin() - Get an iterator to the start of -the use-list.
- use_iterator use_end() - Get an iterator to the end of the -use-list.
- User *use_back() - Returns the last -element in the list. -
These methods are the interface to access the def-use -information in LLVM. As with all other iterators in LLVM, the naming -conventions follow the conventions defined by the STL.
-
Type *getType() const -
This method returns the Type of the Value.
-
bool hasName() const
- std::string getName() const
- void setName(const std::string &Name) -
This family of methods is used to access and assign a name to a Value, -be aware of the precaution above.
-
void replaceAllUsesWith(Value *V) - -
This method traverses the use list of a Value changing all Users of the current value to refer to - "V" instead. For example, if you detect that an instruction always - produces a constant value (for example through constant folding), you can - replace all uses of the instruction with the constant like this:
- -
-
```
-Inst->replaceAllUsesWith(ConstVal);
-
```
-
- -

- -

- - -

- The `User` class -

- -

-#include "llvm/User.h"
-doxygen info: User Class
-Superclass: Value

- -

The User class is the common base class of all LLVM nodes that may -refer to Values. It exposes a list of "Operands" -that are all of the Values that the User is -referring to. The User class itself is a subclass of -Value.

- -

The operands of a User point directly to the LLVM Value that it refers to. Because LLVM uses Static -Single Assignment (SSA) form, there can only be one definition referred to, -allowing this direct connection. This connection provides the use-def -information in LLVM.

- - -

- Important Public Members of the `User` class -

- -

The User class exposes the operand list in two ways: through -an index access interface and through an iterator based interface.

- -

Value *getOperand(unsigned i)
- unsigned getNumOperands() -
These two methods expose the operands of the User in a -convenient form for direct access.
User::op_iterator - Typedef for iterator over the operand -list
- op_iterator op_begin() - Get an iterator to the start of -the operand list.
- op_iterator op_end() - Get an iterator to the end of the -operand list. -
Together, these methods make up the iterator based interface to -the operands of a User.

- -

- - -

- The `Instruction` class -

- -

#include "llvm/Instruction.h"
-doxygen info: Instruction Class
-Superclasses: User, Value

- -

The Instruction class is the common base class for all LLVM -instructions. It provides only a few methods, but is a very commonly used -class. The primary data tracked by the Instruction class itself is the -opcode (instruction type) and the parent BasicBlock the Instruction is embedded -into. To represent a specific type of instruction, one of many subclasses of -Instruction are used.

- -

Because the Instruction class subclasses the User class, its operands can be accessed in the same -way as for other Users (with the -getOperand()/getNumOperands() and -op_begin()/op_end() methods).

An important file for -the Instruction class is the llvm/Instruction.def file. This -file contains some meta-data about the various different types of instructions -in LLVM. It describes the enum values that are used as opcodes (for example -Instruction::Add and Instruction::ICmp), as well as the -concrete sub-classes of Instruction that implement the instruction (for -example BinaryOperator and CmpInst). Unfortunately, the use of macros in -this file confuses doxygen, so these enum values don't show up correctly in the -doxygen output.

- - -

- - Important Subclasses of the `Instruction` class - -

BinaryOperator -
This subclasses represents all two operand instructions whose operands - must be the same type, except for the comparison instructions.
CastInst -
This subclass is the parent of the 12 casting instructions. It provides - common operations on cast instructions.
-
CmpInst -
This subclass respresents the two comparison instructions, - ICmpInst (integer opreands), and - FCmpInst (floating point operands).
-
TerminatorInst -
This subclass is the parent of all terminator instructions (those which - can terminate a block).
-

- - -

- - Important Public Members of the `Instruction` class - -

- -

BasicBlock *getParent() -
Returns the BasicBlock that -this Instruction is embedded into.
bool mayWriteToMemory() -
Returns true if the instruction writes to memory, i.e. it is a - call,free,invoke, or store.
unsigned getOpcode() -
Returns the opcode for the Instruction.
Instruction *clone() const -
Returns another instance of the specified instruction, identical -in all ways to the original except that the instruction has no parent -(ie it's not embedded into a BasicBlock), -and it has no name

- -

- - -

- The `Constant` class and subclasses -

- -

Constant represents a base class for different types of constants. It -is subclassed by ConstantInt, ConstantArray, etc. for representing -the various types of Constants. GlobalValue is also -a subclass, which represents the address of a global variable or function. -

- - -

Important Subclasses of Constant

ConstantInt : This subclass of Constant represents an integer constant of - any width. -
- const APInt& getValue() const: Returns the underlying - value of this constant, an APInt value.
- int64_t getSExtValue() const: Converts the underlying APInt - value to an int64_t via sign extension. If the value (not the bit width) - of the APInt is too large to fit in an int64_t, an assertion will result. - For this reason, use of this method is discouraged.
- uint64_t getZExtValue() const: Converts the underlying APInt - value to a uint64_t via zero extension. IF the value (not the bit width) - of the APInt is too large to fit in a uint64_t, an assertion will result. - For this reason, use of this method is discouraged.
- static ConstantInt* get(const APInt& Val): Returns the - ConstantInt object that represents the value provided by Val. - The type is implied as the IntegerType that corresponds to the bit width - of Val.
- static ConstantInt* get(const Type *Ty, uint64_t Val): - Returns the ConstantInt object that represents the value provided by - Val for integer type Ty.
-
ConstantFP : This class represents a floating point constant. -
- double getValue() const: Returns the underlying value of - this constant.
-
ConstantArray : This represents a constant array. -
- const std::vector<Use> &getValues() const: Returns - a vector of component constants that makeup this array.
-
ConstantStruct : This represents a constant struct. -
- const std::vector<Use> &getValues() const: Returns - a vector of component constants that makeup this array.
-
GlobalValue : This represents either a global variable or a function. In - either case, the value is a constant fixed address (after linking). -

- -

- - -

- The `GlobalValue` class -

- -

#include "llvm/GlobalValue.h"
-doxygen info: GlobalValue -Class
-Superclasses: Constant, -User, Value

- -

Global values (GlobalVariables or Functions) are the only LLVM values that are -visible in the bodies of all Functions. -Because they are visible at global scope, they are also subject to linking with -other globals defined in different translation units. To control the linking -process, GlobalValues know their linkage rules. Specifically, -GlobalValues know whether they have internal or external linkage, as -defined by the LinkageTypes enumeration.

- -

If a GlobalValue has internal linkage (equivalent to being -static in C), it is not visible to code outside the current translation -unit, and does not participate in linking. If it has external linkage, it is -visible to external code, and does participate in linking. In addition to -linkage information, GlobalValues keep track of which Module they are currently part of.

- -

Because GlobalValues are memory objects, they are always referred to -by their address. As such, the Type of a -global is always a pointer to its contents. It is important to remember this -when using the GetElementPtrInst instruction because this pointer must -be dereferenced first. For example, if you have a GlobalVariable (a -subclass of GlobalValue) that is an array of 24 ints, type [24 x -i32], then the GlobalVariable is a pointer to that array. Although -the address of the first element of this array and the value of the -GlobalVariable are the same, they have different types. The -GlobalVariable's type is [24 x i32]. The first element's type -is i32. Because of this, accessing a global value requires you to -dereference the pointer with GetElementPtrInst first, then its elements -can be accessed. This is explained in the LLVM -Language Reference Manual.

- - -

- - Important Public Members of the `GlobalValue` class - -

- -

bool hasInternalLinkage() const
- bool hasExternalLinkage() const
- void setInternalLinkage(bool HasInternalLinkage) -
These methods manipulate the linkage characteristics of the GlobalValue.
-

-
Module *getParent() -
This returns the Module that the -GlobalValue is currently embedded into.

- -

- - -

- The `Function` class -

- -

#include "llvm/Function.h"
doxygen -info: Function Class
-Superclasses: GlobalValue, -Constant, -User, -Value

- -

The Function class represents a single procedure in LLVM. It is -actually one of the more complex classes in the LLVM hierarchy because it must -keep track of a large amount of data. The Function class keeps track -of a list of BasicBlocks, a list of formal -Arguments, and a -SymbolTable.

- -

The list of BasicBlocks is the most -commonly used part of Function objects. The list imposes an implicit -ordering of the blocks in the function, which indicate how the code will be -laid out by the backend. Additionally, the first BasicBlock is the implicit entry node for the -Function. It is not legal in LLVM to explicitly branch to this initial -block. There are no implicit exit nodes, and in fact there may be multiple exit -nodes from a single Function. If the BasicBlock list is empty, this indicates that -the Function is actually a function declaration: the actual body of the -function hasn't been linked in yet.

- -

In addition to a list of BasicBlocks, the -Function class also keeps track of the list of formal Arguments that the function receives. This -container manages the lifetime of the Argument -nodes, just like the BasicBlock list does for -the BasicBlocks.

- -

The SymbolTable is a very rarely used -LLVM feature that is only used when you have to look up a value by name. Aside -from that, the SymbolTable is used -internally to make sure that there are not conflicts between the names of Instructions, BasicBlocks, or Arguments in the function body.

- -

Note that Function is a GlobalValue -and therefore also a Constant. The value of the function -is its address (after linking) which is guaranteed to be constant.

- - -

- - Important Public Members of the `Function` class - -

- -

Function(const FunctionType - *Ty, LinkageTypes Linkage, const std::string &N = "", Module* Parent = 0) - -
Constructor used when you need to create new Functions to add - the program. The constructor must specify the type of the function to - create and what type of linkage the function should have. The FunctionType argument - specifies the formal arguments and return value for the function. The same - FunctionType value can be used to - create multiple functions. The Parent argument specifies the Module - in which the function is defined. If this argument is provided, the function - will automatically be inserted into that module's list of - functions.
bool isDeclaration() - -
Return whether or not the Function has a body defined. If the - function is "external", it does not have a body, and thus must be resolved - by linking with a function defined in a different translation unit.
Function::iterator - Typedef for basic block list iterator
- Function::const_iterator - Typedef for const_iterator.
- - begin(), end() - size(), empty() - -
These are forwarding methods that make it easy to access the contents of - a Function object's BasicBlock - list.
Function::BasicBlockListType &getBasicBlockList() - -
Returns the list of BasicBlocks. This - is necessary to use when you need to update the list or perform a complex - action that doesn't have a forwarding method.
Function::arg_iterator - Typedef for the argument list -iterator
- Function::const_arg_iterator - Typedef for const_iterator.
- - arg_begin(), arg_end() - arg_size(), arg_empty() - -
These are forwarding methods that make it easy to access the contents of - a Function object's Argument - list.
Function::ArgumentListType &getArgumentList() - -
Returns the list of Arguments. This is - necessary to use when you need to update the list or perform a complex - action that doesn't have a forwarding method.
BasicBlock &getEntryBlock() - -
Returns the entry BasicBlock for the - function. Because the entry block for the function is always the first - block, this returns the first block of the Function.
Type *getReturnType()
- FunctionType *getFunctionType() - -
This traverses the Type of the - Function and returns the return type of the function, or the FunctionType of the actual - function.
SymbolTable *getSymbolTable() - -
Return a pointer to the SymbolTable - for this Function.

- -

- - -

- The `GlobalVariable` class -

- -

#include "llvm/GlobalVariable.h" -
-doxygen info: GlobalVariable - Class
-Superclasses: GlobalValue, -Constant, -User, -Value

- -

Global variables are represented with the (surprise surprise) -GlobalVariable class. Like functions, GlobalVariables are also -subclasses of GlobalValue, and as such are -always referenced by their address (global values must live in memory, so their -"name" refers to their constant address). See -GlobalValue for more on this. Global -variables may have an initial value (which must be a -Constant), and if they have an initializer, -they may be marked as "constant" themselves (indicating that their contents -never change at runtime).

- - -

- - Important Public Members of the `GlobalVariable` class - -

- -

GlobalVariable(const Type *Ty, bool - isConstant, LinkageTypes& Linkage, Constant - *Initializer = 0, const std::string &Name = "", Module* Parent = 0) - -
Create a new global variable of the specified type. If - isConstant is true then the global variable will be marked as - unchanging for the program. The Linkage parameter specifies the type of - linkage (internal, external, weak, linkonce, appending) for the variable. - If the linkage is InternalLinkage, WeakAnyLinkage, WeakODRLinkage, - LinkOnceAnyLinkage or LinkOnceODRLinkage, then the resultant - global variable will have internal linkage. AppendingLinkage concatenates - together all instances (in different translation units) of the variable - into a single variable but is only applicable to arrays. See - the LLVM Language Reference for - further details on linkage types. Optionally an initializer, a name, and the - module to put the variable into may be specified for the global variable as - well.
bool isConstant() const - -
Returns true if this is a global variable that is known not to - be modified at runtime.
bool hasInitializer() - -
Returns true if this GlobalVariable has an intializer.
Constant *getInitializer() - -
Returns the initial value for a GlobalVariable. It is not legal - to call this method if there is no initializer.

- -

- - -

- The `BasicBlock` class -

- -

#include "llvm/BasicBlock.h"
-doxygen info: BasicBlock -Class
-Superclass: Value

- -

This class represents a single entry single exit section of the code, -commonly known as a basic block by the compiler community. The -BasicBlock class maintains a list of Instructions, which form the body of the block. -Matching the language definition, the last element of this list of instructions -is always a terminator instruction (a subclass of the TerminatorInst class).

- -

In addition to tracking the list of instructions that make up the block, the -BasicBlock class also keeps track of the Function that it is embedded into.

- -

Note that BasicBlocks themselves are Values, because they are referenced by instructions -like branches and can go in the switch tables. BasicBlocks have type -label.

- - -

- - Important Public Members of the `BasicBlock` class - -

- -

BasicBlock(const std::string &Name = "", Function *Parent = 0) - -
The BasicBlock constructor is used to create new basic blocks for -insertion into a function. The constructor optionally takes a name for the new -block, and a Function to insert it into. If -the Parent parameter is specified, the new BasicBlock is -automatically inserted at the end of the specified Function, if not specified, the BasicBlock must be -manually inserted into the Function.
BasicBlock::iterator - Typedef for instruction list iterator
-BasicBlock::const_iterator - Typedef for const_iterator.
-begin(), end(), front(), back(), -size(), empty() -STL-style functions for accessing the instruction list. - -
These methods and typedefs are forwarding functions that have the same -semantics as the standard library methods of the same names. These methods -expose the underlying instruction list of a basic block in a way that is easy to -manipulate. To get the full complement of container operations (including -operations to update the list), you must use the getInstList() -method.
BasicBlock::InstListType &getInstList() - -
This method is used to get access to the underlying container that actually -holds the Instructions. This method must be used when there isn't a forwarding -function in the BasicBlock class for the operation that you would like -to perform. Because there are no forwarding functions for "updating" -operations, you need to use this if you want to update the contents of a -BasicBlock.
Function *getParent() - -
Returns a pointer to Function the block is -embedded into, or a null pointer if it is homeless.
TerminatorInst *getTerminator() - -
Returns a pointer to the terminator instruction that appears at the end of -the BasicBlock. If there is no terminator instruction, or if the last -instruction in the block is not a terminator, then a null pointer is -returned.

- -

- - -

- The `Argument` class -

- -

This subclass of Value defines the interface for incoming formal -arguments to a function. A Function maintains a list of its formal -arguments. An argument has a pointer to the parent Function.

- -

- - -

- - Dinakar Dhurjati and - Chris Lattner
- The LLVM Compiler Infrastructure
- Last modified: $Date$ -

- - - -- cgit v1.2.3-70-g09d2

- LLVM Programmer's Manual -

- Introduction -

- General Information -

- The C++ Standard Template Library -

- Other useful references -

- Important and useful LLVM APIs -

- The isa<>, cast<> and - dyn_cast<> templates -

- Passing strings (the StringRef -and Twine classes) -

- The StringRef class -

- The Twine class -

- The DEBUG() macro and -debug option -

- Fine grained debug info with DEBUG_TYPE and - the -debug-only option -

- The Statistic class & -stats - option -

- Viewing graphs while debugging code -

- Picking the Right Data Structure for a Task -

- Sequential Containers (std::vector, std::list, etc) -

- llvm/ADT/ArrayRef.h -

- Fixed Size Arrays -

- Heap Allocated Arrays -

- "llvm/ADT/TinyPtrVector.h" -

- "llvm/ADT/SmallVector.h" -

- <vector> -

- <deque> -

- <list> -

- llvm/ADT/ilist.h -

- llvm/ADT/PackedVector.h -

- ilist_traits -

- iplist -

- llvm/ADT/ilist_node.h -

- Sentinels -

- Other Sequential Container options -

- String-like containers -

- llvm/ADT/StringRef.h -

- llvm/ADT/Twine.h -

- llvm/ADT/SmallString.h -

- std::string -

- Set-Like Containers (std::set, SmallSet, SetVector, etc) -

- A sorted 'vector' -

- "llvm/ADT/SmallSet.h" -

- "llvm/ADT/SmallPtrSet.h" -

- "llvm/ADT/DenseSet.h" -

- "llvm/ADT/SparseSet.h" -

- "llvm/ADT/FoldingSet.h" -

- <set> -

- "llvm/ADT/SetVector.h" -

- "llvm/ADT/UniqueVector.h" -

- "llvm/ADT/ImmutableSet.h" -

- Other Set-Like Container Options -

- Map-Like Containers (std::map, DenseMap, etc) -

- A sorted 'vector' -

- "llvm/ADT/StringMap.h" -

- "llvm/ADT/IndexedMap.h" -

- "llvm/ADT/DenseMap.h" -

- "llvm/ADT/ValueMap.h" -

- "llvm/ADT/IntervalMap.h" -

- <map> -

- "llvm/ADT/MapVector.h" -

- "llvm/ADT/IntEqClasses.h" -

- "llvm/ADT/ImmutableMap.h" -

- Other Map-Like Container Options -

- Bit storage containers (BitVector, SparseBitVector) -

- BitVector -

- SmallBitVector -

- SparseBitVector -

- Helpful Hints for Common Operations -

- Basic Inspection and Traversal Routines -

- Iterating over the BasicBlocks in a Function -

- Iterating over the Instructions in a BasicBlock -

- Iterating over the Instructions in a Function -

- Turning an iterator into a class pointer (and - vice-versa) -

- Finding call sites: a slightly more complex - example -

- Treating calls and invokes the same way -

- Iterating over def-use & use-def chains -

- Iterating over predecessors & -successors of blocks -

- Making simple changes -

- Creating and inserting new - Instructions -

- Deleting Instructions -

- Replacing an Instruction with another - Value -

Replacing individual instructions

Deleting Instructions

- The `isa<>`, `cast<>` and - `dyn_cast<>` templates -

- Passing strings (the `StringRef` -and `Twine` classes) -

- The `StringRef` class -

- The `Twine` class -

- The `DEBUG()` macro and `-debug` option -

- Fine grained debug info with `DEBUG_TYPE` and - the `-debug-only` option -

- The `Statistic` class & `-stats` - option -

- Iterating over the `BasicBlock`s in a `Function` -

- Iterating over the `Instruction`s in a `BasicBlock` -

- Iterating over the `Instruction`s in a `Function` -

- Creating and inserting new - `Instruction`s -

- Deleting `Instruction`s -

- Replacing an `Instruction` with another - `Value` -

Deleting `Instruction`s

Replacing multiple uses of `User`s and `Value`s

- Deleting `GlobalVariable`s -

- Ending Execution with `llvm_shutdown()` -

- Lazy Initialization with `ManagedStatic` -

- Achieving Isolation with `LLVMContext` -

- The `ValueSymbolTable` class -

- The `User` and owned `Use` classes' memory layout -

- - Interaction and relationship between `User` and `Use` objects - -

- The `Type` class and Derived Types -

- The `Module` class -

- Important Public Members of the `Module` class -

- The `Value` class -

- Important Public Members of the `Value` class -

- The `User` class -

- Important Public Members of the `User` class -

- The `Instruction` class -

- - Important Subclasses of the `Instruction` class - -

- - Important Public Members of the `Instruction` class - -

- The `Constant` class and subclasses -

- The `GlobalValue` class -

- - Important Public Members of the `GlobalValue` class - -

- The `Function` class -

- - Important Public Members of the `Function` class - -

- The `GlobalVariable` class -

- - Important Public Members of the `GlobalVariable` class - -

- The `BasicBlock` class -

- - Important Public Members of the `BasicBlock` class - -

- The `Argument` class -