Customizing LLVMC: Reference Manual

+ +++ + + + +

Author:	Mikhail Glushenkov <foldr@codedegers.com>

LLVMC is a generic compiler driver, designed to be customizable and +extensible. It plays the same role for LLVM as the gcc program +does for GCC - LLVMC's job is essentially to transform a set of input +files into a set of targets depending on configuration rules and user +options. What makes LLVMC different is that these transformation rules +are completely customizable - in fact, LLVMC knows nothing about the +specifics of transformation (even the command-line options are mostly +not hard-coded) and regards the transformation structure as an +abstract graph. The structure of this graph is completely determined +by plugins, which can be either statically or dynamically linked. This +makes it possible to easily adapt LLVMC for other purposes - for +example, as a build tool for game resources.

Because LLVMC employs TableGen [1] as its configuration language, you +need to be familiar with it to customize LLVMC.

Contents

Compiling with LLVMC
Predefined options
Compiling LLVMC plugins
Customizing LLVMC: the compilation graph
Describing options
- External options
+
Conditional evaluation
Writing a tool description
- Actions
+
Language map
More advanced topics
+
References

Compiling with LLVMC

LLVMC tries hard to be as compatible with gcc as possible, +although there are some small differences. Most of the time, however, +you shouldn't be able to notice them:

+$ # This works as expected:
+$ llvmc -O3 -Wall hello.cpp
+$ ./a.out
+hello
+

One nice feature of LLVMC is that one doesn't have to distinguish +between different compilers for different languages (think g++ and +gcc) - the right toolchain is chosen automatically based on input +language names (which are, in turn, determined from file +extensions). If you want to force files ending with ".c" to compile as +C++, use the -x option, just like you would do it with gcc:

+$ # hello.c is really a C++ file
+$ llvmc -x c++ hello.c
+$ ./a.out
+hello
+

On the other hand, when using LLVMC as a linker to combine several C++ +object files you should provide the --linker option since it's +impossible for LLVMC to choose the right linker in that case:

+$ llvmc -c hello.cpp
+$ llvmc hello.o
+[A lot of link-time errors skipped]
+$ llvmc --linker=c++ hello.o
+$ ./a.out
+hello
+

By default, LLVMC uses llvm-gcc to compile the source code. It is +also possible to choose the work-in-progress clang compiler with +the -clang option.

Predefined options

LLVMC has some built-in options that can't be overridden in the +configuration libraries:

-o FILE - Output file name.
-x LANGUAGE - Specify the language of the following input files +until the next -x option.
-load PLUGIN_NAME - Load the specified plugin DLL. Example: +-load $LLVM_DIR/Release/lib/LLVMCSimple.so.
-v - Enable verbose mode, i.e. print out all executed commands.
--view-graph - Show a graphical representation of the compilation +graph. Requires that you have dot and gv programs +installed. Hidden option, useful for debugging.
--write-graph - Write a compilation-graph.dot file in the +current directory with the compilation graph description in the +Graphviz format. Hidden option, useful for debugging.
--save-temps - Write temporary files to the current directory +and do not delete them on exit. Hidden option, useful for debugging.
--help, --help-hidden, --version - These options have +their standard meaning.

Compiling LLVMC plugins

It's easiest to start working on your own LLVMC plugin by copying the +skeleton project which lives under $LLVMC_DIR/plugins/Simple:

+$ cd $LLVMC_DIR/plugins
+$ cp -r Simple MyPlugin
+$ cd MyPlugin
+$ ls
+Makefile PluginMain.cpp Simple.td
+

As you can see, our basic plugin consists of only two files (not +counting the build script). Simple.td contains TableGen +description of the compilation graph; its format is documented in the +following sections. PluginMain.cpp is just a helper file used to +compile the auto-generated C++ code produced from TableGen source. It +can also contain hook definitions (see below).

The first thing that you should do is to change the LLVMC_PLUGIN +variable in the Makefile to avoid conflicts (since this variable +is used to name the resulting library):

+LLVMC_PLUGIN=MyPlugin
+

It is also a good idea to rename Simple.td to something less +generic:

+$ mv Simple.td MyPlugin.td
+

Note that the plugin source directory must be placed under +$LLVMC_DIR/plugins to make use of the existing build +infrastructure. To build a version of the LLVMC executable called +mydriver with your plugin compiled in, use the following command:

+$ cd $LLVMC_DIR
+$ make BUILTIN_PLUGINS=MyPlugin DRIVER_NAME=mydriver
+

To build your plugin as a dynamic library, just cd to its source +directory and run make. The resulting file will be called +LLVMC$(LLVMC_PLUGIN).$(DLL_EXTENSION) (in our case, +LLVMCMyPlugin.so). This library can be then loaded in with the +-load option. Example:

+$ cd $LLVMC_DIR/plugins/Simple
+$ make
+$ llvmc -load $LLVM_DIR/Release/lib/LLVMCSimple.so
+

Sometimes, you will want a 'bare-bones' version of LLVMC that has no +built-in plugins. It can be compiled with the following command:

+$ cd $LLVMC_DIR
+$ make BUILTIN_PLUGINS=""
+

Customizing LLVMC: the compilation graph

Each TableGen configuration file should include the common +definitions:

+include "llvm/CompilerDriver/Common.td"
+

Internally, LLVMC stores information about possible source +transformations in form of a graph. Nodes in this graph represent +tools, and edges between two nodes represent a transformation path. A +special "root" node is used to mark entry points for the +transformations. LLVMC also assigns a weight to each edge (more on +this later) to choose between several alternative edges.

The definition of the compilation graph (see file +plugins/Base/Base.td for an example) is just a list of edges:

+def CompilationGraph : CompilationGraph<[
+    Edge<"root", "llvm_gcc_c">,
+    Edge<"root", "llvm_gcc_assembler">,
+    ...
+
+    Edge<"llvm_gcc_c", "llc">,
+    Edge<"llvm_gcc_cpp", "llc">,
+    ...
+
+    OptionalEdge<"llvm_gcc_c", "opt", (case (switch_on "opt"),
+                                      (inc_weight))>,
+    OptionalEdge<"llvm_gcc_cpp", "opt", (case (switch_on "opt"),
+                                              (inc_weight))>,
+    ...
+
+    OptionalEdge<"llvm_gcc_assembler", "llvm_gcc_cpp_linker",
+        (case (input_languages_contain "c++"), (inc_weight),
+              (or (parameter_equals "linker", "g++"),
+                  (parameter_equals "linker", "c++")), (inc_weight))>,
+    ...
+
+    ]>;
+

As you can see, the edges can be either default or optional, where +optional edges are differentiated by an additional case expression +used to calculate the weight of this edge. Notice also that we refer +to tools via their names (as strings). This makes it possible to add +edges to an existing compilation graph in plugins without having to +know about all tool definitions used in the graph.

The default edges are assigned a weight of 1, and optional edges get a +weight of 0 + 2*N where N is the number of tests that evaluated to +true in the case expression. It is also possible to provide an +integer parameter to inc_weight and dec_weight - in this case, +the weight is increased (or decreased) by the provided value instead +of the default 2. It is also possible to change the default weight of +an optional edge by using the default clause of the case +construct.

When passing an input file through the graph, LLVMC picks the edge +with the maximum weight. To avoid ambiguity, there should be only one +default edge between two nodes (with the exception of the root node, +which gets a special treatment - there you are allowed to specify one +default edge per language).

When multiple plugins are loaded, their compilation graphs are merged +together. Since multiple edges that have the same end nodes are not +allowed (i.e. the graph is not a multigraph), an edge defined in +several plugins will be replaced by the definition from the plugin +that was loaded last. Plugin load order can be controlled by using the +plugin priority feature described above.

To get a visual representation of the compilation graph (useful for +debugging), run llvmc --view-graph. You will need dot and +gsview installed for this to work properly.

Describing options

Command-line options that the plugin supports are defined by using an +OptionList:

+def Options : OptionList<[
+(switch_option "E", (help "Help string")),
+(alias_option "quiet", "q")
+...
+]>;
+

As you can see, the option list is just a list of DAGs, where each DAG +is an option description consisting of the option name and some +properties. A plugin can define more than one option list (they are +all merged together in the end), which can be handy if one wants to +separate option groups syntactically.

Possible option types:
+
+
- switch_option - a simple boolean switch, for example -time.
- parameter_option - option that takes an argument, for example +-std=c99;
- parameter_list_option - same as the above, but more than one +occurence of the option is allowed.
- prefix_option - same as the parameter_option, but the option name +and parameter value are not separated.
- prefix_list_option - same as the above, but more than one +occurence of the option is allowed; example: -lm -lpthread.
- alias_option - a special option type for creating +aliases. Unlike other option types, aliases are not allowed to +have any properties besides the aliased option name. Usage +example: (alias_option "preprocess", "E")
+
+
Possible option properties:
+
+
- help - help string associated with this option. Used for +--help output.
- required - this option is obligatory.
- hidden - this option should not appear in the --help +output (but should appear in the --help-hidden output).
- really_hidden - the option should not appear in any help +output.
- extern - this option is defined in some other plugin, see below.
+
+

External options

Sometimes, when linking several plugins together, one plugin needs to +access options defined in some other plugin. Because of the way +options are implemented, such options should be marked as +extern. This is what the extern option property is +for. Example:

+...
+(switch_option "E", (extern))
+...
+

See also the section on plugin priorities.

Conditional evaluation

The 'case' construct is the main means by which programmability is +achieved in LLVMC. It can be used to calculate edge weights, program +actions and modify the shell commands to be executed. The 'case' +expression is designed after the similarly-named construct in +functional languages and takes the form (case (test_1), statement_1, +(test_2), statement_2, ... (test_N), statement_N). The statements +are evaluated only if the corresponding tests evaluate to true.

Examples:

+// Edge weight calculation
+
+// Increases edge weight by 5 if "-A" is provided on the
+// command-line, and by 5 more if "-B" is also provided.
+(case
+    (switch_on "A"), (inc_weight 5),
+    (switch_on "B"), (inc_weight 5))
+
+
+// Tool command line specification
+
+// Evaluates to "cmdline1" if the option "-A" is provided on the
+// command line; to "cmdline2" if "-B" is provided;
+// otherwise to "cmdline3".
+
+(case
+    (switch_on "A"), "cmdline1",
+    (switch_on "B"), "cmdline2",
+    (default), "cmdline3")
+

Note the slight difference in 'case' expression handling in contexts +of edge weights and command line specification - in the second example +the value of the "B" switch is never checked when switch "A" is +enabled, and the whole expression always evaluates to "cmdline1" in +that case.

Case expressions can also be nested, i.e. the following is legal:

+(case (switch_on "E"), (case (switch_on "o"), ..., (default), ...)
+      (default), ...)
+

You should, however, try to avoid doing that because it hurts +readability. It is usually better to split tool descriptions and/or +use TableGen inheritance instead.

Possible tests are:
- switch_on - Returns true if a given command-line switch is +provided by the user. Example: (switch_on "opt").
- parameter_equals - Returns true if a command-line parameter equals +a given value. +Example: (parameter_equals "W", "all").
- element_in_list - Returns true if a command-line parameter +list contains a given value. +Example: (parameter_in_list "l", "pthread").
- input_languages_contain - Returns true if a given language +belongs to the current input language set. +Example: (input_languages_contain "c++").
- in_language - Evaluates to true if the input file language +equals to the argument. At the moment works only with cmd_line +and actions (on non-join nodes). +Example: (in_language "c++").
- not_empty - Returns true if a given option (which should be +either a parameter or a parameter list) is set by the +user. +Example: (not_empty "o").
- default - Always evaluates to true. Should always be the last +test in the case expression.
- and - A standard logical combinator that returns true iff all +of its arguments return true. Used like this: (and (test1), +(test2), ... (testN)). Nesting of and and or is allowed, +but not encouraged.
- or - Another logical combinator that returns true only if any +one of its arguments returns true. Example: (or (test1), +(test2), ... (testN)).
+

Writing a tool description

As was said earlier, nodes in the compilation graph represent tools, +which are described separately. A tool definition looks like this +(taken from the include/llvm/CompilerDriver/Tools.td file):

+def llvm_gcc_cpp : Tool<[
+    (in_language "c++"),
+    (out_language "llvm-assembler"),
+    (output_suffix "bc"),
+    (cmd_line "llvm-g++ -c $INFILE -o $OUTFILE -emit-llvm"),
+    (sink)
+    ]>;
+

This defines a new tool called llvm_gcc_cpp, which is an alias for +llvm-g++. As you can see, a tool definition is just a list of +properties; most of them should be self-explanatory. The sink +property means that this tool should be passed all command-line +options that aren't mentioned in the option list.

The complete list of all currently implemented tool properties follows.

Possible tool properties:
- in_language - input language name. Can be either a string or a +list, in case the tool supports multiple input languages.
- out_language - output language name. Tools are not allowed to +have multiple output languages.
- output_suffix - output file suffix. Can also be changed +dynamically, see documentation on actions.
- cmd_line - the actual command used to run the tool. You can +use $INFILE and $OUTFILE variables, output redirection +with >, hook invocations ($CALL), environment variables +(via $ENV) and the case construct.
- join - this tool is a "join node" in the graph, i.e. it gets a +list of input files and joins them together. Used for linkers.
- sink - all command-line options that are not handled by other +tools are passed to this tool.
- actions - A single big case expression that specifies how +this tool reacts on command-line options (described in more detail +below).
+

Actions

A tool often needs to react to command-line options, and this is +precisely what the actions property is for. The next example +illustrates this feature:

+def llvm_gcc_linker : Tool<[
+    (in_language "object-code"),
+    (out_language "executable"),
+    (output_suffix "out"),
+    (cmd_line "llvm-gcc $INFILE -o $OUTFILE"),
+    (join),
+    (actions (case (not_empty "L"), (forward "L"),
+                   (not_empty "l"), (forward "l"),
+                   (not_empty "dummy"),
+                             [(append_cmd "-dummy1"), (append_cmd "-dummy2")])
+    ]>;
+

The actions tool property is implemented on top of the omnipresent +case expression. It associates one or more different actions +with given conditions - in the example, the actions are forward, +which forwards a given option unchanged, and append_cmd, which +appends a given string to the tool execution command. Multiple actions +can be associated with a single condition by using a list of actions +(used in the example to append some dummy options). The same case +construct can also be used in the cmd_line property to modify the +tool command line.

The "join" property used in the example means that this tool behaves +like a linker.

The list of all possible actions follows.

Possible actions:
+
+
- append_cmd - append a string to the tool invocation +command. +Example: (case (switch_on "pthread"), (append_cmd "-lpthread"))
- forward - forward an option unchanged. +Example: (forward "Wall").
- forward_as - Change the name of an option, but forward the +argument unchanged. +Example: (forward_as "O0" "--disable-optimization").
- output_suffix - modify the output suffix of this +tool. +Example: (output_suffix "i").
- stop_compilation - stop compilation after this tool processes +its input. Used without arguments.
- unpack_values - used for for splitting and forwarding +comma-separated lists of options, e.g. -Wa,-foo=bar,-baz is +converted to -foo=bar -baz and appended to the tool invocation +command. +Example: (unpack_values "Wa,").
+
+

Language map

If you are adding support for a new language to LLVMC, you'll need to +modify the language map, which defines mappings from file extensions +to language names. It is used to choose the proper toolchain(s) for a +given input file set. Language map definition looks like this:

+def LanguageMap : LanguageMap<
+    [LangToSuffixes<"c++", ["cc", "cp", "cxx", "cpp", "CPP", "c++", "C"]>,
+     LangToSuffixes<"c", ["c"]>,
+     ...
+    ]>;
+

For example, without those definitions the following command wouldn't work:

+$ llvmc hello.cpp
+llvmc: Unknown suffix: cpp
+

The language map entries should be added only for tools that are +linked with the root node. Since tools are not allowed to have +multiple output languages, for nodes "inside" the graph the input and +output languages should match. This is enforced at compile-time.

References

+ + + + + +

[1]	TableGen Fundamentals +http://llvm.cs.uiuc.edu/docs/TableGenFundamentals.html

+ + + + + +

[2]	Graphviz +http://www.graphviz.org/

+ + + + + +

[3]	Ghostview +http://pages.cs.wisc.edu/~ghost/

+ + Mikhail Glushenkov
+ LLVM Compiler Infrastructure
+ + Last modified: $Date: 2008-12-11 11:34:48 -0600 (Thu, 11 Dec 2008) $ +

Customizing LLVMC: Reference Manual

Compiling with LLVMC

Predefined options

Compiling LLVMC plugins

Customizing LLVMC: the compilation graph

Describing options

External options

Conditional evaluation

Writing a tool description

Actions

Language map

More advanced topics

Hooks and environment variables

How plugins are loaded

Debugging

References