From 68319f87cc07756e29c6b98efd934577312561ec Mon Sep 17 00:00:00 2001 From: Mikhail Glushenkov Date: Thu, 11 Dec 2008 23:24:40 +0000 Subject: Update the auto-generated llvmc documentation. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60909 91177308-0d34-0410-b5e6-96231b3b80d8 --- docs/CompilerDriverTutorial.html | 603 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 603 insertions(+) create mode 100644 docs/CompilerDriverTutorial.html (limited to 'docs/CompilerDriverTutorial.html') diff --git a/docs/CompilerDriverTutorial.html b/docs/CompilerDriverTutorial.html new file mode 100644 index 0000000000..2eb452af0f --- /dev/null +++ b/docs/CompilerDriverTutorial.html @@ -0,0 +1,603 @@ + + + + + + +Customizing LLVMC: Reference Manual + + + + +
+

Customizing LLVMC: Reference Manual

+ +++ + + + +
Author:Mikhail Glushenkov <foldr@codedegers.com>
+

LLVMC is a generic compiler driver, designed to be customizable and +extensible. It plays the same role for LLVM as the gcc program +does for GCC - LLVMC's job is essentially to transform a set of input +files into a set of targets depending on configuration rules and user +options. What makes LLVMC different is that these transformation rules +are completely customizable - in fact, LLVMC knows nothing about the +specifics of transformation (even the command-line options are mostly +not hard-coded) and regards the transformation structure as an +abstract graph. The structure of this graph is completely determined +by plugins, which can be either statically or dynamically linked. This +makes it possible to easily adapt LLVMC for other purposes - for +example, as a build tool for game resources.

+

Because LLVMC employs TableGen [1] as its configuration language, you +need to be familiar with it to customize LLVMC.

+ +
+

Compiling with LLVMC

+

LLVMC tries hard to be as compatible with gcc as possible, +although there are some small differences. Most of the time, however, +you shouldn't be able to notice them:

+
+$ # This works as expected:
+$ llvmc -O3 -Wall hello.cpp
+$ ./a.out
+hello
+
+

One nice feature of LLVMC is that one doesn't have to distinguish +between different compilers for different languages (think g++ and +gcc) - the right toolchain is chosen automatically based on input +language names (which are, in turn, determined from file +extensions). If you want to force files ending with ".c" to compile as +C++, use the -x option, just like you would do it with gcc:

+
+$ # hello.c is really a C++ file
+$ llvmc -x c++ hello.c
+$ ./a.out
+hello
+
+

On the other hand, when using LLVMC as a linker to combine several C++ +object files you should provide the --linker option since it's +impossible for LLVMC to choose the right linker in that case:

+
+$ llvmc -c hello.cpp
+$ llvmc hello.o
+[A lot of link-time errors skipped]
+$ llvmc --linker=c++ hello.o
+$ ./a.out
+hello
+
+

By default, LLVMC uses llvm-gcc to compile the source code. It is +also possible to choose the work-in-progress clang compiler with +the -clang option.

+
+
+

Predefined options

+

LLVMC has some built-in options that can't be overridden in the +configuration libraries:

+
    +
  • -o FILE - Output file name.
  • +
  • -x LANGUAGE - Specify the language of the following input files +until the next -x option.
  • +
  • -load PLUGIN_NAME - Load the specified plugin DLL. Example: +-load $LLVM_DIR/Release/lib/LLVMCSimple.so.
  • +
  • -v - Enable verbose mode, i.e. print out all executed commands.
  • +
  • --view-graph - Show a graphical representation of the compilation +graph. Requires that you have dot and gv programs +installed. Hidden option, useful for debugging.
  • +
  • --write-graph - Write a compilation-graph.dot file in the +current directory with the compilation graph description in the +Graphviz format. Hidden option, useful for debugging.
  • +
  • --save-temps - Write temporary files to the current directory +and do not delete them on exit. Hidden option, useful for debugging.
  • +
  • --help, --help-hidden, --version - These options have +their standard meaning.
  • +
+
+
+

Compiling LLVMC plugins

+

It's easiest to start working on your own LLVMC plugin by copying the +skeleton project which lives under $LLVMC_DIR/plugins/Simple:

+
+$ cd $LLVMC_DIR/plugins
+$ cp -r Simple MyPlugin
+$ cd MyPlugin
+$ ls
+Makefile PluginMain.cpp Simple.td
+
+

As you can see, our basic plugin consists of only two files (not +counting the build script). Simple.td contains TableGen +description of the compilation graph; its format is documented in the +following sections. PluginMain.cpp is just a helper file used to +compile the auto-generated C++ code produced from TableGen source. It +can also contain hook definitions (see below).

+

The first thing that you should do is to change the LLVMC_PLUGIN +variable in the Makefile to avoid conflicts (since this variable +is used to name the resulting library):

+
+LLVMC_PLUGIN=MyPlugin
+
+

It is also a good idea to rename Simple.td to something less +generic:

+
+$ mv Simple.td MyPlugin.td
+
+

Note that the plugin source directory must be placed under +$LLVMC_DIR/plugins to make use of the existing build +infrastructure. To build a version of the LLVMC executable called +mydriver with your plugin compiled in, use the following command:

+
+$ cd $LLVMC_DIR
+$ make BUILTIN_PLUGINS=MyPlugin DRIVER_NAME=mydriver
+
+

To build your plugin as a dynamic library, just cd to its source +directory and run make. The resulting file will be called +LLVMC$(LLVMC_PLUGIN).$(DLL_EXTENSION) (in our case, +LLVMCMyPlugin.so). This library can be then loaded in with the +-load option. Example:

+
+$ cd $LLVMC_DIR/plugins/Simple
+$ make
+$ llvmc -load $LLVM_DIR/Release/lib/LLVMCSimple.so
+
+

Sometimes, you will want a 'bare-bones' version of LLVMC that has no +built-in plugins. It can be compiled with the following command:

+
+$ cd $LLVMC_DIR
+$ make BUILTIN_PLUGINS=""
+
+
+
+

Customizing LLVMC: the compilation graph

+

Each TableGen configuration file should include the common +definitions:

+
+include "llvm/CompilerDriver/Common.td"
+
+

Internally, LLVMC stores information about possible source +transformations in form of a graph. Nodes in this graph represent +tools, and edges between two nodes represent a transformation path. A +special "root" node is used to mark entry points for the +transformations. LLVMC also assigns a weight to each edge (more on +this later) to choose between several alternative edges.

+

The definition of the compilation graph (see file +plugins/Base/Base.td for an example) is just a list of edges:

+
+def CompilationGraph : CompilationGraph<[
+    Edge<"root", "llvm_gcc_c">,
+    Edge<"root", "llvm_gcc_assembler">,
+    ...
+
+    Edge<"llvm_gcc_c", "llc">,
+    Edge<"llvm_gcc_cpp", "llc">,
+    ...
+
+    OptionalEdge<"llvm_gcc_c", "opt", (case (switch_on "opt"),
+                                      (inc_weight))>,
+    OptionalEdge<"llvm_gcc_cpp", "opt", (case (switch_on "opt"),
+                                              (inc_weight))>,
+    ...
+
+    OptionalEdge<"llvm_gcc_assembler", "llvm_gcc_cpp_linker",
+        (case (input_languages_contain "c++"), (inc_weight),
+              (or (parameter_equals "linker", "g++"),
+                  (parameter_equals "linker", "c++")), (inc_weight))>,
+    ...
+
+    ]>;
+
+

As you can see, the edges can be either default or optional, where +optional edges are differentiated by an additional case expression +used to calculate the weight of this edge. Notice also that we refer +to tools via their names (as strings). This makes it possible to add +edges to an existing compilation graph in plugins without having to +know about all tool definitions used in the graph.

+

The default edges are assigned a weight of 1, and optional edges get a +weight of 0 + 2*N where N is the number of tests that evaluated to +true in the case expression. It is also possible to provide an +integer parameter to inc_weight and dec_weight - in this case, +the weight is increased (or decreased) by the provided value instead +of the default 2. It is also possible to change the default weight of +an optional edge by using the default clause of the case +construct.

+

When passing an input file through the graph, LLVMC picks the edge +with the maximum weight. To avoid ambiguity, there should be only one +default edge between two nodes (with the exception of the root node, +which gets a special treatment - there you are allowed to specify one +default edge per language).

+

When multiple plugins are loaded, their compilation graphs are merged +together. Since multiple edges that have the same end nodes are not +allowed (i.e. the graph is not a multigraph), an edge defined in +several plugins will be replaced by the definition from the plugin +that was loaded last. Plugin load order can be controlled by using the +plugin priority feature described above.

+

To get a visual representation of the compilation graph (useful for +debugging), run llvmc --view-graph. You will need dot and +gsview installed for this to work properly.

+
+
+

Describing options

+

Command-line options that the plugin supports are defined by using an +OptionList:

+
+def Options : OptionList<[
+(switch_option "E", (help "Help string")),
+(alias_option "quiet", "q")
+...
+]>;
+
+

As you can see, the option list is just a list of DAGs, where each DAG +is an option description consisting of the option name and some +properties. A plugin can define more than one option list (they are +all merged together in the end), which can be handy if one wants to +separate option groups syntactically.

+
    +
  • Possible option types:

    +
    +
      +
    • switch_option - a simple boolean switch, for example -time.
    • +
    • parameter_option - option that takes an argument, for example +-std=c99;
    • +
    • parameter_list_option - same as the above, but more than one +occurence of the option is allowed.
    • +
    • prefix_option - same as the parameter_option, but the option name +and parameter value are not separated.
    • +
    • prefix_list_option - same as the above, but more than one +occurence of the option is allowed; example: -lm -lpthread.
    • +
    • alias_option - a special option type for creating +aliases. Unlike other option types, aliases are not allowed to +have any properties besides the aliased option name. Usage +example: (alias_option "preprocess", "E")
    • +
    +
    +
  • +
  • Possible option properties:

    +
    +
      +
    • help - help string associated with this option. Used for +--help output.
    • +
    • required - this option is obligatory.
    • +
    • hidden - this option should not appear in the --help +output (but should appear in the --help-hidden output).
    • +
    • really_hidden - the option should not appear in any help +output.
    • +
    • extern - this option is defined in some other plugin, see below.
    • +
    +
    +
  • +
+
+

External options

+

Sometimes, when linking several plugins together, one plugin needs to +access options defined in some other plugin. Because of the way +options are implemented, such options should be marked as +extern. This is what the extern option property is +for. Example:

+
+...
+(switch_option "E", (extern))
+...
+
+

See also the section on plugin priorities.

+
+
+
+

Conditional evaluation

+

The 'case' construct is the main means by which programmability is +achieved in LLVMC. It can be used to calculate edge weights, program +actions and modify the shell commands to be executed. The 'case' +expression is designed after the similarly-named construct in +functional languages and takes the form (case (test_1), statement_1, +(test_2), statement_2, ... (test_N), statement_N). The statements +are evaluated only if the corresponding tests evaluate to true.

+

Examples:

+
+// Edge weight calculation
+
+// Increases edge weight by 5 if "-A" is provided on the
+// command-line, and by 5 more if "-B" is also provided.
+(case
+    (switch_on "A"), (inc_weight 5),
+    (switch_on "B"), (inc_weight 5))
+
+
+// Tool command line specification
+
+// Evaluates to "cmdline1" if the option "-A" is provided on the
+// command line; to "cmdline2" if "-B" is provided;
+// otherwise to "cmdline3".
+
+(case
+    (switch_on "A"), "cmdline1",
+    (switch_on "B"), "cmdline2",
+    (default), "cmdline3")
+
+

Note the slight difference in 'case' expression handling in contexts +of edge weights and command line specification - in the second example +the value of the "B" switch is never checked when switch "A" is +enabled, and the whole expression always evaluates to "cmdline1" in +that case.

+

Case expressions can also be nested, i.e. the following is legal:

+
+(case (switch_on "E"), (case (switch_on "o"), ..., (default), ...)
+      (default), ...)
+
+

You should, however, try to avoid doing that because it hurts +readability. It is usually better to split tool descriptions and/or +use TableGen inheritance instead.

+
    +
  • Possible tests are:
      +
    • switch_on - Returns true if a given command-line switch is +provided by the user. Example: (switch_on "opt").
    • +
    • parameter_equals - Returns true if a command-line parameter equals +a given value. +Example: (parameter_equals "W", "all").
    • +
    • element_in_list - Returns true if a command-line parameter +list contains a given value. +Example: (parameter_in_list "l", "pthread").
    • +
    • input_languages_contain - Returns true if a given language +belongs to the current input language set. +Example: (input_languages_contain "c++").
    • +
    • in_language - Evaluates to true if the input file language +equals to the argument. At the moment works only with cmd_line +and actions (on non-join nodes). +Example: (in_language "c++").
    • +
    • not_empty - Returns true if a given option (which should be +either a parameter or a parameter list) is set by the +user. +Example: (not_empty "o").
    • +
    • default - Always evaluates to true. Should always be the last +test in the case expression.
    • +
    • and - A standard logical combinator that returns true iff all +of its arguments return true. Used like this: (and (test1), +(test2), ... (testN)). Nesting of and and or is allowed, +but not encouraged.
    • +
    • or - Another logical combinator that returns true only if any +one of its arguments returns true. Example: (or (test1), +(test2), ... (testN)).
    • +
    +
  • +
+
+
+

Writing a tool description

+

As was said earlier, nodes in the compilation graph represent tools, +which are described separately. A tool definition looks like this +(taken from the include/llvm/CompilerDriver/Tools.td file):

+
+def llvm_gcc_cpp : Tool<[
+    (in_language "c++"),
+    (out_language "llvm-assembler"),
+    (output_suffix "bc"),
+    (cmd_line "llvm-g++ -c $INFILE -o $OUTFILE -emit-llvm"),
+    (sink)
+    ]>;
+
+

This defines a new tool called llvm_gcc_cpp, which is an alias for +llvm-g++. As you can see, a tool definition is just a list of +properties; most of them should be self-explanatory. The sink +property means that this tool should be passed all command-line +options that aren't mentioned in the option list.

+

The complete list of all currently implemented tool properties follows.

+
    +
  • Possible tool properties:
      +
    • in_language - input language name. Can be either a string or a +list, in case the tool supports multiple input languages.
    • +
    • out_language - output language name. Tools are not allowed to +have multiple output languages.
    • +
    • output_suffix - output file suffix. Can also be changed +dynamically, see documentation on actions.
    • +
    • cmd_line - the actual command used to run the tool. You can +use $INFILE and $OUTFILE variables, output redirection +with >, hook invocations ($CALL), environment variables +(via $ENV) and the case construct.
    • +
    • join - this tool is a "join node" in the graph, i.e. it gets a +list of input files and joins them together. Used for linkers.
    • +
    • sink - all command-line options that are not handled by other +tools are passed to this tool.
    • +
    • actions - A single big case expression that specifies how +this tool reacts on command-line options (described in more detail +below).
    • +
    +
  • +
+
+

Actions

+

A tool often needs to react to command-line options, and this is +precisely what the actions property is for. The next example +illustrates this feature:

+
+def llvm_gcc_linker : Tool<[
+    (in_language "object-code"),
+    (out_language "executable"),
+    (output_suffix "out"),
+    (cmd_line "llvm-gcc $INFILE -o $OUTFILE"),
+    (join),
+    (actions (case (not_empty "L"), (forward "L"),
+                   (not_empty "l"), (forward "l"),
+                   (not_empty "dummy"),
+                             [(append_cmd "-dummy1"), (append_cmd "-dummy2")])
+    ]>;
+
+

The actions tool property is implemented on top of the omnipresent +case expression. It associates one or more different actions +with given conditions - in the example, the actions are forward, +which forwards a given option unchanged, and append_cmd, which +appends a given string to the tool execution command. Multiple actions +can be associated with a single condition by using a list of actions +(used in the example to append some dummy options). The same case +construct can also be used in the cmd_line property to modify the +tool command line.

+

The "join" property used in the example means that this tool behaves +like a linker.

+

The list of all possible actions follows.

+
    +
  • Possible actions:

    +
    +
      +
    • append_cmd - append a string to the tool invocation +command. +Example: (case (switch_on "pthread"), (append_cmd "-lpthread"))
    • +
    • forward - forward an option unchanged. +Example: (forward "Wall").
    • +
    • forward_as - Change the name of an option, but forward the +argument unchanged. +Example: (forward_as "O0" "--disable-optimization").
    • +
    • output_suffix - modify the output suffix of this +tool. +Example: (output_suffix "i").
    • +
    • stop_compilation - stop compilation after this tool processes +its input. Used without arguments.
    • +
    • unpack_values - used for for splitting and forwarding +comma-separated lists of options, e.g. -Wa,-foo=bar,-baz is +converted to -foo=bar -baz and appended to the tool invocation +command. +Example: (unpack_values "Wa,").
    • +
    +
    +
  • +
+
+
+
+

Language map

+

If you are adding support for a new language to LLVMC, you'll need to +modify the language map, which defines mappings from file extensions +to language names. It is used to choose the proper toolchain(s) for a +given input file set. Language map definition looks like this:

+
+def LanguageMap : LanguageMap<
+    [LangToSuffixes<"c++", ["cc", "cp", "cxx", "cpp", "CPP", "c++", "C"]>,
+     LangToSuffixes<"c", ["c"]>,
+     ...
+    ]>;
+
+

For example, without those definitions the following command wouldn't work:

+
+$ llvmc hello.cpp
+llvmc: Unknown suffix: cpp
+
+

The language map entries should be added only for tools that are +linked with the root node. Since tools are not allowed to have +multiple output languages, for nodes "inside" the graph the input and +output languages should match. This is enforced at compile-time.

+
+
+

More advanced topics

+
+

Hooks and environment variables

+

Normally, LLVMC executes programs from the system PATH. Sometimes, +this is not sufficient: for example, we may want to specify tool names +in the configuration file. This can be achieved via the mechanism of +hooks - to write your own hooks, just add their definitions to the +PluginMain.cpp or drop a .cpp file into the +$LLVMC_DIR/driver directory. Hooks should live in the hooks +namespace and have the signature std::string hooks::MyHookName +(void). They can be used from the cmd_line tool property:

+
+(cmd_line "$CALL(MyHook)/path/to/file -o $CALL(AnotherHook)")
+
+

It is also possible to use environment variables in the same manner:

+
+(cmd_line "$ENV(VAR1)/path/to/file -o $ENV(VAR2)")
+
+

To change the command line string based on user-provided options use +the case expression (documented above):

+
+(cmd_line
+  (case
+    (switch_on "E"),
+       "llvm-g++ -E -x c $INFILE -o $OUTFILE",
+    (default),
+       "llvm-g++ -c -x c $INFILE -o $OUTFILE -emit-llvm"))
+
+
+
+

How plugins are loaded

+

It is possible for LLVMC plugins to depend on each other. For example, +one can create edges between nodes defined in some other plugin. To +make this work, however, that plugin should be loaded first. To +achieve this, the concept of plugin priority was introduced. By +default, every plugin has priority zero; to specify the priority +explicitly, put the following line in your plugin's TableGen file:

+
+def Priority : PluginPriority<$PRIORITY_VALUE>;
+# Where PRIORITY_VALUE is some integer > 0
+
+

Plugins are loaded in order of their (increasing) priority, starting +with 0. Therefore, the plugin with the highest priority value will be +loaded last.

+
+
+

Debugging

+

When writing LLVMC plugins, it can be useful to get a visual view of +the resulting compilation graph. This can be achieved via the command +line option --view-graph. This command assumes that Graphviz [2] and +Ghostview [3] are installed. There is also a --dump-graph option that +creates a Graphviz source file(compilation-graph.dot) in the +current directory.

+
+
+
+

References

+ + + + + +
[1]TableGen Fundamentals +http://llvm.cs.uiuc.edu/docs/TableGenFundamentals.html
+ + + + + +
[2]Graphviz +http://www.graphviz.org/
+ + + + + +
[3]Ghostview +http://pages.cs.wisc.edu/~ghost/
+
+
+ Valid CSS + Valid HTML 4.01 + + Mikhail Glushenkov
+ LLVM Compiler Infrastructure
+ + Last modified: $Date: 2008-12-11 11:34:48 -0600 (Thu, 11 Dec 2008) $ +
+
+
+ + -- cgit v1.2.3-18-g5258