diff options
author | Matt Beaumont-Gay <matthewbg@google.com> | 2012-12-14 17:55:15 +0000 |
---|---|---|
committer | Matt Beaumont-Gay <matthewbg@google.com> | 2012-12-14 17:55:15 +0000 |
commit | 6aed25d93d1cfcde5809a73ffa7dc1b0d6396f66 (patch) | |
tree | 57e2fdf1caf960d8d878e0289f32af6759832b49 /docs | |
parent | 7139cfb19b1cc28dfd5e274c07ec68835bc6d6d6 (diff) | |
parent | 1ad9253c9d34ccbce3e7e4ea5d87c266cbf93410 (diff) |
Updating branches/google/stable to r169803
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/google/stable@170212 91177308-0d34-0410-b5e6-96231b3b80d8
Diffstat (limited to 'docs')
88 files changed, 36742 insertions, 43431 deletions
diff --git a/docs/BitCodeFormat.rst b/docs/BitCodeFormat.rst index bd26f7b150..333e79b864 100644 --- a/docs/BitCodeFormat.rst +++ b/docs/BitCodeFormat.rst @@ -54,8 +54,8 @@ structure. This structure consists of the following concepts: * Abbreviations, which specify compression optimizations for the file. -Note that the `llvm-bcanalyzer <CommandGuide/html/llvm-bcanalyzer.html>`_ tool -can be used to dump and inspect arbitrary bitstreams, which is very useful for +Note that the :doc:`llvm-bcanalyzer <CommandGuide/llvm-bcanalyzer>` tool can be +used to dump and inspect arbitrary bitstreams, which is very useful for understanding the encoding. .. _magic number: diff --git a/docs/BranchWeightMetadata.rst b/docs/BranchWeightMetadata.rst index f0df971f87..2667ce3589 100644 --- a/docs/BranchWeightMetadata.rst +++ b/docs/BranchWeightMetadata.rst @@ -27,8 +27,8 @@ Supported Instructions ``BranchInst`` ^^^^^^^^^^^^^^ -Metadata is only assign to the conditional branches. There are two extra -operarands, for the true and the false branch. +Metadata is only assigned to the conditional branches. There are two extra +operarands for the true and the false branch. .. code-block:: llvm @@ -41,8 +41,8 @@ operarands, for the true and the false branch. ``SwitchInst`` ^^^^^^^^^^^^^^ -Branch weights are assign to every case (including ``default`` case which is -always case #0). +Branch weights are assigned to every case (including the ``default`` case which +is always case #0). .. code-block:: llvm @@ -55,7 +55,7 @@ always case #0). ``IndirectBrInst`` ^^^^^^^^^^^^^^^^^^ -Branch weights are assign to every destination. +Branch weights are assigned to every destination. .. code-block:: llvm diff --git a/docs/CodeGenerator.rst b/docs/CodeGenerator.rst index 900fb8a81f..ce23667eb3 100644 --- a/docs/CodeGenerator.rst +++ b/docs/CodeGenerator.rst @@ -172,7 +172,7 @@ architecture. These target descriptions often have a large amount of common information (e.g., an ``add`` instruction is almost identical to a ``sub`` instruction). In order to allow the maximum amount of commonality to be factored out, the LLVM code generator uses the -`TableGen <TableGenFundamentals.html>`_ tool to describe big chunks of the +:doc:`TableGen <TableGenFundamentals>` tool to describe big chunks of the target machine, which allows the use of domain-specific and target-specific abstractions to reduce the amount of repetition. @@ -224,13 +224,13 @@ The ``DataLayout`` class ------------------------ The ``DataLayout`` class is the only required target description class, and it -is the only class that is not extensible (you cannot derived a new class from +is the only class that is not extensible (you cannot derive a new class from it). ``DataLayout`` specifies information about how the target lays out memory for structures, the alignment requirements for various data types, the size of pointers in the target, and whether the target is little-endian or big-endian. -.. _targetlowering: +.. _TargetLowering: The ``TargetLowering`` class ---------------------------- @@ -248,7 +248,9 @@ operations. Among other things, this class indicates: * the type to use for shift amounts, and * various high-level characteristics, like whether it is profitable to turn - division by a constant into a multiplication sequence + division by a constant into a multiplication sequence. + +.. _TargetRegisterInfo: The ``TargetRegisterInfo`` class -------------------------------- @@ -771,6 +773,8 @@ value of type i1, i8, i16, or i64 would be illegal, as would a DAG that uses a SREM or UREM operation. The `legalize types`_ and `legalize operations`_ phases are responsible for turning an illegal DAG into a legal DAG. +.. _SelectionDAG-Process: + SelectionDAG Instruction Selection Process ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -838,8 +842,7 @@ Initial SelectionDAG Construction ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The initial SelectionDAG is na\ :raw-html:`ï`\ vely peephole expanded from -the LLVM input by the ``SelectionDAGLowering`` class in the -``lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp`` file. The intent of this pass +the LLVM input by the ``SelectionDAGBuilder`` class. The intent of this pass is to expose as much low-level, target-specific details to the SelectionDAG as possible. This pass is mostly hard-coded (e.g. an LLVM ``add`` turns into an ``SDNode add`` while a ``getelementptr`` is expanded into the obvious @@ -875,7 +878,7 @@ found, the elements are converted to scalars ("scalarizing"). A target implementation tells the legalizer which types are supported (and which register class to use for them) by calling the ``addRegisterClass`` method in -its TargetLowering constructor. +its ``TargetLowering`` constructor. .. _legalize operations: .. _Legalizer: @@ -969,7 +972,8 @@ The ``FADDS`` instruction is a simple binary single-precision add instruction. To perform this pattern match, the PowerPC backend includes the following instruction definitions: -:: +.. code-block:: text + :emphasize-lines: 4-5,9 def FMADDS : AForm_1<59, 29, (ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRC, F4RC:$FRB), @@ -981,10 +985,10 @@ instruction definitions: "fadds $FRT, $FRA, $FRB", [(set F4RC:$FRT, (fadd F4RC:$FRA, F4RC:$FRB))]>; -The portion of the instruction definition in bold indicates the pattern used to -match the instruction. The DAG operators (like ``fmul``/``fadd``) are defined -in the ``include/llvm/Target/TargetSelectionDAG.td`` file. " ``F4RC``" is the -register class of the input and result values. +The highlighted portion of the instruction definitions indicates the pattern +used to match the instructions. The DAG operators (like ``fmul``/``fadd``) +are defined in the ``include/llvm/Target/TargetSelectionDAG.td`` file. +"``F4RC``" is the register class of the input and result values. The TableGen DAG instruction selector generator reads the instruction patterns in the ``.td`` file and automatically builds parts of the pattern matching code @@ -1728,6 +1732,8 @@ This section of the document explains features or design decisions that are specific to the code generator for a particular target. First we start with a table that summarizes what features are supported by each target. +.. _target-feature-matrix: + Target Feature Matrix --------------------- @@ -1763,7 +1769,6 @@ Here is the table: :raw-html:`<tr>` :raw-html:`<th>Feature</th>` :raw-html:`<th>ARM</th>` -:raw-html:`<th>CellSPU</th>` :raw-html:`<th>Hexagon</th>` :raw-html:`<th>MBlaze</th>` :raw-html:`<th>MSP430</th>` @@ -1778,7 +1783,6 @@ Here is the table: :raw-html:`<tr>` :raw-html:`<td><a href="#feat_reliable">is generally reliable</a></td>` :raw-html:`<td class="yes"></td> <!-- ARM -->` -:raw-html:`<td class="no"></td> <!-- CellSPU -->` :raw-html:`<td class="yes"></td> <!-- Hexagon -->` :raw-html:`<td class="no"></td> <!-- MBlaze -->` :raw-html:`<td class="unknown"></td> <!-- MSP430 -->` @@ -1793,7 +1797,6 @@ Here is the table: :raw-html:`<tr>` :raw-html:`<td><a href="#feat_asmparser">assembly parser</a></td>` :raw-html:`<td class="no"></td> <!-- ARM -->` -:raw-html:`<td class="no"></td> <!-- CellSPU -->` :raw-html:`<td class="no"></td> <!-- Hexagon -->` :raw-html:`<td class="yes"></td> <!-- MBlaze -->` :raw-html:`<td class="no"></td> <!-- MSP430 -->` @@ -1808,7 +1811,6 @@ Here is the table: :raw-html:`<tr>` :raw-html:`<td><a href="#feat_disassembler">disassembler</a></td>` :raw-html:`<td class="yes"></td> <!-- ARM -->` -:raw-html:`<td class="no"></td> <!-- CellSPU -->` :raw-html:`<td class="no"></td> <!-- Hexagon -->` :raw-html:`<td class="yes"></td> <!-- MBlaze -->` :raw-html:`<td class="no"></td> <!-- MSP430 -->` @@ -1823,7 +1825,6 @@ Here is the table: :raw-html:`<tr>` :raw-html:`<td><a href="#feat_inlineasm">inline asm</a></td>` :raw-html:`<td class="yes"></td> <!-- ARM -->` -:raw-html:`<td class="no"></td> <!-- CellSPU -->` :raw-html:`<td class="yes"></td> <!-- Hexagon -->` :raw-html:`<td class="yes"></td> <!-- MBlaze -->` :raw-html:`<td class="unknown"></td> <!-- MSP430 -->` @@ -1838,7 +1839,6 @@ Here is the table: :raw-html:`<tr>` :raw-html:`<td><a href="#feat_jit">jit</a></td>` :raw-html:`<td class="partial"><a href="#feat_jit_arm">*</a></td> <!-- ARM -->` -:raw-html:`<td class="no"></td> <!-- CellSPU -->` :raw-html:`<td class="no"></td> <!-- Hexagon -->` :raw-html:`<td class="no"></td> <!-- MBlaze -->` :raw-html:`<td class="unknown"></td> <!-- MSP430 -->` @@ -1853,7 +1853,6 @@ Here is the table: :raw-html:`<tr>` :raw-html:`<td><a href="#feat_objectwrite">.o file writing</a></td>` :raw-html:`<td class="no"></td> <!-- ARM -->` -:raw-html:`<td class="no"></td> <!-- CellSPU -->` :raw-html:`<td class="no"></td> <!-- Hexagon -->` :raw-html:`<td class="yes"></td> <!-- MBlaze -->` :raw-html:`<td class="no"></td> <!-- MSP430 -->` @@ -1868,7 +1867,6 @@ Here is the table: :raw-html:`<tr>` :raw-html:`<td><a hr:raw-html:`ef="#feat_tailcall">tail calls</a></td>` :raw-html:`<td class="yes"></td> <!-- ARM -->` -:raw-html:`<td class="no"></td> <!-- CellSPU -->` :raw-html:`<td class="yes"></td> <!-- Hexagon -->` :raw-html:`<td class="no"></td> <!-- MBlaze -->` :raw-html:`<td class="unknown"></td> <!-- MSP430 -->` @@ -1883,7 +1881,6 @@ Here is the table: :raw-html:`<tr>` :raw-html:`<td><a href="#feat_segstacks">segmented stacks</a></td>` :raw-html:`<td class="no"></td> <!-- ARM -->` -:raw-html:`<td class="no"></td> <!-- CellSPU -->` :raw-html:`<td class="no"></td> <!-- Hexagon -->` :raw-html:`<td class="no"></td> <!-- MBlaze -->` :raw-html:`<td class="no"></td> <!-- MSP430 -->` @@ -1992,8 +1989,8 @@ Tail call optimization Tail call optimization, callee reusing the stack of the caller, is currently supported on x86/x86-64 and PowerPC. It is performed if: -* Caller and callee have the calling convention ``fastcc`` or ``cc 10`` (GHC - call convention). +* Caller and callee have the calling convention ``fastcc``, ``cc 10`` (GHC + calling convention) or ``cc 11`` (HiPE calling convention). * The call is a tail call - in tail position (ret immediately follows call and ret uses value of call or is void). diff --git a/docs/CodingStandards.rst b/docs/CodingStandards.rst index 90835307b1..8003c12497 100644 --- a/docs/CodingStandards.rst +++ b/docs/CodingStandards.rst @@ -284,17 +284,10 @@ listed. We prefer these ``#include``\s to be listed in this order: #. Main Module Header #. Local/Private Headers -#. ``llvm/*`` -#. ``llvm/Analysis/*`` -#. ``llvm/Assembly/*`` -#. ``llvm/Bitcode/*`` -#. ``llvm/CodeGen/*`` -#. ... -#. ``llvm/Support/*`` -#. ``llvm/Config/*`` +#. ``llvm/...`` #. System ``#include``\s -and each category should be sorted by name. +and each category should be sorted lexicographically by the full path. The `Main Module Header`_ file applies to ``.cpp`` files which implement an interface defined by a ``.h`` file. This ``#include`` should always be included @@ -409,7 +402,8 @@ code. That said, LLVM does make extensive use of a hand-rolled form of RTTI that use templates like `isa<>, cast<>, and dyn_cast<> <ProgrammersManual.html#isa>`_. -This form of RTTI is opt-in and can be added to any class. It is also +This form of RTTI is opt-in and can be +:doc:`added to any class <HowToSetUpLLVMStyleRTTI>`. It is also substantially more efficient than ``dynamic_cast<>``. .. _static constructor: @@ -713,8 +707,8 @@ sort of thing is: .. code-block:: c++ bool FoundFoo = false; - for (unsigned i = 0, e = BarList.size(); i != e; ++i) - if (BarList[i]->isFoo()) { + for (unsigned I = 0, E = BarList.size(); I != E; ++I) + if (BarList[I]->isFoo()) { FoundFoo = true; break; } @@ -732,8 +726,8 @@ code to be structured like this: /// \returns true if the specified list has an element that is a foo. static bool containsFoo(const std::vector<Bar*> &List) { - for (unsigned i = 0, e = List.size(); i != e; ++i) - if (List[i]->isFoo()) + for (unsigned I = 0, E = List.size(); I != E; ++I) + if (List[I]->isFoo()) return true; return false; } @@ -820,8 +814,8 @@ Here are some examples of good and bad names: Vehicle MakeVehicle(VehicleType Type) { VehicleMaker M; // Might be OK if having a short life-span. - Tire tmp1 = M.makeTire(); // Bad -- 'tmp1' provides no information. - Light headlight = M.makeLight("head"); // Good -- descriptive. + Tire Tmp1 = M.makeTire(); // Bad -- 'Tmp1' provides no information. + Light Headlight = M.makeLight("head"); // Good -- descriptive. ... } @@ -841,9 +835,9 @@ enforced, and hopefully what to do about it. Here is one complete example: .. code-block:: c++ - inline Value *getOperand(unsigned i) { - assert(i < Operands.size() && "getOperand() out of range!"); - return Operands[i]; + inline Value *getOperand(unsigned I) { + assert(I < Operands.size() && "getOperand() out of range!"); + return Operands[I]; } Here are more examples: @@ -1035,7 +1029,7 @@ form has two problems. First it may be less efficient than evaluating it at the start of the loop. In this case, the cost is probably minor --- a few extra loads every time through the loop. However, if the base expression is more complex, then the cost can rise quickly. I've seen loops where the end -expression was actually something like: "``SomeMap[x]->end()``" and map lookups +expression was actually something like: "``SomeMap[X]->end()``" and map lookups really aren't cheap. By writing it in the second form consistently, you eliminate the issue entirely and don't even have to think about it. @@ -1111,27 +1105,27 @@ macros. For example, this is good: .. code-block:: c++ - if (x) ... - for (i = 0; i != 100; ++i) ... - while (llvm_rocks) ... + if (X) ... + for (I = 0; I != 100; ++I) ... + while (LLVMRocks) ... somefunc(42); assert(3 != 4 && "laws of math are failing me"); - a = foo(42, 92) + bar(x); + A = foo(42, 92) + bar(X); and this is bad: .. code-block:: c++ - if(x) ... - for(i = 0; i != 100; ++i) ... - while(llvm_rocks) ... + if(X) ... + for(I = 0; I != 100; ++I) ... + while(LLVMRocks) ... somefunc (42); assert (3 != 4 && "laws of math are failing me"); - a = foo (42, 92) + bar (x); + A = foo (42, 92) + bar (X); The reason for doing this is not completely arbitrary. This style makes control flow operators stand out more, and makes expressions flow better. The function @@ -1139,11 +1133,11 @@ call operator binds very tightly as a postfix operator. Putting a space after a function name (as in the last example) makes it appear that the code might bind the arguments of the left-hand-side of a binary operator with the argument list of a function and the name of the right side. More specifically, it is easy to -misread the "``a``" example as: +misread the "``A``" example as: .. code-block:: c++ - a = foo ((42, 92) + bar) (x); + A = foo ((42, 92) + bar) (X); when skimming through the code. By avoiding a space in a function, we avoid this misinterpretation. diff --git a/docs/CommandGuide/FileCheck.rst b/docs/CommandGuide/FileCheck.rst index 51a9bf6293..256970b362 100644 --- a/docs/CommandGuide/FileCheck.rst +++ b/docs/CommandGuide/FileCheck.rst @@ -1,94 +1,78 @@ FileCheck - Flexible pattern matching file verifier =================================================== - SYNOPSIS -------- - -**FileCheck** *match-filename* [*--check-prefix=XXX*] [*--strict-whitespace*] - +:program:`FileCheck` *match-filename* [*--check-prefix=XXX*] [*--strict-whitespace*] DESCRIPTION ----------- +:program:`FileCheck` reads two files (one from standard input, and one +specified on the command line) and uses one to verify the other. This +behavior is particularly useful for the testsuite, which wants to verify that +the output of some tool (e.g. :program:`llc`) contains the expected information +(for example, a movsd from esp or whatever is interesting). This is similar to +using :program:`grep`, but it is optimized for matching multiple different +inputs in one file in a specific order. -**FileCheck** reads two files (one from standard input, and one specified on the -command line) and uses one to verify the other. This behavior is particularly -useful for the testsuite, which wants to verify that the output of some tool -(e.g. llc) contains the expected information (for example, a movsd from esp or -whatever is interesting). This is similar to using grep, but it is optimized -for matching multiple different inputs in one file in a specific order. - -The *match-filename* file specifies the file that contains the patterns to +The ``match-filename`` file specifies the file that contains the patterns to match. The file to verify is always read from standard input. - OPTIONS ------- - - -**-help** +.. option:: -help Print a summary of command line options. +.. option:: --check-prefix prefix + FileCheck searches the contents of ``match-filename`` for patterns to match. + By default, these patterns are prefixed with "``CHECK:``". If you'd like to + use a different prefix (e.g. because the same input file is checking multiple + different tool or options), the :option:`--check-prefix` argument allows you + to specify a specific prefix to match. -**--check-prefix** *prefix* +.. option:: --input-file filename - FileCheck searches the contents of *match-filename* for patterns to match. By - default, these patterns are prefixed with "CHECK:". If you'd like to use a - different prefix (e.g. because the same input file is checking multiple - different tool or options), the **--check-prefix** argument allows you to specify - a specific prefix to match. + File to check (defaults to stdin). - - -**--strict-whitespace** +.. option:: --strict-whitespace By default, FileCheck canonicalizes input horizontal whitespace (spaces and tabs) which causes it to ignore these differences (a space will match a tab). - The --strict-whitespace argument disables this behavior. - + The :option:`--strict-whitespace` argument disables this behavior. - -**-version** +.. option:: -version Show the version number of this program. - - - EXIT STATUS ----------- - -If **FileCheck** verifies that the file matches the expected contents, it exits -with 0. Otherwise, if not, or if an error occurs, it will exit with a non-zero -value. - +If :program:`FileCheck` verifies that the file matches the expected contents, +it exits with 0. Otherwise, if not, or if an error occurs, it will exit with a +non-zero value. TUTORIAL -------- - FileCheck is typically used from LLVM regression tests, being invoked on the RUN line of the test. A simple example of using FileCheck from a RUN line looks like this: - .. code-block:: llvm ; RUN: llvm-as < %s | llc -march=x86-64 | FileCheck %s - -This syntax says to pipe the current file ("%s") into llvm-as, pipe that into -llc, then pipe the output of llc into FileCheck. This means that FileCheck will -be verifying its standard input (the llc output) against the filename argument -specified (the original .ll file specified by "%s"). To see how this works, -let's look at the rest of the .ll file (after the RUN line): - +This syntax says to pipe the current file ("``%s``") into ``llvm-as``, pipe +that into ``llc``, then pipe the output of ``llc`` into ``FileCheck``. This +means that FileCheck will be verifying its standard input (the llc output) +against the filename argument specified (the original ``.ll`` file specified by +"``%s``"). To see how this works, let's look at the rest of the ``.ll`` file +(after the RUN line): .. code-block:: llvm @@ -108,32 +92,30 @@ let's look at the rest of the .ll file (after the RUN line): ret void } +Here you can see some "``CHECK:``" lines specified in comments. Now you can +see how the file is piped into ``llvm-as``, then ``llc``, and the machine code +output is what we are verifying. FileCheck checks the machine code output to +verify that it matches what the "``CHECK:``" lines specify. -Here you can see some "CHECK:" lines specified in comments. Now you can see -how the file is piped into llvm-as, then llc, and the machine code output is -what we are verifying. FileCheck checks the machine code output to verify that -it matches what the "CHECK:" lines specify. - -The syntax of the CHECK: lines is very simple: they are fixed strings that +The syntax of the "``CHECK:``" lines is very simple: they are fixed strings that must occur in order. FileCheck defaults to ignoring horizontal whitespace differences (e.g. a space is allowed to match a tab) but otherwise, the contents -of the CHECK: line is required to match some thing in the test file exactly. +of the "``CHECK:``" line is required to match some thing in the test file exactly. One nice thing about FileCheck (compared to grep) is that it allows merging test cases together into logical groups. For example, because the test above -is checking for the "sub1:" and "inc4:" labels, it will not match unless there -is a "subl" in between those labels. If it existed somewhere else in the file, -that would not count: "grep subl" matches if subl exists anywhere in the -file. +is checking for the "``sub1:``" and "``inc4:``" labels, it will not match +unless there is a "``subl``" in between those labels. If it existed somewhere +else in the file, that would not count: "``grep subl``" matches if "``subl``" +exists anywhere in the file. The FileCheck -check-prefix option ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -The FileCheck -check-prefix option allows multiple test configurations to be -driven from one .ll file. This is useful in many circumstances, for example, -testing different architectural variants with llc. Here's a simple example: - +The FileCheck :option:`-check-prefix` option allows multiple test +configurations to be driven from one `.ll` file. This is useful in many +circumstances, for example, testing different architectural variants with +:program:`llc`. Here's a simple example: .. code-block:: llvm @@ -152,21 +134,17 @@ testing different architectural variants with llc. Here's a simple example: ; X64: pinsrd $1, %edi, %xmm0 } - In this case, we're testing that we get the expected code generation with both 32-bit and 64-bit code generation. - The "CHECK-NEXT:" directive ~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Sometimes you want to match lines and would like to verify that matches happen on exactly consecutive lines with no other lines in between them. In -this case, you can use CHECK: and CHECK-NEXT: directives to specify this. If -you specified a custom check prefix, just use "<PREFIX>-NEXT:". For -example, something like this works as you'd expect: - +this case, you can use "``CHECK:``" and "``CHECK-NEXT:``" directives to specify +this. If you specified a custom check prefix, just use "``<PREFIX>-NEXT:``". +For example, something like this works as you'd expect: .. code-block:: llvm @@ -188,22 +166,18 @@ example, something like this works as you'd expect: ; CHECK-NEXT: ret } - -CHECK-NEXT: directives reject the input unless there is exactly one newline -between it an the previous directive. A CHECK-NEXT cannot be the first -directive in a file. - +"``CHECK-NEXT:``" directives reject the input unless there is exactly one +newline between it and the previous directive. A "``CHECK-NEXT:``" cannot be +the first directive in a file. The "CHECK-NOT:" directive ~~~~~~~~~~~~~~~~~~~~~~~~~~ - -The CHECK-NOT: directive is used to verify that a string doesn't occur +The "``CHECK-NOT:``" directive is used to verify that a string doesn't occur between two matches (or before the first match, or after the last match). For example, to verify that a load is removed by a transformation, a test like this can be used: - .. code-block:: llvm define i8 @coerce_offset0(i32 %V, i32* %P) { @@ -219,27 +193,22 @@ can be used: ; CHECK: ret i8 } - - FileCheck Pattern Matching Syntax ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -The CHECK: and CHECK-NOT: directives both take a pattern to match. For most -uses of FileCheck, fixed string matching is perfectly sufficient. For some -things, a more flexible form of matching is desired. To support this, FileCheck -allows you to specify regular expressions in matching strings, surrounded by -double braces: **{{yourregex}}**. Because we want to use fixed string -matching for a majority of what we do, FileCheck has been designed to support -mixing and matching fixed string matching with regular expressions. This allows -you to write things like this: - +The "``CHECK:``" and "``CHECK-NOT:``" directives both take a pattern to match. +For most uses of FileCheck, fixed string matching is perfectly sufficient. For +some things, a more flexible form of matching is desired. To support this, +FileCheck allows you to specify regular expressions in matching strings, +surrounded by double braces: ``{{yourregex}}``. Because we want to use fixed +string matching for a majority of what we do, FileCheck has been designed to +support mixing and matching fixed string matching with regular expressions. +This allows you to write things like this: .. code-block:: llvm ; CHECK: movhpd {{[0-9]+}}(%esp), {{%xmm[0-7]}} - In this case, any offset from the ESP register will be allowed, and any xmm register will be allowed. @@ -247,19 +216,16 @@ Because regular expressions are enclosed with double braces, they are visually distinct, and you don't need to use escape characters within the double braces like you would in C. In the rare case that you want to match double braces explicitly from the input, you can use something ugly like -**{{[{][{]}}** as your pattern. - +``{{[{][{]}}`` as your pattern. FileCheck Variables ~~~~~~~~~~~~~~~~~~~ - It is often useful to match a pattern and then verify that it occurs again later in the file. For codegen tests, this can be useful to allow any register, -but verify that that register is used consistently later. To do this, FileCheck -allows named variables to be defined and substituted into patterns. Here is a -simple example: - +but verify that that register is used consistently later. To do this, +:program:`FileCheck` allows named variables to be defined and substituted into +patterns. Here is a simple example: .. code-block:: llvm @@ -267,18 +233,46 @@ simple example: ; CHECK: notw [[REGISTER:%[a-z]+]] ; CHECK: andw {{.*}}[[REGISTER]] +The first check line matches a regex ``%[a-z]+`` and captures it into the +variable ``REGISTER``. The second line verifies that whatever is in +``REGISTER`` occurs later in the file after an "``andw``". :program:`FileCheck` +variable references are always contained in ``[[ ]]`` pairs, and their names can +be formed with the regex ``[a-zA-Z][a-zA-Z0-9]*``. If a colon follows the name, +then it is a definition of the variable; otherwise, it is a use. + +:program:`FileCheck` variables can be defined multiple times, and uses always +get the latest value. Variables can also be used later on the same line they +were defined on. For example: + +.. code-block:: llvm + + ; CHECK: op [[REG:r[0-9]+]], [[REG]] + +Can be useful if you want the operands of ``op`` to be the same register, +and don't care exactly which register it is. + +FileCheck Expressions +~~~~~~~~~~~~~~~~~~~~~ + +Sometimes there's a need to verify output which refers line numbers of the +match file, e.g. when testing compiler diagnostics. This introduces a certain +fragility of the match file structure, as "``CHECK:``" lines contain absolute +line numbers in the same file, which have to be updated whenever line numbers +change due to text addition or deletion. + +To support this case, FileCheck allows using ``[[@LINE]]``, +``[[@LINE+<offset>]]``, ``[[@LINE-<offset>]]`` expressions in patterns. These +expressions expand to a number of the line where a pattern is located (with an +optional integer offset). + +This way match patterns can be put near the relevant test lines and include +relative line number references, for example: + +.. code-block:: c++ + + // CHECK: test.cpp:[[@LINE+4]]:6: error: expected ';' after top level declarator + // CHECK-NEXT: {{^int a}} + // CHECK-NEXT: {{^ \^}} + // CHECK-NEXT: {{^ ;}} + int a -The first check line matches a regex (**%[a-z]+**) and captures it into -the variable "REGISTER". The second line verifies that whatever is in REGISTER -occurs later in the file after an "andw". FileCheck variable references are -always contained in **[[ ]]** pairs, are named, and their names can be -name, then it is a definition of the variable, if not, it is a use. - -FileCheck variables can be defined multiple times, and uses always get the -latest value. Note that variables are all read at the start of a "CHECK" line -and are all defined at the end. This means that if you have something like -"**CHECK: [[XYZ:.\\*]]x[[XYZ]]**", the check line will read the previous -value of the XYZ variable and define a new one after the match is performed. If -you need to do something like this you can probably take advantage of the fact -that FileCheck is not actually line-oriented when it matches, this allows you to -define two separate CHECK lines that match on the same line. diff --git a/docs/CommandGuide/bugpoint.rst b/docs/CommandGuide/bugpoint.rst index c1b3b6eca6..e4663e5d44 100644 --- a/docs/CommandGuide/bugpoint.rst +++ b/docs/CommandGuide/bugpoint.rst @@ -1,19 +1,15 @@ bugpoint - automatic test case reduction tool ============================================= - SYNOPSIS -------- - **bugpoint** [*options*] [*input LLVM ll/bc files*] [*LLVM passes*] **--args** *program arguments* - DESCRIPTION ----------- - **bugpoint** narrows down the source of problems in LLVM tools and passes. It can be used to debug three types of failures: optimizer crashes, miscompilations by optimizers, or bad native code generation (including problems in the static @@ -22,82 +18,61 @@ For more information on the design and inner workings of **bugpoint**, as well a advice for using bugpoint, see *llvm/docs/Bugpoint.html* in the LLVM distribution. - OPTIONS ------- - - **--additional-so** *library* Load the dynamic shared object *library* into the test program whenever it is run. This is useful if you are debugging programs which depend on non-LLVM libraries (such as the X or curses libraries) to run. - - **--append-exit-code**\ =\ *{true,false}* Append the test programs exit code to the output file so that a change in exit code is considered a test failure. Defaults to false. - - **--args** *program args* - Pass all arguments specified after -args to the test program whenever it runs. - Note that if any of the *program args* start with a '-', you should use: - + Pass all arguments specified after **--args** to the test program whenever it runs. + Note that if any of the *program args* start with a "``-``", you should use: - .. code-block:: perl + .. code-block:: bash bugpoint [bugpoint args] --args -- [program args] - - The "--" right after the **--args** option tells **bugpoint** to consider any - options starting with ``-`` to be part of the **--args** option, not as options to - **bugpoint** itself. - - + The "``--``" right after the **--args** option tells **bugpoint** to consider + any options starting with "``-``" to be part of the **--args** option, not as + options to **bugpoint** itself. **--tool-args** *tool args* - Pass all arguments specified after --tool-args to the LLVM tool under test + Pass all arguments specified after **--tool-args** to the LLVM tool under test (**llc**, **lli**, etc.) whenever it runs. You should use this option in the following way: - - .. code-block:: perl + .. code-block:: bash bugpoint [bugpoint args] --tool-args -- [tool args] - - The "--" right after the **--tool-args** option tells **bugpoint** to consider any - options starting with ``-`` to be part of the **--tool-args** option, not as - options to **bugpoint** itself. (See **--args**, above.) - - + The "``--``" right after the **--tool-args** option tells **bugpoint** to + consider any options starting with "``-``" to be part of the **--tool-args** + option, not as options to **bugpoint** itself. (See **--args**, above.) **--safe-tool-args** *tool args* Pass all arguments specified after **--safe-tool-args** to the "safe" execution tool. - - **--gcc-tool-args** *gcc tool args* Pass all arguments specified after **--gcc-tool-args** to the invocation of **gcc**. - - **--opt-args** *opt args* Pass all arguments specified after **--opt-args** to the invocation of **opt**. - - **--disable-{dce,simplifycfg}** Do not run the specified passes to clean up and reduce the size of the test @@ -105,36 +80,26 @@ OPTIONS reduce test programs. If you're trying to find a bug in one of these passes, **bugpoint** may crash. - - **--enable-valgrind** Use valgrind to find faults in the optimization phase. This will allow bugpoint to find otherwise asymptomatic problems caused by memory mis-management. - - **-find-bugs** Continually randomize the specified passes and run them on the test program until a bug is found or the user kills **bugpoint**. - - **-help** Print a summary of command line options. - - **--input** *filename* Open *filename* and redirect the standard input of the test program, whenever it runs, to come from that file. - - **--load** *plugin* Load the dynamic object *plugin* into **bugpoint** itself. This object should @@ -143,20 +108,15 @@ OPTIONS optimizations, use the **-help** and **--load** options together; for example: - .. code-block:: perl + .. code-block:: bash bugpoint --load myNewPass.so -help - - - **--mlimit** *megabytes* Specifies an upper limit on memory usage of the optimization and codegen. Set to zero to disable the limit. - - **--output** *filename* Whenever the test program produces output on its standard output stream, it @@ -164,14 +124,10 @@ OPTIONS do not use this option, **bugpoint** will attempt to generate a reference output by compiling the program with the "safe" backend and running it. - - **--profile-info-file** *filename* Profile file loaded by **--profile-loader**. - - **--run-{int,jit,llc,custom}** Whenever the test program is compiled, **bugpoint** should generate code for it @@ -179,8 +135,6 @@ OPTIONS interpreter, the JIT compiler, the static native code compiler, or a custom command (see **--exec-command**) respectively. - - **--safe-{llc,custom}** When debugging a code generator, **bugpoint** should use the specified code @@ -192,16 +146,12 @@ OPTIONS respectively. The interpreter and the JIT backends cannot currently be used as the "safe" backends. - - **--exec-command** *command* This option defines the command to use with the **--run-custom** and **--safe-custom** options to execute the bitcode testcase. This can be useful for cross-compilation. - - **--compile-command** *command* This option defines the command to use with the **--compile-custom** @@ -210,38 +160,28 @@ OPTIONS generate a reduced unit test, you may add CHECK directives to the testcase and pass the name of an executable compile-command script in this form: - .. code-block:: sh #!/bin/sh llc "$@" not FileCheck [bugpoint input file].ll < bugpoint-test-program.s - This script will "fail" as long as FileCheck passes. So the result will be the minimum bitcode that passes FileCheck. - - **--safe-path** *path* This option defines the path to the command to execute with the **--safe-{int,jit,llc,custom}** option. - - - EXIT STATUS ----------- - If **bugpoint** succeeds in finding a problem, it will exit with 0. Otherwise, if an error occurs, it will exit with a non-zero value. - SEE ALSO -------- - opt|opt diff --git a/docs/CommandGuide/lit.rst b/docs/CommandGuide/lit.rst index 9e96cd2a4b..1dcaff10bf 100644 --- a/docs/CommandGuide/lit.rst +++ b/docs/CommandGuide/lit.rst @@ -1,351 +1,282 @@ lit - LLVM Integrated Tester ============================ - SYNOPSIS -------- - -**lit** [*options*] [*tests*] - +:program:`lit` [*options*] [*tests*] DESCRIPTION ----------- +:program:`lit` is a portable tool for executing LLVM and Clang style test +suites, summarizing their results, and providing indication of failures. +:program:`lit` is designed to be a lightweight testing tool with as simple a +user interface as possible. -**lit** is a portable tool for executing LLVM and Clang style test suites, -summarizing their results, and providing indication of failures. **lit** is -designed to be a lightweight testing tool with as simple a user interface as -possible. - -**lit** should be run with one or more *tests* to run specified on the command -line. Tests can be either individual test files or directories to search for -tests (see "TEST DISCOVERY"). +:program:`lit` should be run with one or more *tests* to run specified on the +command line. Tests can be either individual test files or directories to +search for tests (see :ref:`test-discovery`). Each specified test will be executed (potentially in parallel) and once all -tests have been run **lit** will print summary information on the number of tests -which passed or failed (see "TEST STATUS RESULTS"). The **lit** program will -execute with a non-zero exit code if any tests fail. - -By default **lit** will use a succinct progress display and will only print -summary information for test failures. See "OUTPUT OPTIONS" for options -controlling the **lit** progress display and output. +tests have been run :program:`lit` will print summary information on the number +of tests which passed or failed (see :ref:`test-status-results`). The +:program:`lit` program will execute with a non-zero exit code if any tests +fail. -**lit** also includes a number of options for controlling how tests are executed -(specific features may depend on the particular test format). See "EXECUTION -OPTIONS" for more information. +By default :program:`lit` will use a succinct progress display and will only +print summary information for test failures. See :ref:`output-options` for +options controlling the :program:`lit` progress display and output. -Finally, **lit** also supports additional options for only running a subset of -the options specified on the command line, see "SELECTION OPTIONS" for -more information. +:program:`lit` also includes a number of options for controlling how tests are +executed (specific features may depend on the particular test format). See +:ref:`execution-options` for more information. -Users interested in the **lit** architecture or designing a **lit** testing -implementation should see "LIT INFRASTRUCTURE" +Finally, :program:`lit` also supports additional options for only running a +subset of the options specified on the command line, see +:ref:`selection-options` for more information. +Users interested in the :program:`lit` architecture or designing a +:program:`lit` testing implementation should see :ref:`lit-infrastructure`. GENERAL OPTIONS --------------- +.. option:: -h, --help + Show the :program:`lit` help message. -**-h**, **--help** - - Show the **lit** help message. - - - -**-j** *N*, **--threads**\ =\ *N* - - Run *N* tests in parallel. By default, this is automatically chosen to match - the number of detected available CPUs. - - +.. option:: -j N, --threads=N -**--config-prefix**\ =\ *NAME* + Run ``N`` tests in parallel. By default, this is automatically chosen to + match the number of detected available CPUs. - Search for *NAME.cfg* and *NAME.site.cfg* when searching for test suites, - instead of *lit.cfg* and *lit.site.cfg*. +.. option:: --config-prefix=NAME + Search for :file:`{NAME}.cfg` and :file:`{NAME}.site.cfg` when searching for + test suites, instead of :file:`lit.cfg` and :file:`lit.site.cfg`. +.. option:: --param NAME, --param NAME=VALUE -**--param** *NAME*, **--param** *NAME*\ =\ *VALUE* - - Add a user defined parameter *NAME* with the given *VALUE* (or the empty - string if not given). The meaning and use of these parameters is test suite + Add a user defined parameter ``NAME`` with the given ``VALUE`` (or the empty + string if not given). The meaning and use of these parameters is test suite dependent. - - +.. _output-options: OUTPUT OPTIONS -------------- - - -**-q**, **--quiet** +.. option:: -q, --quiet Suppress any output except for test failures. - - -**-s**, **--succinct** +.. option:: -s, --succinct Show less output, for example don't show information on tests that pass. - - -**-v**, **--verbose** +.. option:: -v, --verbose Show more information on test failures, for example the entire test output instead of just the test result. - - -**--no-progress-bar** +.. option:: --no-progress-bar Do not use curses based progress bar. - - +.. _execution-options: EXECUTION OPTIONS ----------------- +.. option:: --path=PATH + Specify an additional ``PATH`` to use when searching for executables in tests. -**--path**\ =\ *PATH* - - Specify an addition *PATH* to use when searching for executables in tests. - - - -**--vg** - - Run individual tests under valgrind (using the memcheck tool). The - *--error-exitcode* argument for valgrind is used so that valgrind failures will - cause the program to exit with a non-zero status. - - When this option is enabled, **lit** will also automatically provide a - "valgrind" feature that can be used to conditionally disable (or expect failure - in) certain tests. - - - -**--vg-arg**\ =\ *ARG* - - When *--vg* is used, specify an additional argument to pass to valgrind itself. - +.. option:: --vg + Run individual tests under valgrind (using the memcheck tool). The + ``--error-exitcode`` argument for valgrind is used so that valgrind failures + will cause the program to exit with a non-zero status. -**--vg-leak** + When this option is enabled, :program:`lit` will also automatically provide a + "``valgrind``" feature that can be used to conditionally disable (or expect + failure in) certain tests. - When *--vg* is used, enable memory leak checks. When this option is enabled, - **lit** will also automatically provide a "vg_leak" feature that can be - used to conditionally disable (or expect failure in) certain tests. +.. option:: --vg-arg=ARG + When :option:`--vg` is used, specify an additional argument to pass to + :program:`valgrind` itself. +.. option:: --vg-leak + When :option:`--vg` is used, enable memory leak checks. When this option is + enabled, :program:`lit` will also automatically provide a "``vg_leak``" + feature that can be used to conditionally disable (or expect failure in) + certain tests. -**--time-tests** - - Track the wall time individual tests take to execute and includes the results in - the summary output. This is useful for determining which tests in a test suite - take the most time to execute. Note that this option is most useful with *-j - 1*. - +.. option:: --time-tests + Track the wall time individual tests take to execute and includes the results + in the summary output. This is useful for determining which tests in a test + suite take the most time to execute. Note that this option is most useful + with ``-j 1``. +.. _selection-options: SELECTION OPTIONS ----------------- +.. option:: --max-tests=N + Run at most ``N`` tests and then terminate. -**--max-tests**\ =\ *N* - - Run at most *N* tests and then terminate. - - - -**--max-time**\ =\ *N* +.. option:: --max-time=N - Spend at most *N* seconds (approximately) running tests and then terminate. + Spend at most ``N`` seconds (approximately) running tests and then terminate. - - -**--shuffle** +.. option:: --shuffle Run the tests in a random order. - - - ADDITIONAL OPTIONS ------------------ +.. option:: --debug + Run :program:`lit` in debug mode, for debugging configuration issues and + :program:`lit` itself. -**--debug** - - Run **lit** in debug mode, for debugging configuration issues and **lit** itself. - - - -**--show-suites** +.. option:: --show-suites List the discovered test suites as part of the standard output. - - -**--no-tcl-as-sh** +.. option:: --no-tcl-as-sh Run Tcl scripts internally (instead of converting to shell scripts). +.. option:: --repeat=N - -**--repeat**\ =\ *N* - - Run each test *N* times. Currently this is primarily useful for timing tests, - other results are not collated in any reasonable fashion. - - - + Run each test ``N`` times. Currently this is primarily useful for timing + tests, other results are not collated in any reasonable fashion. EXIT STATUS ----------- - -**lit** will exit with an exit code of 1 if there are any FAIL or XPASS -results. Otherwise, it will exit with the status 0. Other exit codes are used +:program:`lit` will exit with an exit code of 1 if there are any FAIL or XPASS +results. Otherwise, it will exit with the status 0. Other exit codes are used for non-test related failures (for example a user error or an internal program error). +.. _test-discovery: TEST DISCOVERY -------------- +The inputs passed to :program:`lit` can be either individual tests, or entire +directories or hierarchies of tests to run. When :program:`lit` starts up, the +first thing it does is convert the inputs into a complete list of tests to run +as part of *test discovery*. -The inputs passed to **lit** can be either individual tests, or entire -directories or hierarchies of tests to run. When **lit** starts up, the first -thing it does is convert the inputs into a complete list of tests to run as part -of *test discovery*. +In the :program:`lit` model, every test must exist inside some *test suite*. +:program:`lit` resolves the inputs specified on the command line to test suites +by searching upwards from the input path until it finds a :file:`lit.cfg` or +:file:`lit.site.cfg` file. These files serve as both a marker of test suites +and as configuration files which :program:`lit` loads in order to understand +how to find and run the tests inside the test suite. -In the **lit** model, every test must exist inside some *test suite*. **lit** -resolves the inputs specified on the command line to test suites by searching -upwards from the input path until it finds a *lit.cfg* or *lit.site.cfg* -file. These files serve as both a marker of test suites and as configuration -files which **lit** loads in order to understand how to find and run the tests -inside the test suite. - -Once **lit** has mapped the inputs into test suites it traverses the list of -inputs adding tests for individual files and recursively searching for tests in -directories. +Once :program:`lit` has mapped the inputs into test suites it traverses the +list of inputs adding tests for individual files and recursively searching for +tests in directories. This behavior makes it easy to specify a subset of tests to run, while still allowing the test suite configuration to control exactly how tests are -interpreted. In addition, **lit** always identifies tests by the test suite they -are in, and their relative path inside the test suite. For appropriately -configured projects, this allows **lit** to provide convenient and flexible -support for out-of-tree builds. +interpreted. In addition, :program:`lit` always identifies tests by the test +suite they are in, and their relative path inside the test suite. For +appropriately configured projects, this allows :program:`lit` to provide +convenient and flexible support for out-of-tree builds. +.. _test-status-results: TEST STATUS RESULTS ------------------- - Each test ultimately produces one of the following six results: - **PASS** The test succeeded. - - **XFAIL** - The test failed, but that is expected. This is used for test formats which allow + The test failed, but that is expected. This is used for test formats which allow specifying that a test does not currently work, but wish to leave it in the test suite. - - **XPASS** - The test succeeded, but it was expected to fail. This is used for tests which + The test succeeded, but it was expected to fail. This is used for tests which were specified as expected to fail, but are now succeeding (generally because the feature they test was broken and has been fixed). - - **FAIL** The test failed. - - **UNRESOLVED** - The test result could not be determined. For example, this occurs when the test + The test result could not be determined. For example, this occurs when the test could not be run, the test itself is invalid, or the test was interrupted. - - **UNSUPPORTED** - The test is not supported in this environment. This is used by test formats + The test is not supported in this environment. This is used by test formats which can report unsupported tests. - - Depending on the test format tests may produce additional information about -their status (generally only for failures). See the Output|"OUTPUT OPTIONS" +their status (generally only for failures). See the :ref:`output-options` section for more information. +.. _lit-infrastructure: LIT INFRASTRUCTURE ------------------ +This section describes the :program:`lit` testing architecture for users interested in +creating a new :program:`lit` testing implementation, or extending an existing one. -This section describes the **lit** testing architecture for users interested in -creating a new **lit** testing implementation, or extending an existing one. - -**lit** proper is primarily an infrastructure for discovering and running +:program:`lit` proper is primarily an infrastructure for discovering and running arbitrary tests, and to expose a single convenient interface to these -tests. **lit** itself doesn't know how to run tests, rather this logic is +tests. :program:`lit` itself doesn't know how to run tests, rather this logic is defined by *test suites*. TEST SUITES ~~~~~~~~~~~ - -As described in "TEST DISCOVERY", tests are always located inside a *test -suite*. Test suites serve to define the format of the tests they contain, the +As described in :ref:`test-discovery`, tests are always located inside a *test +suite*. Test suites serve to define the format of the tests they contain, the logic for finding those tests, and any additional information to run the tests. -**lit** identifies test suites as directories containing *lit.cfg* or -*lit.site.cfg* files (see also **--config-prefix**). Test suites are initially -discovered by recursively searching up the directory hierarchy for all the input -files passed on the command line. You can use **--show-suites** to display the -discovered test suites at startup. +:program:`lit` identifies test suites as directories containing ``lit.cfg`` or +``lit.site.cfg`` files (see also :option:`--config-prefix`). Test suites are +initially discovered by recursively searching up the directory hierarchy for +all the input files passed on the command line. You can use +:option:`--show-suites` to display the discovered test suites at startup. -Once a test suite is discovered, its config file is loaded. Config files -themselves are Python modules which will be executed. When the config file is +Once a test suite is discovered, its config file is loaded. Config files +themselves are Python modules which will be executed. When the config file is executed, two important global variables are predefined: - **lit** The global **lit** configuration object (a *LitConfig* instance), which defines the builtin test formats, global configuration parameters, and other helper routines for implementing test configurations. - - **config** This is the config object (a *TestingConfig* instance) for the test suite, - which the config file is expected to populate. The following variables are also + which the config file is expected to populate. The following variables are also available on the *config* object, some of which must be set by the config and others are optional or predefined: @@ -353,135 +284,133 @@ executed, two important global variables are predefined: diagnostics. **test_format** *[required]* The test format object which will be used to - discover and run tests in the test suite. Generally this will be a builtin test + discover and run tests in the test suite. Generally this will be a builtin test format available from the *lit.formats* module. - **test_src_root** The filesystem path to the test suite root. For out-of-dir + **test_src_root** The filesystem path to the test suite root. For out-of-dir builds this is the directory that will be scanned for tests. **test_exec_root** For out-of-dir builds, the path to the test suite root inside - the object directory. This is where tests will be run and temporary output files + the object directory. This is where tests will be run and temporary output files placed. **environment** A dictionary representing the environment to use when executing tests in the suite. **suffixes** For **lit** test formats which scan directories for tests, this - variable is a list of suffixes to identify test files. Used by: *ShTest*, + variable is a list of suffixes to identify test files. Used by: *ShTest*, *TclTest*. **substitutions** For **lit** test formats which substitute variables into a test - script, the list of substitutions to perform. Used by: *ShTest*, *TclTest*. + script, the list of substitutions to perform. Used by: *ShTest*, *TclTest*. **unsupported** Mark an unsupported directory, all tests within it will be - reported as unsupported. Used by: *ShTest*, *TclTest*. + reported as unsupported. Used by: *ShTest*, *TclTest*. **parent** The parent configuration, this is the config object for the directory containing the test suite, or None. - **root** The root configuration. This is the top-most **lit** configuration in + **root** The root configuration. This is the top-most :program:`lit` configuration in the project. **on_clone** The config is actually cloned for every subdirectory inside a test - suite, to allow local configuration on a per-directory basis. The *on_clone* + suite, to allow local configuration on a per-directory basis. The *on_clone* variable can be set to a Python function which will be called whenever a - configuration is cloned (for a subdirectory). The function should takes three + configuration is cloned (for a subdirectory). The function should takes three arguments: (1) the parent configuration, (2) the new configuration (which the *on_clone* function will generally modify), and (3) the test path to the new directory being scanned. - - - TEST DISCOVERY ~~~~~~~~~~~~~~ - -Once test suites are located, **lit** recursively traverses the source directory -(following *test_src_root*) looking for tests. When **lit** enters a -sub-directory, it first checks to see if a nested test suite is defined in that -directory. If so, it loads that test suite recursively, otherwise it -instantiates a local test config for the directory (see "LOCAL CONFIGURATION -FILES"). +Once test suites are located, :program:`lit` recursively traverses the source +directory (following *test_src_root*) looking for tests. When :program:`lit` +enters a sub-directory, it first checks to see if a nested test suite is +defined in that directory. If so, it loads that test suite recursively, +otherwise it instantiates a local test config for the directory (see +:ref:`local-configuration-files`). Tests are identified by the test suite they are contained within, and the -relative path inside that suite. Note that the relative path may not refer to an -actual file on disk; some test formats (such as *GoogleTest*) define "virtual -tests" which have a path that contains both the path to the actual test file and -a subpath to identify the virtual test. +relative path inside that suite. Note that the relative path may not refer to +an actual file on disk; some test formats (such as *GoogleTest*) define +"virtual tests" which have a path that contains both the path to the actual +test file and a subpath to identify the virtual test. +.. _local-configuration-files: LOCAL CONFIGURATION FILES ~~~~~~~~~~~~~~~~~~~~~~~~~ - -When **lit** loads a subdirectory in a test suite, it instantiates a local test -configuration by cloning the configuration for the parent direction -- the root -of this configuration chain will always be a test suite. Once the test -configuration is cloned **lit** checks for a *lit.local.cfg* file in the -subdirectory. If present, this file will be loaded and can be used to specialize -the configuration for each individual directory. This facility can be used to -define subdirectories of optional tests, or to change other configuration -parameters -- for example, to change the test format, or the suffixes which -identify test files. - +When :program:`lit` loads a subdirectory in a test suite, it instantiates a +local test configuration by cloning the configuration for the parent direction +--- the root of this configuration chain will always be a test suite. Once the +test configuration is cloned :program:`lit` checks for a *lit.local.cfg* file +in the subdirectory. If present, this file will be loaded and can be used to +specialize the configuration for each individual directory. This facility can +be used to define subdirectories of optional tests, or to change other +configuration parameters --- for example, to change the test format, or the +suffixes which identify test files. TEST RUN OUTPUT FORMAT ~~~~~~~~~~~~~~~~~~~~~~ - -The b<lit> output for a test run conforms to the following schema, in both short -and verbose modes (although in short mode no PASS lines will be shown). This -schema has been chosen to be relatively easy to reliably parse by a machine (for -example in buildbot log scraping), and for other tools to generate. +The :program:`lit` output for a test run conforms to the following schema, in +both short and verbose modes (although in short mode no PASS lines will be +shown). This schema has been chosen to be relatively easy to reliably parse by +a machine (for example in buildbot log scraping), and for other tools to +generate. Each test result is expected to appear on a line that matches: -<result code>: <test name> (<progress info>) +.. code-block:: none + + <result code>: <test name> (<progress info>) -where <result-code> is a standard test result such as PASS, FAIL, XFAIL, XPASS, -UNRESOLVED, or UNSUPPORTED. The performance result codes of IMPROVED and +where ``<result-code>`` is a standard test result such as PASS, FAIL, XFAIL, +XPASS, UNRESOLVED, or UNSUPPORTED. The performance result codes of IMPROVED and REGRESSED are also allowed. -The <test name> field can consist of an arbitrary string containing no newline. +The ``<test name>`` field can consist of an arbitrary string containing no +newline. -The <progress info> field can be used to report progress information such as -(1/300) or can be empty, but even when empty the parentheses are required. +The ``<progress info>`` field can be used to report progress information such +as (1/300) or can be empty, but even when empty the parentheses are required. Each test result may include additional (multiline) log information in the -following format. +following format: + +.. code-block:: none -<log delineator> TEST '(<test name>)' <trailing delineator> -... log message ... -<log delineator> + <log delineator> TEST '(<test name>)' <trailing delineator> + ... log message ... + <log delineator> -where <test name> should be the name of a preceding reported test, <log -delineator> is a string of '\*' characters *at least* four characters long (the -recommended length is 20), and <trailing delineator> is an arbitrary (unparsed) -string. +where ``<test name>`` should be the name of a preceding reported test, ``<log +delineator>`` is a string of "*" characters *at least* four characters long +(the recommended length is 20), and ``<trailing delineator>`` is an arbitrary +(unparsed) string. The following is an example of a test run output which consists of four tests A, -B, C, and D, and a log message for the failing test C:: +B, C, and D, and a log message for the failing test C: + +.. code-block:: none PASS: A (1 of 4) PASS: B (2 of 4) FAIL: C (3 of 4) - \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\* TEST 'C' FAILED \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\* + ******************** TEST 'C' FAILED ******************** Test 'C' failed as a result of exit code 1. - \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\* + ******************** PASS: D (4 of 4) - LIT EXAMPLE TESTS ~~~~~~~~~~~~~~~~~ - -The **lit** distribution contains several example implementations of test suites -in the *ExampleTests* directory. - +The :program:`lit` distribution contains several example implementations of +test suites in the *ExampleTests* directory. SEE ALSO -------- - valgrind(1) diff --git a/docs/CommandGuide/llc.rst b/docs/CommandGuide/llc.rst index 6f1c486c3f..70354b0343 100644 --- a/docs/CommandGuide/llc.rst +++ b/docs/CommandGuide/llc.rst @@ -1,251 +1,187 @@ llc - LLVM static compiler ========================== - SYNOPSIS -------- - -**llc** [*options*] [*filename*] - +:program:`llc` [*options*] [*filename*] DESCRIPTION ----------- - -The **llc** command compiles LLVM source inputs into assembly language for a -specified architecture. The assembly language output can then be passed through -a native assembler and linker to generate a native executable. +The :program:`llc` command compiles LLVM source inputs into assembly language +for a specified architecture. The assembly language output can then be passed +through a native assembler and linker to generate a native executable. The choice of architecture for the output assembly code is automatically -determined from the input file, unless the **-march** option is used to override -the default. - +determined from the input file, unless the :option:`-march` option is used to +override the default. OPTIONS ------- +If ``filename`` is "``-``" or omitted, :program:`llc` reads from standard input. +Otherwise, it will from ``filename``. Inputs can be in either the LLVM assembly +language format (``.ll``) or the LLVM bitcode format (``.bc``). -If *filename* is - or omitted, **llc** reads from standard input. Otherwise, it -will from *filename*. Inputs can be in either the LLVM assembly language -format (.ll) or the LLVM bitcode format (.bc). +If the :option:`-o` option is omitted, then :program:`llc` will send its output +to standard output if the input is from standard input. If the :option:`-o` +option specifies "``-``", then the output will also be sent to standard output. -If the **-o** option is omitted, then **llc** will send its output to standard -output if the input is from standard input. If the **-o** option specifies -, -then the output will also be sent to standard output. +If no :option:`-o` option is specified and an input file other than "``-``" is +specified, then :program:`llc` creates the output filename by taking the input +filename, removing any existing ``.bc`` extension, and adding a ``.s`` suffix. -If no **-o** option is specified and an input file other than - is specified, -then **llc** creates the output filename by taking the input filename, -removing any existing *.bc* extension, and adding a *.s* suffix. - -Other **llc** options are as follows: +Other :program:`llc` options are described below. End-user Options ~~~~~~~~~~~~~~~~ - - -**-help** +.. option:: -help Print a summary of command line options. +.. option:: -O=uint + Generate code at different optimization levels. These correspond to the + ``-O0``, ``-O1``, ``-O2``, and ``-O3`` optimization levels used by + :program:`llvm-gcc` and :program:`clang`. -**-O**\ =\ *uint* - - Generate code at different optimization levels. These correspond to the *-O0*, - *-O1*, *-O2*, and *-O3* optimization levels used by **llvm-gcc** and - **clang**. - - - -**-mtriple**\ =\ *target triple* +.. option:: -mtriple=<target triple> Override the target triple specified in the input file with the specified string. - - -**-march**\ =\ *arch* +.. option:: -march=<arch> Specify the architecture for which to generate assembly, overriding the target - encoded in the input file. See the output of **llc -help** for a list of + encoded in the input file. See the output of ``llc -help`` for a list of valid architectures. By default this is inferred from the target triple or autodetected to the current architecture. - - -**-mcpu**\ =\ *cpuname* +.. option:: -mcpu=<cpuname> Specify a specific chip in the current architecture to generate code for. By default this is inferred from the target triple and autodetected to the current architecture. For a list of available CPUs, use: - **llvm-as < /dev/null | llc -march=xyz -mcpu=help** + .. code-block:: none + llvm-as < /dev/null | llc -march=xyz -mcpu=help -**-mattr**\ =\ *a1,+a2,-a3,...* +.. option:: -mattr=a1,+a2,-a3,... Override or control specific attributes of the target, such as whether SIMD operations are enabled or not. The default set of attributes is set by the current CPU. For a list of available attributes, use: - **llvm-as < /dev/null | llc -march=xyz -mattr=help** + .. code-block:: none + llvm-as < /dev/null | llc -march=xyz -mattr=help -**--disable-fp-elim** +.. option:: --disable-fp-elim Disable frame pointer elimination optimization. - - -**--disable-excess-fp-precision** +.. option:: --disable-excess-fp-precision Disable optimizations that may produce excess precision for floating point. Note that this option can dramatically slow down code on some systems (e.g. X86). - - -**--enable-no-infs-fp-math** +.. option:: --enable-no-infs-fp-math Enable optimizations that assume no Inf values. - - -**--enable-no-nans-fp-math** +.. option:: --enable-no-nans-fp-math Enable optimizations that assume no NAN values. - - -**--enable-unsafe-fp-math** +.. option:: --enable-unsafe-fp-math Enable optimizations that make unsafe assumptions about IEEE math (e.g. that addition is associative) or may not work for all input ranges. These optimizations allow the code generator to make use of some instructions which - would otherwise not be usable (such as fsin on X86). - + would otherwise not be usable (such as ``fsin`` on X86). +.. option:: --enable-correct-eh-support -**--enable-correct-eh-support** + Instruct the **lowerinvoke** pass to insert code for correct exception + handling support. This is expensive and is by default omitted for efficiency. - Instruct the **lowerinvoke** pass to insert code for correct exception handling - support. This is expensive and is by default omitted for efficiency. - - - -**--stats** +.. option:: --stats Print statistics recorded by code-generation passes. - - -**--time-passes** +.. option:: --time-passes Record the amount of time needed for each pass and print a report to standard error. +.. option:: --load=<dso_path> - -**--load**\ =\ *dso_path* - - Dynamically load *dso_path* (a path to a dynamically shared object) that - implements an LLVM target. This will permit the target name to be used with the - **-march** option so that code can be generated for that target. - - - + Dynamically load ``dso_path`` (a path to a dynamically shared object) that + implements an LLVM target. This will permit the target name to be used with + the :option:`-march` option so that code can be generated for that target. Tuning/Configuration Options ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - - -**--print-machineinstrs** +.. option:: --print-machineinstrs Print generated machine code between compilation phases (useful for debugging). +.. option:: --regalloc=<allocator> - -**--regalloc**\ =\ *allocator* - - Specify the register allocator to use. The default *allocator* is *local*. + Specify the register allocator to use. The default ``allocator`` is *local*. Valid register allocators are: - *simple* Very simple "always spill" register allocator - - *local* Local register allocator - - *linearscan* Linear scan global register allocator - - *iterativescan* Iterative scan global register allocator - - - - -**--spiller**\ =\ *spiller* +.. option:: --spiller=<spiller> Specify the spiller to use for register allocators that support it. Currently - this option is used only by the linear scan register allocator. The default - *spiller* is *local*. Valid spillers are: - + this option is used only by the linear scan register allocator. The default + ``spiller`` is *local*. Valid spillers are: *simple* Simple spiller - - *local* Local spiller - - - - - Intel IA-32-specific Options ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +.. option:: --x86-asm-syntax=[att|intel] - -**--x86-asm-syntax=att|intel** - - Specify whether to emit assembly code in AT&T syntax (the default) or intel + Specify whether to emit assembly code in AT&T syntax (the default) or Intel syntax. - - - - EXIT STATUS ----------- - -If **llc** succeeds, it will exit with 0. Otherwise, if an error occurs, -it will exit with a non-zero value. - +If :program:`llc` succeeds, it will exit with 0. Otherwise, if an error +occurs, it will exit with a non-zero value. SEE ALSO -------- +lli -lli|lli diff --git a/docs/CommandGuide/llvm-cov.rst b/docs/CommandGuide/llvm-cov.rst index 09275f6af7..524f24087f 100644 --- a/docs/CommandGuide/llvm-cov.rst +++ b/docs/CommandGuide/llvm-cov.rst @@ -1,51 +1,39 @@ llvm-cov - emit coverage information ==================================== - SYNOPSIS -------- - -**llvm-cov** [-gcno=filename] [-gcda=filename] [dump] - +:program:`llvm-cov` [-gcno=filename] [-gcda=filename] [dump] DESCRIPTION ----------- - -The experimental **llvm-cov** tool reads in description file generated by compiler -and coverage data file generated by instrumented program. This program assumes -that the description and data file uses same format as gcov files. - +The experimental :program:`llvm-cov` tool reads in description file generated +by compiler and coverage data file generated by instrumented program. This +program assumes that the description and data file uses same format as gcov +files. OPTIONS ------- +.. option:: -gcno=filename + This option selects input description file generated by compiler while + instrumenting program. -**-gcno=filename]** - - This option selects input description file generated by compiler while instrumenting - program. - - - -**-gcda=filename]** +.. option:: -gcda=filename This option selects coverage data file generated by instrumented compiler. +.. option:: -dump - -**-dump** - - This options enables output dump that is suitable for a developer to help debug - **llvm-cov** itself. - - - + This options enables output dump that is suitable for a developer to help + debug :program:`llvm-cov` itself. EXIT STATUS ----------- +:program:`llvm-cov` returns 1 if it cannot read input files. Otherwise, it +exits with zero. -**llvm-cov** returns 1 if it cannot read input files. Otherwise, it exits with zero. diff --git a/docs/CommandGuide/llvm-link.rst b/docs/CommandGuide/llvm-link.rst index 63019d7cca..e4f2228841 100644 --- a/docs/CommandGuide/llvm-link.rst +++ b/docs/CommandGuide/llvm-link.rst @@ -1,96 +1,74 @@ llvm-link - LLVM linker ======================= - SYNOPSIS -------- - -**llvm-link** [*options*] *filename ...* - +:program:`llvm-link` [*options*] *filename ...* DESCRIPTION ----------- +:program:`llvm-link` takes several LLVM bitcode files and links them together +into a single LLVM bitcode file. It writes the output file to standard output, +unless the :option:`-o` option is used to specify a filename. -**llvm-link** takes several LLVM bitcode files and links them together into a -single LLVM bitcode file. It writes the output file to standard output, unless -the **-o** option is used to specify a filename. - -**llvm-link** attempts to load the input files from the current directory. If -that fails, it looks for each file in each of the directories specified by the -**-L** options on the command line. The library search paths are global; each -one is searched for every input file if necessary. The directories are searched -in the order they were specified on the command line. - +:program:`llvm-link` attempts to load the input files from the current +directory. If that fails, it looks for each file in each of the directories +specified by the :option:`-L` options on the command line. The library search +paths are global; each one is searched for every input file if necessary. The +directories are searched in the order they were specified on the command line. OPTIONS ------- +.. option:: -L directory + Add the specified ``directory`` to the library search path. When looking for + libraries, :program:`llvm-link` will look in path name for libraries. This + option can be specified multiple times; :program:`llvm-link` will search + inside these directories in the order in which they were specified on the + command line. -**-L** *directory* - - Add the specified *directory* to the library search path. When looking for - libraries, **llvm-link** will look in path name for libraries. This option can be - specified multiple times; **llvm-link** will search inside these directories in - the order in which they were specified on the command line. - - - -**-f** - - Enable binary output on terminals. Normally, **llvm-link** will refuse to - write raw bitcode output if the output stream is a terminal. With this option, - **llvm-link** will write raw bitcode regardless of the output device. - - +.. option:: -f -**-o** *filename* + Enable binary output on terminals. Normally, :program:`llvm-link` will refuse + to write raw bitcode output if the output stream is a terminal. With this + option, :program:`llvm-link` will write raw bitcode regardless of the output + device. - Specify the output file name. If *filename* is ``-``, then **llvm-link** will - write its output to standard output. +.. option:: -o filename + Specify the output file name. If ``filename`` is "``-``", then + :program:`llvm-link` will write its output to standard output. - -**-S** +.. option:: -S Write output in LLVM intermediate language (instead of bitcode). +.. option:: -d - -**-d** - - If specified, **llvm-link** prints a human-readable version of the output + If specified, :program:`llvm-link` prints a human-readable version of the output bitcode file to standard error. - - -**-help** +.. option:: -help Print a summary of command line options. +.. option:: -v - -**-v** - - Verbose mode. Print information about what **llvm-link** is doing. This - typically includes a message for each bitcode file linked in and for each + Verbose mode. Print information about what :program:`llvm-link` is doing. + This typically includes a message for each bitcode file linked in and for each library found. - - - EXIT STATUS ----------- - -If **llvm-link** succeeds, it will exit with 0. Otherwise, if an error +If :program:`llvm-link` succeeds, it will exit with 0. Otherwise, if an error occurs, it will exit with a non-zero value. - SEE ALSO -------- +gccld -gccld|gccld diff --git a/docs/CommandGuide/llvm-stress.rst b/docs/CommandGuide/llvm-stress.rst index 44aa32c755..fb006f562b 100644 --- a/docs/CommandGuide/llvm-stress.rst +++ b/docs/CommandGuide/llvm-stress.rst @@ -1,48 +1,34 @@ llvm-stress - generate random .ll files ======================================= - SYNOPSIS -------- - -**llvm-stress** [-size=filesize] [-seed=initialseed] [-o=outfile] - +:program:`llvm-stress` [-size=filesize] [-seed=initialseed] [-o=outfile] DESCRIPTION ----------- - -The **llvm-stress** tool is used to generate random .ll files that can be used to -test different components of LLVM. - +The :program:`llvm-stress` tool is used to generate random ``.ll`` files that +can be used to test different components of LLVM. OPTIONS ------- - - -**-o** *filename* +.. option:: -o filename Specify the output filename. +.. option:: -size size + Specify the size of the generated ``.ll`` file. -**-size** *size* - - Specify the size of the generated .ll file. - - - -**-seed** *seed* +.. option:: -seed seed Specify the seed to be used for the randomly generated instructions. - - - EXIT STATUS ----------- +:program:`llvm-stress` returns 0. -**llvm-stress** returns 0. diff --git a/docs/CommandGuide/opt.rst b/docs/CommandGuide/opt.rst index 72f19034c9..179c297c22 100644 --- a/docs/CommandGuide/opt.rst +++ b/docs/CommandGuide/opt.rst @@ -1,183 +1,143 @@ opt - LLVM optimizer ==================== - SYNOPSIS -------- - -**opt** [*options*] [*filename*] - +:program:`opt` [*options*] [*filename*] DESCRIPTION ----------- +The :program:`opt` command is the modular LLVM optimizer and analyzer. It +takes LLVM source files as input, runs the specified optimizations or analyses +on it, and then outputs the optimized file or the analysis results. The +function of :program:`opt` depends on whether the :option:`-analyze` option is +given. -The **opt** command is the modular LLVM optimizer and analyzer. It takes LLVM -source files as input, runs the specified optimizations or analyses on it, and then -outputs the optimized file or the analysis results. The function of -**opt** depends on whether the **-analyze** option is given. - -When **-analyze** is specified, **opt** performs various analyses of the input -source. It will usually print the results on standard output, but in a few -cases, it will print output to standard error or generate a file with the -analysis output, which is usually done when the output is meant for another +When :option:`-analyze` is specified, :program:`opt` performs various analyses +of the input source. It will usually print the results on standard output, but +in a few cases, it will print output to standard error or generate a file with +the analysis output, which is usually done when the output is meant for another program. -While **-analyze** is *not* given, **opt** attempts to produce an optimized -output file. The optimizations available via **opt** depend upon what -libraries were linked into it as well as any additional libraries that have -been loaded with the **-load** option. Use the **-help** option to determine -what optimizations you can use. - -If *filename* is omitted from the command line or is *-*, **opt** reads its -input from standard input. Inputs can be in either the LLVM assembly language -format (.ll) or the LLVM bitcode format (.bc). +While :option:`-analyze` is *not* given, :program:`opt` attempts to produce an +optimized output file. The optimizations available via :program:`opt` depend +upon what libraries were linked into it as well as any additional libraries +that have been loaded with the :option:`-load` option. Use the :option:`-help` +option to determine what optimizations you can use. -If an output filename is not specified with the **-o** option, **opt** -writes its output to the standard output. +If ``filename`` is omitted from the command line or is "``-``", :program:`opt` +reads its input from standard input. Inputs can be in either the LLVM assembly +language format (``.ll``) or the LLVM bitcode format (``.bc``). +If an output filename is not specified with the :option:`-o` option, +:program:`opt` writes its output to the standard output. OPTIONS ------- +.. option:: -f + Enable binary output on terminals. Normally, :program:`opt` will refuse to + write raw bitcode output if the output stream is a terminal. With this option, + :program:`opt` will write raw bitcode regardless of the output device. -**-f** - - Enable binary output on terminals. Normally, **opt** will refuse to - write raw bitcode output if the output stream is a terminal. With this option, - **opt** will write raw bitcode regardless of the output device. - - - -**-help** +.. option:: -help Print a summary of command line options. - - -**-o** *filename* +.. option:: -o <filename> Specify the output filename. - - -**-S** +.. option:: -S Write output in LLVM intermediate language (instead of bitcode). +.. option:: -{passname} + :program:`opt` provides the ability to run any of LLVM's optimization or + analysis passes in any order. The :option:`-help` option lists all the passes + available. The order in which the options occur on the command line are the + order in which they are executed (within pass constraints). -**-{passname}** - - **opt** provides the ability to run any of LLVM's optimization or analysis passes - in any order. The **-help** option lists all the passes available. The order in - which the options occur on the command line are the order in which they are - executed (within pass constraints). - - - -**-std-compile-opts** +.. option:: -std-compile-opts This is short hand for a standard list of *compile time optimization* passes. - This is typically used to optimize the output from the llvm-gcc front end. It - might be useful for other front end compilers as well. To discover the full set - of options available, use the following command: - + This is typically used to optimize the output from the llvm-gcc front end. It + might be useful for other front end compilers as well. To discover the full + set of options available, use the following command: .. code-block:: sh llvm-as < /dev/null | opt -std-compile-opts -disable-output -debug-pass=Arguments +.. option:: -disable-inlining + This option is only meaningful when :option:`-std-compile-opts` is given. It + simply removes the inlining pass from the standard list. +.. option:: -disable-opt -**-disable-inlining** - - This option is only meaningful when **-std-compile-opts** is given. It simply - removes the inlining pass from the standard list. - - - -**-disable-opt** - - This option is only meaningful when **-std-compile-opts** is given. It disables - most, but not all, of the **-std-compile-opts**. The ones that remain are - **-verify**, **-lower-setjmp**, and **-funcresolve**. + This option is only meaningful when :option:`-std-compile-opts` is given. It + disables most, but not all, of the :option:`-std-compile-opts`. The ones that + remain are :option:`-verify`, :option:`-lower-setjmp`, and + :option:`-funcresolve`. - - -**-strip-debug** +.. option:: -strip-debug This option causes opt to strip debug information from the module before - applying other optimizations. It is essentially the same as **-strip** but it - ensures that stripping of debug information is done first. - - - -**-verify-each** - - This option causes opt to add a verify pass after every pass otherwise specified - on the command line (including **-verify**). This is useful for cases where it - is suspected that a pass is creating an invalid module but it is not clear which - pass is doing it. The combination of **-std-compile-opts** and **-verify-each** - can quickly track down this kind of problem. + applying other optimizations. It is essentially the same as :option:`-strip` + but it ensures that stripping of debug information is done first. +.. option:: -verify-each + This option causes opt to add a verify pass after every pass otherwise + specified on the command line (including :option:`-verify`). This is useful + for cases where it is suspected that a pass is creating an invalid module but + it is not clear which pass is doing it. The combination of + :option:`-std-compile-opts` and :option:`-verify-each` can quickly track down + this kind of problem. -**-profile-info-file** *filename* +.. option:: -profile-info-file <filename> - Specify the name of the file loaded by the -profile-loader option. + Specify the name of the file loaded by the ``-profile-loader`` option. - - -**-stats** +.. option:: -stats Print statistics. - - -**-time-passes** +.. option:: -time-passes Record the amount of time needed for each pass and print it to standard error. +.. option:: -debug + If this is a debug build, this option will enable debug printouts from passes + which use the ``DEBUG()`` macro. See the `LLVM Programmer's Manual + <../ProgrammersManual.html>`_, section ``#DEBUG`` for more information. -**-debug** - - If this is a debug build, this option will enable debug printouts - from passes which use the *DEBUG()* macro. See the **LLVM Programmer's - Manual**, section *#DEBUG* for more information. - - - -**-load**\ =\ *plugin* - - Load the dynamic object *plugin*. This object should register new optimization - or analysis passes. Once loaded, the object will add new command line options to - enable various optimizations or analyses. To see the new complete list of - optimizations, use the **-help** and **-load** options together. For example: +.. option:: -load=<plugin> + Load the dynamic object ``plugin``. This object should register new + optimization or analysis passes. Once loaded, the object will add new command + line options to enable various optimizations or analyses. To see the new + complete list of optimizations, use the :option:`-help` and :option:`-load` + options together. For example: .. code-block:: sh opt -load=plugin.so -help - - - -**-p** +.. option:: -p Print module after each transformation. - - - EXIT STATUS ----------- - -If **opt** succeeds, it will exit with 0. Otherwise, if an error +If :program:`opt` succeeds, it will exit with 0. Otherwise, if an error occurs, it will exit with a non-zero value. + diff --git a/docs/CommandGuide/tblgen.rst b/docs/CommandGuide/tblgen.rst index 2d191676d9..1858ee447d 100644 --- a/docs/CommandGuide/tblgen.rst +++ b/docs/CommandGuide/tblgen.rst @@ -1,186 +1,129 @@ tblgen - Target Description To C++ Code Generator ================================================= - SYNOPSIS -------- - -**tblgen** [*options*] [*filename*] - +:program:`tblgen` [*options*] [*filename*] DESCRIPTION ----------- +:program:`tblgen` translates from target description (``.td``) files into C++ +code that can be included in the definition of an LLVM target library. Most +users of LLVM will not need to use this program. It is only for assisting with +writing an LLVM target backend. -**tblgen** translates from target description (.td) files into C++ code that can -be included in the definition of an LLVM target library. Most users of LLVM will -not need to use this program. It is only for assisting with writing an LLVM -target backend. - -The input and output of **tblgen** is beyond the scope of this short -introduction. Please see the *CodeGeneration* page in the LLVM documentation. - -The *filename* argument specifies the name of a Target Description (.td) file -to read as input. +The input and output of :program:`tblgen` is beyond the scope of this short +introduction. Please see :doc:`../TableGenFundamentals`. +The *filename* argument specifies the name of a Target Description (``.td``) +file to read as input. OPTIONS ------- - - -**-help** +.. option:: -help Print a summary of command line options. +.. option:: -o filename + Specify the output file name. If ``filename`` is ``-``, then + :program:`tblgen` sends its output to standard output. -**-o** *filename* - - Specify the output file name. If *filename* is ``-``, then **tblgen** - sends its output to standard output. - - - -**-I** *directory* - - Specify where to find other target description files for inclusion. The - *directory* value should be a full or partial path to a directory that contains - target description files. - - - -**-asmparsernum** *N* +.. option:: -I directory - Make -gen-asm-parser emit assembly writer number *N*. + Specify where to find other target description files for inclusion. The + ``directory`` value should be a full or partial path to a directory that + contains target description files. +.. option:: -asmparsernum N + Make -gen-asm-parser emit assembly writer number ``N``. -**-asmwriternum** *N* +.. option:: -asmwriternum N - Make -gen-asm-writer emit assembly writer number *N*. + Make -gen-asm-writer emit assembly writer number ``N``. - - -**-class** *class Name* +.. option:: -class className Print the enumeration list for this class. - - -**-print-records** +.. option:: -print-records Print all records to standard output (default). - - -**-print-enums** +.. option:: -print-enums Print enumeration values for a class - - -**-print-sets** +.. option:: -print-sets Print expanded sets for testing DAG exprs. - - -**-gen-emitter** +.. option:: -gen-emitter Generate machine code emitter. - - -**-gen-register-info** +.. option:: -gen-register-info Generate registers and register classes info. - - -**-gen-instr-info** +.. option:: -gen-instr-info Generate instruction descriptions. - - -**-gen-asm-writer** +.. option:: -gen-asm-writer Generate the assembly writer. - - -**-gen-disassembler** +.. option:: -gen-disassembler Generate disassembler. - - -**-gen-pseudo-lowering** +.. option:: -gen-pseudo-lowering Generate pseudo instruction lowering. - - -**-gen-dag-isel** +.. option:: -gen-dag-isel Generate a DAG (Directed Acycle Graph) instruction selector. - - -**-gen-asm-matcher** +.. option:: -gen-asm-matcher Generate assembly instruction matcher. - - -**-gen-dfa-packetizer** +.. option:: -gen-dfa-packetizer Generate DFA Packetizer for VLIW targets. - - -**-gen-fast-isel** +.. option:: -gen-fast-isel Generate a "fast" instruction selector. - - -**-gen-subtarget** +.. option:: -gen-subtarget Generate subtarget enumerations. - - -**-gen-intrinsic** +.. option:: -gen-intrinsic Generate intrinsic information. - - -**-gen-tgt-intrinsic** +.. option:: -gen-tgt-intrinsic Generate target intrinsic information. - - -**-gen-enhanced-disassembly-info** +.. option:: -gen-enhanced-disassembly-info Generate enhanced disassembly info. - - -**-version** +.. option:: -version Show the version number of this program. - - - EXIT STATUS ----------- - -If **tblgen** succeeds, it will exit with 0. Otherwise, if an error +If :program:`tblgen` succeeds, it will exit with 0. Otherwise, if an error occurs, it will exit with a non-zero value. diff --git a/docs/CompilerWriterInfo.rst b/docs/CompilerWriterInfo.rst index e41f5f9eec..7504d3c75a 100644 --- a/docs/CompilerWriterInfo.rst +++ b/docs/CompilerWriterInfo.rst @@ -87,7 +87,7 @@ Intel - Official manuals and docs Other x86-specific information ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -* `Calling conventions for different C++ compilers and operating systems <http://www.agner.org/assem/calling_conventions.pdf>`_ +* `Calling conventions for different C++ compilers and operating systems <http://www.agner.org/optimize/calling_conventions.pdf>`_ Other relevant lists -------------------- diff --git a/docs/DeveloperPolicy.rst b/docs/DeveloperPolicy.rst index e35e729556..925e769b86 100644 --- a/docs/DeveloperPolicy.rst +++ b/docs/DeveloperPolicy.rst @@ -26,8 +26,8 @@ This policy is also designed to accomplish the following objectives: #. Keep the top of Subversion trees as stable as possible. -#. Establish awareness of the project's `copyright, license, and patent - policies`_ with contributors to the project. +#. Establish awareness of the project's :ref:`copyright, license, and patent + policies <copyright-license-patents>` with contributors to the project. This policy is aimed at frequent contributors to LLVM. People interested in contributing one-off patches can do so in an informal way by sending them to the @@ -180,8 +180,8 @@ Developers are required to create test cases for any bugs fixed and any new features added. Some tips for getting your testcase approved: * All feature and regression test cases are added to the ``llvm/test`` - directory. The appropriate sub-directory should be selected (see the `Testing - Guide <TestingGuide.html>`_ for details). + directory. The appropriate sub-directory should be selected (see the + :doc:`Testing Guide <TestingGuide>` for details). * Test cases should be written in `LLVM assembly language <LangRef.html>`_ unless the feature or regression being tested requires another language @@ -401,7 +401,7 @@ Hacker!" in the commit message. Overall, please do not add contributor names to the source code. -.. _copyright, license, and patent policies: +.. _copyright-license-patents: Copyright, License, and Patents =============================== diff --git a/docs/GCCFEBuildInstrs.html b/docs/GCCFEBuildInstrs.html deleted file mode 100644 index 0caf9d8618..0000000000 --- a/docs/GCCFEBuildInstrs.html +++ /dev/null @@ -1,279 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> -<html> -<head> - <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> - <link rel="stylesheet" href="_static/llvm.css" type="text/css" media="screen"> - <title>Building the LLVM GCC Front-End</title> -</head> -<body> - -<h1> - Building the LLVM GCC Front-End -</h1> - -<ol> - <li><a href="#instructions">Building llvm-gcc from Source</a></li> - <li><a href="#ada">Building the Ada front-end</a></li> - <li><a href="#fortran">Building the Fortran front-end</a></li> - <li><a href="#license">License Information</a></li> -</ol> - -<div class="doc_author"> - <p>Written by the LLVM Team</p> -</div> - -<!-- *********************************************************************** --> -<h2><a name="instructions">Building llvm-gcc from Source</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>This section describes how to acquire and build llvm-gcc 4.2, which is based -on the GCC 4.2.1 front-end. Supported languages are Ada, C, C++, Fortran, -Objective-C and Objective-C++. Note that the instructions for building these -front-ends are completely different (and much easier!) than those for building -llvm-gcc3 in the past.</p> - -<ol> - <li><p>Retrieve the appropriate llvm-gcc-4.2-<i>version</i>.source.tar.gz - archive from the <a href="http://llvm.org/releases/">LLVM web - site</a>.</p> - - <p>It is also possible to download the sources of the llvm-gcc front end - from a read-only mirror using subversion. To check out the 4.2 code - for first time use:</p> - -<div class="doc_code"> -<pre> -svn co http://llvm.org/svn/llvm-project/llvm-gcc-4.2/trunk <i>dst-directory</i> -</pre> -</div> - - <p>After that, the code can be be updated in the destination directory - using:</p> - -<div class="doc_code"> -<pre>svn update</pre> -</div> - - <p>The mirror is brought up to date every evening.</p></li> - - <li>Follow the directions in the top-level <tt>README.LLVM</tt> file for - up-to-date instructions on how to build llvm-gcc. See below for building - with support for Ada or Fortran. -</ol> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="ada">Building the Ada front-end</a></h2> -<!-- *********************************************************************** --> - -<div> -<p>Building with support for Ada amounts to following the directions in the -top-level <tt>README.LLVM</tt> file, adding ",ada" to EXTRALANGS, for example: -<tt>EXTRALANGS=,ada</tt></p> - -<p>There are some complications however:</p> - -<ol> - <li><p>The only platform for which the Ada front-end is known to build is - 32 bit intel x86 running linux. It is unlikely to build for other - systems without some work.</p></li> - <li><p>The build requires having a compiler that supports Ada, C and C++. - The Ada front-end is written in Ada so an Ada compiler is needed to - build it. Compilers known to work with the - <a href="http://llvm.org/releases/download.html">LLVM 2.7 release</a> - are <a href="http://gcc.gnu.org/releases.html">gcc-4.2</a> and the - 2005, 2006 and 2007 versions of the - <a href="http://libre.adacore.com/">GNAT GPL Edition</a>. - <b>GNAT GPL 2008, gcc-4.3 and later will not work</b>. - The LLVM parts of llvm-gcc are written in C++ so a C++ compiler is - needed to build them. The rest of gcc is written in C. - Some linux distributions provide a version of gcc that supports all - three languages (the Ada part often comes as an add-on package to - the rest of gcc). Otherwise it is possible to combine two versions - of gcc, one that supports Ada and C (such as the - <a href="http://libre.adacore.com/">2007 GNAT GPL Edition</a>) - and another which supports C++, see below.</p></li> - <li><p>Because the Ada front-end is experimental, it is wise to build the - compiler with checking enabled. This causes it to run much slower, but - helps catch mistakes in the compiler (please report any problems using - <a href="http://llvm.org/bugs/">LLVM bugzilla</a>).</p></li> - <li><p>The Ada front-end <a href="http://llvm.org/PR2007">fails to - bootstrap</a>, due to lack of LLVM support for - <tt>setjmp</tt>/<tt>longjmp</tt> style exception handling (used - internally by the compiler), so you must specify - <tt>--disable-bootstrap</tt>.</p></li> -</ol> - -<p>Supposing appropriate compilers are available, llvm-gcc with Ada support can - be built on an x86-32 linux box using the following recipe:</p> - -<ol> - <li><p>Download the <a href="http://llvm.org/releases/download.html">LLVM source</a> - and unpack it:</p> - -<pre class="doc_code"> -wget http://llvm.org/releases/2.7/llvm-2.7.tgz -tar xzf llvm-2.7.tgz -mv llvm-2.7 llvm -</pre> - - <p>or <a href="GettingStarted.html#checkout">check out the - latest version from subversion</a>:</p> - -<pre class="doc_code">svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm</pre> - - </li> - - <li><p>Download the - <a href="http://llvm.org/releases/download.html">llvm-gcc-4.2 source</a> - and unpack it:</p> - -<pre class="doc_code"> -wget http://llvm.org/releases/2.7/llvm-gcc-4.2-2.7.source.tgz -tar xzf llvm-gcc-4.2-2.7.source.tgz -mv llvm-gcc-4.2-2.7.source llvm-gcc-4.2 -</pre> - - <p>or <a href="GettingStarted.html#checkout">check out the - latest version from subversion</a>:</p> - -<pre class="doc_code"> -svn co http://llvm.org/svn/llvm-project/llvm-gcc-4.2/trunk llvm-gcc-4.2 -</pre> - </li> - - <li><p>Make a build directory <tt>llvm-objects</tt> for llvm and make it the - current directory:</p> - -<pre class="doc_code"> -mkdir llvm-objects -cd llvm-objects -</pre> - </li> - - <li><p>Configure LLVM (here it is configured to install into <tt>/usr/local</tt>):</p> - -<pre class="doc_code"> -../llvm/configure --prefix=<b>/usr/local</b> --enable-optimized --enable-assertions -</pre> - - <p>If you have a multi-compiler setup and the C++ compiler is not the - default, then you can configure like this:</p> - -<pre class="doc_code"> -CXX=<b>PATH_TO_C++_COMPILER</b> ../llvm/configure --prefix=<b>/usr/local</b> --enable-optimized --enable-assertions -</pre> - - <p>To compile without checking (not recommended), replace - <tt>--enable-assertions</tt> with <tt>--disable-assertions</tt>.</p> - - </li> - - <li><p>Build LLVM:</p> - -<pre class="doc_code"> -make -</pre> - </li> - - <li><p>Install LLVM (optional):</p> - -<pre class="doc_code"> -make install -</pre> - </li> - - <li><p>Make a build directory <tt>llvm-gcc-4.2-objects</tt> for llvm-gcc and make it the - current directory:</p> - -<pre class="doc_code"> -cd .. -mkdir llvm-gcc-4.2-objects -cd llvm-gcc-4.2-objects -</pre> - </li> - - <li><p>Configure llvm-gcc (here it is configured to install into <tt>/usr/local</tt>). - The <tt>--enable-checking</tt> flag turns on sanity checks inside the compiler. - To turn off these checks (not recommended), replace <tt>--enable-checking</tt> - with <tt>--disable-checking</tt>. - Additional languages can be appended to the <tt>--enable-languages</tt> switch, - for example <tt>--enable-languages=ada,c,c++</tt>.</p> - -<pre class="doc_code"> -../llvm-gcc-4.2/configure --prefix=<b>/usr/local</b> --enable-languages=ada,c \ - --enable-checking --enable-llvm=$PWD/../llvm-objects \ - --disable-bootstrap --disable-multilib -</pre> - - <p>If you have a multi-compiler setup, then you can configure like this:</p> - -<pre class="doc_code"> -export CC=<b>PATH_TO_C_AND_ADA_COMPILER</b> -export CXX=<b>PATH_TO_C++_COMPILER</b> -../llvm-gcc-4.2/configure --prefix=<b>/usr/local</b> --enable-languages=ada,c \ - --enable-checking --enable-llvm=$PWD/../llvm-objects \ - --disable-bootstrap --disable-multilib -</pre> - </li> - - <li><p>Build and install the compiler:</p> - -<pre class="doc_code"> -make -make install -</pre> - </li> -</ol> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="fortran">Building the Fortran front-end</a></h2> -<!-- *********************************************************************** --> - -<div> -<p>To build with support for Fortran, follow the directions in the top-level -<tt>README.LLVM</tt> file, adding ",fortran" to EXTRALANGS, for example:</p> - -<pre class="doc_code"> -EXTRALANGS=,fortran -</pre> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="license">License Information</a></h2> -<!-- *********************************************************************** --> - -<div> -<p> -The LLVM GCC frontend is licensed to you under the GNU General Public License -and the GNU Lesser General Public License. Please see the files COPYING and -COPYING.LIB for more details. -</p> - -<p> -More information is <a href="FAQ.html#license">available in the FAQ</a>. -</p> -</div> - -<!-- *********************************************************************** --> - -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss-blue" alt="Valid CSS"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"></a> - - <a href="http://llvm.org/">LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> - -</body> -</html> diff --git a/docs/GarbageCollection.html b/docs/GarbageCollection.html deleted file mode 100644 index 5bc70f1bb0..0000000000 --- a/docs/GarbageCollection.html +++ /dev/null @@ -1,1389 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> -<html> -<head> - <meta http-equiv="Content-Type" Content="text/html; charset=UTF-8" > - <title>Accurate Garbage Collection with LLVM</title> - <link rel="stylesheet" href="_static/llvm.css" type="text/css"> - <style type="text/css"> - .rowhead { text-align: left; background: inherit; } - .indent { padding-left: 1em; } - .optl { color: #BFBFBF; } - </style> -</head> -<body> - -<h1> - Accurate Garbage Collection with LLVM -</h1> - -<ol> - <li><a href="#introduction">Introduction</a> - <ul> - <li><a href="#feature">Goals and non-goals</a></li> - </ul> - </li> - - <li><a href="#quickstart">Getting started</a> - <ul> - <li><a href="#quickstart-compiler">In your compiler</a></li> - <li><a href="#quickstart-runtime">In your runtime library</a></li> - <li><a href="#shadow-stack">About the shadow stack</a></li> - </ul> - </li> - - <li><a href="#core">Core support</a> - <ul> - <li><a href="#gcattr">Specifying GC code generation: - <tt>gc "..."</tt></a></li> - <li><a href="#gcroot">Identifying GC roots on the stack: - <tt>llvm.gcroot</tt></a></li> - <li><a href="#barriers">Reading and writing references in the heap</a> - <ul> - <li><a href="#gcwrite">Write barrier: <tt>llvm.gcwrite</tt></a></li> - <li><a href="#gcread">Read barrier: <tt>llvm.gcread</tt></a></li> - </ul> - </li> - </ul> - </li> - - <li><a href="#plugin">Compiler plugin interface</a> - <ul> - <li><a href="#collector-algos">Overview of available features</a></li> - <li><a href="#stack-map">Computing stack maps</a></li> - <li><a href="#init-roots">Initializing roots to null: - <tt>InitRoots</tt></a></li> - <li><a href="#custom">Custom lowering of intrinsics: <tt>CustomRoots</tt>, - <tt>CustomReadBarriers</tt>, and <tt>CustomWriteBarriers</tt></a></li> - <li><a href="#safe-points">Generating safe points: - <tt>NeededSafePoints</tt></a></li> - <li><a href="#assembly">Emitting assembly code: - <tt>GCMetadataPrinter</tt></a></li> - </ul> - </li> - - <li><a href="#runtime-impl">Implementing a collector runtime</a> - <ul> - <li><a href="#gcdescriptors">Tracing GC pointers from heap - objects</a></li> - </ul> - </li> - - <li><a href="#references">References</a></li> - -</ol> - -<div class="doc_author"> - <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a> and - Gordon Henriksen</p> -</div> - -<!-- *********************************************************************** --> -<h2> - <a name="introduction">Introduction</a> -</h2> -<!-- *********************************************************************** --> - -<div> - -<p>Garbage collection is a widely used technique that frees the programmer from -having to know the lifetimes of heap objects, making software easier to produce -and maintain. Many programming languages rely on garbage collection for -automatic memory management. There are two primary forms of garbage collection: -conservative and accurate.</p> - -<p>Conservative garbage collection often does not require any special support -from either the language or the compiler: it can handle non-type-safe -programming languages (such as C/C++) and does not require any special -information from the compiler. The -<a href="http://www.hpl.hp.com/personal/Hans_Boehm/gc/">Boehm collector</a> is -an example of a state-of-the-art conservative collector.</p> - -<p>Accurate garbage collection requires the ability to identify all pointers in -the program at run-time (which requires that the source-language be type-safe in -most cases). Identifying pointers at run-time requires compiler support to -locate all places that hold live pointer variables at run-time, including the -<a href="#gcroot">processor stack and registers</a>.</p> - -<p>Conservative garbage collection is attractive because it does not require any -special compiler support, but it does have problems. In particular, because the -conservative garbage collector cannot <i>know</i> that a particular word in the -machine is a pointer, it cannot move live objects in the heap (preventing the -use of compacting and generational GC algorithms) and it can occasionally suffer -from memory leaks due to integer values that happen to point to objects in the -program. In addition, some aggressive compiler transformations can break -conservative garbage collectors (though these seem rare in practice).</p> - -<p>Accurate garbage collectors do not suffer from any of these problems, but -they can suffer from degraded scalar optimization of the program. In particular, -because the runtime must be able to identify and update all pointers active in -the program, some optimizations are less effective. In practice, however, the -locality and performance benefits of using aggressive garbage collection -techniques dominates any low-level losses.</p> - -<p>This document describes the mechanisms and interfaces provided by LLVM to -support accurate garbage collection.</p> - -<!-- ======================================================================= --> -<h3> - <a name="feature">Goals and non-goals</a> -</h3> - -<div> - -<p>LLVM's intermediate representation provides <a href="#intrinsics">garbage -collection intrinsics</a> that offer support for a broad class of -collector models. For instance, the intrinsics permit:</p> - -<ul> - <li>semi-space collectors</li> - <li>mark-sweep collectors</li> - <li>generational collectors</li> - <li>reference counting</li> - <li>incremental collectors</li> - <li>concurrent collectors</li> - <li>cooperative collectors</li> -</ul> - -<p>We hope that the primitive support built into the LLVM IR is sufficient to -support a broad class of garbage collected languages including Scheme, ML, Java, -C#, Perl, Python, Lua, Ruby, other scripting languages, and more.</p> - -<p>However, LLVM does not itself provide a garbage collector—this should -be part of your language's runtime library. LLVM provides a framework for -compile time <a href="#plugin">code generation plugins</a>. The role of these -plugins is to generate code and data structures which conforms to the <em>binary -interface</em> specified by the <em>runtime library</em>. This is similar to the -relationship between LLVM and DWARF debugging info, for example. The -difference primarily lies in the lack of an established standard in the domain -of garbage collection—thus the plugins.</p> - -<p>The aspects of the binary interface with which LLVM's GC support is -concerned are:</p> - -<ul> - <li>Creation of GC-safe points within code where collection is allowed to - execute safely.</li> - <li>Computation of the stack map. For each safe point in the code, object - references within the stack frame must be identified so that the - collector may traverse and perhaps update them.</li> - <li>Write barriers when storing object references to the heap. These are - commonly used to optimize incremental scans in generational - collectors.</li> - <li>Emission of read barriers when loading object references. These are - useful for interoperating with concurrent collectors.</li> -</ul> - -<p>There are additional areas that LLVM does not directly address:</p> - -<ul> - <li>Registration of global roots with the runtime.</li> - <li>Registration of stack map entries with the runtime.</li> - <li>The functions used by the program to allocate memory, trigger a - collection, etc.</li> - <li>Computation or compilation of type maps, or registration of them with - the runtime. These are used to crawl the heap for object - references.</li> -</ul> - -<p>In general, LLVM's support for GC does not include features which can be -adequately addressed with other features of the IR and does not specify a -particular binary interface. On the plus side, this means that you should be -able to integrate LLVM with an existing runtime. On the other hand, it leaves -a lot of work for the developer of a novel language. However, it's easy to get -started quickly and scale up to a more sophisticated implementation as your -compiler matures.</p> - -</div> - -</div> - -<!-- *********************************************************************** --> -<h2> - <a name="quickstart">Getting started</a> -</h2> -<!-- *********************************************************************** --> - -<div> - -<p>Using a GC with LLVM implies many things, for example:</p> - -<ul> - <li>Write a runtime library or find an existing one which implements a GC - heap.<ol> - <li>Implement a memory allocator.</li> - <li>Design a binary interface for the stack map, used to identify - references within a stack frame on the machine stack.*</li> - <li>Implement a stack crawler to discover functions on the call stack.*</li> - <li>Implement a registry for global roots.</li> - <li>Design a binary interface for type maps, used to identify references - within heap objects.</li> - <li>Implement a collection routine bringing together all of the above.</li> - </ol></li> - <li>Emit compatible code from your compiler.<ul> - <li>Initialization in the main function.</li> - <li>Use the <tt>gc "..."</tt> attribute to enable GC code generation - (or <tt>F.setGC("...")</tt>).</li> - <li>Use <tt>@llvm.gcroot</tt> to mark stack roots.</li> - <li>Use <tt>@llvm.gcread</tt> and/or <tt>@llvm.gcwrite</tt> to - manipulate GC references, if necessary.</li> - <li>Allocate memory using the GC allocation routine provided by the - runtime library.</li> - <li>Generate type maps according to your runtime's binary interface.</li> - </ul></li> - <li>Write a compiler plugin to interface LLVM with the runtime library.*<ul> - <li>Lower <tt>@llvm.gcread</tt> and <tt>@llvm.gcwrite</tt> to appropriate - code sequences.*</li> - <li>Compile LLVM's stack map to the binary form expected by the - runtime.</li> - </ul></li> - <li>Load the plugin into the compiler. Use <tt>llc -load</tt> or link the - plugin statically with your language's compiler.*</li> - <li>Link program executables with the runtime.</li> -</ul> - -<p>To help with several of these tasks (those indicated with a *), LLVM -includes a highly portable, built-in ShadowStack code generator. It is compiled -into <tt>llc</tt> and works even with the interpreter and C backends.</p> - -<!-- ======================================================================= --> -<h3> - <a name="quickstart-compiler">In your compiler</a> -</h3> - -<div> - -<p>To turn the shadow stack on for your functions, first call:</p> - -<div class="doc_code"><pre ->F.setGC("shadow-stack");</pre></div> - -<p>for each function your compiler emits. Since the shadow stack is built into -LLVM, you do not need to load a plugin.</p> - -<p>Your compiler must also use <tt>@llvm.gcroot</tt> as documented. -Don't forget to create a root for each intermediate value that is generated -when evaluating an expression. In <tt>h(f(), g())</tt>, the result of -<tt>f()</tt> could easily be collected if evaluating <tt>g()</tt> triggers a -collection.</p> - -<p>There's no need to use <tt>@llvm.gcread</tt> and <tt>@llvm.gcwrite</tt> over -plain <tt>load</tt> and <tt>store</tt> for now. You will need them when -switching to a more advanced GC.</p> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="quickstart-runtime">In your runtime</a> -</h3> - -<div> - -<p>The shadow stack doesn't imply a memory allocation algorithm. A semispace -collector or building atop <tt>malloc</tt> are great places to start, and can -be implemented with very little code.</p> - -<p>When it comes time to collect, however, your runtime needs to traverse the -stack roots, and for this it needs to integrate with the shadow stack. Luckily, -doing so is very simple. (This code is heavily commented to help you -understand the data structure, but there are only 20 lines of meaningful -code.)</p> - -<pre class="doc_code"> -/// @brief The map for a single function's stack frame. One of these is -/// compiled as constant data into the executable for each function. -/// -/// Storage of metadata values is elided if the %metadata parameter to -/// @llvm.gcroot is null. -struct FrameMap { - int32_t NumRoots; //< Number of roots in stack frame. - int32_t NumMeta; //< Number of metadata entries. May be < NumRoots. - const void *Meta[0]; //< Metadata for each root. -}; - -/// @brief A link in the dynamic shadow stack. One of these is embedded in the -/// stack frame of each function on the call stack. -struct StackEntry { - StackEntry *Next; //< Link to next stack entry (the caller's). - const FrameMap *Map; //< Pointer to constant FrameMap. - void *Roots[0]; //< Stack roots (in-place array). -}; - -/// @brief The head of the singly-linked list of StackEntries. Functions push -/// and pop onto this in their prologue and epilogue. -/// -/// Since there is only a global list, this technique is not threadsafe. -StackEntry *llvm_gc_root_chain; - -/// @brief Calls Visitor(root, meta) for each GC root on the stack. -/// root and meta are exactly the values passed to -/// <tt>@llvm.gcroot</tt>. -/// -/// Visitor could be a function to recursively mark live objects. Or it -/// might copy them to another heap or generation. -/// -/// @param Visitor A function to invoke for every GC root on the stack. -void visitGCRoots(void (*Visitor)(void **Root, const void *Meta)) { - for (StackEntry *R = llvm_gc_root_chain; R; R = R->Next) { - unsigned i = 0; - - // For roots [0, NumMeta), the metadata pointer is in the FrameMap. - for (unsigned e = R->Map->NumMeta; i != e; ++i) - Visitor(&R->Roots[i], R->Map->Meta[i]); - - // For roots [NumMeta, NumRoots), the metadata pointer is null. - for (unsigned e = R->Map->NumRoots; i != e; ++i) - Visitor(&R->Roots[i], NULL); - } -}</pre> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="shadow-stack">About the shadow stack</a> -</h3> - -<div> - -<p>Unlike many GC algorithms which rely on a cooperative code generator to -compile stack maps, this algorithm carefully maintains a linked list of stack -roots [<a href="#henderson02">Henderson2002</a>]. This so-called "shadow stack" -mirrors the machine stack. Maintaining this data structure is slower than using -a stack map compiled into the executable as constant data, but has a significant -portability advantage because it requires no special support from the target -code generator, and does not require tricky platform-specific code to crawl -the machine stack.</p> - -<p>The tradeoff for this simplicity and portability is:</p> - -<ul> - <li>High overhead per function call.</li> - <li>Not thread-safe.</li> -</ul> - -<p>Still, it's an easy way to get started. After your compiler and runtime are -up and running, writing a <a href="#plugin">plugin</a> will allow you to take -advantage of <a href="#collector-algos">more advanced GC features</a> of LLVM -in order to improve performance.</p> - -</div> - -</div> - -<!-- *********************************************************************** --> -<h2> - <a name="core">IR features</a><a name="intrinsics"></a> -</h2> -<!-- *********************************************************************** --> - -<div> - -<p>This section describes the garbage collection facilities provided by the -<a href="LangRef.html">LLVM intermediate representation</a>. The exact behavior -of these IR features is specified by the binary interface implemented by a -<a href="#plugin">code generation plugin</a>, not by this document.</p> - -<p>These facilities are limited to those strictly necessary; they are not -intended to be a complete interface to any garbage collector. A program will -need to interface with the GC library using the facilities provided by that -program.</p> - -<!-- ======================================================================= --> -<h3> - <a name="gcattr">Specifying GC code generation: <tt>gc "..."</tt></a> -</h3> - -<div> - -<div class="doc_code"><tt> - define <i>ty</i> @<i>name</i>(...) <span style="text-decoration: underline">gc "<i>name</i>"</span> { ... -</tt></div> - -<p>The <tt>gc</tt> function attribute is used to specify the desired GC style -to the compiler. Its programmatic equivalent is the <tt>setGC</tt> method of -<tt>Function</tt>.</p> - -<p>Setting <tt>gc "<i>name</i>"</tt> on a function triggers a search for a -matching code generation plugin "<i>name</i>"; it is that plugin which defines -the exact nature of the code generated to support GC. If none is found, the -compiler will raise an error.</p> - -<p>Specifying the GC style on a per-function basis allows LLVM to link together -programs that use different garbage collection algorithms (or none at all).</p> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="gcroot">Identifying GC roots on the stack: <tt>llvm.gcroot</tt></a> -</h3> - -<div> - -<div class="doc_code"><tt> - void @llvm.gcroot(i8** %ptrloc, i8* %metadata) -</tt></div> - -<p>The <tt>llvm.gcroot</tt> intrinsic is used to inform LLVM that a stack -variable references an object on the heap and is to be tracked for garbage -collection. The exact impact on generated code is specified by a <a -href="#plugin">compiler plugin</a>. All calls to <tt>llvm.gcroot</tt> <b>must</b> reside - inside the first basic block.</p> - -<p>A compiler which uses mem2reg to raise imperative code using <tt>alloca</tt> -into SSA form need only add a call to <tt>@llvm.gcroot</tt> for those variables -which a pointers into the GC heap.</p> - -<p>It is also important to mark intermediate values with <tt>llvm.gcroot</tt>. -For example, consider <tt>h(f(), g())</tt>. Beware leaking the result of -<tt>f()</tt> in the case that <tt>g()</tt> triggers a collection. Note, that -stack variables must be initialized and marked with <tt>llvm.gcroot</tt> in -function's prologue.</p> - -<p>The first argument <b>must</b> be a value referring to an alloca instruction -or a bitcast of an alloca. The second contains a pointer to metadata that -should be associated with the pointer, and <b>must</b> be a constant or global -value address. If your target collector uses tags, use a null pointer for -metadata.</p> - -<p>The <tt>%metadata</tt> argument can be used to avoid requiring heap objects -to have 'isa' pointers or tag bits. [<a href="#appel89">Appel89</a>, <a -href="#goldberg91">Goldberg91</a>, <a href="#tolmach94">Tolmach94</a>] If -specified, its value will be tracked along with the location of the pointer in -the stack frame.</p> - -<p>Consider the following fragment of Java code:</p> - -<pre class="doc_code"> - { - Object X; // A null-initialized reference to an object - ... - } -</pre> - -<p>This block (which may be located in the middle of a function or in a loop -nest), could be compiled to this LLVM code:</p> - -<pre class="doc_code"> -Entry: - ;; In the entry block for the function, allocate the - ;; stack space for X, which is an LLVM pointer. - %X = alloca %Object* - - ;; Tell LLVM that the stack space is a stack root. - ;; Java has type-tags on objects, so we pass null as metadata. - %tmp = bitcast %Object** %X to i8** - call void @llvm.gcroot(i8** %tmp, i8* null) - ... - - ;; "CodeBlock" is the block corresponding to the start - ;; of the scope above. -CodeBlock: - ;; Java null-initializes pointers. - store %Object* null, %Object** %X - - ... - - ;; As the pointer goes out of scope, store a null value into - ;; it, to indicate that the value is no longer live. - store %Object* null, %Object** %X - ... -</pre> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="barriers">Reading and writing references in the heap</a> -</h3> - -<div> - -<p>Some collectors need to be informed when the mutator (the program that needs -garbage collection) either reads a pointer from or writes a pointer to a field -of a heap object. The code fragments inserted at these points are called -<em>read barriers</em> and <em>write barriers</em>, respectively. The amount of -code that needs to be executed is usually quite small and not on the critical -path of any computation, so the overall performance impact of the barrier is -tolerable.</p> - -<p>Barriers often require access to the <em>object pointer</em> rather than the -<em>derived pointer</em> (which is a pointer to the field within the -object). Accordingly, these intrinsics take both pointers as separate arguments -for completeness. In this snippet, <tt>%object</tt> is the object pointer, and -<tt>%derived</tt> is the derived pointer:</p> - -<blockquote><pre> - ;; An array type. - %class.Array = type { %class.Object, i32, [0 x %class.Object*] } - ... - - ;; Load the object pointer from a gcroot. - %object = load %class.Array** %object_addr - - ;; Compute the derived pointer. - %derived = getelementptr %object, i32 0, i32 2, i32 %n</pre></blockquote> - -<p>LLVM does not enforce this relationship between the object and derived -pointer (although a <a href="#plugin">plugin</a> might). However, it would be -an unusual collector that violated it.</p> - -<p>The use of these intrinsics is naturally optional if the target GC does -require the corresponding barrier. Such a GC plugin will replace the intrinsic -calls with the corresponding <tt>load</tt> or <tt>store</tt> instruction if they -are used.</p> - -<!-- ======================================================================= --> -<h4> - <a name="gcwrite">Write barrier: <tt>llvm.gcwrite</tt></a> -</h4> - -<div> - -<div class="doc_code"><tt> -void @llvm.gcwrite(i8* %value, i8* %object, i8** %derived) -</tt></div> - -<p>For write barriers, LLVM provides the <tt>llvm.gcwrite</tt> intrinsic -function. It has exactly the same semantics as a non-volatile <tt>store</tt> to -the derived pointer (the third argument). The exact code generated is specified -by a <a href="#plugin">compiler plugin</a>.</p> - -<p>Many important algorithms require write barriers, including generational -and concurrent collectors. Additionally, write barriers could be used to -implement reference counting.</p> - -</div> - -<!-- ======================================================================= --> -<h4> - <a name="gcread">Read barrier: <tt>llvm.gcread</tt></a> -</h4> - -<div> - -<div class="doc_code"><tt> -i8* @llvm.gcread(i8* %object, i8** %derived)<br> -</tt></div> - -<p>For read barriers, LLVM provides the <tt>llvm.gcread</tt> intrinsic function. -It has exactly the same semantics as a non-volatile <tt>load</tt> from the -derived pointer (the second argument). The exact code generated is specified by -a <a href="#plugin">compiler plugin</a>.</p> - -<p>Read barriers are needed by fewer algorithms than write barriers, and may -have a greater performance impact since pointer reads are more frequent than -writes.</p> - -</div> - -</div> - -</div> - -<!-- *********************************************************************** --> -<h2> - <a name="plugin">Implementing a collector plugin</a> -</h2> -<!-- *********************************************************************** --> - -<div> - -<p>User code specifies which GC code generation to use with the <tt>gc</tt> -function attribute or, equivalently, with the <tt>setGC</tt> method of -<tt>Function</tt>.</p> - -<p>To implement a GC plugin, it is necessary to subclass -<tt>llvm::GCStrategy</tt>, which can be accomplished in a few lines of -boilerplate code. LLVM's infrastructure provides access to several important -algorithms. For an uncontroversial collector, all that remains may be to -compile LLVM's computed stack map to assembly code (using the binary -representation expected by the runtime library). This can be accomplished in -about 100 lines of code.</p> - -<p>This is not the appropriate place to implement a garbage collected heap or a -garbage collector itself. That code should exist in the language's runtime -library. The compiler plugin is responsible for generating code which -conforms to the binary interface defined by library, most essentially the -<a href="#stack-map">stack map</a>.</p> - -<p>To subclass <tt>llvm::GCStrategy</tt> and register it with the compiler:</p> - -<blockquote><pre>// lib/MyGC/MyGC.cpp - Example LLVM GC plugin - -#include "llvm/CodeGen/GCStrategy.h" -#include "llvm/CodeGen/GCMetadata.h" -#include "llvm/Support/Compiler.h" - -using namespace llvm; - -namespace { - class LLVM_LIBRARY_VISIBILITY MyGC : public GCStrategy { - public: - MyGC() {} - }; - - GCRegistry::Add<MyGC> - X("mygc", "My bespoke garbage collector."); -}</pre></blockquote> - -<p>This boilerplate collector does nothing. More specifically:</p> - -<ul> - <li><tt>llvm.gcread</tt> calls are replaced with the corresponding - <tt>load</tt> instruction.</li> - <li><tt>llvm.gcwrite</tt> calls are replaced with the corresponding - <tt>store</tt> instruction.</li> - <li>No safe points are added to the code.</li> - <li>The stack map is not compiled into the executable.</li> -</ul> - -<p>Using the LLVM makefiles (like the <a -href="http://llvm.org/viewvc/llvm-project/llvm/trunk/projects/sample/">sample -project</a>), this code can be compiled as a plugin using a simple -makefile:</p> - -<blockquote><pre -># lib/MyGC/Makefile - -LEVEL := ../.. -LIBRARYNAME = <var>MyGC</var> -LOADABLE_MODULE = 1 - -include $(LEVEL)/Makefile.common</pre></blockquote> - -<p>Once the plugin is compiled, code using it may be compiled using <tt>llc --load=<var>MyGC.so</var></tt> (though <var>MyGC.so</var> may have some other -platform-specific extension):</p> - -<blockquote><pre ->$ cat sample.ll -define void @f() gc "mygc" { -entry: - ret void -} -$ llvm-as < sample.ll | llc -load=MyGC.so</pre></blockquote> - -<p>It is also possible to statically link the collector plugin into tools, such -as a language-specific compiler front-end.</p> - -<!-- ======================================================================= --> -<h3> - <a name="collector-algos">Overview of available features</a> -</h3> - -<div> - -<p><tt>GCStrategy</tt> provides a range of features through which a plugin -may do useful work. Some of these are callbacks, some are algorithms that can -be enabled, disabled, or customized. This matrix summarizes the supported (and -planned) features and correlates them with the collection techniques which -typically require them.</p> - -<table> - <tr> - <th>Algorithm</th> - <th>Done</th> - <th>shadow stack</th> - <th>refcount</th> - <th>mark-sweep</th> - <th>copying</th> - <th>incremental</th> - <th>threaded</th> - <th>concurrent</th> - </tr> - <tr> - <th class="rowhead"><a href="#stack-map">stack map</a></th> - <td>✔</td> - <td></td> - <td></td> - <td>✘</td> - <td>✘</td> - <td>✘</td> - <td>✘</td> - <td>✘</td> - </tr> - <tr> - <th class="rowhead"><a href="#init-roots">initialize roots</a></th> - <td>✔</td> - <td>✘</td> - <td>✘</td> - <td>✘</td> - <td>✘</td> - <td>✘</td> - <td>✘</td> - <td>✘</td> - </tr> - <tr class="doc_warning"> - <th class="rowhead">derived pointers</th> - <td>NO</td> - <td></td> - <td></td> - <td></td> - <td></td> - <td></td> - <td>✘*</td> - <td>✘*</td> - </tr> - <tr> - <th class="rowhead"><em><a href="#custom">custom lowering</a></em></th> - <td>✔</td> - <th></th> - <th></th> - <th></th> - <th></th> - <th></th> - <th></th> - <th></th> - </tr> - <tr> - <th class="rowhead indent">gcroot</th> - <td>✔</td> - <td>✘</td> - <td>✘</td> - <td></td> - <td></td> - <td></td> - <td></td> - <td></td> - </tr> - <tr> - <th class="rowhead indent">gcwrite</th> - <td>✔</td> - <td></td> - <td>✘</td> - <td></td> - <td></td> - <td>✘</td> - <td></td> - <td>✘</td> - </tr> - <tr> - <th class="rowhead indent">gcread</th> - <td>✔</td> - <td></td> - <td></td> - <td></td> - <td></td> - <td></td> - <td></td> - <td>✘</td> - </tr> - <tr> - <th class="rowhead"><em><a href="#safe-points">safe points</a></em></th> - <td></td> - <th></th> - <th></th> - <th></th> - <th></th> - <th></th> - <th></th> - <th></th> - </tr> - <tr> - <th class="rowhead indent">in calls</th> - <td>✔</td> - <td></td> - <td></td> - <td>✘</td> - <td>✘</td> - <td>✘</td> - <td>✘</td> - <td>✘</td> - </tr> - <tr> - <th class="rowhead indent">before calls</th> - <td>✔</td> - <td></td> - <td></td> - <td></td> - <td></td> - <td></td> - <td>✘</td> - <td>✘</td> - </tr> - <tr class="doc_warning"> - <th class="rowhead indent">for loops</th> - <td>NO</td> - <td></td> - <td></td> - <td></td> - <td></td> - <td></td> - <td>✘</td> - <td>✘</td> - </tr> - <tr> - <th class="rowhead indent">before escape</th> - <td>✔</td> - <td></td> - <td></td> - <td></td> - <td></td> - <td></td> - <td>✘</td> - <td>✘</td> - </tr> - <tr class="doc_warning"> - <th class="rowhead">emit code at safe points</th> - <td>NO</td> - <td></td> - <td></td> - <td></td> - <td></td> - <td></td> - <td>✘</td> - <td>✘</td> - </tr> - <tr> - <th class="rowhead"><em>output</em></th> - <td></td> - <th></th> - <th></th> - <th></th> - <th></th> - <th></th> - <th></th> - <th></th> - </tr> - <tr> - <th class="rowhead indent"><a href="#assembly">assembly</a></th> - <td>✔</td> - <td></td> - <td></td> - <td>✘</td> - <td>✘</td> - <td>✘</td> - <td>✘</td> - <td>✘</td> - </tr> - <tr class="doc_warning"> - <th class="rowhead indent">JIT</th> - <td>NO</td> - <td></td> - <td></td> - <td class="optl">✘</td> - <td class="optl">✘</td> - <td class="optl">✘</td> - <td class="optl">✘</td> - <td class="optl">✘</td> - </tr> - <tr class="doc_warning"> - <th class="rowhead indent">obj</th> - <td>NO</td> - <td></td> - <td></td> - <td class="optl">✘</td> - <td class="optl">✘</td> - <td class="optl">✘</td> - <td class="optl">✘</td> - <td class="optl">✘</td> - </tr> - <tr class="doc_warning"> - <th class="rowhead">live analysis</th> - <td>NO</td> - <td></td> - <td></td> - <td class="optl">✘</td> - <td class="optl">✘</td> - <td class="optl">✘</td> - <td class="optl">✘</td> - <td class="optl">✘</td> - </tr> - <tr class="doc_warning"> - <th class="rowhead">register map</th> - <td>NO</td> - <td></td> - <td></td> - <td class="optl">✘</td> - <td class="optl">✘</td> - <td class="optl">✘</td> - <td class="optl">✘</td> - <td class="optl">✘</td> - </tr> - <tr> - <td colspan="10"> - <div><span class="doc_warning">*</span> Derived pointers only pose a - hazard to copying collectors.</div> - <div><span class="optl">✘</span> in gray denotes a feature which - could be utilized if available.</div> - </td> - </tr> -</table> - -<p>To be clear, the collection techniques above are defined as:</p> - -<dl> - <dt>Shadow Stack</dt> - <dd>The mutator carefully maintains a linked list of stack roots.</dd> - <dt>Reference Counting</dt> - <dd>The mutator maintains a reference count for each object and frees an - object when its count falls to zero.</dd> - <dt>Mark-Sweep</dt> - <dd>When the heap is exhausted, the collector marks reachable objects starting - from the roots, then deallocates unreachable objects in a sweep - phase.</dd> - <dt>Copying</dt> - <dd>As reachability analysis proceeds, the collector copies objects from one - heap area to another, compacting them in the process. Copying collectors - enable highly efficient "bump pointer" allocation and can improve locality - of reference.</dd> - <dt>Incremental</dt> - <dd>(Including generational collectors.) Incremental collectors generally have - all the properties of a copying collector (regardless of whether the - mature heap is compacting), but bring the added complexity of requiring - write barriers.</dd> - <dt>Threaded</dt> - <dd>Denotes a multithreaded mutator; the collector must still stop the mutator - ("stop the world") before beginning reachability analysis. Stopping a - multithreaded mutator is a complicated problem. It generally requires - highly platform specific code in the runtime, and the production of - carefully designed machine code at safe points.</dd> - <dt>Concurrent</dt> - <dd>In this technique, the mutator and the collector run concurrently, with - the goal of eliminating pause times. In a <em>cooperative</em> collector, - the mutator further aids with collection should a pause occur, allowing - collection to take advantage of multiprocessor hosts. The "stop the world" - problem of threaded collectors is generally still present to a limited - extent. Sophisticated marking algorithms are necessary. Read barriers may - be necessary.</dd> -</dl> - -<p>As the matrix indicates, LLVM's garbage collection infrastructure is already -suitable for a wide variety of collectors, but does not currently extend to -multithreaded programs. This will be added in the future as there is -interest.</p> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="stack-map">Computing stack maps</a> -</h3> - -<div> - -<p>LLVM automatically computes a stack map. One of the most important features -of a <tt>GCStrategy</tt> is to compile this information into the executable in -the binary representation expected by the runtime library.</p> - -<p>The stack map consists of the location and identity of each GC root in the -each function in the module. For each root:</p> - -<ul> - <li><tt>RootNum</tt>: The index of the root.</li> - <li><tt>StackOffset</tt>: The offset of the object relative to the frame - pointer.</li> - <li><tt>RootMetadata</tt>: The value passed as the <tt>%metadata</tt> - parameter to the <a href="#gcroot"><tt>@llvm.gcroot</tt></a> intrinsic.</li> -</ul> - -<p>Also, for the function as a whole:</p> - -<ul> - <li><tt>getFrameSize()</tt>: The overall size of the function's initial - stack frame, not accounting for any dynamic allocation.</li> - <li><tt>roots_size()</tt>: The count of roots in the function.</li> -</ul> - -<p>To access the stack map, use <tt>GCFunctionMetadata::roots_begin()</tt> and --<tt>end()</tt> from the <tt><a -href="#assembly">GCMetadataPrinter</a></tt>:</p> - -<blockquote><pre ->for (iterator I = begin(), E = end(); I != E; ++I) { - GCFunctionInfo *FI = *I; - unsigned FrameSize = FI->getFrameSize(); - size_t RootCount = FI->roots_size(); - - for (GCFunctionInfo::roots_iterator RI = FI->roots_begin(), - RE = FI->roots_end(); - RI != RE; ++RI) { - int RootNum = RI->Num; - int RootStackOffset = RI->StackOffset; - Constant *RootMetadata = RI->Metadata; - } -}</pre></blockquote> - -<p>If the <tt>llvm.gcroot</tt> intrinsic is eliminated before code generation by -a custom lowering pass, LLVM will compute an empty stack map. This may be useful -for collector plugins which implement reference counting or a shadow stack.</p> - -</div> - - -<!-- ======================================================================= --> -<h3> - <a name="init-roots">Initializing roots to null: <tt>InitRoots</tt></a> -</h3> - -<div> - -<blockquote><pre ->MyGC::MyGC() { - InitRoots = true; -}</pre></blockquote> - -<p>When set, LLVM will automatically initialize each root to <tt>null</tt> upon -entry to the function. This prevents the GC's sweep phase from visiting -uninitialized pointers, which will almost certainly cause it to crash. This -initialization occurs before custom lowering, so the two may be used -together.</p> - -<p>Since LLVM does not yet compute liveness information, there is no means of -distinguishing an uninitialized stack root from an initialized one. Therefore, -this feature should be used by all GC plugins. It is enabled by default.</p> - -</div> - - -<!-- ======================================================================= --> -<h3> - <a name="custom">Custom lowering of intrinsics: <tt>CustomRoots</tt>, - <tt>CustomReadBarriers</tt>, and <tt>CustomWriteBarriers</tt></a> -</h3> - -<div> - -<p>For GCs which use barriers or unusual treatment of stack roots, these -flags allow the collector to perform arbitrary transformations of the LLVM -IR:</p> - -<blockquote><pre ->class MyGC : public GCStrategy { -public: - MyGC() { - CustomRoots = true; - CustomReadBarriers = true; - CustomWriteBarriers = true; - } - - virtual bool initializeCustomLowering(Module &M); - virtual bool performCustomLowering(Function &F); -};</pre></blockquote> - -<p>If any of these flags are set, then LLVM suppresses its default lowering for -the corresponding intrinsics and instead calls -<tt>performCustomLowering</tt>.</p> - -<p>LLVM's default action for each intrinsic is as follows:</p> - -<ul> - <li><tt>llvm.gcroot</tt>: Leave it alone. The code generator must see it - or the stack map will not be computed.</li> - <li><tt>llvm.gcread</tt>: Substitute a <tt>load</tt> instruction.</li> - <li><tt>llvm.gcwrite</tt>: Substitute a <tt>store</tt> instruction.</li> -</ul> - -<p>If <tt>CustomReadBarriers</tt> or <tt>CustomWriteBarriers</tt> are specified, -then <tt>performCustomLowering</tt> <strong>must</strong> eliminate the -corresponding barriers.</p> - -<p><tt>performCustomLowering</tt> must comply with the same restrictions as <a -href="WritingAnLLVMPass.html#runOnFunction"><tt ->FunctionPass::runOnFunction</tt></a>. -Likewise, <tt>initializeCustomLowering</tt> has the same semantics as <a -href="WritingAnLLVMPass.html#doInitialization_mod"><tt ->Pass::doInitialization(Module&)</tt></a>.</p> - -<p>The following can be used as a template:</p> - -<blockquote><pre ->#include "llvm/Module.h" -#include "llvm/IntrinsicInst.h" - -bool MyGC::initializeCustomLowering(Module &M) { - return false; -} - -bool MyGC::performCustomLowering(Function &F) { - bool MadeChange = false; - - for (Function::iterator BB = F.begin(), E = F.end(); BB != E; ++BB) - for (BasicBlock::iterator II = BB->begin(), E = BB->end(); II != E; ) - if (IntrinsicInst *CI = dyn_cast<IntrinsicInst>(II++)) - if (Function *F = CI->getCalledFunction()) - switch (F->getIntrinsicID()) { - case Intrinsic::gcwrite: - // Handle llvm.gcwrite. - CI->eraseFromParent(); - MadeChange = true; - break; - case Intrinsic::gcread: - // Handle llvm.gcread. - CI->eraseFromParent(); - MadeChange = true; - break; - case Intrinsic::gcroot: - // Handle llvm.gcroot. - CI->eraseFromParent(); - MadeChange = true; - break; - } - - return MadeChange; -}</pre></blockquote> - -</div> - - -<!-- ======================================================================= --> -<h3> - <a name="safe-points">Generating safe points: <tt>NeededSafePoints</tt></a> -</h3> - -<div> - -<p>LLVM can compute four kinds of safe points:</p> - -<blockquote><pre ->namespace GC { - /// PointKind - The type of a collector-safe point. - /// - enum PointKind { - Loop, //< Instr is a loop (backwards branch). - Return, //< Instr is a return instruction. - PreCall, //< Instr is a call instruction. - PostCall //< Instr is the return address of a call. - }; -}</pre></blockquote> - -<p>A collector can request any combination of the four by setting the -<tt>NeededSafePoints</tt> mask:</p> - -<blockquote><pre ->MyGC::MyGC() { - NeededSafePoints = 1 << GC::Loop - | 1 << GC::Return - | 1 << GC::PreCall - | 1 << GC::PostCall; -}</pre></blockquote> - -<p>It can then use the following routines to access safe points.</p> - -<blockquote><pre ->for (iterator I = begin(), E = end(); I != E; ++I) { - GCFunctionInfo *MD = *I; - size_t PointCount = MD->size(); - - for (GCFunctionInfo::iterator PI = MD->begin(), - PE = MD->end(); PI != PE; ++PI) { - GC::PointKind PointKind = PI->Kind; - unsigned PointNum = PI->Num; - } -} -</pre></blockquote> - -<p>Almost every collector requires <tt>PostCall</tt> safe points, since these -correspond to the moments when the function is suspended during a call to a -subroutine.</p> - -<p>Threaded programs generally require <tt>Loop</tt> safe points to guarantee -that the application will reach a safe point within a bounded amount of time, -even if it is executing a long-running loop which contains no function -calls.</p> - -<p>Threaded collectors may also require <tt>Return</tt> and <tt>PreCall</tt> -safe points to implement "stop the world" techniques using self-modifying code, -where it is important that the program not exit the function without reaching a -safe point (because only the topmost function has been patched).</p> - -</div> - - -<!-- ======================================================================= --> -<h3> - <a name="assembly">Emitting assembly code: <tt>GCMetadataPrinter</tt></a> -</h3> - -<div> - -<p>LLVM allows a plugin to print arbitrary assembly code before and after the -rest of a module's assembly code. At the end of the module, the GC can compile -the LLVM stack map into assembly code. (At the beginning, this information is not -yet computed.)</p> - -<p>Since AsmWriter and CodeGen are separate components of LLVM, a separate -abstract base class and registry is provided for printing assembly code, the -<tt>GCMetadaPrinter</tt> and <tt>GCMetadataPrinterRegistry</tt>. The AsmWriter -will look for such a subclass if the <tt>GCStrategy</tt> sets -<tt>UsesMetadata</tt>:</p> - -<blockquote><pre ->MyGC::MyGC() { - UsesMetadata = true; -}</pre></blockquote> - -<p>This separation allows JIT-only clients to be smaller.</p> - -<p>Note that LLVM does not currently have analogous APIs to support code -generation in the JIT, nor using the object writers.</p> - -<blockquote><pre ->// lib/MyGC/MyGCPrinter.cpp - Example LLVM GC printer - -#include "llvm/CodeGen/GCMetadataPrinter.h" -#include "llvm/Support/Compiler.h" - -using namespace llvm; - -namespace { - class LLVM_LIBRARY_VISIBILITY MyGCPrinter : public GCMetadataPrinter { - public: - virtual void beginAssembly(std::ostream &OS, AsmPrinter &AP, - const TargetAsmInfo &TAI); - - virtual void finishAssembly(std::ostream &OS, AsmPrinter &AP, - const TargetAsmInfo &TAI); - }; - - GCMetadataPrinterRegistry::Add<MyGCPrinter> - X("mygc", "My bespoke garbage collector."); -}</pre></blockquote> - -<p>The collector should use <tt>AsmPrinter</tt> and <tt>TargetAsmInfo</tt> to -print portable assembly code to the <tt>std::ostream</tt>. The collector itself -contains the stack map for the entire module, and may access the -<tt>GCFunctionInfo</tt> using its own <tt>begin()</tt> and <tt>end()</tt> -methods. Here's a realistic example:</p> - -<blockquote><pre ->#include "llvm/CodeGen/AsmPrinter.h" -#include "llvm/Function.h" -#include "llvm/Target/TargetMachine.h" -#include "llvm/DataLayout.h" -#include "llvm/Target/TargetAsmInfo.h" - -void MyGCPrinter::beginAssembly(std::ostream &OS, AsmPrinter &AP, - const TargetAsmInfo &TAI) { - // Nothing to do. -} - -void MyGCPrinter::finishAssembly(std::ostream &OS, AsmPrinter &AP, - const TargetAsmInfo &TAI) { - // Set up for emitting addresses. - const char *AddressDirective; - int AddressAlignLog; - if (AP.TM.getDataLayout()->getPointerSize() == sizeof(int32_t)) { - AddressDirective = TAI.getData32bitsDirective(); - AddressAlignLog = 2; - } else { - AddressDirective = TAI.getData64bitsDirective(); - AddressAlignLog = 3; - } - - // Put this in the data section. - AP.SwitchToDataSection(TAI.getDataSection()); - - // For each function... - for (iterator FI = begin(), FE = end(); FI != FE; ++FI) { - GCFunctionInfo &MD = **FI; - - // Emit this data structure: - // - // struct { - // int32_t PointCount; - // struct { - // void *SafePointAddress; - // int32_t LiveCount; - // int32_t LiveOffsets[LiveCount]; - // } Points[PointCount]; - // } __gcmap_<FUNCTIONNAME>; - - // Align to address width. - AP.EmitAlignment(AddressAlignLog); - - // Emit the symbol by which the stack map entry can be found. - std::string Symbol; - Symbol += TAI.getGlobalPrefix(); - Symbol += "__gcmap_"; - Symbol += MD.getFunction().getName(); - if (const char *GlobalDirective = TAI.getGlobalDirective()) - OS << GlobalDirective << Symbol << "\n"; - OS << TAI.getGlobalPrefix() << Symbol << ":\n"; - - // Emit PointCount. - AP.EmitInt32(MD.size()); - AP.EOL("safe point count"); - - // And each safe point... - for (GCFunctionInfo::iterator PI = MD.begin(), - PE = MD.end(); PI != PE; ++PI) { - // Align to address width. - AP.EmitAlignment(AddressAlignLog); - - // Emit the address of the safe point. - OS << AddressDirective - << TAI.getPrivateGlobalPrefix() << "label" << PI->Num; - AP.EOL("safe point address"); - - // Emit the stack frame size. - AP.EmitInt32(MD.getFrameSize()); - AP.EOL("stack frame size"); - - // Emit the number of live roots in the function. - AP.EmitInt32(MD.live_size(PI)); - AP.EOL("live root count"); - - // And for each live root... - for (GCFunctionInfo::live_iterator LI = MD.live_begin(PI), - LE = MD.live_end(PI); - LI != LE; ++LI) { - // Print its offset within the stack frame. - AP.EmitInt32(LI->StackOffset); - AP.EOL("stack offset"); - } - } - } -} -</pre></blockquote> - -</div> - -</div> - -<!-- *********************************************************************** --> -<h2> - <a name="references">References</a> -</h2> -<!-- *********************************************************************** --> - -<div> - -<p><a name="appel89">[Appel89]</a> Runtime Tags Aren't Necessary. Andrew -W. Appel. Lisp and Symbolic Computation 19(7):703-705, July 1989.</p> - -<p><a name="goldberg91">[Goldberg91]</a> Tag-free garbage collection for -strongly typed programming languages. Benjamin Goldberg. ACM SIGPLAN -PLDI'91.</p> - -<p><a name="tolmach94">[Tolmach94]</a> Tag-free garbage collection using -explicit type parameters. Andrew Tolmach. Proceedings of the 1994 ACM -conference on LISP and functional programming.</p> - -<p><a name="henderson02">[Henderson2002]</a> <a -href="http://citeseer.ist.psu.edu/henderson02accurate.html"> -Accurate Garbage Collection in an Uncooperative Environment</a>. -Fergus Henderson. International Symposium on Memory Management 2002.</p> - -</div> - - -<!-- *********************************************************************** --> - -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss-blue" alt="Valid CSS"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"></a> - - <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> - <a href="http://llvm.org/">LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> - -</body> -</html> diff --git a/docs/GarbageCollection.rst b/docs/GarbageCollection.rst new file mode 100644 index 0000000000..b0b2718409 --- /dev/null +++ b/docs/GarbageCollection.rst @@ -0,0 +1,1051 @@ +===================================== +Accurate Garbage Collection with LLVM +===================================== + +.. contents:: + :local: + +.. sectionauthor:: Chris Lattner <sabre@nondot.org> and + Gordon Henriksen + +Introduction +============ + +Garbage collection is a widely used technique that frees the programmer from +having to know the lifetimes of heap objects, making software easier to produce +and maintain. Many programming languages rely on garbage collection for +automatic memory management. There are two primary forms of garbage collection: +conservative and accurate. + +Conservative garbage collection often does not require any special support from +either the language or the compiler: it can handle non-type-safe programming +languages (such as C/C++) and does not require any special information from the +compiler. The `Boehm collector +<http://www.hpl.hp.com/personal/Hans_Boehm/gc/>`__ is an example of a +state-of-the-art conservative collector. + +Accurate garbage collection requires the ability to identify all pointers in the +program at run-time (which requires that the source-language be type-safe in +most cases). Identifying pointers at run-time requires compiler support to +locate all places that hold live pointer variables at run-time, including the +:ref:`processor stack and registers <gcroot>`. + +Conservative garbage collection is attractive because it does not require any +special compiler support, but it does have problems. In particular, because the +conservative garbage collector cannot *know* that a particular word in the +machine is a pointer, it cannot move live objects in the heap (preventing the +use of compacting and generational GC algorithms) and it can occasionally suffer +from memory leaks due to integer values that happen to point to objects in the +program. In addition, some aggressive compiler transformations can break +conservative garbage collectors (though these seem rare in practice). + +Accurate garbage collectors do not suffer from any of these problems, but they +can suffer from degraded scalar optimization of the program. In particular, +because the runtime must be able to identify and update all pointers active in +the program, some optimizations are less effective. In practice, however, the +locality and performance benefits of using aggressive garbage collection +techniques dominates any low-level losses. + +This document describes the mechanisms and interfaces provided by LLVM to +support accurate garbage collection. + +.. _feature: + +Goals and non-goals +------------------- + +LLVM's intermediate representation provides :ref:`garbage collection intrinsics +<gc_intrinsics>` that offer support for a broad class of collector models. For +instance, the intrinsics permit: + +* semi-space collectors + +* mark-sweep collectors + +* generational collectors + +* reference counting + +* incremental collectors + +* concurrent collectors + +* cooperative collectors + +We hope that the primitive support built into the LLVM IR is sufficient to +support a broad class of garbage collected languages including Scheme, ML, Java, +C#, Perl, Python, Lua, Ruby, other scripting languages, and more. + +However, LLVM does not itself provide a garbage collector --- this should be +part of your language's runtime library. LLVM provides a framework for compile +time :ref:`code generation plugins <plugin>`. The role of these plugins is to +generate code and data structures which conforms to the *binary interface* +specified by the *runtime library*. This is similar to the relationship between +LLVM and DWARF debugging info, for example. The difference primarily lies in +the lack of an established standard in the domain of garbage collection --- thus +the plugins. + +The aspects of the binary interface with which LLVM's GC support is +concerned are: + +* Creation of GC-safe points within code where collection is allowed to execute + safely. + +* Computation of the stack map. For each safe point in the code, object + references within the stack frame must be identified so that the collector may + traverse and perhaps update them. + +* Write barriers when storing object references to the heap. These are commonly + used to optimize incremental scans in generational collectors. + +* Emission of read barriers when loading object references. These are useful + for interoperating with concurrent collectors. + +There are additional areas that LLVM does not directly address: + +* Registration of global roots with the runtime. + +* Registration of stack map entries with the runtime. + +* The functions used by the program to allocate memory, trigger a collection, + etc. + +* Computation or compilation of type maps, or registration of them with the + runtime. These are used to crawl the heap for object references. + +In general, LLVM's support for GC does not include features which can be +adequately addressed with other features of the IR and does not specify a +particular binary interface. On the plus side, this means that you should be +able to integrate LLVM with an existing runtime. On the other hand, it leaves a +lot of work for the developer of a novel language. However, it's easy to get +started quickly and scale up to a more sophisticated implementation as your +compiler matures. + +.. _quickstart: + +Getting started +=============== + +Using a GC with LLVM implies many things, for example: + +* Write a runtime library or find an existing one which implements a GC heap. + + #. Implement a memory allocator. + + #. Design a binary interface for the stack map, used to identify references + within a stack frame on the machine stack.\* + + #. Implement a stack crawler to discover functions on the call stack.\* + + #. Implement a registry for global roots. + + #. Design a binary interface for type maps, used to identify references + within heap objects. + + #. Implement a collection routine bringing together all of the above. + +* Emit compatible code from your compiler. + + * Initialization in the main function. + + * Use the ``gc "..."`` attribute to enable GC code generation (or + ``F.setGC("...")``). + + * Use ``@llvm.gcroot`` to mark stack roots. + + * Use ``@llvm.gcread`` and/or ``@llvm.gcwrite`` to manipulate GC references, + if necessary. + + * Allocate memory using the GC allocation routine provided by the runtime + library. + + * Generate type maps according to your runtime's binary interface. + +* Write a compiler plugin to interface LLVM with the runtime library.\* + + * Lower ``@llvm.gcread`` and ``@llvm.gcwrite`` to appropriate code + sequences.\* + + * Compile LLVM's stack map to the binary form expected by the runtime. + +* Load the plugin into the compiler. Use ``llc -load`` or link the plugin + statically with your language's compiler.\* + +* Link program executables with the runtime. + +To help with several of these tasks (those indicated with a \*), LLVM includes a +highly portable, built-in ShadowStack code generator. It is compiled into +``llc`` and works even with the interpreter and C backends. + +.. _quickstart-compiler: + +In your compiler +---------------- + +To turn the shadow stack on for your functions, first call: + +.. code-block:: c++ + + F.setGC("shadow-stack"); + +for each function your compiler emits. Since the shadow stack is built into +LLVM, you do not need to load a plugin. + +Your compiler must also use ``@llvm.gcroot`` as documented. Don't forget to +create a root for each intermediate value that is generated when evaluating an +expression. In ``h(f(), g())``, the result of ``f()`` could easily be collected +if evaluating ``g()`` triggers a collection. + +There's no need to use ``@llvm.gcread`` and ``@llvm.gcwrite`` over plain +``load`` and ``store`` for now. You will need them when switching to a more +advanced GC. + +.. _quickstart-runtime: + +In your runtime +--------------- + +The shadow stack doesn't imply a memory allocation algorithm. A semispace +collector or building atop ``malloc`` are great places to start, and can be +implemented with very little code. + +When it comes time to collect, however, your runtime needs to traverse the stack +roots, and for this it needs to integrate with the shadow stack. Luckily, doing +so is very simple. (This code is heavily commented to help you understand the +data structure, but there are only 20 lines of meaningful code.) + +.. code-block:: c++ + + /// @brief The map for a single function's stack frame. One of these is + /// compiled as constant data into the executable for each function. + /// + /// Storage of metadata values is elided if the %metadata parameter to + /// @llvm.gcroot is null. + struct FrameMap { + int32_t NumRoots; //< Number of roots in stack frame. + int32_t NumMeta; //< Number of metadata entries. May be < NumRoots. + const void *Meta[0]; //< Metadata for each root. + }; + + /// @brief A link in the dynamic shadow stack. One of these is embedded in + /// the stack frame of each function on the call stack. + struct StackEntry { + StackEntry *Next; //< Link to next stack entry (the caller's). + const FrameMap *Map; //< Pointer to constant FrameMap. + void *Roots[0]; //< Stack roots (in-place array). + }; + + /// @brief The head of the singly-linked list of StackEntries. Functions push + /// and pop onto this in their prologue and epilogue. + /// + /// Since there is only a global list, this technique is not threadsafe. + StackEntry *llvm_gc_root_chain; + + /// @brief Calls Visitor(root, meta) for each GC root on the stack. + /// root and meta are exactly the values passed to + /// @llvm.gcroot. + /// + /// Visitor could be a function to recursively mark live objects. Or it + /// might copy them to another heap or generation. + /// + /// @param Visitor A function to invoke for every GC root on the stack. + void visitGCRoots(void (*Visitor)(void **Root, const void *Meta)) { + for (StackEntry *R = llvm_gc_root_chain; R; R = R->Next) { + unsigned i = 0; + + // For roots [0, NumMeta), the metadata pointer is in the FrameMap. + for (unsigned e = R->Map->NumMeta; i != e; ++i) + Visitor(&R->Roots[i], R->Map->Meta[i]); + + // For roots [NumMeta, NumRoots), the metadata pointer is null. + for (unsigned e = R->Map->NumRoots; i != e; ++i) + Visitor(&R->Roots[i], NULL); + } + } + +.. _shadow-stack: + +About the shadow stack +---------------------- + +Unlike many GC algorithms which rely on a cooperative code generator to compile +stack maps, this algorithm carefully maintains a linked list of stack roots +[:ref:`Henderson2002 <henderson02>`]. This so-called "shadow stack" mirrors the +machine stack. Maintaining this data structure is slower than using a stack map +compiled into the executable as constant data, but has a significant portability +advantage because it requires no special support from the target code generator, +and does not require tricky platform-specific code to crawl the machine stack. + +The tradeoff for this simplicity and portability is: + +* High overhead per function call. + +* Not thread-safe. + +Still, it's an easy way to get started. After your compiler and runtime are up +and running, writing a plugin_ will allow you to take advantage of :ref:`more +advanced GC features <collector-algos>` of LLVM in order to improve performance. + +.. _gc_intrinsics: + +IR features +=========== + +This section describes the garbage collection facilities provided by the +:doc:`LLVM intermediate representation <LangRef>`. The exact behavior of these +IR features is specified by the binary interface implemented by a :ref:`code +generation plugin <plugin>`, not by this document. + +These facilities are limited to those strictly necessary; they are not intended +to be a complete interface to any garbage collector. A program will need to +interface with the GC library using the facilities provided by that program. + +.. _gcattr: + +Specifying GC code generation: ``gc "..."`` +------------------------------------------- + +.. code-block:: llvm + + define ty @name(...) gc "name" { ... + +The ``gc`` function attribute is used to specify the desired GC style to the +compiler. Its programmatic equivalent is the ``setGC`` method of ``Function``. + +Setting ``gc "name"`` on a function triggers a search for a matching code +generation plugin "*name*"; it is that plugin which defines the exact nature of +the code generated to support GC. If none is found, the compiler will raise an +error. + +Specifying the GC style on a per-function basis allows LLVM to link together +programs that use different garbage collection algorithms (or none at all). + +.. _gcroot: + +Identifying GC roots on the stack: ``llvm.gcroot`` +-------------------------------------------------- + +.. code-block:: llvm + + void @llvm.gcroot(i8** %ptrloc, i8* %metadata) + +The ``llvm.gcroot`` intrinsic is used to inform LLVM that a stack variable +references an object on the heap and is to be tracked for garbage collection. +The exact impact on generated code is specified by a :ref:`compiler plugin +<plugin>`. All calls to ``llvm.gcroot`` **must** reside inside the first basic +block. + +A compiler which uses mem2reg to raise imperative code using ``alloca`` into SSA +form need only add a call to ``@llvm.gcroot`` for those variables which a +pointers into the GC heap. + +It is also important to mark intermediate values with ``llvm.gcroot``. For +example, consider ``h(f(), g())``. Beware leaking the result of ``f()`` in the +case that ``g()`` triggers a collection. Note, that stack variables must be +initialized and marked with ``llvm.gcroot`` in function's prologue. + +The first argument **must** be a value referring to an alloca instruction or a +bitcast of an alloca. The second contains a pointer to metadata that should be +associated with the pointer, and **must** be a constant or global value +address. If your target collector uses tags, use a null pointer for metadata. + +The ``%metadata`` argument can be used to avoid requiring heap objects to have +'isa' pointers or tag bits. [Appel89_, Goldberg91_, Tolmach94_] If specified, +its value will be tracked along with the location of the pointer in the stack +frame. + +Consider the following fragment of Java code: + +.. code-block:: java + + { + Object X; // A null-initialized reference to an object + ... + } + +This block (which may be located in the middle of a function or in a loop nest), +could be compiled to this LLVM code: + +.. code-block:: llvm + + Entry: + ;; In the entry block for the function, allocate the + ;; stack space for X, which is an LLVM pointer. + %X = alloca %Object* + + ;; Tell LLVM that the stack space is a stack root. + ;; Java has type-tags on objects, so we pass null as metadata. + %tmp = bitcast %Object** %X to i8** + call void @llvm.gcroot(i8** %tmp, i8* null) + ... + + ;; "CodeBlock" is the block corresponding to the start + ;; of the scope above. + CodeBlock: + ;; Java null-initializes pointers. + store %Object* null, %Object** %X + + ... + + ;; As the pointer goes out of scope, store a null value into + ;; it, to indicate that the value is no longer live. + store %Object* null, %Object** %X + ... + +.. _barriers: + +Reading and writing references in the heap +------------------------------------------ + +Some collectors need to be informed when the mutator (the program that needs +garbage collection) either reads a pointer from or writes a pointer to a field +of a heap object. The code fragments inserted at these points are called *read +barriers* and *write barriers*, respectively. The amount of code that needs to +be executed is usually quite small and not on the critical path of any +computation, so the overall performance impact of the barrier is tolerable. + +Barriers often require access to the *object pointer* rather than the *derived +pointer* (which is a pointer to the field within the object). Accordingly, +these intrinsics take both pointers as separate arguments for completeness. In +this snippet, ``%object`` is the object pointer, and ``%derived`` is the derived +pointer: + +.. code-block:: llvm + + ;; An array type. + %class.Array = type { %class.Object, i32, [0 x %class.Object*] } + ... + + ;; Load the object pointer from a gcroot. + %object = load %class.Array** %object_addr + + ;; Compute the derived pointer. + %derived = getelementptr %object, i32 0, i32 2, i32 %n + +LLVM does not enforce this relationship between the object and derived pointer +(although a plugin_ might). However, it would be an unusual collector that +violated it. + +The use of these intrinsics is naturally optional if the target GC does require +the corresponding barrier. Such a GC plugin will replace the intrinsic calls +with the corresponding ``load`` or ``store`` instruction if they are used. + +.. _gcwrite: + +Write barrier: ``llvm.gcwrite`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. code-block:: llvm + + void @llvm.gcwrite(i8* %value, i8* %object, i8** %derived) + +For write barriers, LLVM provides the ``llvm.gcwrite`` intrinsic function. It +has exactly the same semantics as a non-volatile ``store`` to the derived +pointer (the third argument). The exact code generated is specified by a +compiler plugin_. + +Many important algorithms require write barriers, including generational and +concurrent collectors. Additionally, write barriers could be used to implement +reference counting. + +.. _gcread: + +Read barrier: ``llvm.gcread`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. code-block:: llvm + + i8* @llvm.gcread(i8* %object, i8** %derived) + +For read barriers, LLVM provides the ``llvm.gcread`` intrinsic function. It has +exactly the same semantics as a non-volatile ``load`` from the derived pointer +(the second argument). The exact code generated is specified by a compiler +plugin_. + +Read barriers are needed by fewer algorithms than write barriers, and may have a +greater performance impact since pointer reads are more frequent than writes. + +.. _plugin: + +Implementing a collector plugin +=============================== + +User code specifies which GC code generation to use with the ``gc`` function +attribute or, equivalently, with the ``setGC`` method of ``Function``. + +To implement a GC plugin, it is necessary to subclass ``llvm::GCStrategy``, +which can be accomplished in a few lines of boilerplate code. LLVM's +infrastructure provides access to several important algorithms. For an +uncontroversial collector, all that remains may be to compile LLVM's computed +stack map to assembly code (using the binary representation expected by the +runtime library). This can be accomplished in about 100 lines of code. + +This is not the appropriate place to implement a garbage collected heap or a +garbage collector itself. That code should exist in the language's runtime +library. The compiler plugin is responsible for generating code which conforms +to the binary interface defined by library, most essentially the :ref:`stack map +<stack-map>`. + +To subclass ``llvm::GCStrategy`` and register it with the compiler: + +.. code-block:: c++ + + // lib/MyGC/MyGC.cpp - Example LLVM GC plugin + + #include "llvm/CodeGen/GCStrategy.h" + #include "llvm/CodeGen/GCMetadata.h" + #include "llvm/Support/Compiler.h" + + using namespace llvm; + + namespace { + class LLVM_LIBRARY_VISIBILITY MyGC : public GCStrategy { + public: + MyGC() {} + }; + + GCRegistry::Add<MyGC> + X("mygc", "My bespoke garbage collector."); + } + +This boilerplate collector does nothing. More specifically: + +* ``llvm.gcread`` calls are replaced with the corresponding ``load`` + instruction. + +* ``llvm.gcwrite`` calls are replaced with the corresponding ``store`` + instruction. + +* No safe points are added to the code. + +* The stack map is not compiled into the executable. + +Using the LLVM makefiles (like the `sample project +<http://llvm.org/viewvc/llvm-project/llvm/trunk/projects/sample/>`__), this code +can be compiled as a plugin using a simple makefile: + +.. code-block:: make + + # lib/MyGC/Makefile + + LEVEL := ../.. + LIBRARYNAME = MyGC + LOADABLE_MODULE = 1 + + include $(LEVEL)/Makefile.common + +Once the plugin is compiled, code using it may be compiled using ``llc +-load=MyGC.so`` (though MyGC.so may have some other platform-specific +extension): + +:: + + $ cat sample.ll + define void @f() gc "mygc" { + entry: + ret void + } + $ llvm-as < sample.ll | llc -load=MyGC.so + +It is also possible to statically link the collector plugin into tools, such as +a language-specific compiler front-end. + +.. _collector-algos: + +Overview of available features +------------------------------ + +``GCStrategy`` provides a range of features through which a plugin may do useful +work. Some of these are callbacks, some are algorithms that can be enabled, +disabled, or customized. This matrix summarizes the supported (and planned) +features and correlates them with the collection techniques which typically +require them. + +.. |v| unicode:: 0x2714 + :trim: + +.. |x| unicode:: 0x2718 + :trim: + ++------------+------+--------+----------+-------+---------+-------------+----------+------------+ +| Algorithm | Done | Shadow | refcount | mark- | copying | incremental | threaded | concurrent | +| | | stack | | sweep | | | | | ++============+======+========+==========+=======+=========+=============+==========+============+ +| stack map | |v| | | | |x| | |x| | |x| | |x| | |x| | ++------------+------+--------+----------+-------+---------+-------------+----------+------------+ +| initialize | |v| | |x| | |x| | |x| | |x| | |x| | |x| | |x| | +| roots | | | | | | | | | ++------------+------+--------+----------+-------+---------+-------------+----------+------------+ +| derived | NO | | | | | | **N**\* | **N**\* | +| pointers | | | | | | | | | ++------------+------+--------+----------+-------+---------+-------------+----------+------------+ +| **custom | |v| | | | | | | | | +| lowering** | | | | | | | | | ++------------+------+--------+----------+-------+---------+-------------+----------+------------+ +| *gcroot* | |v| | |x| | |x| | | | | | | ++------------+------+--------+----------+-------+---------+-------------+----------+------------+ +| *gcwrite* | |v| | | |x| | | | |x| | | |x| | ++------------+------+--------+----------+-------+---------+-------------+----------+------------+ +| *gcread* | |v| | | | | | | | |x| | ++------------+------+--------+----------+-------+---------+-------------+----------+------------+ +| **safe | | | | | | | | | +| points** | | | | | | | | | ++------------+------+--------+----------+-------+---------+-------------+----------+------------+ +| *in | |v| | | | |x| | |x| | |x| | |x| | |x| | +| calls* | | | | | | | | | ++------------+------+--------+----------+-------+---------+-------------+----------+------------+ +| *before | |v| | | | | | | |x| | |x| | +| calls* | | | | | | | | | ++------------+------+--------+----------+-------+---------+-------------+----------+------------+ +| *for | NO | | | | | | **N** | **N** | +| loops* | | | | | | | | | ++------------+------+--------+----------+-------+---------+-------------+----------+------------+ +| *before | |v| | | | | | | |x| | |x| | +| escape* | | | | | | | | | ++------------+------+--------+----------+-------+---------+-------------+----------+------------+ +| emit code | NO | | | | | | **N** | **N** | +| at safe | | | | | | | | | +| points | | | | | | | | | ++------------+------+--------+----------+-------+---------+-------------+----------+------------+ +| **output** | | | | | | | | | ++------------+------+--------+----------+-------+---------+-------------+----------+------------+ +| *assembly* | |v| | | | |x| | |x| | |x| | |x| | |x| | ++------------+------+--------+----------+-------+---------+-------------+----------+------------+ +| *JIT* | NO | | | **?** | **?** | **?** | **?** | **?** | ++------------+------+--------+----------+-------+---------+-------------+----------+------------+ +| *obj* | NO | | | **?** | **?** | **?** | **?** | **?** | ++------------+------+--------+----------+-------+---------+-------------+----------+------------+ +| live | NO | | | **?** | **?** | **?** | **?** | **?** | +| analysis | | | | | | | | | ++------------+------+--------+----------+-------+---------+-------------+----------+------------+ +| register | NO | | | **?** | **?** | **?** | **?** | **?** | +| map | | | | | | | | | ++------------+------+--------+----------+-------+---------+-------------+----------+------------+ +| \* Derived pointers only pose a hasard to copying collections. | ++------------+------+--------+----------+-------+---------+-------------+----------+------------+ +| **?** denotes a feature which could be utilized if available. | ++------------+------+--------+----------+-------+---------+-------------+----------+------------+ + +To be clear, the collection techniques above are defined as: + +Shadow Stack + The mutator carefully maintains a linked list of stack roots. + +Reference Counting + The mutator maintains a reference count for each object and frees an object + when its count falls to zero. + +Mark-Sweep + When the heap is exhausted, the collector marks reachable objects starting + from the roots, then deallocates unreachable objects in a sweep phase. + +Copying + As reachability analysis proceeds, the collector copies objects from one heap + area to another, compacting them in the process. Copying collectors enable + highly efficient "bump pointer" allocation and can improve locality of + reference. + +Incremental + (Including generational collectors.) Incremental collectors generally have all + the properties of a copying collector (regardless of whether the mature heap + is compacting), but bring the added complexity of requiring write barriers. + +Threaded + Denotes a multithreaded mutator; the collector must still stop the mutator + ("stop the world") before beginning reachability analysis. Stopping a + multithreaded mutator is a complicated problem. It generally requires highly + platform specific code in the runtime, and the production of carefully + designed machine code at safe points. + +Concurrent + In this technique, the mutator and the collector run concurrently, with the + goal of eliminating pause times. In a *cooperative* collector, the mutator + further aids with collection should a pause occur, allowing collection to take + advantage of multiprocessor hosts. The "stop the world" problem of threaded + collectors is generally still present to a limited extent. Sophisticated + marking algorithms are necessary. Read barriers may be necessary. + +As the matrix indicates, LLVM's garbage collection infrastructure is already +suitable for a wide variety of collectors, but does not currently extend to +multithreaded programs. This will be added in the future as there is +interest. + +.. _stack-map: + +Computing stack maps +-------------------- + +LLVM automatically computes a stack map. One of the most important features +of a ``GCStrategy`` is to compile this information into the executable in +the binary representation expected by the runtime library. + +The stack map consists of the location and identity of each GC root in the +each function in the module. For each root: + +* ``RootNum``: The index of the root. + +* ``StackOffset``: The offset of the object relative to the frame pointer. + +* ``RootMetadata``: The value passed as the ``%metadata`` parameter to the + ``@llvm.gcroot`` intrinsic. + +Also, for the function as a whole: + +* ``getFrameSize()``: The overall size of the function's initial stack frame, + not accounting for any dynamic allocation. + +* ``roots_size()``: The count of roots in the function. + +To access the stack map, use ``GCFunctionMetadata::roots_begin()`` and +-``end()`` from the :ref:`GCMetadataPrinter <assembly>`: + +.. code-block:: c++ + + for (iterator I = begin(), E = end(); I != E; ++I) { + GCFunctionInfo *FI = *I; + unsigned FrameSize = FI->getFrameSize(); + size_t RootCount = FI->roots_size(); + + for (GCFunctionInfo::roots_iterator RI = FI->roots_begin(), + RE = FI->roots_end(); + RI != RE; ++RI) { + int RootNum = RI->Num; + int RootStackOffset = RI->StackOffset; + Constant *RootMetadata = RI->Metadata; + } + } + +If the ``llvm.gcroot`` intrinsic is eliminated before code generation by a +custom lowering pass, LLVM will compute an empty stack map. This may be useful +for collector plugins which implement reference counting or a shadow stack. + +.. _init-roots: + +Initializing roots to null: ``InitRoots`` +----------------------------------------- + +.. code-block:: c++ + + MyGC::MyGC() { + InitRoots = true; + } + +When set, LLVM will automatically initialize each root to ``null`` upon entry to +the function. This prevents the GC's sweep phase from visiting uninitialized +pointers, which will almost certainly cause it to crash. This initialization +occurs before custom lowering, so the two may be used together. + +Since LLVM does not yet compute liveness information, there is no means of +distinguishing an uninitialized stack root from an initialized one. Therefore, +this feature should be used by all GC plugins. It is enabled by default. + +.. _custom: + +Custom lowering of intrinsics: ``CustomRoots``, ``CustomReadBarriers``, and ``CustomWriteBarriers`` +--------------------------------------------------------------------------------------------------- + +For GCs which use barriers or unusual treatment of stack roots, these flags +allow the collector to perform arbitrary transformations of the LLVM IR: + +.. code-block:: c++ + + class MyGC : public GCStrategy { + public: + MyGC() { + CustomRoots = true; + CustomReadBarriers = true; + CustomWriteBarriers = true; + } + + virtual bool initializeCustomLowering(Module &M); + virtual bool performCustomLowering(Function &F); + }; + +If any of these flags are set, then LLVM suppresses its default lowering for the +corresponding intrinsics and instead calls ``performCustomLowering``. + +LLVM's default action for each intrinsic is as follows: + +* ``llvm.gcroot``: Leave it alone. The code generator must see it or the stack + map will not be computed. + +* ``llvm.gcread``: Substitute a ``load`` instruction. + +* ``llvm.gcwrite``: Substitute a ``store`` instruction. + +If ``CustomReadBarriers`` or ``CustomWriteBarriers`` are specified, then +``performCustomLowering`` **must** eliminate the corresponding barriers. + +``performCustomLowering`` must comply with the same restrictions as +`FunctionPass::runOnFunction <WritingAnLLVMPass.html#runOnFunction>`__ +Likewise, ``initializeCustomLowering`` has the same semantics as +`Pass::doInitialization(Module&) +<WritingAnLLVMPass.html#doInitialization_mod>`__ + +The following can be used as a template: + +.. code-block:: c++ + + #include "llvm/Module.h" + #include "llvm/IntrinsicInst.h" + + bool MyGC::initializeCustomLowering(Module &M) { + return false; + } + + bool MyGC::performCustomLowering(Function &F) { + bool MadeChange = false; + + for (Function::iterator BB = F.begin(), E = F.end(); BB != E; ++BB) + for (BasicBlock::iterator II = BB->begin(), E = BB->end(); II != E; ) + if (IntrinsicInst *CI = dyn_cast<IntrinsicInst>(II++)) + if (Function *F = CI->getCalledFunction()) + switch (F->getIntrinsicID()) { + case Intrinsic::gcwrite: + // Handle llvm.gcwrite. + CI->eraseFromParent(); + MadeChange = true; + break; + case Intrinsic::gcread: + // Handle llvm.gcread. + CI->eraseFromParent(); + MadeChange = true; + break; + case Intrinsic::gcroot: + // Handle llvm.gcroot. + CI->eraseFromParent(); + MadeChange = true; + break; + } + + return MadeChange; + } + +.. _safe-points: + +Generating safe points: ``NeededSafePoints`` +-------------------------------------------- + +LLVM can compute four kinds of safe points: + +.. code-block:: c++ + + namespace GC { + /// PointKind - The type of a collector-safe point. + /// + enum PointKind { + Loop, //< Instr is a loop (backwards branch). + Return, //< Instr is a return instruction. + PreCall, //< Instr is a call instruction. + PostCall //< Instr is the return address of a call. + }; + } + +A collector can request any combination of the four by setting the +``NeededSafePoints`` mask: + +.. code-block:: c++ + + MyGC::MyGC() { + NeededSafePoints = 1 << GC::Loop + | 1 << GC::Return + | 1 << GC::PreCall + | 1 << GC::PostCall; + } + +It can then use the following routines to access safe points. + +.. code-block:: c++ + + for (iterator I = begin(), E = end(); I != E; ++I) { + GCFunctionInfo *MD = *I; + size_t PointCount = MD->size(); + + for (GCFunctionInfo::iterator PI = MD->begin(), + PE = MD->end(); PI != PE; ++PI) { + GC::PointKind PointKind = PI->Kind; + unsigned PointNum = PI->Num; + } + } + +Almost every collector requires ``PostCall`` safe points, since these correspond +to the moments when the function is suspended during a call to a subroutine. + +Threaded programs generally require ``Loop`` safe points to guarantee that the +application will reach a safe point within a bounded amount of time, even if it +is executing a long-running loop which contains no function calls. + +Threaded collectors may also require ``Return`` and ``PreCall`` safe points to +implement "stop the world" techniques using self-modifying code, where it is +important that the program not exit the function without reaching a safe point +(because only the topmost function has been patched). + +.. _assembly: + +Emitting assembly code: ``GCMetadataPrinter`` +--------------------------------------------- + +LLVM allows a plugin to print arbitrary assembly code before and after the rest +of a module's assembly code. At the end of the module, the GC can compile the +LLVM stack map into assembly code. (At the beginning, this information is not +yet computed.) + +Since AsmWriter and CodeGen are separate components of LLVM, a separate abstract +base class and registry is provided for printing assembly code, the +``GCMetadaPrinter`` and ``GCMetadataPrinterRegistry``. The AsmWriter will look +for such a subclass if the ``GCStrategy`` sets ``UsesMetadata``: + +.. code-block:: c++ + + MyGC::MyGC() { + UsesMetadata = true; + } + +This separation allows JIT-only clients to be smaller. + +Note that LLVM does not currently have analogous APIs to support code generation +in the JIT, nor using the object writers. + +.. code-block:: c++ + + // lib/MyGC/MyGCPrinter.cpp - Example LLVM GC printer + + #include "llvm/CodeGen/GCMetadataPrinter.h" + #include "llvm/Support/Compiler.h" + + using namespace llvm; + + namespace { + class LLVM_LIBRARY_VISIBILITY MyGCPrinter : public GCMetadataPrinter { + public: + virtual void beginAssembly(std::ostream &OS, AsmPrinter &AP, + const TargetAsmInfo &TAI); + + virtual void finishAssembly(std::ostream &OS, AsmPrinter &AP, + const TargetAsmInfo &TAI); + }; + + GCMetadataPrinterRegistry::Add<MyGCPrinter> + X("mygc", "My bespoke garbage collector."); + } + +The collector should use ``AsmPrinter`` and ``TargetAsmInfo`` to print portable +assembly code to the ``std::ostream``. The collector itself contains the stack +map for the entire module, and may access the ``GCFunctionInfo`` using its own +``begin()`` and ``end()`` methods. Here's a realistic example: + +.. code-block:: c++ + + #include "llvm/CodeGen/AsmPrinter.h" + #include "llvm/Function.h" + #include "llvm/Target/TargetMachine.h" + #include "llvm/DataLayout.h" + #include "llvm/Target/TargetAsmInfo.h" + + void MyGCPrinter::beginAssembly(std::ostream &OS, AsmPrinter &AP, + const TargetAsmInfo &TAI) { + // Nothing to do. + } + + void MyGCPrinter::finishAssembly(std::ostream &OS, AsmPrinter &AP, + const TargetAsmInfo &TAI) { + // Set up for emitting addresses. + const char *AddressDirective; + int AddressAlignLog; + if (AP.TM.getDataLayout()->getPointerSize() == sizeof(int32_t)) { + AddressDirective = TAI.getData32bitsDirective(); + AddressAlignLog = 2; + } else { + AddressDirective = TAI.getData64bitsDirective(); + AddressAlignLog = 3; + } + + // Put this in the data section. + AP.SwitchToDataSection(TAI.getDataSection()); + + // For each function... + for (iterator FI = begin(), FE = end(); FI != FE; ++FI) { + GCFunctionInfo &MD = **FI; + + // Emit this data structure: + // + // struct { + // int32_t PointCount; + // struct { + // void *SafePointAddress; + // int32_t LiveCount; + // int32_t LiveOffsets[LiveCount]; + // } Points[PointCount]; + // } __gcmap_<FUNCTIONNAME>; + + // Align to address width. + AP.EmitAlignment(AddressAlignLog); + + // Emit the symbol by which the stack map entry can be found. + std::string Symbol; + Symbol += TAI.getGlobalPrefix(); + Symbol += "__gcmap_"; + Symbol += MD.getFunction().getName(); + if (const char *GlobalDirective = TAI.getGlobalDirective()) + OS << GlobalDirective << Symbol << "\n"; + OS << TAI.getGlobalPrefix() << Symbol << ":\n"; + + // Emit PointCount. + AP.EmitInt32(MD.size()); + AP.EOL("safe point count"); + + // And each safe point... + for (GCFunctionInfo::iterator PI = MD.begin(), + PE = MD.end(); PI != PE; ++PI) { + // Align to address width. + AP.EmitAlignment(AddressAlignLog); + + // Emit the address of the safe point. + OS << AddressDirective + << TAI.getPrivateGlobalPrefix() << "label" << PI->Num; + AP.EOL("safe point address"); + + // Emit the stack frame size. + AP.EmitInt32(MD.getFrameSize()); + AP.EOL("stack frame size"); + + // Emit the number of live roots in the function. + AP.EmitInt32(MD.live_size(PI)); + AP.EOL("live root count"); + + // And for each live root... + for (GCFunctionInfo::live_iterator LI = MD.live_begin(PI), + LE = MD.live_end(PI); + LI != LE; ++LI) { + // Print its offset within the stack frame. + AP.EmitInt32(LI->StackOffset); + AP.EOL("stack offset"); + } + } + } + } + +References +========== + +.. _appel89: + +[Appel89] Runtime Tags Aren't Necessary. Andrew W. Appel. Lisp and Symbolic +Computation 19(7):703-705, July 1989. + +.. _goldberg91: + +[Goldberg91] Tag-free garbage collection for strongly typed programming +languages. Benjamin Goldberg. ACM SIGPLAN PLDI'91. + +.. _tolmach94: + +[Tolmach94] Tag-free garbage collection using explicit type parameters. Andrew +Tolmach. Proceedings of the 1994 ACM conference on LISP and functional +programming. + +.. _henderson02: + +[Henderson2002] `Accurate Garbage Collection in an Uncooperative Environment +<http://citeseer.ist.psu.edu/henderson02accurate.html>`__ + diff --git a/docs/GetElementPtr.rst b/docs/GetElementPtr.rst index f6f904b2e3..3b57d78cf1 100644 --- a/docs/GetElementPtr.rst +++ b/docs/GetElementPtr.rst @@ -22,7 +22,7 @@ Address Computation When people are first confronted with the GEP instruction, they tend to relate it to known concepts from other programming paradigms, most notably C array indexing and field selection. GEP closely resembles C array indexing and field -selection, however it's is a little different and this leads to the following +selection, however it is a little different and this leads to the following questions. What is the first index of the GEP instruction? @@ -190,7 +190,7 @@ In this example, we have a global variable, ``%MyVar`` that is a pointer to a structure containing a pointer to an array of 40 ints. The GEP instruction seems to be accessing the 18th integer of the structure's array of ints. However, this is actually an illegal GEP instruction. It won't compile. The reason is that the -pointer in the structure <i>must</i> be dereferenced in order to index into the +pointer in the structure *must* be dereferenced in order to index into the array of 40 ints. Since the GEP instruction never accesses memory, it is illegal. @@ -416,7 +416,7 @@ arithmetic, and inttoptr sequences. Can I compute the distance between two objects, and add that value to one address to compute the other address? --------------------------------------------------------------------------------------------------------------- -As with arithmetic on null, You can use GEP to compute an address that way, but +As with arithmetic on null, you can use GEP to compute an address that way, but you can't use that pointer to actually access the object if you do, unless the object is managed outside of LLVM. diff --git a/docs/GettingStarted.rst b/docs/GettingStarted.rst index 68768921f6..8902684c98 100644 --- a/docs/GettingStarted.rst +++ b/docs/GettingStarted.rst @@ -94,7 +94,7 @@ Here's the short story for getting up and running quickly with LLVM: running ``svn update``. * It is also possible to use CMake instead of the makefiles. With CMake it is - also possible to generate project files for several IDEs: Eclipse CDT4, + possible to generate project files for several IDEs: Xcode, Eclipse CDT4, CodeBlocks, Qt-Creator (use the CodeBlocks generator), KDevelop3. * If you get an "internal compiler error (ICE)" or test failures, see @@ -583,7 +583,7 @@ git-imap-send. Here is an example to generate the patchset in Gmail's [Drafts]. Then, your .git/config should have [imap] sections. -.. code-block:: bash +.. code-block:: ini [imap] host = imaps://imap.gmail.com @@ -842,12 +842,39 @@ any subdirectories that it contains. Entering any directory inside the LLVM object tree and typing ``gmake`` should rebuild anything in or below that directory that is out of date. +This does not apply to building the documentation. +LLVM's (non-Doxygen) documentation is produced with the +`Sphinx <http://sphinx-doc.org/>`_ documentation generation system. +There are some HTML documents that have not yet been converted to the new +system (which uses the easy-to-read and easy-to-write +`reStructuredText <http://sphinx-doc.org/rest.html>`_ plaintext markup +language). +The generated documentation is built in the ``SRC_ROOT/docs`` directory using +a special makefile. +For instructions on how to install Sphinx, see +`Sphinx Introduction for LLVM Developers +<http://lld.llvm.org/sphinx_intro.html>`_. +After following the instructions there for installing Sphinx, build the LLVM +HTML documentation by doing the following: + +.. code-block:: bash + + $ cd SRC_ROOT/docs + $ make -f Makefile.sphinx + +This creates a ``_build/html`` sub-directory with all of the HTML files, not +just the generated ones. +This directory corresponds to ``llvm.org/docs``. +For example, ``_build/html/SphinxQuickstartTemplate.html`` corresponds to +``llvm.org/docs/SphinxQuickstartTemplate.html``. +The :doc:`SphinxQuickstartTemplate` is useful when creating a new document. + Cross-Compiling LLVM -------------------- It is possible to cross-compile LLVM itself. That is, you can create LLVM executables and libraries to be hosted on a platform different from the platform -where they are build (a Canadian Cross build). To configure a cross-compile, +where they are built (a Canadian Cross build). To configure a cross-compile, supply the configure script with ``--build`` and ``--host`` options that are different. The values of these options must be legal target triples that your GCC compiler supports. @@ -1073,8 +1100,8 @@ module that must be checked out (usually to ``projects/test-suite``). This module contains a comprehensive correctness, performance, and benchmarking test suite for LLVM. It is a separate Subversion module because not every LLVM user is interested in downloading or building such a comprehensive test suite. For -further details on this test suite, please see the `Testing -Guide <TestingGuide.html>`_ document. +further details on this test suite, please see the :doc:`Testing Guide +<TestingGuide>` document. .. _tools: @@ -1250,8 +1277,8 @@ Example with clang % lli hello.bc - The second examples shows how to invoke the LLVM JIT, `lli - <CommandGuide/html/lli.html>`_. + The second examples shows how to invoke the LLVM JIT, :doc:`lli + <CommandGuide/lli>`. #. Use the ``llvm-dis`` utility to take a look at the LLVM assembly code: diff --git a/docs/HowToReleaseLLVM.html b/docs/HowToReleaseLLVM.html deleted file mode 100644 index 30c3d5da5e..0000000000 --- a/docs/HowToReleaseLLVM.html +++ /dev/null @@ -1,581 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> -<html> -<head> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <title>How To Release LLVM To The Public</title> - <link rel="stylesheet" href="_static/llvm.css" type="text/css"> -</head> -<body> - -<h1>How To Release LLVM To The Public</h1> -<ol> - <li><a href="#introduction">Introduction</a></li> - <li><a href="#criteria">Qualification Criteria</a></li> - <li><a href="#introduction">Release Timeline</a></li> - <li><a href="#process">Release Process</a></li> -</ol> -<div class="doc_author"> - <p>Written by <a href="mailto:tonic@nondot.org">Tanya Lattner</a>, - <a href="mailto:rspencer@x10sys.com">Reid Spencer</a>, - <a href="mailto:criswell@cs.uiuc.edu">John Criswell</a>, & - <a href="mailto:wendling@apple.com">Bill Wendling</a> - </p> -</div> - -<!-- *********************************************************************** --> -<h2><a name="introduction">Introduction</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>This document contains information about successfully releasing LLVM — - including subprojects: e.g., <tt>clang</tt> and <tt>dragonegg</tt> — to - the public. It is the Release Manager's responsibility to ensure that a high - quality build of LLVM is released.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="process">Release Timeline</a></h2> -<!-- *********************************************************************** --> -<div> - -<p>LLVM is released on a time based schedule — roughly every 6 months. We - do not normally have dot releases because of the nature of LLVM's incremental - development philosophy. That said, the only thing preventing dot releases for - critical bug fixes from happening is a lack of resources — testers, - machines, time, etc. And, because of the high quality we desire for LLVM - releases, we cannot allow for a truncated form of release qualification.</p> - -<p>The release process is roughly as follows:</p> - -<ul> - <li><p>Set code freeze and branch creation date for 6 months after last code - freeze date. Announce release schedule to the LLVM community and update - the website.</p></li> - - <li><p>Create release branch and begin release process.</p></li> - - <li><p>Send out release candidate sources for first round of testing. Testing - lasts 7-10 days. During the first round of testing, any regressions found - should be fixed. Patches are merged from mainline into the release - branch. Also, all features need to be completed during this time. Any - features not completed at the end of the first round of testing will be - removed or disabled for the release.</p></li> - - <li><p>Generate and send out the second release candidate sources. Only - <em>critial</em> bugs found during this testing phase will be fixed. Any - bugs introduced by merged patches will be fixed. If so a third round of - testing is needed.</p></li> - - <li><p>The release notes are updated.</p></li> - - <li><p>Finally, release!</p></li> -</ul> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="process">Release Process</a></h2> -<!-- *********************************************************************** --> - -<div> - -<ol> - <li><a href="#release-admin">Release Administrative Tasks</a> - <ol> - <li><a href="#branch">Create Release Branch</a></li> - <li><a href="#verchanges">Update Version Numbers</a></li> - </ol> - </li> - <li><a href="#release-build">Building the Release</a> - <ol> - <li><a href="#dist">Build the LLVM Source Distributions</a></li> - <li><a href="#build">Build LLVM</a></li> - <li><a href="#clangbin">Build the Clang Binary Distribution</a></li> - <li><a href="#target-build">Target Specific Build Details</a></li> - </ol> - </li> - <li><a href="#release-qualify">Release Qualification Criteria</a> - <ol> - <li><a href="#llvm-qualify">Qualify LLVM</a></li> - <li><a href="#clang-qualify">Qualify Clang</a></li> - <li><a href="#targets">Specific Target Qualification Details</a></li> - </ol> - </li> - - <li><a href="#commTest">Community Testing</a></li> - <li><a href="#release-patch">Release Patch Rules</a></li> - <li><a href="#release-final">Release final tasks</a> - <ol> - <li><a href="#updocs">Update Documentation</a></li> - <li><a href="#tag">Tag the LLVM Final Release</a></li> - <li><a href="#updemo">Update the LLVM Demo Page</a></li> - <li><a href="#webupdates">Update the LLVM Website</a></li> - <li><a href="#announce">Announce the Release</a></li> - </ol> - </li> -</ol> - -<!-- ======================================================================= --> -<h3><a name="release-admin">Release Administrative Tasks</a></h3> - -<div> - -<p>This section describes a few administrative tasks that need to be done for - the release process to begin. Specifically, it involves:</p> - -<ul> - <li>Creating the release branch,</li> - <li>Setting version numbers, and</li> - <li>Tagging release candidates for the release team to begin testing</li> -</ul> - -<!-- ======================================================================= --> -<h4><a name="branch">Create Release Branch</a></h4> - -<div> - -<p>Branch the Subversion trunk using the following procedure:</p> - -<ol> - <li><p>Remind developers that the release branching is imminent and to refrain - from committing patches that might break the build. E.g., new features, - large patches for works in progress, an overhaul of the type system, an - exciting new TableGen feature, etc.</p></li> - - <li><p>Verify that the current Subversion trunk is in decent shape by - examining nightly tester and buildbot results.</p></li> - - <li><p>Create the release branch for <tt>llvm</tt>, <tt>clang</tt>, - the <tt>test-suite</tt>, and <tt>dragonegg</tt> from the last known good - revision. The branch's name is <tt>release_<i>XY</i></tt>, - where <tt>X</tt> is the major and <tt>Y</tt> the minor release - numbers. The branches should be created using the following commands:</p> - -<div class="doc_code"> -<pre> -$ svn copy https://llvm.org/svn/llvm-project/llvm/trunk \ - https://llvm.org/svn/llvm-project/llvm/branches/release_<i>XY</i> - -$ svn copy https://llvm.org/svn/llvm-project/cfe/trunk \ - https://llvm.org/svn/llvm-project/cfe/branches/release_<i>XY</i> - -$ svn copy https://llvm.org/svn/llvm-project/dragonegg/trunk \ - https://llvm.org/svn/llvm-project/dragonegg/branches/release_<i>XY</i> - -$ svn copy https://llvm.org/svn/llvm-project/test-suite/trunk \ - https://llvm.org/svn/llvm-project/test-suite/branches/release_<i>XY</i> -</pre> -</div></li> - - <li><p>Advise developers that they may now check their patches into the - Subversion tree again.</p></li> - - <li><p>The Release Manager should switch to the release branch, because all - changes to the release will now be done in the branch. The easiest way to - do this is to grab a working copy using the following commands:</p> - -<div class="doc_code"> -<pre> -$ svn co https://llvm.org/svn/llvm-project/llvm/branches/release_<i>XY</i> llvm-<i>X.Y</i> - -$ svn co https://llvm.org/svn/llvm-project/cfe/branches/release_<i>XY</i> clang-<i>X.Y</i> - -$ svn co https://llvm.org/svn/llvm-project/dragonegg/branches/release_<i>XY</i> dragonegg-<i>X.Y</i> - -$ svn co https://llvm.org/svn/llvm-project/test-suite/branches/release_<i>XY</i> test-suite-<i>X.Y</i> -</pre> -</div></li> -</ol> - -</div> - -<!-- ======================================================================= --> -<h4><a name="verchanges">Update LLVM Version</a></h4> - -<div> - -<p>After creating the LLVM release branch, update the release branches' - <tt>autoconf</tt> and <tt>configure.ac</tt> versions from '<tt>X.Ysvn</tt>' - to '<tt>X.Y</tt>'. Update it on mainline as well to be the next version - ('<tt>X.Y+1svn</tt>'). Regenerate the configure scripts for both - <tt>llvm</tt> and the <tt>test-suite</tt>.</p> - -<p>In addition, the version numbers of all the Bugzilla components must be - updated for the next release.</p> - -</div> - -<!-- ======================================================================= --> -<h4><a name="dist">Build the LLVM Release Candidates</a></h4> - -<div> - -<p>Create release candidates for <tt>llvm</tt>, <tt>clang</tt>, - <tt>dragonegg</tt>, and the LLVM <tt>test-suite</tt> by tagging the branch - with the respective release candidate number. For instance, to - create <b>Release Candidate 1</b> you would issue the following commands:</p> - -<div class="doc_code"> -<pre> -$ svn mkdir https://llvm.org/svn/llvm-project/llvm/tags/RELEASE_<i>XY</i> -$ svn copy https://llvm.org/svn/llvm-project/llvm/branches/release_<i>XY</i> \ - https://llvm.org/svn/llvm-project/llvm/tags/RELEASE_<i>XY</i>/rc1 - -$ svn mkdir https://llvm.org/svn/llvm-project/cfe/tags/RELEASE_<i>XY</i> -$ svn copy https://llvm.org/svn/llvm-project/cfe/branches/release_<i>XY</i> \ - https://llvm.org/svn/llvm-project/cfe/tags/RELEASE_<i>XY</i>/rc1 - -$ svn mkdir https://llvm.org/svn/llvm-project/dragonegg/tags/RELEASE_<i>XY</i> -$ svn copy https://llvm.org/svn/llvm-project/dragonegg/branches/release_<i>XY</i> \ - https://llvm.org/svn/llvm-project/dragonegg/tags/RELEASE_<i>XY</i>/rc1 - -$ svn mkdir https://llvm.org/svn/llvm-project/test-suite/tags/RELEASE_<i>XY</i> -$ svn copy https://llvm.org/svn/llvm-project/test-suite/branches/release_<i>XY</i> \ - https://llvm.org/svn/llvm-project/test-suite/tags/RELEASE_<i>XY</i>/rc1 -</pre> -</div> - -<p>Similarly, <b>Release Candidate 2</b> would be named <tt>RC2</tt> and so - on. This keeps a permanent copy of the release candidate around for people to - export and build as they wish. The final released sources will be tagged in - the <tt>RELEASE_<i>XY</i></tt> directory as <tt>Final</tt> - (c.f. <a href="#tag">Tag the LLVM Final Release</a>).</p> - -<p>The Release Manager may supply pre-packaged source tarballs for users. This - can be done with the following commands:</p> - -<div class="doc_code"> -<pre> -$ svn export https://llvm.org/svn/llvm-project/llvm/tags/RELEASE_<i>XY</i>/rc1 llvm-<i>X.Y</i>rc1 -$ svn export https://llvm.org/svn/llvm-project/cfe/tags/RELEASE_<i>XY</i>/rc1 clang-<i>X.Y</i>rc1 -$ svn export https://llvm.org/svn/llvm-project/dragonegg/tags/RELEASE_<i>XY</i>/rc1 dragonegg-<i>X.Y</i>rc1 -$ svn export https://llvm.org/svn/llvm-project/test-suite/tags/RELEASE_<i>XY</i>/rc1 llvm-test-<i>X.Y</i>rc1 - -$ tar -cvf - llvm-<i>X.Y</i>rc1 | gzip > llvm-<i>X.Y</i>rc1.src.tar.gz -$ tar -cvf - clang-<i>X.Y</i>rc1 | gzip > clang-<i>X.Y</i>rc1.src.tar.gz -$ tar -cvf - dragonegg-<i>X.Y</i>rc1 | gzip > dragonegg-<i>X.Y</i>rc1.src.tar.gz -$ tar -cvf - llvm-test-<i>X.Y</i>rc1 | gzip > llvm-test-<i>X.Y</i>rc1.src.tar.gz -</pre> -</div> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3><a name="release-build">Building the Release</a></h3> - -<div> - -<p>The builds of <tt>llvm</tt>, <tt>clang</tt>, and <tt>dragonegg</tt> - <em>must</em> be free of errors and warnings in Debug, Release+Asserts, and - Release builds. If all builds are clean, then the release passes Build - Qualification.</p> - -<p>The <tt>make</tt> options for building the different modes:</p> - -<table> - <tr><th>Mode</th><th>Options</th></tr> - <tr align="left"><td>Debug</td><td><tt>ENABLE_OPTIMIZED=0</tt></td></tr> - <tr align="left"><td>Release+Asserts</td><td><tt>ENABLE_OPTIMIZED=1</tt></td></tr> - <tr align="left"><td>Release</td><td><tt>ENABLE_OPTIMIZED=1 DISABLE_ASSERTIONS=1</tt></td></tr> -</table> - -<!-- ======================================================================= --> -<h4><a name="build">Build LLVM</a></h4> - -<div> - -<p>Build <tt>Debug</tt>, <tt>Release+Asserts</tt>, and <tt>Release</tt> versions - of <tt>llvm</tt> on all supported platforms. Directions to build - <tt>llvm</tt> are <a href="GettingStarted.html#quickstart">here</a>.</p> - -</div> - -<!-- ======================================================================= --> -<h4><a name="clangbin">Build Clang Binary Distribution</a></h4> - -<div> - -<p>Creating the <tt>clang</tt> binary distribution - (Debug/Release+Asserts/Release) requires performing the following steps for - each supported platform:</p> - -<ol> - <li>Build clang according to the directions - <a href="http://clang.llvm.org/get_started.html">here</a>.</li> - - <li>Build both a Debug and Release version of clang. The binary will be the - Release build.</lI> - - <li>Package <tt>clang</tt> (details to follow).</li> -</ol> - -</div> - -<!-- ======================================================================= --> -<h4><a name="target-build">Target Specific Build Details</a></h4> - -<div> - -<p>The table below specifies which compilers are used for each Arch/OS - combination when qualifying the build of <tt>llvm</tt>, <tt>clang</tt>, - and <tt>dragonegg</tt>.</p> - -<table> - <tr><th>Architecture</th> <th>OS</th> <th>compiler</th></tr> - <tr><td>x86-32</td> <td>Mac OS 10.5</td> <td>gcc 4.0.1</td></tr> - <tr><td>x86-32</td> <td>Linux</td> <td>gcc 4.2.X, gcc 4.3.X</td></tr> - <tr><td>x86-32</td> <td>FreeBSD</td> <td>gcc 4.2.X</td></tr> - <tr><td>x86-32</td> <td>mingw</td> <td>gcc 3.4.5</td></tr> - <tr><td>x86-64</td> <td>Mac OS 10.5</td> <td>gcc 4.0.1</td></tr> - <tr><td>x86-64</td> <td>Linux</td> <td>gcc 4.2.X, gcc 4.3.X</td></tr> - <tr><td>x86-64</td> <td>FreeBSD</td> <td>gcc 4.2.X</td></tr> -</table> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3><a name="release-qualify">Building the Release</a></h3> - -<div> - -<p>A release is qualified when it has no regressions from the previous release - (or baseline). Regressions are related to correctness first and performance - second. (We may tolerate some minor performance regressions if they are - deemed necessary for the general quality of the compiler.)</p> - -<p><b>Regressions are new failures in the set of tests that are used to qualify - each product and only include things on the list. Every release will have - some bugs in it. It is the reality of developing a complex piece of - software. We need a very concrete and definitive release criteria that - ensures we have monotonically improving quality on some metric. The metric we - use is described below. This doesn't mean that we don't care about other - criteria, but these are the criteria which we found to be most important and - which must be satisfied before a release can go out</b></p> - -<!-- ======================================================================= --> -<h4><a name="llvm-qualify">Qualify LLVM</a></h4> - -<div> - -<p>LLVM is qualified when it has a clean test run without a front-end. And it - has no regressions when using either <tt>clang</tt> or <tt>dragonegg</tt> - with the <tt>test-suite</tt> from the previous release.</p> - -</div> - -<!-- ======================================================================= --> -<h4><a name="clang-qualify">Qualify Clang</a></h4> - -<div> - -<p><tt>Clang</tt> is qualified when front-end specific tests in the - <tt>llvm</tt> dejagnu test suite all pass, clang's own test suite passes - cleanly, and there are no regressions in the <tt>test-suite</tt>.</p> - -</div> - -<!-- ======================================================================= --> -<h4><a name="targets">Specific Target Qualification Details</a></h4> - -<div> - -<table> - <tr><th>Architecture</th> <th>OS</th> <th>clang baseline</th> <th>tests</th></tr> - <tr><td>x86-32</td> <td>Linux</td> <td>last release</td> <td>llvm dejagnu, clang tests, test-suite (including spec)</td></tr> - <tr><td>x86-32</td> <td>FreeBSD</td> <td>last release</td> <td>llvm dejagnu, clang tests, test-suite</td></tr> - <tr><td>x86-32</td> <td>mingw</td> <td>none</td> <td>QT</td></tr> - <tr><td>x86-64</td> <td>Mac OS 10.X</td> <td>last release</td> <td>llvm dejagnu, clang tests, test-suite (including spec)</td></tr> - <tr><td>x86-64</td> <td>Linux</td> <td>last release</td> <td>llvm dejagnu, clang tests, test-suite (including spec)</td></tr> - <tr><td>x86-64</td> <td>FreeBSD</td> <td>last release</td> <td>llvm dejagnu, clang tests, test-suite</td></tr> -</table> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3><a name="commTest">Community Testing</a></h3> -<div> - -<p>Once all testing has been completed and appropriate bugs filed, the release - candidate tarballs are put on the website and the LLVM community is - notified. Ask that all LLVM developers test the release in 2 ways:</p> - -<ol> - <li>Download <tt>llvm-<i>X.Y</i></tt>, <tt>llvm-test-<i>X.Y</i></tt>, and the - appropriate <tt>clang</tt> binary. Build LLVM. Run <tt>make check</tt> and - the full LLVM test suite (<tt>make TEST=nightly report</tt>).</li> - - <li>Download <tt>llvm-<i>X.Y</i></tt>, <tt>llvm-test-<i>X.Y</i></tt>, and the - <tt>clang</tt> sources. Compile everything. Run <tt>make check</tt> and - the full LLVM test suite (<tt>make TEST=nightly report</tt>).</li> -</ol> - -<p>Ask LLVM developers to submit the test suite report and <tt>make check</tt> - results to the list. Verify that there are no regressions from the previous - release. The results are not used to qualify a release, but to spot other - potential problems. For unsupported targets, verify that <tt>make check</tt> - is at least clean.</p> - -<p>During the first round of testing, all regressions must be fixed before the - second release candidate is tagged.</p> - -<p>If this is the second round of testing, the testing is only to ensure that - bug fixes previously merged in have not created new major problems. <i>This - is not the time to solve additional and unrelated bugs!</i> If no patches are - merged in, the release is determined to be ready and the release manager may - move onto the next stage.</p> - -</div> - -<!-- ======================================================================= --> -<h3><a name="release-patch">Release Patch Rules</a></h3> - -<div> - -<p>Below are the rules regarding patching the release branch:</p> - -<ol> - <li><p>Patches applied to the release branch may only be applied by the - release manager.</p></li> - - <li><p>During the first round of testing, patches that fix regressions or that - are small and relatively risk free (verified by the appropriate code - owner) are applied to the branch. Code owners are asked to be very - conservative in approving patches for the branch. We reserve the right to - reject any patch that does not fix a regression as previously - defined.</p></li> - - <li><p>During the remaining rounds of testing, only patches that fix critical - regressions may be applied.</p></li> -</ol> - -</div> - -<!-- ======================================================================= --> -<h3><a name="release-final">Release Final Tasks</a></h3> - -<div> - -<p>The final stages of the release process involves tagging the "final" release - branch, updating documentation that refers to the release, and updating the - demo page.</p> - -<!-- ======================================================================= --> -<h4><a name="updocs">Update Documentation</a></h4> - -<div> - -<p>Review the documentation and ensure that it is up to date. The "Release - Notes" must be updated to reflect new features, bug fixes, new known issues, - and changes in the list of supported platforms. The "Getting Started Guide" - should be updated to reflect the new release version number tag available from - Subversion and changes in basic system requirements. Merge both changes from - mainline into the release branch.</p> - -</div> - -<!-- ======================================================================= --> -<h4><a name="tag">Tag the LLVM Final Release</a></h4> - -<div> - -<p>Tag the final release sources using the following procedure:</p> - -<div class="doc_code"> -<pre> -$ svn copy https://llvm.org/svn/llvm-project/llvm/branches/release_XY \ - https://llvm.org/svn/llvm-project/llvm/tags/RELEASE_<i>XY</i>/Final - -$ svn copy https://llvm.org/svn/llvm-project/cfe/branches/release_XY \ - https://llvm.org/svn/llvm-project/cfe/tags/RELEASE_<i>XY</i>/Final - -$ svn copy https://llvm.org/svn/llvm-project/dragonegg/branches/release_XY \ - https://llvm.org/svn/llvm-project/dragonegg/tags/RELEASE_<i>XY</i>/Final - -$ svn copy https://llvm.org/svn/llvm-project/test-suite/branches/release_XY \ - https://llvm.org/svn/llvm-project/test-suite/tags/RELEASE_<i>XY</i>/Final -</pre> -</div> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3><a name="updemo">Update the LLVM Demo Page</a></h3> - -<div> - -<p>The LLVM demo page must be updated to use the new release. This consists of - using the new <tt>clang</tt> binary and building LLVM.</p> - -<!-- ======================================================================= --> -<h4><a name="webupdates">Update the LLVM Website</a></h4> - -<div> - -<p>The website must be updated before the release announcement is sent out. Here - is what to do:</p> - -<ol> - <li>Check out the <tt>www</tt> module from Subversion.</li> - - <li>Create a new subdirectory <tt>X.Y</tt> in the releases directory.</li> - - <li>Commit the <tt>llvm</tt>, <tt>test-suite</tt>, <tt>clang</tt> source, - <tt>clang binaries</tt>, <tt>dragonegg</tt> source, and <tt>dragonegg</tt> - binaries in this new directory.</li> - - <li>Copy and commit the <tt>llvm/docs</tt> and <tt>LICENSE.txt</tt> files - into this new directory. The docs should be built with - <tt>BUILD_FOR_WEBSITE=1</tt>.</li> - - <li>Commit the <tt>index.html</tt> to the <tt>release/X.Y</tt> directory to - redirect (use from previous release.</li> - - <li>Update the <tt>releases/download.html</tt> file with the new release.</li> - - <li>Update the <tt>releases/index.html</tt> with the new release and link to - release documentation.</li> - - <li>Finally, update the main page (<tt>index.html</tt> and sidebar) to point - to the new release and release announcement. Make sure this all gets - committed back into Subversion.</li> -</ol> - -</div> - -<!-- ======================================================================= --> -<h4><a name="announce">Announce the Release</a></h4> - -<div> - -<p>Have Chris send out the release announcement when everything is finished.</p> - -</div> - -</div> - -</div> - -<!-- *********************************************************************** --> -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss-blue" alt="Valid CSS"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"></a> - <a href="http://llvm.org/">The LLVM Compiler Infrastructure</a> - <br> - Last modified: $Date$ -</address> -</body> -</html> diff --git a/docs/HowToReleaseLLVM.rst b/docs/HowToReleaseLLVM.rst new file mode 100644 index 0000000000..eb6c838a21 --- /dev/null +++ b/docs/HowToReleaseLLVM.rst @@ -0,0 +1,427 @@ +================================= +How To Release LLVM To The Public +================================= + +.. contents:: + :local: + :depth: 1 + +.. sectionauthor:: Tanya Lattner <tonic@nondot.org>, + Reid Spencer <rspencer@x10sys.com>, + John Criswell <criswell@cs.uiuc.edu> and + Bill Wendling <wendling@apple.com> + +Introduction +============ + +This document contains information about successfully releasing LLVM --- +including subprojects: e.g., ``clang`` and ``dragonegg`` --- to the public. It +is the Release Manager's responsibility to ensure that a high quality build of +LLVM is released. + +.. _timeline: + +Release Timeline +================ + +LLVM is released on a time based schedule --- roughly every 6 months. We do +not normally have dot releases because of the nature of LLVM's incremental +development philosophy. That said, the only thing preventing dot releases for +critical bug fixes from happening is a lack of resources --- testers, +machines, time, etc. And, because of the high quality we desire for LLVM +releases, we cannot allow for a truncated form of release qualification. + +The release process is roughly as follows: + +* Set code freeze and branch creation date for 6 months after last code freeze + date. Announce release schedule to the LLVM community and update the website. + +* Create release branch and begin release process. + +* Send out release candidate sources for first round of testing. Testing lasts + 7-10 days. During the first round of testing, any regressions found should be + fixed. Patches are merged from mainline into the release branch. Also, all + features need to be completed during this time. Any features not completed at + the end of the first round of testing will be removed or disabled for the + release. + +* Generate and send out the second release candidate sources. Only *critial* + bugs found during this testing phase will be fixed. Any bugs introduced by + merged patches will be fixed. If so a third round of testing is needed. + +* The release notes are updated. + +* Finally, release! + +Release Process +=============== + +.. contents:: + :local: + +Release Administrative Tasks +---------------------------- + +This section describes a few administrative tasks that need to be done for the +release process to begin. Specifically, it involves: + +* Creating the release branch, + +* Setting version numbers, and + +* Tagging release candidates for the release team to begin testing. + +Create Release Branch +^^^^^^^^^^^^^^^^^^^^^ + +Branch the Subversion trunk using the following procedure: + +#. Remind developers that the release branching is imminent and to refrain from + committing patches that might break the build. E.g., new features, large + patches for works in progress, an overhaul of the type system, an exciting + new TableGen feature, etc. + +#. Verify that the current Subversion trunk is in decent shape by + examining nightly tester and buildbot results. + +#. Create the release branch for ``llvm``, ``clang``, the ``test-suite``, and + ``dragonegg`` from the last known good revision. The branch's name is + ``release_XY``, where ``X`` is the major and ``Y`` the minor release + numbers. The branches should be created using the following commands: + + :: + + $ svn copy https://llvm.org/svn/llvm-project/llvm/trunk \ + https://llvm.org/svn/llvm-project/llvm/branches/release_XY + + $ svn copy https://llvm.org/svn/llvm-project/cfe/trunk \ + https://llvm.org/svn/llvm-project/cfe/branches/release_XY + + $ svn copy https://llvm.org/svn/llvm-project/dragonegg/trunk \ + https://llvm.org/svn/llvm-project/dragonegg/branches/release_XY + + $ svn copy https://llvm.org/svn/llvm-project/test-suite/trunk \ + https://llvm.org/svn/llvm-project/test-suite/branches/release_XY + +#. Advise developers that they may now check their patches into the Subversion + tree again. + +#. The Release Manager should switch to the release branch, because all changes + to the release will now be done in the branch. The easiest way to do this is + to grab a working copy using the following commands: + + :: + + $ svn co https://llvm.org/svn/llvm-project/llvm/branches/release_XY llvm-X.Y + + $ svn co https://llvm.org/svn/llvm-project/cfe/branches/release_XY clang-X.Y + + $ svn co https://llvm.org/svn/llvm-project/dragonegg/branches/release_XY dragonegg-X.Y + + $ svn co https://llvm.org/svn/llvm-project/test-suite/branches/release_XY test-suite-X.Y + +Update LLVM Version +^^^^^^^^^^^^^^^^^^^ + +After creating the LLVM release branch, update the release branches' +``autoconf`` and ``configure.ac`` versions from '``X.Ysvn``' to '``X.Y``'. +Update it on mainline as well to be the next version ('``X.Y+1svn``'). +Regenerate the configure scripts for both ``llvm`` and the ``test-suite``. + +In addition, the version numbers of all the Bugzilla components must be updated +for the next release. + +Build the LLVM Release Candidates +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Create release candidates for ``llvm``, ``clang``, ``dragonegg``, and the LLVM +``test-suite`` by tagging the branch with the respective release candidate +number. For instance, to create **Release Candidate 1** you would issue the +following commands: + +:: + + $ svn mkdir https://llvm.org/svn/llvm-project/llvm/tags/RELEASE_XY + $ svn copy https://llvm.org/svn/llvm-project/llvm/branches/release_XY \ + https://llvm.org/svn/llvm-project/llvm/tags/RELEASE_XY/rc1 + + $ svn mkdir https://llvm.org/svn/llvm-project/cfe/tags/RELEASE_XY + $ svn copy https://llvm.org/svn/llvm-project/cfe/branches/release_XY \ + https://llvm.org/svn/llvm-project/cfe/tags/RELEASE_XY/rc1 + + $ svn mkdir https://llvm.org/svn/llvm-project/dragonegg/tags/RELEASE_XY + $ svn copy https://llvm.org/svn/llvm-project/dragonegg/branches/release_XY \ + https://llvm.org/svn/llvm-project/dragonegg/tags/RELEASE_XY/rc1 + + $ svn mkdir https://llvm.org/svn/llvm-project/test-suite/tags/RELEASE_XY + $ svn copy https://llvm.org/svn/llvm-project/test-suite/branches/release_XY \ + https://llvm.org/svn/llvm-project/test-suite/tags/RELEASE_XY/rc1 + +Similarly, **Release Candidate 2** would be named ``RC2`` and so on. This keeps +a permanent copy of the release candidate around for people to export and build +as they wish. The final released sources will be tagged in the ``RELEASE_XY`` +directory as ``Final`` (c.f. :ref:`tag`). + +The Release Manager may supply pre-packaged source tarballs for users. This can +be done with the following commands: + +:: + + $ svn export https://llvm.org/svn/llvm-project/llvm/tags/RELEASE_XY/rc1 llvm-X.Yrc1 + $ svn export https://llvm.org/svn/llvm-project/cfe/tags/RELEASE_XY/rc1 clang-X.Yrc1 + $ svn export https://llvm.org/svn/llvm-project/dragonegg/tags/RELEASE_XY/rc1 dragonegg-X.Yrc1 + $ svn export https://llvm.org/svn/llvm-project/test-suite/tags/RELEASE_XY/rc1 llvm-test-X.Yrc1 + + $ tar -cvf - llvm-X.Yrc1 | gzip > llvm-X.Yrc1.src.tar.gz + $ tar -cvf - clang-X.Yrc1 | gzip > clang-X.Yrc1.src.tar.gz + $ tar -cvf - dragonegg-X.Yrc1 | gzip > dragonegg-X.Yrc1.src.tar.gz + $ tar -cvf - llvm-test-X.Yrc1 | gzip > llvm-test-X.Yrc1.src.tar.gz + +Building the Release +-------------------- + +The builds of ``llvm``, ``clang``, and ``dragonegg`` *must* be free of +errors and warnings in Debug, Release+Asserts, and Release builds. If all +builds are clean, then the release passes Build Qualification. + +The ``make`` options for building the different modes: + ++-----------------+---------------------------------------------+ +| Mode | Options | ++=================+=============================================+ +| Debug | ``ENABLE_OPTIMIZED=0`` | ++-----------------+---------------------------------------------+ +| Release+Asserts | ``ENABLE_OPTIMIZED=1`` | ++-----------------+---------------------------------------------+ +| Release | ``ENABLE_OPTIMIZED=1 DISABLE_ASSERTIONS=1`` | ++-----------------+---------------------------------------------+ + +Build LLVM +^^^^^^^^^^ + +Build ``Debug``, ``Release+Asserts``, and ``Release`` versions +of ``llvm`` on all supported platforms. Directions to build ``llvm`` +are :ref:`here <getting_started>`. + +Build Clang Binary Distribution +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Creating the ``clang`` binary distribution (Debug/Release+Asserts/Release) +requires performing the following steps for each supported platform: + +#. Build clang according to the directions `here + <http://clang.llvm.org/get_started.html>`__. + +#. Build both a Debug and Release version of clang. The binary will be the + Release build. + +#. Package ``clang`` (details to follow). + +Target Specific Build Details +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The table below specifies which compilers are used for each Arch/OS combination +when qualifying the build of ``llvm``, ``clang``, and ``dragonegg``. + ++--------------+---------------+----------------------+ +| Architecture | OS | compiler | ++==============+===============+======================+ +| x86-32 | Mac OS 10.5 | gcc 4.0.1 | ++--------------+---------------+----------------------+ +| x86-32 | Linux | gcc 4.2.X, gcc 4.3.X | ++--------------+---------------+----------------------+ +| x86-32 | FreeBSD | gcc 4.2.X | ++--------------+---------------+----------------------+ +| x86-32 | mingw | gcc 3.4.5 | ++--------------+---------------+----------------------+ +| x86-64 | Mac OS 10.5 | gcc 4.0.1 | ++--------------+---------------+----------------------+ +| x86-64 | Linux | gcc 4.2.X, gcc 4.3.X | ++--------------+---------------+----------------------+ +| x86-64 | FreeBSD | gcc 4.2.X | ++--------------+---------------+----------------------+ + +Release Qualification Criteria +------------------------------ + +A release is qualified when it has no regressions from the previous release (or +baseline). Regressions are related to correctness first and performance second. +(We may tolerate some minor performance regressions if they are deemed +necessary for the general quality of the compiler.) + +**Regressions are new failures in the set of tests that are used to qualify +each product and only include things on the list. Every release will have +some bugs in it. It is the reality of developing a complex piece of +software. We need a very concrete and definitive release criteria that +ensures we have monotonically improving quality on some metric. The metric we +use is described below. This doesn't mean that we don't care about other +criteria, but these are the criteria which we found to be most important and +which must be satisfied before a release can go out.** + +Qualify LLVM +^^^^^^^^^^^^ + +LLVM is qualified when it has a clean test run without a front-end. And it has +no regressions when using either ``clang`` or ``dragonegg`` with the +``test-suite`` from the previous release. + +Qualify Clang +^^^^^^^^^^^^^ + +``Clang`` is qualified when front-end specific tests in the ``llvm`` dejagnu +test suite all pass, clang's own test suite passes cleanly, and there are no +regressions in the ``test-suite``. + +Specific Target Qualification Details +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + ++--------------+-------------+----------------+-----------------------------+ +| Architecture | OS | clang baseline | tests | ++==============+=============+================+=============================+ +| x86-32 | Linux | last release | llvm dejagnu, | +| | | | clang tests, | +| | | | test-suite (including spec) | ++--------------+-------------+----------------+-----------------------------+ +| x86-32 | FreeBSD | last release | llvm dejagnu, | +| | | | clang tests, | +| | | | test-suite | ++--------------+-------------+----------------+-----------------------------+ +| x86-32 | mingw | none | QT | ++--------------+-------------+----------------+-----------------------------+ +| x86-64 | Mac OS 10.X | last release | llvm dejagnu, | +| | | | clang tests, | +| | | | test-suite (including spec) | ++--------------+-------------+----------------+-----------------------------+ +| x86-64 | Linux | last release | llvm dejagnu, | +| | | | clang tests, | +| | | | test-suite (including spec) | ++--------------+-------------+----------------+-----------------------------+ +| x86-64 | FreeBSD | last release | llvm dejagnu, | +| | | | clang tests, | +| | | | test-suite | ++--------------+-------------+----------------+-----------------------------+ + +Community Testing +----------------- + +Once all testing has been completed and appropriate bugs filed, the release +candidate tarballs are put on the website and the LLVM community is notified. +Ask that all LLVM developers test the release in 2 ways: + +#. Download ``llvm-X.Y``, ``llvm-test-X.Y``, and the appropriate ``clang`` + binary. Build LLVM. Run ``make check`` and the full LLVM test suite (``make + TEST=nightly report``). + +#. Download ``llvm-X.Y``, ``llvm-test-X.Y``, and the ``clang`` sources. Compile + everything. Run ``make check`` and the full LLVM test suite (``make + TEST=nightly report``). + +Ask LLVM developers to submit the test suite report and ``make check`` results +to the list. Verify that there are no regressions from the previous release. +The results are not used to qualify a release, but to spot other potential +problems. For unsupported targets, verify that ``make check`` is at least +clean. + +During the first round of testing, all regressions must be fixed before the +second release candidate is tagged. + +If this is the second round of testing, the testing is only to ensure that bug +fixes previously merged in have not created new major problems. *This is not +the time to solve additional and unrelated bugs!* If no patches are merged in, +the release is determined to be ready and the release manager may move onto the +next stage. + +Release Patch Rules +------------------- + +Below are the rules regarding patching the release branch: + +#. Patches applied to the release branch may only be applied by the release + manager. + +#. During the first round of testing, patches that fix regressions or that are + small and relatively risk free (verified by the appropriate code owner) are + applied to the branch. Code owners are asked to be very conservative in + approving patches for the branch. We reserve the right to reject any patch + that does not fix a regression as previously defined. + +#. During the remaining rounds of testing, only patches that fix critical + regressions may be applied. + +Release Final Tasks +------------------- + +The final stages of the release process involves tagging the "final" release +branch, updating documentation that refers to the release, and updating the +demo page. + +Update Documentation +^^^^^^^^^^^^^^^^^^^^ + +Review the documentation and ensure that it is up to date. The "Release Notes" +must be updated to reflect new features, bug fixes, new known issues, and +changes in the list of supported platforms. The "Getting Started Guide" should +be updated to reflect the new release version number tag available from +Subversion and changes in basic system requirements. Merge both changes from +mainline into the release branch. + +.. _tag: + +Tag the LLVM Final Release +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Tag the final release sources using the following procedure: + +:: + + $ svn copy https://llvm.org/svn/llvm-project/llvm/branches/release_XY \ + https://llvm.org/svn/llvm-project/llvm/tags/RELEASE_XY/Final + + $ svn copy https://llvm.org/svn/llvm-project/cfe/branches/release_XY \ + https://llvm.org/svn/llvm-project/cfe/tags/RELEASE_XY/Final + + $ svn copy https://llvm.org/svn/llvm-project/dragonegg/branches/release_XY \ + https://llvm.org/svn/llvm-project/dragonegg/tags/RELEASE_XY/Final + + $ svn copy https://llvm.org/svn/llvm-project/test-suite/branches/release_XY \ + https://llvm.org/svn/llvm-project/test-suite/tags/RELEASE_XY/Final + +Update the LLVM Demo Page +------------------------- + +The LLVM demo page must be updated to use the new release. This consists of +using the new ``clang`` binary and building LLVM. + +Update the LLVM Website +^^^^^^^^^^^^^^^^^^^^^^^ + +The website must be updated before the release announcement is sent out. Here +is what to do: + +#. Check out the ``www`` module from Subversion. + +#. Create a new subdirectory ``X.Y`` in the releases directory. + +#. Commit the ``llvm``, ``test-suite``, ``clang`` source, ``clang binaries``, + ``dragonegg`` source, and ``dragonegg`` binaries in this new directory. + +#. Copy and commit the ``llvm/docs`` and ``LICENSE.txt`` files into this new + directory. The docs should be built with ``BUILD_FOR_WEBSITE=1``. + +#. Commit the ``index.html`` to the ``release/X.Y`` directory to redirect (use + from previous release). + +#. Update the ``releases/download.html`` file with the new release. + +#. Update the ``releases/index.html`` with the new release and link to release + documentation. + +#. Finally, update the main page (``index.html`` and sidebar) to point to the + new release and release announcement. Make sure this all gets committed back + into Subversion. + +Announce the Release +^^^^^^^^^^^^^^^^^^^^ + +Have Chris send out the release announcement when everything is finished. + diff --git a/docs/HowToUseInstrMappings.rst b/docs/HowToUseInstrMappings.rst index b51e74e23c..bf9278e770 100755..100644 --- a/docs/HowToUseInstrMappings.rst +++ b/docs/HowToUseInstrMappings.rst @@ -120,7 +120,7 @@ to include relevant information in its definition. For example, consider following to be the current definitions of ADD, ADD_pt (true) and ADD_pf (false) instructions: -.. code-block::llvm +.. code-block:: llvm def ADD : ALU32_rr<(outs IntRegs:$dst), (ins IntRegs:$a, IntRegs:$b), "$dst = add($a, $b)", @@ -141,7 +141,7 @@ In this step, we modify these instructions to include the information required by the relationship model, <tt>getPredOpcode</tt>, so that they can be related. -.. code-block::llvm +.. code-block:: llvm def ADD : PredRel, ALU32_rr<(outs IntRegs:$dst), (ins IntRegs:$a, IntRegs:$b), "$dst = add($a, $b)", diff --git a/docs/LLVMBuild.html b/docs/LLVMBuild.html deleted file mode 100644 index 9e7f8c7657..0000000000 --- a/docs/LLVMBuild.html +++ /dev/null @@ -1,368 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> -<html> -<head> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <title>LLVMBuild Documentation</title> - <link rel="stylesheet" href="_static/llvm.css" type="text/css"> -</head> -<body> - -<h1>LLVMBuild Guide</h1> - -<ol> - <li><a href="#introduction">Introduction</a></li> - <li><a href="#projectorg">Project Organization</a></li> - <li><a href="#buildintegration">Build Integration</a></li> - <li><a href="#componentoverview">Component Overview</a></li> - <li><a href="#formatreference">Format Reference</a></li> -</ol> - -<!-- *********************************************************************** --> -<h2><a name="introduction">Introduction</a></h2> -<!-- *********************************************************************** --> - -<div> - <p>This document describes the <tt>LLVMBuild</tt> organization and files which - we use to describe parts of the LLVM ecosystem. For description of specific - LLVMBuild related tools, please see the command guide.</p> - - <p>LLVM is designed to be a modular set of libraries which can be flexibly - mixed together in order to build a variety of tools, like compilers, JITs, - custom code generators, optimization passes, interpreters, and so on. Related - projects in the LLVM system like Clang and LLDB also tend to follow this - philosophy.</p> - - <p>In order to support this usage style, LLVM has a fairly strict structure as - to how the source code and various components are organized. The - <tt>LLVMBuild.txt</tt> files are the explicit specification of that structure, - and are used by the build systems and other tools in order to develop the LLVM - project.</p> -</div> - -<!-- *********************************************************************** --> -<h2><a name="projectorg">Project Organization</a></h2> -<!-- *********************************************************************** --> - -<!-- FIXME: We should probably have an explicit top level project object. Good -place to hang project level data, name, etc. Also useful for serving as the -$ROOT of project trees for things which can be checked out separately. --> - -<div> - <p>The source code for LLVM projects using the LLVMBuild system (LLVM, Clang, - and LLDB) is organized into <em>components</em>, which define the separate - pieces of functionality that make up the project. These projects may consist - of many libraries, associated tools, build tools, or other utility tools (for - example, testing tools).</p> - - <p>For the most part, the project contents are organized around defining one - main component per each subdirectory. Each such directory contains - an <tt>LLVMBuild.txt</tt> which contains the component definitions.</p> - - <p>The component descriptions for the project as a whole are automatically - gathered by the LLVMBuild tools. The tools automatically traverse the source - directory structure to find all of the component description files. NOTE: For - performance/sanity reasons, we only traverse into subdirectories when the - parent itself contains an <tt>LLVMBuild.txt</tt> description file.</p> -</div> - -<!-- *********************************************************************** --> -<h2><a name="buildintegration">Build Integration</a></h2> -<!-- *********************************************************************** --> - -<div> - <p>The LLVMBuild files themselves are just a declarative way to describe the - project structure. The actual building of the LLVM project is handled by - another build system (currently we support - both <a href="MakefileGuide.html">Makefiles</a> - and <a href="CMake.html">CMake</a>.</p> - - <p>The build system implementation will load the relevant contents of the - LLVMBuild files and use that to drive the actual project build. Typically, the - build system will only need to load this information at "configure" time, and - use it to generative native information. Build systems will also handle - automatically reconfiguring their information when the contents of - the <i>LLVMBuild.txt</i> files change.</p> - - <p>Developers generally are not expected to need to be aware of the details of - how the LLVMBuild system is integrated into their build. Ideally, LLVM - developers who are not working on the build system would only ever need to - modify the contents of the <i>LLVMBuild.txt</i> description files (although we - have not reached this goal yet).</p> - - <p>For more information on the utility tool we provide to help interfacing - with the build system, please see - the <a href="CommandGuide/html/llvm-build.html">llvm-build</a> - documentation.</p> -</div> - -<!-- *********************************************************************** --> -<h2><a name="componentoverview">Component Overview</a></h2> -<!-- *********************************************************************** --> - -<div> - <p>As mentioned earlier, LLVM projects are organized into - logical <em>components</em>. Every component is typically grouped into its - own subdirectory. Generally, a component is organized around a coherent group - of sources which have some kind of clear API separation from other parts of - the code.</p> - - <p>LLVM primarily uses the following types of components:</p> - <ul> - <li><em>Libraries</em> - Library components define a distinct API which can - be independently linked into LLVM client applications. Libraries typically - have private and public header files, and may specify a link of required - libraries that they build on top of.</li> - - <li><em>Build Tools</em> - Build tools are applications which are designed - to be run as part of the build process (typically to generate other source - files). Currently, LLVM uses one main build tool - called <a href="TableGenFundamentals.html">TableGen</a> to generate a - variety of source files.</li> - - <li><em>Tools</em> - Command line applications which are built using the - LLVM component libraries. Most LLVM tools are small and are primarily - frontends to the library interfaces.</li> - -<!-- FIXME: We also need shared libraries as a first class component, but this - is not yet implemented. --> - </ul> - - <p>Components are described using <em>LLVMBuild.txt</em> files in the - directories that define the component. See - the <a href="#formatreference">Format Reference</a> section for information on - the exact format of these files.</p> -</div> - -<!-- *********************************************************************** --> -<h2><a name="formatreference">LLVMBuild Format Reference</a></h2> -<!-- *********************************************************************** --> - -<div> - <p>LLVMBuild files are written in a simple variant of the INI or configuration - file format (<a href="http://en.wikipedia.org/wiki/INI_file">Wikipedia - entry</a>). The format defines a list of sections each of which may contain - some number of properties. A simple example of the file format is below:</p> - <div class="doc_code"> - <pre> -<i>; Comments start with a semi-colon.</i> - -<i>; Sections are declared using square brackets.</i> -[component_0] - -<i>; Properties are declared using '=' and are contained in the previous section. -; -; We support simple string and boolean scalar values and list values, where -; items are separated by spaces. There is no support for quoting, and so -; property values may not contain spaces.</i> -property_name = property_value -list_property_name = value_1 value_2 <em>...</em> value_n -boolean_property_name = 1 <em>(or 0)</em> -</pre> - </div> - - <p>LLVMBuild files are expected to define a strict set of sections and - properties. An typical component description file for a library - component would look typically look like the following example:</p> - <div class="doc_code"> - <pre> -[component_0] -type = Library -name = Linker -parent = Libraries -required_libraries = Archive BitReader Core Support TransformUtils -</pre> - </div> - - <p>A full description of the exact sections and properties which are allowed - follows.</p> - - <p>Each file may define exactly one common component, named "common". The - common component may define the following properties:</p> - <ul> - <li><i>subdirectories</i> <b>[optional]</b> - <p>If given, a list of the names of the subdirectories from the current - subpath to search for additional LLVMBuild files.</p></li> - </ul> - - <p>Each file may define multiple components. Each component is described by a - section who name starts with "component". The remainder of the section name is - ignored, but each section name must be unique. Typically components are just - number in order for files with multiple components ("component_0", - "component_1", and so on).<p> - - <p><b>Section names not matching this format (or the "common" section) are - currently unused and are disallowed.</b></p> - - <p>Every component is defined by the properties in the section. The exact list - of properties that are allowed depends on the component - type. Components <b>may not</b> define any properties other than those - expected by the component type.</p> - - <p>Every component must define the following properties:</p> - <ul> - <li><i>type</i> <b>[required]</b> - <p>The type of the component. Supported component types are - detailed below. Most components will define additional properties which - may be required or optional.</p></li> - - <li><i>name</i> <b>[required]</b> - <p>The name of the component. Names are required to be unique - across the entire project.</p></li> - - <li><i>parent</i> <b>[required]</b> - <p>The name of the logical parent of the component. Components are - organized into a logical tree to make it easier to navigate and organize - groups of components. The parents have no semantics as far as the project - build is concerned, however. Typically, the parent will be the main - component of the parent directory.</p> - - <!-- FIXME: Should we make the parent optional, and default to parent - directories component? --> - - <p>Components may reference the root pseudo component using '$ROOT' to - indicate they should logically be grouped at the top-level.</p> - </li> - </ul> - - <p>Components may define the following properties:</p> - <ul> - <li><i>dependencies</i> <b>[optional]</b> - <p>If specified, a list of names of components which <i>must</i> be built - prior to this one. This should only be exactly those components which - produce some tool or source code required for building the - component.</p> - - <p><em>NOTE:</em> Group and LibraryGroup components have no semantics for - the actual build, and are not allowed to specify dependencies.</p></li> - </ul> - - <p>The following section lists the available component types, as well as the - properties which are associated with that component.</p> - - <ul> - <li><i>type = Group</i> - <p>Group components exist purely to allow additional arbitrary structuring - of the logical components tree. For example, one might define a - "Libraries" group to hold all of the root library components.</p> - - <p>Group components have no additionally properties.</p> - </li> - - <li><i>type = Library</i> - <p>Library components define an individual library which should be built - from the source code in the component directory.</p> - - <p>Components with this type use the following properties:</p> - <ul> - <li><i>library_name</i> <b>[optional]</b> - <p>If given, the name to use for the actual library file on disk. If - not given, the name is derived from the component name - itself.</p></li> - - <li><i>required_libraries</i> <b>[optional]</b> - <p>If given, a list of the names of Library or LibraryGroup components - which must also be linked in whenever this library is used. That is, - the link time dependencies for this component. When tools are built, - the build system will include the transitive closure of - all <i>required_libraries</i> for the components the tool needs.</p></li> - - <li><i>add_to_library_groups</i> <b>[optional]</b> - <p>If given, a list of the names of LibraryGroup components which this - component is also part of. This allows nesting groups of - components. For example, the <i>X86</i> target might define a library - group for all of the <i>X86</i> components. That library group might - then be included in the <i>all-targets</i> library group.</p></li> - - <li><i>installed</i> <b>[optional]</b> <b>[boolean]</b> - <p>Whether this library is installed. Libraries that are not installed - are only reported by <tt>llvm-config</tt> when it is run as part of a - development directory.</p></li> - </ul> - </li> - - <li><i>type = LibraryGroup</i> - <p>LibraryGroup components are a mechanism to allow easy definition of - useful sets of related components. In particular, we use them to easily - specify things like "all targets", or "all assembly printers".</p> - - <p>Components with this type use the following properties:</p> - <ul> - <li><i>required_libraries</i> <b>[optional]</b> - <p>See the Library type for a description of this property.</p></li> - - <li><i>add_to_library_groups</i> <b>[optional]</b> - <p>See the Library type for a description of this property.</p></li> - </ul> - </li> - - <li><i>type = TargetGroup</i> - <p>TargetGroup components are an extension of LibraryGroups, specifically - for defining LLVM targets (which are handled specially in a few - places).</p> - - <p>The name of the component should always be the name of the target.</p> - - <p>Components with this type use the LibraryGroup properties in addition - to:</p> - <ul> - <li><i>has_asmparser</i> <b>[optional]</b> <b>[boolean]</b> - <p>Whether this target defines an assembly parser.</p></li> - <li><i>has_asmprinter</i> <b>[optional]</b> <b>[boolean]</b> - <p>Whether this target defines an assembly printer.</p></li> - <li><i>has_disassembler</i> <b>[optional]</b> <b>[boolean]</b> - <p>Whether this target defines a disassembler.</p></li> - <li><i>has_jit</i> <b>[optional]</b> <b>[boolean]</b> - <p>Whether this target supports JIT compilation.</p></li> - </ul> - </li> - - <li><i>type = Tool</i> - <p>Tool components define standalone command line tools which should be - built from the source code in the component directory and linked.</p> - - <p>Components with this type use the following properties:</p> - <ul> - <li><i>required_libraries</i> <b>[optional]</b> - - <p>If given, a list of the names of Library or LibraryGroup components - which this tool is required to be linked with. <b>NOTE:</b> The values - should be the component names, which may not always match up with the - actual library names on disk.</p> - - <p>Build systems are expected to properly include all of the libraries - required by the linked components (i.e., the transitive closer - of <em>required_libraries</em>).</p> - - <p>Build systems are also expected to understand that those library - components must be built prior to linking -- they do not also need to - be listed under <i>dependencies</i>.</p></li> - </ul> - </li> - - <li><i>type = BuildTool</i> - <p>BuildTool components are like Tool components, except that the tool is - supposed to be built for the platform where the build is running (instead - of that platform being targetted). Build systems are expected to handle - the fact that required libraries may need to be built for multiple - platforms in order to be able to link this tool.</p> - - <p>BuildTool components currently use the exact same properties as Tool - components, the type distinction is only used to differentiate what the - tool is built for.</p> - </li> - </ul> -</div> - -<!-- *********************************************************************** --> -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss-blue" alt="Valid CSS"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"></a> - - <a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> -</body> -</html> diff --git a/docs/LLVMBuild.rst b/docs/LLVMBuild.rst new file mode 100644 index 0000000000..d9215dd8eb --- /dev/null +++ b/docs/LLVMBuild.rst @@ -0,0 +1,325 @@ +=============== +LLVMBuild Guide +=============== + +.. contents:: + :local: + +Introduction +============ + +This document describes the ``LLVMBuild`` organization and files which +we use to describe parts of the LLVM ecosystem. For description of +specific LLVMBuild related tools, please see the command guide. + +LLVM is designed to be a modular set of libraries which can be flexibly +mixed together in order to build a variety of tools, like compilers, +JITs, custom code generators, optimization passes, interpreters, and so +on. Related projects in the LLVM system like Clang and LLDB also tend to +follow this philosophy. + +In order to support this usage style, LLVM has a fairly strict structure +as to how the source code and various components are organized. The +``LLVMBuild.txt`` files are the explicit specification of that +structure, and are used by the build systems and other tools in order to +develop the LLVM project. + +Project Organization +==================== + +The source code for LLVM projects using the LLVMBuild system (LLVM, +Clang, and LLDB) is organized into *components*, which define the +separate pieces of functionality that make up the project. These +projects may consist of many libraries, associated tools, build tools, +or other utility tools (for example, testing tools). + +For the most part, the project contents are organized around defining +one main component per each subdirectory. Each such directory contains +an ``LLVMBuild.txt`` which contains the component definitions. + +The component descriptions for the project as a whole are automatically +gathered by the LLVMBuild tools. The tools automatically traverse the +source directory structure to find all of the component description +files. NOTE: For performance/sanity reasons, we only traverse into +subdirectories when the parent itself contains an ``LLVMBuild.txt`` +description file. + +Build Integration +================= + +The LLVMBuild files themselves are just a declarative way to describe +the project structure. The actual building of the LLVM project is +handled by another build system (currently we support both +:doc:`Makefiles <MakefileGuide>` and :doc:`CMake <CMake>`). + +The build system implementation will load the relevant contents of the +LLVMBuild files and use that to drive the actual project build. +Typically, the build system will only need to load this information at +"configure" time, and use it to generative native information. Build +systems will also handle automatically reconfiguring their information +when the contents of the ``LLVMBuild.txt`` files change. + +Developers generally are not expected to need to be aware of the details +of how the LLVMBuild system is integrated into their build. Ideally, +LLVM developers who are not working on the build system would only ever +need to modify the contents of the ``LLVMBuild.txt`` description files +(although we have not reached this goal yet). + +For more information on the utility tool we provide to help interfacing +with the build system, please see the :doc:`llvm-build +<CommandGuide/llvm-build>` documentation. + +Component Overview +================== + +As mentioned earlier, LLVM projects are organized into logical +*components*. Every component is typically grouped into its own +subdirectory. Generally, a component is organized around a coherent +group of sources which have some kind of clear API separation from other +parts of the code. + +LLVM primarily uses the following types of components: + +- *Libraries* - Library components define a distinct API which can be + independently linked into LLVM client applications. Libraries typically + have private and public header files, and may specify a link of required + libraries that they build on top of. +- *Build Tools* - Build tools are applications which are designed to be run + as part of the build process (typically to generate other source files). + Currently, LLVM uses one main build tool called :doc:`TableGen + <TableGenFundamentals>` to generate a variety of source files. +- *Tools* - Command line applications which are built using the LLVM + component libraries. Most LLVM tools are small and are primarily + frontends to the library interfaces. + +Components are described using ``LLVMBuild.txt`` files in the directories +that define the component. See the `LLVMBuild Format Reference`_ section +for information on the exact format of these files. + +LLVMBuild Format Reference +========================== + +LLVMBuild files are written in a simple variant of the INI or configuration +file format (`Wikipedia entry`_). The format defines a list of sections +each of which may contain some number of properties. A simple example of +the file format is below: + +.. _Wikipedia entry: http://en.wikipedia.org/wiki/INI_file + +.. code-block:: ini + + ; Comments start with a semi-colon. + + ; Sections are declared using square brackets. + [component_0] + + ; Properties are declared using '=' and are contained in the previous section. + ; + ; We support simple string and boolean scalar values and list values, where + ; items are separated by spaces. There is no support for quoting, and so + ; property values may not contain spaces. + property_name = property_value + list_property_name = value_1 value_2 ... value_n + boolean_property_name = 1 (or 0) + +LLVMBuild files are expected to define a strict set of sections and +properties. An typical component description file for a library +component would look typically look like the following example: + +.. code-block:: ini + + [component_0] + type = Library + name = Linker + parent = Libraries + required_libraries = Archive BitReader Core Support TransformUtils + +A full description of the exact sections and properties which are +allowed follows. + +Each file may define exactly one common component, named ``common``. The +common component may define the following properties: + +- ``subdirectories`` **[optional]** + + If given, a list of the names of the subdirectories from the current + subpath to search for additional LLVMBuild files. + +Each file may define multiple components. Each component is described by a +section who name starts with ``component``. The remainder of the section +name is ignored, but each section name must be unique. Typically components +are just number in order for files with multiple components +(``component_0``, ``component_1``, and so on). + +.. warning:: + + Section names not matching this format (or the ``common`` section) are + currently unused and are disallowed. + +Every component is defined by the properties in the section. The exact +list of properties that are allowed depends on the component type. +Components **may not** define any properties other than those expected +by the component type. + +Every component must define the following properties: + +- ``type`` **[required]** + + The type of the component. Supported component types are detailed + below. Most components will define additional properties which may be + required or optional. + +- ``name`` **[required]** + + The name of the component. Names are required to be unique across the + entire project. + +- ``parent`` **[required]** + + The name of the logical parent of the component. Components are + organized into a logical tree to make it easier to navigate and + organize groups of components. The parents have no semantics as far + as the project build is concerned, however. Typically, the parent + will be the main component of the parent directory. + + Components may reference the root pseudo component using ``$ROOT`` to + indicate they should logically be grouped at the top-level. + +Components may define the following properties: + +- ``dependencies`` **[optional]** + + If specified, a list of names of components which *must* be built + prior to this one. This should only be exactly those components which + produce some tool or source code required for building the component. + + .. note:: + + ``Group`` and ``LibraryGroup`` components have no semantics for the + actual build, and are not allowed to specify dependencies. + +The following section lists the available component types, as well as +the properties which are associated with that component. + +- ``type = Group`` + + Group components exist purely to allow additional arbitrary structuring + of the logical components tree. For example, one might define a + ``Libraries`` group to hold all of the root library components. + + ``Group`` components have no additionally properties. + +- ``type = Library`` + + Library components define an individual library which should be built + from the source code in the component directory. + + Components with this type use the following properties: + + - ``library_name`` **[optional]** + + If given, the name to use for the actual library file on disk. If + not given, the name is derived from the component name itself. + + - ``required_libraries`` **[optional]** + + If given, a list of the names of ``Library`` or ``LibraryGroup`` + components which must also be linked in whenever this library is + used. That is, the link time dependencies for this component. When + tools are built, the build system will include the transitive closure + of all ``required_libraries`` for the components the tool needs. + + - ``add_to_library_groups`` **[optional]** + + If given, a list of the names of ``LibraryGroup`` components which + this component is also part of. This allows nesting groups of + components. For example, the ``X86`` target might define a library + group for all of the ``X86`` components. That library group might + then be included in the ``all-targets`` library group. + + - ``installed`` **[optional]** **[boolean]** + + Whether this library is installed. Libraries that are not installed + are only reported by ``llvm-config`` when it is run as part of a + development directory. + +- ``type = LibraryGroup`` + + ``LibraryGroup`` components are a mechanism to allow easy definition of + useful sets of related components. In particular, we use them to easily + specify things like "all targets", or "all assembly printers". + + Components with this type use the following properties: + + - ``required_libraries`` **[optional]** + + See the ``Library`` type for a description of this property. + + - ``add_to_library_groups`` **[optional]** + + See the ``Library`` type for a description of this property. + +- ``type = TargetGroup`` + + ``TargetGroup`` components are an extension of ``LibraryGroup``\s, + specifically for defining LLVM targets (which are handled specially in a + few places). + + The name of the component should always be the name of the target. + + Components with this type use the ``LibraryGroup`` properties in + addition to: + + - ``has_asmparser`` **[optional]** **[boolean]** + + Whether this target defines an assembly parser. + + - ``has_asmprinter`` **[optional]** **[boolean]** + + Whether this target defines an assembly printer. + + - ``has_disassembler`` **[optional]** **[boolean]** + + Whether this target defines a disassembler. + + - ``has_jit`` **[optional]** **[boolean]** + + Whether this target supports JIT compilation. + +- ``type = Tool`` + + ``Tool`` components define standalone command line tools which should be + built from the source code in the component directory and linked. + + Components with this type use the following properties: + + - ``required_libraries`` **[optional]** + + If given, a list of the names of ``Library`` or ``LibraryGroup`` + components which this tool is required to be linked with. + + .. note:: + + The values should be the component names, which may not always + match up with the actual library names on disk. + + Build systems are expected to properly include all of the libraries + required by the linked components (i.e., the transitive closure of + ``required_libraries``). + + Build systems are also expected to understand that those library + components must be built prior to linking -- they do not also need + to be listed under ``dependencies``. + +- ``type = BuildTool`` + + ``BuildTool`` components are like ``Tool`` components, except that the + tool is supposed to be built for the platform where the build is running + (instead of that platform being targetted). Build systems are expected + to handle the fact that required libraries may need to be built for + multiple platforms in order to be able to link this tool. + + ``BuildTool`` components currently use the exact same properties as + ``Tool`` components, the type distinction is only used to differentiate + what the tool is built for. + diff --git a/docs/LangRef.html b/docs/LangRef.html deleted file mode 100644 index ed47f1f00e..0000000000 --- a/docs/LangRef.html +++ /dev/null @@ -1,8776 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> -<html> -<head> - <title>LLVM Assembly Language Reference Manual</title> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <meta name="author" content="Chris Lattner"> - <meta name="description" - content="LLVM Assembly Language Reference Manual."> - <link rel="stylesheet" href="_static/llvm.css" type="text/css"> -</head> - -<body> - -<h1>LLVM Language Reference Manual</h1> -<ol> - <li><a href="#abstract">Abstract</a></li> - <li><a href="#introduction">Introduction</a></li> - <li><a href="#identifiers">Identifiers</a></li> - <li><a href="#highlevel">High Level Structure</a> - <ol> - <li><a href="#modulestructure">Module Structure</a></li> - <li><a href="#linkage">Linkage Types</a> - <ol> - <li><a href="#linkage_private">'<tt>private</tt>' Linkage</a></li> - <li><a href="#linkage_linker_private">'<tt>linker_private</tt>' Linkage</a></li> - <li><a href="#linkage_linker_private_weak">'<tt>linker_private_weak</tt>' Linkage</a></li> - <li><a href="#linkage_internal">'<tt>internal</tt>' Linkage</a></li> - <li><a href="#linkage_available_externally">'<tt>available_externally</tt>' Linkage</a></li> - <li><a href="#linkage_linkonce">'<tt>linkonce</tt>' Linkage</a></li> - <li><a href="#linkage_common">'<tt>common</tt>' Linkage</a></li> - <li><a href="#linkage_weak">'<tt>weak</tt>' Linkage</a></li> - <li><a href="#linkage_appending">'<tt>appending</tt>' Linkage</a></li> - <li><a href="#linkage_externweak">'<tt>extern_weak</tt>' Linkage</a></li> - <li><a href="#linkage_linkonce_odr">'<tt>linkonce_odr</tt>' Linkage</a></li> - <li><a href="#linkage_linkonce_odr_auto_hide">'<tt>linkonce_odr_auto_hide</tt>' Linkage</a></li> - <li><a href="#linkage_weak">'<tt>weak_odr</tt>' Linkage</a></li> - <li><a href="#linkage_external">'<tt>external</tt>' Linkage</a></li> - <li><a href="#linkage_dllimport">'<tt>dllimport</tt>' Linkage</a></li> - <li><a href="#linkage_dllexport">'<tt>dllexport</tt>' Linkage</a></li> - </ol> - </li> - <li><a href="#callingconv">Calling Conventions</a></li> - <li><a href="#namedtypes">Named Types</a></li> - <li><a href="#globalvars">Global Variables</a></li> - <li><a href="#functionstructure">Functions</a></li> - <li><a href="#aliasstructure">Aliases</a></li> - <li><a href="#namedmetadatastructure">Named Metadata</a></li> - <li><a href="#paramattrs">Parameter Attributes</a></li> - <li><a href="#fnattrs">Function Attributes</a></li> - <li><a href="#gc">Garbage Collector Names</a></li> - <li><a href="#moduleasm">Module-Level Inline Assembly</a></li> - <li><a href="#datalayout">Data Layout</a></li> - <li><a href="#pointeraliasing">Pointer Aliasing Rules</a></li> - <li><a href="#volatile">Volatile Memory Accesses</a></li> - <li><a href="#memmodel">Memory Model for Concurrent Operations</a></li> - <li><a href="#ordering">Atomic Memory Ordering Constraints</a></li> - </ol> - </li> - <li><a href="#typesystem">Type System</a> - <ol> - <li><a href="#t_classifications">Type Classifications</a></li> - <li><a href="#t_primitive">Primitive Types</a> - <ol> - <li><a href="#t_integer">Integer Type</a></li> - <li><a href="#t_floating">Floating Point Types</a></li> - <li><a href="#t_x86mmx">X86mmx Type</a></li> - <li><a href="#t_void">Void Type</a></li> - <li><a href="#t_label">Label Type</a></li> - <li><a href="#t_metadata">Metadata Type</a></li> - </ol> - </li> - <li><a href="#t_derived">Derived Types</a> - <ol> - <li><a href="#t_aggregate">Aggregate Types</a> - <ol> - <li><a href="#t_array">Array Type</a></li> - <li><a href="#t_struct">Structure Type</a></li> - <li><a href="#t_opaque">Opaque Structure Types</a></li> - <li><a href="#t_vector">Vector Type</a></li> - </ol> - </li> - <li><a href="#t_function">Function Type</a></li> - <li><a href="#t_pointer">Pointer Type</a></li> - </ol> - </li> - </ol> - </li> - <li><a href="#constants">Constants</a> - <ol> - <li><a href="#simpleconstants">Simple Constants</a></li> - <li><a href="#complexconstants">Complex Constants</a></li> - <li><a href="#globalconstants">Global Variable and Function Addresses</a></li> - <li><a href="#undefvalues">Undefined Values</a></li> - <li><a href="#poisonvalues">Poison Values</a></li> - <li><a href="#blockaddress">Addresses of Basic Blocks</a></li> - <li><a href="#constantexprs">Constant Expressions</a></li> - </ol> - </li> - <li><a href="#othervalues">Other Values</a> - <ol> - <li><a href="#inlineasm">Inline Assembler Expressions</a></li> - <li><a href="#metadata">Metadata Nodes and Metadata Strings</a> - <ol> - <li><a href="#tbaa">'<tt>tbaa</tt>' Metadata</a></li> - <li><a href="#tbaa.struct">'<tt>tbaa.struct</tt>' Metadata</a></li> - <li><a href="#fpmath">'<tt>fpmath</tt>' Metadata</a></li> - <li><a href="#range">'<tt>range</tt>' Metadata</a></li> - </ol> - </li> - </ol> - </li> - <li><a href="#module_flags">Module Flags Metadata</a> - <ol> - <li><a href="#objc_gc_flags">Objective-C Garbage Collection Module Flags Metadata</a></li> - </ol> - </li> - <li><a href="#intrinsic_globals">Intrinsic Global Variables</a> - <ol> - <li><a href="#intg_used">The '<tt>llvm.used</tt>' Global Variable</a></li> - <li><a href="#intg_compiler_used">The '<tt>llvm.compiler.used</tt>' - Global Variable</a></li> - <li><a href="#intg_global_ctors">The '<tt>llvm.global_ctors</tt>' - Global Variable</a></li> - <li><a href="#intg_global_dtors">The '<tt>llvm.global_dtors</tt>' - Global Variable</a></li> - </ol> - </li> - <li><a href="#instref">Instruction Reference</a> - <ol> - <li><a href="#terminators">Terminator Instructions</a> - <ol> - <li><a href="#i_ret">'<tt>ret</tt>' Instruction</a></li> - <li><a href="#i_br">'<tt>br</tt>' Instruction</a></li> - <li><a href="#i_switch">'<tt>switch</tt>' Instruction</a></li> - <li><a href="#i_indirectbr">'<tt>indirectbr</tt>' Instruction</a></li> - <li><a href="#i_invoke">'<tt>invoke</tt>' Instruction</a></li> - <li><a href="#i_resume">'<tt>resume</tt>' Instruction</a></li> - <li><a href="#i_unreachable">'<tt>unreachable</tt>' Instruction</a></li> - </ol> - </li> - <li><a href="#binaryops">Binary Operations</a> - <ol> - <li><a href="#i_add">'<tt>add</tt>' Instruction</a></li> - <li><a href="#i_fadd">'<tt>fadd</tt>' Instruction</a></li> - <li><a href="#i_sub">'<tt>sub</tt>' Instruction</a></li> - <li><a href="#i_fsub">'<tt>fsub</tt>' Instruction</a></li> - <li><a href="#i_mul">'<tt>mul</tt>' Instruction</a></li> - <li><a href="#i_fmul">'<tt>fmul</tt>' Instruction</a></li> - <li><a href="#i_udiv">'<tt>udiv</tt>' Instruction</a></li> - <li><a href="#i_sdiv">'<tt>sdiv</tt>' Instruction</a></li> - <li><a href="#i_fdiv">'<tt>fdiv</tt>' Instruction</a></li> - <li><a href="#i_urem">'<tt>urem</tt>' Instruction</a></li> - <li><a href="#i_srem">'<tt>srem</tt>' Instruction</a></li> - <li><a href="#i_frem">'<tt>frem</tt>' Instruction</a></li> - </ol> - </li> - <li><a href="#bitwiseops">Bitwise Binary Operations</a> - <ol> - <li><a href="#i_shl">'<tt>shl</tt>' Instruction</a></li> - <li><a href="#i_lshr">'<tt>lshr</tt>' Instruction</a></li> - <li><a href="#i_ashr">'<tt>ashr</tt>' Instruction</a></li> - <li><a href="#i_and">'<tt>and</tt>' Instruction</a></li> - <li><a href="#i_or">'<tt>or</tt>' Instruction</a></li> - <li><a href="#i_xor">'<tt>xor</tt>' Instruction</a></li> - </ol> - </li> - <li><a href="#vectorops">Vector Operations</a> - <ol> - <li><a href="#i_extractelement">'<tt>extractelement</tt>' Instruction</a></li> - <li><a href="#i_insertelement">'<tt>insertelement</tt>' Instruction</a></li> - <li><a href="#i_shufflevector">'<tt>shufflevector</tt>' Instruction</a></li> - </ol> - </li> - <li><a href="#aggregateops">Aggregate Operations</a> - <ol> - <li><a href="#i_extractvalue">'<tt>extractvalue</tt>' Instruction</a></li> - <li><a href="#i_insertvalue">'<tt>insertvalue</tt>' Instruction</a></li> - </ol> - </li> - <li><a href="#memoryops">Memory Access and Addressing Operations</a> - <ol> - <li><a href="#i_alloca">'<tt>alloca</tt>' Instruction</a></li> - <li><a href="#i_load">'<tt>load</tt>' Instruction</a></li> - <li><a href="#i_store">'<tt>store</tt>' Instruction</a></li> - <li><a href="#i_fence">'<tt>fence</tt>' Instruction</a></li> - <li><a href="#i_cmpxchg">'<tt>cmpxchg</tt>' Instruction</a></li> - <li><a href="#i_atomicrmw">'<tt>atomicrmw</tt>' Instruction</a></li> - <li><a href="#i_getelementptr">'<tt>getelementptr</tt>' Instruction</a></li> - </ol> - </li> - <li><a href="#convertops">Conversion Operations</a> - <ol> - <li><a href="#i_trunc">'<tt>trunc .. to</tt>' Instruction</a></li> - <li><a href="#i_zext">'<tt>zext .. to</tt>' Instruction</a></li> - <li><a href="#i_sext">'<tt>sext .. to</tt>' Instruction</a></li> - <li><a href="#i_fptrunc">'<tt>fptrunc .. to</tt>' Instruction</a></li> - <li><a href="#i_fpext">'<tt>fpext .. to</tt>' Instruction</a></li> - <li><a href="#i_fptoui">'<tt>fptoui .. to</tt>' Instruction</a></li> - <li><a href="#i_fptosi">'<tt>fptosi .. to</tt>' Instruction</a></li> - <li><a href="#i_uitofp">'<tt>uitofp .. to</tt>' Instruction</a></li> - <li><a href="#i_sitofp">'<tt>sitofp .. to</tt>' Instruction</a></li> - <li><a href="#i_ptrtoint">'<tt>ptrtoint .. to</tt>' Instruction</a></li> - <li><a href="#i_inttoptr">'<tt>inttoptr .. to</tt>' Instruction</a></li> - <li><a href="#i_bitcast">'<tt>bitcast .. to</tt>' Instruction</a></li> - </ol> - </li> - <li><a href="#otherops">Other Operations</a> - <ol> - <li><a href="#i_icmp">'<tt>icmp</tt>' Instruction</a></li> - <li><a href="#i_fcmp">'<tt>fcmp</tt>' Instruction</a></li> - <li><a href="#i_phi">'<tt>phi</tt>' Instruction</a></li> - <li><a href="#i_select">'<tt>select</tt>' Instruction</a></li> - <li><a href="#i_call">'<tt>call</tt>' Instruction</a></li> - <li><a href="#i_va_arg">'<tt>va_arg</tt>' Instruction</a></li> - <li><a href="#i_landingpad">'<tt>landingpad</tt>' Instruction</a></li> - </ol> - </li> - </ol> - </li> - <li><a href="#intrinsics">Intrinsic Functions</a> - <ol> - <li><a href="#int_varargs">Variable Argument Handling Intrinsics</a> - <ol> - <li><a href="#int_va_start">'<tt>llvm.va_start</tt>' Intrinsic</a></li> - <li><a href="#int_va_end">'<tt>llvm.va_end</tt>' Intrinsic</a></li> - <li><a href="#int_va_copy">'<tt>llvm.va_copy</tt>' Intrinsic</a></li> - </ol> - </li> - <li><a href="#int_gc">Accurate Garbage Collection Intrinsics</a> - <ol> - <li><a href="#int_gcroot">'<tt>llvm.gcroot</tt>' Intrinsic</a></li> - <li><a href="#int_gcread">'<tt>llvm.gcread</tt>' Intrinsic</a></li> - <li><a href="#int_gcwrite">'<tt>llvm.gcwrite</tt>' Intrinsic</a></li> - </ol> - </li> - <li><a href="#int_codegen">Code Generator Intrinsics</a> - <ol> - <li><a href="#int_returnaddress">'<tt>llvm.returnaddress</tt>' Intrinsic</a></li> - <li><a href="#int_frameaddress">'<tt>llvm.frameaddress</tt>' Intrinsic</a></li> - <li><a href="#int_stacksave">'<tt>llvm.stacksave</tt>' Intrinsic</a></li> - <li><a href="#int_stackrestore">'<tt>llvm.stackrestore</tt>' Intrinsic</a></li> - <li><a href="#int_prefetch">'<tt>llvm.prefetch</tt>' Intrinsic</a></li> - <li><a href="#int_pcmarker">'<tt>llvm.pcmarker</tt>' Intrinsic</a></li> - <li><a href="#int_readcyclecounter">'<tt>llvm.readcyclecounter</tt>' Intrinsic</a></li> - </ol> - </li> - <li><a href="#int_libc">Standard C Library Intrinsics</a> - <ol> - <li><a href="#int_memcpy">'<tt>llvm.memcpy.*</tt>' Intrinsic</a></li> - <li><a href="#int_memmove">'<tt>llvm.memmove.*</tt>' Intrinsic</a></li> - <li><a href="#int_memset">'<tt>llvm.memset.*</tt>' Intrinsic</a></li> - <li><a href="#int_sqrt">'<tt>llvm.sqrt.*</tt>' Intrinsic</a></li> - <li><a href="#int_powi">'<tt>llvm.powi.*</tt>' Intrinsic</a></li> - <li><a href="#int_sin">'<tt>llvm.sin.*</tt>' Intrinsic</a></li> - <li><a href="#int_cos">'<tt>llvm.cos.*</tt>' Intrinsic</a></li> - <li><a href="#int_pow">'<tt>llvm.pow.*</tt>' Intrinsic</a></li> - <li><a href="#int_exp">'<tt>llvm.exp.*</tt>' Intrinsic</a></li> - <li><a href="#int_log">'<tt>llvm.log.*</tt>' Intrinsic</a></li> - <li><a href="#int_fma">'<tt>llvm.fma.*</tt>' Intrinsic</a></li> - <li><a href="#int_fabs">'<tt>llvm.fabs.*</tt>' Intrinsic</a></li> - <li><a href="#int_floor">'<tt>llvm.floor.*</tt>' Intrinsic</a></li> - </ol> - </li> - <li><a href="#int_manip">Bit Manipulation Intrinsics</a> - <ol> - <li><a href="#int_bswap">'<tt>llvm.bswap.*</tt>' Intrinsics</a></li> - <li><a href="#int_ctpop">'<tt>llvm.ctpop.*</tt>' Intrinsic </a></li> - <li><a href="#int_ctlz">'<tt>llvm.ctlz.*</tt>' Intrinsic </a></li> - <li><a href="#int_cttz">'<tt>llvm.cttz.*</tt>' Intrinsic </a></li> - </ol> - </li> - <li><a href="#int_overflow">Arithmetic with Overflow Intrinsics</a> - <ol> - <li><a href="#int_sadd_overflow">'<tt>llvm.sadd.with.overflow.*</tt> Intrinsics</a></li> - <li><a href="#int_uadd_overflow">'<tt>llvm.uadd.with.overflow.*</tt> Intrinsics</a></li> - <li><a href="#int_ssub_overflow">'<tt>llvm.ssub.with.overflow.*</tt> Intrinsics</a></li> - <li><a href="#int_usub_overflow">'<tt>llvm.usub.with.overflow.*</tt> Intrinsics</a></li> - <li><a href="#int_smul_overflow">'<tt>llvm.smul.with.overflow.*</tt> Intrinsics</a></li> - <li><a href="#int_umul_overflow">'<tt>llvm.umul.with.overflow.*</tt> Intrinsics</a></li> - </ol> - </li> - <li><a href="#spec_arithmetic">Specialised Arithmetic Intrinsics</a> - <ol> - <li><a href="#fmuladd">'<tt>llvm.fmuladd</tt> Intrinsic</a></li> - </ol> - </li> - <li><a href="#int_fp16">Half Precision Floating Point Intrinsics</a> - <ol> - <li><a href="#int_convert_to_fp16">'<tt>llvm.convert.to.fp16</tt>' Intrinsic</a></li> - <li><a href="#int_convert_from_fp16">'<tt>llvm.convert.from.fp16</tt>' Intrinsic</a></li> - </ol> - </li> - <li><a href="#int_debugger">Debugger intrinsics</a></li> - <li><a href="#int_eh">Exception Handling intrinsics</a></li> - <li><a href="#int_trampoline">Trampoline Intrinsics</a> - <ol> - <li><a href="#int_it">'<tt>llvm.init.trampoline</tt>' Intrinsic</a></li> - <li><a href="#int_at">'<tt>llvm.adjust.trampoline</tt>' Intrinsic</a></li> - </ol> - </li> - <li><a href="#int_memorymarkers">Memory Use Markers</a> - <ol> - <li><a href="#int_lifetime_start">'<tt>llvm.lifetime.start</tt>' Intrinsic</a></li> - <li><a href="#int_lifetime_end">'<tt>llvm.lifetime.end</tt>' Intrinsic</a></li> - <li><a href="#int_invariant_start">'<tt>llvm.invariant.start</tt>' Intrinsic</a></li> - <li><a href="#int_invariant_end">'<tt>llvm.invariant.end</tt>' Intrinsic</a></li> - </ol> - </li> - <li><a href="#int_general">General intrinsics</a> - <ol> - <li><a href="#int_var_annotation"> - '<tt>llvm.var.annotation</tt>' Intrinsic</a></li> - <li><a href="#int_annotation"> - '<tt>llvm.annotation.*</tt>' Intrinsic</a></li> - <li><a href="#int_trap"> - '<tt>llvm.trap</tt>' Intrinsic</a></li> - <li><a href="#int_debugtrap"> - '<tt>llvm.debugtrap</tt>' Intrinsic</a></li> - <li><a href="#int_stackprotector"> - '<tt>llvm.stackprotector</tt>' Intrinsic</a></li> - <li><a href="#int_objectsize"> - '<tt>llvm.objectsize</tt>' Intrinsic</a></li> - <li><a href="#int_expect"> - '<tt>llvm.expect</tt>' Intrinsic</a></li> - <li><a href="#int_donothing"> - '<tt>llvm.donothing</tt>' Intrinsic</a></li> - </ol> - </li> - </ol> - </li> -</ol> - -<div class="doc_author"> - <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a> - and <a href="mailto:vadve@cs.uiuc.edu">Vikram Adve</a></p> -</div> - -<!-- *********************************************************************** --> -<h2><a name="abstract">Abstract</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>This document is a reference manual for the LLVM assembly language. LLVM is - a Static Single Assignment (SSA) based representation that provides type - safety, low-level operations, flexibility, and the capability of representing - 'all' high-level languages cleanly. It is the common code representation - used throughout all phases of the LLVM compilation strategy.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="introduction">Introduction</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>The LLVM code representation is designed to be used in three different forms: - as an in-memory compiler IR, as an on-disk bitcode representation (suitable - for fast loading by a Just-In-Time compiler), and as a human readable - assembly language representation. This allows LLVM to provide a powerful - intermediate representation for efficient compiler transformations and - analysis, while providing a natural means to debug and visualize the - transformations. The three different forms of LLVM are all equivalent. This - document describes the human readable representation and notation.</p> - -<p>The LLVM representation aims to be light-weight and low-level while being - expressive, typed, and extensible at the same time. It aims to be a - "universal IR" of sorts, by being at a low enough level that high-level ideas - may be cleanly mapped to it (similar to how microprocessors are "universal - IR's", allowing many source languages to be mapped to them). By providing - type information, LLVM can be used as the target of optimizations: for - example, through pointer analysis, it can be proven that a C automatic - variable is never accessed outside of the current function, allowing it to - be promoted to a simple SSA value instead of a memory location.</p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="wellformed">Well-Formedness</a> -</h4> - -<div> - -<p>It is important to note that this document describes 'well formed' LLVM - assembly language. There is a difference between what the parser accepts and - what is considered 'well formed'. For example, the following instruction is - syntactically okay, but not well formed:</p> - -<pre class="doc_code"> -%x = <a href="#i_add">add</a> i32 1, %x -</pre> - -<p>because the definition of <tt>%x</tt> does not dominate all of its uses. The - LLVM infrastructure provides a verification pass that may be used to verify - that an LLVM module is well formed. This pass is automatically run by the - parser after parsing input assembly and by the optimizer before it outputs - bitcode. The violations pointed out by the verifier pass indicate bugs in - transformation passes or input to the parser.</p> - -</div> - -</div> - -<!-- Describe the typesetting conventions here. --> - -<!-- *********************************************************************** --> -<h2><a name="identifiers">Identifiers</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>LLVM identifiers come in two basic types: global and local. Global - identifiers (functions, global variables) begin with the <tt>'@'</tt> - character. Local identifiers (register names, types) begin with - the <tt>'%'</tt> character. Additionally, there are three different formats - for identifiers, for different purposes:</p> - -<ol> - <li>Named values are represented as a string of characters with their prefix. - For example, <tt>%foo</tt>, <tt>@DivisionByZero</tt>, - <tt>%a.really.long.identifier</tt>. The actual regular expression used is - '<tt>[%@][a-zA-Z$._][a-zA-Z$._0-9]*</tt>'. Identifiers which require - other characters in their names can be surrounded with quotes. Special - characters may be escaped using <tt>"\xx"</tt> where <tt>xx</tt> is the - ASCII code for the character in hexadecimal. In this way, any character - can be used in a name value, even quotes themselves.</li> - - <li>Unnamed values are represented as an unsigned numeric value with their - prefix. For example, <tt>%12</tt>, <tt>@2</tt>, <tt>%44</tt>.</li> - - <li>Constants, which are described in a <a href="#constants">section about - constants</a>, below.</li> -</ol> - -<p>LLVM requires that values start with a prefix for two reasons: Compilers - don't need to worry about name clashes with reserved words, and the set of - reserved words may be expanded in the future without penalty. Additionally, - unnamed identifiers allow a compiler to quickly come up with a temporary - variable without having to avoid symbol table conflicts.</p> - -<p>Reserved words in LLVM are very similar to reserved words in other - languages. There are keywords for different opcodes - ('<tt><a href="#i_add">add</a></tt>', - '<tt><a href="#i_bitcast">bitcast</a></tt>', - '<tt><a href="#i_ret">ret</a></tt>', etc...), for primitive type names - ('<tt><a href="#t_void">void</a></tt>', - '<tt><a href="#t_primitive">i32</a></tt>', etc...), and others. These - reserved words cannot conflict with variable names, because none of them - start with a prefix character (<tt>'%'</tt> or <tt>'@'</tt>).</p> - -<p>Here is an example of LLVM code to multiply the integer variable - '<tt>%X</tt>' by 8:</p> - -<p>The easy way:</p> - -<pre class="doc_code"> -%result = <a href="#i_mul">mul</a> i32 %X, 8 -</pre> - -<p>After strength reduction:</p> - -<pre class="doc_code"> -%result = <a href="#i_shl">shl</a> i32 %X, i8 3 -</pre> - -<p>And the hard way:</p> - -<pre class="doc_code"> -%0 = <a href="#i_add">add</a> i32 %X, %X <i>; yields {i32}:%0</i> -%1 = <a href="#i_add">add</a> i32 %0, %0 <i>; yields {i32}:%1</i> -%result = <a href="#i_add">add</a> i32 %1, %1 -</pre> - -<p>This last way of multiplying <tt>%X</tt> by 8 illustrates several important - lexical features of LLVM:</p> - -<ol> - <li>Comments are delimited with a '<tt>;</tt>' and go until the end of - line.</li> - - <li>Unnamed temporaries are created when the result of a computation is not - assigned to a named value.</li> - - <li>Unnamed temporaries are numbered sequentially</li> -</ol> - -<p>It also shows a convention that we follow in this document. When - demonstrating instructions, we will follow an instruction with a comment that - defines the type and name of value produced. Comments are shown in italic - text.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="highlevel">High Level Structure</a></h2> -<!-- *********************************************************************** --> -<div> -<!-- ======================================================================= --> -<h3> - <a name="modulestructure">Module Structure</a> -</h3> - -<div> - -<p>LLVM programs are composed of <tt>Module</tt>s, each of which is a - translation unit of the input programs. Each module consists of functions, - global variables, and symbol table entries. Modules may be combined together - with the LLVM linker, which merges function (and global variable) - definitions, resolves forward declarations, and merges symbol table - entries. Here is an example of the "hello world" module:</p> - -<pre class="doc_code"> -<i>; Declare the string constant as a global constant.</i> -<a href="#identifiers">@.str</a> = <a href="#linkage_private">private</a> <a href="#globalvars">unnamed_addr</a> <a href="#globalvars">constant</a> <a href="#t_array">[13 x i8]</a> c"hello world\0A\00" - -<i>; External declaration of the puts function</i> -<a href="#functionstructure">declare</a> i32 @puts(i8* <a href="#nocapture">nocapture</a>) <a href="#fnattrs">nounwind</a> - -<i>; Definition of main function</i> -define i32 @main() { <i>; i32()* </i> - <i>; Convert [13 x i8]* to i8 *...</i> - %cast210 = <a href="#i_getelementptr">getelementptr</a> [13 x i8]* @.str, i64 0, i64 0 - - <i>; Call puts function to write out the string to stdout.</i> - <a href="#i_call">call</a> i32 @puts(i8* %cast210) - <a href="#i_ret">ret</a> i32 0 -} - -<i>; Named metadata</i> -!1 = metadata !{i32 42} -!foo = !{!1, null} -</pre> - -<p>This example is made up of a <a href="#globalvars">global variable</a> named - "<tt>.str</tt>", an external declaration of the "<tt>puts</tt>" function, - a <a href="#functionstructure">function definition</a> for - "<tt>main</tt>" and <a href="#namedmetadatastructure">named metadata</a> - "<tt>foo</tt>".</p> - -<p>In general, a module is made up of a list of global values (where both - functions and global variables are global values). Global values are - represented by a pointer to a memory location (in this case, a pointer to an - array of char, and a pointer to a function), and have one of the - following <a href="#linkage">linkage types</a>.</p> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="linkage">Linkage Types</a> -</h3> - -<div> - -<p>All Global Variables and Functions have one of the following types of - linkage:</p> - -<dl> - <dt><tt><b><a name="linkage_private">private</a></b></tt></dt> - <dd>Global values with "<tt>private</tt>" linkage are only directly accessible - by objects in the current module. In particular, linking code into a - module with an private global value may cause the private to be renamed as - necessary to avoid collisions. Because the symbol is private to the - module, all references can be updated. This doesn't show up in any symbol - table in the object file.</dd> - - <dt><tt><b><a name="linkage_linker_private">linker_private</a></b></tt></dt> - <dd>Similar to <tt>private</tt>, but the symbol is passed through the - assembler and evaluated by the linker. Unlike normal strong symbols, they - are removed by the linker from the final linked image (executable or - dynamic library).</dd> - - <dt><tt><b><a name="linkage_linker_private_weak">linker_private_weak</a></b></tt></dt> - <dd>Similar to "<tt>linker_private</tt>", but the symbol is weak. Note that - <tt>linker_private_weak</tt> symbols are subject to coalescing by the - linker. The symbols are removed by the linker from the final linked image - (executable or dynamic library).</dd> - - <dt><tt><b><a name="linkage_internal">internal</a></b></tt></dt> - <dd>Similar to private, but the value shows as a local symbol - (<tt>STB_LOCAL</tt> in the case of ELF) in the object file. This - corresponds to the notion of the '<tt>static</tt>' keyword in C.</dd> - - <dt><tt><b><a name="linkage_available_externally">available_externally</a></b></tt></dt> - <dd>Globals with "<tt>available_externally</tt>" linkage are never emitted - into the object file corresponding to the LLVM module. They exist to - allow inlining and other optimizations to take place given knowledge of - the definition of the global, which is known to be somewhere outside the - module. Globals with <tt>available_externally</tt> linkage are allowed to - be discarded at will, and are otherwise the same as <tt>linkonce_odr</tt>. - This linkage type is only allowed on definitions, not declarations.</dd> - - <dt><tt><b><a name="linkage_linkonce">linkonce</a></b></tt></dt> - <dd>Globals with "<tt>linkonce</tt>" linkage are merged with other globals of - the same name when linkage occurs. This can be used to implement - some forms of inline functions, templates, or other code which must be - generated in each translation unit that uses it, but where the body may - be overridden with a more definitive definition later. Unreferenced - <tt>linkonce</tt> globals are allowed to be discarded. Note that - <tt>linkonce</tt> linkage does not actually allow the optimizer to - inline the body of this function into callers because it doesn't know if - this definition of the function is the definitive definition within the - program or whether it will be overridden by a stronger definition. - To enable inlining and other optimizations, use "<tt>linkonce_odr</tt>" - linkage.</dd> - - <dt><tt><b><a name="linkage_weak">weak</a></b></tt></dt> - <dd>"<tt>weak</tt>" linkage has the same merging semantics as - <tt>linkonce</tt> linkage, except that unreferenced globals with - <tt>weak</tt> linkage may not be discarded. This is used for globals that - are declared "weak" in C source code.</dd> - - <dt><tt><b><a name="linkage_common">common</a></b></tt></dt> - <dd>"<tt>common</tt>" linkage is most similar to "<tt>weak</tt>" linkage, but - they are used for tentative definitions in C, such as "<tt>int X;</tt>" at - global scope. - Symbols with "<tt>common</tt>" linkage are merged in the same way as - <tt>weak symbols</tt>, and they may not be deleted if unreferenced. - <tt>common</tt> symbols may not have an explicit section, - must have a zero initializer, and may not be marked '<a - href="#globalvars"><tt>constant</tt></a>'. Functions and aliases may not - have common linkage.</dd> - - - <dt><tt><b><a name="linkage_appending">appending</a></b></tt></dt> - <dd>"<tt>appending</tt>" linkage may only be applied to global variables of - pointer to array type. When two global variables with appending linkage - are linked together, the two global arrays are appended together. This is - the LLVM, typesafe, equivalent of having the system linker append together - "sections" with identical names when .o files are linked.</dd> - - <dt><tt><b><a name="linkage_externweak">extern_weak</a></b></tt></dt> - <dd>The semantics of this linkage follow the ELF object file model: the symbol - is weak until linked, if not linked, the symbol becomes null instead of - being an undefined reference.</dd> - - <dt><tt><b><a name="linkage_linkonce_odr">linkonce_odr</a></b></tt></dt> - <dt><tt><b><a name="linkage_weak_odr">weak_odr</a></b></tt></dt> - <dd>Some languages allow differing globals to be merged, such as two functions - with different semantics. Other languages, such as <tt>C++</tt>, ensure - that only equivalent globals are ever merged (the "one definition rule" - — "ODR"). Such languages can use the <tt>linkonce_odr</tt> - and <tt>weak_odr</tt> linkage types to indicate that the global will only - be merged with equivalent globals. These linkage types are otherwise the - same as their non-<tt>odr</tt> versions.</dd> - - <dt><tt><b><a name="linkage_linkonce_odr_auto_hide">linkonce_odr_auto_hide</a></b></tt></dt> - <dd>Similar to "<tt>linkonce_odr</tt>", but nothing in the translation unit - takes the address of this definition. For instance, functions that had an - inline definition, but the compiler decided not to inline it. - <tt>linkonce_odr_auto_hide</tt> may have only <tt>default</tt> visibility. - The symbols are removed by the linker from the final linked image - (executable or dynamic library).</dd> - - <dt><tt><b><a name="linkage_external">external</a></b></tt></dt> - <dd>If none of the above identifiers are used, the global is externally - visible, meaning that it participates in linkage and can be used to - resolve external symbol references.</dd> -</dl> - -<p>The next two types of linkage are targeted for Microsoft Windows platform - only. They are designed to support importing (exporting) symbols from (to) - DLLs (Dynamic Link Libraries).</p> - -<dl> - <dt><tt><b><a name="linkage_dllimport">dllimport</a></b></tt></dt> - <dd>"<tt>dllimport</tt>" linkage causes the compiler to reference a function - or variable via a global pointer to a pointer that is set up by the DLL - exporting the symbol. On Microsoft Windows targets, the pointer name is - formed by combining <code>__imp_</code> and the function or variable - name.</dd> - - <dt><tt><b><a name="linkage_dllexport">dllexport</a></b></tt></dt> - <dd>"<tt>dllexport</tt>" linkage causes the compiler to provide a global - pointer to a pointer in a DLL, so that it can be referenced with the - <tt>dllimport</tt> attribute. On Microsoft Windows targets, the pointer - name is formed by combining <code>__imp_</code> and the function or - variable name.</dd> -</dl> - -<p>For example, since the "<tt>.LC0</tt>" variable is defined to be internal, if - another module defined a "<tt>.LC0</tt>" variable and was linked with this - one, one of the two would be renamed, preventing a collision. Since - "<tt>main</tt>" and "<tt>puts</tt>" are external (i.e., lacking any linkage - declarations), they are accessible outside of the current module.</p> - -<p>It is illegal for a function <i>declaration</i> to have any linkage type - other than <tt>external</tt>, <tt>dllimport</tt> - or <tt>extern_weak</tt>.</p> - -<p>Aliases can have only <tt>external</tt>, <tt>internal</tt>, <tt>weak</tt> - or <tt>weak_odr</tt> linkages.</p> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="callingconv">Calling Conventions</a> -</h3> - -<div> - -<p>LLVM <a href="#functionstructure">functions</a>, <a href="#i_call">calls</a> - and <a href="#i_invoke">invokes</a> can all have an optional calling - convention specified for the call. The calling convention of any pair of - dynamic caller/callee must match, or the behavior of the program is - undefined. The following calling conventions are supported by LLVM, and more - may be added in the future:</p> - -<dl> - <dt><b>"<tt>ccc</tt>" - The C calling convention</b>:</dt> - <dd>This calling convention (the default if no other calling convention is - specified) matches the target C calling conventions. This calling - convention supports varargs function calls and tolerates some mismatch in - the declared prototype and implemented declaration of the function (as - does normal C).</dd> - - <dt><b>"<tt>fastcc</tt>" - The fast calling convention</b>:</dt> - <dd>This calling convention attempts to make calls as fast as possible - (e.g. by passing things in registers). This calling convention allows the - target to use whatever tricks it wants to produce fast code for the - target, without having to conform to an externally specified ABI - (Application Binary Interface). - <a href="CodeGenerator.html#tailcallopt">Tail calls can only be optimized - when this or the GHC convention is used.</a> This calling convention - does not support varargs and requires the prototype of all callees to - exactly match the prototype of the function definition.</dd> - - <dt><b>"<tt>coldcc</tt>" - The cold calling convention</b>:</dt> - <dd>This calling convention attempts to make code in the caller as efficient - as possible under the assumption that the call is not commonly executed. - As such, these calls often preserve all registers so that the call does - not break any live ranges in the caller side. This calling convention - does not support varargs and requires the prototype of all callees to - exactly match the prototype of the function definition.</dd> - - <dt><b>"<tt>cc <em>10</em></tt>" - GHC convention</b>:</dt> - <dd>This calling convention has been implemented specifically for use by the - <a href="http://www.haskell.org/ghc">Glasgow Haskell Compiler (GHC)</a>. - It passes everything in registers, going to extremes to achieve this by - disabling callee save registers. This calling convention should not be - used lightly but only for specific situations such as an alternative to - the <em>register pinning</em> performance technique often used when - implementing functional programming languages.At the moment only X86 - supports this convention and it has the following limitations: - <ul> - <li>On <em>X86-32</em> only supports up to 4 bit type parameters. No - floating point types are supported.</li> - <li>On <em>X86-64</em> only supports up to 10 bit type parameters and - 6 floating point parameters.</li> - </ul> - This calling convention supports - <a href="CodeGenerator.html#tailcallopt">tail call optimization</a> but - requires both the caller and callee are using it. - </dd> - - <dt><b>"<tt>cc <<em>n</em>></tt>" - Numbered convention</b>:</dt> - <dd>Any calling convention may be specified by number, allowing - target-specific calling conventions to be used. Target specific calling - conventions start at 64.</dd> -</dl> - -<p>More calling conventions can be added/defined on an as-needed basis, to - support Pascal conventions or any other well-known target-independent - convention.</p> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="visibility">Visibility Styles</a> -</h3> - -<div> - -<p>All Global Variables and Functions have one of the following visibility - styles:</p> - -<dl> - <dt><b>"<tt>default</tt>" - Default style</b>:</dt> - <dd>On targets that use the ELF object file format, default visibility means - that the declaration is visible to other modules and, in shared libraries, - means that the declared entity may be overridden. On Darwin, default - visibility means that the declaration is visible to other modules. Default - visibility corresponds to "external linkage" in the language.</dd> - - <dt><b>"<tt>hidden</tt>" - Hidden style</b>:</dt> - <dd>Two declarations of an object with hidden visibility refer to the same - object if they are in the same shared object. Usually, hidden visibility - indicates that the symbol will not be placed into the dynamic symbol - table, so no other module (executable or shared library) can reference it - directly.</dd> - - <dt><b>"<tt>protected</tt>" - Protected style</b>:</dt> - <dd>On ELF, protected visibility indicates that the symbol will be placed in - the dynamic symbol table, but that references within the defining module - will bind to the local symbol. That is, the symbol cannot be overridden by - another module.</dd> -</dl> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="namedtypes">Named Types</a> -</h3> - -<div> - -<p>LLVM IR allows you to specify name aliases for certain types. This can make - it easier to read the IR and make the IR more condensed (particularly when - recursive types are involved). An example of a name specification is:</p> - -<pre class="doc_code"> -%mytype = type { %mytype*, i32 } -</pre> - -<p>You may give a name to any <a href="#typesystem">type</a> except - "<a href="#t_void">void</a>". Type name aliases may be used anywhere a type - is expected with the syntax "%mytype".</p> - -<p>Note that type names are aliases for the structural type that they indicate, - and that you can therefore specify multiple names for the same type. This - often leads to confusing behavior when dumping out a .ll file. Since LLVM IR - uses structural typing, the name is not part of the type. When printing out - LLVM IR, the printer will pick <em>one name</em> to render all types of a - particular shape. This means that if you have code where two different - source types end up having the same LLVM type, that the dumper will sometimes - print the "wrong" or unexpected type. This is an important design point and - isn't going to change.</p> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="globalvars">Global Variables</a> -</h3> - -<div> - -<p>Global variables define regions of memory allocated at compilation time - instead of run-time. Global variables may optionally be initialized, may - have an explicit section to be placed in, and may have an optional explicit - alignment specified.</p> - -<p>A variable may be defined as <tt>thread_local</tt>, which - means that it will not be shared by threads (each thread will have a - separated copy of the variable). Not all targets support thread-local - variables. Optionally, a TLS model may be specified:</p> - -<dl> - <dt><b><tt>localdynamic</tt></b>:</dt> - <dd>For variables that are only used within the current shared library.</dd> - - <dt><b><tt>initialexec</tt></b>:</dt> - <dd>For variables in modules that will not be loaded dynamically.</dd> - - <dt><b><tt>localexec</tt></b>:</dt> - <dd>For variables defined in the executable and only used within it.</dd> -</dl> - -<p>The models correspond to the ELF TLS models; see - <a href="http://people.redhat.com/drepper/tls.pdf">ELF - Handling For Thread-Local Storage</a> for more information on under which - circumstances the different models may be used. The target may choose a - different TLS model if the specified model is not supported, or if a better - choice of model can be made.</p> - -<p>A variable may be defined as a global - "constant," which indicates that the contents of the variable - will <b>never</b> be modified (enabling better optimization, allowing the - global data to be placed in the read-only section of an executable, etc). - Note that variables that need runtime initialization cannot be marked - "constant" as there is a store to the variable.</p> - -<p>LLVM explicitly allows <em>declarations</em> of global variables to be marked - constant, even if the final definition of the global is not. This capability - can be used to enable slightly better optimization of the program, but - requires the language definition to guarantee that optimizations based on the - 'constantness' are valid for the translation units that do not include the - definition.</p> - -<p>As SSA values, global variables define pointer values that are in scope - (i.e. they dominate) all basic blocks in the program. Global variables - always define a pointer to their "content" type because they describe a - region of memory, and all memory objects in LLVM are accessed through - pointers.</p> - -<p>Global variables can be marked with <tt>unnamed_addr</tt> which indicates - that the address is not significant, only the content. Constants marked - like this can be merged with other constants if they have the same - initializer. Note that a constant with significant address <em>can</em> - be merged with a <tt>unnamed_addr</tt> constant, the result being a - constant whose address is significant.</p> - -<p>A global variable may be declared to reside in a target-specific numbered - address space. For targets that support them, address spaces may affect how - optimizations are performed and/or what target instructions are used to - access the variable. The default address space is zero. The address space - qualifier must precede any other attributes.</p> - -<p>LLVM allows an explicit section to be specified for globals. If the target - supports it, it will emit globals to the section specified.</p> - -<p>An explicit alignment may be specified for a global, which must be a power - of 2. If not present, or if the alignment is set to zero, the alignment of - the global is set by the target to whatever it feels convenient. If an - explicit alignment is specified, the global is forced to have exactly that - alignment. Targets and optimizers are not allowed to over-align the global - if the global has an assigned section. In this case, the extra alignment - could be observable: for example, code could assume that the globals are - densely packed in their section and try to iterate over them as an array, - alignment padding would break this iteration.</p> - -<p>For example, the following defines a global in a numbered address space with - an initializer, section, and alignment:</p> - -<pre class="doc_code"> -@G = addrspace(5) constant float 1.0, section "foo", align 4 -</pre> - -<p>The following example defines a thread-local global with - the <tt>initialexec</tt> TLS model:</p> - -<pre class="doc_code"> -@G = thread_local(initialexec) global i32 0, align 4 -</pre> - -</div> - - -<!-- ======================================================================= --> -<h3> - <a name="functionstructure">Functions</a> -</h3> - -<div> - -<p>LLVM function definitions consist of the "<tt>define</tt>" keyword, an - optional <a href="#linkage">linkage type</a>, an optional - <a href="#visibility">visibility style</a>, an optional - <a href="#callingconv">calling convention</a>, - an optional <tt>unnamed_addr</tt> attribute, a return type, an optional - <a href="#paramattrs">parameter attribute</a> for the return type, a function - name, a (possibly empty) argument list (each with optional - <a href="#paramattrs">parameter attributes</a>), optional - <a href="#fnattrs">function attributes</a>, an optional section, an optional - alignment, an optional <a href="#gc">garbage collector name</a>, an opening - curly brace, a list of basic blocks, and a closing curly brace.</p> - -<p>LLVM function declarations consist of the "<tt>declare</tt>" keyword, an - optional <a href="#linkage">linkage type</a>, an optional - <a href="#visibility">visibility style</a>, an optional - <a href="#callingconv">calling convention</a>, - an optional <tt>unnamed_addr</tt> attribute, a return type, an optional - <a href="#paramattrs">parameter attribute</a> for the return type, a function - name, a possibly empty list of arguments, an optional alignment, and an - optional <a href="#gc">garbage collector name</a>.</p> - -<p>A function definition contains a list of basic blocks, forming the CFG - (Control Flow Graph) for the function. Each basic block may optionally start - with a label (giving the basic block a symbol table entry), contains a list - of instructions, and ends with a <a href="#terminators">terminator</a> - instruction (such as a branch or function return).</p> - -<p>The first basic block in a function is special in two ways: it is immediately - executed on entrance to the function, and it is not allowed to have - predecessor basic blocks (i.e. there can not be any branches to the entry - block of a function). Because the block can have no predecessors, it also - cannot have any <a href="#i_phi">PHI nodes</a>.</p> - -<p>LLVM allows an explicit section to be specified for functions. If the target - supports it, it will emit functions to the section specified.</p> - -<p>An explicit alignment may be specified for a function. If not present, or if - the alignment is set to zero, the alignment of the function is set by the - target to whatever it feels convenient. If an explicit alignment is - specified, the function is forced to have at least that much alignment. All - alignments must be a power of 2.</p> - -<p>If the <tt>unnamed_addr</tt> attribute is given, the address is know to not - be significant and two identical functions can be merged.</p> - -<h5>Syntax:</h5> -<pre class="doc_code"> -define [<a href="#linkage">linkage</a>] [<a href="#visibility">visibility</a>] - [<a href="#callingconv">cconv</a>] [<a href="#paramattrs">ret attrs</a>] - <ResultType> @<FunctionName> ([argument list]) - [<a href="#fnattrs">fn Attrs</a>] [section "name"] [align N] - [<a href="#gc">gc</a>] { ... } -</pre> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="aliasstructure">Aliases</a> -</h3> - -<div> - -<p>Aliases act as "second name" for the aliasee value (which can be either - function, global variable, another alias or bitcast of global value). Aliases - may have an optional <a href="#linkage">linkage type</a>, and an - optional <a href="#visibility">visibility style</a>.</p> - -<h5>Syntax:</h5> -<pre class="doc_code"> -@<Name> = alias [Linkage] [Visibility] <AliaseeTy> @<Aliasee> -</pre> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="namedmetadatastructure">Named Metadata</a> -</h3> - -<div> - -<p>Named metadata is a collection of metadata. <a href="#metadata">Metadata - nodes</a> (but not metadata strings) are the only valid operands for - a named metadata.</p> - -<h5>Syntax:</h5> -<pre class="doc_code"> -; Some unnamed metadata nodes, which are referenced by the named metadata. -!0 = metadata !{metadata !"zero"} -!1 = metadata !{metadata !"one"} -!2 = metadata !{metadata !"two"} -; A named metadata. -!name = !{!0, !1, !2} -</pre> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="paramattrs">Parameter Attributes</a> -</h3> - -<div> - -<p>The return type and each parameter of a function type may have a set of - <i>parameter attributes</i> associated with them. Parameter attributes are - used to communicate additional information about the result or parameters of - a function. Parameter attributes are considered to be part of the function, - not of the function type, so functions with different parameter attributes - can have the same function type.</p> - -<p>Parameter attributes are simple keywords that follow the type specified. If - multiple parameter attributes are needed, they are space separated. For - example:</p> - -<pre class="doc_code"> -declare i32 @printf(i8* noalias nocapture, ...) -declare i32 @atoi(i8 zeroext) -declare signext i8 @returns_signed_char() -</pre> - -<p>Note that any attributes for the function result (<tt>nounwind</tt>, - <tt>readonly</tt>) come immediately after the argument list.</p> - -<p>Currently, only the following parameter attributes are defined:</p> - -<dl> - <dt><tt><b>zeroext</b></tt></dt> - <dd>This indicates to the code generator that the parameter or return value - should be zero-extended to the extent required by the target's ABI (which - is usually 32-bits, but is 8-bits for a i1 on x86-64) by the caller (for a - parameter) or the callee (for a return value).</dd> - - <dt><tt><b>signext</b></tt></dt> - <dd>This indicates to the code generator that the parameter or return value - should be sign-extended to the extent required by the target's ABI (which - is usually 32-bits) by the caller (for a parameter) or the callee (for a - return value).</dd> - - <dt><tt><b>inreg</b></tt></dt> - <dd>This indicates that this parameter or return value should be treated in a - special target-dependent fashion during while emitting code for a function - call or return (usually, by putting it in a register as opposed to memory, - though some targets use it to distinguish between two different kinds of - registers). Use of this attribute is target-specific.</dd> - - <dt><tt><b><a name="byval">byval</a></b></tt></dt> - <dd><p>This indicates that the pointer parameter should really be passed by - value to the function. The attribute implies that a hidden copy of the - pointee - is made between the caller and the callee, so the callee is unable to - modify the value in the caller. This attribute is only valid on LLVM - pointer arguments. It is generally used to pass structs and arrays by - value, but is also valid on pointers to scalars. The copy is considered - to belong to the caller not the callee (for example, - <tt><a href="#readonly">readonly</a></tt> functions should not write to - <tt>byval</tt> parameters). This is not a valid attribute for return - values.</p> - - <p>The byval attribute also supports specifying an alignment with - the align attribute. It indicates the alignment of the stack slot to - form and the known alignment of the pointer specified to the call site. If - the alignment is not specified, then the code generator makes a - target-specific assumption.</p></dd> - - <dt><tt><b><a name="sret">sret</a></b></tt></dt> - <dd>This indicates that the pointer parameter specifies the address of a - structure that is the return value of the function in the source program. - This pointer must be guaranteed by the caller to be valid: loads and - stores to the structure may be assumed by the callee to not to trap and - to be properly aligned. This may only be applied to the first parameter. - This is not a valid attribute for return values. </dd> - - <dt><tt><b><a name="noalias">noalias</a></b></tt></dt> - <dd>This indicates that pointer values - <a href="#pointeraliasing"><i>based</i></a> on the argument or return - value do not alias pointer values which are not <i>based</i> on it, - ignoring certain "irrelevant" dependencies. - For a call to the parent function, dependencies between memory - references from before or after the call and from those during the call - are "irrelevant" to the <tt>noalias</tt> keyword for the arguments and - return value used in that call. - The caller shares the responsibility with the callee for ensuring that - these requirements are met. - For further details, please see the discussion of the NoAlias response in - <a href="AliasAnalysis.html#MustMayNo">alias analysis</a>.<br> -<br> - Note that this definition of <tt>noalias</tt> is intentionally - similar to the definition of <tt>restrict</tt> in C99 for function - arguments, though it is slightly weaker. -<br> - For function return values, C99's <tt>restrict</tt> is not meaningful, - while LLVM's <tt>noalias</tt> is. - </dd> - - <dt><tt><b><a name="nocapture">nocapture</a></b></tt></dt> - <dd>This indicates that the callee does not make any copies of the pointer - that outlive the callee itself. This is not a valid attribute for return - values.</dd> - - <dt><tt><b><a name="nest">nest</a></b></tt></dt> - <dd>This indicates that the pointer parameter can be excised using the - <a href="#int_trampoline">trampoline intrinsics</a>. This is not a valid - attribute for return values.</dd> -</dl> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="gc">Garbage Collector Names</a> -</h3> - -<div> - -<p>Each function may specify a garbage collector name, which is simply a - string:</p> - -<pre class="doc_code"> -define void @f() gc "name" { ... } -</pre> - -<p>The compiler declares the supported values of <i>name</i>. Specifying a - collector which will cause the compiler to alter its output in order to - support the named garbage collection algorithm.</p> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="fnattrs">Function Attributes</a> -</h3> - -<div> - -<p>Function attributes are set to communicate additional information about a - function. Function attributes are considered to be part of the function, not - of the function type, so functions with different parameter attributes can - have the same function type.</p> - -<p>Function attributes are simple keywords that follow the type specified. If - multiple attributes are needed, they are space separated. For example:</p> - -<pre class="doc_code"> -define void @f() noinline { ... } -define void @f() alwaysinline { ... } -define void @f() alwaysinline optsize { ... } -define void @f() optsize { ... } -</pre> - -<dl> - <dt><tt><b>address_safety</b></tt></dt> - <dd>This attribute indicates that the address safety analysis - is enabled for this function. </dd> - - <dt><tt><b>alignstack(<<em>n</em>>)</b></tt></dt> - <dd>This attribute indicates that, when emitting the prologue and epilogue, - the backend should forcibly align the stack pointer. Specify the - desired alignment, which must be a power of two, in parentheses. - - <dt><tt><b>alwaysinline</b></tt></dt> - <dd>This attribute indicates that the inliner should attempt to inline this - function into callers whenever possible, ignoring any active inlining size - threshold for this caller.</dd> - - <dt><tt><b>nonlazybind</b></tt></dt> - <dd>This attribute suppresses lazy symbol binding for the function. This - may make calls to the function faster, at the cost of extra program - startup time if the function is not called during program startup.</dd> - - <dt><tt><b>inlinehint</b></tt></dt> - <dd>This attribute indicates that the source code contained a hint that inlining - this function is desirable (such as the "inline" keyword in C/C++). It - is just a hint; it imposes no requirements on the inliner.</dd> - - <dt><tt><b>naked</b></tt></dt> - <dd>This attribute disables prologue / epilogue emission for the function. - This can have very system-specific consequences.</dd> - - <dt><tt><b>noimplicitfloat</b></tt></dt> - <dd>This attributes disables implicit floating point instructions.</dd> - - <dt><tt><b>noinline</b></tt></dt> - <dd>This attribute indicates that the inliner should never inline this - function in any situation. This attribute may not be used together with - the <tt>alwaysinline</tt> attribute.</dd> - - <dt><tt><b>noredzone</b></tt></dt> - <dd>This attribute indicates that the code generator should not use a red - zone, even if the target-specific ABI normally permits it.</dd> - - <dt><tt><b>noreturn</b></tt></dt> - <dd>This function attribute indicates that the function never returns - normally. This produces undefined behavior at runtime if the function - ever does dynamically return.</dd> - - <dt><tt><b>nounwind</b></tt></dt> - <dd>This function attribute indicates that the function never returns with an - unwind or exceptional control flow. If the function does unwind, its - runtime behavior is undefined.</dd> - - <dt><tt><b>optsize</b></tt></dt> - <dd>This attribute suggests that optimization passes and code generator passes - make choices that keep the code size of this function low, and otherwise - do optimizations specifically to reduce code size.</dd> - - <dt><tt><b>readnone</b></tt></dt> - <dd>This attribute indicates that the function computes its result (or decides - to unwind an exception) based strictly on its arguments, without - dereferencing any pointer arguments or otherwise accessing any mutable - state (e.g. memory, control registers, etc) visible to caller functions. - It does not write through any pointer arguments - (including <tt><a href="#byval">byval</a></tt> arguments) and never - changes any state visible to callers. This means that it cannot unwind - exceptions by calling the <tt>C++</tt> exception throwing methods.</dd> - - <dt><tt><b><a name="readonly">readonly</a></b></tt></dt> - <dd>This attribute indicates that the function does not write through any - pointer arguments (including <tt><a href="#byval">byval</a></tt> - arguments) or otherwise modify any state (e.g. memory, control registers, - etc) visible to caller functions. It may dereference pointer arguments - and read state that may be set in the caller. A readonly function always - returns the same value (or unwinds an exception identically) when called - with the same set of arguments and global state. It cannot unwind an - exception by calling the <tt>C++</tt> exception throwing methods.</dd> - - <dt><tt><b><a name="returns_twice">returns_twice</a></b></tt></dt> - <dd>This attribute indicates that this function can return twice. The - C <code>setjmp</code> is an example of such a function. The compiler - disables some optimizations (like tail calls) in the caller of these - functions.</dd> - - <dt><tt><b><a name="ssp">ssp</a></b></tt></dt> - <dd>This attribute indicates that the function should emit a stack smashing - protector. It is in the form of a "canary"—a random value placed on - the stack before the local variables that's checked upon return from the - function to see if it has been overwritten. A heuristic is used to - determine if a function needs stack protectors or not.<br> -<br> - If a function that has an <tt>ssp</tt> attribute is inlined into a - function that doesn't have an <tt>ssp</tt> attribute, then the resulting - function will have an <tt>ssp</tt> attribute.</dd> - - <dt><tt><b>sspreq</b></tt></dt> - <dd>This attribute indicates that the function should <em>always</em> emit a - stack smashing protector. This overrides - the <tt><a href="#ssp">ssp</a></tt> function attribute.<br> -<br> - If a function that has an <tt>sspreq</tt> attribute is inlined into a - function that doesn't have an <tt>sspreq</tt> attribute or which has - an <tt>ssp</tt> attribute, then the resulting function will have - an <tt>sspreq</tt> attribute.</dd> - - <dt><tt><b><a name="uwtable">uwtable</a></b></tt></dt> - <dd>This attribute indicates that the ABI being targeted requires that - an unwind table entry be produce for this function even if we can - show that no exceptions passes by it. This is normally the case for - the ELF x86-64 abi, but it can be disabled for some compilation - units.</dd> -</dl> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="moduleasm">Module-Level Inline Assembly</a> -</h3> - -<div> - -<p>Modules may contain "module-level inline asm" blocks, which corresponds to - the GCC "file scope inline asm" blocks. These blocks are internally - concatenated by LLVM and treated as a single unit, but may be separated in - the <tt>.ll</tt> file if desired. The syntax is very simple:</p> - -<pre class="doc_code"> -module asm "inline asm code goes here" -module asm "more can go here" -</pre> - -<p>The strings can contain any character by escaping non-printable characters. - The escape sequence used is simply "\xx" where "xx" is the two digit hex code - for the number.</p> - -<p>The inline asm code is simply printed to the machine code .s file when - assembly code is generated.</p> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="datalayout">Data Layout</a> -</h3> - -<div> - -<p>A module may specify a target specific data layout string that specifies how - data is to be laid out in memory. The syntax for the data layout is - simply:</p> - -<pre class="doc_code"> -target datalayout = "<i>layout specification</i>" -</pre> - -<p>The <i>layout specification</i> consists of a list of specifications - separated by the minus sign character ('-'). Each specification starts with - a letter and may include other information after the letter to define some - aspect of the data layout. The specifications accepted are as follows:</p> - -<dl> - <dt><tt>E</tt></dt> - <dd>Specifies that the target lays out data in big-endian form. That is, the - bits with the most significance have the lowest address location.</dd> - - <dt><tt>e</tt></dt> - <dd>Specifies that the target lays out data in little-endian form. That is, - the bits with the least significance have the lowest address - location.</dd> - - <dt><tt>S<i>size</i></tt></dt> - <dd>Specifies the natural alignment of the stack in bits. Alignment promotion - of stack variables is limited to the natural stack alignment to avoid - dynamic stack realignment. The stack alignment must be a multiple of - 8-bits. If omitted, the natural stack alignment defaults to "unspecified", - which does not prevent any alignment promotions.</dd> - - <dt><tt>p[n]:<i>size</i>:<i>abi</i>:<i>pref</i></tt></dt> - <dd>This specifies the <i>size</i> of a pointer and its <i>abi</i> and - <i>preferred</i> alignments for address space <i>n</i>. All sizes are in - bits. Specifying the <i>pref</i> alignment is optional. If omitted, the - preceding <tt>:</tt> should be omitted too. The address space, - <i>n</i> is optional, and if not specified, denotes the default address - space 0. The value of <i>n</i> must be in the range [1,2^23).</dd> - - <dt><tt>i<i>size</i>:<i>abi</i>:<i>pref</i></tt></dt> - <dd>This specifies the alignment for an integer type of a given bit - <i>size</i>. The value of <i>size</i> must be in the range [1,2^23).</dd> - - <dt><tt>v<i>size</i>:<i>abi</i>:<i>pref</i></tt></dt> - <dd>This specifies the alignment for a vector type of a given bit - <i>size</i>.</dd> - - <dt><tt>f<i>size</i>:<i>abi</i>:<i>pref</i></tt></dt> - <dd>This specifies the alignment for a floating point type of a given bit - <i>size</i>. Only values of <i>size</i> that are supported by the target - will work. 32 (float) and 64 (double) are supported on all targets; - 80 or 128 (different flavors of long double) are also supported on some - targets. - - <dt><tt>a<i>size</i>:<i>abi</i>:<i>pref</i></tt></dt> - <dd>This specifies the alignment for an aggregate type of a given bit - <i>size</i>.</dd> - - <dt><tt>s<i>size</i>:<i>abi</i>:<i>pref</i></tt></dt> - <dd>This specifies the alignment for a stack object of a given bit - <i>size</i>.</dd> - - <dt><tt>n<i>size1</i>:<i>size2</i>:<i>size3</i>...</tt></dt> - <dd>This specifies a set of native integer widths for the target CPU - in bits. For example, it might contain "n32" for 32-bit PowerPC, - "n32:64" for PowerPC 64, or "n8:16:32:64" for X86-64. Elements of - this set are considered to support most general arithmetic - operations efficiently.</dd> -</dl> - -<p>When constructing the data layout for a given target, LLVM starts with a - default set of specifications which are then (possibly) overridden by the - specifications in the <tt>datalayout</tt> keyword. The default specifications - are given in this list:</p> - -<ul> - <li><tt>E</tt> - big endian</li> - <li><tt>p:64:64:64</tt> - 64-bit pointers with 64-bit alignment</li> - <li><tt>p1:32:32:32</tt> - 32-bit pointers with 32-bit alignment for - address space 1</li> - <li><tt>p2:16:32:32</tt> - 16-bit pointers with 32-bit alignment for - address space 2</li> - <li><tt>i1:8:8</tt> - i1 is 8-bit (byte) aligned</li> - <li><tt>i8:8:8</tt> - i8 is 8-bit (byte) aligned</li> - <li><tt>i16:16:16</tt> - i16 is 16-bit aligned</li> - <li><tt>i32:32:32</tt> - i32 is 32-bit aligned</li> - <li><tt>i64:32:64</tt> - i64 has ABI alignment of 32-bits but preferred - alignment of 64-bits</li> - <li><tt>f32:32:32</tt> - float is 32-bit aligned</li> - <li><tt>f64:64:64</tt> - double is 64-bit aligned</li> - <li><tt>v64:64:64</tt> - 64-bit vector is 64-bit aligned</li> - <li><tt>v128:128:128</tt> - 128-bit vector is 128-bit aligned</li> - <li><tt>a0:0:1</tt> - aggregates are 8-bit aligned</li> - <li><tt>s0:64:64</tt> - stack objects are 64-bit aligned</li> -</ul> - -<p>When LLVM is determining the alignment for a given type, it uses the - following rules:</p> - -<ol> - <li>If the type sought is an exact match for one of the specifications, that - specification is used.</li> - - <li>If no match is found, and the type sought is an integer type, then the - smallest integer type that is larger than the bitwidth of the sought type - is used. If none of the specifications are larger than the bitwidth then - the largest integer type is used. For example, given the default - specifications above, the i7 type will use the alignment of i8 (next - largest) while both i65 and i256 will use the alignment of i64 (largest - specified).</li> - - <li>If no match is found, and the type sought is a vector type, then the - largest vector type that is smaller than the sought vector type will be - used as a fall back. This happens because <128 x double> can be - implemented in terms of 64 <2 x double>, for example.</li> -</ol> - -<p>The function of the data layout string may not be what you expect. Notably, - this is not a specification from the frontend of what alignment the code - generator should use.</p> - -<p>Instead, if specified, the target data layout is required to match what the - ultimate <em>code generator</em> expects. This string is used by the - mid-level optimizers to - improve code, and this only works if it matches what the ultimate code - generator uses. If you would like to generate IR that does not embed this - target-specific detail into the IR, then you don't have to specify the - string. This will disable some optimizations that require precise layout - information, but this also prevents those optimizations from introducing - target specificity into the IR.</p> - - - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="pointeraliasing">Pointer Aliasing Rules</a> -</h3> - -<div> - -<p>Any memory access must be done through a pointer value associated -with an address range of the memory access, otherwise the behavior -is undefined. Pointer values are associated with address ranges -according to the following rules:</p> - -<ul> - <li>A pointer value is associated with the addresses associated with - any value it is <i>based</i> on. - <li>An address of a global variable is associated with the address - range of the variable's storage.</li> - <li>The result value of an allocation instruction is associated with - the address range of the allocated storage.</li> - <li>A null pointer in the default address-space is associated with - no address.</li> - <li>An integer constant other than zero or a pointer value returned - from a function not defined within LLVM may be associated with address - ranges allocated through mechanisms other than those provided by - LLVM. Such ranges shall not overlap with any ranges of addresses - allocated by mechanisms provided by LLVM.</li> -</ul> - -<p>A pointer value is <i>based</i> on another pointer value according - to the following rules:</p> - -<ul> - <li>A pointer value formed from a - <tt><a href="#i_getelementptr">getelementptr</a></tt> operation - is <i>based</i> on the first operand of the <tt>getelementptr</tt>.</li> - <li>The result value of a - <tt><a href="#i_bitcast">bitcast</a></tt> is <i>based</i> on the operand - of the <tt>bitcast</tt>.</li> - <li>A pointer value formed by an - <tt><a href="#i_inttoptr">inttoptr</a></tt> is <i>based</i> on all - pointer values that contribute (directly or indirectly) to the - computation of the pointer's value.</li> - <li>The "<i>based</i> on" relationship is transitive.</li> -</ul> - -<p>Note that this definition of <i>"based"</i> is intentionally - similar to the definition of <i>"based"</i> in C99, though it is - slightly weaker.</p> - -<p>LLVM IR does not associate types with memory. The result type of a -<tt><a href="#i_load">load</a></tt> merely indicates the size and -alignment of the memory from which to load, as well as the -interpretation of the value. The first operand type of a -<tt><a href="#i_store">store</a></tt> similarly only indicates the size -and alignment of the store.</p> - -<p>Consequently, type-based alias analysis, aka TBAA, aka -<tt>-fstrict-aliasing</tt>, is not applicable to general unadorned -LLVM IR. <a href="#metadata">Metadata</a> may be used to encode -additional information which specialized optimization passes may use -to implement type-based alias analysis.</p> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="volatile">Volatile Memory Accesses</a> -</h3> - -<div> - -<p>Certain memory accesses, such as <a href="#i_load"><tt>load</tt></a>s, <a -href="#i_store"><tt>store</tt></a>s, and <a -href="#int_memcpy"><tt>llvm.memcpy</tt></a>s may be marked <tt>volatile</tt>. -The optimizers must not change the number of volatile operations or change their -order of execution relative to other volatile operations. The optimizers -<i>may</i> change the order of volatile operations relative to non-volatile -operations. This is not Java's "volatile" and has no cross-thread -synchronization behavior.</p> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="memmodel">Memory Model for Concurrent Operations</a> -</h3> - -<div> - -<p>The LLVM IR does not define any way to start parallel threads of execution -or to register signal handlers. Nonetheless, there are platform-specific -ways to create them, and we define LLVM IR's behavior in their presence. This -model is inspired by the C++0x memory model.</p> - -<p>For a more informal introduction to this model, see the -<a href="Atomics.html">LLVM Atomic Instructions and Concurrency Guide</a>. - -<p>We define a <i>happens-before</i> partial order as the least partial order -that</p> -<ul> - <li>Is a superset of single-thread program order, and</li> - <li>When a <i>synchronizes-with</i> <tt>b</tt>, includes an edge from - <tt>a</tt> to <tt>b</tt>. <i>Synchronizes-with</i> pairs are introduced - by platform-specific techniques, like pthread locks, thread - creation, thread joining, etc., and by atomic instructions. - (See also <a href="#ordering">Atomic Memory Ordering Constraints</a>). - </li> -</ul> - -<p>Note that program order does not introduce <i>happens-before</i> edges -between a thread and signals executing inside that thread.</p> - -<p>Every (defined) read operation (load instructions, memcpy, atomic -loads/read-modify-writes, etc.) <var>R</var> reads a series of bytes written by -(defined) write operations (store instructions, atomic -stores/read-modify-writes, memcpy, etc.). For the purposes of this section, -initialized globals are considered to have a write of the initializer which is -atomic and happens before any other read or write of the memory in question. -For each byte of a read <var>R</var>, <var>R<sub>byte</sub></var> may see -any write to the same byte, except:</p> - -<ul> - <li>If <var>write<sub>1</sub></var> happens before - <var>write<sub>2</sub></var>, and <var>write<sub>2</sub></var> happens - before <var>R<sub>byte</sub></var>, then <var>R<sub>byte</sub></var> - does not see <var>write<sub>1</sub></var>. - <li>If <var>R<sub>byte</sub></var> happens before - <var>write<sub>3</sub></var>, then <var>R<sub>byte</sub></var> does not - see <var>write<sub>3</sub></var>. -</ul> - -<p>Given that definition, <var>R<sub>byte</sub></var> is defined as follows: -<ul> - <li>If <var>R</var> is volatile, the result is target-dependent. (Volatile - is supposed to give guarantees which can support - <code>sig_atomic_t</code> in C/C++, and may be used for accesses to - addresses which do not behave like normal memory. It does not generally - provide cross-thread synchronization.) - <li>Otherwise, if there is no write to the same byte that happens before - <var>R<sub>byte</sub></var>, <var>R<sub>byte</sub></var> returns - <tt>undef</tt> for that byte. - <li>Otherwise, if <var>R<sub>byte</sub></var> may see exactly one write, - <var>R<sub>byte</sub></var> returns the value written by that - write.</li> - <li>Otherwise, if <var>R</var> is atomic, and all the writes - <var>R<sub>byte</sub></var> may see are atomic, it chooses one of the - values written. See the <a href="#ordering">Atomic Memory Ordering - Constraints</a> section for additional constraints on how the choice - is made. - <li>Otherwise <var>R<sub>byte</sub></var> returns <tt>undef</tt>.</li> -</ul> - -<p><var>R</var> returns the value composed of the series of bytes it read. -This implies that some bytes within the value may be <tt>undef</tt> -<b>without</b> the entire value being <tt>undef</tt>. Note that this only -defines the semantics of the operation; it doesn't mean that targets will -emit more than one instruction to read the series of bytes.</p> - -<p>Note that in cases where none of the atomic intrinsics are used, this model -places only one restriction on IR transformations on top of what is required -for single-threaded execution: introducing a store to a byte which might not -otherwise be stored is not allowed in general. (Specifically, in the case -where another thread might write to and read from an address, introducing a -store can change a load that may see exactly one write into a load that may -see multiple writes.)</p> - -<!-- FIXME: This model assumes all targets where concurrency is relevant have -a byte-size store which doesn't affect adjacent bytes. As far as I can tell, -none of the backends currently in the tree fall into this category; however, -there might be targets which care. If there are, we want a paragraph -like the following: - -Targets may specify that stores narrower than a certain width are not -available; on such a target, for the purposes of this model, treat any -non-atomic write with an alignment or width less than the minimum width -as if it writes to the relevant surrounding bytes. ---> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="ordering">Atomic Memory Ordering Constraints</a> -</h3> - -<div> - -<p>Atomic instructions (<a href="#i_cmpxchg"><code>cmpxchg</code></a>, -<a href="#i_atomicrmw"><code>atomicrmw</code></a>, -<a href="#i_fence"><code>fence</code></a>, -<a href="#i_load"><code>atomic load</code></a>, and -<a href="#i_store"><code>atomic store</code></a>) take an ordering parameter -that determines which other atomic instructions on the same address they -<i>synchronize with</i>. These semantics are borrowed from Java and C++0x, -but are somewhat more colloquial. If these descriptions aren't precise enough, -check those specs (see spec references in the -<a href="Atomics.html#introduction">atomics guide</a>). -<a href="#i_fence"><code>fence</code></a> instructions -treat these orderings somewhat differently since they don't take an address. -See that instruction's documentation for details.</p> - -<p>For a simpler introduction to the ordering constraints, see the -<a href="Atomics.html">LLVM Atomic Instructions and Concurrency Guide</a>.</p> - -<dl> -<dt><code>unordered</code></dt> -<dd>The set of values that can be read is governed by the happens-before -partial order. A value cannot be read unless some operation wrote it. -This is intended to provide a guarantee strong enough to model Java's -non-volatile shared variables. This ordering cannot be specified for -read-modify-write operations; it is not strong enough to make them atomic -in any interesting way.</dd> -<dt><code>monotonic</code></dt> -<dd>In addition to the guarantees of <code>unordered</code>, there is a single -total order for modifications by <code>monotonic</code> operations on each -address. All modification orders must be compatible with the happens-before -order. There is no guarantee that the modification orders can be combined to -a global total order for the whole program (and this often will not be -possible). The read in an atomic read-modify-write operation -(<a href="#i_cmpxchg"><code>cmpxchg</code></a> and -<a href="#i_atomicrmw"><code>atomicrmw</code></a>) -reads the value in the modification order immediately before the value it -writes. If one atomic read happens before another atomic read of the same -address, the later read must see the same value or a later value in the -address's modification order. This disallows reordering of -<code>monotonic</code> (or stronger) operations on the same address. If an -address is written <code>monotonic</code>ally by one thread, and other threads -<code>monotonic</code>ally read that address repeatedly, the other threads must -eventually see the write. This corresponds to the C++0x/C1x -<code>memory_order_relaxed</code>.</dd> -<dt><code>acquire</code></dt> -<dd>In addition to the guarantees of <code>monotonic</code>, -a <i>synchronizes-with</i> edge may be formed with a <code>release</code> -operation. This is intended to model C++'s <code>memory_order_acquire</code>.</dd> -<dt><code>release</code></dt> -<dd>In addition to the guarantees of <code>monotonic</code>, if this operation -writes a value which is subsequently read by an <code>acquire</code> operation, -it <i>synchronizes-with</i> that operation. (This isn't a complete -description; see the C++0x definition of a release sequence.) This corresponds -to the C++0x/C1x <code>memory_order_release</code>.</dd> -<dt><code>acq_rel</code> (acquire+release)</dt><dd>Acts as both an -<code>acquire</code> and <code>release</code> operation on its address. -This corresponds to the C++0x/C1x <code>memory_order_acq_rel</code>.</dd> -<dt><code>seq_cst</code> (sequentially consistent)</dt><dd> -<dd>In addition to the guarantees of <code>acq_rel</code> -(<code>acquire</code> for an operation which only reads, <code>release</code> -for an operation which only writes), there is a global total order on all -sequentially-consistent operations on all addresses, which is consistent with -the <i>happens-before</i> partial order and with the modification orders of -all the affected addresses. Each sequentially-consistent read sees the last -preceding write to the same address in this global order. This corresponds -to the C++0x/C1x <code>memory_order_seq_cst</code> and Java volatile.</dd> -</dl> - -<p id="singlethread">If an atomic operation is marked <code>singlethread</code>, -it only <i>synchronizes with</i> or participates in modification and seq_cst -total orderings with other operations running in the same thread (for example, -in signal handlers).</p> - -</div> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="typesystem">Type System</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>The LLVM type system is one of the most important features of the - intermediate representation. Being typed enables a number of optimizations - to be performed on the intermediate representation directly, without having - to do extra analyses on the side before the transformation. A strong type - system makes it easier to read the generated code and enables novel analyses - and transformations that are not feasible to perform on normal three address - code representations.</p> - -<!-- ======================================================================= --> -<h3> - <a name="t_classifications">Type Classifications</a> -</h3> - -<div> - -<p>The types fall into a few useful classifications:</p> - -<table border="1" cellspacing="0" cellpadding="4"> - <tbody> - <tr><th>Classification</th><th>Types</th></tr> - <tr> - <td><a href="#t_integer">integer</a></td> - <td><tt>i1, i2, i3, ... i8, ... i16, ... i32, ... i64, ... </tt></td> - </tr> - <tr> - <td><a href="#t_floating">floating point</a></td> - <td><tt>half, float, double, x86_fp80, fp128, ppc_fp128</tt></td> - </tr> - <tr> - <td><a name="t_firstclass">first class</a></td> - <td><a href="#t_integer">integer</a>, - <a href="#t_floating">floating point</a>, - <a href="#t_pointer">pointer</a>, - <a href="#t_vector">vector</a>, - <a href="#t_struct">structure</a>, - <a href="#t_array">array</a>, - <a href="#t_label">label</a>, - <a href="#t_metadata">metadata</a>. - </td> - </tr> - <tr> - <td><a href="#t_primitive">primitive</a></td> - <td><a href="#t_label">label</a>, - <a href="#t_void">void</a>, - <a href="#t_integer">integer</a>, - <a href="#t_floating">floating point</a>, - <a href="#t_x86mmx">x86mmx</a>, - <a href="#t_metadata">metadata</a>.</td> - </tr> - <tr> - <td><a href="#t_derived">derived</a></td> - <td><a href="#t_array">array</a>, - <a href="#t_function">function</a>, - <a href="#t_pointer">pointer</a>, - <a href="#t_struct">structure</a>, - <a href="#t_vector">vector</a>, - <a href="#t_opaque">opaque</a>. - </td> - </tr> - </tbody> -</table> - -<p>The <a href="#t_firstclass">first class</a> types are perhaps the most - important. Values of these types are the only ones which can be produced by - instructions.</p> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="t_primitive">Primitive Types</a> -</h3> - -<div> - -<p>The primitive types are the fundamental building blocks of the LLVM - system.</p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="t_integer">Integer Type</a> -</h4> - -<div> - -<h5>Overview:</h5> -<p>The integer type is a very simple type that simply specifies an arbitrary - bit width for the integer type desired. Any bit width from 1 bit to - 2<sup>23</sup>-1 (about 8 million) can be specified.</p> - -<h5>Syntax:</h5> -<pre> - iN -</pre> - -<p>The number of bits the integer will occupy is specified by the <tt>N</tt> - value.</p> - -<h5>Examples:</h5> -<table class="layout"> - <tr class="layout"> - <td class="left"><tt>i1</tt></td> - <td class="left">a single-bit integer.</td> - </tr> - <tr class="layout"> - <td class="left"><tt>i32</tt></td> - <td class="left">a 32-bit integer.</td> - </tr> - <tr class="layout"> - <td class="left"><tt>i1942652</tt></td> - <td class="left">a really big integer of over 1 million bits.</td> - </tr> -</table> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="t_floating">Floating Point Types</a> -</h4> - -<div> - -<table> - <tbody> - <tr><th>Type</th><th>Description</th></tr> - <tr><td><tt>half</tt></td><td>16-bit floating point value</td></tr> - <tr><td><tt>float</tt></td><td>32-bit floating point value</td></tr> - <tr><td><tt>double</tt></td><td>64-bit floating point value</td></tr> - <tr><td><tt>fp128</tt></td><td>128-bit floating point value (112-bit mantissa)</td></tr> - <tr><td><tt>x86_fp80</tt></td><td>80-bit floating point value (X87)</td></tr> - <tr><td><tt>ppc_fp128</tt></td><td>128-bit floating point value (two 64-bits)</td></tr> - </tbody> -</table> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="t_x86mmx">X86mmx Type</a> -</h4> - -<div> - -<h5>Overview:</h5> -<p>The x86mmx type represents a value held in an MMX register on an x86 machine. The operations allowed on it are quite limited: parameters and return values, load and store, and bitcast. User-specified MMX instructions are represented as intrinsic or asm calls with arguments and/or results of this type. There are no arrays, vectors or constants of this type.</p> - -<h5>Syntax:</h5> -<pre> - x86mmx -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="t_void">Void Type</a> -</h4> - -<div> - -<h5>Overview:</h5> -<p>The void type does not represent any value and has no size.</p> - -<h5>Syntax:</h5> -<pre> - void -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="t_label">Label Type</a> -</h4> - -<div> - -<h5>Overview:</h5> -<p>The label type represents code labels.</p> - -<h5>Syntax:</h5> -<pre> - label -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="t_metadata">Metadata Type</a> -</h4> - -<div> - -<h5>Overview:</h5> -<p>The metadata type represents embedded metadata. No derived types may be - created from metadata except for <a href="#t_function">function</a> - arguments. - -<h5>Syntax:</h5> -<pre> - metadata -</pre> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="t_derived">Derived Types</a> -</h3> - -<div> - -<p>The real power in LLVM comes from the derived types in the system. This is - what allows a programmer to represent arrays, functions, pointers, and other - useful types. Each of these types contain one or more element types which - may be a primitive type, or another derived type. For example, it is - possible to have a two dimensional array, using an array as the element type - of another array.</p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="t_aggregate">Aggregate Types</a> -</h4> - -<div> - -<p>Aggregate Types are a subset of derived types that can contain multiple - member types. <a href="#t_array">Arrays</a> and - <a href="#t_struct">structs</a> are aggregate types. - <a href="#t_vector">Vectors</a> are not considered to be aggregate types.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="t_array">Array Type</a> -</h4> - -<div> - -<h5>Overview:</h5> -<p>The array type is a very simple derived type that arranges elements - sequentially in memory. The array type requires a size (number of elements) - and an underlying data type.</p> - -<h5>Syntax:</h5> -<pre> - [<# elements> x <elementtype>] -</pre> - -<p>The number of elements is a constant integer value; <tt>elementtype</tt> may - be any type with a size.</p> - -<h5>Examples:</h5> -<table class="layout"> - <tr class="layout"> - <td class="left"><tt>[40 x i32]</tt></td> - <td class="left">Array of 40 32-bit integer values.</td> - </tr> - <tr class="layout"> - <td class="left"><tt>[41 x i32]</tt></td> - <td class="left">Array of 41 32-bit integer values.</td> - </tr> - <tr class="layout"> - <td class="left"><tt>[4 x i8]</tt></td> - <td class="left">Array of 4 8-bit integer values.</td> - </tr> -</table> -<p>Here are some examples of multidimensional arrays:</p> -<table class="layout"> - <tr class="layout"> - <td class="left"><tt>[3 x [4 x i32]]</tt></td> - <td class="left">3x4 array of 32-bit integer values.</td> - </tr> - <tr class="layout"> - <td class="left"><tt>[12 x [10 x float]]</tt></td> - <td class="left">12x10 array of single precision floating point values.</td> - </tr> - <tr class="layout"> - <td class="left"><tt>[2 x [3 x [4 x i16]]]</tt></td> - <td class="left">2x3x4 array of 16-bit integer values.</td> - </tr> -</table> - -<p>There is no restriction on indexing beyond the end of the array implied by - a static type (though there are restrictions on indexing beyond the bounds - of an allocated object in some cases). This means that single-dimension - 'variable sized array' addressing can be implemented in LLVM with a zero - length array type. An implementation of 'pascal style arrays' in LLVM could - use the type "<tt>{ i32, [0 x float]}</tt>", for example.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="t_function">Function Type</a> -</h4> - -<div> - -<h5>Overview:</h5> -<p>The function type can be thought of as a function signature. It consists of - a return type and a list of formal parameter types. The return type of a - function type is a first class type or a void type.</p> - -<h5>Syntax:</h5> -<pre> - <returntype> (<parameter list>) -</pre> - -<p>...where '<tt><parameter list></tt>' is a comma-separated list of type - specifiers. Optionally, the parameter list may include a type <tt>...</tt>, - which indicates that the function takes a variable number of arguments. - Variable argument functions can access their arguments with - the <a href="#int_varargs">variable argument handling intrinsic</a> - functions. '<tt><returntype></tt>' is any type except - <a href="#t_label">label</a>.</p> - -<h5>Examples:</h5> -<table class="layout"> - <tr class="layout"> - <td class="left"><tt>i32 (i32)</tt></td> - <td class="left">function taking an <tt>i32</tt>, returning an <tt>i32</tt> - </td> - </tr><tr class="layout"> - <td class="left"><tt>float (i16, i32 *) * - </tt></td> - <td class="left"><a href="#t_pointer">Pointer</a> to a function that takes - an <tt>i16</tt> and a <a href="#t_pointer">pointer</a> to <tt>i32</tt>, - returning <tt>float</tt>. - </td> - </tr><tr class="layout"> - <td class="left"><tt>i32 (i8*, ...)</tt></td> - <td class="left">A vararg function that takes at least one - <a href="#t_pointer">pointer</a> to <tt>i8 </tt> (char in C), - which returns an integer. This is the signature for <tt>printf</tt> in - LLVM. - </td> - </tr><tr class="layout"> - <td class="left"><tt>{i32, i32} (i32)</tt></td> - <td class="left">A function taking an <tt>i32</tt>, returning a - <a href="#t_struct">structure</a> containing two <tt>i32</tt> values - </td> - </tr> -</table> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="t_struct">Structure Type</a> -</h4> - -<div> - -<h5>Overview:</h5> -<p>The structure type is used to represent a collection of data members together - in memory. The elements of a structure may be any type that has a size.</p> - -<p>Structures in memory are accessed using '<tt><a href="#i_load">load</a></tt>' - and '<tt><a href="#i_store">store</a></tt>' by getting a pointer to a field - with the '<tt><a href="#i_getelementptr">getelementptr</a></tt>' instruction. - Structures in registers are accessed using the - '<tt><a href="#i_extractvalue">extractvalue</a></tt>' and - '<tt><a href="#i_insertvalue">insertvalue</a></tt>' instructions.</p> - -<p>Structures may optionally be "packed" structures, which indicate that the - alignment of the struct is one byte, and that there is no padding between - the elements. In non-packed structs, padding between field types is inserted - as defined by the DataLayout string in the module, which is required to match - what the underlying code generator expects.</p> - -<p>Structures can either be "literal" or "identified". A literal structure is - defined inline with other types (e.g. <tt>{i32, i32}*</tt>) whereas identified - types are always defined at the top level with a name. Literal types are - uniqued by their contents and can never be recursive or opaque since there is - no way to write one. Identified types can be recursive, can be opaqued, and are - never uniqued. -</p> - -<h5>Syntax:</h5> -<pre> - %T1 = type { <type list> } <i>; Identified normal struct type</i> - %T2 = type <{ <type list> }> <i>; Identified packed struct type</i> -</pre> - -<h5>Examples:</h5> -<table class="layout"> - <tr class="layout"> - <td class="left"><tt>{ i32, i32, i32 }</tt></td> - <td class="left">A triple of three <tt>i32</tt> values</td> - </tr> - <tr class="layout"> - <td class="left"><tt>{ float, i32 (i32) * }</tt></td> - <td class="left">A pair, where the first element is a <tt>float</tt> and the - second element is a <a href="#t_pointer">pointer</a> to a - <a href="#t_function">function</a> that takes an <tt>i32</tt>, returning - an <tt>i32</tt>.</td> - </tr> - <tr class="layout"> - <td class="left"><tt><{ i8, i32 }></tt></td> - <td class="left">A packed struct known to be 5 bytes in size.</td> - </tr> -</table> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="t_opaque">Opaque Structure Types</a> -</h4> - -<div> - -<h5>Overview:</h5> -<p>Opaque structure types are used to represent named structure types that do - not have a body specified. This corresponds (for example) to the C notion of - a forward declared structure.</p> - -<h5>Syntax:</h5> -<pre> - %X = type opaque - %52 = type opaque -</pre> - -<h5>Examples:</h5> -<table class="layout"> - <tr class="layout"> - <td class="left"><tt>opaque</tt></td> - <td class="left">An opaque type.</td> - </tr> -</table> - -</div> - - - -<!-- _______________________________________________________________________ --> -<h4> - <a name="t_pointer">Pointer Type</a> -</h4> - -<div> - -<h5>Overview:</h5> -<p>The pointer type is used to specify memory locations. - Pointers are commonly used to reference objects in memory.</p> - -<p>Pointer types may have an optional address space attribute defining the - numbered address space where the pointed-to object resides. The default - address space is number zero. The semantics of non-zero address - spaces are target-specific.</p> - -<p>Note that LLVM does not permit pointers to void (<tt>void*</tt>) nor does it - permit pointers to labels (<tt>label*</tt>). Use <tt>i8*</tt> instead.</p> - -<h5>Syntax:</h5> -<pre> - <type> * -</pre> - -<h5>Examples:</h5> -<table class="layout"> - <tr class="layout"> - <td class="left"><tt>[4 x i32]*</tt></td> - <td class="left">A <a href="#t_pointer">pointer</a> to <a - href="#t_array">array</a> of four <tt>i32</tt> values.</td> - </tr> - <tr class="layout"> - <td class="left"><tt>i32 (i32*) *</tt></td> - <td class="left"> A <a href="#t_pointer">pointer</a> to a <a - href="#t_function">function</a> that takes an <tt>i32*</tt>, returning an - <tt>i32</tt>.</td> - </tr> - <tr class="layout"> - <td class="left"><tt>i32 addrspace(5)*</tt></td> - <td class="left">A <a href="#t_pointer">pointer</a> to an <tt>i32</tt> value - that resides in address space #5.</td> - </tr> -</table> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="t_vector">Vector Type</a> -</h4> - -<div> - -<h5>Overview:</h5> -<p>A vector type is a simple derived type that represents a vector of elements. - Vector types are used when multiple primitive data are operated in parallel - using a single instruction (SIMD). A vector type requires a size (number of - elements) and an underlying primitive data type. Vector types are considered - <a href="#t_firstclass">first class</a>.</p> - -<h5>Syntax:</h5> -<pre> - < <# elements> x <elementtype> > -</pre> - -<p>The number of elements is a constant integer value larger than 0; elementtype - may be any integer or floating point type, or a pointer to these types. - Vectors of size zero are not allowed. </p> - -<h5>Examples:</h5> -<table class="layout"> - <tr class="layout"> - <td class="left"><tt><4 x i32></tt></td> - <td class="left">Vector of 4 32-bit integer values.</td> - </tr> - <tr class="layout"> - <td class="left"><tt><8 x float></tt></td> - <td class="left">Vector of 8 32-bit floating-point values.</td> - </tr> - <tr class="layout"> - <td class="left"><tt><2 x i64></tt></td> - <td class="left">Vector of 2 64-bit integer values.</td> - </tr> - <tr class="layout"> - <td class="left"><tt><4 x i64*></tt></td> - <td class="left">Vector of 4 pointers to 64-bit integer values.</td> - </tr> -</table> - -</div> - -</div> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="constants">Constants</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>LLVM has several different basic types of constants. This section describes - them all and their syntax.</p> - -<!-- ======================================================================= --> -<h3> - <a name="simpleconstants">Simple Constants</a> -</h3> - -<div> - -<dl> - <dt><b>Boolean constants</b></dt> - <dd>The two strings '<tt>true</tt>' and '<tt>false</tt>' are both valid - constants of the <tt><a href="#t_integer">i1</a></tt> type.</dd> - - <dt><b>Integer constants</b></dt> - <dd>Standard integers (such as '4') are constants of - the <a href="#t_integer">integer</a> type. Negative numbers may be used - with integer types.</dd> - - <dt><b>Floating point constants</b></dt> - <dd>Floating point constants use standard decimal notation (e.g. 123.421), - exponential notation (e.g. 1.23421e+2), or a more precise hexadecimal - notation (see below). The assembler requires the exact decimal value of a - floating-point constant. For example, the assembler accepts 1.25 but - rejects 1.3 because 1.3 is a repeating decimal in binary. Floating point - constants must have a <a href="#t_floating">floating point</a> type. </dd> - - <dt><b>Null pointer constants</b></dt> - <dd>The identifier '<tt>null</tt>' is recognized as a null pointer constant - and must be of <a href="#t_pointer">pointer type</a>.</dd> -</dl> - -<p>The one non-intuitive notation for constants is the hexadecimal form of - floating point constants. For example, the form '<tt>double - 0x432ff973cafa8000</tt>' is equivalent to (but harder to read than) - '<tt>double 4.5e+15</tt>'. The only time hexadecimal floating point - constants are required (and the only time that they are generated by the - disassembler) is when a floating point constant must be emitted but it cannot - be represented as a decimal floating point number in a reasonable number of - digits. For example, NaN's, infinities, and other special values are - represented in their IEEE hexadecimal format so that assembly and disassembly - do not cause any bits to change in the constants.</p> - -<p>When using the hexadecimal form, constants of types half, float, and double are - represented using the 16-digit form shown above (which matches the IEEE754 - representation for double); half and float values must, however, be exactly - representable as IEE754 half and single precision, respectively. - Hexadecimal format is always used - for long double, and there are three forms of long double. The 80-bit format - used by x86 is represented as <tt>0xK</tt> followed by 20 hexadecimal digits. - The 128-bit format used by PowerPC (two adjacent doubles) is represented - by <tt>0xM</tt> followed by 32 hexadecimal digits. The IEEE 128-bit format - is represented by <tt>0xL</tt> followed by 32 hexadecimal digits; no - currently supported target uses this format. Long doubles will only work if - they match the long double format on your target. The IEEE 16-bit format - (half precision) is represented by <tt>0xH</tt> followed by 4 hexadecimal - digits. All hexadecimal formats are big-endian (sign bit at the left).</p> - -<p>There are no constants of type x86mmx.</p> -</div> - -<!-- ======================================================================= --> -<h3> -<a name="aggregateconstants"></a> <!-- old anchor --> -<a name="complexconstants">Complex Constants</a> -</h3> - -<div> - -<p>Complex constants are a (potentially recursive) combination of simple - constants and smaller complex constants.</p> - -<dl> - <dt><b>Structure constants</b></dt> - <dd>Structure constants are represented with notation similar to structure - type definitions (a comma separated list of elements, surrounded by braces - (<tt>{}</tt>)). For example: "<tt>{ i32 4, float 17.0, i32* @G }</tt>", - where "<tt>@G</tt>" is declared as "<tt>@G = external global i32</tt>". - Structure constants must have <a href="#t_struct">structure type</a>, and - the number and types of elements must match those specified by the - type.</dd> - - <dt><b>Array constants</b></dt> - <dd>Array constants are represented with notation similar to array type - definitions (a comma separated list of elements, surrounded by square - brackets (<tt>[]</tt>)). For example: "<tt>[ i32 42, i32 11, i32 74 - ]</tt>". Array constants must have <a href="#t_array">array type</a>, and - the number and types of elements must match those specified by the - type.</dd> - - <dt><b>Vector constants</b></dt> - <dd>Vector constants are represented with notation similar to vector type - definitions (a comma separated list of elements, surrounded by - less-than/greater-than's (<tt><></tt>)). For example: "<tt>< i32 - 42, i32 11, i32 74, i32 100 ></tt>". Vector constants must - have <a href="#t_vector">vector type</a>, and the number and types of - elements must match those specified by the type.</dd> - - <dt><b>Zero initialization</b></dt> - <dd>The string '<tt>zeroinitializer</tt>' can be used to zero initialize a - value to zero of <em>any</em> type, including scalar and - <a href="#t_aggregate">aggregate</a> types. - This is often used to avoid having to print large zero initializers - (e.g. for large arrays) and is always exactly equivalent to using explicit - zero initializers.</dd> - - <dt><b>Metadata node</b></dt> - <dd>A metadata node is a structure-like constant with - <a href="#t_metadata">metadata type</a>. For example: "<tt>metadata !{ - i32 0, metadata !"test" }</tt>". Unlike other constants that are meant to - be interpreted as part of the instruction stream, metadata is a place to - attach additional information such as debug info.</dd> -</dl> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="globalconstants">Global Variable and Function Addresses</a> -</h3> - -<div> - -<p>The addresses of <a href="#globalvars">global variables</a> - and <a href="#functionstructure">functions</a> are always implicitly valid - (link-time) constants. These constants are explicitly referenced when - the <a href="#identifiers">identifier for the global</a> is used and always - have <a href="#t_pointer">pointer</a> type. For example, the following is a - legal LLVM file:</p> - -<pre class="doc_code"> -@X = global i32 17 -@Y = global i32 42 -@Z = global [2 x i32*] [ i32* @X, i32* @Y ] -</pre> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="undefvalues">Undefined Values</a> -</h3> - -<div> - -<p>The string '<tt>undef</tt>' can be used anywhere a constant is expected, and - indicates that the user of the value may receive an unspecified bit-pattern. - Undefined values may be of any type (other than '<tt>label</tt>' - or '<tt>void</tt>') and be used anywhere a constant is permitted.</p> - -<p>Undefined values are useful because they indicate to the compiler that the - program is well defined no matter what value is used. This gives the - compiler more freedom to optimize. Here are some examples of (potentially - surprising) transformations that are valid (in pseudo IR):</p> - - -<pre class="doc_code"> - %A = add %X, undef - %B = sub %X, undef - %C = xor %X, undef -Safe: - %A = undef - %B = undef - %C = undef -</pre> - -<p>This is safe because all of the output bits are affected by the undef bits. - Any output bit can have a zero or one depending on the input bits.</p> - -<pre class="doc_code"> - %A = or %X, undef - %B = and %X, undef -Safe: - %A = -1 - %B = 0 -Unsafe: - %A = undef - %B = undef -</pre> - -<p>These logical operations have bits that are not always affected by the input. - For example, if <tt>%X</tt> has a zero bit, then the output of the - '<tt>and</tt>' operation will always be a zero for that bit, no matter what - the corresponding bit from the '<tt>undef</tt>' is. As such, it is unsafe to - optimize or assume that the result of the '<tt>and</tt>' is '<tt>undef</tt>'. - However, it is safe to assume that all bits of the '<tt>undef</tt>' could be - 0, and optimize the '<tt>and</tt>' to 0. Likewise, it is safe to assume that - all the bits of the '<tt>undef</tt>' operand to the '<tt>or</tt>' could be - set, allowing the '<tt>or</tt>' to be folded to -1.</p> - -<pre class="doc_code"> - %A = select undef, %X, %Y - %B = select undef, 42, %Y - %C = select %X, %Y, undef -Safe: - %A = %X (or %Y) - %B = 42 (or %Y) - %C = %Y -Unsafe: - %A = undef - %B = undef - %C = undef -</pre> - -<p>This set of examples shows that undefined '<tt>select</tt>' (and conditional - branch) conditions can go <em>either way</em>, but they have to come from one - of the two operands. In the <tt>%A</tt> example, if <tt>%X</tt> and - <tt>%Y</tt> were both known to have a clear low bit, then <tt>%A</tt> would - have to have a cleared low bit. However, in the <tt>%C</tt> example, the - optimizer is allowed to assume that the '<tt>undef</tt>' operand could be the - same as <tt>%Y</tt>, allowing the whole '<tt>select</tt>' to be - eliminated.</p> - -<pre class="doc_code"> - %A = xor undef, undef - - %B = undef - %C = xor %B, %B - - %D = undef - %E = icmp lt %D, 4 - %F = icmp gte %D, 4 - -Safe: - %A = undef - %B = undef - %C = undef - %D = undef - %E = undef - %F = undef -</pre> - -<p>This example points out that two '<tt>undef</tt>' operands are not - necessarily the same. This can be surprising to people (and also matches C - semantics) where they assume that "<tt>X^X</tt>" is always zero, even - if <tt>X</tt> is undefined. This isn't true for a number of reasons, but the - short answer is that an '<tt>undef</tt>' "variable" can arbitrarily change - its value over its "live range". This is true because the variable doesn't - actually <em>have a live range</em>. Instead, the value is logically read - from arbitrary registers that happen to be around when needed, so the value - is not necessarily consistent over time. In fact, <tt>%A</tt> and <tt>%C</tt> - need to have the same semantics or the core LLVM "replace all uses with" - concept would not hold.</p> - -<pre class="doc_code"> - %A = fdiv undef, %X - %B = fdiv %X, undef -Safe: - %A = undef -b: unreachable -</pre> - -<p>These examples show the crucial difference between an <em>undefined - value</em> and <em>undefined behavior</em>. An undefined value (like - '<tt>undef</tt>') is allowed to have an arbitrary bit-pattern. This means that - the <tt>%A</tt> operation can be constant folded to '<tt>undef</tt>', because - the '<tt>undef</tt>' could be an SNaN, and <tt>fdiv</tt> is not (currently) - defined on SNaN's. However, in the second example, we can make a more - aggressive assumption: because the <tt>undef</tt> is allowed to be an - arbitrary value, we are allowed to assume that it could be zero. Since a - divide by zero has <em>undefined behavior</em>, we are allowed to assume that - the operation does not execute at all. This allows us to delete the divide and - all code after it. Because the undefined operation "can't happen", the - optimizer can assume that it occurs in dead code.</p> - -<pre class="doc_code"> -a: store undef -> %X -b: store %X -> undef -Safe: -a: <deleted> -b: unreachable -</pre> - -<p>These examples reiterate the <tt>fdiv</tt> example: a store <em>of</em> an - undefined value can be assumed to not have any effect; we can assume that the - value is overwritten with bits that happen to match what was already there. - However, a store <em>to</em> an undefined location could clobber arbitrary - memory, therefore, it has undefined behavior.</p> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="poisonvalues">Poison Values</a> -</h3> - -<div> - -<p>Poison values are similar to <a href="#undefvalues">undef values</a>, however - they also represent the fact that an instruction or constant expression which - cannot evoke side effects has nevertheless detected a condition which results - in undefined behavior.</p> - -<p>There is currently no way of representing a poison value in the IR; they - only exist when produced by operations such as - <a href="#i_add"><tt>add</tt></a> with the <tt>nsw</tt> flag.</p> - -<p>Poison value behavior is defined in terms of value <i>dependence</i>:</p> - -<ul> -<li>Values other than <a href="#i_phi"><tt>phi</tt></a> nodes depend on - their operands.</li> - -<li><a href="#i_phi"><tt>Phi</tt></a> nodes depend on the operand corresponding - to their dynamic predecessor basic block.</li> - -<li>Function arguments depend on the corresponding actual argument values in - the dynamic callers of their functions.</li> - -<li><a href="#i_call"><tt>Call</tt></a> instructions depend on the - <a href="#i_ret"><tt>ret</tt></a> instructions that dynamically transfer - control back to them.</li> - -<li><a href="#i_invoke"><tt>Invoke</tt></a> instructions depend on the - <a href="#i_ret"><tt>ret</tt></a>, <a href="#i_resume"><tt>resume</tt></a>, - or exception-throwing call instructions that dynamically transfer control - back to them.</li> - -<li>Non-volatile loads and stores depend on the most recent stores to all of the - referenced memory addresses, following the order in the IR - (including loads and stores implied by intrinsics such as - <a href="#int_memcpy"><tt>@llvm.memcpy</tt></a>.)</li> - -<!-- TODO: In the case of multiple threads, this only applies if the store - "happens-before" the load or store. --> - -<!-- TODO: floating-point exception state --> - -<li>An instruction with externally visible side effects depends on the most - recent preceding instruction with externally visible side effects, following - the order in the IR. (This includes - <a href="#volatile">volatile operations</a>.)</li> - -<li>An instruction <i>control-depends</i> on a - <a href="#terminators">terminator instruction</a> - if the terminator instruction has multiple successors and the instruction - is always executed when control transfers to one of the successors, and - may not be executed when control is transferred to another.</li> - -<li>Additionally, an instruction also <i>control-depends</i> on a terminator - instruction if the set of instructions it otherwise depends on would be - different if the terminator had transferred control to a different - successor.</li> - -<li>Dependence is transitive.</li> - -</ul> - -<p>Poison Values have the same behavior as <a href="#undefvalues">undef values</a>, - with the additional affect that any instruction which has a <i>dependence</i> - on a poison value has undefined behavior.</p> - -<p>Here are some examples:</p> - -<pre class="doc_code"> -entry: - %poison = sub nuw i32 0, 1 ; Results in a poison value. - %still_poison = and i32 %poison, 0 ; 0, but also poison. - %poison_yet_again = getelementptr i32* @h, i32 %still_poison - store i32 0, i32* %poison_yet_again ; memory at @h[0] is poisoned - - store i32 %poison, i32* @g ; Poison value stored to memory. - %poison2 = load i32* @g ; Poison value loaded back from memory. - - store volatile i32 %poison, i32* @g ; External observation; undefined behavior. - - %narrowaddr = bitcast i32* @g to i16* - %wideaddr = bitcast i32* @g to i64* - %poison3 = load i16* %narrowaddr ; Returns a poison value. - %poison4 = load i64* %wideaddr ; Returns a poison value. - - %cmp = icmp slt i32 %poison, 0 ; Returns a poison value. - br i1 %cmp, label %true, label %end ; Branch to either destination. - -true: - store volatile i32 0, i32* @g ; This is control-dependent on %cmp, so - ; it has undefined behavior. - br label %end - -end: - %p = phi i32 [ 0, %entry ], [ 1, %true ] - ; Both edges into this PHI are - ; control-dependent on %cmp, so this - ; always results in a poison value. - - store volatile i32 0, i32* @g ; This would depend on the store in %true - ; if %cmp is true, or the store in %entry - ; otherwise, so this is undefined behavior. - - br i1 %cmp, label %second_true, label %second_end - ; The same branch again, but this time the - ; true block doesn't have side effects. - -second_true: - ; No side effects! - ret void - -second_end: - store volatile i32 0, i32* @g ; This time, the instruction always depends - ; on the store in %end. Also, it is - ; control-equivalent to %end, so this is - ; well-defined (ignoring earlier undefined - ; behavior in this example). -</pre> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="blockaddress">Addresses of Basic Blocks</a> -</h3> - -<div> - -<p><b><tt>blockaddress(@function, %block)</tt></b></p> - -<p>The '<tt>blockaddress</tt>' constant computes the address of the specified - basic block in the specified function, and always has an i8* type. Taking - the address of the entry block is illegal.</p> - -<p>This value only has defined behavior when used as an operand to the - '<a href="#i_indirectbr"><tt>indirectbr</tt></a>' instruction, or for - comparisons against null. Pointer equality tests between labels addresses - results in undefined behavior — though, again, comparison against null - is ok, and no label is equal to the null pointer. This may be passed around - as an opaque pointer sized value as long as the bits are not inspected. This - allows <tt>ptrtoint</tt> and arithmetic to be performed on these values so - long as the original value is reconstituted before the <tt>indirectbr</tt> - instruction.</p> - -<p>Finally, some targets may provide defined semantics when using the value as - the operand to an inline assembly, but that is target specific.</p> - -</div> - - -<!-- ======================================================================= --> -<h3> - <a name="constantexprs">Constant Expressions</a> -</h3> - -<div> - -<p>Constant expressions are used to allow expressions involving other constants - to be used as constants. Constant expressions may be of - any <a href="#t_firstclass">first class</a> type and may involve any LLVM - operation that does not have side effects (e.g. load and call are not - supported). The following is the syntax for constant expressions:</p> - -<dl> - <dt><b><tt>trunc (CST to TYPE)</tt></b></dt> - <dd>Truncate a constant to another type. The bit size of CST must be larger - than the bit size of TYPE. Both types must be integers.</dd> - - <dt><b><tt>zext (CST to TYPE)</tt></b></dt> - <dd>Zero extend a constant to another type. The bit size of CST must be - smaller than the bit size of TYPE. Both types must be integers.</dd> - - <dt><b><tt>sext (CST to TYPE)</tt></b></dt> - <dd>Sign extend a constant to another type. The bit size of CST must be - smaller than the bit size of TYPE. Both types must be integers.</dd> - - <dt><b><tt>fptrunc (CST to TYPE)</tt></b></dt> - <dd>Truncate a floating point constant to another floating point type. The - size of CST must be larger than the size of TYPE. Both types must be - floating point.</dd> - - <dt><b><tt>fpext (CST to TYPE)</tt></b></dt> - <dd>Floating point extend a constant to another type. The size of CST must be - smaller or equal to the size of TYPE. Both types must be floating - point.</dd> - - <dt><b><tt>fptoui (CST to TYPE)</tt></b></dt> - <dd>Convert a floating point constant to the corresponding unsigned integer - constant. TYPE must be a scalar or vector integer type. CST must be of - scalar or vector floating point type. Both CST and TYPE must be scalars, - or vectors of the same number of elements. If the value won't fit in the - integer type, the results are undefined.</dd> - - <dt><b><tt>fptosi (CST to TYPE)</tt></b></dt> - <dd>Convert a floating point constant to the corresponding signed integer - constant. TYPE must be a scalar or vector integer type. CST must be of - scalar or vector floating point type. Both CST and TYPE must be scalars, - or vectors of the same number of elements. If the value won't fit in the - integer type, the results are undefined.</dd> - - <dt><b><tt>uitofp (CST to TYPE)</tt></b></dt> - <dd>Convert an unsigned integer constant to the corresponding floating point - constant. TYPE must be a scalar or vector floating point type. CST must be - of scalar or vector integer type. Both CST and TYPE must be scalars, or - vectors of the same number of elements. If the value won't fit in the - floating point type, the results are undefined.</dd> - - <dt><b><tt>sitofp (CST to TYPE)</tt></b></dt> - <dd>Convert a signed integer constant to the corresponding floating point - constant. TYPE must be a scalar or vector floating point type. CST must be - of scalar or vector integer type. Both CST and TYPE must be scalars, or - vectors of the same number of elements. If the value won't fit in the - floating point type, the results are undefined.</dd> - - <dt><b><tt>ptrtoint (CST to TYPE)</tt></b></dt> - <dd>Convert a pointer typed constant to the corresponding integer constant - <tt>TYPE</tt> must be an integer type. <tt>CST</tt> must be of pointer - type. The <tt>CST</tt> value is zero extended, truncated, or unchanged to - make it fit in <tt>TYPE</tt>.</dd> - - <dt><b><tt>inttoptr (CST to TYPE)</tt></b></dt> - <dd>Convert an integer constant to a pointer constant. TYPE must be a pointer - type. CST must be of integer type. The CST value is zero extended, - truncated, or unchanged to make it fit in a pointer size. This one is - <i>really</i> dangerous!</dd> - - <dt><b><tt>bitcast (CST to TYPE)</tt></b></dt> - <dd>Convert a constant, CST, to another TYPE. The constraints of the operands - are the same as those for the <a href="#i_bitcast">bitcast - instruction</a>.</dd> - - <dt><b><tt>getelementptr (CSTPTR, IDX0, IDX1, ...)</tt></b></dt> - <dt><b><tt>getelementptr inbounds (CSTPTR, IDX0, IDX1, ...)</tt></b></dt> - <dd>Perform the <a href="#i_getelementptr">getelementptr operation</a> on - constants. As with the <a href="#i_getelementptr">getelementptr</a> - instruction, the index list may have zero or more indexes, which are - required to make sense for the type of "CSTPTR".</dd> - - <dt><b><tt>select (COND, VAL1, VAL2)</tt></b></dt> - <dd>Perform the <a href="#i_select">select operation</a> on constants.</dd> - - <dt><b><tt>icmp COND (VAL1, VAL2)</tt></b></dt> - <dd>Performs the <a href="#i_icmp">icmp operation</a> on constants.</dd> - - <dt><b><tt>fcmp COND (VAL1, VAL2)</tt></b></dt> - <dd>Performs the <a href="#i_fcmp">fcmp operation</a> on constants.</dd> - - <dt><b><tt>extractelement (VAL, IDX)</tt></b></dt> - <dd>Perform the <a href="#i_extractelement">extractelement operation</a> on - constants.</dd> - - <dt><b><tt>insertelement (VAL, ELT, IDX)</tt></b></dt> - <dd>Perform the <a href="#i_insertelement">insertelement operation</a> on - constants.</dd> - - <dt><b><tt>shufflevector (VEC1, VEC2, IDXMASK)</tt></b></dt> - <dd>Perform the <a href="#i_shufflevector">shufflevector operation</a> on - constants.</dd> - - <dt><b><tt>extractvalue (VAL, IDX0, IDX1, ...)</tt></b></dt> - <dd>Perform the <a href="#i_extractvalue">extractvalue operation</a> on - constants. The index list is interpreted in a similar manner as indices in - a '<a href="#i_getelementptr">getelementptr</a>' operation. At least one - index value must be specified.</dd> - - <dt><b><tt>insertvalue (VAL, ELT, IDX0, IDX1, ...)</tt></b></dt> - <dd>Perform the <a href="#i_insertvalue">insertvalue operation</a> on - constants. The index list is interpreted in a similar manner as indices in - a '<a href="#i_getelementptr">getelementptr</a>' operation. At least one - index value must be specified.</dd> - - <dt><b><tt>OPCODE (LHS, RHS)</tt></b></dt> - <dd>Perform the specified operation of the LHS and RHS constants. OPCODE may - be any of the <a href="#binaryops">binary</a> - or <a href="#bitwiseops">bitwise binary</a> operations. The constraints - on operands are the same as those for the corresponding instruction - (e.g. no bitwise operations on floating point values are allowed).</dd> -</dl> - -</div> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="othervalues">Other Values</a></h2> -<!-- *********************************************************************** --> -<div> -<!-- ======================================================================= --> -<h3> -<a name="inlineasm">Inline Assembler Expressions</a> -</h3> - -<div> - -<p>LLVM supports inline assembler expressions (as opposed - to <a href="#moduleasm">Module-Level Inline Assembly</a>) through the use of - a special value. This value represents the inline assembler as a string - (containing the instructions to emit), a list of operand constraints (stored - as a string), a flag that indicates whether or not the inline asm - expression has side effects, and a flag indicating whether the function - containing the asm needs to align its stack conservatively. An example - inline assembler expression is:</p> - -<pre class="doc_code"> -i32 (i32) asm "bswap $0", "=r,r" -</pre> - -<p>Inline assembler expressions may <b>only</b> be used as the callee operand of - a <a href="#i_call"><tt>call</tt></a> or an - <a href="#i_invoke"><tt>invoke</tt></a> instruction. - Thus, typically we have:</p> - -<pre class="doc_code"> -%X = call i32 asm "<a href="#int_bswap">bswap</a> $0", "=r,r"(i32 %Y) -</pre> - -<p>Inline asms with side effects not visible in the constraint list must be - marked as having side effects. This is done through the use of the - '<tt>sideeffect</tt>' keyword, like so:</p> - -<pre class="doc_code"> -call void asm sideeffect "eieio", ""() -</pre> - -<p>In some cases inline asms will contain code that will not work unless the - stack is aligned in some way, such as calls or SSE instructions on x86, - yet will not contain code that does that alignment within the asm. - The compiler should make conservative assumptions about what the asm might - contain and should generate its usual stack alignment code in the prologue - if the '<tt>alignstack</tt>' keyword is present:</p> - -<pre class="doc_code"> -call void asm alignstack "eieio", ""() -</pre> - -<p>Inline asms also support using non-standard assembly dialects. The assumed - dialect is ATT. When the '<tt>inteldialect</tt>' keyword is present, the - inline asm is using the Intel dialect. Currently, ATT and Intel are the - only supported dialects. An example is:</p> - -<pre class="doc_code"> -call void asm inteldialect "eieio", ""() -</pre> - -<p>If multiple keywords appear the '<tt>sideeffect</tt>' keyword must come - first, the '<tt>alignstack</tt>' keyword second and the - '<tt>inteldialect</tt>' keyword last.</p> - -<!-- -<p>TODO: The format of the asm and constraints string still need to be - documented here. Constraints on what can be done (e.g. duplication, moving, - etc need to be documented). This is probably best done by reference to - another document that covers inline asm from a holistic perspective.</p> - --> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="inlineasm_md">Inline Asm Metadata</a> -</h4> - -<div> - -<p>The call instructions that wrap inline asm nodes may have a - "<tt>!srcloc</tt>" MDNode attached to it that contains a list of constant - integers. If present, the code generator will use the integer as the - location cookie value when report errors through the <tt>LLVMContext</tt> - error reporting mechanisms. This allows a front-end to correlate backend - errors that occur with inline asm back to the source code that produced it. - For example:</p> - -<pre class="doc_code"> -call void asm sideeffect "something bad", ""()<b>, !srcloc !42</b> -... -!42 = !{ i32 1234567 } -</pre> - -<p>It is up to the front-end to make sense of the magic numbers it places in the - IR. If the MDNode contains multiple constants, the code generator will use - the one that corresponds to the line of the asm that the error occurs on.</p> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="metadata">Metadata Nodes and Metadata Strings</a> -</h3> - -<div> - -<p>LLVM IR allows metadata to be attached to instructions in the program that - can convey extra information about the code to the optimizers and code - generator. One example application of metadata is source-level debug - information. There are two metadata primitives: strings and nodes. All - metadata has the <tt>metadata</tt> type and is identified in syntax by a - preceding exclamation point ('<tt>!</tt>').</p> - -<p>A metadata string is a string surrounded by double quotes. It can contain - any character by escaping non-printable characters with "<tt>\xx</tt>" where - "<tt>xx</tt>" is the two digit hex code. For example: - "<tt>!"test\00"</tt>".</p> - -<p>Metadata nodes are represented with notation similar to structure constants - (a comma separated list of elements, surrounded by braces and preceded by an - exclamation point). Metadata nodes can have any values as their operand. For - example:</p> - -<div class="doc_code"> -<pre> -!{ metadata !"test\00", i32 10} -</pre> -</div> - -<p>A <a href="#namedmetadatastructure">named metadata</a> is a collection of - metadata nodes, which can be looked up in the module symbol table. For - example:</p> - -<div class="doc_code"> -<pre> -!foo = metadata !{!4, !3} -</pre> -</div> - -<p>Metadata can be used as function arguments. Here <tt>llvm.dbg.value</tt> - function is using two metadata arguments:</p> - -<div class="doc_code"> -<pre> -call void @llvm.dbg.value(metadata !24, i64 0, metadata !25) -</pre> -</div> - -<p>Metadata can be attached with an instruction. Here metadata <tt>!21</tt> is - attached to the <tt>add</tt> instruction using the <tt>!dbg</tt> - identifier:</p> - -<div class="doc_code"> -<pre> -%indvar.next = add i64 %indvar, 1, !dbg !21 -</pre> -</div> - -<p>More information about specific metadata nodes recognized by the optimizers - and code generator is found below.</p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="tbaa">'<tt>tbaa</tt>' Metadata</a> -</h4> - -<div> - -<p>In LLVM IR, memory does not have types, so LLVM's own type system is not - suitable for doing TBAA. Instead, metadata is added to the IR to describe - a type system of a higher level language. This can be used to implement - typical C/C++ TBAA, but it can also be used to implement custom alias - analysis behavior for other languages.</p> - -<p>The current metadata format is very simple. TBAA metadata nodes have up to - three fields, e.g.:</p> - -<div class="doc_code"> -<pre> -!0 = metadata !{ metadata !"an example type tree" } -!1 = metadata !{ metadata !"int", metadata !0 } -!2 = metadata !{ metadata !"float", metadata !0 } -!3 = metadata !{ metadata !"const float", metadata !2, i64 1 } -</pre> -</div> - -<p>The first field is an identity field. It can be any value, usually - a metadata string, which uniquely identifies the type. The most important - name in the tree is the name of the root node. Two trees with - different root node names are entirely disjoint, even if they - have leaves with common names.</p> - -<p>The second field identifies the type's parent node in the tree, or - is null or omitted for a root node. A type is considered to alias - all of its descendants and all of its ancestors in the tree. Also, - a type is considered to alias all types in other trees, so that - bitcode produced from multiple front-ends is handled conservatively.</p> - -<p>If the third field is present, it's an integer which if equal to 1 - indicates that the type is "constant" (meaning - <tt>pointsToConstantMemory</tt> should return true; see - <a href="AliasAnalysis.html#OtherItfs">other useful - <tt>AliasAnalysis</tt> methods</a>).</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="tbaa.struct">'<tt>tbaa.struct</tt>' Metadata</a> -</h4> - -<div> - -<p>The <a href="#int_memcpy"><tt>llvm.memcpy</tt></a> is often used to implement -aggregate assignment operations in C and similar languages, however it is -defined to copy a contiguous region of memory, which is more than strictly -necessary for aggregate types which contain holes due to padding. Also, it -doesn't contain any TBAA information about the fields of the aggregate.</p> - -<p><tt>!tbaa.struct</tt> metadata can describe which memory subregions in a memcpy -are padding and what the TBAA tags of the struct are.</p> - -<p>The current metadata format is very simple. <tt>!tbaa.struct</tt> metadata nodes - are a list of operands which are in conceptual groups of three. For each - group of three, the first operand gives the byte offset of a field in bytes, - the second gives its size in bytes, and the third gives its - tbaa tag. e.g.:</p> - -<div class="doc_code"> -<pre> -!4 = metadata !{ i64 0, i64 4, metadata !1, i64 8, i64 4, metadata !2 } -</pre> -</div> - -<p>This describes a struct with two fields. The first is at offset 0 bytes - with size 4 bytes, and has tbaa tag !1. The second is at offset 8 bytes - and has size 4 bytes and has tbaa tag !2.</p> - -<p>Note that the fields need not be contiguous. In this example, there is a - 4 byte gap between the two fields. This gap represents padding which - does not carry useful data and need not be preserved.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="fpmath">'<tt>fpmath</tt>' Metadata</a> -</h4> - -<div> - -<p><tt>fpmath</tt> metadata may be attached to any instruction of floating point - type. It can be used to express the maximum acceptable error in the result of - that instruction, in ULPs, thus potentially allowing the compiler to use a - more efficient but less accurate method of computing it. ULP is defined as - follows:</p> - -<blockquote> - -<p>If <tt>x</tt> is a real number that lies between two finite consecutive - floating-point numbers <tt>a</tt> and <tt>b</tt>, without being equal to one - of them, then <tt>ulp(x) = |b - a|</tt>, otherwise <tt>ulp(x)</tt> is the - distance between the two non-equal finite floating-point numbers nearest - <tt>x</tt>. Moreover, <tt>ulp(NaN)</tt> is <tt>NaN</tt>.</p> - -</blockquote> - -<p>The metadata node shall consist of a single positive floating point number - representing the maximum relative error, for example:</p> - -<div class="doc_code"> -<pre> -!0 = metadata !{ float 2.5 } ; maximum acceptable inaccuracy is 2.5 ULPs -</pre> -</div> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="range">'<tt>range</tt>' Metadata</a> -</h4> - -<div> -<p><tt>range</tt> metadata may be attached only to loads of integer types. It - expresses the possible ranges the loaded value is in. The ranges are - represented with a flattened list of integers. The loaded value is known to - be in the union of the ranges defined by each consecutive pair. Each pair - has the following properties:</p> -<ul> - <li>The type must match the type loaded by the instruction.</li> - <li>The pair <tt>a,b</tt> represents the range <tt>[a,b)</tt>.</li> - <li>Both <tt>a</tt> and <tt>b</tt> are constants.</li> - <li>The range is allowed to wrap.</li> - <li>The range should not represent the full or empty set. That is, - <tt>a!=b</tt>. </li> -</ul> -<p> In addition, the pairs must be in signed order of the lower bound and - they must be non-contiguous.</p> - -<p>Examples:</p> -<div class="doc_code"> -<pre> - %a = load i8* %x, align 1, !range !0 ; Can only be 0 or 1 - %b = load i8* %y, align 1, !range !1 ; Can only be 255 (-1), 0 or 1 - %c = load i8* %z, align 1, !range !2 ; Can only be 0, 1, 3, 4 or 5 - %d = load i8* %z, align 1, !range !3 ; Can only be -2, -1, 3, 4 or 5 -... -!0 = metadata !{ i8 0, i8 2 } -!1 = metadata !{ i8 255, i8 2 } -!2 = metadata !{ i8 0, i8 2, i8 3, i8 6 } -!3 = metadata !{ i8 -2, i8 0, i8 3, i8 6 } -</pre> -</div> -</div> -</div> - -</div> - -<!-- *********************************************************************** --> -<h2> - <a name="module_flags">Module Flags Metadata</a> -</h2> -<!-- *********************************************************************** --> - -<div> - -<p>Information about the module as a whole is difficult to convey to LLVM's - subsystems. The LLVM IR isn't sufficient to transmit this - information. The <tt>llvm.module.flags</tt> named metadata exists in order to - facilitate this. These flags are in the form of key / value pairs — - much like a dictionary — making it easy for any subsystem who cares - about a flag to look it up.</p> - -<p>The <tt>llvm.module.flags</tt> metadata contains a list of metadata - triplets. Each triplet has the following form:</p> - -<ul> - <li>The first element is a <i>behavior</i> flag, which specifies the behavior - when two (or more) modules are merged together, and it encounters two (or - more) metadata with the same ID. The supported behaviors are described - below.</li> - - <li>The second element is a metadata string that is a unique ID for the - metadata. How each ID is interpreted is documented below.</li> - - <li>The third element is the value of the flag.</li> -</ul> - -<p>When two (or more) modules are merged together, the resulting - <tt>llvm.module.flags</tt> metadata is the union of the - modules' <tt>llvm.module.flags</tt> metadata. The only exception being a flag - with the <i>Override</i> behavior, which may override another flag's value - (see below).</p> - -<p>The following behaviors are supported:</p> - -<table border="1" cellspacing="0" cellpadding="4"> - <tbody> - <tr> - <th>Value</th> - <th>Behavior</th> - </tr> - <tr> - <td>1</td> - <td align="left"> - <dl> - <dt><b>Error</b></dt> - <dd>Emits an error if two values disagree. It is an error to have an ID - with both an Error and a Warning behavior.</dd> - </dl> - </td> - </tr> - <tr> - <td>2</td> - <td align="left"> - <dl> - <dt><b>Warning</b></dt> - <dd>Emits a warning if two values disagree.</dd> - </dl> - </td> - </tr> - <tr> - <td>3</td> - <td align="left"> - <dl> - <dt><b>Require</b></dt> - <dd>Emits an error when the specified value is not present or doesn't - have the specified value. It is an error for two (or more) - <tt>llvm.module.flags</tt> with the same ID to have the Require - behavior but different values. There may be multiple Require flags - per ID.</dd> - </dl> - </td> - </tr> - <tr> - <td>4</td> - <td align="left"> - <dl> - <dt><b>Override</b></dt> - <dd>Uses the specified value if the two values disagree. It is an - error for two (or more) <tt>llvm.module.flags</tt> with the same - ID to have the Override behavior but different values.</dd> - </dl> - </td> - </tr> - </tbody> -</table> - -<p>An example of module flags:</p> - -<pre class="doc_code"> -!0 = metadata !{ i32 1, metadata !"foo", i32 1 } -!1 = metadata !{ i32 4, metadata !"bar", i32 37 } -!2 = metadata !{ i32 2, metadata !"qux", i32 42 } -!3 = metadata !{ i32 3, metadata !"qux", - metadata !{ - metadata !"foo", i32 1 - } -} -!llvm.module.flags = !{ !0, !1, !2, !3 } -</pre> - -<ul> - <li><p>Metadata <tt>!0</tt> has the ID <tt>!"foo"</tt> and the value '1'. The - behavior if two or more <tt>!"foo"</tt> flags are seen is to emit an - error if their values are not equal.</p></li> - - <li><p>Metadata <tt>!1</tt> has the ID <tt>!"bar"</tt> and the value '37'. The - behavior if two or more <tt>!"bar"</tt> flags are seen is to use the - value '37' if their values are not equal.</p></li> - - <li><p>Metadata <tt>!2</tt> has the ID <tt>!"qux"</tt> and the value '42'. The - behavior if two or more <tt>!"qux"</tt> flags are seen is to emit a - warning if their values are not equal.</p></li> - - <li><p>Metadata <tt>!3</tt> has the ID <tt>!"qux"</tt> and the value:</p> - -<pre class="doc_code"> -metadata !{ metadata !"foo", i32 1 } -</pre> - - <p>The behavior is to emit an error if the <tt>llvm.module.flags</tt> does - not contain a flag with the ID <tt>!"foo"</tt> that has the value - '1'. If two or more <tt>!"qux"</tt> flags exist, then they must have - the same value or an error will be issued.</p></li> -</ul> - - -<!-- ======================================================================= --> -<h3> -<a name="objc_gc_flags">Objective-C Garbage Collection Module Flags Metadata</a> -</h3> - -<div> - -<p>On the Mach-O platform, Objective-C stores metadata about garbage collection - in a special section called "image info". The metadata consists of a version - number and a bitmask specifying what types of garbage collection are - supported (if any) by the file. If two or more modules are linked together - their garbage collection metadata needs to be merged rather than appended - together.</p> - -<p>The Objective-C garbage collection module flags metadata consists of the - following key-value pairs:</p> - -<table border="1" cellspacing="0" cellpadding="4"> - <col width="30%"> - <tbody> - <tr> - <th>Key</th> - <th>Value</th> - </tr> - <tr> - <td><tt>Objective-C Version</tt></td> - <td align="left"><b>[Required]</b> — The Objective-C ABI - version. Valid values are 1 and 2.</td> - </tr> - <tr> - <td><tt>Objective-C Image Info Version</tt></td> - <td align="left"><b>[Required]</b> — The version of the image info - section. Currently always 0.</td> - </tr> - <tr> - <td><tt>Objective-C Image Info Section</tt></td> - <td align="left"><b>[Required]</b> — The section to place the - metadata. Valid values are <tt>"__OBJC, __image_info, regular"</tt> for - Objective-C ABI version 1, and <tt>"__DATA,__objc_imageinfo, regular, - no_dead_strip"</tt> for Objective-C ABI version 2.</td> - </tr> - <tr> - <td><tt>Objective-C Garbage Collection</tt></td> - <td align="left"><b>[Required]</b> — Specifies whether garbage - collection is supported or not. Valid values are 0, for no garbage - collection, and 2, for garbage collection supported.</td> - </tr> - <tr> - <td><tt>Objective-C GC Only</tt></td> - <td align="left"><b>[Optional]</b> — Specifies that only garbage - collection is supported. If present, its value must be 6. This flag - requires that the <tt>Objective-C Garbage Collection</tt> flag have the - value 2.</td> - </tr> - </tbody> -</table> - -<p>Some important flag interactions:</p> - -<ul> - <li>If a module with <tt>Objective-C Garbage Collection</tt> set to 0 is - merged with a module with <tt>Objective-C Garbage Collection</tt> set to - 2, then the resulting module has the <tt>Objective-C Garbage - Collection</tt> flag set to 0.</li> - - <li>A module with <tt>Objective-C Garbage Collection</tt> set to 0 cannot be - merged with a module with <tt>Objective-C GC Only</tt> set to 6.</li> -</ul> - -</div> - -</div> - -<!-- *********************************************************************** --> -<h2> - <a name="intrinsic_globals">Intrinsic Global Variables</a> -</h2> -<!-- *********************************************************************** --> -<div> -<p>LLVM has a number of "magic" global variables that contain data that affect -code generation or other IR semantics. These are documented here. All globals -of this sort should have a section specified as "<tt>llvm.metadata</tt>". This -section and all globals that start with "<tt>llvm.</tt>" are reserved for use -by LLVM.</p> - -<!-- ======================================================================= --> -<h3> -<a name="intg_used">The '<tt>llvm.used</tt>' Global Variable</a> -</h3> - -<div> - -<p>The <tt>@llvm.used</tt> global is an array with i8* element type which has <a -href="#linkage_appending">appending linkage</a>. This array contains a list of -pointers to global variables and functions which may optionally have a pointer -cast formed of bitcast or getelementptr. For example, a legal use of it is:</p> - -<div class="doc_code"> -<pre> -@X = global i8 4 -@Y = global i32 123 - -@llvm.used = appending global [2 x i8*] [ - i8* @X, - i8* bitcast (i32* @Y to i8*) -], section "llvm.metadata" -</pre> -</div> - -<p>If a global variable appears in the <tt>@llvm.used</tt> list, then the - compiler, assembler, and linker are required to treat the symbol as if there - is a reference to the global that it cannot see. For example, if a variable - has internal linkage and no references other than that from - the <tt>@llvm.used</tt> list, it cannot be deleted. This is commonly used to - represent references from inline asms and other things the compiler cannot - "see", and corresponds to "<tt>attribute((used))</tt>" in GNU C.</p> - -<p>On some targets, the code generator must emit a directive to the assembler or - object file to prevent the assembler and linker from molesting the - symbol.</p> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="intg_compiler_used"> - The '<tt>llvm.compiler.used</tt>' Global Variable - </a> -</h3> - -<div> - -<p>The <tt>@llvm.compiler.used</tt> directive is the same as the - <tt>@llvm.used</tt> directive, except that it only prevents the compiler from - touching the symbol. On targets that support it, this allows an intelligent - linker to optimize references to the symbol without being impeded as it would - be by <tt>@llvm.used</tt>.</p> - -<p>This is a rare construct that should only be used in rare circumstances, and - should not be exposed to source languages.</p> - -</div> - -<!-- ======================================================================= --> -<h3> -<a name="intg_global_ctors">The '<tt>llvm.global_ctors</tt>' Global Variable</a> -</h3> - -<div> - -<div class="doc_code"> -<pre> -%0 = type { i32, void ()* } -@llvm.global_ctors = appending global [1 x %0] [%0 { i32 65535, void ()* @ctor }] -</pre> -</div> - -<p>The <tt>@llvm.global_ctors</tt> array contains a list of constructor - functions and associated priorities. The functions referenced by this array - will be called in ascending order of priority (i.e. lowest first) when the - module is loaded. The order of functions with the same priority is not - defined.</p> - -</div> - -<!-- ======================================================================= --> -<h3> -<a name="intg_global_dtors">The '<tt>llvm.global_dtors</tt>' Global Variable</a> -</h3> - -<div> - -<div class="doc_code"> -<pre> -%0 = type { i32, void ()* } -@llvm.global_dtors = appending global [1 x %0] [%0 { i32 65535, void ()* @dtor }] -</pre> -</div> - -<p>The <tt>@llvm.global_dtors</tt> array contains a list of destructor functions - and associated priorities. The functions referenced by this array will be - called in descending order of priority (i.e. highest first) when the module - is loaded. The order of functions with the same priority is not defined.</p> - -</div> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="instref">Instruction Reference</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>The LLVM instruction set consists of several different classifications of - instructions: <a href="#terminators">terminator - instructions</a>, <a href="#binaryops">binary instructions</a>, - <a href="#bitwiseops">bitwise binary instructions</a>, - <a href="#memoryops">memory instructions</a>, and - <a href="#otherops">other instructions</a>.</p> - -<!-- ======================================================================= --> -<h3> - <a name="terminators">Terminator Instructions</a> -</h3> - -<div> - -<p>As mentioned <a href="#functionstructure">previously</a>, every basic block - in a program ends with a "Terminator" instruction, which indicates which - block should be executed after the current block is finished. These - terminator instructions typically yield a '<tt>void</tt>' value: they produce - control flow, not values (the one exception being the - '<a href="#i_invoke"><tt>invoke</tt></a>' instruction).</p> - -<p>The terminator instructions are: - '<a href="#i_ret"><tt>ret</tt></a>', - '<a href="#i_br"><tt>br</tt></a>', - '<a href="#i_switch"><tt>switch</tt></a>', - '<a href="#i_indirectbr"><tt>indirectbr</tt></a>', - '<a href="#i_invoke"><tt>invoke</tt></a>', - '<a href="#i_resume"><tt>resume</tt></a>', and - '<a href="#i_unreachable"><tt>unreachable</tt></a>'.</p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_ret">'<tt>ret</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - ret <type> <value> <i>; Return a value from a non-void function</i> - ret void <i>; Return from void function</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>ret</tt>' instruction is used to return control flow (and optionally - a value) from a function back to the caller.</p> - -<p>There are two forms of the '<tt>ret</tt>' instruction: one that returns a - value and then causes control flow, and one that just causes control flow to - occur.</p> - -<h5>Arguments:</h5> -<p>The '<tt>ret</tt>' instruction optionally accepts a single argument, the - return value. The type of the return value must be a - '<a href="#t_firstclass">first class</a>' type.</p> - -<p>A function is not <a href="#wellformed">well formed</a> if it it has a - non-void return type and contains a '<tt>ret</tt>' instruction with no return - value or a return value with a type that does not match its type, or if it - has a void return type and contains a '<tt>ret</tt>' instruction with a - return value.</p> - -<h5>Semantics:</h5> -<p>When the '<tt>ret</tt>' instruction is executed, control flow returns back to - the calling function's context. If the caller is a - "<a href="#i_call"><tt>call</tt></a>" instruction, execution continues at the - instruction after the call. If the caller was an - "<a href="#i_invoke"><tt>invoke</tt></a>" instruction, execution continues at - the beginning of the "normal" destination block. If the instruction returns - a value, that value shall set the call or invoke instruction's return - value.</p> - -<h5>Example:</h5> -<pre> - ret i32 5 <i>; Return an integer value of 5</i> - ret void <i>; Return from a void function</i> - ret { i32, i8 } { i32 4, i8 2 } <i>; Return a struct of values 4 and 2</i> -</pre> - -</div> -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_br">'<tt>br</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - br i1 <cond>, label <iftrue>, label <iffalse> - br label <dest> <i>; Unconditional branch</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>br</tt>' instruction is used to cause control flow to transfer to a - different basic block in the current function. There are two forms of this - instruction, corresponding to a conditional branch and an unconditional - branch.</p> - -<h5>Arguments:</h5> -<p>The conditional branch form of the '<tt>br</tt>' instruction takes a single - '<tt>i1</tt>' value and two '<tt>label</tt>' values. The unconditional form - of the '<tt>br</tt>' instruction takes a single '<tt>label</tt>' value as a - target.</p> - -<h5>Semantics:</h5> -<p>Upon execution of a conditional '<tt>br</tt>' instruction, the '<tt>i1</tt>' - argument is evaluated. If the value is <tt>true</tt>, control flows to the - '<tt>iftrue</tt>' <tt>label</tt> argument. If "cond" is <tt>false</tt>, - control flows to the '<tt>iffalse</tt>' <tt>label</tt> argument.</p> - -<h5>Example:</h5> -<pre> -Test: - %cond = <a href="#i_icmp">icmp</a> eq i32 %a, %b - br i1 %cond, label %IfEqual, label %IfUnequal -IfEqual: - <a href="#i_ret">ret</a> i32 1 -IfUnequal: - <a href="#i_ret">ret</a> i32 0 -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_switch">'<tt>switch</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - switch <intty> <value>, label <defaultdest> [ <intty> <val>, label <dest> ... ] -</pre> - -<h5>Overview:</h5> -<p>The '<tt>switch</tt>' instruction is used to transfer control flow to one of - several different places. It is a generalization of the '<tt>br</tt>' - instruction, allowing a branch to occur to one of many possible - destinations.</p> - -<h5>Arguments:</h5> -<p>The '<tt>switch</tt>' instruction uses three parameters: an integer - comparison value '<tt>value</tt>', a default '<tt>label</tt>' destination, - and an array of pairs of comparison value constants and '<tt>label</tt>'s. - The table is not allowed to contain duplicate constant entries.</p> - -<h5>Semantics:</h5> -<p>The <tt>switch</tt> instruction specifies a table of values and - destinations. When the '<tt>switch</tt>' instruction is executed, this table - is searched for the given value. If the value is found, control flow is - transferred to the corresponding destination; otherwise, control flow is - transferred to the default destination.</p> - -<h5>Implementation:</h5> -<p>Depending on properties of the target machine and the particular - <tt>switch</tt> instruction, this instruction may be code generated in - different ways. For example, it could be generated as a series of chained - conditional branches or with a lookup table.</p> - -<h5>Example:</h5> -<pre> - <i>; Emulate a conditional br instruction</i> - %Val = <a href="#i_zext">zext</a> i1 %value to i32 - switch i32 %Val, label %truedest [ i32 0, label %falsedest ] - - <i>; Emulate an unconditional br instruction</i> - switch i32 0, label %dest [ ] - - <i>; Implement a jump table:</i> - switch i32 %val, label %otherwise [ i32 0, label %onzero - i32 1, label %onone - i32 2, label %ontwo ] -</pre> - -</div> - - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_indirectbr">'<tt>indirectbr</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - indirectbr <somety>* <address>, [ label <dest1>, label <dest2>, ... ] -</pre> - -<h5>Overview:</h5> - -<p>The '<tt>indirectbr</tt>' instruction implements an indirect branch to a label - within the current function, whose address is specified by - "<tt>address</tt>". Address must be derived from a <a - href="#blockaddress">blockaddress</a> constant.</p> - -<h5>Arguments:</h5> - -<p>The '<tt>address</tt>' argument is the address of the label to jump to. The - rest of the arguments indicate the full set of possible destinations that the - address may point to. Blocks are allowed to occur multiple times in the - destination list, though this isn't particularly useful.</p> - -<p>This destination list is required so that dataflow analysis has an accurate - understanding of the CFG.</p> - -<h5>Semantics:</h5> - -<p>Control transfers to the block specified in the address argument. All - possible destination blocks must be listed in the label list, otherwise this - instruction has undefined behavior. This implies that jumps to labels - defined in other functions have undefined behavior as well.</p> - -<h5>Implementation:</h5> - -<p>This is typically implemented with a jump through a register.</p> - -<h5>Example:</h5> -<pre> - indirectbr i8* %Addr, [ label %bb1, label %bb2, label %bb3 ] -</pre> - -</div> - - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_invoke">'<tt>invoke</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = invoke [<a href="#callingconv">cconv</a>] [<a href="#paramattrs">ret attrs</a>] <ptr to function ty> <function ptr val>(<function args>) [<a href="#fnattrs">fn attrs</a>] - to label <normal label> unwind label <exception label> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>invoke</tt>' instruction causes control to transfer to a specified - function, with the possibility of control flow transfer to either the - '<tt>normal</tt>' label or the '<tt>exception</tt>' label. If the callee - function returns with the "<tt><a href="#i_ret">ret</a></tt>" instruction, - control flow will return to the "normal" label. If the callee (or any - indirect callees) returns via the "<a href="#i_resume"><tt>resume</tt></a>" - instruction or other exception handling mechanism, control is interrupted and - continued at the dynamically nearest "exception" label.</p> - -<p>The '<tt>exception</tt>' label is a - <i><a href="ExceptionHandling.html#overview">landing pad</a></i> for the - exception. As such, '<tt>exception</tt>' label is required to have the - "<a href="#i_landingpad"><tt>landingpad</tt></a>" instruction, which contains - the information about the behavior of the program after unwinding - happens, as its first non-PHI instruction. The restrictions on the - "<tt>landingpad</tt>" instruction's tightly couples it to the - "<tt>invoke</tt>" instruction, so that the important information contained - within the "<tt>landingpad</tt>" instruction can't be lost through normal - code motion.</p> - -<h5>Arguments:</h5> -<p>This instruction requires several arguments:</p> - -<ol> - <li>The optional "cconv" marker indicates which <a href="#callingconv">calling - convention</a> the call should use. If none is specified, the call - defaults to using C calling conventions.</li> - - <li>The optional <a href="#paramattrs">Parameter Attributes</a> list for - return values. Only '<tt>zeroext</tt>', '<tt>signext</tt>', and - '<tt>inreg</tt>' attributes are valid here.</li> - - <li>'<tt>ptr to function ty</tt>': shall be the signature of the pointer to - function value being invoked. In most cases, this is a direct function - invocation, but indirect <tt>invoke</tt>s are just as possible, branching - off an arbitrary pointer to function value.</li> - - <li>'<tt>function ptr val</tt>': An LLVM value containing a pointer to a - function to be invoked. </li> - - <li>'<tt>function args</tt>': argument list whose types match the function - signature argument types and parameter attributes. All arguments must be - of <a href="#t_firstclass">first class</a> type. If the function - signature indicates the function accepts a variable number of arguments, - the extra arguments can be specified.</li> - - <li>'<tt>normal label</tt>': the label reached when the called function - executes a '<tt><a href="#i_ret">ret</a></tt>' instruction. </li> - - <li>'<tt>exception label</tt>': the label reached when a callee returns via - the <a href="#i_resume"><tt>resume</tt></a> instruction or other exception - handling mechanism.</li> - - <li>The optional <a href="#fnattrs">function attributes</a> list. Only - '<tt>noreturn</tt>', '<tt>nounwind</tt>', '<tt>readonly</tt>' and - '<tt>readnone</tt>' attributes are valid here.</li> -</ol> - -<h5>Semantics:</h5> -<p>This instruction is designed to operate as a standard - '<tt><a href="#i_call">call</a></tt>' instruction in most regards. The - primary difference is that it establishes an association with a label, which - is used by the runtime library to unwind the stack.</p> - -<p>This instruction is used in languages with destructors to ensure that proper - cleanup is performed in the case of either a <tt>longjmp</tt> or a thrown - exception. Additionally, this is important for implementation of - '<tt>catch</tt>' clauses in high-level languages that support them.</p> - -<p>For the purposes of the SSA form, the definition of the value returned by the - '<tt>invoke</tt>' instruction is deemed to occur on the edge from the current - block to the "normal" label. If the callee unwinds then no return value is - available.</p> - -<h5>Example:</h5> -<pre> - %retval = invoke i32 @Test(i32 15) to label %Continue - unwind label %TestCleanup <i>; {i32}:retval set</i> - %retval = invoke <a href="#callingconv">coldcc</a> i32 %Testfnptr(i32 15) to label %Continue - unwind label %TestCleanup <i>; {i32}:retval set</i> -</pre> - -</div> - - <!-- _______________________________________________________________________ --> - -<h4> - <a name="i_resume">'<tt>resume</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - resume <type> <value> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>resume</tt>' instruction is a terminator instruction that has no - successors.</p> - -<h5>Arguments:</h5> -<p>The '<tt>resume</tt>' instruction requires one argument, which must have the - same type as the result of any '<tt>landingpad</tt>' instruction in the same - function.</p> - -<h5>Semantics:</h5> -<p>The '<tt>resume</tt>' instruction resumes propagation of an existing - (in-flight) exception whose unwinding was interrupted with - a <a href="#i_landingpad"><tt>landingpad</tt></a> instruction.</p> - -<h5>Example:</h5> -<pre> - resume { i8*, i32 } %exn -</pre> - -</div> - -<!-- _______________________________________________________________________ --> - -<h4> - <a name="i_unreachable">'<tt>unreachable</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - unreachable -</pre> - -<h5>Overview:</h5> -<p>The '<tt>unreachable</tt>' instruction has no defined semantics. This - instruction is used to inform the optimizer that a particular portion of the - code is not reachable. This can be used to indicate that the code after a - no-return function cannot be reached, and other facts.</p> - -<h5>Semantics:</h5> -<p>The '<tt>unreachable</tt>' instruction has no defined semantics.</p> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="binaryops">Binary Operations</a> -</h3> - -<div> - -<p>Binary operators are used to do most of the computation in a program. They - require two operands of the same type, execute an operation on them, and - produce a single value. The operands might represent multiple data, as is - the case with the <a href="#t_vector">vector</a> data type. The result value - has the same type as its operands.</p> - -<p>There are several different binary operators:</p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_add">'<tt>add</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = add <ty> <op1>, <op2> <i>; yields {ty}:result</i> - <result> = add nuw <ty> <op1>, <op2> <i>; yields {ty}:result</i> - <result> = add nsw <ty> <op1>, <op2> <i>; yields {ty}:result</i> - <result> = add nuw nsw <ty> <op1>, <op2> <i>; yields {ty}:result</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>add</tt>' instruction returns the sum of its two operands.</p> - -<h5>Arguments:</h5> -<p>The two arguments to the '<tt>add</tt>' instruction must - be <a href="#t_integer">integer</a> or <a href="#t_vector">vector</a> of - integer values. Both arguments must have identical types.</p> - -<h5>Semantics:</h5> -<p>The value produced is the integer sum of the two operands.</p> - -<p>If the sum has unsigned overflow, the result returned is the mathematical - result modulo 2<sup>n</sup>, where n is the bit width of the result.</p> - -<p>Because LLVM integers use a two's complement representation, this instruction - is appropriate for both signed and unsigned integers.</p> - -<p><tt>nuw</tt> and <tt>nsw</tt> stand for "No Unsigned Wrap" - and "No Signed Wrap", respectively. If the <tt>nuw</tt> and/or - <tt>nsw</tt> keywords are present, the result value of the <tt>add</tt> - is a <a href="#poisonvalues">poison value</a> if unsigned and/or signed overflow, - respectively, occurs.</p> - -<h5>Example:</h5> -<pre> - <result> = add i32 4, %var <i>; yields {i32}:result = 4 + %var</i> -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_fadd">'<tt>fadd</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = fadd <ty> <op1>, <op2> <i>; yields {ty}:result</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>fadd</tt>' instruction returns the sum of its two operands.</p> - -<h5>Arguments:</h5> -<p>The two arguments to the '<tt>fadd</tt>' instruction must be - <a href="#t_floating">floating point</a> or <a href="#t_vector">vector</a> of - floating point values. Both arguments must have identical types.</p> - -<h5>Semantics:</h5> -<p>The value produced is the floating point sum of the two operands.</p> - -<h5>Example:</h5> -<pre> - <result> = fadd float 4.0, %var <i>; yields {float}:result = 4.0 + %var</i> -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_sub">'<tt>sub</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = sub <ty> <op1>, <op2> <i>; yields {ty}:result</i> - <result> = sub nuw <ty> <op1>, <op2> <i>; yields {ty}:result</i> - <result> = sub nsw <ty> <op1>, <op2> <i>; yields {ty}:result</i> - <result> = sub nuw nsw <ty> <op1>, <op2> <i>; yields {ty}:result</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>sub</tt>' instruction returns the difference of its two - operands.</p> - -<p>Note that the '<tt>sub</tt>' instruction is used to represent the - '<tt>neg</tt>' instruction present in most other intermediate - representations.</p> - -<h5>Arguments:</h5> -<p>The two arguments to the '<tt>sub</tt>' instruction must - be <a href="#t_integer">integer</a> or <a href="#t_vector">vector</a> of - integer values. Both arguments must have identical types.</p> - -<h5>Semantics:</h5> -<p>The value produced is the integer difference of the two operands.</p> - -<p>If the difference has unsigned overflow, the result returned is the - mathematical result modulo 2<sup>n</sup>, where n is the bit width of the - result.</p> - -<p>Because LLVM integers use a two's complement representation, this instruction - is appropriate for both signed and unsigned integers.</p> - -<p><tt>nuw</tt> and <tt>nsw</tt> stand for "No Unsigned Wrap" - and "No Signed Wrap", respectively. If the <tt>nuw</tt> and/or - <tt>nsw</tt> keywords are present, the result value of the <tt>sub</tt> - is a <a href="#poisonvalues">poison value</a> if unsigned and/or signed overflow, - respectively, occurs.</p> - -<h5>Example:</h5> -<pre> - <result> = sub i32 4, %var <i>; yields {i32}:result = 4 - %var</i> - <result> = sub i32 0, %val <i>; yields {i32}:result = -%var</i> -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_fsub">'<tt>fsub</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = fsub <ty> <op1>, <op2> <i>; yields {ty}:result</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>fsub</tt>' instruction returns the difference of its two - operands.</p> - -<p>Note that the '<tt>fsub</tt>' instruction is used to represent the - '<tt>fneg</tt>' instruction present in most other intermediate - representations.</p> - -<h5>Arguments:</h5> -<p>The two arguments to the '<tt>fsub</tt>' instruction must be - <a href="#t_floating">floating point</a> or <a href="#t_vector">vector</a> of - floating point values. Both arguments must have identical types.</p> - -<h5>Semantics:</h5> -<p>The value produced is the floating point difference of the two operands.</p> - -<h5>Example:</h5> -<pre> - <result> = fsub float 4.0, %var <i>; yields {float}:result = 4.0 - %var</i> - <result> = fsub float -0.0, %val <i>; yields {float}:result = -%var</i> -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_mul">'<tt>mul</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = mul <ty> <op1>, <op2> <i>; yields {ty}:result</i> - <result> = mul nuw <ty> <op1>, <op2> <i>; yields {ty}:result</i> - <result> = mul nsw <ty> <op1>, <op2> <i>; yields {ty}:result</i> - <result> = mul nuw nsw <ty> <op1>, <op2> <i>; yields {ty}:result</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>mul</tt>' instruction returns the product of its two operands.</p> - -<h5>Arguments:</h5> -<p>The two arguments to the '<tt>mul</tt>' instruction must - be <a href="#t_integer">integer</a> or <a href="#t_vector">vector</a> of - integer values. Both arguments must have identical types.</p> - -<h5>Semantics:</h5> -<p>The value produced is the integer product of the two operands.</p> - -<p>If the result of the multiplication has unsigned overflow, the result - returned is the mathematical result modulo 2<sup>n</sup>, where n is the bit - width of the result.</p> - -<p>Because LLVM integers use a two's complement representation, and the result - is the same width as the operands, this instruction returns the correct - result for both signed and unsigned integers. If a full product - (e.g. <tt>i32</tt>x<tt>i32</tt>-><tt>i64</tt>) is needed, the operands should - be sign-extended or zero-extended as appropriate to the width of the full - product.</p> - -<p><tt>nuw</tt> and <tt>nsw</tt> stand for "No Unsigned Wrap" - and "No Signed Wrap", respectively. If the <tt>nuw</tt> and/or - <tt>nsw</tt> keywords are present, the result value of the <tt>mul</tt> - is a <a href="#poisonvalues">poison value</a> if unsigned and/or signed overflow, - respectively, occurs.</p> - -<h5>Example:</h5> -<pre> - <result> = mul i32 4, %var <i>; yields {i32}:result = 4 * %var</i> -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_fmul">'<tt>fmul</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = fmul <ty> <op1>, <op2> <i>; yields {ty}:result</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>fmul</tt>' instruction returns the product of its two operands.</p> - -<h5>Arguments:</h5> -<p>The two arguments to the '<tt>fmul</tt>' instruction must be - <a href="#t_floating">floating point</a> or <a href="#t_vector">vector</a> of - floating point values. Both arguments must have identical types.</p> - -<h5>Semantics:</h5> -<p>The value produced is the floating point product of the two operands.</p> - -<h5>Example:</h5> -<pre> - <result> = fmul float 4.0, %var <i>; yields {float}:result = 4.0 * %var</i> -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_udiv">'<tt>udiv</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = udiv <ty> <op1>, <op2> <i>; yields {ty}:result</i> - <result> = udiv exact <ty> <op1>, <op2> <i>; yields {ty}:result</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>udiv</tt>' instruction returns the quotient of its two operands.</p> - -<h5>Arguments:</h5> -<p>The two arguments to the '<tt>udiv</tt>' instruction must be - <a href="#t_integer">integer</a> or <a href="#t_vector">vector</a> of integer - values. Both arguments must have identical types.</p> - -<h5>Semantics:</h5> -<p>The value produced is the unsigned integer quotient of the two operands.</p> - -<p>Note that unsigned integer division and signed integer division are distinct - operations; for signed integer division, use '<tt>sdiv</tt>'.</p> - -<p>Division by zero leads to undefined behavior.</p> - -<p>If the <tt>exact</tt> keyword is present, the result value of the - <tt>udiv</tt> is a <a href="#poisonvalues">poison value</a> if %op1 is not a - multiple of %op2 (as such, "((a udiv exact b) mul b) == a").</p> - - -<h5>Example:</h5> -<pre> - <result> = udiv i32 4, %var <i>; yields {i32}:result = 4 / %var</i> -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_sdiv">'<tt>sdiv</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = sdiv <ty> <op1>, <op2> <i>; yields {ty}:result</i> - <result> = sdiv exact <ty> <op1>, <op2> <i>; yields {ty}:result</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>sdiv</tt>' instruction returns the quotient of its two operands.</p> - -<h5>Arguments:</h5> -<p>The two arguments to the '<tt>sdiv</tt>' instruction must be - <a href="#t_integer">integer</a> or <a href="#t_vector">vector</a> of integer - values. Both arguments must have identical types.</p> - -<h5>Semantics:</h5> -<p>The value produced is the signed integer quotient of the two operands rounded - towards zero.</p> - -<p>Note that signed integer division and unsigned integer division are distinct - operations; for unsigned integer division, use '<tt>udiv</tt>'.</p> - -<p>Division by zero leads to undefined behavior. Overflow also leads to - undefined behavior; this is a rare case, but can occur, for example, by doing - a 32-bit division of -2147483648 by -1.</p> - -<p>If the <tt>exact</tt> keyword is present, the result value of the - <tt>sdiv</tt> is a <a href="#poisonvalues">poison value</a> if the result would - be rounded.</p> - -<h5>Example:</h5> -<pre> - <result> = sdiv i32 4, %var <i>; yields {i32}:result = 4 / %var</i> -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_fdiv">'<tt>fdiv</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = fdiv <ty> <op1>, <op2> <i>; yields {ty}:result</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>fdiv</tt>' instruction returns the quotient of its two operands.</p> - -<h5>Arguments:</h5> -<p>The two arguments to the '<tt>fdiv</tt>' instruction must be - <a href="#t_floating">floating point</a> or <a href="#t_vector">vector</a> of - floating point values. Both arguments must have identical types.</p> - -<h5>Semantics:</h5> -<p>The value produced is the floating point quotient of the two operands.</p> - -<h5>Example:</h5> -<pre> - <result> = fdiv float 4.0, %var <i>; yields {float}:result = 4.0 / %var</i> -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_urem">'<tt>urem</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = urem <ty> <op1>, <op2> <i>; yields {ty}:result</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>urem</tt>' instruction returns the remainder from the unsigned - division of its two arguments.</p> - -<h5>Arguments:</h5> -<p>The two arguments to the '<tt>urem</tt>' instruction must be - <a href="#t_integer">integer</a> or <a href="#t_vector">vector</a> of integer - values. Both arguments must have identical types.</p> - -<h5>Semantics:</h5> -<p>This instruction returns the unsigned integer <i>remainder</i> of a division. - This instruction always performs an unsigned division to get the - remainder.</p> - -<p>Note that unsigned integer remainder and signed integer remainder are - distinct operations; for signed integer remainder, use '<tt>srem</tt>'.</p> - -<p>Taking the remainder of a division by zero leads to undefined behavior.</p> - -<h5>Example:</h5> -<pre> - <result> = urem i32 4, %var <i>; yields {i32}:result = 4 % %var</i> -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_srem">'<tt>srem</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = srem <ty> <op1>, <op2> <i>; yields {ty}:result</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>srem</tt>' instruction returns the remainder from the signed - division of its two operands. This instruction can also take - <a href="#t_vector">vector</a> versions of the values in which case the - elements must be integers.</p> - -<h5>Arguments:</h5> -<p>The two arguments to the '<tt>srem</tt>' instruction must be - <a href="#t_integer">integer</a> or <a href="#t_vector">vector</a> of integer - values. Both arguments must have identical types.</p> - -<h5>Semantics:</h5> -<p>This instruction returns the <i>remainder</i> of a division (where the result - is either zero or has the same sign as the dividend, <tt>op1</tt>), not the - <i>modulo</i> operator (where the result is either zero or has the same sign - as the divisor, <tt>op2</tt>) of a value. - For more information about the difference, - see <a href="http://mathforum.org/dr.math/problems/anne.4.28.99.html">The - Math Forum</a>. For a table of how this is implemented in various languages, - please see <a href="http://en.wikipedia.org/wiki/Modulo_operation"> - Wikipedia: modulo operation</a>.</p> - -<p>Note that signed integer remainder and unsigned integer remainder are - distinct operations; for unsigned integer remainder, use '<tt>urem</tt>'.</p> - -<p>Taking the remainder of a division by zero leads to undefined behavior. - Overflow also leads to undefined behavior; this is a rare case, but can - occur, for example, by taking the remainder of a 32-bit division of - -2147483648 by -1. (The remainder doesn't actually overflow, but this rule - lets srem be implemented using instructions that return both the result of - the division and the remainder.)</p> - -<h5>Example:</h5> -<pre> - <result> = srem i32 4, %var <i>; yields {i32}:result = 4 % %var</i> -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_frem">'<tt>frem</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = frem <ty> <op1>, <op2> <i>; yields {ty}:result</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>frem</tt>' instruction returns the remainder from the division of - its two operands.</p> - -<h5>Arguments:</h5> -<p>The two arguments to the '<tt>frem</tt>' instruction must be - <a href="#t_floating">floating point</a> or <a href="#t_vector">vector</a> of - floating point values. Both arguments must have identical types.</p> - -<h5>Semantics:</h5> -<p>This instruction returns the <i>remainder</i> of a division. The remainder - has the same sign as the dividend.</p> - -<h5>Example:</h5> -<pre> - <result> = frem float 4.0, %var <i>; yields {float}:result = 4.0 % %var</i> -</pre> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="bitwiseops">Bitwise Binary Operations</a> -</h3> - -<div> - -<p>Bitwise binary operators are used to do various forms of bit-twiddling in a - program. They are generally very efficient instructions and can commonly be - strength reduced from other instructions. They require two operands of the - same type, execute an operation on them, and produce a single value. The - resulting value is the same type as its operands.</p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_shl">'<tt>shl</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = shl <ty> <op1>, <op2> <i>; yields {ty}:result</i> - <result> = shl nuw <ty> <op1>, <op2> <i>; yields {ty}:result</i> - <result> = shl nsw <ty> <op1>, <op2> <i>; yields {ty}:result</i> - <result> = shl nuw nsw <ty> <op1>, <op2> <i>; yields {ty}:result</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>shl</tt>' instruction returns the first operand shifted to the left - a specified number of bits.</p> - -<h5>Arguments:</h5> -<p>Both arguments to the '<tt>shl</tt>' instruction must be the - same <a href="#t_integer">integer</a> or <a href="#t_vector">vector</a> of - integer type. '<tt>op2</tt>' is treated as an unsigned value.</p> - -<h5>Semantics:</h5> -<p>The value produced is <tt>op1</tt> * 2<sup><tt>op2</tt></sup> mod - 2<sup>n</sup>, where <tt>n</tt> is the width of the result. If <tt>op2</tt> - is (statically or dynamically) negative or equal to or larger than the number - of bits in <tt>op1</tt>, the result is undefined. If the arguments are - vectors, each vector element of <tt>op1</tt> is shifted by the corresponding - shift amount in <tt>op2</tt>.</p> - -<p>If the <tt>nuw</tt> keyword is present, then the shift produces a - <a href="#poisonvalues">poison value</a> if it shifts out any non-zero bits. If - the <tt>nsw</tt> keyword is present, then the shift produces a - <a href="#poisonvalues">poison value</a> if it shifts out any bits that disagree - with the resultant sign bit. As such, NUW/NSW have the same semantics as - they would if the shift were expressed as a mul instruction with the same - nsw/nuw bits in (mul %op1, (shl 1, %op2)).</p> - -<h5>Example:</h5> -<pre> - <result> = shl i32 4, %var <i>; yields {i32}: 4 << %var</i> - <result> = shl i32 4, 2 <i>; yields {i32}: 16</i> - <result> = shl i32 1, 10 <i>; yields {i32}: 1024</i> - <result> = shl i32 1, 32 <i>; undefined</i> - <result> = shl <2 x i32> < i32 1, i32 1>, < i32 1, i32 2> <i>; yields: result=<2 x i32> < i32 2, i32 4></i> -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_lshr">'<tt>lshr</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = lshr <ty> <op1>, <op2> <i>; yields {ty}:result</i> - <result> = lshr exact <ty> <op1>, <op2> <i>; yields {ty}:result</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>lshr</tt>' instruction (logical shift right) returns the first - operand shifted to the right a specified number of bits with zero fill.</p> - -<h5>Arguments:</h5> -<p>Both arguments to the '<tt>lshr</tt>' instruction must be the same - <a href="#t_integer">integer</a> or <a href="#t_vector">vector</a> of integer - type. '<tt>op2</tt>' is treated as an unsigned value.</p> - -<h5>Semantics:</h5> -<p>This instruction always performs a logical shift right operation. The most - significant bits of the result will be filled with zero bits after the shift. - If <tt>op2</tt> is (statically or dynamically) equal to or larger than the - number of bits in <tt>op1</tt>, the result is undefined. If the arguments are - vectors, each vector element of <tt>op1</tt> is shifted by the corresponding - shift amount in <tt>op2</tt>.</p> - -<p>If the <tt>exact</tt> keyword is present, the result value of the - <tt>lshr</tt> is a <a href="#poisonvalues">poison value</a> if any of the bits - shifted out are non-zero.</p> - - -<h5>Example:</h5> -<pre> - <result> = lshr i32 4, 1 <i>; yields {i32}:result = 2</i> - <result> = lshr i32 4, 2 <i>; yields {i32}:result = 1</i> - <result> = lshr i8 4, 3 <i>; yields {i8}:result = 0</i> - <result> = lshr i8 -2, 1 <i>; yields {i8}:result = 0x7FFFFFFF </i> - <result> = lshr i32 1, 32 <i>; undefined</i> - <result> = lshr <2 x i32> < i32 -2, i32 4>, < i32 1, i32 2> <i>; yields: result=<2 x i32> < i32 0x7FFFFFFF, i32 1></i> -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_ashr">'<tt>ashr</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = ashr <ty> <op1>, <op2> <i>; yields {ty}:result</i> - <result> = ashr exact <ty> <op1>, <op2> <i>; yields {ty}:result</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>ashr</tt>' instruction (arithmetic shift right) returns the first - operand shifted to the right a specified number of bits with sign - extension.</p> - -<h5>Arguments:</h5> -<p>Both arguments to the '<tt>ashr</tt>' instruction must be the same - <a href="#t_integer">integer</a> or <a href="#t_vector">vector</a> of integer - type. '<tt>op2</tt>' is treated as an unsigned value.</p> - -<h5>Semantics:</h5> -<p>This instruction always performs an arithmetic shift right operation, The - most significant bits of the result will be filled with the sign bit - of <tt>op1</tt>. If <tt>op2</tt> is (statically or dynamically) equal to or - larger than the number of bits in <tt>op1</tt>, the result is undefined. If - the arguments are vectors, each vector element of <tt>op1</tt> is shifted by - the corresponding shift amount in <tt>op2</tt>.</p> - -<p>If the <tt>exact</tt> keyword is present, the result value of the - <tt>ashr</tt> is a <a href="#poisonvalues">poison value</a> if any of the bits - shifted out are non-zero.</p> - -<h5>Example:</h5> -<pre> - <result> = ashr i32 4, 1 <i>; yields {i32}:result = 2</i> - <result> = ashr i32 4, 2 <i>; yields {i32}:result = 1</i> - <result> = ashr i8 4, 3 <i>; yields {i8}:result = 0</i> - <result> = ashr i8 -2, 1 <i>; yields {i8}:result = -1</i> - <result> = ashr i32 1, 32 <i>; undefined</i> - <result> = ashr <2 x i32> < i32 -2, i32 4>, < i32 1, i32 3> <i>; yields: result=<2 x i32> < i32 -1, i32 0></i> -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_and">'<tt>and</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = and <ty> <op1>, <op2> <i>; yields {ty}:result</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>and</tt>' instruction returns the bitwise logical and of its two - operands.</p> - -<h5>Arguments:</h5> -<p>The two arguments to the '<tt>and</tt>' instruction must be - <a href="#t_integer">integer</a> or <a href="#t_vector">vector</a> of integer - values. Both arguments must have identical types.</p> - -<h5>Semantics:</h5> -<p>The truth table used for the '<tt>and</tt>' instruction is:</p> - -<table border="1" cellspacing="0" cellpadding="4"> - <tbody> - <tr> - <th>In0</th> - <th>In1</th> - <th>Out</th> - </tr> - <tr> - <td>0</td> - <td>0</td> - <td>0</td> - </tr> - <tr> - <td>0</td> - <td>1</td> - <td>0</td> - </tr> - <tr> - <td>1</td> - <td>0</td> - <td>0</td> - </tr> - <tr> - <td>1</td> - <td>1</td> - <td>1</td> - </tr> - </tbody> -</table> - -<h5>Example:</h5> -<pre> - <result> = and i32 4, %var <i>; yields {i32}:result = 4 & %var</i> - <result> = and i32 15, 40 <i>; yields {i32}:result = 8</i> - <result> = and i32 4, 8 <i>; yields {i32}:result = 0</i> -</pre> -</div> -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_or">'<tt>or</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = or <ty> <op1>, <op2> <i>; yields {ty}:result</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>or</tt>' instruction returns the bitwise logical inclusive or of its - two operands.</p> - -<h5>Arguments:</h5> -<p>The two arguments to the '<tt>or</tt>' instruction must be - <a href="#t_integer">integer</a> or <a href="#t_vector">vector</a> of integer - values. Both arguments must have identical types.</p> - -<h5>Semantics:</h5> -<p>The truth table used for the '<tt>or</tt>' instruction is:</p> - -<table border="1" cellspacing="0" cellpadding="4"> - <tbody> - <tr> - <th>In0</th> - <th>In1</th> - <th>Out</th> - </tr> - <tr> - <td>0</td> - <td>0</td> - <td>0</td> - </tr> - <tr> - <td>0</td> - <td>1</td> - <td>1</td> - </tr> - <tr> - <td>1</td> - <td>0</td> - <td>1</td> - </tr> - <tr> - <td>1</td> - <td>1</td> - <td>1</td> - </tr> - </tbody> -</table> - -<h5>Example:</h5> -<pre> - <result> = or i32 4, %var <i>; yields {i32}:result = 4 | %var</i> - <result> = or i32 15, 40 <i>; yields {i32}:result = 47</i> - <result> = or i32 4, 8 <i>; yields {i32}:result = 12</i> -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_xor">'<tt>xor</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = xor <ty> <op1>, <op2> <i>; yields {ty}:result</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>xor</tt>' instruction returns the bitwise logical exclusive or of - its two operands. The <tt>xor</tt> is used to implement the "one's - complement" operation, which is the "~" operator in C.</p> - -<h5>Arguments:</h5> -<p>The two arguments to the '<tt>xor</tt>' instruction must be - <a href="#t_integer">integer</a> or <a href="#t_vector">vector</a> of integer - values. Both arguments must have identical types.</p> - -<h5>Semantics:</h5> -<p>The truth table used for the '<tt>xor</tt>' instruction is:</p> - -<table border="1" cellspacing="0" cellpadding="4"> - <tbody> - <tr> - <th>In0</th> - <th>In1</th> - <th>Out</th> - </tr> - <tr> - <td>0</td> - <td>0</td> - <td>0</td> - </tr> - <tr> - <td>0</td> - <td>1</td> - <td>1</td> - </tr> - <tr> - <td>1</td> - <td>0</td> - <td>1</td> - </tr> - <tr> - <td>1</td> - <td>1</td> - <td>0</td> - </tr> - </tbody> -</table> - -<h5>Example:</h5> -<pre> - <result> = xor i32 4, %var <i>; yields {i32}:result = 4 ^ %var</i> - <result> = xor i32 15, 40 <i>; yields {i32}:result = 39</i> - <result> = xor i32 4, 8 <i>; yields {i32}:result = 12</i> - <result> = xor i32 %V, -1 <i>; yields {i32}:result = ~%V</i> -</pre> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="vectorops">Vector Operations</a> -</h3> - -<div> - -<p>LLVM supports several instructions to represent vector operations in a - target-independent manner. These instructions cover the element-access and - vector-specific operations needed to process vectors effectively. While LLVM - does directly support these vector operations, many sophisticated algorithms - will want to use target-specific intrinsics to take full advantage of a - specific target.</p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_extractelement">'<tt>extractelement</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = extractelement <n x <ty>> <val>, i32 <idx> <i>; yields <ty></i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>extractelement</tt>' instruction extracts a single scalar element - from a vector at a specified index.</p> - - -<h5>Arguments:</h5> -<p>The first operand of an '<tt>extractelement</tt>' instruction is a value - of <a href="#t_vector">vector</a> type. The second operand is an index - indicating the position from which to extract the element. The index may be - a variable.</p> - -<h5>Semantics:</h5> -<p>The result is a scalar of the same type as the element type of - <tt>val</tt>. Its value is the value at position <tt>idx</tt> of - <tt>val</tt>. If <tt>idx</tt> exceeds the length of <tt>val</tt>, the - results are undefined.</p> - -<h5>Example:</h5> -<pre> - <result> = extractelement <4 x i32> %vec, i32 0 <i>; yields i32</i> -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_insertelement">'<tt>insertelement</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = insertelement <n x <ty>> <val>, <ty> <elt>, i32 <idx> <i>; yields <n x <ty>></i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>insertelement</tt>' instruction inserts a scalar element into a - vector at a specified index.</p> - -<h5>Arguments:</h5> -<p>The first operand of an '<tt>insertelement</tt>' instruction is a value - of <a href="#t_vector">vector</a> type. The second operand is a scalar value - whose type must equal the element type of the first operand. The third - operand is an index indicating the position at which to insert the value. - The index may be a variable.</p> - -<h5>Semantics:</h5> -<p>The result is a vector of the same type as <tt>val</tt>. Its element values - are those of <tt>val</tt> except at position <tt>idx</tt>, where it gets the - value <tt>elt</tt>. If <tt>idx</tt> exceeds the length of <tt>val</tt>, the - results are undefined.</p> - -<h5>Example:</h5> -<pre> - <result> = insertelement <4 x i32> %vec, i32 1, i32 0 <i>; yields <4 x i32></i> -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_shufflevector">'<tt>shufflevector</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = shufflevector <n x <ty>> <v1>, <n x <ty>> <v2>, <m x i32> <mask> <i>; yields <m x <ty>></i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>shufflevector</tt>' instruction constructs a permutation of elements - from two input vectors, returning a vector with the same element type as the - input and length that is the same as the shuffle mask.</p> - -<h5>Arguments:</h5> -<p>The first two operands of a '<tt>shufflevector</tt>' instruction are vectors - with the same type. The third argument is a shuffle mask whose - element type is always 'i32'. The result of the instruction is a vector - whose length is the same as the shuffle mask and whose element type is the - same as the element type of the first two operands.</p> - -<p>The shuffle mask operand is required to be a constant vector with either - constant integer or undef values.</p> - -<h5>Semantics:</h5> -<p>The elements of the two input vectors are numbered from left to right across - both of the vectors. The shuffle mask operand specifies, for each element of - the result vector, which element of the two input vectors the result element - gets. The element selector may be undef (meaning "don't care") and the - second operand may be undef if performing a shuffle from only one vector.</p> - -<h5>Example:</h5> -<pre> - <result> = shufflevector <4 x i32> %v1, <4 x i32> %v2, - <4 x i32> <i32 0, i32 4, i32 1, i32 5> <i>; yields <4 x i32></i> - <result> = shufflevector <4 x i32> %v1, <4 x i32> undef, - <4 x i32> <i32 0, i32 1, i32 2, i32 3> <i>; yields <4 x i32></i> - Identity shuffle. - <result> = shufflevector <8 x i32> %v1, <8 x i32> undef, - <4 x i32> <i32 0, i32 1, i32 2, i32 3> <i>; yields <4 x i32></i> - <result> = shufflevector <4 x i32> %v1, <4 x i32> %v2, - <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7 > <i>; yields <8 x i32></i> -</pre> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="aggregateops">Aggregate Operations</a> -</h3> - -<div> - -<p>LLVM supports several instructions for working with - <a href="#t_aggregate">aggregate</a> values.</p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_extractvalue">'<tt>extractvalue</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = extractvalue <aggregate type> <val>, <idx>{, <idx>}* -</pre> - -<h5>Overview:</h5> -<p>The '<tt>extractvalue</tt>' instruction extracts the value of a member field - from an <a href="#t_aggregate">aggregate</a> value.</p> - -<h5>Arguments:</h5> -<p>The first operand of an '<tt>extractvalue</tt>' instruction is a value - of <a href="#t_struct">struct</a> or - <a href="#t_array">array</a> type. The operands are constant indices to - specify which value to extract in a similar manner as indices in a - '<tt><a href="#i_getelementptr">getelementptr</a></tt>' instruction.</p> - <p>The major differences to <tt>getelementptr</tt> indexing are:</p> - <ul> - <li>Since the value being indexed is not a pointer, the first index is - omitted and assumed to be zero.</li> - <li>At least one index must be specified.</li> - <li>Not only struct indices but also array indices must be in - bounds.</li> - </ul> - -<h5>Semantics:</h5> -<p>The result is the value at the position in the aggregate specified by the - index operands.</p> - -<h5>Example:</h5> -<pre> - <result> = extractvalue {i32, float} %agg, 0 <i>; yields i32</i> -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_insertvalue">'<tt>insertvalue</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = insertvalue <aggregate type> <val>, <ty> <elt>, <idx>{, <idx>}* <i>; yields <aggregate type></i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>insertvalue</tt>' instruction inserts a value into a member field - in an <a href="#t_aggregate">aggregate</a> value.</p> - -<h5>Arguments:</h5> -<p>The first operand of an '<tt>insertvalue</tt>' instruction is a value - of <a href="#t_struct">struct</a> or - <a href="#t_array">array</a> type. The second operand is a first-class - value to insert. The following operands are constant indices indicating - the position at which to insert the value in a similar manner as indices in a - '<tt><a href="#i_extractvalue">extractvalue</a></tt>' instruction. The - value to insert must have the same type as the value identified by the - indices.</p> - -<h5>Semantics:</h5> -<p>The result is an aggregate of the same type as <tt>val</tt>. Its value is - that of <tt>val</tt> except that the value at the position specified by the - indices is that of <tt>elt</tt>.</p> - -<h5>Example:</h5> -<pre> - %agg1 = insertvalue {i32, float} undef, i32 1, 0 <i>; yields {i32 1, float undef}</i> - %agg2 = insertvalue {i32, float} %agg1, float %val, 1 <i>; yields {i32 1, float %val}</i> - %agg3 = insertvalue {i32, {float}} %agg1, float %val, 1, 0 <i>; yields {i32 1, float %val}</i> -</pre> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="memoryops">Memory Access and Addressing Operations</a> -</h3> - -<div> - -<p>A key design point of an SSA-based representation is how it represents - memory. In LLVM, no memory locations are in SSA form, which makes things - very simple. This section describes how to read, write, and allocate - memory in LLVM.</p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_alloca">'<tt>alloca</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = alloca <type>[, <ty> <NumElements>][, align <alignment>] <i>; yields {type*}:result</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>alloca</tt>' instruction allocates memory on the stack frame of the - currently executing function, to be automatically released when this function - returns to its caller. The object is always allocated in the generic address - space (address space zero).</p> - -<h5>Arguments:</h5> -<p>The '<tt>alloca</tt>' instruction - allocates <tt>sizeof(<type>)*NumElements</tt> bytes of memory on the - runtime stack, returning a pointer of the appropriate type to the program. - If "NumElements" is specified, it is the number of elements allocated, - otherwise "NumElements" is defaulted to be one. If a constant alignment is - specified, the value result of the allocation is guaranteed to be aligned to - at least that boundary. If not specified, or if zero, the target can choose - to align the allocation on any convenient boundary compatible with the - type.</p> - -<p>'<tt>type</tt>' may be any sized type.</p> - -<h5>Semantics:</h5> -<p>Memory is allocated; a pointer is returned. The operation is undefined if - there is insufficient stack space for the allocation. '<tt>alloca</tt>'d - memory is automatically released when the function returns. The - '<tt>alloca</tt>' instruction is commonly used to represent automatic - variables that must have an address available. When the function returns - (either with the <tt><a href="#i_ret">ret</a></tt> - or <tt><a href="#i_resume">resume</a></tt> instructions), the memory is - reclaimed. Allocating zero bytes is legal, but the result is undefined. - The order in which memory is allocated (ie., which way the stack grows) is - not specified.</p> - -<p> - -<h5>Example:</h5> -<pre> - %ptr = alloca i32 <i>; yields {i32*}:ptr</i> - %ptr = alloca i32, i32 4 <i>; yields {i32*}:ptr</i> - %ptr = alloca i32, i32 4, align 1024 <i>; yields {i32*}:ptr</i> - %ptr = alloca i32, align 1024 <i>; yields {i32*}:ptr</i> -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_load">'<tt>load</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = load [volatile] <ty>* <pointer>[, align <alignment>][, !nontemporal !<index>][, !invariant.load !<index>] - <result> = load atomic [volatile] <ty>* <pointer> [singlethread] <ordering>, align <alignment> - !<index> = !{ i32 1 } -</pre> - -<h5>Overview:</h5> -<p>The '<tt>load</tt>' instruction is used to read from memory.</p> - -<h5>Arguments:</h5> -<p>The argument to the '<tt>load</tt>' instruction specifies the memory address - from which to load. The pointer must point to - a <a href="#t_firstclass">first class</a> type. If the <tt>load</tt> is - marked as <tt>volatile</tt>, then the optimizer is not allowed to modify the - number or order of execution of this <tt>load</tt> with other <a - href="#volatile">volatile operations</a>.</p> - -<p>If the <code>load</code> is marked as <code>atomic</code>, it takes an extra - <a href="#ordering">ordering</a> and optional <code>singlethread</code> - argument. The <code>release</code> and <code>acq_rel</code> orderings are - not valid on <code>load</code> instructions. Atomic loads produce <a - href="#memorymodel">defined</a> results when they may see multiple atomic - stores. The type of the pointee must be an integer type whose bit width - is a power of two greater than or equal to eight and less than or equal - to a target-specific size limit. <code>align</code> must be explicitly - specified on atomic loads, and the load has undefined behavior if the - alignment is not set to a value which is at least the size in bytes of - the pointee. <code>!nontemporal</code> does not have any defined semantics - for atomic loads.</p> - -<p>The optional constant <tt>align</tt> argument specifies the alignment of the - operation (that is, the alignment of the memory address). A value of 0 or an - omitted <tt>align</tt> argument means that the operation has the abi - alignment for the target. It is the responsibility of the code emitter to - ensure that the alignment information is correct. Overestimating the - alignment results in undefined behavior. Underestimating the alignment may - produce less efficient code. An alignment of 1 is always safe.</p> - -<p>The optional <tt>!nontemporal</tt> metadata must reference a single - metatadata name <index> corresponding to a metadata node with - one <tt>i32</tt> entry of value 1. The existence of - the <tt>!nontemporal</tt> metatadata on the instruction tells the optimizer - and code generator that this load is not expected to be reused in the cache. - The code generator may select special instructions to save cache bandwidth, - such as the <tt>MOVNT</tt> instruction on x86.</p> - -<p>The optional <tt>!invariant.load</tt> metadata must reference a single - metatadata name <index> corresponding to a metadata node with no - entries. The existence of the <tt>!invariant.load</tt> metatadata on the - instruction tells the optimizer and code generator that this load address - points to memory which does not change value during program execution. - The optimizer may then move this load around, for example, by hoisting it - out of loops using loop invariant code motion.</p> - -<h5>Semantics:</h5> -<p>The location of memory pointed to is loaded. If the value being loaded is of - scalar type then the number of bytes read does not exceed the minimum number - of bytes needed to hold all bits of the type. For example, loading an - <tt>i24</tt> reads at most three bytes. When loading a value of a type like - <tt>i20</tt> with a size that is not an integral number of bytes, the result - is undefined if the value was not originally written using a store of the - same type.</p> - -<h5>Examples:</h5> -<pre> - %ptr = <a href="#i_alloca">alloca</a> i32 <i>; yields {i32*}:ptr</i> - <a href="#i_store">store</a> i32 3, i32* %ptr <i>; yields {void}</i> - %val = load i32* %ptr <i>; yields {i32}:val = i32 3</i> -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_store">'<tt>store</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - store [volatile] <ty> <value>, <ty>* <pointer>[, align <alignment>][, !nontemporal !<index>] <i>; yields {void}</i> - store atomic [volatile] <ty> <value>, <ty>* <pointer> [singlethread] <ordering>, align <alignment> <i>; yields {void}</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>store</tt>' instruction is used to write to memory.</p> - -<h5>Arguments:</h5> -<p>There are two arguments to the '<tt>store</tt>' instruction: a value to store - and an address at which to store it. The type of the - '<tt><pointer></tt>' operand must be a pointer to - the <a href="#t_firstclass">first class</a> type of the - '<tt><value></tt>' operand. If the <tt>store</tt> is marked as - <tt>volatile</tt>, then the optimizer is not allowed to modify the number or - order of execution of this <tt>store</tt> with other <a - href="#volatile">volatile operations</a>.</p> - -<p>If the <code>store</code> is marked as <code>atomic</code>, it takes an extra - <a href="#ordering">ordering</a> and optional <code>singlethread</code> - argument. The <code>acquire</code> and <code>acq_rel</code> orderings aren't - valid on <code>store</code> instructions. Atomic loads produce <a - href="#memorymodel">defined</a> results when they may see multiple atomic - stores. The type of the pointee must be an integer type whose bit width - is a power of two greater than or equal to eight and less than or equal - to a target-specific size limit. <code>align</code> must be explicitly - specified on atomic stores, and the store has undefined behavior if the - alignment is not set to a value which is at least the size in bytes of - the pointee. <code>!nontemporal</code> does not have any defined semantics - for atomic stores.</p> - -<p>The optional constant "align" argument specifies the alignment of the - operation (that is, the alignment of the memory address). A value of 0 or an - omitted "align" argument means that the operation has the abi - alignment for the target. It is the responsibility of the code emitter to - ensure that the alignment information is correct. Overestimating the - alignment results in an undefined behavior. Underestimating the alignment may - produce less efficient code. An alignment of 1 is always safe.</p> - -<p>The optional !nontemporal metadata must reference a single metatadata - name <index> corresponding to a metadata node with one i32 entry of - value 1. The existence of the !nontemporal metatadata on the - instruction tells the optimizer and code generator that this load is - not expected to be reused in the cache. The code generator may - select special instructions to save cache bandwidth, such as the - MOVNT instruction on x86.</p> - - -<h5>Semantics:</h5> -<p>The contents of memory are updated to contain '<tt><value></tt>' at the - location specified by the '<tt><pointer></tt>' operand. If - '<tt><value></tt>' is of scalar type then the number of bytes written - does not exceed the minimum number of bytes needed to hold all bits of the - type. For example, storing an <tt>i24</tt> writes at most three bytes. When - writing a value of a type like <tt>i20</tt> with a size that is not an - integral number of bytes, it is unspecified what happens to the extra bits - that do not belong to the type, but they will typically be overwritten.</p> - -<h5>Example:</h5> -<pre> - %ptr = <a href="#i_alloca">alloca</a> i32 <i>; yields {i32*}:ptr</i> - store i32 3, i32* %ptr <i>; yields {void}</i> - %val = <a href="#i_load">load</a> i32* %ptr <i>; yields {i32}:val = i32 3</i> -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> -<a name="i_fence">'<tt>fence</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - fence [singlethread] <ordering> <i>; yields {void}</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>fence</tt>' instruction is used to introduce happens-before edges -between operations.</p> - -<h5>Arguments:</h5> <p>'<code>fence</code>' instructions take an <a -href="#ordering">ordering</a> argument which defines what -<i>synchronizes-with</i> edges they add. They can only be given -<code>acquire</code>, <code>release</code>, <code>acq_rel</code>, and -<code>seq_cst</code> orderings.</p> - -<h5>Semantics:</h5> -<p>A fence <var>A</var> which has (at least) <code>release</code> ordering -semantics <i>synchronizes with</i> a fence <var>B</var> with (at least) -<code>acquire</code> ordering semantics if and only if there exist atomic -operations <var>X</var> and <var>Y</var>, both operating on some atomic object -<var>M</var>, such that <var>A</var> is sequenced before <var>X</var>, -<var>X</var> modifies <var>M</var> (either directly or through some side effect -of a sequence headed by <var>X</var>), <var>Y</var> is sequenced before -<var>B</var>, and <var>Y</var> observes <var>M</var>. This provides a -<i>happens-before</i> dependency between <var>A</var> and <var>B</var>. Rather -than an explicit <code>fence</code>, one (but not both) of the atomic operations -<var>X</var> or <var>Y</var> might provide a <code>release</code> or -<code>acquire</code> (resp.) ordering constraint and still -<i>synchronize-with</i> the explicit <code>fence</code> and establish the -<i>happens-before</i> edge.</p> - -<p>A <code>fence</code> which has <code>seq_cst</code> ordering, in addition to -having both <code>acquire</code> and <code>release</code> semantics specified -above, participates in the global program order of other <code>seq_cst</code> -operations and/or fences.</p> - -<p>The optional "<a href="#singlethread"><code>singlethread</code></a>" argument -specifies that the fence only synchronizes with other fences in the same -thread. (This is useful for interacting with signal handlers.)</p> - -<h5>Example:</h5> -<pre> - fence acquire <i>; yields {void}</i> - fence singlethread seq_cst <i>; yields {void}</i> -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> -<a name="i_cmpxchg">'<tt>cmpxchg</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - cmpxchg [volatile] <ty>* <pointer>, <ty> <cmp>, <ty> <new> [singlethread] <ordering> <i>; yields {ty}</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>cmpxchg</tt>' instruction is used to atomically modify memory. -It loads a value in memory and compares it to a given value. If they are -equal, it stores a new value into the memory.</p> - -<h5>Arguments:</h5> -<p>There are three arguments to the '<code>cmpxchg</code>' instruction: an -address to operate on, a value to compare to the value currently be at that -address, and a new value to place at that address if the compared values are -equal. The type of '<var><cmp></var>' must be an integer type whose -bit width is a power of two greater than or equal to eight and less than -or equal to a target-specific size limit. '<var><cmp></var>' and -'<var><new></var>' must have the same type, and the type of -'<var><pointer></var>' must be a pointer to that type. If the -<code>cmpxchg</code> is marked as <code>volatile</code>, then the -optimizer is not allowed to modify the number or order of execution -of this <code>cmpxchg</code> with other <a href="#volatile">volatile -operations</a>.</p> - -<!-- FIXME: Extend allowed types. --> - -<p>The <a href="#ordering"><var>ordering</var></a> argument specifies how this -<code>cmpxchg</code> synchronizes with other atomic operations.</p> - -<p>The optional "<code>singlethread</code>" argument declares that the -<code>cmpxchg</code> is only atomic with respect to code (usually signal -handlers) running in the same thread as the <code>cmpxchg</code>. Otherwise the -cmpxchg is atomic with respect to all other code in the system.</p> - -<p>The pointer passed into cmpxchg must have alignment greater than or equal to -the size in memory of the operand. - -<h5>Semantics:</h5> -<p>The contents of memory at the location specified by the -'<tt><pointer></tt>' operand is read and compared to -'<tt><cmp></tt>'; if the read value is the equal, -'<tt><new></tt>' is written. The original value at the location -is returned. - -<p>A successful <code>cmpxchg</code> is a read-modify-write instruction for the -purpose of identifying <a href="#release_sequence">release sequences</a>. A -failed <code>cmpxchg</code> is equivalent to an atomic load with an ordering -parameter determined by dropping any <code>release</code> part of the -<code>cmpxchg</code>'s ordering.</p> - -<!-- -FIXME: Is compare_exchange_weak() necessary? (Consider after we've done -optimization work on ARM.) - -FIXME: Is a weaker ordering constraint on failure helpful in practice? ---> - -<h5>Example:</h5> -<pre> -entry: - %orig = atomic <a href="#i_load">load</a> i32* %ptr unordered <i>; yields {i32}</i> - <a href="#i_br">br</a> label %loop - -loop: - %cmp = <a href="#i_phi">phi</a> i32 [ %orig, %entry ], [%old, %loop] - %squared = <a href="#i_mul">mul</a> i32 %cmp, %cmp - %old = cmpxchg i32* %ptr, i32 %cmp, i32 %squared <i>; yields {i32}</i> - %success = <a href="#i_icmp">icmp</a> eq i32 %cmp, %old - <a href="#i_br">br</a> i1 %success, label %done, label %loop - -done: - ... -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> -<a name="i_atomicrmw">'<tt>atomicrmw</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - atomicrmw [volatile] <operation> <ty>* <pointer>, <ty> <value> [singlethread] <ordering> <i>; yields {ty}</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>atomicrmw</tt>' instruction is used to atomically modify memory.</p> - -<h5>Arguments:</h5> -<p>There are three arguments to the '<code>atomicrmw</code>' instruction: an -operation to apply, an address whose value to modify, an argument to the -operation. The operation must be one of the following keywords:</p> -<ul> - <li>xchg</li> - <li>add</li> - <li>sub</li> - <li>and</li> - <li>nand</li> - <li>or</li> - <li>xor</li> - <li>max</li> - <li>min</li> - <li>umax</li> - <li>umin</li> -</ul> - -<p>The type of '<var><value></var>' must be an integer type whose -bit width is a power of two greater than or equal to eight and less than -or equal to a target-specific size limit. The type of the -'<code><pointer></code>' operand must be a pointer to that type. -If the <code>atomicrmw</code> is marked as <code>volatile</code>, then the -optimizer is not allowed to modify the number or order of execution of this -<code>atomicrmw</code> with other <a href="#volatile">volatile - operations</a>.</p> - -<!-- FIXME: Extend allowed types. --> - -<h5>Semantics:</h5> -<p>The contents of memory at the location specified by the -'<tt><pointer></tt>' operand are atomically read, modified, and written -back. The original value at the location is returned. The modification is -specified by the <var>operation</var> argument:</p> - -<ul> - <li>xchg: <code>*ptr = val</code></li> - <li>add: <code>*ptr = *ptr + val</code></li> - <li>sub: <code>*ptr = *ptr - val</code></li> - <li>and: <code>*ptr = *ptr & val</code></li> - <li>nand: <code>*ptr = ~(*ptr & val)</code></li> - <li>or: <code>*ptr = *ptr | val</code></li> - <li>xor: <code>*ptr = *ptr ^ val</code></li> - <li>max: <code>*ptr = *ptr > val ? *ptr : val</code> (using a signed comparison)</li> - <li>min: <code>*ptr = *ptr < val ? *ptr : val</code> (using a signed comparison)</li> - <li>umax: <code>*ptr = *ptr > val ? *ptr : val</code> (using an unsigned comparison)</li> - <li>umin: <code>*ptr = *ptr < val ? *ptr : val</code> (using an unsigned comparison)</li> -</ul> - -<h5>Example:</h5> -<pre> - %old = atomicrmw add i32* %ptr, i32 1 acquire <i>; yields {i32}</i> -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_getelementptr">'<tt>getelementptr</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = getelementptr <pty>* <ptrval>{, <ty> <idx>}* - <result> = getelementptr inbounds <pty>* <ptrval>{, <ty> <idx>}* - <result> = getelementptr <ptr vector> ptrval, <vector index type> idx -</pre> - -<h5>Overview:</h5> -<p>The '<tt>getelementptr</tt>' instruction is used to get the address of a - subelement of an <a href="#t_aggregate">aggregate</a> data structure. - It performs address calculation only and does not access memory.</p> - -<h5>Arguments:</h5> -<p>The first argument is always a pointer or a vector of pointers, - and forms the basis of the - calculation. The remaining arguments are indices that indicate which of the - elements of the aggregate object are indexed. The interpretation of each - index is dependent on the type being indexed into. The first index always - indexes the pointer value given as the first argument, the second index - indexes a value of the type pointed to (not necessarily the value directly - pointed to, since the first index can be non-zero), etc. The first type - indexed into must be a pointer value, subsequent types can be arrays, - vectors, and structs. Note that subsequent types being indexed into - can never be pointers, since that would require loading the pointer before - continuing calculation.</p> - -<p>The type of each index argument depends on the type it is indexing into. - When indexing into a (optionally packed) structure, only <tt>i32</tt> - integer <b>constants</b> are allowed. When indexing into an array, pointer - or vector, integers of any width are allowed, and they are not required to be - constant. These integers are treated as signed values where relevant.</p> - -<p>For example, let's consider a C code fragment and how it gets compiled to - LLVM:</p> - -<pre class="doc_code"> -struct RT { - char A; - int B[10][20]; - char C; -}; -struct ST { - int X; - double Y; - struct RT Z; -}; - -int *foo(struct ST *s) { - return &s[1].Z.B[5][13]; -} -</pre> - -<p>The LLVM code generated by Clang is:</p> - -<pre class="doc_code"> -%struct.RT = <a href="#namedtypes">type</a> { i8, [10 x [20 x i32]], i8 } -%struct.ST = <a href="#namedtypes">type</a> { i32, double, %struct.RT } - -define i32* @foo(%struct.ST* %s) nounwind uwtable readnone optsize ssp { -entry: - %arrayidx = getelementptr inbounds %struct.ST* %s, i64 1, i32 2, i32 1, i64 5, i64 13 - ret i32* %arrayidx -} -</pre> - -<h5>Semantics:</h5> -<p>In the example above, the first index is indexing into the - '<tt>%struct.ST*</tt>' type, which is a pointer, yielding a - '<tt>%struct.ST</tt>' = '<tt>{ i32, double, %struct.RT }</tt>' type, a - structure. The second index indexes into the third element of the structure, - yielding a '<tt>%struct.RT</tt>' = '<tt>{ i8 , [10 x [20 x i32]], i8 }</tt>' - type, another structure. The third index indexes into the second element of - the structure, yielding a '<tt>[10 x [20 x i32]]</tt>' type, an array. The - two dimensions of the array are subscripted into, yielding an '<tt>i32</tt>' - type. The '<tt>getelementptr</tt>' instruction returns a pointer to this - element, thus computing a value of '<tt>i32*</tt>' type.</p> - -<p>Note that it is perfectly legal to index partially through a structure, - returning a pointer to an inner element. Because of this, the LLVM code for - the given testcase is equivalent to:</p> - -<pre class="doc_code"> -define i32* @foo(%struct.ST* %s) { - %t1 = getelementptr %struct.ST* %s, i32 1 <i>; yields %struct.ST*:%t1</i> - %t2 = getelementptr %struct.ST* %t1, i32 0, i32 2 <i>; yields %struct.RT*:%t2</i> - %t3 = getelementptr %struct.RT* %t2, i32 0, i32 1 <i>; yields [10 x [20 x i32]]*:%t3</i> - %t4 = getelementptr [10 x [20 x i32]]* %t3, i32 0, i32 5 <i>; yields [20 x i32]*:%t4</i> - %t5 = getelementptr [20 x i32]* %t4, i32 0, i32 13 <i>; yields i32*:%t5</i> - ret i32* %t5 -} -</pre> - -<p>If the <tt>inbounds</tt> keyword is present, the result value of the - <tt>getelementptr</tt> is a <a href="#poisonvalues">poison value</a> if the - base pointer is not an <i>in bounds</i> address of an allocated object, - or if any of the addresses that would be formed by successive addition of - the offsets implied by the indices to the base address with infinitely - precise signed arithmetic are not an <i>in bounds</i> address of that - allocated object. The <i>in bounds</i> addresses for an allocated object - are all the addresses that point into the object, plus the address one - byte past the end. - In cases where the base is a vector of pointers the <tt>inbounds</tt> keyword - applies to each of the computations element-wise. </p> - -<p>If the <tt>inbounds</tt> keyword is not present, the offsets are added to - the base address with silently-wrapping two's complement arithmetic. If the - offsets have a different width from the pointer, they are sign-extended or - truncated to the width of the pointer. The result value of the - <tt>getelementptr</tt> may be outside the object pointed to by the base - pointer. The result value may not necessarily be used to access memory - though, even if it happens to point into allocated storage. See the - <a href="#pointeraliasing">Pointer Aliasing Rules</a> section for more - information.</p> - -<p>The getelementptr instruction is often confusing. For some more insight into - how it works, see <a href="GetElementPtr.html">the getelementptr FAQ</a>.</p> - -<h5>Example:</h5> -<pre> - <i>; yields [12 x i8]*:aptr</i> - %aptr = getelementptr {i32, [12 x i8]}* %saptr, i64 0, i32 1 - <i>; yields i8*:vptr</i> - %vptr = getelementptr {i32, <2 x i8>}* %svptr, i64 0, i32 1, i32 1 - <i>; yields i8*:eptr</i> - %eptr = getelementptr [12 x i8]* %aptr, i64 0, i32 1 - <i>; yields i32*:iptr</i> - %iptr = getelementptr [10 x i32]* @arr, i16 0, i16 0 -</pre> - -<p>In cases where the pointer argument is a vector of pointers, only a - single index may be used, and the number of vector elements has to be - the same. For example: </p> -<pre class="doc_code"> - %A = getelementptr <4 x i8*> %ptrs, <4 x i64> %offsets, -</pre> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="convertops">Conversion Operations</a> -</h3> - -<div> - -<p>The instructions in this category are the conversion instructions (casting) - which all take a single operand and a type. They perform various bit - conversions on the operand.</p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_trunc">'<tt>trunc .. to</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = trunc <ty> <value> to <ty2> <i>; yields ty2</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>trunc</tt>' instruction truncates its operand to the - type <tt>ty2</tt>.</p> - -<h5>Arguments:</h5> -<p>The '<tt>trunc</tt>' instruction takes a value to trunc, and a type to trunc it to. - Both types must be of <a href="#t_integer">integer</a> types, or vectors - of the same number of integers. - The bit size of the <tt>value</tt> must be larger than - the bit size of the destination type, <tt>ty2</tt>. - Equal sized types are not allowed.</p> - -<h5>Semantics:</h5> -<p>The '<tt>trunc</tt>' instruction truncates the high order bits - in <tt>value</tt> and converts the remaining bits to <tt>ty2</tt>. Since the - source size must be larger than the destination size, <tt>trunc</tt> cannot - be a <i>no-op cast</i>. It will always truncate bits.</p> - -<h5>Example:</h5> -<pre> - %X = trunc i32 257 to i8 <i>; yields i8:1</i> - %Y = trunc i32 123 to i1 <i>; yields i1:true</i> - %Z = trunc i32 122 to i1 <i>; yields i1:false</i> - %W = trunc <2 x i16> <i16 8, i16 7> to <2 x i8> <i>; yields <i8 8, i8 7></i> -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_zext">'<tt>zext .. to</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = zext <ty> <value> to <ty2> <i>; yields ty2</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>zext</tt>' instruction zero extends its operand to type - <tt>ty2</tt>.</p> - - -<h5>Arguments:</h5> -<p>The '<tt>zext</tt>' instruction takes a value to cast, and a type to cast it to. - Both types must be of <a href="#t_integer">integer</a> types, or vectors - of the same number of integers. - The bit size of the <tt>value</tt> must be smaller than - the bit size of the destination type, - <tt>ty2</tt>.</p> - -<h5>Semantics:</h5> -<p>The <tt>zext</tt> fills the high order bits of the <tt>value</tt> with zero - bits until it reaches the size of the destination type, <tt>ty2</tt>.</p> - -<p>When zero extending from i1, the result will always be either 0 or 1.</p> - -<h5>Example:</h5> -<pre> - %X = zext i32 257 to i64 <i>; yields i64:257</i> - %Y = zext i1 true to i32 <i>; yields i32:1</i> - %Z = zext <2 x i16> <i16 8, i16 7> to <2 x i32> <i>; yields <i32 8, i32 7></i> -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_sext">'<tt>sext .. to</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = sext <ty> <value> to <ty2> <i>; yields ty2</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>sext</tt>' sign extends <tt>value</tt> to the type <tt>ty2</tt>.</p> - -<h5>Arguments:</h5> -<p>The '<tt>sext</tt>' instruction takes a value to cast, and a type to cast it to. - Both types must be of <a href="#t_integer">integer</a> types, or vectors - of the same number of integers. - The bit size of the <tt>value</tt> must be smaller than - the bit size of the destination type, - <tt>ty2</tt>.</p> - -<h5>Semantics:</h5> -<p>The '<tt>sext</tt>' instruction performs a sign extension by copying the sign - bit (highest order bit) of the <tt>value</tt> until it reaches the bit size - of the type <tt>ty2</tt>.</p> - -<p>When sign extending from i1, the extension always results in -1 or 0.</p> - -<h5>Example:</h5> -<pre> - %X = sext i8 -1 to i16 <i>; yields i16 :65535</i> - %Y = sext i1 true to i32 <i>; yields i32:-1</i> - %Z = sext <2 x i16> <i16 8, i16 7> to <2 x i32> <i>; yields <i32 8, i32 7></i> -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_fptrunc">'<tt>fptrunc .. to</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = fptrunc <ty> <value> to <ty2> <i>; yields ty2</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>fptrunc</tt>' instruction truncates <tt>value</tt> to type - <tt>ty2</tt>.</p> - -<h5>Arguments:</h5> -<p>The '<tt>fptrunc</tt>' instruction takes a <a href="#t_floating">floating - point</a> value to cast and a <a href="#t_floating">floating point</a> type - to cast it to. The size of <tt>value</tt> must be larger than the size of - <tt>ty2</tt>. This implies that <tt>fptrunc</tt> cannot be used to make a - <i>no-op cast</i>.</p> - -<h5>Semantics:</h5> -<p>The '<tt>fptrunc</tt>' instruction truncates a <tt>value</tt> from a larger - <a href="#t_floating">floating point</a> type to a smaller - <a href="#t_floating">floating point</a> type. If the value cannot fit - within the destination type, <tt>ty2</tt>, then the results are - undefined.</p> - -<h5>Example:</h5> -<pre> - %X = fptrunc double 123.0 to float <i>; yields float:123.0</i> - %Y = fptrunc double 1.0E+300 to float <i>; yields undefined</i> -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_fpext">'<tt>fpext .. to</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = fpext <ty> <value> to <ty2> <i>; yields ty2</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>fpext</tt>' extends a floating point <tt>value</tt> to a larger - floating point value.</p> - -<h5>Arguments:</h5> -<p>The '<tt>fpext</tt>' instruction takes a - <a href="#t_floating">floating point</a> <tt>value</tt> to cast, and - a <a href="#t_floating">floating point</a> type to cast it to. The source - type must be smaller than the destination type.</p> - -<h5>Semantics:</h5> -<p>The '<tt>fpext</tt>' instruction extends the <tt>value</tt> from a smaller - <a href="#t_floating">floating point</a> type to a larger - <a href="#t_floating">floating point</a> type. The <tt>fpext</tt> cannot be - used to make a <i>no-op cast</i> because it always changes bits. Use - <tt>bitcast</tt> to make a <i>no-op cast</i> for a floating point cast.</p> - -<h5>Example:</h5> -<pre> - %X = fpext float 3.125 to double <i>; yields double:3.125000e+00</i> - %Y = fpext double %X to fp128 <i>; yields fp128:0xL00000000000000004000900000000000</i> -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_fptoui">'<tt>fptoui .. to</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = fptoui <ty> <value> to <ty2> <i>; yields ty2</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>fptoui</tt>' converts a floating point <tt>value</tt> to its - unsigned integer equivalent of type <tt>ty2</tt>.</p> - -<h5>Arguments:</h5> -<p>The '<tt>fptoui</tt>' instruction takes a value to cast, which must be a - scalar or vector <a href="#t_floating">floating point</a> value, and a type - to cast it to <tt>ty2</tt>, which must be an <a href="#t_integer">integer</a> - type. If <tt>ty</tt> is a vector floating point type, <tt>ty2</tt> must be a - vector integer type with the same number of elements as <tt>ty</tt></p> - -<h5>Semantics:</h5> -<p>The '<tt>fptoui</tt>' instruction converts its - <a href="#t_floating">floating point</a> operand into the nearest (rounding - towards zero) unsigned integer value. If the value cannot fit - in <tt>ty2</tt>, the results are undefined.</p> - -<h5>Example:</h5> -<pre> - %X = fptoui double 123.0 to i32 <i>; yields i32:123</i> - %Y = fptoui float 1.0E+300 to i1 <i>; yields undefined:1</i> - %Z = fptoui float 1.04E+17 to i8 <i>; yields undefined:1</i> -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_fptosi">'<tt>fptosi .. to</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = fptosi <ty> <value> to <ty2> <i>; yields ty2</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>fptosi</tt>' instruction converts - <a href="#t_floating">floating point</a> <tt>value</tt> to - type <tt>ty2</tt>.</p> - -<h5>Arguments:</h5> -<p>The '<tt>fptosi</tt>' instruction takes a value to cast, which must be a - scalar or vector <a href="#t_floating">floating point</a> value, and a type - to cast it to <tt>ty2</tt>, which must be an <a href="#t_integer">integer</a> - type. If <tt>ty</tt> is a vector floating point type, <tt>ty2</tt> must be a - vector integer type with the same number of elements as <tt>ty</tt></p> - -<h5>Semantics:</h5> -<p>The '<tt>fptosi</tt>' instruction converts its - <a href="#t_floating">floating point</a> operand into the nearest (rounding - towards zero) signed integer value. If the value cannot fit in <tt>ty2</tt>, - the results are undefined.</p> - -<h5>Example:</h5> -<pre> - %X = fptosi double -123.0 to i32 <i>; yields i32:-123</i> - %Y = fptosi float 1.0E-247 to i1 <i>; yields undefined:1</i> - %Z = fptosi float 1.04E+17 to i8 <i>; yields undefined:1</i> -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_uitofp">'<tt>uitofp .. to</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = uitofp <ty> <value> to <ty2> <i>; yields ty2</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>uitofp</tt>' instruction regards <tt>value</tt> as an unsigned - integer and converts that value to the <tt>ty2</tt> type.</p> - -<h5>Arguments:</h5> -<p>The '<tt>uitofp</tt>' instruction takes a value to cast, which must be a - scalar or vector <a href="#t_integer">integer</a> value, and a type to cast - it to <tt>ty2</tt>, which must be an <a href="#t_floating">floating point</a> - type. If <tt>ty</tt> is a vector integer type, <tt>ty2</tt> must be a vector - floating point type with the same number of elements as <tt>ty</tt></p> - -<h5>Semantics:</h5> -<p>The '<tt>uitofp</tt>' instruction interprets its operand as an unsigned - integer quantity and converts it to the corresponding floating point - value. If the value cannot fit in the floating point value, the results are - undefined.</p> - -<h5>Example:</h5> -<pre> - %X = uitofp i32 257 to float <i>; yields float:257.0</i> - %Y = uitofp i8 -1 to double <i>; yields double:255.0</i> -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_sitofp">'<tt>sitofp .. to</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = sitofp <ty> <value> to <ty2> <i>; yields ty2</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>sitofp</tt>' instruction regards <tt>value</tt> as a signed integer - and converts that value to the <tt>ty2</tt> type.</p> - -<h5>Arguments:</h5> -<p>The '<tt>sitofp</tt>' instruction takes a value to cast, which must be a - scalar or vector <a href="#t_integer">integer</a> value, and a type to cast - it to <tt>ty2</tt>, which must be an <a href="#t_floating">floating point</a> - type. If <tt>ty</tt> is a vector integer type, <tt>ty2</tt> must be a vector - floating point type with the same number of elements as <tt>ty</tt></p> - -<h5>Semantics:</h5> -<p>The '<tt>sitofp</tt>' instruction interprets its operand as a signed integer - quantity and converts it to the corresponding floating point value. If the - value cannot fit in the floating point value, the results are undefined.</p> - -<h5>Example:</h5> -<pre> - %X = sitofp i32 257 to float <i>; yields float:257.0</i> - %Y = sitofp i8 -1 to double <i>; yields double:-1.0</i> -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_ptrtoint">'<tt>ptrtoint .. to</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = ptrtoint <ty> <value> to <ty2> <i>; yields ty2</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>ptrtoint</tt>' instruction converts the pointer or a vector of - pointers <tt>value</tt> to - the integer (or vector of integers) type <tt>ty2</tt>.</p> - -<h5>Arguments:</h5> -<p>The '<tt>ptrtoint</tt>' instruction takes a <tt>value</tt> to cast, which - must be a a value of type <a href="#t_pointer">pointer</a> or a vector of - pointers, and a type to cast it to - <tt>ty2</tt>, which must be an <a href="#t_integer">integer</a> or a vector - of integers type.</p> - -<h5>Semantics:</h5> -<p>The '<tt>ptrtoint</tt>' instruction converts <tt>value</tt> to integer type - <tt>ty2</tt> by interpreting the pointer value as an integer and either - truncating or zero extending that value to the size of the integer type. If - <tt>value</tt> is smaller than <tt>ty2</tt> then a zero extension is done. If - <tt>value</tt> is larger than <tt>ty2</tt> then a truncation is done. If they - are the same size, then nothing is done (<i>no-op cast</i>) other than a type - change.</p> - -<h5>Example:</h5> -<pre> - %X = ptrtoint i32* %P to i8 <i>; yields truncation on 32-bit architecture</i> - %Y = ptrtoint i32* %P to i64 <i>; yields zero extension on 32-bit architecture</i> - %Z = ptrtoint <4 x i32*> %P to <4 x i64><i>; yields vector zero extension for a vector of addresses on 32-bit architecture</i> -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_inttoptr">'<tt>inttoptr .. to</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = inttoptr <ty> <value> to <ty2> <i>; yields ty2</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>inttoptr</tt>' instruction converts an integer <tt>value</tt> to a - pointer type, <tt>ty2</tt>.</p> - -<h5>Arguments:</h5> -<p>The '<tt>inttoptr</tt>' instruction takes an <a href="#t_integer">integer</a> - value to cast, and a type to cast it to, which must be a - <a href="#t_pointer">pointer</a> type.</p> - -<h5>Semantics:</h5> -<p>The '<tt>inttoptr</tt>' instruction converts <tt>value</tt> to type - <tt>ty2</tt> by applying either a zero extension or a truncation depending on - the size of the integer <tt>value</tt>. If <tt>value</tt> is larger than the - size of a pointer then a truncation is done. If <tt>value</tt> is smaller - than the size of a pointer then a zero extension is done. If they are the - same size, nothing is done (<i>no-op cast</i>).</p> - -<h5>Example:</h5> -<pre> - %X = inttoptr i32 255 to i32* <i>; yields zero extension on 64-bit architecture</i> - %Y = inttoptr i32 255 to i32* <i>; yields no-op on 32-bit architecture</i> - %Z = inttoptr i64 0 to i32* <i>; yields truncation on 32-bit architecture</i> - %Z = inttoptr <4 x i32> %G to <4 x i8*><i>; yields truncation of vector G to four pointers</i> -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_bitcast">'<tt>bitcast .. to</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = bitcast <ty> <value> to <ty2> <i>; yields ty2</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>bitcast</tt>' instruction converts <tt>value</tt> to type - <tt>ty2</tt> without changing any bits.</p> - -<h5>Arguments:</h5> -<p>The '<tt>bitcast</tt>' instruction takes a value to cast, which must be a - non-aggregate first class value, and a type to cast it to, which must also be - a non-aggregate <a href="#t_firstclass">first class</a> type. The bit sizes - of <tt>value</tt> and the destination type, <tt>ty2</tt>, must be - identical. If the source type is a pointer, the destination type must also be - a pointer. This instruction supports bitwise conversion of vectors to - integers and to vectors of other types (as long as they have the same - size).</p> - -<h5>Semantics:</h5> -<p>The '<tt>bitcast</tt>' instruction converts <tt>value</tt> to type - <tt>ty2</tt>. It is always a <i>no-op cast</i> because no bits change with - this conversion. The conversion is done as if the <tt>value</tt> had been - stored to memory and read back as type <tt>ty2</tt>. - Pointer (or vector of pointers) types may only be converted to other pointer - (or vector of pointers) types with this instruction. To convert - pointers to other types, use the <a href="#i_inttoptr">inttoptr</a> or - <a href="#i_ptrtoint">ptrtoint</a> instructions first.</p> - -<h5>Example:</h5> -<pre> - %X = bitcast i8 255 to i8 <i>; yields i8 :-1</i> - %Y = bitcast i32* %x to sint* <i>; yields sint*:%x</i> - %Z = bitcast <2 x int> %V to i64; <i>; yields i64: %V</i> - %Z = bitcast <2 x i32*> %V to <2 x i64*> <i>; yields <2 x i64*></i> -</pre> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="otherops">Other Operations</a> -</h3> - -<div> - -<p>The instructions in this category are the "miscellaneous" instructions, which - defy better classification.</p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_icmp">'<tt>icmp</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = icmp <cond> <ty> <op1>, <op2> <i>; yields {i1} or {<N x i1>}:result</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>icmp</tt>' instruction returns a boolean value or a vector of - boolean values based on comparison of its two integer, integer vector, - pointer, or pointer vector operands.</p> - -<h5>Arguments:</h5> -<p>The '<tt>icmp</tt>' instruction takes three operands. The first operand is - the condition code indicating the kind of comparison to perform. It is not a - value, just a keyword. The possible condition code are:</p> - -<ol> - <li><tt>eq</tt>: equal</li> - <li><tt>ne</tt>: not equal </li> - <li><tt>ugt</tt>: unsigned greater than</li> - <li><tt>uge</tt>: unsigned greater or equal</li> - <li><tt>ult</tt>: unsigned less than</li> - <li><tt>ule</tt>: unsigned less or equal</li> - <li><tt>sgt</tt>: signed greater than</li> - <li><tt>sge</tt>: signed greater or equal</li> - <li><tt>slt</tt>: signed less than</li> - <li><tt>sle</tt>: signed less or equal</li> -</ol> - -<p>The remaining two arguments must be <a href="#t_integer">integer</a> or - <a href="#t_pointer">pointer</a> or integer <a href="#t_vector">vector</a> - typed. They must also be identical types.</p> - -<h5>Semantics:</h5> -<p>The '<tt>icmp</tt>' compares <tt>op1</tt> and <tt>op2</tt> according to the - condition code given as <tt>cond</tt>. The comparison performed always yields - either an <a href="#t_integer"><tt>i1</tt></a> or vector of <tt>i1</tt> - result, as follows:</p> - -<ol> - <li><tt>eq</tt>: yields <tt>true</tt> if the operands are equal, - <tt>false</tt> otherwise. No sign interpretation is necessary or - performed.</li> - - <li><tt>ne</tt>: yields <tt>true</tt> if the operands are unequal, - <tt>false</tt> otherwise. No sign interpretation is necessary or - performed.</li> - - <li><tt>ugt</tt>: interprets the operands as unsigned values and yields - <tt>true</tt> if <tt>op1</tt> is greater than <tt>op2</tt>.</li> - - <li><tt>uge</tt>: interprets the operands as unsigned values and yields - <tt>true</tt> if <tt>op1</tt> is greater than or equal - to <tt>op2</tt>.</li> - - <li><tt>ult</tt>: interprets the operands as unsigned values and yields - <tt>true</tt> if <tt>op1</tt> is less than <tt>op2</tt>.</li> - - <li><tt>ule</tt>: interprets the operands as unsigned values and yields - <tt>true</tt> if <tt>op1</tt> is less than or equal to <tt>op2</tt>.</li> - - <li><tt>sgt</tt>: interprets the operands as signed values and yields - <tt>true</tt> if <tt>op1</tt> is greater than <tt>op2</tt>.</li> - - <li><tt>sge</tt>: interprets the operands as signed values and yields - <tt>true</tt> if <tt>op1</tt> is greater than or equal - to <tt>op2</tt>.</li> - - <li><tt>slt</tt>: interprets the operands as signed values and yields - <tt>true</tt> if <tt>op1</tt> is less than <tt>op2</tt>.</li> - - <li><tt>sle</tt>: interprets the operands as signed values and yields - <tt>true</tt> if <tt>op1</tt> is less than or equal to <tt>op2</tt>.</li> -</ol> - -<p>If the operands are <a href="#t_pointer">pointer</a> typed, the pointer - values are compared as if they were integers.</p> - -<p>If the operands are integer vectors, then they are compared element by - element. The result is an <tt>i1</tt> vector with the same number of elements - as the values being compared. Otherwise, the result is an <tt>i1</tt>.</p> - -<h5>Example:</h5> -<pre> - <result> = icmp eq i32 4, 5 <i>; yields: result=false</i> - <result> = icmp ne float* %X, %X <i>; yields: result=false</i> - <result> = icmp ult i16 4, 5 <i>; yields: result=true</i> - <result> = icmp sgt i16 4, 5 <i>; yields: result=false</i> - <result> = icmp ule i16 -4, 5 <i>; yields: result=false</i> - <result> = icmp sge i16 4, 5 <i>; yields: result=false</i> -</pre> - -<p>Note that the code generator does not yet support vector types with - the <tt>icmp</tt> instruction.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_fcmp">'<tt>fcmp</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = fcmp <cond> <ty> <op1>, <op2> <i>; yields {i1} or {<N x i1>}:result</i> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>fcmp</tt>' instruction returns a boolean value or vector of boolean - values based on comparison of its operands.</p> - -<p>If the operands are floating point scalars, then the result type is a boolean -(<a href="#t_integer"><tt>i1</tt></a>).</p> - -<p>If the operands are floating point vectors, then the result type is a vector - of boolean with the same number of elements as the operands being - compared.</p> - -<h5>Arguments:</h5> -<p>The '<tt>fcmp</tt>' instruction takes three operands. The first operand is - the condition code indicating the kind of comparison to perform. It is not a - value, just a keyword. The possible condition code are:</p> - -<ol> - <li><tt>false</tt>: no comparison, always returns false</li> - <li><tt>oeq</tt>: ordered and equal</li> - <li><tt>ogt</tt>: ordered and greater than </li> - <li><tt>oge</tt>: ordered and greater than or equal</li> - <li><tt>olt</tt>: ordered and less than </li> - <li><tt>ole</tt>: ordered and less than or equal</li> - <li><tt>one</tt>: ordered and not equal</li> - <li><tt>ord</tt>: ordered (no nans)</li> - <li><tt>ueq</tt>: unordered or equal</li> - <li><tt>ugt</tt>: unordered or greater than </li> - <li><tt>uge</tt>: unordered or greater than or equal</li> - <li><tt>ult</tt>: unordered or less than </li> - <li><tt>ule</tt>: unordered or less than or equal</li> - <li><tt>une</tt>: unordered or not equal</li> - <li><tt>uno</tt>: unordered (either nans)</li> - <li><tt>true</tt>: no comparison, always returns true</li> -</ol> - -<p><i>Ordered</i> means that neither operand is a QNAN while - <i>unordered</i> means that either operand may be a QNAN.</p> - -<p>Each of <tt>val1</tt> and <tt>val2</tt> arguments must be either - a <a href="#t_floating">floating point</a> type or - a <a href="#t_vector">vector</a> of floating point type. They must have - identical types.</p> - -<h5>Semantics:</h5> -<p>The '<tt>fcmp</tt>' instruction compares <tt>op1</tt> and <tt>op2</tt> - according to the condition code given as <tt>cond</tt>. If the operands are - vectors, then the vectors are compared element by element. Each comparison - performed always yields an <a href="#t_integer">i1</a> result, as - follows:</p> - -<ol> - <li><tt>false</tt>: always yields <tt>false</tt>, regardless of operands.</li> - - <li><tt>oeq</tt>: yields <tt>true</tt> if both operands are not a QNAN and - <tt>op1</tt> is equal to <tt>op2</tt>.</li> - - <li><tt>ogt</tt>: yields <tt>true</tt> if both operands are not a QNAN and - <tt>op1</tt> is greater than <tt>op2</tt>.</li> - - <li><tt>oge</tt>: yields <tt>true</tt> if both operands are not a QNAN and - <tt>op1</tt> is greater than or equal to <tt>op2</tt>.</li> - - <li><tt>olt</tt>: yields <tt>true</tt> if both operands are not a QNAN and - <tt>op1</tt> is less than <tt>op2</tt>.</li> - - <li><tt>ole</tt>: yields <tt>true</tt> if both operands are not a QNAN and - <tt>op1</tt> is less than or equal to <tt>op2</tt>.</li> - - <li><tt>one</tt>: yields <tt>true</tt> if both operands are not a QNAN and - <tt>op1</tt> is not equal to <tt>op2</tt>.</li> - - <li><tt>ord</tt>: yields <tt>true</tt> if both operands are not a QNAN.</li> - - <li><tt>ueq</tt>: yields <tt>true</tt> if either operand is a QNAN or - <tt>op1</tt> is equal to <tt>op2</tt>.</li> - - <li><tt>ugt</tt>: yields <tt>true</tt> if either operand is a QNAN or - <tt>op1</tt> is greater than <tt>op2</tt>.</li> - - <li><tt>uge</tt>: yields <tt>true</tt> if either operand is a QNAN or - <tt>op1</tt> is greater than or equal to <tt>op2</tt>.</li> - - <li><tt>ult</tt>: yields <tt>true</tt> if either operand is a QNAN or - <tt>op1</tt> is less than <tt>op2</tt>.</li> - - <li><tt>ule</tt>: yields <tt>true</tt> if either operand is a QNAN or - <tt>op1</tt> is less than or equal to <tt>op2</tt>.</li> - - <li><tt>une</tt>: yields <tt>true</tt> if either operand is a QNAN or - <tt>op1</tt> is not equal to <tt>op2</tt>.</li> - - <li><tt>uno</tt>: yields <tt>true</tt> if either operand is a QNAN.</li> - - <li><tt>true</tt>: always yields <tt>true</tt>, regardless of operands.</li> -</ol> - -<h5>Example:</h5> -<pre> - <result> = fcmp oeq float 4.0, 5.0 <i>; yields: result=false</i> - <result> = fcmp one float 4.0, 5.0 <i>; yields: result=true</i> - <result> = fcmp olt float 4.0, 5.0 <i>; yields: result=true</i> - <result> = fcmp ueq double 1.0, 2.0 <i>; yields: result=false</i> -</pre> - -<p>Note that the code generator does not yet support vector types with - the <tt>fcmp</tt> instruction.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_phi">'<tt>phi</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = phi <ty> [ <val0>, <label0>], ... -</pre> - -<h5>Overview:</h5> -<p>The '<tt>phi</tt>' instruction is used to implement the φ node in the - SSA graph representing the function.</p> - -<h5>Arguments:</h5> -<p>The type of the incoming values is specified with the first type field. After - this, the '<tt>phi</tt>' instruction takes a list of pairs as arguments, with - one pair for each predecessor basic block of the current block. Only values - of <a href="#t_firstclass">first class</a> type may be used as the value - arguments to the PHI node. Only labels may be used as the label - arguments.</p> - -<p>There must be no non-phi instructions between the start of a basic block and - the PHI instructions: i.e. PHI instructions must be first in a basic - block.</p> - -<p>For the purposes of the SSA form, the use of each incoming value is deemed to - occur on the edge from the corresponding predecessor block to the current - block (but after any definition of an '<tt>invoke</tt>' instruction's return - value on the same edge).</p> - -<h5>Semantics:</h5> -<p>At runtime, the '<tt>phi</tt>' instruction logically takes on the value - specified by the pair corresponding to the predecessor basic block that - executed just prior to the current block.</p> - -<h5>Example:</h5> -<pre> -Loop: ; Infinite loop that counts from 0 on up... - %indvar = phi i32 [ 0, %LoopHeader ], [ %nextindvar, %Loop ] - %nextindvar = add i32 %indvar, 1 - br label %Loop -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_select">'<tt>select</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = select <i>selty</i> <cond>, <ty> <val1>, <ty> <val2> <i>; yields ty</i> - - <i>selty</i> is either i1 or {<N x i1>} -</pre> - -<h5>Overview:</h5> -<p>The '<tt>select</tt>' instruction is used to choose one value based on a - condition, without branching.</p> - - -<h5>Arguments:</h5> -<p>The '<tt>select</tt>' instruction requires an 'i1' value or a vector of 'i1' - values indicating the condition, and two values of the - same <a href="#t_firstclass">first class</a> type. If the val1/val2 are - vectors and the condition is a scalar, then entire vectors are selected, not - individual elements.</p> - -<h5>Semantics:</h5> -<p>If the condition is an i1 and it evaluates to 1, the instruction returns the - first value argument; otherwise, it returns the second value argument.</p> - -<p>If the condition is a vector of i1, then the value arguments must be vectors - of the same size, and the selection is done element by element.</p> - -<h5>Example:</h5> -<pre> - %X = select i1 true, i8 17, i8 42 <i>; yields i8:17</i> -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_call">'<tt>call</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <result> = [tail] call [<a href="#callingconv">cconv</a>] [<a href="#paramattrs">ret attrs</a>] <ty> [<fnty>*] <fnptrval>(<function args>) [<a href="#fnattrs">fn attrs</a>] -</pre> - -<h5>Overview:</h5> -<p>The '<tt>call</tt>' instruction represents a simple function call.</p> - -<h5>Arguments:</h5> -<p>This instruction requires several arguments:</p> - -<ol> - <li>The optional "tail" marker indicates that the callee function does not - access any allocas or varargs in the caller. Note that calls may be - marked "tail" even if they do not occur before - a <a href="#i_ret"><tt>ret</tt></a> instruction. If the "tail" marker is - present, the function call is eligible for tail call optimization, - but <a href="CodeGenerator.html#tailcallopt">might not in fact be - optimized into a jump</a>. The code generator may optimize calls marked - "tail" with either 1) automatic <a href="CodeGenerator.html#sibcallopt"> - sibling call optimization</a> when the caller and callee have - matching signatures, or 2) forced tail call optimization when the - following extra requirements are met: - <ul> - <li>Caller and callee both have the calling - convention <tt>fastcc</tt>.</li> - <li>The call is in tail position (ret immediately follows call and ret - uses value of call or is void).</li> - <li>Option <tt>-tailcallopt</tt> is enabled, - or <code>llvm::GuaranteedTailCallOpt</code> is <code>true</code>.</li> - <li><a href="CodeGenerator.html#tailcallopt">Platform specific - constraints are met.</a></li> - </ul> - </li> - - <li>The optional "cconv" marker indicates which <a href="#callingconv">calling - convention</a> the call should use. If none is specified, the call - defaults to using C calling conventions. The calling convention of the - call must match the calling convention of the target function, or else the - behavior is undefined.</li> - - <li>The optional <a href="#paramattrs">Parameter Attributes</a> list for - return values. Only '<tt>zeroext</tt>', '<tt>signext</tt>', and - '<tt>inreg</tt>' attributes are valid here.</li> - - <li>'<tt>ty</tt>': the type of the call instruction itself which is also the - type of the return value. Functions that return no value are marked - <tt><a href="#t_void">void</a></tt>.</li> - - <li>'<tt>fnty</tt>': shall be the signature of the pointer to function value - being invoked. The argument types must match the types implied by this - signature. This type can be omitted if the function is not varargs and if - the function type does not return a pointer to a function.</li> - - <li>'<tt>fnptrval</tt>': An LLVM value containing a pointer to a function to - be invoked. In most cases, this is a direct function invocation, but - indirect <tt>call</tt>s are just as possible, calling an arbitrary pointer - to function value.</li> - - <li>'<tt>function args</tt>': argument list whose types match the function - signature argument types and parameter attributes. All arguments must be - of <a href="#t_firstclass">first class</a> type. If the function - signature indicates the function accepts a variable number of arguments, - the extra arguments can be specified.</li> - - <li>The optional <a href="#fnattrs">function attributes</a> list. Only - '<tt>noreturn</tt>', '<tt>nounwind</tt>', '<tt>readonly</tt>' and - '<tt>readnone</tt>' attributes are valid here.</li> -</ol> - -<h5>Semantics:</h5> -<p>The '<tt>call</tt>' instruction is used to cause control flow to transfer to - a specified function, with its incoming arguments bound to the specified - values. Upon a '<tt><a href="#i_ret">ret</a></tt>' instruction in the called - function, control flow continues with the instruction after the function - call, and the return value of the function is bound to the result - argument.</p> - -<h5>Example:</h5> -<pre> - %retval = call i32 @test(i32 %argc) - call i32 (i8*, ...)* @printf(i8* %msg, i32 12, i8 42) <i>; yields i32</i> - %X = tail call i32 @foo() <i>; yields i32</i> - %Y = tail call <a href="#callingconv">fastcc</a> i32 @foo() <i>; yields i32</i> - call void %foo(i8 97 signext) - - %struct.A = type { i32, i8 } - %r = call %struct.A @foo() <i>; yields { 32, i8 }</i> - %gr = extractvalue %struct.A %r, 0 <i>; yields i32</i> - %gr1 = extractvalue %struct.A %r, 1 <i>; yields i8</i> - %Z = call void @foo() noreturn <i>; indicates that %foo never returns normally</i> - %ZZ = call zeroext i32 @bar() <i>; Return value is %zero extended</i> -</pre> - -<p>llvm treats calls to some functions with names and arguments that match the -standard C99 library as being the C99 library functions, and may perform -optimizations or generate code for them under that assumption. This is -something we'd like to change in the future to provide better support for -freestanding environments and non-C-based languages.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_va_arg">'<tt>va_arg</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <resultval> = va_arg <va_list*> <arglist>, <argty> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>va_arg</tt>' instruction is used to access arguments passed through - the "variable argument" area of a function call. It is used to implement the - <tt>va_arg</tt> macro in C.</p> - -<h5>Arguments:</h5> -<p>This instruction takes a <tt>va_list*</tt> value and the type of the - argument. It returns a value of the specified argument type and increments - the <tt>va_list</tt> to point to the next argument. The actual type - of <tt>va_list</tt> is target specific.</p> - -<h5>Semantics:</h5> -<p>The '<tt>va_arg</tt>' instruction loads an argument of the specified type - from the specified <tt>va_list</tt> and causes the <tt>va_list</tt> to point - to the next argument. For more information, see the variable argument - handling <a href="#int_varargs">Intrinsic Functions</a>.</p> - -<p>It is legal for this instruction to be called in a function which does not - take a variable number of arguments, for example, the <tt>vfprintf</tt> - function.</p> - -<p><tt>va_arg</tt> is an LLVM instruction instead of - an <a href="#intrinsics">intrinsic function</a> because it takes a type as an - argument.</p> - -<h5>Example:</h5> -<p>See the <a href="#int_varargs">variable argument processing</a> section.</p> - -<p>Note that the code generator does not yet fully support va_arg on many - targets. Also, it does not currently support va_arg with aggregate types on - any target.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="i_landingpad">'<tt>landingpad</tt>' Instruction</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - <resultval> = landingpad <resultty> personality <type> <pers_fn> <clause>+ - <resultval> = landingpad <resultty> personality <type> <pers_fn> cleanup <clause>* - - <clause> := catch <type> <value> - <clause> := filter <array constant type> <array constant> -</pre> - -<h5>Overview:</h5> -<p>The '<tt>landingpad</tt>' instruction is used by - <a href="ExceptionHandling.html#overview">LLVM's exception handling - system</a> to specify that a basic block is a landing pad — one where - the exception lands, and corresponds to the code found in the - <i><tt>catch</tt></i> portion of a <i><tt>try/catch</tt></i> sequence. It - defines values supplied by the personality function (<tt>pers_fn</tt>) upon - re-entry to the function. The <tt>resultval</tt> has the - type <tt>resultty</tt>.</p> - -<h5>Arguments:</h5> -<p>This instruction takes a <tt>pers_fn</tt> value. This is the personality - function associated with the unwinding mechanism. The optional - <tt>cleanup</tt> flag indicates that the landing pad block is a cleanup.</p> - -<p>A <tt>clause</tt> begins with the clause type — <tt>catch</tt> - or <tt>filter</tt> — and contains the global variable representing the - "type" that may be caught or filtered respectively. Unlike the - <tt>catch</tt> clause, the <tt>filter</tt> clause takes an array constant as - its argument. Use "<tt>[0 x i8**] undef</tt>" for a filter which cannot - throw. The '<tt>landingpad</tt>' instruction must contain <em>at least</em> - one <tt>clause</tt> or the <tt>cleanup</tt> flag.</p> - -<h5>Semantics:</h5> -<p>The '<tt>landingpad</tt>' instruction defines the values which are set by the - personality function (<tt>pers_fn</tt>) upon re-entry to the function, and - therefore the "result type" of the <tt>landingpad</tt> instruction. As with - calling conventions, how the personality function results are represented in - LLVM IR is target specific.</p> - -<p>The clauses are applied in order from top to bottom. If two - <tt>landingpad</tt> instructions are merged together through inlining, the - clauses from the calling function are appended to the list of clauses. - When the call stack is being unwound due to an exception being thrown, the - exception is compared against each <tt>clause</tt> in turn. If it doesn't - match any of the clauses, and the <tt>cleanup</tt> flag is not set, then - unwinding continues further up the call stack.</p> - -<p>The <tt>landingpad</tt> instruction has several restrictions:</p> - -<ul> - <li>A landing pad block is a basic block which is the unwind destination of an - '<tt>invoke</tt>' instruction.</li> - <li>A landing pad block must have a '<tt>landingpad</tt>' instruction as its - first non-PHI instruction.</li> - <li>There can be only one '<tt>landingpad</tt>' instruction within the landing - pad block.</li> - <li>A basic block that is not a landing pad block may not include a - '<tt>landingpad</tt>' instruction.</li> - <li>All '<tt>landingpad</tt>' instructions in a function must have the same - personality function.</li> -</ul> - -<h5>Example:</h5> -<pre> - ;; A landing pad which can catch an integer. - %res = landingpad { i8*, i32 } personality i32 (...)* @__gxx_personality_v0 - catch i8** @_ZTIi - ;; A landing pad that is a cleanup. - %res = landingpad { i8*, i32 } personality i32 (...)* @__gxx_personality_v0 - cleanup - ;; A landing pad which can catch an integer and can only throw a double. - %res = landingpad { i8*, i32 } personality i32 (...)* @__gxx_personality_v0 - catch i8** @_ZTIi - filter [1 x i8**] [@_ZTId] -</pre> - -</div> - -</div> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="intrinsics">Intrinsic Functions</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>LLVM supports the notion of an "intrinsic function". These functions have - well known names and semantics and are required to follow certain - restrictions. Overall, these intrinsics represent an extension mechanism for - the LLVM language that does not require changing all of the transformations - in LLVM when adding to the language (or the bitcode reader/writer, the - parser, etc...).</p> - -<p>Intrinsic function names must all start with an "<tt>llvm.</tt>" prefix. This - prefix is reserved in LLVM for intrinsic names; thus, function names may not - begin with this prefix. Intrinsic functions must always be external - functions: you cannot define the body of intrinsic functions. Intrinsic - functions may only be used in call or invoke instructions: it is illegal to - take the address of an intrinsic function. Additionally, because intrinsic - functions are part of the LLVM language, it is required if any are added that - they be documented here.</p> - -<p>Some intrinsic functions can be overloaded, i.e., the intrinsic represents a - family of functions that perform the same operation but on different data - types. Because LLVM can represent over 8 million different integer types, - overloading is used commonly to allow an intrinsic function to operate on any - integer type. One or more of the argument types or the result type can be - overloaded to accept any integer type. Argument types may also be defined as - exactly matching a previous argument's type or the result type. This allows - an intrinsic function which accepts multiple arguments, but needs all of them - to be of the same type, to only be overloaded with respect to a single - argument or the result.</p> - -<p>Overloaded intrinsics will have the names of its overloaded argument types - encoded into its function name, each preceded by a period. Only those types - which are overloaded result in a name suffix. Arguments whose type is matched - against another type do not. For example, the <tt>llvm.ctpop</tt> function - can take an integer of any width and returns an integer of exactly the same - integer width. This leads to a family of functions such as - <tt>i8 @llvm.ctpop.i8(i8 %val)</tt> and <tt>i29 @llvm.ctpop.i29(i29 - %val)</tt>. Only one type, the return type, is overloaded, and only one type - suffix is required. Because the argument's type is matched against the return - type, it does not require its own name suffix.</p> - -<p>To learn how to add an intrinsic function, please see the - <a href="ExtendingLLVM.html">Extending LLVM Guide</a>.</p> - -<!-- ======================================================================= --> -<h3> - <a name="int_varargs">Variable Argument Handling Intrinsics</a> -</h3> - -<div> - -<p>Variable argument support is defined in LLVM with - the <a href="#i_va_arg"><tt>va_arg</tt></a> instruction and these three - intrinsic functions. These functions are related to the similarly named - macros defined in the <tt><stdarg.h></tt> header file.</p> - -<p>All of these functions operate on arguments that use a target-specific value - type "<tt>va_list</tt>". The LLVM assembly language reference manual does - not define what this type is, so all transformations should be prepared to - handle these functions regardless of the type used.</p> - -<p>This example shows how the <a href="#i_va_arg"><tt>va_arg</tt></a> - instruction and the variable argument handling intrinsic functions are - used.</p> - -<pre class="doc_code"> -define i32 @test(i32 %X, ...) { - ; Initialize variable argument processing - %ap = alloca i8* - %ap2 = bitcast i8** %ap to i8* - call void @llvm.va_start(i8* %ap2) - - ; Read a single integer argument - %tmp = va_arg i8** %ap, i32 - - ; Demonstrate usage of llvm.va_copy and llvm.va_end - %aq = alloca i8* - %aq2 = bitcast i8** %aq to i8* - call void @llvm.va_copy(i8* %aq2, i8* %ap2) - call void @llvm.va_end(i8* %aq2) - - ; Stop processing of arguments. - call void @llvm.va_end(i8* %ap2) - ret i32 %tmp -} - -declare void @llvm.va_start(i8*) -declare void @llvm.va_copy(i8*, i8*) -declare void @llvm.va_end(i8*) -</pre> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_va_start">'<tt>llvm.va_start</tt>' Intrinsic</a> -</h4> - - -<div> - -<h5>Syntax:</h5> -<pre> - declare void %llvm.va_start(i8* <arglist>) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.va_start</tt>' intrinsic initializes <tt>*<arglist></tt> - for subsequent use by <tt><a href="#i_va_arg">va_arg</a></tt>.</p> - -<h5>Arguments:</h5> -<p>The argument is a pointer to a <tt>va_list</tt> element to initialize.</p> - -<h5>Semantics:</h5> -<p>The '<tt>llvm.va_start</tt>' intrinsic works just like the <tt>va_start</tt> - macro available in C. In a target-dependent way, it initializes - the <tt>va_list</tt> element to which the argument points, so that the next - call to <tt>va_arg</tt> will produce the first variable argument passed to - the function. Unlike the C <tt>va_start</tt> macro, this intrinsic does not - need to know the last argument of the function as the compiler can figure - that out.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_va_end">'<tt>llvm.va_end</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - declare void @llvm.va_end(i8* <arglist>) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.va_end</tt>' intrinsic destroys <tt>*<arglist></tt>, - which has been initialized previously - with <tt><a href="#int_va_start">llvm.va_start</a></tt> - or <tt><a href="#i_va_copy">llvm.va_copy</a></tt>.</p> - -<h5>Arguments:</h5> -<p>The argument is a pointer to a <tt>va_list</tt> to destroy.</p> - -<h5>Semantics:</h5> -<p>The '<tt>llvm.va_end</tt>' intrinsic works just like the <tt>va_end</tt> - macro available in C. In a target-dependent way, it destroys - the <tt>va_list</tt> element to which the argument points. Calls - to <a href="#int_va_start"><tt>llvm.va_start</tt></a> - and <a href="#int_va_copy"> <tt>llvm.va_copy</tt></a> must be matched exactly - with calls to <tt>llvm.va_end</tt>.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_va_copy">'<tt>llvm.va_copy</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - declare void @llvm.va_copy(i8* <destarglist>, i8* <srcarglist>) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.va_copy</tt>' intrinsic copies the current argument position - from the source argument list to the destination argument list.</p> - -<h5>Arguments:</h5> -<p>The first argument is a pointer to a <tt>va_list</tt> element to initialize. - The second argument is a pointer to a <tt>va_list</tt> element to copy - from.</p> - -<h5>Semantics:</h5> -<p>The '<tt>llvm.va_copy</tt>' intrinsic works just like the <tt>va_copy</tt> - macro available in C. In a target-dependent way, it copies the - source <tt>va_list</tt> element into the destination <tt>va_list</tt> - element. This intrinsic is necessary because - the <tt><a href="#int_va_start"> llvm.va_start</a></tt> intrinsic may be - arbitrarily complex and require, for example, memory allocation.</p> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="int_gc">Accurate Garbage Collection Intrinsics</a> -</h3> - -<div> - -<p>LLVM support for <a href="GarbageCollection.html">Accurate Garbage -Collection</a> (GC) requires the implementation and generation of these -intrinsics. These intrinsics allow identification of <a href="#int_gcroot">GC -roots on the stack</a>, as well as garbage collector implementations that -require <a href="#int_gcread">read</a> and <a href="#int_gcwrite">write</a> -barriers. Front-ends for type-safe garbage collected languages should generate -these intrinsics to make use of the LLVM garbage collectors. For more details, -see <a href="GarbageCollection.html">Accurate Garbage Collection with -LLVM</a>.</p> - -<p>The garbage collection intrinsics only operate on objects in the generic - address space (address space zero).</p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_gcroot">'<tt>llvm.gcroot</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - declare void @llvm.gcroot(i8** %ptrloc, i8* %metadata) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.gcroot</tt>' intrinsic declares the existence of a GC root to - the code generator, and allows some metadata to be associated with it.</p> - -<h5>Arguments:</h5> -<p>The first argument specifies the address of a stack object that contains the - root pointer. The second pointer (which must be either a constant or a - global value address) contains the meta-data to be associated with the - root.</p> - -<h5>Semantics:</h5> -<p>At runtime, a call to this intrinsic stores a null pointer into the "ptrloc" - location. At compile-time, the code generator generates information to allow - the runtime to find the pointer at GC safe points. The '<tt>llvm.gcroot</tt>' - intrinsic may only be used in a function which <a href="#gc">specifies a GC - algorithm</a>.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_gcread">'<tt>llvm.gcread</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - declare i8* @llvm.gcread(i8* %ObjPtr, i8** %Ptr) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.gcread</tt>' intrinsic identifies reads of references from heap - locations, allowing garbage collector implementations that require read - barriers.</p> - -<h5>Arguments:</h5> -<p>The second argument is the address to read from, which should be an address - allocated from the garbage collector. The first object is a pointer to the - start of the referenced object, if needed by the language runtime (otherwise - null).</p> - -<h5>Semantics:</h5> -<p>The '<tt>llvm.gcread</tt>' intrinsic has the same semantics as a load - instruction, but may be replaced with substantially more complex code by the - garbage collector runtime, as needed. The '<tt>llvm.gcread</tt>' intrinsic - may only be used in a function which <a href="#gc">specifies a GC - algorithm</a>.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_gcwrite">'<tt>llvm.gcwrite</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - declare void @llvm.gcwrite(i8* %P1, i8* %Obj, i8** %P2) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.gcwrite</tt>' intrinsic identifies writes of references to heap - locations, allowing garbage collector implementations that require write - barriers (such as generational or reference counting collectors).</p> - -<h5>Arguments:</h5> -<p>The first argument is the reference to store, the second is the start of the - object to store it to, and the third is the address of the field of Obj to - store to. If the runtime does not require a pointer to the object, Obj may - be null.</p> - -<h5>Semantics:</h5> -<p>The '<tt>llvm.gcwrite</tt>' intrinsic has the same semantics as a store - instruction, but may be replaced with substantially more complex code by the - garbage collector runtime, as needed. The '<tt>llvm.gcwrite</tt>' intrinsic - may only be used in a function which <a href="#gc">specifies a GC - algorithm</a>.</p> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="int_codegen">Code Generator Intrinsics</a> -</h3> - -<div> - -<p>These intrinsics are provided by LLVM to expose special features that may - only be implemented with code generator support.</p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_returnaddress">'<tt>llvm.returnaddress</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - declare i8 *@llvm.returnaddress(i32 <level>) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.returnaddress</tt>' intrinsic attempts to compute a - target-specific value indicating the return address of the current function - or one of its callers.</p> - -<h5>Arguments:</h5> -<p>The argument to this intrinsic indicates which function to return the address - for. Zero indicates the calling function, one indicates its caller, etc. - The argument is <b>required</b> to be a constant integer value.</p> - -<h5>Semantics:</h5> -<p>The '<tt>llvm.returnaddress</tt>' intrinsic either returns a pointer - indicating the return address of the specified call frame, or zero if it - cannot be identified. The value returned by this intrinsic is likely to be - incorrect or 0 for arguments other than zero, so it should only be used for - debugging purposes.</p> - -<p>Note that calling this intrinsic does not prevent function inlining or other - aggressive transformations, so the value returned may not be that of the - obvious source-language caller.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_frameaddress">'<tt>llvm.frameaddress</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - declare i8* @llvm.frameaddress(i32 <level>) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.frameaddress</tt>' intrinsic attempts to return the - target-specific frame pointer value for the specified stack frame.</p> - -<h5>Arguments:</h5> -<p>The argument to this intrinsic indicates which function to return the frame - pointer for. Zero indicates the calling function, one indicates its caller, - etc. The argument is <b>required</b> to be a constant integer value.</p> - -<h5>Semantics:</h5> -<p>The '<tt>llvm.frameaddress</tt>' intrinsic either returns a pointer - indicating the frame address of the specified call frame, or zero if it - cannot be identified. The value returned by this intrinsic is likely to be - incorrect or 0 for arguments other than zero, so it should only be used for - debugging purposes.</p> - -<p>Note that calling this intrinsic does not prevent function inlining or other - aggressive transformations, so the value returned may not be that of the - obvious source-language caller.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_stacksave">'<tt>llvm.stacksave</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - declare i8* @llvm.stacksave() -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.stacksave</tt>' intrinsic is used to remember the current state - of the function stack, for use - with <a href="#int_stackrestore"> <tt>llvm.stackrestore</tt></a>. This is - useful for implementing language features like scoped automatic variable - sized arrays in C99.</p> - -<h5>Semantics:</h5> -<p>This intrinsic returns a opaque pointer value that can be passed - to <a href="#int_stackrestore"><tt>llvm.stackrestore</tt></a>. When - an <tt>llvm.stackrestore</tt> intrinsic is executed with a value saved - from <tt>llvm.stacksave</tt>, it effectively restores the state of the stack - to the state it was in when the <tt>llvm.stacksave</tt> intrinsic executed. - In practice, this pops any <a href="#i_alloca">alloca</a> blocks from the - stack that were allocated after the <tt>llvm.stacksave</tt> was executed.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_stackrestore">'<tt>llvm.stackrestore</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - declare void @llvm.stackrestore(i8* %ptr) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.stackrestore</tt>' intrinsic is used to restore the state of - the function stack to the state it was in when the - corresponding <a href="#int_stacksave"><tt>llvm.stacksave</tt></a> intrinsic - executed. This is useful for implementing language features like scoped - automatic variable sized arrays in C99.</p> - -<h5>Semantics:</h5> -<p>See the description - for <a href="#int_stacksave"><tt>llvm.stacksave</tt></a>.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_prefetch">'<tt>llvm.prefetch</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - declare void @llvm.prefetch(i8* <address>, i32 <rw>, i32 <locality>, i32 <cache type>) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.prefetch</tt>' intrinsic is a hint to the code generator to - insert a prefetch instruction if supported; otherwise, it is a noop. - Prefetches have no effect on the behavior of the program but can change its - performance characteristics.</p> - -<h5>Arguments:</h5> -<p><tt>address</tt> is the address to be prefetched, <tt>rw</tt> is the - specifier determining if the fetch should be for a read (0) or write (1), - and <tt>locality</tt> is a temporal locality specifier ranging from (0) - no - locality, to (3) - extremely local keep in cache. The <tt>cache type</tt> - specifies whether the prefetch is performed on the data (1) or instruction (0) - cache. The <tt>rw</tt>, <tt>locality</tt> and <tt>cache type</tt> arguments - must be constant integers.</p> - -<h5>Semantics:</h5> -<p>This intrinsic does not modify the behavior of the program. In particular, - prefetches cannot trap and do not produce a value. On targets that support - this intrinsic, the prefetch can provide hints to the processor cache for - better performance.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_pcmarker">'<tt>llvm.pcmarker</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - declare void @llvm.pcmarker(i32 <id>) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.pcmarker</tt>' intrinsic is a method to export a Program - Counter (PC) in a region of code to simulators and other tools. The method - is target specific, but it is expected that the marker will use exported - symbols to transmit the PC of the marker. The marker makes no guarantees - that it will remain with any specific instruction after optimizations. It is - possible that the presence of a marker will inhibit optimizations. The - intended use is to be inserted after optimizations to allow correlations of - simulation runs.</p> - -<h5>Arguments:</h5> -<p><tt>id</tt> is a numerical id identifying the marker.</p> - -<h5>Semantics:</h5> -<p>This intrinsic does not modify the behavior of the program. Backends that do - not support this intrinsic may ignore it.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_readcyclecounter">'<tt>llvm.readcyclecounter</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - declare i64 @llvm.readcyclecounter() -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.readcyclecounter</tt>' intrinsic provides access to the cycle - counter register (or similar low latency, high accuracy clocks) on those - targets that support it. On X86, it should map to RDTSC. On Alpha, it - should map to RPCC. As the backing counters overflow quickly (on the order - of 9 seconds on alpha), this should only be used for small timings.</p> - -<h5>Semantics:</h5> -<p>When directly supported, reading the cycle counter should not modify any - memory. Implementations are allowed to either return a application specific - value or a system wide value. On backends without support, this is lowered - to a constant 0.</p> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="int_libc">Standard C Library Intrinsics</a> -</h3> - -<div> - -<p>LLVM provides intrinsics for a few important standard C library functions. - These intrinsics allow source-language front-ends to pass information about - the alignment of the pointer arguments to the code generator, providing - opportunity for more efficient code generation.</p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_memcpy">'<tt>llvm.memcpy</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<p>This is an overloaded intrinsic. You can use <tt>llvm.memcpy</tt> on any - integer bit width and for different address spaces. Not all targets support - all bit widths however.</p> - -<pre> - declare void @llvm.memcpy.p0i8.p0i8.i32(i8* <dest>, i8* <src>, - i32 <len>, i32 <align>, i1 <isvolatile>) - declare void @llvm.memcpy.p0i8.p0i8.i64(i8* <dest>, i8* <src>, - i64 <len>, i32 <align>, i1 <isvolatile>) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.memcpy.*</tt>' intrinsics copy a block of memory from the - source location to the destination location.</p> - -<p>Note that, unlike the standard libc function, the <tt>llvm.memcpy.*</tt> - intrinsics do not return a value, takes extra alignment/isvolatile arguments - and the pointers can be in specified address spaces.</p> - -<h5>Arguments:</h5> - -<p>The first argument is a pointer to the destination, the second is a pointer - to the source. The third argument is an integer argument specifying the - number of bytes to copy, the fourth argument is the alignment of the - source and destination locations, and the fifth is a boolean indicating a - volatile access.</p> - -<p>If the call to this intrinsic has an alignment value that is not 0 or 1, - then the caller guarantees that both the source and destination pointers are - aligned to that boundary.</p> - -<p>If the <tt>isvolatile</tt> parameter is <tt>true</tt>, the - <tt>llvm.memcpy</tt> call is a <a href="#volatile">volatile operation</a>. - The detailed access behavior is not very cleanly specified and it is unwise - to depend on it.</p> - -<h5>Semantics:</h5> - -<p>The '<tt>llvm.memcpy.*</tt>' intrinsics copy a block of memory from the - source location to the destination location, which are not allowed to - overlap. It copies "len" bytes of memory over. If the argument is known to - be aligned to some boundary, this can be specified as the fourth argument, - otherwise it should be set to 0 or 1.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_memmove">'<tt>llvm.memmove</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<p>This is an overloaded intrinsic. You can use llvm.memmove on any integer bit - width and for different address space. Not all targets support all bit - widths however.</p> - -<pre> - declare void @llvm.memmove.p0i8.p0i8.i32(i8* <dest>, i8* <src>, - i32 <len>, i32 <align>, i1 <isvolatile>) - declare void @llvm.memmove.p0i8.p0i8.i64(i8* <dest>, i8* <src>, - i64 <len>, i32 <align>, i1 <isvolatile>) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.memmove.*</tt>' intrinsics move a block of memory from the - source location to the destination location. It is similar to the - '<tt>llvm.memcpy</tt>' intrinsic but allows the two memory locations to - overlap.</p> - -<p>Note that, unlike the standard libc function, the <tt>llvm.memmove.*</tt> - intrinsics do not return a value, takes extra alignment/isvolatile arguments - and the pointers can be in specified address spaces.</p> - -<h5>Arguments:</h5> - -<p>The first argument is a pointer to the destination, the second is a pointer - to the source. The third argument is an integer argument specifying the - number of bytes to copy, the fourth argument is the alignment of the - source and destination locations, and the fifth is a boolean indicating a - volatile access.</p> - -<p>If the call to this intrinsic has an alignment value that is not 0 or 1, - then the caller guarantees that the source and destination pointers are - aligned to that boundary.</p> - -<p>If the <tt>isvolatile</tt> parameter is <tt>true</tt>, the - <tt>llvm.memmove</tt> call is a <a href="#volatile">volatile operation</a>. - The detailed access behavior is not very cleanly specified and it is unwise - to depend on it.</p> - -<h5>Semantics:</h5> - -<p>The '<tt>llvm.memmove.*</tt>' intrinsics copy a block of memory from the - source location to the destination location, which may overlap. It copies - "len" bytes of memory over. If the argument is known to be aligned to some - boundary, this can be specified as the fourth argument, otherwise it should - be set to 0 or 1.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_memset">'<tt>llvm.memset.*</tt>' Intrinsics</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<p>This is an overloaded intrinsic. You can use llvm.memset on any integer bit - width and for different address spaces. However, not all targets support all - bit widths.</p> - -<pre> - declare void @llvm.memset.p0i8.i32(i8* <dest>, i8 <val>, - i32 <len>, i32 <align>, i1 <isvolatile>) - declare void @llvm.memset.p0i8.i64(i8* <dest>, i8 <val>, - i64 <len>, i32 <align>, i1 <isvolatile>) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.memset.*</tt>' intrinsics fill a block of memory with a - particular byte value.</p> - -<p>Note that, unlike the standard libc function, the <tt>llvm.memset</tt> - intrinsic does not return a value and takes extra alignment/volatile - arguments. Also, the destination can be in an arbitrary address space.</p> - -<h5>Arguments:</h5> -<p>The first argument is a pointer to the destination to fill, the second is the - byte value with which to fill it, the third argument is an integer argument - specifying the number of bytes to fill, and the fourth argument is the known - alignment of the destination location.</p> - -<p>If the call to this intrinsic has an alignment value that is not 0 or 1, - then the caller guarantees that the destination pointer is aligned to that - boundary.</p> - -<p>If the <tt>isvolatile</tt> parameter is <tt>true</tt>, the - <tt>llvm.memset</tt> call is a <a href="#volatile">volatile operation</a>. - The detailed access behavior is not very cleanly specified and it is unwise - to depend on it.</p> - -<h5>Semantics:</h5> -<p>The '<tt>llvm.memset.*</tt>' intrinsics fill "len" bytes of memory starting - at the destination location. If the argument is known to be aligned to some - boundary, this can be specified as the fourth argument, otherwise it should - be set to 0 or 1.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_sqrt">'<tt>llvm.sqrt.*</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<p>This is an overloaded intrinsic. You can use <tt>llvm.sqrt</tt> on any - floating point or vector of floating point type. Not all targets support all - types however.</p> - -<pre> - declare float @llvm.sqrt.f32(float %Val) - declare double @llvm.sqrt.f64(double %Val) - declare x86_fp80 @llvm.sqrt.f80(x86_fp80 %Val) - declare fp128 @llvm.sqrt.f128(fp128 %Val) - declare ppc_fp128 @llvm.sqrt.ppcf128(ppc_fp128 %Val) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.sqrt</tt>' intrinsics return the sqrt of the specified operand, - returning the same value as the libm '<tt>sqrt</tt>' functions would. - Unlike <tt>sqrt</tt> in libm, however, <tt>llvm.sqrt</tt> has undefined - behavior for negative numbers other than -0.0 (which allows for better - optimization, because there is no need to worry about errno being - set). <tt>llvm.sqrt(-0.0)</tt> is defined to return -0.0 like IEEE sqrt.</p> - -<h5>Arguments:</h5> -<p>The argument and return value are floating point numbers of the same - type.</p> - -<h5>Semantics:</h5> -<p>This function returns the sqrt of the specified operand if it is a - nonnegative floating point number.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_powi">'<tt>llvm.powi.*</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<p>This is an overloaded intrinsic. You can use <tt>llvm.powi</tt> on any - floating point or vector of floating point type. Not all targets support all - types however.</p> - -<pre> - declare float @llvm.powi.f32(float %Val, i32 %power) - declare double @llvm.powi.f64(double %Val, i32 %power) - declare x86_fp80 @llvm.powi.f80(x86_fp80 %Val, i32 %power) - declare fp128 @llvm.powi.f128(fp128 %Val, i32 %power) - declare ppc_fp128 @llvm.powi.ppcf128(ppc_fp128 %Val, i32 %power) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.powi.*</tt>' intrinsics return the first operand raised to the - specified (positive or negative) power. The order of evaluation of - multiplications is not defined. When a vector of floating point type is - used, the second argument remains a scalar integer value.</p> - -<h5>Arguments:</h5> -<p>The second argument is an integer power, and the first is a value to raise to - that power.</p> - -<h5>Semantics:</h5> -<p>This function returns the first value raised to the second power with an - unspecified sequence of rounding operations.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_sin">'<tt>llvm.sin.*</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<p>This is an overloaded intrinsic. You can use <tt>llvm.sin</tt> on any - floating point or vector of floating point type. Not all targets support all - types however.</p> - -<pre> - declare float @llvm.sin.f32(float %Val) - declare double @llvm.sin.f64(double %Val) - declare x86_fp80 @llvm.sin.f80(x86_fp80 %Val) - declare fp128 @llvm.sin.f128(fp128 %Val) - declare ppc_fp128 @llvm.sin.ppcf128(ppc_fp128 %Val) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.sin.*</tt>' intrinsics return the sine of the operand.</p> - -<h5>Arguments:</h5> -<p>The argument and return value are floating point numbers of the same - type.</p> - -<h5>Semantics:</h5> -<p>This function returns the sine of the specified operand, returning the same - values as the libm <tt>sin</tt> functions would, and handles error conditions - in the same way.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_cos">'<tt>llvm.cos.*</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<p>This is an overloaded intrinsic. You can use <tt>llvm.cos</tt> on any - floating point or vector of floating point type. Not all targets support all - types however.</p> - -<pre> - declare float @llvm.cos.f32(float %Val) - declare double @llvm.cos.f64(double %Val) - declare x86_fp80 @llvm.cos.f80(x86_fp80 %Val) - declare fp128 @llvm.cos.f128(fp128 %Val) - declare ppc_fp128 @llvm.cos.ppcf128(ppc_fp128 %Val) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.cos.*</tt>' intrinsics return the cosine of the operand.</p> - -<h5>Arguments:</h5> -<p>The argument and return value are floating point numbers of the same - type.</p> - -<h5>Semantics:</h5> -<p>This function returns the cosine of the specified operand, returning the same - values as the libm <tt>cos</tt> functions would, and handles error conditions - in the same way.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_pow">'<tt>llvm.pow.*</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<p>This is an overloaded intrinsic. You can use <tt>llvm.pow</tt> on any - floating point or vector of floating point type. Not all targets support all - types however.</p> - -<pre> - declare float @llvm.pow.f32(float %Val, float %Power) - declare double @llvm.pow.f64(double %Val, double %Power) - declare x86_fp80 @llvm.pow.f80(x86_fp80 %Val, x86_fp80 %Power) - declare fp128 @llvm.pow.f128(fp128 %Val, fp128 %Power) - declare ppc_fp128 @llvm.pow.ppcf128(ppc_fp128 %Val, ppc_fp128 Power) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.pow.*</tt>' intrinsics return the first operand raised to the - specified (positive or negative) power.</p> - -<h5>Arguments:</h5> -<p>The second argument is a floating point power, and the first is a value to - raise to that power.</p> - -<h5>Semantics:</h5> -<p>This function returns the first value raised to the second power, returning - the same values as the libm <tt>pow</tt> functions would, and handles error - conditions in the same way.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_exp">'<tt>llvm.exp.*</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<p>This is an overloaded intrinsic. You can use <tt>llvm.exp</tt> on any - floating point or vector of floating point type. Not all targets support all - types however.</p> - -<pre> - declare float @llvm.exp.f32(float %Val) - declare double @llvm.exp.f64(double %Val) - declare x86_fp80 @llvm.exp.f80(x86_fp80 %Val) - declare fp128 @llvm.exp.f128(fp128 %Val) - declare ppc_fp128 @llvm.exp.ppcf128(ppc_fp128 %Val) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.exp.*</tt>' intrinsics perform the exp function.</p> - -<h5>Arguments:</h5> -<p>The argument and return value are floating point numbers of the same - type.</p> - -<h5>Semantics:</h5> -<p>This function returns the same values as the libm <tt>exp</tt> functions - would, and handles error conditions in the same way.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_log">'<tt>llvm.log.*</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<p>This is an overloaded intrinsic. You can use <tt>llvm.log</tt> on any - floating point or vector of floating point type. Not all targets support all - types however.</p> - -<pre> - declare float @llvm.log.f32(float %Val) - declare double @llvm.log.f64(double %Val) - declare x86_fp80 @llvm.log.f80(x86_fp80 %Val) - declare fp128 @llvm.log.f128(fp128 %Val) - declare ppc_fp128 @llvm.log.ppcf128(ppc_fp128 %Val) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.log.*</tt>' intrinsics perform the log function.</p> - -<h5>Arguments:</h5> -<p>The argument and return value are floating point numbers of the same - type.</p> - -<h5>Semantics:</h5> -<p>This function returns the same values as the libm <tt>log</tt> functions - would, and handles error conditions in the same way.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_fma">'<tt>llvm.fma.*</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<p>This is an overloaded intrinsic. You can use <tt>llvm.fma</tt> on any - floating point or vector of floating point type. Not all targets support all - types however.</p> - -<pre> - declare float @llvm.fma.f32(float %a, float %b, float %c) - declare double @llvm.fma.f64(double %a, double %b, double %c) - declare x86_fp80 @llvm.fma.f80(x86_fp80 %a, x86_fp80 %b, x86_fp80 %c) - declare fp128 @llvm.fma.f128(fp128 %a, fp128 %b, fp128 %c) - declare ppc_fp128 @llvm.fma.ppcf128(ppc_fp128 %a, ppc_fp128 %b, ppc_fp128 %c) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.fma.*</tt>' intrinsics perform the fused multiply-add - operation.</p> - -<h5>Arguments:</h5> -<p>The argument and return value are floating point numbers of the same - type.</p> - -<h5>Semantics:</h5> -<p>This function returns the same values as the libm <tt>fma</tt> functions - would.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_fabs">'<tt>llvm.fabs.*</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<p>This is an overloaded intrinsic. You can use <tt>llvm.fabs</tt> on any - floating point or vector of floating point type. Not all targets support all - types however.</p> - -<pre> - declare float @llvm.fabs.f32(float %Val) - declare double @llvm.fabs.f64(double %Val) - declare x86_fp80 @llvm.fabs.f80(x86_fp80 %Val) - declare fp128 @llvm.fabs.f128(fp128 %Val) - declare ppc_fp128 @llvm.fabs.ppcf128(ppc_fp128 %Val) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.fabs.*</tt>' intrinsics return the absolute value of - the operand.</p> - -<h5>Arguments:</h5> -<p>The argument and return value are floating point numbers of the same - type.</p> - -<h5>Semantics:</h5> -<p>This function returns the same values as the libm <tt>fabs</tt> functions - would, and handles error conditions in the same way.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_floor">'<tt>llvm.floor.*</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<p>This is an overloaded intrinsic. You can use <tt>llvm.floor</tt> on any - floating point or vector of floating point type. Not all targets support all - types however.</p> - -<pre> - declare float @llvm.floor.f32(float %Val) - declare double @llvm.floor.f64(double %Val) - declare x86_fp80 @llvm.floor.f80(x86_fp80 %Val) - declare fp128 @llvm.floor.f128(fp128 %Val) - declare ppc_fp128 @llvm.floor.ppcf128(ppc_fp128 %Val) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.floor.*</tt>' intrinsics return the floor of - the operand.</p> - -<h5>Arguments:</h5> -<p>The argument and return value are floating point numbers of the same - type.</p> - -<h5>Semantics:</h5> -<p>This function returns the same values as the libm <tt>floor</tt> functions - would, and handles error conditions in the same way.</p> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="int_manip">Bit Manipulation Intrinsics</a> -</h3> - -<div> - -<p>LLVM provides intrinsics for a few important bit manipulation operations. - These allow efficient code generation for some algorithms.</p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_bswap">'<tt>llvm.bswap.*</tt>' Intrinsics</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<p>This is an overloaded intrinsic function. You can use bswap on any integer - type that is an even number of bytes (i.e. BitWidth % 16 == 0).</p> - -<pre> - declare i16 @llvm.bswap.i16(i16 <id>) - declare i32 @llvm.bswap.i32(i32 <id>) - declare i64 @llvm.bswap.i64(i64 <id>) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.bswap</tt>' family of intrinsics is used to byte swap integer - values with an even number of bytes (positive multiple of 16 bits). These - are useful for performing operations on data that is not in the target's - native byte order.</p> - -<h5>Semantics:</h5> -<p>The <tt>llvm.bswap.i16</tt> intrinsic returns an i16 value that has the high - and low byte of the input i16 swapped. Similarly, - the <tt>llvm.bswap.i32</tt> intrinsic returns an i32 value that has the four - bytes of the input i32 swapped, so that if the input bytes are numbered 0, 1, - 2, 3 then the returned i32 will have its bytes in 3, 2, 1, 0 order. - The <tt>llvm.bswap.i48</tt>, <tt>llvm.bswap.i64</tt> and other intrinsics - extend this concept to additional even-byte lengths (6 bytes, 8 bytes and - more, respectively).</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_ctpop">'<tt>llvm.ctpop.*</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<p>This is an overloaded intrinsic. You can use llvm.ctpop on any integer bit - width, or on any vector with integer elements. Not all targets support all - bit widths or vector types, however.</p> - -<pre> - declare i8 @llvm.ctpop.i8(i8 <src>) - declare i16 @llvm.ctpop.i16(i16 <src>) - declare i32 @llvm.ctpop.i32(i32 <src>) - declare i64 @llvm.ctpop.i64(i64 <src>) - declare i256 @llvm.ctpop.i256(i256 <src>) - declare <2 x i32> @llvm.ctpop.v2i32(<2 x i32> <src>) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.ctpop</tt>' family of intrinsics counts the number of bits set - in a value.</p> - -<h5>Arguments:</h5> -<p>The only argument is the value to be counted. The argument may be of any - integer type, or a vector with integer elements. - The return type must match the argument type.</p> - -<h5>Semantics:</h5> -<p>The '<tt>llvm.ctpop</tt>' intrinsic counts the 1's in a variable, or within each - element of a vector.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_ctlz">'<tt>llvm.ctlz.*</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<p>This is an overloaded intrinsic. You can use <tt>llvm.ctlz</tt> on any - integer bit width, or any vector whose elements are integers. Not all - targets support all bit widths or vector types, however.</p> - -<pre> - declare i8 @llvm.ctlz.i8 (i8 <src>, i1 <is_zero_undef>) - declare i16 @llvm.ctlz.i16 (i16 <src>, i1 <is_zero_undef>) - declare i32 @llvm.ctlz.i32 (i32 <src>, i1 <is_zero_undef>) - declare i64 @llvm.ctlz.i64 (i64 <src>, i1 <is_zero_undef>) - declare i256 @llvm.ctlz.i256(i256 <src>, i1 <is_zero_undef>) - declase <2 x i32> @llvm.ctlz.v2i32(<2 x i32> <src>, i1 <is_zero_undef>) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.ctlz</tt>' family of intrinsic functions counts the number of - leading zeros in a variable.</p> - -<h5>Arguments:</h5> -<p>The first argument is the value to be counted. This argument may be of any - integer type, or a vectory with integer element type. The return type - must match the first argument type.</p> - -<p>The second argument must be a constant and is a flag to indicate whether the - intrinsic should ensure that a zero as the first argument produces a defined - result. Historically some architectures did not provide a defined result for - zero values as efficiently, and many algorithms are now predicated on - avoiding zero-value inputs.</p> - -<h5>Semantics:</h5> -<p>The '<tt>llvm.ctlz</tt>' intrinsic counts the leading (most significant) - zeros in a variable, or within each element of the vector. - If <tt>src == 0</tt> then the result is the size in bits of the type of - <tt>src</tt> if <tt>is_zero_undef == 0</tt> and <tt>undef</tt> otherwise. - For example, <tt>llvm.ctlz(i32 2) = 30</tt>.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_cttz">'<tt>llvm.cttz.*</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<p>This is an overloaded intrinsic. You can use <tt>llvm.cttz</tt> on any - integer bit width, or any vector of integer elements. Not all targets - support all bit widths or vector types, however.</p> - -<pre> - declare i8 @llvm.cttz.i8 (i8 <src>, i1 <is_zero_undef>) - declare i16 @llvm.cttz.i16 (i16 <src>, i1 <is_zero_undef>) - declare i32 @llvm.cttz.i32 (i32 <src>, i1 <is_zero_undef>) - declare i64 @llvm.cttz.i64 (i64 <src>, i1 <is_zero_undef>) - declare i256 @llvm.cttz.i256(i256 <src>, i1 <is_zero_undef>) - declase <2 x i32> @llvm.cttz.v2i32(<2 x i32> <src>, i1 <is_zero_undef>) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.cttz</tt>' family of intrinsic functions counts the number of - trailing zeros.</p> - -<h5>Arguments:</h5> -<p>The first argument is the value to be counted. This argument may be of any - integer type, or a vectory with integer element type. The return type - must match the first argument type.</p> - -<p>The second argument must be a constant and is a flag to indicate whether the - intrinsic should ensure that a zero as the first argument produces a defined - result. Historically some architectures did not provide a defined result for - zero values as efficiently, and many algorithms are now predicated on - avoiding zero-value inputs.</p> - -<h5>Semantics:</h5> -<p>The '<tt>llvm.cttz</tt>' intrinsic counts the trailing (least significant) - zeros in a variable, or within each element of a vector. - If <tt>src == 0</tt> then the result is the size in bits of the type of - <tt>src</tt> if <tt>is_zero_undef == 0</tt> and <tt>undef</tt> otherwise. - For example, <tt>llvm.cttz(2) = 1</tt>.</p> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="int_overflow">Arithmetic with Overflow Intrinsics</a> -</h3> - -<div> - -<p>LLVM provides intrinsics for some arithmetic with overflow operations.</p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_sadd_overflow"> - '<tt>llvm.sadd.with.overflow.*</tt>' Intrinsics - </a> -</h4> - -<div> - -<h5>Syntax:</h5> -<p>This is an overloaded intrinsic. You can use <tt>llvm.sadd.with.overflow</tt> - on any integer bit width.</p> - -<pre> - declare {i16, i1} @llvm.sadd.with.overflow.i16(i16 %a, i16 %b) - declare {i32, i1} @llvm.sadd.with.overflow.i32(i32 %a, i32 %b) - declare {i64, i1} @llvm.sadd.with.overflow.i64(i64 %a, i64 %b) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.sadd.with.overflow</tt>' family of intrinsic functions perform - a signed addition of the two arguments, and indicate whether an overflow - occurred during the signed summation.</p> - -<h5>Arguments:</h5> -<p>The arguments (%a and %b) and the first element of the result structure may - be of integer types of any bit width, but they must have the same bit - width. The second element of the result structure must be of - type <tt>i1</tt>. <tt>%a</tt> and <tt>%b</tt> are the two values that will - undergo signed addition.</p> - -<h5>Semantics:</h5> -<p>The '<tt>llvm.sadd.with.overflow</tt>' family of intrinsic functions perform - a signed addition of the two variables. They return a structure — the - first element of which is the signed summation, and the second element of - which is a bit specifying if the signed summation resulted in an - overflow.</p> - -<h5>Examples:</h5> -<pre> - %res = call {i32, i1} @llvm.sadd.with.overflow.i32(i32 %a, i32 %b) - %sum = extractvalue {i32, i1} %res, 0 - %obit = extractvalue {i32, i1} %res, 1 - br i1 %obit, label %overflow, label %normal -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_uadd_overflow"> - '<tt>llvm.uadd.with.overflow.*</tt>' Intrinsics - </a> -</h4> - -<div> - -<h5>Syntax:</h5> -<p>This is an overloaded intrinsic. You can use <tt>llvm.uadd.with.overflow</tt> - on any integer bit width.</p> - -<pre> - declare {i16, i1} @llvm.uadd.with.overflow.i16(i16 %a, i16 %b) - declare {i32, i1} @llvm.uadd.with.overflow.i32(i32 %a, i32 %b) - declare {i64, i1} @llvm.uadd.with.overflow.i64(i64 %a, i64 %b) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.uadd.with.overflow</tt>' family of intrinsic functions perform - an unsigned addition of the two arguments, and indicate whether a carry - occurred during the unsigned summation.</p> - -<h5>Arguments:</h5> -<p>The arguments (%a and %b) and the first element of the result structure may - be of integer types of any bit width, but they must have the same bit - width. The second element of the result structure must be of - type <tt>i1</tt>. <tt>%a</tt> and <tt>%b</tt> are the two values that will - undergo unsigned addition.</p> - -<h5>Semantics:</h5> -<p>The '<tt>llvm.uadd.with.overflow</tt>' family of intrinsic functions perform - an unsigned addition of the two arguments. They return a structure — - the first element of which is the sum, and the second element of which is a - bit specifying if the unsigned summation resulted in a carry.</p> - -<h5>Examples:</h5> -<pre> - %res = call {i32, i1} @llvm.uadd.with.overflow.i32(i32 %a, i32 %b) - %sum = extractvalue {i32, i1} %res, 0 - %obit = extractvalue {i32, i1} %res, 1 - br i1 %obit, label %carry, label %normal -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_ssub_overflow"> - '<tt>llvm.ssub.with.overflow.*</tt>' Intrinsics - </a> -</h4> - -<div> - -<h5>Syntax:</h5> -<p>This is an overloaded intrinsic. You can use <tt>llvm.ssub.with.overflow</tt> - on any integer bit width.</p> - -<pre> - declare {i16, i1} @llvm.ssub.with.overflow.i16(i16 %a, i16 %b) - declare {i32, i1} @llvm.ssub.with.overflow.i32(i32 %a, i32 %b) - declare {i64, i1} @llvm.ssub.with.overflow.i64(i64 %a, i64 %b) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.ssub.with.overflow</tt>' family of intrinsic functions perform - a signed subtraction of the two arguments, and indicate whether an overflow - occurred during the signed subtraction.</p> - -<h5>Arguments:</h5> -<p>The arguments (%a and %b) and the first element of the result structure may - be of integer types of any bit width, but they must have the same bit - width. The second element of the result structure must be of - type <tt>i1</tt>. <tt>%a</tt> and <tt>%b</tt> are the two values that will - undergo signed subtraction.</p> - -<h5>Semantics:</h5> -<p>The '<tt>llvm.ssub.with.overflow</tt>' family of intrinsic functions perform - a signed subtraction of the two arguments. They return a structure — - the first element of which is the subtraction, and the second element of - which is a bit specifying if the signed subtraction resulted in an - overflow.</p> - -<h5>Examples:</h5> -<pre> - %res = call {i32, i1} @llvm.ssub.with.overflow.i32(i32 %a, i32 %b) - %sum = extractvalue {i32, i1} %res, 0 - %obit = extractvalue {i32, i1} %res, 1 - br i1 %obit, label %overflow, label %normal -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_usub_overflow"> - '<tt>llvm.usub.with.overflow.*</tt>' Intrinsics - </a> -</h4> - -<div> - -<h5>Syntax:</h5> -<p>This is an overloaded intrinsic. You can use <tt>llvm.usub.with.overflow</tt> - on any integer bit width.</p> - -<pre> - declare {i16, i1} @llvm.usub.with.overflow.i16(i16 %a, i16 %b) - declare {i32, i1} @llvm.usub.with.overflow.i32(i32 %a, i32 %b) - declare {i64, i1} @llvm.usub.with.overflow.i64(i64 %a, i64 %b) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.usub.with.overflow</tt>' family of intrinsic functions perform - an unsigned subtraction of the two arguments, and indicate whether an - overflow occurred during the unsigned subtraction.</p> - -<h5>Arguments:</h5> -<p>The arguments (%a and %b) and the first element of the result structure may - be of integer types of any bit width, but they must have the same bit - width. The second element of the result structure must be of - type <tt>i1</tt>. <tt>%a</tt> and <tt>%b</tt> are the two values that will - undergo unsigned subtraction.</p> - -<h5>Semantics:</h5> -<p>The '<tt>llvm.usub.with.overflow</tt>' family of intrinsic functions perform - an unsigned subtraction of the two arguments. They return a structure — - the first element of which is the subtraction, and the second element of - which is a bit specifying if the unsigned subtraction resulted in an - overflow.</p> - -<h5>Examples:</h5> -<pre> - %res = call {i32, i1} @llvm.usub.with.overflow.i32(i32 %a, i32 %b) - %sum = extractvalue {i32, i1} %res, 0 - %obit = extractvalue {i32, i1} %res, 1 - br i1 %obit, label %overflow, label %normal -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_smul_overflow"> - '<tt>llvm.smul.with.overflow.*</tt>' Intrinsics - </a> -</h4> - -<div> - -<h5>Syntax:</h5> -<p>This is an overloaded intrinsic. You can use <tt>llvm.smul.with.overflow</tt> - on any integer bit width.</p> - -<pre> - declare {i16, i1} @llvm.smul.with.overflow.i16(i16 %a, i16 %b) - declare {i32, i1} @llvm.smul.with.overflow.i32(i32 %a, i32 %b) - declare {i64, i1} @llvm.smul.with.overflow.i64(i64 %a, i64 %b) -</pre> - -<h5>Overview:</h5> - -<p>The '<tt>llvm.smul.with.overflow</tt>' family of intrinsic functions perform - a signed multiplication of the two arguments, and indicate whether an - overflow occurred during the signed multiplication.</p> - -<h5>Arguments:</h5> -<p>The arguments (%a and %b) and the first element of the result structure may - be of integer types of any bit width, but they must have the same bit - width. The second element of the result structure must be of - type <tt>i1</tt>. <tt>%a</tt> and <tt>%b</tt> are the two values that will - undergo signed multiplication.</p> - -<h5>Semantics:</h5> -<p>The '<tt>llvm.smul.with.overflow</tt>' family of intrinsic functions perform - a signed multiplication of the two arguments. They return a structure — - the first element of which is the multiplication, and the second element of - which is a bit specifying if the signed multiplication resulted in an - overflow.</p> - -<h5>Examples:</h5> -<pre> - %res = call {i32, i1} @llvm.smul.with.overflow.i32(i32 %a, i32 %b) - %sum = extractvalue {i32, i1} %res, 0 - %obit = extractvalue {i32, i1} %res, 1 - br i1 %obit, label %overflow, label %normal -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_umul_overflow"> - '<tt>llvm.umul.with.overflow.*</tt>' Intrinsics - </a> -</h4> - -<div> - -<h5>Syntax:</h5> -<p>This is an overloaded intrinsic. You can use <tt>llvm.umul.with.overflow</tt> - on any integer bit width.</p> - -<pre> - declare {i16, i1} @llvm.umul.with.overflow.i16(i16 %a, i16 %b) - declare {i32, i1} @llvm.umul.with.overflow.i32(i32 %a, i32 %b) - declare {i64, i1} @llvm.umul.with.overflow.i64(i64 %a, i64 %b) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.umul.with.overflow</tt>' family of intrinsic functions perform - a unsigned multiplication of the two arguments, and indicate whether an - overflow occurred during the unsigned multiplication.</p> - -<h5>Arguments:</h5> -<p>The arguments (%a and %b) and the first element of the result structure may - be of integer types of any bit width, but they must have the same bit - width. The second element of the result structure must be of - type <tt>i1</tt>. <tt>%a</tt> and <tt>%b</tt> are the two values that will - undergo unsigned multiplication.</p> - -<h5>Semantics:</h5> -<p>The '<tt>llvm.umul.with.overflow</tt>' family of intrinsic functions perform - an unsigned multiplication of the two arguments. They return a structure - — the first element of which is the multiplication, and the second - element of which is a bit specifying if the unsigned multiplication resulted - in an overflow.</p> - -<h5>Examples:</h5> -<pre> - %res = call {i32, i1} @llvm.umul.with.overflow.i32(i32 %a, i32 %b) - %sum = extractvalue {i32, i1} %res, 0 - %obit = extractvalue {i32, i1} %res, 1 - br i1 %obit, label %overflow, label %normal -</pre> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="spec_arithmetic">Specialised Arithmetic Intrinsics</a> -</h3> - -<!-- _______________________________________________________________________ --> - -<h4> - <a name="fmuladd">'<tt>llvm.fmuladd.*</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - declare float @llvm.fmuladd.f32(float %a, float %b, float %c) - declare double @llvm.fmuladd.f64(double %a, double %b, double %c) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.fmuladd.*</tt>' intrinsic functions represent multiply-add -expressions that can be fused if the code generator determines that the fused -expression would be legal and efficient.</p> - -<h5>Arguments:</h5> -<p>The '<tt>llvm.fmuladd.*</tt>' intrinsics each take three arguments: two -multiplicands, a and b, and an addend c.</p> - -<h5>Semantics:</h5> -<p>The expression:</p> -<pre> - %0 = call float @llvm.fmuladd.f32(%a, %b, %c) -</pre> -<p>is equivalent to the expression a * b + c, except that rounding will not be -performed between the multiplication and addition steps if the code generator -fuses the operations. Fusion is not guaranteed, even if the target platform -supports it. If a fused multiply-add is required the corresponding llvm.fma.* -intrinsic function should be used instead.</p> - -<h5>Examples:</h5> -<pre> - %r2 = call float @llvm.fmuladd.f32(float %a, float %b, float %c) ; yields {float}:r2 = (a * b) + c -</pre> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="int_fp16">Half Precision Floating Point Intrinsics</a> -</h3> - -<div> - -<p>For most target platforms, half precision floating point is a storage-only - format. This means that it is - a dense encoding (in memory) but does not support computation in the - format.</p> - -<p>This means that code must first load the half-precision floating point - value as an i16, then convert it to float with <a - href="#int_convert_from_fp16"><tt>llvm.convert.from.fp16</tt></a>. - Computation can then be performed on the float value (including extending to - double etc). To store the value back to memory, it is first converted to - float if needed, then converted to i16 with - <a href="#int_convert_to_fp16"><tt>llvm.convert.to.fp16</tt></a>, then - storing as an i16 value.</p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_convert_to_fp16"> - '<tt>llvm.convert.to.fp16</tt>' Intrinsic - </a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - declare i16 @llvm.convert.to.fp16(f32 %a) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.convert.to.fp16</tt>' intrinsic function performs - a conversion from single precision floating point format to half precision - floating point format.</p> - -<h5>Arguments:</h5> -<p>The intrinsic function contains single argument - the value to be - converted.</p> - -<h5>Semantics:</h5> -<p>The '<tt>llvm.convert.to.fp16</tt>' intrinsic function performs - a conversion from single precision floating point format to half precision - floating point format. The return value is an <tt>i16</tt> which - contains the converted number.</p> - -<h5>Examples:</h5> -<pre> - %res = call i16 @llvm.convert.to.fp16(f32 %a) - store i16 %res, i16* @x, align 2 -</pre> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_convert_from_fp16"> - '<tt>llvm.convert.from.fp16</tt>' Intrinsic - </a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - declare f32 @llvm.convert.from.fp16(i16 %a) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.convert.from.fp16</tt>' intrinsic function performs - a conversion from half precision floating point format to single precision - floating point format.</p> - -<h5>Arguments:</h5> -<p>The intrinsic function contains single argument - the value to be - converted.</p> - -<h5>Semantics:</h5> -<p>The '<tt>llvm.convert.from.fp16</tt>' intrinsic function performs a - conversion from half single precision floating point format to single - precision floating point format. The input half-float value is represented by - an <tt>i16</tt> value.</p> - -<h5>Examples:</h5> -<pre> - %a = load i16* @x, align 2 - %res = call f32 @llvm.convert.from.fp16(i16 %a) -</pre> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="int_debugger">Debugger Intrinsics</a> -</h3> - -<div> - -<p>The LLVM debugger intrinsics (which all start with <tt>llvm.dbg.</tt> - prefix), are described in - the <a href="SourceLevelDebugging.html#format_common_intrinsics">LLVM Source - Level Debugging</a> document.</p> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="int_eh">Exception Handling Intrinsics</a> -</h3> - -<div> - -<p>The LLVM exception handling intrinsics (which all start with - <tt>llvm.eh.</tt> prefix), are described in - the <a href="ExceptionHandling.html#format_common_intrinsics">LLVM Exception - Handling</a> document.</p> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="int_trampoline">Trampoline Intrinsics</a> -</h3> - -<div> - -<p>These intrinsics make it possible to excise one parameter, marked with - the <a href="#nest"><tt>nest</tt></a> attribute, from a function. - The result is a callable - function pointer lacking the nest parameter - the caller does not need to - provide a value for it. Instead, the value to use is stored in advance in a - "trampoline", a block of memory usually allocated on the stack, which also - contains code to splice the nest value into the argument list. This is used - to implement the GCC nested function address extension.</p> - -<p>For example, if the function is - <tt>i32 f(i8* nest %c, i32 %x, i32 %y)</tt> then the resulting function - pointer has signature <tt>i32 (i32, i32)*</tt>. It can be created as - follows:</p> - -<pre class="doc_code"> - %tramp = alloca [10 x i8], align 4 ; size and alignment only correct for X86 - %tramp1 = getelementptr [10 x i8]* %tramp, i32 0, i32 0 - call i8* @llvm.init.trampoline(i8* %tramp1, i8* bitcast (i32 (i8*, i32, i32)* @f to i8*), i8* %nval) - %p = call i8* @llvm.adjust.trampoline(i8* %tramp1) - %fp = bitcast i8* %p to i32 (i32, i32)* -</pre> - -<p>The call <tt>%val = call i32 %fp(i32 %x, i32 %y)</tt> is then equivalent - to <tt>%val = call i32 %f(i8* %nval, i32 %x, i32 %y)</tt>.</p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_it"> - '<tt>llvm.init.trampoline</tt>' Intrinsic - </a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - declare void @llvm.init.trampoline(i8* <tramp>, i8* <func>, i8* <nval>) -</pre> - -<h5>Overview:</h5> -<p>This fills the memory pointed to by <tt>tramp</tt> with executable code, - turning it into a trampoline.</p> - -<h5>Arguments:</h5> -<p>The <tt>llvm.init.trampoline</tt> intrinsic takes three arguments, all - pointers. The <tt>tramp</tt> argument must point to a sufficiently large and - sufficiently aligned block of memory; this memory is written to by the - intrinsic. Note that the size and the alignment are target-specific - LLVM - currently provides no portable way of determining them, so a front-end that - generates this intrinsic needs to have some target-specific knowledge. - The <tt>func</tt> argument must hold a function bitcast to - an <tt>i8*</tt>.</p> - -<h5>Semantics:</h5> -<p>The block of memory pointed to by <tt>tramp</tt> is filled with target - dependent code, turning it into a function. Then <tt>tramp</tt> needs to be - passed to <a href="#int_at">llvm.adjust.trampoline</a> to get a pointer - which can be <a href="#int_trampoline">bitcast (to a new function) and - called</a>. The new function's signature is the same as that of - <tt>func</tt> with any arguments marked with the <tt>nest</tt> attribute - removed. At most one such <tt>nest</tt> argument is allowed, and it must be of - pointer type. Calling the new function is equivalent to calling <tt>func</tt> - with the same argument list, but with <tt>nval</tt> used for the missing - <tt>nest</tt> argument. If, after calling <tt>llvm.init.trampoline</tt>, the - memory pointed to by <tt>tramp</tt> is modified, then the effect of any later call - to the returned function pointer is undefined.</p> -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_at"> - '<tt>llvm.adjust.trampoline</tt>' Intrinsic - </a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - declare i8* @llvm.adjust.trampoline(i8* <tramp>) -</pre> - -<h5>Overview:</h5> -<p>This performs any required machine-specific adjustment to the address of a - trampoline (passed as <tt>tramp</tt>).</p> - -<h5>Arguments:</h5> -<p><tt>tramp</tt> must point to a block of memory which already has trampoline code - filled in by a previous call to <a href="#int_it"><tt>llvm.init.trampoline</tt> - </a>.</p> - -<h5>Semantics:</h5> -<p>On some architectures the address of the code to be executed needs to be - different to the address where the trampoline is actually stored. This - intrinsic returns the executable address corresponding to <tt>tramp</tt> - after performing the required machine specific adjustments. - The pointer returned can then be <a href="#int_trampoline"> bitcast and - executed</a>. -</p> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="int_memorymarkers">Memory Use Markers</a> -</h3> - -<div> - -<p>This class of intrinsics exists to information about the lifetime of memory - objects and ranges where variables are immutable.</p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_lifetime_start">'<tt>llvm.lifetime.start</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - declare void @llvm.lifetime.start(i64 <size>, i8* nocapture <ptr>) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.lifetime.start</tt>' intrinsic specifies the start of a memory - object's lifetime.</p> - -<h5>Arguments:</h5> -<p>The first argument is a constant integer representing the size of the - object, or -1 if it is variable sized. The second argument is a pointer to - the object.</p> - -<h5>Semantics:</h5> -<p>This intrinsic indicates that before this point in the code, the value of the - memory pointed to by <tt>ptr</tt> is dead. This means that it is known to - never be used and has an undefined value. A load from the pointer that - precedes this intrinsic can be replaced with - <tt>'<a href="#undefvalues">undef</a>'</tt>.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_lifetime_end">'<tt>llvm.lifetime.end</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - declare void @llvm.lifetime.end(i64 <size>, i8* nocapture <ptr>) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.lifetime.end</tt>' intrinsic specifies the end of a memory - object's lifetime.</p> - -<h5>Arguments:</h5> -<p>The first argument is a constant integer representing the size of the - object, or -1 if it is variable sized. The second argument is a pointer to - the object.</p> - -<h5>Semantics:</h5> -<p>This intrinsic indicates that after this point in the code, the value of the - memory pointed to by <tt>ptr</tt> is dead. This means that it is known to - never be used and has an undefined value. Any stores into the memory object - following this intrinsic may be removed as dead. - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_invariant_start">'<tt>llvm.invariant.start</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - declare {}* @llvm.invariant.start(i64 <size>, i8* nocapture <ptr>) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.invariant.start</tt>' intrinsic specifies that the contents of - a memory object will not change.</p> - -<h5>Arguments:</h5> -<p>The first argument is a constant integer representing the size of the - object, or -1 if it is variable sized. The second argument is a pointer to - the object.</p> - -<h5>Semantics:</h5> -<p>This intrinsic indicates that until an <tt>llvm.invariant.end</tt> that uses - the return value, the referenced memory location is constant and - unchanging.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_invariant_end">'<tt>llvm.invariant.end</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - declare void @llvm.invariant.end({}* <start>, i64 <size>, i8* nocapture <ptr>) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.invariant.end</tt>' intrinsic specifies that the contents of - a memory object are mutable.</p> - -<h5>Arguments:</h5> -<p>The first argument is the matching <tt>llvm.invariant.start</tt> intrinsic. - The second argument is a constant integer representing the size of the - object, or -1 if it is variable sized and the third argument is a pointer - to the object.</p> - -<h5>Semantics:</h5> -<p>This intrinsic indicates that the memory is mutable again.</p> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="int_general">General Intrinsics</a> -</h3> - -<div> - -<p>This class of intrinsics is designed to be generic and has no specific - purpose.</p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_var_annotation">'<tt>llvm.var.annotation</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - declare void @llvm.var.annotation(i8* <val>, i8* <str>, i8* <str>, i32 <int>) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.var.annotation</tt>' intrinsic.</p> - -<h5>Arguments:</h5> -<p>The first argument is a pointer to a value, the second is a pointer to a - global string, the third is a pointer to a global string which is the source - file name, and the last argument is the line number.</p> - -<h5>Semantics:</h5> -<p>This intrinsic allows annotation of local variables with arbitrary strings. - This can be useful for special purpose optimizations that want to look for - these annotations. These have no other defined use; they are ignored by code - generation and optimization.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_annotation">'<tt>llvm.annotation.*</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<p>This is an overloaded intrinsic. You can use '<tt>llvm.annotation</tt>' on - any integer bit width.</p> - -<pre> - declare i8 @llvm.annotation.i8(i8 <val>, i8* <str>, i8* <str>, i32 <int>) - declare i16 @llvm.annotation.i16(i16 <val>, i8* <str>, i8* <str>, i32 <int>) - declare i32 @llvm.annotation.i32(i32 <val>, i8* <str>, i8* <str>, i32 <int>) - declare i64 @llvm.annotation.i64(i64 <val>, i8* <str>, i8* <str>, i32 <int>) - declare i256 @llvm.annotation.i256(i256 <val>, i8* <str>, i8* <str>, i32 <int>) -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.annotation</tt>' intrinsic.</p> - -<h5>Arguments:</h5> -<p>The first argument is an integer value (result of some expression), the - second is a pointer to a global string, the third is a pointer to a global - string which is the source file name, and the last argument is the line - number. It returns the value of the first argument.</p> - -<h5>Semantics:</h5> -<p>This intrinsic allows annotations to be put on arbitrary expressions with - arbitrary strings. This can be useful for special purpose optimizations that - want to look for these annotations. These have no other defined use; they - are ignored by code generation and optimization.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_trap">'<tt>llvm.trap</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - declare void @llvm.trap() noreturn nounwind -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.trap</tt>' intrinsic.</p> - -<h5>Arguments:</h5> -<p>None.</p> - -<h5>Semantics:</h5> -<p>This intrinsic is lowered to the target dependent trap instruction. If the - target does not have a trap instruction, this intrinsic will be lowered to - a call of the <tt>abort()</tt> function.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_debugtrap">'<tt>llvm.debugtrap</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - declare void @llvm.debugtrap() nounwind -</pre> - -<h5>Overview:</h5> -<p>The '<tt>llvm.debugtrap</tt>' intrinsic.</p> - -<h5>Arguments:</h5> -<p>None.</p> - -<h5>Semantics:</h5> -<p>This intrinsic is lowered to code which is intended to cause an execution - trap with the intention of requesting the attention of a debugger.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_stackprotector">'<tt>llvm.stackprotector</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - declare void @llvm.stackprotector(i8* <guard>, i8** <slot>) -</pre> - -<h5>Overview:</h5> -<p>The <tt>llvm.stackprotector</tt> intrinsic takes the <tt>guard</tt> and - stores it onto the stack at <tt>slot</tt>. The stack slot is adjusted to - ensure that it is placed on the stack before local variables.</p> - -<h5>Arguments:</h5> -<p>The <tt>llvm.stackprotector</tt> intrinsic requires two pointer - arguments. The first argument is the value loaded from the stack - guard <tt>@__stack_chk_guard</tt>. The second variable is an <tt>alloca</tt> - that has enough space to hold the value of the guard.</p> - -<h5>Semantics:</h5> -<p>This intrinsic causes the prologue/epilogue inserter to force the position of - the <tt>AllocaInst</tt> stack slot to be before local variables on the - stack. This is to ensure that if a local variable on the stack is - overwritten, it will destroy the value of the guard. When the function exits, - the guard on the stack is checked against the original guard. If they are - different, then the program aborts by calling the <tt>__stack_chk_fail()</tt> - function.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_objectsize">'<tt>llvm.objectsize</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - declare i32 @llvm.objectsize.i32(i8* <object>, i1 <min>) - declare i64 @llvm.objectsize.i64(i8* <object>, i1 <min>) -</pre> - -<h5>Overview:</h5> -<p>The <tt>llvm.objectsize</tt> intrinsic is designed to provide information to - the optimizers to determine at compile time whether a) an operation (like - memcpy) will overflow a buffer that corresponds to an object, or b) that a - runtime check for overflow isn't necessary. An object in this context means - an allocation of a specific class, structure, array, or other object.</p> - -<h5>Arguments:</h5> -<p>The <tt>llvm.objectsize</tt> intrinsic takes two arguments. The first - argument is a pointer to or into the <tt>object</tt>. The second argument - is a boolean and determines whether <tt>llvm.objectsize</tt> returns 0 (if - true) or -1 (if false) when the object size is unknown. - The second argument only accepts constants.</p> - -<h5>Semantics:</h5> -<p>The <tt>llvm.objectsize</tt> intrinsic is lowered to a constant representing - the size of the object concerned. If the size cannot be determined at compile - time, <tt>llvm.objectsize</tt> returns <tt>i32/i64 -1 or 0</tt> - (depending on the <tt>min</tt> argument).</p> - -</div> -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_expect">'<tt>llvm.expect</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - declare i32 @llvm.expect.i32(i32 <val>, i32 <expected_val>) - declare i64 @llvm.expect.i64(i64 <val>, i64 <expected_val>) -</pre> - -<h5>Overview:</h5> -<p>The <tt>llvm.expect</tt> intrinsic provides information about expected (the - most probable) value of <tt>val</tt>, which can be used by optimizers.</p> - -<h5>Arguments:</h5> -<p>The <tt>llvm.expect</tt> intrinsic takes two arguments. The first - argument is a value. The second argument is an expected value, this needs to - be a constant value, variables are not allowed.</p> - -<h5>Semantics:</h5> -<p>This intrinsic is lowered to the <tt>val</tt>.</p> -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="int_donothing">'<tt>llvm.donothing</tt>' Intrinsic</a> -</h4> - -<div> - -<h5>Syntax:</h5> -<pre> - declare void @llvm.donothing() nounwind readnone -</pre> - -<h5>Overview:</h5> -<p>The <tt>llvm.donothing</tt> intrinsic doesn't perform any operation. It's the -only intrinsic that can be called with an invoke instruction.</p> - -<h5>Arguments:</h5> -<p>None.</p> - -<h5>Semantics:</h5> -<p>This intrinsic does nothing, and it's removed by optimizers and ignored by -codegen.</p> -</div> - -</div> - -</div> -<!-- *********************************************************************** --> -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss-blue" alt="Valid CSS"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"></a> - - <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> - <a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> - -</body> -</html> diff --git a/docs/LangRef.rst b/docs/LangRef.rst new file mode 100644 index 0000000000..1ea475dee6 --- /dev/null +++ b/docs/LangRef.rst @@ -0,0 +1,8298 @@ +============================== +LLVM Language Reference Manual +============================== + +.. contents:: + :local: + :depth: 3 + +Abstract +======== + +This document is a reference manual for the LLVM assembly language. LLVM +is a Static Single Assignment (SSA) based representation that provides +type safety, low-level operations, flexibility, and the capability of +representing 'all' high-level languages cleanly. It is the common code +representation used throughout all phases of the LLVM compilation +strategy. + +Introduction +============ + +The LLVM code representation is designed to be used in three different +forms: as an in-memory compiler IR, as an on-disk bitcode representation +(suitable for fast loading by a Just-In-Time compiler), and as a human +readable assembly language representation. This allows LLVM to provide a +powerful intermediate representation for efficient compiler +transformations and analysis, while providing a natural means to debug +and visualize the transformations. The three different forms of LLVM are +all equivalent. This document describes the human readable +representation and notation. + +The LLVM representation aims to be light-weight and low-level while +being expressive, typed, and extensible at the same time. It aims to be +a "universal IR" of sorts, by being at a low enough level that +high-level ideas may be cleanly mapped to it (similar to how +microprocessors are "universal IR's", allowing many source languages to +be mapped to them). By providing type information, LLVM can be used as +the target of optimizations: for example, through pointer analysis, it +can be proven that a C automatic variable is never accessed outside of +the current function, allowing it to be promoted to a simple SSA value +instead of a memory location. + +.. _wellformed: + +Well-Formedness +--------------- + +It is important to note that this document describes 'well formed' LLVM +assembly language. There is a difference between what the parser accepts +and what is considered 'well formed'. For example, the following +instruction is syntactically okay, but not well formed: + +.. code-block:: llvm + + %x = add i32 1, %x + +because the definition of ``%x`` does not dominate all of its uses. The +LLVM infrastructure provides a verification pass that may be used to +verify that an LLVM module is well formed. This pass is automatically +run by the parser after parsing input assembly and by the optimizer +before it outputs bitcode. The violations pointed out by the verifier +pass indicate bugs in transformation passes or input to the parser. + +.. _identifiers: + +Identifiers +=========== + +LLVM identifiers come in two basic types: global and local. Global +identifiers (functions, global variables) begin with the ``'@'`` +character. Local identifiers (register names, types) begin with the +``'%'`` character. Additionally, there are three different formats for +identifiers, for different purposes: + +#. Named values are represented as a string of characters with their + prefix. For example, ``%foo``, ``@DivisionByZero``, + ``%a.really.long.identifier``. The actual regular expression used is + '``[%@][a-zA-Z$._][a-zA-Z$._0-9]*``'. Identifiers which require other + characters in their names can be surrounded with quotes. Special + characters may be escaped using ``"\xx"`` where ``xx`` is the ASCII + code for the character in hexadecimal. In this way, any character can + be used in a name value, even quotes themselves. +#. Unnamed values are represented as an unsigned numeric value with + their prefix. For example, ``%12``, ``@2``, ``%44``. +#. Constants, which are described in the section Constants_ below. + +LLVM requires that values start with a prefix for two reasons: Compilers +don't need to worry about name clashes with reserved words, and the set +of reserved words may be expanded in the future without penalty. +Additionally, unnamed identifiers allow a compiler to quickly come up +with a temporary variable without having to avoid symbol table +conflicts. + +Reserved words in LLVM are very similar to reserved words in other +languages. There are keywords for different opcodes ('``add``', +'``bitcast``', '``ret``', etc...), for primitive type names ('``void``', +'``i32``', etc...), and others. These reserved words cannot conflict +with variable names, because none of them start with a prefix character +(``'%'`` or ``'@'``). + +Here is an example of LLVM code to multiply the integer variable +'``%X``' by 8: + +The easy way: + +.. code-block:: llvm + + %result = mul i32 %X, 8 + +After strength reduction: + +.. code-block:: llvm + + %result = shl i32 %X, i8 3 + +And the hard way: + +.. code-block:: llvm + + %0 = add i32 %X, %X ; yields {i32}:%0 + %1 = add i32 %0, %0 ; yields {i32}:%1 + %result = add i32 %1, %1 + +This last way of multiplying ``%X`` by 8 illustrates several important +lexical features of LLVM: + +#. Comments are delimited with a '``;``' and go until the end of line. +#. Unnamed temporaries are created when the result of a computation is + not assigned to a named value. +#. Unnamed temporaries are numbered sequentially + +It also shows a convention that we follow in this document. When +demonstrating instructions, we will follow an instruction with a comment +that defines the type and name of value produced. + +High Level Structure +==================== + +Module Structure +---------------- + +LLVM programs are composed of ``Module``'s, each of which is a +translation unit of the input programs. Each module consists of +functions, global variables, and symbol table entries. Modules may be +combined together with the LLVM linker, which merges function (and +global variable) definitions, resolves forward declarations, and merges +symbol table entries. Here is an example of the "hello world" module: + +.. code-block:: llvm + + ; Declare the string constant as a global constant. + @.str = private unnamed_addr constant [13 x i8] c"hello world\0A\00" + + ; External declaration of the puts function + declare i32 @puts(i8* nocapture) nounwind + + ; Definition of main function + define i32 @main() { ; i32()*  + ; Convert [13 x i8]* to i8 *... + %cast210 = getelementptr [13 x i8]* @.str, i64 0, i64 0 + + ; Call puts function to write out the string to stdout. + call i32 @puts(i8* %cast210) + ret i32 0 + } + + ; Named metadata + !1 = metadata !{i32 42} + !foo = !{!1, null} + +This example is made up of a :ref:`global variable <globalvars>` named +"``.str``", an external declaration of the "``puts``" function, a +:ref:`function definition <functionstructure>` for "``main``" and +:ref:`named metadata <namedmetadatastructure>` "``foo``". + +In general, a module is made up of a list of global values (where both +functions and global variables are global values). Global values are +represented by a pointer to a memory location (in this case, a pointer +to an array of char, and a pointer to a function), and have one of the +following :ref:`linkage types <linkage>`. + +.. _linkage: + +Linkage Types +------------- + +All Global Variables and Functions have one of the following types of +linkage: + +``private`` + Global values with "``private``" linkage are only directly + accessible by objects in the current module. In particular, linking + code into a module with an private global value may cause the + private to be renamed as necessary to avoid collisions. Because the + symbol is private to the module, all references can be updated. This + doesn't show up in any symbol table in the object file. +``linker_private`` + Similar to ``private``, but the symbol is passed through the + assembler and evaluated by the linker. Unlike normal strong symbols, + they are removed by the linker from the final linked image + (executable or dynamic library). +``linker_private_weak`` + Similar to "``linker_private``", but the symbol is weak. Note that + ``linker_private_weak`` symbols are subject to coalescing by the + linker. The symbols are removed by the linker from the final linked + image (executable or dynamic library). +``internal`` + Similar to private, but the value shows as a local symbol + (``STB_LOCAL`` in the case of ELF) in the object file. This + corresponds to the notion of the '``static``' keyword in C. +``available_externally`` + Globals with "``available_externally``" linkage are never emitted + into the object file corresponding to the LLVM module. They exist to + allow inlining and other optimizations to take place given knowledge + of the definition of the global, which is known to be somewhere + outside the module. Globals with ``available_externally`` linkage + are allowed to be discarded at will, and are otherwise the same as + ``linkonce_odr``. This linkage type is only allowed on definitions, + not declarations. +``linkonce`` + Globals with "``linkonce``" linkage are merged with other globals of + the same name when linkage occurs. This can be used to implement + some forms of inline functions, templates, or other code which must + be generated in each translation unit that uses it, but where the + body may be overridden with a more definitive definition later. + Unreferenced ``linkonce`` globals are allowed to be discarded. Note + that ``linkonce`` linkage does not actually allow the optimizer to + inline the body of this function into callers because it doesn't + know if this definition of the function is the definitive definition + within the program or whether it will be overridden by a stronger + definition. To enable inlining and other optimizations, use + "``linkonce_odr``" linkage. +``weak`` + "``weak``" linkage has the same merging semantics as ``linkonce`` + linkage, except that unreferenced globals with ``weak`` linkage may + not be discarded. This is used for globals that are declared "weak" + in C source code. +``common`` + "``common``" linkage is most similar to "``weak``" linkage, but they + are used for tentative definitions in C, such as "``int X;``" at + global scope. Symbols with "``common``" linkage are merged in the + same way as ``weak symbols``, and they may not be deleted if + unreferenced. ``common`` symbols may not have an explicit section, + must have a zero initializer, and may not be marked + ':ref:`constant <globalvars>`'. Functions and aliases may not have + common linkage. + +.. _linkage_appending: + +``appending`` + "``appending``" linkage may only be applied to global variables of + pointer to array type. When two global variables with appending + linkage are linked together, the two global arrays are appended + together. This is the LLVM, typesafe, equivalent of having the + system linker append together "sections" with identical names when + .o files are linked. +``extern_weak`` + The semantics of this linkage follow the ELF object file model: the + symbol is weak until linked, if not linked, the symbol becomes null + instead of being an undefined reference. +``linkonce_odr``, ``weak_odr`` + Some languages allow differing globals to be merged, such as two + functions with different semantics. Other languages, such as + ``C++``, ensure that only equivalent globals are ever merged (the + "one definition rule" — "ODR"). Such languages can use the + ``linkonce_odr`` and ``weak_odr`` linkage types to indicate that the + global will only be merged with equivalent globals. These linkage + types are otherwise the same as their non-``odr`` versions. +``linkonce_odr_auto_hide`` + Similar to "``linkonce_odr``", but nothing in the translation unit + takes the address of this definition. For instance, functions that + had an inline definition, but the compiler decided not to inline it. + ``linkonce_odr_auto_hide`` may have only ``default`` visibility. The + symbols are removed by the linker from the final linked image + (executable or dynamic library). +``external`` + If none of the above identifiers are used, the global is externally + visible, meaning that it participates in linkage and can be used to + resolve external symbol references. + +The next two types of linkage are targeted for Microsoft Windows +platform only. They are designed to support importing (exporting) +symbols from (to) DLLs (Dynamic Link Libraries). + +``dllimport`` + "``dllimport``" linkage causes the compiler to reference a function + or variable via a global pointer to a pointer that is set up by the + DLL exporting the symbol. On Microsoft Windows targets, the pointer + name is formed by combining ``__imp_`` and the function or variable + name. +``dllexport`` + "``dllexport``" linkage causes the compiler to provide a global + pointer to a pointer in a DLL, so that it can be referenced with the + ``dllimport`` attribute. On Microsoft Windows targets, the pointer + name is formed by combining ``__imp_`` and the function or variable + name. + +For example, since the "``.LC0``" variable is defined to be internal, if +another module defined a "``.LC0``" variable and was linked with this +one, one of the two would be renamed, preventing a collision. Since +"``main``" and "``puts``" are external (i.e., lacking any linkage +declarations), they are accessible outside of the current module. + +It is illegal for a function *declaration* to have any linkage type +other than ``external``, ``dllimport`` or ``extern_weak``. + +Aliases can have only ``external``, ``internal``, ``weak`` or +``weak_odr`` linkages. + +.. _callingconv: + +Calling Conventions +------------------- + +LLVM :ref:`functions <functionstructure>`, :ref:`calls <i_call>` and +:ref:`invokes <i_invoke>` can all have an optional calling convention +specified for the call. The calling convention of any pair of dynamic +caller/callee must match, or the behavior of the program is undefined. +The following calling conventions are supported by LLVM, and more may be +added in the future: + +"``ccc``" - The C calling convention + This calling convention (the default if no other calling convention + is specified) matches the target C calling conventions. This calling + convention supports varargs function calls and tolerates some + mismatch in the declared prototype and implemented declaration of + the function (as does normal C). +"``fastcc``" - The fast calling convention + This calling convention attempts to make calls as fast as possible + (e.g. by passing things in registers). This calling convention + allows the target to use whatever tricks it wants to produce fast + code for the target, without having to conform to an externally + specified ABI (Application Binary Interface). `Tail calls can only + be optimized when this, the GHC or the HiPE convention is + used. <CodeGenerator.html#id80>`_ This calling convention does not + support varargs and requires the prototype of all callees to exactly + match the prototype of the function definition. +"``coldcc``" - The cold calling convention + This calling convention attempts to make code in the caller as + efficient as possible under the assumption that the call is not + commonly executed. As such, these calls often preserve all registers + so that the call does not break any live ranges in the caller side. + This calling convention does not support varargs and requires the + prototype of all callees to exactly match the prototype of the + function definition. +"``cc 10``" - GHC convention + This calling convention has been implemented specifically for use by + the `Glasgow Haskell Compiler (GHC) <http://www.haskell.org/ghc>`_. + It passes everything in registers, going to extremes to achieve this + by disabling callee save registers. This calling convention should + not be used lightly but only for specific situations such as an + alternative to the *register pinning* performance technique often + used when implementing functional programming languages. At the + moment only X86 supports this convention and it has the following + limitations: + + - On *X86-32* only supports up to 4 bit type parameters. No + floating point types are supported. + - On *X86-64* only supports up to 10 bit type parameters and 6 + floating point parameters. + + This calling convention supports `tail call + optimization <CodeGenerator.html#id80>`_ but requires both the + caller and callee are using it. +"``cc 11``" - The HiPE calling convention + This calling convention has been implemented specifically for use by + the `High-Performance Erlang + (HiPE) <http://www.it.uu.se/research/group/hipe/>`_ compiler, *the* + native code compiler of the `Ericsson's Open Source Erlang/OTP + system <http://www.erlang.org/download.shtml>`_. It uses more + registers for argument passing than the ordinary C calling + convention and defines no callee-saved registers. The calling + convention properly supports `tail call + optimization <CodeGenerator.html#id80>`_ but requires that both the + caller and the callee use it. It uses a *register pinning* + mechanism, similar to GHC's convention, for keeping frequently + accessed runtime components pinned to specific hardware registers. + At the moment only X86 supports this convention (both 32 and 64 + bit). +"``cc <n>``" - Numbered convention + Any calling convention may be specified by number, allowing + target-specific calling conventions to be used. Target specific + calling conventions start at 64. + +More calling conventions can be added/defined on an as-needed basis, to +support Pascal conventions or any other well-known target-independent +convention. + +Visibility Styles +----------------- + +All Global Variables and Functions have one of the following visibility +styles: + +"``default``" - Default style + On targets that use the ELF object file format, default visibility + means that the declaration is visible to other modules and, in + shared libraries, means that the declared entity may be overridden. + On Darwin, default visibility means that the declaration is visible + to other modules. Default visibility corresponds to "external + linkage" in the language. +"``hidden``" - Hidden style + Two declarations of an object with hidden visibility refer to the + same object if they are in the same shared object. Usually, hidden + visibility indicates that the symbol will not be placed into the + dynamic symbol table, so no other module (executable or shared + library) can reference it directly. +"``protected``" - Protected style + On ELF, protected visibility indicates that the symbol will be + placed in the dynamic symbol table, but that references within the + defining module will bind to the local symbol. That is, the symbol + cannot be overridden by another module. + +Named Types +----------- + +LLVM IR allows you to specify name aliases for certain types. This can +make it easier to read the IR and make the IR more condensed +(particularly when recursive types are involved). An example of a name +specification is: + +.. code-block:: llvm + + %mytype = type { %mytype*, i32 } + +You may give a name to any :ref:`type <typesystem>` except +":ref:`void <t_void>`". Type name aliases may be used anywhere a type is +expected with the syntax "%mytype". + +Note that type names are aliases for the structural type that they +indicate, and that you can therefore specify multiple names for the same +type. This often leads to confusing behavior when dumping out a .ll +file. Since LLVM IR uses structural typing, the name is not part of the +type. When printing out LLVM IR, the printer will pick *one name* to +render all types of a particular shape. This means that if you have code +where two different source types end up having the same LLVM type, that +the dumper will sometimes print the "wrong" or unexpected type. This is +an important design point and isn't going to change. + +.. _globalvars: + +Global Variables +---------------- + +Global variables define regions of memory allocated at compilation time +instead of run-time. Global variables may optionally be initialized, may +have an explicit section to be placed in, and may have an optional +explicit alignment specified. + +A variable may be defined as ``thread_local``, which means that it will +not be shared by threads (each thread will have a separated copy of the +variable). Not all targets support thread-local variables. Optionally, a +TLS model may be specified: + +``localdynamic`` + For variables that are only used within the current shared library. +``initialexec`` + For variables in modules that will not be loaded dynamically. +``localexec`` + For variables defined in the executable and only used within it. + +The models correspond to the ELF TLS models; see `ELF Handling For +Thread-Local Storage <http://people.redhat.com/drepper/tls.pdf>`_ for +more information on under which circumstances the different models may +be used. The target may choose a different TLS model if the specified +model is not supported, or if a better choice of model can be made. + +A variable may be defined as a global "constant," which indicates that +the contents of the variable will **never** be modified (enabling better +optimization, allowing the global data to be placed in the read-only +section of an executable, etc). Note that variables that need runtime +initialization cannot be marked "constant" as there is a store to the +variable. + +LLVM explicitly allows *declarations* of global variables to be marked +constant, even if the final definition of the global is not. This +capability can be used to enable slightly better optimization of the +program, but requires the language definition to guarantee that +optimizations based on the 'constantness' are valid for the translation +units that do not include the definition. + +As SSA values, global variables define pointer values that are in scope +(i.e. they dominate) all basic blocks in the program. Global variables +always define a pointer to their "content" type because they describe a +region of memory, and all memory objects in LLVM are accessed through +pointers. + +Global variables can be marked with ``unnamed_addr`` which indicates +that the address is not significant, only the content. Constants marked +like this can be merged with other constants if they have the same +initializer. Note that a constant with significant address *can* be +merged with a ``unnamed_addr`` constant, the result being a constant +whose address is significant. + +A global variable may be declared to reside in a target-specific +numbered address space. For targets that support them, address spaces +may affect how optimizations are performed and/or what target +instructions are used to access the variable. The default address space +is zero. The address space qualifier must precede any other attributes. + +LLVM allows an explicit section to be specified for globals. If the +target supports it, it will emit globals to the section specified. + +An explicit alignment may be specified for a global, which must be a +power of 2. If not present, or if the alignment is set to zero, the +alignment of the global is set by the target to whatever it feels +convenient. If an explicit alignment is specified, the global is forced +to have exactly that alignment. Targets and optimizers are not allowed +to over-align the global if the global has an assigned section. In this +case, the extra alignment could be observable: for example, code could +assume that the globals are densely packed in their section and try to +iterate over them as an array, alignment padding would break this +iteration. + +For example, the following defines a global in a numbered address space +with an initializer, section, and alignment: + +.. code-block:: llvm + + @G = addrspace(5) constant float 1.0, section "foo", align 4 + +The following example defines a thread-local global with the +``initialexec`` TLS model: + +.. code-block:: llvm + + @G = thread_local(initialexec) global i32 0, align 4 + +.. _functionstructure: + +Functions +--------- + +LLVM function definitions consist of the "``define``" keyword, an +optional :ref:`linkage type <linkage>`, an optional :ref:`visibility +style <visibility>`, an optional :ref:`calling convention <callingconv>`, +an optional ``unnamed_addr`` attribute, a return type, an optional +:ref:`parameter attribute <paramattrs>` for the return type, a function +name, a (possibly empty) argument list (each with optional :ref:`parameter +attributes <paramattrs>`), optional :ref:`function attributes <fnattrs>`, +an optional section, an optional alignment, an optional :ref:`garbage +collector name <gc>`, an opening curly brace, a list of basic blocks, +and a closing curly brace. + +LLVM function declarations consist of the "``declare``" keyword, an +optional :ref:`linkage type <linkage>`, an optional :ref:`visibility +style <visibility>`, an optional :ref:`calling convention <callingconv>`, +an optional ``unnamed_addr`` attribute, a return type, an optional +:ref:`parameter attribute <paramattrs>` for the return type, a function +name, a possibly empty list of arguments, an optional alignment, and an +optional :ref:`garbage collector name <gc>`. + +A function definition contains a list of basic blocks, forming the CFG +(Control Flow Graph) for the function. Each basic block may optionally +start with a label (giving the basic block a symbol table entry), +contains a list of instructions, and ends with a +:ref:`terminator <terminators>` instruction (such as a branch or function +return). + +The first basic block in a function is special in two ways: it is +immediately executed on entrance to the function, and it is not allowed +to have predecessor basic blocks (i.e. there can not be any branches to +the entry block of a function). Because the block can have no +predecessors, it also cannot have any :ref:`PHI nodes <i_phi>`. + +LLVM allows an explicit section to be specified for functions. If the +target supports it, it will emit functions to the section specified. + +An explicit alignment may be specified for a function. If not present, +or if the alignment is set to zero, the alignment of the function is set +by the target to whatever it feels convenient. If an explicit alignment +is specified, the function is forced to have at least that much +alignment. All alignments must be a power of 2. + +If the ``unnamed_addr`` attribute is given, the address is know to not +be significant and two identical functions can be merged. + +Syntax:: + + define [linkage] [visibility] + [cconv] [ret attrs] + <ResultType> @<FunctionName> ([argument list]) + [fn Attrs] [section "name"] [align N] + [gc] { ... } + +Aliases +------- + +Aliases act as "second name" for the aliasee value (which can be either +function, global variable, another alias or bitcast of global value). +Aliases may have an optional :ref:`linkage type <linkage>`, and an optional +:ref:`visibility style <visibility>`. + +Syntax:: + + @<Name> = alias [Linkage] [Visibility] <AliaseeTy> @<Aliasee> + +.. _namedmetadatastructure: + +Named Metadata +-------------- + +Named metadata is a collection of metadata. :ref:`Metadata +nodes <metadata>` (but not metadata strings) are the only valid +operands for a named metadata. + +Syntax:: + + ; Some unnamed metadata nodes, which are referenced by the named metadata. + !0 = metadata !{metadata !"zero"} + !1 = metadata !{metadata !"one"} + !2 = metadata !{metadata !"two"} + ; A named metadata. + !name = !{!0, !1, !2} + +.. _paramattrs: + +Parameter Attributes +-------------------- + +The return type and each parameter of a function type may have a set of +*parameter attributes* associated with them. Parameter attributes are +used to communicate additional information about the result or +parameters of a function. Parameter attributes are considered to be part +of the function, not of the function type, so functions with different +parameter attributes can have the same function type. + +Parameter attributes are simple keywords that follow the type specified. +If multiple parameter attributes are needed, they are space separated. +For example: + +.. code-block:: llvm + + declare i32 @printf(i8* noalias nocapture, ...) + declare i32 @atoi(i8 zeroext) + declare signext i8 @returns_signed_char() + +Note that any attributes for the function result (``nounwind``, +``readonly``) come immediately after the argument list. + +Currently, only the following parameter attributes are defined: + +``zeroext`` + This indicates to the code generator that the parameter or return + value should be zero-extended to the extent required by the target's + ABI (which is usually 32-bits, but is 8-bits for a i1 on x86-64) by + the caller (for a parameter) or the callee (for a return value). +``signext`` + This indicates to the code generator that the parameter or return + value should be sign-extended to the extent required by the target's + ABI (which is usually 32-bits) by the caller (for a parameter) or + the callee (for a return value). +``inreg`` + This indicates that this parameter or return value should be treated + in a special target-dependent fashion during while emitting code for + a function call or return (usually, by putting it in a register as + opposed to memory, though some targets use it to distinguish between + two different kinds of registers). Use of this attribute is + target-specific. +``byval`` + This indicates that the pointer parameter should really be passed by + value to the function. The attribute implies that a hidden copy of + the pointee is made between the caller and the callee, so the callee + is unable to modify the value in the caller. This attribute is only + valid on LLVM pointer arguments. It is generally used to pass + structs and arrays by value, but is also valid on pointers to + scalars. The copy is considered to belong to the caller not the + callee (for example, ``readonly`` functions should not write to + ``byval`` parameters). This is not a valid attribute for return + values. + + The byval attribute also supports specifying an alignment with the + align attribute. It indicates the alignment of the stack slot to + form and the known alignment of the pointer specified to the call + site. If the alignment is not specified, then the code generator + makes a target-specific assumption. + +``sret`` + This indicates that the pointer parameter specifies the address of a + structure that is the return value of the function in the source + program. This pointer must be guaranteed by the caller to be valid: + loads and stores to the structure may be assumed by the callee to + not to trap and to be properly aligned. This may only be applied to + the first parameter. This is not a valid attribute for return + values. +``noalias`` + This indicates that pointer values `*based* <pointeraliasing>` on + the argument or return value do not alias pointer values which are + not *based* on it, ignoring certain "irrelevant" dependencies. For a + call to the parent function, dependencies between memory references + from before or after the call and from those during the call are + "irrelevant" to the ``noalias`` keyword for the arguments and return + value used in that call. The caller shares the responsibility with + the callee for ensuring that these requirements are met. For further + details, please see the discussion of the NoAlias response in `alias + analysis <AliasAnalysis.html#MustMayNo>`_. + + Note that this definition of ``noalias`` is intentionally similar + to the definition of ``restrict`` in C99 for function arguments, + though it is slightly weaker. + + For function return values, C99's ``restrict`` is not meaningful, + while LLVM's ``noalias`` is. +``nocapture`` + This indicates that the callee does not make any copies of the + pointer that outlive the callee itself. This is not a valid + attribute for return values. + +.. _nest: + +``nest`` + This indicates that the pointer parameter can be excised using the + :ref:`trampoline intrinsics <int_trampoline>`. This is not a valid + attribute for return values. + +.. _gc: + +Garbage Collector Names +----------------------- + +Each function may specify a garbage collector name, which is simply a +string: + +.. code-block:: llvm + + define void @f() gc "name" { ... } + +The compiler declares the supported values of *name*. Specifying a +collector which will cause the compiler to alter its output in order to +support the named garbage collection algorithm. + +.. _fnattrs: + +Function Attributes +------------------- + +Function attributes are set to communicate additional information about +a function. Function attributes are considered to be part of the +function, not of the function type, so functions with different function +attributes can have the same function type. + +Function attributes are simple keywords that follow the type specified. +If multiple attributes are needed, they are space separated. For +example: + +.. code-block:: llvm + + define void @f() noinline { ... } + define void @f() alwaysinline { ... } + define void @f() alwaysinline optsize { ... } + define void @f() optsize { ... } + +``address_safety`` + This attribute indicates that the address safety analysis is enabled + for this function. +``alignstack(<n>)`` + This attribute indicates that, when emitting the prologue and + epilogue, the backend should forcibly align the stack pointer. + Specify the desired alignment, which must be a power of two, in + parentheses. +``alwaysinline`` + This attribute indicates that the inliner should attempt to inline + this function into callers whenever possible, ignoring any active + inlining size threshold for this caller. +``nonlazybind`` + This attribute suppresses lazy symbol binding for the function. This + may make calls to the function faster, at the cost of extra program + startup time if the function is not called during program startup. +``inlinehint`` + This attribute indicates that the source code contained a hint that + inlining this function is desirable (such as the "inline" keyword in + C/C++). It is just a hint; it imposes no requirements on the + inliner. +``naked`` + This attribute disables prologue / epilogue emission for the + function. This can have very system-specific consequences. +``noimplicitfloat`` + This attributes disables implicit floating point instructions. +``noinline`` + This attribute indicates that the inliner should never inline this + function in any situation. This attribute may not be used together + with the ``alwaysinline`` attribute. +``noredzone`` + This attribute indicates that the code generator should not use a + red zone, even if the target-specific ABI normally permits it. +``noreturn`` + This function attribute indicates that the function never returns + normally. This produces undefined behavior at runtime if the + function ever does dynamically return. +``nounwind`` + This function attribute indicates that the function never returns + with an unwind or exceptional control flow. If the function does + unwind, its runtime behavior is undefined. +``optsize`` + This attribute suggests that optimization passes and code generator + passes make choices that keep the code size of this function low, + and otherwise do optimizations specifically to reduce code size. +``readnone`` + This attribute indicates that the function computes its result (or + decides to unwind an exception) based strictly on its arguments, + without dereferencing any pointer arguments or otherwise accessing + any mutable state (e.g. memory, control registers, etc) visible to + caller functions. It does not write through any pointer arguments + (including ``byval`` arguments) and never changes any state visible + to callers. This means that it cannot unwind exceptions by calling + the ``C++`` exception throwing methods. +``readonly`` + This attribute indicates that the function does not write through + any pointer arguments (including ``byval`` arguments) or otherwise + modify any state (e.g. memory, control registers, etc) visible to + caller functions. It may dereference pointer arguments and read + state that may be set in the caller. A readonly function always + returns the same value (or unwinds an exception identically) when + called with the same set of arguments and global state. It cannot + unwind an exception by calling the ``C++`` exception throwing + methods. +``returns_twice`` + This attribute indicates that this function can return twice. The C + ``setjmp`` is an example of such a function. The compiler disables + some optimizations (like tail calls) in the caller of these + functions. +``ssp`` + This attribute indicates that the function should emit a stack + smashing protector. It is in the form of a "canary"—a random value + placed on the stack before the local variables that's checked upon + return from the function to see if it has been overwritten. A + heuristic is used to determine if a function needs stack protectors + or not. + + If a function that has an ``ssp`` attribute is inlined into a + function that doesn't have an ``ssp`` attribute, then the resulting + function will have an ``ssp`` attribute. +``sspreq`` + This attribute indicates that the function should *always* emit a + stack smashing protector. This overrides the ``ssp`` function + attribute. + + If a function that has an ``sspreq`` attribute is inlined into a + function that doesn't have an ``sspreq`` attribute or which has an + ``ssp`` attribute, then the resulting function will have an + ``sspreq`` attribute. +``uwtable`` + This attribute indicates that the ABI being targeted requires that + an unwind table entry be produce for this function even if we can + show that no exceptions passes by it. This is normally the case for + the ELF x86-64 abi, but it can be disabled for some compilation + units. + +.. _moduleasm: + +Module-Level Inline Assembly +---------------------------- + +Modules may contain "module-level inline asm" blocks, which corresponds +to the GCC "file scope inline asm" blocks. These blocks are internally +concatenated by LLVM and treated as a single unit, but may be separated +in the ``.ll`` file if desired. The syntax is very simple: + +.. code-block:: llvm + + module asm "inline asm code goes here" + module asm "more can go here" + +The strings can contain any character by escaping non-printable +characters. The escape sequence used is simply "\\xx" where "xx" is the +two digit hex code for the number. + +The inline asm code is simply printed to the machine code .s file when +assembly code is generated. + +Data Layout +----------- + +A module may specify a target specific data layout string that specifies +how data is to be laid out in memory. The syntax for the data layout is +simply: + +.. code-block:: llvm + + target datalayout = "layout specification" + +The *layout specification* consists of a list of specifications +separated by the minus sign character ('-'). Each specification starts +with a letter and may include other information after the letter to +define some aspect of the data layout. The specifications accepted are +as follows: + +``E`` + Specifies that the target lays out data in big-endian form. That is, + the bits with the most significance have the lowest address + location. +``e`` + Specifies that the target lays out data in little-endian form. That + is, the bits with the least significance have the lowest address + location. +``S<size>`` + Specifies the natural alignment of the stack in bits. Alignment + promotion of stack variables is limited to the natural stack + alignment to avoid dynamic stack realignment. The stack alignment + must be a multiple of 8-bits. If omitted, the natural stack + alignment defaults to "unspecified", which does not prevent any + alignment promotions. +``p[n]:<size>:<abi>:<pref>`` + This specifies the *size* of a pointer and its ``<abi>`` and + ``<pref>``\erred alignments for address space ``n``. All sizes are in + bits. Specifying the ``<pref>`` alignment is optional. If omitted, the + preceding ``:`` should be omitted too. The address space, ``n`` is + optional, and if not specified, denotes the default address space 0. + The value of ``n`` must be in the range [1,2^23). +``i<size>:<abi>:<pref>`` + This specifies the alignment for an integer type of a given bit + ``<size>``. The value of ``<size>`` must be in the range [1,2^23). +``v<size>:<abi>:<pref>`` + This specifies the alignment for a vector type of a given bit + ``<size>``. +``f<size>:<abi>:<pref>`` + This specifies the alignment for a floating point type of a given bit + ``<size>``. Only values of ``<size>`` that are supported by the target + will work. 32 (float) and 64 (double) are supported on all targets; 80 + or 128 (different flavors of long double) are also supported on some + targets. +``a<size>:<abi>:<pref>`` + This specifies the alignment for an aggregate type of a given bit + ``<size>``. +``s<size>:<abi>:<pref>`` + This specifies the alignment for a stack object of a given bit + ``<size>``. +``n<size1>:<size2>:<size3>...`` + This specifies a set of native integer widths for the target CPU in + bits. For example, it might contain ``n32`` for 32-bit PowerPC, + ``n32:64`` for PowerPC 64, or ``n8:16:32:64`` for X86-64. Elements of + this set are considered to support most general arithmetic operations + efficiently. + +When constructing the data layout for a given target, LLVM starts with a +default set of specifications which are then (possibly) overridden by +the specifications in the ``datalayout`` keyword. The default +specifications are given in this list: + +- ``E`` - big endian +- ``p:64:64:64`` - 64-bit pointers with 64-bit alignment +- ``p1:32:32:32`` - 32-bit pointers with 32-bit alignment for address + space 1 +- ``p2:16:32:32`` - 16-bit pointers with 32-bit alignment for address + space 2 +- ``i1:8:8`` - i1 is 8-bit (byte) aligned +- ``i8:8:8`` - i8 is 8-bit (byte) aligned +- ``i16:16:16`` - i16 is 16-bit aligned +- ``i32:32:32`` - i32 is 32-bit aligned +- ``i64:32:64`` - i64 has ABI alignment of 32-bits but preferred + alignment of 64-bits +- ``f32:32:32`` - float is 32-bit aligned +- ``f64:64:64`` - double is 64-bit aligned +- ``v64:64:64`` - 64-bit vector is 64-bit aligned +- ``v128:128:128`` - 128-bit vector is 128-bit aligned +- ``a0:0:1`` - aggregates are 8-bit aligned +- ``s0:64:64`` - stack objects are 64-bit aligned + +When LLVM is determining the alignment for a given type, it uses the +following rules: + +#. If the type sought is an exact match for one of the specifications, + that specification is used. +#. If no match is found, and the type sought is an integer type, then + the smallest integer type that is larger than the bitwidth of the + sought type is used. If none of the specifications are larger than + the bitwidth then the largest integer type is used. For example, + given the default specifications above, the i7 type will use the + alignment of i8 (next largest) while both i65 and i256 will use the + alignment of i64 (largest specified). +#. If no match is found, and the type sought is a vector type, then the + largest vector type that is smaller than the sought vector type will + be used as a fall back. This happens because <128 x double> can be + implemented in terms of 64 <2 x double>, for example. + +The function of the data layout string may not be what you expect. +Notably, this is not a specification from the frontend of what alignment +the code generator should use. + +Instead, if specified, the target data layout is required to match what +the ultimate *code generator* expects. This string is used by the +mid-level optimizers to improve code, and this only works if it matches +what the ultimate code generator uses. If you would like to generate IR +that does not embed this target-specific detail into the IR, then you +don't have to specify the string. This will disable some optimizations +that require precise layout information, but this also prevents those +optimizations from introducing target specificity into the IR. + +.. _pointeraliasing: + +Pointer Aliasing Rules +---------------------- + +Any memory access must be done through a pointer value associated with +an address range of the memory access, otherwise the behavior is +undefined. Pointer values are associated with address ranges according +to the following rules: + +- A pointer value is associated with the addresses associated with any + value it is *based* on. +- An address of a global variable is associated with the address range + of the variable's storage. +- The result value of an allocation instruction is associated with the + address range of the allocated storage. +- A null pointer in the default address-space is associated with no + address. +- An integer constant other than zero or a pointer value returned from + a function not defined within LLVM may be associated with address + ranges allocated through mechanisms other than those provided by + LLVM. Such ranges shall not overlap with any ranges of addresses + allocated by mechanisms provided by LLVM. + +A pointer value is *based* on another pointer value according to the +following rules: + +- A pointer value formed from a ``getelementptr`` operation is *based* + on the first operand of the ``getelementptr``. +- The result value of a ``bitcast`` is *based* on the operand of the + ``bitcast``. +- A pointer value formed by an ``inttoptr`` is *based* on all pointer + values that contribute (directly or indirectly) to the computation of + the pointer's value. +- The "*based* on" relationship is transitive. + +Note that this definition of *"based"* is intentionally similar to the +definition of *"based"* in C99, though it is slightly weaker. + +LLVM IR does not associate types with memory. The result type of a +``load`` merely indicates the size and alignment of the memory from +which to load, as well as the interpretation of the value. The first +operand type of a ``store`` similarly only indicates the size and +alignment of the store. + +Consequently, type-based alias analysis, aka TBAA, aka +``-fstrict-aliasing``, is not applicable to general unadorned LLVM IR. +:ref:`Metadata <metadata>` may be used to encode additional information +which specialized optimization passes may use to implement type-based +alias analysis. + +.. _volatile: + +Volatile Memory Accesses +------------------------ + +Certain memory accesses, such as :ref:`load <i_load>`'s, +:ref:`store <i_store>`'s, and :ref:`llvm.memcpy <int_memcpy>`'s may be +marked ``volatile``. The optimizers must not change the number of +volatile operations or change their order of execution relative to other +volatile operations. The optimizers *may* change the order of volatile +operations relative to non-volatile operations. This is not Java's +"volatile" and has no cross-thread synchronization behavior. + +.. _memmodel: + +Memory Model for Concurrent Operations +-------------------------------------- + +The LLVM IR does not define any way to start parallel threads of +execution or to register signal handlers. Nonetheless, there are +platform-specific ways to create them, and we define LLVM IR's behavior +in their presence. This model is inspired by the C++0x memory model. + +For a more informal introduction to this model, see the :doc:`Atomics`. + +We define a *happens-before* partial order as the least partial order +that + +- Is a superset of single-thread program order, and +- When a *synchronizes-with* ``b``, includes an edge from ``a`` to + ``b``. *Synchronizes-with* pairs are introduced by platform-specific + techniques, like pthread locks, thread creation, thread joining, + etc., and by atomic instructions. (See also :ref:`Atomic Memory Ordering + Constraints <ordering>`). + +Note that program order does not introduce *happens-before* edges +between a thread and signals executing inside that thread. + +Every (defined) read operation (load instructions, memcpy, atomic +loads/read-modify-writes, etc.) R reads a series of bytes written by +(defined) write operations (store instructions, atomic +stores/read-modify-writes, memcpy, etc.). For the purposes of this +section, initialized globals are considered to have a write of the +initializer which is atomic and happens before any other read or write +of the memory in question. For each byte of a read R, R\ :sub:`byte` +may see any write to the same byte, except: + +- If write\ :sub:`1` happens before write\ :sub:`2`, and + write\ :sub:`2` happens before R\ :sub:`byte`, then + R\ :sub:`byte` does not see write\ :sub:`1`. +- If R\ :sub:`byte` happens before write\ :sub:`3`, then + R\ :sub:`byte` does not see write\ :sub:`3`. + +Given that definition, R\ :sub:`byte` is defined as follows: + +- If R is volatile, the result is target-dependent. (Volatile is + supposed to give guarantees which can support ``sig_atomic_t`` in + C/C++, and may be used for accesses to addresses which do not behave + like normal memory. It does not generally provide cross-thread + synchronization.) +- Otherwise, if there is no write to the same byte that happens before + R\ :sub:`byte`, R\ :sub:`byte` returns ``undef`` for that byte. +- Otherwise, if R\ :sub:`byte` may see exactly one write, + R\ :sub:`byte` returns the value written by that write. +- Otherwise, if R is atomic, and all the writes R\ :sub:`byte` may + see are atomic, it chooses one of the values written. See the :ref:`Atomic + Memory Ordering Constraints <ordering>` section for additional + constraints on how the choice is made. +- Otherwise R\ :sub:`byte` returns ``undef``. + +R returns the value composed of the series of bytes it read. This +implies that some bytes within the value may be ``undef`` **without** +the entire value being ``undef``. Note that this only defines the +semantics of the operation; it doesn't mean that targets will emit more +than one instruction to read the series of bytes. + +Note that in cases where none of the atomic intrinsics are used, this +model places only one restriction on IR transformations on top of what +is required for single-threaded execution: introducing a store to a byte +which might not otherwise be stored is not allowed in general. +(Specifically, in the case where another thread might write to and read +from an address, introducing a store can change a load that may see +exactly one write into a load that may see multiple writes.) + +.. _ordering: + +Atomic Memory Ordering Constraints +---------------------------------- + +Atomic instructions (:ref:`cmpxchg <i_cmpxchg>`, +:ref:`atomicrmw <i_atomicrmw>`, :ref:`fence <i_fence>`, +:ref:`atomic load <i_load>`, and :ref:`atomic store <i_store>`) take +an ordering parameter that determines which other atomic instructions on +the same address they *synchronize with*. These semantics are borrowed +from Java and C++0x, but are somewhat more colloquial. If these +descriptions aren't precise enough, check those specs (see spec +references in the :doc:`atomics guide <Atomics>`). +:ref:`fence <i_fence>` instructions treat these orderings somewhat +differently since they don't take an address. See that instruction's +documentation for details. + +For a simpler introduction to the ordering constraints, see the +:doc:`Atomics`. + +``unordered`` + The set of values that can be read is governed by the happens-before + partial order. A value cannot be read unless some operation wrote + it. This is intended to provide a guarantee strong enough to model + Java's non-volatile shared variables. This ordering cannot be + specified for read-modify-write operations; it is not strong enough + to make them atomic in any interesting way. +``monotonic`` + In addition to the guarantees of ``unordered``, there is a single + total order for modifications by ``monotonic`` operations on each + address. All modification orders must be compatible with the + happens-before order. There is no guarantee that the modification + orders can be combined to a global total order for the whole program + (and this often will not be possible). The read in an atomic + read-modify-write operation (:ref:`cmpxchg <i_cmpxchg>` and + :ref:`atomicrmw <i_atomicrmw>`) reads the value in the modification + order immediately before the value it writes. If one atomic read + happens before another atomic read of the same address, the later + read must see the same value or a later value in the address's + modification order. This disallows reordering of ``monotonic`` (or + stronger) operations on the same address. If an address is written + ``monotonic``-ally by one thread, and other threads ``monotonic``-ally + read that address repeatedly, the other threads must eventually see + the write. This corresponds to the C++0x/C1x + ``memory_order_relaxed``. +``acquire`` + In addition to the guarantees of ``monotonic``, a + *synchronizes-with* edge may be formed with a ``release`` operation. + This is intended to model C++'s ``memory_order_acquire``. +``release`` + In addition to the guarantees of ``monotonic``, if this operation + writes a value which is subsequently read by an ``acquire`` + operation, it *synchronizes-with* that operation. (This isn't a + complete description; see the C++0x definition of a release + sequence.) This corresponds to the C++0x/C1x + ``memory_order_release``. +``acq_rel`` (acquire+release) + Acts as both an ``acquire`` and ``release`` operation on its + address. This corresponds to the C++0x/C1x ``memory_order_acq_rel``. +``seq_cst`` (sequentially consistent) + In addition to the guarantees of ``acq_rel`` (``acquire`` for an + operation which only reads, ``release`` for an operation which only + writes), there is a global total order on all + sequentially-consistent operations on all addresses, which is + consistent with the *happens-before* partial order and with the + modification orders of all the affected addresses. Each + sequentially-consistent read sees the last preceding write to the + same address in this global order. This corresponds to the C++0x/C1x + ``memory_order_seq_cst`` and Java volatile. + +.. _singlethread: + +If an atomic operation is marked ``singlethread``, it only *synchronizes +with* or participates in modification and seq\_cst total orderings with +other operations running in the same thread (for example, in signal +handlers). + +.. _fastmath: + +Fast-Math Flags +--------------- + +LLVM IR floating-point binary ops (:ref:`fadd <i_fadd>`, +:ref:`fsub <i_fsub>`, :ref:`fmul <i_fmul>`, :ref:`fdiv <i_fdiv>`, +:ref:`frem <i_frem>`) have the following flags that can set to enable +otherwise unsafe floating point operations + +``nnan`` + No NaNs - Allow optimizations to assume the arguments and result are not + NaN. Such optimizations are required to retain defined behavior over + NaNs, but the value of the result is undefined. + +``ninf`` + No Infs - Allow optimizations to assume the arguments and result are not + +/-Inf. Such optimizations are required to retain defined behavior over + +/-Inf, but the value of the result is undefined. + +``nsz`` + No Signed Zeros - Allow optimizations to treat the sign of a zero + argument or result as insignificant. + +``arcp`` + Allow Reciprocal - Allow optimizations to use the reciprocal of an + argument rather than perform division. + +``fast`` + Fast - Allow algebraically equivalent transformations that may + dramatically change results in floating point (e.g. reassociate). This + flag implies all the others. + +.. _typesystem: + +Type System +=========== + +The LLVM type system is one of the most important features of the +intermediate representation. Being typed enables a number of +optimizations to be performed on the intermediate representation +directly, without having to do extra analyses on the side before the +transformation. A strong type system makes it easier to read the +generated code and enables novel analyses and transformations that are +not feasible to perform on normal three address code representations. + +Type Classifications +-------------------- + +The types fall into a few useful classifications: + + +.. list-table:: + :header-rows: 1 + + * - Classification + - Types + + * - :ref:`integer <t_integer>` + - ``i1``, ``i2``, ``i3``, ... ``i8``, ... ``i16``, ... ``i32``, ... + ``i64``, ... + + * - :ref:`floating point <t_floating>` + - ``half``, ``float``, ``double``, ``x86_fp80``, ``fp128``, + ``ppc_fp128`` + + + * - first class + + .. _t_firstclass: + + - :ref:`integer <t_integer>`, :ref:`floating point <t_floating>`, + :ref:`pointer <t_pointer>`, :ref:`vector <t_vector>`, + :ref:`structure <t_struct>`, :ref:`array <t_array>`, + :ref:`label <t_label>`, :ref:`metadata <t_metadata>`. + + * - :ref:`primitive <t_primitive>` + - :ref:`label <t_label>`, + :ref:`void <t_void>`, + :ref:`integer <t_integer>`, + :ref:`floating point <t_floating>`, + :ref:`x86mmx <t_x86mmx>`, + :ref:`metadata <t_metadata>`. + + * - :ref:`derived <t_derived>` + - :ref:`array <t_array>`, + :ref:`function <t_function>`, + :ref:`pointer <t_pointer>`, + :ref:`structure <t_struct>`, + :ref:`vector <t_vector>`, + :ref:`opaque <t_opaque>`. + +The :ref:`first class <t_firstclass>` types are perhaps the most important. +Values of these types are the only ones which can be produced by +instructions. + +.. _t_primitive: + +Primitive Types +--------------- + +The primitive types are the fundamental building blocks of the LLVM +system. + +.. _t_integer: + +Integer Type +^^^^^^^^^^^^ + +Overview: +""""""""" + +The integer type is a very simple type that simply specifies an +arbitrary bit width for the integer type desired. Any bit width from 1 +bit to 2\ :sup:`23`\ -1 (about 8 million) can be specified. + +Syntax: +""""""" + +:: + + iN + +The number of bits the integer will occupy is specified by the ``N`` +value. + +Examples: +""""""""" + ++----------------+------------------------------------------------+ +| ``i1`` | a single-bit integer. | ++----------------+------------------------------------------------+ +| ``i32`` | a 32-bit integer. | ++----------------+------------------------------------------------+ +| ``i1942652`` | a really big integer of over 1 million bits. | ++----------------+------------------------------------------------+ + +.. _t_floating: + +Floating Point Types +^^^^^^^^^^^^^^^^^^^^ + +.. list-table:: + :header-rows: 1 + + * - Type + - Description + + * - ``half`` + - 16-bit floating point value + + * - ``float`` + - 32-bit floating point value + + * - ``double`` + - 64-bit floating point value + + * - ``fp128`` + - 128-bit floating point value (112-bit mantissa) + + * - ``x86_fp80`` + - 80-bit floating point value (X87) + + * - ``ppc_fp128`` + - 128-bit floating point value (two 64-bits) + +.. _t_x86mmx: + +X86mmx Type +^^^^^^^^^^^ + +Overview: +""""""""" + +The x86mmx type represents a value held in an MMX register on an x86 +machine. The operations allowed on it are quite limited: parameters and +return values, load and store, and bitcast. User-specified MMX +instructions are represented as intrinsic or asm calls with arguments +and/or results of this type. There are no arrays, vectors or constants +of this type. + +Syntax: +""""""" + +:: + + x86mmx + +.. _t_void: + +Void Type +^^^^^^^^^ + +Overview: +""""""""" + +The void type does not represent any value and has no size. + +Syntax: +""""""" + +:: + + void + +.. _t_label: + +Label Type +^^^^^^^^^^ + +Overview: +""""""""" + +The label type represents code labels. + +Syntax: +""""""" + +:: + + label + +.. _t_metadata: + +Metadata Type +^^^^^^^^^^^^^ + +Overview: +""""""""" + +The metadata type represents embedded metadata. No derived types may be +created from metadata except for :ref:`function <t_function>` arguments. + +Syntax: +""""""" + +:: + + metadata + +.. _t_derived: + +Derived Types +------------- + +The real power in LLVM comes from the derived types in the system. This +is what allows a programmer to represent arrays, functions, pointers, +and other useful types. Each of these types contain one or more element +types which may be a primitive type, or another derived type. For +example, it is possible to have a two dimensional array, using an array +as the element type of another array. + +.. _t_aggregate: + +Aggregate Types +^^^^^^^^^^^^^^^ + +Aggregate Types are a subset of derived types that can contain multiple +member types. :ref:`Arrays <t_array>` and :ref:`structs <t_struct>` are +aggregate types. :ref:`Vectors <t_vector>` are not considered to be +aggregate types. + +.. _t_array: + +Array Type +^^^^^^^^^^ + +Overview: +""""""""" + +The array type is a very simple derived type that arranges elements +sequentially in memory. The array type requires a size (number of +elements) and an underlying data type. + +Syntax: +""""""" + +:: + + [<# elements> x <elementtype>] + +The number of elements is a constant integer value; ``elementtype`` may +be any type with a size. + +Examples: +""""""""" + ++------------------+--------------------------------------+ +| ``[40 x i32]`` | Array of 40 32-bit integer values. | ++------------------+--------------------------------------+ +| ``[41 x i32]`` | Array of 41 32-bit integer values. | ++------------------+--------------------------------------+ +| ``[4 x i8]`` | Array of 4 8-bit integer values. | ++------------------+--------------------------------------+ + +Here are some examples of multidimensional arrays: + ++-----------------------------+----------------------------------------------------------+ +| ``[3 x [4 x i32]]`` | 3x4 array of 32-bit integer values. | ++-----------------------------+----------------------------------------------------------+ +| ``[12 x [10 x float]]`` | 12x10 array of single precision floating point values. | ++-----------------------------+----------------------------------------------------------+ +| ``[2 x [3 x [4 x i16]]]`` | 2x3x4 array of 16-bit integer values. | ++-----------------------------+----------------------------------------------------------+ + +There is no restriction on indexing beyond the end of the array implied +by a static type (though there are restrictions on indexing beyond the +bounds of an allocated object in some cases). This means that +single-dimension 'variable sized array' addressing can be implemented in +LLVM with a zero length array type. An implementation of 'pascal style +arrays' in LLVM could use the type "``{ i32, [0 x float]}``", for +example. + +.. _t_function: + +Function Type +^^^^^^^^^^^^^ + +Overview: +""""""""" + +The function type can be thought of as a function signature. It consists +of a return type and a list of formal parameter types. The return type +of a function type is a first class type or a void type. + +Syntax: +""""""" + +:: + + <returntype> (<parameter list>) + +...where '``<parameter list>``' is a comma-separated list of type +specifiers. Optionally, the parameter list may include a type ``...``, +which indicates that the function takes a variable number of arguments. +Variable argument functions can access their arguments with the +:ref:`variable argument handling intrinsic <int_varargs>` functions. +'``<returntype>``' is any type except :ref:`label <t_label>`. + +Examples: +""""""""" + ++---------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| ``i32 (i32)`` | function taking an ``i32``, returning an ``i32`` | ++---------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| ``float (i16, i32 *) *`` | :ref:`Pointer <t_pointer>` to a function that takes an ``i16`` and a :ref:`pointer <t_pointer>` to ``i32``, returning ``float``. | ++---------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| ``i32 (i8*, ...)`` | A vararg function that takes at least one :ref:`pointer <t_pointer>` to ``i8`` (char in C), which returns an integer. This is the signature for ``printf`` in LLVM. | ++---------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| ``{i32, i32} (i32)`` | A function taking an ``i32``, returning a :ref:`structure <t_struct>` containing two ``i32`` values | ++---------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +.. _t_struct: + +Structure Type +^^^^^^^^^^^^^^ + +Overview: +""""""""" + +The structure type is used to represent a collection of data members +together in memory. The elements of a structure may be any type that has +a size. + +Structures in memory are accessed using '``load``' and '``store``' by +getting a pointer to a field with the '``getelementptr``' instruction. +Structures in registers are accessed using the '``extractvalue``' and +'``insertvalue``' instructions. + +Structures may optionally be "packed" structures, which indicate that +the alignment of the struct is one byte, and that there is no padding +between the elements. In non-packed structs, padding between field types +is inserted as defined by the DataLayout string in the module, which is +required to match what the underlying code generator expects. + +Structures can either be "literal" or "identified". A literal structure +is defined inline with other types (e.g. ``{i32, i32}*``) whereas +identified types are always defined at the top level with a name. +Literal types are uniqued by their contents and can never be recursive +or opaque since there is no way to write one. Identified types can be +recursive, can be opaqued, and are never uniqued. + +Syntax: +""""""" + +:: + + %T1 = type { <type list> } ; Identified normal struct type + %T2 = type <{ <type list> }> ; Identified packed struct type + +Examples: +""""""""" + ++------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| ``{ i32, i32, i32 }`` | A triple of three ``i32`` values | ++------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| ``{ float, i32 (i32) * }`` | A pair, where the first element is a ``float`` and the second element is a :ref:`pointer <t_pointer>` to a :ref:`function <t_function>` that takes an ``i32``, returning an ``i32``. | ++------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| ``<{ i8, i32 }>`` | A packed struct known to be 5 bytes in size. | ++------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +.. _t_opaque: + +Opaque Structure Types +^^^^^^^^^^^^^^^^^^^^^^ + +Overview: +""""""""" + +Opaque structure types are used to represent named structure types that +do not have a body specified. This corresponds (for example) to the C +notion of a forward declared structure. + +Syntax: +""""""" + +:: + + %X = type opaque + %52 = type opaque + +Examples: +""""""""" + ++--------------+-------------------+ +| ``opaque`` | An opaque type. | ++--------------+-------------------+ + +.. _t_pointer: + +Pointer Type +^^^^^^^^^^^^ + +Overview: +""""""""" + +The pointer type is used to specify memory locations. Pointers are +commonly used to reference objects in memory. + +Pointer types may have an optional address space attribute defining the +numbered address space where the pointed-to object resides. The default +address space is number zero. The semantics of non-zero address spaces +are target-specific. + +Note that LLVM does not permit pointers to void (``void*``) nor does it +permit pointers to labels (``label*``). Use ``i8*`` instead. + +Syntax: +""""""" + +:: + + <type> * + +Examples: +""""""""" + ++-------------------------+--------------------------------------------------------------------------------------------------------------+ +| ``[4 x i32]*`` | A :ref:`pointer <t_pointer>` to :ref:`array <t_array>` of four ``i32`` values. | ++-------------------------+--------------------------------------------------------------------------------------------------------------+ +| ``i32 (i32*) *`` | A :ref:`pointer <t_pointer>` to a :ref:`function <t_function>` that takes an ``i32*``, returning an ``i32``. | ++-------------------------+--------------------------------------------------------------------------------------------------------------+ +| ``i32 addrspace(5)*`` | A :ref:`pointer <t_pointer>` to an ``i32`` value that resides in address space #5. | ++-------------------------+--------------------------------------------------------------------------------------------------------------+ + +.. _t_vector: + +Vector Type +^^^^^^^^^^^ + +Overview: +""""""""" + +A vector type is a simple derived type that represents a vector of +elements. Vector types are used when multiple primitive data are +operated in parallel using a single instruction (SIMD). A vector type +requires a size (number of elements) and an underlying primitive data +type. Vector types are considered :ref:`first class <t_firstclass>`. + +Syntax: +""""""" + +:: + + < <# elements> x <elementtype> > + +The number of elements is a constant integer value larger than 0; +elementtype may be any integer or floating point type, or a pointer to +these types. Vectors of size zero are not allowed. + +Examples: +""""""""" + ++-------------------+--------------------------------------------------+ +| ``<4 x i32>`` | Vector of 4 32-bit integer values. | ++-------------------+--------------------------------------------------+ +| ``<8 x float>`` | Vector of 8 32-bit floating-point values. | ++-------------------+--------------------------------------------------+ +| ``<2 x i64>`` | Vector of 2 64-bit integer values. | ++-------------------+--------------------------------------------------+ +| ``<4 x i64*>`` | Vector of 4 pointers to 64-bit integer values. | ++-------------------+--------------------------------------------------+ + +Constants +========= + +LLVM has several different basic types of constants. This section +describes them all and their syntax. + +Simple Constants +---------------- + +**Boolean constants** + The two strings '``true``' and '``false``' are both valid constants + of the ``i1`` type. +**Integer constants** + Standard integers (such as '4') are constants of the + :ref:`integer <t_integer>` type. Negative numbers may be used with + integer types. +**Floating point constants** + Floating point constants use standard decimal notation (e.g. + 123.421), exponential notation (e.g. 1.23421e+2), or a more precise + hexadecimal notation (see below). The assembler requires the exact + decimal value of a floating-point constant. For example, the + assembler accepts 1.25 but rejects 1.3 because 1.3 is a repeating + decimal in binary. Floating point constants must have a :ref:`floating + point <t_floating>` type. +**Null pointer constants** + The identifier '``null``' is recognized as a null pointer constant + and must be of :ref:`pointer type <t_pointer>`. + +The one non-intuitive notation for constants is the hexadecimal form of +floating point constants. For example, the form +'``double 0x432ff973cafa8000``' is equivalent to (but harder to read +than) '``double 4.5e+15``'. The only time hexadecimal floating point +constants are required (and the only time that they are generated by the +disassembler) is when a floating point constant must be emitted but it +cannot be represented as a decimal floating point number in a reasonable +number of digits. For example, NaN's, infinities, and other special +values are represented in their IEEE hexadecimal format so that assembly +and disassembly do not cause any bits to change in the constants. + +When using the hexadecimal form, constants of types half, float, and +double are represented using the 16-digit form shown above (which +matches the IEEE754 representation for double); half and float values +must, however, be exactly representable as IEE754 half and single +precision, respectively. Hexadecimal format is always used for long +double, and there are three forms of long double. The 80-bit format used +by x86 is represented as ``0xK`` followed by 20 hexadecimal digits. The +128-bit format used by PowerPC (two adjacent doubles) is represented by +``0xM`` followed by 32 hexadecimal digits. The IEEE 128-bit format is +represented by ``0xL`` followed by 32 hexadecimal digits; no currently +supported target uses this format. Long doubles will only work if they +match the long double format on your target. The IEEE 16-bit format +(half precision) is represented by ``0xH`` followed by 4 hexadecimal +digits. All hexadecimal formats are big-endian (sign bit at the left). + +There are no constants of type x86mmx. + +Complex Constants +----------------- + +Complex constants are a (potentially recursive) combination of simple +constants and smaller complex constants. + +**Structure constants** + Structure constants are represented with notation similar to + structure type definitions (a comma separated list of elements, + surrounded by braces (``{}``)). For example: + "``{ i32 4, float 17.0, i32* @G }``", where "``@G``" is declared as + "``@G = external global i32``". Structure constants must have + :ref:`structure type <t_struct>`, and the number and types of elements + must match those specified by the type. +**Array constants** + Array constants are represented with notation similar to array type + definitions (a comma separated list of elements, surrounded by + square brackets (``[]``)). For example: + "``[ i32 42, i32 11, i32 74 ]``". Array constants must have + :ref:`array type <t_array>`, and the number and types of elements must + match those specified by the type. +**Vector constants** + Vector constants are represented with notation similar to vector + type definitions (a comma separated list of elements, surrounded by + less-than/greater-than's (``<>``)). For example: + "``< i32 42, i32 11, i32 74, i32 100 >``". Vector constants + must have :ref:`vector type <t_vector>`, and the number and types of + elements must match those specified by the type. +**Zero initialization** + The string '``zeroinitializer``' can be used to zero initialize a + value to zero of *any* type, including scalar and + :ref:`aggregate <t_aggregate>` types. This is often used to avoid + having to print large zero initializers (e.g. for large arrays) and + is always exactly equivalent to using explicit zero initializers. +**Metadata node** + A metadata node is a structure-like constant with :ref:`metadata + type <t_metadata>`. For example: + "``metadata !{ i32 0, metadata !"test" }``". Unlike other + constants that are meant to be interpreted as part of the + instruction stream, metadata is a place to attach additional + information such as debug info. + +Global Variable and Function Addresses +-------------------------------------- + +The addresses of :ref:`global variables <globalvars>` and +:ref:`functions <functionstructure>` are always implicitly valid +(link-time) constants. These constants are explicitly referenced when +the :ref:`identifier for the global <identifiers>` is used and always have +:ref:`pointer <t_pointer>` type. For example, the following is a legal LLVM +file: + +.. code-block:: llvm + + @X = global i32 17 + @Y = global i32 42 + @Z = global [2 x i32*] [ i32* @X, i32* @Y ] + +.. _undefvalues: + +Undefined Values +---------------- + +The string '``undef``' can be used anywhere a constant is expected, and +indicates that the user of the value may receive an unspecified +bit-pattern. Undefined values may be of any type (other than '``label``' +or '``void``') and be used anywhere a constant is permitted. + +Undefined values are useful because they indicate to the compiler that +the program is well defined no matter what value is used. This gives the +compiler more freedom to optimize. Here are some examples of +(potentially surprising) transformations that are valid (in pseudo IR): + +.. code-block:: llvm + + %A = add %X, undef + %B = sub %X, undef + %C = xor %X, undef + Safe: + %A = undef + %B = undef + %C = undef + +This is safe because all of the output bits are affected by the undef +bits. Any output bit can have a zero or one depending on the input bits. + +.. code-block:: llvm + + %A = or %X, undef + %B = and %X, undef + Safe: + %A = -1 + %B = 0 + Unsafe: + %A = undef + %B = undef + +These logical operations have bits that are not always affected by the +input. For example, if ``%X`` has a zero bit, then the output of the +'``and``' operation will always be a zero for that bit, no matter what +the corresponding bit from the '``undef``' is. As such, it is unsafe to +optimize or assume that the result of the '``and``' is '``undef``'. +However, it is safe to assume that all bits of the '``undef``' could be +0, and optimize the '``and``' to 0. Likewise, it is safe to assume that +all the bits of the '``undef``' operand to the '``or``' could be set, +allowing the '``or``' to be folded to -1. + +.. code-block:: llvm + + %A = select undef, %X, %Y + %B = select undef, 42, %Y + %C = select %X, %Y, undef + Safe: + %A = %X (or %Y) + %B = 42 (or %Y) + %C = %Y + Unsafe: + %A = undef + %B = undef + %C = undef + +This set of examples shows that undefined '``select``' (and conditional +branch) conditions can go *either way*, but they have to come from one +of the two operands. In the ``%A`` example, if ``%X`` and ``%Y`` were +both known to have a clear low bit, then ``%A`` would have to have a +cleared low bit. However, in the ``%C`` example, the optimizer is +allowed to assume that the '``undef``' operand could be the same as +``%Y``, allowing the whole '``select``' to be eliminated. + +.. code-block:: llvm + + %A = xor undef, undef + + %B = undef + %C = xor %B, %B + + %D = undef + %E = icmp lt %D, 4 + %F = icmp gte %D, 4 + + Safe: + %A = undef + %B = undef + %C = undef + %D = undef + %E = undef + %F = undef + +This example points out that two '``undef``' operands are not +necessarily the same. This can be surprising to people (and also matches +C semantics) where they assume that "``X^X``" is always zero, even if +``X`` is undefined. This isn't true for a number of reasons, but the +short answer is that an '``undef``' "variable" can arbitrarily change +its value over its "live range". This is true because the variable +doesn't actually *have a live range*. Instead, the value is logically +read from arbitrary registers that happen to be around when needed, so +the value is not necessarily consistent over time. In fact, ``%A`` and +``%C`` need to have the same semantics or the core LLVM "replace all +uses with" concept would not hold. + +.. code-block:: llvm + + %A = fdiv undef, %X + %B = fdiv %X, undef + Safe: + %A = undef + b: unreachable + +These examples show the crucial difference between an *undefined value* +and *undefined behavior*. An undefined value (like '``undef``') is +allowed to have an arbitrary bit-pattern. This means that the ``%A`` +operation can be constant folded to '``undef``', because the '``undef``' +could be an SNaN, and ``fdiv`` is not (currently) defined on SNaN's. +However, in the second example, we can make a more aggressive +assumption: because the ``undef`` is allowed to be an arbitrary value, +we are allowed to assume that it could be zero. Since a divide by zero +has *undefined behavior*, we are allowed to assume that the operation +does not execute at all. This allows us to delete the divide and all +code after it. Because the undefined operation "can't happen", the +optimizer can assume that it occurs in dead code. + +.. code-block:: llvm + + a: store undef -> %X + b: store %X -> undef + Safe: + a: <deleted> + b: unreachable + +These examples reiterate the ``fdiv`` example: a store *of* an undefined +value can be assumed to not have any effect; we can assume that the +value is overwritten with bits that happen to match what was already +there. However, a store *to* an undefined location could clobber +arbitrary memory, therefore, it has undefined behavior. + +.. _poisonvalues: + +Poison Values +------------- + +Poison values are similar to :ref:`undef values <undefvalues>`, however +they also represent the fact that an instruction or constant expression +which cannot evoke side effects has nevertheless detected a condition +which results in undefined behavior. + +There is currently no way of representing a poison value in the IR; they +only exist when produced by operations such as :ref:`add <i_add>` with +the ``nsw`` flag. + +Poison value behavior is defined in terms of value *dependence*: + +- Values other than :ref:`phi <i_phi>` nodes depend on their operands. +- :ref:`Phi <i_phi>` nodes depend on the operand corresponding to + their dynamic predecessor basic block. +- Function arguments depend on the corresponding actual argument values + in the dynamic callers of their functions. +- :ref:`Call <i_call>` instructions depend on the :ref:`ret <i_ret>` + instructions that dynamically transfer control back to them. +- :ref:`Invoke <i_invoke>` instructions depend on the + :ref:`ret <i_ret>`, :ref:`resume <i_resume>`, or exception-throwing + call instructions that dynamically transfer control back to them. +- Non-volatile loads and stores depend on the most recent stores to all + of the referenced memory addresses, following the order in the IR + (including loads and stores implied by intrinsics such as + :ref:`@llvm.memcpy <int_memcpy>`.) +- An instruction with externally visible side effects depends on the + most recent preceding instruction with externally visible side + effects, following the order in the IR. (This includes :ref:`volatile + operations <volatile>`.) +- An instruction *control-depends* on a :ref:`terminator + instruction <terminators>` if the terminator instruction has + multiple successors and the instruction is always executed when + control transfers to one of the successors, and may not be executed + when control is transferred to another. +- Additionally, an instruction also *control-depends* on a terminator + instruction if the set of instructions it otherwise depends on would + be different if the terminator had transferred control to a different + successor. +- Dependence is transitive. + +Poison Values have the same behavior as :ref:`undef values <undefvalues>`, +with the additional affect that any instruction which has a *dependence* +on a poison value has undefined behavior. + +Here are some examples: + +.. code-block:: llvm + + entry: + %poison = sub nuw i32 0, 1 ; Results in a poison value. + %still_poison = and i32 %poison, 0 ; 0, but also poison. + %poison_yet_again = getelementptr i32* @h, i32 %still_poison + store i32 0, i32* %poison_yet_again ; memory at @h[0] is poisoned + + store i32 %poison, i32* @g ; Poison value stored to memory. + %poison2 = load i32* @g ; Poison value loaded back from memory. + + store volatile i32 %poison, i32* @g ; External observation; undefined behavior. + + %narrowaddr = bitcast i32* @g to i16* + %wideaddr = bitcast i32* @g to i64* + %poison3 = load i16* %narrowaddr ; Returns a poison value. + %poison4 = load i64* %wideaddr ; Returns a poison value. + + %cmp = icmp slt i32 %poison, 0 ; Returns a poison value. + br i1 %cmp, label %true, label %end ; Branch to either destination. + + true: + store volatile i32 0, i32* @g ; This is control-dependent on %cmp, so + ; it has undefined behavior. + br label %end + + end: + %p = phi i32 [ 0, %entry ], [ 1, %true ] + ; Both edges into this PHI are + ; control-dependent on %cmp, so this + ; always results in a poison value. + + store volatile i32 0, i32* @g ; This would depend on the store in %true + ; if %cmp is true, or the store in %entry + ; otherwise, so this is undefined behavior. + + br i1 %cmp, label %second_true, label %second_end + ; The same branch again, but this time the + ; true block doesn't have side effects. + + second_true: + ; No side effects! + ret void + + second_end: + store volatile i32 0, i32* @g ; This time, the instruction always depends + ; on the store in %end. Also, it is + ; control-equivalent to %end, so this is + ; well-defined (ignoring earlier undefined + ; behavior in this example). + +.. _blockaddress: + +Addresses of Basic Blocks +------------------------- + +``blockaddress(@function, %block)`` + +The '``blockaddress``' constant computes the address of the specified +basic block in the specified function, and always has an ``i8*`` type. +Taking the address of the entry block is illegal. + +This value only has defined behavior when used as an operand to the +':ref:`indirectbr <i_indirectbr>`' instruction, or for comparisons +against null. Pointer equality tests between labels addresses results in +undefined behavior — though, again, comparison against null is ok, and +no label is equal to the null pointer. This may be passed around as an +opaque pointer sized value as long as the bits are not inspected. This +allows ``ptrtoint`` and arithmetic to be performed on these values so +long as the original value is reconstituted before the ``indirectbr`` +instruction. + +Finally, some targets may provide defined semantics when using the value +as the operand to an inline assembly, but that is target specific. + +Constant Expressions +-------------------- + +Constant expressions are used to allow expressions involving other +constants to be used as constants. Constant expressions may be of any +:ref:`first class <t_firstclass>` type and may involve any LLVM operation +that does not have side effects (e.g. load and call are not supported). +The following is the syntax for constant expressions: + +``trunc (CST to TYPE)`` + Truncate a constant to another type. The bit size of CST must be + larger than the bit size of TYPE. Both types must be integers. +``zext (CST to TYPE)`` + Zero extend a constant to another type. The bit size of CST must be + smaller than the bit size of TYPE. Both types must be integers. +``sext (CST to TYPE)`` + Sign extend a constant to another type. The bit size of CST must be + smaller than the bit size of TYPE. Both types must be integers. +``fptrunc (CST to TYPE)`` + Truncate a floating point constant to another floating point type. + The size of CST must be larger than the size of TYPE. Both types + must be floating point. +``fpext (CST to TYPE)`` + Floating point extend a constant to another type. The size of CST + must be smaller or equal to the size of TYPE. Both types must be + floating point. +``fptoui (CST to TYPE)`` + Convert a floating point constant to the corresponding unsigned + integer constant. TYPE must be a scalar or vector integer type. CST + must be of scalar or vector floating point type. Both CST and TYPE + must be scalars, or vectors of the same number of elements. If the + value won't fit in the integer type, the results are undefined. +``fptosi (CST to TYPE)`` + Convert a floating point constant to the corresponding signed + integer constant. TYPE must be a scalar or vector integer type. CST + must be of scalar or vector floating point type. Both CST and TYPE + must be scalars, or vectors of the same number of elements. If the + value won't fit in the integer type, the results are undefined. +``uitofp (CST to TYPE)`` + Convert an unsigned integer constant to the corresponding floating + point constant. TYPE must be a scalar or vector floating point type. + CST must be of scalar or vector integer type. Both CST and TYPE must + be scalars, or vectors of the same number of elements. If the value + won't fit in the floating point type, the results are undefined. +``sitofp (CST to TYPE)`` + Convert a signed integer constant to the corresponding floating + point constant. TYPE must be a scalar or vector floating point type. + CST must be of scalar or vector integer type. Both CST and TYPE must + be scalars, or vectors of the same number of elements. If the value + won't fit in the floating point type, the results are undefined. +``ptrtoint (CST to TYPE)`` + Convert a pointer typed constant to the corresponding integer + constant ``TYPE`` must be an integer type. ``CST`` must be of + pointer type. The ``CST`` value is zero extended, truncated, or + unchanged to make it fit in ``TYPE``. +``inttoptr (CST to TYPE)`` + Convert an integer constant to a pointer constant. TYPE must be a + pointer type. CST must be of integer type. The CST value is zero + extended, truncated, or unchanged to make it fit in a pointer size. + This one is *really* dangerous! +``bitcast (CST to TYPE)`` + Convert a constant, CST, to another TYPE. The constraints of the + operands are the same as those for the :ref:`bitcast + instruction <i_bitcast>`. +``getelementptr (CSTPTR, IDX0, IDX1, ...)``, ``getelementptr inbounds (CSTPTR, IDX0, IDX1, ...)`` + Perform the :ref:`getelementptr operation <i_getelementptr>` on + constants. As with the :ref:`getelementptr <i_getelementptr>` + instruction, the index list may have zero or more indexes, which are + required to make sense for the type of "CSTPTR". +``select (COND, VAL1, VAL2)`` + Perform the :ref:`select operation <i_select>` on constants. +``icmp COND (VAL1, VAL2)`` + Performs the :ref:`icmp operation <i_icmp>` on constants. +``fcmp COND (VAL1, VAL2)`` + Performs the :ref:`fcmp operation <i_fcmp>` on constants. +``extractelement (VAL, IDX)`` + Perform the :ref:`extractelement operation <i_extractelement>` on + constants. +``insertelement (VAL, ELT, IDX)`` + Perform the :ref:`insertelement operation <i_insertelement>` on + constants. +``shufflevector (VEC1, VEC2, IDXMASK)`` + Perform the :ref:`shufflevector operation <i_shufflevector>` on + constants. +``extractvalue (VAL, IDX0, IDX1, ...)`` + Perform the :ref:`extractvalue operation <i_extractvalue>` on + constants. The index list is interpreted in a similar manner as + indices in a ':ref:`getelementptr <i_getelementptr>`' operation. At + least one index value must be specified. +``insertvalue (VAL, ELT, IDX0, IDX1, ...)`` + Perform the :ref:`insertvalue operation <i_insertvalue>` on constants. + The index list is interpreted in a similar manner as indices in a + ':ref:`getelementptr <i_getelementptr>`' operation. At least one index + value must be specified. +``OPCODE (LHS, RHS)`` + Perform the specified operation of the LHS and RHS constants. OPCODE + may be any of the :ref:`binary <binaryops>` or :ref:`bitwise + binary <bitwiseops>` operations. The constraints on operands are + the same as those for the corresponding instruction (e.g. no bitwise + operations on floating point values are allowed). + +Other Values +============ + +Inline Assembler Expressions +---------------------------- + +LLVM supports inline assembler expressions (as opposed to :ref:`Module-Level +Inline Assembly <moduleasm>`) through the use of a special value. This +value represents the inline assembler as a string (containing the +instructions to emit), a list of operand constraints (stored as a +string), a flag that indicates whether or not the inline asm expression +has side effects, and a flag indicating whether the function containing +the asm needs to align its stack conservatively. An example inline +assembler expression is: + +.. code-block:: llvm + + i32 (i32) asm "bswap $0", "=r,r" + +Inline assembler expressions may **only** be used as the callee operand +of a :ref:`call <i_call>` or an :ref:`invoke <i_invoke>` instruction. +Thus, typically we have: + +.. code-block:: llvm + + %X = call i32 asm "bswap $0", "=r,r"(i32 %Y) + +Inline asms with side effects not visible in the constraint list must be +marked as having side effects. This is done through the use of the +'``sideeffect``' keyword, like so: + +.. code-block:: llvm + + call void asm sideeffect "eieio", ""() + +In some cases inline asms will contain code that will not work unless +the stack is aligned in some way, such as calls or SSE instructions on +x86, yet will not contain code that does that alignment within the asm. +The compiler should make conservative assumptions about what the asm +might contain and should generate its usual stack alignment code in the +prologue if the '``alignstack``' keyword is present: + +.. code-block:: llvm + + call void asm alignstack "eieio", ""() + +Inline asms also support using non-standard assembly dialects. The +assumed dialect is ATT. When the '``inteldialect``' keyword is present, +the inline asm is using the Intel dialect. Currently, ATT and Intel are +the only supported dialects. An example is: + +.. code-block:: llvm + + call void asm inteldialect "eieio", ""() + +If multiple keywords appear the '``sideeffect``' keyword must come +first, the '``alignstack``' keyword second and the '``inteldialect``' +keyword last. + +Inline Asm Metadata +^^^^^^^^^^^^^^^^^^^ + +The call instructions that wrap inline asm nodes may have a +"``!srcloc``" MDNode attached to it that contains a list of constant +integers. If present, the code generator will use the integer as the +location cookie value when report errors through the ``LLVMContext`` +error reporting mechanisms. This allows a front-end to correlate backend +errors that occur with inline asm back to the source code that produced +it. For example: + +.. code-block:: llvm + + call void asm sideeffect "something bad", ""(), !srcloc !42 + ... + !42 = !{ i32 1234567 } + +It is up to the front-end to make sense of the magic numbers it places +in the IR. If the MDNode contains multiple constants, the code generator +will use the one that corresponds to the line of the asm that the error +occurs on. + +.. _metadata: + +Metadata Nodes and Metadata Strings +----------------------------------- + +LLVM IR allows metadata to be attached to instructions in the program +that can convey extra information about the code to the optimizers and +code generator. One example application of metadata is source-level +debug information. There are two metadata primitives: strings and nodes. +All metadata has the ``metadata`` type and is identified in syntax by a +preceding exclamation point ('``!``'). + +A metadata string is a string surrounded by double quotes. It can +contain any character by escaping non-printable characters with +"``\xx``" where "``xx``" is the two digit hex code. For example: +"``!"test\00"``". + +Metadata nodes are represented with notation similar to structure +constants (a comma separated list of elements, surrounded by braces and +preceded by an exclamation point). Metadata nodes can have any values as +their operand. For example: + +.. code-block:: llvm + + !{ metadata !"test\00", i32 10} + +A :ref:`named metadata <namedmetadatastructure>` is a collection of +metadata nodes, which can be looked up in the module symbol table. For +example: + +.. code-block:: llvm + + !foo = metadata !{!4, !3} + +Metadata can be used as function arguments. Here ``llvm.dbg.value`` +function is using two metadata arguments: + +.. code-block:: llvm + + call void @llvm.dbg.value(metadata !24, i64 0, metadata !25) + +Metadata can be attached with an instruction. Here metadata ``!21`` is +attached to the ``add`` instruction using the ``!dbg`` identifier: + +.. code-block:: llvm + + %indvar.next = add i64 %indvar, 1, !dbg !21 + +More information about specific metadata nodes recognized by the +optimizers and code generator is found below. + +'``tbaa``' Metadata +^^^^^^^^^^^^^^^^^^^ + +In LLVM IR, memory does not have types, so LLVM's own type system is not +suitable for doing TBAA. Instead, metadata is added to the IR to +describe a type system of a higher level language. This can be used to +implement typical C/C++ TBAA, but it can also be used to implement +custom alias analysis behavior for other languages. + +The current metadata format is very simple. TBAA metadata nodes have up +to three fields, e.g.: + +.. code-block:: llvm + + !0 = metadata !{ metadata !"an example type tree" } + !1 = metadata !{ metadata !"int", metadata !0 } + !2 = metadata !{ metadata !"float", metadata !0 } + !3 = metadata !{ metadata !"const float", metadata !2, i64 1 } + +The first field is an identity field. It can be any value, usually a +metadata string, which uniquely identifies the type. The most important +name in the tree is the name of the root node. Two trees with different +root node names are entirely disjoint, even if they have leaves with +common names. + +The second field identifies the type's parent node in the tree, or is +null or omitted for a root node. A type is considered to alias all of +its descendants and all of its ancestors in the tree. Also, a type is +considered to alias all types in other trees, so that bitcode produced +from multiple front-ends is handled conservatively. + +If the third field is present, it's an integer which if equal to 1 +indicates that the type is "constant" (meaning +``pointsToConstantMemory`` should return true; see `other useful +AliasAnalysis methods <AliasAnalysis.html#OtherItfs>`_). + +'``tbaa.struct``' Metadata +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The :ref:`llvm.memcpy <int_memcpy>` is often used to implement +aggregate assignment operations in C and similar languages, however it +is defined to copy a contiguous region of memory, which is more than +strictly necessary for aggregate types which contain holes due to +padding. Also, it doesn't contain any TBAA information about the fields +of the aggregate. + +``!tbaa.struct`` metadata can describe which memory subregions in a +memcpy are padding and what the TBAA tags of the struct are. + +The current metadata format is very simple. ``!tbaa.struct`` metadata +nodes are a list of operands which are in conceptual groups of three. +For each group of three, the first operand gives the byte offset of a +field in bytes, the second gives its size in bytes, and the third gives +its tbaa tag. e.g.: + +.. code-block:: llvm + + !4 = metadata !{ i64 0, i64 4, metadata !1, i64 8, i64 4, metadata !2 } + +This describes a struct with two fields. The first is at offset 0 bytes +with size 4 bytes, and has tbaa tag !1. The second is at offset 8 bytes +and has size 4 bytes and has tbaa tag !2. + +Note that the fields need not be contiguous. In this example, there is a +4 byte gap between the two fields. This gap represents padding which +does not carry useful data and need not be preserved. + +'``fpmath``' Metadata +^^^^^^^^^^^^^^^^^^^^^ + +``fpmath`` metadata may be attached to any instruction of floating point +type. It can be used to express the maximum acceptable error in the +result of that instruction, in ULPs, thus potentially allowing the +compiler to use a more efficient but less accurate method of computing +it. ULP is defined as follows: + + If ``x`` is a real number that lies between two finite consecutive + floating-point numbers ``a`` and ``b``, without being equal to one + of them, then ``ulp(x) = |b - a|``, otherwise ``ulp(x)`` is the + distance between the two non-equal finite floating-point numbers + nearest ``x``. Moreover, ``ulp(NaN)`` is ``NaN``. + +The metadata node shall consist of a single positive floating point +number representing the maximum relative error, for example: + +.. code-block:: llvm + + !0 = metadata !{ float 2.5 } ; maximum acceptable inaccuracy is 2.5 ULPs + +'``range``' Metadata +^^^^^^^^^^^^^^^^^^^^ + +``range`` metadata may be attached only to loads of integer types. It +expresses the possible ranges the loaded value is in. The ranges are +represented with a flattened list of integers. The loaded value is known +to be in the union of the ranges defined by each consecutive pair. Each +pair has the following properties: + +- The type must match the type loaded by the instruction. +- The pair ``a,b`` represents the range ``[a,b)``. +- Both ``a`` and ``b`` are constants. +- The range is allowed to wrap. +- The range should not represent the full or empty set. That is, + ``a!=b``. + +In addition, the pairs must be in signed order of the lower bound and +they must be non-contiguous. + +Examples: + +.. code-block:: llvm + + %a = load i8* %x, align 1, !range !0 ; Can only be 0 or 1 + %b = load i8* %y, align 1, !range !1 ; Can only be 255 (-1), 0 or 1 + %c = load i8* %z, align 1, !range !2 ; Can only be 0, 1, 3, 4 or 5 + %d = load i8* %z, align 1, !range !3 ; Can only be -2, -1, 3, 4 or 5 + ... + !0 = metadata !{ i8 0, i8 2 } + !1 = metadata !{ i8 255, i8 2 } + !2 = metadata !{ i8 0, i8 2, i8 3, i8 6 } + !3 = metadata !{ i8 -2, i8 0, i8 3, i8 6 } + +Module Flags Metadata +===================== + +Information about the module as a whole is difficult to convey to LLVM's +subsystems. The LLVM IR isn't sufficient to transmit this information. +The ``llvm.module.flags`` named metadata exists in order to facilitate +this. These flags are in the form of key / value pairs — much like a +dictionary — making it easy for any subsystem who cares about a flag to +look it up. + +The ``llvm.module.flags`` metadata contains a list of metadata triplets. +Each triplet has the following form: + +- The first element is a *behavior* flag, which specifies the behavior + when two (or more) modules are merged together, and it encounters two + (or more) metadata with the same ID. The supported behaviors are + described below. +- The second element is a metadata string that is a unique ID for the + metadata. How each ID is interpreted is documented below. +- The third element is the value of the flag. + +When two (or more) modules are merged together, the resulting +``llvm.module.flags`` metadata is the union of the modules' +``llvm.module.flags`` metadata. The only exception being a flag with the +*Override* behavior, which may override another flag's value (see +below). + +The following behaviors are supported: + +.. list-table:: + :header-rows: 1 + :widths: 10 90 + + * - Value + - Behavior + + * - 1 + - **Error** + Emits an error if two values disagree. It is an error to have an + ID with both an Error and a Warning behavior. + + * - 2 + - **Warning** + Emits a warning if two values disagree. + + * - 3 + - **Require** + Emits an error when the specified value is not present or doesn't + have the specified value. It is an error for two (or more) + ``llvm.module.flags`` with the same ID to have the Require behavior + but different values. There may be multiple Require flags per ID. + + * - 4 + - **Override** + Uses the specified value if the two values disagree. It is an + error for two (or more) ``llvm.module.flags`` with the same ID + to have the Override behavior but different values. + +An example of module flags: + +.. code-block:: llvm + + !0 = metadata !{ i32 1, metadata !"foo", i32 1 } + !1 = metadata !{ i32 4, metadata !"bar", i32 37 } + !2 = metadata !{ i32 2, metadata !"qux", i32 42 } + !3 = metadata !{ i32 3, metadata !"qux", + metadata !{ + metadata !"foo", i32 1 + } + } + !llvm.module.flags = !{ !0, !1, !2, !3 } + +- Metadata ``!0`` has the ID ``!"foo"`` and the value '1'. The behavior + if two or more ``!"foo"`` flags are seen is to emit an error if their + values are not equal. + +- Metadata ``!1`` has the ID ``!"bar"`` and the value '37'. The + behavior if two or more ``!"bar"`` flags are seen is to use the value + '37' if their values are not equal. + +- Metadata ``!2`` has the ID ``!"qux"`` and the value '42'. The + behavior if two or more ``!"qux"`` flags are seen is to emit a + warning if their values are not equal. + +- Metadata ``!3`` has the ID ``!"qux"`` and the value: + + :: + + metadata !{ metadata !"foo", i32 1 } + + The behavior is to emit an error if the ``llvm.module.flags`` does + not contain a flag with the ID ``!"foo"`` that has the value '1'. If + two or more ``!"qux"`` flags exist, then they must have the same + value or an error will be issued. + +Objective-C Garbage Collection Module Flags Metadata +---------------------------------------------------- + +On the Mach-O platform, Objective-C stores metadata about garbage +collection in a special section called "image info". The metadata +consists of a version number and a bitmask specifying what types of +garbage collection are supported (if any) by the file. If two or more +modules are linked together their garbage collection metadata needs to +be merged rather than appended together. + +The Objective-C garbage collection module flags metadata consists of the +following key-value pairs: + +.. list-table:: + :header-rows: 1 + :widths: 30 70 + + * - Key + - Value + + * - ``Objective-C Version`` + - **[Required]** — The Objective-C ABI version. Valid values are 1 and 2. + + * - ``Objective-C Image Info Version`` + - **[Required]** — The version of the image info section. Currently + always 0. + + * - ``Objective-C Image Info Section`` + - **[Required]** — The section to place the metadata. Valid values are + ``"__OBJC, __image_info, regular"`` for Objective-C ABI version 1, and + ``"__DATA,__objc_imageinfo, regular, no_dead_strip"`` for + Objective-C ABI version 2. + + * - ``Objective-C Garbage Collection`` + - **[Required]** — Specifies whether garbage collection is supported or + not. Valid values are 0, for no garbage collection, and 2, for garbage + collection supported. + + * - ``Objective-C GC Only`` + - **[Optional]** — Specifies that only garbage collection is supported. + If present, its value must be 6. This flag requires that the + ``Objective-C Garbage Collection`` flag have the value 2. + +Some important flag interactions: + +- If a module with ``Objective-C Garbage Collection`` set to 0 is + merged with a module with ``Objective-C Garbage Collection`` set to + 2, then the resulting module has the + ``Objective-C Garbage Collection`` flag set to 0. +- A module with ``Objective-C Garbage Collection`` set to 0 cannot be + merged with a module with ``Objective-C GC Only`` set to 6. + +Intrinsic Global Variables +========================== + +LLVM has a number of "magic" global variables that contain data that +affect code generation or other IR semantics. These are documented here. +All globals of this sort should have a section specified as +"``llvm.metadata``". This section and all globals that start with +"``llvm.``" are reserved for use by LLVM. + +The '``llvm.used``' Global Variable +----------------------------------- + +The ``@llvm.used`` global is an array with i8\* element type which has +:ref:`appending linkage <linkage_appending>`. This array contains a list of +pointers to global variables and functions which may optionally have a +pointer cast formed of bitcast or getelementptr. For example, a legal +use of it is: + +.. code-block:: llvm + + @X = global i8 4 + @Y = global i32 123 + + @llvm.used = appending global [2 x i8*] [ + i8* @X, + i8* bitcast (i32* @Y to i8*) + ], section "llvm.metadata" + +If a global variable appears in the ``@llvm.used`` list, then the +compiler, assembler, and linker are required to treat the symbol as if +there is a reference to the global that it cannot see. For example, if a +variable has internal linkage and no references other than that from the +``@llvm.used`` list, it cannot be deleted. This is commonly used to +represent references from inline asms and other things the compiler +cannot "see", and corresponds to "``attribute((used))``" in GNU C. + +On some targets, the code generator must emit a directive to the +assembler or object file to prevent the assembler and linker from +molesting the symbol. + +The '``llvm.compiler.used``' Global Variable +-------------------------------------------- + +The ``@llvm.compiler.used`` directive is the same as the ``@llvm.used`` +directive, except that it only prevents the compiler from touching the +symbol. On targets that support it, this allows an intelligent linker to +optimize references to the symbol without being impeded as it would be +by ``@llvm.used``. + +This is a rare construct that should only be used in rare circumstances, +and should not be exposed to source languages. + +The '``llvm.global_ctors``' Global Variable +------------------------------------------- + +.. code-block:: llvm + + %0 = type { i32, void ()* } + @llvm.global_ctors = appending global [1 x %0] [%0 { i32 65535, void ()* @ctor }] + +The ``@llvm.global_ctors`` array contains a list of constructor +functions and associated priorities. The functions referenced by this +array will be called in ascending order of priority (i.e. lowest first) +when the module is loaded. The order of functions with the same priority +is not defined. + +The '``llvm.global_dtors``' Global Variable +------------------------------------------- + +.. code-block:: llvm + + %0 = type { i32, void ()* } + @llvm.global_dtors = appending global [1 x %0] [%0 { i32 65535, void ()* @dtor }] + +The ``@llvm.global_dtors`` array contains a list of destructor functions +and associated priorities. The functions referenced by this array will +be called in descending order of priority (i.e. highest first) when the +module is loaded. The order of functions with the same priority is not +defined. + +Instruction Reference +===================== + +The LLVM instruction set consists of several different classifications +of instructions: :ref:`terminator instructions <terminators>`, :ref:`binary +instructions <binaryops>`, :ref:`bitwise binary +instructions <bitwiseops>`, :ref:`memory instructions <memoryops>`, and +:ref:`other instructions <otherops>`. + +.. _terminators: + +Terminator Instructions +----------------------- + +As mentioned :ref:`previously <functionstructure>`, every basic block in a +program ends with a "Terminator" instruction, which indicates which +block should be executed after the current block is finished. These +terminator instructions typically yield a '``void``' value: they produce +control flow, not values (the one exception being the +':ref:`invoke <i_invoke>`' instruction). + +The terminator instructions are: ':ref:`ret <i_ret>`', +':ref:`br <i_br>`', ':ref:`switch <i_switch>`', +':ref:`indirectbr <i_indirectbr>`', ':ref:`invoke <i_invoke>`', +':ref:`resume <i_resume>`', and ':ref:`unreachable <i_unreachable>`'. + +.. _i_ret: + +'``ret``' Instruction +^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + ret <type> <value> ; Return a value from a non-void function + ret void ; Return from void function + +Overview: +""""""""" + +The '``ret``' instruction is used to return control flow (and optionally +a value) from a function back to the caller. + +There are two forms of the '``ret``' instruction: one that returns a +value and then causes control flow, and one that just causes control +flow to occur. + +Arguments: +"""""""""" + +The '``ret``' instruction optionally accepts a single argument, the +return value. The type of the return value must be a ':ref:`first +class <t_firstclass>`' type. + +A function is not :ref:`well formed <wellformed>` if it it has a non-void +return type and contains a '``ret``' instruction with no return value or +a return value with a type that does not match its type, or if it has a +void return type and contains a '``ret``' instruction with a return +value. + +Semantics: +"""""""""" + +When the '``ret``' instruction is executed, control flow returns back to +the calling function's context. If the caller is a +":ref:`call <i_call>`" instruction, execution continues at the +instruction after the call. If the caller was an +":ref:`invoke <i_invoke>`" instruction, execution continues at the +beginning of the "normal" destination block. If the instruction returns +a value, that value shall set the call or invoke instruction's return +value. + +Example: +"""""""" + +.. code-block:: llvm + + ret i32 5 ; Return an integer value of 5 + ret void ; Return from a void function + ret { i32, i8 } { i32 4, i8 2 } ; Return a struct of values 4 and 2 + +.. _i_br: + +'``br``' Instruction +^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + br i1 <cond>, label <iftrue>, label <iffalse> + br label <dest> ; Unconditional branch + +Overview: +""""""""" + +The '``br``' instruction is used to cause control flow to transfer to a +different basic block in the current function. There are two forms of +this instruction, corresponding to a conditional branch and an +unconditional branch. + +Arguments: +"""""""""" + +The conditional branch form of the '``br``' instruction takes a single +'``i1``' value and two '``label``' values. The unconditional form of the +'``br``' instruction takes a single '``label``' value as a target. + +Semantics: +"""""""""" + +Upon execution of a conditional '``br``' instruction, the '``i1``' +argument is evaluated. If the value is ``true``, control flows to the +'``iftrue``' ``label`` argument. If "cond" is ``false``, control flows +to the '``iffalse``' ``label`` argument. + +Example: +"""""""" + +.. code-block:: llvm + + Test: + %cond = icmp eq i32 %a, %b + br i1 %cond, label %IfEqual, label %IfUnequal + IfEqual: + ret i32 1 + IfUnequal: + ret i32 0 + +.. _i_switch: + +'``switch``' Instruction +^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + switch <intty> <value>, label <defaultdest> [ <intty> <val>, label <dest> ... ] + +Overview: +""""""""" + +The '``switch``' instruction is used to transfer control flow to one of +several different places. It is a generalization of the '``br``' +instruction, allowing a branch to occur to one of many possible +destinations. + +Arguments: +"""""""""" + +The '``switch``' instruction uses three parameters: an integer +comparison value '``value``', a default '``label``' destination, and an +array of pairs of comparison value constants and '``label``'s. The table +is not allowed to contain duplicate constant entries. + +Semantics: +"""""""""" + +The ``switch`` instruction specifies a table of values and destinations. +When the '``switch``' instruction is executed, this table is searched +for the given value. If the value is found, control flow is transferred +to the corresponding destination; otherwise, control flow is transferred +to the default destination. + +Implementation: +""""""""""""""" + +Depending on properties of the target machine and the particular +``switch`` instruction, this instruction may be code generated in +different ways. For example, it could be generated as a series of +chained conditional branches or with a lookup table. + +Example: +"""""""" + +.. code-block:: llvm + + ; Emulate a conditional br instruction + %Val = zext i1 %value to i32 + switch i32 %Val, label %truedest [ i32 0, label %falsedest ] + + ; Emulate an unconditional br instruction + switch i32 0, label %dest [ ] + + ; Implement a jump table: + switch i32 %val, label %otherwise [ i32 0, label %onzero + i32 1, label %onone + i32 2, label %ontwo ] + +.. _i_indirectbr: + +'``indirectbr``' Instruction +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + indirectbr <somety>* <address>, [ label <dest1>, label <dest2>, ... ] + +Overview: +""""""""" + +The '``indirectbr``' instruction implements an indirect branch to a +label within the current function, whose address is specified by +"``address``". Address must be derived from a +:ref:`blockaddress <blockaddress>` constant. + +Arguments: +"""""""""" + +The '``address``' argument is the address of the label to jump to. The +rest of the arguments indicate the full set of possible destinations +that the address may point to. Blocks are allowed to occur multiple +times in the destination list, though this isn't particularly useful. + +This destination list is required so that dataflow analysis has an +accurate understanding of the CFG. + +Semantics: +"""""""""" + +Control transfers to the block specified in the address argument. All +possible destination blocks must be listed in the label list, otherwise +this instruction has undefined behavior. This implies that jumps to +labels defined in other functions have undefined behavior as well. + +Implementation: +""""""""""""""" + +This is typically implemented with a jump through a register. + +Example: +"""""""" + +.. code-block:: llvm + + indirectbr i8* %Addr, [ label %bb1, label %bb2, label %bb3 ] + +.. _i_invoke: + +'``invoke``' Instruction +^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = invoke [cconv] [ret attrs] <ptr to function ty> <function ptr val>(<function args>) [fn attrs] + to label <normal label> unwind label <exception label> + +Overview: +""""""""" + +The '``invoke``' instruction causes control to transfer to a specified +function, with the possibility of control flow transfer to either the +'``normal``' label or the '``exception``' label. If the callee function +returns with the "``ret``" instruction, control flow will return to the +"normal" label. If the callee (or any indirect callees) returns via the +":ref:`resume <i_resume>`" instruction or other exception handling +mechanism, control is interrupted and continued at the dynamically +nearest "exception" label. + +The '``exception``' label is a `landing +pad <ExceptionHandling.html#overview>`_ for the exception. As such, +'``exception``' label is required to have the +":ref:`landingpad <i_landingpad>`" instruction, which contains the +information about the behavior of the program after unwinding happens, +as its first non-PHI instruction. The restrictions on the +"``landingpad``" instruction's tightly couples it to the "``invoke``" +instruction, so that the important information contained within the +"``landingpad``" instruction can't be lost through normal code motion. + +Arguments: +"""""""""" + +This instruction requires several arguments: + +#. The optional "cconv" marker indicates which :ref:`calling + convention <callingconv>` the call should use. If none is + specified, the call defaults to using C calling conventions. +#. The optional :ref:`Parameter Attributes <paramattrs>` list for return + values. Only '``zeroext``', '``signext``', and '``inreg``' attributes + are valid here. +#. '``ptr to function ty``': shall be the signature of the pointer to + function value being invoked. In most cases, this is a direct + function invocation, but indirect ``invoke``'s are just as possible, + branching off an arbitrary pointer to function value. +#. '``function ptr val``': An LLVM value containing a pointer to a + function to be invoked. +#. '``function args``': argument list whose types match the function + signature argument types and parameter attributes. All arguments must + be of :ref:`first class <t_firstclass>` type. If the function signature + indicates the function accepts a variable number of arguments, the + extra arguments can be specified. +#. '``normal label``': the label reached when the called function + executes a '``ret``' instruction. +#. '``exception label``': the label reached when a callee returns via + the :ref:`resume <i_resume>` instruction or other exception handling + mechanism. +#. The optional :ref:`function attributes <fnattrs>` list. Only + '``noreturn``', '``nounwind``', '``readonly``' and '``readnone``' + attributes are valid here. + +Semantics: +"""""""""" + +This instruction is designed to operate as a standard '``call``' +instruction in most regards. The primary difference is that it +establishes an association with a label, which is used by the runtime +library to unwind the stack. + +This instruction is used in languages with destructors to ensure that +proper cleanup is performed in the case of either a ``longjmp`` or a +thrown exception. Additionally, this is important for implementation of +'``catch``' clauses in high-level languages that support them. + +For the purposes of the SSA form, the definition of the value returned +by the '``invoke``' instruction is deemed to occur on the edge from the +current block to the "normal" label. If the callee unwinds then no +return value is available. + +Example: +"""""""" + +.. code-block:: llvm + + %retval = invoke i32 @Test(i32 15) to label %Continue + unwind label %TestCleanup ; {i32}:retval set + %retval = invoke coldcc i32 %Testfnptr(i32 15) to label %Continue + unwind label %TestCleanup ; {i32}:retval set + +.. _i_resume: + +'``resume``' Instruction +^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + resume <type> <value> + +Overview: +""""""""" + +The '``resume``' instruction is a terminator instruction that has no +successors. + +Arguments: +"""""""""" + +The '``resume``' instruction requires one argument, which must have the +same type as the result of any '``landingpad``' instruction in the same +function. + +Semantics: +"""""""""" + +The '``resume``' instruction resumes propagation of an existing +(in-flight) exception whose unwinding was interrupted with a +:ref:`landingpad <i_landingpad>` instruction. + +Example: +"""""""" + +.. code-block:: llvm + + resume { i8*, i32 } %exn + +.. _i_unreachable: + +'``unreachable``' Instruction +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + unreachable + +Overview: +""""""""" + +The '``unreachable``' instruction has no defined semantics. This +instruction is used to inform the optimizer that a particular portion of +the code is not reachable. This can be used to indicate that the code +after a no-return function cannot be reached, and other facts. + +Semantics: +"""""""""" + +The '``unreachable``' instruction has no defined semantics. + +.. _binaryops: + +Binary Operations +----------------- + +Binary operators are used to do most of the computation in a program. +They require two operands of the same type, execute an operation on +them, and produce a single value. The operands might represent multiple +data, as is the case with the :ref:`vector <t_vector>` data type. The +result value has the same type as its operands. + +There are several different binary operators: + +.. _i_add: + +'``add``' Instruction +^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = add <ty> <op1>, <op2> ; yields {ty}:result + <result> = add nuw <ty> <op1>, <op2> ; yields {ty}:result + <result> = add nsw <ty> <op1>, <op2> ; yields {ty}:result + <result> = add nuw nsw <ty> <op1>, <op2> ; yields {ty}:result + +Overview: +""""""""" + +The '``add``' instruction returns the sum of its two operands. + +Arguments: +"""""""""" + +The two arguments to the '``add``' instruction must be +:ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer values. Both +arguments must have identical types. + +Semantics: +"""""""""" + +The value produced is the integer sum of the two operands. + +If the sum has unsigned overflow, the result returned is the +mathematical result modulo 2\ :sup:`n`\ , where n is the bit width of +the result. + +Because LLVM integers use a two's complement representation, this +instruction is appropriate for both signed and unsigned integers. + +``nuw`` and ``nsw`` stand for "No Unsigned Wrap" and "No Signed Wrap", +respectively. If the ``nuw`` and/or ``nsw`` keywords are present, the +result value of the ``add`` is a :ref:`poison value <poisonvalues>` if +unsigned and/or signed overflow, respectively, occurs. + +Example: +"""""""" + +.. code-block:: llvm + + <result> = add i32 4, %var ; yields {i32}:result = 4 + %var + +.. _i_fadd: + +'``fadd``' Instruction +^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = fadd [fast-math flags]* <ty> <op1>, <op2> ; yields {ty}:result + +Overview: +""""""""" + +The '``fadd``' instruction returns the sum of its two operands. + +Arguments: +"""""""""" + +The two arguments to the '``fadd``' instruction must be :ref:`floating +point <t_floating>` or :ref:`vector <t_vector>` of floating point values. +Both arguments must have identical types. + +Semantics: +"""""""""" + +The value produced is the floating point sum of the two operands. This +instruction can also take any number of :ref:`fast-math flags <fastmath>`, +which are optimization hints to enable otherwise unsafe floating point +optimizations: + +Example: +"""""""" + +.. code-block:: llvm + + <result> = fadd float 4.0, %var ; yields {float}:result = 4.0 + %var + +'``sub``' Instruction +^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = sub <ty> <op1>, <op2> ; yields {ty}:result + <result> = sub nuw <ty> <op1>, <op2> ; yields {ty}:result + <result> = sub nsw <ty> <op1>, <op2> ; yields {ty}:result + <result> = sub nuw nsw <ty> <op1>, <op2> ; yields {ty}:result + +Overview: +""""""""" + +The '``sub``' instruction returns the difference of its two operands. + +Note that the '``sub``' instruction is used to represent the '``neg``' +instruction present in most other intermediate representations. + +Arguments: +"""""""""" + +The two arguments to the '``sub``' instruction must be +:ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer values. Both +arguments must have identical types. + +Semantics: +"""""""""" + +The value produced is the integer difference of the two operands. + +If the difference has unsigned overflow, the result returned is the +mathematical result modulo 2\ :sup:`n`\ , where n is the bit width of +the result. + +Because LLVM integers use a two's complement representation, this +instruction is appropriate for both signed and unsigned integers. + +``nuw`` and ``nsw`` stand for "No Unsigned Wrap" and "No Signed Wrap", +respectively. If the ``nuw`` and/or ``nsw`` keywords are present, the +result value of the ``sub`` is a :ref:`poison value <poisonvalues>` if +unsigned and/or signed overflow, respectively, occurs. + +Example: +"""""""" + +.. code-block:: llvm + + <result> = sub i32 4, %var ; yields {i32}:result = 4 - %var + <result> = sub i32 0, %val ; yields {i32}:result = -%var + +.. _i_fsub: + +'``fsub``' Instruction +^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = fsub [fast-math flags]* <ty> <op1>, <op2> ; yields {ty}:result + +Overview: +""""""""" + +The '``fsub``' instruction returns the difference of its two operands. + +Note that the '``fsub``' instruction is used to represent the '``fneg``' +instruction present in most other intermediate representations. + +Arguments: +"""""""""" + +The two arguments to the '``fsub``' instruction must be :ref:`floating +point <t_floating>` or :ref:`vector <t_vector>` of floating point values. +Both arguments must have identical types. + +Semantics: +"""""""""" + +The value produced is the floating point difference of the two operands. +This instruction can also take any number of :ref:`fast-math +flags <fastmath>`, which are optimization hints to enable otherwise +unsafe floating point optimizations: + +Example: +"""""""" + +.. code-block:: llvm + + <result> = fsub float 4.0, %var ; yields {float}:result = 4.0 - %var + <result> = fsub float -0.0, %val ; yields {float}:result = -%var + +'``mul``' Instruction +^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = mul <ty> <op1>, <op2> ; yields {ty}:result + <result> = mul nuw <ty> <op1>, <op2> ; yields {ty}:result + <result> = mul nsw <ty> <op1>, <op2> ; yields {ty}:result + <result> = mul nuw nsw <ty> <op1>, <op2> ; yields {ty}:result + +Overview: +""""""""" + +The '``mul``' instruction returns the product of its two operands. + +Arguments: +"""""""""" + +The two arguments to the '``mul``' instruction must be +:ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer values. Both +arguments must have identical types. + +Semantics: +"""""""""" + +The value produced is the integer product of the two operands. + +If the result of the multiplication has unsigned overflow, the result +returned is the mathematical result modulo 2\ :sup:`n`\ , where n is the +bit width of the result. + +Because LLVM integers use a two's complement representation, and the +result is the same width as the operands, this instruction returns the +correct result for both signed and unsigned integers. If a full product +(e.g. ``i32`` * ``i32`` -> ``i64``) is needed, the operands should be +sign-extended or zero-extended as appropriate to the width of the full +product. + +``nuw`` and ``nsw`` stand for "No Unsigned Wrap" and "No Signed Wrap", +respectively. If the ``nuw`` and/or ``nsw`` keywords are present, the +result value of the ``mul`` is a :ref:`poison value <poisonvalues>` if +unsigned and/or signed overflow, respectively, occurs. + +Example: +"""""""" + +.. code-block:: llvm + + <result> = mul i32 4, %var ; yields {i32}:result = 4 * %var + +.. _i_fmul: + +'``fmul``' Instruction +^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = fmul [fast-math flags]* <ty> <op1>, <op2> ; yields {ty}:result + +Overview: +""""""""" + +The '``fmul``' instruction returns the product of its two operands. + +Arguments: +"""""""""" + +The two arguments to the '``fmul``' instruction must be :ref:`floating +point <t_floating>` or :ref:`vector <t_vector>` of floating point values. +Both arguments must have identical types. + +Semantics: +"""""""""" + +The value produced is the floating point product of the two operands. +This instruction can also take any number of :ref:`fast-math +flags <fastmath>`, which are optimization hints to enable otherwise +unsafe floating point optimizations: + +Example: +"""""""" + +.. code-block:: llvm + + <result> = fmul float 4.0, %var ; yields {float}:result = 4.0 * %var + +'``udiv``' Instruction +^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = udiv <ty> <op1>, <op2> ; yields {ty}:result + <result> = udiv exact <ty> <op1>, <op2> ; yields {ty}:result + +Overview: +""""""""" + +The '``udiv``' instruction returns the quotient of its two operands. + +Arguments: +"""""""""" + +The two arguments to the '``udiv``' instruction must be +:ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer values. Both +arguments must have identical types. + +Semantics: +"""""""""" + +The value produced is the unsigned integer quotient of the two operands. + +Note that unsigned integer division and signed integer division are +distinct operations; for signed integer division, use '``sdiv``'. + +Division by zero leads to undefined behavior. + +If the ``exact`` keyword is present, the result value of the ``udiv`` is +a :ref:`poison value <poisonvalues>` if %op1 is not a multiple of %op2 (as +such, "((a udiv exact b) mul b) == a"). + +Example: +"""""""" + +.. code-block:: llvm + + <result> = udiv i32 4, %var ; yields {i32}:result = 4 / %var + +'``sdiv``' Instruction +^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = sdiv <ty> <op1>, <op2> ; yields {ty}:result + <result> = sdiv exact <ty> <op1>, <op2> ; yields {ty}:result + +Overview: +""""""""" + +The '``sdiv``' instruction returns the quotient of its two operands. + +Arguments: +"""""""""" + +The two arguments to the '``sdiv``' instruction must be +:ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer values. Both +arguments must have identical types. + +Semantics: +"""""""""" + +The value produced is the signed integer quotient of the two operands +rounded towards zero. + +Note that signed integer division and unsigned integer division are +distinct operations; for unsigned integer division, use '``udiv``'. + +Division by zero leads to undefined behavior. Overflow also leads to +undefined behavior; this is a rare case, but can occur, for example, by +doing a 32-bit division of -2147483648 by -1. + +If the ``exact`` keyword is present, the result value of the ``sdiv`` is +a :ref:`poison value <poisonvalues>` if the result would be rounded. + +Example: +"""""""" + +.. code-block:: llvm + + <result> = sdiv i32 4, %var ; yields {i32}:result = 4 / %var + +.. _i_fdiv: + +'``fdiv``' Instruction +^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = fdiv [fast-math flags]* <ty> <op1>, <op2> ; yields {ty}:result + +Overview: +""""""""" + +The '``fdiv``' instruction returns the quotient of its two operands. + +Arguments: +"""""""""" + +The two arguments to the '``fdiv``' instruction must be :ref:`floating +point <t_floating>` or :ref:`vector <t_vector>` of floating point values. +Both arguments must have identical types. + +Semantics: +"""""""""" + +The value produced is the floating point quotient of the two operands. +This instruction can also take any number of :ref:`fast-math +flags <fastmath>`, which are optimization hints to enable otherwise +unsafe floating point optimizations: + +Example: +"""""""" + +.. code-block:: llvm + + <result> = fdiv float 4.0, %var ; yields {float}:result = 4.0 / %var + +'``urem``' Instruction +^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = urem <ty> <op1>, <op2> ; yields {ty}:result + +Overview: +""""""""" + +The '``urem``' instruction returns the remainder from the unsigned +division of its two arguments. + +Arguments: +"""""""""" + +The two arguments to the '``urem``' instruction must be +:ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer values. Both +arguments must have identical types. + +Semantics: +"""""""""" + +This instruction returns the unsigned integer *remainder* of a division. +This instruction always performs an unsigned division to get the +remainder. + +Note that unsigned integer remainder and signed integer remainder are +distinct operations; for signed integer remainder, use '``srem``'. + +Taking the remainder of a division by zero leads to undefined behavior. + +Example: +"""""""" + +.. code-block:: llvm + + <result> = urem i32 4, %var ; yields {i32}:result = 4 % %var + +'``srem``' Instruction +^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = srem <ty> <op1>, <op2> ; yields {ty}:result + +Overview: +""""""""" + +The '``srem``' instruction returns the remainder from the signed +division of its two operands. This instruction can also take +:ref:`vector <t_vector>` versions of the values in which case the elements +must be integers. + +Arguments: +"""""""""" + +The two arguments to the '``srem``' instruction must be +:ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer values. Both +arguments must have identical types. + +Semantics: +"""""""""" + +This instruction returns the *remainder* of a division (where the result +is either zero or has the same sign as the dividend, ``op1``), not the +*modulo* operator (where the result is either zero or has the same sign +as the divisor, ``op2``) of a value. For more information about the +difference, see `The Math +Forum <http://mathforum.org/dr.math/problems/anne.4.28.99.html>`_. For a +table of how this is implemented in various languages, please see +`Wikipedia: modulo +operation <http://en.wikipedia.org/wiki/Modulo_operation>`_. + +Note that signed integer remainder and unsigned integer remainder are +distinct operations; for unsigned integer remainder, use '``urem``'. + +Taking the remainder of a division by zero leads to undefined behavior. +Overflow also leads to undefined behavior; this is a rare case, but can +occur, for example, by taking the remainder of a 32-bit division of +-2147483648 by -1. (The remainder doesn't actually overflow, but this +rule lets srem be implemented using instructions that return both the +result of the division and the remainder.) + +Example: +"""""""" + +.. code-block:: llvm + + <result> = srem i32 4, %var ; yields {i32}:result = 4 % %var + +.. _i_frem: + +'``frem``' Instruction +^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = frem [fast-math flags]* <ty> <op1>, <op2> ; yields {ty}:result + +Overview: +""""""""" + +The '``frem``' instruction returns the remainder from the division of +its two operands. + +Arguments: +"""""""""" + +The two arguments to the '``frem``' instruction must be :ref:`floating +point <t_floating>` or :ref:`vector <t_vector>` of floating point values. +Both arguments must have identical types. + +Semantics: +"""""""""" + +This instruction returns the *remainder* of a division. The remainder +has the same sign as the dividend. This instruction can also take any +number of :ref:`fast-math flags <fastmath>`, which are optimization hints +to enable otherwise unsafe floating point optimizations: + +Example: +"""""""" + +.. code-block:: llvm + + <result> = frem float 4.0, %var ; yields {float}:result = 4.0 % %var + +.. _bitwiseops: + +Bitwise Binary Operations +------------------------- + +Bitwise binary operators are used to do various forms of bit-twiddling +in a program. They are generally very efficient instructions and can +commonly be strength reduced from other instructions. They require two +operands of the same type, execute an operation on them, and produce a +single value. The resulting value is the same type as its operands. + +'``shl``' Instruction +^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = shl <ty> <op1>, <op2> ; yields {ty}:result + <result> = shl nuw <ty> <op1>, <op2> ; yields {ty}:result + <result> = shl nsw <ty> <op1>, <op2> ; yields {ty}:result + <result> = shl nuw nsw <ty> <op1>, <op2> ; yields {ty}:result + +Overview: +""""""""" + +The '``shl``' instruction returns the first operand shifted to the left +a specified number of bits. + +Arguments: +"""""""""" + +Both arguments to the '``shl``' instruction must be the same +:ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer type. +'``op2``' is treated as an unsigned value. + +Semantics: +"""""""""" + +The value produced is ``op1`` \* 2\ :sup:`op2` mod 2\ :sup:`n`, +where ``n`` is the width of the result. If ``op2`` is (statically or +dynamically) negative or equal to or larger than the number of bits in +``op1``, the result is undefined. If the arguments are vectors, each +vector element of ``op1`` is shifted by the corresponding shift amount +in ``op2``. + +If the ``nuw`` keyword is present, then the shift produces a :ref:`poison +value <poisonvalues>` if it shifts out any non-zero bits. If the +``nsw`` keyword is present, then the shift produces a :ref:`poison +value <poisonvalues>` if it shifts out any bits that disagree with the +resultant sign bit. As such, NUW/NSW have the same semantics as they +would if the shift were expressed as a mul instruction with the same +nsw/nuw bits in (mul %op1, (shl 1, %op2)). + +Example: +"""""""" + +.. code-block:: llvm + + <result> = shl i32 4, %var ; yields {i32}: 4 << %var + <result> = shl i32 4, 2 ; yields {i32}: 16 + <result> = shl i32 1, 10 ; yields {i32}: 1024 + <result> = shl i32 1, 32 ; undefined + <result> = shl <2 x i32> < i32 1, i32 1>, < i32 1, i32 2> ; yields: result=<2 x i32> < i32 2, i32 4> + +'``lshr``' Instruction +^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = lshr <ty> <op1>, <op2> ; yields {ty}:result + <result> = lshr exact <ty> <op1>, <op2> ; yields {ty}:result + +Overview: +""""""""" + +The '``lshr``' instruction (logical shift right) returns the first +operand shifted to the right a specified number of bits with zero fill. + +Arguments: +"""""""""" + +Both arguments to the '``lshr``' instruction must be the same +:ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer type. +'``op2``' is treated as an unsigned value. + +Semantics: +"""""""""" + +This instruction always performs a logical shift right operation. The +most significant bits of the result will be filled with zero bits after +the shift. If ``op2`` is (statically or dynamically) equal to or larger +than the number of bits in ``op1``, the result is undefined. If the +arguments are vectors, each vector element of ``op1`` is shifted by the +corresponding shift amount in ``op2``. + +If the ``exact`` keyword is present, the result value of the ``lshr`` is +a :ref:`poison value <poisonvalues>` if any of the bits shifted out are +non-zero. + +Example: +"""""""" + +.. code-block:: llvm + + <result> = lshr i32 4, 1 ; yields {i32}:result = 2 + <result> = lshr i32 4, 2 ; yields {i32}:result = 1 + <result> = lshr i8 4, 3 ; yields {i8}:result = 0 + <result> = lshr i8 -2, 1 ; yields {i8}:result = 0x7FFFFFFF + <result> = lshr i32 1, 32 ; undefined + <result> = lshr <2 x i32> < i32 -2, i32 4>, < i32 1, i32 2> ; yields: result=<2 x i32> < i32 0x7FFFFFFF, i32 1> + +'``ashr``' Instruction +^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = ashr <ty> <op1>, <op2> ; yields {ty}:result + <result> = ashr exact <ty> <op1>, <op2> ; yields {ty}:result + +Overview: +""""""""" + +The '``ashr``' instruction (arithmetic shift right) returns the first +operand shifted to the right a specified number of bits with sign +extension. + +Arguments: +"""""""""" + +Both arguments to the '``ashr``' instruction must be the same +:ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer type. +'``op2``' is treated as an unsigned value. + +Semantics: +"""""""""" + +This instruction always performs an arithmetic shift right operation, +The most significant bits of the result will be filled with the sign bit +of ``op1``. If ``op2`` is (statically or dynamically) equal to or larger +than the number of bits in ``op1``, the result is undefined. If the +arguments are vectors, each vector element of ``op1`` is shifted by the +corresponding shift amount in ``op2``. + +If the ``exact`` keyword is present, the result value of the ``ashr`` is +a :ref:`poison value <poisonvalues>` if any of the bits shifted out are +non-zero. + +Example: +"""""""" + +.. code-block:: llvm + + <result> = ashr i32 4, 1 ; yields {i32}:result = 2 + <result> = ashr i32 4, 2 ; yields {i32}:result = 1 + <result> = ashr i8 4, 3 ; yields {i8}:result = 0 + <result> = ashr i8 -2, 1 ; yields {i8}:result = -1 + <result> = ashr i32 1, 32 ; undefined + <result> = ashr <2 x i32> < i32 -2, i32 4>, < i32 1, i32 3> ; yields: result=<2 x i32> < i32 -1, i32 0> + +'``and``' Instruction +^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = and <ty> <op1>, <op2> ; yields {ty}:result + +Overview: +""""""""" + +The '``and``' instruction returns the bitwise logical and of its two +operands. + +Arguments: +"""""""""" + +The two arguments to the '``and``' instruction must be +:ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer values. Both +arguments must have identical types. + +Semantics: +"""""""""" + +The truth table used for the '``and``' instruction is: + ++-----+-----+-----+ +| In0 | In1 | Out | ++-----+-----+-----+ +| 0 | 0 | 0 | ++-----+-----+-----+ +| 0 | 1 | 0 | ++-----+-----+-----+ +| 1 | 0 | 0 | ++-----+-----+-----+ +| 1 | 1 | 1 | ++-----+-----+-----+ + +Example: +"""""""" + +.. code-block:: llvm + + <result> = and i32 4, %var ; yields {i32}:result = 4 & %var + <result> = and i32 15, 40 ; yields {i32}:result = 8 + <result> = and i32 4, 8 ; yields {i32}:result = 0 + +'``or``' Instruction +^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = or <ty> <op1>, <op2> ; yields {ty}:result + +Overview: +""""""""" + +The '``or``' instruction returns the bitwise logical inclusive or of its +two operands. + +Arguments: +"""""""""" + +The two arguments to the '``or``' instruction must be +:ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer values. Both +arguments must have identical types. + +Semantics: +"""""""""" + +The truth table used for the '``or``' instruction is: + ++-----+-----+-----+ +| In0 | In1 | Out | ++-----+-----+-----+ +| 0 | 0 | 0 | ++-----+-----+-----+ +| 0 | 1 | 1 | ++-----+-----+-----+ +| 1 | 0 | 1 | ++-----+-----+-----+ +| 1 | 1 | 1 | ++-----+-----+-----+ + +Example: +"""""""" + +:: + + <result> = or i32 4, %var ; yields {i32}:result = 4 | %var + <result> = or i32 15, 40 ; yields {i32}:result = 47 + <result> = or i32 4, 8 ; yields {i32}:result = 12 + +'``xor``' Instruction +^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = xor <ty> <op1>, <op2> ; yields {ty}:result + +Overview: +""""""""" + +The '``xor``' instruction returns the bitwise logical exclusive or of +its two operands. The ``xor`` is used to implement the "one's +complement" operation, which is the "~" operator in C. + +Arguments: +"""""""""" + +The two arguments to the '``xor``' instruction must be +:ref:`integer <t_integer>` or :ref:`vector <t_vector>` of integer values. Both +arguments must have identical types. + +Semantics: +"""""""""" + +The truth table used for the '``xor``' instruction is: + ++-----+-----+-----+ +| In0 | In1 | Out | ++-----+-----+-----+ +| 0 | 0 | 0 | ++-----+-----+-----+ +| 0 | 1 | 1 | ++-----+-----+-----+ +| 1 | 0 | 1 | ++-----+-----+-----+ +| 1 | 1 | 0 | ++-----+-----+-----+ + +Example: +"""""""" + +.. code-block:: llvm + + <result> = xor i32 4, %var ; yields {i32}:result = 4 ^ %var + <result> = xor i32 15, 40 ; yields {i32}:result = 39 + <result> = xor i32 4, 8 ; yields {i32}:result = 12 + <result> = xor i32 %V, -1 ; yields {i32}:result = ~%V + +Vector Operations +----------------- + +LLVM supports several instructions to represent vector operations in a +target-independent manner. These instructions cover the element-access +and vector-specific operations needed to process vectors effectively. +While LLVM does directly support these vector operations, many +sophisticated algorithms will want to use target-specific intrinsics to +take full advantage of a specific target. + +.. _i_extractelement: + +'``extractelement``' Instruction +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = extractelement <n x <ty>> <val>, i32 <idx> ; yields <ty> + +Overview: +""""""""" + +The '``extractelement``' instruction extracts a single scalar element +from a vector at a specified index. + +Arguments: +"""""""""" + +The first operand of an '``extractelement``' instruction is a value of +:ref:`vector <t_vector>` type. The second operand is an index indicating +the position from which to extract the element. The index may be a +variable. + +Semantics: +"""""""""" + +The result is a scalar of the same type as the element type of ``val``. +Its value is the value at position ``idx`` of ``val``. If ``idx`` +exceeds the length of ``val``, the results are undefined. + +Example: +"""""""" + +.. code-block:: llvm + + <result> = extractelement <4 x i32> %vec, i32 0 ; yields i32 + +.. _i_insertelement: + +'``insertelement``' Instruction +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = insertelement <n x <ty>> <val>, <ty> <elt>, i32 <idx> ; yields <n x <ty>> + +Overview: +""""""""" + +The '``insertelement``' instruction inserts a scalar element into a +vector at a specified index. + +Arguments: +"""""""""" + +The first operand of an '``insertelement``' instruction is a value of +:ref:`vector <t_vector>` type. The second operand is a scalar value whose +type must equal the element type of the first operand. The third operand +is an index indicating the position at which to insert the value. The +index may be a variable. + +Semantics: +"""""""""" + +The result is a vector of the same type as ``val``. Its element values +are those of ``val`` except at position ``idx``, where it gets the value +``elt``. If ``idx`` exceeds the length of ``val``, the results are +undefined. + +Example: +"""""""" + +.. code-block:: llvm + + <result> = insertelement <4 x i32> %vec, i32 1, i32 0 ; yields <4 x i32> + +.. _i_shufflevector: + +'``shufflevector``' Instruction +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = shufflevector <n x <ty>> <v1>, <n x <ty>> <v2>, <m x i32> <mask> ; yields <m x <ty>> + +Overview: +""""""""" + +The '``shufflevector``' instruction constructs a permutation of elements +from two input vectors, returning a vector with the same element type as +the input and length that is the same as the shuffle mask. + +Arguments: +"""""""""" + +The first two operands of a '``shufflevector``' instruction are vectors +with the same type. The third argument is a shuffle mask whose element +type is always 'i32'. The result of the instruction is a vector whose +length is the same as the shuffle mask and whose element type is the +same as the element type of the first two operands. + +The shuffle mask operand is required to be a constant vector with either +constant integer or undef values. + +Semantics: +"""""""""" + +The elements of the two input vectors are numbered from left to right +across both of the vectors. The shuffle mask operand specifies, for each +element of the result vector, which element of the two input vectors the +result element gets. The element selector may be undef (meaning "don't +care") and the second operand may be undef if performing a shuffle from +only one vector. + +Example: +"""""""" + +.. code-block:: llvm + + <result> = shufflevector <4 x i32> %v1, <4 x i32> %v2, + <4 x i32> <i32 0, i32 4, i32 1, i32 5> ; yields <4 x i32> + <result> = shufflevector <4 x i32> %v1, <4 x i32> undef, + <4 x i32> <i32 0, i32 1, i32 2, i32 3> ; yields <4 x i32> - Identity shuffle. + <result> = shufflevector <8 x i32> %v1, <8 x i32> undef, + <4 x i32> <i32 0, i32 1, i32 2, i32 3> ; yields <4 x i32> + <result> = shufflevector <4 x i32> %v1, <4 x i32> %v2, + <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7 > ; yields <8 x i32> + +Aggregate Operations +-------------------- + +LLVM supports several instructions for working with +:ref:`aggregate <t_aggregate>` values. + +.. _i_extractvalue: + +'``extractvalue``' Instruction +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = extractvalue <aggregate type> <val>, <idx>{, <idx>}* + +Overview: +""""""""" + +The '``extractvalue``' instruction extracts the value of a member field +from an :ref:`aggregate <t_aggregate>` value. + +Arguments: +"""""""""" + +The first operand of an '``extractvalue``' instruction is a value of +:ref:`struct <t_struct>` or :ref:`array <t_array>` type. The operands are +constant indices to specify which value to extract in a similar manner +as indices in a '``getelementptr``' instruction. + +The major differences to ``getelementptr`` indexing are: + +- Since the value being indexed is not a pointer, the first index is + omitted and assumed to be zero. +- At least one index must be specified. +- Not only struct indices but also array indices must be in bounds. + +Semantics: +"""""""""" + +The result is the value at the position in the aggregate specified by +the index operands. + +Example: +"""""""" + +.. code-block:: llvm + + <result> = extractvalue {i32, float} %agg, 0 ; yields i32 + +.. _i_insertvalue: + +'``insertvalue``' Instruction +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = insertvalue <aggregate type> <val>, <ty> <elt>, <idx>{, <idx>}* ; yields <aggregate type> + +Overview: +""""""""" + +The '``insertvalue``' instruction inserts a value into a member field in +an :ref:`aggregate <t_aggregate>` value. + +Arguments: +"""""""""" + +The first operand of an '``insertvalue``' instruction is a value of +:ref:`struct <t_struct>` or :ref:`array <t_array>` type. The second operand is +a first-class value to insert. The following operands are constant +indices indicating the position at which to insert the value in a +similar manner as indices in a '``extractvalue``' instruction. The value +to insert must have the same type as the value identified by the +indices. + +Semantics: +"""""""""" + +The result is an aggregate of the same type as ``val``. Its value is +that of ``val`` except that the value at the position specified by the +indices is that of ``elt``. + +Example: +"""""""" + +.. code-block:: llvm + + %agg1 = insertvalue {i32, float} undef, i32 1, 0 ; yields {i32 1, float undef} + %agg2 = insertvalue {i32, float} %agg1, float %val, 1 ; yields {i32 1, float %val} + %agg3 = insertvalue {i32, {float}} %agg1, float %val, 1, 0 ; yields {i32 1, float %val} + +.. _memoryops: + +Memory Access and Addressing Operations +--------------------------------------- + +A key design point of an SSA-based representation is how it represents +memory. In LLVM, no memory locations are in SSA form, which makes things +very simple. This section describes how to read, write, and allocate +memory in LLVM. + +.. _i_alloca: + +'``alloca``' Instruction +^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = alloca <type>[, <ty> <NumElements>][, align <alignment>] ; yields {type*}:result + +Overview: +""""""""" + +The '``alloca``' instruction allocates memory on the stack frame of the +currently executing function, to be automatically released when this +function returns to its caller. The object is always allocated in the +generic address space (address space zero). + +Arguments: +"""""""""" + +The '``alloca``' instruction allocates ``sizeof(<type>)*NumElements`` +bytes of memory on the runtime stack, returning a pointer of the +appropriate type to the program. If "NumElements" is specified, it is +the number of elements allocated, otherwise "NumElements" is defaulted +to be one. If a constant alignment is specified, the value result of the +allocation is guaranteed to be aligned to at least that boundary. If not +specified, or if zero, the target can choose to align the allocation on +any convenient boundary compatible with the type. + +'``type``' may be any sized type. + +Semantics: +"""""""""" + +Memory is allocated; a pointer is returned. The operation is undefined +if there is insufficient stack space for the allocation. '``alloca``'d +memory is automatically released when the function returns. The +'``alloca``' instruction is commonly used to represent automatic +variables that must have an address available. When the function returns +(either with the ``ret`` or ``resume`` instructions), the memory is +reclaimed. Allocating zero bytes is legal, but the result is undefined. +The order in which memory is allocated (ie., which way the stack grows) +is not specified. + +Example: +"""""""" + +.. code-block:: llvm + + %ptr = alloca i32 ; yields {i32*}:ptr + %ptr = alloca i32, i32 4 ; yields {i32*}:ptr + %ptr = alloca i32, i32 4, align 1024 ; yields {i32*}:ptr + %ptr = alloca i32, align 1024 ; yields {i32*}:ptr + +.. _i_load: + +'``load``' Instruction +^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = load [volatile] <ty>* <pointer>[, align <alignment>][, !nontemporal !<index>][, !invariant.load !<index>] + <result> = load atomic [volatile] <ty>* <pointer> [singlethread] <ordering>, align <alignment> + !<index> = !{ i32 1 } + +Overview: +""""""""" + +The '``load``' instruction is used to read from memory. + +Arguments: +"""""""""" + +The argument to the '``load``' instruction specifies the memory address +from which to load. The pointer must point to a :ref:`first +class <t_firstclass>` type. If the ``load`` is marked as ``volatile``, +then the optimizer is not allowed to modify the number or order of +execution of this ``load`` with other :ref:`volatile +operations <volatile>`. + +If the ``load`` is marked as ``atomic``, it takes an extra +:ref:`ordering <ordering>` and optional ``singlethread`` argument. The +``release`` and ``acq_rel`` orderings are not valid on ``load`` +instructions. Atomic loads produce :ref:`defined <memmodel>` results +when they may see multiple atomic stores. The type of the pointee must +be an integer type whose bit width is a power of two greater than or +equal to eight and less than or equal to a target-specific size limit. +``align`` must be explicitly specified on atomic loads, and the load has +undefined behavior if the alignment is not set to a value which is at +least the size in bytes of the pointee. ``!nontemporal`` does not have +any defined semantics for atomic loads. + +The optional constant ``align`` argument specifies the alignment of the +operation (that is, the alignment of the memory address). A value of 0 +or an omitted ``align`` argument means that the operation has the abi +alignment for the target. It is the responsibility of the code emitter +to ensure that the alignment information is correct. Overestimating the +alignment results in undefined behavior. Underestimating the alignment +may produce less efficient code. An alignment of 1 is always safe. + +The optional ``!nontemporal`` metadata must reference a single +metatadata name <index> corresponding to a metadata node with one +``i32`` entry of value 1. The existence of the ``!nontemporal`` +metatadata on the instruction tells the optimizer and code generator +that this load is not expected to be reused in the cache. The code +generator may select special instructions to save cache bandwidth, such +as the ``MOVNT`` instruction on x86. + +The optional ``!invariant.load`` metadata must reference a single +metatadata name <index> corresponding to a metadata node with no +entries. The existence of the ``!invariant.load`` metatadata on the +instruction tells the optimizer and code generator that this load +address points to memory which does not change value during program +execution. The optimizer may then move this load around, for example, by +hoisting it out of loops using loop invariant code motion. + +Semantics: +"""""""""" + +The location of memory pointed to is loaded. If the value being loaded +is of scalar type then the number of bytes read does not exceed the +minimum number of bytes needed to hold all bits of the type. For +example, loading an ``i24`` reads at most three bytes. When loading a +value of a type like ``i20`` with a size that is not an integral number +of bytes, the result is undefined if the value was not originally +written using a store of the same type. + +Examples: +""""""""" + +.. code-block:: llvm + + %ptr = alloca i32 ; yields {i32*}:ptr + store i32 3, i32* %ptr ; yields {void} + %val = load i32* %ptr ; yields {i32}:val = i32 3 + +.. _i_store: + +'``store``' Instruction +^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + store [volatile] <ty> <value>, <ty>* <pointer>[, align <alignment>][, !nontemporal !<index>] ; yields {void} + store atomic [volatile] <ty> <value>, <ty>* <pointer> [singlethread] <ordering>, align <alignment> ; yields {void} + +Overview: +""""""""" + +The '``store``' instruction is used to write to memory. + +Arguments: +"""""""""" + +There are two arguments to the '``store``' instruction: a value to store +and an address at which to store it. The type of the '``<pointer>``' +operand must be a pointer to the :ref:`first class <t_firstclass>` type of +the '``<value>``' operand. If the ``store`` is marked as ``volatile``, +then the optimizer is not allowed to modify the number or order of +execution of this ``store`` with other :ref:`volatile +operations <volatile>`. + +If the ``store`` is marked as ``atomic``, it takes an extra +:ref:`ordering <ordering>` and optional ``singlethread`` argument. The +``acquire`` and ``acq_rel`` orderings aren't valid on ``store`` +instructions. Atomic loads produce :ref:`defined <memmodel>` results +when they may see multiple atomic stores. The type of the pointee must +be an integer type whose bit width is a power of two greater than or +equal to eight and less than or equal to a target-specific size limit. +``align`` must be explicitly specified on atomic stores, and the store +has undefined behavior if the alignment is not set to a value which is +at least the size in bytes of the pointee. ``!nontemporal`` does not +have any defined semantics for atomic stores. + +The optional constant "align" argument specifies the alignment of the +operation (that is, the alignment of the memory address). A value of 0 +or an omitted "align" argument means that the operation has the abi +alignment for the target. It is the responsibility of the code emitter +to ensure that the alignment information is correct. Overestimating the +alignment results in an undefined behavior. Underestimating the +alignment may produce less efficient code. An alignment of 1 is always +safe. + +The optional !nontemporal metadata must reference a single metatadata +name <index> corresponding to a metadata node with one i32 entry of +value 1. The existence of the !nontemporal metatadata on the instruction +tells the optimizer and code generator that this load is not expected to +be reused in the cache. The code generator may select special +instructions to save cache bandwidth, such as the MOVNT instruction on +x86. + +Semantics: +"""""""""" + +The contents of memory are updated to contain '``<value>``' at the +location specified by the '``<pointer>``' operand. If '``<value>``' is +of scalar type then the number of bytes written does not exceed the +minimum number of bytes needed to hold all bits of the type. For +example, storing an ``i24`` writes at most three bytes. When writing a +value of a type like ``i20`` with a size that is not an integral number +of bytes, it is unspecified what happens to the extra bits that do not +belong to the type, but they will typically be overwritten. + +Example: +"""""""" + +.. code-block:: llvm + + %ptr = alloca i32 ; yields {i32*}:ptr + store i32 3, i32* %ptr ; yields {void} + %val = load i32* %ptr ; yields {i32}:val = i32 3 + +.. _i_fence: + +'``fence``' Instruction +^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + fence [singlethread] <ordering> ; yields {void} + +Overview: +""""""""" + +The '``fence``' instruction is used to introduce happens-before edges +between operations. + +Arguments: +"""""""""" + +'``fence``' instructions take an :ref:`ordering <ordering>` argument which +defines what *synchronizes-with* edges they add. They can only be given +``acquire``, ``release``, ``acq_rel``, and ``seq_cst`` orderings. + +Semantics: +"""""""""" + +A fence A which has (at least) ``release`` ordering semantics +*synchronizes with* a fence B with (at least) ``acquire`` ordering +semantics if and only if there exist atomic operations X and Y, both +operating on some atomic object M, such that A is sequenced before X, X +modifies M (either directly or through some side effect of a sequence +headed by X), Y is sequenced before B, and Y observes M. This provides a +*happens-before* dependency between A and B. Rather than an explicit +``fence``, one (but not both) of the atomic operations X or Y might +provide a ``release`` or ``acquire`` (resp.) ordering constraint and +still *synchronize-with* the explicit ``fence`` and establish the +*happens-before* edge. + +A ``fence`` which has ``seq_cst`` ordering, in addition to having both +``acquire`` and ``release`` semantics specified above, participates in +the global program order of other ``seq_cst`` operations and/or fences. + +The optional ":ref:`singlethread <singlethread>`" argument specifies +that the fence only synchronizes with other fences in the same thread. +(This is useful for interacting with signal handlers.) + +Example: +"""""""" + +.. code-block:: llvm + + fence acquire ; yields {void} + fence singlethread seq_cst ; yields {void} + +.. _i_cmpxchg: + +'``cmpxchg``' Instruction +^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + cmpxchg [volatile] <ty>* <pointer>, <ty> <cmp>, <ty> <new> [singlethread] <ordering> ; yields {ty} + +Overview: +""""""""" + +The '``cmpxchg``' instruction is used to atomically modify memory. It +loads a value in memory and compares it to a given value. If they are +equal, it stores a new value into the memory. + +Arguments: +"""""""""" + +There are three arguments to the '``cmpxchg``' instruction: an address +to operate on, a value to compare to the value currently be at that +address, and a new value to place at that address if the compared values +are equal. The type of '<cmp>' must be an integer type whose bit width +is a power of two greater than or equal to eight and less than or equal +to a target-specific size limit. '<cmp>' and '<new>' must have the same +type, and the type of '<pointer>' must be a pointer to that type. If the +``cmpxchg`` is marked as ``volatile``, then the optimizer is not allowed +to modify the number or order of execution of this ``cmpxchg`` with +other :ref:`volatile operations <volatile>`. + +The :ref:`ordering <ordering>` argument specifies how this ``cmpxchg`` +synchronizes with other atomic operations. + +The optional "``singlethread``" argument declares that the ``cmpxchg`` +is only atomic with respect to code (usually signal handlers) running in +the same thread as the ``cmpxchg``. Otherwise the cmpxchg is atomic with +respect to all other code in the system. + +The pointer passed into cmpxchg must have alignment greater than or +equal to the size in memory of the operand. + +Semantics: +"""""""""" + +The contents of memory at the location specified by the '``<pointer>``' +operand is read and compared to '``<cmp>``'; if the read value is the +equal, '``<new>``' is written. The original value at the location is +returned. + +A successful ``cmpxchg`` is a read-modify-write instruction for the purpose +of identifying release sequences. A failed ``cmpxchg`` is equivalent to an +atomic load with an ordering parameter determined by dropping any +``release`` part of the ``cmpxchg``'s ordering. + +Example: +"""""""" + +.. code-block:: llvm + + entry: + %orig = atomic load i32* %ptr unordered ; yields {i32} + br label %loop + + loop: + %cmp = phi i32 [ %orig, %entry ], [%old, %loop] + %squared = mul i32 %cmp, %cmp + %old = cmpxchg i32* %ptr, i32 %cmp, i32 %squared ; yields {i32} + %success = icmp eq i32 %cmp, %old + br i1 %success, label %done, label %loop + + done: + ... + +.. _i_atomicrmw: + +'``atomicrmw``' Instruction +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + atomicrmw [volatile] <operation> <ty>* <pointer>, <ty> <value> [singlethread] <ordering> ; yields {ty} + +Overview: +""""""""" + +The '``atomicrmw``' instruction is used to atomically modify memory. + +Arguments: +"""""""""" + +There are three arguments to the '``atomicrmw``' instruction: an +operation to apply, an address whose value to modify, an argument to the +operation. The operation must be one of the following keywords: + +- xchg +- add +- sub +- and +- nand +- or +- xor +- max +- min +- umax +- umin + +The type of '<value>' must be an integer type whose bit width is a power +of two greater than or equal to eight and less than or equal to a +target-specific size limit. The type of the '``<pointer>``' operand must +be a pointer to that type. If the ``atomicrmw`` is marked as +``volatile``, then the optimizer is not allowed to modify the number or +order of execution of this ``atomicrmw`` with other :ref:`volatile +operations <volatile>`. + +Semantics: +"""""""""" + +The contents of memory at the location specified by the '``<pointer>``' +operand are atomically read, modified, and written back. The original +value at the location is returned. The modification is specified by the +operation argument: + +- xchg: ``*ptr = val`` +- add: ``*ptr = *ptr + val`` +- sub: ``*ptr = *ptr - val`` +- and: ``*ptr = *ptr & val`` +- nand: ``*ptr = ~(*ptr & val)`` +- or: ``*ptr = *ptr | val`` +- xor: ``*ptr = *ptr ^ val`` +- max: ``*ptr = *ptr > val ? *ptr : val`` (using a signed comparison) +- min: ``*ptr = *ptr < val ? *ptr : val`` (using a signed comparison) +- umax: ``*ptr = *ptr > val ? *ptr : val`` (using an unsigned + comparison) +- umin: ``*ptr = *ptr < val ? *ptr : val`` (using an unsigned + comparison) + +Example: +"""""""" + +.. code-block:: llvm + + %old = atomicrmw add i32* %ptr, i32 1 acquire ; yields {i32} + +.. _i_getelementptr: + +'``getelementptr``' Instruction +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = getelementptr <pty>* <ptrval>{, <ty> <idx>}* + <result> = getelementptr inbounds <pty>* <ptrval>{, <ty> <idx>}* + <result> = getelementptr <ptr vector> ptrval, <vector index type> idx + +Overview: +""""""""" + +The '``getelementptr``' instruction is used to get the address of a +subelement of an :ref:`aggregate <t_aggregate>` data structure. It performs +address calculation only and does not access memory. + +Arguments: +"""""""""" + +The first argument is always a pointer or a vector of pointers, and +forms the basis of the calculation. The remaining arguments are indices +that indicate which of the elements of the aggregate object are indexed. +The interpretation of each index is dependent on the type being indexed +into. The first index always indexes the pointer value given as the +first argument, the second index indexes a value of the type pointed to +(not necessarily the value directly pointed to, since the first index +can be non-zero), etc. The first type indexed into must be a pointer +value, subsequent types can be arrays, vectors, and structs. Note that +subsequent types being indexed into can never be pointers, since that +would require loading the pointer before continuing calculation. + +The type of each index argument depends on the type it is indexing into. +When indexing into a (optionally packed) structure, only ``i32`` integer +**constants** are allowed (when using a vector of indices they must all +be the **same** ``i32`` integer constant). When indexing into an array, +pointer or vector, integers of any width are allowed, and they are not +required to be constant. These integers are treated as signed values +where relevant. + +For example, let's consider a C code fragment and how it gets compiled +to LLVM: + +.. code-block:: c + + struct RT { + char A; + int B[10][20]; + char C; + }; + struct ST { + int X; + double Y; + struct RT Z; + }; + + int *foo(struct ST *s) { + return &s[1].Z.B[5][13]; + } + +The LLVM code generated by Clang is: + +.. code-block:: llvm + + %struct.RT = type { i8, [10 x [20 x i32]], i8 } + %struct.ST = type { i32, double, %struct.RT } + + define i32* @foo(%struct.ST* %s) nounwind uwtable readnone optsize ssp { + entry: + %arrayidx = getelementptr inbounds %struct.ST* %s, i64 1, i32 2, i32 1, i64 5, i64 13 + ret i32* %arrayidx + } + +Semantics: +"""""""""" + +In the example above, the first index is indexing into the +'``%struct.ST*``' type, which is a pointer, yielding a '``%struct.ST``' += '``{ i32, double, %struct.RT }``' type, a structure. The second index +indexes into the third element of the structure, yielding a +'``%struct.RT``' = '``{ i8 , [10 x [20 x i32]], i8 }``' type, another +structure. The third index indexes into the second element of the +structure, yielding a '``[10 x [20 x i32]]``' type, an array. The two +dimensions of the array are subscripted into, yielding an '``i32``' +type. The '``getelementptr``' instruction returns a pointer to this +element, thus computing a value of '``i32*``' type. + +Note that it is perfectly legal to index partially through a structure, +returning a pointer to an inner element. Because of this, the LLVM code +for the given testcase is equivalent to: + +.. code-block:: llvm + + define i32* @foo(%struct.ST* %s) { + %t1 = getelementptr %struct.ST* %s, i32 1 ; yields %struct.ST*:%t1 + %t2 = getelementptr %struct.ST* %t1, i32 0, i32 2 ; yields %struct.RT*:%t2 + %t3 = getelementptr %struct.RT* %t2, i32 0, i32 1 ; yields [10 x [20 x i32]]*:%t3 + %t4 = getelementptr [10 x [20 x i32]]* %t3, i32 0, i32 5 ; yields [20 x i32]*:%t4 + %t5 = getelementptr [20 x i32]* %t4, i32 0, i32 13 ; yields i32*:%t5 + ret i32* %t5 + } + +If the ``inbounds`` keyword is present, the result value of the +``getelementptr`` is a :ref:`poison value <poisonvalues>` if the base +pointer is not an *in bounds* address of an allocated object, or if any +of the addresses that would be formed by successive addition of the +offsets implied by the indices to the base address with infinitely +precise signed arithmetic are not an *in bounds* address of that +allocated object. The *in bounds* addresses for an allocated object are +all the addresses that point into the object, plus the address one byte +past the end. In cases where the base is a vector of pointers the +``inbounds`` keyword applies to each of the computations element-wise. + +If the ``inbounds`` keyword is not present, the offsets are added to the +base address with silently-wrapping two's complement arithmetic. If the +offsets have a different width from the pointer, they are sign-extended +or truncated to the width of the pointer. The result value of the +``getelementptr`` may be outside the object pointed to by the base +pointer. The result value may not necessarily be used to access memory +though, even if it happens to point into allocated storage. See the +:ref:`Pointer Aliasing Rules <pointeraliasing>` section for more +information. + +The getelementptr instruction is often confusing. For some more insight +into how it works, see :doc:`the getelementptr FAQ <GetElementPtr>`. + +Example: +"""""""" + +.. code-block:: llvm + + ; yields [12 x i8]*:aptr + %aptr = getelementptr {i32, [12 x i8]}* %saptr, i64 0, i32 1 + ; yields i8*:vptr + %vptr = getelementptr {i32, <2 x i8>}* %svptr, i64 0, i32 1, i32 1 + ; yields i8*:eptr + %eptr = getelementptr [12 x i8]* %aptr, i64 0, i32 1 + ; yields i32*:iptr + %iptr = getelementptr [10 x i32]* @arr, i16 0, i16 0 + +In cases where the pointer argument is a vector of pointers, each index +must be a vector with the same number of elements. For example: + +.. code-block:: llvm + + %A = getelementptr <4 x i8*> %ptrs, <4 x i64> %offsets, + +Conversion Operations +--------------------- + +The instructions in this category are the conversion instructions +(casting) which all take a single operand and a type. They perform +various bit conversions on the operand. + +'``trunc .. to``' Instruction +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = trunc <ty> <value> to <ty2> ; yields ty2 + +Overview: +""""""""" + +The '``trunc``' instruction truncates its operand to the type ``ty2``. + +Arguments: +"""""""""" + +The '``trunc``' instruction takes a value to trunc, and a type to trunc +it to. Both types must be of :ref:`integer <t_integer>` types, or vectors +of the same number of integers. The bit size of the ``value`` must be +larger than the bit size of the destination type, ``ty2``. Equal sized +types are not allowed. + +Semantics: +"""""""""" + +The '``trunc``' instruction truncates the high order bits in ``value`` +and converts the remaining bits to ``ty2``. Since the source size must +be larger than the destination size, ``trunc`` cannot be a *no-op cast*. +It will always truncate bits. + +Example: +"""""""" + +.. code-block:: llvm + + %X = trunc i32 257 to i8 ; yields i8:1 + %Y = trunc i32 123 to i1 ; yields i1:true + %Z = trunc i32 122 to i1 ; yields i1:false + %W = trunc <2 x i16> <i16 8, i16 7> to <2 x i8> ; yields <i8 8, i8 7> + +'``zext .. to``' Instruction +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = zext <ty> <value> to <ty2> ; yields ty2 + +Overview: +""""""""" + +The '``zext``' instruction zero extends its operand to type ``ty2``. + +Arguments: +"""""""""" + +The '``zext``' instruction takes a value to cast, and a type to cast it +to. Both types must be of :ref:`integer <t_integer>` types, or vectors of +the same number of integers. The bit size of the ``value`` must be +smaller than the bit size of the destination type, ``ty2``. + +Semantics: +"""""""""" + +The ``zext`` fills the high order bits of the ``value`` with zero bits +until it reaches the size of the destination type, ``ty2``. + +When zero extending from i1, the result will always be either 0 or 1. + +Example: +"""""""" + +.. code-block:: llvm + + %X = zext i32 257 to i64 ; yields i64:257 + %Y = zext i1 true to i32 ; yields i32:1 + %Z = zext <2 x i16> <i16 8, i16 7> to <2 x i32> ; yields <i32 8, i32 7> + +'``sext .. to``' Instruction +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = sext <ty> <value> to <ty2> ; yields ty2 + +Overview: +""""""""" + +The '``sext``' sign extends ``value`` to the type ``ty2``. + +Arguments: +"""""""""" + +The '``sext``' instruction takes a value to cast, and a type to cast it +to. Both types must be of :ref:`integer <t_integer>` types, or vectors of +the same number of integers. The bit size of the ``value`` must be +smaller than the bit size of the destination type, ``ty2``. + +Semantics: +"""""""""" + +The '``sext``' instruction performs a sign extension by copying the sign +bit (highest order bit) of the ``value`` until it reaches the bit size +of the type ``ty2``. + +When sign extending from i1, the extension always results in -1 or 0. + +Example: +"""""""" + +.. code-block:: llvm + + %X = sext i8 -1 to i16 ; yields i16 :65535 + %Y = sext i1 true to i32 ; yields i32:-1 + %Z = sext <2 x i16> <i16 8, i16 7> to <2 x i32> ; yields <i32 8, i32 7> + +'``fptrunc .. to``' Instruction +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = fptrunc <ty> <value> to <ty2> ; yields ty2 + +Overview: +""""""""" + +The '``fptrunc``' instruction truncates ``value`` to type ``ty2``. + +Arguments: +"""""""""" + +The '``fptrunc``' instruction takes a :ref:`floating point <t_floating>` +value to cast and a :ref:`floating point <t_floating>` type to cast it to. +The size of ``value`` must be larger than the size of ``ty2``. This +implies that ``fptrunc`` cannot be used to make a *no-op cast*. + +Semantics: +"""""""""" + +The '``fptrunc``' instruction truncates a ``value`` from a larger +:ref:`floating point <t_floating>` type to a smaller :ref:`floating +point <t_floating>` type. If the value cannot fit within the +destination type, ``ty2``, then the results are undefined. + +Example: +"""""""" + +.. code-block:: llvm + + %X = fptrunc double 123.0 to float ; yields float:123.0 + %Y = fptrunc double 1.0E+300 to float ; yields undefined + +'``fpext .. to``' Instruction +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = fpext <ty> <value> to <ty2> ; yields ty2 + +Overview: +""""""""" + +The '``fpext``' extends a floating point ``value`` to a larger floating +point value. + +Arguments: +"""""""""" + +The '``fpext``' instruction takes a :ref:`floating point <t_floating>` +``value`` to cast, and a :ref:`floating point <t_floating>` type to cast it +to. The source type must be smaller than the destination type. + +Semantics: +"""""""""" + +The '``fpext``' instruction extends the ``value`` from a smaller +:ref:`floating point <t_floating>` type to a larger :ref:`floating +point <t_floating>` type. The ``fpext`` cannot be used to make a +*no-op cast* because it always changes bits. Use ``bitcast`` to make a +*no-op cast* for a floating point cast. + +Example: +"""""""" + +.. code-block:: llvm + + %X = fpext float 3.125 to double ; yields double:3.125000e+00 + %Y = fpext double %X to fp128 ; yields fp128:0xL00000000000000004000900000000000 + +'``fptoui .. to``' Instruction +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = fptoui <ty> <value> to <ty2> ; yields ty2 + +Overview: +""""""""" + +The '``fptoui``' converts a floating point ``value`` to its unsigned +integer equivalent of type ``ty2``. + +Arguments: +"""""""""" + +The '``fptoui``' instruction takes a value to cast, which must be a +scalar or vector :ref:`floating point <t_floating>` value, and a type to +cast it to ``ty2``, which must be an :ref:`integer <t_integer>` type. If +``ty`` is a vector floating point type, ``ty2`` must be a vector integer +type with the same number of elements as ``ty`` + +Semantics: +"""""""""" + +The '``fptoui``' instruction converts its :ref:`floating +point <t_floating>` operand into the nearest (rounding towards zero) +unsigned integer value. If the value cannot fit in ``ty2``, the results +are undefined. + +Example: +"""""""" + +.. code-block:: llvm + + %X = fptoui double 123.0 to i32 ; yields i32:123 + %Y = fptoui float 1.0E+300 to i1 ; yields undefined:1 + %Z = fptoui float 1.04E+17 to i8 ; yields undefined:1 + +'``fptosi .. to``' Instruction +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = fptosi <ty> <value> to <ty2> ; yields ty2 + +Overview: +""""""""" + +The '``fptosi``' instruction converts :ref:`floating point <t_floating>` +``value`` to type ``ty2``. + +Arguments: +"""""""""" + +The '``fptosi``' instruction takes a value to cast, which must be a +scalar or vector :ref:`floating point <t_floating>` value, and a type to +cast it to ``ty2``, which must be an :ref:`integer <t_integer>` type. If +``ty`` is a vector floating point type, ``ty2`` must be a vector integer +type with the same number of elements as ``ty`` + +Semantics: +"""""""""" + +The '``fptosi``' instruction converts its :ref:`floating +point <t_floating>` operand into the nearest (rounding towards zero) +signed integer value. If the value cannot fit in ``ty2``, the results +are undefined. + +Example: +"""""""" + +.. code-block:: llvm + + %X = fptosi double -123.0 to i32 ; yields i32:-123 + %Y = fptosi float 1.0E-247 to i1 ; yields undefined:1 + %Z = fptosi float 1.04E+17 to i8 ; yields undefined:1 + +'``uitofp .. to``' Instruction +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = uitofp <ty> <value> to <ty2> ; yields ty2 + +Overview: +""""""""" + +The '``uitofp``' instruction regards ``value`` as an unsigned integer +and converts that value to the ``ty2`` type. + +Arguments: +"""""""""" + +The '``uitofp``' instruction takes a value to cast, which must be a +scalar or vector :ref:`integer <t_integer>` value, and a type to cast it to +``ty2``, which must be an :ref:`floating point <t_floating>` type. If +``ty`` is a vector integer type, ``ty2`` must be a vector floating point +type with the same number of elements as ``ty`` + +Semantics: +"""""""""" + +The '``uitofp``' instruction interprets its operand as an unsigned +integer quantity and converts it to the corresponding floating point +value. If the value cannot fit in the floating point value, the results +are undefined. + +Example: +"""""""" + +.. code-block:: llvm + + %X = uitofp i32 257 to float ; yields float:257.0 + %Y = uitofp i8 -1 to double ; yields double:255.0 + +'``sitofp .. to``' Instruction +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = sitofp <ty> <value> to <ty2> ; yields ty2 + +Overview: +""""""""" + +The '``sitofp``' instruction regards ``value`` as a signed integer and +converts that value to the ``ty2`` type. + +Arguments: +"""""""""" + +The '``sitofp``' instruction takes a value to cast, which must be a +scalar or vector :ref:`integer <t_integer>` value, and a type to cast it to +``ty2``, which must be an :ref:`floating point <t_floating>` type. If +``ty`` is a vector integer type, ``ty2`` must be a vector floating point +type with the same number of elements as ``ty`` + +Semantics: +"""""""""" + +The '``sitofp``' instruction interprets its operand as a signed integer +quantity and converts it to the corresponding floating point value. If +the value cannot fit in the floating point value, the results are +undefined. + +Example: +"""""""" + +.. code-block:: llvm + + %X = sitofp i32 257 to float ; yields float:257.0 + %Y = sitofp i8 -1 to double ; yields double:-1.0 + +.. _i_ptrtoint: + +'``ptrtoint .. to``' Instruction +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = ptrtoint <ty> <value> to <ty2> ; yields ty2 + +Overview: +""""""""" + +The '``ptrtoint``' instruction converts the pointer or a vector of +pointers ``value`` to the integer (or vector of integers) type ``ty2``. + +Arguments: +"""""""""" + +The '``ptrtoint``' instruction takes a ``value`` to cast, which must be +a a value of type :ref:`pointer <t_pointer>` or a vector of pointers, and a +type to cast it to ``ty2``, which must be an :ref:`integer <t_integer>` or +a vector of integers type. + +Semantics: +"""""""""" + +The '``ptrtoint``' instruction converts ``value`` to integer type +``ty2`` by interpreting the pointer value as an integer and either +truncating or zero extending that value to the size of the integer type. +If ``value`` is smaller than ``ty2`` then a zero extension is done. If +``value`` is larger than ``ty2`` then a truncation is done. If they are +the same size, then nothing is done (*no-op cast*) other than a type +change. + +Example: +"""""""" + +.. code-block:: llvm + + %X = ptrtoint i32* %P to i8 ; yields truncation on 32-bit architecture + %Y = ptrtoint i32* %P to i64 ; yields zero extension on 32-bit architecture + %Z = ptrtoint <4 x i32*> %P to <4 x i64>; yields vector zero extension for a vector of addresses on 32-bit architecture + +.. _i_inttoptr: + +'``inttoptr .. to``' Instruction +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = inttoptr <ty> <value> to <ty2> ; yields ty2 + +Overview: +""""""""" + +The '``inttoptr``' instruction converts an integer ``value`` to a +pointer type, ``ty2``. + +Arguments: +"""""""""" + +The '``inttoptr``' instruction takes an :ref:`integer <t_integer>` value to +cast, and a type to cast it to, which must be a :ref:`pointer <t_pointer>` +type. + +Semantics: +"""""""""" + +The '``inttoptr``' instruction converts ``value`` to type ``ty2`` by +applying either a zero extension or a truncation depending on the size +of the integer ``value``. If ``value`` is larger than the size of a +pointer then a truncation is done. If ``value`` is smaller than the size +of a pointer then a zero extension is done. If they are the same size, +nothing is done (*no-op cast*). + +Example: +"""""""" + +.. code-block:: llvm + + %X = inttoptr i32 255 to i32* ; yields zero extension on 64-bit architecture + %Y = inttoptr i32 255 to i32* ; yields no-op on 32-bit architecture + %Z = inttoptr i64 0 to i32* ; yields truncation on 32-bit architecture + %Z = inttoptr <4 x i32> %G to <4 x i8*>; yields truncation of vector G to four pointers + +.. _i_bitcast: + +'``bitcast .. to``' Instruction +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = bitcast <ty> <value> to <ty2> ; yields ty2 + +Overview: +""""""""" + +The '``bitcast``' instruction converts ``value`` to type ``ty2`` without +changing any bits. + +Arguments: +"""""""""" + +The '``bitcast``' instruction takes a value to cast, which must be a +non-aggregate first class value, and a type to cast it to, which must +also be a non-aggregate :ref:`first class <t_firstclass>` type. The bit +sizes of ``value`` and the destination type, ``ty2``, must be identical. +If the source type is a pointer, the destination type must also be a +pointer. This instruction supports bitwise conversion of vectors to +integers and to vectors of other types (as long as they have the same +size). + +Semantics: +"""""""""" + +The '``bitcast``' instruction converts ``value`` to type ``ty2``. It is +always a *no-op cast* because no bits change with this conversion. The +conversion is done as if the ``value`` had been stored to memory and +read back as type ``ty2``. Pointer (or vector of pointers) types may +only be converted to other pointer (or vector of pointers) types with +this instruction. To convert pointers to other types, use the +:ref:`inttoptr <i_inttoptr>` or :ref:`ptrtoint <i_ptrtoint>` instructions +first. + +Example: +"""""""" + +.. code-block:: llvm + + %X = bitcast i8 255 to i8 ; yields i8 :-1 + %Y = bitcast i32* %x to sint* ; yields sint*:%x + %Z = bitcast <2 x int> %V to i64; ; yields i64: %V + %Z = bitcast <2 x i32*> %V to <2 x i64*> ; yields <2 x i64*> + +.. _otherops: + +Other Operations +---------------- + +The instructions in this category are the "miscellaneous" instructions, +which defy better classification. + +.. _i_icmp: + +'``icmp``' Instruction +^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = icmp <cond> <ty> <op1>, <op2> ; yields {i1} or {<N x i1>}:result + +Overview: +""""""""" + +The '``icmp``' instruction returns a boolean value or a vector of +boolean values based on comparison of its two integer, integer vector, +pointer, or pointer vector operands. + +Arguments: +"""""""""" + +The '``icmp``' instruction takes three operands. The first operand is +the condition code indicating the kind of comparison to perform. It is +not a value, just a keyword. The possible condition code are: + +#. ``eq``: equal +#. ``ne``: not equal +#. ``ugt``: unsigned greater than +#. ``uge``: unsigned greater or equal +#. ``ult``: unsigned less than +#. ``ule``: unsigned less or equal +#. ``sgt``: signed greater than +#. ``sge``: signed greater or equal +#. ``slt``: signed less than +#. ``sle``: signed less or equal + +The remaining two arguments must be :ref:`integer <t_integer>` or +:ref:`pointer <t_pointer>` or integer :ref:`vector <t_vector>` typed. They +must also be identical types. + +Semantics: +"""""""""" + +The '``icmp``' compares ``op1`` and ``op2`` according to the condition +code given as ``cond``. The comparison performed always yields either an +:ref:`i1 <t_integer>` or vector of ``i1`` result, as follows: + +#. ``eq``: yields ``true`` if the operands are equal, ``false`` + otherwise. No sign interpretation is necessary or performed. +#. ``ne``: yields ``true`` if the operands are unequal, ``false`` + otherwise. No sign interpretation is necessary or performed. +#. ``ugt``: interprets the operands as unsigned values and yields + ``true`` if ``op1`` is greater than ``op2``. +#. ``uge``: interprets the operands as unsigned values and yields + ``true`` if ``op1`` is greater than or equal to ``op2``. +#. ``ult``: interprets the operands as unsigned values and yields + ``true`` if ``op1`` is less than ``op2``. +#. ``ule``: interprets the operands as unsigned values and yields + ``true`` if ``op1`` is less than or equal to ``op2``. +#. ``sgt``: interprets the operands as signed values and yields ``true`` + if ``op1`` is greater than ``op2``. +#. ``sge``: interprets the operands as signed values and yields ``true`` + if ``op1`` is greater than or equal to ``op2``. +#. ``slt``: interprets the operands as signed values and yields ``true`` + if ``op1`` is less than ``op2``. +#. ``sle``: interprets the operands as signed values and yields ``true`` + if ``op1`` is less than or equal to ``op2``. + +If the operands are :ref:`pointer <t_pointer>` typed, the pointer values +are compared as if they were integers. + +If the operands are integer vectors, then they are compared element by +element. The result is an ``i1`` vector with the same number of elements +as the values being compared. Otherwise, the result is an ``i1``. + +Example: +"""""""" + +.. code-block:: llvm + + <result> = icmp eq i32 4, 5 ; yields: result=false + <result> = icmp ne float* %X, %X ; yields: result=false + <result> = icmp ult i16 4, 5 ; yields: result=true + <result> = icmp sgt i16 4, 5 ; yields: result=false + <result> = icmp ule i16 -4, 5 ; yields: result=false + <result> = icmp sge i16 4, 5 ; yields: result=false + +Note that the code generator does not yet support vector types with the +``icmp`` instruction. + +.. _i_fcmp: + +'``fcmp``' Instruction +^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = fcmp <cond> <ty> <op1>, <op2> ; yields {i1} or {<N x i1>}:result + +Overview: +""""""""" + +The '``fcmp``' instruction returns a boolean value or vector of boolean +values based on comparison of its operands. + +If the operands are floating point scalars, then the result type is a +boolean (:ref:`i1 <t_integer>`). + +If the operands are floating point vectors, then the result type is a +vector of boolean with the same number of elements as the operands being +compared. + +Arguments: +"""""""""" + +The '``fcmp``' instruction takes three operands. The first operand is +the condition code indicating the kind of comparison to perform. It is +not a value, just a keyword. The possible condition code are: + +#. ``false``: no comparison, always returns false +#. ``oeq``: ordered and equal +#. ``ogt``: ordered and greater than +#. ``oge``: ordered and greater than or equal +#. ``olt``: ordered and less than +#. ``ole``: ordered and less than or equal +#. ``one``: ordered and not equal +#. ``ord``: ordered (no nans) +#. ``ueq``: unordered or equal +#. ``ugt``: unordered or greater than +#. ``uge``: unordered or greater than or equal +#. ``ult``: unordered or less than +#. ``ule``: unordered or less than or equal +#. ``une``: unordered or not equal +#. ``uno``: unordered (either nans) +#. ``true``: no comparison, always returns true + +*Ordered* means that neither operand is a QNAN while *unordered* means +that either operand may be a QNAN. + +Each of ``val1`` and ``val2`` arguments must be either a :ref:`floating +point <t_floating>` type or a :ref:`vector <t_vector>` of floating point +type. They must have identical types. + +Semantics: +"""""""""" + +The '``fcmp``' instruction compares ``op1`` and ``op2`` according to the +condition code given as ``cond``. If the operands are vectors, then the +vectors are compared element by element. Each comparison performed +always yields an :ref:`i1 <t_integer>` result, as follows: + +#. ``false``: always yields ``false``, regardless of operands. +#. ``oeq``: yields ``true`` if both operands are not a QNAN and ``op1`` + is equal to ``op2``. +#. ``ogt``: yields ``true`` if both operands are not a QNAN and ``op1`` + is greater than ``op2``. +#. ``oge``: yields ``true`` if both operands are not a QNAN and ``op1`` + is greater than or equal to ``op2``. +#. ``olt``: yields ``true`` if both operands are not a QNAN and ``op1`` + is less than ``op2``. +#. ``ole``: yields ``true`` if both operands are not a QNAN and ``op1`` + is less than or equal to ``op2``. +#. ``one``: yields ``true`` if both operands are not a QNAN and ``op1`` + is not equal to ``op2``. +#. ``ord``: yields ``true`` if both operands are not a QNAN. +#. ``ueq``: yields ``true`` if either operand is a QNAN or ``op1`` is + equal to ``op2``. +#. ``ugt``: yields ``true`` if either operand is a QNAN or ``op1`` is + greater than ``op2``. +#. ``uge``: yields ``true`` if either operand is a QNAN or ``op1`` is + greater than or equal to ``op2``. +#. ``ult``: yields ``true`` if either operand is a QNAN or ``op1`` is + less than ``op2``. +#. ``ule``: yields ``true`` if either operand is a QNAN or ``op1`` is + less than or equal to ``op2``. +#. ``une``: yields ``true`` if either operand is a QNAN or ``op1`` is + not equal to ``op2``. +#. ``uno``: yields ``true`` if either operand is a QNAN. +#. ``true``: always yields ``true``, regardless of operands. + +Example: +"""""""" + +.. code-block:: llvm + + <result> = fcmp oeq float 4.0, 5.0 ; yields: result=false + <result> = fcmp one float 4.0, 5.0 ; yields: result=true + <result> = fcmp olt float 4.0, 5.0 ; yields: result=true + <result> = fcmp ueq double 1.0, 2.0 ; yields: result=false + +Note that the code generator does not yet support vector types with the +``fcmp`` instruction. + +.. _i_phi: + +'``phi``' Instruction +^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = phi <ty> [ <val0>, <label0>], ... + +Overview: +""""""""" + +The '``phi``' instruction is used to implement the φ node in the SSA +graph representing the function. + +Arguments: +"""""""""" + +The type of the incoming values is specified with the first type field. +After this, the '``phi``' instruction takes a list of pairs as +arguments, with one pair for each predecessor basic block of the current +block. Only values of :ref:`first class <t_firstclass>` type may be used as +the value arguments to the PHI node. Only labels may be used as the +label arguments. + +There must be no non-phi instructions between the start of a basic block +and the PHI instructions: i.e. PHI instructions must be first in a basic +block. + +For the purposes of the SSA form, the use of each incoming value is +deemed to occur on the edge from the corresponding predecessor block to +the current block (but after any definition of an '``invoke``' +instruction's return value on the same edge). + +Semantics: +"""""""""" + +At runtime, the '``phi``' instruction logically takes on the value +specified by the pair corresponding to the predecessor basic block that +executed just prior to the current block. + +Example: +"""""""" + +.. code-block:: llvm + + Loop: ; Infinite loop that counts from 0 on up... + %indvar = phi i32 [ 0, %LoopHeader ], [ %nextindvar, %Loop ] + %nextindvar = add i32 %indvar, 1 + br label %Loop + +.. _i_select: + +'``select``' Instruction +^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = select selty <cond>, <ty> <val1>, <ty> <val2> ; yields ty + + selty is either i1 or {<N x i1>} + +Overview: +""""""""" + +The '``select``' instruction is used to choose one value based on a +condition, without branching. + +Arguments: +"""""""""" + +The '``select``' instruction requires an 'i1' value or a vector of 'i1' +values indicating the condition, and two values of the same :ref:`first +class <t_firstclass>` type. If the val1/val2 are vectors and the +condition is a scalar, then entire vectors are selected, not individual +elements. + +Semantics: +"""""""""" + +If the condition is an i1 and it evaluates to 1, the instruction returns +the first value argument; otherwise, it returns the second value +argument. + +If the condition is a vector of i1, then the value arguments must be +vectors of the same size, and the selection is done element by element. + +Example: +"""""""" + +.. code-block:: llvm + + %X = select i1 true, i8 17, i8 42 ; yields i8:17 + +.. _i_call: + +'``call``' Instruction +^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = [tail] call [cconv] [ret attrs] <ty> [<fnty>*] <fnptrval>(<function args>) [fn attrs] + +Overview: +""""""""" + +The '``call``' instruction represents a simple function call. + +Arguments: +"""""""""" + +This instruction requires several arguments: + +#. The optional "tail" marker indicates that the callee function does + not access any allocas or varargs in the caller. Note that calls may + be marked "tail" even if they do not occur before a + :ref:`ret <i_ret>` instruction. If the "tail" marker is present, the + function call is eligible for tail call optimization, but `might not + in fact be optimized into a jump <CodeGenerator.html#tailcallopt>`_. + The code generator may optimize calls marked "tail" with either 1) + automatic `sibling call + optimization <CodeGenerator.html#sibcallopt>`_ when the caller and + callee have matching signatures, or 2) forced tail call optimization + when the following extra requirements are met: + + - Caller and callee both have the calling convention ``fastcc``. + - The call is in tail position (ret immediately follows call and ret + uses value of call or is void). + - Option ``-tailcallopt`` is enabled, or + ``llvm::GuaranteedTailCallOpt`` is ``true``. + - `Platform specific constraints are + met. <CodeGenerator.html#tailcallopt>`_ + +#. The optional "cconv" marker indicates which :ref:`calling + convention <callingconv>` the call should use. If none is + specified, the call defaults to using C calling conventions. The + calling convention of the call must match the calling convention of + the target function, or else the behavior is undefined. +#. The optional :ref:`Parameter Attributes <paramattrs>` list for return + values. Only '``zeroext``', '``signext``', and '``inreg``' attributes + are valid here. +#. '``ty``': the type of the call instruction itself which is also the + type of the return value. Functions that return no value are marked + ``void``. +#. '``fnty``': shall be the signature of the pointer to function value + being invoked. The argument types must match the types implied by + this signature. This type can be omitted if the function is not + varargs and if the function type does not return a pointer to a + function. +#. '``fnptrval``': An LLVM value containing a pointer to a function to + be invoked. In most cases, this is a direct function invocation, but + indirect ``call``'s are just as possible, calling an arbitrary pointer + to function value. +#. '``function args``': argument list whose types match the function + signature argument types and parameter attributes. All arguments must + be of :ref:`first class <t_firstclass>` type. If the function signature + indicates the function accepts a variable number of arguments, the + extra arguments can be specified. +#. The optional :ref:`function attributes <fnattrs>` list. Only + '``noreturn``', '``nounwind``', '``readonly``' and '``readnone``' + attributes are valid here. + +Semantics: +"""""""""" + +The '``call``' instruction is used to cause control flow to transfer to +a specified function, with its incoming arguments bound to the specified +values. Upon a '``ret``' instruction in the called function, control +flow continues with the instruction after the function call, and the +return value of the function is bound to the result argument. + +Example: +"""""""" + +.. code-block:: llvm + + %retval = call i32 @test(i32 %argc) + call i32 (i8*, ...)* @printf(i8* %msg, i32 12, i8 42) ; yields i32 + %X = tail call i32 @foo() ; yields i32 + %Y = tail call fastcc i32 @foo() ; yields i32 + call void %foo(i8 97 signext) + + %struct.A = type { i32, i8 } + %r = call %struct.A @foo() ; yields { 32, i8 } + %gr = extractvalue %struct.A %r, 0 ; yields i32 + %gr1 = extractvalue %struct.A %r, 1 ; yields i8 + %Z = call void @foo() noreturn ; indicates that %foo never returns normally + %ZZ = call zeroext i32 @bar() ; Return value is %zero extended + +llvm treats calls to some functions with names and arguments that match +the standard C99 library as being the C99 library functions, and may +perform optimizations or generate code for them under that assumption. +This is something we'd like to change in the future to provide better +support for freestanding environments and non-C-based languages. + +.. _i_va_arg: + +'``va_arg``' Instruction +^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <resultval> = va_arg <va_list*> <arglist>, <argty> + +Overview: +""""""""" + +The '``va_arg``' instruction is used to access arguments passed through +the "variable argument" area of a function call. It is used to implement +the ``va_arg`` macro in C. + +Arguments: +"""""""""" + +This instruction takes a ``va_list*`` value and the type of the +argument. It returns a value of the specified argument type and +increments the ``va_list`` to point to the next argument. The actual +type of ``va_list`` is target specific. + +Semantics: +"""""""""" + +The '``va_arg``' instruction loads an argument of the specified type +from the specified ``va_list`` and causes the ``va_list`` to point to +the next argument. For more information, see the variable argument +handling :ref:`Intrinsic Functions <int_varargs>`. + +It is legal for this instruction to be called in a function which does +not take a variable number of arguments, for example, the ``vfprintf`` +function. + +``va_arg`` is an LLVM instruction instead of an :ref:`intrinsic +function <intrinsics>` because it takes a type as an argument. + +Example: +"""""""" + +See the :ref:`variable argument processing <int_varargs>` section. + +Note that the code generator does not yet fully support va\_arg on many +targets. Also, it does not currently support va\_arg with aggregate +types on any target. + +.. _i_landingpad: + +'``landingpad``' Instruction +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <resultval> = landingpad <resultty> personality <type> <pers_fn> <clause>+ + <resultval> = landingpad <resultty> personality <type> <pers_fn> cleanup <clause>* + + <clause> := catch <type> <value> + <clause> := filter <array constant type> <array constant> + +Overview: +""""""""" + +The '``landingpad``' instruction is used by `LLVM's exception handling +system <ExceptionHandling.html#overview>`_ to specify that a basic block +is a landing pad — one where the exception lands, and corresponds to the +code found in the ``catch`` portion of a ``try``/``catch`` sequence. It +defines values supplied by the personality function (``pers_fn``) upon +re-entry to the function. The ``resultval`` has the type ``resultty``. + +Arguments: +"""""""""" + +This instruction takes a ``pers_fn`` value. This is the personality +function associated with the unwinding mechanism. The optional +``cleanup`` flag indicates that the landing pad block is a cleanup. + +A ``clause`` begins with the clause type — ``catch`` or ``filter`` — and +contains the global variable representing the "type" that may be caught +or filtered respectively. Unlike the ``catch`` clause, the ``filter`` +clause takes an array constant as its argument. Use +"``[0 x i8**] undef``" for a filter which cannot throw. The +'``landingpad``' instruction must contain *at least* one ``clause`` or +the ``cleanup`` flag. + +Semantics: +"""""""""" + +The '``landingpad``' instruction defines the values which are set by the +personality function (``pers_fn``) upon re-entry to the function, and +therefore the "result type" of the ``landingpad`` instruction. As with +calling conventions, how the personality function results are +represented in LLVM IR is target specific. + +The clauses are applied in order from top to bottom. If two +``landingpad`` instructions are merged together through inlining, the +clauses from the calling function are appended to the list of clauses. +When the call stack is being unwound due to an exception being thrown, +the exception is compared against each ``clause`` in turn. If it doesn't +match any of the clauses, and the ``cleanup`` flag is not set, then +unwinding continues further up the call stack. + +The ``landingpad`` instruction has several restrictions: + +- A landing pad block is a basic block which is the unwind destination + of an '``invoke``' instruction. +- A landing pad block must have a '``landingpad``' instruction as its + first non-PHI instruction. +- There can be only one '``landingpad``' instruction within the landing + pad block. +- A basic block that is not a landing pad block may not include a + '``landingpad``' instruction. +- All '``landingpad``' instructions in a function must have the same + personality function. + +Example: +"""""""" + +.. code-block:: llvm + + ;; A landing pad which can catch an integer. + %res = landingpad { i8*, i32 } personality i32 (...)* @__gxx_personality_v0 + catch i8** @_ZTIi + ;; A landing pad that is a cleanup. + %res = landingpad { i8*, i32 } personality i32 (...)* @__gxx_personality_v0 + cleanup + ;; A landing pad which can catch an integer and can only throw a double. + %res = landingpad { i8*, i32 } personality i32 (...)* @__gxx_personality_v0 + catch i8** @_ZTIi + filter [1 x i8**] [@_ZTId] + +.. _intrinsics: + +Intrinsic Functions +=================== + +LLVM supports the notion of an "intrinsic function". These functions +have well known names and semantics and are required to follow certain +restrictions. Overall, these intrinsics represent an extension mechanism +for the LLVM language that does not require changing all of the +transformations in LLVM when adding to the language (or the bitcode +reader/writer, the parser, etc...). + +Intrinsic function names must all start with an "``llvm.``" prefix. This +prefix is reserved in LLVM for intrinsic names; thus, function names may +not begin with this prefix. Intrinsic functions must always be external +functions: you cannot define the body of intrinsic functions. Intrinsic +functions may only be used in call or invoke instructions: it is illegal +to take the address of an intrinsic function. Additionally, because +intrinsic functions are part of the LLVM language, it is required if any +are added that they be documented here. + +Some intrinsic functions can be overloaded, i.e., the intrinsic +represents a family of functions that perform the same operation but on +different data types. Because LLVM can represent over 8 million +different integer types, overloading is used commonly to allow an +intrinsic function to operate on any integer type. One or more of the +argument types or the result type can be overloaded to accept any +integer type. Argument types may also be defined as exactly matching a +previous argument's type or the result type. This allows an intrinsic +function which accepts multiple arguments, but needs all of them to be +of the same type, to only be overloaded with respect to a single +argument or the result. + +Overloaded intrinsics will have the names of its overloaded argument +types encoded into its function name, each preceded by a period. Only +those types which are overloaded result in a name suffix. Arguments +whose type is matched against another type do not. For example, the +``llvm.ctpop`` function can take an integer of any width and returns an +integer of exactly the same integer width. This leads to a family of +functions such as ``i8 @llvm.ctpop.i8(i8 %val)`` and +``i29 @llvm.ctpop.i29(i29 %val)``. Only one type, the return type, is +overloaded, and only one type suffix is required. Because the argument's +type is matched against the return type, it does not require its own +name suffix. + +To learn how to add an intrinsic function, please see the `Extending +LLVM Guide <ExtendingLLVM.html>`_. + +.. _int_varargs: + +Variable Argument Handling Intrinsics +------------------------------------- + +Variable argument support is defined in LLVM with the +:ref:`va_arg <i_va_arg>` instruction and these three intrinsic +functions. These functions are related to the similarly named macros +defined in the ``<stdarg.h>`` header file. + +All of these functions operate on arguments that use a target-specific +value type "``va_list``". The LLVM assembly language reference manual +does not define what this type is, so all transformations should be +prepared to handle these functions regardless of the type used. + +This example shows how the :ref:`va_arg <i_va_arg>` instruction and the +variable argument handling intrinsic functions are used. + +.. code-block:: llvm + + define i32 @test(i32 %X, ...) { + ; Initialize variable argument processing + %ap = alloca i8* + %ap2 = bitcast i8** %ap to i8* + call void @llvm.va_start(i8* %ap2) + + ; Read a single integer argument + %tmp = va_arg i8** %ap, i32 + + ; Demonstrate usage of llvm.va_copy and llvm.va_end + %aq = alloca i8* + %aq2 = bitcast i8** %aq to i8* + call void @llvm.va_copy(i8* %aq2, i8* %ap2) + call void @llvm.va_end(i8* %aq2) + + ; Stop processing of arguments. + call void @llvm.va_end(i8* %ap2) + ret i32 %tmp + } + + declare void @llvm.va_start(i8*) + declare void @llvm.va_copy(i8*, i8*) + declare void @llvm.va_end(i8*) + +.. _int_va_start: + +'``llvm.va_start``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare void %llvm.va_start(i8* <arglist>) + +Overview: +""""""""" + +The '``llvm.va_start``' intrinsic initializes ``*<arglist>`` for +subsequent use by ``va_arg``. + +Arguments: +"""""""""" + +The argument is a pointer to a ``va_list`` element to initialize. + +Semantics: +"""""""""" + +The '``llvm.va_start``' intrinsic works just like the ``va_start`` macro +available in C. In a target-dependent way, it initializes the +``va_list`` element to which the argument points, so that the next call +to ``va_arg`` will produce the first variable argument passed to the +function. Unlike the C ``va_start`` macro, this intrinsic does not need +to know the last argument of the function as the compiler can figure +that out. + +'``llvm.va_end``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare void @llvm.va_end(i8* <arglist>) + +Overview: +""""""""" + +The '``llvm.va_end``' intrinsic destroys ``*<arglist>``, which has been +initialized previously with ``llvm.va_start`` or ``llvm.va_copy``. + +Arguments: +"""""""""" + +The argument is a pointer to a ``va_list`` to destroy. + +Semantics: +"""""""""" + +The '``llvm.va_end``' intrinsic works just like the ``va_end`` macro +available in C. In a target-dependent way, it destroys the ``va_list`` +element to which the argument points. Calls to +:ref:`llvm.va_start <int_va_start>` and +:ref:`llvm.va_copy <int_va_copy>` must be matched exactly with calls to +``llvm.va_end``. + +.. _int_va_copy: + +'``llvm.va_copy``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare void @llvm.va_copy(i8* <destarglist>, i8* <srcarglist>) + +Overview: +""""""""" + +The '``llvm.va_copy``' intrinsic copies the current argument position +from the source argument list to the destination argument list. + +Arguments: +"""""""""" + +The first argument is a pointer to a ``va_list`` element to initialize. +The second argument is a pointer to a ``va_list`` element to copy from. + +Semantics: +"""""""""" + +The '``llvm.va_copy``' intrinsic works just like the ``va_copy`` macro +available in C. In a target-dependent way, it copies the source +``va_list`` element into the destination ``va_list`` element. This +intrinsic is necessary because the `` llvm.va_start`` intrinsic may be +arbitrarily complex and require, for example, memory allocation. + +Accurate Garbage Collection Intrinsics +-------------------------------------- + +LLVM support for `Accurate Garbage Collection <GarbageCollection.html>`_ +(GC) requires the implementation and generation of these intrinsics. +These intrinsics allow identification of :ref:`GC roots on the +stack <int_gcroot>`, as well as garbage collector implementations that +require :ref:`read <int_gcread>` and :ref:`write <int_gcwrite>` barriers. +Front-ends for type-safe garbage collected languages should generate +these intrinsics to make use of the LLVM garbage collectors. For more +details, see `Accurate Garbage Collection with +LLVM <GarbageCollection.html>`_. + +The garbage collection intrinsics only operate on objects in the generic +address space (address space zero). + +.. _int_gcroot: + +'``llvm.gcroot``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare void @llvm.gcroot(i8** %ptrloc, i8* %metadata) + +Overview: +""""""""" + +The '``llvm.gcroot``' intrinsic declares the existence of a GC root to +the code generator, and allows some metadata to be associated with it. + +Arguments: +"""""""""" + +The first argument specifies the address of a stack object that contains +the root pointer. The second pointer (which must be either a constant or +a global value address) contains the meta-data to be associated with the +root. + +Semantics: +"""""""""" + +At runtime, a call to this intrinsic stores a null pointer into the +"ptrloc" location. At compile-time, the code generator generates +information to allow the runtime to find the pointer at GC safe points. +The '``llvm.gcroot``' intrinsic may only be used in a function which +:ref:`specifies a GC algorithm <gc>`. + +.. _int_gcread: + +'``llvm.gcread``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare i8* @llvm.gcread(i8* %ObjPtr, i8** %Ptr) + +Overview: +""""""""" + +The '``llvm.gcread``' intrinsic identifies reads of references from heap +locations, allowing garbage collector implementations that require read +barriers. + +Arguments: +"""""""""" + +The second argument is the address to read from, which should be an +address allocated from the garbage collector. The first object is a +pointer to the start of the referenced object, if needed by the language +runtime (otherwise null). + +Semantics: +"""""""""" + +The '``llvm.gcread``' intrinsic has the same semantics as a load +instruction, but may be replaced with substantially more complex code by +the garbage collector runtime, as needed. The '``llvm.gcread``' +intrinsic may only be used in a function which :ref:`specifies a GC +algorithm <gc>`. + +.. _int_gcwrite: + +'``llvm.gcwrite``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare void @llvm.gcwrite(i8* %P1, i8* %Obj, i8** %P2) + +Overview: +""""""""" + +The '``llvm.gcwrite``' intrinsic identifies writes of references to heap +locations, allowing garbage collector implementations that require write +barriers (such as generational or reference counting collectors). + +Arguments: +"""""""""" + +The first argument is the reference to store, the second is the start of +the object to store it to, and the third is the address of the field of +Obj to store to. If the runtime does not require a pointer to the +object, Obj may be null. + +Semantics: +"""""""""" + +The '``llvm.gcwrite``' intrinsic has the same semantics as a store +instruction, but may be replaced with substantially more complex code by +the garbage collector runtime, as needed. The '``llvm.gcwrite``' +intrinsic may only be used in a function which :ref:`specifies a GC +algorithm <gc>`. + +Code Generator Intrinsics +------------------------- + +These intrinsics are provided by LLVM to expose special features that +may only be implemented with code generator support. + +'``llvm.returnaddress``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare i8 *@llvm.returnaddress(i32 <level>) + +Overview: +""""""""" + +The '``llvm.returnaddress``' intrinsic attempts to compute a +target-specific value indicating the return address of the current +function or one of its callers. + +Arguments: +"""""""""" + +The argument to this intrinsic indicates which function to return the +address for. Zero indicates the calling function, one indicates its +caller, etc. The argument is **required** to be a constant integer +value. + +Semantics: +"""""""""" + +The '``llvm.returnaddress``' intrinsic either returns a pointer +indicating the return address of the specified call frame, or zero if it +cannot be identified. The value returned by this intrinsic is likely to +be incorrect or 0 for arguments other than zero, so it should only be +used for debugging purposes. + +Note that calling this intrinsic does not prevent function inlining or +other aggressive transformations, so the value returned may not be that +of the obvious source-language caller. + +'``llvm.frameaddress``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare i8* @llvm.frameaddress(i32 <level>) + +Overview: +""""""""" + +The '``llvm.frameaddress``' intrinsic attempts to return the +target-specific frame pointer value for the specified stack frame. + +Arguments: +"""""""""" + +The argument to this intrinsic indicates which function to return the +frame pointer for. Zero indicates the calling function, one indicates +its caller, etc. The argument is **required** to be a constant integer +value. + +Semantics: +"""""""""" + +The '``llvm.frameaddress``' intrinsic either returns a pointer +indicating the frame address of the specified call frame, or zero if it +cannot be identified. The value returned by this intrinsic is likely to +be incorrect or 0 for arguments other than zero, so it should only be +used for debugging purposes. + +Note that calling this intrinsic does not prevent function inlining or +other aggressive transformations, so the value returned may not be that +of the obvious source-language caller. + +.. _int_stacksave: + +'``llvm.stacksave``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare i8* @llvm.stacksave() + +Overview: +""""""""" + +The '``llvm.stacksave``' intrinsic is used to remember the current state +of the function stack, for use with +:ref:`llvm.stackrestore <int_stackrestore>`. This is useful for +implementing language features like scoped automatic variable sized +arrays in C99. + +Semantics: +"""""""""" + +This intrinsic returns a opaque pointer value that can be passed to +:ref:`llvm.stackrestore <int_stackrestore>`. When an +``llvm.stackrestore`` intrinsic is executed with a value saved from +``llvm.stacksave``, it effectively restores the state of the stack to +the state it was in when the ``llvm.stacksave`` intrinsic executed. In +practice, this pops any :ref:`alloca <i_alloca>` blocks from the stack that +were allocated after the ``llvm.stacksave`` was executed. + +.. _int_stackrestore: + +'``llvm.stackrestore``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare void @llvm.stackrestore(i8* %ptr) + +Overview: +""""""""" + +The '``llvm.stackrestore``' intrinsic is used to restore the state of +the function stack to the state it was in when the corresponding +:ref:`llvm.stacksave <int_stacksave>` intrinsic executed. This is +useful for implementing language features like scoped automatic variable +sized arrays in C99. + +Semantics: +"""""""""" + +See the description for :ref:`llvm.stacksave <int_stacksave>`. + +'``llvm.prefetch``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare void @llvm.prefetch(i8* <address>, i32 <rw>, i32 <locality>, i32 <cache type>) + +Overview: +""""""""" + +The '``llvm.prefetch``' intrinsic is a hint to the code generator to +insert a prefetch instruction if supported; otherwise, it is a noop. +Prefetches have no effect on the behavior of the program but can change +its performance characteristics. + +Arguments: +"""""""""" + +``address`` is the address to be prefetched, ``rw`` is the specifier +determining if the fetch should be for a read (0) or write (1), and +``locality`` is a temporal locality specifier ranging from (0) - no +locality, to (3) - extremely local keep in cache. The ``cache type`` +specifies whether the prefetch is performed on the data (1) or +instruction (0) cache. The ``rw``, ``locality`` and ``cache type`` +arguments must be constant integers. + +Semantics: +"""""""""" + +This intrinsic does not modify the behavior of the program. In +particular, prefetches cannot trap and do not produce a value. On +targets that support this intrinsic, the prefetch can provide hints to +the processor cache for better performance. + +'``llvm.pcmarker``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare void @llvm.pcmarker(i32 <id>) + +Overview: +""""""""" + +The '``llvm.pcmarker``' intrinsic is a method to export a Program +Counter (PC) in a region of code to simulators and other tools. The +method is target specific, but it is expected that the marker will use +exported symbols to transmit the PC of the marker. The marker makes no +guarantees that it will remain with any specific instruction after +optimizations. It is possible that the presence of a marker will inhibit +optimizations. The intended use is to be inserted after optimizations to +allow correlations of simulation runs. + +Arguments: +"""""""""" + +``id`` is a numerical id identifying the marker. + +Semantics: +"""""""""" + +This intrinsic does not modify the behavior of the program. Backends +that do not support this intrinsic may ignore it. + +'``llvm.readcyclecounter``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare i64 @llvm.readcyclecounter() + +Overview: +""""""""" + +The '``llvm.readcyclecounter``' intrinsic provides access to the cycle +counter register (or similar low latency, high accuracy clocks) on those +targets that support it. On X86, it should map to RDTSC. On Alpha, it +should map to RPCC. As the backing counters overflow quickly (on the +order of 9 seconds on alpha), this should only be used for small +timings. + +Semantics: +"""""""""" + +When directly supported, reading the cycle counter should not modify any +memory. Implementations are allowed to either return a application +specific value or a system wide value. On backends without support, this +is lowered to a constant 0. + +Standard C Library Intrinsics +----------------------------- + +LLVM provides intrinsics for a few important standard C library +functions. These intrinsics allow source-language front-ends to pass +information about the alignment of the pointer arguments to the code +generator, providing opportunity for more efficient code generation. + +.. _int_memcpy: + +'``llvm.memcpy``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +This is an overloaded intrinsic. You can use ``llvm.memcpy`` on any +integer bit width and for different address spaces. Not all targets +support all bit widths however. + +:: + + declare void @llvm.memcpy.p0i8.p0i8.i32(i8* <dest>, i8* <src>, + i32 <len>, i32 <align>, i1 <isvolatile>) + declare void @llvm.memcpy.p0i8.p0i8.i64(i8* <dest>, i8* <src>, + i64 <len>, i32 <align>, i1 <isvolatile>) + +Overview: +""""""""" + +The '``llvm.memcpy.*``' intrinsics copy a block of memory from the +source location to the destination location. + +Note that, unlike the standard libc function, the ``llvm.memcpy.*`` +intrinsics do not return a value, takes extra alignment/isvolatile +arguments and the pointers can be in specified address spaces. + +Arguments: +"""""""""" + +The first argument is a pointer to the destination, the second is a +pointer to the source. The third argument is an integer argument +specifying the number of bytes to copy, the fourth argument is the +alignment of the source and destination locations, and the fifth is a +boolean indicating a volatile access. + +If the call to this intrinsic has an alignment value that is not 0 or 1, +then the caller guarantees that both the source and destination pointers +are aligned to that boundary. + +If the ``isvolatile`` parameter is ``true``, the ``llvm.memcpy`` call is +a :ref:`volatile operation <volatile>`. The detailed access behavior is not +very cleanly specified and it is unwise to depend on it. + +Semantics: +"""""""""" + +The '``llvm.memcpy.*``' intrinsics copy a block of memory from the +source location to the destination location, which are not allowed to +overlap. It copies "len" bytes of memory over. If the argument is known +to be aligned to some boundary, this can be specified as the fourth +argument, otherwise it should be set to 0 or 1. + +'``llvm.memmove``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +This is an overloaded intrinsic. You can use llvm.memmove on any integer +bit width and for different address space. Not all targets support all +bit widths however. + +:: + + declare void @llvm.memmove.p0i8.p0i8.i32(i8* <dest>, i8* <src>, + i32 <len>, i32 <align>, i1 <isvolatile>) + declare void @llvm.memmove.p0i8.p0i8.i64(i8* <dest>, i8* <src>, + i64 <len>, i32 <align>, i1 <isvolatile>) + +Overview: +""""""""" + +The '``llvm.memmove.*``' intrinsics move a block of memory from the +source location to the destination location. It is similar to the +'``llvm.memcpy``' intrinsic but allows the two memory locations to +overlap. + +Note that, unlike the standard libc function, the ``llvm.memmove.*`` +intrinsics do not return a value, takes extra alignment/isvolatile +arguments and the pointers can be in specified address spaces. + +Arguments: +"""""""""" + +The first argument is a pointer to the destination, the second is a +pointer to the source. The third argument is an integer argument +specifying the number of bytes to copy, the fourth argument is the +alignment of the source and destination locations, and the fifth is a +boolean indicating a volatile access. + +If the call to this intrinsic has an alignment value that is not 0 or 1, +then the caller guarantees that the source and destination pointers are +aligned to that boundary. + +If the ``isvolatile`` parameter is ``true``, the ``llvm.memmove`` call +is a :ref:`volatile operation <volatile>`. The detailed access behavior is +not very cleanly specified and it is unwise to depend on it. + +Semantics: +"""""""""" + +The '``llvm.memmove.*``' intrinsics copy a block of memory from the +source location to the destination location, which may overlap. It +copies "len" bytes of memory over. If the argument is known to be +aligned to some boundary, this can be specified as the fourth argument, +otherwise it should be set to 0 or 1. + +'``llvm.memset.*``' Intrinsics +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +This is an overloaded intrinsic. You can use llvm.memset on any integer +bit width and for different address spaces. However, not all targets +support all bit widths. + +:: + + declare void @llvm.memset.p0i8.i32(i8* <dest>, i8 <val>, + i32 <len>, i32 <align>, i1 <isvolatile>) + declare void @llvm.memset.p0i8.i64(i8* <dest>, i8 <val>, + i64 <len>, i32 <align>, i1 <isvolatile>) + +Overview: +""""""""" + +The '``llvm.memset.*``' intrinsics fill a block of memory with a +particular byte value. + +Note that, unlike the standard libc function, the ``llvm.memset`` +intrinsic does not return a value and takes extra alignment/volatile +arguments. Also, the destination can be in an arbitrary address space. + +Arguments: +"""""""""" + +The first argument is a pointer to the destination to fill, the second +is the byte value with which to fill it, the third argument is an +integer argument specifying the number of bytes to fill, and the fourth +argument is the known alignment of the destination location. + +If the call to this intrinsic has an alignment value that is not 0 or 1, +then the caller guarantees that the destination pointer is aligned to +that boundary. + +If the ``isvolatile`` parameter is ``true``, the ``llvm.memset`` call is +a :ref:`volatile operation <volatile>`. The detailed access behavior is not +very cleanly specified and it is unwise to depend on it. + +Semantics: +"""""""""" + +The '``llvm.memset.*``' intrinsics fill "len" bytes of memory starting +at the destination location. If the argument is known to be aligned to +some boundary, this can be specified as the fourth argument, otherwise +it should be set to 0 or 1. + +'``llvm.sqrt.*``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +This is an overloaded intrinsic. You can use ``llvm.sqrt`` on any +floating point or vector of floating point type. Not all targets support +all types however. + +:: + + declare float @llvm.sqrt.f32(float %Val) + declare double @llvm.sqrt.f64(double %Val) + declare x86_fp80 @llvm.sqrt.f80(x86_fp80 %Val) + declare fp128 @llvm.sqrt.f128(fp128 %Val) + declare ppc_fp128 @llvm.sqrt.ppcf128(ppc_fp128 %Val) + +Overview: +""""""""" + +The '``llvm.sqrt``' intrinsics return the sqrt of the specified operand, +returning the same value as the libm '``sqrt``' functions would. Unlike +``sqrt`` in libm, however, ``llvm.sqrt`` has undefined behavior for +negative numbers other than -0.0 (which allows for better optimization, +because there is no need to worry about errno being set). +``llvm.sqrt(-0.0)`` is defined to return -0.0 like IEEE sqrt. + +Arguments: +"""""""""" + +The argument and return value are floating point numbers of the same +type. + +Semantics: +"""""""""" + +This function returns the sqrt of the specified operand if it is a +nonnegative floating point number. + +'``llvm.powi.*``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +This is an overloaded intrinsic. You can use ``llvm.powi`` on any +floating point or vector of floating point type. Not all targets support +all types however. + +:: + + declare float @llvm.powi.f32(float %Val, i32 %power) + declare double @llvm.powi.f64(double %Val, i32 %power) + declare x86_fp80 @llvm.powi.f80(x86_fp80 %Val, i32 %power) + declare fp128 @llvm.powi.f128(fp128 %Val, i32 %power) + declare ppc_fp128 @llvm.powi.ppcf128(ppc_fp128 %Val, i32 %power) + +Overview: +""""""""" + +The '``llvm.powi.*``' intrinsics return the first operand raised to the +specified (positive or negative) power. The order of evaluation of +multiplications is not defined. When a vector of floating point type is +used, the second argument remains a scalar integer value. + +Arguments: +"""""""""" + +The second argument is an integer power, and the first is a value to +raise to that power. + +Semantics: +"""""""""" + +This function returns the first value raised to the second power with an +unspecified sequence of rounding operations. + +'``llvm.sin.*``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +This is an overloaded intrinsic. You can use ``llvm.sin`` on any +floating point or vector of floating point type. Not all targets support +all types however. + +:: + + declare float @llvm.sin.f32(float %Val) + declare double @llvm.sin.f64(double %Val) + declare x86_fp80 @llvm.sin.f80(x86_fp80 %Val) + declare fp128 @llvm.sin.f128(fp128 %Val) + declare ppc_fp128 @llvm.sin.ppcf128(ppc_fp128 %Val) + +Overview: +""""""""" + +The '``llvm.sin.*``' intrinsics return the sine of the operand. + +Arguments: +"""""""""" + +The argument and return value are floating point numbers of the same +type. + +Semantics: +"""""""""" + +This function returns the sine of the specified operand, returning the +same values as the libm ``sin`` functions would, and handles error +conditions in the same way. + +'``llvm.cos.*``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +This is an overloaded intrinsic. You can use ``llvm.cos`` on any +floating point or vector of floating point type. Not all targets support +all types however. + +:: + + declare float @llvm.cos.f32(float %Val) + declare double @llvm.cos.f64(double %Val) + declare x86_fp80 @llvm.cos.f80(x86_fp80 %Val) + declare fp128 @llvm.cos.f128(fp128 %Val) + declare ppc_fp128 @llvm.cos.ppcf128(ppc_fp128 %Val) + +Overview: +""""""""" + +The '``llvm.cos.*``' intrinsics return the cosine of the operand. + +Arguments: +"""""""""" + +The argument and return value are floating point numbers of the same +type. + +Semantics: +"""""""""" + +This function returns the cosine of the specified operand, returning the +same values as the libm ``cos`` functions would, and handles error +conditions in the same way. + +'``llvm.pow.*``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +This is an overloaded intrinsic. You can use ``llvm.pow`` on any +floating point or vector of floating point type. Not all targets support +all types however. + +:: + + declare float @llvm.pow.f32(float %Val, float %Power) + declare double @llvm.pow.f64(double %Val, double %Power) + declare x86_fp80 @llvm.pow.f80(x86_fp80 %Val, x86_fp80 %Power) + declare fp128 @llvm.pow.f128(fp128 %Val, fp128 %Power) + declare ppc_fp128 @llvm.pow.ppcf128(ppc_fp128 %Val, ppc_fp128 Power) + +Overview: +""""""""" + +The '``llvm.pow.*``' intrinsics return the first operand raised to the +specified (positive or negative) power. + +Arguments: +"""""""""" + +The second argument is a floating point power, and the first is a value +to raise to that power. + +Semantics: +"""""""""" + +This function returns the first value raised to the second power, +returning the same values as the libm ``pow`` functions would, and +handles error conditions in the same way. + +'``llvm.exp.*``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +This is an overloaded intrinsic. You can use ``llvm.exp`` on any +floating point or vector of floating point type. Not all targets support +all types however. + +:: + + declare float @llvm.exp.f32(float %Val) + declare double @llvm.exp.f64(double %Val) + declare x86_fp80 @llvm.exp.f80(x86_fp80 %Val) + declare fp128 @llvm.exp.f128(fp128 %Val) + declare ppc_fp128 @llvm.exp.ppcf128(ppc_fp128 %Val) + +Overview: +""""""""" + +The '``llvm.exp.*``' intrinsics perform the exp function. + +Arguments: +"""""""""" + +The argument and return value are floating point numbers of the same +type. + +Semantics: +"""""""""" + +This function returns the same values as the libm ``exp`` functions +would, and handles error conditions in the same way. + +'``llvm.exp2.*``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +This is an overloaded intrinsic. You can use ``llvm.exp2`` on any +floating point or vector of floating point type. Not all targets support +all types however. + +:: + + declare float @llvm.exp2.f32(float %Val) + declare double @llvm.exp2.f64(double %Val) + declare x86_fp80 @llvm.exp2.f80(x86_fp80 %Val) + declare fp128 @llvm.exp2.f128(fp128 %Val) + declare ppc_fp128 @llvm.exp2.ppcf128(ppc_fp128 %Val) + +Overview: +""""""""" + +The '``llvm.exp2.*``' intrinsics perform the exp2 function. + +Arguments: +"""""""""" + +The argument and return value are floating point numbers of the same +type. + +Semantics: +"""""""""" + +This function returns the same values as the libm ``exp2`` functions +would, and handles error conditions in the same way. + +'``llvm.log.*``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +This is an overloaded intrinsic. You can use ``llvm.log`` on any +floating point or vector of floating point type. Not all targets support +all types however. + +:: + + declare float @llvm.log.f32(float %Val) + declare double @llvm.log.f64(double %Val) + declare x86_fp80 @llvm.log.f80(x86_fp80 %Val) + declare fp128 @llvm.log.f128(fp128 %Val) + declare ppc_fp128 @llvm.log.ppcf128(ppc_fp128 %Val) + +Overview: +""""""""" + +The '``llvm.log.*``' intrinsics perform the log function. + +Arguments: +"""""""""" + +The argument and return value are floating point numbers of the same +type. + +Semantics: +"""""""""" + +This function returns the same values as the libm ``log`` functions +would, and handles error conditions in the same way. + +'``llvm.log10.*``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +This is an overloaded intrinsic. You can use ``llvm.log10`` on any +floating point or vector of floating point type. Not all targets support +all types however. + +:: + + declare float @llvm.log10.f32(float %Val) + declare double @llvm.log10.f64(double %Val) + declare x86_fp80 @llvm.log10.f80(x86_fp80 %Val) + declare fp128 @llvm.log10.f128(fp128 %Val) + declare ppc_fp128 @llvm.log10.ppcf128(ppc_fp128 %Val) + +Overview: +""""""""" + +The '``llvm.log10.*``' intrinsics perform the log10 function. + +Arguments: +"""""""""" + +The argument and return value are floating point numbers of the same +type. + +Semantics: +"""""""""" + +This function returns the same values as the libm ``log10`` functions +would, and handles error conditions in the same way. + +'``llvm.log2.*``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +This is an overloaded intrinsic. You can use ``llvm.log2`` on any +floating point or vector of floating point type. Not all targets support +all types however. + +:: + + declare float @llvm.log2.f32(float %Val) + declare double @llvm.log2.f64(double %Val) + declare x86_fp80 @llvm.log2.f80(x86_fp80 %Val) + declare fp128 @llvm.log2.f128(fp128 %Val) + declare ppc_fp128 @llvm.log2.ppcf128(ppc_fp128 %Val) + +Overview: +""""""""" + +The '``llvm.log2.*``' intrinsics perform the log2 function. + +Arguments: +"""""""""" + +The argument and return value are floating point numbers of the same +type. + +Semantics: +"""""""""" + +This function returns the same values as the libm ``log2`` functions +would, and handles error conditions in the same way. + +'``llvm.fma.*``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +This is an overloaded intrinsic. You can use ``llvm.fma`` on any +floating point or vector of floating point type. Not all targets support +all types however. + +:: + + declare float @llvm.fma.f32(float %a, float %b, float %c) + declare double @llvm.fma.f64(double %a, double %b, double %c) + declare x86_fp80 @llvm.fma.f80(x86_fp80 %a, x86_fp80 %b, x86_fp80 %c) + declare fp128 @llvm.fma.f128(fp128 %a, fp128 %b, fp128 %c) + declare ppc_fp128 @llvm.fma.ppcf128(ppc_fp128 %a, ppc_fp128 %b, ppc_fp128 %c) + +Overview: +""""""""" + +The '``llvm.fma.*``' intrinsics perform the fused multiply-add +operation. + +Arguments: +"""""""""" + +The argument and return value are floating point numbers of the same +type. + +Semantics: +"""""""""" + +This function returns the same values as the libm ``fma`` functions +would. + +'``llvm.fabs.*``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +This is an overloaded intrinsic. You can use ``llvm.fabs`` on any +floating point or vector of floating point type. Not all targets support +all types however. + +:: + + declare float @llvm.fabs.f32(float %Val) + declare double @llvm.fabs.f64(double %Val) + declare x86_fp80 @llvm.fabs.f80(x86_fp80 %Val) + declare fp128 @llvm.fabs.f128(fp128 %Val) + declare ppc_fp128 @llvm.fabs.ppcf128(ppc_fp128 %Val) + +Overview: +""""""""" + +The '``llvm.fabs.*``' intrinsics return the absolute value of the +operand. + +Arguments: +"""""""""" + +The argument and return value are floating point numbers of the same +type. + +Semantics: +"""""""""" + +This function returns the same values as the libm ``fabs`` functions +would, and handles error conditions in the same way. + +'``llvm.floor.*``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +This is an overloaded intrinsic. You can use ``llvm.floor`` on any +floating point or vector of floating point type. Not all targets support +all types however. + +:: + + declare float @llvm.floor.f32(float %Val) + declare double @llvm.floor.f64(double %Val) + declare x86_fp80 @llvm.floor.f80(x86_fp80 %Val) + declare fp128 @llvm.floor.f128(fp128 %Val) + declare ppc_fp128 @llvm.floor.ppcf128(ppc_fp128 %Val) + +Overview: +""""""""" + +The '``llvm.floor.*``' intrinsics return the floor of the operand. + +Arguments: +"""""""""" + +The argument and return value are floating point numbers of the same +type. + +Semantics: +"""""""""" + +This function returns the same values as the libm ``floor`` functions +would, and handles error conditions in the same way. + +'``llvm.ceil.*``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +This is an overloaded intrinsic. You can use ``llvm.ceil`` on any +floating point or vector of floating point type. Not all targets support +all types however. + +:: + + declare float @llvm.ceil.f32(float %Val) + declare double @llvm.ceil.f64(double %Val) + declare x86_fp80 @llvm.ceil.f80(x86_fp80 %Val) + declare fp128 @llvm.ceil.f128(fp128 %Val) + declare ppc_fp128 @llvm.ceil.ppcf128(ppc_fp128 %Val) + +Overview: +""""""""" + +The '``llvm.ceil.*``' intrinsics return the ceiling of the operand. + +Arguments: +"""""""""" + +The argument and return value are floating point numbers of the same +type. + +Semantics: +"""""""""" + +This function returns the same values as the libm ``ceil`` functions +would, and handles error conditions in the same way. + +'``llvm.trunc.*``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +This is an overloaded intrinsic. You can use ``llvm.trunc`` on any +floating point or vector of floating point type. Not all targets support +all types however. + +:: + + declare float @llvm.trunc.f32(float %Val) + declare double @llvm.trunc.f64(double %Val) + declare x86_fp80 @llvm.trunc.f80(x86_fp80 %Val) + declare fp128 @llvm.trunc.f128(fp128 %Val) + declare ppc_fp128 @llvm.trunc.ppcf128(ppc_fp128 %Val) + +Overview: +""""""""" + +The '``llvm.trunc.*``' intrinsics returns the operand rounded to the +nearest integer not larger in magnitude than the operand. + +Arguments: +"""""""""" + +The argument and return value are floating point numbers of the same +type. + +Semantics: +"""""""""" + +This function returns the same values as the libm ``trunc`` functions +would, and handles error conditions in the same way. + +'``llvm.rint.*``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +This is an overloaded intrinsic. You can use ``llvm.rint`` on any +floating point or vector of floating point type. Not all targets support +all types however. + +:: + + declare float @llvm.rint.f32(float %Val) + declare double @llvm.rint.f64(double %Val) + declare x86_fp80 @llvm.rint.f80(x86_fp80 %Val) + declare fp128 @llvm.rint.f128(fp128 %Val) + declare ppc_fp128 @llvm.rint.ppcf128(ppc_fp128 %Val) + +Overview: +""""""""" + +The '``llvm.rint.*``' intrinsics returns the operand rounded to the +nearest integer. It may raise an inexact floating-point exception if the +operand isn't an integer. + +Arguments: +"""""""""" + +The argument and return value are floating point numbers of the same +type. + +Semantics: +"""""""""" + +This function returns the same values as the libm ``rint`` functions +would, and handles error conditions in the same way. + +'``llvm.nearbyint.*``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +This is an overloaded intrinsic. You can use ``llvm.nearbyint`` on any +floating point or vector of floating point type. Not all targets support +all types however. + +:: + + declare float @llvm.nearbyint.f32(float %Val) + declare double @llvm.nearbyint.f64(double %Val) + declare x86_fp80 @llvm.nearbyint.f80(x86_fp80 %Val) + declare fp128 @llvm.nearbyint.f128(fp128 %Val) + declare ppc_fp128 @llvm.nearbyint.ppcf128(ppc_fp128 %Val) + +Overview: +""""""""" + +The '``llvm.nearbyint.*``' intrinsics returns the operand rounded to the +nearest integer. + +Arguments: +"""""""""" + +The argument and return value are floating point numbers of the same +type. + +Semantics: +"""""""""" + +This function returns the same values as the libm ``nearbyint`` +functions would, and handles error conditions in the same way. + +Bit Manipulation Intrinsics +--------------------------- + +LLVM provides intrinsics for a few important bit manipulation +operations. These allow efficient code generation for some algorithms. + +'``llvm.bswap.*``' Intrinsics +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +This is an overloaded intrinsic function. You can use bswap on any +integer type that is an even number of bytes (i.e. BitWidth % 16 == 0). + +:: + + declare i16 @llvm.bswap.i16(i16 <id>) + declare i32 @llvm.bswap.i32(i32 <id>) + declare i64 @llvm.bswap.i64(i64 <id>) + +Overview: +""""""""" + +The '``llvm.bswap``' family of intrinsics is used to byte swap integer +values with an even number of bytes (positive multiple of 16 bits). +These are useful for performing operations on data that is not in the +target's native byte order. + +Semantics: +"""""""""" + +The ``llvm.bswap.i16`` intrinsic returns an i16 value that has the high +and low byte of the input i16 swapped. Similarly, the ``llvm.bswap.i32`` +intrinsic returns an i32 value that has the four bytes of the input i32 +swapped, so that if the input bytes are numbered 0, 1, 2, 3 then the +returned i32 will have its bytes in 3, 2, 1, 0 order. The +``llvm.bswap.i48``, ``llvm.bswap.i64`` and other intrinsics extend this +concept to additional even-byte lengths (6 bytes, 8 bytes and more, +respectively). + +'``llvm.ctpop.*``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +This is an overloaded intrinsic. You can use llvm.ctpop on any integer +bit width, or on any vector with integer elements. Not all targets +support all bit widths or vector types, however. + +:: + + declare i8 @llvm.ctpop.i8(i8 <src>) + declare i16 @llvm.ctpop.i16(i16 <src>) + declare i32 @llvm.ctpop.i32(i32 <src>) + declare i64 @llvm.ctpop.i64(i64 <src>) + declare i256 @llvm.ctpop.i256(i256 <src>) + declare <2 x i32> @llvm.ctpop.v2i32(<2 x i32> <src>) + +Overview: +""""""""" + +The '``llvm.ctpop``' family of intrinsics counts the number of bits set +in a value. + +Arguments: +"""""""""" + +The only argument is the value to be counted. The argument may be of any +integer type, or a vector with integer elements. The return type must +match the argument type. + +Semantics: +"""""""""" + +The '``llvm.ctpop``' intrinsic counts the 1's in a variable, or within +each element of a vector. + +'``llvm.ctlz.*``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +This is an overloaded intrinsic. You can use ``llvm.ctlz`` on any +integer bit width, or any vector whose elements are integers. Not all +targets support all bit widths or vector types, however. + +:: + + declare i8 @llvm.ctlz.i8 (i8 <src>, i1 <is_zero_undef>) + declare i16 @llvm.ctlz.i16 (i16 <src>, i1 <is_zero_undef>) + declare i32 @llvm.ctlz.i32 (i32 <src>, i1 <is_zero_undef>) + declare i64 @llvm.ctlz.i64 (i64 <src>, i1 <is_zero_undef>) + declare i256 @llvm.ctlz.i256(i256 <src>, i1 <is_zero_undef>) + declase <2 x i32> @llvm.ctlz.v2i32(<2 x i32> <src>, i1 <is_zero_undef>) + +Overview: +""""""""" + +The '``llvm.ctlz``' family of intrinsic functions counts the number of +leading zeros in a variable. + +Arguments: +"""""""""" + +The first argument is the value to be counted. This argument may be of +any integer type, or a vectory with integer element type. The return +type must match the first argument type. + +The second argument must be a constant and is a flag to indicate whether +the intrinsic should ensure that a zero as the first argument produces a +defined result. Historically some architectures did not provide a +defined result for zero values as efficiently, and many algorithms are +now predicated on avoiding zero-value inputs. + +Semantics: +"""""""""" + +The '``llvm.ctlz``' intrinsic counts the leading (most significant) +zeros in a variable, or within each element of the vector. If +``src == 0`` then the result is the size in bits of the type of ``src`` +if ``is_zero_undef == 0`` and ``undef`` otherwise. For example, +``llvm.ctlz(i32 2) = 30``. + +'``llvm.cttz.*``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +This is an overloaded intrinsic. You can use ``llvm.cttz`` on any +integer bit width, or any vector of integer elements. Not all targets +support all bit widths or vector types, however. + +:: + + declare i8 @llvm.cttz.i8 (i8 <src>, i1 <is_zero_undef>) + declare i16 @llvm.cttz.i16 (i16 <src>, i1 <is_zero_undef>) + declare i32 @llvm.cttz.i32 (i32 <src>, i1 <is_zero_undef>) + declare i64 @llvm.cttz.i64 (i64 <src>, i1 <is_zero_undef>) + declare i256 @llvm.cttz.i256(i256 <src>, i1 <is_zero_undef>) + declase <2 x i32> @llvm.cttz.v2i32(<2 x i32> <src>, i1 <is_zero_undef>) + +Overview: +""""""""" + +The '``llvm.cttz``' family of intrinsic functions counts the number of +trailing zeros. + +Arguments: +"""""""""" + +The first argument is the value to be counted. This argument may be of +any integer type, or a vectory with integer element type. The return +type must match the first argument type. + +The second argument must be a constant and is a flag to indicate whether +the intrinsic should ensure that a zero as the first argument produces a +defined result. Historically some architectures did not provide a +defined result for zero values as efficiently, and many algorithms are +now predicated on avoiding zero-value inputs. + +Semantics: +"""""""""" + +The '``llvm.cttz``' intrinsic counts the trailing (least significant) +zeros in a variable, or within each element of a vector. If ``src == 0`` +then the result is the size in bits of the type of ``src`` if +``is_zero_undef == 0`` and ``undef`` otherwise. For example, +``llvm.cttz(2) = 1``. + +Arithmetic with Overflow Intrinsics +----------------------------------- + +LLVM provides intrinsics for some arithmetic with overflow operations. + +'``llvm.sadd.with.overflow.*``' Intrinsics +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +This is an overloaded intrinsic. You can use ``llvm.sadd.with.overflow`` +on any integer bit width. + +:: + + declare {i16, i1} @llvm.sadd.with.overflow.i16(i16 %a, i16 %b) + declare {i32, i1} @llvm.sadd.with.overflow.i32(i32 %a, i32 %b) + declare {i64, i1} @llvm.sadd.with.overflow.i64(i64 %a, i64 %b) + +Overview: +""""""""" + +The '``llvm.sadd.with.overflow``' family of intrinsic functions perform +a signed addition of the two arguments, and indicate whether an overflow +occurred during the signed summation. + +Arguments: +"""""""""" + +The arguments (%a and %b) and the first element of the result structure +may be of integer types of any bit width, but they must have the same +bit width. The second element of the result structure must be of type +``i1``. ``%a`` and ``%b`` are the two values that will undergo signed +addition. + +Semantics: +"""""""""" + +The '``llvm.sadd.with.overflow``' family of intrinsic functions perform +a signed addition of the two variables. They return a structure — the +first element of which is the signed summation, and the second element +of which is a bit specifying if the signed summation resulted in an +overflow. + +Examples: +""""""""" + +.. code-block:: llvm + + %res = call {i32, i1} @llvm.sadd.with.overflow.i32(i32 %a, i32 %b) + %sum = extractvalue {i32, i1} %res, 0 + %obit = extractvalue {i32, i1} %res, 1 + br i1 %obit, label %overflow, label %normal + +'``llvm.uadd.with.overflow.*``' Intrinsics +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +This is an overloaded intrinsic. You can use ``llvm.uadd.with.overflow`` +on any integer bit width. + +:: + + declare {i16, i1} @llvm.uadd.with.overflow.i16(i16 %a, i16 %b) + declare {i32, i1} @llvm.uadd.with.overflow.i32(i32 %a, i32 %b) + declare {i64, i1} @llvm.uadd.with.overflow.i64(i64 %a, i64 %b) + +Overview: +""""""""" + +The '``llvm.uadd.with.overflow``' family of intrinsic functions perform +an unsigned addition of the two arguments, and indicate whether a carry +occurred during the unsigned summation. + +Arguments: +"""""""""" + +The arguments (%a and %b) and the first element of the result structure +may be of integer types of any bit width, but they must have the same +bit width. The second element of the result structure must be of type +``i1``. ``%a`` and ``%b`` are the two values that will undergo unsigned +addition. + +Semantics: +"""""""""" + +The '``llvm.uadd.with.overflow``' family of intrinsic functions perform +an unsigned addition of the two arguments. They return a structure — the +first element of which is the sum, and the second element of which is a +bit specifying if the unsigned summation resulted in a carry. + +Examples: +""""""""" + +.. code-block:: llvm + + %res = call {i32, i1} @llvm.uadd.with.overflow.i32(i32 %a, i32 %b) + %sum = extractvalue {i32, i1} %res, 0 + %obit = extractvalue {i32, i1} %res, 1 + br i1 %obit, label %carry, label %normal + +'``llvm.ssub.with.overflow.*``' Intrinsics +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +This is an overloaded intrinsic. You can use ``llvm.ssub.with.overflow`` +on any integer bit width. + +:: + + declare {i16, i1} @llvm.ssub.with.overflow.i16(i16 %a, i16 %b) + declare {i32, i1} @llvm.ssub.with.overflow.i32(i32 %a, i32 %b) + declare {i64, i1} @llvm.ssub.with.overflow.i64(i64 %a, i64 %b) + +Overview: +""""""""" + +The '``llvm.ssub.with.overflow``' family of intrinsic functions perform +a signed subtraction of the two arguments, and indicate whether an +overflow occurred during the signed subtraction. + +Arguments: +"""""""""" + +The arguments (%a and %b) and the first element of the result structure +may be of integer types of any bit width, but they must have the same +bit width. The second element of the result structure must be of type +``i1``. ``%a`` and ``%b`` are the two values that will undergo signed +subtraction. + +Semantics: +"""""""""" + +The '``llvm.ssub.with.overflow``' family of intrinsic functions perform +a signed subtraction of the two arguments. They return a structure — the +first element of which is the subtraction, and the second element of +which is a bit specifying if the signed subtraction resulted in an +overflow. + +Examples: +""""""""" + +.. code-block:: llvm + + %res = call {i32, i1} @llvm.ssub.with.overflow.i32(i32 %a, i32 %b) + %sum = extractvalue {i32, i1} %res, 0 + %obit = extractvalue {i32, i1} %res, 1 + br i1 %obit, label %overflow, label %normal + +'``llvm.usub.with.overflow.*``' Intrinsics +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +This is an overloaded intrinsic. You can use ``llvm.usub.with.overflow`` +on any integer bit width. + +:: + + declare {i16, i1} @llvm.usub.with.overflow.i16(i16 %a, i16 %b) + declare {i32, i1} @llvm.usub.with.overflow.i32(i32 %a, i32 %b) + declare {i64, i1} @llvm.usub.with.overflow.i64(i64 %a, i64 %b) + +Overview: +""""""""" + +The '``llvm.usub.with.overflow``' family of intrinsic functions perform +an unsigned subtraction of the two arguments, and indicate whether an +overflow occurred during the unsigned subtraction. + +Arguments: +"""""""""" + +The arguments (%a and %b) and the first element of the result structure +may be of integer types of any bit width, but they must have the same +bit width. The second element of the result structure must be of type +``i1``. ``%a`` and ``%b`` are the two values that will undergo unsigned +subtraction. + +Semantics: +"""""""""" + +The '``llvm.usub.with.overflow``' family of intrinsic functions perform +an unsigned subtraction of the two arguments. They return a structure — +the first element of which is the subtraction, and the second element of +which is a bit specifying if the unsigned subtraction resulted in an +overflow. + +Examples: +""""""""" + +.. code-block:: llvm + + %res = call {i32, i1} @llvm.usub.with.overflow.i32(i32 %a, i32 %b) + %sum = extractvalue {i32, i1} %res, 0 + %obit = extractvalue {i32, i1} %res, 1 + br i1 %obit, label %overflow, label %normal + +'``llvm.smul.with.overflow.*``' Intrinsics +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +This is an overloaded intrinsic. You can use ``llvm.smul.with.overflow`` +on any integer bit width. + +:: + + declare {i16, i1} @llvm.smul.with.overflow.i16(i16 %a, i16 %b) + declare {i32, i1} @llvm.smul.with.overflow.i32(i32 %a, i32 %b) + declare {i64, i1} @llvm.smul.with.overflow.i64(i64 %a, i64 %b) + +Overview: +""""""""" + +The '``llvm.smul.with.overflow``' family of intrinsic functions perform +a signed multiplication of the two arguments, and indicate whether an +overflow occurred during the signed multiplication. + +Arguments: +"""""""""" + +The arguments (%a and %b) and the first element of the result structure +may be of integer types of any bit width, but they must have the same +bit width. The second element of the result structure must be of type +``i1``. ``%a`` and ``%b`` are the two values that will undergo signed +multiplication. + +Semantics: +"""""""""" + +The '``llvm.smul.with.overflow``' family of intrinsic functions perform +a signed multiplication of the two arguments. They return a structure — +the first element of which is the multiplication, and the second element +of which is a bit specifying if the signed multiplication resulted in an +overflow. + +Examples: +""""""""" + +.. code-block:: llvm + + %res = call {i32, i1} @llvm.smul.with.overflow.i32(i32 %a, i32 %b) + %sum = extractvalue {i32, i1} %res, 0 + %obit = extractvalue {i32, i1} %res, 1 + br i1 %obit, label %overflow, label %normal + +'``llvm.umul.with.overflow.*``' Intrinsics +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +This is an overloaded intrinsic. You can use ``llvm.umul.with.overflow`` +on any integer bit width. + +:: + + declare {i16, i1} @llvm.umul.with.overflow.i16(i16 %a, i16 %b) + declare {i32, i1} @llvm.umul.with.overflow.i32(i32 %a, i32 %b) + declare {i64, i1} @llvm.umul.with.overflow.i64(i64 %a, i64 %b) + +Overview: +""""""""" + +The '``llvm.umul.with.overflow``' family of intrinsic functions perform +a unsigned multiplication of the two arguments, and indicate whether an +overflow occurred during the unsigned multiplication. + +Arguments: +"""""""""" + +The arguments (%a and %b) and the first element of the result structure +may be of integer types of any bit width, but they must have the same +bit width. The second element of the result structure must be of type +``i1``. ``%a`` and ``%b`` are the two values that will undergo unsigned +multiplication. + +Semantics: +"""""""""" + +The '``llvm.umul.with.overflow``' family of intrinsic functions perform +an unsigned multiplication of the two arguments. They return a structure +— the first element of which is the multiplication, and the second +element of which is a bit specifying if the unsigned multiplication +resulted in an overflow. + +Examples: +""""""""" + +.. code-block:: llvm + + %res = call {i32, i1} @llvm.umul.with.overflow.i32(i32 %a, i32 %b) + %sum = extractvalue {i32, i1} %res, 0 + %obit = extractvalue {i32, i1} %res, 1 + br i1 %obit, label %overflow, label %normal + +Specialised Arithmetic Intrinsics +--------------------------------- + +'``llvm.fmuladd.*``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare float @llvm.fmuladd.f32(float %a, float %b, float %c) + declare double @llvm.fmuladd.f64(double %a, double %b, double %c) + +Overview: +""""""""" + +The '``llvm.fmuladd.*``' intrinsic functions represent multiply-add +expressions that can be fused if the code generator determines that the +fused expression would be legal and efficient. + +Arguments: +"""""""""" + +The '``llvm.fmuladd.*``' intrinsics each take three arguments: two +multiplicands, a and b, and an addend c. + +Semantics: +"""""""""" + +The expression: + +:: + + %0 = call float @llvm.fmuladd.f32(%a, %b, %c) + +is equivalent to the expression a \* b + c, except that rounding will +not be performed between the multiplication and addition steps if the +code generator fuses the operations. Fusion is not guaranteed, even if +the target platform supports it. If a fused multiply-add is required the +corresponding llvm.fma.\* intrinsic function should be used instead. + +Examples: +""""""""" + +.. code-block:: llvm + + %r2 = call float @llvm.fmuladd.f32(float %a, float %b, float %c) ; yields {float}:r2 = (a * b) + c + +Half Precision Floating Point Intrinsics +---------------------------------------- + +For most target platforms, half precision floating point is a +storage-only format. This means that it is a dense encoding (in memory) +but does not support computation in the format. + +This means that code must first load the half-precision floating point +value as an i16, then convert it to float with +:ref:`llvm.convert.from.fp16 <int_convert_from_fp16>`. Computation can +then be performed on the float value (including extending to double +etc). To store the value back to memory, it is first converted to float +if needed, then converted to i16 with +:ref:`llvm.convert.to.fp16 <int_convert_to_fp16>`, then storing as an +i16 value. + +.. _int_convert_to_fp16: + +'``llvm.convert.to.fp16``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare i16 @llvm.convert.to.fp16(f32 %a) + +Overview: +""""""""" + +The '``llvm.convert.to.fp16``' intrinsic function performs a conversion +from single precision floating point format to half precision floating +point format. + +Arguments: +"""""""""" + +The intrinsic function contains single argument - the value to be +converted. + +Semantics: +"""""""""" + +The '``llvm.convert.to.fp16``' intrinsic function performs a conversion +from single precision floating point format to half precision floating +point format. The return value is an ``i16`` which contains the +converted number. + +Examples: +""""""""" + +.. code-block:: llvm + + %res = call i16 @llvm.convert.to.fp16(f32 %a) + store i16 %res, i16* @x, align 2 + +.. _int_convert_from_fp16: + +'``llvm.convert.from.fp16``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare f32 @llvm.convert.from.fp16(i16 %a) + +Overview: +""""""""" + +The '``llvm.convert.from.fp16``' intrinsic function performs a +conversion from half precision floating point format to single precision +floating point format. + +Arguments: +"""""""""" + +The intrinsic function contains single argument - the value to be +converted. + +Semantics: +"""""""""" + +The '``llvm.convert.from.fp16``' intrinsic function performs a +conversion from half single precision floating point format to single +precision floating point format. The input half-float value is +represented by an ``i16`` value. + +Examples: +""""""""" + +.. code-block:: llvm + + %a = load i16* @x, align 2 + %res = call f32 @llvm.convert.from.fp16(i16 %a) + +Debugger Intrinsics +------------------- + +The LLVM debugger intrinsics (which all start with ``llvm.dbg.`` +prefix), are described in the `LLVM Source Level +Debugging <SourceLevelDebugging.html#format_common_intrinsics>`_ +document. + +Exception Handling Intrinsics +----------------------------- + +The LLVM exception handling intrinsics (which all start with +``llvm.eh.`` prefix), are described in the `LLVM Exception +Handling <ExceptionHandling.html#format_common_intrinsics>`_ document. + +.. _int_trampoline: + +Trampoline Intrinsics +--------------------- + +These intrinsics make it possible to excise one parameter, marked with +the :ref:`nest <nest>` attribute, from a function. The result is a +callable function pointer lacking the nest parameter - the caller does +not need to provide a value for it. Instead, the value to use is stored +in advance in a "trampoline", a block of memory usually allocated on the +stack, which also contains code to splice the nest value into the +argument list. This is used to implement the GCC nested function address +extension. + +For example, if the function is ``i32 f(i8* nest %c, i32 %x, i32 %y)`` +then the resulting function pointer has signature ``i32 (i32, i32)*``. +It can be created as follows: + +.. code-block:: llvm + + %tramp = alloca [10 x i8], align 4 ; size and alignment only correct for X86 + %tramp1 = getelementptr [10 x i8]* %tramp, i32 0, i32 0 + call i8* @llvm.init.trampoline(i8* %tramp1, i8* bitcast (i32 (i8*, i32, i32)* @f to i8*), i8* %nval) + %p = call i8* @llvm.adjust.trampoline(i8* %tramp1) + %fp = bitcast i8* %p to i32 (i32, i32)* + +The call ``%val = call i32 %fp(i32 %x, i32 %y)`` is then equivalent to +``%val = call i32 %f(i8* %nval, i32 %x, i32 %y)``. + +.. _int_it: + +'``llvm.init.trampoline``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare void @llvm.init.trampoline(i8* <tramp>, i8* <func>, i8* <nval>) + +Overview: +""""""""" + +This fills the memory pointed to by ``tramp`` with executable code, +turning it into a trampoline. + +Arguments: +"""""""""" + +The ``llvm.init.trampoline`` intrinsic takes three arguments, all +pointers. The ``tramp`` argument must point to a sufficiently large and +sufficiently aligned block of memory; this memory is written to by the +intrinsic. Note that the size and the alignment are target-specific - +LLVM currently provides no portable way of determining them, so a +front-end that generates this intrinsic needs to have some +target-specific knowledge. The ``func`` argument must hold a function +bitcast to an ``i8*``. + +Semantics: +"""""""""" + +The block of memory pointed to by ``tramp`` is filled with target +dependent code, turning it into a function. Then ``tramp`` needs to be +passed to :ref:`llvm.adjust.trampoline <int_at>` to get a pointer which can +be :ref:`bitcast (to a new function) and called <int_trampoline>`. The new +function's signature is the same as that of ``func`` with any arguments +marked with the ``nest`` attribute removed. At most one such ``nest`` +argument is allowed, and it must be of pointer type. Calling the new +function is equivalent to calling ``func`` with the same argument list, +but with ``nval`` used for the missing ``nest`` argument. If, after +calling ``llvm.init.trampoline``, the memory pointed to by ``tramp`` is +modified, then the effect of any later call to the returned function +pointer is undefined. + +.. _int_at: + +'``llvm.adjust.trampoline``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare i8* @llvm.adjust.trampoline(i8* <tramp>) + +Overview: +""""""""" + +This performs any required machine-specific adjustment to the address of +a trampoline (passed as ``tramp``). + +Arguments: +"""""""""" + +``tramp`` must point to a block of memory which already has trampoline +code filled in by a previous call to +:ref:`llvm.init.trampoline <int_it>`. + +Semantics: +"""""""""" + +On some architectures the address of the code to be executed needs to be +different to the address where the trampoline is actually stored. This +intrinsic returns the executable address corresponding to ``tramp`` +after performing the required machine specific adjustments. The pointer +returned can then be :ref:`bitcast and executed <int_trampoline>`. + +Memory Use Markers +------------------ + +This class of intrinsics exists to information about the lifetime of +memory objects and ranges where variables are immutable. + +'``llvm.lifetime.start``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare void @llvm.lifetime.start(i64 <size>, i8* nocapture <ptr>) + +Overview: +""""""""" + +The '``llvm.lifetime.start``' intrinsic specifies the start of a memory +object's lifetime. + +Arguments: +"""""""""" + +The first argument is a constant integer representing the size of the +object, or -1 if it is variable sized. The second argument is a pointer +to the object. + +Semantics: +"""""""""" + +This intrinsic indicates that before this point in the code, the value +of the memory pointed to by ``ptr`` is dead. This means that it is known +to never be used and has an undefined value. A load from the pointer +that precedes this intrinsic can be replaced with ``'undef'``. + +'``llvm.lifetime.end``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare void @llvm.lifetime.end(i64 <size>, i8* nocapture <ptr>) + +Overview: +""""""""" + +The '``llvm.lifetime.end``' intrinsic specifies the end of a memory +object's lifetime. + +Arguments: +"""""""""" + +The first argument is a constant integer representing the size of the +object, or -1 if it is variable sized. The second argument is a pointer +to the object. + +Semantics: +"""""""""" + +This intrinsic indicates that after this point in the code, the value of +the memory pointed to by ``ptr`` is dead. This means that it is known to +never be used and has an undefined value. Any stores into the memory +object following this intrinsic may be removed as dead. + +'``llvm.invariant.start``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare {}* @llvm.invariant.start(i64 <size>, i8* nocapture <ptr>) + +Overview: +""""""""" + +The '``llvm.invariant.start``' intrinsic specifies that the contents of +a memory object will not change. + +Arguments: +"""""""""" + +The first argument is a constant integer representing the size of the +object, or -1 if it is variable sized. The second argument is a pointer +to the object. + +Semantics: +"""""""""" + +This intrinsic indicates that until an ``llvm.invariant.end`` that uses +the return value, the referenced memory location is constant and +unchanging. + +'``llvm.invariant.end``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare void @llvm.invariant.end({}* <start>, i64 <size>, i8* nocapture <ptr>) + +Overview: +""""""""" + +The '``llvm.invariant.end``' intrinsic specifies that the contents of a +memory object are mutable. + +Arguments: +"""""""""" + +The first argument is the matching ``llvm.invariant.start`` intrinsic. +The second argument is a constant integer representing the size of the +object, or -1 if it is variable sized and the third argument is a +pointer to the object. + +Semantics: +"""""""""" + +This intrinsic indicates that the memory is mutable again. + +General Intrinsics +------------------ + +This class of intrinsics is designed to be generic and has no specific +purpose. + +'``llvm.var.annotation``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare void @llvm.var.annotation(i8* <val>, i8* <str>, i8* <str>, i32 <int>) + +Overview: +""""""""" + +The '``llvm.var.annotation``' intrinsic. + +Arguments: +"""""""""" + +The first argument is a pointer to a value, the second is a pointer to a +global string, the third is a pointer to a global string which is the +source file name, and the last argument is the line number. + +Semantics: +"""""""""" + +This intrinsic allows annotation of local variables with arbitrary +strings. This can be useful for special purpose optimizations that want +to look for these annotations. These have no other defined use; they are +ignored by code generation and optimization. + +'``llvm.annotation.*``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +This is an overloaded intrinsic. You can use '``llvm.annotation``' on +any integer bit width. + +:: + + declare i8 @llvm.annotation.i8(i8 <val>, i8* <str>, i8* <str>, i32 <int>) + declare i16 @llvm.annotation.i16(i16 <val>, i8* <str>, i8* <str>, i32 <int>) + declare i32 @llvm.annotation.i32(i32 <val>, i8* <str>, i8* <str>, i32 <int>) + declare i64 @llvm.annotation.i64(i64 <val>, i8* <str>, i8* <str>, i32 <int>) + declare i256 @llvm.annotation.i256(i256 <val>, i8* <str>, i8* <str>, i32 <int>) + +Overview: +""""""""" + +The '``llvm.annotation``' intrinsic. + +Arguments: +"""""""""" + +The first argument is an integer value (result of some expression), the +second is a pointer to a global string, the third is a pointer to a +global string which is the source file name, and the last argument is +the line number. It returns the value of the first argument. + +Semantics: +"""""""""" + +This intrinsic allows annotations to be put on arbitrary expressions +with arbitrary strings. This can be useful for special purpose +optimizations that want to look for these annotations. These have no +other defined use; they are ignored by code generation and optimization. + +'``llvm.trap``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare void @llvm.trap() noreturn nounwind + +Overview: +""""""""" + +The '``llvm.trap``' intrinsic. + +Arguments: +"""""""""" + +None. + +Semantics: +"""""""""" + +This intrinsic is lowered to the target dependent trap instruction. If +the target does not have a trap instruction, this intrinsic will be +lowered to a call of the ``abort()`` function. + +'``llvm.debugtrap``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare void @llvm.debugtrap() nounwind + +Overview: +""""""""" + +The '``llvm.debugtrap``' intrinsic. + +Arguments: +"""""""""" + +None. + +Semantics: +"""""""""" + +This intrinsic is lowered to code which is intended to cause an +execution trap with the intention of requesting the attention of a +debugger. + +'``llvm.stackprotector``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare void @llvm.stackprotector(i8* <guard>, i8** <slot>) + +Overview: +""""""""" + +The ``llvm.stackprotector`` intrinsic takes the ``guard`` and stores it +onto the stack at ``slot``. The stack slot is adjusted to ensure that it +is placed on the stack before local variables. + +Arguments: +"""""""""" + +The ``llvm.stackprotector`` intrinsic requires two pointer arguments. +The first argument is the value loaded from the stack guard +``@__stack_chk_guard``. The second variable is an ``alloca`` that has +enough space to hold the value of the guard. + +Semantics: +"""""""""" + +This intrinsic causes the prologue/epilogue inserter to force the +position of the ``AllocaInst`` stack slot to be before local variables +on the stack. This is to ensure that if a local variable on the stack is +overwritten, it will destroy the value of the guard. When the function +exits, the guard on the stack is checked against the original guard. If +they are different, then the program aborts by calling the +``__stack_chk_fail()`` function. + +'``llvm.objectsize``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare i32 @llvm.objectsize.i32(i8* <object>, i1 <min>) + declare i64 @llvm.objectsize.i64(i8* <object>, i1 <min>) + +Overview: +""""""""" + +The ``llvm.objectsize`` intrinsic is designed to provide information to +the optimizers to determine at compile time whether a) an operation +(like memcpy) will overflow a buffer that corresponds to an object, or +b) that a runtime check for overflow isn't necessary. An object in this +context means an allocation of a specific class, structure, array, or +other object. + +Arguments: +"""""""""" + +The ``llvm.objectsize`` intrinsic takes two arguments. The first +argument is a pointer to or into the ``object``. The second argument is +a boolean and determines whether ``llvm.objectsize`` returns 0 (if true) +or -1 (if false) when the object size is unknown. The second argument +only accepts constants. + +Semantics: +"""""""""" + +The ``llvm.objectsize`` intrinsic is lowered to a constant representing +the size of the object concerned. If the size cannot be determined at +compile time, ``llvm.objectsize`` returns ``i32/i64 -1 or 0`` (depending +on the ``min`` argument). + +'``llvm.expect``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare i32 @llvm.expect.i32(i32 <val>, i32 <expected_val>) + declare i64 @llvm.expect.i64(i64 <val>, i64 <expected_val>) + +Overview: +""""""""" + +The ``llvm.expect`` intrinsic provides information about expected (the +most probable) value of ``val``, which can be used by optimizers. + +Arguments: +"""""""""" + +The ``llvm.expect`` intrinsic takes two arguments. The first argument is +a value. The second argument is an expected value, this needs to be a +constant value, variables are not allowed. + +Semantics: +"""""""""" + +This intrinsic is lowered to the ``val``. + +'``llvm.donothing``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare void @llvm.donothing() nounwind readnone + +Overview: +""""""""" + +The ``llvm.donothing`` intrinsic doesn't perform any operation. It's the +only intrinsic that can be called with an invoke instruction. + +Arguments: +"""""""""" + +None. + +Semantics: +"""""""""" + +This intrinsic does nothing, and it's removed by optimizers and ignored +by codegen. diff --git a/docs/Makefile.sphinx b/docs/Makefile.sphinx index 81c13de9cd..3746522db6 100644 --- a/docs/Makefile.sphinx +++ b/docs/Makefile.sphinx @@ -49,7 +49,7 @@ html: @# FIXME: Remove this `cp` once HTML->Sphinx transition is completed. @# Kind of a hack, but HTML-formatted docs are on the way out anyway. @echo "Copying legacy HTML-formatted docs into $(BUILDDIR)/html" - @cp -a *.html tutorial $(BUILDDIR)/html + @cp -a *.html $(BUILDDIR)/html @echo "Build finished. The HTML pages are in $(BUILDDIR)/html." dirhtml: diff --git a/docs/MakefileGuide.rst b/docs/MakefileGuide.rst index d2bdd24a9e..2c1d33e962 100644 --- a/docs/MakefileGuide.rst +++ b/docs/MakefileGuide.rst @@ -339,7 +339,7 @@ the invocation of ``make check-local`` in the ``test`` directory. The intended usage for this is to assist in running specific suites of tests. If ``TESTSUITE`` is not set, the implementation of ``check-local`` should run all normal tests. It is up to the project to define what different values for -``TESTSUTE`` will do. See the `Testing Guide <TestingGuide.html>`_ for further +``TESTSUTE`` will do. See the :doc:`Testing Guide <TestingGuide>` for further details. ``check-local`` diff --git a/docs/Passes.html b/docs/Passes.html index aa9f8bc247..7bffc54d8d 100644 --- a/docs/Passes.html +++ b/docs/Passes.html @@ -175,7 +175,6 @@ perl -e '$/ = undef; for (split(/\n/, <>)) { s:^ *///? ?::; print " <p>\n" if ! <tr><td><a href="#simplify-libcalls">-simplify-libcalls</a></td><td>Simplify well-known library calls</td></tr> <tr><td><a href="#simplifycfg">-simplifycfg</a></td><td>Simplify the CFG</td></tr> <tr><td><a href="#sink">-sink</a></td><td>Code sinking</td></tr> -<tr><td><a href="#sretpromotion">-sretpromotion</a></td><td>Promote sret arguments to multiple ret values</td></tr> <tr><td><a href="#strip">-strip</a></td><td>Strip all symbols from a module</td></tr> <tr><td><a href="#strip-dead-debug-info">-strip-dead-debug-info</a></td><td>Strip debug info for unused symbols</td></tr> <tr><td><a href="#strip-dead-prototypes">-strip-dead-prototypes</a></td><td>Strip Unused Function Prototypes</td></tr> @@ -1715,29 +1714,6 @@ if (X < 3) {</pre> <!-------------------------------------------------------------------------- --> <h3> - <a name="sretpromotion">-sretpromotion: Promote sret arguments to multiple ret values</a> -</h3> -<div> - <p> - This pass finds functions that return a struct (using a pointer to the struct - as the first argument of the function, marked with the '<tt>sret</tt>' attribute) and - replaces them with a new function that simply returns each of the elements of - that struct (using multiple return values). - </p> - - <p> - This pass works under a number of conditions: - </p> - - <ul> - <li>The returned struct must not contain other structs</li> - <li>The returned struct must only be used to load values from</li> - <li>The placeholder struct passed in is the result of an <tt>alloca</tt></li> - </ul> -</div> - -<!-------------------------------------------------------------------------- --> -<h3> <a name="strip">-strip: Strip all symbols from a module</a> </h3> <div> diff --git a/docs/Phabricator.rst b/docs/Phabricator.rst index 13ef9eddd3..b45449793e 100644 --- a/docs/Phabricator.rst +++ b/docs/Phabricator.rst @@ -50,8 +50,8 @@ reviewer understand your code. To get a full diff, use one of the following commands (or just use Arcanist to upload your patch): -* git diff -U999999 other-branch -* svn diff --diff-cmd=diff -x -U999999 +* ``git diff -U999999 other-branch`` +* ``svn diff --diff-cmd=diff -x -U999999`` To upload a new patch: diff --git a/docs/ProgrammersManual.html b/docs/ProgrammersManual.html deleted file mode 100644 index 64ddb9d105..0000000000 --- a/docs/ProgrammersManual.html +++ /dev/null @@ -1,4156 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> -<html> -<head> - <meta http-equiv="Content-type" content="text/html;charset=UTF-8"> - <title>LLVM Programmer's Manual</title> - <link rel="stylesheet" href="_static/llvm.css" type="text/css"> -</head> -<body> - -<h1> - LLVM Programmer's Manual -</h1> - -<ol> - <li><a href="#introduction">Introduction</a></li> - <li><a href="#general">General Information</a> - <ul> - <li><a href="#stl">The C++ Standard Template Library</a></li> -<!-- - <li>The <tt>-time-passes</tt> option</li> - <li>How to use the LLVM Makefile system</li> - <li>How to write a regression test</li> - ---> - </ul> - </li> - <li><a href="#apis">Important and useful LLVM APIs</a> - <ul> - <li><a href="#isa">The <tt>isa<></tt>, <tt>cast<></tt> -and <tt>dyn_cast<></tt> templates</a> </li> - <li><a href="#string_apis">Passing strings (the <tt>StringRef</tt> -and <tt>Twine</tt> classes)</a> - <ul> - <li><a href="#StringRef">The <tt>StringRef</tt> class</a> </li> - <li><a href="#Twine">The <tt>Twine</tt> class</a> </li> - </ul> - </li> - <li><a href="#DEBUG">The <tt>DEBUG()</tt> macro and <tt>-debug</tt> -option</a> - <ul> - <li><a href="#DEBUG_TYPE">Fine grained debug info with <tt>DEBUG_TYPE</tt> -and the <tt>-debug-only</tt> option</a> </li> - </ul> - </li> - <li><a href="#Statistic">The <tt>Statistic</tt> class & <tt>-stats</tt> -option</a></li> -<!-- - <li>The <tt>InstVisitor</tt> template - <li>The general graph API ---> - <li><a href="#ViewGraph">Viewing graphs while debugging code</a></li> - </ul> - </li> - <li><a href="#datastructure">Picking the Right Data Structure for a Task</a> - <ul> - <li><a href="#ds_sequential">Sequential Containers (std::vector, std::list, etc)</a> - <ul> - <li><a href="#dss_arrayref">llvm/ADT/ArrayRef.h</a></li> - <li><a href="#dss_fixedarrays">Fixed Size Arrays</a></li> - <li><a href="#dss_heaparrays">Heap Allocated Arrays</a></li> - <li><a href="#dss_tinyptrvector">"llvm/ADT/TinyPtrVector.h"</a></li> - <li><a href="#dss_smallvector">"llvm/ADT/SmallVector.h"</a></li> - <li><a href="#dss_vector"><vector></a></li> - <li><a href="#dss_deque"><deque></a></li> - <li><a href="#dss_list"><list></a></li> - <li><a href="#dss_ilist">llvm/ADT/ilist.h</a></li> - <li><a href="#dss_packedvector">llvm/ADT/PackedVector.h</a></li> - <li><a href="#dss_other">Other Sequential Container Options</a></li> - </ul></li> - <li><a href="#ds_string">String-like containers</a> - <ul> - <li><a href="#dss_stringref">llvm/ADT/StringRef.h</a></li> - <li><a href="#dss_twine">llvm/ADT/Twine.h</a></li> - <li><a href="#dss_smallstring">llvm/ADT/SmallString.h</a></li> - <li><a href="#dss_stdstring">std::string</a></li> - </ul></li> - <li><a href="#ds_set">Set-Like Containers (std::set, SmallSet, SetVector, etc)</a> - <ul> - <li><a href="#dss_sortedvectorset">A sorted 'vector'</a></li> - <li><a href="#dss_smallset">"llvm/ADT/SmallSet.h"</a></li> - <li><a href="#dss_smallptrset">"llvm/ADT/SmallPtrSet.h"</a></li> - <li><a href="#dss_denseset">"llvm/ADT/DenseSet.h"</a></li> - <li><a href="#dss_sparseset">"llvm/ADT/SparseSet.h"</a></li> - <li><a href="#dss_FoldingSet">"llvm/ADT/FoldingSet.h"</a></li> - <li><a href="#dss_set"><set></a></li> - <li><a href="#dss_setvector">"llvm/ADT/SetVector.h"</a></li> - <li><a href="#dss_uniquevector">"llvm/ADT/UniqueVector.h"</a></li> - <li><a href="#dss_immutableset">"llvm/ADT/ImmutableSet.h"</a></li> - <li><a href="#dss_otherset">Other Set-Like Container Options</a></li> - </ul></li> - <li><a href="#ds_map">Map-Like Containers (std::map, DenseMap, etc)</a> - <ul> - <li><a href="#dss_sortedvectormap">A sorted 'vector'</a></li> - <li><a href="#dss_stringmap">"llvm/ADT/StringMap.h"</a></li> - <li><a href="#dss_indexedmap">"llvm/ADT/IndexedMap.h"</a></li> - <li><a href="#dss_densemap">"llvm/ADT/DenseMap.h"</a></li> - <li><a href="#dss_valuemap">"llvm/ADT/ValueMap.h"</a></li> - <li><a href="#dss_intervalmap">"llvm/ADT/IntervalMap.h"</a></li> - <li><a href="#dss_map"><map></a></li> - <li><a href="#dss_mapvector">"llvm/ADT/MapVector.h"</a></li> - <li><a href="#dss_inteqclasses">"llvm/ADT/IntEqClasses.h"</a></li> - <li><a href="#dss_immutablemap">"llvm/ADT/ImmutableMap.h"</a></li> - <li><a href="#dss_othermap">Other Map-Like Container Options</a></li> - </ul></li> - <li><a href="#ds_bit">BitVector-like containers</a> - <ul> - <li><a href="#dss_bitvector">A dense bitvector</a></li> - <li><a href="#dss_smallbitvector">A "small" dense bitvector</a></li> - <li><a href="#dss_sparsebitvector">A sparse bitvector</a></li> - </ul></li> - </ul> - </li> - <li><a href="#common">Helpful Hints for Common Operations</a> - <ul> - <li><a href="#inspection">Basic Inspection and Traversal Routines</a> - <ul> - <li><a href="#iterate_function">Iterating over the <tt>BasicBlock</tt>s -in a <tt>Function</tt></a> </li> - <li><a href="#iterate_basicblock">Iterating over the <tt>Instruction</tt>s -in a <tt>BasicBlock</tt></a> </li> - <li><a href="#iterate_institer">Iterating over the <tt>Instruction</tt>s -in a <tt>Function</tt></a> </li> - <li><a href="#iterate_convert">Turning an iterator into a -class pointer</a> </li> - <li><a href="#iterate_complex">Finding call sites: a more -complex example</a> </li> - <li><a href="#calls_and_invokes">Treating calls and invokes -the same way</a> </li> - <li><a href="#iterate_chains">Iterating over def-use & -use-def chains</a> </li> - <li><a href="#iterate_preds">Iterating over predecessors & -successors of blocks</a></li> - </ul> - </li> - <li><a href="#simplechanges">Making simple changes</a> - <ul> - <li><a href="#schanges_creating">Creating and inserting new - <tt>Instruction</tt>s</a> </li> - <li><a href="#schanges_deleting">Deleting <tt>Instruction</tt>s</a> </li> - <li><a href="#schanges_replacing">Replacing an <tt>Instruction</tt> -with another <tt>Value</tt></a> </li> - <li><a href="#schanges_deletingGV">Deleting <tt>GlobalVariable</tt>s</a> </li> - </ul> - </li> - <li><a href="#create_types">How to Create Types</a></li> -<!-- - <li>Working with the Control Flow Graph - <ul> - <li>Accessing predecessors and successors of a <tt>BasicBlock</tt> - <li> - <li> - </ul> ---> - </ul> - </li> - - <li><a href="#threading">Threads and LLVM</a> - <ul> - <li><a href="#startmultithreaded">Entering and Exiting Multithreaded Mode - </a></li> - <li><a href="#shutdown">Ending execution with <tt>llvm_shutdown()</tt></a></li> - <li><a href="#managedstatic">Lazy initialization with <tt>ManagedStatic</tt></a></li> - <li><a href="#llvmcontext">Achieving Isolation with <tt>LLVMContext</tt></a></li> - <li><a href="#jitthreading">Threads and the JIT</a></li> - </ul> - </li> - - <li><a href="#advanced">Advanced Topics</a> - <ul> - - <li><a href="#SymbolTable">The <tt>ValueSymbolTable</tt> class</a></li> - <li><a href="#UserLayout">The <tt>User</tt> and owned <tt>Use</tt> classes' memory layout</a></li> - </ul></li> - - <li><a href="#coreclasses">The Core LLVM Class Hierarchy Reference</a> - <ul> - <li><a href="#Type">The <tt>Type</tt> class</a> </li> - <li><a href="#Module">The <tt>Module</tt> class</a></li> - <li><a href="#Value">The <tt>Value</tt> class</a> - <ul> - <li><a href="#User">The <tt>User</tt> class</a> - <ul> - <li><a href="#Instruction">The <tt>Instruction</tt> class</a></li> - <li><a href="#Constant">The <tt>Constant</tt> class</a> - <ul> - <li><a href="#GlobalValue">The <tt>GlobalValue</tt> class</a> - <ul> - <li><a href="#Function">The <tt>Function</tt> class</a></li> - <li><a href="#GlobalVariable">The <tt>GlobalVariable</tt> class</a></li> - </ul> - </li> - </ul> - </li> - </ul> - </li> - <li><a href="#BasicBlock">The <tt>BasicBlock</tt> class</a></li> - <li><a href="#Argument">The <tt>Argument</tt> class</a></li> - </ul> - </li> - </ul> - </li> -</ol> - -<div class="doc_author"> - <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a>, - <a href="mailto:dhurjati@cs.uiuc.edu">Dinakar Dhurjati</a>, - <a href="mailto:ggreif@gmail.com">Gabor Greif</a>, - <a href="mailto:jstanley@cs.uiuc.edu">Joel Stanley</a>, - <a href="mailto:rspencer@x10sys.com">Reid Spencer</a> and - <a href="mailto:owen@apple.com">Owen Anderson</a></p> -</div> - -<!-- *********************************************************************** --> -<h2> - <a name="introduction">Introduction </a> -</h2> -<!-- *********************************************************************** --> - -<div> - -<p>This document is meant to highlight some of the important classes and -interfaces available in the LLVM source-base. This manual is not -intended to explain what LLVM is, how it works, and what LLVM code looks -like. It assumes that you know the basics of LLVM and are interested -in writing transformations or otherwise analyzing or manipulating the -code.</p> - -<p>This document should get you oriented so that you can find your -way in the continuously growing source code that makes up the LLVM -infrastructure. Note that this manual is not intended to serve as a -replacement for reading the source code, so if you think there should be -a method in one of these classes to do something, but it's not listed, -check the source. Links to the <a href="/doxygen/">doxygen</a> sources -are provided to make this as easy as possible.</p> - -<p>The first section of this document describes general information that is -useful to know when working in the LLVM infrastructure, and the second describes -the Core LLVM classes. In the future this manual will be extended with -information describing how to use extension libraries, such as dominator -information, CFG traversal routines, and useful utilities like the <tt><a -href="/doxygen/InstVisitor_8h-source.html">InstVisitor</a></tt> template.</p> - -</div> - -<!-- *********************************************************************** --> -<h2> - <a name="general">General Information</a> -</h2> -<!-- *********************************************************************** --> - -<div> - -<p>This section contains general information that is useful if you are working -in the LLVM source-base, but that isn't specific to any particular API.</p> - -<!-- ======================================================================= --> -<h3> - <a name="stl">The C++ Standard Template Library</a> -</h3> - -<div> - -<p>LLVM makes heavy use of the C++ Standard Template Library (STL), -perhaps much more than you are used to, or have seen before. Because of -this, you might want to do a little background reading in the -techniques used and capabilities of the library. There are many good -pages that discuss the STL, and several books on the subject that you -can get, so it will not be discussed in this document.</p> - -<p>Here are some useful links:</p> - -<ol> - -<li><a href="http://www.dinkumware.com/manuals/#Standard C++ Library">Dinkumware -C++ Library reference</a> - an excellent reference for the STL and other parts -of the standard C++ library.</li> - -<li><a href="http://www.tempest-sw.com/cpp/">C++ In a Nutshell</a> - This is an -O'Reilly book in the making. It has a decent Standard Library -Reference that rivals Dinkumware's, and is unfortunately no longer free since the -book has been published.</li> - -<li><a href="http://www.parashift.com/c++-faq-lite/">C++ Frequently Asked -Questions</a></li> - -<li><a href="http://www.sgi.com/tech/stl/">SGI's STL Programmer's Guide</a> - -Contains a useful <a -href="http://www.sgi.com/tech/stl/stl_introduction.html">Introduction to the -STL</a>.</li> - -<li><a href="http://www.research.att.com/%7Ebs/C++.html">Bjarne Stroustrup's C++ -Page</a></li> - -<li><a href="http://64.78.49.204/"> -Bruce Eckel's Thinking in C++, 2nd ed. Volume 2 Revision 4.0 (even better, get -the book).</a></li> - -</ol> - -<p>You are also encouraged to take a look at the <a -href="CodingStandards.html">LLVM Coding Standards</a> guide which focuses on how -to write maintainable code more than where to put your curly braces.</p> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="stl">Other useful references</a> -</h3> - -<div> - -<ol> -<li><a href="http://www.fortran-2000.com/ArnaudRecipes/sharedlib.html">Using -static and shared libraries across platforms</a></li> -</ol> - -</div> - -</div> - -<!-- *********************************************************************** --> -<h2> - <a name="apis">Important and useful LLVM APIs</a> -</h2> -<!-- *********************************************************************** --> - -<div> - -<p>Here we highlight some LLVM APIs that are generally useful and good to -know about when writing transformations.</p> - -<!-- ======================================================================= --> -<h3> - <a name="isa">The <tt>isa<></tt>, <tt>cast<></tt> and - <tt>dyn_cast<></tt> templates</a> -</h3> - -<div> - -<p>The LLVM source-base makes extensive use of a custom form of RTTI. -These templates have many similarities to the C++ <tt>dynamic_cast<></tt> -operator, but they don't have some drawbacks (primarily stemming from -the fact that <tt>dynamic_cast<></tt> only works on classes that -have a v-table). Because they are used so often, you must know what they -do and how they work. All of these templates are defined in the <a - href="/doxygen/Casting_8h-source.html"><tt>llvm/Support/Casting.h</tt></a> -file (note that you very rarely have to include this file directly).</p> - -<dl> - <dt><tt>isa<></tt>: </dt> - - <dd><p>The <tt>isa<></tt> operator works exactly like the Java - "<tt>instanceof</tt>" operator. It returns true or false depending on whether - a reference or pointer points to an instance of the specified class. This can - be very useful for constraint checking of various sorts (example below).</p> - </dd> - - <dt><tt>cast<></tt>: </dt> - - <dd><p>The <tt>cast<></tt> operator is a "checked cast" operation. It - converts a pointer or reference from a base class to a derived class, causing - an assertion failure if it is not really an instance of the right type. This - should be used in cases where you have some information that makes you believe - that something is of the right type. An example of the <tt>isa<></tt> - and <tt>cast<></tt> template is:</p> - -<div class="doc_code"> -<pre> -static bool isLoopInvariant(const <a href="#Value">Value</a> *V, const Loop *L) { - if (isa<<a href="#Constant">Constant</a>>(V) || isa<<a href="#Argument">Argument</a>>(V) || isa<<a href="#GlobalValue">GlobalValue</a>>(V)) - return true; - - // <i>Otherwise, it must be an instruction...</i> - return !L->contains(cast<<a href="#Instruction">Instruction</a>>(V)->getParent()); -} -</pre> -</div> - - <p>Note that you should <b>not</b> use an <tt>isa<></tt> test followed - by a <tt>cast<></tt>, for that use the <tt>dyn_cast<></tt> - operator.</p> - - </dd> - - <dt><tt>dyn_cast<></tt>:</dt> - - <dd><p>The <tt>dyn_cast<></tt> operator is a "checking cast" operation. - It checks to see if the operand is of the specified type, and if so, returns a - pointer to it (this operator does not work with references). If the operand is - not of the correct type, a null pointer is returned. Thus, this works very - much like the <tt>dynamic_cast<></tt> operator in C++, and should be - used in the same circumstances. Typically, the <tt>dyn_cast<></tt> - operator is used in an <tt>if</tt> statement or some other flow control - statement like this:</p> - -<div class="doc_code"> -<pre> -if (<a href="#AllocationInst">AllocationInst</a> *AI = dyn_cast<<a href="#AllocationInst">AllocationInst</a>>(Val)) { - // <i>...</i> -} -</pre> -</div> - - <p>This form of the <tt>if</tt> statement effectively combines together a call - to <tt>isa<></tt> and a call to <tt>cast<></tt> into one - statement, which is very convenient.</p> - - <p>Note that the <tt>dyn_cast<></tt> operator, like C++'s - <tt>dynamic_cast<></tt> or Java's <tt>instanceof</tt> operator, can be - abused. In particular, you should not use big chained <tt>if/then/else</tt> - blocks to check for lots of different variants of classes. If you find - yourself wanting to do this, it is much cleaner and more efficient to use the - <tt>InstVisitor</tt> class to dispatch over the instruction type directly.</p> - - </dd> - - <dt><tt>cast_or_null<></tt>: </dt> - - <dd><p>The <tt>cast_or_null<></tt> operator works just like the - <tt>cast<></tt> operator, except that it allows for a null pointer as an - argument (which it then propagates). This can sometimes be useful, allowing - you to combine several null checks into one.</p></dd> - - <dt><tt>dyn_cast_or_null<></tt>: </dt> - - <dd><p>The <tt>dyn_cast_or_null<></tt> operator works just like the - <tt>dyn_cast<></tt> operator, except that it allows for a null pointer - as an argument (which it then propagates). This can sometimes be useful, - allowing you to combine several null checks into one.</p></dd> - -</dl> - -<p>These five templates can be used with any classes, whether they have a -v-table or not. If you want to add support for these templates, see the -document <a href="HowToSetUpLLVMStyleRTTI.html">How to set up LLVM-style -RTTI for your class hierarchy </a>. -</p> - -</div> - - -<!-- ======================================================================= --> -<h3> - <a name="string_apis">Passing strings (the <tt>StringRef</tt> -and <tt>Twine</tt> classes)</a> -</h3> - -<div> - -<p>Although LLVM generally does not do much string manipulation, we do have -several important APIs which take strings. Two important examples are the -Value class -- which has names for instructions, functions, etc. -- and the -StringMap class which is used extensively in LLVM and Clang.</p> - -<p>These are generic classes, and they need to be able to accept strings which -may have embedded null characters. Therefore, they cannot simply take -a <tt>const char *</tt>, and taking a <tt>const std::string&</tt> requires -clients to perform a heap allocation which is usually unnecessary. Instead, -many LLVM APIs use a <tt>StringRef</tt> or a <tt>const Twine&</tt> for -passing strings efficiently.</p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="StringRef">The <tt>StringRef</tt> class</a> -</h4> - -<div> - -<p>The <tt>StringRef</tt> data type represents a reference to a constant string -(a character array and a length) and supports the common operations available -on <tt>std:string</tt>, but does not require heap allocation.</p> - -<p>It can be implicitly constructed using a C style null-terminated string, -an <tt>std::string</tt>, or explicitly with a character pointer and length. -For example, the <tt>StringRef</tt> find function is declared as:</p> - -<pre class="doc_code"> - iterator find(StringRef Key); -</pre> - -<p>and clients can call it using any one of:</p> - -<pre class="doc_code"> - Map.find("foo"); <i>// Lookup "foo"</i> - Map.find(std::string("bar")); <i>// Lookup "bar"</i> - Map.find(StringRef("\0baz", 4)); <i>// Lookup "\0baz"</i> -</pre> - -<p>Similarly, APIs which need to return a string may return a <tt>StringRef</tt> -instance, which can be used directly or converted to an <tt>std::string</tt> -using the <tt>str</tt> member function. See -"<tt><a href="/doxygen/classllvm_1_1StringRef_8h-source.html">llvm/ADT/StringRef.h</a></tt>" -for more information.</p> - -<p>You should rarely use the <tt>StringRef</tt> class directly, because it contains -pointers to external memory it is not generally safe to store an instance of the -class (unless you know that the external storage will not be freed). StringRef is -small and pervasive enough in LLVM that it should always be passed by value.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="Twine">The <tt>Twine</tt> class</a> -</h4> - -<div> - -<p>The <tt><a href="/doxygen/classllvm_1_1Twine.html">Twine</a></tt> class is an -efficient way for APIs to accept concatenated strings. For example, a common -LLVM paradigm is to name one instruction based on -the name of another instruction with a suffix, for example:</p> - -<div class="doc_code"> -<pre> - New = CmpInst::Create(<i>...</i>, SO->getName() + ".cmp"); -</pre> -</div> - -<p>The <tt>Twine</tt> class is effectively a lightweight -<a href="http://en.wikipedia.org/wiki/Rope_(computer_science)">rope</a> -which points to temporary (stack allocated) objects. Twines can be implicitly -constructed as the result of the plus operator applied to strings (i.e., a C -strings, an <tt>std::string</tt>, or a <tt>StringRef</tt>). The twine delays -the actual concatenation of strings until it is actually required, at which -point it can be efficiently rendered directly into a character array. This -avoids unnecessary heap allocation involved in constructing the temporary -results of string concatenation. See -"<tt><a href="/doxygen/Twine_8h_source.html">llvm/ADT/Twine.h</a></tt>" -and <a href="#dss_twine">here</a> for more information.</p> - -<p>As with a <tt>StringRef</tt>, <tt>Twine</tt> objects point to external memory -and should almost never be stored or mentioned directly. They are intended -solely for use when defining a function which should be able to efficiently -accept concatenated strings.</p> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="DEBUG">The <tt>DEBUG()</tt> macro and <tt>-debug</tt> option</a> -</h3> - -<div> - -<p>Often when working on your pass you will put a bunch of debugging printouts -and other code into your pass. After you get it working, you want to remove -it, but you may need it again in the future (to work out new bugs that you run -across).</p> - -<p> Naturally, because of this, you don't want to delete the debug printouts, -but you don't want them to always be noisy. A standard compromise is to comment -them out, allowing you to enable them if you need them in the future.</p> - -<p>The "<tt><a href="/doxygen/Debug_8h-source.html">llvm/Support/Debug.h</a></tt>" -file provides a macro named <tt>DEBUG()</tt> that is a much nicer solution to -this problem. Basically, you can put arbitrary code into the argument of the -<tt>DEBUG</tt> macro, and it is only executed if '<tt>opt</tt>' (or any other -tool) is run with the '<tt>-debug</tt>' command line argument:</p> - -<div class="doc_code"> -<pre> -DEBUG(errs() << "I am here!\n"); -</pre> -</div> - -<p>Then you can run your pass like this:</p> - -<div class="doc_code"> -<pre> -$ opt < a.bc > /dev/null -mypass -<i><no output></i> -$ opt < a.bc > /dev/null -mypass -debug -I am here! -</pre> -</div> - -<p>Using the <tt>DEBUG()</tt> macro instead of a home-brewed solution allows you -to not have to create "yet another" command line option for the debug output for -your pass. Note that <tt>DEBUG()</tt> macros are disabled for optimized builds, -so they do not cause a performance impact at all (for the same reason, they -should also not contain side-effects!).</p> - -<p>One additional nice thing about the <tt>DEBUG()</tt> macro is that you can -enable or disable it directly in gdb. Just use "<tt>set DebugFlag=0</tt>" or -"<tt>set DebugFlag=1</tt>" from the gdb if the program is running. If the -program hasn't been started yet, you can always just run it with -<tt>-debug</tt>.</p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="DEBUG_TYPE">Fine grained debug info with <tt>DEBUG_TYPE</tt> and - the <tt>-debug-only</tt> option</a> -</h4> - -<div> - -<p>Sometimes you may find yourself in a situation where enabling <tt>-debug</tt> -just turns on <b>too much</b> information (such as when working on the code -generator). If you want to enable debug information with more fine-grained -control, you define the <tt>DEBUG_TYPE</tt> macro and the <tt>-debug</tt> only -option as follows:</p> - -<div class="doc_code"> -<pre> -#undef DEBUG_TYPE -DEBUG(errs() << "No debug type\n"); -#define DEBUG_TYPE "foo" -DEBUG(errs() << "'foo' debug type\n"); -#undef DEBUG_TYPE -#define DEBUG_TYPE "bar" -DEBUG(errs() << "'bar' debug type\n")); -#undef DEBUG_TYPE -#define DEBUG_TYPE "" -DEBUG(errs() << "No debug type (2)\n"); -</pre> -</div> - -<p>Then you can run your pass like this:</p> - -<div class="doc_code"> -<pre> -$ opt < a.bc > /dev/null -mypass -<i><no output></i> -$ opt < a.bc > /dev/null -mypass -debug -No debug type -'foo' debug type -'bar' debug type -No debug type (2) -$ opt < a.bc > /dev/null -mypass -debug-only=foo -'foo' debug type -$ opt < a.bc > /dev/null -mypass -debug-only=bar -'bar' debug type -</pre> -</div> - -<p>Of course, in practice, you should only set <tt>DEBUG_TYPE</tt> at the top of -a file, to specify the debug type for the entire module (if you do this before -you <tt>#include "llvm/Support/Debug.h"</tt>, you don't have to insert the ugly -<tt>#undef</tt>'s). Also, you should use names more meaningful than "foo" and -"bar", because there is no system in place to ensure that names do not -conflict. If two different modules use the same string, they will all be turned -on when the name is specified. This allows, for example, all debug information -for instruction scheduling to be enabled with <tt>-debug-type=InstrSched</tt>, -even if the source lives in multiple files.</p> - -<p>The <tt>DEBUG_WITH_TYPE</tt> macro is also available for situations where you -would like to set <tt>DEBUG_TYPE</tt>, but only for one specific <tt>DEBUG</tt> -statement. It takes an additional first parameter, which is the type to use. For -example, the preceding example could be written as:</p> - - -<div class="doc_code"> -<pre> -DEBUG_WITH_TYPE("", errs() << "No debug type\n"); -DEBUG_WITH_TYPE("foo", errs() << "'foo' debug type\n"); -DEBUG_WITH_TYPE("bar", errs() << "'bar' debug type\n")); -DEBUG_WITH_TYPE("", errs() << "No debug type (2)\n"); -</pre> -</div> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="Statistic">The <tt>Statistic</tt> class & <tt>-stats</tt> - option</a> -</h3> - -<div> - -<p>The "<tt><a -href="/doxygen/Statistic_8h-source.html">llvm/ADT/Statistic.h</a></tt>" file -provides a class named <tt>Statistic</tt> that is used as a unified way to -keep track of what the LLVM compiler is doing and how effective various -optimizations are. It is useful to see what optimizations are contributing to -making a particular program run faster.</p> - -<p>Often you may run your pass on some big program, and you're interested to see -how many times it makes a certain transformation. Although you can do this with -hand inspection, or some ad-hoc method, this is a real pain and not very useful -for big programs. Using the <tt>Statistic</tt> class makes it very easy to -keep track of this information, and the calculated information is presented in a -uniform manner with the rest of the passes being executed.</p> - -<p>There are many examples of <tt>Statistic</tt> uses, but the basics of using -it are as follows:</p> - -<ol> - <li><p>Define your statistic like this:</p> - -<div class="doc_code"> -<pre> -#define <a href="#DEBUG_TYPE">DEBUG_TYPE</a> "mypassname" <i>// This goes before any #includes.</i> -STATISTIC(NumXForms, "The # of times I did stuff"); -</pre> -</div> - - <p>The <tt>STATISTIC</tt> macro defines a static variable, whose name is - specified by the first argument. The pass name is taken from the DEBUG_TYPE - macro, and the description is taken from the second argument. The variable - defined ("NumXForms" in this case) acts like an unsigned integer.</p></li> - - <li><p>Whenever you make a transformation, bump the counter:</p> - -<div class="doc_code"> -<pre> -++NumXForms; // <i>I did stuff!</i> -</pre> -</div> - - </li> - </ol> - - <p>That's all you have to do. To get '<tt>opt</tt>' to print out the - statistics gathered, use the '<tt>-stats</tt>' option:</p> - -<div class="doc_code"> -<pre> -$ opt -stats -mypassname < program.bc > /dev/null -<i>... statistics output ...</i> -</pre> -</div> - - <p> When running <tt>opt</tt> on a C file from the SPEC benchmark -suite, it gives a report that looks like this:</p> - -<div class="doc_code"> -<pre> - 7646 bitcodewriter - Number of normal instructions - 725 bitcodewriter - Number of oversized instructions - 129996 bitcodewriter - Number of bitcode bytes written - 2817 raise - Number of insts DCEd or constprop'd - 3213 raise - Number of cast-of-self removed - 5046 raise - Number of expression trees converted - 75 raise - Number of other getelementptr's formed - 138 raise - Number of load/store peepholes - 42 deadtypeelim - Number of unused typenames removed from symtab - 392 funcresolve - Number of varargs functions resolved - 27 globaldce - Number of global variables removed - 2 adce - Number of basic blocks removed - 134 cee - Number of branches revectored - 49 cee - Number of setcc instruction eliminated - 532 gcse - Number of loads removed - 2919 gcse - Number of instructions removed - 86 indvars - Number of canonical indvars added - 87 indvars - Number of aux indvars removed - 25 instcombine - Number of dead inst eliminate - 434 instcombine - Number of insts combined - 248 licm - Number of load insts hoisted - 1298 licm - Number of insts hoisted to a loop pre-header - 3 licm - Number of insts hoisted to multiple loop preds (bad, no loop pre-header) - 75 mem2reg - Number of alloca's promoted - 1444 cfgsimplify - Number of blocks simplified -</pre> -</div> - -<p>Obviously, with so many optimizations, having a unified framework for this -stuff is very nice. Making your pass fit well into the framework makes it more -maintainable and useful.</p> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="ViewGraph">Viewing graphs while debugging code</a> -</h3> - -<div> - -<p>Several of the important data structures in LLVM are graphs: for example -CFGs made out of LLVM <a href="#BasicBlock">BasicBlock</a>s, CFGs made out of -LLVM <a href="CodeGenerator.html#machinebasicblock">MachineBasicBlock</a>s, and -<a href="CodeGenerator.html#selectiondag_intro">Instruction Selection -DAGs</a>. In many cases, while debugging various parts of the compiler, it is -nice to instantly visualize these graphs.</p> - -<p>LLVM provides several callbacks that are available in a debug build to do -exactly that. If you call the <tt>Function::viewCFG()</tt> method, for example, -the current LLVM tool will pop up a window containing the CFG for the function -where each basic block is a node in the graph, and each node contains the -instructions in the block. Similarly, there also exists -<tt>Function::viewCFGOnly()</tt> (does not include the instructions), the -<tt>MachineFunction::viewCFG()</tt> and <tt>MachineFunction::viewCFGOnly()</tt>, -and the <tt>SelectionDAG::viewGraph()</tt> methods. Within GDB, for example, -you can usually use something like <tt>call DAG.viewGraph()</tt> to pop -up a window. Alternatively, you can sprinkle calls to these functions in your -code in places you want to debug.</p> - -<p>Getting this to work requires a small amount of configuration. On Unix -systems with X11, install the <a href="http://www.graphviz.org">graphviz</a> -toolkit, and make sure 'dot' and 'gv' are in your path. If you are running on -Mac OS/X, download and install the Mac OS/X <a -href="http://www.pixelglow.com/graphviz/">Graphviz program</a>, and add -<tt>/Applications/Graphviz.app/Contents/MacOS/</tt> (or wherever you install -it) to your path. Once in your system and path are set up, rerun the LLVM -configure script and rebuild LLVM to enable this functionality.</p> - -<p><tt>SelectionDAG</tt> has been extended to make it easier to locate -<i>interesting</i> nodes in large complex graphs. From gdb, if you -<tt>call DAG.setGraphColor(<i>node</i>, "<i>color</i>")</tt>, then the -next <tt>call DAG.viewGraph()</tt> would highlight the node in the -specified color (choices of colors can be found at <a -href="http://www.graphviz.org/doc/info/colors.html">colors</a>.) More -complex node attributes can be provided with <tt>call -DAG.setGraphAttrs(<i>node</i>, "<i>attributes</i>")</tt> (choices can be -found at <a href="http://www.graphviz.org/doc/info/attrs.html">Graph -Attributes</a>.) If you want to restart and clear all the current graph -attributes, then you can <tt>call DAG.clearGraphAttrs()</tt>. </p> - -<p>Note that graph visualization features are compiled out of Release builds -to reduce file size. This means that you need a Debug+Asserts or -Release+Asserts build to use these features.</p> - -</div> - -</div> - -<!-- *********************************************************************** --> -<h2> - <a name="datastructure">Picking the Right Data Structure for a Task</a> -</h2> -<!-- *********************************************************************** --> - -<div> - -<p>LLVM has a plethora of data structures in the <tt>llvm/ADT/</tt> directory, - and we commonly use STL data structures. This section describes the trade-offs - you should consider when you pick one.</p> - -<p> -The first step is a choose your own adventure: do you want a sequential -container, a set-like container, or a map-like container? The most important -thing when choosing a container is the algorithmic properties of how you plan to -access the container. Based on that, you should use:</p> - -<ul> -<li>a <a href="#ds_map">map-like</a> container if you need efficient look-up - of an value based on another value. Map-like containers also support - efficient queries for containment (whether a key is in the map). Map-like - containers generally do not support efficient reverse mapping (values to - keys). If you need that, use two maps. Some map-like containers also - support efficient iteration through the keys in sorted order. Map-like - containers are the most expensive sort, only use them if you need one of - these capabilities.</li> - -<li>a <a href="#ds_set">set-like</a> container if you need to put a bunch of - stuff into a container that automatically eliminates duplicates. Some - set-like containers support efficient iteration through the elements in - sorted order. Set-like containers are more expensive than sequential - containers. -</li> - -<li>a <a href="#ds_sequential">sequential</a> container provides - the most efficient way to add elements and keeps track of the order they are - added to the collection. They permit duplicates and support efficient - iteration, but do not support efficient look-up based on a key. -</li> - -<li>a <a href="#ds_string">string</a> container is a specialized sequential - container or reference structure that is used for character or byte - arrays.</li> - -<li>a <a href="#ds_bit">bit</a> container provides an efficient way to store and - perform set operations on sets of numeric id's, while automatically - eliminating duplicates. Bit containers require a maximum of 1 bit for each - identifier you want to store. -</li> -</ul> - -<p> -Once the proper category of container is determined, you can fine tune the -memory use, constant factors, and cache behaviors of access by intelligently -picking a member of the category. Note that constant factors and cache behavior -can be a big deal. If you have a vector that usually only contains a few -elements (but could contain many), for example, it's much better to use -<a href="#dss_smallvector">SmallVector</a> than <a href="#dss_vector">vector</a> -. Doing so avoids (relatively) expensive malloc/free calls, which dwarf the -cost of adding the elements to the container. </p> - -<!-- ======================================================================= --> -<h3> - <a name="ds_sequential">Sequential Containers (std::vector, std::list, etc)</a> -</h3> - -<div> -There are a variety of sequential containers available for you, based on your -needs. Pick the first in this section that will do what you want. - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_arrayref">llvm/ADT/ArrayRef.h</a> -</h4> - -<div> -<p>The llvm::ArrayRef class is the preferred class to use in an interface that - accepts a sequential list of elements in memory and just reads from them. By - taking an ArrayRef, the API can be passed a fixed size array, an std::vector, - an llvm::SmallVector and anything else that is contiguous in memory. -</p> -</div> - - - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_fixedarrays">Fixed Size Arrays</a> -</h4> - -<div> -<p>Fixed size arrays are very simple and very fast. They are good if you know -exactly how many elements you have, or you have a (low) upper bound on how many -you have.</p> -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_heaparrays">Heap Allocated Arrays</a> -</h4> - -<div> -<p>Heap allocated arrays (new[] + delete[]) are also simple. They are good if -the number of elements is variable, if you know how many elements you will need -before the array is allocated, and if the array is usually large (if not, -consider a <a href="#dss_smallvector">SmallVector</a>). The cost of a heap -allocated array is the cost of the new/delete (aka malloc/free). Also note that -if you are allocating an array of a type with a constructor, the constructor and -destructors will be run for every element in the array (re-sizable vectors only -construct those elements actually used).</p> -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_tinyptrvector">"llvm/ADT/TinyPtrVector.h"</a> -</h4> - - -<div> -<p><tt>TinyPtrVector<Type></tt> is a highly specialized collection class -that is optimized to avoid allocation in the case when a vector has zero or one -elements. It has two major restrictions: 1) it can only hold values of pointer -type, and 2) it cannot hold a null pointer.</p> - -<p>Since this container is highly specialized, it is rarely used.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_smallvector">"llvm/ADT/SmallVector.h"</a> -</h4> - -<div> -<p><tt>SmallVector<Type, N></tt> is a simple class that looks and smells -just like <tt>vector<Type></tt>: -it supports efficient iteration, lays out elements in memory order (so you can -do pointer arithmetic between elements), supports efficient push_back/pop_back -operations, supports efficient random access to its elements, etc.</p> - -<p>The advantage of SmallVector is that it allocates space for -some number of elements (N) <b>in the object itself</b>. Because of this, if -the SmallVector is dynamically smaller than N, no malloc is performed. This can -be a big win in cases where the malloc/free call is far more expensive than the -code that fiddles around with the elements.</p> - -<p>This is good for vectors that are "usually small" (e.g. the number of -predecessors/successors of a block is usually less than 8). On the other hand, -this makes the size of the SmallVector itself large, so you don't want to -allocate lots of them (doing so will waste a lot of space). As such, -SmallVectors are most useful when on the stack.</p> - -<p>SmallVector also provides a nice portable and efficient replacement for -<tt>alloca</tt>.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_vector"><vector></a> -</h4> - -<div> -<p> -std::vector is well loved and respected. It is useful when SmallVector isn't: -when the size of the vector is often large (thus the small optimization will -rarely be a benefit) or if you will be allocating many instances of the vector -itself (which would waste space for elements that aren't in the container). -vector is also useful when interfacing with code that expects vectors :). -</p> - -<p>One worthwhile note about std::vector: avoid code like this:</p> - -<div class="doc_code"> -<pre> -for ( ... ) { - std::vector<foo> V; - // make use of V. -} -</pre> -</div> - -<p>Instead, write this as:</p> - -<div class="doc_code"> -<pre> -std::vector<foo> V; -for ( ... ) { - // make use of V. - V.clear(); -} -</pre> -</div> - -<p>Doing so will save (at least) one heap allocation and free per iteration of -the loop.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_deque"><deque></a> -</h4> - -<div> -<p>std::deque is, in some senses, a generalized version of std::vector. Like -std::vector, it provides constant time random access and other similar -properties, but it also provides efficient access to the front of the list. It -does not guarantee continuity of elements within memory.</p> - -<p>In exchange for this extra flexibility, std::deque has significantly higher -constant factor costs than std::vector. If possible, use std::vector or -something cheaper.</p> -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_list"><list></a> -</h4> - -<div> -<p>std::list is an extremely inefficient class that is rarely useful. -It performs a heap allocation for every element inserted into it, thus having an -extremely high constant factor, particularly for small data types. std::list -also only supports bidirectional iteration, not random access iteration.</p> - -<p>In exchange for this high cost, std::list supports efficient access to both -ends of the list (like std::deque, but unlike std::vector or SmallVector). In -addition, the iterator invalidation characteristics of std::list are stronger -than that of a vector class: inserting or removing an element into the list does -not invalidate iterator or pointers to other elements in the list.</p> -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_ilist">llvm/ADT/ilist.h</a> -</h4> - -<div> -<p><tt>ilist<T></tt> implements an 'intrusive' doubly-linked list. It is -intrusive, because it requires the element to store and provide access to the -prev/next pointers for the list.</p> - -<p><tt>ilist</tt> has the same drawbacks as <tt>std::list</tt>, and additionally -requires an <tt>ilist_traits</tt> implementation for the element type, but it -provides some novel characteristics. In particular, it can efficiently store -polymorphic objects, the traits class is informed when an element is inserted or -removed from the list, and <tt>ilist</tt>s are guaranteed to support a -constant-time splice operation.</p> - -<p>These properties are exactly what we want for things like -<tt>Instruction</tt>s and basic blocks, which is why these are implemented with -<tt>ilist</tt>s.</p> - -Related classes of interest are explained in the following subsections: - <ul> - <li><a href="#dss_ilist_traits">ilist_traits</a></li> - <li><a href="#dss_iplist">iplist</a></li> - <li><a href="#dss_ilist_node">llvm/ADT/ilist_node.h</a></li> - <li><a href="#dss_ilist_sentinel">Sentinels</a></li> - </ul> -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_packedvector">llvm/ADT/PackedVector.h</a> -</h4> - -<div> -<p> -Useful for storing a vector of values using only a few number of bits for each -value. Apart from the standard operations of a vector-like container, it can -also perform an 'or' set operation. -</p> - -<p>For example:</p> - -<div class="doc_code"> -<pre> -enum State { - None = 0x0, - FirstCondition = 0x1, - SecondCondition = 0x2, - Both = 0x3 -}; - -State get() { - PackedVector<State, 2> Vec1; - Vec1.push_back(FirstCondition); - - PackedVector<State, 2> Vec2; - Vec2.push_back(SecondCondition); - - Vec1 |= Vec2; - return Vec1[0]; // returns 'Both'. -} -</pre> -</div> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_ilist_traits">ilist_traits</a> -</h4> - -<div> -<p><tt>ilist_traits<T></tt> is <tt>ilist<T></tt>'s customization -mechanism. <tt>iplist<T></tt> (and consequently <tt>ilist<T></tt>) -publicly derive from this traits class.</p> -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_iplist">iplist</a> -</h4> - -<div> -<p><tt>iplist<T></tt> is <tt>ilist<T></tt>'s base and as such -supports a slightly narrower interface. Notably, inserters from -<tt>T&</tt> are absent.</p> - -<p><tt>ilist_traits<T></tt> is a public base of this class and can be -used for a wide variety of customizations.</p> -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_ilist_node">llvm/ADT/ilist_node.h</a> -</h4> - -<div> -<p><tt>ilist_node<T></tt> implements a the forward and backward links -that are expected by the <tt>ilist<T></tt> (and analogous containers) -in the default manner.</p> - -<p><tt>ilist_node<T></tt>s are meant to be embedded in the node type -<tt>T</tt>, usually <tt>T</tt> publicly derives from -<tt>ilist_node<T></tt>.</p> -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_ilist_sentinel">Sentinels</a> -</h4> - -<div> -<p><tt>ilist</tt>s have another specialty that must be considered. To be a good -citizen in the C++ ecosystem, it needs to support the standard container -operations, such as <tt>begin</tt> and <tt>end</tt> iterators, etc. Also, the -<tt>operator--</tt> must work correctly on the <tt>end</tt> iterator in the -case of non-empty <tt>ilist</tt>s.</p> - -<p>The only sensible solution to this problem is to allocate a so-called -<i>sentinel</i> along with the intrusive list, which serves as the <tt>end</tt> -iterator, providing the back-link to the last element. However conforming to the -C++ convention it is illegal to <tt>operator++</tt> beyond the sentinel and it -also must not be dereferenced.</p> - -<p>These constraints allow for some implementation freedom to the <tt>ilist</tt> -how to allocate and store the sentinel. The corresponding policy is dictated -by <tt>ilist_traits<T></tt>. By default a <tt>T</tt> gets heap-allocated -whenever the need for a sentinel arises.</p> - -<p>While the default policy is sufficient in most cases, it may break down when -<tt>T</tt> does not provide a default constructor. Also, in the case of many -instances of <tt>ilist</tt>s, the memory overhead of the associated sentinels -is wasted. To alleviate the situation with numerous and voluminous -<tt>T</tt>-sentinels, sometimes a trick is employed, leading to <i>ghostly -sentinels</i>.</p> - -<p>Ghostly sentinels are obtained by specially-crafted <tt>ilist_traits<T></tt> -which superpose the sentinel with the <tt>ilist</tt> instance in memory. Pointer -arithmetic is used to obtain the sentinel, which is relative to the -<tt>ilist</tt>'s <tt>this</tt> pointer. The <tt>ilist</tt> is augmented by an -extra pointer, which serves as the back-link of the sentinel. This is the only -field in the ghostly sentinel which can be legally accessed.</p> -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_other">Other Sequential Container options</a> -</h4> - -<div> -<p>Other STL containers are available, such as std::string.</p> - -<p>There are also various STL adapter classes such as std::queue, -std::priority_queue, std::stack, etc. These provide simplified access to an -underlying container but don't affect the cost of the container itself.</p> - -</div> -</div> - -<!-- ======================================================================= --> -<h3> - <a name="ds_string">String-like containers</a> -</h3> - -<div> - -<p> -There are a variety of ways to pass around and use strings in C and C++, and -LLVM adds a few new options to choose from. Pick the first option on this list -that will do what you need, they are ordered according to their relative cost. -</p> -<p> -Note that is is generally preferred to <em>not</em> pass strings around as -"<tt>const char*</tt>"'s. These have a number of problems, including the fact -that they cannot represent embedded nul ("\0") characters, and do not have a -length available efficiently. The general replacement for '<tt>const -char*</tt>' is StringRef. -</p> - -<p>For more information on choosing string containers for APIs, please see -<a href="#string_apis">Passing strings</a>.</p> - - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_stringref">llvm/ADT/StringRef.h</a> -</h4> - -<div> -<p> -The StringRef class is a simple value class that contains a pointer to a -character and a length, and is quite related to the <a -href="#dss_arrayref">ArrayRef</a> class (but specialized for arrays of -characters). Because StringRef carries a length with it, it safely handles -strings with embedded nul characters in it, getting the length does not require -a strlen call, and it even has very convenient APIs for slicing and dicing the -character range that it represents. -</p> - -<p> -StringRef is ideal for passing simple strings around that are known to be live, -either because they are C string literals, std::string, a C array, or a -SmallVector. Each of these cases has an efficient implicit conversion to -StringRef, which doesn't result in a dynamic strlen being executed. -</p> - -<p>StringRef has a few major limitations which make more powerful string -containers useful:</p> - -<ol> -<li>You cannot directly convert a StringRef to a 'const char*' because there is -no way to add a trailing nul (unlike the .c_str() method on various stronger -classes).</li> - - -<li>StringRef doesn't own or keep alive the underlying string bytes. -As such it can easily lead to dangling pointers, and is not suitable for -embedding in datastructures in most cases (instead, use an std::string or -something like that).</li> - -<li>For the same reason, StringRef cannot be used as the return value of a -method if the method "computes" the result string. Instead, use -std::string.</li> - -<li>StringRef's do not allow you to mutate the pointed-to string bytes and it -doesn't allow you to insert or remove bytes from the range. For editing -operations like this, it interoperates with the <a -href="#dss_twine">Twine</a> class.</li> -</ol> - -<p>Because of its strengths and limitations, it is very common for a function to -take a StringRef and for a method on an object to return a StringRef that -points into some string that it owns.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_twine">llvm/ADT/Twine.h</a> -</h4> - -<div> - <p> - The Twine class is used as an intermediary datatype for APIs that want to take - a string that can be constructed inline with a series of concatenations. - Twine works by forming recursive instances of the Twine datatype (a simple - value object) on the stack as temporary objects, linking them together into a - tree which is then linearized when the Twine is consumed. Twine is only safe - to use as the argument to a function, and should always be a const reference, - e.g.: - </p> - - <pre> - void foo(const Twine &T); - ... - StringRef X = ... - unsigned i = ... - foo(X + "." + Twine(i)); - </pre> - - <p>This example forms a string like "blarg.42" by concatenating the values - together, and does not form intermediate strings containing "blarg" or - "blarg.". - </p> - - <p>Because Twine is constructed with temporary objects on the stack, and - because these instances are destroyed at the end of the current statement, - it is an inherently dangerous API. For example, this simple variant contains - undefined behavior and will probably crash:</p> - - <pre> - void foo(const Twine &T); - ... - StringRef X = ... - unsigned i = ... - const Twine &Tmp = X + "." + Twine(i); - foo(Tmp); - </pre> - - <p>... because the temporaries are destroyed before the call. That said, - Twine's are much more efficient than intermediate std::string temporaries, and - they work really well with StringRef. Just be aware of their limitations.</p> - -</div> - - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_smallstring">llvm/ADT/SmallString.h</a> -</h4> - -<div> - -<p>SmallString is a subclass of <a href="#dss_smallvector">SmallVector</a> that -adds some convenience APIs like += that takes StringRef's. SmallString avoids -allocating memory in the case when the preallocated space is enough to hold its -data, and it calls back to general heap allocation when required. Since it owns -its data, it is very safe to use and supports full mutation of the string.</p> - -<p>Like SmallVector's, the big downside to SmallString is their sizeof. While -they are optimized for small strings, they themselves are not particularly -small. This means that they work great for temporary scratch buffers on the -stack, but should not generally be put into the heap: it is very rare to -see a SmallString as the member of a frequently-allocated heap data structure -or returned by-value. -</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_stdstring">std::string</a> -</h4> - -<div> - - <p>The standard C++ std::string class is a very general class that (like - SmallString) owns its underlying data. sizeof(std::string) is very reasonable - so it can be embedded into heap data structures and returned by-value. - On the other hand, std::string is highly inefficient for inline editing (e.g. - concatenating a bunch of stuff together) and because it is provided by the - standard library, its performance characteristics depend a lot of the host - standard library (e.g. libc++ and MSVC provide a highly optimized string - class, GCC contains a really slow implementation). - </p> - - <p>The major disadvantage of std::string is that almost every operation that - makes them larger can allocate memory, which is slow. As such, it is better - to use SmallVector or Twine as a scratch buffer, but then use std::string to - persist the result.</p> - - -</div> - -<!-- end of strings --> -</div> - - -<!-- ======================================================================= --> -<h3> - <a name="ds_set">Set-Like Containers (std::set, SmallSet, SetVector, etc)</a> -</h3> - -<div> - -<p>Set-like containers are useful when you need to canonicalize multiple values -into a single representation. There are several different choices for how to do -this, providing various trade-offs.</p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_sortedvectorset">A sorted 'vector'</a> -</h4> - -<div> - -<p>If you intend to insert a lot of elements, then do a lot of queries, a -great approach is to use a vector (or other sequential container) with -std::sort+std::unique to remove duplicates. This approach works really well if -your usage pattern has these two distinct phases (insert then query), and can be -coupled with a good choice of <a href="#ds_sequential">sequential container</a>. -</p> - -<p> -This combination provides the several nice properties: the result data is -contiguous in memory (good for cache locality), has few allocations, is easy to -address (iterators in the final vector are just indices or pointers), and can be -efficiently queried with a standard binary or radix search.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_smallset">"llvm/ADT/SmallSet.h"</a> -</h4> - -<div> - -<p>If you have a set-like data structure that is usually small and whose elements -are reasonably small, a <tt>SmallSet<Type, N></tt> is a good choice. This set -has space for N elements in place (thus, if the set is dynamically smaller than -N, no malloc traffic is required) and accesses them with a simple linear search. -When the set grows beyond 'N' elements, it allocates a more expensive representation that -guarantees efficient access (for most types, it falls back to std::set, but for -pointers it uses something far better, <a -href="#dss_smallptrset">SmallPtrSet</a>).</p> - -<p>The magic of this class is that it handles small sets extremely efficiently, -but gracefully handles extremely large sets without loss of efficiency. The -drawback is that the interface is quite small: it supports insertion, queries -and erasing, but does not support iteration.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_smallptrset">"llvm/ADT/SmallPtrSet.h"</a> -</h4> - -<div> - -<p>SmallPtrSet has all the advantages of <tt>SmallSet</tt> (and a <tt>SmallSet</tt> of pointers is -transparently implemented with a <tt>SmallPtrSet</tt>), but also supports iterators. If -more than 'N' insertions are performed, a single quadratically -probed hash table is allocated and grows as needed, providing extremely -efficient access (constant time insertion/deleting/queries with low constant -factors) and is very stingy with malloc traffic.</p> - -<p>Note that, unlike <tt>std::set</tt>, the iterators of <tt>SmallPtrSet</tt> are invalidated -whenever an insertion occurs. Also, the values visited by the iterators are not -visited in sorted order.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_denseset">"llvm/ADT/DenseSet.h"</a> -</h4> - -<div> - -<p> -DenseSet is a simple quadratically probed hash table. It excels at supporting -small values: it uses a single allocation to hold all of the pairs that -are currently inserted in the set. DenseSet is a great way to unique small -values that are not simple pointers (use <a -href="#dss_smallptrset">SmallPtrSet</a> for pointers). Note that DenseSet has -the same requirements for the value type that <a -href="#dss_densemap">DenseMap</a> has. -</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_sparseset">"llvm/ADT/SparseSet.h"</a> -</h4> - -<div> - -<p>SparseSet holds a small number of objects identified by unsigned keys of -moderate size. It uses a lot of memory, but provides operations that are -almost as fast as a vector. Typical keys are physical registers, virtual -registers, or numbered basic blocks.</p> - -<p>SparseSet is useful for algorithms that need very fast clear/find/insert/erase -and fast iteration over small sets. It is not intended for building composite -data structures.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_FoldingSet">"llvm/ADT/FoldingSet.h"</a> -</h4> - -<div> - -<p> -FoldingSet is an aggregate class that is really good at uniquing -expensive-to-create or polymorphic objects. It is a combination of a chained -hash table with intrusive links (uniqued objects are required to inherit from -FoldingSetNode) that uses <a href="#dss_smallvector">SmallVector</a> as part of -its ID process.</p> - -<p>Consider a case where you want to implement a "getOrCreateFoo" method for -a complex object (for example, a node in the code generator). The client has a -description of *what* it wants to generate (it knows the opcode and all the -operands), but we don't want to 'new' a node, then try inserting it into a set -only to find out it already exists, at which point we would have to delete it -and return the node that already exists. -</p> - -<p>To support this style of client, FoldingSet perform a query with a -FoldingSetNodeID (which wraps SmallVector) that can be used to describe the -element that we want to query for. The query either returns the element -matching the ID or it returns an opaque ID that indicates where insertion should -take place. Construction of the ID usually does not require heap traffic.</p> - -<p>Because FoldingSet uses intrusive links, it can support polymorphic objects -in the set (for example, you can have SDNode instances mixed with LoadSDNodes). -Because the elements are individually allocated, pointers to the elements are -stable: inserting or removing elements does not invalidate any pointers to other -elements. -</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_set"><set></a> -</h4> - -<div> - -<p><tt>std::set</tt> is a reasonable all-around set class, which is decent at -many things but great at nothing. std::set allocates memory for each element -inserted (thus it is very malloc intensive) and typically stores three pointers -per element in the set (thus adding a large amount of per-element space -overhead). It offers guaranteed log(n) performance, which is not particularly -fast from a complexity standpoint (particularly if the elements of the set are -expensive to compare, like strings), and has extremely high constant factors for -lookup, insertion and removal.</p> - -<p>The advantages of std::set are that its iterators are stable (deleting or -inserting an element from the set does not affect iterators or pointers to other -elements) and that iteration over the set is guaranteed to be in sorted order. -If the elements in the set are large, then the relative overhead of the pointers -and malloc traffic is not a big deal, but if the elements of the set are small, -std::set is almost never a good choice.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_setvector">"llvm/ADT/SetVector.h"</a> -</h4> - -<div> -<p>LLVM's SetVector<Type> is an adapter class that combines your choice of -a set-like container along with a <a href="#ds_sequential">Sequential -Container</a>. The important property -that this provides is efficient insertion with uniquing (duplicate elements are -ignored) with iteration support. It implements this by inserting elements into -both a set-like container and the sequential container, using the set-like -container for uniquing and the sequential container for iteration. -</p> - -<p>The difference between SetVector and other sets is that the order of -iteration is guaranteed to match the order of insertion into the SetVector. -This property is really important for things like sets of pointers. Because -pointer values are non-deterministic (e.g. vary across runs of the program on -different machines), iterating over the pointers in the set will -not be in a well-defined order.</p> - -<p> -The drawback of SetVector is that it requires twice as much space as a normal -set and has the sum of constant factors from the set-like container and the -sequential container that it uses. Use it *only* if you need to iterate over -the elements in a deterministic order. SetVector is also expensive to delete -elements out of (linear time), unless you use it's "pop_back" method, which is -faster. -</p> - -<p><tt>SetVector</tt> is an adapter class that defaults to - using <tt>std::vector</tt> and a size 16 <tt>SmallSet</tt> for the underlying - containers, so it is quite expensive. However, - <tt>"llvm/ADT/SetVector.h"</tt> also provides a <tt>SmallSetVector</tt> - class, which defaults to using a <tt>SmallVector</tt> and <tt>SmallSet</tt> - of a specified size. If you use this, and if your sets are dynamically - smaller than <tt>N</tt>, you will save a lot of heap traffic.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_uniquevector">"llvm/ADT/UniqueVector.h"</a> -</h4> - -<div> - -<p> -UniqueVector is similar to <a href="#dss_setvector">SetVector</a>, but it -retains a unique ID for each element inserted into the set. It internally -contains a map and a vector, and it assigns a unique ID for each value inserted -into the set.</p> - -<p>UniqueVector is very expensive: its cost is the sum of the cost of -maintaining both the map and vector, it has high complexity, high constant -factors, and produces a lot of malloc traffic. It should be avoided.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_immutableset">"llvm/ADT/ImmutableSet.h"</a> -</h4> - -<div> - -<p> -ImmutableSet is an immutable (functional) set implementation based on an AVL -tree. -Adding or removing elements is done through a Factory object and results in the -creation of a new ImmutableSet object. -If an ImmutableSet already exists with the given contents, then the existing one -is returned; equality is compared with a FoldingSetNodeID. -The time and space complexity of add or remove operations is logarithmic in the -size of the original set. - -<p> -There is no method for returning an element of the set, you can only check for -membership. - -</div> - - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_otherset">Other Set-Like Container Options</a> -</h4> - -<div> - -<p> -The STL provides several other options, such as std::multiset and the various -"hash_set" like containers (whether from C++ TR1 or from the SGI library). We -never use hash_set and unordered_set because they are generally very expensive -(each insertion requires a malloc) and very non-portable. -</p> - -<p>std::multiset is useful if you're not interested in elimination of -duplicates, but has all the drawbacks of std::set. A sorted vector (where you -don't delete duplicate entries) or some other approach is almost always -better.</p> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="ds_map">Map-Like Containers (std::map, DenseMap, etc)</a> -</h3> - -<div> -Map-like containers are useful when you want to associate data to a key. As -usual, there are a lot of different ways to do this. :) - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_sortedvectormap">A sorted 'vector'</a> -</h4> - -<div> - -<p> -If your usage pattern follows a strict insert-then-query approach, you can -trivially use the same approach as <a href="#dss_sortedvectorset">sorted vectors -for set-like containers</a>. The only difference is that your query function -(which uses std::lower_bound to get efficient log(n) lookup) should only compare -the key, not both the key and value. This yields the same advantages as sorted -vectors for sets. -</p> -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_stringmap">"llvm/ADT/StringMap.h"</a> -</h4> - -<div> - -<p> -Strings are commonly used as keys in maps, and they are difficult to support -efficiently: they are variable length, inefficient to hash and compare when -long, expensive to copy, etc. StringMap is a specialized container designed to -cope with these issues. It supports mapping an arbitrary range of bytes to an -arbitrary other object.</p> - -<p>The StringMap implementation uses a quadratically-probed hash table, where -the buckets store a pointer to the heap allocated entries (and some other -stuff). The entries in the map must be heap allocated because the strings are -variable length. The string data (key) and the element object (value) are -stored in the same allocation with the string data immediately after the element -object. This container guarantees the "<tt>(char*)(&Value+1)</tt>" points -to the key string for a value.</p> - -<p>The StringMap is very fast for several reasons: quadratic probing is very -cache efficient for lookups, the hash value of strings in buckets is not -recomputed when looking up an element, StringMap rarely has to touch the -memory for unrelated objects when looking up a value (even when hash collisions -happen), hash table growth does not recompute the hash values for strings -already in the table, and each pair in the map is store in a single allocation -(the string data is stored in the same allocation as the Value of a pair).</p> - -<p>StringMap also provides query methods that take byte ranges, so it only ever -copies a string if a value is inserted into the table.</p> - -<p>StringMap iteratation order, however, is not guaranteed to be deterministic, -so any uses which require that should instead use a std::map.</p> -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_indexedmap">"llvm/ADT/IndexedMap.h"</a> -</h4> - -<div> -<p> -IndexedMap is a specialized container for mapping small dense integers (or -values that can be mapped to small dense integers) to some other type. It is -internally implemented as a vector with a mapping function that maps the keys to -the dense integer range. -</p> - -<p> -This is useful for cases like virtual registers in the LLVM code generator: they -have a dense mapping that is offset by a compile-time constant (the first -virtual register ID).</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_densemap">"llvm/ADT/DenseMap.h"</a> -</h4> - -<div> - -<p> -DenseMap is a simple quadratically probed hash table. It excels at supporting -small keys and values: it uses a single allocation to hold all of the pairs that -are currently inserted in the map. DenseMap is a great way to map pointers to -pointers, or map other small types to each other. -</p> - -<p> -There are several aspects of DenseMap that you should be aware of, however. The -iterators in a DenseMap are invalidated whenever an insertion occurs, unlike -map. Also, because DenseMap allocates space for a large number of key/value -pairs (it starts with 64 by default), it will waste a lot of space if your keys -or values are large. Finally, you must implement a partial specialization of -DenseMapInfo for the key that you want, if it isn't already supported. This -is required to tell DenseMap about two special marker values (which can never be -inserted into the map) that it needs internally.</p> - -<p> -DenseMap's find_as() method supports lookup operations using an alternate key -type. This is useful in cases where the normal key type is expensive to -construct, but cheap to compare against. The DenseMapInfo is responsible for -defining the appropriate comparison and hashing methods for each alternate -key type used. -</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_valuemap">"llvm/ADT/ValueMap.h"</a> -</h4> - -<div> - -<p> -ValueMap is a wrapper around a <a href="#dss_densemap">DenseMap</a> mapping -Value*s (or subclasses) to another type. When a Value is deleted or RAUW'ed, -ValueMap will update itself so the new version of the key is mapped to the same -value, just as if the key were a WeakVH. You can configure exactly how this -happens, and what else happens on these two events, by passing -a <code>Config</code> parameter to the ValueMap template.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_intervalmap">"llvm/ADT/IntervalMap.h"</a> -</h4> - -<div> - -<p> IntervalMap is a compact map for small keys and values. It maps key -intervals instead of single keys, and it will automatically coalesce adjacent -intervals. When then map only contains a few intervals, they are stored in the -map object itself to avoid allocations.</p> - -<p> The IntervalMap iterators are quite big, so they should not be passed around -as STL iterators. The heavyweight iterators allow a smaller data structure.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_map"><map></a> -</h4> - -<div> - -<p> -std::map has similar characteristics to <a href="#dss_set">std::set</a>: it uses -a single allocation per pair inserted into the map, it offers log(n) lookup with -an extremely large constant factor, imposes a space penalty of 3 pointers per -pair in the map, etc.</p> - -<p>std::map is most useful when your keys or values are very large, if you need -to iterate over the collection in sorted order, or if you need stable iterators -into the map (i.e. they don't get invalidated if an insertion or deletion of -another element takes place).</p> - -</div> - - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_mapvector">"llvm/ADT/MapVector.h"</a> -</h4> -<div> - -<p> MapVector<KeyT,ValueT> provides a subset of the DenseMap interface. - The main difference is that the iteration order is guaranteed to be - the insertion order, making it an easy (but somewhat expensive) solution - for non-deterministic iteration over maps of pointers. </p> - -<p> It is implemented by mapping from key to an index in a vector of key,value - pairs. This provides fast lookup and iteration, but has two main drawbacks: - The key is stored twice and it doesn't support removing elements. </p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_inteqclasses">"llvm/ADT/IntEqClasses.h"</a> -</h4> - -<div> - -<p>IntEqClasses provides a compact representation of equivalence classes of -small integers. Initially, each integer in the range 0..n-1 has its own -equivalence class. Classes can be joined by passing two class representatives to -the join(a, b) method. Two integers are in the same class when findLeader() -returns the same representative.</p> - -<p>Once all equivalence classes are formed, the map can be compressed so each -integer 0..n-1 maps to an equivalence class number in the range 0..m-1, where m -is the total number of equivalence classes. The map must be uncompressed before -it can be edited again.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_immutablemap">"llvm/ADT/ImmutableMap.h"</a> -</h4> - -<div> - -<p> -ImmutableMap is an immutable (functional) map implementation based on an AVL -tree. -Adding or removing elements is done through a Factory object and results in the -creation of a new ImmutableMap object. -If an ImmutableMap already exists with the given key set, then the existing one -is returned; equality is compared with a FoldingSetNodeID. -The time and space complexity of add or remove operations is logarithmic in the -size of the original map. - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_othermap">Other Map-Like Container Options</a> -</h4> - -<div> - -<p> -The STL provides several other options, such as std::multimap and the various -"hash_map" like containers (whether from C++ TR1 or from the SGI library). We -never use hash_set and unordered_set because they are generally very expensive -(each insertion requires a malloc) and very non-portable.</p> - -<p>std::multimap is useful if you want to map a key to multiple values, but has -all the drawbacks of std::map. A sorted vector or some other approach is almost -always better.</p> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="ds_bit">Bit storage containers (BitVector, SparseBitVector)</a> -</h3> - -<div> -<p>Unlike the other containers, there are only two bit storage containers, and -choosing when to use each is relatively straightforward.</p> - -<p>One additional option is -<tt>std::vector<bool></tt>: we discourage its use for two reasons 1) the -implementation in many common compilers (e.g. commonly available versions of -GCC) is extremely inefficient and 2) the C++ standards committee is likely to -deprecate this container and/or change it significantly somehow. In any case, -please don't use it.</p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_bitvector">BitVector</a> -</h4> - -<div> -<p> The BitVector container provides a dynamic size set of bits for manipulation. -It supports individual bit setting/testing, as well as set operations. The set -operations take time O(size of bitvector), but operations are performed one word -at a time, instead of one bit at a time. This makes the BitVector very fast for -set operations compared to other containers. Use the BitVector when you expect -the number of set bits to be high (IE a dense set). -</p> -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_smallbitvector">SmallBitVector</a> -</h4> - -<div> -<p> The SmallBitVector container provides the same interface as BitVector, but -it is optimized for the case where only a small number of bits, less than -25 or so, are needed. It also transparently supports larger bit counts, but -slightly less efficiently than a plain BitVector, so SmallBitVector should -only be used when larger counts are rare. -</p> - -<p> -At this time, SmallBitVector does not support set operations (and, or, xor), -and its operator[] does not provide an assignable lvalue. -</p> -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="dss_sparsebitvector">SparseBitVector</a> -</h4> - -<div> -<p> The SparseBitVector container is much like BitVector, with one major -difference: Only the bits that are set, are stored. This makes the -SparseBitVector much more space efficient than BitVector when the set is sparse, -as well as making set operations O(number of set bits) instead of O(size of -universe). The downside to the SparseBitVector is that setting and testing of random bits is O(N), and on large SparseBitVectors, this can be slower than BitVector. In our implementation, setting or testing bits in sorted order -(either forwards or reverse) is O(1) worst case. Testing and setting bits within 128 bits (depends on size) of the current bit is also O(1). As a general statement, testing/setting bits in a SparseBitVector is O(distance away from last set bit). -</p> -</div> - -</div> - -</div> - -<!-- *********************************************************************** --> -<h2> - <a name="common">Helpful Hints for Common Operations</a> -</h2> -<!-- *********************************************************************** --> - -<div> - -<p>This section describes how to perform some very simple transformations of -LLVM code. This is meant to give examples of common idioms used, showing the -practical side of LLVM transformations. <p> Because this is a "how-to" section, -you should also read about the main classes that you will be working with. The -<a href="#coreclasses">Core LLVM Class Hierarchy Reference</a> contains details -and descriptions of the main classes that you should know about.</p> - -<!-- NOTE: this section should be heavy on example code --> -<!-- ======================================================================= --> -<h3> - <a name="inspection">Basic Inspection and Traversal Routines</a> -</h3> - -<div> - -<p>The LLVM compiler infrastructure have many different data structures that may -be traversed. Following the example of the C++ standard template library, the -techniques used to traverse these various data structures are all basically the -same. For a enumerable sequence of values, the <tt>XXXbegin()</tt> function (or -method) returns an iterator to the start of the sequence, the <tt>XXXend()</tt> -function returns an iterator pointing to one past the last valid element of the -sequence, and there is some <tt>XXXiterator</tt> data type that is common -between the two operations.</p> - -<p>Because the pattern for iteration is common across many different aspects of -the program representation, the standard template library algorithms may be used -on them, and it is easier to remember how to iterate. First we show a few common -examples of the data structures that need to be traversed. Other data -structures are traversed in very similar ways.</p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="iterate_function">Iterating over the </a><a - href="#BasicBlock"><tt>BasicBlock</tt></a>s in a <a - href="#Function"><tt>Function</tt></a> -</h4> - -<div> - -<p>It's quite common to have a <tt>Function</tt> instance that you'd like to -transform in some way; in particular, you'd like to manipulate its -<tt>BasicBlock</tt>s. To facilitate this, you'll need to iterate over all of -the <tt>BasicBlock</tt>s that constitute the <tt>Function</tt>. The following is -an example that prints the name of a <tt>BasicBlock</tt> and the number of -<tt>Instruction</tt>s it contains:</p> - -<div class="doc_code"> -<pre> -// <i>func is a pointer to a Function instance</i> -for (Function::iterator i = func->begin(), e = func->end(); i != e; ++i) - // <i>Print out the name of the basic block if it has one, and then the</i> - // <i>number of instructions that it contains</i> - errs() << "Basic block (name=" << i->getName() << ") has " - << i->size() << " instructions.\n"; -</pre> -</div> - -<p>Note that i can be used as if it were a pointer for the purposes of -invoking member functions of the <tt>Instruction</tt> class. This is -because the indirection operator is overloaded for the iterator -classes. In the above code, the expression <tt>i->size()</tt> is -exactly equivalent to <tt>(*i).size()</tt> just like you'd expect.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="iterate_basicblock">Iterating over the </a><a - href="#Instruction"><tt>Instruction</tt></a>s in a <a - href="#BasicBlock"><tt>BasicBlock</tt></a> -</h4> - -<div> - -<p>Just like when dealing with <tt>BasicBlock</tt>s in <tt>Function</tt>s, it's -easy to iterate over the individual instructions that make up -<tt>BasicBlock</tt>s. Here's a code snippet that prints out each instruction in -a <tt>BasicBlock</tt>:</p> - -<div class="doc_code"> -<pre> -// <i>blk is a pointer to a BasicBlock instance</i> -for (BasicBlock::iterator i = blk->begin(), e = blk->end(); i != e; ++i) - // <i>The next statement works since operator<<(ostream&,...)</i> - // <i>is overloaded for Instruction&</i> - errs() << *i << "\n"; -</pre> -</div> - -<p>However, this isn't really the best way to print out the contents of a -<tt>BasicBlock</tt>! Since the ostream operators are overloaded for virtually -anything you'll care about, you could have just invoked the print routine on the -basic block itself: <tt>errs() << *blk << "\n";</tt>.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="iterate_institer">Iterating over the </a><a - href="#Instruction"><tt>Instruction</tt></a>s in a <a - href="#Function"><tt>Function</tt></a> -</h4> - -<div> - -<p>If you're finding that you commonly iterate over a <tt>Function</tt>'s -<tt>BasicBlock</tt>s and then that <tt>BasicBlock</tt>'s <tt>Instruction</tt>s, -<tt>InstIterator</tt> should be used instead. You'll need to include <a -href="/doxygen/InstIterator_8h-source.html"><tt>llvm/Support/InstIterator.h</tt></a>, -and then instantiate <tt>InstIterator</tt>s explicitly in your code. Here's a -small example that shows how to dump all instructions in a function to the standard error stream:<p> - -<div class="doc_code"> -<pre> -#include "<a href="/doxygen/InstIterator_8h-source.html">llvm/Support/InstIterator.h</a>" - -// <i>F is a pointer to a Function instance</i> -for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I) - errs() << *I << "\n"; -</pre> -</div> - -<p>Easy, isn't it? You can also use <tt>InstIterator</tt>s to fill a -work list with its initial contents. For example, if you wanted to -initialize a work list to contain all instructions in a <tt>Function</tt> -F, all you would need to do is something like:</p> - -<div class="doc_code"> -<pre> -std::set<Instruction*> worklist; -// or better yet, SmallPtrSet<Instruction*, 64> worklist; - -for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I) - worklist.insert(&*I); -</pre> -</div> - -<p>The STL set <tt>worklist</tt> would now contain all instructions in the -<tt>Function</tt> pointed to by F.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="iterate_convert">Turning an iterator into a class pointer (and - vice-versa)</a> -</h4> - -<div> - -<p>Sometimes, it'll be useful to grab a reference (or pointer) to a class -instance when all you've got at hand is an iterator. Well, extracting -a reference or a pointer from an iterator is very straight-forward. -Assuming that <tt>i</tt> is a <tt>BasicBlock::iterator</tt> and <tt>j</tt> -is a <tt>BasicBlock::const_iterator</tt>:</p> - -<div class="doc_code"> -<pre> -Instruction& inst = *i; // <i>Grab reference to instruction reference</i> -Instruction* pinst = &*i; // <i>Grab pointer to instruction reference</i> -const Instruction& inst = *j; -</pre> -</div> - -<p>However, the iterators you'll be working with in the LLVM framework are -special: they will automatically convert to a ptr-to-instance type whenever they -need to. Instead of dereferencing the iterator and then taking the address of -the result, you can simply assign the iterator to the proper pointer type and -you get the dereference and address-of operation as a result of the assignment -(behind the scenes, this is a result of overloading casting mechanisms). Thus -the last line of the last example,</p> - -<div class="doc_code"> -<pre> -Instruction *pinst = &*i; -</pre> -</div> - -<p>is semantically equivalent to</p> - -<div class="doc_code"> -<pre> -Instruction *pinst = i; -</pre> -</div> - -<p>It's also possible to turn a class pointer into the corresponding iterator, -and this is a constant time operation (very efficient). The following code -snippet illustrates use of the conversion constructors provided by LLVM -iterators. By using these, you can explicitly grab the iterator of something -without actually obtaining it via iteration over some structure:</p> - -<div class="doc_code"> -<pre> -void printNextInstruction(Instruction* inst) { - BasicBlock::iterator it(inst); - ++it; // <i>After this line, it refers to the instruction after *inst</i> - if (it != inst->getParent()->end()) errs() << *it << "\n"; -} -</pre> -</div> - -<p>Unfortunately, these implicit conversions come at a cost; they prevent -these iterators from conforming to standard iterator conventions, and thus -from being usable with standard algorithms and containers. For example, they -prevent the following code, where <tt>B</tt> is a <tt>BasicBlock</tt>, -from compiling:</p> - -<div class="doc_code"> -<pre> - llvm::SmallVector<llvm::Instruction *, 16>(B->begin(), B->end()); -</pre> -</div> - -<p>Because of this, these implicit conversions may be removed some day, -and <tt>operator*</tt> changed to return a pointer instead of a reference.</p> - -</div> - -<!--_______________________________________________________________________--> -<h4> - <a name="iterate_complex">Finding call sites: a slightly more complex - example</a> -</h4> - -<div> - -<p>Say that you're writing a FunctionPass and would like to count all the -locations in the entire module (that is, across every <tt>Function</tt>) where a -certain function (i.e., some <tt>Function</tt>*) is already in scope. As you'll -learn later, you may want to use an <tt>InstVisitor</tt> to accomplish this in a -much more straight-forward manner, but this example will allow us to explore how -you'd do it if you didn't have <tt>InstVisitor</tt> around. In pseudo-code, this -is what we want to do:</p> - -<div class="doc_code"> -<pre> -initialize callCounter to zero -for each Function f in the Module - for each BasicBlock b in f - for each Instruction i in b - if (i is a CallInst and calls the given function) - increment callCounter -</pre> -</div> - -<p>And the actual code is (remember, because we're writing a -<tt>FunctionPass</tt>, our <tt>FunctionPass</tt>-derived class simply has to -override the <tt>runOnFunction</tt> method):</p> - -<div class="doc_code"> -<pre> -Function* targetFunc = ...; - -class OurFunctionPass : public FunctionPass { - public: - OurFunctionPass(): callCounter(0) { } - - virtual runOnFunction(Function& F) { - for (Function::iterator b = F.begin(), be = F.end(); b != be; ++b) { - for (BasicBlock::iterator i = b->begin(), ie = b->end(); i != ie; ++i) { - if (<a href="#CallInst">CallInst</a>* callInst = <a href="#isa">dyn_cast</a><<a - href="#CallInst">CallInst</a>>(&*i)) { - // <i>We know we've encountered a call instruction, so we</i> - // <i>need to determine if it's a call to the</i> - // <i>function pointed to by m_func or not.</i> - if (callInst->getCalledFunction() == targetFunc) - ++callCounter; - } - } - } - } - - private: - unsigned callCounter; -}; -</pre> -</div> - -</div> - -<!--_______________________________________________________________________--> -<h4> - <a name="calls_and_invokes">Treating calls and invokes the same way</a> -</h4> - -<div> - -<p>You may have noticed that the previous example was a bit oversimplified in -that it did not deal with call sites generated by 'invoke' instructions. In -this, and in other situations, you may find that you want to treat -<tt>CallInst</tt>s and <tt>InvokeInst</tt>s the same way, even though their -most-specific common base class is <tt>Instruction</tt>, which includes lots of -less closely-related things. For these cases, LLVM provides a handy wrapper -class called <a -href="http://llvm.org/doxygen/classllvm_1_1CallSite.html"><tt>CallSite</tt></a>. -It is essentially a wrapper around an <tt>Instruction</tt> pointer, with some -methods that provide functionality common to <tt>CallInst</tt>s and -<tt>InvokeInst</tt>s.</p> - -<p>This class has "value semantics": it should be passed by value, not by -reference and it should not be dynamically allocated or deallocated using -<tt>operator new</tt> or <tt>operator delete</tt>. It is efficiently copyable, -assignable and constructable, with costs equivalents to that of a bare pointer. -If you look at its definition, it has only a single pointer member.</p> - -</div> - -<!--_______________________________________________________________________--> -<h4> - <a name="iterate_chains">Iterating over def-use & use-def chains</a> -</h4> - -<div> - -<p>Frequently, we might have an instance of the <a -href="/doxygen/classllvm_1_1Value.html">Value Class</a> and we want to -determine which <tt>User</tt>s use the <tt>Value</tt>. The list of all -<tt>User</tt>s of a particular <tt>Value</tt> is called a <i>def-use</i> chain. -For example, let's say we have a <tt>Function*</tt> named <tt>F</tt> to a -particular function <tt>foo</tt>. Finding all of the instructions that -<i>use</i> <tt>foo</tt> is as simple as iterating over the <i>def-use</i> chain -of <tt>F</tt>:</p> - -<div class="doc_code"> -<pre> -Function *F = ...; - -for (Value::use_iterator i = F->use_begin(), e = F->use_end(); i != e; ++i) - if (Instruction *Inst = dyn_cast<Instruction>(*i)) { - errs() << "F is used in instruction:\n"; - errs() << *Inst << "\n"; - } -</pre> -</div> - -<p>Note that dereferencing a <tt>Value::use_iterator</tt> is not a very cheap -operation. Instead of performing <tt>*i</tt> above several times, consider -doing it only once in the loop body and reusing its result.</p> - -<p>Alternatively, it's common to have an instance of the <a -href="/doxygen/classllvm_1_1User.html">User Class</a> and need to know what -<tt>Value</tt>s are used by it. The list of all <tt>Value</tt>s used by a -<tt>User</tt> is known as a <i>use-def</i> chain. Instances of class -<tt>Instruction</tt> are common <tt>User</tt>s, so we might want to iterate over -all of the values that a particular instruction uses (that is, the operands of -the particular <tt>Instruction</tt>):</p> - -<div class="doc_code"> -<pre> -Instruction *pi = ...; - -for (User::op_iterator i = pi->op_begin(), e = pi->op_end(); i != e; ++i) { - Value *v = *i; - // <i>...</i> -} -</pre> -</div> - -<p>Declaring objects as <tt>const</tt> is an important tool of enforcing -mutation free algorithms (such as analyses, etc.). For this purpose above -iterators come in constant flavors as <tt>Value::const_use_iterator</tt> -and <tt>Value::const_op_iterator</tt>. They automatically arise when -calling <tt>use/op_begin()</tt> on <tt>const Value*</tt>s or -<tt>const User*</tt>s respectively. Upon dereferencing, they return -<tt>const Use*</tt>s. Otherwise the above patterns remain unchanged.</p> - -</div> - -<!--_______________________________________________________________________--> -<h4> - <a name="iterate_preds">Iterating over predecessors & -successors of blocks</a> -</h4> - -<div> - -<p>Iterating over the predecessors and successors of a block is quite easy -with the routines defined in <tt>"llvm/Support/CFG.h"</tt>. Just use code like -this to iterate over all predecessors of BB:</p> - -<div class="doc_code"> -<pre> -#include "llvm/Support/CFG.h" -BasicBlock *BB = ...; - -for (pred_iterator PI = pred_begin(BB), E = pred_end(BB); PI != E; ++PI) { - BasicBlock *Pred = *PI; - // <i>...</i> -} -</pre> -</div> - -<p>Similarly, to iterate over successors use -succ_iterator/succ_begin/succ_end.</p> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="simplechanges">Making simple changes</a> -</h3> - -<div> - -<p>There are some primitive transformation operations present in the LLVM -infrastructure that are worth knowing about. When performing -transformations, it's fairly common to manipulate the contents of basic -blocks. This section describes some of the common methods for doing so -and gives example code.</p> - -<!--_______________________________________________________________________--> -<h4> - <a name="schanges_creating">Creating and inserting new - <tt>Instruction</tt>s</a> -</h4> - -<div> - -<p><i>Instantiating Instructions</i></p> - -<p>Creation of <tt>Instruction</tt>s is straight-forward: simply call the -constructor for the kind of instruction to instantiate and provide the necessary -parameters. For example, an <tt>AllocaInst</tt> only <i>requires</i> a -(const-ptr-to) <tt>Type</tt>. Thus:</p> - -<div class="doc_code"> -<pre> -AllocaInst* ai = new AllocaInst(Type::Int32Ty); -</pre> -</div> - -<p>will create an <tt>AllocaInst</tt> instance that represents the allocation of -one integer in the current stack frame, at run time. Each <tt>Instruction</tt> -subclass is likely to have varying default parameters which change the semantics -of the instruction, so refer to the <a -href="/doxygen/classllvm_1_1Instruction.html">doxygen documentation for the subclass of -Instruction</a> that you're interested in instantiating.</p> - -<p><i>Naming values</i></p> - -<p>It is very useful to name the values of instructions when you're able to, as -this facilitates the debugging of your transformations. If you end up looking -at generated LLVM machine code, you definitely want to have logical names -associated with the results of instructions! By supplying a value for the -<tt>Name</tt> (default) parameter of the <tt>Instruction</tt> constructor, you -associate a logical name with the result of the instruction's execution at -run time. For example, say that I'm writing a transformation that dynamically -allocates space for an integer on the stack, and that integer is going to be -used as some kind of index by some other code. To accomplish this, I place an -<tt>AllocaInst</tt> at the first point in the first <tt>BasicBlock</tt> of some -<tt>Function</tt>, and I'm intending to use it within the same -<tt>Function</tt>. I might do:</p> - -<div class="doc_code"> -<pre> -AllocaInst* pa = new AllocaInst(Type::Int32Ty, 0, "indexLoc"); -</pre> -</div> - -<p>where <tt>indexLoc</tt> is now the logical name of the instruction's -execution value, which is a pointer to an integer on the run time stack.</p> - -<p><i>Inserting instructions</i></p> - -<p>There are essentially two ways to insert an <tt>Instruction</tt> -into an existing sequence of instructions that form a <tt>BasicBlock</tt>:</p> - -<ul> - <li>Insertion into an explicit instruction list - - <p>Given a <tt>BasicBlock* pb</tt>, an <tt>Instruction* pi</tt> within that - <tt>BasicBlock</tt>, and a newly-created instruction we wish to insert - before <tt>*pi</tt>, we do the following: </p> - -<div class="doc_code"> -<pre> -BasicBlock *pb = ...; -Instruction *pi = ...; -Instruction *newInst = new Instruction(...); - -pb->getInstList().insert(pi, newInst); // <i>Inserts newInst before pi in pb</i> -</pre> -</div> - - <p>Appending to the end of a <tt>BasicBlock</tt> is so common that - the <tt>Instruction</tt> class and <tt>Instruction</tt>-derived - classes provide constructors which take a pointer to a - <tt>BasicBlock</tt> to be appended to. For example code that - looked like: </p> - -<div class="doc_code"> -<pre> -BasicBlock *pb = ...; -Instruction *newInst = new Instruction(...); - -pb->getInstList().push_back(newInst); // <i>Appends newInst to pb</i> -</pre> -</div> - - <p>becomes: </p> - -<div class="doc_code"> -<pre> -BasicBlock *pb = ...; -Instruction *newInst = new Instruction(..., pb); -</pre> -</div> - - <p>which is much cleaner, especially if you are creating - long instruction streams.</p></li> - - <li>Insertion into an implicit instruction list - - <p><tt>Instruction</tt> instances that are already in <tt>BasicBlock</tt>s - are implicitly associated with an existing instruction list: the instruction - list of the enclosing basic block. Thus, we could have accomplished the same - thing as the above code without being given a <tt>BasicBlock</tt> by doing: - </p> - -<div class="doc_code"> -<pre> -Instruction *pi = ...; -Instruction *newInst = new Instruction(...); - -pi->getParent()->getInstList().insert(pi, newInst); -</pre> -</div> - - <p>In fact, this sequence of steps occurs so frequently that the - <tt>Instruction</tt> class and <tt>Instruction</tt>-derived classes provide - constructors which take (as a default parameter) a pointer to an - <tt>Instruction</tt> which the newly-created <tt>Instruction</tt> should - precede. That is, <tt>Instruction</tt> constructors are capable of - inserting the newly-created instance into the <tt>BasicBlock</tt> of a - provided instruction, immediately before that instruction. Using an - <tt>Instruction</tt> constructor with a <tt>insertBefore</tt> (default) - parameter, the above code becomes:</p> - -<div class="doc_code"> -<pre> -Instruction* pi = ...; -Instruction* newInst = new Instruction(..., pi); -</pre> -</div> - - <p>which is much cleaner, especially if you're creating a lot of - instructions and adding them to <tt>BasicBlock</tt>s.</p></li> -</ul> - -</div> - -<!--_______________________________________________________________________--> -<h4> - <a name="schanges_deleting">Deleting <tt>Instruction</tt>s</a> -</h4> - -<div> - -<p>Deleting an instruction from an existing sequence of instructions that form a -<a href="#BasicBlock"><tt>BasicBlock</tt></a> is very straight-forward: just -call the instruction's eraseFromParent() method. For example:</p> - -<div class="doc_code"> -<pre> -<a href="#Instruction">Instruction</a> *I = .. ; -I->eraseFromParent(); -</pre> -</div> - -<p>This unlinks the instruction from its containing basic block and deletes -it. If you'd just like to unlink the instruction from its containing basic -block but not delete it, you can use the <tt>removeFromParent()</tt> method.</p> - -</div> - -<!--_______________________________________________________________________--> -<h4> - <a name="schanges_replacing">Replacing an <tt>Instruction</tt> with another - <tt>Value</tt></a> -</h4> - -<div> - -<h5><i>Replacing individual instructions</i></h5> - -<p>Including "<a href="/doxygen/BasicBlockUtils_8h-source.html">llvm/Transforms/Utils/BasicBlockUtils.h</a>" -permits use of two very useful replace functions: <tt>ReplaceInstWithValue</tt> -and <tt>ReplaceInstWithInst</tt>.</p> - -<h5><a name="schanges_deleting">Deleting <tt>Instruction</tt>s</a></h5> - -<div> -<ul> - <li><tt>ReplaceInstWithValue</tt> - - <p>This function replaces all uses of a given instruction with a value, - and then removes the original instruction. The following example - illustrates the replacement of the result of a particular - <tt>AllocaInst</tt> that allocates memory for a single integer with a null - pointer to an integer.</p> - -<div class="doc_code"> -<pre> -AllocaInst* instToReplace = ...; -BasicBlock::iterator ii(instToReplace); - -ReplaceInstWithValue(instToReplace->getParent()->getInstList(), ii, - Constant::getNullValue(PointerType::getUnqual(Type::Int32Ty))); -</pre></div></li> - - <li><tt>ReplaceInstWithInst</tt> - - <p>This function replaces a particular instruction with another - instruction, inserting the new instruction into the basic block at the - location where the old instruction was, and replacing any uses of the old - instruction with the new instruction. The following example illustrates - the replacement of one <tt>AllocaInst</tt> with another.</p> - -<div class="doc_code"> -<pre> -AllocaInst* instToReplace = ...; -BasicBlock::iterator ii(instToReplace); - -ReplaceInstWithInst(instToReplace->getParent()->getInstList(), ii, - new AllocaInst(Type::Int32Ty, 0, "ptrToReplacedInt")); -</pre></div></li> -</ul> - -</div> - -<h5><i>Replacing multiple uses of <tt>User</tt>s and <tt>Value</tt>s</i></h5> - -<p>You can use <tt>Value::replaceAllUsesWith</tt> and -<tt>User::replaceUsesOfWith</tt> to change more than one use at a time. See the -doxygen documentation for the <a href="/doxygen/classllvm_1_1Value.html">Value Class</a> -and <a href="/doxygen/classllvm_1_1User.html">User Class</a>, respectively, for more -information.</p> - -<!-- Value::replaceAllUsesWith User::replaceUsesOfWith Point out: -include/llvm/Transforms/Utils/ especially BasicBlockUtils.h with: -ReplaceInstWithValue, ReplaceInstWithInst --> - -</div> - -<!--_______________________________________________________________________--> -<h4> - <a name="schanges_deletingGV">Deleting <tt>GlobalVariable</tt>s</a> -</h4> - -<div> - -<p>Deleting a global variable from a module is just as easy as deleting an -Instruction. First, you must have a pointer to the global variable that you wish - to delete. You use this pointer to erase it from its parent, the module. - For example:</p> - -<div class="doc_code"> -<pre> -<a href="#GlobalVariable">GlobalVariable</a> *GV = .. ; - -GV->eraseFromParent(); -</pre> -</div> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="create_types">How to Create Types</a> -</h3> - -<div> - -<p>In generating IR, you may need some complex types. If you know these types -statically, you can use <tt>TypeBuilder<...>::get()</tt>, defined -in <tt>llvm/Support/TypeBuilder.h</tt>, to retrieve them. <tt>TypeBuilder</tt> -has two forms depending on whether you're building types for cross-compilation -or native library use. <tt>TypeBuilder<T, true></tt> requires -that <tt>T</tt> be independent of the host environment, meaning that it's built -out of types from -the <a href="/doxygen/namespacellvm_1_1types.html"><tt>llvm::types</tt></a> -namespace and pointers, functions, arrays, etc. built of -those. <tt>TypeBuilder<T, false></tt> additionally allows native C types -whose size may depend on the host compiler. For example,</p> - -<div class="doc_code"> -<pre> -FunctionType *ft = TypeBuilder<types::i<8>(types::i<32>*), true>::get(); -</pre> -</div> - -<p>is easier to read and write than the equivalent</p> - -<div class="doc_code"> -<pre> -std::vector<const Type*> params; -params.push_back(PointerType::getUnqual(Type::Int32Ty)); -FunctionType *ft = FunctionType::get(Type::Int8Ty, params, false); -</pre> -</div> - -<p>See the <a href="/doxygen/TypeBuilder_8h-source.html#l00001">class -comment</a> for more details.</p> - -</div> - -</div> - -<!-- *********************************************************************** --> -<h2> - <a name="threading">Threads and LLVM</a> -</h2> -<!-- *********************************************************************** --> - -<div> -<p> -This section describes the interaction of the LLVM APIs with multithreading, -both on the part of client applications, and in the JIT, in the hosted -application. -</p> - -<p> -Note that LLVM's support for multithreading is still relatively young. Up -through version 2.5, the execution of threaded hosted applications was -supported, but not threaded client access to the APIs. While this use case is -now supported, clients <em>must</em> adhere to the guidelines specified below to -ensure proper operation in multithreaded mode. -</p> - -<p> -Note that, on Unix-like platforms, LLVM requires the presence of GCC's atomic -intrinsics in order to support threaded operation. If you need a -multhreading-capable LLVM on a platform without a suitably modern system -compiler, consider compiling LLVM and LLVM-GCC in single-threaded mode, and -using the resultant compiler to build a copy of LLVM with multithreading -support. -</p> - -<!-- ======================================================================= --> -<h3> - <a name="startmultithreaded">Entering and Exiting Multithreaded Mode</a> -</h3> - -<div> - -<p> -In order to properly protect its internal data structures while avoiding -excessive locking overhead in the single-threaded case, the LLVM must intialize -certain data structures necessary to provide guards around its internals. To do -so, the client program must invoke <tt>llvm_start_multithreaded()</tt> before -making any concurrent LLVM API calls. To subsequently tear down these -structures, use the <tt>llvm_stop_multithreaded()</tt> call. You can also use -the <tt>llvm_is_multithreaded()</tt> call to check the status of multithreaded -mode. -</p> - -<p> -Note that both of these calls must be made <em>in isolation</em>. That is to -say that no other LLVM API calls may be executing at any time during the -execution of <tt>llvm_start_multithreaded()</tt> or <tt>llvm_stop_multithreaded -</tt>. It's is the client's responsibility to enforce this isolation. -</p> - -<p> -The return value of <tt>llvm_start_multithreaded()</tt> indicates the success or -failure of the initialization. Failure typically indicates that your copy of -LLVM was built without multithreading support, typically because GCC atomic -intrinsics were not found in your system compiler. In this case, the LLVM API -will not be safe for concurrent calls. However, it <em>will</em> be safe for -hosting threaded applications in the JIT, though <a href="#jitthreading">care -must be taken</a> to ensure that side exits and the like do not accidentally -result in concurrent LLVM API calls. -</p> -</div> - -<!-- ======================================================================= --> -<h3> - <a name="shutdown">Ending Execution with <tt>llvm_shutdown()</tt></a> -</h3> - -<div> -<p> -When you are done using the LLVM APIs, you should call <tt>llvm_shutdown()</tt> -to deallocate memory used for internal structures. This will also invoke -<tt>llvm_stop_multithreaded()</tt> if LLVM is operating in multithreaded mode. -As such, <tt>llvm_shutdown()</tt> requires the same isolation guarantees as -<tt>llvm_stop_multithreaded()</tt>. -</p> - -<p> -Note that, if you use scope-based shutdown, you can use the -<tt>llvm_shutdown_obj</tt> class, which calls <tt>llvm_shutdown()</tt> in its -destructor. -</div> - -<!-- ======================================================================= --> -<h3> - <a name="managedstatic">Lazy Initialization with <tt>ManagedStatic</tt></a> -</h3> - -<div> -<p> -<tt>ManagedStatic</tt> is a utility class in LLVM used to implement static -initialization of static resources, such as the global type tables. Before the -invocation of <tt>llvm_shutdown()</tt>, it implements a simple lazy -initialization scheme. Once <tt>llvm_start_multithreaded()</tt> returns, -however, it uses double-checked locking to implement thread-safe lazy -initialization. -</p> - -<p> -Note that, because no other threads are allowed to issue LLVM API calls before -<tt>llvm_start_multithreaded()</tt> returns, it is possible to have -<tt>ManagedStatic</tt>s of <tt>llvm::sys::Mutex</tt>s. -</p> - -<p> -The <tt>llvm_acquire_global_lock()</tt> and <tt>llvm_release_global_lock</tt> -APIs provide access to the global lock used to implement the double-checked -locking for lazy initialization. These should only be used internally to LLVM, -and only if you know what you're doing! -</p> -</div> - -<!-- ======================================================================= --> -<h3> - <a name="llvmcontext">Achieving Isolation with <tt>LLVMContext</tt></a> -</h3> - -<div> -<p> -<tt>LLVMContext</tt> is an opaque class in the LLVM API which clients can use -to operate multiple, isolated instances of LLVM concurrently within the same -address space. For instance, in a hypothetical compile-server, the compilation -of an individual translation unit is conceptually independent from all the -others, and it would be desirable to be able to compile incoming translation -units concurrently on independent server threads. Fortunately, -<tt>LLVMContext</tt> exists to enable just this kind of scenario! -</p> - -<p> -Conceptually, <tt>LLVMContext</tt> provides isolation. Every LLVM entity -(<tt>Module</tt>s, <tt>Value</tt>s, <tt>Type</tt>s, <tt>Constant</tt>s, etc.) -in LLVM's in-memory IR belongs to an <tt>LLVMContext</tt>. Entities in -different contexts <em>cannot</em> interact with each other: <tt>Module</tt>s in -different contexts cannot be linked together, <tt>Function</tt>s cannot be added -to <tt>Module</tt>s in different contexts, etc. What this means is that is is -safe to compile on multiple threads simultaneously, as long as no two threads -operate on entities within the same context. -</p> - -<p> -In practice, very few places in the API require the explicit specification of a -<tt>LLVMContext</tt>, other than the <tt>Type</tt> creation/lookup APIs. -Because every <tt>Type</tt> carries a reference to its owning context, most -other entities can determine what context they belong to by looking at their -own <tt>Type</tt>. If you are adding new entities to LLVM IR, please try to -maintain this interface design. -</p> - -<p> -For clients that do <em>not</em> require the benefits of isolation, LLVM -provides a convenience API <tt>getGlobalContext()</tt>. This returns a global, -lazily initialized <tt>LLVMContext</tt> that may be used in situations where -isolation is not a concern. -</p> -</div> - -<!-- ======================================================================= --> -<h3> - <a name="jitthreading">Threads and the JIT</a> -</h3> - -<div> -<p> -LLVM's "eager" JIT compiler is safe to use in threaded programs. Multiple -threads can call <tt>ExecutionEngine::getPointerToFunction()</tt> or -<tt>ExecutionEngine::runFunction()</tt> concurrently, and multiple threads can -run code output by the JIT concurrently. The user must still ensure that only -one thread accesses IR in a given <tt>LLVMContext</tt> while another thread -might be modifying it. One way to do that is to always hold the JIT lock while -accessing IR outside the JIT (the JIT <em>modifies</em> the IR by adding -<tt>CallbackVH</tt>s). Another way is to only -call <tt>getPointerToFunction()</tt> from the <tt>LLVMContext</tt>'s thread. -</p> - -<p>When the JIT is configured to compile lazily (using -<tt>ExecutionEngine::DisableLazyCompilation(false)</tt>), there is currently a -<a href="http://llvm.org/bugs/show_bug.cgi?id=5184">race condition</a> in -updating call sites after a function is lazily-jitted. It's still possible to -use the lazy JIT in a threaded program if you ensure that only one thread at a -time can call any particular lazy stub and that the JIT lock guards any IR -access, but we suggest using only the eager JIT in threaded programs. -</p> -</div> - -</div> - -<!-- *********************************************************************** --> -<h2> - <a name="advanced">Advanced Topics</a> -</h2> -<!-- *********************************************************************** --> - -<div> -<p> -This section describes some of the advanced or obscure API's that most clients -do not need to be aware of. These API's tend manage the inner workings of the -LLVM system, and only need to be accessed in unusual circumstances. -</p> - - -<!-- ======================================================================= --> -<h3> - <a name="SymbolTable">The <tt>ValueSymbolTable</tt> class</a> -</h3> - -<div> -<p>The <tt><a href="http://llvm.org/doxygen/classllvm_1_1ValueSymbolTable.html"> -ValueSymbolTable</a></tt> class provides a symbol table that the <a -href="#Function"><tt>Function</tt></a> and <a href="#Module"> -<tt>Module</tt></a> classes use for naming value definitions. The symbol table -can provide a name for any <a href="#Value"><tt>Value</tt></a>. -</p> - -<p>Note that the <tt>SymbolTable</tt> class should not be directly accessed -by most clients. It should only be used when iteration over the symbol table -names themselves are required, which is very special purpose. Note that not -all LLVM -<tt><a href="#Value">Value</a></tt>s have names, and those without names (i.e. they have -an empty name) do not exist in the symbol table. -</p> - -<p>Symbol tables support iteration over the values in the symbol -table with <tt>begin/end/iterator</tt> and supports querying to see if a -specific name is in the symbol table (with <tt>lookup</tt>). The -<tt>ValueSymbolTable</tt> class exposes no public mutator methods, instead, -simply call <tt>setName</tt> on a value, which will autoinsert it into the -appropriate symbol table.</p> - -</div> - - - -<!-- ======================================================================= --> -<h3> - <a name="UserLayout">The <tt>User</tt> and owned <tt>Use</tt> classes' memory layout</a> -</h3> - -<div> -<p>The <tt><a href="http://llvm.org/doxygen/classllvm_1_1User.html"> -User</a></tt> class provides a basis for expressing the ownership of <tt>User</tt> -towards other <tt><a href="http://llvm.org/doxygen/classllvm_1_1Value.html"> -Value</a></tt>s. The <tt><a href="http://llvm.org/doxygen/classllvm_1_1Use.html"> -Use</a></tt> helper class is employed to do the bookkeeping and to facilitate <i>O(1)</i> -addition and removal.</p> - -<!-- ______________________________________________________________________ --> -<h4> - <a name="Use2User"> - Interaction and relationship between <tt>User</tt> and <tt>Use</tt> objects - </a> -</h4> - -<div> -<p> -A subclass of <tt>User</tt> can choose between incorporating its <tt>Use</tt> objects -or refer to them out-of-line by means of a pointer. A mixed variant -(some <tt>Use</tt>s inline others hung off) is impractical and breaks the invariant -that the <tt>Use</tt> objects belonging to the same <tt>User</tt> form a contiguous array. -</p> - -<p> -We have 2 different layouts in the <tt>User</tt> (sub)classes: -<ul> -<li><p>Layout a) -The <tt>Use</tt> object(s) are inside (resp. at fixed offset) of the <tt>User</tt> -object and there are a fixed number of them.</p> - -<li><p>Layout b) -The <tt>Use</tt> object(s) are referenced by a pointer to an -array from the <tt>User</tt> object and there may be a variable -number of them.</p> -</ul> -<p> -As of v2.4 each layout still possesses a direct pointer to the -start of the array of <tt>Use</tt>s. Though not mandatory for layout a), -we stick to this redundancy for the sake of simplicity. -The <tt>User</tt> object also stores the number of <tt>Use</tt> objects it -has. (Theoretically this information can also be calculated -given the scheme presented below.)</p> -<p> -Special forms of allocation operators (<tt>operator new</tt>) -enforce the following memory layouts:</p> - -<ul> -<li><p>Layout a) is modelled by prepending the <tt>User</tt> object by the <tt>Use[]</tt> array.</p> - -<pre> -...---.---.---.---.-------... - | P | P | P | P | User -'''---'---'---'---'-------''' -</pre> - -<li><p>Layout b) is modelled by pointing at the <tt>Use[]</tt> array.</p> -<pre> -.-------... -| User -'-------''' - | - v - .---.---.---.---... - | P | P | P | P | - '---'---'---'---''' -</pre> -</ul> -<i>(In the above figures '<tt>P</tt>' stands for the <tt>Use**</tt> that - is stored in each <tt>Use</tt> object in the member <tt>Use::Prev</tt>)</i> - -</div> - -<!-- ______________________________________________________________________ --> -<h4> - <a name="Waymarking">The waymarking algorithm</a> -</h4> - -<div> -<p> -Since the <tt>Use</tt> objects are deprived of the direct (back)pointer to -their <tt>User</tt> objects, there must be a fast and exact method to -recover it. This is accomplished by the following scheme:</p> - -A bit-encoding in the 2 LSBits (least significant bits) of the <tt>Use::Prev</tt> allows to find the -start of the <tt>User</tt> object: -<ul> -<li><tt>00</tt> —> binary digit 0</li> -<li><tt>01</tt> —> binary digit 1</li> -<li><tt>10</tt> —> stop and calculate (<tt>s</tt>)</li> -<li><tt>11</tt> —> full stop (<tt>S</tt>)</li> -</ul> -<p> -Given a <tt>Use*</tt>, all we have to do is to walk till we get -a stop and we either have a <tt>User</tt> immediately behind or -we have to walk to the next stop picking up digits -and calculating the offset:</p> -<pre> -.---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.---------------- -| 1 | s | 1 | 0 | 1 | 0 | s | 1 | 1 | 0 | s | 1 | 1 | s | 1 | S | User (or User*) -'---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'---------------- - |+15 |+10 |+6 |+3 |+1 - | | | | |__> - | | | |__________> - | | |______________________> - | |______________________________________> - |__________________________________________________________> -</pre> -<p> -Only the significant number of bits need to be stored between the -stops, so that the <i>worst case is 20 memory accesses</i> when there are -1000 <tt>Use</tt> objects associated with a <tt>User</tt>.</p> - -</div> - -<!-- ______________________________________________________________________ --> -<h4> - <a name="ReferenceImpl">Reference implementation</a> -</h4> - -<div> -<p> -The following literate Haskell fragment demonstrates the concept:</p> - -<div class="doc_code"> -<pre> -> import Test.QuickCheck -> -> digits :: Int -> [Char] -> [Char] -> digits 0 acc = '0' : acc -> digits 1 acc = '1' : acc -> digits n acc = digits (n `div` 2) $ digits (n `mod` 2) acc -> -> dist :: Int -> [Char] -> [Char] -> dist 0 [] = ['S'] -> dist 0 acc = acc -> dist 1 acc = let r = dist 0 acc in 's' : digits (length r) r -> dist n acc = dist (n - 1) $ dist 1 acc -> -> takeLast n ss = reverse $ take n $ reverse ss -> -> test = takeLast 40 $ dist 20 [] -> -</pre> -</div> -<p> -Printing <test> gives: <tt>"1s100000s11010s10100s1111s1010s110s11s1S"</tt></p> -<p> -The reverse algorithm computes the length of the string just by examining -a certain prefix:</p> - -<div class="doc_code"> -<pre> -> pref :: [Char] -> Int -> pref "S" = 1 -> pref ('s':'1':rest) = decode 2 1 rest -> pref (_:rest) = 1 + pref rest -> -> decode walk acc ('0':rest) = decode (walk + 1) (acc * 2) rest -> decode walk acc ('1':rest) = decode (walk + 1) (acc * 2 + 1) rest -> decode walk acc _ = walk + acc -> -</pre> -</div> -<p> -Now, as expected, printing <pref test> gives <tt>40</tt>.</p> -<p> -We can <i>quickCheck</i> this with following property:</p> - -<div class="doc_code"> -<pre> -> testcase = dist 2000 [] -> testcaseLength = length testcase -> -> identityProp n = n > 0 && n <= testcaseLength ==> length arr == pref arr -> where arr = takeLast n testcase -> -</pre> -</div> -<p> -As expected <quickCheck identityProp> gives:</p> - -<pre> -*Main> quickCheck identityProp -OK, passed 100 tests. -</pre> -<p> -Let's be a bit more exhaustive:</p> - -<div class="doc_code"> -<pre> -> -> deepCheck p = check (defaultConfig { configMaxTest = 500 }) p -> -</pre> -</div> -<p> -And here is the result of <deepCheck identityProp>:</p> - -<pre> -*Main> deepCheck identityProp -OK, passed 500 tests. -</pre> - -</div> - -<!-- ______________________________________________________________________ --> -<h4> - <a name="Tagging">Tagging considerations</a> -</h4> - -<div> - -<p> -To maintain the invariant that the 2 LSBits of each <tt>Use**</tt> in <tt>Use</tt> -never change after being set up, setters of <tt>Use::Prev</tt> must re-tag the -new <tt>Use**</tt> on every modification. Accordingly getters must strip the -tag bits.</p> -<p> -For layout b) instead of the <tt>User</tt> we find a pointer (<tt>User*</tt> with LSBit set). -Following this pointer brings us to the <tt>User</tt>. A portable trick ensures -that the first bytes of <tt>User</tt> (if interpreted as a pointer) never has -the LSBit set. (Portability is relying on the fact that all known compilers place the -<tt>vptr</tt> in the first word of the instances.)</p> - -</div> - -</div> - -</div> - -<!-- *********************************************************************** --> -<h2> - <a name="coreclasses">The Core LLVM Class Hierarchy Reference </a> -</h2> -<!-- *********************************************************************** --> - -<div> -<p><tt>#include "<a href="/doxygen/Type_8h-source.html">llvm/Type.h</a>"</tt> -<br>doxygen info: <a href="/doxygen/classllvm_1_1Type.html">Type Class</a></p> - -<p>The Core LLVM classes are the primary means of representing the program -being inspected or transformed. The core LLVM classes are defined in -header files in the <tt>include/llvm/</tt> directory, and implemented in -the <tt>lib/VMCore</tt> directory.</p> - -<!-- ======================================================================= --> -<h3> - <a name="Type">The <tt>Type</tt> class and Derived Types</a> -</h3> - -<div> - - <p><tt>Type</tt> is a superclass of all type classes. Every <tt>Value</tt> has - a <tt>Type</tt>. <tt>Type</tt> cannot be instantiated directly but only - through its subclasses. Certain primitive types (<tt>VoidType</tt>, - <tt>LabelType</tt>, <tt>FloatType</tt> and <tt>DoubleType</tt>) have hidden - subclasses. They are hidden because they offer no useful functionality beyond - what the <tt>Type</tt> class offers except to distinguish themselves from - other subclasses of <tt>Type</tt>.</p> - <p>All other types are subclasses of <tt>DerivedType</tt>. Types can be - named, but this is not a requirement. There exists exactly - one instance of a given shape at any one time. This allows type equality to - be performed with address equality of the Type Instance. That is, given two - <tt>Type*</tt> values, the types are identical if the pointers are identical. - </p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="m_Type">Important Public Methods</a> -</h4> - -<div> - -<ul> - <li><tt>bool isIntegerTy() const</tt>: Returns true for any integer type.</li> - - <li><tt>bool isFloatingPointTy()</tt>: Return true if this is one of the five - floating point types.</li> - - <li><tt>bool isSized()</tt>: Return true if the type has known size. Things - that don't have a size are abstract types, labels and void.</li> - -</ul> -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="derivedtypes">Important Derived Types</a> -</h4> -<div> -<dl> - <dt><tt>IntegerType</tt></dt> - <dd>Subclass of DerivedType that represents integer types of any bit width. - Any bit width between <tt>IntegerType::MIN_INT_BITS</tt> (1) and - <tt>IntegerType::MAX_INT_BITS</tt> (~8 million) can be represented. - <ul> - <li><tt>static const IntegerType* get(unsigned NumBits)</tt>: get an integer - type of a specific bit width.</li> - <li><tt>unsigned getBitWidth() const</tt>: Get the bit width of an integer - type.</li> - </ul> - </dd> - <dt><tt>SequentialType</tt></dt> - <dd>This is subclassed by ArrayType, PointerType and VectorType. - <ul> - <li><tt>const Type * getElementType() const</tt>: Returns the type of each - of the elements in the sequential type. </li> - </ul> - </dd> - <dt><tt>ArrayType</tt></dt> - <dd>This is a subclass of SequentialType and defines the interface for array - types. - <ul> - <li><tt>unsigned getNumElements() const</tt>: Returns the number of - elements in the array. </li> - </ul> - </dd> - <dt><tt>PointerType</tt></dt> - <dd>Subclass of SequentialType for pointer types.</dd> - <dt><tt>VectorType</tt></dt> - <dd>Subclass of SequentialType for vector types. A - vector type is similar to an ArrayType but is distinguished because it is - a first class type whereas ArrayType is not. Vector types are used for - vector operations and are usually small vectors of of an integer or floating - point type.</dd> - <dt><tt>StructType</tt></dt> - <dd>Subclass of DerivedTypes for struct types.</dd> - <dt><tt><a name="FunctionType">FunctionType</a></tt></dt> - <dd>Subclass of DerivedTypes for function types. - <ul> - <li><tt>bool isVarArg() const</tt>: Returns true if it's a vararg - function</li> - <li><tt> const Type * getReturnType() const</tt>: Returns the - return type of the function.</li> - <li><tt>const Type * getParamType (unsigned i)</tt>: Returns - the type of the ith parameter.</li> - <li><tt> const unsigned getNumParams() const</tt>: Returns the - number of formal parameters.</li> - </ul> - </dd> -</dl> -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="Module">The <tt>Module</tt> class</a> -</h3> - -<div> - -<p><tt>#include "<a -href="/doxygen/Module_8h-source.html">llvm/Module.h</a>"</tt><br> doxygen info: -<a href="/doxygen/classllvm_1_1Module.html">Module Class</a></p> - -<p>The <tt>Module</tt> class represents the top level structure present in LLVM -programs. An LLVM module is effectively either a translation unit of the -original program or a combination of several translation units merged by the -linker. The <tt>Module</tt> class keeps track of a list of <a -href="#Function"><tt>Function</tt></a>s, a list of <a -href="#GlobalVariable"><tt>GlobalVariable</tt></a>s, and a <a -href="#SymbolTable"><tt>SymbolTable</tt></a>. Additionally, it contains a few -helpful member functions that try to make common operations easy.</p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="m_Module">Important Public Members of the <tt>Module</tt> class</a> -</h4> - -<div> - -<ul> - <li><tt>Module::Module(std::string name = "")</tt> - - <p>Constructing a <a href="#Module">Module</a> is easy. You can optionally -provide a name for it (probably based on the name of the translation unit).</p> - </li> - - <li><tt>Module::iterator</tt> - Typedef for function list iterator<br> - <tt>Module::const_iterator</tt> - Typedef for const_iterator.<br> - - <tt>begin()</tt>, <tt>end()</tt> - <tt>size()</tt>, <tt>empty()</tt> - - <p>These are forwarding methods that make it easy to access the contents of - a <tt>Module</tt> object's <a href="#Function"><tt>Function</tt></a> - list.</p></li> - - <li><tt>Module::FunctionListType &getFunctionList()</tt> - - <p> Returns the list of <a href="#Function"><tt>Function</tt></a>s. This is - necessary to use when you need to update the list or perform a complex - action that doesn't have a forwarding method.</p> - - <p><!-- Global Variable --></p></li> -</ul> - -<hr> - -<ul> - <li><tt>Module::global_iterator</tt> - Typedef for global variable list iterator<br> - - <tt>Module::const_global_iterator</tt> - Typedef for const_iterator.<br> - - <tt>global_begin()</tt>, <tt>global_end()</tt> - <tt>global_size()</tt>, <tt>global_empty()</tt> - - <p> These are forwarding methods that make it easy to access the contents of - a <tt>Module</tt> object's <a - href="#GlobalVariable"><tt>GlobalVariable</tt></a> list.</p></li> - - <li><tt>Module::GlobalListType &getGlobalList()</tt> - - <p>Returns the list of <a - href="#GlobalVariable"><tt>GlobalVariable</tt></a>s. This is necessary to - use when you need to update the list or perform a complex action that - doesn't have a forwarding method.</p> - - <p><!-- Symbol table stuff --> </p></li> -</ul> - -<hr> - -<ul> - <li><tt><a href="#SymbolTable">SymbolTable</a> *getSymbolTable()</tt> - - <p>Return a reference to the <a href="#SymbolTable"><tt>SymbolTable</tt></a> - for this <tt>Module</tt>.</p> - - <p><!-- Convenience methods --></p></li> -</ul> - -<hr> - -<ul> - - <li><tt><a href="#Function">Function</a> *getFunction(StringRef Name) const - </tt> - - <p>Look up the specified function in the <tt>Module</tt> <a - href="#SymbolTable"><tt>SymbolTable</tt></a>. If it does not exist, return - <tt>null</tt>.</p></li> - - <li><tt><a href="#Function">Function</a> *getOrInsertFunction(const - std::string &Name, const <a href="#FunctionType">FunctionType</a> *T)</tt> - - <p>Look up the specified function in the <tt>Module</tt> <a - href="#SymbolTable"><tt>SymbolTable</tt></a>. If it does not exist, add an - external declaration for the function and return it.</p></li> - - <li><tt>std::string getTypeName(const <a href="#Type">Type</a> *Ty)</tt> - - <p>If there is at least one entry in the <a - href="#SymbolTable"><tt>SymbolTable</tt></a> for the specified <a - href="#Type"><tt>Type</tt></a>, return it. Otherwise return the empty - string.</p></li> - - <li><tt>bool addTypeName(const std::string &Name, const <a - href="#Type">Type</a> *Ty)</tt> - - <p>Insert an entry in the <a href="#SymbolTable"><tt>SymbolTable</tt></a> - mapping <tt>Name</tt> to <tt>Ty</tt>. If there is already an entry for this - name, true is returned and the <a - href="#SymbolTable"><tt>SymbolTable</tt></a> is not modified.</p></li> -</ul> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="Value">The <tt>Value</tt> class</a> -</h3> - -<div> - -<p><tt>#include "<a href="/doxygen/Value_8h-source.html">llvm/Value.h</a>"</tt> -<br> -doxygen info: <a href="/doxygen/classllvm_1_1Value.html">Value Class</a></p> - -<p>The <tt>Value</tt> class is the most important class in the LLVM Source -base. It represents a typed value that may be used (among other things) as an -operand to an instruction. There are many different types of <tt>Value</tt>s, -such as <a href="#Constant"><tt>Constant</tt></a>s,<a -href="#Argument"><tt>Argument</tt></a>s. Even <a -href="#Instruction"><tt>Instruction</tt></a>s and <a -href="#Function"><tt>Function</tt></a>s are <tt>Value</tt>s.</p> - -<p>A particular <tt>Value</tt> may be used many times in the LLVM representation -for a program. For example, an incoming argument to a function (represented -with an instance of the <a href="#Argument">Argument</a> class) is "used" by -every instruction in the function that references the argument. To keep track -of this relationship, the <tt>Value</tt> class keeps a list of all of the <a -href="#User"><tt>User</tt></a>s that is using it (the <a -href="#User"><tt>User</tt></a> class is a base class for all nodes in the LLVM -graph that can refer to <tt>Value</tt>s). This use list is how LLVM represents -def-use information in the program, and is accessible through the <tt>use_</tt>* -methods, shown below.</p> - -<p>Because LLVM is a typed representation, every LLVM <tt>Value</tt> is typed, -and this <a href="#Type">Type</a> is available through the <tt>getType()</tt> -method. In addition, all LLVM values can be named. The "name" of the -<tt>Value</tt> is a symbolic string printed in the LLVM code:</p> - -<div class="doc_code"> -<pre> -%<b>foo</b> = add i32 1, 2 -</pre> -</div> - -<p><a name="nameWarning">The name of this instruction is "foo".</a> <b>NOTE</b> -that the name of any value may be missing (an empty string), so names should -<b>ONLY</b> be used for debugging (making the source code easier to read, -debugging printouts), they should not be used to keep track of values or map -between them. For this purpose, use a <tt>std::map</tt> of pointers to the -<tt>Value</tt> itself instead.</p> - -<p>One important aspect of LLVM is that there is no distinction between an SSA -variable and the operation that produces it. Because of this, any reference to -the value produced by an instruction (or the value available as an incoming -argument, for example) is represented as a direct pointer to the instance of -the class that -represents this value. Although this may take some getting used to, it -simplifies the representation and makes it easier to manipulate.</p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="m_Value">Important Public Members of the <tt>Value</tt> class</a> -</h4> - -<div> - -<ul> - <li><tt>Value::use_iterator</tt> - Typedef for iterator over the -use-list<br> - <tt>Value::const_use_iterator</tt> - Typedef for const_iterator over -the use-list<br> - <tt>unsigned use_size()</tt> - Returns the number of users of the -value.<br> - <tt>bool use_empty()</tt> - Returns true if there are no users.<br> - <tt>use_iterator use_begin()</tt> - Get an iterator to the start of -the use-list.<br> - <tt>use_iterator use_end()</tt> - Get an iterator to the end of the -use-list.<br> - <tt><a href="#User">User</a> *use_back()</tt> - Returns the last -element in the list. - <p> These methods are the interface to access the def-use -information in LLVM. As with all other iterators in LLVM, the naming -conventions follow the conventions defined by the <a href="#stl">STL</a>.</p> - </li> - <li><tt><a href="#Type">Type</a> *getType() const</tt> - <p>This method returns the Type of the Value.</p> - </li> - <li><tt>bool hasName() const</tt><br> - <tt>std::string getName() const</tt><br> - <tt>void setName(const std::string &Name)</tt> - <p> This family of methods is used to access and assign a name to a <tt>Value</tt>, -be aware of the <a href="#nameWarning">precaution above</a>.</p> - </li> - <li><tt>void replaceAllUsesWith(Value *V)</tt> - - <p>This method traverses the use list of a <tt>Value</tt> changing all <a - href="#User"><tt>User</tt>s</a> of the current value to refer to - "<tt>V</tt>" instead. For example, if you detect that an instruction always - produces a constant value (for example through constant folding), you can - replace all uses of the instruction with the constant like this:</p> - -<div class="doc_code"> -<pre> -Inst->replaceAllUsesWith(ConstVal); -</pre> -</div> - -</ul> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="User">The <tt>User</tt> class</a> -</h3> - -<div> - -<p> -<tt>#include "<a href="/doxygen/User_8h-source.html">llvm/User.h</a>"</tt><br> -doxygen info: <a href="/doxygen/classllvm_1_1User.html">User Class</a><br> -Superclass: <a href="#Value"><tt>Value</tt></a></p> - -<p>The <tt>User</tt> class is the common base class of all LLVM nodes that may -refer to <a href="#Value"><tt>Value</tt></a>s. It exposes a list of "Operands" -that are all of the <a href="#Value"><tt>Value</tt></a>s that the User is -referring to. The <tt>User</tt> class itself is a subclass of -<tt>Value</tt>.</p> - -<p>The operands of a <tt>User</tt> point directly to the LLVM <a -href="#Value"><tt>Value</tt></a> that it refers to. Because LLVM uses Static -Single Assignment (SSA) form, there can only be one definition referred to, -allowing this direct connection. This connection provides the use-def -information in LLVM.</p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="m_User">Important Public Members of the <tt>User</tt> class</a> -</h4> - -<div> - -<p>The <tt>User</tt> class exposes the operand list in two ways: through -an index access interface and through an iterator based interface.</p> - -<ul> - <li><tt>Value *getOperand(unsigned i)</tt><br> - <tt>unsigned getNumOperands()</tt> - <p> These two methods expose the operands of the <tt>User</tt> in a -convenient form for direct access.</p></li> - - <li><tt>User::op_iterator</tt> - Typedef for iterator over the operand -list<br> - <tt>op_iterator op_begin()</tt> - Get an iterator to the start of -the operand list.<br> - <tt>op_iterator op_end()</tt> - Get an iterator to the end of the -operand list. - <p> Together, these methods make up the iterator based interface to -the operands of a <tt>User</tt>.</p></li> -</ul> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="Instruction">The <tt>Instruction</tt> class</a> -</h3> - -<div> - -<p><tt>#include "</tt><tt><a -href="/doxygen/Instruction_8h-source.html">llvm/Instruction.h</a>"</tt><br> -doxygen info: <a href="/doxygen/classllvm_1_1Instruction.html">Instruction Class</a><br> -Superclasses: <a href="#User"><tt>User</tt></a>, <a -href="#Value"><tt>Value</tt></a></p> - -<p>The <tt>Instruction</tt> class is the common base class for all LLVM -instructions. It provides only a few methods, but is a very commonly used -class. The primary data tracked by the <tt>Instruction</tt> class itself is the -opcode (instruction type) and the parent <a -href="#BasicBlock"><tt>BasicBlock</tt></a> the <tt>Instruction</tt> is embedded -into. To represent a specific type of instruction, one of many subclasses of -<tt>Instruction</tt> are used.</p> - -<p> Because the <tt>Instruction</tt> class subclasses the <a -href="#User"><tt>User</tt></a> class, its operands can be accessed in the same -way as for other <a href="#User"><tt>User</tt></a>s (with the -<tt>getOperand()</tt>/<tt>getNumOperands()</tt> and -<tt>op_begin()</tt>/<tt>op_end()</tt> methods).</p> <p> An important file for -the <tt>Instruction</tt> class is the <tt>llvm/Instruction.def</tt> file. This -file contains some meta-data about the various different types of instructions -in LLVM. It describes the enum values that are used as opcodes (for example -<tt>Instruction::Add</tt> and <tt>Instruction::ICmp</tt>), as well as the -concrete sub-classes of <tt>Instruction</tt> that implement the instruction (for -example <tt><a href="#BinaryOperator">BinaryOperator</a></tt> and <tt><a -href="#CmpInst">CmpInst</a></tt>). Unfortunately, the use of macros in -this file confuses doxygen, so these enum values don't show up correctly in the -<a href="/doxygen/classllvm_1_1Instruction.html">doxygen output</a>.</p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="s_Instruction"> - Important Subclasses of the <tt>Instruction</tt> class - </a> -</h4> -<div> - <ul> - <li><tt><a name="BinaryOperator">BinaryOperator</a></tt> - <p>This subclasses represents all two operand instructions whose operands - must be the same type, except for the comparison instructions.</p></li> - <li><tt><a name="CastInst">CastInst</a></tt> - <p>This subclass is the parent of the 12 casting instructions. It provides - common operations on cast instructions.</p> - <li><tt><a name="CmpInst">CmpInst</a></tt> - <p>This subclass respresents the two comparison instructions, - <a href="LangRef.html#i_icmp">ICmpInst</a> (integer opreands), and - <a href="LangRef.html#i_fcmp">FCmpInst</a> (floating point operands).</p> - <li><tt><a name="TerminatorInst">TerminatorInst</a></tt> - <p>This subclass is the parent of all terminator instructions (those which - can terminate a block).</p> - </ul> - </div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="m_Instruction"> - Important Public Members of the <tt>Instruction</tt> class - </a> -</h4> - -<div> - -<ul> - <li><tt><a href="#BasicBlock">BasicBlock</a> *getParent()</tt> - <p>Returns the <a href="#BasicBlock"><tt>BasicBlock</tt></a> that -this <tt>Instruction</tt> is embedded into.</p></li> - <li><tt>bool mayWriteToMemory()</tt> - <p>Returns true if the instruction writes to memory, i.e. it is a - <tt>call</tt>,<tt>free</tt>,<tt>invoke</tt>, or <tt>store</tt>.</p></li> - <li><tt>unsigned getOpcode()</tt> - <p>Returns the opcode for the <tt>Instruction</tt>.</p></li> - <li><tt><a href="#Instruction">Instruction</a> *clone() const</tt> - <p>Returns another instance of the specified instruction, identical -in all ways to the original except that the instruction has no parent -(ie it's not embedded into a <a href="#BasicBlock"><tt>BasicBlock</tt></a>), -and it has no name</p></li> -</ul> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="Constant">The <tt>Constant</tt> class and subclasses</a> -</h3> - -<div> - -<p>Constant represents a base class for different types of constants. It -is subclassed by ConstantInt, ConstantArray, etc. for representing -the various types of Constants. <a href="#GlobalValue">GlobalValue</a> is also -a subclass, which represents the address of a global variable or function. -</p> - -<!-- _______________________________________________________________________ --> -<h4>Important Subclasses of Constant</h4> -<div> -<ul> - <li>ConstantInt : This subclass of Constant represents an integer constant of - any width. - <ul> - <li><tt>const APInt& getValue() const</tt>: Returns the underlying - value of this constant, an APInt value.</li> - <li><tt>int64_t getSExtValue() const</tt>: Converts the underlying APInt - value to an int64_t via sign extension. If the value (not the bit width) - of the APInt is too large to fit in an int64_t, an assertion will result. - For this reason, use of this method is discouraged.</li> - <li><tt>uint64_t getZExtValue() const</tt>: Converts the underlying APInt - value to a uint64_t via zero extension. IF the value (not the bit width) - of the APInt is too large to fit in a uint64_t, an assertion will result. - For this reason, use of this method is discouraged.</li> - <li><tt>static ConstantInt* get(const APInt& Val)</tt>: Returns the - ConstantInt object that represents the value provided by <tt>Val</tt>. - The type is implied as the IntegerType that corresponds to the bit width - of <tt>Val</tt>.</li> - <li><tt>static ConstantInt* get(const Type *Ty, uint64_t Val)</tt>: - Returns the ConstantInt object that represents the value provided by - <tt>Val</tt> for integer type <tt>Ty</tt>.</li> - </ul> - </li> - <li>ConstantFP : This class represents a floating point constant. - <ul> - <li><tt>double getValue() const</tt>: Returns the underlying value of - this constant. </li> - </ul> - </li> - <li>ConstantArray : This represents a constant array. - <ul> - <li><tt>const std::vector<Use> &getValues() const</tt>: Returns - a vector of component constants that makeup this array. </li> - </ul> - </li> - <li>ConstantStruct : This represents a constant struct. - <ul> - <li><tt>const std::vector<Use> &getValues() const</tt>: Returns - a vector of component constants that makeup this array. </li> - </ul> - </li> - <li>GlobalValue : This represents either a global variable or a function. In - either case, the value is a constant fixed address (after linking). - </li> -</ul> -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="GlobalValue">The <tt>GlobalValue</tt> class</a> -</h3> - -<div> - -<p><tt>#include "<a -href="/doxygen/GlobalValue_8h-source.html">llvm/GlobalValue.h</a>"</tt><br> -doxygen info: <a href="/doxygen/classllvm_1_1GlobalValue.html">GlobalValue -Class</a><br> -Superclasses: <a href="#Constant"><tt>Constant</tt></a>, -<a href="#User"><tt>User</tt></a>, <a href="#Value"><tt>Value</tt></a></p> - -<p>Global values (<a href="#GlobalVariable"><tt>GlobalVariable</tt></a>s or <a -href="#Function"><tt>Function</tt></a>s) are the only LLVM values that are -visible in the bodies of all <a href="#Function"><tt>Function</tt></a>s. -Because they are visible at global scope, they are also subject to linking with -other globals defined in different translation units. To control the linking -process, <tt>GlobalValue</tt>s know their linkage rules. Specifically, -<tt>GlobalValue</tt>s know whether they have internal or external linkage, as -defined by the <tt>LinkageTypes</tt> enumeration.</p> - -<p>If a <tt>GlobalValue</tt> has internal linkage (equivalent to being -<tt>static</tt> in C), it is not visible to code outside the current translation -unit, and does not participate in linking. If it has external linkage, it is -visible to external code, and does participate in linking. In addition to -linkage information, <tt>GlobalValue</tt>s keep track of which <a -href="#Module"><tt>Module</tt></a> they are currently part of.</p> - -<p>Because <tt>GlobalValue</tt>s are memory objects, they are always referred to -by their <b>address</b>. As such, the <a href="#Type"><tt>Type</tt></a> of a -global is always a pointer to its contents. It is important to remember this -when using the <tt>GetElementPtrInst</tt> instruction because this pointer must -be dereferenced first. For example, if you have a <tt>GlobalVariable</tt> (a -subclass of <tt>GlobalValue)</tt> that is an array of 24 ints, type <tt>[24 x -i32]</tt>, then the <tt>GlobalVariable</tt> is a pointer to that array. Although -the address of the first element of this array and the value of the -<tt>GlobalVariable</tt> are the same, they have different types. The -<tt>GlobalVariable</tt>'s type is <tt>[24 x i32]</tt>. The first element's type -is <tt>i32.</tt> Because of this, accessing a global value requires you to -dereference the pointer with <tt>GetElementPtrInst</tt> first, then its elements -can be accessed. This is explained in the <a href="LangRef.html#globalvars">LLVM -Language Reference Manual</a>.</p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="m_GlobalValue"> - Important Public Members of the <tt>GlobalValue</tt> class - </a> -</h4> - -<div> - -<ul> - <li><tt>bool hasInternalLinkage() const</tt><br> - <tt>bool hasExternalLinkage() const</tt><br> - <tt>void setInternalLinkage(bool HasInternalLinkage)</tt> - <p> These methods manipulate the linkage characteristics of the <tt>GlobalValue</tt>.</p> - <p> </p> - </li> - <li><tt><a href="#Module">Module</a> *getParent()</tt> - <p> This returns the <a href="#Module"><tt>Module</tt></a> that the -GlobalValue is currently embedded into.</p></li> -</ul> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="Function">The <tt>Function</tt> class</a> -</h3> - -<div> - -<p><tt>#include "<a -href="/doxygen/Function_8h-source.html">llvm/Function.h</a>"</tt><br> doxygen -info: <a href="/doxygen/classllvm_1_1Function.html">Function Class</a><br> -Superclasses: <a href="#GlobalValue"><tt>GlobalValue</tt></a>, -<a href="#Constant"><tt>Constant</tt></a>, -<a href="#User"><tt>User</tt></a>, -<a href="#Value"><tt>Value</tt></a></p> - -<p>The <tt>Function</tt> class represents a single procedure in LLVM. It is -actually one of the more complex classes in the LLVM hierarchy because it must -keep track of a large amount of data. The <tt>Function</tt> class keeps track -of a list of <a href="#BasicBlock"><tt>BasicBlock</tt></a>s, a list of formal -<a href="#Argument"><tt>Argument</tt></a>s, and a -<a href="#SymbolTable"><tt>SymbolTable</tt></a>.</p> - -<p>The list of <a href="#BasicBlock"><tt>BasicBlock</tt></a>s is the most -commonly used part of <tt>Function</tt> objects. The list imposes an implicit -ordering of the blocks in the function, which indicate how the code will be -laid out by the backend. Additionally, the first <a -href="#BasicBlock"><tt>BasicBlock</tt></a> is the implicit entry node for the -<tt>Function</tt>. It is not legal in LLVM to explicitly branch to this initial -block. There are no implicit exit nodes, and in fact there may be multiple exit -nodes from a single <tt>Function</tt>. If the <a -href="#BasicBlock"><tt>BasicBlock</tt></a> list is empty, this indicates that -the <tt>Function</tt> is actually a function declaration: the actual body of the -function hasn't been linked in yet.</p> - -<p>In addition to a list of <a href="#BasicBlock"><tt>BasicBlock</tt></a>s, the -<tt>Function</tt> class also keeps track of the list of formal <a -href="#Argument"><tt>Argument</tt></a>s that the function receives. This -container manages the lifetime of the <a href="#Argument"><tt>Argument</tt></a> -nodes, just like the <a href="#BasicBlock"><tt>BasicBlock</tt></a> list does for -the <a href="#BasicBlock"><tt>BasicBlock</tt></a>s.</p> - -<p>The <a href="#SymbolTable"><tt>SymbolTable</tt></a> is a very rarely used -LLVM feature that is only used when you have to look up a value by name. Aside -from that, the <a href="#SymbolTable"><tt>SymbolTable</tt></a> is used -internally to make sure that there are not conflicts between the names of <a -href="#Instruction"><tt>Instruction</tt></a>s, <a -href="#BasicBlock"><tt>BasicBlock</tt></a>s, or <a -href="#Argument"><tt>Argument</tt></a>s in the function body.</p> - -<p>Note that <tt>Function</tt> is a <a href="#GlobalValue">GlobalValue</a> -and therefore also a <a href="#Constant">Constant</a>. The value of the function -is its address (after linking) which is guaranteed to be constant.</p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="m_Function"> - Important Public Members of the <tt>Function</tt> class - </a> -</h4> - -<div> - -<ul> - <li><tt>Function(const </tt><tt><a href="#FunctionType">FunctionType</a> - *Ty, LinkageTypes Linkage, const std::string &N = "", Module* Parent = 0)</tt> - - <p>Constructor used when you need to create new <tt>Function</tt>s to add - the program. The constructor must specify the type of the function to - create and what type of linkage the function should have. The <a - href="#FunctionType"><tt>FunctionType</tt></a> argument - specifies the formal arguments and return value for the function. The same - <a href="#FunctionType"><tt>FunctionType</tt></a> value can be used to - create multiple functions. The <tt>Parent</tt> argument specifies the Module - in which the function is defined. If this argument is provided, the function - will automatically be inserted into that module's list of - functions.</p></li> - - <li><tt>bool isDeclaration()</tt> - - <p>Return whether or not the <tt>Function</tt> has a body defined. If the - function is "external", it does not have a body, and thus must be resolved - by linking with a function defined in a different translation unit.</p></li> - - <li><tt>Function::iterator</tt> - Typedef for basic block list iterator<br> - <tt>Function::const_iterator</tt> - Typedef for const_iterator.<br> - - <tt>begin()</tt>, <tt>end()</tt> - <tt>size()</tt>, <tt>empty()</tt> - - <p>These are forwarding methods that make it easy to access the contents of - a <tt>Function</tt> object's <a href="#BasicBlock"><tt>BasicBlock</tt></a> - list.</p></li> - - <li><tt>Function::BasicBlockListType &getBasicBlockList()</tt> - - <p>Returns the list of <a href="#BasicBlock"><tt>BasicBlock</tt></a>s. This - is necessary to use when you need to update the list or perform a complex - action that doesn't have a forwarding method.</p></li> - - <li><tt>Function::arg_iterator</tt> - Typedef for the argument list -iterator<br> - <tt>Function::const_arg_iterator</tt> - Typedef for const_iterator.<br> - - <tt>arg_begin()</tt>, <tt>arg_end()</tt> - <tt>arg_size()</tt>, <tt>arg_empty()</tt> - - <p>These are forwarding methods that make it easy to access the contents of - a <tt>Function</tt> object's <a href="#Argument"><tt>Argument</tt></a> - list.</p></li> - - <li><tt>Function::ArgumentListType &getArgumentList()</tt> - - <p>Returns the list of <a href="#Argument"><tt>Argument</tt></a>s. This is - necessary to use when you need to update the list or perform a complex - action that doesn't have a forwarding method.</p></li> - - <li><tt><a href="#BasicBlock">BasicBlock</a> &getEntryBlock()</tt> - - <p>Returns the entry <a href="#BasicBlock"><tt>BasicBlock</tt></a> for the - function. Because the entry block for the function is always the first - block, this returns the first block of the <tt>Function</tt>.</p></li> - - <li><tt><a href="#Type">Type</a> *getReturnType()</tt><br> - <tt><a href="#FunctionType">FunctionType</a> *getFunctionType()</tt> - - <p>This traverses the <a href="#Type"><tt>Type</tt></a> of the - <tt>Function</tt> and returns the return type of the function, or the <a - href="#FunctionType"><tt>FunctionType</tt></a> of the actual - function.</p></li> - - <li><tt><a href="#SymbolTable">SymbolTable</a> *getSymbolTable()</tt> - - <p> Return a pointer to the <a href="#SymbolTable"><tt>SymbolTable</tt></a> - for this <tt>Function</tt>.</p></li> -</ul> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="GlobalVariable">The <tt>GlobalVariable</tt> class</a> -</h3> - -<div> - -<p><tt>#include "<a -href="/doxygen/GlobalVariable_8h-source.html">llvm/GlobalVariable.h</a>"</tt> -<br> -doxygen info: <a href="/doxygen/classllvm_1_1GlobalVariable.html">GlobalVariable - Class</a><br> -Superclasses: <a href="#GlobalValue"><tt>GlobalValue</tt></a>, -<a href="#Constant"><tt>Constant</tt></a>, -<a href="#User"><tt>User</tt></a>, -<a href="#Value"><tt>Value</tt></a></p> - -<p>Global variables are represented with the (surprise surprise) -<tt>GlobalVariable</tt> class. Like functions, <tt>GlobalVariable</tt>s are also -subclasses of <a href="#GlobalValue"><tt>GlobalValue</tt></a>, and as such are -always referenced by their address (global values must live in memory, so their -"name" refers to their constant address). See -<a href="#GlobalValue"><tt>GlobalValue</tt></a> for more on this. Global -variables may have an initial value (which must be a -<a href="#Constant"><tt>Constant</tt></a>), and if they have an initializer, -they may be marked as "constant" themselves (indicating that their contents -never change at runtime).</p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="m_GlobalVariable"> - Important Public Members of the <tt>GlobalVariable</tt> class - </a> -</h4> - -<div> - -<ul> - <li><tt>GlobalVariable(const </tt><tt><a href="#Type">Type</a> *Ty, bool - isConstant, LinkageTypes& Linkage, <a href="#Constant">Constant</a> - *Initializer = 0, const std::string &Name = "", Module* Parent = 0)</tt> - - <p>Create a new global variable of the specified type. If - <tt>isConstant</tt> is true then the global variable will be marked as - unchanging for the program. The Linkage parameter specifies the type of - linkage (internal, external, weak, linkonce, appending) for the variable. - If the linkage is InternalLinkage, WeakAnyLinkage, WeakODRLinkage, - LinkOnceAnyLinkage or LinkOnceODRLinkage, then the resultant - global variable will have internal linkage. AppendingLinkage concatenates - together all instances (in different translation units) of the variable - into a single variable but is only applicable to arrays. See - the <a href="LangRef.html#modulestructure">LLVM Language Reference</a> for - further details on linkage types. Optionally an initializer, a name, and the - module to put the variable into may be specified for the global variable as - well.</p></li> - - <li><tt>bool isConstant() const</tt> - - <p>Returns true if this is a global variable that is known not to - be modified at runtime.</p></li> - - <li><tt>bool hasInitializer()</tt> - - <p>Returns true if this <tt>GlobalVariable</tt> has an intializer.</p></li> - - <li><tt><a href="#Constant">Constant</a> *getInitializer()</tt> - - <p>Returns the initial value for a <tt>GlobalVariable</tt>. It is not legal - to call this method if there is no initializer.</p></li> -</ul> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="BasicBlock">The <tt>BasicBlock</tt> class</a> -</h3> - -<div> - -<p><tt>#include "<a -href="/doxygen/BasicBlock_8h-source.html">llvm/BasicBlock.h</a>"</tt><br> -doxygen info: <a href="/doxygen/classllvm_1_1BasicBlock.html">BasicBlock -Class</a><br> -Superclass: <a href="#Value"><tt>Value</tt></a></p> - -<p>This class represents a single entry single exit section of the code, -commonly known as a basic block by the compiler community. The -<tt>BasicBlock</tt> class maintains a list of <a -href="#Instruction"><tt>Instruction</tt></a>s, which form the body of the block. -Matching the language definition, the last element of this list of instructions -is always a terminator instruction (a subclass of the <a -href="#TerminatorInst"><tt>TerminatorInst</tt></a> class).</p> - -<p>In addition to tracking the list of instructions that make up the block, the -<tt>BasicBlock</tt> class also keeps track of the <a -href="#Function"><tt>Function</tt></a> that it is embedded into.</p> - -<p>Note that <tt>BasicBlock</tt>s themselves are <a -href="#Value"><tt>Value</tt></a>s, because they are referenced by instructions -like branches and can go in the switch tables. <tt>BasicBlock</tt>s have type -<tt>label</tt>.</p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="m_BasicBlock"> - Important Public Members of the <tt>BasicBlock</tt> class - </a> -</h4> - -<div> -<ul> - -<li><tt>BasicBlock(const std::string &Name = "", </tt><tt><a - href="#Function">Function</a> *Parent = 0)</tt> - -<p>The <tt>BasicBlock</tt> constructor is used to create new basic blocks for -insertion into a function. The constructor optionally takes a name for the new -block, and a <a href="#Function"><tt>Function</tt></a> to insert it into. If -the <tt>Parent</tt> parameter is specified, the new <tt>BasicBlock</tt> is -automatically inserted at the end of the specified <a -href="#Function"><tt>Function</tt></a>, if not specified, the BasicBlock must be -manually inserted into the <a href="#Function"><tt>Function</tt></a>.</p></li> - -<li><tt>BasicBlock::iterator</tt> - Typedef for instruction list iterator<br> -<tt>BasicBlock::const_iterator</tt> - Typedef for const_iterator.<br> -<tt>begin()</tt>, <tt>end()</tt>, <tt>front()</tt>, <tt>back()</tt>, -<tt>size()</tt>, <tt>empty()</tt> -STL-style functions for accessing the instruction list. - -<p>These methods and typedefs are forwarding functions that have the same -semantics as the standard library methods of the same names. These methods -expose the underlying instruction list of a basic block in a way that is easy to -manipulate. To get the full complement of container operations (including -operations to update the list), you must use the <tt>getInstList()</tt> -method.</p></li> - -<li><tt>BasicBlock::InstListType &getInstList()</tt> - -<p>This method is used to get access to the underlying container that actually -holds the Instructions. This method must be used when there isn't a forwarding -function in the <tt>BasicBlock</tt> class for the operation that you would like -to perform. Because there are no forwarding functions for "updating" -operations, you need to use this if you want to update the contents of a -<tt>BasicBlock</tt>.</p></li> - -<li><tt><a href="#Function">Function</a> *getParent()</tt> - -<p> Returns a pointer to <a href="#Function"><tt>Function</tt></a> the block is -embedded into, or a null pointer if it is homeless.</p></li> - -<li><tt><a href="#TerminatorInst">TerminatorInst</a> *getTerminator()</tt> - -<p> Returns a pointer to the terminator instruction that appears at the end of -the <tt>BasicBlock</tt>. If there is no terminator instruction, or if the last -instruction in the block is not a terminator, then a null pointer is -returned.</p></li> - -</ul> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="Argument">The <tt>Argument</tt> class</a> -</h3> - -<div> - -<p>This subclass of Value defines the interface for incoming formal -arguments to a function. A Function maintains a list of its formal -arguments. An argument has a pointer to the parent Function.</p> - -</div> - -</div> - -<!-- *********************************************************************** --> -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss-blue" alt="Valid CSS"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01 Strict"></a> - - <a href="mailto:dhurjati@cs.uiuc.edu">Dinakar Dhurjati</a> and - <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> - <a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> - -</body> -</html> diff --git a/docs/ProgrammersManual.rst b/docs/ProgrammersManual.rst new file mode 100644 index 0000000000..2b272de425 --- /dev/null +++ b/docs/ProgrammersManual.rst @@ -0,0 +1,3166 @@ +======================== +LLVM Programmer's Manual +======================== + +.. contents:: + :local: + +.. warning:: + This is a work in progress. + +.. sectionauthor:: Chris Lattner <sabre@nondot.org>, + Dinakar Dhurjati <dhurjati@cs.uiuc.edu>, + Gabor Greif <ggreif@gmail.com>, + Joel Stanley <jstanley@cs.uiuc.edu>, + Reid Spencer <rspencer@x10sys.com> and + Owen Anderson <owen@apple.com> + +.. _introduction: + +Introduction +============ + +This document is meant to highlight some of the important classes and interfaces +available in the LLVM source-base. This manual is not intended to explain what +LLVM is, how it works, and what LLVM code looks like. It assumes that you know +the basics of LLVM and are interested in writing transformations or otherwise +analyzing or manipulating the code. + +This document should get you oriented so that you can find your way in the +continuously growing source code that makes up the LLVM infrastructure. Note +that this manual is not intended to serve as a replacement for reading the +source code, so if you think there should be a method in one of these classes to +do something, but it's not listed, check the source. Links to the `doxygen +<http://llvm.org/doxygen/>`__ sources are provided to make this as easy as +possible. + +The first section of this document describes general information that is useful +to know when working in the LLVM infrastructure, and the second describes the +Core LLVM classes. In the future this manual will be extended with information +describing how to use extension libraries, such as dominator information, CFG +traversal routines, and useful utilities like the ``InstVisitor`` (`doxygen +<http://llvm.org/doxygen/InstVisitor_8h-source.html>`__) template. + +.. _general: + +General Information +=================== + +This section contains general information that is useful if you are working in +the LLVM source-base, but that isn't specific to any particular API. + +.. _stl: + +The C++ Standard Template Library +--------------------------------- + +LLVM makes heavy use of the C++ Standard Template Library (STL), perhaps much +more than you are used to, or have seen before. Because of this, you might want +to do a little background reading in the techniques used and capabilities of the +library. There are many good pages that discuss the STL, and several books on +the subject that you can get, so it will not be discussed in this document. + +Here are some useful links: + +#. `cppreference.com + <http://en.cppreference.com/w/>`_ - an excellent + reference for the STL and other parts of the standard C++ library. + +#. `C++ In a Nutshell <http://www.tempest-sw.com/cpp/>`_ - This is an O'Reilly + book in the making. It has a decent Standard Library Reference that rivals + Dinkumware's, and is unfortunately no longer free since the book has been + published. + +#. `C++ Frequently Asked Questions <http://www.parashift.com/c++-faq-lite/>`_. + +#. `SGI's STL Programmer's Guide <http://www.sgi.com/tech/stl/>`_ - Contains a + useful `Introduction to the STL + <http://www.sgi.com/tech/stl/stl_introduction.html>`_. + +#. `Bjarne Stroustrup's C++ Page + <http://www.research.att.com/%7Ebs/C++.html>`_. + +#. `Bruce Eckel's Thinking in C++, 2nd ed. Volume 2 Revision 4.0 + (even better, get the book) + <http://www.mindview.net/Books/TICPP/ThinkingInCPP2e.html>`_. + +You are also encouraged to take a look at the :ref:`LLVM Coding Standards +<coding_standards>` guide which focuses on how to write maintainable code more +than where to put your curly braces. + +.. _resources: + +Other useful references +----------------------- + +#. `Using static and shared libraries across platforms + <http://www.fortran-2000.com/ArnaudRecipes/sharedlib.html>`_ + +.. _apis: + +Important and useful LLVM APIs +============================== + +Here we highlight some LLVM APIs that are generally useful and good to know +about when writing transformations. + +.. _isa: + +The ``isa<>``, ``cast<>`` and ``dyn_cast<>`` templates +------------------------------------------------------ + +The LLVM source-base makes extensive use of a custom form of RTTI. These +templates have many similarities to the C++ ``dynamic_cast<>`` operator, but +they don't have some drawbacks (primarily stemming from the fact that +``dynamic_cast<>`` only works on classes that have a v-table). Because they are +used so often, you must know what they do and how they work. All of these +templates are defined in the ``llvm/Support/Casting.h`` (`doxygen +<http://llvm.org/doxygen/Casting_8h-source.html>`__) file (note that you very +rarely have to include this file directly). + +``isa<>``: + The ``isa<>`` operator works exactly like the Java "``instanceof``" operator. + It returns true or false depending on whether a reference or pointer points to + an instance of the specified class. This can be very useful for constraint + checking of various sorts (example below). + +``cast<>``: + The ``cast<>`` operator is a "checked cast" operation. It converts a pointer + or reference from a base class to a derived class, causing an assertion + failure if it is not really an instance of the right type. This should be + used in cases where you have some information that makes you believe that + something is of the right type. An example of the ``isa<>`` and ``cast<>`` + template is: + + .. code-block:: c++ + + static bool isLoopInvariant(const Value *V, const Loop *L) { + if (isa<Constant>(V) || isa<Argument>(V) || isa<GlobalValue>(V)) + return true; + + // Otherwise, it must be an instruction... + return !L->contains(cast<Instruction>(V)->getParent()); + } + + Note that you should **not** use an ``isa<>`` test followed by a ``cast<>``, + for that use the ``dyn_cast<>`` operator. + +``dyn_cast<>``: + The ``dyn_cast<>`` operator is a "checking cast" operation. It checks to see + if the operand is of the specified type, and if so, returns a pointer to it + (this operator does not work with references). If the operand is not of the + correct type, a null pointer is returned. Thus, this works very much like + the ``dynamic_cast<>`` operator in C++, and should be used in the same + circumstances. Typically, the ``dyn_cast<>`` operator is used in an ``if`` + statement or some other flow control statement like this: + + .. code-block:: c++ + + if (AllocationInst *AI = dyn_cast<AllocationInst>(Val)) { + // ... + } + + This form of the ``if`` statement effectively combines together a call to + ``isa<>`` and a call to ``cast<>`` into one statement, which is very + convenient. + + Note that the ``dyn_cast<>`` operator, like C++'s ``dynamic_cast<>`` or Java's + ``instanceof`` operator, can be abused. In particular, you should not use big + chained ``if/then/else`` blocks to check for lots of different variants of + classes. If you find yourself wanting to do this, it is much cleaner and more + efficient to use the ``InstVisitor`` class to dispatch over the instruction + type directly. + +``cast_or_null<>``: + The ``cast_or_null<>`` operator works just like the ``cast<>`` operator, + except that it allows for a null pointer as an argument (which it then + propagates). This can sometimes be useful, allowing you to combine several + null checks into one. + +``dyn_cast_or_null<>``: + The ``dyn_cast_or_null<>`` operator works just like the ``dyn_cast<>`` + operator, except that it allows for a null pointer as an argument (which it + then propagates). This can sometimes be useful, allowing you to combine + several null checks into one. + +These five templates can be used with any classes, whether they have a v-table +or not. If you want to add support for these templates, see the document +:ref:`How to set up LLVM-style RTTI for your class hierarchy +<how-to-set-up-llvm-style-rtti>` + +.. _string_apis: + +Passing strings (the ``StringRef`` and ``Twine`` classes) +--------------------------------------------------------- + +Although LLVM generally does not do much string manipulation, we do have several +important APIs which take strings. Two important examples are the Value class +-- which has names for instructions, functions, etc. -- and the ``StringMap`` +class which is used extensively in LLVM and Clang. + +These are generic classes, and they need to be able to accept strings which may +have embedded null characters. Therefore, they cannot simply take a ``const +char *``, and taking a ``const std::string&`` requires clients to perform a heap +allocation which is usually unnecessary. Instead, many LLVM APIs use a +``StringRef`` or a ``const Twine&`` for passing strings efficiently. + +.. _StringRef: + +The ``StringRef`` class +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The ``StringRef`` data type represents a reference to a constant string (a +character array and a length) and supports the common operations available on +``std::string``, but does not require heap allocation. + +It can be implicitly constructed using a C style null-terminated string, an +``std::string``, or explicitly with a character pointer and length. For +example, the ``StringRef`` find function is declared as: + +.. code-block:: c++ + + iterator find(StringRef Key); + +and clients can call it using any one of: + +.. code-block:: c++ + + Map.find("foo"); // Lookup "foo" + Map.find(std::string("bar")); // Lookup "bar" + Map.find(StringRef("\0baz", 4)); // Lookup "\0baz" + +Similarly, APIs which need to return a string may return a ``StringRef`` +instance, which can be used directly or converted to an ``std::string`` using +the ``str`` member function. See ``llvm/ADT/StringRef.h`` (`doxygen +<http://llvm.org/doxygen/classllvm_1_1StringRef_8h-source.html>`__) for more +information. + +You should rarely use the ``StringRef`` class directly, because it contains +pointers to external memory it is not generally safe to store an instance of the +class (unless you know that the external storage will not be freed). +``StringRef`` is small and pervasive enough in LLVM that it should always be +passed by value. + +The ``Twine`` class +^^^^^^^^^^^^^^^^^^^ + +The ``Twine`` (`doxygen <http://llvm.org/doxygen/classllvm_1_1Twine.html>`__) +class is an efficient way for APIs to accept concatenated strings. For example, +a common LLVM paradigm is to name one instruction based on the name of another +instruction with a suffix, for example: + +.. code-block:: c++ + + New = CmpInst::Create(..., SO->getName() + ".cmp"); + +The ``Twine`` class is effectively a lightweight `rope +<http://en.wikipedia.org/wiki/Rope_(computer_science)>`_ which points to +temporary (stack allocated) objects. Twines can be implicitly constructed as +the result of the plus operator applied to strings (i.e., a C strings, an +``std::string``, or a ``StringRef``). The twine delays the actual concatenation +of strings until it is actually required, at which point it can be efficiently +rendered directly into a character array. This avoids unnecessary heap +allocation involved in constructing the temporary results of string +concatenation. See ``llvm/ADT/Twine.h`` (`doxygen +<http://llvm.org/doxygen/Twine_8h_source.html>`__) and :ref:`here <dss_twine>` +for more information. + +As with a ``StringRef``, ``Twine`` objects point to external memory and should +almost never be stored or mentioned directly. They are intended solely for use +when defining a function which should be able to efficiently accept concatenated +strings. + +.. _DEBUG: + +The ``DEBUG()`` macro and ``-debug`` option +------------------------------------------- + +Often when working on your pass you will put a bunch of debugging printouts and +other code into your pass. After you get it working, you want to remove it, but +you may need it again in the future (to work out new bugs that you run across). + +Naturally, because of this, you don't want to delete the debug printouts, but +you don't want them to always be noisy. A standard compromise is to comment +them out, allowing you to enable them if you need them in the future. + +The ``llvm/Support/Debug.h`` (`doxygen +<http://llvm.org/doxygen/Debug_8h-source.html>`__) file provides a macro named +``DEBUG()`` that is a much nicer solution to this problem. Basically, you can +put arbitrary code into the argument of the ``DEBUG`` macro, and it is only +executed if '``opt``' (or any other tool) is run with the '``-debug``' command +line argument: + +.. code-block:: c++ + + DEBUG(errs() << "I am here!\n"); + +Then you can run your pass like this: + +.. code-block:: none + + $ opt < a.bc > /dev/null -mypass + <no output> + $ opt < a.bc > /dev/null -mypass -debug + I am here! + +Using the ``DEBUG()`` macro instead of a home-brewed solution allows you to not +have to create "yet another" command line option for the debug output for your +pass. Note that ``DEBUG()`` macros are disabled for optimized builds, so they +do not cause a performance impact at all (for the same reason, they should also +not contain side-effects!). + +One additional nice thing about the ``DEBUG()`` macro is that you can enable or +disable it directly in gdb. Just use "``set DebugFlag=0``" or "``set +DebugFlag=1``" from the gdb if the program is running. If the program hasn't +been started yet, you can always just run it with ``-debug``. + +.. _DEBUG_TYPE: + +Fine grained debug info with ``DEBUG_TYPE`` and the ``-debug-only`` option +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Sometimes you may find yourself in a situation where enabling ``-debug`` just +turns on **too much** information (such as when working on the code generator). +If you want to enable debug information with more fine-grained control, you +define the ``DEBUG_TYPE`` macro and the ``-debug`` only option as follows: + +.. code-block:: c++ + + #undef DEBUG_TYPE + DEBUG(errs() << "No debug type\n"); + #define DEBUG_TYPE "foo" + DEBUG(errs() << "'foo' debug type\n"); + #undef DEBUG_TYPE + #define DEBUG_TYPE "bar" + DEBUG(errs() << "'bar' debug type\n")); + #undef DEBUG_TYPE + #define DEBUG_TYPE "" + DEBUG(errs() << "No debug type (2)\n"); + +Then you can run your pass like this: + +.. code-block:: none + + $ opt < a.bc > /dev/null -mypass + <no output> + $ opt < a.bc > /dev/null -mypass -debug + No debug type + 'foo' debug type + 'bar' debug type + No debug type (2) + $ opt < a.bc > /dev/null -mypass -debug-only=foo + 'foo' debug type + $ opt < a.bc > /dev/null -mypass -debug-only=bar + 'bar' debug type + +Of course, in practice, you should only set ``DEBUG_TYPE`` at the top of a file, +to specify the debug type for the entire module (if you do this before you +``#include "llvm/Support/Debug.h"``, you don't have to insert the ugly +``#undef``'s). Also, you should use names more meaningful than "foo" and "bar", +because there is no system in place to ensure that names do not conflict. If +two different modules use the same string, they will all be turned on when the +name is specified. This allows, for example, all debug information for +instruction scheduling to be enabled with ``-debug-type=InstrSched``, even if +the source lives in multiple files. + +The ``DEBUG_WITH_TYPE`` macro is also available for situations where you would +like to set ``DEBUG_TYPE``, but only for one specific ``DEBUG`` statement. It +takes an additional first parameter, which is the type to use. For example, the +preceding example could be written as: + +.. code-block:: c++ + + DEBUG_WITH_TYPE("", errs() << "No debug type\n"); + DEBUG_WITH_TYPE("foo", errs() << "'foo' debug type\n"); + DEBUG_WITH_TYPE("bar", errs() << "'bar' debug type\n")); + DEBUG_WITH_TYPE("", errs() << "No debug type (2)\n"); + +.. _Statistic: + +The ``Statistic`` class & ``-stats`` option +------------------------------------------- + +The ``llvm/ADT/Statistic.h`` (`doxygen +<http://llvm.org/doxygen/Statistic_8h-source.html>`__) file provides a class +named ``Statistic`` that is used as a unified way to keep track of what the LLVM +compiler is doing and how effective various optimizations are. It is useful to +see what optimizations are contributing to making a particular program run +faster. + +Often you may run your pass on some big program, and you're interested to see +how many times it makes a certain transformation. Although you can do this with +hand inspection, or some ad-hoc method, this is a real pain and not very useful +for big programs. Using the ``Statistic`` class makes it very easy to keep +track of this information, and the calculated information is presented in a +uniform manner with the rest of the passes being executed. + +There are many examples of ``Statistic`` uses, but the basics of using it are as +follows: + +#. Define your statistic like this: + + .. code-block:: c++ + + #define DEBUG_TYPE "mypassname" // This goes before any #includes. + STATISTIC(NumXForms, "The # of times I did stuff"); + + The ``STATISTIC`` macro defines a static variable, whose name is specified by + the first argument. The pass name is taken from the ``DEBUG_TYPE`` macro, and + the description is taken from the second argument. The variable defined + ("NumXForms" in this case) acts like an unsigned integer. + +#. Whenever you make a transformation, bump the counter: + + .. code-block:: c++ + + ++NumXForms; // I did stuff! + +That's all you have to do. To get '``opt``' to print out the statistics +gathered, use the '``-stats``' option: + +.. code-block:: none + + $ opt -stats -mypassname < program.bc > /dev/null + ... statistics output ... + +When running ``opt`` on a C file from the SPEC benchmark suite, it gives a +report that looks like this: + +.. code-block:: none + + 7646 bitcodewriter - Number of normal instructions + 725 bitcodewriter - Number of oversized instructions + 129996 bitcodewriter - Number of bitcode bytes written + 2817 raise - Number of insts DCEd or constprop'd + 3213 raise - Number of cast-of-self removed + 5046 raise - Number of expression trees converted + 75 raise - Number of other getelementptr's formed + 138 raise - Number of load/store peepholes + 42 deadtypeelim - Number of unused typenames removed from symtab + 392 funcresolve - Number of varargs functions resolved + 27 globaldce - Number of global variables removed + 2 adce - Number of basic blocks removed + 134 cee - Number of branches revectored + 49 cee - Number of setcc instruction eliminated + 532 gcse - Number of loads removed + 2919 gcse - Number of instructions removed + 86 indvars - Number of canonical indvars added + 87 indvars - Number of aux indvars removed + 25 instcombine - Number of dead inst eliminate + 434 instcombine - Number of insts combined + 248 licm - Number of load insts hoisted + 1298 licm - Number of insts hoisted to a loop pre-header + 3 licm - Number of insts hoisted to multiple loop preds (bad, no loop pre-header) + 75 mem2reg - Number of alloca's promoted + 1444 cfgsimplify - Number of blocks simplified + +Obviously, with so many optimizations, having a unified framework for this stuff +is very nice. Making your pass fit well into the framework makes it more +maintainable and useful. + +.. _ViewGraph: + +Viewing graphs while debugging code +----------------------------------- + +Several of the important data structures in LLVM are graphs: for example CFGs +made out of LLVM :ref:`BasicBlocks <BasicBlock>`, CFGs made out of LLVM +:ref:`MachineBasicBlocks <MachineBasicBlock>`, and :ref:`Instruction Selection +DAGs <SelectionDAG>`. In many cases, while debugging various parts of the +compiler, it is nice to instantly visualize these graphs. + +LLVM provides several callbacks that are available in a debug build to do +exactly that. If you call the ``Function::viewCFG()`` method, for example, the +current LLVM tool will pop up a window containing the CFG for the function where +each basic block is a node in the graph, and each node contains the instructions +in the block. Similarly, there also exists ``Function::viewCFGOnly()`` (does +not include the instructions), the ``MachineFunction::viewCFG()`` and +``MachineFunction::viewCFGOnly()``, and the ``SelectionDAG::viewGraph()`` +methods. Within GDB, for example, you can usually use something like ``call +DAG.viewGraph()`` to pop up a window. Alternatively, you can sprinkle calls to +these functions in your code in places you want to debug. + +Getting this to work requires a small amount of configuration. On Unix systems +with X11, install the `graphviz <http://www.graphviz.org>`_ toolkit, and make +sure 'dot' and 'gv' are in your path. If you are running on Mac OS/X, download +and install the Mac OS/X `Graphviz program +<http://www.pixelglow.com/graphviz/>`_ and add +``/Applications/Graphviz.app/Contents/MacOS/`` (or wherever you install it) to +your path. Once in your system and path are set up, rerun the LLVM configure +script and rebuild LLVM to enable this functionality. + +``SelectionDAG`` has been extended to make it easier to locate *interesting* +nodes in large complex graphs. From gdb, if you ``call DAG.setGraphColor(node, +"color")``, then the next ``call DAG.viewGraph()`` would highlight the node in +the specified color (choices of colors can be found at `colors +<http://www.graphviz.org/doc/info/colors.html>`_.) More complex node attributes +can be provided with ``call DAG.setGraphAttrs(node, "attributes")`` (choices can +be found at `Graph attributes <http://www.graphviz.org/doc/info/attrs.html>`_.) +If you want to restart and clear all the current graph attributes, then you can +``call DAG.clearGraphAttrs()``. + +Note that graph visualization features are compiled out of Release builds to +reduce file size. This means that you need a Debug+Asserts or Release+Asserts +build to use these features. + +.. _datastructure: + +Picking the Right Data Structure for a Task +=========================================== + +LLVM has a plethora of data structures in the ``llvm/ADT/`` directory, and we +commonly use STL data structures. This section describes the trade-offs you +should consider when you pick one. + +The first step is a choose your own adventure: do you want a sequential +container, a set-like container, or a map-like container? The most important +thing when choosing a container is the algorithmic properties of how you plan to +access the container. Based on that, you should use: + + +* a :ref:`map-like <ds_map>` container if you need efficient look-up of a + value based on another value. Map-like containers also support efficient + queries for containment (whether a key is in the map). Map-like containers + generally do not support efficient reverse mapping (values to keys). If you + need that, use two maps. Some map-like containers also support efficient + iteration through the keys in sorted order. Map-like containers are the most + expensive sort, only use them if you need one of these capabilities. + +* a :ref:`set-like <ds_set>` container if you need to put a bunch of stuff into + a container that automatically eliminates duplicates. Some set-like + containers support efficient iteration through the elements in sorted order. + Set-like containers are more expensive than sequential containers. + +* a :ref:`sequential <ds_sequential>` container provides the most efficient way + to add elements and keeps track of the order they are added to the collection. + They permit duplicates and support efficient iteration, but do not support + efficient look-up based on a key. + +* a :ref:`string <ds_string>` container is a specialized sequential container or + reference structure that is used for character or byte arrays. + +* a :ref:`bit <ds_bit>` container provides an efficient way to store and + perform set operations on sets of numeric id's, while automatically + eliminating duplicates. Bit containers require a maximum of 1 bit for each + identifier you want to store. + +Once the proper category of container is determined, you can fine tune the +memory use, constant factors, and cache behaviors of access by intelligently +picking a member of the category. Note that constant factors and cache behavior +can be a big deal. If you have a vector that usually only contains a few +elements (but could contain many), for example, it's much better to use +:ref:`SmallVector <dss_smallvector>` than :ref:`vector <dss_vector>`. Doing so +avoids (relatively) expensive malloc/free calls, which dwarf the cost of adding +the elements to the container. + +.. _ds_sequential: + +Sequential Containers (std::vector, std::list, etc) +--------------------------------------------------- + +There are a variety of sequential containers available for you, based on your +needs. Pick the first in this section that will do what you want. + +.. _dss_arrayref: + +llvm/ADT/ArrayRef.h +^^^^^^^^^^^^^^^^^^^ + +The ``llvm::ArrayRef`` class is the preferred class to use in an interface that +accepts a sequential list of elements in memory and just reads from them. By +taking an ``ArrayRef``, the API can be passed a fixed size array, an +``std::vector``, an ``llvm::SmallVector`` and anything else that is contiguous +in memory. + +.. _dss_fixedarrays: + +Fixed Size Arrays +^^^^^^^^^^^^^^^^^ + +Fixed size arrays are very simple and very fast. They are good if you know +exactly how many elements you have, or you have a (low) upper bound on how many +you have. + +.. _dss_heaparrays: + +Heap Allocated Arrays +^^^^^^^^^^^^^^^^^^^^^ + +Heap allocated arrays (``new[]`` + ``delete[]``) are also simple. They are good +if the number of elements is variable, if you know how many elements you will +need before the array is allocated, and if the array is usually large (if not, +consider a :ref:`SmallVector <dss_smallvector>`). The cost of a heap allocated +array is the cost of the new/delete (aka malloc/free). Also note that if you +are allocating an array of a type with a constructor, the constructor and +destructors will be run for every element in the array (re-sizable vectors only +construct those elements actually used). + +.. _dss_tinyptrvector: + +llvm/ADT/TinyPtrVector.h +^^^^^^^^^^^^^^^^^^^^^^^^ + +``TinyPtrVector<Type>`` is a highly specialized collection class that is +optimized to avoid allocation in the case when a vector has zero or one +elements. It has two major restrictions: 1) it can only hold values of pointer +type, and 2) it cannot hold a null pointer. + +Since this container is highly specialized, it is rarely used. + +.. _dss_smallvector: + +llvm/ADT/SmallVector.h +^^^^^^^^^^^^^^^^^^^^^^ + +``SmallVector<Type, N>`` is a simple class that looks and smells just like +``vector<Type>``: it supports efficient iteration, lays out elements in memory +order (so you can do pointer arithmetic between elements), supports efficient +push_back/pop_back operations, supports efficient random access to its elements, +etc. + +The advantage of SmallVector is that it allocates space for some number of +elements (N) **in the object itself**. Because of this, if the SmallVector is +dynamically smaller than N, no malloc is performed. This can be a big win in +cases where the malloc/free call is far more expensive than the code that +fiddles around with the elements. + +This is good for vectors that are "usually small" (e.g. the number of +predecessors/successors of a block is usually less than 8). On the other hand, +this makes the size of the SmallVector itself large, so you don't want to +allocate lots of them (doing so will waste a lot of space). As such, +SmallVectors are most useful when on the stack. + +SmallVector also provides a nice portable and efficient replacement for +``alloca``. + +.. _dss_vector: + +<vector> +^^^^^^^^ + +``std::vector`` is well loved and respected. It is useful when SmallVector +isn't: when the size of the vector is often large (thus the small optimization +will rarely be a benefit) or if you will be allocating many instances of the +vector itself (which would waste space for elements that aren't in the +container). vector is also useful when interfacing with code that expects +vectors :). + +One worthwhile note about std::vector: avoid code like this: + +.. code-block:: c++ + + for ( ... ) { + std::vector<foo> V; + // make use of V. + } + +Instead, write this as: + +.. code-block:: c++ + + std::vector<foo> V; + for ( ... ) { + // make use of V. + V.clear(); + } + +Doing so will save (at least) one heap allocation and free per iteration of the +loop. + +.. _dss_deque: + +<deque> +^^^^^^^ + +``std::deque`` is, in some senses, a generalized version of ``std::vector``. +Like ``std::vector``, it provides constant time random access and other similar +properties, but it also provides efficient access to the front of the list. It +does not guarantee continuity of elements within memory. + +In exchange for this extra flexibility, ``std::deque`` has significantly higher +constant factor costs than ``std::vector``. If possible, use ``std::vector`` or +something cheaper. + +.. _dss_list: + +<list> +^^^^^^ + +``std::list`` is an extremely inefficient class that is rarely useful. It +performs a heap allocation for every element inserted into it, thus having an +extremely high constant factor, particularly for small data types. +``std::list`` also only supports bidirectional iteration, not random access +iteration. + +In exchange for this high cost, std::list supports efficient access to both ends +of the list (like ``std::deque``, but unlike ``std::vector`` or +``SmallVector``). In addition, the iterator invalidation characteristics of +std::list are stronger than that of a vector class: inserting or removing an +element into the list does not invalidate iterator or pointers to other elements +in the list. + +.. _dss_ilist: + +llvm/ADT/ilist.h +^^^^^^^^^^^^^^^^ + +``ilist<T>`` implements an 'intrusive' doubly-linked list. It is intrusive, +because it requires the element to store and provide access to the prev/next +pointers for the list. + +``ilist`` has the same drawbacks as ``std::list``, and additionally requires an +``ilist_traits`` implementation for the element type, but it provides some novel +characteristics. In particular, it can efficiently store polymorphic objects, +the traits class is informed when an element is inserted or removed from the +list, and ``ilist``\ s are guaranteed to support a constant-time splice +operation. + +These properties are exactly what we want for things like ``Instruction``\ s and +basic blocks, which is why these are implemented with ``ilist``\ s. + +Related classes of interest are explained in the following subsections: + +* :ref:`ilist_traits <dss_ilist_traits>` + +* :ref:`iplist <dss_iplist>` + +* :ref:`llvm/ADT/ilist_node.h <dss_ilist_node>` + +* :ref:`Sentinels <dss_ilist_sentinel>` + +.. _dss_packedvector: + +llvm/ADT/PackedVector.h +^^^^^^^^^^^^^^^^^^^^^^^ + +Useful for storing a vector of values using only a few number of bits for each +value. Apart from the standard operations of a vector-like container, it can +also perform an 'or' set operation. + +For example: + +.. code-block:: c++ + + enum State { + None = 0x0, + FirstCondition = 0x1, + SecondCondition = 0x2, + Both = 0x3 + }; + + State get() { + PackedVector<State, 2> Vec1; + Vec1.push_back(FirstCondition); + + PackedVector<State, 2> Vec2; + Vec2.push_back(SecondCondition); + + Vec1 |= Vec2; + return Vec1[0]; // returns 'Both'. + } + +.. _dss_ilist_traits: + +ilist_traits +^^^^^^^^^^^^ + +``ilist_traits<T>`` is ``ilist<T>``'s customization mechanism. ``iplist<T>`` +(and consequently ``ilist<T>``) publicly derive from this traits class. + +.. _dss_iplist: + +iplist +^^^^^^ + +``iplist<T>`` is ``ilist<T>``'s base and as such supports a slightly narrower +interface. Notably, inserters from ``T&`` are absent. + +``ilist_traits<T>`` is a public base of this class and can be used for a wide +variety of customizations. + +.. _dss_ilist_node: + +llvm/ADT/ilist_node.h +^^^^^^^^^^^^^^^^^^^^^ + +``ilist_node<T>`` implements a the forward and backward links that are expected +by the ``ilist<T>`` (and analogous containers) in the default manner. + +``ilist_node<T>``\ s are meant to be embedded in the node type ``T``, usually +``T`` publicly derives from ``ilist_node<T>``. + +.. _dss_ilist_sentinel: + +Sentinels +^^^^^^^^^ + +``ilist``\ s have another specialty that must be considered. To be a good +citizen in the C++ ecosystem, it needs to support the standard container +operations, such as ``begin`` and ``end`` iterators, etc. Also, the +``operator--`` must work correctly on the ``end`` iterator in the case of +non-empty ``ilist``\ s. + +The only sensible solution to this problem is to allocate a so-called *sentinel* +along with the intrusive list, which serves as the ``end`` iterator, providing +the back-link to the last element. However conforming to the C++ convention it +is illegal to ``operator++`` beyond the sentinel and it also must not be +dereferenced. + +These constraints allow for some implementation freedom to the ``ilist`` how to +allocate and store the sentinel. The corresponding policy is dictated by +``ilist_traits<T>``. By default a ``T`` gets heap-allocated whenever the need +for a sentinel arises. + +While the default policy is sufficient in most cases, it may break down when +``T`` does not provide a default constructor. Also, in the case of many +instances of ``ilist``\ s, the memory overhead of the associated sentinels is +wasted. To alleviate the situation with numerous and voluminous +``T``-sentinels, sometimes a trick is employed, leading to *ghostly sentinels*. + +Ghostly sentinels are obtained by specially-crafted ``ilist_traits<T>`` which +superpose the sentinel with the ``ilist`` instance in memory. Pointer +arithmetic is used to obtain the sentinel, which is relative to the ``ilist``'s +``this`` pointer. The ``ilist`` is augmented by an extra pointer, which serves +as the back-link of the sentinel. This is the only field in the ghostly +sentinel which can be legally accessed. + +.. _dss_other: + +Other Sequential Container options +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Other STL containers are available, such as ``std::string``. + +There are also various STL adapter classes such as ``std::queue``, +``std::priority_queue``, ``std::stack``, etc. These provide simplified access +to an underlying container but don't affect the cost of the container itself. + +.. _ds_string: + +String-like containers +---------------------- + +There are a variety of ways to pass around and use strings in C and C++, and +LLVM adds a few new options to choose from. Pick the first option on this list +that will do what you need, they are ordered according to their relative cost. + +Note that is is generally preferred to *not* pass strings around as ``const +char*``'s. These have a number of problems, including the fact that they +cannot represent embedded nul ("\0") characters, and do not have a length +available efficiently. The general replacement for '``const char*``' is +StringRef. + +For more information on choosing string containers for APIs, please see +:ref:`Passing Strings <string_apis>`. + +.. _dss_stringref: + +llvm/ADT/StringRef.h +^^^^^^^^^^^^^^^^^^^^ + +The StringRef class is a simple value class that contains a pointer to a +character and a length, and is quite related to the :ref:`ArrayRef +<dss_arrayref>` class (but specialized for arrays of characters). Because +StringRef carries a length with it, it safely handles strings with embedded nul +characters in it, getting the length does not require a strlen call, and it even +has very convenient APIs for slicing and dicing the character range that it +represents. + +StringRef is ideal for passing simple strings around that are known to be live, +either because they are C string literals, std::string, a C array, or a +SmallVector. Each of these cases has an efficient implicit conversion to +StringRef, which doesn't result in a dynamic strlen being executed. + +StringRef has a few major limitations which make more powerful string containers +useful: + +#. You cannot directly convert a StringRef to a 'const char*' because there is + no way to add a trailing nul (unlike the .c_str() method on various stronger + classes). + +#. StringRef doesn't own or keep alive the underlying string bytes. + As such it can easily lead to dangling pointers, and is not suitable for + embedding in datastructures in most cases (instead, use an std::string or + something like that). + +#. For the same reason, StringRef cannot be used as the return value of a + method if the method "computes" the result string. Instead, use std::string. + +#. StringRef's do not allow you to mutate the pointed-to string bytes and it + doesn't allow you to insert or remove bytes from the range. For editing + operations like this, it interoperates with the :ref:`Twine <dss_twine>` + class. + +Because of its strengths and limitations, it is very common for a function to +take a StringRef and for a method on an object to return a StringRef that points +into some string that it owns. + +.. _dss_twine: + +llvm/ADT/Twine.h +^^^^^^^^^^^^^^^^ + +The Twine class is used as an intermediary datatype for APIs that want to take a +string that can be constructed inline with a series of concatenations. Twine +works by forming recursive instances of the Twine datatype (a simple value +object) on the stack as temporary objects, linking them together into a tree +which is then linearized when the Twine is consumed. Twine is only safe to use +as the argument to a function, and should always be a const reference, e.g.: + +.. code-block:: c++ + + void foo(const Twine &T); + ... + StringRef X = ... + unsigned i = ... + foo(X + "." + Twine(i)); + +This example forms a string like "blarg.42" by concatenating the values +together, and does not form intermediate strings containing "blarg" or "blarg.". + +Because Twine is constructed with temporary objects on the stack, and because +these instances are destroyed at the end of the current statement, it is an +inherently dangerous API. For example, this simple variant contains undefined +behavior and will probably crash: + +.. code-block:: c++ + + void foo(const Twine &T); + ... + StringRef X = ... + unsigned i = ... + const Twine &Tmp = X + "." + Twine(i); + foo(Tmp); + +... because the temporaries are destroyed before the call. That said, Twine's +are much more efficient than intermediate std::string temporaries, and they work +really well with StringRef. Just be aware of their limitations. + +.. _dss_smallstring: + +llvm/ADT/SmallString.h +^^^^^^^^^^^^^^^^^^^^^^ + +SmallString is a subclass of :ref:`SmallVector <dss_smallvector>` that adds some +convenience APIs like += that takes StringRef's. SmallString avoids allocating +memory in the case when the preallocated space is enough to hold its data, and +it calls back to general heap allocation when required. Since it owns its data, +it is very safe to use and supports full mutation of the string. + +Like SmallVector's, the big downside to SmallString is their sizeof. While they +are optimized for small strings, they themselves are not particularly small. +This means that they work great for temporary scratch buffers on the stack, but +should not generally be put into the heap: it is very rare to see a SmallString +as the member of a frequently-allocated heap data structure or returned +by-value. + +.. _dss_stdstring: + +std::string +^^^^^^^^^^^ + +The standard C++ std::string class is a very general class that (like +SmallString) owns its underlying data. sizeof(std::string) is very reasonable +so it can be embedded into heap data structures and returned by-value. On the +other hand, std::string is highly inefficient for inline editing (e.g. +concatenating a bunch of stuff together) and because it is provided by the +standard library, its performance characteristics depend a lot of the host +standard library (e.g. libc++ and MSVC provide a highly optimized string class, +GCC contains a really slow implementation). + +The major disadvantage of std::string is that almost every operation that makes +them larger can allocate memory, which is slow. As such, it is better to use +SmallVector or Twine as a scratch buffer, but then use std::string to persist +the result. + +.. _ds_set: + +Set-Like Containers (std::set, SmallSet, SetVector, etc) +-------------------------------------------------------- + +Set-like containers are useful when you need to canonicalize multiple values +into a single representation. There are several different choices for how to do +this, providing various trade-offs. + +.. _dss_sortedvectorset: + +A sorted 'vector' +^^^^^^^^^^^^^^^^^ + +If you intend to insert a lot of elements, then do a lot of queries, a great +approach is to use a vector (or other sequential container) with +std::sort+std::unique to remove duplicates. This approach works really well if +your usage pattern has these two distinct phases (insert then query), and can be +coupled with a good choice of :ref:`sequential container <ds_sequential>`. + +This combination provides the several nice properties: the result data is +contiguous in memory (good for cache locality), has few allocations, is easy to +address (iterators in the final vector are just indices or pointers), and can be +efficiently queried with a standard binary or radix search. + +.. _dss_smallset: + +llvm/ADT/SmallSet.h +^^^^^^^^^^^^^^^^^^^ + +If you have a set-like data structure that is usually small and whose elements +are reasonably small, a ``SmallSet<Type, N>`` is a good choice. This set has +space for N elements in place (thus, if the set is dynamically smaller than N, +no malloc traffic is required) and accesses them with a simple linear search. +When the set grows beyond 'N' elements, it allocates a more expensive +representation that guarantees efficient access (for most types, it falls back +to std::set, but for pointers it uses something far better, :ref:`SmallPtrSet +<dss_smallptrset>`. + +The magic of this class is that it handles small sets extremely efficiently, but +gracefully handles extremely large sets without loss of efficiency. The +drawback is that the interface is quite small: it supports insertion, queries +and erasing, but does not support iteration. + +.. _dss_smallptrset: + +llvm/ADT/SmallPtrSet.h +^^^^^^^^^^^^^^^^^^^^^^ + +SmallPtrSet has all the advantages of ``SmallSet`` (and a ``SmallSet`` of +pointers is transparently implemented with a ``SmallPtrSet``), but also supports +iterators. If more than 'N' insertions are performed, a single quadratically +probed hash table is allocated and grows as needed, providing extremely +efficient access (constant time insertion/deleting/queries with low constant +factors) and is very stingy with malloc traffic. + +Note that, unlike ``std::set``, the iterators of ``SmallPtrSet`` are invalidated +whenever an insertion occurs. Also, the values visited by the iterators are not +visited in sorted order. + +.. _dss_denseset: + +llvm/ADT/DenseSet.h +^^^^^^^^^^^^^^^^^^^ + +DenseSet is a simple quadratically probed hash table. It excels at supporting +small values: it uses a single allocation to hold all of the pairs that are +currently inserted in the set. DenseSet is a great way to unique small values +that are not simple pointers (use :ref:`SmallPtrSet <dss_smallptrset>` for +pointers). Note that DenseSet has the same requirements for the value type that +:ref:`DenseMap <dss_densemap>` has. + +.. _dss_sparseset: + +llvm/ADT/SparseSet.h +^^^^^^^^^^^^^^^^^^^^ + +SparseSet holds a small number of objects identified by unsigned keys of +moderate size. It uses a lot of memory, but provides operations that are almost +as fast as a vector. Typical keys are physical registers, virtual registers, or +numbered basic blocks. + +SparseSet is useful for algorithms that need very fast clear/find/insert/erase +and fast iteration over small sets. It is not intended for building composite +data structures. + +.. _dss_FoldingSet: + +llvm/ADT/FoldingSet.h +^^^^^^^^^^^^^^^^^^^^^ + +FoldingSet is an aggregate class that is really good at uniquing +expensive-to-create or polymorphic objects. It is a combination of a chained +hash table with intrusive links (uniqued objects are required to inherit from +FoldingSetNode) that uses :ref:`SmallVector <dss_smallvector>` as part of its ID +process. + +Consider a case where you want to implement a "getOrCreateFoo" method for a +complex object (for example, a node in the code generator). The client has a +description of **what** it wants to generate (it knows the opcode and all the +operands), but we don't want to 'new' a node, then try inserting it into a set +only to find out it already exists, at which point we would have to delete it +and return the node that already exists. + +To support this style of client, FoldingSet perform a query with a +FoldingSetNodeID (which wraps SmallVector) that can be used to describe the +element that we want to query for. The query either returns the element +matching the ID or it returns an opaque ID that indicates where insertion should +take place. Construction of the ID usually does not require heap traffic. + +Because FoldingSet uses intrusive links, it can support polymorphic objects in +the set (for example, you can have SDNode instances mixed with LoadSDNodes). +Because the elements are individually allocated, pointers to the elements are +stable: inserting or removing elements does not invalidate any pointers to other +elements. + +.. _dss_set: + +<set> +^^^^^ + +``std::set`` is a reasonable all-around set class, which is decent at many +things but great at nothing. std::set allocates memory for each element +inserted (thus it is very malloc intensive) and typically stores three pointers +per element in the set (thus adding a large amount of per-element space +overhead). It offers guaranteed log(n) performance, which is not particularly +fast from a complexity standpoint (particularly if the elements of the set are +expensive to compare, like strings), and has extremely high constant factors for +lookup, insertion and removal. + +The advantages of std::set are that its iterators are stable (deleting or +inserting an element from the set does not affect iterators or pointers to other +elements) and that iteration over the set is guaranteed to be in sorted order. +If the elements in the set are large, then the relative overhead of the pointers +and malloc traffic is not a big deal, but if the elements of the set are small, +std::set is almost never a good choice. + +.. _dss_setvector: + +llvm/ADT/SetVector.h +^^^^^^^^^^^^^^^^^^^^ + +LLVM's ``SetVector<Type>`` is an adapter class that combines your choice of a +set-like container along with a :ref:`Sequential Container <ds_sequential>` The +important property that this provides is efficient insertion with uniquing +(duplicate elements are ignored) with iteration support. It implements this by +inserting elements into both a set-like container and the sequential container, +using the set-like container for uniquing and the sequential container for +iteration. + +The difference between SetVector and other sets is that the order of iteration +is guaranteed to match the order of insertion into the SetVector. This property +is really important for things like sets of pointers. Because pointer values +are non-deterministic (e.g. vary across runs of the program on different +machines), iterating over the pointers in the set will not be in a well-defined +order. + +The drawback of SetVector is that it requires twice as much space as a normal +set and has the sum of constant factors from the set-like container and the +sequential container that it uses. Use it **only** if you need to iterate over +the elements in a deterministic order. SetVector is also expensive to delete +elements out of (linear time), unless you use it's "pop_back" method, which is +faster. + +``SetVector`` is an adapter class that defaults to using ``std::vector`` and a +size 16 ``SmallSet`` for the underlying containers, so it is quite expensive. +However, ``"llvm/ADT/SetVector.h"`` also provides a ``SmallSetVector`` class, +which defaults to using a ``SmallVector`` and ``SmallSet`` of a specified size. +If you use this, and if your sets are dynamically smaller than ``N``, you will +save a lot of heap traffic. + +.. _dss_uniquevector: + +llvm/ADT/UniqueVector.h +^^^^^^^^^^^^^^^^^^^^^^^ + +UniqueVector is similar to :ref:`SetVector <dss_setvector>` but it retains a +unique ID for each element inserted into the set. It internally contains a map +and a vector, and it assigns a unique ID for each value inserted into the set. + +UniqueVector is very expensive: its cost is the sum of the cost of maintaining +both the map and vector, it has high complexity, high constant factors, and +produces a lot of malloc traffic. It should be avoided. + +.. _dss_immutableset: + +llvm/ADT/ImmutableSet.h +^^^^^^^^^^^^^^^^^^^^^^^ + +ImmutableSet is an immutable (functional) set implementation based on an AVL +tree. Adding or removing elements is done through a Factory object and results +in the creation of a new ImmutableSet object. If an ImmutableSet already exists +with the given contents, then the existing one is returned; equality is compared +with a FoldingSetNodeID. The time and space complexity of add or remove +operations is logarithmic in the size of the original set. + +There is no method for returning an element of the set, you can only check for +membership. + +.. _dss_otherset: + +Other Set-Like Container Options +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The STL provides several other options, such as std::multiset and the various +"hash_set" like containers (whether from C++ TR1 or from the SGI library). We +never use hash_set and unordered_set because they are generally very expensive +(each insertion requires a malloc) and very non-portable. + +std::multiset is useful if you're not interested in elimination of duplicates, +but has all the drawbacks of std::set. A sorted vector (where you don't delete +duplicate entries) or some other approach is almost always better. + +.. _ds_map: + +Map-Like Containers (std::map, DenseMap, etc) +--------------------------------------------- + +Map-like containers are useful when you want to associate data to a key. As +usual, there are a lot of different ways to do this. :) + +.. _dss_sortedvectormap: + +A sorted 'vector' +^^^^^^^^^^^^^^^^^ + +If your usage pattern follows a strict insert-then-query approach, you can +trivially use the same approach as :ref:`sorted vectors for set-like containers +<dss_sortedvectorset>`. The only difference is that your query function (which +uses std::lower_bound to get efficient log(n) lookup) should only compare the +key, not both the key and value. This yields the same advantages as sorted +vectors for sets. + +.. _dss_stringmap: + +llvm/ADT/StringMap.h +^^^^^^^^^^^^^^^^^^^^ + +Strings are commonly used as keys in maps, and they are difficult to support +efficiently: they are variable length, inefficient to hash and compare when +long, expensive to copy, etc. StringMap is a specialized container designed to +cope with these issues. It supports mapping an arbitrary range of bytes to an +arbitrary other object. + +The StringMap implementation uses a quadratically-probed hash table, where the +buckets store a pointer to the heap allocated entries (and some other stuff). +The entries in the map must be heap allocated because the strings are variable +length. The string data (key) and the element object (value) are stored in the +same allocation with the string data immediately after the element object. +This container guarantees the "``(char*)(&Value+1)``" points to the key string +for a value. + +The StringMap is very fast for several reasons: quadratic probing is very cache +efficient for lookups, the hash value of strings in buckets is not recomputed +when looking up an element, StringMap rarely has to touch the memory for +unrelated objects when looking up a value (even when hash collisions happen), +hash table growth does not recompute the hash values for strings already in the +table, and each pair in the map is store in a single allocation (the string data +is stored in the same allocation as the Value of a pair). + +StringMap also provides query methods that take byte ranges, so it only ever +copies a string if a value is inserted into the table. + +StringMap iteratation order, however, is not guaranteed to be deterministic, so +any uses which require that should instead use a std::map. + +.. _dss_indexmap: + +llvm/ADT/IndexedMap.h +^^^^^^^^^^^^^^^^^^^^^ + +IndexedMap is a specialized container for mapping small dense integers (or +values that can be mapped to small dense integers) to some other type. It is +internally implemented as a vector with a mapping function that maps the keys +to the dense integer range. + +This is useful for cases like virtual registers in the LLVM code generator: they +have a dense mapping that is offset by a compile-time constant (the first +virtual register ID). + +.. _dss_densemap: + +llvm/ADT/DenseMap.h +^^^^^^^^^^^^^^^^^^^ + +DenseMap is a simple quadratically probed hash table. It excels at supporting +small keys and values: it uses a single allocation to hold all of the pairs +that are currently inserted in the map. DenseMap is a great way to map +pointers to pointers, or map other small types to each other. + +There are several aspects of DenseMap that you should be aware of, however. +The iterators in a DenseMap are invalidated whenever an insertion occurs, +unlike map. Also, because DenseMap allocates space for a large number of +key/value pairs (it starts with 64 by default), it will waste a lot of space if +your keys or values are large. Finally, you must implement a partial +specialization of DenseMapInfo for the key that you want, if it isn't already +supported. This is required to tell DenseMap about two special marker values +(which can never be inserted into the map) that it needs internally. + +DenseMap's find_as() method supports lookup operations using an alternate key +type. This is useful in cases where the normal key type is expensive to +construct, but cheap to compare against. The DenseMapInfo is responsible for +defining the appropriate comparison and hashing methods for each alternate key +type used. + +.. _dss_valuemap: + +llvm/ADT/ValueMap.h +^^^^^^^^^^^^^^^^^^^ + +ValueMap is a wrapper around a :ref:`DenseMap <dss_densemap>` mapping +``Value*``\ s (or subclasses) to another type. When a Value is deleted or +RAUW'ed, ValueMap will update itself so the new version of the key is mapped to +the same value, just as if the key were a WeakVH. You can configure exactly how +this happens, and what else happens on these two events, by passing a ``Config`` +parameter to the ValueMap template. + +.. _dss_intervalmap: + +llvm/ADT/IntervalMap.h +^^^^^^^^^^^^^^^^^^^^^^ + +IntervalMap is a compact map for small keys and values. It maps key intervals +instead of single keys, and it will automatically coalesce adjacent intervals. +When then map only contains a few intervals, they are stored in the map object +itself to avoid allocations. + +The IntervalMap iterators are quite big, so they should not be passed around as +STL iterators. The heavyweight iterators allow a smaller data structure. + +.. _dss_map: + +<map> +^^^^^ + +std::map has similar characteristics to :ref:`std::set <dss_set>`: it uses a +single allocation per pair inserted into the map, it offers log(n) lookup with +an extremely large constant factor, imposes a space penalty of 3 pointers per +pair in the map, etc. + +std::map is most useful when your keys or values are very large, if you need to +iterate over the collection in sorted order, or if you need stable iterators +into the map (i.e. they don't get invalidated if an insertion or deletion of +another element takes place). + +.. _dss_mapvector: + +llvm/ADT/MapVector.h +^^^^^^^^^^^^^^^^^^^^ + +``MapVector<KeyT,ValueT>`` provides a subset of the DenseMap interface. The +main difference is that the iteration order is guaranteed to be the insertion +order, making it an easy (but somewhat expensive) solution for non-deterministic +iteration over maps of pointers. + +It is implemented by mapping from key to an index in a vector of key,value +pairs. This provides fast lookup and iteration, but has two main drawbacks: The +key is stored twice and it doesn't support removing elements. + +.. _dss_inteqclasses: + +llvm/ADT/IntEqClasses.h +^^^^^^^^^^^^^^^^^^^^^^^ + +IntEqClasses provides a compact representation of equivalence classes of small +integers. Initially, each integer in the range 0..n-1 has its own equivalence +class. Classes can be joined by passing two class representatives to the +join(a, b) method. Two integers are in the same class when findLeader() returns +the same representative. + +Once all equivalence classes are formed, the map can be compressed so each +integer 0..n-1 maps to an equivalence class number in the range 0..m-1, where m +is the total number of equivalence classes. The map must be uncompressed before +it can be edited again. + +.. _dss_immutablemap: + +llvm/ADT/ImmutableMap.h +^^^^^^^^^^^^^^^^^^^^^^^ + +ImmutableMap is an immutable (functional) map implementation based on an AVL +tree. Adding or removing elements is done through a Factory object and results +in the creation of a new ImmutableMap object. If an ImmutableMap already exists +with the given key set, then the existing one is returned; equality is compared +with a FoldingSetNodeID. The time and space complexity of add or remove +operations is logarithmic in the size of the original map. + +.. _dss_othermap: + +Other Map-Like Container Options +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The STL provides several other options, such as std::multimap and the various +"hash_map" like containers (whether from C++ TR1 or from the SGI library). We +never use hash_set and unordered_set because they are generally very expensive +(each insertion requires a malloc) and very non-portable. + +std::multimap is useful if you want to map a key to multiple values, but has all +the drawbacks of std::map. A sorted vector or some other approach is almost +always better. + +.. _ds_bit: + +Bit storage containers (BitVector, SparseBitVector) +--------------------------------------------------- + +Unlike the other containers, there are only two bit storage containers, and +choosing when to use each is relatively straightforward. + +One additional option is ``std::vector<bool>``: we discourage its use for two +reasons 1) the implementation in many common compilers (e.g. commonly +available versions of GCC) is extremely inefficient and 2) the C++ standards +committee is likely to deprecate this container and/or change it significantly +somehow. In any case, please don't use it. + +.. _dss_bitvector: + +BitVector +^^^^^^^^^ + +The BitVector container provides a dynamic size set of bits for manipulation. +It supports individual bit setting/testing, as well as set operations. The set +operations take time O(size of bitvector), but operations are performed one word +at a time, instead of one bit at a time. This makes the BitVector very fast for +set operations compared to other containers. Use the BitVector when you expect +the number of set bits to be high (i.e. a dense set). + +.. _dss_smallbitvector: + +SmallBitVector +^^^^^^^^^^^^^^ + +The SmallBitVector container provides the same interface as BitVector, but it is +optimized for the case where only a small number of bits, less than 25 or so, +are needed. It also transparently supports larger bit counts, but slightly less +efficiently than a plain BitVector, so SmallBitVector should only be used when +larger counts are rare. + +At this time, SmallBitVector does not support set operations (and, or, xor), and +its operator[] does not provide an assignable lvalue. + +.. _dss_sparsebitvector: + +SparseBitVector +^^^^^^^^^^^^^^^ + +The SparseBitVector container is much like BitVector, with one major difference: +Only the bits that are set, are stored. This makes the SparseBitVector much +more space efficient than BitVector when the set is sparse, as well as making +set operations O(number of set bits) instead of O(size of universe). The +downside to the SparseBitVector is that setting and testing of random bits is +O(N), and on large SparseBitVectors, this can be slower than BitVector. In our +implementation, setting or testing bits in sorted order (either forwards or +reverse) is O(1) worst case. Testing and setting bits within 128 bits (depends +on size) of the current bit is also O(1). As a general statement, +testing/setting bits in a SparseBitVector is O(distance away from last set bit). + +.. _common: + +Helpful Hints for Common Operations +=================================== + +This section describes how to perform some very simple transformations of LLVM +code. This is meant to give examples of common idioms used, showing the +practical side of LLVM transformations. + +Because this is a "how-to" section, you should also read about the main classes +that you will be working with. The :ref:`Core LLVM Class Hierarchy Reference +<coreclasses>` contains details and descriptions of the main classes that you +should know about. + +.. _inspection: + +Basic Inspection and Traversal Routines +--------------------------------------- + +The LLVM compiler infrastructure have many different data structures that may be +traversed. Following the example of the C++ standard template library, the +techniques used to traverse these various data structures are all basically the +same. For a enumerable sequence of values, the ``XXXbegin()`` function (or +method) returns an iterator to the start of the sequence, the ``XXXend()`` +function returns an iterator pointing to one past the last valid element of the +sequence, and there is some ``XXXiterator`` data type that is common between the +two operations. + +Because the pattern for iteration is common across many different aspects of the +program representation, the standard template library algorithms may be used on +them, and it is easier to remember how to iterate. First we show a few common +examples of the data structures that need to be traversed. Other data +structures are traversed in very similar ways. + +.. _iterate_function: + +Iterating over the ``BasicBlock`` in a ``Function`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +It's quite common to have a ``Function`` instance that you'd like to transform +in some way; in particular, you'd like to manipulate its ``BasicBlock``\ s. To +facilitate this, you'll need to iterate over all of the ``BasicBlock``\ s that +constitute the ``Function``. The following is an example that prints the name +of a ``BasicBlock`` and the number of ``Instruction``\ s it contains: + +.. code-block:: c++ + + // func is a pointer to a Function instance + for (Function::iterator i = func->begin(), e = func->end(); i != e; ++i) + // Print out the name of the basic block if it has one, and then the + // number of instructions that it contains + errs() << "Basic block (name=" << i->getName() << ") has " + << i->size() << " instructions.\n"; + +Note that i can be used as if it were a pointer for the purposes of invoking +member functions of the ``Instruction`` class. This is because the indirection +operator is overloaded for the iterator classes. In the above code, the +expression ``i->size()`` is exactly equivalent to ``(*i).size()`` just like +you'd expect. + +.. _iterate_basicblock: + +Iterating over the ``Instruction`` in a ``BasicBlock`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Just like when dealing with ``BasicBlock``\ s in ``Function``\ s, it's easy to +iterate over the individual instructions that make up ``BasicBlock``\ s. Here's +a code snippet that prints out each instruction in a ``BasicBlock``: + +.. code-block:: c++ + + // blk is a pointer to a BasicBlock instance + for (BasicBlock::iterator i = blk->begin(), e = blk->end(); i != e; ++i) + // The next statement works since operator<<(ostream&,...) + // is overloaded for Instruction& + errs() << *i << "\n"; + + +However, this isn't really the best way to print out the contents of a +``BasicBlock``! Since the ostream operators are overloaded for virtually +anything you'll care about, you could have just invoked the print routine on the +basic block itself: ``errs() << *blk << "\n";``. + +.. _iterate_insiter: + +Iterating over the ``Instruction`` in a ``Function`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +If you're finding that you commonly iterate over a ``Function``'s +``BasicBlock``\ s and then that ``BasicBlock``'s ``Instruction``\ s, +``InstIterator`` should be used instead. You'll need to include +``llvm/Support/InstIterator.h`` (`doxygen +<http://llvm.org/doxygen/InstIterator_8h-source.html>`__) and then instantiate +``InstIterator``\ s explicitly in your code. Here's a small example that shows +how to dump all instructions in a function to the standard error stream: + +.. code-block:: c++ + + #include "llvm/Support/InstIterator.h" + + // F is a pointer to a Function instance + for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I) + errs() << *I << "\n"; + +Easy, isn't it? You can also use ``InstIterator``\ s to fill a work list with +its initial contents. For example, if you wanted to initialize a work list to +contain all instructions in a ``Function`` F, all you would need to do is +something like: + +.. code-block:: c++ + + std::set<Instruction*> worklist; + // or better yet, SmallPtrSet<Instruction*, 64> worklist; + + for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I) + worklist.insert(&*I); + +The STL set ``worklist`` would now contain all instructions in the ``Function`` +pointed to by F. + +.. _iterate_convert: + +Turning an iterator into a class pointer (and vice-versa) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Sometimes, it'll be useful to grab a reference (or pointer) to a class instance +when all you've got at hand is an iterator. Well, extracting a reference or a +pointer from an iterator is very straight-forward. Assuming that ``i`` is a +``BasicBlock::iterator`` and ``j`` is a ``BasicBlock::const_iterator``: + +.. code-block:: c++ + + Instruction& inst = *i; // Grab reference to instruction reference + Instruction* pinst = &*i; // Grab pointer to instruction reference + const Instruction& inst = *j; + +However, the iterators you'll be working with in the LLVM framework are special: +they will automatically convert to a ptr-to-instance type whenever they need to. +Instead of derferencing the iterator and then taking the address of the result, +you can simply assign the iterator to the proper pointer type and you get the +dereference and address-of operation as a result of the assignment (behind the +scenes, this is a result of overloading casting mechanisms). Thus the last line +of the last example, + +.. code-block:: c++ + + Instruction *pinst = &*i; + +is semantically equivalent to + +.. code-block:: c++ + + Instruction *pinst = i; + +It's also possible to turn a class pointer into the corresponding iterator, and +this is a constant time operation (very efficient). The following code snippet +illustrates use of the conversion constructors provided by LLVM iterators. By +using these, you can explicitly grab the iterator of something without actually +obtaining it via iteration over some structure: + +.. code-block:: c++ + + void printNextInstruction(Instruction* inst) { + BasicBlock::iterator it(inst); + ++it; // After this line, it refers to the instruction after *inst + if (it != inst->getParent()->end()) errs() << *it << "\n"; + } + +Unfortunately, these implicit conversions come at a cost; they prevent these +iterators from conforming to standard iterator conventions, and thus from being +usable with standard algorithms and containers. For example, they prevent the +following code, where ``B`` is a ``BasicBlock``, from compiling: + +.. code-block:: c++ + + llvm::SmallVector<llvm::Instruction *, 16>(B->begin(), B->end()); + +Because of this, these implicit conversions may be removed some day, and +``operator*`` changed to return a pointer instead of a reference. + +.. _iterate_complex: + +Finding call sites: a slightly more complex example +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Say that you're writing a FunctionPass and would like to count all the locations +in the entire module (that is, across every ``Function``) where a certain +function (i.e., some ``Function *``) is already in scope. As you'll learn +later, you may want to use an ``InstVisitor`` to accomplish this in a much more +straight-forward manner, but this example will allow us to explore how you'd do +it if you didn't have ``InstVisitor`` around. In pseudo-code, this is what we +want to do: + +.. code-block:: none + + initialize callCounter to zero + for each Function f in the Module + for each BasicBlock b in f + for each Instruction i in b + if (i is a CallInst and calls the given function) + increment callCounter + +And the actual code is (remember, because we're writing a ``FunctionPass``, our +``FunctionPass``-derived class simply has to override the ``runOnFunction`` +method): + +.. code-block:: c++ + + Function* targetFunc = ...; + + class OurFunctionPass : public FunctionPass { + public: + OurFunctionPass(): callCounter(0) { } + + virtual runOnFunction(Function& F) { + for (Function::iterator b = F.begin(), be = F.end(); b != be; ++b) { + for (BasicBlock::iterator i = b->begin(), ie = b->end(); i != ie; ++i) { + if (CallInst* callInst = dyn_cast<CallInst>(&*i)) { + // We know we've encountered a call instruction, so we + // need to determine if it's a call to the + // function pointed to by m_func or not. + if (callInst->getCalledFunction() == targetFunc) + ++callCounter; + } + } + } + } + + private: + unsigned callCounter; + }; + +.. _calls_and_invokes: + +Treating calls and invokes the same way +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +You may have noticed that the previous example was a bit oversimplified in that +it did not deal with call sites generated by 'invoke' instructions. In this, +and in other situations, you may find that you want to treat ``CallInst``\ s and +``InvokeInst``\ s the same way, even though their most-specific common base +class is ``Instruction``, which includes lots of less closely-related things. +For these cases, LLVM provides a handy wrapper class called ``CallSite`` +(`doxygen <http://llvm.org/doxygen/classllvm_1_1CallSite.html>`__) It is +essentially a wrapper around an ``Instruction`` pointer, with some methods that +provide functionality common to ``CallInst``\ s and ``InvokeInst``\ s. + +This class has "value semantics": it should be passed by value, not by reference +and it should not be dynamically allocated or deallocated using ``operator new`` +or ``operator delete``. It is efficiently copyable, assignable and +constructable, with costs equivalents to that of a bare pointer. If you look at +its definition, it has only a single pointer member. + +.. _iterate_chains: + +Iterating over def-use & use-def chains +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Frequently, we might have an instance of the ``Value`` class (`doxygen +<http://llvm.org/doxygen/classllvm_1_1Value.html>`__) and we want to determine +which ``User`` s use the ``Value``. The list of all ``User``\ s of a particular +``Value`` is called a *def-use* chain. For example, let's say we have a +``Function*`` named ``F`` to a particular function ``foo``. Finding all of the +instructions that *use* ``foo`` is as simple as iterating over the *def-use* +chain of ``F``: + +.. code-block:: c++ + + Function *F = ...; + + for (Value::use_iterator i = F->use_begin(), e = F->use_end(); i != e; ++i) + if (Instruction *Inst = dyn_cast<Instruction>(*i)) { + errs() << "F is used in instruction:\n"; + errs() << *Inst << "\n"; + } + +Note that dereferencing a ``Value::use_iterator`` is not a very cheap operation. +Instead of performing ``*i`` above several times, consider doing it only once in +the loop body and reusing its result. + +Alternatively, it's common to have an instance of the ``User`` Class (`doxygen +<http://llvm.org/doxygen/classllvm_1_1User.html>`__) and need to know what +``Value``\ s are used by it. The list of all ``Value``\ s used by a ``User`` is +known as a *use-def* chain. Instances of class ``Instruction`` are common +``User`` s, so we might want to iterate over all of the values that a particular +instruction uses (that is, the operands of the particular ``Instruction``): + +.. code-block:: c++ + + Instruction *pi = ...; + + for (User::op_iterator i = pi->op_begin(), e = pi->op_end(); i != e; ++i) { + Value *v = *i; + // ... + } + +Declaring objects as ``const`` is an important tool of enforcing mutation free +algorithms (such as analyses, etc.). For this purpose above iterators come in +constant flavors as ``Value::const_use_iterator`` and +``Value::const_op_iterator``. They automatically arise when calling +``use/op_begin()`` on ``const Value*``\ s or ``const User*``\ s respectively. +Upon dereferencing, they return ``const Use*``\ s. Otherwise the above patterns +remain unchanged. + +.. _iterate_preds: + +Iterating over predecessors & successors of blocks +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Iterating over the predecessors and successors of a block is quite easy with the +routines defined in ``"llvm/Support/CFG.h"``. Just use code like this to +iterate over all predecessors of BB: + +.. code-block:: c++ + + #include "llvm/Support/CFG.h" + BasicBlock *BB = ...; + + for (pred_iterator PI = pred_begin(BB), E = pred_end(BB); PI != E; ++PI) { + BasicBlock *Pred = *PI; + // ... + } + +Similarly, to iterate over successors use ``succ_iterator/succ_begin/succ_end``. + +.. _simplechanges: + +Making simple changes +--------------------- + +There are some primitive transformation operations present in the LLVM +infrastructure that are worth knowing about. When performing transformations, +it's fairly common to manipulate the contents of basic blocks. This section +describes some of the common methods for doing so and gives example code. + +.. _schanges_creating: + +Creating and inserting new ``Instruction``\ s +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +*Instantiating Instructions* + +Creation of ``Instruction``\ s is straight-forward: simply call the constructor +for the kind of instruction to instantiate and provide the necessary parameters. +For example, an ``AllocaInst`` only *requires* a (const-ptr-to) ``Type``. Thus: + +.. code-block:: c++ + + AllocaInst* ai = new AllocaInst(Type::Int32Ty); + +will create an ``AllocaInst`` instance that represents the allocation of one +integer in the current stack frame, at run time. Each ``Instruction`` subclass +is likely to have varying default parameters which change the semantics of the +instruction, so refer to the `doxygen documentation for the subclass of +Instruction <http://llvm.org/doxygen/classllvm_1_1Instruction.html>`_ that +you're interested in instantiating. + +*Naming values* + +It is very useful to name the values of instructions when you're able to, as +this facilitates the debugging of your transformations. If you end up looking +at generated LLVM machine code, you definitely want to have logical names +associated with the results of instructions! By supplying a value for the +``Name`` (default) parameter of the ``Instruction`` constructor, you associate a +logical name with the result of the instruction's execution at run time. For +example, say that I'm writing a transformation that dynamically allocates space +for an integer on the stack, and that integer is going to be used as some kind +of index by some other code. To accomplish this, I place an ``AllocaInst`` at +the first point in the first ``BasicBlock`` of some ``Function``, and I'm +intending to use it within the same ``Function``. I might do: + +.. code-block:: c++ + + AllocaInst* pa = new AllocaInst(Type::Int32Ty, 0, "indexLoc"); + +where ``indexLoc`` is now the logical name of the instruction's execution value, +which is a pointer to an integer on the run time stack. + +*Inserting instructions* + +There are essentially two ways to insert an ``Instruction`` into an existing +sequence of instructions that form a ``BasicBlock``: + +* Insertion into an explicit instruction list + + Given a ``BasicBlock* pb``, an ``Instruction* pi`` within that ``BasicBlock``, + and a newly-created instruction we wish to insert before ``*pi``, we do the + following: + + .. code-block:: c++ + + BasicBlock *pb = ...; + Instruction *pi = ...; + Instruction *newInst = new Instruction(...); + + pb->getInstList().insert(pi, newInst); // Inserts newInst before pi in pb + + Appending to the end of a ``BasicBlock`` is so common that the ``Instruction`` + class and ``Instruction``-derived classes provide constructors which take a + pointer to a ``BasicBlock`` to be appended to. For example code that looked + like: + + .. code-block:: c++ + + BasicBlock *pb = ...; + Instruction *newInst = new Instruction(...); + + pb->getInstList().push_back(newInst); // Appends newInst to pb + + becomes: + + .. code-block:: c++ + + BasicBlock *pb = ...; + Instruction *newInst = new Instruction(..., pb); + + which is much cleaner, especially if you are creating long instruction + streams. + +* Insertion into an implicit instruction list + + ``Instruction`` instances that are already in ``BasicBlock``\ s are implicitly + associated with an existing instruction list: the instruction list of the + enclosing basic block. Thus, we could have accomplished the same thing as the + above code without being given a ``BasicBlock`` by doing: + + .. code-block:: c++ + + Instruction *pi = ...; + Instruction *newInst = new Instruction(...); + + pi->getParent()->getInstList().insert(pi, newInst); + + In fact, this sequence of steps occurs so frequently that the ``Instruction`` + class and ``Instruction``-derived classes provide constructors which take (as + a default parameter) a pointer to an ``Instruction`` which the newly-created + ``Instruction`` should precede. That is, ``Instruction`` constructors are + capable of inserting the newly-created instance into the ``BasicBlock`` of a + provided instruction, immediately before that instruction. Using an + ``Instruction`` constructor with a ``insertBefore`` (default) parameter, the + above code becomes: + + .. code-block:: c++ + + Instruction* pi = ...; + Instruction* newInst = new Instruction(..., pi); + + which is much cleaner, especially if you're creating a lot of instructions and + adding them to ``BasicBlock``\ s. + +.. _schanges_deleting: + +Deleting Instructions +^^^^^^^^^^^^^^^^^^^^^ + +Deleting an instruction from an existing sequence of instructions that form a +BasicBlock_ is very straight-forward: just call the instruction's +``eraseFromParent()`` method. For example: + +.. code-block:: c++ + + Instruction *I = .. ; + I->eraseFromParent(); + +This unlinks the instruction from its containing basic block and deletes it. If +you'd just like to unlink the instruction from its containing basic block but +not delete it, you can use the ``removeFromParent()`` method. + +.. _schanges_replacing: + +Replacing an Instruction with another Value +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Replacing individual instructions +""""""""""""""""""""""""""""""""" + +Including "`llvm/Transforms/Utils/BasicBlockUtils.h +<http://llvm.org/doxygen/BasicBlockUtils_8h-source.html>`_" permits use of two +very useful replace functions: ``ReplaceInstWithValue`` and +``ReplaceInstWithInst``. + +.. _schanges_deleting_sub: + +Deleting Instructions +""""""""""""""""""""" + +* ``ReplaceInstWithValue`` + + This function replaces all uses of a given instruction with a value, and then + removes the original instruction. The following example illustrates the + replacement of the result of a particular ``AllocaInst`` that allocates memory + for a single integer with a null pointer to an integer. + + .. code-block:: c++ + + AllocaInst* instToReplace = ...; + BasicBlock::iterator ii(instToReplace); + + ReplaceInstWithValue(instToReplace->getParent()->getInstList(), ii, + Constant::getNullValue(PointerType::getUnqual(Type::Int32Ty))); + +* ``ReplaceInstWithInst`` + + This function replaces a particular instruction with another instruction, + inserting the new instruction into the basic block at the location where the + old instruction was, and replacing any uses of the old instruction with the + new instruction. The following example illustrates the replacement of one + ``AllocaInst`` with another. + + .. code-block:: c++ + + AllocaInst* instToReplace = ...; + BasicBlock::iterator ii(instToReplace); + + ReplaceInstWithInst(instToReplace->getParent()->getInstList(), ii, + new AllocaInst(Type::Int32Ty, 0, "ptrToReplacedInt")); + + +Replacing multiple uses of Users and Values +""""""""""""""""""""""""""""""""""""""""""" + +You can use ``Value::replaceAllUsesWith`` and ``User::replaceUsesOfWith`` to +change more than one use at a time. See the doxygen documentation for the +`Value Class <http://llvm.org/doxygen/classllvm_1_1Value.html>`_ and `User Class +<http://llvm.org/doxygen/classllvm_1_1User.html>`_, respectively, for more +information. + +.. _schanges_deletingGV: + +Deleting GlobalVariables +^^^^^^^^^^^^^^^^^^^^^^^^ + +Deleting a global variable from a module is just as easy as deleting an +Instruction. First, you must have a pointer to the global variable that you +wish to delete. You use this pointer to erase it from its parent, the module. +For example: + +.. code-block:: c++ + + GlobalVariable *GV = .. ; + + GV->eraseFromParent(); + + +.. _create_types: + +How to Create Types +------------------- + +In generating IR, you may need some complex types. If you know these types +statically, you can use ``TypeBuilder<...>::get()``, defined in +``llvm/Support/TypeBuilder.h``, to retrieve them. ``TypeBuilder`` has two forms +depending on whether you're building types for cross-compilation or native +library use. ``TypeBuilder<T, true>`` requires that ``T`` be independent of the +host environment, meaning that it's built out of types from the ``llvm::types`` +(`doxygen <http://llvm.org/doxygen/namespacellvm_1_1types.html>`__) namespace +and pointers, functions, arrays, etc. built of those. ``TypeBuilder<T, false>`` +additionally allows native C types whose size may depend on the host compiler. +For example, + +.. code-block:: c++ + + FunctionType *ft = TypeBuilder<types::i<8>(types::i<32>*), true>::get(); + +is easier to read and write than the equivalent + +.. code-block:: c++ + + std::vector<const Type*> params; + params.push_back(PointerType::getUnqual(Type::Int32Ty)); + FunctionType *ft = FunctionType::get(Type::Int8Ty, params, false); + +See the `class comment +<http://llvm.org/doxygen/TypeBuilder_8h-source.html#l00001>`_ for more details. + +.. _threading: + +Threads and LLVM +================ + +This section describes the interaction of the LLVM APIs with multithreading, +both on the part of client applications, and in the JIT, in the hosted +application. + +Note that LLVM's support for multithreading is still relatively young. Up +through version 2.5, the execution of threaded hosted applications was +supported, but not threaded client access to the APIs. While this use case is +now supported, clients *must* adhere to the guidelines specified below to ensure +proper operation in multithreaded mode. + +Note that, on Unix-like platforms, LLVM requires the presence of GCC's atomic +intrinsics in order to support threaded operation. If you need a +multhreading-capable LLVM on a platform without a suitably modern system +compiler, consider compiling LLVM and LLVM-GCC in single-threaded mode, and +using the resultant compiler to build a copy of LLVM with multithreading +support. + +.. _startmultithreaded: + +Entering and Exiting Multithreaded Mode +--------------------------------------- + +In order to properly protect its internal data structures while avoiding +excessive locking overhead in the single-threaded case, the LLVM must intialize +certain data structures necessary to provide guards around its internals. To do +so, the client program must invoke ``llvm_start_multithreaded()`` before making +any concurrent LLVM API calls. To subsequently tear down these structures, use +the ``llvm_stop_multithreaded()`` call. You can also use the +``llvm_is_multithreaded()`` call to check the status of multithreaded mode. + +Note that both of these calls must be made *in isolation*. That is to say that +no other LLVM API calls may be executing at any time during the execution of +``llvm_start_multithreaded()`` or ``llvm_stop_multithreaded``. It's is the +client's responsibility to enforce this isolation. + +The return value of ``llvm_start_multithreaded()`` indicates the success or +failure of the initialization. Failure typically indicates that your copy of +LLVM was built without multithreading support, typically because GCC atomic +intrinsics were not found in your system compiler. In this case, the LLVM API +will not be safe for concurrent calls. However, it *will* be safe for hosting +threaded applications in the JIT, though :ref:`care must be taken +<jitthreading>` to ensure that side exits and the like do not accidentally +result in concurrent LLVM API calls. + +.. _shutdown: + +Ending Execution with ``llvm_shutdown()`` +----------------------------------------- + +When you are done using the LLVM APIs, you should call ``llvm_shutdown()`` to +deallocate memory used for internal structures. This will also invoke +``llvm_stop_multithreaded()`` if LLVM is operating in multithreaded mode. As +such, ``llvm_shutdown()`` requires the same isolation guarantees as +``llvm_stop_multithreaded()``. + +Note that, if you use scope-based shutdown, you can use the +``llvm_shutdown_obj`` class, which calls ``llvm_shutdown()`` in its destructor. + +.. _managedstatic: + +Lazy Initialization with ``ManagedStatic`` +------------------------------------------ + +``ManagedStatic`` is a utility class in LLVM used to implement static +initialization of static resources, such as the global type tables. Before the +invocation of ``llvm_shutdown()``, it implements a simple lazy initialization +scheme. Once ``llvm_start_multithreaded()`` returns, however, it uses +double-checked locking to implement thread-safe lazy initialization. + +Note that, because no other threads are allowed to issue LLVM API calls before +``llvm_start_multithreaded()`` returns, it is possible to have +``ManagedStatic``\ s of ``llvm::sys::Mutex``\ s. + +The ``llvm_acquire_global_lock()`` and ``llvm_release_global_lock`` APIs provide +access to the global lock used to implement the double-checked locking for lazy +initialization. These should only be used internally to LLVM, and only if you +know what you're doing! + +.. _llvmcontext: + +Achieving Isolation with ``LLVMContext`` +---------------------------------------- + +``LLVMContext`` is an opaque class in the LLVM API which clients can use to +operate multiple, isolated instances of LLVM concurrently within the same +address space. For instance, in a hypothetical compile-server, the compilation +of an individual translation unit is conceptually independent from all the +others, and it would be desirable to be able to compile incoming translation +units concurrently on independent server threads. Fortunately, ``LLVMContext`` +exists to enable just this kind of scenario! + +Conceptually, ``LLVMContext`` provides isolation. Every LLVM entity +(``Module``\ s, ``Value``\ s, ``Type``\ s, ``Constant``\ s, etc.) in LLVM's +in-memory IR belongs to an ``LLVMContext``. Entities in different contexts +*cannot* interact with each other: ``Module``\ s in different contexts cannot be +linked together, ``Function``\ s cannot be added to ``Module``\ s in different +contexts, etc. What this means is that is is safe to compile on multiple +threads simultaneously, as long as no two threads operate on entities within the +same context. + +In practice, very few places in the API require the explicit specification of a +``LLVMContext``, other than the ``Type`` creation/lookup APIs. Because every +``Type`` carries a reference to its owning context, most other entities can +determine what context they belong to by looking at their own ``Type``. If you +are adding new entities to LLVM IR, please try to maintain this interface +design. + +For clients that do *not* require the benefits of isolation, LLVM provides a +convenience API ``getGlobalContext()``. This returns a global, lazily +initialized ``LLVMContext`` that may be used in situations where isolation is +not a concern. + +.. _jitthreading: + +Threads and the JIT +------------------- + +LLVM's "eager" JIT compiler is safe to use in threaded programs. Multiple +threads can call ``ExecutionEngine::getPointerToFunction()`` or +``ExecutionEngine::runFunction()`` concurrently, and multiple threads can run +code output by the JIT concurrently. The user must still ensure that only one +thread accesses IR in a given ``LLVMContext`` while another thread might be +modifying it. One way to do that is to always hold the JIT lock while accessing +IR outside the JIT (the JIT *modifies* the IR by adding ``CallbackVH``\ s). +Another way is to only call ``getPointerToFunction()`` from the +``LLVMContext``'s thread. + +When the JIT is configured to compile lazily (using +``ExecutionEngine::DisableLazyCompilation(false)``), there is currently a `race +condition <http://llvm.org/bugs/show_bug.cgi?id=5184>`_ in updating call sites +after a function is lazily-jitted. It's still possible to use the lazy JIT in a +threaded program if you ensure that only one thread at a time can call any +particular lazy stub and that the JIT lock guards any IR access, but we suggest +using only the eager JIT in threaded programs. + +.. _advanced: + +Advanced Topics +=============== + +This section describes some of the advanced or obscure API's that most clients +do not need to be aware of. These API's tend manage the inner workings of the +LLVM system, and only need to be accessed in unusual circumstances. + +.. _SymbolTable: + +The ``ValueSymbolTable`` class +------------------------------ + +The ``ValueSymbolTable`` (`doxygen +<http://llvm.org/doxygen/classllvm_1_1ValueSymbolTable.html>`__) class provides +a symbol table that the :ref:`Function <c_Function>` and Module_ classes use for +naming value definitions. The symbol table can provide a name for any Value_. + +Note that the ``SymbolTable`` class should not be directly accessed by most +clients. It should only be used when iteration over the symbol table names +themselves are required, which is very special purpose. Note that not all LLVM +Value_\ s have names, and those without names (i.e. they have an empty name) do +not exist in the symbol table. + +Symbol tables support iteration over the values in the symbol table with +``begin/end/iterator`` and supports querying to see if a specific name is in the +symbol table (with ``lookup``). The ``ValueSymbolTable`` class exposes no +public mutator methods, instead, simply call ``setName`` on a value, which will +autoinsert it into the appropriate symbol table. + +.. _UserLayout: + +The ``User`` and owned ``Use`` classes' memory layout +----------------------------------------------------- + +The ``User`` (`doxygen <http://llvm.org/doxygen/classllvm_1_1User.html>`__) +class provides a basis for expressing the ownership of ``User`` towards other +`Value instance <http://llvm.org/doxygen/classllvm_1_1Value.html>`_\ s. The +``Use`` (`doxygen <http://llvm.org/doxygen/classllvm_1_1Use.html>`__) helper +class is employed to do the bookkeeping and to facilitate *O(1)* addition and +removal. + +.. _Use2User: + +Interaction and relationship between ``User`` and ``Use`` objects +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +A subclass of ``User`` can choose between incorporating its ``Use`` objects or +refer to them out-of-line by means of a pointer. A mixed variant (some ``Use`` +s inline others hung off) is impractical and breaks the invariant that the +``Use`` objects belonging to the same ``User`` form a contiguous array. + +We have 2 different layouts in the ``User`` (sub)classes: + +* Layout a) + + The ``Use`` object(s) are inside (resp. at fixed offset) of the ``User`` + object and there are a fixed number of them. + +* Layout b) + + The ``Use`` object(s) are referenced by a pointer to an array from the + ``User`` object and there may be a variable number of them. + +As of v2.4 each layout still possesses a direct pointer to the start of the +array of ``Use``\ s. Though not mandatory for layout a), we stick to this +redundancy for the sake of simplicity. The ``User`` object also stores the +number of ``Use`` objects it has. (Theoretically this information can also be +calculated given the scheme presented below.) + +Special forms of allocation operators (``operator new``) enforce the following +memory layouts: + +* Layout a) is modelled by prepending the ``User`` object by the ``Use[]`` + array. + + .. code-block:: none + + ...---.---.---.---.-------... + | P | P | P | P | User + '''---'---'---'---'-------''' + +* Layout b) is modelled by pointing at the ``Use[]`` array. + + .. code-block:: none + + .-------... + | User + '-------''' + | + v + .---.---.---.---... + | P | P | P | P | + '---'---'---'---''' + +*(In the above figures* '``P``' *stands for the* ``Use**`` *that is stored in +each* ``Use`` *object in the member* ``Use::Prev`` *)* + +.. _Waymarking: + +The waymarking algorithm +^^^^^^^^^^^^^^^^^^^^^^^^ + +Since the ``Use`` objects are deprived of the direct (back)pointer to their +``User`` objects, there must be a fast and exact method to recover it. This is +accomplished by the following scheme: + +A bit-encoding in the 2 LSBits (least significant bits) of the ``Use::Prev`` +allows to find the start of the ``User`` object: + +* ``00`` –> binary digit 0 + +* ``01`` –> binary digit 1 + +* ``10`` –> stop and calculate (``s``) + +* ``11`` –> full stop (``S``) + +Given a ``Use*``, all we have to do is to walk till we get a stop and we either +have a ``User`` immediately behind or we have to walk to the next stop picking +up digits and calculating the offset: + +.. code-block:: none + + .---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.---------------- + | 1 | s | 1 | 0 | 1 | 0 | s | 1 | 1 | 0 | s | 1 | 1 | s | 1 | S | User (or User*) + '---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'---------------- + |+15 |+10 |+6 |+3 |+1 + | | | | | __> + | | | | __________> + | | | ______________________> + | | ______________________________________> + | __________________________________________________________> + +Only the significant number of bits need to be stored between the stops, so that +the *worst case is 20 memory accesses* when there are 1000 ``Use`` objects +associated with a ``User``. + +.. _ReferenceImpl: + +Reference implementation +^^^^^^^^^^^^^^^^^^^^^^^^ + +The following literate Haskell fragment demonstrates the concept: + +.. code-block:: haskell + + > import Test.QuickCheck + > + > digits :: Int -> [Char] -> [Char] + > digits 0 acc = '0' : acc + > digits 1 acc = '1' : acc + > digits n acc = digits (n `div` 2) $ digits (n `mod` 2) acc + > + > dist :: Int -> [Char] -> [Char] + > dist 0 [] = ['S'] + > dist 0 acc = acc + > dist 1 acc = let r = dist 0 acc in 's' : digits (length r) r + > dist n acc = dist (n - 1) $ dist 1 acc + > + > takeLast n ss = reverse $ take n $ reverse ss + > + > test = takeLast 40 $ dist 20 [] + > + +Printing <test> gives: ``"1s100000s11010s10100s1111s1010s110s11s1S"`` + +The reverse algorithm computes the length of the string just by examining a +certain prefix: + +.. code-block:: haskell + + > pref :: [Char] -> Int + > pref "S" = 1 + > pref ('s':'1':rest) = decode 2 1 rest + > pref (_:rest) = 1 + pref rest + > + > decode walk acc ('0':rest) = decode (walk + 1) (acc * 2) rest + > decode walk acc ('1':rest) = decode (walk + 1) (acc * 2 + 1) rest + > decode walk acc _ = walk + acc + > + +Now, as expected, printing <pref test> gives ``40``. + +We can *quickCheck* this with following property: + +.. code-block:: haskell + + > testcase = dist 2000 [] + > testcaseLength = length testcase + > + > identityProp n = n > 0 && n <= testcaseLength ==> length arr == pref arr + > where arr = takeLast n testcase + > + +As expected <quickCheck identityProp> gives: + +:: + + *Main> quickCheck identityProp + OK, passed 100 tests. + +Let's be a bit more exhaustive: + +.. code-block:: haskell + + > + > deepCheck p = check (defaultConfig { configMaxTest = 500 }) p + > + +And here is the result of <deepCheck identityProp>: + +:: + + *Main> deepCheck identityProp + OK, passed 500 tests. + +.. _Tagging: + +Tagging considerations +^^^^^^^^^^^^^^^^^^^^^^ + +To maintain the invariant that the 2 LSBits of each ``Use**`` in ``Use`` never +change after being set up, setters of ``Use::Prev`` must re-tag the new +``Use**`` on every modification. Accordingly getters must strip the tag bits. + +For layout b) instead of the ``User`` we find a pointer (``User*`` with LSBit +set). Following this pointer brings us to the ``User``. A portable trick +ensures that the first bytes of ``User`` (if interpreted as a pointer) never has +the LSBit set. (Portability is relying on the fact that all known compilers +place the ``vptr`` in the first word of the instances.) + +.. _coreclasses: + +The Core LLVM Class Hierarchy Reference +======================================= + +``#include "llvm/Type.h"`` + +header source: `Type.h <http://llvm.org/doxygen/Type_8h-source.html>`_ + +doxygen info: `Type Clases <http://llvm.org/doxygen/classllvm_1_1Type.html>`_ + +The Core LLVM classes are the primary means of representing the program being +inspected or transformed. The core LLVM classes are defined in header files in +the ``include/llvm/`` directory, and implemented in the ``lib/VMCore`` +directory. + +.. _Type: + +The Type class and Derived Types +-------------------------------- + +``Type`` is a superclass of all type classes. Every ``Value`` has a ``Type``. +``Type`` cannot be instantiated directly but only through its subclasses. +Certain primitive types (``VoidType``, ``LabelType``, ``FloatType`` and +``DoubleType``) have hidden subclasses. They are hidden because they offer no +useful functionality beyond what the ``Type`` class offers except to distinguish +themselves from other subclasses of ``Type``. + +All other types are subclasses of ``DerivedType``. Types can be named, but this +is not a requirement. There exists exactly one instance of a given shape at any +one time. This allows type equality to be performed with address equality of +the Type Instance. That is, given two ``Type*`` values, the types are identical +if the pointers are identical. + +.. _m_Type: + +Important Public Methods +^^^^^^^^^^^^^^^^^^^^^^^^ + +* ``bool isIntegerTy() const``: Returns true for any integer type. + +* ``bool isFloatingPointTy()``: Return true if this is one of the five + floating point types. + +* ``bool isSized()``: Return true if the type has known size. Things + that don't have a size are abstract types, labels and void. + +.. _derivedtypes: + +Important Derived Types +^^^^^^^^^^^^^^^^^^^^^^^ + +``IntegerType`` + Subclass of DerivedType that represents integer types of any bit width. Any + bit width between ``IntegerType::MIN_INT_BITS`` (1) and + ``IntegerType::MAX_INT_BITS`` (~8 million) can be represented. + + * ``static const IntegerType* get(unsigned NumBits)``: get an integer + type of a specific bit width. + + * ``unsigned getBitWidth() const``: Get the bit width of an integer type. + +``SequentialType`` + This is subclassed by ArrayType, PointerType and VectorType. + + * ``const Type * getElementType() const``: Returns the type of each + of the elements in the sequential type. + +``ArrayType`` + This is a subclass of SequentialType and defines the interface for array + types. + + * ``unsigned getNumElements() const``: Returns the number of elements + in the array. + +``PointerType`` + Subclass of SequentialType for pointer types. + +``VectorType`` + Subclass of SequentialType for vector types. A vector type is similar to an + ArrayType but is distinguished because it is a first class type whereas + ArrayType is not. Vector types are used for vector operations and are usually + small vectors of of an integer or floating point type. + +``StructType`` + Subclass of DerivedTypes for struct types. + +.. _FunctionType: + +``FunctionType`` + Subclass of DerivedTypes for function types. + + * ``bool isVarArg() const``: Returns true if it's a vararg function. + + * ``const Type * getReturnType() const``: Returns the return type of the + function. + + * ``const Type * getParamType (unsigned i)``: Returns the type of the ith + parameter. + + * ``const unsigned getNumParams() const``: Returns the number of formal + parameters. + +.. _Module: + +The ``Module`` class +-------------------- + +``#include "llvm/Module.h"`` + +header source: `Module.h <http://llvm.org/doxygen/Module_8h-source.html>`_ + +doxygen info: `Module Class <http://llvm.org/doxygen/classllvm_1_1Module.html>`_ + +The ``Module`` class represents the top level structure present in LLVM +programs. An LLVM module is effectively either a translation unit of the +original program or a combination of several translation units merged by the +linker. The ``Module`` class keeps track of a list of :ref:`Function +<c_Function>`\ s, a list of GlobalVariable_\ s, and a SymbolTable_. +Additionally, it contains a few helpful member functions that try to make common +operations easy. + +.. _m_Module: + +Important Public Members of the ``Module`` class +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +* ``Module::Module(std::string name = "")`` + + Constructing a Module_ is easy. You can optionally provide a name for it + (probably based on the name of the translation unit). + +* | ``Module::iterator`` - Typedef for function list iterator + | ``Module::const_iterator`` - Typedef for const_iterator. + | ``begin()``, ``end()``, ``size()``, ``empty()`` + + These are forwarding methods that make it easy to access the contents of a + ``Module`` object's :ref:`Function <c_Function>` list. + +* ``Module::FunctionListType &getFunctionList()`` + + Returns the list of :ref:`Function <c_Function>`\ s. This is necessary to use + when you need to update the list or perform a complex action that doesn't have + a forwarding method. + +---------------- + +* | ``Module::global_iterator`` - Typedef for global variable list iterator + | ``Module::const_global_iterator`` - Typedef for const_iterator. + | ``global_begin()``, ``global_end()``, ``global_size()``, ``global_empty()`` + + These are forwarding methods that make it easy to access the contents of a + ``Module`` object's GlobalVariable_ list. + +* ``Module::GlobalListType &getGlobalList()`` + + Returns the list of GlobalVariable_\ s. This is necessary to use when you + need to update the list or perform a complex action that doesn't have a + forwarding method. + +---------------- + +* ``SymbolTable *getSymbolTable()`` + + Return a reference to the SymbolTable_ for this ``Module``. + +---------------- + +* ``Function *getFunction(StringRef Name) const`` + + Look up the specified function in the ``Module`` SymbolTable_. If it does not + exist, return ``null``. + +* ``Function *getOrInsertFunction(const std::string &Name, const FunctionType + *T)`` + + Look up the specified function in the ``Module`` SymbolTable_. If it does not + exist, add an external declaration for the function and return it. + +* ``std::string getTypeName(const Type *Ty)`` + + If there is at least one entry in the SymbolTable_ for the specified Type_, + return it. Otherwise return the empty string. + +* ``bool addTypeName(const std::string &Name, const Type *Ty)`` + + Insert an entry in the SymbolTable_ mapping ``Name`` to ``Ty``. If there is + already an entry for this name, true is returned and the SymbolTable_ is not + modified. + +.. _Value: + +The ``Value`` class +------------------- + +``#include "llvm/Value.h"`` + +header source: `Value.h <http://llvm.org/doxygen/Value_8h-source.html>`_ + +doxygen info: `Value Class <http://llvm.org/doxygen/classllvm_1_1Value.html>`_ + +The ``Value`` class is the most important class in the LLVM Source base. It +represents a typed value that may be used (among other things) as an operand to +an instruction. There are many different types of ``Value``\ s, such as +Constant_\ s, Argument_\ s. Even Instruction_\ s and :ref:`Function +<c_Function>`\ s are ``Value``\ s. + +A particular ``Value`` may be used many times in the LLVM representation for a +program. For example, an incoming argument to a function (represented with an +instance of the Argument_ class) is "used" by every instruction in the function +that references the argument. To keep track of this relationship, the ``Value`` +class keeps a list of all of the ``User``\ s that is using it (the User_ class +is a base class for all nodes in the LLVM graph that can refer to ``Value``\ s). +This use list is how LLVM represents def-use information in the program, and is +accessible through the ``use_*`` methods, shown below. + +Because LLVM is a typed representation, every LLVM ``Value`` is typed, and this +Type_ is available through the ``getType()`` method. In addition, all LLVM +values can be named. The "name" of the ``Value`` is a symbolic string printed +in the LLVM code: + +.. code-block:: llvm + + %foo = add i32 1, 2 + +.. _nameWarning: + +The name of this instruction is "foo". **NOTE** that the name of any value may +be missing (an empty string), so names should **ONLY** be used for debugging +(making the source code easier to read, debugging printouts), they should not be +used to keep track of values or map between them. For this purpose, use a +``std::map`` of pointers to the ``Value`` itself instead. + +One important aspect of LLVM is that there is no distinction between an SSA +variable and the operation that produces it. Because of this, any reference to +the value produced by an instruction (or the value available as an incoming +argument, for example) is represented as a direct pointer to the instance of the +class that represents this value. Although this may take some getting used to, +it simplifies the representation and makes it easier to manipulate. + +.. _m_Value: + +Important Public Members of the ``Value`` class +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +* | ``Value::use_iterator`` - Typedef for iterator over the use-list + | ``Value::const_use_iterator`` - Typedef for const_iterator over the + use-list + | ``unsigned use_size()`` - Returns the number of users of the value. + | ``bool use_empty()`` - Returns true if there are no users. + | ``use_iterator use_begin()`` - Get an iterator to the start of the + use-list. + | ``use_iterator use_end()`` - Get an iterator to the end of the use-list. + | ``User *use_back()`` - Returns the last element in the list. + + These methods are the interface to access the def-use information in LLVM. + As with all other iterators in LLVM, the naming conventions follow the + conventions defined by the STL_. + +* ``Type *getType() const`` + This method returns the Type of the Value. + +* | ``bool hasName() const`` + | ``std::string getName() const`` + | ``void setName(const std::string &Name)`` + + This family of methods is used to access and assign a name to a ``Value``, be + aware of the :ref:`precaution above <nameWarning>`. + +* ``void replaceAllUsesWith(Value *V)`` + + This method traverses the use list of a ``Value`` changing all User_\ s of the + current value to refer to "``V``" instead. For example, if you detect that an + instruction always produces a constant value (for example through constant + folding), you can replace all uses of the instruction with the constant like + this: + + .. code-block:: c++ + + Inst->replaceAllUsesWith(ConstVal); + +.. _User: + +The ``User`` class +------------------ + +``#include "llvm/User.h"`` + +header source: `User.h <http://llvm.org/doxygen/User_8h-source.html>`_ + +doxygen info: `User Class <http://llvm.org/doxygen/classllvm_1_1User.html>`_ + +Superclass: Value_ + +The ``User`` class is the common base class of all LLVM nodes that may refer to +``Value``\ s. It exposes a list of "Operands" that are all of the ``Value``\ s +that the User is referring to. The ``User`` class itself is a subclass of +``Value``. + +The operands of a ``User`` point directly to the LLVM ``Value`` that it refers +to. Because LLVM uses Static Single Assignment (SSA) form, there can only be +one definition referred to, allowing this direct connection. This connection +provides the use-def information in LLVM. + +.. _m_User: + +Important Public Members of the ``User`` class +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The ``User`` class exposes the operand list in two ways: through an index access +interface and through an iterator based interface. + +* | ``Value *getOperand(unsigned i)`` + | ``unsigned getNumOperands()`` + + These two methods expose the operands of the ``User`` in a convenient form for + direct access. + +* | ``User::op_iterator`` - Typedef for iterator over the operand list + | ``op_iterator op_begin()`` - Get an iterator to the start of the operand + list. + | ``op_iterator op_end()`` - Get an iterator to the end of the operand list. + + Together, these methods make up the iterator based interface to the operands + of a ``User``. + + +.. _Instruction: + +The ``Instruction`` class +------------------------- + +``#include "llvm/Instruction.h"`` + +header source: `Instruction.h +<http://llvm.org/doxygen/Instruction_8h-source.html>`_ + +doxygen info: `Instruction Class +<http://llvm.org/doxygen/classllvm_1_1Instruction.html>`_ + +Superclasses: User_, Value_ + +The ``Instruction`` class is the common base class for all LLVM instructions. +It provides only a few methods, but is a very commonly used class. The primary +data tracked by the ``Instruction`` class itself is the opcode (instruction +type) and the parent BasicBlock_ the ``Instruction`` is embedded into. To +represent a specific type of instruction, one of many subclasses of +``Instruction`` are used. + +Because the ``Instruction`` class subclasses the User_ class, its operands can +be accessed in the same way as for other ``User``\ s (with the +``getOperand()``/``getNumOperands()`` and ``op_begin()``/``op_end()`` methods). +An important file for the ``Instruction`` class is the ``llvm/Instruction.def`` +file. This file contains some meta-data about the various different types of +instructions in LLVM. It describes the enum values that are used as opcodes +(for example ``Instruction::Add`` and ``Instruction::ICmp``), as well as the +concrete sub-classes of ``Instruction`` that implement the instruction (for +example BinaryOperator_ and CmpInst_). Unfortunately, the use of macros in this +file confuses doxygen, so these enum values don't show up correctly in the +`doxygen output <http://llvm.org/doxygen/classllvm_1_1Instruction.html>`_. + +.. _s_Instruction: + +Important Subclasses of the ``Instruction`` class +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. _BinaryOperator: + +* ``BinaryOperator`` + + This subclasses represents all two operand instructions whose operands must be + the same type, except for the comparison instructions. + +.. _CastInst: + +* ``CastInst`` + This subclass is the parent of the 12 casting instructions. It provides + common operations on cast instructions. + +.. _CmpInst: + +* ``CmpInst`` + + This subclass respresents the two comparison instructions, + `ICmpInst <LangRef.html#i_icmp>`_ (integer opreands), and + `FCmpInst <LangRef.html#i_fcmp>`_ (floating point operands). + +.. _TerminatorInst: + +* ``TerminatorInst`` + + This subclass is the parent of all terminator instructions (those which can + terminate a block). + +.. _m_Instruction: + +Important Public Members of the ``Instruction`` class +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +* ``BasicBlock *getParent()`` + + Returns the BasicBlock_ that this + ``Instruction`` is embedded into. + +* ``bool mayWriteToMemory()`` + + Returns true if the instruction writes to memory, i.e. it is a ``call``, + ``free``, ``invoke``, or ``store``. + +* ``unsigned getOpcode()`` + + Returns the opcode for the ``Instruction``. + +* ``Instruction *clone() const`` + + Returns another instance of the specified instruction, identical in all ways + to the original except that the instruction has no parent (i.e. it's not + embedded into a BasicBlock_), and it has no name. + +.. _Constant: + +The ``Constant`` class and subclasses +------------------------------------- + +Constant represents a base class for different types of constants. It is +subclassed by ConstantInt, ConstantArray, etc. for representing the various +types of Constants. GlobalValue_ is also a subclass, which represents the +address of a global variable or function. + +.. _s_Constant: + +Important Subclasses of Constant +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +* ConstantInt : This subclass of Constant represents an integer constant of + any width. + + * ``const APInt& getValue() const``: Returns the underlying + value of this constant, an APInt value. + + * ``int64_t getSExtValue() const``: Converts the underlying APInt value to an + int64_t via sign extension. If the value (not the bit width) of the APInt + is too large to fit in an int64_t, an assertion will result. For this + reason, use of this method is discouraged. + + * ``uint64_t getZExtValue() const``: Converts the underlying APInt value + to a uint64_t via zero extension. IF the value (not the bit width) of the + APInt is too large to fit in a uint64_t, an assertion will result. For this + reason, use of this method is discouraged. + + * ``static ConstantInt* get(const APInt& Val)``: Returns the ConstantInt + object that represents the value provided by ``Val``. The type is implied + as the IntegerType that corresponds to the bit width of ``Val``. + + * ``static ConstantInt* get(const Type *Ty, uint64_t Val)``: Returns the + ConstantInt object that represents the value provided by ``Val`` for integer + type ``Ty``. + +* ConstantFP : This class represents a floating point constant. + + * ``double getValue() const``: Returns the underlying value of this constant. + +* ConstantArray : This represents a constant array. + + * ``const std::vector<Use> &getValues() const``: Returns a vector of + component constants that makeup this array. + +* ConstantStruct : This represents a constant struct. + + * ``const std::vector<Use> &getValues() const``: Returns a vector of + component constants that makeup this array. + +* GlobalValue : This represents either a global variable or a function. In + either case, the value is a constant fixed address (after linking). + +.. _GlobalValue: + +The ``GlobalValue`` class +------------------------- + +``#include "llvm/GlobalValue.h"`` + +header source: `GlobalValue.h +<http://llvm.org/doxygen/GlobalValue_8h-source.html>`_ + +doxygen info: `GlobalValue Class +<http://llvm.org/doxygen/classllvm_1_1GlobalValue.html>`_ + +Superclasses: Constant_, User_, Value_ + +Global values ( GlobalVariable_\ s or :ref:`Function <c_Function>`\ s) are the +only LLVM values that are visible in the bodies of all :ref:`Function +<c_Function>`\ s. Because they are visible at global scope, they are also +subject to linking with other globals defined in different translation units. +To control the linking process, ``GlobalValue``\ s know their linkage rules. +Specifically, ``GlobalValue``\ s know whether they have internal or external +linkage, as defined by the ``LinkageTypes`` enumeration. + +If a ``GlobalValue`` has internal linkage (equivalent to being ``static`` in C), +it is not visible to code outside the current translation unit, and does not +participate in linking. If it has external linkage, it is visible to external +code, and does participate in linking. In addition to linkage information, +``GlobalValue``\ s keep track of which Module_ they are currently part of. + +Because ``GlobalValue``\ s are memory objects, they are always referred to by +their **address**. As such, the Type_ of a global is always a pointer to its +contents. It is important to remember this when using the ``GetElementPtrInst`` +instruction because this pointer must be dereferenced first. For example, if +you have a ``GlobalVariable`` (a subclass of ``GlobalValue)`` that is an array +of 24 ints, type ``[24 x i32]``, then the ``GlobalVariable`` is a pointer to +that array. Although the address of the first element of this array and the +value of the ``GlobalVariable`` are the same, they have different types. The +``GlobalVariable``'s type is ``[24 x i32]``. The first element's type is +``i32.`` Because of this, accessing a global value requires you to dereference +the pointer with ``GetElementPtrInst`` first, then its elements can be accessed. +This is explained in the `LLVM Language Reference Manual +<LangRef.html#globalvars>`_. + +.. _m_GlobalValue: + +Important Public Members of the ``GlobalValue`` class +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +* | ``bool hasInternalLinkage() const`` + | ``bool hasExternalLinkage() const`` + | ``void setInternalLinkage(bool HasInternalLinkage)`` + + These methods manipulate the linkage characteristics of the ``GlobalValue``. + +* ``Module *getParent()`` + + This returns the Module_ that the + GlobalValue is currently embedded into. + +.. _c_Function: + +The ``Function`` class +---------------------- + +``#include "llvm/Function.h"`` + +header source: `Function.h <http://llvm.org/doxygen/Function_8h-source.html>`_ + +doxygen info: `Function Class +<http://llvm.org/doxygen/classllvm_1_1Function.html>`_ + +Superclasses: GlobalValue_, Constant_, User_, Value_ + +The ``Function`` class represents a single procedure in LLVM. It is actually +one of the more complex classes in the LLVM hierarchy because it must keep track +of a large amount of data. The ``Function`` class keeps track of a list of +BasicBlock_\ s, a list of formal Argument_\ s, and a SymbolTable_. + +The list of BasicBlock_\ s is the most commonly used part of ``Function`` +objects. The list imposes an implicit ordering of the blocks in the function, +which indicate how the code will be laid out by the backend. Additionally, the +first BasicBlock_ is the implicit entry node for the ``Function``. It is not +legal in LLVM to explicitly branch to this initial block. There are no implicit +exit nodes, and in fact there may be multiple exit nodes from a single +``Function``. If the BasicBlock_ list is empty, this indicates that the +``Function`` is actually a function declaration: the actual body of the function +hasn't been linked in yet. + +In addition to a list of BasicBlock_\ s, the ``Function`` class also keeps track +of the list of formal Argument_\ s that the function receives. This container +manages the lifetime of the Argument_ nodes, just like the BasicBlock_ list does +for the BasicBlock_\ s. + +The SymbolTable_ is a very rarely used LLVM feature that is only used when you +have to look up a value by name. Aside from that, the SymbolTable_ is used +internally to make sure that there are not conflicts between the names of +Instruction_\ s, BasicBlock_\ s, or Argument_\ s in the function body. + +Note that ``Function`` is a GlobalValue_ and therefore also a Constant_. The +value of the function is its address (after linking) which is guaranteed to be +constant. + +.. _m_Function: + +Important Public Members of the ``Function`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +* ``Function(const FunctionType *Ty, LinkageTypes Linkage, + const std::string &N = "", Module* Parent = 0)`` + + Constructor used when you need to create new ``Function``\ s to add the + program. The constructor must specify the type of the function to create and + what type of linkage the function should have. The FunctionType_ argument + specifies the formal arguments and return value for the function. The same + FunctionType_ value can be used to create multiple functions. The ``Parent`` + argument specifies the Module in which the function is defined. If this + argument is provided, the function will automatically be inserted into that + module's list of functions. + +* ``bool isDeclaration()`` + + Return whether or not the ``Function`` has a body defined. If the function is + "external", it does not have a body, and thus must be resolved by linking with + a function defined in a different translation unit. + +* | ``Function::iterator`` - Typedef for basic block list iterator + | ``Function::const_iterator`` - Typedef for const_iterator. + | ``begin()``, ``end()``, ``size()``, ``empty()`` + + These are forwarding methods that make it easy to access the contents of a + ``Function`` object's BasicBlock_ list. + +* ``Function::BasicBlockListType &getBasicBlockList()`` + + Returns the list of BasicBlock_\ s. This is necessary to use when you need to + update the list or perform a complex action that doesn't have a forwarding + method. + +* | ``Function::arg_iterator`` - Typedef for the argument list iterator + | ``Function::const_arg_iterator`` - Typedef for const_iterator. + | ``arg_begin()``, ``arg_end()``, ``arg_size()``, ``arg_empty()`` + + These are forwarding methods that make it easy to access the contents of a + ``Function`` object's Argument_ list. + +* ``Function::ArgumentListType &getArgumentList()`` + + Returns the list of Argument_. This is necessary to use when you need to + update the list or perform a complex action that doesn't have a forwarding + method. + +* ``BasicBlock &getEntryBlock()`` + + Returns the entry ``BasicBlock`` for the function. Because the entry block + for the function is always the first block, this returns the first block of + the ``Function``. + +* | ``Type *getReturnType()`` + | ``FunctionType *getFunctionType()`` + + This traverses the Type_ of the ``Function`` and returns the return type of + the function, or the FunctionType_ of the actual function. + +* ``SymbolTable *getSymbolTable()`` + + Return a pointer to the SymbolTable_ for this ``Function``. + +.. _GlobalVariable: + +The ``GlobalVariable`` class +---------------------------- + +``#include "llvm/GlobalVariable.h"`` + +header source: `GlobalVariable.h +<http://llvm.org/doxygen/GlobalVariable_8h-source.html>`_ + +doxygen info: `GlobalVariable Class +<http://llvm.org/doxygen/classllvm_1_1GlobalVariable.html>`_ + +Superclasses: GlobalValue_, Constant_, User_, Value_ + +Global variables are represented with the (surprise surprise) ``GlobalVariable`` +class. Like functions, ``GlobalVariable``\ s are also subclasses of +GlobalValue_, and as such are always referenced by their address (global values +must live in memory, so their "name" refers to their constant address). See +GlobalValue_ for more on this. Global variables may have an initial value +(which must be a Constant_), and if they have an initializer, they may be marked +as "constant" themselves (indicating that their contents never change at +runtime). + +.. _m_GlobalVariable: + +Important Public Members of the ``GlobalVariable`` class +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +* ``GlobalVariable(const Type *Ty, bool isConstant, LinkageTypes &Linkage, + Constant *Initializer = 0, const std::string &Name = "", Module* Parent = 0)`` + + Create a new global variable of the specified type. If ``isConstant`` is true + then the global variable will be marked as unchanging for the program. The + Linkage parameter specifies the type of linkage (internal, external, weak, + linkonce, appending) for the variable. If the linkage is InternalLinkage, + WeakAnyLinkage, WeakODRLinkage, LinkOnceAnyLinkage or LinkOnceODRLinkage, then + the resultant global variable will have internal linkage. AppendingLinkage + concatenates together all instances (in different translation units) of the + variable into a single variable but is only applicable to arrays. See the + `LLVM Language Reference <LangRef.html#modulestructure>`_ for further details + on linkage types. Optionally an initializer, a name, and the module to put + the variable into may be specified for the global variable as well. + +* ``bool isConstant() const`` + + Returns true if this is a global variable that is known not to be modified at + runtime. + +* ``bool hasInitializer()`` + + Returns true if this ``GlobalVariable`` has an intializer. + +* ``Constant *getInitializer()`` + + Returns the initial value for a ``GlobalVariable``. It is not legal to call + this method if there is no initializer. + +.. _BasicBlock: + +The ``BasicBlock`` class +------------------------ + +``#include "llvm/BasicBlock.h"`` + +header source: `BasicBlock.h +<http://llvm.org/doxygen/BasicBlock_8h-source.html>`_ + +doxygen info: `BasicBlock Class +<http://llvm.org/doxygen/classllvm_1_1BasicBlock.html>`_ + +Superclass: Value_ + +This class represents a single entry single exit section of the code, commonly +known as a basic block by the compiler community. The ``BasicBlock`` class +maintains a list of Instruction_\ s, which form the body of the block. Matching +the language definition, the last element of this list of instructions is always +a terminator instruction (a subclass of the TerminatorInst_ class). + +In addition to tracking the list of instructions that make up the block, the +``BasicBlock`` class also keeps track of the :ref:`Function <c_Function>` that +it is embedded into. + +Note that ``BasicBlock``\ s themselves are Value_\ s, because they are +referenced by instructions like branches and can go in the switch tables. +``BasicBlock``\ s have type ``label``. + +.. _m_BasicBlock: + +Important Public Members of the ``BasicBlock`` class +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +* ``BasicBlock(const std::string &Name = "", Function *Parent = 0)`` + + The ``BasicBlock`` constructor is used to create new basic blocks for + insertion into a function. The constructor optionally takes a name for the + new block, and a :ref:`Function <c_Function>` to insert it into. If the + ``Parent`` parameter is specified, the new ``BasicBlock`` is automatically + inserted at the end of the specified :ref:`Function <c_Function>`, if not + specified, the BasicBlock must be manually inserted into the :ref:`Function + <c_Function>`. + +* | ``BasicBlock::iterator`` - Typedef for instruction list iterator + | ``BasicBlock::const_iterator`` - Typedef for const_iterator. + | ``begin()``, ``end()``, ``front()``, ``back()``, + ``size()``, ``empty()`` + STL-style functions for accessing the instruction list. + + These methods and typedefs are forwarding functions that have the same + semantics as the standard library methods of the same names. These methods + expose the underlying instruction list of a basic block in a way that is easy + to manipulate. To get the full complement of container operations (including + operations to update the list), you must use the ``getInstList()`` method. + +* ``BasicBlock::InstListType &getInstList()`` + + This method is used to get access to the underlying container that actually + holds the Instructions. This method must be used when there isn't a + forwarding function in the ``BasicBlock`` class for the operation that you + would like to perform. Because there are no forwarding functions for + "updating" operations, you need to use this if you want to update the contents + of a ``BasicBlock``. + +* ``Function *getParent()`` + + Returns a pointer to :ref:`Function <c_Function>` the block is embedded into, + or a null pointer if it is homeless. + +* ``TerminatorInst *getTerminator()`` + + Returns a pointer to the terminator instruction that appears at the end of the + ``BasicBlock``. If there is no terminator instruction, or if the last + instruction in the block is not a terminator, then a null pointer is returned. + +.. _Argument: + +The ``Argument`` class +---------------------- + +This subclass of Value defines the interface for incoming formal arguments to a +function. A Function maintains a list of its formal arguments. An argument has +a pointer to the parent Function. + + diff --git a/docs/Projects.rst b/docs/Projects.rst index 63132887a5..c5d03d33a0 100644 --- a/docs/Projects.rst +++ b/docs/Projects.rst @@ -156,9 +156,9 @@ Underneath your top level directory, you should have the following directories: * LLVM provides a ``tcl`` procedure that is used by ``Dejagnu`` to run tests. It can be found in ``llvm/lib/llvm-dg.exp``. This test procedure uses ``RUN`` lines in the actual test case to determine how to run the test. See the - `TestingGuide <TestingGuide.html>`_ for more details. You can easily write - Makefile support similar to the Makefiles in ``llvm/test`` to use ``Dejagnu`` - to run your project's tests. + :doc:`TestingGuide` for more details. You can easily write Makefile + support similar to the Makefiles in ``llvm/test`` to use ``Dejagnu`` to + run your project's tests. * LLVM contains an optional package called ``llvm-test``, which provides benchmarks and programs that are known to compile with the Clang front diff --git a/docs/ReleaseNotes.html b/docs/ReleaseNotes.html deleted file mode 100644 index a10f541ce7..0000000000 --- a/docs/ReleaseNotes.html +++ /dev/null @@ -1,783 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> -<html> -<head> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <link rel="stylesheet" href="_static/llvm.css" type="text/css"> - <title>LLVM 3.2 Release Notes</title> -</head> -<body> - -<h1>LLVM 3.2 Release Notes</h1> - -<div> -<img style="float:right" src="http://llvm.org/img/DragonSmall.png" - width="136" height="136" alt="LLVM Dragon Logo"> -</div> - -<ol> - <li><a href="#intro">Introduction</a></li> - <li><a href="#subproj">Sub-project Status Update</a></li> - <li><a href="#externalproj">External Projects Using LLVM 3.2</a></li> - <li><a href="#whatsnew">What's New in LLVM?</a></li> - <li><a href="GettingStarted.html">Installation Instructions</a></li> - <li><a href="#knownproblems">Known Problems</a></li> - <li><a href="#additionalinfo">Additional Information</a></li> -</ol> - -<div class="doc_author"> - <p>Written by the <a href="http://llvm.org/">LLVM Team</a></p> -</div> - -<h1 style="color:red">These are in-progress notes for the upcoming LLVM 3.2 -release.<br> -You may prefer the -<a href="http://llvm.org/releases/3.1/docs/ReleaseNotes.html">LLVM 3.1 -Release Notes</a>.</h1> - -<!-- *********************************************************************** --> -<h2> - <a name="intro">Introduction</a> -</h2> -<!-- *********************************************************************** --> - -<div> - -<p>This document contains the release notes for the LLVM Compiler - Infrastructure, release 3.2. Here we describe the status of LLVM, including - major improvements from the previous release, improvements in various - subprojects of LLVM, and some of the current users of the code. All LLVM - releases may be downloaded from the <a href="http://llvm.org/releases/">LLVM - releases web site</a>.</p> - -<p>For more information about LLVM, including information about the latest - release, please check out the <a href="http://llvm.org/">main LLVM web - site</a>. If you have questions or comments, - the <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">LLVM - Developer's Mailing List</a> is a good place to send them.</p> - -<p>Note that if you are reading this file from a Subversion checkout or the main - LLVM web page, this document applies to the <i>next</i> release, not the - current one. To see the release notes for a specific release, please see the - <a href="http://llvm.org/releases/">releases page</a>.</p> - -</div> - - -<!-- *********************************************************************** --> -<h2> - <a name="subproj">Sub-project Status Update</a> -</h2> -<!-- *********************************************************************** --> - -<div> - -<p>The LLVM 3.2 distribution currently consists of code from the core LLVM - repository, which roughly includes the LLVM optimizers, code generators and - supporting tools, and the Clang repository. In addition to this code, the - LLVM Project includes other sub-projects that are in development. Here we - include updates on these subprojects.</p> - -<!--=========================================================================--> -<h3> -<a name="clang">Clang: C/C++/Objective-C Frontend Toolkit</a> -</h3> - -<div> - -<p><a href="http://clang.llvm.org/">Clang</a> is an LLVM front end for the C, - C++, and Objective-C languages. Clang aims to provide a better user - experience through expressive diagnostics, a high level of conformance to - language standards, fast compilation, and low memory use. Like LLVM, Clang - provides a modular, library-based architecture that makes it suitable for - creating or integrating with other development tools. Clang is considered a - production-quality compiler for C, Objective-C, C++ and Objective-C++ on x86 - (32- and 64-bit), and for Darwin/ARM targets.</p> - -<p>In the LLVM 3.2 time-frame, the Clang team has made many improvements. - Highlights include:</p> -<ul> - <li>...</li> -</ul> - -<p>For more details about the changes to Clang since the 3.1 release, see the - <a href="http://clang.llvm.org/docs/ReleaseNotes.html">Clang release - notes.</a></p> - -<p>If Clang rejects your code but another compiler accepts it, please take a - look at the <a href="http://clang.llvm.org/compatibility.html">language - compatibility</a> guide to make sure this is not intentional or a known - issue.</p> - -</div> - -<!--=========================================================================--> -<h3> -<a name="dragonegg">DragonEgg: GCC front-ends, LLVM back-end</a> -</h3> - -<div> - -<p><a href="http://dragonegg.llvm.org/">DragonEgg</a> is a - <a href="http://gcc.gnu.org/wiki/plugins">gcc plugin</a> that replaces GCC's - optimizers and code generators with LLVM's. It works with gcc-4.5 and gcc-4.6 - (and partially with gcc-4.7), can target the x86-32/x86-64 and ARM processor - families, and has been successfully used on the Darwin, FreeBSD, KFreeBSD, - Linux and OpenBSD platforms. It fully supports Ada, C, C++ and Fortran. It - has partial support for Go, Java, Obj-C and Obj-C++.</p> - -<p>The 3.2 release has the following notable changes:</p> - -<ul> - <li>...</li> -</ul> - -</div> - -<!--=========================================================================--> -<h3> -<a name="compiler-rt">compiler-rt: Compiler Runtime Library</a> -</h3> - -<div> - -<p>The new LLVM <a href="http://compiler-rt.llvm.org/">compiler-rt project</a> - is a simple library that provides an implementation of the low-level - target-specific hooks required by code generation and other runtime - components. For example, when compiling for a 32-bit target, converting a - double to a 64-bit unsigned integer is compiled into a runtime call to the - <code>__fixunsdfdi</code> function. The compiler-rt library provides highly - optimized implementations of this and other low-level routines (some are 3x - faster than the equivalent libgcc routines).</p> - -<p>The 3.2 release has the following notable changes:</p> - -<ul> - <li>...</li> -</ul> - -</div> - -<!--=========================================================================--> -<h3> -<a name="lldb">LLDB: Low Level Debugger</a> -</h3> - -<div> - -<p><a href="http://lldb.llvm.org">LLDB</a> is a ground-up implementation of a - command line debugger, as well as a debugger API that can be used from other - applications. LLDB makes use of the Clang parser to provide high-fidelity - expression parsing (particularly for C++) and uses the LLVM JIT for target - support.</p> - -<p>The 3.2 release has the following notable changes:</p> - -<ul> - <li>...</li> -</ul> - -</div> - -<!--=========================================================================--> -<h3> -<a name="libc++">libc++: C++ Standard Library</a> -</h3> - -<div> - -<p>Like compiler_rt, libc++ is now <a href="DeveloperPolicy.html#license">dual - licensed</a> under the MIT and UIUC license, allowing it to be used more - permissively.</p> - -<p>Within the LLVM 3.2 time-frame there were the following highlights:</p> - -<ul> - <li>...</li> -</ul> - -</div> - -<!--=========================================================================--> -<h3> -<a name="vmkit">VMKit</a> -</h3> - -<div> - -<p>The <a href="http://vmkit.llvm.org/">VMKit project</a> is an implementation - of a Java Virtual Machine (Java VM or JVM) that uses LLVM for static and - just-in-time compilation.</p> - -<p>The 3.2 release has the following notable changes:</p> - -<ul> - <li>...</li> -</ul> - -</div> - - -<!--=========================================================================--> -<h3> -<a name="Polly">Polly: Polyhedral Optimizer</a> -</h3> - -<div> - -<p><a href="http://polly.llvm.org/">Polly</a> is an <em>experimental</em> - optimizer for data locality and parallelism. It currently provides high-level - loop optimizations and automatic parallelisation (using the OpenMP run time). - Work in the area of automatic SIMD and accelerator code generation was - started.</p> - -<p>Within the LLVM 3.2 time-frame there were the following highlights:</p> - -<ul> - <li>...</li> -</ul> - -</div> - -</div> - -<!-- *********************************************************************** --> -<h2> - <a name="externalproj">External Open Source Projects Using LLVM 3.2</a> -</h2> -<!-- *********************************************************************** --> - -<div> - -<p>An exciting aspect of LLVM is that it is used as an enabling technology for - a lot of other language and tools projects. This section lists some of the - projects that have already been updated to work with LLVM 3.2.</p> - -<h3>Crack</h3> - -<div> - -<p><a href="http://code.google.com/p/crack-language/">Crack</a> aims to provide - the ease of development of a scripting language with the performance of a - compiled language. The language derives concepts from C++, Java and Python, - incorporating object-oriented programming, operator overloading and strong - typing.</p> - -</div> - -<h3>FAUST</h3> - -<div> - -<p><a href="http://faust.grame.fr/">FAUST</a> is a compiled language for - real-time audio signal processing. The name FAUST stands for Functional - AUdio STream. Its programming model combines two approaches: functional - programming and block diagram composition. In addition with the C, C++, Java, - JavaScript output formats, the Faust compiler can generate LLVM bitcode, and - works with LLVM 2.7-3.1.</p> - -</div> - -<h3>Glasgow Haskell Compiler (GHC)</h3> - -<div> - -<p><a href="http://www.haskell.org/ghc/">GHC</a> is an open source compiler and - programming suite for Haskell, a lazy functional programming language. It - includes an optimizing static compiler generating good code for a variety of - platforms, together with an interactive system for convenient, quick - development.</p> - -<p>GHC 7.0 and onwards include an LLVM code generator, supporting LLVM 2.8 and - later.</p> - -</div> - -<h3>Julia</h3> - -<div> - -<p><a href="https://github.com/JuliaLang/julia">Julia</a> is a high-level, - high-performance dynamic language for technical computing. It provides a - sophisticated compiler, distributed parallel execution, numerical accuracy, - and an extensive mathematical function library. The compiler uses type - inference to generate fast code without any type declarations, and uses - LLVM's optimization passes and JIT compiler. The - <a href="http://julialang.org/"> Julia Language</a> is designed - around multiple dispatch, giving programs a large degree of flexibility. It - is ready for use on many kinds of problems.</p> - -</div> - -<h3>LLVM D Compiler</h3> - -<div> - -<p><a href="https://github.com/ldc-developers/ldc">LLVM D Compiler</a> (LDC) is - a compiler for the D programming Language. It is based on the DMD frontend - and uses LLVM as backend.</p> - -</div> - -<h3>Open Shading Language</h3> - -<div> - -<p><a href="https://github.com/imageworks/OpenShadingLanguage/">Open Shading - Language (OSL)</a> is a small but rich language for programmable shading in - advanced global illumination renderers and other applications, ideal for - describing materials, lights, displacement, and pattern generation. It uses - LLVM to JIT complex shader networks to x86 code at runtime.</p> - -<p>OSL was developed by Sony Pictures Imageworks for use in its in-house - renderer used for feature film animation and visual effects, and is - distributed as open source software with the "New BSD" license.</p> - -</div> - -<h3>Portable OpenCL (pocl)</h3> - -<div> - -<p>In addition to producing an easily portable open source OpenCL - implementation, another major goal of <a href="http://pocl.sourceforge.net/"> - pocl</a> is improving performance portability of OpenCL programs with - compiler optimizations, reducing the need for target-dependent manual - optimizations. An important part of pocl is a set of LLVM passes used to - statically parallelize multiple work-items with the kernel compiler, even in - the presence of work-group barriers. This enables static parallelization of - the fine-grained static concurrency in the work groups in multiple ways - (SIMD, VLIW, superscalar,...).</p> - -</div> - -<h3>Pure</h3> - -<div> - -<p><a href="http://pure-lang.googlecode.com/">Pure</a> is an - algebraic/functional programming language based on term rewriting. Programs - are collections of equations which are used to evaluate expressions in a - symbolic fashion. The interpreter uses LLVM as a backend to JIT-compile Pure - programs to fast native code. Pure offers dynamic typing, eager and lazy - evaluation, lexical closures, a hygienic macro system (also based on term - rewriting), built-in list and matrix support (including list and matrix - comprehensions) and an easy-to-use interface to C and other programming - languages (including the ability to load LLVM bitcode modules, and inline C, - C++, Fortran and Faust code in Pure programs if the corresponding - LLVM-enabled compilers are installed).</p> - -<p>Pure version 0.54 has been tested and is known to work with LLVM 3.1 (and - continues to work with older LLVM releases >= 2.5).</p> - -</div> - -<h3>TTA-based Co-design Environment (TCE)</h3> - -<div> - -<p><a href="http://tce.cs.tut.fi/">TCE</a> is a toolset for designing - application-specific processors (ASP) based on the Transport triggered - architecture (TTA). The toolset provides a complete co-design flow from C/C++ - programs down to synthesizable VHDL/Verilog and parallel program binaries. - Processor customization points include the register files, function units, - supported operations, and the interconnection network.</p> - -<p>TCE uses Clang and LLVM for C/C++ language support, target independent - optimizations and also for parts of code generation. It generates new - LLVM-based code generators "on the fly" for the designed TTA processors and - loads them in to the compiler backend as runtime libraries to avoid - per-target recompilation of larger parts of the compiler chain.</p> - -</div> - -</div> - -<!-- *********************************************************************** --> -<h2> - <a name="whatsnew">What's New in LLVM 3.2?</a> -</h2> -<!-- *********************************************************************** --> - -<div> - -<p>This release includes a huge number of bug fixes, performance tweaks and - minor improvements. Some of the major improvements and new features are - listed in this section.</p> - -<!--=========================================================================--> -<h3> -<a name="majorfeatures">Major New Features</a> -</h3> - -<div> - - <!-- Features that need text if they're finished for 3.2: - ARM EHABI - combiner-aa? - strong phi elim - loop dependence analysis - CorrelatedValuePropagation - lib/Transforms/IPO/MergeFunctions.cpp => consider for 3.2. - Integrated assembler on by default for arm/thumb? - - --> - - <!-- Near dead: - Analysis/RegionInfo.h + Dom Frontiers - SparseBitVector: used in LiveVar. - llvm/lib/Archive - replace with lib object? - --> - -<p>LLVM 3.2 includes several major changes and big features:</p> - -<ul> - <li>...</li> -</ul> - -</div> - - -<!--=========================================================================--> -<h3> -<a name="coreimprovements">LLVM IR and Core Improvements</a> -</h3> - -<div> - -<p>LLVM IR has several new features for better support of new targets and that - expose new optimization opportunities:</p> - -<ul> - <li>Thread local variables may have a specified TLS model. See the - <a href="LangRef.html#globalvars">Language Reference Manual</a>.</li> - <li>...</li> -</ul> - -</div> - -<!--=========================================================================--> -<h3> -<a name="optimizer">Optimizer Improvements</a> -</h3> - -<div> - -<p>In addition to many minor performance tweaks and bug fixes, this release - includes a few major enhancements and additions to the optimizers:</p> - -<p> Loop Vectorizer - We've added a loop vectorizer and we are now able to - vectorize small loops. The loop vectorizer is disabled by default and - can be enabled using the <b>-mllvm -vectorize-loops</b> flag. - The SIMD vector width can be specified using the flag - <b>-mllvm -force-vector-width=4</b>. - The default value is <b>0</b> which means auto-select. - <br/> - We can now vectorize this code: - - <pre class="doc_code"> - for (i=0; i<n; i++) { - a[i] = b[i+1] + c[i+3] + i; - sum += d[i]; - } - </pre> - - </p> - -<ul> - <li>...</li> -</ul> - -</div> - -<!--=========================================================================--> -<h3> -<a name="mc">MC Level Improvements</a> -</h3> - -<div> - -<p>The LLVM Machine Code (aka MC) subsystem was created to solve a number of - problems in the realm of assembly, disassembly, object file format handling, - and a number of other related areas that CPU instruction-set level tools work - in. For more information, please see the - <a href="http://blog.llvm.org/2010/04/intro-to-llvm-mc-project.html">Intro - to the LLVM MC Project Blog Post</a>.</p> - -<ul> - <li>...</li> -</ul> - -</div> - -<!--=========================================================================--> -<h3> -<a name="codegen">Target Independent Code Generator Improvements</a> -</h3> - -<div> - -<p>Stack Coloring - We have implemented a new optimization pass - to merge stack objects which are used in disjoin areas of the code. - This optimization reduces the required stack space significantly, in cases - where it is clear to the optimizer that the stack slot is not shared. - We use the lifetime markers to tell the codegen that a certain alloca - is used within a region.</p> - -<p> We now merge consecutive loads and stores. </p> - -<p>We have put a significant amount of work into the code generator - infrastructure, which allows us to implement more aggressive algorithms and - make it run faster:</p> - -<ul> - <li>...</li> -</ul> - -<p> We added new TableGen infrastructure to support bundling for - Very Long Instruction Word (VLIW) architectures. TableGen can now - automatically generate a deterministic finite automaton from a VLIW - target's schedule description which can be queried to determine - legal groupings of instructions in a bundle.</p> - -<p> We have added a new target independent VLIW packetizer based on the - DFA infrastructure to group machine instructions into bundles.</p> - -</div> - -<h4> -<a name="blockplacement">Basic Block Placement</a> -</h4> - -<div> - -<p>A probability based block placement and code layout algorithm was added to - LLVM's code generator. This layout pass supports probabilities derived from - static heuristics as well as source code annotations such as - <code>__builtin_expect</code>.</p> - -</div> - -<!--=========================================================================--> -<h3> -<a name="x86">X86-32 and X86-64 Target Improvements</a> -</h3> - -<div> - -<p>New features and major changes in the X86 target include:</p> - -<ul> - <li>...</li> -</ul> - -</div> - -<!--=========================================================================--> -<h3> -<a name="ARM">ARM Target Improvements</a> -</h3> - -<div> - -<p>New features of the ARM target include:</p> - -<ul> - <li>...</li> -</ul> - -<!--_________________________________________________________________________--> - -<h4> -<a name="armintegratedassembler">ARM Integrated Assembler</a> -</h4> - -<div> - -<p>The ARM target now includes a full featured macro assembler, including - direct-to-object module support for clang. The assembler is currently enabled - by default for Darwin only pending testing and any additional necessary - platform specific support for Linux.</p> - -<p>Full support is included for Thumb1, Thumb2 and ARM modes, along with - subtarget and CPU specific extensions for VFP2, VFP3 and NEON.</p> - -<p>The assembler is Unified Syntax only (see ARM Architecural Reference Manual - for details). While there is some, and growing, support for pre-unfied - (divided) syntax, there are still significant gaps in that support.</p> - -</div> - -</div> - -<!--=========================================================================--> -<h3> -<a name="MIPS">MIPS Target Improvements</a> -</h3> - -<div> - -<p>New features and major changes in the MIPS target include:</p> - -<ul> - <li>...</li> -</ul> - -</div> - -<!--=========================================================================--> -<h3> -<a name="OtherTS">Other Target Specific Improvements</a> -</h3> - -<div> - -<ul> - <li>...</li> -</ul> - -</div> - -<!--=========================================================================--> -<h3> -<a name="changes">Major Changes and Removed Features</a> -</h3> - -<div> - -<p>If you're already an LLVM user or developer with out-of-tree changes based on - LLVM 3.2, this section lists some "gotchas" that you may run into upgrading - from the previous release.</p> - -<ul> - <li>...</li> -</ul> - -</div> - -<!--=========================================================================--> -<h3> -<a name="api_changes">Internal API Changes</a> -</h3> - -<div> - -<p>In addition, many APIs have changed in this release. Some of the major - LLVM API changes are:</p> - -<p> We've added a new interface for allowing IR-level passes to access - target-specific information. A new IR-level pass, called - "TargetTransformInfo" provides a number of low-level interfaces. - LSR and LowerInvoke already use the new interface. </p> - -<p> The TargetData structure has been renamed to DataLayout and moved to VMCore -to remove a dependency on Target. </p> - -<ul> - <li>...</li> -</ul> - -</div> - -<!--=========================================================================--> -<h3> -<a name="tools_changes">Tools Changes</a> -</h3> - -<div> - -<p>In addition, some tools have changed in this release. Some of the changes - are:</p> - -<ul> - <li>...</li> -</ul> - -</div> - - -<!--=========================================================================--> -<h3> -<a name="python">Python Bindings</a> -</h3> - -<div> - -<p>Officially supported Python bindings have been added! Feature support is far - from complete. The current bindings support interfaces to:</p> - -<ul> - <li>...</li> -</ul> - -</div> - -</div> - -<!-- *********************************************************************** --> -<h2> - <a name="knownproblems">Known Problems</a> -</h2> -<!-- *********************************************************************** --> - -<div> - -<p>LLVM is generally a production quality compiler, and is used by a broad range - of applications and shipping in many products. That said, not every - subsystem is as mature as the aggregate, particularly the more obscure - targets. If you run into a problem, please check - the <a href="http://llvm.org/bugs/">LLVM bug database</a> and submit a bug if - there isn't already one or ask on - the <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">LLVMdev - list</a>.</p> - - <p>Known problem areas include:</p> - -<ul> - <li>The CellSPU, MSP430, PTX and XCore backends are experimental.</li> - - <li>The integrated assembler, disassembler, and JIT is not supported by - several targets. If an integrated assembler is not supported, then a - system assembler is required. For more details, see the <a - href="CodeGenerator.html#targetfeatures">Target Features Matrix</a>. - </li> -</ul> - -</div> - -<!-- *********************************************************************** --> -<h2> - <a name="additionalinfo">Additional Information</a> -</h2> -<!-- *********************************************************************** --> - -<div> - -<p>A wide variety of additional information is available on - the <a href="http://llvm.org/">LLVM web page</a>, in particular in - the <a href="http://llvm.org/docs/">documentation</a> section. The web page - also contains versions of the API documentation which is up-to-date with the - Subversion version of the source code. You can access versions of these - documents specific to this release by going into the "<tt>llvm/doc/</tt>" - directory in the LLVM tree.</p> - -<p>If you have any questions or comments about LLVM, please feel free to contact - us via the <a href="http://llvm.org/docs/#maillist"> mailing lists</a>.</p> - -</div> - -<!-- *********************************************************************** --> - -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss-blue" alt="Valid CSS"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"></a> - - <a href="http://llvm.org/">LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> - -</body> -</html> diff --git a/docs/ReleaseNotes.rst b/docs/ReleaseNotes.rst new file mode 100644 index 0000000000..a5922ad983 --- /dev/null +++ b/docs/ReleaseNotes.rst @@ -0,0 +1,564 @@ +.. raw:: html + + <style> .red {color:red} </style> + +.. role:: red + +====================== +LLVM 3.2 Release Notes +====================== + +.. contents:: + :local: + +Written by the `LLVM Team <http://llvm.org/>`_ + +:red:`These are in-progress notes for the upcoming LLVM 3.2 release. You may +prefer the` `LLVM 3.1 Release Notes <http://llvm.org/releases/3.1/docs +/ReleaseNotes.html>`_. + +Introduction +============ + +This document contains the release notes for the LLVM Compiler Infrastructure, +release 3.2. Here we describe the status of LLVM, including major improvements +from the previous release, improvements in various subprojects of LLVM, and +some of the current users of the code. All LLVM releases may be downloaded +from the `LLVM releases web site <http://llvm.org/releases/>`_. + +For more information about LLVM, including information about the latest +release, please check out the `main LLVM web site <http://llvm.org/>`_. If you +have questions or comments, the `LLVM Developer's Mailing List +<http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>`_ is a good place to send +them. + +Note that if you are reading this file from a Subversion checkout or the main +LLVM web page, this document applies to the *next* release, not the current +one. To see the release notes for a specific release, please see the `releases +page <http://llvm.org/releases/>`_. + +Sub-project Status Update +========================= + +The LLVM 3.2 distribution currently consists of code from the core LLVM +repository, which roughly includes the LLVM optimizers, code generators and +supporting tools, and the Clang repository. In addition to this code, the LLVM +Project includes other sub-projects that are in development. Here we include +updates on these subprojects. + +Clang: C/C++/Objective-C Frontend Toolkit +----------------------------------------- + +`Clang <http://clang.llvm.org/>`_ is an LLVM front end for the C, C++, and +Objective-C languages. Clang aims to provide a better user experience through +expressive diagnostics, a high level of conformance to language standards, fast +compilation, and low memory use. Like LLVM, Clang provides a modular, +library-based architecture that makes it suitable for creating or integrating +with other development tools. Clang is considered a production-quality +compiler for C, Objective-C, C++ and Objective-C++ on x86 (32- and 64-bit), and +for Darwin/ARM targets. + +In the LLVM 3.2 time-frame, the Clang team has made many improvements. +Highlights include: + +#. ... + +For more details about the changes to Clang since the 3.1 release, see the +`Clang release notes. <http://clang.llvm.org/docs/ReleaseNotes.html>`_ + +If Clang rejects your code but another compiler accepts it, please take a look +at the `language compatibility <http://clang.llvm.org/compatibility.html>`_ +guide to make sure this is not intentional or a known issue. + +DragonEgg: GCC front-ends, LLVM back-end +---------------------------------------- + +`DragonEgg <http://dragonegg.llvm.org/>`_ is a `gcc plugin +<http://gcc.gnu.org/wiki/plugins>`_ that replaces GCC's optimizers and code +generators with LLVM's. It works with gcc-4.5 and gcc-4.6 (and partially with +gcc-4.7), can target the x86-32/x86-64 and ARM processor families, and has been +successfully used on the Darwin, FreeBSD, KFreeBSD, Linux and OpenBSD +platforms. It fully supports Ada, C, C++ and Fortran. It has partial support +for Go, Java, Obj-C and Obj-C++. + +The 3.2 release has the following notable changes: + +#. ... + +compiler-rt: Compiler Runtime Library +------------------------------------- + +The new LLVM `compiler-rt project <http://compiler-rt.llvm.org/>`_ is a simple +library that provides an implementation of the low-level target-specific hooks +required by code generation and other runtime components. For example, when +compiling for a 32-bit target, converting a double to a 64-bit unsigned integer +is compiled into a runtime call to the ``__fixunsdfdi`` function. The +``compiler-rt`` library provides highly optimized implementations of this and +other low-level routines (some are 3x faster than the equivalent libgcc +routines). + +The 3.2 release has the following notable changes: + +#. ... + +LLDB: Low Level Debugger +------------------------ + +`LLDB <http://lldb.llvm.org>`_ is a ground-up implementation of a command line +debugger, as well as a debugger API that can be used from other applications. +LLDB makes use of the Clang parser to provide high-fidelity expression parsing +(particularly for C++) and uses the LLVM JIT for target support. + +The 3.2 release has the following notable changes: + +#. ... + +libc++: C++ Standard Library +---------------------------- + +Like compiler_rt, libc++ is now :ref:`dual licensed +<copyright-license-patents>` under the MIT and UIUC license, allowing it to be +used more permissively. + +Within the LLVM 3.2 time-frame there were the following highlights: + +#. ... + +VMKit +----- + +The `VMKit project <http://vmkit.llvm.org/>`_ is an implementation of a Java +Virtual Machine (Java VM or JVM) that uses LLVM for static and just-in-time +compilation. + +The 3.2 release has the following notable changes: + +#. ... + +Polly: Polyhedral Optimizer +--------------------------- + +`Polly <http://polly.llvm.org/>`_ is an *experimental* optimizer for data +locality and parallelism. It provides high-level loop optimizations and +automatic parallelisation. + +Within the LLVM 3.2 time-frame there were the following highlights: + +#. isl, the integer set library used by Polly, was relicensed to the MIT license +#. isl based code generation +#. MIT licensed replacement for CLooG (LGPLv2) +#. Fine grained option handling (separation of core and border computations, + control overhead vs. code size) +#. Support for FORTRAN and dragonegg +#. OpenMP code generation fixes + +External Open Source Projects Using LLVM 3.2 +============================================ + +An exciting aspect of LLVM is that it is used as an enabling technology for a +lot of other language and tools projects. This section lists some of the +projects that have already been updated to work with LLVM 3.2. + +Crack +----- + +`Crack <http://code.google.com/p/crack-language/>`_ aims to provide the ease of +development of a scripting language with the performance of a compiled +language. The language derives concepts from C++, Java and Python, +incorporating object-oriented programming, operator overloading and strong +typing. + +FAUST +----- + +`FAUST <http://faust.grame.fr/>`_ is a compiled language for real-time audio +signal processing. The name FAUST stands for Functional AUdio STream. Its +programming model combines two approaches: functional programming and block +diagram composition. In addition with the C, C++, Java, JavaScript output +formats, the Faust compiler can generate LLVM bitcode, and works with LLVM +2.7-3.1. + +Glasgow Haskell Compiler (GHC) +------------------------------ + +`GHC <http://www.haskell.org/ghc/>`_ is an open source compiler and programming +suite for Haskell, a lazy functional programming language. It includes an +optimizing static compiler generating good code for a variety of platforms, +together with an interactive system for convenient, quick development. + +GHC 7.0 and onwards include an LLVM code generator, supporting LLVM 2.8 and +later. + +Julia +----- + +`Julia <https://github.com/JuliaLang/julia>`_ is a high-level, high-performance +dynamic language for technical computing. It provides a sophisticated +compiler, distributed parallel execution, numerical accuracy, and an extensive +mathematical function library. The compiler uses type inference to generate +fast code without any type declarations, and uses LLVM's optimization passes +and JIT compiler. The `Julia Language <http://julialang.org/>`_ is designed +around multiple dispatch, giving programs a large degree of flexibility. It is +ready for use on many kinds of problems. + +LLVM D Compiler +--------------- + +`LLVM D Compiler <https://github.com/ldc-developers/ldc>`_ (LDC) is a compiler +for the D programming Language. It is based on the DMD frontend and uses LLVM +as backend. + +Open Shading Language +--------------------- + +`Open Shading Language (OSL) +<https://github.com/imageworks/OpenShadingLanguage/>`_ is a small but rich +language for programmable shading in advanced global illumination renderers and +other applications, ideal for describing materials, lights, displacement, and +pattern generation. It uses LLVM to JIT complex shader networks to x86 code at +runtime. + +OSL was developed by Sony Pictures Imageworks for use in its in-house renderer +used for feature film animation and visual effects, and is distributed as open +source software with the "New BSD" license. + +Portable OpenCL (pocl) +---------------------- + +In addition to producing an easily portable open source OpenCL implementation, +another major goal of `pocl <http://pocl.sourceforge.net/>`_ is improving +performance portability of OpenCL programs with compiler optimizations, +reducing the need for target-dependent manual optimizations. An important part +of pocl is a set of LLVM passes used to statically parallelize multiple +work-items with the kernel compiler, even in the presence of work-group +barriers. This enables static parallelization of the fine-grained static +concurrency in the work groups in multiple ways (SIMD, VLIW, superscalar, ...). + +Pure +---- + +`Pure <http://pure-lang.googlecode.com/>`_ is an algebraic/functional +programming language based on term rewriting. Programs are collections of +equations which are used to evaluate expressions in a symbolic fashion. The +interpreter uses LLVM as a backend to JIT-compile Pure programs to fast native +code. Pure offers dynamic typing, eager and lazy evaluation, lexical closures, +a hygienic macro system (also based on term rewriting), built-in list and +matrix support (including list and matrix comprehensions) and an easy-to-use +interface to C and other programming languages (including the ability to load +LLVM bitcode modules, and inline C, C++, Fortran and Faust code in Pure +programs if the corresponding LLVM-enabled compilers are installed). + +Pure version 0.54 has been tested and is known to work with LLVM 3.1 (and +continues to work with older LLVM releases >= 2.5). + +TTA-based Co-design Environment (TCE) +------------------------------------- + +`TCE <http://tce.cs.tut.fi/>`_ is a toolset for designing application-specific +processors (ASP) based on the Transport triggered architecture (TTA). The +toolset provides a complete co-design flow from C/C++ programs down to +synthesizable VHDL/Verilog and parallel program binaries. Processor +customization points include the register files, function units, supported +operations, and the interconnection network. + +TCE uses Clang and LLVM for C/C++ language support, target independent +optimizations and also for parts of code generation. It generates new +LLVM-based code generators "on the fly" for the designed TTA processors and +loads them in to the compiler backend as runtime libraries to avoid per-target +recompilation of larger parts of the compiler chain. + +Installation Instructions +========================= + +See :doc:`GettingStarted`. + +What's New in LLVM 3.2? +======================= + +This release includes a huge number of bug fixes, performance tweaks and minor +improvements. Some of the major improvements and new features are listed in +this section. + +Major New Features +------------------ + +.. + + Features that need text if they're finished for 3.2: + ARM EHABI + combiner-aa? + strong phi elim + loop dependence analysis + CorrelatedValuePropagation + lib/Transforms/IPO/MergeFunctions.cpp => consider for 3.2. + Integrated assembler on by default for arm/thumb? + + Near dead: + Analysis/RegionInfo.h + Dom Frontiers + SparseBitVector: used in LiveVar. + llvm/lib/Archive - replace with lib object? + + +LLVM 3.2 includes several major changes and big features: + +#. New NVPTX back-end (replacing existing PTX back-end) based on NVIDIA sources +#. ... + +LLVM IR and Core Improvements +----------------------------- + +LLVM IR has several new features for better support of new targets and that +expose new optimization opportunities: + +#. Thread local variables may have a specified TLS model. See the :ref:`Language + Reference Manual <globalvars>`. +#. ... + +Optimizer Improvements +---------------------- + +In addition to many minor performance tweaks and bug fixes, this release +includes a few major enhancements and additions to the optimizers: + +Loop Vectorizer - We've added a loop vectorizer and we are now able to +vectorize small loops. The loop vectorizer is disabled by default and can be +enabled using the ``-mllvm -vectorize-loops`` flag. The SIMD vector width can +be specified using the flag ``-mllvm -force-vector-width=4``. The default +value is ``0`` which means auto-select. + +We can now vectorize this function: + +.. code-block:: c++ + + unsigned sum_arrays(int *A, int *B, int start, int end) { + unsigned sum = 0; + for (int i = start; i < end; ++i) + sum += A[i] + B[i] + i; + return sum; + } + +We vectorize under the following loops: + +#. The inner most loops must have a single basic block. +#. The number of iterations are known before the loop starts to execute. +#. The loop counter needs to be incremented by one. +#. The loop trip count **can** be a variable. +#. Loops do **not** need to start at zero. +#. The induction variable can be used inside the loop. +#. Loop reductions are supported. +#. Arrays with affine access pattern do **not** need to be marked as + '``noalias``' and are checked at runtime. +#. ... + +SROA - We've re-written SROA to be significantly more powerful. + +#. Branch weight metadata is preseved through more of the optimizer. +#. ... + +MC Level Improvements +--------------------- + +The LLVM Machine Code (aka MC) subsystem was created to solve a number of +problems in the realm of assembly, disassembly, object file format handling, +and a number of other related areas that CPU instruction-set level tools work +in. For more information, please see the `Intro to the LLVM MC Project Blog +Post <http://blog.llvm.org/2010/04/intro-to-llvm-mc-project.html>`_. + +#. ... + +.. _codegen: + +Target Independent Code Generator Improvements +---------------------------------------------- + +Stack Coloring - We have implemented a new optimization pass to merge stack +objects which are used in disjoin areas of the code. This optimization reduces +the required stack space significantly, in cases where it is clear to the +optimizer that the stack slot is not shared. We use the lifetime markers to +tell the codegen that a certain alloca is used within a region. + +We now merge consecutive loads and stores. + +We have put a significant amount of work into the code generator +infrastructure, which allows us to implement more aggressive algorithms and +make it run faster: + +#. ... + +We added new TableGen infrastructure to support bundling for Very Long +Instruction Word (VLIW) architectures. TableGen can now automatically generate +a deterministic finite automaton from a VLIW target's schedule description +which can be queried to determine legal groupings of instructions in a bundle. + +We have added a new target independent VLIW packetizer based on the DFA +infrastructure to group machine instructions into bundles. + +Basic Block Placement +^^^^^^^^^^^^^^^^^^^^^ + +A probability based block placement and code layout algorithm was added to +LLVM's code generator. This layout pass supports probabilities derived from +static heuristics as well as source code annotations such as +``__builtin_expect``. + +X86-32 and X86-64 Target Improvements +------------------------------------- + +New features and major changes in the X86 target include: + +#. ... + +.. _ARM: + +ARM Target Improvements +----------------------- + +New features of the ARM target include: + +#. ... + +.. _armintegratedassembler: + +ARM Integrated Assembler +^^^^^^^^^^^^^^^^^^^^^^^^ + +The ARM target now includes a full featured macro assembler, including +direct-to-object module support for clang. The assembler is currently enabled +by default for Darwin only pending testing and any additional necessary +platform specific support for Linux. + +Full support is included for Thumb1, Thumb2 and ARM modes, along with subtarget +and CPU specific extensions for VFP2, VFP3 and NEON. + +The assembler is Unified Syntax only (see ARM Architecural Reference Manual for +details). While there is some, and growing, support for pre-unfied (divided) +syntax, there are still significant gaps in that support. + +MIPS Target Improvements +------------------------ + +New features and major changes in the MIPS target include: + +#. ... + +PowerPC Target Improvements +--------------------------- + +Many fixes and changes across LLVM (and Clang) for better compliance with the +64-bit PowerPC ELF Application Binary Interface, interoperability with GCC, and +overall 64-bit PowerPC support. Some highlights include: + +#. MCJIT support added. +#. PPC64 relocation support and (small code model) TOC handling added. +#. Parameter passing and return value fixes (alignment issues, padding, varargs + support, proper register usage, odd-sized structure support, float support, + extension of return values for i32 return values). +#. Fixes in spill and reload code for vector registers. +#. C++ exception handling enabled. +#. Changes to remediate double-rounding compatibility issues with respect to + GCC behavior. +#. Refactoring to disentangle ``ppc64-elf-linux`` ABI from Darwin ppc64 ABI + support. +#. Assorted new test cases and test case fixes (endian and word size issues). +#. Fixes for big-endian codegen bugs, instruction encodings, and instruction + constraints. +#. Implemented ``-integrated-as`` support. +#. Additional support for Altivec compare operations. +#. IBM long double support. + +There have also been code generation improvements for both 32- and 64-bit code. +Instruction scheduling support for the Freescale e500mc and e5500 cores has +been added. + +PTX/NVPTX Target Improvements +----------------------------- + +The PTX back-end has been replaced by the NVPTX back-end, which is based on the +LLVM back-end used by NVIDIA in their CUDA (nvcc) and OpenCL compiler. Some +highlights include: + +#. Compatibility with PTX 3.1 and SM 3.5. +#. Support for NVVM intrinsics as defined in the NVIDIA Compiler SDK. +#. Full compatibility with old PTX back-end, with much greater coverage of LLVM + SIR. + +Please submit any back-end bugs to the LLVM Bugzilla site. + +Other Target Specific Improvements +---------------------------------- + +#. ... + +Major Changes and Removed Features +---------------------------------- + +If you're already an LLVM user or developer with out-of-tree changes based on +LLVM 3.2, this section lists some "gotchas" that you may run into upgrading +from the previous release. + +#. The CellSPU port has been removed. It can still be found in older versions. +#. ... + +Internal API Changes +-------------------- + +In addition, many APIs have changed in this release. Some of the major LLVM +API changes are: + +We've added a new interface for allowing IR-level passes to access +target-specific information. A new IR-level pass, called +``TargetTransformInfo`` provides a number of low-level interfaces. LSR and +LowerInvoke already use the new interface. + +The ``TargetData`` structure has been renamed to ``DataLayout`` and moved to +``VMCore`` to remove a dependency on ``Target``. + +#. ... + +Tools Changes +------------- + +In addition, some tools have changed in this release. Some of the changes are: + +#. ... + +Python Bindings +--------------- + +Officially supported Python bindings have been added! Feature support is far +from complete. The current bindings support interfaces to: + +#. ... + +Known Problems +============== + +LLVM is generally a production quality compiler, and is used by a broad range +of applications and shipping in many products. That said, not every subsystem +is as mature as the aggregate, particularly the more obscure1 targets. If you +run into a problem, please check the `LLVM bug database +<http://llvm.org/bugs/>`_ and submit a bug if there isn't already one or ask on +the `LLVMdev list <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>`_. + +Known problem areas include: + +#. The CellSPU, MSP430, and XCore backends are experimental. + +#. The integrated assembler, disassembler, and JIT is not supported by several + targets. If an integrated assembler is not supported, then a system + assembler is required. For more details, see the + :ref:`target-feature-matrix`. + +Additional Information +====================== + +A wide variety of additional information is available on the `LLVM web page +<http://llvm.org/>`_, in particular in the `documentation +<http://llvm.org/docs/>`_ section. The web page also contains versions of the +API documentation which is up-to-date with the Subversion version of the source +code. You can access versions of these documents specific to this release by +going into the ``llvm/docs/`` directory in the LLVM tree. + +If you have any questions or comments about LLVM, please feel free to contact +us via the `mailing lists <http://llvm.org/docs/#maillist>`_. + diff --git a/docs/SourceLevelDebugging.html b/docs/SourceLevelDebugging.html deleted file mode 100644 index 546aab9d1a..0000000000 --- a/docs/SourceLevelDebugging.html +++ /dev/null @@ -1,2858 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> -<html> -<head> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <title>Source Level Debugging with LLVM</title> - <link rel="stylesheet" href="_static/llvm.css" type="text/css"> -</head> -<body> - -<h1>Source Level Debugging with LLVM</h1> - -<table class="layout" style="width:100%"> - <tr class="layout"> - <td class="left"> -<ul> - <li><a href="#introduction">Introduction</a> - <ol> - <li><a href="#phil">Philosophy behind LLVM debugging information</a></li> - <li><a href="#consumers">Debug information consumers</a></li> - <li><a href="#debugopt">Debugging optimized code</a></li> - </ol></li> - <li><a href="#format">Debugging information format</a> - <ol> - <li><a href="#debug_info_descriptors">Debug information descriptors</a> - <ul> - <li><a href="#format_compile_units">Compile unit descriptors</a></li> - <li><a href="#format_files">File descriptors</a></li> - <li><a href="#format_global_variables">Global variable descriptors</a></li> - <li><a href="#format_subprograms">Subprogram descriptors</a></li> - <li><a href="#format_blocks">Block descriptors</a></li> - <li><a href="#format_basic_type">Basic type descriptors</a></li> - <li><a href="#format_derived_type">Derived type descriptors</a></li> - <li><a href="#format_composite_type">Composite type descriptors</a></li> - <li><a href="#format_subrange">Subrange descriptors</a></li> - <li><a href="#format_enumeration">Enumerator descriptors</a></li> - <li><a href="#format_variables">Local variables</a></li> - </ul></li> - <li><a href="#format_common_intrinsics">Debugger intrinsic functions</a> - <ul> - <li><a href="#format_common_declare">llvm.dbg.declare</a></li> - <li><a href="#format_common_value">llvm.dbg.value</a></li> - </ul></li> - </ol></li> - <li><a href="#format_common_lifetime">Object lifetimes and scoping</a></li> - <li><a href="#ccxx_frontend">C/C++ front-end specific debug information</a> - <ol> - <li><a href="#ccxx_compile_units">C/C++ source file information</a></li> - <li><a href="#ccxx_global_variable">C/C++ global variable information</a></li> - <li><a href="#ccxx_subprogram">C/C++ function information</a></li> - <li><a href="#ccxx_basic_types">C/C++ basic types</a></li> - <li><a href="#ccxx_derived_types">C/C++ derived types</a></li> - <li><a href="#ccxx_composite_types">C/C++ struct/union types</a></li> - <li><a href="#ccxx_enumeration_types">C/C++ enumeration types</a></li> - </ol></li> - <li><a href="#llvmdwarfextension">LLVM Dwarf Extensions</a> - <ol> - <li><a href="#objcproperty">Debugging Information Extension - for Objective C Properties</a> - <ul> - <li><a href="#objcpropertyintroduction">Introduction</a></li> - <li><a href="#objcpropertyproposal">Proposal</a></li> - <li><a href="#objcpropertynewattributes">New DWARF Attributes</a></li> - <li><a href="#objcpropertynewconstants">New DWARF Constants</a></li> - </ul> - </li> - <li><a href="#acceltable">Name Accelerator Tables</a> - <ul> - <li><a href="#acceltableintroduction">Introduction</a></li> - <li><a href="#acceltablehashes">Hash Tables</a></li> - <li><a href="#acceltabledetails">Details</a></li> - <li><a href="#acceltablecontents">Contents</a></li> - <li><a href="#acceltableextensions">Language Extensions and File Format Changes</a></li> - </ul> - </li> - </ol> - </li> -</ul> -</td> -</tr></table> - -<div class="doc_author"> - <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a> - and <a href="mailto:jlaskey@mac.com">Jim Laskey</a></p> -</div> - - -<!-- *********************************************************************** --> -<h2><a name="introduction">Introduction</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>This document is the central repository for all information pertaining to - debug information in LLVM. It describes the <a href="#format">actual format - that the LLVM debug information</a> takes, which is useful for those - interested in creating front-ends or dealing directly with the information. - Further, this document provides specific examples of what debug information - for C/C++ looks like.</p> - -<!-- ======================================================================= --> -<h3> - <a name="phil">Philosophy behind LLVM debugging information</a> -</h3> - -<div> - -<p>The idea of the LLVM debugging information is to capture how the important - pieces of the source-language's Abstract Syntax Tree map onto LLVM code. - Several design aspects have shaped the solution that appears here. The - important ones are:</p> - -<ul> - <li>Debugging information should have very little impact on the rest of the - compiler. No transformations, analyses, or code generators should need to - be modified because of debugging information.</li> - - <li>LLVM optimizations should interact in <a href="#debugopt">well-defined and - easily described ways</a> with the debugging information.</li> - - <li>Because LLVM is designed to support arbitrary programming languages, - LLVM-to-LLVM tools should not need to know anything about the semantics of - the source-level-language.</li> - - <li>Source-level languages are often <b>widely</b> different from one another. - LLVM should not put any restrictions of the flavor of the source-language, - and the debugging information should work with any language.</li> - - <li>With code generator support, it should be possible to use an LLVM compiler - to compile a program to native machine code and standard debugging - formats. This allows compatibility with traditional machine-code level - debuggers, like GDB or DBX.</li> -</ul> - -<p>The approach used by the LLVM implementation is to use a small set - of <a href="#format_common_intrinsics">intrinsic functions</a> to define a - mapping between LLVM program objects and the source-level objects. The - description of the source-level program is maintained in LLVM metadata - in an <a href="#ccxx_frontend">implementation-defined format</a> - (the C/C++ front-end currently uses working draft 7 of - the <a href="http://www.eagercon.com/dwarf/dwarf3std.htm">DWARF 3 - standard</a>).</p> - -<p>When a program is being debugged, a debugger interacts with the user and - turns the stored debug information into source-language specific information. - As such, a debugger must be aware of the source-language, and is thus tied to - a specific language or family of languages.</p> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="consumers">Debug information consumers</a> -</h3> - -<div> - -<p>The role of debug information is to provide meta information normally - stripped away during the compilation process. This meta information provides - an LLVM user a relationship between generated code and the original program - source code.</p> - -<p>Currently, debug information is consumed by DwarfDebug to produce dwarf - information used by the gdb debugger. Other targets could use the same - information to produce stabs or other debug forms.</p> - -<p>It would also be reasonable to use debug information to feed profiling tools - for analysis of generated code, or, tools for reconstructing the original - source from generated code.</p> - -<p>TODO - expound a bit more.</p> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="debugopt">Debugging optimized code</a> -</h3> - -<div> - -<p>An extremely high priority of LLVM debugging information is to make it - interact well with optimizations and analysis. In particular, the LLVM debug - information provides the following guarantees:</p> - -<ul> - <li>LLVM debug information <b>always provides information to accurately read - the source-level state of the program</b>, regardless of which LLVM - optimizations have been run, and without any modification to the - optimizations themselves. However, some optimizations may impact the - ability to modify the current state of the program with a debugger, such - as setting program variables, or calling functions that have been - deleted.</li> - - <li>As desired, LLVM optimizations can be upgraded to be aware of the LLVM - debugging information, allowing them to update the debugging information - as they perform aggressive optimizations. This means that, with effort, - the LLVM optimizers could optimize debug code just as well as non-debug - code.</li> - - <li>LLVM debug information does not prevent optimizations from - happening (for example inlining, basic block reordering/merging/cleanup, - tail duplication, etc).</li> - - <li>LLVM debug information is automatically optimized along with the rest of - the program, using existing facilities. For example, duplicate - information is automatically merged by the linker, and unused information - is automatically removed.</li> -</ul> - -<p>Basically, the debug information allows you to compile a program with - "<tt>-O0 -g</tt>" and get full debug information, allowing you to arbitrarily - modify the program as it executes from a debugger. Compiling a program with - "<tt>-O3 -g</tt>" gives you full debug information that is always available - and accurate for reading (e.g., you get accurate stack traces despite tail - call elimination and inlining), but you might lose the ability to modify the - program and call functions where were optimized out of the program, or - inlined away completely.</p> - -<p><a href="TestingGuide.html#quicktestsuite">LLVM test suite</a> provides a - framework to test optimizer's handling of debugging information. It can be - run like this:</p> - -<div class="doc_code"> -<pre> -% cd llvm/projects/test-suite/MultiSource/Benchmarks # or some other level -% make TEST=dbgopt -</pre> -</div> - -<p>This will test impact of debugging information on optimization passes. If - debugging information influences optimization passes then it will be reported - as a failure. See <a href="TestingGuide.html">TestingGuide</a> for more - information on LLVM test infrastructure and how to run various tests.</p> - -</div> - -</div> - -<!-- *********************************************************************** --> -<h2> - <a name="format">Debugging information format</a> -</h2> -<!-- *********************************************************************** --> - -<div> - -<p>LLVM debugging information has been carefully designed to make it possible - for the optimizer to optimize the program and debugging information without - necessarily having to know anything about debugging information. In - particular, the use of metadata avoids duplicated debugging information from - the beginning, and the global dead code elimination pass automatically - deletes debugging information for a function if it decides to delete the - function. </p> - -<p>To do this, most of the debugging information (descriptors for types, - variables, functions, source files, etc) is inserted by the language - front-end in the form of LLVM metadata. </p> - -<p>Debug information is designed to be agnostic about the target debugger and - debugging information representation (e.g. DWARF/Stabs/etc). It uses a - generic pass to decode the information that represents variables, types, - functions, namespaces, etc: this allows for arbitrary source-language - semantics and type-systems to be used, as long as there is a module - written for the target debugger to interpret the information. </p> - -<p>To provide basic functionality, the LLVM debugger does have to make some - assumptions about the source-level language being debugged, though it keeps - these to a minimum. The only common features that the LLVM debugger assumes - exist are <a href="#format_files">source files</a>, - and <a href="#format_global_variables">program objects</a>. These abstract - objects are used by a debugger to form stack traces, show information about - local variables, etc.</p> - -<p>This section of the documentation first describes the representation aspects - common to any source-language. The <a href="#ccxx_frontend">next section</a> - describes the data layout conventions used by the C and C++ front-ends.</p> - -<!-- ======================================================================= --> -<h3> - <a name="debug_info_descriptors">Debug information descriptors</a> -</h3> - -<div> - -<p>In consideration of the complexity and volume of debug information, LLVM - provides a specification for well formed debug descriptors. </p> - -<p>Consumers of LLVM debug information expect the descriptors for program - objects to start in a canonical format, but the descriptors can include - additional information appended at the end that is source-language - specific. All LLVM debugging information is versioned, allowing backwards - compatibility in the case that the core structures need to change in some - way. Also, all debugging information objects start with a tag to indicate - what type of object it is. The source-language is allowed to define its own - objects, by using unreserved tag numbers. We recommend using with tags in - the range 0x1000 through 0x2000 (there is a defined enum DW_TAG_user_base = - 0x1000.)</p> - -<p>The fields of debug descriptors used internally by LLVM - are restricted to only the simple data types <tt>i32</tt>, <tt>i1</tt>, - <tt>float</tt>, <tt>double</tt>, <tt>mdstring</tt> and <tt>mdnode</tt>. </p> - -<div class="doc_code"> -<pre> -!1 = metadata !{ - i32, ;; A tag - ... -} -</pre> -</div> - -<p><a name="LLVMDebugVersion">The first field of a descriptor is always an - <tt>i32</tt> containing a tag value identifying the content of the - descriptor. The remaining fields are specific to the descriptor. The values - of tags are loosely bound to the tag values of DWARF information entries. - However, that does not restrict the use of the information supplied to DWARF - targets. To facilitate versioning of debug information, the tag is augmented - with the current debug version (LLVMDebugVersion = 8 << 16 or - 0x80000 or 524288.)</a></p> - -<p>The details of the various descriptors follow.</p> - -<!-- ======================================================================= --> -<h4> - <a name="format_compile_units">Compile unit descriptors</a> -</h4> - -<div> - -<div class="doc_code"> -<pre> -!0 = metadata !{ - i32, ;; Tag = 17 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> - ;; (DW_TAG_compile_unit) - i32, ;; Unused field. - i32, ;; DWARF language identifier (ex. DW_LANG_C89) - metadata, ;; Source file name - metadata, ;; Source file directory (includes trailing slash) - metadata ;; Producer (ex. "4.0.1 LLVM (LLVM research group)") - i1, ;; True if this is a main compile unit. - i1, ;; True if this is optimized. - metadata, ;; Flags - i32 ;; Runtime version - metadata ;; List of enums types - metadata ;; List of retained types - metadata ;; List of subprograms - metadata ;; List of global variables -} -</pre> -</div> - -<p>These descriptors contain a source language ID for the file (we use the DWARF - 3.0 ID numbers, such as <tt>DW_LANG_C89</tt>, <tt>DW_LANG_C_plus_plus</tt>, - <tt>DW_LANG_Cobol74</tt>, etc), three strings describing the filename, - working directory of the compiler, and an identifier string for the compiler - that produced it.</p> - -<p>Compile unit descriptors provide the root context for objects declared in a - specific compilation unit. File descriptors are defined using this context. - These descriptors are collected by a named metadata - <tt>!llvm.dbg.cu</tt>. Compile unit descriptor keeps track of subprograms, - global variables and type information. - -</div> - -<!-- ======================================================================= --> -<h4> - <a name="format_files">File descriptors</a> -</h4> - -<div> - -<div class="doc_code"> -<pre> -!0 = metadata !{ - i32, ;; Tag = 41 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> - ;; (DW_TAG_file_type) - metadata, ;; Source file name - metadata, ;; Source file directory (includes trailing slash) - metadata ;; Unused -} -</pre> -</div> - -<p>These descriptors contain information for a file. Global variables and top - level functions would be defined using this context.k File descriptors also - provide context for source line correspondence. </p> - -<p>Each input file is encoded as a separate file descriptor in LLVM debugging - information output. </p> - -</div> - -<!-- ======================================================================= --> -<h4> - <a name="format_global_variables">Global variable descriptors</a> -</h4> - -<div> - -<div class="doc_code"> -<pre> -!1 = metadata !{ - i32, ;; Tag = 52 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> - ;; (DW_TAG_variable) - i32, ;; Unused field. - metadata, ;; Reference to context descriptor - metadata, ;; Name - metadata, ;; Display name (fully qualified C++ name) - metadata, ;; MIPS linkage name (for C++) - metadata, ;; Reference to file where defined - i32, ;; Line number where defined - metadata, ;; Reference to type descriptor - i1, ;; True if the global is local to compile unit (static) - i1, ;; True if the global is defined in the compile unit (not extern) - {}* ;; Reference to the global variable -} -</pre> -</div> - -<p>These descriptors provide debug information about globals variables. The -provide details such as name, type and where the variable is defined. All -global variables are collected inside the named metadata -<tt>!llvm.dbg.cu</tt>.</p> - -</div> - -<!-- ======================================================================= --> -<h4> - <a name="format_subprograms">Subprogram descriptors</a> -</h4> - -<div> - -<div class="doc_code"> -<pre> -!2 = metadata !{ - i32, ;; Tag = 46 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> - ;; (DW_TAG_subprogram) - i32, ;; Unused field. - metadata, ;; Reference to context descriptor - metadata, ;; Name - metadata, ;; Display name (fully qualified C++ name) - metadata, ;; MIPS linkage name (for C++) - metadata, ;; Reference to file where defined - i32, ;; Line number where defined - metadata, ;; Reference to type descriptor - i1, ;; True if the global is local to compile unit (static) - i1, ;; True if the global is defined in the compile unit (not extern) - i32, ;; Line number where the scope of the subprogram begins - i32, ;; Virtuality, e.g. dwarf::DW_VIRTUALITY__virtual - i32, ;; Index into a virtual function - metadata, ;; indicates which base type contains the vtable pointer for the - ;; derived class - i32, ;; Flags - Artifical, Private, Protected, Explicit, Prototyped. - i1, ;; isOptimized - Function *,;; Pointer to LLVM function - metadata, ;; Lists function template parameters - metadata ;; Function declaration descriptor - metadata ;; List of function variables -} -</pre> -</div> - -<p>These descriptors provide debug information about functions, methods and - subprograms. They provide details such as name, return types and the source - location where the subprogram is defined. -</p> - -</div> - -<!-- ======================================================================= --> -<h4> - <a name="format_blocks">Block descriptors</a> -</h4> - -<div> - -<div class="doc_code"> -<pre> -!3 = metadata !{ - i32, ;; Tag = 11 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> (DW_TAG_lexical_block) - metadata,;; Reference to context descriptor - i32, ;; Line number - i32, ;; Column number - metadata,;; Reference to source file - i32 ;; Unique ID to identify blocks from a template function -} -</pre> -</div> - -<p>This descriptor provides debug information about nested blocks within a - subprogram. The line number and column numbers are used to dinstinguish - two lexical blocks at same depth. </p> - -<div class="doc_code"> -<pre> -!3 = metadata !{ - i32, ;; Tag = 11 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> (DW_TAG_lexical_block) - metadata ;; Reference to the scope we're annotating with a file change - metadata,;; Reference to the file the scope is enclosed in. -} -</pre> -</div> - -<p>This descriptor provides a wrapper around a lexical scope to handle file - changes in the middle of a lexical block.</p> - -</div> - -<!-- ======================================================================= --> -<h4> - <a name="format_basic_type">Basic type descriptors</a> -</h4> - -<div> - -<div class="doc_code"> -<pre> -!4 = metadata !{ - i32, ;; Tag = 36 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> - ;; (DW_TAG_base_type) - metadata, ;; Reference to context - metadata, ;; Name (may be "" for anonymous types) - metadata, ;; Reference to file where defined (may be NULL) - i32, ;; Line number where defined (may be 0) - i64, ;; Size in bits - i64, ;; Alignment in bits - i64, ;; Offset in bits - i32, ;; Flags - i32 ;; DWARF type encoding -} -</pre> -</div> - -<p>These descriptors define primitive types used in the code. Example int, bool - and float. The context provides the scope of the type, which is usually the - top level. Since basic types are not usually user defined the context - and line number can be left as NULL and 0. The size, alignment and offset - are expressed in bits and can be 64 bit values. The alignment is used to - round the offset when embedded in a - <a href="#format_composite_type">composite type</a> (example to keep float - doubles on 64 bit boundaries.) The offset is the bit offset if embedded in - a <a href="#format_composite_type">composite type</a>.</p> - -<p>The type encoding provides the details of the type. The values are typically - one of the following:</p> - -<div class="doc_code"> -<pre> -DW_ATE_address = 1 -DW_ATE_boolean = 2 -DW_ATE_float = 4 -DW_ATE_signed = 5 -DW_ATE_signed_char = 6 -DW_ATE_unsigned = 7 -DW_ATE_unsigned_char = 8 -</pre> -</div> - -</div> - -<!-- ======================================================================= --> -<h4> - <a name="format_derived_type">Derived type descriptors</a> -</h4> - -<div> - -<div class="doc_code"> -<pre> -!5 = metadata !{ - i32, ;; Tag (see below) - metadata, ;; Reference to context - metadata, ;; Name (may be "" for anonymous types) - metadata, ;; Reference to file where defined (may be NULL) - i32, ;; Line number where defined (may be 0) - i64, ;; Size in bits - i64, ;; Alignment in bits - i64, ;; Offset in bits - i32, ;; Flags to encode attributes, e.g. private - metadata, ;; Reference to type derived from - metadata, ;; (optional) Name of the Objective C property associated with - ;; Objective-C an ivar - metadata, ;; (optional) Name of the Objective C property getter selector. - metadata, ;; (optional) Name of the Objective C property setter selector. - i32 ;; (optional) Objective C property attributes. -} -</pre> -</div> - -<p>These descriptors are used to define types derived from other types. The -value of the tag varies depending on the meaning. The following are possible -tag values:</p> - -<div class="doc_code"> -<pre> -DW_TAG_formal_parameter = 5 -DW_TAG_member = 13 -DW_TAG_pointer_type = 15 -DW_TAG_reference_type = 16 -DW_TAG_typedef = 22 -DW_TAG_const_type = 38 -DW_TAG_volatile_type = 53 -DW_TAG_restrict_type = 55 -</pre> -</div> - -<p><tt>DW_TAG_member</tt> is used to define a member of - a <a href="#format_composite_type">composite type</a> - or <a href="#format_subprograms">subprogram</a>. The type of the member is - the <a href="#format_derived_type">derived - type</a>. <tt>DW_TAG_formal_parameter</tt> is used to define a member which - is a formal argument of a subprogram.</p> - -<p><tt>DW_TAG_typedef</tt> is used to provide a name for the derived type.</p> - -<p><tt>DW_TAG_pointer_type</tt>, <tt>DW_TAG_reference_type</tt>, - <tt>DW_TAG_const_type</tt>, <tt>DW_TAG_volatile_type</tt> and - <tt>DW_TAG_restrict_type</tt> are used to qualify - the <a href="#format_derived_type">derived type</a>. </p> - -<p><a href="#format_derived_type">Derived type</a> location can be determined - from the context and line number. The size, alignment and offset are - expressed in bits and can be 64 bit values. The alignment is used to round - the offset when embedded in a <a href="#format_composite_type">composite - type</a> (example to keep float doubles on 64 bit boundaries.) The offset is - the bit offset if embedded in a <a href="#format_composite_type">composite - type</a>.</p> - -<p>Note that the <tt>void *</tt> type is expressed as a type derived from NULL. -</p> - -</div> - -<!-- ======================================================================= --> -<h4> - <a name="format_composite_type">Composite type descriptors</a> -</h4> - -<div> - -<div class="doc_code"> -<pre> -!6 = metadata !{ - i32, ;; Tag (see below) - metadata, ;; Reference to context - metadata, ;; Name (may be "" for anonymous types) - metadata, ;; Reference to file where defined (may be NULL) - i32, ;; Line number where defined (may be 0) - i64, ;; Size in bits - i64, ;; Alignment in bits - i64, ;; Offset in bits - i32, ;; Flags - metadata, ;; Reference to type derived from - metadata, ;; Reference to array of member descriptors - i32 ;; Runtime languages -} -</pre> -</div> - -<p>These descriptors are used to define types that are composed of 0 or more -elements. The value of the tag varies depending on the meaning. The following -are possible tag values:</p> - -<div class="doc_code"> -<pre> -DW_TAG_array_type = 1 -DW_TAG_enumeration_type = 4 -DW_TAG_structure_type = 19 -DW_TAG_union_type = 23 -DW_TAG_vector_type = 259 -DW_TAG_subroutine_type = 21 -DW_TAG_inheritance = 28 -</pre> -</div> - -<p>The vector flag indicates that an array type is a native packed vector.</p> - -<p>The members of array types (tag = <tt>DW_TAG_array_type</tt>) or vector types - (tag = <tt>DW_TAG_vector_type</tt>) are <a href="#format_subrange">subrange - descriptors</a>, each representing the range of subscripts at that level of - indexing.</p> - -<p>The members of enumeration types (tag = <tt>DW_TAG_enumeration_type</tt>) are - <a href="#format_enumeration">enumerator descriptors</a>, each representing - the definition of enumeration value for the set. All enumeration type - descriptors are collected inside the named metadata - <tt>!llvm.dbg.cu</tt>.</p> - -<p>The members of structure (tag = <tt>DW_TAG_structure_type</tt>) or union (tag - = <tt>DW_TAG_union_type</tt>) types are any one of - the <a href="#format_basic_type">basic</a>, - <a href="#format_derived_type">derived</a> - or <a href="#format_composite_type">composite</a> type descriptors, each - representing a field member of the structure or union.</p> - -<p>For C++ classes (tag = <tt>DW_TAG_structure_type</tt>), member descriptors - provide information about base classes, static members and member - functions. If a member is a <a href="#format_derived_type">derived type - descriptor</a> and has a tag of <tt>DW_TAG_inheritance</tt>, then the type - represents a base class. If the member of is - a <a href="#format_global_variables">global variable descriptor</a> then it - represents a static member. And, if the member is - a <a href="#format_subprograms">subprogram descriptor</a> then it represents - a member function. For static members and member - functions, <tt>getName()</tt> returns the members link or the C++ mangled - name. <tt>getDisplayName()</tt> the simplied version of the name.</p> - -<p>The first member of subroutine (tag = <tt>DW_TAG_subroutine_type</tt>) type - elements is the return type for the subroutine. The remaining elements are - the formal arguments to the subroutine.</p> - -<p><a href="#format_composite_type">Composite type</a> location can be - determined from the context and line number. The size, alignment and - offset are expressed in bits and can be 64 bit values. The alignment is used - to round the offset when embedded in - a <a href="#format_composite_type">composite type</a> (as an example, to keep - float doubles on 64 bit boundaries.) The offset is the bit offset if embedded - in a <a href="#format_composite_type">composite type</a>.</p> - -</div> - -<!-- ======================================================================= --> -<h4> - <a name="format_subrange">Subrange descriptors</a> -</h4> - -<div> - -<div class="doc_code"> -<pre> -!42 = metadata !{ - i32, ;; Tag = 33 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> (DW_TAG_subrange_type) - i64, ;; Low value - i64 ;; High value -} -</pre> -</div> - -<p>These descriptors are used to define ranges of array subscripts for an array - <a href="#format_composite_type">composite type</a>. The low value defines - the lower bounds typically zero for C/C++. The high value is the upper - bounds. Values are 64 bit. High - low + 1 is the size of the array. If low - > high the array bounds are not included in generated debugging information. -</p> - -</div> - -<!-- ======================================================================= --> -<h4> - <a name="format_enumeration">Enumerator descriptors</a> -</h4> - -<div> - -<div class="doc_code"> -<pre> -!6 = metadata !{ - i32, ;; Tag = 40 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> - ;; (DW_TAG_enumerator) - metadata, ;; Name - i64 ;; Value -} -</pre> -</div> - -<p>These descriptors are used to define members of an - enumeration <a href="#format_composite_type">composite type</a>, it - associates the name to the value.</p> - -</div> - -<!-- ======================================================================= --> -<h4> - <a name="format_variables">Local variables</a> -</h4> - -<div> - -<div class="doc_code"> -<pre> -!7 = metadata !{ - i32, ;; Tag (see below) - metadata, ;; Context - metadata, ;; Name - metadata, ;; Reference to file where defined - i32, ;; 24 bit - Line number where defined - ;; 8 bit - Argument number. 1 indicates 1st argument. - metadata, ;; Type descriptor - i32, ;; flags - metadata ;; (optional) Reference to inline location -} -</pre> -</div> - -<p>These descriptors are used to define variables local to a sub program. The - value of the tag depends on the usage of the variable:</p> - -<div class="doc_code"> -<pre> -DW_TAG_auto_variable = 256 -DW_TAG_arg_variable = 257 -DW_TAG_return_variable = 258 -</pre> -</div> - -<p>An auto variable is any variable declared in the body of the function. An - argument variable is any variable that appears as a formal argument to the - function. A return variable is used to track the result of a function and - has no source correspondent.</p> - -<p>The context is either the subprogram or block where the variable is defined. - Name the source variable name. Context and line indicate where the - variable was defined. Type descriptor defines the declared type of the - variable.</p> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="format_common_intrinsics">Debugger intrinsic functions</a> -</h3> - -<div> - -<p>LLVM uses several intrinsic functions (name prefixed with "llvm.dbg") to - provide debug information at various points in generated code.</p> - -<!-- ======================================================================= --> -<h4> - <a name="format_common_declare">llvm.dbg.declare</a> -</h4> - -<div> -<pre> - void %<a href="#format_common_declare">llvm.dbg.declare</a>(metadata, metadata) -</pre> - -<p>This intrinsic provides information about a local element (e.g., variable). The - first argument is metadata holding the alloca for the variable. The - second argument is metadata containing a description of the variable.</p> -</div> - -<!-- ======================================================================= --> -<h4> - <a name="format_common_value">llvm.dbg.value</a> -</h4> - -<div> -<pre> - void %<a href="#format_common_value">llvm.dbg.value</a>(metadata, i64, metadata) -</pre> - -<p>This intrinsic provides information when a user source variable is set to a - new value. The first argument is the new value (wrapped as metadata). The - second argument is the offset in the user source variable where the new value - is written. The third argument is metadata containing a description of the - user source variable.</p> -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="format_common_lifetime">Object lifetimes and scoping</a> -</h3> - -<div> -<p>In many languages, the local variables in functions can have their lifetimes - or scopes limited to a subset of a function. In the C family of languages, - for example, variables are only live (readable and writable) within the - source block that they are defined in. In functional languages, values are - only readable after they have been defined. Though this is a very obvious - concept, it is non-trivial to model in LLVM, because it has no notion of - scoping in this sense, and does not want to be tied to a language's scoping - rules.</p> - -<p>In order to handle this, the LLVM debug format uses the metadata attached to - llvm instructions to encode line number and scoping information. Consider - the following C fragment, for example:</p> - -<div class="doc_code"> -<pre> -1. void foo() { -2. int X = 21; -3. int Y = 22; -4. { -5. int Z = 23; -6. Z = X; -7. } -8. X = Y; -9. } -</pre> -</div> - -<p>Compiled to LLVM, this function would be represented like this:</p> - -<div class="doc_code"> -<pre> -define void @foo() nounwind ssp { -entry: - %X = alloca i32, align 4 ; <i32*> [#uses=4] - %Y = alloca i32, align 4 ; <i32*> [#uses=4] - %Z = alloca i32, align 4 ; <i32*> [#uses=3] - %0 = bitcast i32* %X to {}* ; <{}*> [#uses=1] - call void @llvm.dbg.declare(metadata !{i32 * %X}, metadata !0), !dbg !7 - store i32 21, i32* %X, !dbg !8 - %1 = bitcast i32* %Y to {}* ; <{}*> [#uses=1] - call void @llvm.dbg.declare(metadata !{i32 * %Y}, metadata !9), !dbg !10 - store i32 22, i32* %Y, !dbg !11 - %2 = bitcast i32* %Z to {}* ; <{}*> [#uses=1] - call void @llvm.dbg.declare(metadata !{i32 * %Z}, metadata !12), !dbg !14 - store i32 23, i32* %Z, !dbg !15 - %tmp = load i32* %X, !dbg !16 ; <i32> [#uses=1] - %tmp1 = load i32* %Y, !dbg !16 ; <i32> [#uses=1] - %add = add nsw i32 %tmp, %tmp1, !dbg !16 ; <i32> [#uses=1] - store i32 %add, i32* %Z, !dbg !16 - %tmp2 = load i32* %Y, !dbg !17 ; <i32> [#uses=1] - store i32 %tmp2, i32* %X, !dbg !17 - ret void, !dbg !18 -} - -declare void @llvm.dbg.declare(metadata, metadata) nounwind readnone - -!0 = metadata !{i32 459008, metadata !1, metadata !"X", - metadata !3, i32 2, metadata !6}; [ DW_TAG_auto_variable ] -!1 = metadata !{i32 458763, metadata !2}; [DW_TAG_lexical_block ] -!2 = metadata !{i32 458798, i32 0, metadata !3, metadata !"foo", metadata !"foo", - metadata !"foo", metadata !3, i32 1, metadata !4, - i1 false, i1 true}; [DW_TAG_subprogram ] -!3 = metadata !{i32 458769, i32 0, i32 12, metadata !"foo.c", - metadata !"/private/tmp", metadata !"clang 1.1", i1 true, - i1 false, metadata !"", i32 0}; [DW_TAG_compile_unit ] -!4 = metadata !{i32 458773, metadata !3, metadata !"", null, i32 0, i64 0, i64 0, - i64 0, i32 0, null, metadata !5, i32 0}; [DW_TAG_subroutine_type ] -!5 = metadata !{null} -!6 = metadata !{i32 458788, metadata !3, metadata !"int", metadata !3, i32 0, - i64 32, i64 32, i64 0, i32 0, i32 5}; [DW_TAG_base_type ] -!7 = metadata !{i32 2, i32 7, metadata !1, null} -!8 = metadata !{i32 2, i32 3, metadata !1, null} -!9 = metadata !{i32 459008, metadata !1, metadata !"Y", metadata !3, i32 3, - metadata !6}; [ DW_TAG_auto_variable ] -!10 = metadata !{i32 3, i32 7, metadata !1, null} -!11 = metadata !{i32 3, i32 3, metadata !1, null} -!12 = metadata !{i32 459008, metadata !13, metadata !"Z", metadata !3, i32 5, - metadata !6}; [ DW_TAG_auto_variable ] -!13 = metadata !{i32 458763, metadata !1}; [DW_TAG_lexical_block ] -!14 = metadata !{i32 5, i32 9, metadata !13, null} -!15 = metadata !{i32 5, i32 5, metadata !13, null} -!16 = metadata !{i32 6, i32 5, metadata !13, null} -!17 = metadata !{i32 8, i32 3, metadata !1, null} -!18 = metadata !{i32 9, i32 1, metadata !2, null} -</pre> -</div> - -<p>This example illustrates a few important details about LLVM debugging - information. In particular, it shows how the <tt>llvm.dbg.declare</tt> - intrinsic and location information, which are attached to an instruction, - are applied together to allow a debugger to analyze the relationship between - statements, variable definitions, and the code used to implement the - function.</p> - -<div class="doc_code"> -<pre> -call void @llvm.dbg.declare(metadata, metadata !0), !dbg !7 -</pre> -</div> - -<p>The first intrinsic - <tt>%<a href="#format_common_declare">llvm.dbg.declare</a></tt> - encodes debugging information for the variable <tt>X</tt>. The metadata - <tt>!dbg !7</tt> attached to the intrinsic provides scope information for the - variable <tt>X</tt>.</p> - -<div class="doc_code"> -<pre> -!7 = metadata !{i32 2, i32 7, metadata !1, null} -!1 = metadata !{i32 458763, metadata !2}; [DW_TAG_lexical_block ] -!2 = metadata !{i32 458798, i32 0, metadata !3, metadata !"foo", - metadata !"foo", metadata !"foo", metadata !3, i32 1, - metadata !4, i1 false, i1 true}; [DW_TAG_subprogram ] -</pre> -</div> - -<p>Here <tt>!7</tt> is metadata providing location information. It has four - fields: line number, column number, scope, and original scope. The original - scope represents inline location if this instruction is inlined inside a - caller, and is null otherwise. In this example, scope is encoded by - <tt>!1</tt>. <tt>!1</tt> represents a lexical block inside the scope - <tt>!2</tt>, where <tt>!2</tt> is a - <a href="#format_subprograms">subprogram descriptor</a>. This way the - location information attached to the intrinsics indicates that the - variable <tt>X</tt> is declared at line number 2 at a function level scope in - function <tt>foo</tt>.</p> - -<p>Now lets take another example.</p> - -<div class="doc_code"> -<pre> -call void @llvm.dbg.declare(metadata, metadata !12), !dbg !14 -</pre> -</div> - -<p>The second intrinsic - <tt>%<a href="#format_common_declare">llvm.dbg.declare</a></tt> - encodes debugging information for variable <tt>Z</tt>. The metadata - <tt>!dbg !14</tt> attached to the intrinsic provides scope information for - the variable <tt>Z</tt>.</p> - -<div class="doc_code"> -<pre> -!13 = metadata !{i32 458763, metadata !1}; [DW_TAG_lexical_block ] -!14 = metadata !{i32 5, i32 9, metadata !13, null} -</pre> -</div> - -<p>Here <tt>!14</tt> indicates that <tt>Z</tt> is declared at line number 5 and - column number 9 inside of lexical scope <tt>!13</tt>. The lexical scope - itself resides inside of lexical scope <tt>!1</tt> described above.</p> - -<p>The scope information attached with each instruction provides a - straightforward way to find instructions covered by a scope.</p> - -</div> - -</div> - -<!-- *********************************************************************** --> -<h2> - <a name="ccxx_frontend">C/C++ front-end specific debug information</a> -</h2> -<!-- *********************************************************************** --> - -<div> - -<p>The C and C++ front-ends represent information about the program in a format - that is effectively identical - to <a href="http://www.eagercon.com/dwarf/dwarf3std.htm">DWARF 3.0</a> in - terms of information content. This allows code generators to trivially - support native debuggers by generating standard dwarf information, and - contains enough information for non-dwarf targets to translate it as - needed.</p> - -<p>This section describes the forms used to represent C and C++ programs. Other - languages could pattern themselves after this (which itself is tuned to - representing programs in the same way that DWARF 3 does), or they could - choose to provide completely different forms if they don't fit into the DWARF - model. As support for debugging information gets added to the various LLVM - source-language front-ends, the information used should be documented - here.</p> - -<p>The following sections provide examples of various C/C++ constructs and the - debug information that would best describe those constructs.</p> - -<!-- ======================================================================= --> -<h3> - <a name="ccxx_compile_units">C/C++ source file information</a> -</h3> - -<div> - -<p>Given the source files <tt>MySource.cpp</tt> and <tt>MyHeader.h</tt> located - in the directory <tt>/Users/mine/sources</tt>, the following code:</p> - -<div class="doc_code"> -<pre> -#include "MyHeader.h" - -int main(int argc, char *argv[]) { - return 0; -} -</pre> -</div> - -<p>a C/C++ front-end would generate the following descriptors:</p> - -<div class="doc_code"> -<pre> -... -;; -;; Define the compile unit for the main source file "/Users/mine/sources/MySource.cpp". -;; -!2 = metadata !{ - i32 524305, ;; Tag - i32 0, ;; Unused - i32 4, ;; Language Id - metadata !"MySource.cpp", - metadata !"/Users/mine/sources", - metadata !"4.2.1 (Based on Apple Inc. build 5649) (LLVM build 00)", - i1 true, ;; Main Compile Unit - i1 false, ;; Optimized compile unit - metadata !"", ;; Compiler flags - i32 0} ;; Runtime version - -;; -;; Define the file for the file "/Users/mine/sources/MySource.cpp". -;; -!1 = metadata !{ - i32 524329, ;; Tag - metadata !"MySource.cpp", - metadata !"/Users/mine/sources", - metadata !2 ;; Compile unit -} - -;; -;; Define the file for the file "/Users/mine/sources/Myheader.h" -;; -!3 = metadata !{ - i32 524329, ;; Tag - metadata !"Myheader.h" - metadata !"/Users/mine/sources", - metadata !2 ;; Compile unit -} - -... -</pre> -</div> - -<p>llvm::Instruction provides easy access to metadata attached with an -instruction. One can extract line number information encoded in LLVM IR -using <tt>Instruction::getMetadata()</tt> and -<tt>DILocation::getLineNumber()</tt>. -<pre> - if (MDNode *N = I->getMetadata("dbg")) { // Here I is an LLVM instruction - DILocation Loc(N); // DILocation is in DebugInfo.h - unsigned Line = Loc.getLineNumber(); - StringRef File = Loc.getFilename(); - StringRef Dir = Loc.getDirectory(); - } -</pre> -</div> - -<!-- ======================================================================= --> -<h3> - <a name="ccxx_global_variable">C/C++ global variable information</a> -</h3> - -<div> - -<p>Given an integer global variable declared as follows:</p> - -<div class="doc_code"> -<pre> -int MyGlobal = 100; -</pre> -</div> - -<p>a C/C++ front-end would generate the following descriptors:</p> - -<div class="doc_code"> -<pre> -;; -;; Define the global itself. -;; -%MyGlobal = global int 100 -... -;; -;; List of debug info of globals -;; -!llvm.dbg.cu = !{!0} - -;; Define the compile unit. -!0 = metadata !{ - i32 786449, ;; Tag - i32 0, ;; Context - i32 4, ;; Language - metadata !"foo.cpp", ;; File - metadata !"/Volumes/Data/tmp", ;; Directory - metadata !"clang version 3.1 ", ;; Producer - i1 true, ;; Deprecated field - i1 false, ;; "isOptimized"? - metadata !"", ;; Flags - i32 0, ;; Runtime Version - metadata !1, ;; Enum Types - metadata !1, ;; Retained Types - metadata !1, ;; Subprograms - metadata !3 ;; Global Variables -} ; [ DW_TAG_compile_unit ] - -;; The Array of Global Variables -!3 = metadata !{ - metadata !4 -} - -!4 = metadata !{ - metadata !5 -} - -;; -;; Define the global variable itself. -;; -!5 = metadata !{ - i32 786484, ;; Tag - i32 0, ;; Unused - null, ;; Unused - metadata !"MyGlobal", ;; Name - metadata !"MyGlobal", ;; Display Name - metadata !"", ;; Linkage Name - metadata !6, ;; File - i32 1, ;; Line - metadata !7, ;; Type - i32 0, ;; IsLocalToUnit - i32 1, ;; IsDefinition - i32* @MyGlobal ;; LLVM-IR Value -} ; [ DW_TAG_variable ] - -;; -;; Define the file -;; -!6 = metadata !{ - i32 786473, ;; Tag - metadata !"foo.cpp", ;; File - metadata !"/Volumes/Data/tmp", ;; Directory - null ;; Unused -} ; [ DW_TAG_file_type ] - -;; -;; Define the type -;; -!7 = metadata !{ - i32 786468, ;; Tag - null, ;; Unused - metadata !"int", ;; Name - null, ;; Unused - i32 0, ;; Line - i64 32, ;; Size in Bits - i64 32, ;; Align in Bits - i64 0, ;; Offset - i32 0, ;; Flags - i32 5 ;; Encoding -} ; [ DW_TAG_base_type ] - -</pre> -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="ccxx_subprogram">C/C++ function information</a> -</h3> - -<div> - -<p>Given a function declared as follows:</p> - -<div class="doc_code"> -<pre> -int main(int argc, char *argv[]) { - return 0; -} -</pre> -</div> - -<p>a C/C++ front-end would generate the following descriptors:</p> - -<div class="doc_code"> -<pre> -;; -;; Define the anchor for subprograms. Note that the second field of the -;; anchor is 46, which is the same as the tag for subprograms -;; (46 = DW_TAG_subprogram.) -;; -!6 = metadata !{ - i32 524334, ;; Tag - i32 0, ;; Unused - metadata !1, ;; Context - metadata !"main", ;; Name - metadata !"main", ;; Display name - metadata !"main", ;; Linkage name - metadata !1, ;; File - i32 1, ;; Line number - metadata !4, ;; Type - i1 false, ;; Is local - i1 true, ;; Is definition - i32 0, ;; Virtuality attribute, e.g. pure virtual function - i32 0, ;; Index into virtual table for C++ methods - i32 0, ;; Type that holds virtual table. - i32 0, ;; Flags - i1 false, ;; True if this function is optimized - Function *, ;; Pointer to llvm::Function - null ;; Function template parameters -} -;; -;; Define the subprogram itself. -;; -define i32 @main(i32 %argc, i8** %argv) { -... -} -</pre> -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="ccxx_basic_types">C/C++ basic types</a> -</h3> - -<div> - -<p>The following are the basic type descriptors for C/C++ core types:</p> - -<!-- ======================================================================= --> -<h4> - <a name="ccxx_basic_type_bool">bool</a> -</h4> - -<div> - -<div class="doc_code"> -<pre> -!2 = metadata !{ - i32 524324, ;; Tag - metadata !1, ;; Context - metadata !"bool", ;; Name - metadata !1, ;; File - i32 0, ;; Line number - i64 8, ;; Size in Bits - i64 8, ;; Align in Bits - i64 0, ;; Offset in Bits - i32 0, ;; Flags - i32 2 ;; Encoding -} -</pre> -</div> - -</div> - -<!-- ======================================================================= --> -<h4> - <a name="ccxx_basic_char">char</a> -</h4> - -<div> - -<div class="doc_code"> -<pre> -!2 = metadata !{ - i32 524324, ;; Tag - metadata !1, ;; Context - metadata !"char", ;; Name - metadata !1, ;; File - i32 0, ;; Line number - i64 8, ;; Size in Bits - i64 8, ;; Align in Bits - i64 0, ;; Offset in Bits - i32 0, ;; Flags - i32 6 ;; Encoding -} -</pre> -</div> - -</div> - -<!-- ======================================================================= --> -<h4> - <a name="ccxx_basic_unsigned_char">unsigned char</a> -</h4> - -<div> - -<div class="doc_code"> -<pre> -!2 = metadata !{ - i32 524324, ;; Tag - metadata !1, ;; Context - metadata !"unsigned char", - metadata !1, ;; File - i32 0, ;; Line number - i64 8, ;; Size in Bits - i64 8, ;; Align in Bits - i64 0, ;; Offset in Bits - i32 0, ;; Flags - i32 8 ;; Encoding -} -</pre> -</div> - -</div> - -<!-- ======================================================================= --> -<h4> - <a name="ccxx_basic_short">short</a> -</h4> - -<div> - -<div class="doc_code"> -<pre> -!2 = metadata !{ - i32 524324, ;; Tag - metadata !1, ;; Context - metadata !"short int", - metadata !1, ;; File - i32 0, ;; Line number - i64 16, ;; Size in Bits - i64 16, ;; Align in Bits - i64 0, ;; Offset in Bits - i32 0, ;; Flags - i32 5 ;; Encoding -} -</pre> -</div> - -</div> - -<!-- ======================================================================= --> -<h4> - <a name="ccxx_basic_unsigned_short">unsigned short</a> -</h4> - -<div> - -<div class="doc_code"> -<pre> -!2 = metadata !{ - i32 524324, ;; Tag - metadata !1, ;; Context - metadata !"short unsigned int", - metadata !1, ;; File - i32 0, ;; Line number - i64 16, ;; Size in Bits - i64 16, ;; Align in Bits - i64 0, ;; Offset in Bits - i32 0, ;; Flags - i32 7 ;; Encoding -} -</pre> -</div> - -</div> - -<!-- ======================================================================= --> -<h4> - <a name="ccxx_basic_int">int</a> -</h4> - -<div> - -<div class="doc_code"> -<pre> -!2 = metadata !{ - i32 524324, ;; Tag - metadata !1, ;; Context - metadata !"int", ;; Name - metadata !1, ;; File - i32 0, ;; Line number - i64 32, ;; Size in Bits - i64 32, ;; Align in Bits - i64 0, ;; Offset in Bits - i32 0, ;; Flags - i32 5 ;; Encoding -} -</pre></div> - -</div> - -<!-- ======================================================================= --> -<h4> - <a name="ccxx_basic_unsigned_int">unsigned int</a> -</h4> - -<div> - -<div class="doc_code"> -<pre> -!2 = metadata !{ - i32 524324, ;; Tag - metadata !1, ;; Context - metadata !"unsigned int", - metadata !1, ;; File - i32 0, ;; Line number - i64 32, ;; Size in Bits - i64 32, ;; Align in Bits - i64 0, ;; Offset in Bits - i32 0, ;; Flags - i32 7 ;; Encoding -} -</pre> -</div> - -</div> - -<!-- ======================================================================= --> -<h4> - <a name="ccxx_basic_long_long">long long</a> -</h4> - -<div> - -<div class="doc_code"> -<pre> -!2 = metadata !{ - i32 524324, ;; Tag - metadata !1, ;; Context - metadata !"long long int", - metadata !1, ;; File - i32 0, ;; Line number - i64 64, ;; Size in Bits - i64 64, ;; Align in Bits - i64 0, ;; Offset in Bits - i32 0, ;; Flags - i32 5 ;; Encoding -} -</pre> -</div> - -</div> - -<!-- ======================================================================= --> -<h4> - <a name="ccxx_basic_unsigned_long_long">unsigned long long</a> -</h4> - -<div> - -<div class="doc_code"> -<pre> -!2 = metadata !{ - i32 524324, ;; Tag - metadata !1, ;; Context - metadata !"long long unsigned int", - metadata !1, ;; File - i32 0, ;; Line number - i64 64, ;; Size in Bits - i64 64, ;; Align in Bits - i64 0, ;; Offset in Bits - i32 0, ;; Flags - i32 7 ;; Encoding -} -</pre> -</div> - -</div> - -<!-- ======================================================================= --> -<h4> - <a name="ccxx_basic_float">float</a> -</h4> - -<div> - -<div class="doc_code"> -<pre> -!2 = metadata !{ - i32 524324, ;; Tag - metadata !1, ;; Context - metadata !"float", - metadata !1, ;; File - i32 0, ;; Line number - i64 32, ;; Size in Bits - i64 32, ;; Align in Bits - i64 0, ;; Offset in Bits - i32 0, ;; Flags - i32 4 ;; Encoding -} -</pre> -</div> - -</div> - -<!-- ======================================================================= --> -<h4> - <a name="ccxx_basic_double">double</a> -</h4> - -<div> - -<div class="doc_code"> -<pre> -!2 = metadata !{ - i32 524324, ;; Tag - metadata !1, ;; Context - metadata !"double",;; Name - metadata !1, ;; File - i32 0, ;; Line number - i64 64, ;; Size in Bits - i64 64, ;; Align in Bits - i64 0, ;; Offset in Bits - i32 0, ;; Flags - i32 4 ;; Encoding -} -</pre> -</div> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="ccxx_derived_types">C/C++ derived types</a> -</h3> - -<div> - -<p>Given the following as an example of C/C++ derived type:</p> - -<div class="doc_code"> -<pre> -typedef const int *IntPtr; -</pre> -</div> - -<p>a C/C++ front-end would generate the following descriptors:</p> - -<div class="doc_code"> -<pre> -;; -;; Define the typedef "IntPtr". -;; -!2 = metadata !{ - i32 524310, ;; Tag - metadata !1, ;; Context - metadata !"IntPtr", ;; Name - metadata !3, ;; File - i32 0, ;; Line number - i64 0, ;; Size in bits - i64 0, ;; Align in bits - i64 0, ;; Offset in bits - i32 0, ;; Flags - metadata !4 ;; Derived From type -} - -;; -;; Define the pointer type. -;; -!4 = metadata !{ - i32 524303, ;; Tag - metadata !1, ;; Context - metadata !"", ;; Name - metadata !1, ;; File - i32 0, ;; Line number - i64 64, ;; Size in bits - i64 64, ;; Align in bits - i64 0, ;; Offset in bits - i32 0, ;; Flags - metadata !5 ;; Derived From type -} -;; -;; Define the const type. -;; -!5 = metadata !{ - i32 524326, ;; Tag - metadata !1, ;; Context - metadata !"", ;; Name - metadata !1, ;; File - i32 0, ;; Line number - i64 32, ;; Size in bits - i64 32, ;; Align in bits - i64 0, ;; Offset in bits - i32 0, ;; Flags - metadata !6 ;; Derived From type -} -;; -;; Define the int type. -;; -!6 = metadata !{ - i32 524324, ;; Tag - metadata !1, ;; Context - metadata !"int", ;; Name - metadata !1, ;; File - i32 0, ;; Line number - i64 32, ;; Size in bits - i64 32, ;; Align in bits - i64 0, ;; Offset in bits - i32 0, ;; Flags - 5 ;; Encoding -} -</pre> -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="ccxx_composite_types">C/C++ struct/union types</a> -</h3> - -<div> - -<p>Given the following as an example of C/C++ struct type:</p> - -<div class="doc_code"> -<pre> -struct Color { - unsigned Red; - unsigned Green; - unsigned Blue; -}; -</pre> -</div> - -<p>a C/C++ front-end would generate the following descriptors:</p> - -<div class="doc_code"> -<pre> -;; -;; Define basic type for unsigned int. -;; -!5 = metadata !{ - i32 524324, ;; Tag - metadata !1, ;; Context - metadata !"unsigned int", - metadata !1, ;; File - i32 0, ;; Line number - i64 32, ;; Size in Bits - i64 32, ;; Align in Bits - i64 0, ;; Offset in Bits - i32 0, ;; Flags - i32 7 ;; Encoding -} -;; -;; Define composite type for struct Color. -;; -!2 = metadata !{ - i32 524307, ;; Tag - metadata !1, ;; Context - metadata !"Color", ;; Name - metadata !1, ;; Compile unit - i32 1, ;; Line number - i64 96, ;; Size in bits - i64 32, ;; Align in bits - i64 0, ;; Offset in bits - i32 0, ;; Flags - null, ;; Derived From - metadata !3, ;; Elements - i32 0 ;; Runtime Language -} - -;; -;; Define the Red field. -;; -!4 = metadata !{ - i32 524301, ;; Tag - metadata !1, ;; Context - metadata !"Red", ;; Name - metadata !1, ;; File - i32 2, ;; Line number - i64 32, ;; Size in bits - i64 32, ;; Align in bits - i64 0, ;; Offset in bits - i32 0, ;; Flags - metadata !5 ;; Derived From type -} - -;; -;; Define the Green field. -;; -!6 = metadata !{ - i32 524301, ;; Tag - metadata !1, ;; Context - metadata !"Green", ;; Name - metadata !1, ;; File - i32 3, ;; Line number - i64 32, ;; Size in bits - i64 32, ;; Align in bits - i64 32, ;; Offset in bits - i32 0, ;; Flags - metadata !5 ;; Derived From type -} - -;; -;; Define the Blue field. -;; -!7 = metadata !{ - i32 524301, ;; Tag - metadata !1, ;; Context - metadata !"Blue", ;; Name - metadata !1, ;; File - i32 4, ;; Line number - i64 32, ;; Size in bits - i64 32, ;; Align in bits - i64 64, ;; Offset in bits - i32 0, ;; Flags - metadata !5 ;; Derived From type -} - -;; -;; Define the array of fields used by the composite type Color. -;; -!3 = metadata !{metadata !4, metadata !6, metadata !7} -</pre> -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="ccxx_enumeration_types">C/C++ enumeration types</a> -</h3> - -<div> - -<p>Given the following as an example of C/C++ enumeration type:</p> - -<div class="doc_code"> -<pre> -enum Trees { - Spruce = 100, - Oak = 200, - Maple = 300 -}; -</pre> -</div> - -<p>a C/C++ front-end would generate the following descriptors:</p> - -<div class="doc_code"> -<pre> -;; -;; Define composite type for enum Trees -;; -!2 = metadata !{ - i32 524292, ;; Tag - metadata !1, ;; Context - metadata !"Trees", ;; Name - metadata !1, ;; File - i32 1, ;; Line number - i64 32, ;; Size in bits - i64 32, ;; Align in bits - i64 0, ;; Offset in bits - i32 0, ;; Flags - null, ;; Derived From type - metadata !3, ;; Elements - i32 0 ;; Runtime language -} - -;; -;; Define the array of enumerators used by composite type Trees. -;; -!3 = metadata !{metadata !4, metadata !5, metadata !6} - -;; -;; Define Spruce enumerator. -;; -!4 = metadata !{i32 524328, metadata !"Spruce", i64 100} - -;; -;; Define Oak enumerator. -;; -!5 = metadata !{i32 524328, metadata !"Oak", i64 200} - -;; -;; Define Maple enumerator. -;; -!6 = metadata !{i32 524328, metadata !"Maple", i64 300} - -</pre> -</div> - -</div> - -</div> - - -<!-- *********************************************************************** --> -<h2> - <a name="llvmdwarfextension">Debugging information format</a> -</h2> -<!-- *********************************************************************** --> -<div> -<!-- ======================================================================= --> -<h3> - <a name="objcproperty">Debugging Information Extension for Objective C Properties</a> -</h3> -<div> -<!-- *********************************************************************** --> -<h4> - <a name="objcpropertyintroduction">Introduction</a> -</h4> -<!-- *********************************************************************** --> - -<div> -<p>Objective C provides a simpler way to declare and define accessor methods -using declared properties. The language provides features to declare a -property and to let compiler synthesize accessor methods. -</p> - -<p>The debugger lets developer inspect Objective C interfaces and their -instance variables and class variables. However, the debugger does not know -anything about the properties defined in Objective C interfaces. The debugger -consumes information generated by compiler in DWARF format. The format does -not support encoding of Objective C properties. This proposal describes DWARF -extensions to encode Objective C properties, which the debugger can use to let -developers inspect Objective C properties. -</p> - -</div> - - -<!-- *********************************************************************** --> -<h4> - <a name="objcpropertyproposal">Proposal</a> -</h4> -<!-- *********************************************************************** --> - -<div> -<p>Objective C properties exist separately from class members. A property -can be defined only by "setter" and "getter" selectors, and -be calculated anew on each access. Or a property can just be a direct access -to some declared ivar. Finally it can have an ivar "automatically -synthesized" for it by the compiler, in which case the property can be -referred to in user code directly using the standard C dereference syntax as -well as through the property "dot" syntax, but there is no entry in -the @interface declaration corresponding to this ivar. -</p> -<p> -To facilitate debugging, these properties we will add a new DWARF TAG into the -DW_TAG_structure_type definition for the class to hold the description of a -given property, and a set of DWARF attributes that provide said description. -The property tag will also contain the name and declared type of the property. -</p> -<p> -If there is a related ivar, there will also be a DWARF property attribute placed -in the DW_TAG_member DIE for that ivar referring back to the property TAG for -that property. And in the case where the compiler synthesizes the ivar directly, -the compiler is expected to generate a DW_TAG_member for that ivar (with the -DW_AT_artificial set to 1), whose name will be the name used to access this -ivar directly in code, and with the property attribute pointing back to the -property it is backing. -</p> -<p> -The following examples will serve as illustration for our discussion: -</p> - -<div class="doc_code"> -<pre> -@interface I1 { - int n2; -} - -@property int p1; -@property int p2; -@end - -@implementation I1 -@synthesize p1; -@synthesize p2 = n2; -@end -</pre> -</div> - -<p> -This produces the following DWARF (this is a "pseudo dwarfdump" output): -</p> -<div class="doc_code"> -<pre> -0x00000100: TAG_structure_type [7] * - AT_APPLE_runtime_class( 0x10 ) - AT_name( "I1" ) - AT_decl_file( "Objc_Property.m" ) - AT_decl_line( 3 ) - -0x00000110 TAG_APPLE_property - AT_name ( "p1" ) - AT_type ( {0x00000150} ( int ) ) - -0x00000120: TAG_APPLE_property - AT_name ( "p2" ) - AT_type ( {0x00000150} ( int ) ) - -0x00000130: TAG_member [8] - AT_name( "_p1" ) - AT_APPLE_property ( {0x00000110} "p1" ) - AT_type( {0x00000150} ( int ) ) - AT_artificial ( 0x1 ) - -0x00000140: TAG_member [8] - AT_name( "n2" ) - AT_APPLE_property ( {0x00000120} "p2" ) - AT_type( {0x00000150} ( int ) ) - -0x00000150: AT_type( ( int ) ) -</pre> -</div> - -<p> Note, the current convention is that the name of the ivar for an -auto-synthesized property is the name of the property from which it derives with -an underscore prepended, as is shown in the example. -But we actually don't need to know this convention, since we are given the name -of the ivar directly. -</p> - -<p> -Also, it is common practice in ObjC to have different property declarations in -the @interface and @implementation - e.g. to provide a read-only property in -the interface,and a read-write interface in the implementation. In that case, -the compiler should emit whichever property declaration will be in force in the -current translation unit. -</p> - -<p> Developers can decorate a property with attributes which are encoded using -DW_AT_APPLE_property_attribute. -</p> - -<div class="doc_code"> -<pre> -@property (readonly, nonatomic) int pr; -</pre> -</div> -<p> -Which produces a property tag: -<p> -<div class="doc_code"> -<pre> -TAG_APPLE_property [8] - AT_name( "pr" ) - AT_type ( {0x00000147} (int) ) - AT_APPLE_property_attribute (DW_APPLE_PROPERTY_readonly, DW_APPLE_PROPERTY_nonatomic) -</pre> -</div> - -<p> The setter and getter method names are attached to the property using -DW_AT_APPLE_property_setter and DW_AT_APPLE_property_getter attributes. -</p> -<div class="doc_code"> -<pre> -@interface I1 -@property (setter=myOwnP3Setter:) int p3; --(void)myOwnP3Setter:(int)a; -@end - -@implementation I1 -@synthesize p3; --(void)myOwnP3Setter:(int)a{ } -@end -</pre> -</div> - -<p> -The DWARF for this would be: -</p> -<div class="doc_code"> -<pre> -0x000003bd: TAG_structure_type [7] * - AT_APPLE_runtime_class( 0x10 ) - AT_name( "I1" ) - AT_decl_file( "Objc_Property.m" ) - AT_decl_line( 3 ) - -0x000003cd TAG_APPLE_property - AT_name ( "p3" ) - AT_APPLE_property_setter ( "myOwnP3Setter:" ) - AT_type( {0x00000147} ( int ) ) - -0x000003f3: TAG_member [8] - AT_name( "_p3" ) - AT_type ( {0x00000147} ( int ) ) - AT_APPLE_property ( {0x000003cd} ) - AT_artificial ( 0x1 ) -</pre> -</div> - -</div> - -<!-- *********************************************************************** --> -<h4> - <a name="objcpropertynewtags">New DWARF Tags</a> -</h4> -<!-- *********************************************************************** --> - -<div> -<table border="1" cellspacing="0"> - <col width="200"> - <col width="200"> - <tr> - <th>TAG</th> - <th>Value</th> - </tr> - <tr> - <td>DW_TAG_APPLE_property</td> - <td>0x4200</td> - </tr> -</table> - -</div> - -<!-- *********************************************************************** --> -<h4> - <a name="objcpropertynewattributes">New DWARF Attributes</a> -</h4> -<!-- *********************************************************************** --> - -<div> -<table border="1" cellspacing="0"> - <col width="200"> - <col width="200"> - <col width="200"> - <tr> - <th>Attribute</th> - <th>Value</th> - <th>Classes</th> - </tr> - <tr> - <td>DW_AT_APPLE_property</td> - <td>0x3fed</td> - <td>Reference</td> - </tr> - <tr> - <td>DW_AT_APPLE_property_getter</td> - <td>0x3fe9</td> - <td>String</td> - </tr> - <tr> - <td>DW_AT_APPLE_property_setter</td> - <td>0x3fea</td> - <td>String</td> - </tr> - <tr> - <td>DW_AT_APPLE_property_attribute</td> - <td>0x3feb</td> - <td>Constant</td> - </tr> -</table> - -</div> - -<!-- *********************************************************************** --> -<h4> - <a name="objcpropertynewconstants">New DWARF Constants</a> -</h4> -<!-- *********************************************************************** --> - -<div> -<table border="1" cellspacing="0"> - <col width="200"> - <col width="200"> - <tr> - <th>Name</th> - <th>Value</th> - </tr> - <tr> - <td>DW_AT_APPLE_PROPERTY_readonly</td> - <td>0x1</td> - </tr> - <tr> - <td>DW_AT_APPLE_PROPERTY_readwrite</td> - <td>0x2</td> - </tr> - <tr> - <td>DW_AT_APPLE_PROPERTY_assign</td> - <td>0x4</td> - </tr> - <tr> - <td>DW_AT_APPLE_PROPERTY_retain</td> - <td>0x8</td> - </tr> - <tr> - <td>DW_AT_APPLE_PROPERTY_copy</td> - <td>0x10</td> - </tr> - <tr> - <td>DW_AT_APPLE_PROPERTY_nonatomic</td> - <td>0x20</td> - </tr> -</table> - -</div> -</div> - -<!-- ======================================================================= --> -<h3> - <a name="acceltable">Name Accelerator Tables</a> -</h3> -<!-- ======================================================================= --> -<div> -<!-- ======================================================================= --> -<h4> - <a name="acceltableintroduction">Introduction</a> -</h4> -<!-- ======================================================================= --> -<div> -<p>The .debug_pubnames and .debug_pubtypes formats are not what a debugger - needs. The "pub" in the section name indicates that the entries in the - table are publicly visible names only. This means no static or hidden - functions show up in the .debug_pubnames. No static variables or private class - variables are in the .debug_pubtypes. Many compilers add different things to - these tables, so we can't rely upon the contents between gcc, icc, or clang.</p> - -<p>The typical query given by users tends not to match up with the contents of - these tables. For example, the DWARF spec states that "In the case of the - name of a function member or static data member of a C++ structure, class or - union, the name presented in the .debug_pubnames section is not the simple - name given by the DW_AT_name attribute of the referenced debugging information - entry, but rather the fully qualified name of the data or function member." - So the only names in these tables for complex C++ entries is a fully - qualified name. Debugger users tend not to enter their search strings as - "a::b::c(int,const Foo&) const", but rather as "c", "b::c" , or "a::b::c". So - the name entered in the name table must be demangled in order to chop it up - appropriately and additional names must be manually entered into the table - to make it effective as a name lookup table for debuggers to use.</p> - -<p>All debuggers currently ignore the .debug_pubnames table as a result of - its inconsistent and useless public-only name content making it a waste of - space in the object file. These tables, when they are written to disk, are - not sorted in any way, leaving every debugger to do its own parsing - and sorting. These tables also include an inlined copy of the string values - in the table itself making the tables much larger than they need to be on - disk, especially for large C++ programs.</p> - -<p>Can't we just fix the sections by adding all of the names we need to this - table? No, because that is not what the tables are defined to contain and we - won't know the difference between the old bad tables and the new good tables. - At best we could make our own renamed sections that contain all of the data - we need.</p> - -<p>These tables are also insufficient for what a debugger like LLDB needs. - LLDB uses clang for its expression parsing where LLDB acts as a PCH. LLDB is - then often asked to look for type "foo" or namespace "bar", or list items in - namespace "baz". Namespaces are not included in the pubnames or pubtypes - tables. Since clang asks a lot of questions when it is parsing an expression, - we need to be very fast when looking up names, as it happens a lot. Having new - accelerator tables that are optimized for very quick lookups will benefit - this type of debugging experience greatly.</p> - -<p>We would like to generate name lookup tables that can be mapped into - memory from disk, and used as is, with little or no up-front parsing. We would - also be able to control the exact content of these different tables so they - contain exactly what we need. The Name Accelerator Tables were designed - to fix these issues. In order to solve these issues we need to:</p> - -<ul> - <li>Have a format that can be mapped into memory from disk and used as is</li> - <li>Lookups should be very fast</li> - <li>Extensible table format so these tables can be made by many producers</li> - <li>Contain all of the names needed for typical lookups out of the box</li> - <li>Strict rules for the contents of tables</li> -</ul> - -<p>Table size is important and the accelerator table format should allow the - reuse of strings from common string tables so the strings for the names are - not duplicated. We also want to make sure the table is ready to be used as-is - by simply mapping the table into memory with minimal header parsing.</p> - -<p>The name lookups need to be fast and optimized for the kinds of lookups - that debuggers tend to do. Optimally we would like to touch as few parts of - the mapped table as possible when doing a name lookup and be able to quickly - find the name entry we are looking for, or discover there are no matches. In - the case of debuggers we optimized for lookups that fail most of the time.</p> - -<p>Each table that is defined should have strict rules on exactly what is in - the accelerator tables and documented so clients can rely on the content.</p> - -</div> - -<!-- ======================================================================= --> -<h4> - <a name="acceltablehashes">Hash Tables</a> -</h4> -<!-- ======================================================================= --> - -<div> -<h5>Standard Hash Tables</h5> - -<p>Typical hash tables have a header, buckets, and each bucket points to the -bucket contents: -</p> - -<div class="doc_code"> -<pre> -.------------. -| HEADER | -|------------| -| BUCKETS | -|------------| -| DATA | -`------------' -</pre> -</div> - -<p>The BUCKETS are an array of offsets to DATA for each hash:</p> - -<div class="doc_code"> -<pre> -.------------. -| 0x00001000 | BUCKETS[0] -| 0x00002000 | BUCKETS[1] -| 0x00002200 | BUCKETS[2] -| 0x000034f0 | BUCKETS[3] -| | ... -| 0xXXXXXXXX | BUCKETS[n_buckets] -'------------' -</pre> -</div> - -<p>So for bucket[3] in the example above, we have an offset into the table - 0x000034f0 which points to a chain of entries for the bucket. Each bucket - must contain a next pointer, full 32 bit hash value, the string itself, - and the data for the current string value.</p> - -<div class="doc_code"> -<pre> - .------------. -0x000034f0: | 0x00003500 | next pointer - | 0x12345678 | 32 bit hash - | "erase" | string value - | data[n] | HashData for this bucket - |------------| -0x00003500: | 0x00003550 | next pointer - | 0x29273623 | 32 bit hash - | "dump" | string value - | data[n] | HashData for this bucket - |------------| -0x00003550: | 0x00000000 | next pointer - | 0x82638293 | 32 bit hash - | "main" | string value - | data[n] | HashData for this bucket - `------------' -</pre> -</div> - -<p>The problem with this layout for debuggers is that we need to optimize for - the negative lookup case where the symbol we're searching for is not present. - So if we were to lookup "printf" in the table above, we would make a 32 hash - for "printf", it might match bucket[3]. We would need to go to the offset - 0x000034f0 and start looking to see if our 32 bit hash matches. To do so, we - need to read the next pointer, then read the hash, compare it, and skip to - the next bucket. Each time we are skipping many bytes in memory and touching - new cache pages just to do the compare on the full 32 bit hash. All of these - accesses then tell us that we didn't have a match.</p> - -<h5>Name Hash Tables</h5> - -<p>To solve the issues mentioned above we have structured the hash tables - a bit differently: a header, buckets, an array of all unique 32 bit hash - values, followed by an array of hash value data offsets, one for each hash - value, then the data for all hash values:</p> - -<div class="doc_code"> -<pre> -.-------------. -| HEADER | -|-------------| -| BUCKETS | -|-------------| -| HASHES | -|-------------| -| OFFSETS | -|-------------| -| DATA | -`-------------' -</pre> -</div> - -<p>The BUCKETS in the name tables are an index into the HASHES array. By - making all of the full 32 bit hash values contiguous in memory, we allow - ourselves to efficiently check for a match while touching as little - memory as possible. Most often checking the 32 bit hash values is as far as - the lookup goes. If it does match, it usually is a match with no collisions. - So for a table with "n_buckets" buckets, and "n_hashes" unique 32 bit hash - values, we can clarify the contents of the BUCKETS, HASHES and OFFSETS as:</p> - -<div class="doc_code"> -<pre> -.-------------------------. -| HEADER.magic | uint32_t -| HEADER.version | uint16_t -| HEADER.hash_function | uint16_t -| HEADER.bucket_count | uint32_t -| HEADER.hashes_count | uint32_t -| HEADER.header_data_len | uint32_t -| HEADER_DATA | HeaderData -|-------------------------| -| BUCKETS | uint32_t[bucket_count] // 32 bit hash indexes -|-------------------------| -| HASHES | uint32_t[hashes_count] // 32 bit hash values -|-------------------------| -| OFFSETS | uint32_t[hashes_count] // 32 bit offsets to hash value data -|-------------------------| -| ALL HASH DATA | -`-------------------------' -</pre> -</div> - -<p>So taking the exact same data from the standard hash example above we end up - with:</p> - -<div class="doc_code"> -<pre> - .------------. - | HEADER | - |------------| - | 0 | BUCKETS[0] - | 2 | BUCKETS[1] - | 5 | BUCKETS[2] - | 6 | BUCKETS[3] - | | ... - | ... | BUCKETS[n_buckets] - |------------| - | 0x........ | HASHES[0] - | 0x........ | HASHES[1] - | 0x........ | HASHES[2] - | 0x........ | HASHES[3] - | 0x........ | HASHES[4] - | 0x........ | HASHES[5] - | 0x12345678 | HASHES[6] hash for BUCKETS[3] - | 0x29273623 | HASHES[7] hash for BUCKETS[3] - | 0x82638293 | HASHES[8] hash for BUCKETS[3] - | 0x........ | HASHES[9] - | 0x........ | HASHES[10] - | 0x........ | HASHES[11] - | 0x........ | HASHES[12] - | 0x........ | HASHES[13] - | 0x........ | HASHES[n_hashes] - |------------| - | 0x........ | OFFSETS[0] - | 0x........ | OFFSETS[1] - | 0x........ | OFFSETS[2] - | 0x........ | OFFSETS[3] - | 0x........ | OFFSETS[4] - | 0x........ | OFFSETS[5] - | 0x000034f0 | OFFSETS[6] offset for BUCKETS[3] - | 0x00003500 | OFFSETS[7] offset for BUCKETS[3] - | 0x00003550 | OFFSETS[8] offset for BUCKETS[3] - | 0x........ | OFFSETS[9] - | 0x........ | OFFSETS[10] - | 0x........ | OFFSETS[11] - | 0x........ | OFFSETS[12] - | 0x........ | OFFSETS[13] - | 0x........ | OFFSETS[n_hashes] - |------------| - | | - | | - | | - | | - | | - |------------| -0x000034f0: | 0x00001203 | .debug_str ("erase") - | 0x00000004 | A 32 bit array count - number of HashData with name "erase" - | 0x........ | HashData[0] - | 0x........ | HashData[1] - | 0x........ | HashData[2] - | 0x........ | HashData[3] - | 0x00000000 | String offset into .debug_str (terminate data for hash) - |------------| -0x00003500: | 0x00001203 | String offset into .debug_str ("collision") - | 0x00000002 | A 32 bit array count - number of HashData with name "collision" - | 0x........ | HashData[0] - | 0x........ | HashData[1] - | 0x00001203 | String offset into .debug_str ("dump") - | 0x00000003 | A 32 bit array count - number of HashData with name "dump" - | 0x........ | HashData[0] - | 0x........ | HashData[1] - | 0x........ | HashData[2] - | 0x00000000 | String offset into .debug_str (terminate data for hash) - |------------| -0x00003550: | 0x00001203 | String offset into .debug_str ("main") - | 0x00000009 | A 32 bit array count - number of HashData with name "main" - | 0x........ | HashData[0] - | 0x........ | HashData[1] - | 0x........ | HashData[2] - | 0x........ | HashData[3] - | 0x........ | HashData[4] - | 0x........ | HashData[5] - | 0x........ | HashData[6] - | 0x........ | HashData[7] - | 0x........ | HashData[8] - | 0x00000000 | String offset into .debug_str (terminate data for hash) - `------------' -</pre> -</div> - -<p>So we still have all of the same data, we just organize it more efficiently - for debugger lookup. If we repeat the same "printf" lookup from above, we - would hash "printf" and find it matches BUCKETS[3] by taking the 32 bit hash - value and modulo it by n_buckets. BUCKETS[3] contains "6" which is the index - into the HASHES table. We would then compare any consecutive 32 bit hashes - values in the HASHES array as long as the hashes would be in BUCKETS[3]. We - do this by verifying that each subsequent hash value modulo n_buckets is still - 3. In the case of a failed lookup we would access the memory for BUCKETS[3], and - then compare a few consecutive 32 bit hashes before we know that we have no match. - We don't end up marching through multiple words of memory and we really keep the - number of processor data cache lines being accessed as small as possible.</p> - -<p>The string hash that is used for these lookup tables is the Daniel J. - Bernstein hash which is also used in the ELF GNU_HASH sections. It is a very - good hash for all kinds of names in programs with very few hash collisions.</p> - -<p>Empty buckets are designated by using an invalid hash index of UINT32_MAX.</p> -</div> - -<!-- ======================================================================= --> -<h4> - <a name="acceltabledetails">Details</a> -</h4> -<!-- ======================================================================= --> -<div> -<p>These name hash tables are designed to be generic where specializations of - the table get to define additional data that goes into the header - ("HeaderData"), how the string value is stored ("KeyType") and the content - of the data for each hash value.</p> - -<h5>Header Layout</h5> -<p>The header has a fixed part, and the specialized part. The exact format of - the header is:</p> -<div class="doc_code"> -<pre> -struct Header -{ - uint32_t magic; // 'HASH' magic value to allow endian detection - uint16_t version; // Version number - uint16_t hash_function; // The hash function enumeration that was used - uint32_t bucket_count; // The number of buckets in this hash table - uint32_t hashes_count; // The total number of unique hash values and hash data offsets in this table - uint32_t header_data_len; // The bytes to skip to get to the hash indexes (buckets) for correct alignment - // Specifically the length of the following HeaderData field - this does not - // include the size of the preceding fields - HeaderData header_data; // Implementation specific header data -}; -</pre> -</div> -<p>The header starts with a 32 bit "magic" value which must be 'HASH' encoded as - an ASCII integer. This allows the detection of the start of the hash table and - also allows the table's byte order to be determined so the table can be - correctly extracted. The "magic" value is followed by a 16 bit version number - which allows the table to be revised and modified in the future. The current - version number is 1. "hash_function" is a uint16_t enumeration that specifies - which hash function was used to produce this table. The current values for the - hash function enumerations include:</p> -<div class="doc_code"> -<pre> -enum HashFunctionType -{ - eHashFunctionDJB = 0u, // Daniel J Bernstein hash function -}; -</pre> -</div> -<p>"bucket_count" is a 32 bit unsigned integer that represents how many buckets - are in the BUCKETS array. "hashes_count" is the number of unique 32 bit hash - values that are in the HASHES array, and is the same number of offsets are - contained in the OFFSETS array. "header_data_len" specifies the size in - bytes of the HeaderData that is filled in by specialized versions of this - table.</p> - -<h5>Fixed Lookup</h5> -<p>The header is followed by the buckets, hashes, offsets, and hash value - data. -<div class="doc_code"> -<pre> -struct FixedTable -{ - uint32_t buckets[Header.bucket_count]; // An array of hash indexes into the "hashes[]" array below - uint32_t hashes [Header.hashes_count]; // Every unique 32 bit hash for the entire table is in this table - uint32_t offsets[Header.hashes_count]; // An offset that corresponds to each item in the "hashes[]" array above -}; -</pre> -</div> -<p>"buckets" is an array of 32 bit indexes into the "hashes" array. The - "hashes" array contains all of the 32 bit hash values for all names in the - hash table. Each hash in the "hashes" table has an offset in the "offsets" - array that points to the data for the hash value.</p> - -<p>This table setup makes it very easy to repurpose these tables to contain - different data, while keeping the lookup mechanism the same for all tables. - This layout also makes it possible to save the table to disk and map it in - later and do very efficient name lookups with little or no parsing.</p> - -<p>DWARF lookup tables can be implemented in a variety of ways and can store - a lot of information for each name. We want to make the DWARF tables - extensible and able to store the data efficiently so we have used some of the - DWARF features that enable efficient data storage to define exactly what kind - of data we store for each name.</p> - -<p>The "HeaderData" contains a definition of the contents of each HashData - chunk. We might want to store an offset to all of the debug information - entries (DIEs) for each name. To keep things extensible, we create a list of - items, or Atoms, that are contained in the data for each name. First comes the - type of the data in each atom:</p> -<div class="doc_code"> -<pre> -enum AtomType -{ - eAtomTypeNULL = 0u, - eAtomTypeDIEOffset = 1u, // DIE offset, check form for encoding - eAtomTypeCUOffset = 2u, // DIE offset of the compiler unit header that contains the item in question - eAtomTypeTag = 3u, // DW_TAG_xxx value, should be encoded as DW_FORM_data1 (if no tags exceed 255) or DW_FORM_data2 - eAtomTypeNameFlags = 4u, // Flags from enum NameFlags - eAtomTypeTypeFlags = 5u, // Flags from enum TypeFlags -}; -</pre> -</div> -<p>The enumeration values and their meanings are:</p> -<div class="doc_code"> -<pre> - eAtomTypeNULL - a termination atom that specifies the end of the atom list - eAtomTypeDIEOffset - an offset into the .debug_info section for the DWARF DIE for this name - eAtomTypeCUOffset - an offset into the .debug_info section for the CU that contains the DIE - eAtomTypeDIETag - The DW_TAG_XXX enumeration value so you don't have to parse the DWARF to see what it is - eAtomTypeNameFlags - Flags for functions and global variables (isFunction, isInlined, isExternal...) - eAtomTypeTypeFlags - Flags for types (isCXXClass, isObjCClass, ...) -</pre> -</div> -<p>Then we allow each atom type to define the atom type and how the data for - each atom type data is encoded:</p> -<div class="doc_code"> -<pre> -struct Atom -{ - uint16_t type; // AtomType enum value - uint16_t form; // DWARF DW_FORM_XXX defines -}; -</pre> -</div> -<p>The "form" type above is from the DWARF specification and defines the - exact encoding of the data for the Atom type. See the DWARF specification for - the DW_FORM_ definitions.</p> -<div class="doc_code"> -<pre> -struct HeaderData -{ - uint32_t die_offset_base; - uint32_t atom_count; - Atoms atoms[atom_count0]; -}; -</pre> -</div> -<p>"HeaderData" defines the base DIE offset that should be added to any atoms - that are encoded using the DW_FORM_ref1, DW_FORM_ref2, DW_FORM_ref4, - DW_FORM_ref8 or DW_FORM_ref_udata. It also defines what is contained in - each "HashData" object -- Atom.form tells us how large each field will be in - the HashData and the Atom.type tells us how this data should be interpreted.</p> - -<p>For the current implementations of the ".apple_names" (all functions + globals), - the ".apple_types" (names of all types that are defined), and the - ".apple_namespaces" (all namespaces), we currently set the Atom array to be:</p> -<div class="doc_code"> -<pre> -HeaderData.atom_count = 1; -HeaderData.atoms[0].type = eAtomTypeDIEOffset; -HeaderData.atoms[0].form = DW_FORM_data4; -</pre> -</div> -<p>This defines the contents to be the DIE offset (eAtomTypeDIEOffset) that is - encoded as a 32 bit value (DW_FORM_data4). This allows a single name to have - multiple matching DIEs in a single file, which could come up with an inlined - function for instance. Future tables could include more information about the - DIE such as flags indicating if the DIE is a function, method, block, - or inlined.</p> - -<p>The KeyType for the DWARF table is a 32 bit string table offset into the - ".debug_str" table. The ".debug_str" is the string table for the DWARF which - may already contain copies of all of the strings. This helps make sure, with - help from the compiler, that we reuse the strings between all of the DWARF - sections and keeps the hash table size down. Another benefit to having the - compiler generate all strings as DW_FORM_strp in the debug info, is that - DWARF parsing can be made much faster.</p> - -<p>After a lookup is made, we get an offset into the hash data. The hash data - needs to be able to deal with 32 bit hash collisions, so the chunk of data - at the offset in the hash data consists of a triple:</p> -<div class="doc_code"> -<pre> -uint32_t str_offset -uint32_t hash_data_count -HashData[hash_data_count] -</pre> -</div> -<p>If "str_offset" is zero, then the bucket contents are done. 99.9% of the - hash data chunks contain a single item (no 32 bit hash collision):</p> -<div class="doc_code"> -<pre> -.------------. -| 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main") -| 0x00000004 | uint32_t HashData count -| 0x........ | uint32_t HashData[0] DIE offset -| 0x........ | uint32_t HashData[1] DIE offset -| 0x........ | uint32_t HashData[2] DIE offset -| 0x........ | uint32_t HashData[3] DIE offset -| 0x00000000 | uint32_t KeyType (end of hash chain) -`------------' -</pre> -</div> -<p>If there are collisions, you will have multiple valid string offsets:</p> -<div class="doc_code"> -<pre> -.------------. -| 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main") -| 0x00000004 | uint32_t HashData count -| 0x........ | uint32_t HashData[0] DIE offset -| 0x........ | uint32_t HashData[1] DIE offset -| 0x........ | uint32_t HashData[2] DIE offset -| 0x........ | uint32_t HashData[3] DIE offset -| 0x00002023 | uint32_t KeyType (.debug_str[0x0002023] => "print") -| 0x00000002 | uint32_t HashData count -| 0x........ | uint32_t HashData[0] DIE offset -| 0x........ | uint32_t HashData[1] DIE offset -| 0x00000000 | uint32_t KeyType (end of hash chain) -`------------' -</pre> -</div> -<p>Current testing with real world C++ binaries has shown that there is around 1 - 32 bit hash collision per 100,000 name entries.</p> -</div> -<!-- ======================================================================= --> -<h4> - <a name="acceltablecontents">Contents</a> -</h4> -<!-- ======================================================================= --> -<div> -<p>As we said, we want to strictly define exactly what is included in the - different tables. For DWARF, we have 3 tables: ".apple_names", ".apple_types", - and ".apple_namespaces".</p> - -<p>".apple_names" sections should contain an entry for each DWARF DIE whose - DW_TAG is a DW_TAG_label, DW_TAG_inlined_subroutine, or DW_TAG_subprogram that - has address attributes: DW_AT_low_pc, DW_AT_high_pc, DW_AT_ranges or - DW_AT_entry_pc. It also contains DW_TAG_variable DIEs that have a DW_OP_addr - in the location (global and static variables). All global and static variables - should be included, including those scoped within functions and classes. For - example using the following code:</p> -<div class="doc_code"> -<pre> -static int var = 0; - -void f () -{ - static int var = 0; -} -</pre> -</div> -<p>Both of the static "var" variables would be included in the table. All - functions should emit both their full names and their basenames. For C or C++, - the full name is the mangled name (if available) which is usually in the - DW_AT_MIPS_linkage_name attribute, and the DW_AT_name contains the function - basename. If global or static variables have a mangled name in a - DW_AT_MIPS_linkage_name attribute, this should be emitted along with the - simple name found in the DW_AT_name attribute.</p> - -<p>".apple_types" sections should contain an entry for each DWARF DIE whose - tag is one of:</p> -<ul> - <li>DW_TAG_array_type</li> - <li>DW_TAG_class_type</li> - <li>DW_TAG_enumeration_type</li> - <li>DW_TAG_pointer_type</li> - <li>DW_TAG_reference_type</li> - <li>DW_TAG_string_type</li> - <li>DW_TAG_structure_type</li> - <li>DW_TAG_subroutine_type</li> - <li>DW_TAG_typedef</li> - <li>DW_TAG_union_type</li> - <li>DW_TAG_ptr_to_member_type</li> - <li>DW_TAG_set_type</li> - <li>DW_TAG_subrange_type</li> - <li>DW_TAG_base_type</li> - <li>DW_TAG_const_type</li> - <li>DW_TAG_constant</li> - <li>DW_TAG_file_type</li> - <li>DW_TAG_namelist</li> - <li>DW_TAG_packed_type</li> - <li>DW_TAG_volatile_type</li> - <li>DW_TAG_restrict_type</li> - <li>DW_TAG_interface_type</li> - <li>DW_TAG_unspecified_type</li> - <li>DW_TAG_shared_type</li> -</ul> -<p>Only entries with a DW_AT_name attribute are included, and the entry must - not be a forward declaration (DW_AT_declaration attribute with a non-zero value). - For example, using the following code:</p> -<div class="doc_code"> -<pre> -int main () -{ - int *b = 0; - return *b; -} -</pre> -</div> -<p>We get a few type DIEs:</p> -<div class="doc_code"> -<pre> -0x00000067: TAG_base_type [5] - AT_encoding( DW_ATE_signed ) - AT_name( "int" ) - AT_byte_size( 0x04 ) - -0x0000006e: TAG_pointer_type [6] - AT_type( {0x00000067} ( int ) ) - AT_byte_size( 0x08 ) -</pre> -</div> -<p>The DW_TAG_pointer_type is not included because it does not have a DW_AT_name.</p> - -<p>".apple_namespaces" section should contain all DW_TAG_namespace DIEs. If - we run into a namespace that has no name this is an anonymous namespace, - and the name should be output as "(anonymous namespace)" (without the quotes). - Why? This matches the output of the abi::cxa_demangle() that is in the standard - C++ library that demangles mangled names.</p> -</div> - -<!-- ======================================================================= --> -<h4> - <a name="acceltableextensions">Language Extensions and File Format Changes</a> -</h4> -<!-- ======================================================================= --> -<div> -<h5>Objective-C Extensions</h5> -<p>".apple_objc" section should contain all DW_TAG_subprogram DIEs for an - Objective-C class. The name used in the hash table is the name of the - Objective-C class itself. If the Objective-C class has a category, then an - entry is made for both the class name without the category, and for the class - name with the category. So if we have a DIE at offset 0x1234 with a name - of method "-[NSString(my_additions) stringWithSpecialString:]", we would add - an entry for "NSString" that points to DIE 0x1234, and an entry for - "NSString(my_additions)" that points to 0x1234. This allows us to quickly - track down all Objective-C methods for an Objective-C class when doing - expressions. It is needed because of the dynamic nature of Objective-C where - anyone can add methods to a class. The DWARF for Objective-C methods is also - emitted differently from C++ classes where the methods are not usually - contained in the class definition, they are scattered about across one or more - compile units. Categories can also be defined in different shared libraries. - So we need to be able to quickly find all of the methods and class functions - given the Objective-C class name, or quickly find all methods and class - functions for a class + category name. This table does not contain any selector - names, it just maps Objective-C class names (or class names + category) to all - of the methods and class functions. The selectors are added as function - basenames in the .debug_names section.</p> - -<p>In the ".apple_names" section for Objective-C functions, the full name is the - entire function name with the brackets ("-[NSString stringWithCString:]") and the - basename is the selector only ("stringWithCString:").</p> - -<h5>Mach-O Changes</h5> -<p>The sections names for the apple hash tables are for non mach-o files. For - mach-o files, the sections should be contained in the "__DWARF" segment with - names as follows:</p> -<ul> - <li>".apple_names" -> "__apple_names"</li> - <li>".apple_types" -> "__apple_types"</li> - <li>".apple_namespaces" -> "__apple_namespac" (16 character limit)</li> - <li> ".apple_objc" -> "__apple_objc"</li> -</ul> -</div> -</div> -</div> - -<!-- *********************************************************************** --> - -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss-blue" alt="Valid CSS"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"></a> - - <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> - <a href="http://llvm.org/">LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> - -</body> -</html> diff --git a/docs/SourceLevelDebugging.rst b/docs/SourceLevelDebugging.rst new file mode 100644 index 0000000000..d7c50d234a --- /dev/null +++ b/docs/SourceLevelDebugging.rst @@ -0,0 +1,2284 @@ +================================ +Source Level Debugging with LLVM +================================ + +.. sectionauthor:: Chris Lattner <sabre@nondot.org> and Jim Laskey <jlaskey@mac.com> + +.. contents:: + :local: + +Introduction +============ + +This document is the central repository for all information pertaining to debug +information in LLVM. It describes the :ref:`actual format that the LLVM debug +information takes <format>`, which is useful for those interested in creating +front-ends or dealing directly with the information. Further, this document +provides specific examples of what debug information for C/C++ looks like. + +Philosophy behind LLVM debugging information +-------------------------------------------- + +The idea of the LLVM debugging information is to capture how the important +pieces of the source-language's Abstract Syntax Tree map onto LLVM code. +Several design aspects have shaped the solution that appears here. The +important ones are: + +* Debugging information should have very little impact on the rest of the + compiler. No transformations, analyses, or code generators should need to + be modified because of debugging information. + +* LLVM optimizations should interact in :ref:`well-defined and easily described + ways <intro_debugopt>` with the debugging information. + +* Because LLVM is designed to support arbitrary programming languages, + LLVM-to-LLVM tools should not need to know anything about the semantics of + the source-level-language. + +* Source-level languages are often **widely** different from one another. + LLVM should not put any restrictions of the flavor of the source-language, + and the debugging information should work with any language. + +* With code generator support, it should be possible to use an LLVM compiler + to compile a program to native machine code and standard debugging + formats. This allows compatibility with traditional machine-code level + debuggers, like GDB or DBX. + +The approach used by the LLVM implementation is to use a small set of +:ref:`intrinsic functions <format_common_intrinsics>` to define a mapping +between LLVM program objects and the source-level objects. The description of +the source-level program is maintained in LLVM metadata in an +:ref:`implementation-defined format <ccxx_frontend>` (the C/C++ front-end +currently uses working draft 7 of the `DWARF 3 standard +<http://www.eagercon.com/dwarf/dwarf3std.htm>`_). + +When a program is being debugged, a debugger interacts with the user and turns +the stored debug information into source-language specific information. As +such, a debugger must be aware of the source-language, and is thus tied to a +specific language or family of languages. + +Debug information consumers +--------------------------- + +The role of debug information is to provide meta information normally stripped +away during the compilation process. This meta information provides an LLVM +user a relationship between generated code and the original program source +code. + +Currently, debug information is consumed by DwarfDebug to produce dwarf +information used by the gdb debugger. Other targets could use the same +information to produce stabs or other debug forms. + +It would also be reasonable to use debug information to feed profiling tools +for analysis of generated code, or, tools for reconstructing the original +source from generated code. + +TODO - expound a bit more. + +.. _intro_debugopt: + +Debugging optimized code +------------------------ + +An extremely high priority of LLVM debugging information is to make it interact +well with optimizations and analysis. In particular, the LLVM debug +information provides the following guarantees: + +* LLVM debug information **always provides information to accurately read + the source-level state of the program**, regardless of which LLVM + optimizations have been run, and without any modification to the + optimizations themselves. However, some optimizations may impact the + ability to modify the current state of the program with a debugger, such + as setting program variables, or calling functions that have been + deleted. + +* As desired, LLVM optimizations can be upgraded to be aware of the LLVM + debugging information, allowing them to update the debugging information + as they perform aggressive optimizations. This means that, with effort, + the LLVM optimizers could optimize debug code just as well as non-debug + code. + +* LLVM debug information does not prevent optimizations from + happening (for example inlining, basic block reordering/merging/cleanup, + tail duplication, etc). + +* LLVM debug information is automatically optimized along with the rest of + the program, using existing facilities. For example, duplicate + information is automatically merged by the linker, and unused information + is automatically removed. + +Basically, the debug information allows you to compile a program with +"``-O0 -g``" and get full debug information, allowing you to arbitrarily modify +the program as it executes from a debugger. Compiling a program with +"``-O3 -g``" gives you full debug information that is always available and +accurate for reading (e.g., you get accurate stack traces despite tail call +elimination and inlining), but you might lose the ability to modify the program +and call functions where were optimized out of the program, or inlined away +completely. + +:ref:`LLVM test suite <test-suite-quickstart>` provides a framework to test +optimizer's handling of debugging information. It can be run like this: + +.. code-block:: bash + + % cd llvm/projects/test-suite/MultiSource/Benchmarks # or some other level + % make TEST=dbgopt + +This will test impact of debugging information on optimization passes. If +debugging information influences optimization passes then it will be reported +as a failure. See :doc:`TestingGuide` for more information on LLVM test +infrastructure and how to run various tests. + +.. _format: + +Debugging information format +============================ + +LLVM debugging information has been carefully designed to make it possible for +the optimizer to optimize the program and debugging information without +necessarily having to know anything about debugging information. In +particular, the use of metadata avoids duplicated debugging information from +the beginning, and the global dead code elimination pass automatically deletes +debugging information for a function if it decides to delete the function. + +To do this, most of the debugging information (descriptors for types, +variables, functions, source files, etc) is inserted by the language front-end +in the form of LLVM metadata. + +Debug information is designed to be agnostic about the target debugger and +debugging information representation (e.g. DWARF/Stabs/etc). It uses a generic +pass to decode the information that represents variables, types, functions, +namespaces, etc: this allows for arbitrary source-language semantics and +type-systems to be used, as long as there is a module written for the target +debugger to interpret the information. + +To provide basic functionality, the LLVM debugger does have to make some +assumptions about the source-level language being debugged, though it keeps +these to a minimum. The only common features that the LLVM debugger assumes +exist are :ref:`source files <format_files>`, and :ref:`program objects +<format_global_variables>`. These abstract objects are used by a debugger to +form stack traces, show information about local variables, etc. + +This section of the documentation first describes the representation aspects +common to any source-language. :ref:`ccxx_frontend` describes the data layout +conventions used by the C and C++ front-ends. + +Debug information descriptors +----------------------------- + +In consideration of the complexity and volume of debug information, LLVM +provides a specification for well formed debug descriptors. + +Consumers of LLVM debug information expect the descriptors for program objects +to start in a canonical format, but the descriptors can include additional +information appended at the end that is source-language specific. All LLVM +debugging information is versioned, allowing backwards compatibility in the +case that the core structures need to change in some way. Also, all debugging +information objects start with a tag to indicate what type of object it is. +The source-language is allowed to define its own objects, by using unreserved +tag numbers. We recommend using with tags in the range 0x1000 through 0x2000 +(there is a defined ``enum DW_TAG_user_base = 0x1000``.) + +The fields of debug descriptors used internally by LLVM are restricted to only +the simple data types ``i32``, ``i1``, ``float``, ``double``, ``mdstring`` and +``mdnode``. + +.. code-block:: llvm + + !1 = metadata !{ + i32, ;; A tag + ... + } + +<a name="LLVMDebugVersion">The first field of a descriptor is always an +``i32`` containing a tag value identifying the content of the descriptor. +The remaining fields are specific to the descriptor. The values of tags are +loosely bound to the tag values of DWARF information entries. However, that +does not restrict the use of the information supplied to DWARF targets. To +facilitate versioning of debug information, the tag is augmented with the +current debug version (``LLVMDebugVersion = 8 << 16`` or 0x80000 or +524288.) + +The details of the various descriptors follow. + +Compile unit descriptors +^^^^^^^^^^^^^^^^^^^^^^^^ + +.. code-block:: llvm + + !0 = metadata !{ + i32, ;; Tag = 17 + LLVMDebugVersion (DW_TAG_compile_unit) + i32, ;; Unused field. + i32, ;; DWARF language identifier (ex. DW_LANG_C89) + metadata, ;; Source file name + metadata, ;; Source file directory (includes trailing slash) + metadata ;; Producer (ex. "4.0.1 LLVM (LLVM research group)") + i1, ;; True if this is a main compile unit. + i1, ;; True if this is optimized. + metadata, ;; Flags + i32 ;; Runtime version + metadata ;; List of enums types + metadata ;; List of retained types + metadata ;; List of subprograms + metadata ;; List of global variables + } + +These descriptors contain a source language ID for the file (we use the DWARF +3.0 ID numbers, such as ``DW_LANG_C89``, ``DW_LANG_C_plus_plus``, +``DW_LANG_Cobol74``, etc), three strings describing the filename, working +directory of the compiler, and an identifier string for the compiler that +produced it. + +Compile unit descriptors provide the root context for objects declared in a +specific compilation unit. File descriptors are defined using this context. +These descriptors are collected by a named metadata ``!llvm.dbg.cu``. They +keep track of subprograms, global variables and type information. + +.. _format_files: + +File descriptors +^^^^^^^^^^^^^^^^ + +.. code-block:: llvm + + !0 = metadata !{ + i32, ;; Tag = 41 + LLVMDebugVersion (DW_TAG_file_type) + metadata, ;; Source file name + metadata, ;; Source file directory (includes trailing slash) + metadata ;; Unused + } + +These descriptors contain information for a file. Global variables and top +level functions would be defined using this context. File descriptors also +provide context for source line correspondence. + +Each input file is encoded as a separate file descriptor in LLVM debugging +information output. + +.. _format_global_variables: + +Global variable descriptors +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. code-block:: llvm + + !1 = metadata !{ + i32, ;; Tag = 52 + LLVMDebugVersion (DW_TAG_variable) + i32, ;; Unused field. + metadata, ;; Reference to context descriptor + metadata, ;; Name + metadata, ;; Display name (fully qualified C++ name) + metadata, ;; MIPS linkage name (for C++) + metadata, ;; Reference to file where defined + i32, ;; Line number where defined + metadata, ;; Reference to type descriptor + i1, ;; True if the global is local to compile unit (static) + i1, ;; True if the global is defined in the compile unit (not extern) + {}* ;; Reference to the global variable + } + +These descriptors provide debug information about globals variables. They +provide details such as name, type and where the variable is defined. All +global variables are collected inside the named metadata ``!llvm.dbg.cu``. + +.. _format_subprograms: + +Subprogram descriptors +^^^^^^^^^^^^^^^^^^^^^^ + +.. code-block:: llvm + + !2 = metadata !{ + i32, ;; Tag = 46 + LLVMDebugVersion (DW_TAG_subprogram) + i32, ;; Unused field. + metadata, ;; Reference to context descriptor + metadata, ;; Name + metadata, ;; Display name (fully qualified C++ name) + metadata, ;; MIPS linkage name (for C++) + metadata, ;; Reference to file where defined + i32, ;; Line number where defined + metadata, ;; Reference to type descriptor + i1, ;; True if the global is local to compile unit (static) + i1, ;; True if the global is defined in the compile unit (not extern) + i32, ;; Line number where the scope of the subprogram begins + i32, ;; Virtuality, e.g. dwarf::DW_VIRTUALITY__virtual + i32, ;; Index into a virtual function + metadata, ;; indicates which base type contains the vtable pointer for the + ;; derived class + i32, ;; Flags - Artifical, Private, Protected, Explicit, Prototyped. + i1, ;; isOptimized + Function * , ;; Pointer to LLVM function + metadata, ;; Lists function template parameters + metadata, ;; Function declaration descriptor + metadata ;; List of function variables + } + +These descriptors provide debug information about functions, methods and +subprograms. They provide details such as name, return types and the source +location where the subprogram is defined. + +Block descriptors +^^^^^^^^^^^^^^^^^ + +.. code-block:: llvm + + !3 = metadata !{ + i32, ;; Tag = 11 + LLVMDebugVersion (DW_TAG_lexical_block) + metadata,;; Reference to context descriptor + i32, ;; Line number + i32, ;; Column number + metadata,;; Reference to source file + i32 ;; Unique ID to identify blocks from a template function + } + +This descriptor provides debug information about nested blocks within a +subprogram. The line number and column numbers are used to dinstinguish two +lexical blocks at same depth. + +.. code-block:: llvm + + !3 = metadata !{ + i32, ;; Tag = 11 + LLVMDebugVersion (DW_TAG_lexical_block) + metadata ;; Reference to the scope we're annotating with a file change + metadata,;; Reference to the file the scope is enclosed in. + } + +This descriptor provides a wrapper around a lexical scope to handle file +changes in the middle of a lexical block. + +.. _format_basic_type: + +Basic type descriptors +^^^^^^^^^^^^^^^^^^^^^^ + +.. code-block:: llvm + + !4 = metadata !{ + i32, ;; Tag = 36 + LLVMDebugVersion (DW_TAG_base_type) + metadata, ;; Reference to context + metadata, ;; Name (may be "" for anonymous types) + metadata, ;; Reference to file where defined (may be NULL) + i32, ;; Line number where defined (may be 0) + i64, ;; Size in bits + i64, ;; Alignment in bits + i64, ;; Offset in bits + i32, ;; Flags + i32 ;; DWARF type encoding + } + +These descriptors define primitive types used in the code. Example ``int``, +``bool`` and ``float``. The context provides the scope of the type, which is +usually the top level. Since basic types are not usually user defined the +context and line number can be left as NULL and 0. The size, alignment and +offset are expressed in bits and can be 64 bit values. The alignment is used +to round the offset when embedded in a :ref:`composite type +<format_composite_type>` (example to keep float doubles on 64 bit boundaries). +The offset is the bit offset if embedded in a :ref:`composite type +<format_composite_type>`. + +The type encoding provides the details of the type. The values are typically +one of the following: + +.. code-block:: llvm + + DW_ATE_address = 1 + DW_ATE_boolean = 2 + DW_ATE_float = 4 + DW_ATE_signed = 5 + DW_ATE_signed_char = 6 + DW_ATE_unsigned = 7 + DW_ATE_unsigned_char = 8 + +.. _format_derived_type: + +Derived type descriptors +^^^^^^^^^^^^^^^^^^^^^^^^ + +.. code-block:: llvm + + !5 = metadata !{ + i32, ;; Tag (see below) + metadata, ;; Reference to context + metadata, ;; Name (may be "" for anonymous types) + metadata, ;; Reference to file where defined (may be NULL) + i32, ;; Line number where defined (may be 0) + i64, ;; Size in bits + i64, ;; Alignment in bits + i64, ;; Offset in bits + i32, ;; Flags to encode attributes, e.g. private + metadata, ;; Reference to type derived from + metadata, ;; (optional) Name of the Objective C property associated with + ;; Objective-C an ivar + metadata, ;; (optional) Name of the Objective C property getter selector. + metadata, ;; (optional) Name of the Objective C property setter selector. + i32 ;; (optional) Objective C property attributes. + } + +These descriptors are used to define types derived from other types. The value +of the tag varies depending on the meaning. The following are possible tag +values: + +.. code-block:: llvm + + DW_TAG_formal_parameter = 5 + DW_TAG_member = 13 + DW_TAG_pointer_type = 15 + DW_TAG_reference_type = 16 + DW_TAG_typedef = 22 + DW_TAG_const_type = 38 + DW_TAG_volatile_type = 53 + DW_TAG_restrict_type = 55 + +``DW_TAG_member`` is used to define a member of a :ref:`composite type +<format_composite_type>` or :ref:`subprogram <format_subprograms>`. The type +of the member is the :ref:`derived type <format_derived_type>`. +``DW_TAG_formal_parameter`` is used to define a member which is a formal +argument of a subprogram. + +``DW_TAG_typedef`` is used to provide a name for the derived type. + +``DW_TAG_pointer_type``, ``DW_TAG_reference_type``, ``DW_TAG_const_type``, +``DW_TAG_volatile_type`` and ``DW_TAG_restrict_type`` are used to qualify the +:ref:`derived type <format_derived_type>`. + +:ref:`Derived type <format_derived_type>` location can be determined from the +context and line number. The size, alignment and offset are expressed in bits +and can be 64 bit values. The alignment is used to round the offset when +embedded in a :ref:`composite type <format_composite_type>` (example to keep +float doubles on 64 bit boundaries.) The offset is the bit offset if embedded +in a :ref:`composite type <format_composite_type>`. + +Note that the ``void *`` type is expressed as a type derived from NULL. + +.. _format_composite_type: + +Composite type descriptors +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. code-block:: llvm + + !6 = metadata !{ + i32, ;; Tag (see below) + metadata, ;; Reference to context + metadata, ;; Name (may be "" for anonymous types) + metadata, ;; Reference to file where defined (may be NULL) + i32, ;; Line number where defined (may be 0) + i64, ;; Size in bits + i64, ;; Alignment in bits + i64, ;; Offset in bits + i32, ;; Flags + metadata, ;; Reference to type derived from + metadata, ;; Reference to array of member descriptors + i32 ;; Runtime languages + } + +These descriptors are used to define types that are composed of 0 or more +elements. The value of the tag varies depending on the meaning. The following +are possible tag values: + +.. code-block:: llvm + + DW_TAG_array_type = 1 + DW_TAG_enumeration_type = 4 + DW_TAG_structure_type = 19 + DW_TAG_union_type = 23 + DW_TAG_vector_type = 259 + DW_TAG_subroutine_type = 21 + DW_TAG_inheritance = 28 + +The vector flag indicates that an array type is a native packed vector. + +The members of array types (tag = ``DW_TAG_array_type``) or vector types (tag = +``DW_TAG_vector_type``) are :ref:`subrange descriptors <format_subrange>`, each +representing the range of subscripts at that level of indexing. + +The members of enumeration types (tag = ``DW_TAG_enumeration_type``) are +:ref:`enumerator descriptors <format_enumerator>`, each representing the +definition of enumeration value for the set. All enumeration type descriptors +are collected inside the named metadata ``!llvm.dbg.cu``. + +The members of structure (tag = ``DW_TAG_structure_type``) or union (tag = +``DW_TAG_union_type``) types are any one of the :ref:`basic +<format_basic_type>`, :ref:`derived <format_derived_type>` or :ref:`composite +<format_composite_type>` type descriptors, each representing a field member of +the structure or union. + +For C++ classes (tag = ``DW_TAG_structure_type``), member descriptors provide +information about base classes, static members and member functions. If a +member is a :ref:`derived type descriptor <format_derived_type>` and has a tag +of ``DW_TAG_inheritance``, then the type represents a base class. If the member +of is a :ref:`global variable descriptor <format_global_variables>` then it +represents a static member. And, if the member is a :ref:`subprogram +descriptor <format_subprograms>` then it represents a member function. For +static members and member functions, ``getName()`` returns the members link or +the C++ mangled name. ``getDisplayName()`` the simplied version of the name. + +The first member of subroutine (tag = ``DW_TAG_subroutine_type``) type elements +is the return type for the subroutine. The remaining elements are the formal +arguments to the subroutine. + +:ref:`Composite type <format_composite_type>` location can be determined from +the context and line number. The size, alignment and offset are expressed in +bits and can be 64 bit values. The alignment is used to round the offset when +embedded in a :ref:`composite type <format_composite_type>` (as an example, to +keep float doubles on 64 bit boundaries). The offset is the bit offset if +embedded in a :ref:`composite type <format_composite_type>`. + +.. _format_subrange: + +Subrange descriptors +^^^^^^^^^^^^^^^^^^^^ + +.. code-block:: llvm + + !42 = metadata !{ + i32, ;; Tag = 33 + LLVMDebugVersion (DW_TAG_subrange_type) + i64, ;; Low value + i64 ;; High value + } + +These descriptors are used to define ranges of array subscripts for an array +:ref:`composite type <format_composite_type>`. The low value defines the lower +bounds typically zero for C/C++. The high value is the upper bounds. Values +are 64 bit. ``High - Low + 1`` is the size of the array. If ``Low > High`` +the array bounds are not included in generated debugging information. + +.. _format_enumerator: + +Enumerator descriptors +^^^^^^^^^^^^^^^^^^^^^^ + +.. code-block:: llvm + + !6 = metadata !{ + i32, ;; Tag = 40 + LLVMDebugVersion (DW_TAG_enumerator) + metadata, ;; Name + i64 ;; Value + } + +These descriptors are used to define members of an enumeration :ref:`composite +type <format_composite_type>`, it associates the name to the value. + +Local variables +^^^^^^^^^^^^^^^ + +.. code-block:: llvm + + !7 = metadata !{ + i32, ;; Tag (see below) + metadata, ;; Context + metadata, ;; Name + metadata, ;; Reference to file where defined + i32, ;; 24 bit - Line number where defined + ;; 8 bit - Argument number. 1 indicates 1st argument. + metadata, ;; Type descriptor + i32, ;; flags + metadata ;; (optional) Reference to inline location + } + +These descriptors are used to define variables local to a sub program. The +value of the tag depends on the usage of the variable: + +.. code-block:: llvm + + DW_TAG_auto_variable = 256 + DW_TAG_arg_variable = 257 + DW_TAG_return_variable = 258 + +An auto variable is any variable declared in the body of the function. An +argument variable is any variable that appears as a formal argument to the +function. A return variable is used to track the result of a function and has +no source correspondent. + +The context is either the subprogram or block where the variable is defined. +Name the source variable name. Context and line indicate where the variable +was defined. Type descriptor defines the declared type of the variable. + +.. _format_common_intrinsics: + +Debugger intrinsic functions +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +LLVM uses several intrinsic functions (name prefixed with "``llvm.dbg``") to +provide debug information at various points in generated code. + +``llvm.dbg.declare`` +^^^^^^^^^^^^^^^^^^^^ + +.. code-block:: llvm + + void %llvm.dbg.declare(metadata, metadata) + +This intrinsic provides information about a local element (e.g., variable). +The first argument is metadata holding the alloca for the variable. The second +argument is metadata containing a description of the variable. + +``llvm.dbg.value`` +^^^^^^^^^^^^^^^^^^ + +.. code-block:: llvm + + void %llvm.dbg.value(metadata, i64, metadata) + +This intrinsic provides information when a user source variable is set to a new +value. The first argument is the new value (wrapped as metadata). The second +argument is the offset in the user source variable where the new value is +written. The third argument is metadata containing a description of the user +source variable. + +Object lifetimes and scoping +============================ + +In many languages, the local variables in functions can have their lifetimes or +scopes limited to a subset of a function. In the C family of languages, for +example, variables are only live (readable and writable) within the source +block that they are defined in. In functional languages, values are only +readable after they have been defined. Though this is a very obvious concept, +it is non-trivial to model in LLVM, because it has no notion of scoping in this +sense, and does not want to be tied to a language's scoping rules. + +In order to handle this, the LLVM debug format uses the metadata attached to +llvm instructions to encode line number and scoping information. Consider the +following C fragment, for example: + +.. code-block:: c + + 1. void foo() { + 2. int X = 21; + 3. int Y = 22; + 4. { + 5. int Z = 23; + 6. Z = X; + 7. } + 8. X = Y; + 9. } + +Compiled to LLVM, this function would be represented like this: + +.. code-block:: llvm + + define void @foo() nounwind ssp { + entry: + %X = alloca i32, align 4 ; <i32*> [#uses=4] + %Y = alloca i32, align 4 ; <i32*> [#uses=4] + %Z = alloca i32, align 4 ; <i32*> [#uses=3] + %0 = bitcast i32* %X to {}* ; <{}*> [#uses=1] + call void @llvm.dbg.declare(metadata !{i32 * %X}, metadata !0), !dbg !7 + store i32 21, i32* %X, !dbg !8 + %1 = bitcast i32* %Y to {}* ; <{}*> [#uses=1] + call void @llvm.dbg.declare(metadata !{i32 * %Y}, metadata !9), !dbg !10 + store i32 22, i32* %Y, !dbg !11 + %2 = bitcast i32* %Z to {}* ; <{}*> [#uses=1] + call void @llvm.dbg.declare(metadata !{i32 * %Z}, metadata !12), !dbg !14 + store i32 23, i32* %Z, !dbg !15 + %tmp = load i32* %X, !dbg !16 ; <i32> [#uses=1] + %tmp1 = load i32* %Y, !dbg !16 ; <i32> [#uses=1] + %add = add nsw i32 %tmp, %tmp1, !dbg !16 ; <i32> [#uses=1] + store i32 %add, i32* %Z, !dbg !16 + %tmp2 = load i32* %Y, !dbg !17 ; <i32> [#uses=1] + store i32 %tmp2, i32* %X, !dbg !17 + ret void, !dbg !18 + } + + declare void @llvm.dbg.declare(metadata, metadata) nounwind readnone + + !0 = metadata !{i32 459008, metadata !1, metadata !"X", + metadata !3, i32 2, metadata !6}; [ DW_TAG_auto_variable ] + !1 = metadata !{i32 458763, metadata !2}; [DW_TAG_lexical_block ] + !2 = metadata !{i32 458798, i32 0, metadata !3, metadata !"foo", metadata !"foo", + metadata !"foo", metadata !3, i32 1, metadata !4, + i1 false, i1 true}; [DW_TAG_subprogram ] + !3 = metadata !{i32 458769, i32 0, i32 12, metadata !"foo.c", + metadata !"/private/tmp", metadata !"clang 1.1", i1 true, + i1 false, metadata !"", i32 0}; [DW_TAG_compile_unit ] + !4 = metadata !{i32 458773, metadata !3, metadata !"", null, i32 0, i64 0, i64 0, + i64 0, i32 0, null, metadata !5, i32 0}; [DW_TAG_subroutine_type ] + !5 = metadata !{null} + !6 = metadata !{i32 458788, metadata !3, metadata !"int", metadata !3, i32 0, + i64 32, i64 32, i64 0, i32 0, i32 5}; [DW_TAG_base_type ] + !7 = metadata !{i32 2, i32 7, metadata !1, null} + !8 = metadata !{i32 2, i32 3, metadata !1, null} + !9 = metadata !{i32 459008, metadata !1, metadata !"Y", metadata !3, i32 3, + metadata !6}; [ DW_TAG_auto_variable ] + !10 = metadata !{i32 3, i32 7, metadata !1, null} + !11 = metadata !{i32 3, i32 3, metadata !1, null} + !12 = metadata !{i32 459008, metadata !13, metadata !"Z", metadata !3, i32 5, + metadata !6}; [ DW_TAG_auto_variable ] + !13 = metadata !{i32 458763, metadata !1}; [DW_TAG_lexical_block ] + !14 = metadata !{i32 5, i32 9, metadata !13, null} + !15 = metadata !{i32 5, i32 5, metadata !13, null} + !16 = metadata !{i32 6, i32 5, metadata !13, null} + !17 = metadata !{i32 8, i32 3, metadata !1, null} + !18 = metadata !{i32 9, i32 1, metadata !2, null} + +This example illustrates a few important details about LLVM debugging +information. In particular, it shows how the ``llvm.dbg.declare`` intrinsic and +location information, which are attached to an instruction, are applied +together to allow a debugger to analyze the relationship between statements, +variable definitions, and the code used to implement the function. + +.. code-block:: llvm + + call void @llvm.dbg.declare(metadata, metadata !0), !dbg !7 + +The first intrinsic ``%llvm.dbg.declare`` encodes debugging information for the +variable ``X``. The metadata ``!dbg !7`` attached to the intrinsic provides +scope information for the variable ``X``. + +.. code-block:: llvm + + !7 = metadata !{i32 2, i32 7, metadata !1, null} + !1 = metadata !{i32 458763, metadata !2}; [DW_TAG_lexical_block ] + !2 = metadata !{i32 458798, i32 0, metadata !3, metadata !"foo", + metadata !"foo", metadata !"foo", metadata !3, i32 1, + metadata !4, i1 false, i1 true}; [DW_TAG_subprogram ] + +Here ``!7`` is metadata providing location information. It has four fields: +line number, column number, scope, and original scope. The original scope +represents inline location if this instruction is inlined inside a caller, and +is null otherwise. In this example, scope is encoded by ``!1``. ``!1`` +represents a lexical block inside the scope ``!2``, where ``!2`` is a +:ref:`subprogram descriptor <format_subprograms>`. This way the location +information attached to the intrinsics indicates that the variable ``X`` is +declared at line number 2 at a function level scope in function ``foo``. + +Now lets take another example. + +.. code-block:: llvm + + call void @llvm.dbg.declare(metadata, metadata !12), !dbg !14 + +The second intrinsic ``%llvm.dbg.declare`` encodes debugging information for +variable ``Z``. The metadata ``!dbg !14`` attached to the intrinsic provides +scope information for the variable ``Z``. + +.. code-block:: llvm + + !13 = metadata !{i32 458763, metadata !1}; [DW_TAG_lexical_block ] + !14 = metadata !{i32 5, i32 9, metadata !13, null} + +Here ``!14`` indicates that ``Z`` is declared at line number 5 and +column number 9 inside of lexical scope ``!13``. The lexical scope itself +resides inside of lexical scope ``!1`` described above. + +The scope information attached with each instruction provides a straightforward +way to find instructions covered by a scope. + +.. _ccxx_frontend: + +C/C++ front-end specific debug information +========================================== + +The C and C++ front-ends represent information about the program in a format +that is effectively identical to `DWARF 3.0 +<http://www.eagercon.com/dwarf/dwarf3std.htm>`_ in terms of information +content. This allows code generators to trivially support native debuggers by +generating standard dwarf information, and contains enough information for +non-dwarf targets to translate it as needed. + +This section describes the forms used to represent C and C++ programs. Other +languages could pattern themselves after this (which itself is tuned to +representing programs in the same way that DWARF 3 does), or they could choose +to provide completely different forms if they don't fit into the DWARF model. +As support for debugging information gets added to the various LLVM +source-language front-ends, the information used should be documented here. + +The following sections provide examples of various C/C++ constructs and the +debug information that would best describe those constructs. + +C/C++ source file information +----------------------------- + +Given the source files ``MySource.cpp`` and ``MyHeader.h`` located in the +directory ``/Users/mine/sources``, the following code: + +.. code-block:: c + + #include "MyHeader.h" + + int main(int argc, char *argv[]) { + return 0; + } + +a C/C++ front-end would generate the following descriptors: + +.. code-block:: llvm + + ... + ;; + ;; Define the compile unit for the main source file "/Users/mine/sources/MySource.cpp". + ;; + !2 = metadata !{ + i32 524305, ;; Tag + i32 0, ;; Unused + i32 4, ;; Language Id + metadata !"MySource.cpp", + metadata !"/Users/mine/sources", + metadata !"4.2.1 (Based on Apple Inc. build 5649) (LLVM build 00)", + i1 true, ;; Main Compile Unit + i1 false, ;; Optimized compile unit + metadata !"", ;; Compiler flags + i32 0} ;; Runtime version + + ;; + ;; Define the file for the file "/Users/mine/sources/MySource.cpp". + ;; + !1 = metadata !{ + i32 524329, ;; Tag + metadata !"MySource.cpp", + metadata !"/Users/mine/sources", + metadata !2 ;; Compile unit + } + + ;; + ;; Define the file for the file "/Users/mine/sources/Myheader.h" + ;; + !3 = metadata !{ + i32 524329, ;; Tag + metadata !"Myheader.h" + metadata !"/Users/mine/sources", + metadata !2 ;; Compile unit + } + + ... + +``llvm::Instruction`` provides easy access to metadata attached with an +instruction. One can extract line number information encoded in LLVM IR using +``Instruction::getMetadata()`` and ``DILocation::getLineNumber()``. + +.. code-block:: c++ + + if (MDNode *N = I->getMetadata("dbg")) { // Here I is an LLVM instruction + DILocation Loc(N); // DILocation is in DebugInfo.h + unsigned Line = Loc.getLineNumber(); + StringRef File = Loc.getFilename(); + StringRef Dir = Loc.getDirectory(); + } + +C/C++ global variable information +--------------------------------- + +Given an integer global variable declared as follows: + +.. code-block:: c + + int MyGlobal = 100; + +a C/C++ front-end would generate the following descriptors: + +.. code-block:: llvm + + ;; + ;; Define the global itself. + ;; + %MyGlobal = global int 100 + ... + ;; + ;; List of debug info of globals + ;; + !llvm.dbg.cu = !{!0} + + ;; Define the compile unit. + !0 = metadata !{ + i32 786449, ;; Tag + i32 0, ;; Context + i32 4, ;; Language + metadata !"foo.cpp", ;; File + metadata !"/Volumes/Data/tmp", ;; Directory + metadata !"clang version 3.1 ", ;; Producer + i1 true, ;; Deprecated field + i1 false, ;; "isOptimized"? + metadata !"", ;; Flags + i32 0, ;; Runtime Version + metadata !1, ;; Enum Types + metadata !1, ;; Retained Types + metadata !1, ;; Subprograms + metadata !3 ;; Global Variables + } ; [ DW_TAG_compile_unit ] + + ;; The Array of Global Variables + !3 = metadata !{ + metadata !4 + } + + !4 = metadata !{ + metadata !5 + } + + ;; + ;; Define the global variable itself. + ;; + !5 = metadata !{ + i32 786484, ;; Tag + i32 0, ;; Unused + null, ;; Unused + metadata !"MyGlobal", ;; Name + metadata !"MyGlobal", ;; Display Name + metadata !"", ;; Linkage Name + metadata !6, ;; File + i32 1, ;; Line + metadata !7, ;; Type + i32 0, ;; IsLocalToUnit + i32 1, ;; IsDefinition + i32* @MyGlobal ;; LLVM-IR Value + } ; [ DW_TAG_variable ] + + ;; + ;; Define the file + ;; + !6 = metadata !{ + i32 786473, ;; Tag + metadata !"foo.cpp", ;; File + metadata !"/Volumes/Data/tmp", ;; Directory + null ;; Unused + } ; [ DW_TAG_file_type ] + + ;; + ;; Define the type + ;; + !7 = metadata !{ + i32 786468, ;; Tag + null, ;; Unused + metadata !"int", ;; Name + null, ;; Unused + i32 0, ;; Line + i64 32, ;; Size in Bits + i64 32, ;; Align in Bits + i64 0, ;; Offset + i32 0, ;; Flags + i32 5 ;; Encoding + } ; [ DW_TAG_base_type ] + +C/C++ function information +-------------------------- + +Given a function declared as follows: + +.. code-block:: c + + int main(int argc, char *argv[]) { + return 0; + } + +a C/C++ front-end would generate the following descriptors: + +.. code-block:: llvm + + ;; + ;; Define the anchor for subprograms. Note that the second field of the + ;; anchor is 46, which is the same as the tag for subprograms + ;; (46 = DW_TAG_subprogram.) + ;; + !6 = metadata !{ + i32 524334, ;; Tag + i32 0, ;; Unused + metadata !1, ;; Context + metadata !"main", ;; Name + metadata !"main", ;; Display name + metadata !"main", ;; Linkage name + metadata !1, ;; File + i32 1, ;; Line number + metadata !4, ;; Type + i1 false, ;; Is local + i1 true, ;; Is definition + i32 0, ;; Virtuality attribute, e.g. pure virtual function + i32 0, ;; Index into virtual table for C++ methods + i32 0, ;; Type that holds virtual table. + i32 0, ;; Flags + i1 false, ;; True if this function is optimized + Function *, ;; Pointer to llvm::Function + null ;; Function template parameters + } + ;; + ;; Define the subprogram itself. + ;; + define i32 @main(i32 %argc, i8** %argv) { + ... + } + +C/C++ basic types +----------------- + +The following are the basic type descriptors for C/C++ core types: + +bool +^^^^ + +.. code-block:: llvm + + !2 = metadata !{ + i32 524324, ;; Tag + metadata !1, ;; Context + metadata !"bool", ;; Name + metadata !1, ;; File + i32 0, ;; Line number + i64 8, ;; Size in Bits + i64 8, ;; Align in Bits + i64 0, ;; Offset in Bits + i32 0, ;; Flags + i32 2 ;; Encoding + } + +char +^^^^ + +.. code-block:: llvm + + !2 = metadata !{ + i32 524324, ;; Tag + metadata !1, ;; Context + metadata !"char", ;; Name + metadata !1, ;; File + i32 0, ;; Line number + i64 8, ;; Size in Bits + i64 8, ;; Align in Bits + i64 0, ;; Offset in Bits + i32 0, ;; Flags + i32 6 ;; Encoding + } + +unsigned char +^^^^^^^^^^^^^ + +.. code-block:: llvm + + !2 = metadata !{ + i32 524324, ;; Tag + metadata !1, ;; Context + metadata !"unsigned char", + metadata !1, ;; File + i32 0, ;; Line number + i64 8, ;; Size in Bits + i64 8, ;; Align in Bits + i64 0, ;; Offset in Bits + i32 0, ;; Flags + i32 8 ;; Encoding + } + +short +^^^^^ + +.. code-block:: llvm + + !2 = metadata !{ + i32 524324, ;; Tag + metadata !1, ;; Context + metadata !"short int", + metadata !1, ;; File + i32 0, ;; Line number + i64 16, ;; Size in Bits + i64 16, ;; Align in Bits + i64 0, ;; Offset in Bits + i32 0, ;; Flags + i32 5 ;; Encoding + } + +unsigned short +^^^^^^^^^^^^^^ + +.. code-block:: llvm + + !2 = metadata !{ + i32 524324, ;; Tag + metadata !1, ;; Context + metadata !"short unsigned int", + metadata !1, ;; File + i32 0, ;; Line number + i64 16, ;; Size in Bits + i64 16, ;; Align in Bits + i64 0, ;; Offset in Bits + i32 0, ;; Flags + i32 7 ;; Encoding + } + +int +^^^ + +.. code-block:: llvm + + !2 = metadata !{ + i32 524324, ;; Tag + metadata !1, ;; Context + metadata !"int", ;; Name + metadata !1, ;; File + i32 0, ;; Line number + i64 32, ;; Size in Bits + i64 32, ;; Align in Bits + i64 0, ;; Offset in Bits + i32 0, ;; Flags + i32 5 ;; Encoding + } + +unsigned int +^^^^^^^^^^^^ + +.. code-block:: llvm + + !2 = metadata !{ + i32 524324, ;; Tag + metadata !1, ;; Context + metadata !"unsigned int", + metadata !1, ;; File + i32 0, ;; Line number + i64 32, ;; Size in Bits + i64 32, ;; Align in Bits + i64 0, ;; Offset in Bits + i32 0, ;; Flags + i32 7 ;; Encoding + } + +long long +^^^^^^^^^ + +.. code-block:: llvm + + !2 = metadata !{ + i32 524324, ;; Tag + metadata !1, ;; Context + metadata !"long long int", + metadata !1, ;; File + i32 0, ;; Line number + i64 64, ;; Size in Bits + i64 64, ;; Align in Bits + i64 0, ;; Offset in Bits + i32 0, ;; Flags + i32 5 ;; Encoding + } + +unsigned long long +^^^^^^^^^^^^^^^^^^ + +.. code-block:: llvm + + !2 = metadata !{ + i32 524324, ;; Tag + metadata !1, ;; Context + metadata !"long long unsigned int", + metadata !1, ;; File + i32 0, ;; Line number + i64 64, ;; Size in Bits + i64 64, ;; Align in Bits + i64 0, ;; Offset in Bits + i32 0, ;; Flags + i32 7 ;; Encoding + } + +float +^^^^^ + +.. code-block:: llvm + + !2 = metadata !{ + i32 524324, ;; Tag + metadata !1, ;; Context + metadata !"float", + metadata !1, ;; File + i32 0, ;; Line number + i64 32, ;; Size in Bits + i64 32, ;; Align in Bits + i64 0, ;; Offset in Bits + i32 0, ;; Flags + i32 4 ;; Encoding + } + +double +^^^^^^ + +.. code-block:: llvm + + !2 = metadata !{ + i32 524324, ;; Tag + metadata !1, ;; Context + metadata !"double",;; Name + metadata !1, ;; File + i32 0, ;; Line number + i64 64, ;; Size in Bits + i64 64, ;; Align in Bits + i64 0, ;; Offset in Bits + i32 0, ;; Flags + i32 4 ;; Encoding + } + +C/C++ derived types +------------------- + +Given the following as an example of C/C++ derived type: + +.. code-block:: c + + typedef const int *IntPtr; + +a C/C++ front-end would generate the following descriptors: + +.. code-block:: llvm + + ;; + ;; Define the typedef "IntPtr". + ;; + !2 = metadata !{ + i32 524310, ;; Tag + metadata !1, ;; Context + metadata !"IntPtr", ;; Name + metadata !3, ;; File + i32 0, ;; Line number + i64 0, ;; Size in bits + i64 0, ;; Align in bits + i64 0, ;; Offset in bits + i32 0, ;; Flags + metadata !4 ;; Derived From type + } + ;; + ;; Define the pointer type. + ;; + !4 = metadata !{ + i32 524303, ;; Tag + metadata !1, ;; Context + metadata !"", ;; Name + metadata !1, ;; File + i32 0, ;; Line number + i64 64, ;; Size in bits + i64 64, ;; Align in bits + i64 0, ;; Offset in bits + i32 0, ;; Flags + metadata !5 ;; Derived From type + } + ;; + ;; Define the const type. + ;; + !5 = metadata !{ + i32 524326, ;; Tag + metadata !1, ;; Context + metadata !"", ;; Name + metadata !1, ;; File + i32 0, ;; Line number + i64 32, ;; Size in bits + i64 32, ;; Align in bits + i64 0, ;; Offset in bits + i32 0, ;; Flags + metadata !6 ;; Derived From type + } + ;; + ;; Define the int type. + ;; + !6 = metadata !{ + i32 524324, ;; Tag + metadata !1, ;; Context + metadata !"int", ;; Name + metadata !1, ;; File + i32 0, ;; Line number + i64 32, ;; Size in bits + i64 32, ;; Align in bits + i64 0, ;; Offset in bits + i32 0, ;; Flags + 5 ;; Encoding + } + +C/C++ struct/union types +------------------------ + +Given the following as an example of C/C++ struct type: + +.. code-block:: c + + struct Color { + unsigned Red; + unsigned Green; + unsigned Blue; + }; + +a C/C++ front-end would generate the following descriptors: + +.. code-block:: llvm + + ;; + ;; Define basic type for unsigned int. + ;; + !5 = metadata !{ + i32 524324, ;; Tag + metadata !1, ;; Context + metadata !"unsigned int", + metadata !1, ;; File + i32 0, ;; Line number + i64 32, ;; Size in Bits + i64 32, ;; Align in Bits + i64 0, ;; Offset in Bits + i32 0, ;; Flags + i32 7 ;; Encoding + } + ;; + ;; Define composite type for struct Color. + ;; + !2 = metadata !{ + i32 524307, ;; Tag + metadata !1, ;; Context + metadata !"Color", ;; Name + metadata !1, ;; Compile unit + i32 1, ;; Line number + i64 96, ;; Size in bits + i64 32, ;; Align in bits + i64 0, ;; Offset in bits + i32 0, ;; Flags + null, ;; Derived From + metadata !3, ;; Elements + i32 0 ;; Runtime Language + } + + ;; + ;; Define the Red field. + ;; + !4 = metadata !{ + i32 524301, ;; Tag + metadata !1, ;; Context + metadata !"Red", ;; Name + metadata !1, ;; File + i32 2, ;; Line number + i64 32, ;; Size in bits + i64 32, ;; Align in bits + i64 0, ;; Offset in bits + i32 0, ;; Flags + metadata !5 ;; Derived From type + } + + ;; + ;; Define the Green field. + ;; + !6 = metadata !{ + i32 524301, ;; Tag + metadata !1, ;; Context + metadata !"Green", ;; Name + metadata !1, ;; File + i32 3, ;; Line number + i64 32, ;; Size in bits + i64 32, ;; Align in bits + i64 32, ;; Offset in bits + i32 0, ;; Flags + metadata !5 ;; Derived From type + } + + ;; + ;; Define the Blue field. + ;; + !7 = metadata !{ + i32 524301, ;; Tag + metadata !1, ;; Context + metadata !"Blue", ;; Name + metadata !1, ;; File + i32 4, ;; Line number + i64 32, ;; Size in bits + i64 32, ;; Align in bits + i64 64, ;; Offset in bits + i32 0, ;; Flags + metadata !5 ;; Derived From type + } + + ;; + ;; Define the array of fields used by the composite type Color. + ;; + !3 = metadata !{metadata !4, metadata !6, metadata !7} + +C/C++ enumeration types +----------------------- + +Given the following as an example of C/C++ enumeration type: + +.. code-block:: c + + enum Trees { + Spruce = 100, + Oak = 200, + Maple = 300 + }; + +a C/C++ front-end would generate the following descriptors: + +.. code-block:: llvm + + ;; + ;; Define composite type for enum Trees + ;; + !2 = metadata !{ + i32 524292, ;; Tag + metadata !1, ;; Context + metadata !"Trees", ;; Name + metadata !1, ;; File + i32 1, ;; Line number + i64 32, ;; Size in bits + i64 32, ;; Align in bits + i64 0, ;; Offset in bits + i32 0, ;; Flags + null, ;; Derived From type + metadata !3, ;; Elements + i32 0 ;; Runtime language + } + + ;; + ;; Define the array of enumerators used by composite type Trees. + ;; + !3 = metadata !{metadata !4, metadata !5, metadata !6} + + ;; + ;; Define Spruce enumerator. + ;; + !4 = metadata !{i32 524328, metadata !"Spruce", i64 100} + + ;; + ;; Define Oak enumerator. + ;; + !5 = metadata !{i32 524328, metadata !"Oak", i64 200} + + ;; + ;; Define Maple enumerator. + ;; + !6 = metadata !{i32 524328, metadata !"Maple", i64 300} + +Debugging information format +============================ + +Debugging Information Extension for Objective C Properties +---------------------------------------------------------- + +Introduction +^^^^^^^^^^^^ + +Objective C provides a simpler way to declare and define accessor methods using +declared properties. The language provides features to declare a property and +to let compiler synthesize accessor methods. + +The debugger lets developer inspect Objective C interfaces and their instance +variables and class variables. However, the debugger does not know anything +about the properties defined in Objective C interfaces. The debugger consumes +information generated by compiler in DWARF format. The format does not support +encoding of Objective C properties. This proposal describes DWARF extensions to +encode Objective C properties, which the debugger can use to let developers +inspect Objective C properties. + +Proposal +^^^^^^^^ + +Objective C properties exist separately from class members. A property can be +defined only by "setter" and "getter" selectors, and be calculated anew on each +access. Or a property can just be a direct access to some declared ivar. +Finally it can have an ivar "automatically synthesized" for it by the compiler, +in which case the property can be referred to in user code directly using the +standard C dereference syntax as well as through the property "dot" syntax, but +there is no entry in the ``@interface`` declaration corresponding to this ivar. + +To facilitate debugging, these properties we will add a new DWARF TAG into the +``DW_TAG_structure_type`` definition for the class to hold the description of a +given property, and a set of DWARF attributes that provide said description. +The property tag will also contain the name and declared type of the property. + +If there is a related ivar, there will also be a DWARF property attribute placed +in the ``DW_TAG_member`` DIE for that ivar referring back to the property TAG +for that property. And in the case where the compiler synthesizes the ivar +directly, the compiler is expected to generate a ``DW_TAG_member`` for that +ivar (with the ``DW_AT_artificial`` set to 1), whose name will be the name used +to access this ivar directly in code, and with the property attribute pointing +back to the property it is backing. + +The following examples will serve as illustration for our discussion: + +.. code-block:: objc + + @interface I1 { + int n2; + } + + @property int p1; + @property int p2; + @end + + @implementation I1 + @synthesize p1; + @synthesize p2 = n2; + @end + +This produces the following DWARF (this is a "pseudo dwarfdump" output): + +.. code-block:: none + + 0x00000100: TAG_structure_type [7] * + AT_APPLE_runtime_class( 0x10 ) + AT_name( "I1" ) + AT_decl_file( "Objc_Property.m" ) + AT_decl_line( 3 ) + + 0x00000110 TAG_APPLE_property + AT_name ( "p1" ) + AT_type ( {0x00000150} ( int ) ) + + 0x00000120: TAG_APPLE_property + AT_name ( "p2" ) + AT_type ( {0x00000150} ( int ) ) + + 0x00000130: TAG_member [8] + AT_name( "_p1" ) + AT_APPLE_property ( {0x00000110} "p1" ) + AT_type( {0x00000150} ( int ) ) + AT_artificial ( 0x1 ) + + 0x00000140: TAG_member [8] + AT_name( "n2" ) + AT_APPLE_property ( {0x00000120} "p2" ) + AT_type( {0x00000150} ( int ) ) + + 0x00000150: AT_type( ( int ) ) + +Note, the current convention is that the name of the ivar for an +auto-synthesized property is the name of the property from which it derives +with an underscore prepended, as is shown in the example. But we actually +don't need to know this convention, since we are given the name of the ivar +directly. + +Also, it is common practice in ObjC to have different property declarations in +the @interface and @implementation - e.g. to provide a read-only property in +the interface,and a read-write interface in the implementation. In that case, +the compiler should emit whichever property declaration will be in force in the +current translation unit. + +Developers can decorate a property with attributes which are encoded using +``DW_AT_APPLE_property_attribute``. + +.. code-block:: objc + + @property (readonly, nonatomic) int pr; + +.. code-block:: none + + TAG_APPLE_property [8] + AT_name( "pr" ) + AT_type ( {0x00000147} (int) ) + AT_APPLE_property_attribute (DW_APPLE_PROPERTY_readonly, DW_APPLE_PROPERTY_nonatomic) + +The setter and getter method names are attached to the property using +``DW_AT_APPLE_property_setter`` and ``DW_AT_APPLE_property_getter`` attributes. + +.. code-block:: objc + + @interface I1 + @property (setter=myOwnP3Setter:) int p3; + -(void)myOwnP3Setter:(int)a; + @end + + @implementation I1 + @synthesize p3; + -(void)myOwnP3Setter:(int)a{ } + @end + +The DWARF for this would be: + +.. code-block:: none + + 0x000003bd: TAG_structure_type [7] * + AT_APPLE_runtime_class( 0x10 ) + AT_name( "I1" ) + AT_decl_file( "Objc_Property.m" ) + AT_decl_line( 3 ) + + 0x000003cd TAG_APPLE_property + AT_name ( "p3" ) + AT_APPLE_property_setter ( "myOwnP3Setter:" ) + AT_type( {0x00000147} ( int ) ) + + 0x000003f3: TAG_member [8] + AT_name( "_p3" ) + AT_type ( {0x00000147} ( int ) ) + AT_APPLE_property ( {0x000003cd} ) + AT_artificial ( 0x1 ) + +New DWARF Tags +^^^^^^^^^^^^^^ + ++-----------------------+--------+ +| TAG | Value | ++=======================+========+ +| DW_TAG_APPLE_property | 0x4200 | ++-----------------------+--------+ + +New DWARF Attributes +^^^^^^^^^^^^^^^^^^^^ + ++--------------------------------+--------+-----------+ +| Attribute | Value | Classes | ++================================+========+===========+ +| DW_AT_APPLE_property | 0x3fed | Reference | ++--------------------------------+--------+-----------+ +| DW_AT_APPLE_property_getter | 0x3fe9 | String | ++--------------------------------+--------+-----------+ +| DW_AT_APPLE_property_setter | 0x3fea | String | ++--------------------------------+--------+-----------+ +| DW_AT_APPLE_property_attribute | 0x3feb | Constant | ++--------------------------------+--------+-----------+ + +New DWARF Constants +^^^^^^^^^^^^^^^^^^^ + ++--------------------------------+-------+ +| Name | Value | ++================================+=======+ +| DW_AT_APPLE_PROPERTY_readonly | 0x1 | ++--------------------------------+-------+ +| DW_AT_APPLE_PROPERTY_readwrite | 0x2 | ++--------------------------------+-------+ +| DW_AT_APPLE_PROPERTY_assign | 0x4 | ++--------------------------------+-------+ +| DW_AT_APPLE_PROPERTY_retain | 0x8 | ++--------------------------------+-------+ +| DW_AT_APPLE_PROPERTY_copy | 0x10 | ++--------------------------------+-------+ +| DW_AT_APPLE_PROPERTY_nonatomic | 0x20 | ++--------------------------------+-------+ + +Name Accelerator Tables +----------------------- + +Introduction +^^^^^^^^^^^^ + +The "``.debug_pubnames``" and "``.debug_pubtypes``" formats are not what a +debugger needs. The "``pub``" in the section name indicates that the entries +in the table are publicly visible names only. This means no static or hidden +functions show up in the "``.debug_pubnames``". No static variables or private +class variables are in the "``.debug_pubtypes``". Many compilers add different +things to these tables, so we can't rely upon the contents between gcc, icc, or +clang. + +The typical query given by users tends not to match up with the contents of +these tables. For example, the DWARF spec states that "In the case of the name +of a function member or static data member of a C++ structure, class or union, +the name presented in the "``.debug_pubnames``" section is not the simple name +given by the ``DW_AT_name attribute`` of the referenced debugging information +entry, but rather the fully qualified name of the data or function member." +So the only names in these tables for complex C++ entries is a fully +qualified name. Debugger users tend not to enter their search strings as +"``a::b::c(int,const Foo&) const``", but rather as "``c``", "``b::c``" , or +"``a::b::c``". So the name entered in the name table must be demangled in +order to chop it up appropriately and additional names must be manually entered +into the table to make it effective as a name lookup table for debuggers to +se. + +All debuggers currently ignore the "``.debug_pubnames``" table as a result of +its inconsistent and useless public-only name content making it a waste of +space in the object file. These tables, when they are written to disk, are not +sorted in any way, leaving every debugger to do its own parsing and sorting. +These tables also include an inlined copy of the string values in the table +itself making the tables much larger than they need to be on disk, especially +for large C++ programs. + +Can't we just fix the sections by adding all of the names we need to this +table? No, because that is not what the tables are defined to contain and we +won't know the difference between the old bad tables and the new good tables. +At best we could make our own renamed sections that contain all of the data we +need. + +These tables are also insufficient for what a debugger like LLDB needs. LLDB +uses clang for its expression parsing where LLDB acts as a PCH. LLDB is then +often asked to look for type "``foo``" or namespace "``bar``", or list items in +namespace "``baz``". Namespaces are not included in the pubnames or pubtypes +tables. Since clang asks a lot of questions when it is parsing an expression, +we need to be very fast when looking up names, as it happens a lot. Having new +accelerator tables that are optimized for very quick lookups will benefit this +type of debugging experience greatly. + +We would like to generate name lookup tables that can be mapped into memory +from disk, and used as is, with little or no up-front parsing. We would also +be able to control the exact content of these different tables so they contain +exactly what we need. The Name Accelerator Tables were designed to fix these +issues. In order to solve these issues we need to: + +* Have a format that can be mapped into memory from disk and used as is +* Lookups should be very fast +* Extensible table format so these tables can be made by many producers +* Contain all of the names needed for typical lookups out of the box +* Strict rules for the contents of tables + +Table size is important and the accelerator table format should allow the reuse +of strings from common string tables so the strings for the names are not +duplicated. We also want to make sure the table is ready to be used as-is by +simply mapping the table into memory with minimal header parsing. + +The name lookups need to be fast and optimized for the kinds of lookups that +debuggers tend to do. Optimally we would like to touch as few parts of the +mapped table as possible when doing a name lookup and be able to quickly find +the name entry we are looking for, or discover there are no matches. In the +case of debuggers we optimized for lookups that fail most of the time. + +Each table that is defined should have strict rules on exactly what is in the +accelerator tables and documented so clients can rely on the content. + +Hash Tables +^^^^^^^^^^^ + +Standard Hash Tables +"""""""""""""""""""" + +Typical hash tables have a header, buckets, and each bucket points to the +bucket contents: + +.. code-block:: none + + .------------. + | HEADER | + |------------| + | BUCKETS | + |------------| + | DATA | + `------------' + +The BUCKETS are an array of offsets to DATA for each hash: + +.. code-block:: none + + .------------. + | 0x00001000 | BUCKETS[0] + | 0x00002000 | BUCKETS[1] + | 0x00002200 | BUCKETS[2] + | 0x000034f0 | BUCKETS[3] + | | ... + | 0xXXXXXXXX | BUCKETS[n_buckets] + '------------' + +So for ``bucket[3]`` in the example above, we have an offset into the table +0x000034f0 which points to a chain of entries for the bucket. Each bucket must +contain a next pointer, full 32 bit hash value, the string itself, and the data +for the current string value. + +.. code-block:: none + + .------------. + 0x000034f0: | 0x00003500 | next pointer + | 0x12345678 | 32 bit hash + | "erase" | string value + | data[n] | HashData for this bucket + |------------| + 0x00003500: | 0x00003550 | next pointer + | 0x29273623 | 32 bit hash + | "dump" | string value + | data[n] | HashData for this bucket + |------------| + 0x00003550: | 0x00000000 | next pointer + | 0x82638293 | 32 bit hash + | "main" | string value + | data[n] | HashData for this bucket + `------------' + +The problem with this layout for debuggers is that we need to optimize for the +negative lookup case where the symbol we're searching for is not present. So +if we were to lookup "``printf``" in the table above, we would make a 32 hash +for "``printf``", it might match ``bucket[3]``. We would need to go to the +offset 0x000034f0 and start looking to see if our 32 bit hash matches. To do +so, we need to read the next pointer, then read the hash, compare it, and skip +to the next bucket. Each time we are skipping many bytes in memory and +touching new cache pages just to do the compare on the full 32 bit hash. All +of these accesses then tell us that we didn't have a match. + +Name Hash Tables +"""""""""""""""" + +To solve the issues mentioned above we have structured the hash tables a bit +differently: a header, buckets, an array of all unique 32 bit hash values, +followed by an array of hash value data offsets, one for each hash value, then +the data for all hash values: + +.. code-block:: none + + .-------------. + | HEADER | + |-------------| + | BUCKETS | + |-------------| + | HASHES | + |-------------| + | OFFSETS | + |-------------| + | DATA | + `-------------' + +The ``BUCKETS`` in the name tables are an index into the ``HASHES`` array. By +making all of the full 32 bit hash values contiguous in memory, we allow +ourselves to efficiently check for a match while touching as little memory as +possible. Most often checking the 32 bit hash values is as far as the lookup +goes. If it does match, it usually is a match with no collisions. So for a +table with "``n_buckets``" buckets, and "``n_hashes``" unique 32 bit hash +values, we can clarify the contents of the ``BUCKETS``, ``HASHES`` and +``OFFSETS`` as: + +.. code-block:: none + + .-------------------------. + | HEADER.magic | uint32_t + | HEADER.version | uint16_t + | HEADER.hash_function | uint16_t + | HEADER.bucket_count | uint32_t + | HEADER.hashes_count | uint32_t + | HEADER.header_data_len | uint32_t + | HEADER_DATA | HeaderData + |-------------------------| + | BUCKETS | uint32_t[bucket_count] // 32 bit hash indexes + |-------------------------| + | HASHES | uint32_t[hashes_count] // 32 bit hash values + |-------------------------| + | OFFSETS | uint32_t[hashes_count] // 32 bit offsets to hash value data + |-------------------------| + | ALL HASH DATA | + `-------------------------' + +So taking the exact same data from the standard hash example above we end up +with: + +.. code-block:: none + + .------------. + | HEADER | + |------------| + | 0 | BUCKETS[0] + | 2 | BUCKETS[1] + | 5 | BUCKETS[2] + | 6 | BUCKETS[3] + | | ... + | ... | BUCKETS[n_buckets] + |------------| + | 0x........ | HASHES[0] + | 0x........ | HASHES[1] + | 0x........ | HASHES[2] + | 0x........ | HASHES[3] + | 0x........ | HASHES[4] + | 0x........ | HASHES[5] + | 0x12345678 | HASHES[6] hash for BUCKETS[3] + | 0x29273623 | HASHES[7] hash for BUCKETS[3] + | 0x82638293 | HASHES[8] hash for BUCKETS[3] + | 0x........ | HASHES[9] + | 0x........ | HASHES[10] + | 0x........ | HASHES[11] + | 0x........ | HASHES[12] + | 0x........ | HASHES[13] + | 0x........ | HASHES[n_hashes] + |------------| + | 0x........ | OFFSETS[0] + | 0x........ | OFFSETS[1] + | 0x........ | OFFSETS[2] + | 0x........ | OFFSETS[3] + | 0x........ | OFFSETS[4] + | 0x........ | OFFSETS[5] + | 0x000034f0 | OFFSETS[6] offset for BUCKETS[3] + | 0x00003500 | OFFSETS[7] offset for BUCKETS[3] + | 0x00003550 | OFFSETS[8] offset for BUCKETS[3] + | 0x........ | OFFSETS[9] + | 0x........ | OFFSETS[10] + | 0x........ | OFFSETS[11] + | 0x........ | OFFSETS[12] + | 0x........ | OFFSETS[13] + | 0x........ | OFFSETS[n_hashes] + |------------| + | | + | | + | | + | | + | | + |------------| + 0x000034f0: | 0x00001203 | .debug_str ("erase") + | 0x00000004 | A 32 bit array count - number of HashData with name "erase" + | 0x........ | HashData[0] + | 0x........ | HashData[1] + | 0x........ | HashData[2] + | 0x........ | HashData[3] + | 0x00000000 | String offset into .debug_str (terminate data for hash) + |------------| + 0x00003500: | 0x00001203 | String offset into .debug_str ("collision") + | 0x00000002 | A 32 bit array count - number of HashData with name "collision" + | 0x........ | HashData[0] + | 0x........ | HashData[1] + | 0x00001203 | String offset into .debug_str ("dump") + | 0x00000003 | A 32 bit array count - number of HashData with name "dump" + | 0x........ | HashData[0] + | 0x........ | HashData[1] + | 0x........ | HashData[2] + | 0x00000000 | String offset into .debug_str (terminate data for hash) + |------------| + 0x00003550: | 0x00001203 | String offset into .debug_str ("main") + | 0x00000009 | A 32 bit array count - number of HashData with name "main" + | 0x........ | HashData[0] + | 0x........ | HashData[1] + | 0x........ | HashData[2] + | 0x........ | HashData[3] + | 0x........ | HashData[4] + | 0x........ | HashData[5] + | 0x........ | HashData[6] + | 0x........ | HashData[7] + | 0x........ | HashData[8] + | 0x00000000 | String offset into .debug_str (terminate data for hash) + `------------' + +So we still have all of the same data, we just organize it more efficiently for +debugger lookup. If we repeat the same "``printf``" lookup from above, we +would hash "``printf``" and find it matches ``BUCKETS[3]`` by taking the 32 bit +hash value and modulo it by ``n_buckets``. ``BUCKETS[3]`` contains "6" which +is the index into the ``HASHES`` table. We would then compare any consecutive +32 bit hashes values in the ``HASHES`` array as long as the hashes would be in +``BUCKETS[3]``. We do this by verifying that each subsequent hash value modulo +``n_buckets`` is still 3. In the case of a failed lookup we would access the +memory for ``BUCKETS[3]``, and then compare a few consecutive 32 bit hashes +before we know that we have no match. We don't end up marching through +multiple words of memory and we really keep the number of processor data cache +lines being accessed as small as possible. + +The string hash that is used for these lookup tables is the Daniel J. +Bernstein hash which is also used in the ELF ``GNU_HASH`` sections. It is a +very good hash for all kinds of names in programs with very few hash +collisions. + +Empty buckets are designated by using an invalid hash index of ``UINT32_MAX``. + +Details +^^^^^^^ + +These name hash tables are designed to be generic where specializations of the +table get to define additional data that goes into the header ("``HeaderData``"), +how the string value is stored ("``KeyType``") and the content of the data for each +hash value. + +Header Layout +""""""""""""" + +The header has a fixed part, and the specialized part. The exact format of the +header is: + +.. code-block:: c + + struct Header + { + uint32_t magic; // 'HASH' magic value to allow endian detection + uint16_t version; // Version number + uint16_t hash_function; // The hash function enumeration that was used + uint32_t bucket_count; // The number of buckets in this hash table + uint32_t hashes_count; // The total number of unique hash values and hash data offsets in this table + uint32_t header_data_len; // The bytes to skip to get to the hash indexes (buckets) for correct alignment + // Specifically the length of the following HeaderData field - this does not + // include the size of the preceding fields + HeaderData header_data; // Implementation specific header data + }; + +The header starts with a 32 bit "``magic``" value which must be ``'HASH'`` +encoded as an ASCII integer. This allows the detection of the start of the +hash table and also allows the table's byte order to be determined so the table +can be correctly extracted. The "``magic``" value is followed by a 16 bit +``version`` number which allows the table to be revised and modified in the +future. The current version number is 1. ``hash_function`` is a ``uint16_t`` +enumeration that specifies which hash function was used to produce this table. +The current values for the hash function enumerations include: + +.. code-block:: c + + enum HashFunctionType + { + eHashFunctionDJB = 0u, // Daniel J Bernstein hash function + }; + +``bucket_count`` is a 32 bit unsigned integer that represents how many buckets +are in the ``BUCKETS`` array. ``hashes_count`` is the number of unique 32 bit +hash values that are in the ``HASHES`` array, and is the same number of offsets +are contained in the ``OFFSETS`` array. ``header_data_len`` specifies the size +in bytes of the ``HeaderData`` that is filled in by specialized versions of +this table. + +Fixed Lookup +"""""""""""" + +The header is followed by the buckets, hashes, offsets, and hash value data. + +.. code-block:: c + + struct FixedTable + { + uint32_t buckets[Header.bucket_count]; // An array of hash indexes into the "hashes[]" array below + uint32_t hashes [Header.hashes_count]; // Every unique 32 bit hash for the entire table is in this table + uint32_t offsets[Header.hashes_count]; // An offset that corresponds to each item in the "hashes[]" array above + }; + +``buckets`` is an array of 32 bit indexes into the ``hashes`` array. The +``hashes`` array contains all of the 32 bit hash values for all names in the +hash table. Each hash in the ``hashes`` table has an offset in the ``offsets`` +array that points to the data for the hash value. + +This table setup makes it very easy to repurpose these tables to contain +different data, while keeping the lookup mechanism the same for all tables. +This layout also makes it possible to save the table to disk and map it in +later and do very efficient name lookups with little or no parsing. + +DWARF lookup tables can be implemented in a variety of ways and can store a lot +of information for each name. We want to make the DWARF tables extensible and +able to store the data efficiently so we have used some of the DWARF features +that enable efficient data storage to define exactly what kind of data we store +for each name. + +The ``HeaderData`` contains a definition of the contents of each HashData chunk. +We might want to store an offset to all of the debug information entries (DIEs) +for each name. To keep things extensible, we create a list of items, or +Atoms, that are contained in the data for each name. First comes the type of +the data in each atom: + +.. code-block:: c + + enum AtomType + { + eAtomTypeNULL = 0u, + eAtomTypeDIEOffset = 1u, // DIE offset, check form for encoding + eAtomTypeCUOffset = 2u, // DIE offset of the compiler unit header that contains the item in question + eAtomTypeTag = 3u, // DW_TAG_xxx value, should be encoded as DW_FORM_data1 (if no tags exceed 255) or DW_FORM_data2 + eAtomTypeNameFlags = 4u, // Flags from enum NameFlags + eAtomTypeTypeFlags = 5u, // Flags from enum TypeFlags + }; + +The enumeration values and their meanings are: + +.. code-block:: none + + eAtomTypeNULL - a termination atom that specifies the end of the atom list + eAtomTypeDIEOffset - an offset into the .debug_info section for the DWARF DIE for this name + eAtomTypeCUOffset - an offset into the .debug_info section for the CU that contains the DIE + eAtomTypeDIETag - The DW_TAG_XXX enumeration value so you don't have to parse the DWARF to see what it is + eAtomTypeNameFlags - Flags for functions and global variables (isFunction, isInlined, isExternal...) + eAtomTypeTypeFlags - Flags for types (isCXXClass, isObjCClass, ...) + +Then we allow each atom type to define the atom type and how the data for each +atom type data is encoded: + +.. code-block:: c + + struct Atom + { + uint16_t type; // AtomType enum value + uint16_t form; // DWARF DW_FORM_XXX defines + }; + +The ``form`` type above is from the DWARF specification and defines the exact +encoding of the data for the Atom type. See the DWARF specification for the +``DW_FORM_`` definitions. + +.. code-block:: c + + struct HeaderData + { + uint32_t die_offset_base; + uint32_t atom_count; + Atoms atoms[atom_count0]; + }; + +``HeaderData`` defines the base DIE offset that should be added to any atoms +that are encoded using the ``DW_FORM_ref1``, ``DW_FORM_ref2``, +``DW_FORM_ref4``, ``DW_FORM_ref8`` or ``DW_FORM_ref_udata``. It also defines +what is contained in each ``HashData`` object -- ``Atom.form`` tells us how large +each field will be in the ``HashData`` and the ``Atom.type`` tells us how this data +should be interpreted. + +For the current implementations of the "``.apple_names``" (all functions + +globals), the "``.apple_types``" (names of all types that are defined), and +the "``.apple_namespaces``" (all namespaces), we currently set the ``Atom`` +array to be: + +.. code-block:: c + + HeaderData.atom_count = 1; + HeaderData.atoms[0].type = eAtomTypeDIEOffset; + HeaderData.atoms[0].form = DW_FORM_data4; + +This defines the contents to be the DIE offset (eAtomTypeDIEOffset) that is + encoded as a 32 bit value (DW_FORM_data4). This allows a single name to have + multiple matching DIEs in a single file, which could come up with an inlined + function for instance. Future tables could include more information about the + DIE such as flags indicating if the DIE is a function, method, block, + or inlined. + +The KeyType for the DWARF table is a 32 bit string table offset into the + ".debug_str" table. The ".debug_str" is the string table for the DWARF which + may already contain copies of all of the strings. This helps make sure, with + help from the compiler, that we reuse the strings between all of the DWARF + sections and keeps the hash table size down. Another benefit to having the + compiler generate all strings as DW_FORM_strp in the debug info, is that + DWARF parsing can be made much faster. + +After a lookup is made, we get an offset into the hash data. The hash data + needs to be able to deal with 32 bit hash collisions, so the chunk of data + at the offset in the hash data consists of a triple: + +.. code-block:: c + + uint32_t str_offset + uint32_t hash_data_count + HashData[hash_data_count] + +If "str_offset" is zero, then the bucket contents are done. 99.9% of the + hash data chunks contain a single item (no 32 bit hash collision): + +.. code-block:: none + + .------------. + | 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main") + | 0x00000004 | uint32_t HashData count + | 0x........ | uint32_t HashData[0] DIE offset + | 0x........ | uint32_t HashData[1] DIE offset + | 0x........ | uint32_t HashData[2] DIE offset + | 0x........ | uint32_t HashData[3] DIE offset + | 0x00000000 | uint32_t KeyType (end of hash chain) + `------------' + +If there are collisions, you will have multiple valid string offsets: + +.. code-block:: none + + .------------. + | 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main") + | 0x00000004 | uint32_t HashData count + | 0x........ | uint32_t HashData[0] DIE offset + | 0x........ | uint32_t HashData[1] DIE offset + | 0x........ | uint32_t HashData[2] DIE offset + | 0x........ | uint32_t HashData[3] DIE offset + | 0x00002023 | uint32_t KeyType (.debug_str[0x0002023] => "print") + | 0x00000002 | uint32_t HashData count + | 0x........ | uint32_t HashData[0] DIE offset + | 0x........ | uint32_t HashData[1] DIE offset + | 0x00000000 | uint32_t KeyType (end of hash chain) + `------------' + +Current testing with real world C++ binaries has shown that there is around 1 +32 bit hash collision per 100,000 name entries. + +Contents +^^^^^^^^ + +As we said, we want to strictly define exactly what is included in the +different tables. For DWARF, we have 3 tables: "``.apple_names``", +"``.apple_types``", and "``.apple_namespaces``". + +"``.apple_names``" sections should contain an entry for each DWARF DIE whose +``DW_TAG`` is a ``DW_TAG_label``, ``DW_TAG_inlined_subroutine``, or +``DW_TAG_subprogram`` that has address attributes: ``DW_AT_low_pc``, +``DW_AT_high_pc``, ``DW_AT_ranges`` or ``DW_AT_entry_pc``. It also contains +``DW_TAG_variable`` DIEs that have a ``DW_OP_addr`` in the location (global and +static variables). All global and static variables should be included, +including those scoped within functions and classes. For example using the +following code: + +.. code-block:: c + + static int var = 0; + + void f () + { + static int var = 0; + } + +Both of the static ``var`` variables would be included in the table. All +functions should emit both their full names and their basenames. For C or C++, +the full name is the mangled name (if available) which is usually in the +``DW_AT_MIPS_linkage_name`` attribute, and the ``DW_AT_name`` contains the +function basename. If global or static variables have a mangled name in a +``DW_AT_MIPS_linkage_name`` attribute, this should be emitted along with the +simple name found in the ``DW_AT_name`` attribute. + +"``.apple_types``" sections should contain an entry for each DWARF DIE whose +tag is one of: + +* DW_TAG_array_type +* DW_TAG_class_type +* DW_TAG_enumeration_type +* DW_TAG_pointer_type +* DW_TAG_reference_type +* DW_TAG_string_type +* DW_TAG_structure_type +* DW_TAG_subroutine_type +* DW_TAG_typedef +* DW_TAG_union_type +* DW_TAG_ptr_to_member_type +* DW_TAG_set_type +* DW_TAG_subrange_type +* DW_TAG_base_type +* DW_TAG_const_type +* DW_TAG_constant +* DW_TAG_file_type +* DW_TAG_namelist +* DW_TAG_packed_type +* DW_TAG_volatile_type +* DW_TAG_restrict_type +* DW_TAG_interface_type +* DW_TAG_unspecified_type +* DW_TAG_shared_type + +Only entries with a ``DW_AT_name`` attribute are included, and the entry must +not be a forward declaration (``DW_AT_declaration`` attribute with a non-zero +value). For example, using the following code: + +.. code-block:: c + + int main () + { + int *b = 0; + return *b; + } + +We get a few type DIEs: + +.. code-block:: none + + 0x00000067: TAG_base_type [5] + AT_encoding( DW_ATE_signed ) + AT_name( "int" ) + AT_byte_size( 0x04 ) + + 0x0000006e: TAG_pointer_type [6] + AT_type( {0x00000067} ( int ) ) + AT_byte_size( 0x08 ) + +The DW_TAG_pointer_type is not included because it does not have a ``DW_AT_name``. + +"``.apple_namespaces``" section should contain all ``DW_TAG_namespace`` DIEs. +If we run into a namespace that has no name this is an anonymous namespace, and +the name should be output as "``(anonymous namespace)``" (without the quotes). +Why? This matches the output of the ``abi::cxa_demangle()`` that is in the +standard C++ library that demangles mangled names. + + +Language Extensions and File Format Changes +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Objective-C Extensions +"""""""""""""""""""""" + +"``.apple_objc``" section should contain all ``DW_TAG_subprogram`` DIEs for an +Objective-C class. The name used in the hash table is the name of the +Objective-C class itself. If the Objective-C class has a category, then an +entry is made for both the class name without the category, and for the class +name with the category. So if we have a DIE at offset 0x1234 with a name of +method "``-[NSString(my_additions) stringWithSpecialString:]``", we would add +an entry for "``NSString``" that points to DIE 0x1234, and an entry for +"``NSString(my_additions)``" that points to 0x1234. This allows us to quickly +track down all Objective-C methods for an Objective-C class when doing +expressions. It is needed because of the dynamic nature of Objective-C where +anyone can add methods to a class. The DWARF for Objective-C methods is also +emitted differently from C++ classes where the methods are not usually +contained in the class definition, they are scattered about across one or more +compile units. Categories can also be defined in different shared libraries. +So we need to be able to quickly find all of the methods and class functions +given the Objective-C class name, or quickly find all methods and class +functions for a class + category name. This table does not contain any +selector names, it just maps Objective-C class names (or class names + +category) to all of the methods and class functions. The selectors are added +as function basenames in the "``.debug_names``" section. + +In the "``.apple_names``" section for Objective-C functions, the full name is +the entire function name with the brackets ("``-[NSString +stringWithCString:]``") and the basename is the selector only +("``stringWithCString:``"). + +Mach-O Changes +"""""""""""""" + +The sections names for the apple hash tables are for non mach-o files. For +mach-o files, the sections should be contained in the ``__DWARF`` segment with +names as follows: + +* "``.apple_names``" -> "``__apple_names``" +* "``.apple_types``" -> "``__apple_types``" +* "``.apple_namespaces``" -> "``__apple_namespac``" (16 character limit) +* "``.apple_objc``" -> "``__apple_objc``" + diff --git a/docs/SphinxQuickstartTemplate.rst b/docs/SphinxQuickstartTemplate.rst index 75d916368e..640df63db1 100644 --- a/docs/SphinxQuickstartTemplate.rst +++ b/docs/SphinxQuickstartTemplate.rst @@ -116,6 +116,26 @@ For a shell session, use a ``bash`` code block: If you need to show LLVM IR use the ``llvm`` code block. +You can show preformatted text without any syntax highlighting like this: + +:: + + . + +:. + ..:: :: + .++:+:: ::+:.:. + .:+ : + ::.::..:: .+. + ..:+ :: : + ......+:. .. + :++. .. : + .+:::+:: : + .. . .+ :: + +.: .::+. + ...+. .: . + .++:.. + ... + Hopefully you won't need to be this deep """""""""""""""""""""""""""""""""""""""" diff --git a/docs/SystemLibrary.html b/docs/SystemLibrary.html deleted file mode 100644 index 1ef221fa27..0000000000 --- a/docs/SystemLibrary.html +++ /dev/null @@ -1,316 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> -<html> -<head> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <title>System Library</title> - <link rel="stylesheet" href="_static/llvm.css" type="text/css"> -</head> -<body> - -<h1>System Library</h1> -<ul> - <li><a href="#abstract">Abstract</a></li> - <li><a href="#requirements">Keeping LLVM Portable</a> - <ol> - <li><a href="#headers">Don't Include System Headers</a></li> - <li><a href="#expose">Don't Expose System Headers</a></li> - <li><a href="#c_headers">Allow Standard C Header Files</a></li> - <li><a href="#cpp_headers">Allow Standard C++ Header Files</a></li> - <li><a href="#highlev">High-Level Interface</a></li> - <li><a href="#nofunc">No Exposed Functions</a></li> - <li><a href="#nodata">No Exposed Data</a></li> - <li><a href="#nodupl">No Duplicate Implementations</a></li> - <li><a href="#nounused">No Unused Functionality</a></li> - <li><a href="#virtuals">No Virtual Methods</a></li> - <li><a href="#softerrors">Minimize Soft Errors</a></li> - <li><a href="#throw_spec">No throw() Specifications</a></li> - <li><a href="#organization">Code Organization</a></li> - <li><a href="#semantics">Consistent Semantics</a></li> - <li><a href="#bug">Tracking Bugzilla Bug: 351</a></li> - </ol></li> -</ul> - -<div class="doc_author"> - <p>Written by <a href="mailto:rspencer@x10sys.com">Reid Spencer</a></p> -</div> - - -<!-- *********************************************************************** --> -<h2><a name="abstract">Abstract</a></h2> -<div> - <p>This document provides some details on LLVM's System Library, located in - the source at <tt>lib/System</tt> and <tt>include/llvm/System</tt>. The - library's purpose is to shield LLVM from the differences between operating - systems for the few services LLVM needs from the operating system. Much of - LLVM is written using portability features of standard C++. However, in a few - areas, system dependent facilities are needed and the System Library is the - wrapper around those system calls.</p> - <p>By centralizing LLVM's use of operating system interfaces, we make it - possible for the LLVM tool chain and runtime libraries to be more easily - ported to new platforms since (theoretically) only <tt>lib/System</tt> needs - to be ported. This library also unclutters the rest of LLVM from #ifdef use - and special cases for specific operating systems. Such uses are replaced - with simple calls to the interfaces provided in <tt>include/llvm/System</tt>. - </p> - <p>Note that the System Library is not intended to be a complete operating - system wrapper (such as the Adaptive Communications Environment (ACE) or - Apache Portable Runtime (APR)), but only provides the functionality necessary - to support LLVM. - <p>The System Library was written by Reid Spencer who formulated the - design based on similar work originating from the eXtensible Programming - System (XPS). Several people helped with the effort; especially, - Jeff Cohen and Henrik Bach on the Win32 port.</p> -</div> - -<!-- *********************************************************************** --> -<h2> - <a name="requirements">Keeping LLVM Portable</a> -</h2> -<div> - <p>In order to keep LLVM portable, LLVM developers should adhere to a set of - portability rules associated with the System Library. Adherence to these rules - should help the System Library achieve its goal of shielding LLVM from the - variations in operating system interfaces and doing so efficiently. The - following sections define the rules needed to fulfill this objective.</p> - -<!-- ======================================================================= --> -<h3><a name="headers">Don't Include System Headers</a></h3> -<div> - <p>Except in <tt>lib/System</tt>, no LLVM source code should directly - <tt>#include</tt> a system header. Care has been taken to remove all such - <tt>#includes</tt> from LLVM while <tt>lib/System</tt> was being - developed. Specifically this means that header files like "unistd.h", - "windows.h", "stdio.h", and "string.h" are forbidden to be included by LLVM - source code outside the implementation of <tt>lib/System</tt>.</p> - <p>To obtain system-dependent functionality, existing interfaces to the system - found in <tt>include/llvm/System</tt> should be used. If an appropriate - interface is not available, it should be added to <tt>include/llvm/System</tt> - and implemented in <tt>lib/System</tt> for all supported platforms.</p> -</div> - -<!-- ======================================================================= --> -<h3><a name="expose">Don't Expose System Headers</a></h3> -<div> - <p>The System Library must shield LLVM from <em>all</em> system headers. To - obtain system level functionality, LLVM source must - <tt>#include "llvm/System/Thing.h"</tt> and nothing else. This means that - <tt>Thing.h</tt> cannot expose any system header files. This protects LLVM - from accidentally using system specific functionality and only allows it - via the <tt>lib/System</tt> interface.</p> -</div> - -<!-- ======================================================================= --> -<h3><a name="c_headers">Use Standard C Headers</a></h3> -<div> - <p>The <em>standard</em> C headers (the ones beginning with "c") are allowed - to be exposed through the <tt>lib/System</tt> interface. These headers and - the things they declare are considered to be platform agnostic. LLVM source - files may include them directly or obtain their inclusion through - <tt>lib/System</tt> interfaces.</p> -</div> - -<!-- ======================================================================= --> -<h3><a name="cpp_headers">Use Standard C++ Headers</a></h3> -<div> - <p>The <em>standard</em> C++ headers from the standard C++ library and - standard template library may be exposed through the <tt>lib/System</tt> - interface. These headers and the things they declare are considered to be - platform agnostic. LLVM source files may include them or obtain their - inclusion through lib/System interfaces.</p> -</div> - -<!-- ======================================================================= --> -<h3><a name="highlev">High Level Interface</a></h3> -<div> - <p>The entry points specified in the interface of lib/System must be aimed at - completing some reasonably high level task needed by LLVM. We do not want to - simply wrap each operating system call. It would be preferable to wrap several - operating system calls that are always used in conjunction with one another by - LLVM.</p> - <p>For example, consider what is needed to execute a program, wait for it to - complete, and return its result code. On Unix, this involves the following - operating system calls: <tt>getenv, fork, execve,</tt> and <tt>wait</tt>. The - correct thing for lib/System to provide is a function, say - <tt>ExecuteProgramAndWait</tt>, that implements the functionality completely. - what we don't want is wrappers for the operating system calls involved.</p> - <p>There must <em>not</em> be a one-to-one relationship between operating - system calls and the System library's interface. Any such interface function - will be suspicious.</p> -</div> - -<!-- ======================================================================= --> -<h3><a name="nounused">No Unused Functionality</a></h3> -<div> - <p>There must be no functionality specified in the interface of lib/System - that isn't actually used by LLVM. We're not writing a general purpose - operating system wrapper here, just enough to satisfy LLVM's needs. And, LLVM - doesn't need much. This design goal aims to keep the lib/System interface - small and understandable which should foster its actual use and adoption.</p> -</div> - -<!-- ======================================================================= --> -<h3><a name="nodupl">No Duplicate Implementations</a></h3> -<div> - <p>The implementation of a function for a given platform must be written - exactly once. This implies that it must be possible to apply a function's - implementation to multiple operating systems if those operating systems can - share the same implementation. This rule applies to the set of operating - systems supported for a given class of operating system (e.g. Unix, Win32). - </p> -</div> - -<!-- ======================================================================= --> -<h3><a name="virtuals">No Virtual Methods</a></h3> -<div> - <p>The System Library interfaces can be called quite frequently by LLVM. In - order to make those calls as efficient as possible, we discourage the use of - virtual methods. There is no need to use inheritance for implementation - differences, it just adds complexity. The <tt>#include</tt> mechanism works - just fine.</p> -</div> - -<!-- ======================================================================= --> -<h3><a name="nofunc">No Exposed Functions</a></h3> -<div> - <p>Any functions defined by system libraries (i.e. not defined by lib/System) - must not be exposed through the lib/System interface, even if the header file - for that function is not exposed. This prevents inadvertent use of system - specific functionality.</p> - <p>For example, the <tt>stat</tt> system call is notorious for having - variations in the data it provides. <tt>lib/System</tt> must not declare - <tt>stat</tt> nor allow it to be declared. Instead it should provide its own - interface to discovering information about files and directories. Those - interfaces may be implemented in terms of <tt>stat</tt> but that is strictly - an implementation detail. The interface provided by the System Library must - be implemented on all platforms (even those without <tt>stat</tt>).</p> -</div> - -<!-- ======================================================================= --> -<h3><a name="nodata">No Exposed Data</a></h3> -<div> - <p>Any data defined by system libraries (i.e. not defined by lib/System) must - not be exposed through the lib/System interface, even if the header file for - that function is not exposed. As with functions, this prevents inadvertent use - of data that might not exist on all platforms.</p> -</div> - -<!-- ======================================================================= --> -<h3><a name="softerrors">Minimize Soft Errors</a></h3> -<div> - <p>Operating system interfaces will generally provide error results for every - little thing that could go wrong. In almost all cases, you can divide these - error results into two groups: normal/good/soft and abnormal/bad/hard. That - is, some of the errors are simply information like "file not found", - "insufficient privileges", etc. while other errors are much harder like - "out of space", "bad disk sector", or "system call interrupted". We'll call - the first group "<i>soft</i>" errors and the second group "<i>hard</i>" - errors.<p> - <p>lib/System must always attempt to minimize soft errors. - This is a design requirement because the - minimization of soft errors can affect the granularity and the nature of the - interface. In general, if you find that you're wanting to throw soft errors, - you must review the granularity of the interface because it is likely you're - trying to implement something that is too low level. The rule of thumb is to - provide interface functions that <em>can't</em> fail, except when faced with - hard errors.</p> - <p>For a trivial example, suppose we wanted to add an "OpenFileForWriting" - function. For many operating systems, if the file doesn't exist, attempting - to open the file will produce an error. However, lib/System should not - simply throw that error if it occurs because its a soft error. The problem - is that the interface function, OpenFileForWriting is too low level. It should - be OpenOrCreateFileForWriting. In the case of the soft "doesn't exist" error, - this function would just create it and then open it for writing.</p> - <p>This design principle needs to be maintained in lib/System because it - avoids the propagation of soft error handling throughout the rest of LLVM. - Hard errors will generally just cause a termination for an LLVM tool so don't - be bashful about throwing them.</p> - <p>Rules of thumb:</p> - <ol> - <li>Don't throw soft errors, only hard errors.</li> - <li>If you're tempted to throw a soft error, re-think the interface.</li> - <li>Handle internally the most common normal/good/soft error conditions - so the rest of LLVM doesn't have to.</li> - </ol> -</div> - -<!-- ======================================================================= --> -<h3><a name="throw_spec">No throw Specifications</a></h3> -<div> - <p>None of the lib/System interface functions may be declared with C++ - <tt>throw()</tt> specifications on them. This requirement makes sure that the - compiler does not insert additional exception handling code into the interface - functions. This is a performance consideration: lib/System functions are at - the bottom of many call chains and as such can be frequently called. We - need them to be as efficient as possible. However, no routines in the - system library should actually throw exceptions.</p> -</div> - -<!-- ======================================================================= --> -<h3><a name="organization">Code Organization</a></h3> -<div> - <p>Implementations of the System Library interface are separated by their - general class of operating system. Currently only Unix and Win32 classes are - defined but more could be added for other operating system classifications. - To distinguish which implementation to compile, the code in lib/System uses - the LLVM_ON_UNIX and LLVM_ON_WIN32 #defines provided via configure through the - llvm/Config/config.h file. Each source file in lib/System, after implementing - the generic (operating system independent) functionality needs to include the - correct implementation using a set of <tt>#if defined(LLVM_ON_XYZ)</tt> - directives. For example, if we had lib/System/File.cpp, we'd expect to see in - that file:</p> - <pre><tt> - #if defined(LLVM_ON_UNIX) - #include "Unix/File.cpp" - #endif - #if defined(LLVM_ON_WIN32) - #include "Win32/File.cpp" - #endif - </tt></pre> - <p>The implementation in lib/System/Unix/File.cpp should handle all Unix - variants. The implementation in lib/System/Win32/File.cpp should handle all - Win32 variants. What this does is quickly differentiate the basic class of - operating system that will provide the implementation. The specific details - for a given platform must still be determined through the use of - <tt>#ifdef</tt>.</p> -</div> - -<!-- ======================================================================= --> -<h3><a name="semantics">Consistent Semantics</a></h3> -<div> - <p>The implementation of a lib/System interface can vary drastically between - platforms. That's okay as long as the end result of the interface function - is the same. For example, a function to create a directory is pretty straight - forward on all operating system. System V IPC on the other hand isn't even - supported on all platforms. Instead of "supporting" System V IPC, lib/System - should provide an interface to the basic concept of inter-process - communications. The implementations might use System V IPC if that was - available or named pipes, or whatever gets the job done effectively for a - given operating system. In all cases, the interface and the implementation - must be semantically consistent. </p> -</div> - -<!-- ======================================================================= --> -<h3><a name="bug">Bug 351</a></h3> -<div> - <p>See <a href="http://llvm.org/PR351">bug 351</a> - for further details on the progress of this work</p> -</div> - -</div> - -<!-- *********************************************************************** --> - -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss-blue" alt="Valid CSS"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"></a> - - <a href="mailto:rspencer@x10sys.com">Reid Spencer</a><br> - <a href="http://llvm.org/">LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> -</body> -</html> diff --git a/docs/SystemLibrary.rst b/docs/SystemLibrary.rst new file mode 100644 index 0000000000..88404f4d81 --- /dev/null +++ b/docs/SystemLibrary.rst @@ -0,0 +1,249 @@ +============== +System Library +============== + +.. sectionauthor:: Reid Spencer <rspencer@x10sys.com> + +Abstract +======== + +This document provides some details on LLVM's System Library, located in the +source at ``lib/System`` and ``include/llvm/System``. The library's purpose is +to shield LLVM from the differences between operating systems for the few +services LLVM needs from the operating system. Much of LLVM is written using +portability features of standard C++. However, in a few areas, system dependent +facilities are needed and the System Library is the wrapper around those system +calls. + +By centralizing LLVM's use of operating system interfaces, we make it possible +for the LLVM tool chain and runtime libraries to be more easily ported to new +platforms since (theoretically) only ``lib/System`` needs to be ported. This +library also unclutters the rest of LLVM from #ifdef use and special cases for +specific operating systems. Such uses are replaced with simple calls to the +interfaces provided in ``include/llvm/System``. + +Note that the System Library is not intended to be a complete operating system +wrapper (such as the Adaptive Communications Environment (ACE) or Apache +Portable Runtime (APR)), but only provides the functionality necessary to +support LLVM. + +The System Library was written by Reid Spencer who formulated the design based +on similar work originating from the eXtensible Programming System (XPS). +Several people helped with the effort; especially, Jeff Cohen and Henrik Bach +on the Win32 port. + +Keeping LLVM Portable +===================== + +In order to keep LLVM portable, LLVM developers should adhere to a set of +portability rules associated with the System Library. Adherence to these rules +should help the System Library achieve its goal of shielding LLVM from the +variations in operating system interfaces and doing so efficiently. The +following sections define the rules needed to fulfill this objective. + +Don't Include System Headers +---------------------------- + +Except in ``lib/System``, no LLVM source code should directly ``#include`` a +system header. Care has been taken to remove all such ``#includes`` from LLVM +while ``lib/System`` was being developed. Specifically this means that header +files like "``unistd.h``", "``windows.h``", "``stdio.h``", and "``string.h``" +are forbidden to be included by LLVM source code outside the implementation of +``lib/System``. + +To obtain system-dependent functionality, existing interfaces to the system +found in ``include/llvm/System`` should be used. If an appropriate interface is +not available, it should be added to ``include/llvm/System`` and implemented in +``lib/System`` for all supported platforms. + +Don't Expose System Headers +--------------------------- + +The System Library must shield LLVM from **all** system headers. To obtain +system level functionality, LLVM source must ``#include "llvm/System/Thing.h"`` +and nothing else. This means that ``Thing.h`` cannot expose any system header +files. This protects LLVM from accidentally using system specific functionality +and only allows it via the ``lib/System`` interface. + +Use Standard C Headers +---------------------- + +The **standard** C headers (the ones beginning with "c") are allowed to be +exposed through the ``lib/System`` interface. These headers and the things they +declare are considered to be platform agnostic. LLVM source files may include +them directly or obtain their inclusion through ``lib/System`` interfaces. + +Use Standard C++ Headers +------------------------ + +The **standard** C++ headers from the standard C++ library and standard +template library may be exposed through the ``lib/System`` interface. These +headers and the things they declare are considered to be platform agnostic. +LLVM source files may include them or obtain their inclusion through +``lib/System`` interfaces. + +High Level Interface +-------------------- + +The entry points specified in the interface of ``lib/System`` must be aimed at +completing some reasonably high level task needed by LLVM. We do not want to +simply wrap each operating system call. It would be preferable to wrap several +operating system calls that are always used in conjunction with one another by +LLVM. + +For example, consider what is needed to execute a program, wait for it to +complete, and return its result code. On Unix, this involves the following +operating system calls: ``getenv``, ``fork``, ``execve``, and ``wait``. The +correct thing for ``lib/System`` to provide is a function, say +``ExecuteProgramAndWait``, that implements the functionality completely. what +we don't want is wrappers for the operating system calls involved. + +There must **not** be a one-to-one relationship between operating system +calls and the System library's interface. Any such interface function will be +suspicious. + +No Unused Functionality +----------------------- + +There must be no functionality specified in the interface of ``lib/System`` +that isn't actually used by LLVM. We're not writing a general purpose operating +system wrapper here, just enough to satisfy LLVM's needs. And, LLVM doesn't +need much. This design goal aims to keep the ``lib/System`` interface small and +understandable which should foster its actual use and adoption. + +No Duplicate Implementations +---------------------------- + +The implementation of a function for a given platform must be written exactly +once. This implies that it must be possible to apply a function's +implementation to multiple operating systems if those operating systems can +share the same implementation. This rule applies to the set of operating +systems supported for a given class of operating system (e.g. Unix, Win32). + +No Virtual Methods +------------------ + +The System Library interfaces can be called quite frequently by LLVM. In order +to make those calls as efficient as possible, we discourage the use of virtual +methods. There is no need to use inheritance for implementation differences, it +just adds complexity. The ``#include`` mechanism works just fine. + +No Exposed Functions +-------------------- + +Any functions defined by system libraries (i.e. not defined by ``lib/System``) +must not be exposed through the ``lib/System`` interface, even if the header +file for that function is not exposed. This prevents inadvertent use of system +specific functionality. + +For example, the ``stat`` system call is notorious for having variations in the +data it provides. ``lib/System`` must not declare ``stat`` nor allow it to be +declared. Instead it should provide its own interface to discovering +information about files and directories. Those interfaces may be implemented in +terms of ``stat`` but that is strictly an implementation detail. The interface +provided by the System Library must be implemented on all platforms (even those +without ``stat``). + +No Exposed Data +--------------- + +Any data defined by system libraries (i.e. not defined by ``lib/System``) must +not be exposed through the ``lib/System`` interface, even if the header file +for that function is not exposed. As with functions, this prevents inadvertent +use of data that might not exist on all platforms. + +Minimize Soft Errors +-------------------- + +Operating system interfaces will generally provide error results for every +little thing that could go wrong. In almost all cases, you can divide these +error results into two groups: normal/good/soft and abnormal/bad/hard. That is, +some of the errors are simply information like "file not found", "insufficient +privileges", etc. while other errors are much harder like "out of space", "bad +disk sector", or "system call interrupted". We'll call the first group "*soft*" +errors and the second group "*hard*" errors. + +``lib/System`` must always attempt to minimize soft errors. This is a design +requirement because the minimization of soft errors can affect the granularity +and the nature of the interface. In general, if you find that you're wanting to +throw soft errors, you must review the granularity of the interface because it +is likely you're trying to implement something that is too low level. The rule +of thumb is to provide interface functions that **can't** fail, except when +faced with hard errors. + +For a trivial example, suppose we wanted to add an "``OpenFileForWriting``" +function. For many operating systems, if the file doesn't exist, attempting to +open the file will produce an error. However, ``lib/System`` should not simply +throw that error if it occurs because its a soft error. The problem is that the +interface function, ``OpenFileForWriting`` is too low level. It should be +``OpenOrCreateFileForWriting``. In the case of the soft "doesn't exist" error, +this function would just create it and then open it for writing. + +This design principle needs to be maintained in ``lib/System`` because it +avoids the propagation of soft error handling throughout the rest of LLVM. +Hard errors will generally just cause a termination for an LLVM tool so don't +be bashful about throwing them. + +Rules of thumb: + +#. Don't throw soft errors, only hard errors. + +#. If you're tempted to throw a soft error, re-think the interface. + +#. Handle internally the most common normal/good/soft error conditions + so the rest of LLVM doesn't have to. + +No throw Specifications +----------------------- + +None of the ``lib/System`` interface functions may be declared with C++ +``throw()`` specifications on them. This requirement makes sure that the +compiler does not insert additional exception handling code into the interface +functions. This is a performance consideration: ``lib/System`` functions are at +the bottom of many call chains and as such can be frequently called. We need +them to be as efficient as possible. However, no routines in the system +library should actually throw exceptions. + +Code Organization +----------------- + +Implementations of the System Library interface are separated by their general +class of operating system. Currently only Unix and Win32 classes are defined +but more could be added for other operating system classifications. To +distinguish which implementation to compile, the code in ``lib/System`` uses +the ``LLVM_ON_UNIX`` and ``LLVM_ON_WIN32`` ``#defines`` provided via configure +through the ``llvm/Config/config.h`` file. Each source file in ``lib/System``, +after implementing the generic (operating system independent) functionality +needs to include the correct implementation using a set of +``#if defined(LLVM_ON_XYZ)`` directives. For example, if we had +``lib/System/File.cpp``, we'd expect to see in that file: + +.. code-block:: c++ + + #if defined(LLVM_ON_UNIX) + #include "Unix/File.cpp" + #endif + #if defined(LLVM_ON_WIN32) + #include "Win32/File.cpp" + #endif + +The implementation in ``lib/System/Unix/File.cpp`` should handle all Unix +variants. The implementation in ``lib/System/Win32/File.cpp`` should handle all +Win32 variants. What this does is quickly differentiate the basic class of +operating system that will provide the implementation. The specific details for +a given platform must still be determined through the use of ``#ifdef``. + +Consistent Semantics +-------------------- + +The implementation of a ``lib/System`` interface can vary drastically between +platforms. That's okay as long as the end result of the interface function is +the same. For example, a function to create a directory is pretty straight +forward on all operating system. System V IPC on the other hand isn't even +supported on all platforms. Instead of "supporting" System V IPC, +``lib/System`` should provide an interface to the basic concept of +inter-process communications. The implementations might use System V IPC if +that was available or named pipes, or whatever gets the job done effectively +for a given operating system. In all cases, the interface and the +implementation must be semantically consistent. + diff --git a/docs/TableGenFundamentals.rst b/docs/TableGenFundamentals.rst index bfb2618998..356b7d208e 100644 --- a/docs/TableGenFundamentals.rst +++ b/docs/TableGenFundamentals.rst @@ -120,16 +120,16 @@ this (at the time of this writing): } ... -This definition corresponds to a 32-bit register-register add instruction in the -X86. The string after the '``def``' string indicates the name of the -record---"``ADD32rr``" in this case---and the comment at the end of the line -indicates the superclasses of the definition. The body of the record contains -all of the data that TableGen assembled for the record, indicating that the -instruction is part of the "X86" namespace, the pattern indicating how the the -instruction should be emitted into the assembly file, that it is a two-address -instruction, has a particular encoding, etc. The contents and semantics of the -information in the record is specific to the needs of the X86 backend, and is -only shown as an example. +This definition corresponds to the 32-bit register-register ``add`` instruction +of the the x86 architecture. ``def ADD32rr`` defines a record named +``ADD32rr``, and the comment at the end of the line indicates the superclasses +of the definition. The body of the record contains all of the data that +TableGen assembled for the record, indicating that the instruction is part of +the "X86" namespace, the pattern indicating how the the instruction should be +emitted into the assembly file, that it is a two-address instruction, has a +particular encoding, etc. The contents and semantics of the information in the +record are specific to the needs of the X86 backend, and are only shown as an +example. As you can see, a lot of information is needed for every instruction supported by the code generator, and specifying it all manually would be unmaintainable, @@ -152,13 +152,12 @@ factor out the common features that instructions of its class share. A key feature of TableGen is that it allows the end-user to define the abstractions they prefer to use when describing their information. -Each def record has a special entry called "``NAME``." This is the name of the -def ("``ADD32rr``" above). In the general case def names can be formed from -various kinds of string processing expressions and ``NAME`` resolves to the +Each ``def`` record has a special entry called "NAME". This is the name of the +record ("``ADD32rr``" above). In the general case ``def`` names can be formed +from various kinds of string processing expressions and ``NAME`` resolves to the final value obtained after resolving all of those expressions. The user may -refer to ``NAME`` anywhere she desires to use the ultimate name of the def. -``NAME`` should not be defined anywhere else in user code to avoid conflict -problems. +refer to ``NAME`` anywhere she desires to use the ultimate name of the ``def``. +``NAME`` should not be defined anywhere else in user code to avoid conflicts. Running TableGen ---------------- diff --git a/docs/TestSuiteMakefileGuide.html b/docs/TestSuiteMakefileGuide.html deleted file mode 100644 index 1b24250380..0000000000 --- a/docs/TestSuiteMakefileGuide.html +++ /dev/null @@ -1,351 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> -<html> -<head> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <title>LLVM test-suite Makefile Guide</title> - <link rel="stylesheet" href="_static/llvm.css" type="text/css"> -</head> -<body> - -<h1> - LLVM test-suite Makefile Guide -</h1> - -<ol> - <li><a href="#overview">Overview</a></li> - <li><a href="#testsuitestructure">Test suite structure</a></li> - <li><a href="#testsuiterun">Running the test suite</a> - <ul> - <li><a href="#testsuiteexternal">Configuring External Tests</a></li> - <li><a href="#testsuitetests">Running different tests</a></li> - <li><a href="#testsuiteoutput">Generating test output</a></li> - <li><a href="#testsuitecustom">Writing custom tests for test-suite</a></li> - </ul> - </li> -</ol> - -<div class="doc_author"> - <p>Written by John T. Criswell, Daniel Dunbar, Reid Spencer, and Tanya Lattner</p> -</div> - -<!--=========================================================================--> -<h2><a name="overview">Overview</a></h2> -<!--=========================================================================--> - -<div> - -<p>This document describes the features of the Makefile-based LLVM -test-suite. This way of interacting with the test-suite is deprecated in favor -of running the test-suite using LNT, but may continue to prove useful for some -users. See the Testing -Guide's <a href="TestingGuide.html#testsuitequickstart">test-suite -Quickstart</a> section for more information.</p> - -</div> - -<!--=========================================================================--> -<h2><a name="testsuitestructure">Test suite Structure</a></h2> -<!--=========================================================================--> - -<div> - -<p>The <tt>test-suite</tt> module contains a number of programs that can be compiled -with LLVM and executed. These programs are compiled using the native compiler -and various LLVM backends. The output from the program compiled with the -native compiler is assumed correct; the results from the other programs are -compared to the native program output and pass if they match.</p> - -<p>When executing tests, it is usually a good idea to start out with a subset of -the available tests or programs. This makes test run times smaller at first and -later on this is useful to investigate individual test failures. To run some -test only on a subset of programs, simply change directory to the programs you -want tested and run <tt>gmake</tt> there. Alternatively, you can run a different -test using the <tt>TEST</tt> variable to change what tests or run on the -selected programs (see below for more info).</p> - -<p>In addition for testing correctness, the <tt>test-suite</tt> directory also -performs timing tests of various LLVM optimizations. It also records -compilation times for the compilers and the JIT. This information can be -used to compare the effectiveness of LLVM's optimizations and code -generation.</p> - -<p><tt>test-suite</tt> tests are divided into three types of tests: MultiSource, -SingleSource, and External.</p> - -<ul> -<li><tt>test-suite/SingleSource</tt> -<p>The SingleSource directory contains test programs that are only a single -source file in size. These are usually small benchmark programs or small -programs that calculate a particular value. Several such programs are grouped -together in each directory.</p></li> - -<li><tt>test-suite/MultiSource</tt> -<p>The MultiSource directory contains subdirectories which contain entire -programs with multiple source files. Large benchmarks and whole applications -go here.</p></li> - -<li><tt>test-suite/External</tt> -<p>The External directory contains Makefiles for building code that is external -to (i.e., not distributed with) LLVM. The most prominent members of this -directory are the SPEC 95 and SPEC 2000 benchmark suites. The <tt>External</tt> -directory does not contain these actual tests, but only the Makefiles that know -how to properly compile these programs from somewhere else. The presence and -location of these external programs is configured by the test-suite -<tt>configure</tt> script.</p></li> -</ul> - -<p>Each tree is then subdivided into several categories, including applications, -benchmarks, regression tests, code that is strange grammatically, etc. These -organizations should be relatively self explanatory.</p> - -<p>Some tests are known to fail. Some are bugs that we have not fixed yet; -others are features that we haven't added yet (or may never add). In the -regression tests, the result for such tests will be XFAIL (eXpected FAILure). -In this way, you can tell the difference between an expected and unexpected -failure.</p> - -<p>The tests in the test suite have no such feature at this time. If the -test passes, only warnings and other miscellaneous output will be generated. If -a test fails, a large <program> FAILED message will be displayed. This -will help you separate benign warnings from actual test failures.</p> - -</div> - -<!--=========================================================================--> -<h2><a name="testsuiterun">Running the test suite</a></h2> -<!--=========================================================================--> - -<div> - -<p>First, all tests are executed within the LLVM object directory tree. They -<i>are not</i> executed inside of the LLVM source tree. This is because the -test suite creates temporary files during execution.</p> - -<p>To run the test suite, you need to use the following steps:</p> - -<ol> - <li><tt>cd</tt> into the <tt>llvm/projects</tt> directory in your source tree. - </li> - - <li><p>Check out the <tt>test-suite</tt> module with:</p> - -<div class="doc_code"> -<pre> -% svn co http://llvm.org/svn/llvm-project/test-suite/trunk test-suite -</pre> -</div> - <p>This will get the test suite into <tt>llvm/projects/test-suite</tt>.</p> - </li> - <li><p>Configure and build <tt>llvm</tt>.</p></li> - <li><p>Configure and build <tt>llvm-gcc</tt>.</p></li> - <li><p>Install <tt>llvm-gcc</tt> somewhere.</p></li> - <li><p><em>Re-configure</em> <tt>llvm</tt> from the top level of - each build tree (LLVM object directory tree) in which you want - to run the test suite, just as you do before building LLVM.</p> - <p>During the <em>re-configuration</em>, you must either: (1) - have <tt>llvm-gcc</tt> you just built in your path, or (2) - specify the directory where your just-built <tt>llvm-gcc</tt> is - installed using <tt>--with-llvmgccdir=$LLVM_GCC_DIR</tt>.</p> - <p>You must also tell the configure machinery that the test suite - is available so it can be configured for your build tree:</p> -<div class="doc_code"> -<pre> -% cd $LLVM_OBJ_ROOT ; $LLVM_SRC_ROOT/configure [--with-llvmgccdir=$LLVM_GCC_DIR] -</pre> -</div> - <p>[Remember that <tt>$LLVM_GCC_DIR</tt> is the directory where you - <em>installed</em> llvm-gcc, not its src or obj directory.]</p> - </li> - - <li><p>You can now run the test suite from your build tree as follows:</p> -<div class="doc_code"> -<pre> -% cd $LLVM_OBJ_ROOT/projects/test-suite -% make -</pre> -</div> - </li> -</ol> -<p>Note that the second and third steps only need to be done once. After you -have the suite checked out and configured, you don't need to do it again (unless -the test code or configure script changes).</p> - -<!-- _______________________________________________________________________ --> -<h3> - <a name="testsuiteexternal">Configuring External Tests</a> -</h3> -<!-- _______________________________________________________________________ --> - -<div> -<p>In order to run the External tests in the <tt>test-suite</tt> - module, you must specify <i>--with-externals</i>. This - must be done during the <em>re-configuration</em> step (see above), - and the <tt>llvm</tt> re-configuration must recognize the - previously-built <tt>llvm-gcc</tt>. If any of these is missing or - neglected, the External tests won't work.</p> -<dl> -<dt><i>--with-externals</i></dt> -<dt><i>--with-externals=<<tt>directory</tt>></i></dt> -</dl> - This tells LLVM where to find any external tests. They are expected to be - in specifically named subdirectories of <<tt>directory</tt>>. - If <tt>directory</tt> is left unspecified, - <tt>configure</tt> uses the default value - <tt>/home/vadve/shared/benchmarks/speccpu2000/benchspec</tt>. - Subdirectory names known to LLVM include: - <dl> - <dt>spec95</dt> - <dt>speccpu2000</dt> - <dt>speccpu2006</dt> - <dt>povray31</dt> - </dl> - Others are added from time to time, and can be determined from - <tt>configure</tt>. -</div> - -<!-- _______________________________________________________________________ --> -<h3> - <a name="testsuitetests">Running different tests</a> -</h3> -<!-- _______________________________________________________________________ --> -<div> -<p>In addition to the regular "whole program" tests, the <tt>test-suite</tt> -module also provides a mechanism for compiling the programs in different ways. -If the variable TEST is defined on the <tt>gmake</tt> command line, the test system will -include a Makefile named <tt>TEST.<value of TEST variable>.Makefile</tt>. -This Makefile can modify build rules to yield different results.</p> - -<p>For example, the LLVM nightly tester uses <tt>TEST.nightly.Makefile</tt> to -create the nightly test reports. To run the nightly tests, run <tt>gmake -TEST=nightly</tt>.</p> - -<p>There are several TEST Makefiles available in the tree. Some of them are -designed for internal LLVM research and will not work outside of the LLVM -research group. They may still be valuable, however, as a guide to writing your -own TEST Makefile for any optimization or analysis passes that you develop with -LLVM.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h3> - <a name="testsuiteoutput">Generating test output</a> -</h3> -<!-- _______________________________________________________________________ --> -<div> - <p>There are a number of ways to run the tests and generate output. The most - simple one is simply running <tt>gmake</tt> with no arguments. This will - compile and run all programs in the tree using a number of different methods - and compare results. Any failures are reported in the output, but are likely - drowned in the other output. Passes are not reported explicitly.</p> - - <p>Somewhat better is running <tt>gmake TEST=sometest test</tt>, which runs - the specified test and usually adds per-program summaries to the output - (depending on which sometest you use). For example, the <tt>nightly</tt> test - explicitly outputs TEST-PASS or TEST-FAIL for every test after each program. - Though these lines are still drowned in the output, it's easy to grep the - output logs in the Output directories.</p> - - <p>Even better are the <tt>report</tt> and <tt>report.format</tt> targets - (where <tt>format</tt> is one of <tt>html</tt>, <tt>csv</tt>, <tt>text</tt> or - <tt>graphs</tt>). The exact contents of the report are dependent on which - <tt>TEST</tt> you are running, but the text results are always shown at the - end of the run and the results are always stored in the - <tt>report.<type>.format</tt> file (when running with - <tt>TEST=<type></tt>). - - The <tt>report</tt> also generate a file called - <tt>report.<type>.raw.out</tt> containing the output of the entire test - run. -</div> - -<!-- _______________________________________________________________________ --> -<h3> - <a name="testsuitecustom">Writing custom tests for the test suite</a> -</h3> -<!-- _______________________________________________________________________ --> - -<div> - -<p>Assuming you can run the test suite, (e.g. "<tt>gmake TEST=nightly report</tt>" -should work), it is really easy to run optimizations or code generator -components against every program in the tree, collecting statistics or running -custom checks for correctness. At base, this is how the nightly tester works, -it's just one example of a general framework.</p> - -<p>Lets say that you have an LLVM optimization pass, and you want to see how -many times it triggers. First thing you should do is add an LLVM -<a href="ProgrammersManual.html#Statistic">statistic</a> to your pass, which -will tally counts of things you care about.</p> - -<p>Following this, you can set up a test and a report that collects these and -formats them for easy viewing. This consists of two files, a -"<tt>test-suite/TEST.XXX.Makefile</tt>" fragment (where XXX is the name of your -test) and a "<tt>test-suite/TEST.XXX.report</tt>" file that indicates how to -format the output into a table. There are many example reports of various -levels of sophistication included with the test suite, and the framework is very -general.</p> - -<p>If you are interested in testing an optimization pass, check out the -"libcalls" test as an example. It can be run like this:<p> - -<div class="doc_code"> -<pre> -% cd llvm/projects/test-suite/MultiSource/Benchmarks # or some other level -% make TEST=libcalls report -</pre> -</div> - -<p>This will do a bunch of stuff, then eventually print a table like this:</p> - -<div class="doc_code"> -<pre> -Name | total | #exit | -... -FreeBench/analyzer/analyzer | 51 | 6 | -FreeBench/fourinarow/fourinarow | 1 | 1 | -FreeBench/neural/neural | 19 | 9 | -FreeBench/pifft/pifft | 5 | 3 | -MallocBench/cfrac/cfrac | 1 | * | -MallocBench/espresso/espresso | 52 | 12 | -MallocBench/gs/gs | 4 | * | -Prolangs-C/TimberWolfMC/timberwolfmc | 302 | * | -Prolangs-C/agrep/agrep | 33 | 12 | -Prolangs-C/allroots/allroots | * | * | -Prolangs-C/assembler/assembler | 47 | * | -Prolangs-C/bison/mybison | 74 | * | -... -</pre> -</div> - -<p>This basically is grepping the -stats output and displaying it in a table. -You can also use the "TEST=libcalls report.html" target to get the table in HTML -form, similarly for report.csv and report.tex.</p> - -<p>The source for this is in test-suite/TEST.libcalls.*. The format is pretty -simple: the Makefile indicates how to run the test (in this case, -"<tt>opt -simplify-libcalls -stats</tt>"), and the report contains one line for -each column of the output. The first value is the header for the column and the -second is the regex to grep the output of the command for. There are lots of -example reports that can do fancy stuff.</p> - -</div> - -</div> - -<!-- *********************************************************************** --> - -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss-blue" alt="Valid CSS"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"></a> - - John T. Criswell, Daniel Dunbar, Reid Spencer, and Tanya Lattner<br> - <a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> -</body> -</html> diff --git a/docs/TestSuiteMakefileGuide.rst b/docs/TestSuiteMakefileGuide.rst new file mode 100644 index 0000000000..b10379ef4d --- /dev/null +++ b/docs/TestSuiteMakefileGuide.rst @@ -0,0 +1,279 @@ +============================== +LLVM test-suite Makefile Guide +============================== + +Written by John T. Criswell, Daniel Dunbar, Reid Spencer, and Tanya +Lattner + +.. contents:: + :local: + +Overview +======== + +This document describes the features of the Makefile-based LLVM +test-suite. This way of interacting with the test-suite is deprecated in +favor of running the test-suite using LNT, but may continue to prove +useful for some users. See the Testing Guide's :ref:`test-suite Quickstart +<test-suite-quickstart>` section for more information. + +Test suite Structure +==================== + +The ``test-suite`` module contains a number of programs that can be +compiled with LLVM and executed. These programs are compiled using the +native compiler and various LLVM backends. The output from the program +compiled with the native compiler is assumed correct; the results from +the other programs are compared to the native program output and pass if +they match. + +When executing tests, it is usually a good idea to start out with a +subset of the available tests or programs. This makes test run times +smaller at first and later on this is useful to investigate individual +test failures. To run some test only on a subset of programs, simply +change directory to the programs you want tested and run ``gmake`` +there. Alternatively, you can run a different test using the ``TEST`` +variable to change what tests or run on the selected programs (see below +for more info). + +In addition for testing correctness, the ``test-suite`` directory also +performs timing tests of various LLVM optimizations. It also records +compilation times for the compilers and the JIT. This information can be +used to compare the effectiveness of LLVM's optimizations and code +generation. + +``test-suite`` tests are divided into three types of tests: MultiSource, +SingleSource, and External. + +- ``test-suite/SingleSource`` + + The SingleSource directory contains test programs that are only a + single source file in size. These are usually small benchmark + programs or small programs that calculate a particular value. Several + such programs are grouped together in each directory. + +- ``test-suite/MultiSource`` + + The MultiSource directory contains subdirectories which contain + entire programs with multiple source files. Large benchmarks and + whole applications go here. + +- ``test-suite/External`` + + The External directory contains Makefiles for building code that is + external to (i.e., not distributed with) LLVM. The most prominent + members of this directory are the SPEC 95 and SPEC 2000 benchmark + suites. The ``External`` directory does not contain these actual + tests, but only the Makefiles that know how to properly compile these + programs from somewhere else. The presence and location of these + external programs is configured by the test-suite ``configure`` + script. + +Each tree is then subdivided into several categories, including +applications, benchmarks, regression tests, code that is strange +grammatically, etc. These organizations should be relatively self +explanatory. + +Some tests are known to fail. Some are bugs that we have not fixed yet; +others are features that we haven't added yet (or may never add). In the +regression tests, the result for such tests will be XFAIL (eXpected +FAILure). In this way, you can tell the difference between an expected +and unexpected failure. + +The tests in the test suite have no such feature at this time. If the +test passes, only warnings and other miscellaneous output will be +generated. If a test fails, a large <program> FAILED message will be +displayed. This will help you separate benign warnings from actual test +failures. + +Running the test suite +====================== + +First, all tests are executed within the LLVM object directory tree. +They *are not* executed inside of the LLVM source tree. This is because +the test suite creates temporary files during execution. + +To run the test suite, you need to use the following steps: + +#. ``cd`` into the ``llvm/projects`` directory in your source tree. +#. Check out the ``test-suite`` module with: + + .. code-block:: bash + + % svn co http://llvm.org/svn/llvm-project/test-suite/trunk test-suite + + This will get the test suite into ``llvm/projects/test-suite``. + +#. Configure and build ``llvm``. + +#. Configure and build ``llvm-gcc``. + +#. Install ``llvm-gcc`` somewhere. + +#. *Re-configure* ``llvm`` from the top level of each build tree (LLVM + object directory tree) in which you want to run the test suite, just + as you do before building LLVM. + + During the *re-configuration*, you must either: (1) have ``llvm-gcc`` + you just built in your path, or (2) specify the directory where your + just-built ``llvm-gcc`` is installed using + ``--with-llvmgccdir=$LLVM_GCC_DIR``. + + You must also tell the configure machinery that the test suite is + available so it can be configured for your build tree: + + .. code-block:: bash + + % cd $LLVM_OBJ_ROOT ; $LLVM_SRC_ROOT/configure [--with-llvmgccdir=$LLVM_GCC_DIR] + + [Remember that ``$LLVM_GCC_DIR`` is the directory where you + *installed* llvm-gcc, not its src or obj directory.] + +#. You can now run the test suite from your build tree as follows: + + .. code-block:: bash + + % cd $LLVM_OBJ_ROOT/projects/test-suite + % make + +Note that the second and third steps only need to be done once. After +you have the suite checked out and configured, you don't need to do it +again (unless the test code or configure script changes). + +Configuring External Tests +-------------------------- + +In order to run the External tests in the ``test-suite`` module, you +must specify *--with-externals*. This must be done during the +*re-configuration* step (see above), and the ``llvm`` re-configuration +must recognize the previously-built ``llvm-gcc``. If any of these is +missing or neglected, the External tests won't work. + +* *--with-externals* + +* *--with-externals=<directory>* + +This tells LLVM where to find any external tests. They are expected to +be in specifically named subdirectories of <``directory``>. If +``directory`` is left unspecified, ``configure`` uses the default value +``/home/vadve/shared/benchmarks/speccpu2000/benchspec``. Subdirectory +names known to LLVM include: + +* spec95 + +* speccpu2000 + +* speccpu2006 + +* povray31 + +Others are added from time to time, and can be determined from +``configure``. + +Running different tests +----------------------- + +In addition to the regular "whole program" tests, the ``test-suite`` +module also provides a mechanism for compiling the programs in different +ways. If the variable TEST is defined on the ``gmake`` command line, the +test system will include a Makefile named +``TEST.<value of TEST variable>.Makefile``. This Makefile can modify +build rules to yield different results. + +For example, the LLVM nightly tester uses ``TEST.nightly.Makefile`` to +create the nightly test reports. To run the nightly tests, run +``gmake TEST=nightly``. + +There are several TEST Makefiles available in the tree. Some of them are +designed for internal LLVM research and will not work outside of the +LLVM research group. They may still be valuable, however, as a guide to +writing your own TEST Makefile for any optimization or analysis passes +that you develop with LLVM. + +Generating test output +---------------------- + +There are a number of ways to run the tests and generate output. The +most simple one is simply running ``gmake`` with no arguments. This will +compile and run all programs in the tree using a number of different +methods and compare results. Any failures are reported in the output, +but are likely drowned in the other output. Passes are not reported +explicitly. + +Somewhat better is running ``gmake TEST=sometest test``, which runs the +specified test and usually adds per-program summaries to the output +(depending on which sometest you use). For example, the ``nightly`` test +explicitly outputs TEST-PASS or TEST-FAIL for every test after each +program. Though these lines are still drowned in the output, it's easy +to grep the output logs in the Output directories. + +Even better are the ``report`` and ``report.format`` targets (where +``format`` is one of ``html``, ``csv``, ``text`` or ``graphs``). The +exact contents of the report are dependent on which ``TEST`` you are +running, but the text results are always shown at the end of the run and +the results are always stored in the ``report.<type>.format`` file (when +running with ``TEST=<type>``). The ``report`` also generate a file +called ``report.<type>.raw.out`` containing the output of the entire +test run. + +Writing custom tests for the test suite +--------------------------------------- + +Assuming you can run the test suite, (e.g. +"``gmake TEST=nightly report``" should work), it is really easy to run +optimizations or code generator components against every program in the +tree, collecting statistics or running custom checks for correctness. At +base, this is how the nightly tester works, it's just one example of a +general framework. + +Lets say that you have an LLVM optimization pass, and you want to see +how many times it triggers. First thing you should do is add an LLVM +`statistic <ProgrammersManual.html#Statistic>`_ to your pass, which will +tally counts of things you care about. + +Following this, you can set up a test and a report that collects these +and formats them for easy viewing. This consists of two files, a +"``test-suite/TEST.XXX.Makefile``" fragment (where XXX is the name of +your test) and a "``test-suite/TEST.XXX.report``" file that indicates +how to format the output into a table. There are many example reports of +various levels of sophistication included with the test suite, and the +framework is very general. + +If you are interested in testing an optimization pass, check out the +"libcalls" test as an example. It can be run like this: + +.. code-block:: bash + + % cd llvm/projects/test-suite/MultiSource/Benchmarks # or some other level + % make TEST=libcalls report + +This will do a bunch of stuff, then eventually print a table like this: + +:: + + Name | total | #exit | + ... + FreeBench/analyzer/analyzer | 51 | 6 | + FreeBench/fourinarow/fourinarow | 1 | 1 | + FreeBench/neural/neural | 19 | 9 | + FreeBench/pifft/pifft | 5 | 3 | + MallocBench/cfrac/cfrac | 1 | * | + MallocBench/espresso/espresso | 52 | 12 | + MallocBench/gs/gs | 4 | * | + Prolangs-C/TimberWolfMC/timberwolfmc | 302 | * | + Prolangs-C/agrep/agrep | 33 | 12 | + Prolangs-C/allroots/allroots | * | * | + Prolangs-C/assembler/assembler | 47 | * | + Prolangs-C/bison/mybison | 74 | * | + ... + +This basically is grepping the -stats output and displaying it in a +table. You can also use the "TEST=libcalls report.html" target to get +the table in HTML form, similarly for report.csv and report.tex. + +The source for this is in ``test-suite/TEST.libcalls.*``. The format is +pretty simple: the Makefile indicates how to run the test (in this case, +"``opt -simplify-libcalls -stats``"), and the report contains one line +for each column of the output. The first value is the header for the +column and the second is the regex to grep the output of the command +for. There are lots of example reports that can do fancy stuff. diff --git a/docs/TestingGuide.html b/docs/TestingGuide.html deleted file mode 100644 index ae2643fe4e..0000000000 --- a/docs/TestingGuide.html +++ /dev/null @@ -1,916 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> -<html> -<head> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <title>LLVM Testing Infrastructure Guide</title> - <link rel="stylesheet" href="_static/llvm.css" type="text/css"> -</head> -<body> - -<h1> - LLVM Testing Infrastructure Guide -</h1> - -<ol> - <li><a href="#overview">Overview</a></li> - <li><a href="#requirements">Requirements</a></li> - <li><a href="#org">LLVM testing infrastructure organization</a> - <ul> - <li><a href="#regressiontests">Regression tests</a></li> - <li><a href="#testsuite"><tt>test-suite</tt></a></li> - <li><a href="#debuginfotests">Debugging Information tests</a></li> - </ul> - </li> - <li><a href="#quick">Quick start</a> - <ul> - <li><a href="#quickregressiontests">Regression tests</a></li> - <li><a href="#quickdebuginfotests">Debugging Information tests</a></li> - </ul> - </li> - <li><a href="#rtstructure">Regression test structure</a> - <ul> - <li><a href="#rtcustom">Writing new regression tests</a></li> - <li><a href="#FileCheck">The FileCheck utility</a></li> - <li><a href="#rtvars">Variables and substitutions</a></li> - <li><a href="#rtfeatures">Other features</a></li> - </ul> - </li> - <li><a href="#testsuiteoverview"><tt>test-suite</tt> Overview</a> - <ul> - <li><a href="#testsuitequickstart"><tt>test-suite</tt> Quickstart</a></li> - <li><a href="#testsuitemakefiles"><tt>test-suite</tt> Makefiles</a></li> - </ul> - </li> -</ol> - -<div class="doc_author"> - <p>Written by John T. Criswell, Daniel Dunbar, Reid Spencer, and Tanya Lattner</p> -</div> - -<!--=========================================================================--> -<h2><a name="overview">Overview</a></h2> -<!--=========================================================================--> - -<div> - -<p>This document is the reference manual for the LLVM testing infrastructure. It -documents the structure of the LLVM testing infrastructure, the tools needed to -use it, and how to add and run tests.</p> - -</div> - -<!--=========================================================================--> -<h2><a name="requirements">Requirements</a></h2> -<!--=========================================================================--> - -<div> - -<p>In order to use the LLVM testing infrastructure, you will need all of the -software required to build LLVM, as well -as <a href="http://python.org">Python</a> 2.4 or later.</p> - -</div> - -<!--=========================================================================--> -<h2><a name="org">LLVM testing infrastructure organization</a></h2> -<!--=========================================================================--> - -<div> - -<p>The LLVM testing infrastructure contains two major categories of tests: -regression tests and whole programs. The regression tests are contained inside -the LLVM repository itself under <tt>llvm/test</tt> and are expected to always -pass -- they should be run before every commit.</p> - -<p>The whole programs tests are referred to as the "LLVM test suite" (or -"test-suite") and are in the <tt>test-suite</tt> module in subversion. For -historical reasons, these tests are also referred to as the "nightly tests" in -places, which is less ambiguous than "test-suite" and remains in use although we -run them much more often than nightly.</p> - -<!-- _______________________________________________________________________ --> -<h3><a name="regressiontests">Regression tests</a></h3> -<!-- _______________________________________________________________________ --> - -<div> - -<p>The regression tests are small pieces of code that test a specific feature of -LLVM or trigger a specific bug in LLVM. They are usually written in LLVM -assembly language, but can be written in other languages if the test targets a -particular language front end (and the appropriate <tt>--with-llvmgcc</tt> -options were used at <tt>configure</tt> time of the <tt>llvm</tt> module). These -tests are driven by the 'lit' testing tool, which is part of LLVM.</p> - -<p>These code fragments are not complete programs. The code generated -from them is never executed to determine correct behavior.</p> - -<p>These code fragment tests are located in the <tt>llvm/test</tt> -directory.</p> - -<p>Typically when a bug is found in LLVM, a regression test containing -just enough code to reproduce the problem should be written and placed -somewhere underneath this directory. In most cases, this will be a small -piece of LLVM assembly language code, often distilled from an actual -application or benchmark.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h3><a name="testsuite"><tt>test-suite</tt></a></h3> -<!-- _______________________________________________________________________ --> - -<div> - -<p>The test suite contains whole programs, which are pieces of code which can be -compiled and linked into a stand-alone program that can be executed. These -programs are generally written in high level languages such as C or C++.</p> - -<p>These programs are compiled using a user specified compiler and set of flags, -and then executed to capture the program output and timing information. The -output of these programs is compared to a reference output to ensure that the -program is being compiled correctly.</p> - -<p>In addition to compiling and executing programs, whole program tests serve as -a way of benchmarking LLVM performance, both in terms of the efficiency of the -programs generated as well as the speed with which LLVM compiles, optimizes, and -generates code.</p> - -<p>The test-suite is located in the <tt>test-suite</tt> Subversion module.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h3><a name="debuginfotests">Debugging Information tests</a></h3> -<!-- _______________________________________________________________________ --> - -<div> - -<p>The test suite contains tests to check quality of debugging information. -The test are written in C based languages or in LLVM assembly language. </p> - -<p>These tests are compiled and run under a debugger. The debugger output -is checked to validate of debugging information. See README.txt in the -test suite for more information . This test suite is located in the -<tt>debuginfo-tests</tt> Subversion module. </p> - -</div> - -</div> - -<!--=========================================================================--> -<h2><a name="quick">Quick start</a></h2> -<!--=========================================================================--> - -<div> - - <p>The tests are located in two separate Subversion modules. The regressions - tests are in the main "llvm" module under the directory - <tt>llvm/test</tt> (so you get these tests for free with the main llvm - tree). Use "make check-all" to run the regression tests after building - LLVM.</p> - - <p>The more comprehensive test suite that includes whole programs in C and C++ - is in the <tt>test-suite</tt> - module. See <a href="#testsuitequickstart"><tt>test-suite</tt> Quickstart</a> - for more information on running these tests.</p> - -<!-- _______________________________________________________________________ --> -<h3><a name="quickregressiontests">Regression tests</a></h3> -<div> -<!-- _______________________________________________________________________ --> -<p>To run all of the LLVM regression tests, use master Makefile in - the <tt>llvm/test</tt> directory:</p> - -<div class="doc_code"> -<pre> -% gmake -C llvm/test -</pre> -</div> - -<p>or</p> - -<div class="doc_code"> -<pre> -% gmake check -</pre> -</div> - -<p>If you have <a href="http://clang.llvm.org/">Clang</a> checked out and built, -you can run the LLVM and Clang tests simultaneously using:</p> - -<p>or</p> - -<div class="doc_code"> -<pre> -% gmake check-all -</pre> -</div> - -<p>To run the tests with Valgrind (Memcheck by default), just append -<tt>VG=1</tt> to the commands above, e.g.:</p> - -<div class="doc_code"> -<pre> -% gmake check VG=1 -</pre> -</div> - -<p>To run individual tests or subsets of tests, you can use the 'llvm-lit' -script which is built as part of LLVM. For example, to run the -'Integer/BitCast.ll' test by itself you can run:</p> - -<div class="doc_code"> -<pre> -% llvm-lit ~/llvm/test/Integer/BitCast.ll -</pre> -</div> - -<p>or to run all of the ARM CodeGen tests:</p> - -<div class="doc_code"> -<pre> -% llvm-lit ~/llvm/test/CodeGen/ARM -</pre> -</div> - -<p>For more information on using the 'lit' tool, see 'llvm-lit --help' or the -'lit' man page.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h3><a name="quickdebuginfotests">Debugging Information tests</a></h3> -<div> -<!-- _______________________________________________________________________ --> -<div> - -<p> To run debugging information tests simply checkout the tests inside -clang/test directory. </p> - -<div class="doc_code"> -<pre> -%cd clang/test -% svn co http://llvm.org/svn/llvm-project/debuginfo-tests/trunk debuginfo-tests -</pre> -</div> - -<p> These tests are already set up to run as part of clang regression tests.</p> - -</div> - -</div> - -</div> - -<!--=========================================================================--> -<h2><a name="rtstructure">Regression test structure</a></h2> -<!--=========================================================================--> -<div> - <p>The LLVM regression tests are driven by 'lit' and are located in - the <tt>llvm/test</tt> directory. - - <p>This directory contains a large array of small tests - that exercise various features of LLVM and to ensure that regressions do not - occur. The directory is broken into several sub-directories, each focused on - a particular area of LLVM. A few of the important ones are:</p> - - <ul> - <li><tt>Analysis</tt>: checks Analysis passes.</li> - <li><tt>Archive</tt>: checks the Archive library.</li> - <li><tt>Assembler</tt>: checks Assembly reader/writer functionality.</li> - <li><tt>Bitcode</tt>: checks Bitcode reader/writer functionality.</li> - <li><tt>CodeGen</tt>: checks code generation and each target.</li> - <li><tt>Features</tt>: checks various features of the LLVM language.</li> - <li><tt>Linker</tt>: tests bitcode linking.</li> - <li><tt>Transforms</tt>: tests each of the scalar, IPO, and utility - transforms to ensure they make the right transformations.</li> - <li><tt>Verifier</tt>: tests the IR verifier.</li> - </ul> - -<!-- _______________________________________________________________________ --> -<h3><a name="rtcustom">Writing new regression tests</a></h3> -<!-- _______________________________________________________________________ --> -<div> - <p>The regression test structure is very simple, but does require some - information to be set. This information is gathered via <tt>configure</tt> and - is written to a file, <tt>lit.site.cfg</tt> - in <tt>llvm/test</tt>. The <tt>llvm/test</tt> Makefile does this work for - you.</p> - - <p>In order for the regression tests to work, each directory of tests must - have a <tt>lit.local.cfg</tt> file. Lit looks for this file to determine how - to run the tests. This file is just Python code and thus is very flexible, - but we've standardized it for the LLVM regression tests. If you're adding a - directory of tests, just copy <tt>lit.local.cfg</tt> from another directory to - get running. The standard <tt>lit.local.cfg</tt> simply specifies which files - to look in for tests. Any directory that contains only directories does not - need the <tt>lit.local.cfg</tt> file. Read the - <a href="http://llvm.org/cmds/lit.html">Lit documentation</a> for more - information. </p> - - <p>The <tt>llvm-runtests</tt> function looks at each file that is passed to - it and gathers any lines together that match "RUN:". These are the "RUN" lines - that specify how the test is to be run. So, each test script must contain - RUN lines if it is to do anything. If there are no RUN lines, the - <tt>llvm-runtests</tt> function will issue an error and the test will - fail.</p> - - <p>RUN lines are specified in the comments of the test program using the - keyword <tt>RUN</tt> followed by a colon, and lastly the command (pipeline) - to execute. Together, these lines form the "script" that - <tt>llvm-runtests</tt> executes to run the test case. The syntax of the - RUN lines is similar to a shell's syntax for pipelines including I/O - redirection and variable substitution. However, even though these lines - may <i>look</i> like a shell script, they are not. RUN lines are interpreted - directly by the Tcl <tt>exec</tt> command. They are never executed by a - shell. Consequently the syntax differs from normal shell script syntax in a - few ways. You can specify as many RUN lines as needed.</p> - - <p>lit performs substitution on each RUN line to replace LLVM tool - names with the full paths to the executable built for each tool (in - $(LLVM_OBJ_ROOT)/$(BuildMode)/bin). This ensures that lit does not - invoke any stray LLVM tools in the user's path during testing.</p> - - <p>Each RUN line is executed on its own, distinct from other lines unless - its last character is <tt>\</tt>. This continuation character causes the RUN - line to be concatenated with the next one. In this way you can build up long - pipelines of commands without making huge line lengths. The lines ending in - <tt>\</tt> are concatenated until a RUN line that doesn't end in <tt>\</tt> is - found. This concatenated set of RUN lines then constitutes one execution. - Tcl will substitute variables and arrange for the pipeline to be executed. If - any process in the pipeline fails, the entire line (and test case) fails too. - </p> - - <p> Below is an example of legal RUN lines in a <tt>.ll</tt> file:</p> - -<div class="doc_code"> -<pre> -; RUN: llvm-as < %s | llvm-dis > %t1 -; RUN: llvm-dis < %s.bc-13 > %t2 -; RUN: diff %t1 %t2 -</pre> -</div> - - <p>As with a Unix shell, the RUN: lines permit pipelines and I/O redirection - to be used. However, the usage is slightly different than for Bash. To check - what's legal, see the documentation for the - <a href="http://www.tcl.tk/man/tcl8.5/TclCmd/exec.htm#M2">Tcl exec</a> - command and the - <a href="http://www.tcl.tk/man/tcl8.5/tutorial/Tcl26.html">tutorial</a>. - The major differences are:</p> - <ul> - <li>You can't do <tt>2>&1</tt>. That will cause Tcl to write to a - file named <tt>&1</tt>. Usually this is done to get stderr to go through - a pipe. You can do that in tcl with <tt>|&</tt> so replace this idiom: - <tt>... 2>&1 | grep</tt> with <tt>... |& grep</tt></li> - <li>You can only redirect to a file, not to another descriptor and not from - a here document.</li> - <li>tcl supports redirecting to open files with the @ syntax but you - shouldn't use that here.</li> - </ul> - - <p>There are some quoting rules that you must pay attention to when writing - your RUN lines. In general nothing needs to be quoted. Tcl won't strip off any - quote characters so they will get passed to the invoked program. For - example:</p> - -<div class="doc_code"> -<pre> -... | grep 'find this string' -</pre> -</div> - - <p>This will fail because the ' characters are passed to grep. This would - instruction grep to look for <tt>'find</tt> in the files <tt>this</tt> and - <tt>string'</tt>. To avoid this use curly braces to tell Tcl that it should - treat everything enclosed as one value. So our example would become:</p> - -<div class="doc_code"> -<pre> -... | grep {find this string} -</pre> -</div> - - <p>Additionally, the characters <tt>[</tt> and <tt>]</tt> are treated - specially by Tcl. They tell Tcl to interpret the content as a command to - execute. Since these characters are often used in regular expressions this can - have disastrous results and cause the entire test run in a directory to fail. - For example, a common idiom is to look for some basicblock number:</p> - -<div class="doc_code"> -<pre> -... | grep bb[2-8] -</pre> -</div> - - <p>This, however, will cause Tcl to fail because its going to try to execute - a program named "2-8". Instead, what you want is this:</p> - -<div class="doc_code"> -<pre> -... | grep {bb\[2-8\]} -</pre> -</div> - - <p>Finally, if you need to pass the <tt>\</tt> character down to a program, - then it must be doubled. This is another Tcl special character. So, suppose - you had: - -<div class="doc_code"> -<pre> -... | grep 'i32\*' -</pre> -</div> - - <p>This will fail to match what you want (a pointer to i32). First, the - <tt>'</tt> do not get stripped off. Second, the <tt>\</tt> gets stripped off - by Tcl so what grep sees is: <tt>'i32*'</tt>. That's not likely to match - anything. To resolve this you must use <tt>\\</tt> and the <tt>{}</tt>, like - this:</p> - -<div class="doc_code"> -<pre> -... | grep {i32\\*} -</pre> -</div> - -<p>If your system includes GNU <tt>grep</tt>, make sure -that <tt>GREP_OPTIONS</tt> is not set in your environment. Otherwise, -you may get invalid results (both false positives and false -negatives).</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h3><a name="FileCheck">The FileCheck utility</a></h3> -<!-- _______________________________________________________________________ --> - -<div> - -<p>A powerful feature of the RUN: lines is that it allows any arbitrary commands - to be executed as part of the test harness. While standard (portable) unix - tools like 'grep' work fine on run lines, as you see above, there are a lot - of caveats due to interaction with Tcl syntax, and we want to make sure the - run lines are portable to a wide range of systems. Another major problem is - that grep is not very good at checking to verify that the output of a tools - contains a series of different output in a specific order. The FileCheck - tool was designed to help with these problems.</p> - -<p>FileCheck (whose basic command line arguments are described in <a - href="http://llvm.org/cmds/FileCheck.html">the FileCheck man page</a> is - designed to read a file to check from standard input, and the set of things - to verify from a file specified as a command line argument. A simple example - of using FileCheck from a RUN line looks like this:</p> - -<div class="doc_code"> -<pre> -; RUN: llvm-as < %s | llc -march=x86-64 | <b>FileCheck %s</b> -</pre> -</div> - -<p>This syntax says to pipe the current file ("%s") into llvm-as, pipe that into -llc, then pipe the output of llc into FileCheck. This means that FileCheck will -be verifying its standard input (the llc output) against the filename argument -specified (the original .ll file specified by "%s"). To see how this works, -let's look at the rest of the .ll file (after the RUN line):</p> - -<div class="doc_code"> -<pre> -define void @sub1(i32* %p, i32 %v) { -entry: -; <b>CHECK: sub1:</b> -; <b>CHECK: subl</b> - %0 = tail call i32 @llvm.atomic.load.sub.i32.p0i32(i32* %p, i32 %v) - ret void -} - -define void @inc4(i64* %p) { -entry: -; <b>CHECK: inc4:</b> -; <b>CHECK: incq</b> - %0 = tail call i64 @llvm.atomic.load.add.i64.p0i64(i64* %p, i64 1) - ret void -} -</pre> -</div> - -<p>Here you can see some "CHECK:" lines specified in comments. Now you can see -how the file is piped into llvm-as, then llc, and the machine code output is -what we are verifying. FileCheck checks the machine code output to verify that -it matches what the "CHECK:" lines specify.</p> - -<p>The syntax of the CHECK: lines is very simple: they are fixed strings that -must occur in order. FileCheck defaults to ignoring horizontal whitespace -differences (e.g. a space is allowed to match a tab) but otherwise, the contents -of the CHECK: line is required to match some thing in the test file exactly.</p> - -<p>One nice thing about FileCheck (compared to grep) is that it allows merging -test cases together into logical groups. For example, because the test above -is checking for the "sub1:" and "inc4:" labels, it will not match unless there -is a "subl" in between those labels. If it existed somewhere else in the file, -that would not count: "grep subl" matches if subl exists anywhere in the -file.</p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="FileCheck-check-prefix">The FileCheck -check-prefix option</a> -</h4> - -<div> - -<p>The FileCheck -check-prefix option allows multiple test configurations to be -driven from one .ll file. This is useful in many circumstances, for example, -testing different architectural variants with llc. Here's a simple example:</p> - -<div class="doc_code"> -<pre> -; RUN: llvm-as < %s | llc -mtriple=i686-apple-darwin9 -mattr=sse41 \ -; RUN: | <b>FileCheck %s -check-prefix=X32</b> -; RUN: llvm-as < %s | llc -mtriple=x86_64-apple-darwin9 -mattr=sse41 \ -; RUN: | <b>FileCheck %s -check-prefix=X64</b> - -define <4 x i32> @pinsrd_1(i32 %s, <4 x i32> %tmp) nounwind { - %tmp1 = insertelement <4 x i32> %tmp, i32 %s, i32 1 - ret <4 x i32> %tmp1 -; <b>X32:</b> pinsrd_1: -; <b>X32:</b> pinsrd $1, 4(%esp), %xmm0 - -; <b>X64:</b> pinsrd_1: -; <b>X64:</b> pinsrd $1, %edi, %xmm0 -} -</pre> -</div> - -<p>In this case, we're testing that we get the expected code generation with -both 32-bit and 64-bit code generation.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="FileCheck-CHECK-NEXT">The "CHECK-NEXT:" directive</a> -</h4> - -<div> - -<p>Sometimes you want to match lines and would like to verify that matches -happen on exactly consecutive lines with no other lines in between them. In -this case, you can use CHECK: and CHECK-NEXT: directives to specify this. If -you specified a custom check prefix, just use "<PREFIX>-NEXT:". For -example, something like this works as you'd expect:</p> - -<div class="doc_code"> -<pre> -define void @t2(<2 x double>* %r, <2 x double>* %A, double %B) { - %tmp3 = load <2 x double>* %A, align 16 - %tmp7 = insertelement <2 x double> undef, double %B, i32 0 - %tmp9 = shufflevector <2 x double> %tmp3, - <2 x double> %tmp7, - <2 x i32> < i32 0, i32 2 > - store <2 x double> %tmp9, <2 x double>* %r, align 16 - ret void - -; <b>CHECK:</b> t2: -; <b>CHECK:</b> movl 8(%esp), %eax -; <b>CHECK-NEXT:</b> movapd (%eax), %xmm0 -; <b>CHECK-NEXT:</b> movhpd 12(%esp), %xmm0 -; <b>CHECK-NEXT:</b> movl 4(%esp), %eax -; <b>CHECK-NEXT:</b> movapd %xmm0, (%eax) -; <b>CHECK-NEXT:</b> ret -} -</pre> -</div> - -<p>CHECK-NEXT: directives reject the input unless there is exactly one newline -between it an the previous directive. A CHECK-NEXT cannot be the first -directive in a file.</p> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="FileCheck-CHECK-NOT">The "CHECK-NOT:" directive</a> -</h4> - -<div> - -<p>The CHECK-NOT: directive is used to verify that a string doesn't occur -between two matches (or the first match and the beginning of the file). For -example, to verify that a load is removed by a transformation, a test like this -can be used:</p> - -<div class="doc_code"> -<pre> -define i8 @coerce_offset0(i32 %V, i32* %P) { - store i32 %V, i32* %P - - %P2 = bitcast i32* %P to i8* - %P3 = getelementptr i8* %P2, i32 2 - - %A = load i8* %P3 - ret i8 %A -; <b>CHECK:</b> @coerce_offset0 -; <b>CHECK-NOT:</b> load -; <b>CHECK:</b> ret i8 -} -</pre> -</div> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="FileCheck-Matching">FileCheck Pattern Matching Syntax</a> -</h4> - -<div> - -<!-- {% raw %} --> - -<p>The CHECK: and CHECK-NOT: directives both take a pattern to match. For most -uses of FileCheck, fixed string matching is perfectly sufficient. For some -things, a more flexible form of matching is desired. To support this, FileCheck -allows you to specify regular expressions in matching strings, surrounded by -double braces: <b>{{yourregex}}</b>. Because we want to use fixed string -matching for a majority of what we do, FileCheck has been designed to support -mixing and matching fixed string matching with regular expressions. This allows -you to write things like this:</p> - -<div class="doc_code"> -<pre> -; CHECK: movhpd <b>{{[0-9]+}}</b>(%esp), <b>{{%xmm[0-7]}}</b> -</pre> -</div> - -<p>In this case, any offset from the ESP register will be allowed, and any xmm -register will be allowed.</p> - -<p>Because regular expressions are enclosed with double braces, they are -visually distinct, and you don't need to use escape characters within the double -braces like you would in C. In the rare case that you want to match double -braces explicitly from the input, you can use something ugly like -<b>{{[{][{]}}</b> as your pattern.</p> - -<!-- {% endraw %} --> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="FileCheck-Variables">FileCheck Variables</a> -</h4> - -<div> - - -<!-- {% raw %} --> - -<p>It is often useful to match a pattern and then verify that it occurs again -later in the file. For codegen tests, this can be useful to allow any register, -but verify that that register is used consistently later. To do this, FileCheck -allows named variables to be defined and substituted into patterns. Here is a -simple example:</p> - -<div class="doc_code"> -<pre> -; CHECK: test5: -; CHECK: notw <b>[[REGISTER:%[a-z]+]]</b> -; CHECK: andw {{.*}}<b>[[REGISTER]]</b> -</pre> -</div> - -<p>The first check line matches a regex (<tt>%[a-z]+</tt>) and captures it into -the variables "REGISTER". The second line verifies that whatever is in REGISTER -occurs later in the file after an "andw". FileCheck variable references are -always contained in <tt>[[ ]]</tt> pairs, are named, and their names can be -formed with the regex "<tt>[a-zA-Z][a-zA-Z0-9]*</tt>". If a colon follows the -name, then it is a definition of the variable, if not, it is a use.</p> - -<p>FileCheck variables can be defined multiple times, and uses always get the -latest value. Note that variables are all read at the start of a "CHECK" line -and are all defined at the end. This means that if you have something like -"<tt>CHECK: [[XYZ:.*]]x[[XYZ]]</tt>" that the check line will read the previous -value of the XYZ variable and define a new one after the match is performed. If -you need to do something like this you can probably take advantage of the fact -that FileCheck is not actually line-oriented when it matches, this allows you to -define two separate CHECK lines that match on the same line. -</p> - -<!-- {% endraw %} --> - -</div> - -</div> - -<!-- _______________________________________________________________________ --> -<h3><a name="rtvars">Variables and substitutions</a></h3> -<!-- _______________________________________________________________________ --> -<div> - <p>With a RUN line there are a number of substitutions that are permitted. In - general, any Tcl variable that is available in the <tt>substitute</tt> - function (in <tt>test/lib/llvm.exp</tt>) can be substituted into a RUN line. - To make a substitution just write the variable's name preceded by a $. - Additionally, for compatibility reasons with previous versions of the test - library, certain names can be accessed with an alternate syntax: a % prefix. - These alternates are deprecated and may go away in a future version. - </p> - <p>Here are the available variable names. The alternate syntax is listed in - parentheses.</p> - - <dl style="margin-left: 25px"> - <dt><b>$test</b> (%s)</dt> - <dd>The full path to the test case's source. This is suitable for passing - on the command line as the input to an llvm tool.</dd> - - <dt><b>$srcdir</b></dt> - <dd>The source directory from where the "<tt>make check</tt>" was run.</dd> - - <dt><b>objdir</b></dt> - <dd>The object directory that corresponds to the <tt>$srcdir</tt>.</dd> - - <dt><b>subdir</b></dt> - <dd>A partial path from the <tt>test</tt> directory that contains the - sub-directory that contains the test source being executed.</dd> - - <dt><b>srcroot</b></dt> - <dd>The root directory of the LLVM src tree.</dd> - - <dt><b>objroot</b></dt> - <dd>The root directory of the LLVM object tree. This could be the same - as the srcroot.</dd> - - <dt><b>path</b><dt> - <dd>The path to the directory that contains the test case source. This is - for locating any supporting files that are not generated by the test, but - used by the test.</dd> - - <dt><b>tmp</b></dt> - <dd>The path to a temporary file name that could be used for this test case. - The file name won't conflict with other test cases. You can append to it if - you need multiple temporaries. This is useful as the destination of some - redirected output.</dd> - - <dt><b>target_triplet</b> (%target_triplet)</dt> - <dd>The target triplet that corresponds to the current host machine (the one - running the test cases). This should probably be called "host".<dd> - - <dt><b>link</b> (%link)</dt> - <dd>This full link command used to link LLVM executables. This has all the - configured -I, -L and -l options.</dd> - - <dt><b>shlibext</b> (%shlibext)</dt> - <dd>The suffix for the host platforms share library (dll) files. This - includes the period as the first character.</dd> - </dl> - <p>To add more variables, two things need to be changed. First, add a line in - the <tt>test/Makefile</tt> that creates the <tt>site.exp</tt> file. This will - "set" the variable as a global in the site.exp file. Second, in the - <tt>test/lib/llvm.exp</tt> file, in the substitute proc, add the variable name - to the list of "global" declarations at the beginning of the proc. That's it, - the variable can then be used in test scripts.</p> -</div> - -<!-- _______________________________________________________________________ --> -<h3><a name="rtfeatures">Other Features</a></h3> -<!-- _______________________________________________________________________ --> -<div> - <p>To make RUN line writing easier, there are several shell scripts located - in the <tt>llvm/test/Scripts</tt> directory. This directory is in the PATH - when running tests, so you can just call these scripts using their name. For - example:</p> - <dl> - <dt><b>ignore</b></dt> - <dd>This script runs its arguments and then always returns 0. This is useful - in cases where the test needs to cause a tool to generate an error (e.g. to - check the error output). However, any program in a pipeline that returns a - non-zero result will cause the test to fail. This script overcomes that - issue and nicely documents that the test case is purposefully ignoring the - result code of the tool</dd> - - <dt><b>not</b></dt> - <dd>This script runs its arguments and then inverts the result code from - it. Zero result codes become 1. Non-zero result codes become 0. This is - useful to invert the result of a grep. For example "not grep X" means - succeed only if you don't find X in the input.</dd> - </dl> - - <p>Sometimes it is necessary to mark a test case as "expected fail" or XFAIL. - You can easily mark a test as XFAIL just by including <tt>XFAIL: </tt> on a - line near the top of the file. This signals that the test case should succeed - if the test fails. Such test cases are counted separately by the testing - tool. To specify an expected fail, use the XFAIL keyword in the comments of - the test program followed by a colon and one or more failure patterns. Each - failure pattern can be either '*' (to specify fail everywhere), or a part of a - target triple (indicating the test should fail on that platform), or the name - of a configurable feature (for example, "loadable_module"). If there is a - match, the test is expected to fail. If not, the test is expected to - succeed. To XFAIL everywhere just specify <tt>XFAIL: *</tt>. Here is an - example of an <tt>XFAIL</tt> line:</p> - -<div class="doc_code"> -<pre> -; XFAIL: darwin,sun -</pre> -</div> - - <p>To make the output more useful, the <tt>llvm_runtest</tt> function wil - scan the lines of the test case for ones that contain a pattern that matches - PR[0-9]+. This is the syntax for specifying a PR (Problem Report) number that - is related to the test case. The number after "PR" specifies the LLVM bugzilla - number. When a PR number is specified, it will be used in the pass/fail - reporting. This is useful to quickly get some context when a test fails.</p> - - <p>Finally, any line that contains "END." will cause the special - interpretation of lines to terminate. This is generally done right after the - last RUN: line. This has two side effects: (a) it prevents special - interpretation of lines that are part of the test program, not the - instructions to the test case, and (b) it speeds things up for really big test - cases by avoiding interpretation of the remainder of the file.</p> - -</div> - -</div> - -<!--=========================================================================--> -<h2><a name="testsuiteoverview"><tt>test-suite</tt> Overview</a></h2> -<!--=========================================================================--> - -<div> - -<p>The <tt>test-suite</tt> module contains a number of programs that can be -compiled and executed. The <tt>test-suite</tt> includes reference outputs for -all of the programs, so that the output of the executed program can be checked -for correctness.</p> - -<p><tt>test-suite</tt> tests are divided into three types of tests: MultiSource, -SingleSource, and External.</p> - -<ul> -<li><tt>test-suite/SingleSource</tt> -<p>The SingleSource directory contains test programs that are only a single -source file in size. These are usually small benchmark programs or small -programs that calculate a particular value. Several such programs are grouped -together in each directory.</p></li> - -<li><tt>test-suite/MultiSource</tt> -<p>The MultiSource directory contains subdirectories which contain entire -programs with multiple source files. Large benchmarks and whole applications -go here.</p></li> - -<li><tt>test-suite/External</tt> -<p>The External directory contains Makefiles for building code that is external -to (i.e., not distributed with) LLVM. The most prominent members of this -directory are the SPEC 95 and SPEC 2000 benchmark suites. The <tt>External</tt> -directory does not contain these actual tests, but only the Makefiles that know -how to properly compile these programs from somewhere else. When -using <tt>LNT</tt>, use the <tt>--test-externals</tt> option to include these -tests in the results.</p></li> -</ul> -</div> - -<!--=========================================================================--> -<h2><a name="testsuitequickstart"><tt>test-suite</tt> Quickstart</a></h2> -<!--=========================================================================--> - -<div> -<p>The modern way of running the <tt>test-suite</tt> is focused on testing and -benchmarking complete compilers using -the <a href="http://llvm.org/docs/lnt">LNT</a> testing infrastructure.</p> - -<p>For more information on using LNT to execute the <tt>test-suite</tt>, please -see the <a href="http://llvm.org/docs/lnt/quickstart.html">LNT Quickstart</a> -documentation.</p> -</div> - -<!--=========================================================================--> -<h2><a name="testsuitemakefiles"><tt>test-suite</tt> Makefiles</a></h2> -<!--=========================================================================--> - -<div> -<p>Historically, the <tt>test-suite</tt> was executed using a complicated setup -of Makefiles. The LNT based approach above is recommended for most users, but -there are some testing scenarios which are not supported by the LNT approach. In -addition, LNT currently uses the Makefile setup under the covers and so -developers who are interested in how LNT works under the hood may want to -understand the Makefile based setup.</p> - -<p>For more information on the <tt>test-suite</tt> Makefile setup, please see -the <a href="TestSuiteMakefileGuide.html">Test Suite Makefile Guide.</a></p> -</div> - -<!-- *********************************************************************** --> - -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss-blue" alt="Valid CSS"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"></a> - - John T. Criswell, Daniel Dunbar, Reid Spencer, and Tanya Lattner<br> - <a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> -</body> -</html> diff --git a/docs/TestingGuide.rst b/docs/TestingGuide.rst new file mode 100644 index 0000000000..f66cae1d14 --- /dev/null +++ b/docs/TestingGuide.rst @@ -0,0 +1,460 @@ +================================= +LLVM Testing Infrastructure Guide +================================= + +Written by John T. Criswell, Daniel Dunbar, Reid Spencer, and Tanya +Lattner + +.. contents:: + :local: + +.. toctree:: + :hidden: + + TestSuiteMakefileGuide + +Overview +======== + +This document is the reference manual for the LLVM testing +infrastructure. It documents the structure of the LLVM testing +infrastructure, the tools needed to use it, and how to add and run +tests. + +Requirements +============ + +In order to use the LLVM testing infrastructure, you will need all of +the software required to build LLVM, as well as +`Python <http://python.org>`_ 2.4 or later. + +LLVM testing infrastructure organization +======================================== + +The LLVM testing infrastructure contains two major categories of tests: +regression tests and whole programs. The regression tests are contained +inside the LLVM repository itself under ``llvm/test`` and are expected +to always pass -- they should be run before every commit. + +The whole programs tests are referred to as the "LLVM test suite" (or +"test-suite") and are in the ``test-suite`` module in subversion. For +historical reasons, these tests are also referred to as the "nightly +tests" in places, which is less ambiguous than "test-suite" and remains +in use although we run them much more often than nightly. + +Regression tests +---------------- + +The regression tests are small pieces of code that test a specific +feature of LLVM or trigger a specific bug in LLVM. The language they are +written in depends on the part of LLVM being tested. These tests are driven by +the :doc:`Lit <CommandGuide/lit>` testing tool (which is part of LLVM), and +are located in the ``llvm/test`` directory. + +Typically when a bug is found in LLVM, a regression test containing just +enough code to reproduce the problem should be written and placed +somewhere underneath this directory. For example, it can be a small +piece of LLVM IR distilled from an actual application or benchmark. + +``test-suite`` +-------------- + +The test suite contains whole programs, which are pieces of code which +can be compiled and linked into a stand-alone program that can be +executed. These programs are generally written in high level languages +such as C or C++. + +These programs are compiled using a user specified compiler and set of +flags, and then executed to capture the program output and timing +information. The output of these programs is compared to a reference +output to ensure that the program is being compiled correctly. + +In addition to compiling and executing programs, whole program tests +serve as a way of benchmarking LLVM performance, both in terms of the +efficiency of the programs generated as well as the speed with which +LLVM compiles, optimizes, and generates code. + +The test-suite is located in the ``test-suite`` Subversion module. + +Debugging Information tests +--------------------------- + +The test suite contains tests to check quality of debugging information. +The test are written in C based languages or in LLVM assembly language. + +These tests are compiled and run under a debugger. The debugger output +is checked to validate of debugging information. See README.txt in the +test suite for more information . This test suite is located in the +``debuginfo-tests`` Subversion module. + +Quick start +=========== + +The tests are located in two separate Subversion modules. The +regressions tests are in the main "llvm" module under the directory +``llvm/test`` (so you get these tests for free with the main LLVM tree). +Use ``make check-all`` to run the regression tests after building LLVM. + +The more comprehensive test suite that includes whole programs in C and C++ +is in the ``test-suite`` module. See :ref:`test-suite Quickstart +<test-suite-quickstart>` for more information on running these tests. + +Regression tests +---------------- + +To run all of the LLVM regression tests, use the master Makefile in the +``llvm/test`` directory. LLVM Makefiles require GNU Make (read the :doc:`LLVM +Makefile Guide <MakefileGuide>` for more details): + +.. code-block:: bash + + % make -C llvm/test + +or: + +.. code-block:: bash + + % make check + +If you have `Clang <http://clang.llvm.org/>`_ checked out and built, you +can run the LLVM and Clang tests simultaneously using: + +.. code-block:: bash + + % make check-all + +To run the tests with Valgrind (Memcheck by default), just append +``VG=1`` to the commands above, e.g.: + +.. code-block:: bash + + % make check VG=1 + +To run individual tests or subsets of tests, you can use the ``llvm-lit`` +script which is built as part of LLVM. For example, to run the +``Integer/BitPacked.ll`` test by itself you can run: + +.. code-block:: bash + + % llvm-lit ~/llvm/test/Integer/BitPacked.ll + +or to run all of the ARM CodeGen tests: + +.. code-block:: bash + + % llvm-lit ~/llvm/test/CodeGen/ARM + +For more information on using the :program:`lit` tool, see ``llvm-lit --help`` +or the :doc:`lit man page <CommandGuide/lit>`. + +Debugging Information tests +--------------------------- + +To run debugging information tests simply checkout the tests inside +clang/test directory. + +.. code-block:: bash + + % cd clang/test + % svn co http://llvm.org/svn/llvm-project/debuginfo-tests/trunk debuginfo-tests + +These tests are already set up to run as part of clang regression tests. + +Regression test structure +========================= + +The LLVM regression tests are driven by :program:`lit` and are located in the +``llvm/test`` directory. + +This directory contains a large array of small tests that exercise +various features of LLVM and to ensure that regressions do not occur. +The directory is broken into several sub-directories, each focused on a +particular area of LLVM. + +Writing new regression tests +---------------------------- + +The regression test structure is very simple, but does require some +information to be set. This information is gathered via ``configure`` +and is written to a file, ``test/lit.site.cfg`` in the build directory. +The ``llvm/test`` Makefile does this work for you. + +In order for the regression tests to work, each directory of tests must +have a ``lit.local.cfg`` file. :program:`lit` looks for this file to determine +how to run the tests. This file is just Python code and thus is very +flexible, but we've standardized it for the LLVM regression tests. If +you're adding a directory of tests, just copy ``lit.local.cfg`` from +another directory to get running. The standard ``lit.local.cfg`` simply +specifies which files to look in for tests. Any directory that contains +only directories does not need the ``lit.local.cfg`` file. Read the :doc:`Lit +documentation <CommandGuide/lit>` for more information. + +Each test file must contain lines starting with "RUN:" that tell :program:`lit` +how to run it. If there are no RUN lines, :program:`lit` will issue an error +while running a test. + +RUN lines are specified in the comments of the test program using the +keyword ``RUN`` followed by a colon, and lastly the command (pipeline) +to execute. Together, these lines form the "script" that :program:`lit` +executes to run the test case. The syntax of the RUN lines is similar to a +shell's syntax for pipelines including I/O redirection and variable +substitution. However, even though these lines may *look* like a shell +script, they are not. RUN lines are interpreted by :program:`lit`. +Consequently, the syntax differs from shell in a few ways. You can specify +as many RUN lines as needed. + +:program:`lit` performs substitution on each RUN line to replace LLVM tool names +with the full paths to the executable built for each tool (in +``$(LLVM_OBJ_ROOT)/$(BuildMode)/bin)``. This ensures that :program:`lit` does +not invoke any stray LLVM tools in the user's path during testing. + +Each RUN line is executed on its own, distinct from other lines unless +its last character is ``\``. This continuation character causes the RUN +line to be concatenated with the next one. In this way you can build up +long pipelines of commands without making huge line lengths. The lines +ending in ``\`` are concatenated until a RUN line that doesn't end in +``\`` is found. This concatenated set of RUN lines then constitutes one +execution. :program:`lit` will substitute variables and arrange for the pipeline +to be executed. If any process in the pipeline fails, the entire line (and +test case) fails too. + +Below is an example of legal RUN lines in a ``.ll`` file: + +.. code-block:: llvm + + ; RUN: llvm-as < %s | llvm-dis > %t1 + ; RUN: llvm-dis < %s.bc-13 > %t2 + ; RUN: diff %t1 %t2 + +As with a Unix shell, the RUN lines permit pipelines and I/O +redirection to be used. However, the usage is slightly different than +for Bash. In general, it's useful to read the code of other tests to figure out +what you can use in yours. The major differences are: + +- You can't do ``2>&1``. That will cause :program:`lit` to write to a file + named ``&1``. Usually this is done to get stderr to go through a pipe. You + can do that with ``|&`` so replace this idiom: + ``... 2>&1 | grep`` with ``... |& grep`` +- You can only redirect to a file, not to another descriptor and not + from a here document. + +There are some quoting rules that you must pay attention to when writing +your RUN lines. In general nothing needs to be quoted. :program:`lit` won't +strip off any quote characters so they will get passed to the invoked program. +For example: + +.. code-block:: bash + + ... | grep 'find this string' + +This will fail because the ``'`` characters are passed to ``grep``. This would +make ``grep`` to look for ``'find`` in the files ``this`` and +``string'``. To avoid this use curly braces to tell :program:`lit` that it +should treat everything enclosed as one value. So our example would become: + +.. code-block:: bash + + ... | grep {find this string} + +In general, you should strive to keep your RUN lines as simple as possible, +using them only to run tools that generate the output you can then examine. The +recommended way to examine output to figure out if the test passes it using the +:doc:`FileCheck tool <CommandGuide/FileCheck>`. The usage of ``grep`` in RUN +lines is discouraged. + +The FileCheck utility +--------------------- + +A powerful feature of the RUN lines is that it allows any arbitrary +commands to be executed as part of the test harness. While standard +(portable) unix tools like ``grep`` work fine on run lines, as you see +above, there are a lot of caveats due to interaction with shell syntax, +and we want to make sure the run lines are portable to a wide range of +systems. Another major problem is that ``grep`` is not very good at checking +to verify that the output of a tools contains a series of different +output in a specific order. The :program:`FileCheck` tool was designed to +help with these problems. + +:program:`FileCheck` is designed to read a file to check from standard input, +and the set of things to verify from a file specified as a command line +argument. :program:`FileCheck` is described in :doc:`the FileCheck man page +<CommandGuide/FileCheck>`. + +Variables and substitutions +--------------------------- + +With a RUN line there are a number of substitutions that are permitted. +To make a substitution just write the variable's name preceded by a ``$``. +Additionally, for compatibility reasons with previous versions of the +test library, certain names can be accessed with an alternate syntax: a +% prefix. These alternates are deprecated and may go away in a future +version. + +Here are the available variable names. The alternate syntax is listed in +parentheses. + +``$test`` (``%s``) + The full path to the test case's source. This is suitable for passing on + the command line as the input to an LLVM tool. + +``%(line)``, ``%(line+<number>)``, ``%(line-<number>)`` + The number of the line where this variable is used, with an optional + integer offset. This can be used in tests with multiple RUN lines, + which reference test file's line numbers. + +``$srcdir`` + The source directory from where the ``make check`` was run. + +``objdir`` + The object directory that corresponds to the ``$srcdir``. + +``subdir`` + A partial path from the ``test`` directory that contains the + sub-directory that contains the test source being executed. + +``srcroot`` + The root directory of the LLVM src tree. + +``objroot`` + The root directory of the LLVM object tree. This could be the same as + the srcroot. + +``path`` + The path to the directory that contains the test case source. This is + for locating any supporting files that are not generated by the test, + but used by the test. + +``tmp`` + The path to a temporary file name that could be used for this test case. + The file name won't conflict with other test cases. You can append to it + if you need multiple temporaries. This is useful as the destination of + some redirected output. + +``target_triplet`` (``%target_triplet``) + The target triplet that corresponds to the current host machine (the one + running the test cases). This should probably be called "host". + +``link`` (``%link``) + This full link command used to link LLVM executables. This has all the + configured ``-I``, ``-L`` and ``-l`` options. + +``shlibext`` (``%shlibext``) + The suffix for the host platforms shared library (DLL) files. This + includes the period as the first character. + +To add more variables, look at ``test/lit.cfg``. + +Other Features +-------------- + +To make RUN line writing easier, there are several helper scripts and programs +in the ``llvm/test/Scripts`` directory. This directory is in the PATH +when running tests, so you can just call these scripts using their name. +For example: + +``ignore`` + This script runs its arguments and then always returns 0. This is useful + in cases where the test needs to cause a tool to generate an error (e.g. + to check the error output). However, any program in a pipeline that + returns a non-zero result will cause the test to fail. This script + overcomes that issue and nicely documents that the test case is + purposefully ignoring the result code of the tool +``not`` + This script runs its arguments and then inverts the result code from it. + Zero result codes become 1. Non-zero result codes become 0. + +Sometimes it is necessary to mark a test case as "expected fail" or +XFAIL. You can easily mark a test as XFAIL just by including ``XFAIL:`` +on a line near the top of the file. This signals that the test case +should succeed if the test fails. Such test cases are counted separately +by the testing tool. To specify an expected fail, use the XFAIL keyword +in the comments of the test program followed by a colon and one or more +failure patterns. Each failure pattern can be either ``*`` (to specify +fail everywhere), or a part of a target triple (indicating the test +should fail on that platform), or the name of a configurable feature +(for example, ``loadable_module``). If there is a match, the test is +expected to fail. If not, the test is expected to succeed. To XFAIL +everywhere just specify ``XFAIL: *``. Here is an example of an ``XFAIL`` +line: + +.. code-block:: llvm + + ; XFAIL: darwin,sun + +To make the output more useful, :program:`lit` will scan +the lines of the test case for ones that contain a pattern that matches +``PR[0-9]+``. This is the syntax for specifying a PR (Problem Report) number +that is related to the test case. The number after "PR" specifies the +LLVM bugzilla number. When a PR number is specified, it will be used in +the pass/fail reporting. This is useful to quickly get some context when +a test fails. + +Finally, any line that contains "END." will cause the special +interpretation of lines to terminate. This is generally done right after +the last RUN: line. This has two side effects: + +(a) it prevents special interpretation of lines that are part of the test + program, not the instructions to the test case, and + +(b) it speeds things up for really big test cases by avoiding + interpretation of the remainder of the file. + +``test-suite`` Overview +======================= + +The ``test-suite`` module contains a number of programs that can be +compiled and executed. The ``test-suite`` includes reference outputs for +all of the programs, so that the output of the executed program can be +checked for correctness. + +``test-suite`` tests are divided into three types of tests: MultiSource, +SingleSource, and External. + +- ``test-suite/SingleSource`` + + The SingleSource directory contains test programs that are only a + single source file in size. These are usually small benchmark + programs or small programs that calculate a particular value. Several + such programs are grouped together in each directory. + +- ``test-suite/MultiSource`` + + The MultiSource directory contains subdirectories which contain + entire programs with multiple source files. Large benchmarks and + whole applications go here. + +- ``test-suite/External`` + + The External directory contains Makefiles for building code that is + external to (i.e., not distributed with) LLVM. The most prominent + members of this directory are the SPEC 95 and SPEC 2000 benchmark + suites. The ``External`` directory does not contain these actual + tests, but only the Makefiles that know how to properly compile these + programs from somewhere else. When using ``LNT``, use the + ``--test-externals`` option to include these tests in the results. + +.. _test-suite-quickstart: + +``test-suite`` Quickstart +------------------------- + +The modern way of running the ``test-suite`` is focused on testing and +benchmarking complete compilers using the +`LNT <http://llvm.org/docs/lnt>`_ testing infrastructure. + +For more information on using LNT to execute the ``test-suite``, please +see the `LNT Quickstart <http://llvm.org/docs/lnt/quickstart.html>`_ +documentation. + +``test-suite`` Makefiles +------------------------ + +Historically, the ``test-suite`` was executed using a complicated setup +of Makefiles. The LNT based approach above is recommended for most +users, but there are some testing scenarios which are not supported by +the LNT approach. In addition, LNT currently uses the Makefile setup +under the covers and so developers who are interested in how LNT works +under the hood may want to understand the Makefile based setup. + +For more information on the ``test-suite`` Makefile setup, please see +the :doc:`Test Suite Makefile Guide <TestSuiteMakefileGuide>`. diff --git a/docs/WritingAnLLVMBackend.html b/docs/WritingAnLLVMBackend.html deleted file mode 100644 index 0ad472cb92..0000000000 --- a/docs/WritingAnLLVMBackend.html +++ /dev/null @@ -1,2557 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> -<html> -<head> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <title>Writing an LLVM Compiler Backend</title> - <link rel="stylesheet" href="_static/llvm.css" type="text/css"> -</head> - -<body> - -<h1> - Writing an LLVM Compiler Backend -</h1> - -<ol> - <li><a href="#intro">Introduction</a> - <ul> - <li><a href="#Audience">Audience</a></li> - <li><a href="#Prerequisite">Prerequisite Reading</a></li> - <li><a href="#Basic">Basic Steps</a></li> - <li><a href="#Preliminaries">Preliminaries</a></li> - </ul> - <li><a href="#TargetMachine">Target Machine</a></li> - <li><a href="#TargetRegistration">Target Registration</a></li> - <li><a href="#RegisterSet">Register Set and Register Classes</a> - <ul> - <li><a href="#RegisterDef">Defining a Register</a></li> - <li><a href="#RegisterClassDef">Defining a Register Class</a></li> - <li><a href="#implementRegister">Implement a subclass of TargetRegisterInfo</a></li> - </ul></li> - <li><a href="#InstructionSet">Instruction Set</a> - <ul> - <li><a href="#operandMapping">Instruction Operand Mapping</a></li> - <li><a href="#relationMapping">Instruction Relation Mapping</a></li> - <li><a href="#implementInstr">Implement a subclass of TargetInstrInfo</a></li> - <li><a href="#branchFolding">Branch Folding and If Conversion</a></li> - </ul></li> - <li><a href="#InstructionSelector">Instruction Selector</a> - <ul> - <li><a href="#LegalizePhase">The SelectionDAG Legalize Phase</a> - <ul> - <li><a href="#promote">Promote</a></li> - <li><a href="#expand">Expand</a></li> - <li><a href="#custom">Custom</a></li> - <li><a href="#legal">Legal</a></li> - </ul></li> - <li><a href="#callingConventions">Calling Conventions</a></li> - </ul></li> - <li><a href="#assemblyPrinter">Assembly Printer</a></li> - <li><a href="#subtargetSupport">Subtarget Support</a></li> - <li><a href="#jitSupport">JIT Support</a> - <ul> - <li><a href="#mce">Machine Code Emitter</a></li> - <li><a href="#targetJITInfo">Target JIT Info</a></li> - </ul></li> -</ol> - -<div class="doc_author"> - <p>Written by <a href="http://www.woo.com">Mason Woo</a> and - <a href="http://misha.brukman.net">Misha Brukman</a></p> -</div> - -<!-- *********************************************************************** --> -<h2> - <a name="intro">Introduction</a> -</h2> -<!-- *********************************************************************** --> - -<div> - -<p> -This document describes techniques for writing compiler backends that convert -the LLVM Intermediate Representation (IR) to code for a specified machine or -other languages. Code intended for a specific machine can take the form of -either assembly code or binary code (usable for a JIT compiler). -</p> - -<p> -The backend of LLVM features a target-independent code generator that may create -output for several types of target CPUs — including X86, PowerPC, ARM, -and SPARC. The backend may also be used to generate code targeted at SPUs of the -Cell processor or GPUs to support the execution of compute kernels. -</p> - -<p> -The document focuses on existing examples found in subdirectories -of <tt>llvm/lib/Target</tt> in a downloaded LLVM release. In particular, this -document focuses on the example of creating a static compiler (one that emits -text assembly) for a SPARC target, because SPARC has fairly standard -characteristics, such as a RISC instruction set and straightforward calling -conventions. -</p> - -<h3> - <a name="Audience">Audience</a> -</h3> - -<div> - -<p> -The audience for this document is anyone who needs to write an LLVM backend to -generate code for a specific hardware or software target. -</p> - -</div> - -<h3> - <a name="Prerequisite">Prerequisite Reading</a> -</h3> - -<div> - -<p> -These essential documents must be read before reading this document: -</p> - -<ul> -<li><i><a href="LangRef.html">LLVM Language Reference - Manual</a></i> — a reference manual for the LLVM assembly language.</li> - -<li><i><a href="CodeGenerator.html">The LLVM - Target-Independent Code Generator</a></i> — a guide to the components - (classes and code generation algorithms) for translating the LLVM internal - representation into machine code for a specified target. Pay particular - attention to the descriptions of code generation stages: Instruction - Selection, Scheduling and Formation, SSA-based Optimization, Register - Allocation, Prolog/Epilog Code Insertion, Late Machine Code Optimizations, - and Code Emission.</li> - -<li><i><a href="TableGenFundamentals.html">TableGen - Fundamentals</a></i> —a document that describes the TableGen - (<tt>tblgen</tt>) application that manages domain-specific information to - support LLVM code generation. TableGen processes input from a target - description file (<tt>.td</tt> suffix) and generates C++ code that can be - used for code generation.</li> - -<li><i><a href="WritingAnLLVMPass.html">Writing an LLVM - Pass</a></i> — The assembly printer is a <tt>FunctionPass</tt>, as are - several SelectionDAG processing steps.</li> -</ul> - -<p> -To follow the SPARC examples in this document, have a copy of -<i><a href="http://www.sparc.org/standards/V8.pdf">The SPARC Architecture -Manual, Version 8</a></i> for reference. For details about the ARM instruction -set, refer to the <i><a href="http://infocenter.arm.com/">ARM Architecture -Reference Manual</a></i>. For more about the GNU Assembler format -(<tt>GAS</tt>), see -<i><a href="http://sourceware.org/binutils/docs/as/index.html">Using As</a></i>, -especially for the assembly printer. <i>Using As</i> contains a list of target -machine dependent features. -</p> - -</div> - -<h3> - <a name="Basic">Basic Steps</a> -</h3> - -<div> - -<p> -To write a compiler backend for LLVM that converts the LLVM IR to code for a -specified target (machine or other language), follow these steps: -</p> - -<ul> -<li>Create a subclass of the TargetMachine class that describes characteristics - of your target machine. Copy existing examples of specific TargetMachine - class and header files; for example, start with - <tt>SparcTargetMachine.cpp</tt> and <tt>SparcTargetMachine.h</tt>, but - change the file names for your target. Similarly, change code that - references "Sparc" to reference your target. </li> - -<li>Describe the register set of the target. Use TableGen to generate code for - register definition, register aliases, and register classes from a - target-specific <tt>RegisterInfo.td</tt> input file. You should also write - additional code for a subclass of the TargetRegisterInfo class that - represents the class register file data used for register allocation and - also describes the interactions between registers.</li> - -<li>Describe the instruction set of the target. Use TableGen to generate code - for target-specific instructions from target-specific versions of - <tt>TargetInstrFormats.td</tt> and <tt>TargetInstrInfo.td</tt>. You should - write additional code for a subclass of the TargetInstrInfo class to - represent machine instructions supported by the target machine. </li> - -<li>Describe the selection and conversion of the LLVM IR from a Directed Acyclic - Graph (DAG) representation of instructions to native target-specific - instructions. Use TableGen to generate code that matches patterns and - selects instructions based on additional information in a target-specific - version of <tt>TargetInstrInfo.td</tt>. Write code - for <tt>XXXISelDAGToDAG.cpp</tt>, where XXX identifies the specific target, - to perform pattern matching and DAG-to-DAG instruction selection. Also write - code in <tt>XXXISelLowering.cpp</tt> to replace or remove operations and - data types that are not supported natively in a SelectionDAG. </li> - -<li>Write code for an assembly printer that converts LLVM IR to a GAS format for - your target machine. You should add assembly strings to the instructions - defined in your target-specific version of <tt>TargetInstrInfo.td</tt>. You - should also write code for a subclass of AsmPrinter that performs the - LLVM-to-assembly conversion and a trivial subclass of TargetAsmInfo.</li> - -<li>Optionally, add support for subtargets (i.e., variants with different - capabilities). You should also write code for a subclass of the - TargetSubtarget class, which allows you to use the <tt>-mcpu=</tt> - and <tt>-mattr=</tt> command-line options.</li> - -<li>Optionally, add JIT support and create a machine code emitter (subclass of - TargetJITInfo) that is used to emit binary code directly into memory. </li> -</ul> - -<p> -In the <tt>.cpp</tt> and <tt>.h</tt>. files, initially stub up these methods and -then implement them later. Initially, you may not know which private members -that the class will need and which components will need to be subclassed. -</p> - -</div> - -<h3> - <a name="Preliminaries">Preliminaries</a> -</h3> - -<div> - -<p> -To actually create your compiler backend, you need to create and modify a few -files. The absolute minimum is discussed here. But to actually use the LLVM -target-independent code generator, you must perform the steps described in -the <a href="CodeGenerator.html">LLVM -Target-Independent Code Generator</a> document. -</p> - -<p> -First, you should create a subdirectory under <tt>lib/Target</tt> to hold all -the files related to your target. If your target is called "Dummy," create the -directory <tt>lib/Target/Dummy</tt>. -</p> - -<p> -In this new -directory, create a <tt>Makefile</tt>. It is easiest to copy a -<tt>Makefile</tt> of another target and modify it. It should at least contain -the <tt>LEVEL</tt>, <tt>LIBRARYNAME</tt> and <tt>TARGET</tt> variables, and then -include <tt>$(LEVEL)/Makefile.common</tt>. The library can be -named <tt>LLVMDummy</tt> (for example, see the MIPS target). Alternatively, you -can split the library into <tt>LLVMDummyCodeGen</tt> -and <tt>LLVMDummyAsmPrinter</tt>, the latter of which should be implemented in a -subdirectory below <tt>lib/Target/Dummy</tt> (for example, see the PowerPC -target). -</p> - -<p> -Note that these two naming schemes are hardcoded into <tt>llvm-config</tt>. -Using any other naming scheme will confuse <tt>llvm-config</tt> and produce a -lot of (seemingly unrelated) linker errors when linking <tt>llc</tt>. -</p> - -<p> -To make your target actually do something, you need to implement a subclass of -<tt>TargetMachine</tt>. This implementation should typically be in the file -<tt>lib/Target/DummyTargetMachine.cpp</tt>, but any file in -the <tt>lib/Target</tt> directory will be built and should work. To use LLVM's -target independent code generator, you should do what all current machine -backends do: create a subclass of <tt>LLVMTargetMachine</tt>. (To create a -target from scratch, create a subclass of <tt>TargetMachine</tt>.) -</p> - -<p> -To get LLVM to actually build and link your target, you need to add it to -the <tt>TARGETS_TO_BUILD</tt> variable. To do this, you modify the configure -script to know about your target when parsing the <tt>--enable-targets</tt> -option. Search the configure script for <tt>TARGETS_TO_BUILD</tt>, add your -target to the lists there (some creativity required), and then -reconfigure. Alternatively, you can change <tt>autotools/configure.ac</tt> and -regenerate configure by running <tt>./autoconf/AutoRegen.sh</tt>. -</p> - -</div> - -</div> - -<!-- *********************************************************************** --> -<h2> - <a name="TargetMachine">Target Machine</a> -</h2> -<!-- *********************************************************************** --> - -<div> - -<p> -<tt>LLVMTargetMachine</tt> is designed as a base class for targets implemented -with the LLVM target-independent code generator. The <tt>LLVMTargetMachine</tt> -class should be specialized by a concrete target class that implements the -various virtual methods. <tt>LLVMTargetMachine</tt> is defined as a subclass of -<tt>TargetMachine</tt> in <tt>include/llvm/Target/TargetMachine.h</tt>. The -<tt>TargetMachine</tt> class implementation (<tt>TargetMachine.cpp</tt>) also -processes numerous command-line options. -</p> - -<p> -To create a concrete target-specific subclass of <tt>LLVMTargetMachine</tt>, -start by copying an existing <tt>TargetMachine</tt> class and header. You -should name the files that you create to reflect your specific target. For -instance, for the SPARC target, name the files <tt>SparcTargetMachine.h</tt> and -<tt>SparcTargetMachine.cpp</tt>. -</p> - -<p> -For a target machine <tt>XXX</tt>, the implementation of -<tt>XXXTargetMachine</tt> must have access methods to obtain objects that -represent target components. These methods are named <tt>get*Info</tt>, and are -intended to obtain the instruction set (<tt>getInstrInfo</tt>), register set -(<tt>getRegisterInfo</tt>), stack frame layout (<tt>getFrameInfo</tt>), and -similar information. <tt>XXXTargetMachine</tt> must also implement the -<tt>getDataLayout</tt> method to access an object with target-specific data -characteristics, such as data type size and alignment requirements. -</p> - -<p> -For instance, for the SPARC target, the header file -<tt>SparcTargetMachine.h</tt> declares prototypes for several <tt>get*Info</tt> -and <tt>getDataLayout</tt> methods that simply return a class member. -</p> - -<div class="doc_code"> -<pre> -namespace llvm { - -class Module; - -class SparcTargetMachine : public LLVMTargetMachine { - const DataLayout DataLayout; // Calculates type size & alignment - SparcSubtarget Subtarget; - SparcInstrInfo InstrInfo; - TargetFrameInfo FrameInfo; - -protected: - virtual const TargetAsmInfo *createTargetAsmInfo() const; - -public: - SparcTargetMachine(const Module &M, const std::string &FS); - - virtual const SparcInstrInfo *getInstrInfo() const {return &InstrInfo; } - virtual const TargetFrameInfo *getFrameInfo() const {return &FrameInfo; } - virtual const TargetSubtarget *getSubtargetImpl() const{return &Subtarget; } - virtual const TargetRegisterInfo *getRegisterInfo() const { - return &InstrInfo.getRegisterInfo(); - } - virtual const DataLayout *getDataLayout() const { return &DataLayout; } - static unsigned getModuleMatchQuality(const Module &M); - - // Pass Pipeline Configuration - virtual bool addInstSelector(PassManagerBase &PM, bool Fast); - virtual bool addPreEmitPass(PassManagerBase &PM, bool Fast); -}; - -} // end namespace llvm -</pre> -</div> - -<ul> -<li><tt>getInstrInfo()</tt></li> -<li><tt>getRegisterInfo()</tt></li> -<li><tt>getFrameInfo()</tt></li> -<li><tt>getDataLayout()</tt></li> -<li><tt>getSubtargetImpl()</tt></li> -</ul> - -<p>For some targets, you also need to support the following methods:</p> - -<ul> -<li><tt>getTargetLowering()</tt></li> -<li><tt>getJITInfo()</tt></li> -</ul> - -<p> -In addition, the <tt>XXXTargetMachine</tt> constructor should specify a -<tt>TargetDescription</tt> string that determines the data layout for the target -machine, including characteristics such as pointer size, alignment, and -endianness. For example, the constructor for SparcTargetMachine contains the -following: -</p> - -<div class="doc_code"> -<pre> -SparcTargetMachine::SparcTargetMachine(const Module &M, const std::string &FS) - : DataLayout("E-p:32:32-f128:128:128"), - Subtarget(M, FS), InstrInfo(Subtarget), - FrameInfo(TargetFrameInfo::StackGrowsDown, 8, 0) { -} -</pre> -</div> - -<p>Hyphens separate portions of the <tt>TargetDescription</tt> string.</p> - -<ul> -<li>An upper-case "<tt>E</tt>" in the string indicates a big-endian target data - model. a lower-case "<tt>e</tt>" indicates little-endian.</li> - -<li>"<tt>p:</tt>" is followed by pointer information: size, ABI alignment, and - preferred alignment. If only two figures follow "<tt>p:</tt>", then the - first value is pointer size, and the second value is both ABI and preferred - alignment.</li> - -<li>Then a letter for numeric type alignment: "<tt>i</tt>", "<tt>f</tt>", - "<tt>v</tt>", or "<tt>a</tt>" (corresponding to integer, floating point, - vector, or aggregate). "<tt>i</tt>", "<tt>v</tt>", or "<tt>a</tt>" are - followed by ABI alignment and preferred alignment. "<tt>f</tt>" is followed - by three values: the first indicates the size of a long double, then ABI - alignment, and then ABI preferred alignment.</li> -</ul> - -</div> - -<!-- *********************************************************************** --> -<h2> - <a name="TargetRegistration">Target Registration</a> -</h2> -<!-- *********************************************************************** --> - -<div> - -<p> -You must also register your target with the <tt>TargetRegistry</tt>, which is -what other LLVM tools use to be able to lookup and use your target at -runtime. The <tt>TargetRegistry</tt> can be used directly, but for most targets -there are helper templates which should take care of the work for you.</p> - -<p> -All targets should declare a global <tt>Target</tt> object which is used to -represent the target during registration. Then, in the target's TargetInfo -library, the target should define that object and use -the <tt>RegisterTarget</tt> template to register the target. For example, the Sparc registration code looks like this: -</p> - -<div class="doc_code"> -<pre> -Target llvm::TheSparcTarget; - -extern "C" void LLVMInitializeSparcTargetInfo() { - RegisterTarget<Triple::sparc, /*HasJIT=*/false> - X(TheSparcTarget, "sparc", "Sparc"); -} -</pre> -</div> - -<p> -This allows the <tt>TargetRegistry</tt> to look up the target by name or by -target triple. In addition, most targets will also register additional features -which are available in separate libraries. These registration steps are -separate, because some clients may wish to only link in some parts of the target --- the JIT code generator does not require the use of the assembler printer, for -example. Here is an example of registering the Sparc assembly printer: -</p> - -<div class="doc_code"> -<pre> -extern "C" void LLVMInitializeSparcAsmPrinter() { - RegisterAsmPrinter<SparcAsmPrinter> X(TheSparcTarget); -} -</pre> -</div> - -<p> -For more information, see -"<a href="/doxygen/TargetRegistry_8h-source.html">llvm/Target/TargetRegistry.h</a>". -</p> - -</div> - -<!-- *********************************************************************** --> -<h2> - <a name="RegisterSet">Register Set and Register Classes</a> -</h2> -<!-- *********************************************************************** --> - -<div> - -<p> -You should describe a concrete target-specific class that represents the -register file of a target machine. This class is called <tt>XXXRegisterInfo</tt> -(where <tt>XXX</tt> identifies the target) and represents the class register -file data that is used for register allocation. It also describes the -interactions between registers. -</p> - -<p> -You also need to define register classes to categorize related registers. A -register class should be added for groups of registers that are all treated the -same way for some instruction. Typical examples are register classes for -integer, floating-point, or vector registers. A register allocator allows an -instruction to use any register in a specified register class to perform the -instruction in a similar manner. Register classes allocate virtual registers to -instructions from these sets, and register classes let the target-independent -register allocator automatically choose the actual registers. -</p> - -<p> -Much of the code for registers, including register definition, register aliases, -and register classes, is generated by TableGen from <tt>XXXRegisterInfo.td</tt> -input files and placed in <tt>XXXGenRegisterInfo.h.inc</tt> and -<tt>XXXGenRegisterInfo.inc</tt> output files. Some of the code in the -implementation of <tt>XXXRegisterInfo</tt> requires hand-coding. -</p> - -<!-- ======================================================================= --> -<h3> - <a name="RegisterDef">Defining a Register</a> -</h3> - -<div> - -<p> -The <tt>XXXRegisterInfo.td</tt> file typically starts with register definitions -for a target machine. The <tt>Register</tt> class (specified -in <tt>Target.td</tt>) is used to define an object for each register. The -specified string <tt>n</tt> becomes the <tt>Name</tt> of the register. The -basic <tt>Register</tt> object does not have any subregisters and does not -specify any aliases. -</p> - -<div class="doc_code"> -<pre> -class Register<string n> { - string Namespace = ""; - string AsmName = n; - string Name = n; - int SpillSize = 0; - int SpillAlignment = 0; - list<Register> Aliases = []; - list<Register> SubRegs = []; - list<int> DwarfNumbers = []; -} -</pre> -</div> - -<p> -For example, in the <tt>X86RegisterInfo.td</tt> file, there are register -definitions that utilize the Register class, such as: -</p> - -<div class="doc_code"> -<pre> -def AL : Register<"AL">, DwarfRegNum<[0, 0, 0]>; -</pre> -</div> - -<p> -This defines the register <tt>AL</tt> and assigns it values (with -<tt>DwarfRegNum</tt>) that are used by <tt>gcc</tt>, <tt>gdb</tt>, or a debug -information writer to identify a register. For register -<tt>AL</tt>, <tt>DwarfRegNum</tt> takes an array of 3 values representing 3 -different modes: the first element is for X86-64, the second for exception -handling (EH) on X86-32, and the third is generic. -1 is a special Dwarf number -that indicates the gcc number is undefined, and -2 indicates the register number -is invalid for this mode. -</p> - -<p> -From the previously described line in the <tt>X86RegisterInfo.td</tt> file, -TableGen generates this code in the <tt>X86GenRegisterInfo.inc</tt> file: -</p> - -<div class="doc_code"> -<pre> -static const unsigned GR8[] = { X86::AL, ... }; - -const unsigned AL_AliasSet[] = { X86::AX, X86::EAX, X86::RAX, 0 }; - -const TargetRegisterDesc RegisterDescriptors[] = { - ... -{ "AL", "AL", AL_AliasSet, Empty_SubRegsSet, Empty_SubRegsSet, AL_SuperRegsSet }, ... -</pre> -</div> - -<p> -From the register info file, TableGen generates a <tt>TargetRegisterDesc</tt> -object for each register. <tt>TargetRegisterDesc</tt> is defined in -<tt>include/llvm/Target/TargetRegisterInfo.h</tt> with the following fields: -</p> - -<div class="doc_code"> -<pre> -struct TargetRegisterDesc { - const char *AsmName; // Assembly language name for the register - const char *Name; // Printable name for the reg (for debugging) - const unsigned *AliasSet; // Register Alias Set - const unsigned *SubRegs; // Sub-register set - const unsigned *ImmSubRegs; // Immediate sub-register set - const unsigned *SuperRegs; // Super-register set -};</pre> -</div> - -<p> -TableGen uses the entire target description file (<tt>.td</tt>) to determine -text names for the register (in the <tt>AsmName</tt> and <tt>Name</tt> fields of -<tt>TargetRegisterDesc</tt>) and the relationships of other registers to the -defined register (in the other <tt>TargetRegisterDesc</tt> fields). In this -example, other definitions establish the registers "<tt>AX</tt>", -"<tt>EAX</tt>", and "<tt>RAX</tt>" as aliases for one another, so TableGen -generates a null-terminated array (<tt>AL_AliasSet</tt>) for this register alias -set. -</p> - -<p> -The <tt>Register</tt> class is commonly used as a base class for more complex -classes. In <tt>Target.td</tt>, the <tt>Register</tt> class is the base for the -<tt>RegisterWithSubRegs</tt> class that is used to define registers that need to -specify subregisters in the <tt>SubRegs</tt> list, as shown here: -</p> - -<div class="doc_code"> -<pre> -class RegisterWithSubRegs<string n, -list<Register> subregs> : Register<n> { - let SubRegs = subregs; -} -</pre> -</div> - -<p> -In <tt>SparcRegisterInfo.td</tt>, additional register classes are defined for -SPARC: a Register subclass, SparcReg, and further subclasses: <tt>Ri</tt>, -<tt>Rf</tt>, and <tt>Rd</tt>. SPARC registers are identified by 5-bit ID -numbers, which is a feature common to these subclasses. Note the use of -'<tt>let</tt>' expressions to override values that are initially defined in a -superclass (such as <tt>SubRegs</tt> field in the <tt>Rd</tt> class). -</p> - -<div class="doc_code"> -<pre> -class SparcReg<string n> : Register<n> { - field bits<5> Num; - let Namespace = "SP"; -} -// Ri - 32-bit integer registers -class Ri<bits<5> num, string n> : -SparcReg<n> { - let Num = num; -} -// Rf - 32-bit floating-point registers -class Rf<bits<5> num, string n> : -SparcReg<n> { - let Num = num; -} -// Rd - Slots in the FP register file for 64-bit -floating-point values. -class Rd<bits<5> num, string n, -list<Register> subregs> : SparcReg<n> { - let Num = num; - let SubRegs = subregs; -} -</pre> -</div> - -<p> -In the <tt>SparcRegisterInfo.td</tt> file, there are register definitions that -utilize these subclasses of <tt>Register</tt>, such as: -</p> - -<div class="doc_code"> -<pre> -def G0 : Ri< 0, "G0">, -DwarfRegNum<[0]>; -def G1 : Ri< 1, "G1">, DwarfRegNum<[1]>; -... -def F0 : Rf< 0, "F0">, -DwarfRegNum<[32]>; -def F1 : Rf< 1, "F1">, -DwarfRegNum<[33]>; -... -def D0 : Rd< 0, "F0", [F0, F1]>, -DwarfRegNum<[32]>; -def D1 : Rd< 2, "F2", [F2, F3]>, -DwarfRegNum<[34]>; -</pre> -</div> - -<p> -The last two registers shown above (<tt>D0</tt> and <tt>D1</tt>) are -double-precision floating-point registers that are aliases for pairs of -single-precision floating-point sub-registers. In addition to aliases, the -sub-register and super-register relationships of the defined register are in -fields of a register's TargetRegisterDesc. -</p> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="RegisterClassDef">Defining a Register Class</a> -</h3> - -<div> - -<p> -The <tt>RegisterClass</tt> class (specified in <tt>Target.td</tt>) is used to -define an object that represents a group of related registers and also defines -the default allocation order of the registers. A target description file -<tt>XXXRegisterInfo.td</tt> that uses <tt>Target.td</tt> can construct register -classes using the following class: -</p> - -<div class="doc_code"> -<pre> -class RegisterClass<string namespace, -list<ValueType> regTypes, int alignment, dag regList> { - string Namespace = namespace; - list<ValueType> RegTypes = regTypes; - int Size = 0; // spill size, in bits; zero lets tblgen pick the size - int Alignment = alignment; - - // CopyCost is the cost of copying a value between two registers - // default value 1 means a single instruction - // A negative value means copying is extremely expensive or impossible - int CopyCost = 1; - dag MemberList = regList; - - // for register classes that are subregisters of this class - list<RegisterClass> SubRegClassList = []; - - code MethodProtos = [{}]; // to insert arbitrary code - code MethodBodies = [{}]; -} -</pre> -</div> - -<p>To define a RegisterClass, use the following 4 arguments:</p> - -<ul> -<li>The first argument of the definition is the name of the namespace.</li> - -<li>The second argument is a list of <tt>ValueType</tt> register type values - that are defined in <tt>include/llvm/CodeGen/ValueTypes.td</tt>. Defined - values include integer types (such as <tt>i16</tt>, <tt>i32</tt>, - and <tt>i1</tt> for Boolean), floating-point types - (<tt>f32</tt>, <tt>f64</tt>), and vector types (for example, <tt>v8i16</tt> - for an <tt>8 x i16</tt> vector). All registers in a <tt>RegisterClass</tt> - must have the same <tt>ValueType</tt>, but some registers may store vector - data in different configurations. For example a register that can process a - 128-bit vector may be able to handle 16 8-bit integer elements, 8 16-bit - integers, 4 32-bit integers, and so on. </li> - -<li>The third argument of the <tt>RegisterClass</tt> definition specifies the - alignment required of the registers when they are stored or loaded to - memory.</li> - -<li>The final argument, <tt>regList</tt>, specifies which registers are in this - class. If an alternative allocation order method is not specified, then - <tt>regList</tt> also defines the order of allocation used by the register - allocator. Besides simply listing registers with <tt>(add R0, R1, ...)</tt>, - more advanced set operators are available. See - <tt>include/llvm/Target/Target.td</tt> for more information.</li> -</ul> - -<p> -In <tt>SparcRegisterInfo.td</tt>, three RegisterClass objects are defined: -<tt>FPRegs</tt>, <tt>DFPRegs</tt>, and <tt>IntRegs</tt>. For all three register -classes, the first argument defines the namespace with the string -'<tt>SP</tt>'. <tt>FPRegs</tt> defines a group of 32 single-precision -floating-point registers (<tt>F0</tt> to <tt>F31</tt>); <tt>DFPRegs</tt> defines -a group of 16 double-precision registers -(<tt>D0-D15</tt>). -</p> - -<div class="doc_code"> -<pre> -// F0, F1, F2, ..., F31 -def FPRegs : RegisterClass<"SP", [f32], 32, (sequence "F%u", 0, 31)>; - -def DFPRegs : RegisterClass<"SP", [f64], 64, - (add D0, D1, D2, D3, D4, D5, D6, D7, D8, - D9, D10, D11, D12, D13, D14, D15)>; - -def IntRegs : RegisterClass<"SP", [i32], 32, - (add L0, L1, L2, L3, L4, L5, L6, L7, - I0, I1, I2, I3, I4, I5, - O0, O1, O2, O3, O4, O5, O7, - G1, - // Non-allocatable regs: - G2, G3, G4, - O6, // stack ptr - I6, // frame ptr - I7, // return address - G0, // constant zero - G5, G6, G7 // reserved for kernel - )>; -</pre> -</div> - -<p> -Using <tt>SparcRegisterInfo.td</tt> with TableGen generates several output files -that are intended for inclusion in other source code that you write. -<tt>SparcRegisterInfo.td</tt> generates <tt>SparcGenRegisterInfo.h.inc</tt>, -which should be included in the header file for the implementation of the SPARC -register implementation that you write (<tt>SparcRegisterInfo.h</tt>). In -<tt>SparcGenRegisterInfo.h.inc</tt> a new structure is defined called -<tt>SparcGenRegisterInfo</tt> that uses <tt>TargetRegisterInfo</tt> as its -base. It also specifies types, based upon the defined register -classes: <tt>DFPRegsClass</tt>, <tt>FPRegsClass</tt>, and <tt>IntRegsClass</tt>. -</p> - -<p> -<tt>SparcRegisterInfo.td</tt> also generates <tt>SparcGenRegisterInfo.inc</tt>, -which is included at the bottom of <tt>SparcRegisterInfo.cpp</tt>, the SPARC -register implementation. The code below shows only the generated integer -registers and associated register classes. The order of registers -in <tt>IntRegs</tt> reflects the order in the definition of <tt>IntRegs</tt> in -the target description file. -</p> - -<div class="doc_code"> -<pre> // IntRegs Register Class... - static const unsigned IntRegs[] = { - SP::L0, SP::L1, SP::L2, SP::L3, SP::L4, SP::L5, - SP::L6, SP::L7, SP::I0, SP::I1, SP::I2, SP::I3, - SP::I4, SP::I5, SP::O0, SP::O1, SP::O2, SP::O3, - SP::O4, SP::O5, SP::O7, SP::G1, SP::G2, SP::G3, - SP::G4, SP::O6, SP::I6, SP::I7, SP::G0, SP::G5, - SP::G6, SP::G7, - }; - - // IntRegsVTs Register Class Value Types... - static const MVT::ValueType IntRegsVTs[] = { - MVT::i32, MVT::Other - }; - -namespace SP { // Register class instances - DFPRegsClass DFPRegsRegClass; - FPRegsClass FPRegsRegClass; - IntRegsClass IntRegsRegClass; -... - // IntRegs Sub-register Classess... - static const TargetRegisterClass* const IntRegsSubRegClasses [] = { - NULL - }; -... - // IntRegs Super-register Classess... - static const TargetRegisterClass* const IntRegsSuperRegClasses [] = { - NULL - }; -... - // IntRegs Register Class sub-classes... - static const TargetRegisterClass* const IntRegsSubclasses [] = { - NULL - }; -... - // IntRegs Register Class super-classes... - static const TargetRegisterClass* const IntRegsSuperclasses [] = { - NULL - }; - - IntRegsClass::IntRegsClass() : TargetRegisterClass(IntRegsRegClassID, - IntRegsVTs, IntRegsSubclasses, IntRegsSuperclasses, IntRegsSubRegClasses, - IntRegsSuperRegClasses, 4, 4, 1, IntRegs, IntRegs + 32) {} -} -</pre> -</div> - -<p> -The register allocators will avoid using reserved registers, and callee saved -registers are not used until all the volatile registers have been used. That -is usually good enough, but in some cases it may be necessary to provide custom -allocation orders. -</p> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="implementRegister">Implement a subclass of</a> - <a href="CodeGenerator.html#targetregisterinfo">TargetRegisterInfo</a> -</h3> - -<div> - -<p> -The final step is to hand code portions of <tt>XXXRegisterInfo</tt>, which -implements the interface described in <tt>TargetRegisterInfo.h</tt>. These -functions return <tt>0</tt>, <tt>NULL</tt>, or <tt>false</tt>, unless -overridden. Here is a list of functions that are overridden for the SPARC -implementation in <tt>SparcRegisterInfo.cpp</tt>: -</p> - -<ul> -<li><tt>getCalleeSavedRegs</tt> — Returns a list of callee-saved registers - in the order of the desired callee-save stack frame offset.</li> - -<li><tt>getReservedRegs</tt> — Returns a bitset indexed by physical - register numbers, indicating if a particular register is unavailable.</li> - -<li><tt>hasFP</tt> — Return a Boolean indicating if a function should have - a dedicated frame pointer register.</li> - -<li><tt>eliminateCallFramePseudoInstr</tt> — If call frame setup or - destroy pseudo instructions are used, this can be called to eliminate - them.</li> - -<li><tt>eliminateFrameIndex</tt> — Eliminate abstract frame indices from - instructions that may use them.</li> - -<li><tt>emitPrologue</tt> — Insert prologue code into the function.</li> - -<li><tt>emitEpilogue</tt> — Insert epilogue code into the function.</li> -</ul> - -</div> - -</div> - -<!-- *********************************************************************** --> -<h2> - <a name="InstructionSet">Instruction Set</a> -</h2> - -<!-- *********************************************************************** --> -<div> - -<p> -During the early stages of code generation, the LLVM IR code is converted to a -<tt>SelectionDAG</tt> with nodes that are instances of the <tt>SDNode</tt> class -containing target instructions. An <tt>SDNode</tt> has an opcode, operands, type -requirements, and operation properties. For example, is an operation -commutative, does an operation load from memory. The various operation node -types are described in the <tt>include/llvm/CodeGen/SelectionDAGNodes.h</tt> -file (values of the <tt>NodeType</tt> enum in the <tt>ISD</tt> namespace). -</p> - -<p> -TableGen uses the following target description (<tt>.td</tt>) input files to -generate much of the code for instruction definition: -</p> - -<ul> -<li><tt>Target.td</tt> — Where the <tt>Instruction</tt>, <tt>Operand</tt>, - <tt>InstrInfo</tt>, and other fundamental classes are defined.</li> - -<li><tt>TargetSelectionDAG.td</tt>— Used by <tt>SelectionDAG</tt> - instruction selection generators, contains <tt>SDTC*</tt> classes (selection - DAG type constraint), definitions of <tt>SelectionDAG</tt> nodes (such as - <tt>imm</tt>, <tt>cond</tt>, <tt>bb</tt>, <tt>add</tt>, <tt>fadd</tt>, - <tt>sub</tt>), and pattern support (<tt>Pattern</tt>, <tt>Pat</tt>, - <tt>PatFrag</tt>, <tt>PatLeaf</tt>, <tt>ComplexPattern</tt>.</li> - -<li><tt>XXXInstrFormats.td</tt> — Patterns for definitions of - target-specific instructions.</li> - -<li><tt>XXXInstrInfo.td</tt> — Target-specific definitions of instruction - templates, condition codes, and instructions of an instruction set. For - architecture modifications, a different file name may be used. For example, - for Pentium with SSE instruction, this file is <tt>X86InstrSSE.td</tt>, and - for Pentium with MMX, this file is <tt>X86InstrMMX.td</tt>.</li> -</ul> - -<p> -There is also a target-specific <tt>XXX.td</tt> file, where <tt>XXX</tt> is the -name of the target. The <tt>XXX.td</tt> file includes the other <tt>.td</tt> -input files, but its contents are only directly important for subtargets. -</p> - -<p> -You should describe a concrete target-specific class <tt>XXXInstrInfo</tt> that -represents machine instructions supported by a target machine. -<tt>XXXInstrInfo</tt> contains an array of <tt>XXXInstrDescriptor</tt> objects, -each of which describes one instruction. An instruction descriptor defines:</p> - -<ul> -<li>Opcode mnemonic</li> - -<li>Number of operands</li> - -<li>List of implicit register definitions and uses</li> - -<li>Target-independent properties (such as memory access, is commutable)</li> - -<li>Target-specific flags </li> -</ul> - -<p> -The Instruction class (defined in <tt>Target.td</tt>) is mostly used as a base -for more complex instruction classes. -</p> - -<div class="doc_code"> -<pre>class Instruction { - string Namespace = ""; - dag OutOperandList; // An dag containing the MI def operand list. - dag InOperandList; // An dag containing the MI use operand list. - string AsmString = ""; // The .s format to print the instruction with. - list<dag> Pattern; // Set to the DAG pattern for this instruction - list<Register> Uses = []; - list<Register> Defs = []; - list<Predicate> Predicates = []; // predicates turned into isel match code - ... remainder not shown for space ... -} -</pre> -</div> - -<p> -A <tt>SelectionDAG</tt> node (<tt>SDNode</tt>) should contain an object -representing a target-specific instruction that is defined -in <tt>XXXInstrInfo.td</tt>. The instruction objects should represent -instructions from the architecture manual of the target machine (such as the -SPARC Architecture Manual for the SPARC target). -</p> - -<p> -A single instruction from the architecture manual is often modeled as multiple -target instructions, depending upon its operands. For example, a manual might -describe an add instruction that takes a register or an immediate operand. An -LLVM target could model this with two instructions named <tt>ADDri</tt> and -<tt>ADDrr</tt>. -</p> - -<p> -You should define a class for each instruction category and define each opcode -as a subclass of the category with appropriate parameters such as the fixed -binary encoding of opcodes and extended opcodes. You should map the register -bits to the bits of the instruction in which they are encoded (for the -JIT). Also you should specify how the instruction should be printed when the -automatic assembly printer is used. -</p> - -<p> -As is described in the SPARC Architecture Manual, Version 8, there are three -major 32-bit formats for instructions. Format 1 is only for the <tt>CALL</tt> -instruction. Format 2 is for branch on condition codes and <tt>SETHI</tt> (set -high bits of a register) instructions. Format 3 is for other instructions. -</p> - -<p> -Each of these formats has corresponding classes in <tt>SparcInstrFormat.td</tt>. -<tt>InstSP</tt> is a base class for other instruction classes. Additional base -classes are specified for more precise formats: for example -in <tt>SparcInstrFormat.td</tt>, <tt>F2_1</tt> is for <tt>SETHI</tt>, -and <tt>F2_2</tt> is for branches. There are three other base -classes: <tt>F3_1</tt> for register/register operations, <tt>F3_2</tt> for -register/immediate operations, and <tt>F3_3</tt> for floating-point -operations. <tt>SparcInstrInfo.td</tt> also adds the base class Pseudo for -synthetic SPARC instructions. -</p> - -<p> -<tt>SparcInstrInfo.td</tt> largely consists of operand and instruction -definitions for the SPARC target. In <tt>SparcInstrInfo.td</tt>, the following -target description file entry, <tt>LDrr</tt>, defines the Load Integer -instruction for a Word (the <tt>LD</tt> SPARC opcode) from a memory address to a -register. The first parameter, the value 3 (<tt>11<sub>2</sub></tt>), is the -operation value for this category of operation. The second parameter -(<tt>000000<sub>2</sub></tt>) is the specific operation value -for <tt>LD</tt>/Load Word. The third parameter is the output destination, which -is a register operand and defined in the <tt>Register</tt> target description -file (<tt>IntRegs</tt>). -</p> - -<div class="doc_code"> -<pre>def LDrr : F3_1 <3, 0b000000, (outs IntRegs:$dst), (ins MEMrr:$addr), - "ld [$addr], $dst", - [(set IntRegs:$dst, (load ADDRrr:$addr))]>; -</pre> -</div> - -<p> -The fourth parameter is the input source, which uses the address -operand <tt>MEMrr</tt> that is defined earlier in <tt>SparcInstrInfo.td</tt>: -</p> - -<div class="doc_code"> -<pre>def MEMrr : Operand<i32> { - let PrintMethod = "printMemOperand"; - let MIOperandInfo = (ops IntRegs, IntRegs); -} -</pre> -</div> - -<p> -The fifth parameter is a string that is used by the assembly printer and can be -left as an empty string until the assembly printer interface is implemented. The -sixth and final parameter is the pattern used to match the instruction during -the SelectionDAG Select Phase described in -(<a href="CodeGenerator.html">The LLVM -Target-Independent Code Generator</a>). This parameter is detailed in the next -section, <a href="#InstructionSelector">Instruction Selector</a>. -</p> - -<p> -Instruction class definitions are not overloaded for different operand types, so -separate versions of instructions are needed for register, memory, or immediate -value operands. For example, to perform a Load Integer instruction for a Word -from an immediate operand to a register, the following instruction class is -defined: -</p> - -<div class="doc_code"> -<pre>def LDri : F3_2 <3, 0b000000, (outs IntRegs:$dst), (ins MEMri:$addr), - "ld [$addr], $dst", - [(set IntRegs:$dst, (load ADDRri:$addr))]>; -</pre> -</div> - -<p> -Writing these definitions for so many similar instructions can involve a lot of -cut and paste. In td files, the <tt>multiclass</tt> directive enables the -creation of templates to define several instruction classes at once (using -the <tt>defm</tt> directive). For example in <tt>SparcInstrInfo.td</tt>, the -<tt>multiclass</tt> pattern <tt>F3_12</tt> is defined to create 2 instruction -classes each time <tt>F3_12</tt> is invoked: -</p> - -<div class="doc_code"> -<pre>multiclass F3_12 <string OpcStr, bits<6> Op3Val, SDNode OpNode> { - def rr : F3_1 <2, Op3Val, - (outs IntRegs:$dst), (ins IntRegs:$b, IntRegs:$c), - !strconcat(OpcStr, " $b, $c, $dst"), - [(set IntRegs:$dst, (OpNode IntRegs:$b, IntRegs:$c))]>; - def ri : F3_2 <2, Op3Val, - (outs IntRegs:$dst), (ins IntRegs:$b, i32imm:$c), - !strconcat(OpcStr, " $b, $c, $dst"), - [(set IntRegs:$dst, (OpNode IntRegs:$b, simm13:$c))]>; -} -</pre> -</div> - -<p> -So when the <tt>defm</tt> directive is used for the <tt>XOR</tt> -and <tt>ADD</tt> instructions, as seen below, it creates four instruction -objects: <tt>XORrr</tt>, <tt>XORri</tt>, <tt>ADDrr</tt>, and <tt>ADDri</tt>. -</p> - -<div class="doc_code"> -<pre> -defm XOR : F3_12<"xor", 0b000011, xor>; -defm ADD : F3_12<"add", 0b000000, add>; -</pre> -</div> - -<p> -<tt>SparcInstrInfo.td</tt> also includes definitions for condition codes that -are referenced by branch instructions. The following definitions -in <tt>SparcInstrInfo.td</tt> indicate the bit location of the SPARC condition -code. For example, the 10<sup>th</sup> bit represents the 'greater than' -condition for integers, and the 22<sup>nd</sup> bit represents the 'greater -than' condition for floats. -</p> - -<div class="doc_code"> -<pre> -def ICC_NE : ICC_VAL< 9>; // Not Equal -def ICC_E : ICC_VAL< 1>; // Equal -def ICC_G : ICC_VAL<10>; // Greater -... -def FCC_U : FCC_VAL<23>; // Unordered -def FCC_G : FCC_VAL<22>; // Greater -def FCC_UG : FCC_VAL<21>; // Unordered or Greater -... -</pre> -</div> - -<p> -(Note that <tt>Sparc.h</tt> also defines enums that correspond to the same SPARC -condition codes. Care must be taken to ensure the values in <tt>Sparc.h</tt> -correspond to the values in <tt>SparcInstrInfo.td</tt>. I.e., -<tt>SPCC::ICC_NE = 9</tt>, <tt>SPCC::FCC_U = 23</tt> and so on.) -</p> - -<!-- ======================================================================= --> -<h3> - <a name="operandMapping">Instruction Operand Mapping</a> -</h3> - -<div> - -<p> -The code generator backend maps instruction operands to fields in the -instruction. Operands are assigned to unbound fields in the instruction in the -order they are defined. Fields are bound when they are assigned a value. For -example, the Sparc target defines the <tt>XNORrr</tt> instruction as -a <tt>F3_1</tt> format instruction having three operands. -</p> - -<div class="doc_code"> -<pre> -def XNORrr : F3_1<2, 0b000111, - (outs IntRegs:$dst), (ins IntRegs:$b, IntRegs:$c), - "xnor $b, $c, $dst", - [(set IntRegs:$dst, (not (xor IntRegs:$b, IntRegs:$c)))]>; -</pre> -</div> - -<p> -The instruction templates in <tt>SparcInstrFormats.td</tt> show the base class -for <tt>F3_1</tt> is <tt>InstSP</tt>. -</p> - -<div class="doc_code"> -<pre> -class InstSP<dag outs, dag ins, string asmstr, list<dag> pattern> : Instruction { - field bits<32> Inst; - let Namespace = "SP"; - bits<2> op; - let Inst{31-30} = op; - dag OutOperandList = outs; - dag InOperandList = ins; - let AsmString = asmstr; - let Pattern = pattern; -} -</pre> -</div> - -<p><tt>InstSP</tt> leaves the <tt>op</tt> field unbound.</p> - -<div class="doc_code"> -<pre> -class F3<dag outs, dag ins, string asmstr, list<dag> pattern> - : InstSP<outs, ins, asmstr, pattern> { - bits<5> rd; - bits<6> op3; - bits<5> rs1; - let op{1} = 1; // Op = 2 or 3 - let Inst{29-25} = rd; - let Inst{24-19} = op3; - let Inst{18-14} = rs1; -} -</pre> -</div> - -<p> -<tt>F3</tt> binds the <tt>op</tt> field and defines the <tt>rd</tt>, -<tt>op3</tt>, and <tt>rs1</tt> fields. <tt>F3</tt> format instructions will -bind the operands <tt>rd</tt>, <tt>op3</tt>, and <tt>rs1</tt> fields. -</p> - -<div class="doc_code"> -<pre> -class F3_1<bits<2> opVal, bits<6> op3val, dag outs, dag ins, - string asmstr, list<dag> pattern> : F3<outs, ins, asmstr, pattern> { - bits<8> asi = 0; // asi not currently used - bits<5> rs2; - let op = opVal; - let op3 = op3val; - let Inst{13} = 0; // i field = 0 - let Inst{12-5} = asi; // address space identifier - let Inst{4-0} = rs2; -} -</pre> -</div> - -<p> -<tt>F3_1</tt> binds the <tt>op3</tt> field and defines the <tt>rs2</tt> -fields. <tt>F3_1</tt> format instructions will bind the operands to the <tt>rd</tt>, -<tt>rs1</tt>, and <tt>rs2</tt> fields. This results in the <tt>XNORrr</tt> -instruction binding <tt>$dst</tt>, <tt>$b</tt>, and <tt>$c</tt> operands to -the <tt>rd</tt>, <tt>rs1</tt>, and <tt>rs2</tt> fields respectively. -</p> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="relationMapping">Instruction Relation Mapping</a> -</h3> - -<div> - -<p> -This TableGen feature is used to relate instructions with each other. It is -particularly useful when you have multiple instruction formats and need to -switch between them after instruction selection. This entire feature is driven -by relation models which can be defined in <tt>XXXInstrInfo.td</tt> files -according to the target-specific instruction set. Relation models are defined -using <tt>InstrMapping</tt> class as a base. TableGen parses all the models -and generates instruction relation maps using the specified information. -Relation maps are emitted as tables in the <tt>XXXGenInstrInfo.inc</tt> file -along with the functions to query them. For the detailed information on how to -use this feature, please refer to -<a href="HowToUseInstrMappings.html">How to add Instruction Mappings</a> -document. -</p> -</div> - -<!-- ======================================================================= --> -<h3> - <a name="implementInstr">Implement a subclass of </a> - <a href="CodeGenerator.html#targetinstrinfo">TargetInstrInfo</a> -</h3> - -<div> - -<p> -The final step is to hand code portions of <tt>XXXInstrInfo</tt>, which -implements the interface described in <tt>TargetInstrInfo.h</tt>. These -functions return <tt>0</tt> or a Boolean or they assert, unless -overridden. Here's a list of functions that are overridden for the SPARC -implementation in <tt>SparcInstrInfo.cpp</tt>: -</p> - -<ul> -<li><tt>isLoadFromStackSlot</tt> — If the specified machine instruction is - a direct load from a stack slot, return the register number of the - destination and the <tt>FrameIndex</tt> of the stack slot.</li> - -<li><tt>isStoreToStackSlot</tt> — If the specified machine instruction is - a direct store to a stack slot, return the register number of the - destination and the <tt>FrameIndex</tt> of the stack slot.</li> - -<li><tt>copyPhysReg</tt> — Copy values between a pair of physical - registers.</li> - -<li><tt>storeRegToStackSlot</tt> — Store a register value to a stack - slot.</li> - -<li><tt>loadRegFromStackSlot</tt> — Load a register value from a stack - slot.</li> - -<li><tt>storeRegToAddr</tt> — Store a register value to memory.</li> - -<li><tt>loadRegFromAddr</tt> — Load a register value from memory.</li> - -<li><tt>foldMemoryOperand</tt> — Attempt to combine instructions of any - load or store instruction for the specified operand(s).</li> -</ul> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="branchFolding">Branch Folding and If Conversion</a> -</h3> -<div> - -<p> -Performance can be improved by combining instructions or by eliminating -instructions that are never reached. The <tt>AnalyzeBranch</tt> method -in <tt>XXXInstrInfo</tt> may be implemented to examine conditional instructions -and remove unnecessary instructions. <tt>AnalyzeBranch</tt> looks at the end of -a machine basic block (MBB) for opportunities for improvement, such as branch -folding and if conversion. The <tt>BranchFolder</tt> and <tt>IfConverter</tt> -machine function passes (see the source files <tt>BranchFolding.cpp</tt> and -<tt>IfConversion.cpp</tt> in the <tt>lib/CodeGen</tt> directory) call -<tt>AnalyzeBranch</tt> to improve the control flow graph that represents the -instructions. -</p> - -<p> -Several implementations of <tt>AnalyzeBranch</tt> (for ARM, Alpha, and X86) can -be examined as models for your own <tt>AnalyzeBranch</tt> implementation. Since -SPARC does not implement a useful <tt>AnalyzeBranch</tt>, the ARM target -implementation is shown below. -</p> - -<p><tt>AnalyzeBranch</tt> returns a Boolean value and takes four parameters:</p> - -<ul> -<li><tt>MachineBasicBlock &MBB</tt> — The incoming block to be - examined.</li> - -<li><tt>MachineBasicBlock *&TBB</tt> — A destination block that is - returned. For a conditional branch that evaluates to true, <tt>TBB</tt> is - the destination.</li> - -<li><tt>MachineBasicBlock *&FBB</tt> — For a conditional branch that - evaluates to false, <tt>FBB</tt> is returned as the destination.</li> - -<li><tt>std::vector<MachineOperand> &Cond</tt> — List of - operands to evaluate a condition for a conditional branch.</li> -</ul> - -<p> -In the simplest case, if a block ends without a branch, then it falls through to -the successor block. No destination blocks are specified for either <tt>TBB</tt> -or <tt>FBB</tt>, so both parameters return <tt>NULL</tt>. The start of -the <tt>AnalyzeBranch</tt> (see code below for the ARM target) shows the -function parameters and the code for the simplest case. -</p> - -<div class="doc_code"> -<pre>bool ARMInstrInfo::AnalyzeBranch(MachineBasicBlock &MBB, - MachineBasicBlock *&TBB, MachineBasicBlock *&FBB, - std::vector<MachineOperand> &Cond) const -{ - MachineBasicBlock::iterator I = MBB.end(); - if (I == MBB.begin() || !isUnpredicatedTerminator(--I)) - return false; -</pre> -</div> - -<p> -If a block ends with a single unconditional branch instruction, then -<tt>AnalyzeBranch</tt> (shown below) should return the destination of that -branch in the <tt>TBB</tt> parameter. -</p> - -<div class="doc_code"> -<pre> - if (LastOpc == ARM::B || LastOpc == ARM::tB) { - TBB = LastInst->getOperand(0).getMBB(); - return false; - } -</pre> -</div> - -<p> -If a block ends with two unconditional branches, then the second branch is never -reached. In that situation, as shown below, remove the last branch instruction -and return the penultimate branch in the <tt>TBB</tt> parameter. -</p> - -<div class="doc_code"> -<pre> - if ((SecondLastOpc == ARM::B || SecondLastOpc==ARM::tB) && - (LastOpc == ARM::B || LastOpc == ARM::tB)) { - TBB = SecondLastInst->getOperand(0).getMBB(); - I = LastInst; - I->eraseFromParent(); - return false; - } -</pre> -</div> - -<p> -A block may end with a single conditional branch instruction that falls through -to successor block if the condition evaluates to false. In that case, -<tt>AnalyzeBranch</tt> (shown below) should return the destination of that -conditional branch in the <tt>TBB</tt> parameter and a list of operands in -the <tt>Cond</tt> parameter to evaluate the condition. -</p> - -<div class="doc_code"> -<pre> - if (LastOpc == ARM::Bcc || LastOpc == ARM::tBcc) { - // Block ends with fall-through condbranch. - TBB = LastInst->getOperand(0).getMBB(); - Cond.push_back(LastInst->getOperand(1)); - Cond.push_back(LastInst->getOperand(2)); - return false; - } -</pre> -</div> - -<p> -If a block ends with both a conditional branch and an ensuing unconditional -branch, then <tt>AnalyzeBranch</tt> (shown below) should return the conditional -branch destination (assuming it corresponds to a conditional evaluation of -'<tt>true</tt>') in the <tt>TBB</tt> parameter and the unconditional branch -destination in the <tt>FBB</tt> (corresponding to a conditional evaluation of -'<tt>false</tt>'). A list of operands to evaluate the condition should be -returned in the <tt>Cond</tt> parameter. -</p> - -<div class="doc_code"> -<pre> - unsigned SecondLastOpc = SecondLastInst->getOpcode(); - - if ((SecondLastOpc == ARM::Bcc && LastOpc == ARM::B) || - (SecondLastOpc == ARM::tBcc && LastOpc == ARM::tB)) { - TBB = SecondLastInst->getOperand(0).getMBB(); - Cond.push_back(SecondLastInst->getOperand(1)); - Cond.push_back(SecondLastInst->getOperand(2)); - FBB = LastInst->getOperand(0).getMBB(); - return false; - } -</pre> -</div> - -<p> -For the last two cases (ending with a single conditional branch or ending with -one conditional and one unconditional branch), the operands returned in -the <tt>Cond</tt> parameter can be passed to methods of other instructions to -create new branches or perform other operations. An implementation -of <tt>AnalyzeBranch</tt> requires the helper methods <tt>RemoveBranch</tt> -and <tt>InsertBranch</tt> to manage subsequent operations. -</p> - -<p> -<tt>AnalyzeBranch</tt> should return false indicating success in most circumstances. -<tt>AnalyzeBranch</tt> should only return true when the method is stumped about what to -do, for example, if a block has three terminating branches. <tt>AnalyzeBranch</tt> may -return true if it encounters a terminator it cannot handle, such as an indirect -branch. -</p> - -</div> - -</div> - -<!-- *********************************************************************** --> -<h2> - <a name="InstructionSelector">Instruction Selector</a> -</h2> -<!-- *********************************************************************** --> - -<div> - -<p> -LLVM uses a <tt>SelectionDAG</tt> to represent LLVM IR instructions, and nodes -of the <tt>SelectionDAG</tt> ideally represent native target -instructions. During code generation, instruction selection passes are performed -to convert non-native DAG instructions into native target-specific -instructions. The pass described in <tt>XXXISelDAGToDAG.cpp</tt> is used to -match patterns and perform DAG-to-DAG instruction selection. Optionally, a pass -may be defined (in <tt>XXXBranchSelector.cpp</tt>) to perform similar DAG-to-DAG -operations for branch instructions. Later, the code in -<tt>XXXISelLowering.cpp</tt> replaces or removes operations and data types not -supported natively (legalizes) in a <tt>SelectionDAG</tt>. -</p> - -<p> -TableGen generates code for instruction selection using the following target -description input files: -</p> - -<ul> -<li><tt>XXXInstrInfo.td</tt> — Contains definitions of instructions in a - target-specific instruction set, generates <tt>XXXGenDAGISel.inc</tt>, which - is included in <tt>XXXISelDAGToDAG.cpp</tt>.</li> - -<li><tt>XXXCallingConv.td</tt> — Contains the calling and return value - conventions for the target architecture, and it generates - <tt>XXXGenCallingConv.inc</tt>, which is included in - <tt>XXXISelLowering.cpp</tt>.</li> -</ul> - -<p> -The implementation of an instruction selection pass must include a header that -declares the <tt>FunctionPass</tt> class or a subclass of <tt>FunctionPass</tt>. In -<tt>XXXTargetMachine.cpp</tt>, a Pass Manager (PM) should add each instruction -selection pass into the queue of passes to run. -</p> - -<p> -The LLVM static compiler (<tt>llc</tt>) is an excellent tool for visualizing the -contents of DAGs. To display the <tt>SelectionDAG</tt> before or after specific -processing phases, use the command line options for <tt>llc</tt>, described -at <a href="CodeGenerator.html#selectiondag_process"> -SelectionDAG Instruction Selection Process</a>. -</p> - -<p> -To describe instruction selector behavior, you should add patterns for lowering -LLVM code into a <tt>SelectionDAG</tt> as the last parameter of the instruction -definitions in <tt>XXXInstrInfo.td</tt>. For example, in -<tt>SparcInstrInfo.td</tt>, this entry defines a register store operation, and -the last parameter describes a pattern with the store DAG operator. -</p> - -<div class="doc_code"> -<pre> -def STrr : F3_1< 3, 0b000100, (outs), (ins MEMrr:$addr, IntRegs:$src), - "st $src, [$addr]", [(store IntRegs:$src, ADDRrr:$addr)]>; -</pre> -</div> - -<p> -<tt>ADDRrr</tt> is a memory mode that is also defined in -<tt>SparcInstrInfo.td</tt>: -</p> - -<div class="doc_code"> -<pre> -def ADDRrr : ComplexPattern<i32, 2, "SelectADDRrr", [], []>; -</pre> -</div> - -<p> -The definition of <tt>ADDRrr</tt> refers to <tt>SelectADDRrr</tt>, which is a -function defined in an implementation of the Instructor Selector (such -as <tt>SparcISelDAGToDAG.cpp</tt>). -</p> - -<p> -In <tt>lib/Target/TargetSelectionDAG.td</tt>, the DAG operator for store is -defined below: -</p> - -<div class="doc_code"> -<pre> -def store : PatFrag<(ops node:$val, node:$ptr), - (st node:$val, node:$ptr), [{ - if (StoreSDNode *ST = dyn_cast<StoreSDNode>(N)) - return !ST->isTruncatingStore() && - ST->getAddressingMode() == ISD::UNINDEXED; - return false; -}]>; -</pre> -</div> - -<p> -<tt>XXXInstrInfo.td</tt> also generates (in <tt>XXXGenDAGISel.inc</tt>) the -<tt>SelectCode</tt> method that is used to call the appropriate processing -method for an instruction. In this example, <tt>SelectCode</tt> -calls <tt>Select_ISD_STORE</tt> for the <tt>ISD::STORE</tt> opcode. -</p> - -<div class="doc_code"> -<pre> -SDNode *SelectCode(SDValue N) { - ... - MVT::ValueType NVT = N.getNode()->getValueType(0); - switch (N.getOpcode()) { - case ISD::STORE: { - switch (NVT) { - default: - return Select_ISD_STORE(N); - break; - } - break; - } - ... -</pre> -</div> - -<p> -The pattern for <tt>STrr</tt> is matched, so elsewhere in -<tt>XXXGenDAGISel.inc</tt>, code for <tt>STrr</tt> is created for -<tt>Select_ISD_STORE</tt>. The <tt>Emit_22</tt> method is also generated -in <tt>XXXGenDAGISel.inc</tt> to complete the processing of this -instruction. -</p> - -<div class="doc_code"> -<pre> -SDNode *Select_ISD_STORE(const SDValue &N) { - SDValue Chain = N.getOperand(0); - if (Predicate_store(N.getNode())) { - SDValue N1 = N.getOperand(1); - SDValue N2 = N.getOperand(2); - SDValue CPTmp0; - SDValue CPTmp1; - - // Pattern: (st:void IntRegs:i32:$src, - // ADDRrr:i32:$addr)<<P:Predicate_store>> - // Emits: (STrr:void ADDRrr:i32:$addr, IntRegs:i32:$src) - // Pattern complexity = 13 cost = 1 size = 0 - if (SelectADDRrr(N, N2, CPTmp0, CPTmp1) && - N1.getNode()->getValueType(0) == MVT::i32 && - N2.getNode()->getValueType(0) == MVT::i32) { - return Emit_22(N, SP::STrr, CPTmp0, CPTmp1); - } -... -</pre> -</div> - -<!-- ======================================================================= --> -<h3> - <a name="LegalizePhase">The SelectionDAG Legalize Phase</a> -</h3> - -<div> - -<p> -The Legalize phase converts a DAG to use types and operations that are natively -supported by the target. For natively unsupported types and operations, you need -to add code to the target-specific XXXTargetLowering implementation to convert -unsupported types and operations to supported ones. -</p> - -<p> -In the constructor for the <tt>XXXTargetLowering</tt> class, first use the -<tt>addRegisterClass</tt> method to specify which types are supports and which -register classes are associated with them. The code for the register classes are -generated by TableGen from <tt>XXXRegisterInfo.td</tt> and placed -in <tt>XXXGenRegisterInfo.h.inc</tt>. For example, the implementation of the -constructor for the SparcTargetLowering class (in -<tt>SparcISelLowering.cpp</tt>) starts with the following code: -</p> - -<div class="doc_code"> -<pre> -addRegisterClass(MVT::i32, SP::IntRegsRegisterClass); -addRegisterClass(MVT::f32, SP::FPRegsRegisterClass); -addRegisterClass(MVT::f64, SP::DFPRegsRegisterClass); -</pre> -</div> - -<p> -You should examine the node types in the <tt>ISD</tt> namespace -(<tt>include/llvm/CodeGen/SelectionDAGNodes.h</tt>) and determine which -operations the target natively supports. For operations that do <b>not</b> have -native support, add a callback to the constructor for the XXXTargetLowering -class, so the instruction selection process knows what to do. The TargetLowering -class callback methods (declared in <tt>llvm/Target/TargetLowering.h</tt>) are: -</p> - -<ul> -<li><tt>setOperationAction</tt> — General operation.</li> - -<li><tt>setLoadExtAction</tt> — Load with extension.</li> - -<li><tt>setTruncStoreAction</tt> — Truncating store.</li> - -<li><tt>setIndexedLoadAction</tt> — Indexed load.</li> - -<li><tt>setIndexedStoreAction</tt> — Indexed store.</li> - -<li><tt>setConvertAction</tt> — Type conversion.</li> - -<li><tt>setCondCodeAction</tt> — Support for a given condition code.</li> -</ul> - -<p> -Note: on older releases, <tt>setLoadXAction</tt> is used instead -of <tt>setLoadExtAction</tt>. Also, on older releases, -<tt>setCondCodeAction</tt> may not be supported. Examine your release -to see what methods are specifically supported. -</p> - -<p> -These callbacks are used to determine that an operation does or does not work -with a specified type (or types). And in all cases, the third parameter is -a <tt>LegalAction</tt> type enum value: <tt>Promote</tt>, <tt>Expand</tt>, -<tt>Custom</tt>, or <tt>Legal</tt>. <tt>SparcISelLowering.cpp</tt> -contains examples of all four <tt>LegalAction</tt> values. -</p> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="promote">Promote</a> -</h4> - -<div> - -<p> -For an operation without native support for a given type, the specified type may -be promoted to a larger type that is supported. For example, SPARC does not -support a sign-extending load for Boolean values (<tt>i1</tt> type), so -in <tt>SparcISelLowering.cpp</tt> the third parameter below, <tt>Promote</tt>, -changes <tt>i1</tt> type values to a large type before loading. -</p> - -<div class="doc_code"> -<pre> -setLoadExtAction(ISD::SEXTLOAD, MVT::i1, Promote); -</pre> -</div> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="expand">Expand</a> -</h4> - -<div> - -<p> -For a type without native support, a value may need to be broken down further, -rather than promoted. For an operation without native support, a combination of -other operations may be used to similar effect. In SPARC, the floating-point -sine and cosine trig operations are supported by expansion to other operations, -as indicated by the third parameter, <tt>Expand</tt>, to -<tt>setOperationAction</tt>: -</p> - -<div class="doc_code"> -<pre> -setOperationAction(ISD::FSIN, MVT::f32, Expand); -setOperationAction(ISD::FCOS, MVT::f32, Expand); -</pre> -</div> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="custom">Custom</a> -</h4> - -<div> - -<p> -For some operations, simple type promotion or operation expansion may be -insufficient. In some cases, a special intrinsic function must be implemented. -</p> - -<p> -For example, a constant value may require special treatment, or an operation may -require spilling and restoring registers in the stack and working with register -allocators. -</p> - -<p> -As seen in <tt>SparcISelLowering.cpp</tt> code below, to perform a type -conversion from a floating point value to a signed integer, first the -<tt>setOperationAction</tt> should be called with <tt>Custom</tt> as the third -parameter: -</p> - -<div class="doc_code"> -<pre> -setOperationAction(ISD::FP_TO_SINT, MVT::i32, Custom); -</pre> -</div> - -<p> -In the <tt>LowerOperation</tt> method, for each <tt>Custom</tt> operation, a -case statement should be added to indicate what function to call. In the -following code, an <tt>FP_TO_SINT</tt> opcode will call -the <tt>LowerFP_TO_SINT</tt> method: -</p> - -<div class="doc_code"> -<pre> -SDValue SparcTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) { - switch (Op.getOpcode()) { - case ISD::FP_TO_SINT: return LowerFP_TO_SINT(Op, DAG); - ... - } -} -</pre> -</div> - -<p> -Finally, the <tt>LowerFP_TO_SINT</tt> method is implemented, using an FP -register to convert the floating-point value to an integer. -</p> - -<div class="doc_code"> -<pre> -static SDValue LowerFP_TO_SINT(SDValue Op, SelectionDAG &DAG) { - assert(Op.getValueType() == MVT::i32); - Op = DAG.getNode(SPISD::FTOI, MVT::f32, Op.getOperand(0)); - return DAG.getNode(ISD::BITCAST, MVT::i32, Op); -} -</pre> -</div> - -</div> - -<!-- _______________________________________________________________________ --> -<h4> - <a name="legal">Legal</a> -</h4> - -<div> - -<p> -The <tt>Legal</tt> LegalizeAction enum value simply indicates that an -operation <b>is</b> natively supported. <tt>Legal</tt> represents the default -condition, so it is rarely used. In <tt>SparcISelLowering.cpp</tt>, the action -for <tt>CTPOP</tt> (an operation to count the bits set in an integer) is -natively supported only for SPARC v9. The following code enables -the <tt>Expand</tt> conversion technique for non-v9 SPARC implementations. -</p> - -<div class="doc_code"> -<pre> -setOperationAction(ISD::CTPOP, MVT::i32, Expand); -... -if (TM.getSubtarget<SparcSubtarget>().isV9()) - setOperationAction(ISD::CTPOP, MVT::i32, Legal); - case ISD::SETULT: return SPCC::ICC_CS; - case ISD::SETULE: return SPCC::ICC_LEU; - case ISD::SETUGT: return SPCC::ICC_GU; - case ISD::SETUGE: return SPCC::ICC_CC; - } -} -</pre> -</div> - -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="callingConventions">Calling Conventions</a> -</h3> - -<div> - -<p> -To support target-specific calling conventions, <tt>XXXGenCallingConv.td</tt> -uses interfaces (such as CCIfType and CCAssignToReg) that are defined in -<tt>lib/Target/TargetCallingConv.td</tt>. TableGen can take the target -descriptor file <tt>XXXGenCallingConv.td</tt> and generate the header -file <tt>XXXGenCallingConv.inc</tt>, which is typically included -in <tt>XXXISelLowering.cpp</tt>. You can use the interfaces in -<tt>TargetCallingConv.td</tt> to specify: -</p> - -<ul> -<li>The order of parameter allocation.</li> - -<li>Where parameters and return values are placed (that is, on the stack or in - registers).</li> - -<li>Which registers may be used.</li> - -<li>Whether the caller or callee unwinds the stack.</li> -</ul> - -<p> -The following example demonstrates the use of the <tt>CCIfType</tt> and -<tt>CCAssignToReg</tt> interfaces. If the <tt>CCIfType</tt> predicate is true -(that is, if the current argument is of type <tt>f32</tt> or <tt>f64</tt>), then -the action is performed. In this case, the <tt>CCAssignToReg</tt> action assigns -the argument value to the first available register: either <tt>R0</tt> -or <tt>R1</tt>. -</p> - -<div class="doc_code"> -<pre> -CCIfType<[f32,f64], CCAssignToReg<[R0, R1]>> -</pre> -</div> - -<p> -<tt>SparcCallingConv.td</tt> contains definitions for a target-specific -return-value calling convention (RetCC_Sparc32) and a basic 32-bit C calling -convention (<tt>CC_Sparc32</tt>). The definition of <tt>RetCC_Sparc32</tt> -(shown below) indicates which registers are used for specified scalar return -types. A single-precision float is returned to register <tt>F0</tt>, and a -double-precision float goes to register <tt>D0</tt>. A 32-bit integer is -returned in register <tt>I0</tt> or <tt>I1</tt>. -</p> - -<div class="doc_code"> -<pre> -def RetCC_Sparc32 : CallingConv<[ - CCIfType<[i32], CCAssignToReg<[I0, I1]>>, - CCIfType<[f32], CCAssignToReg<[F0]>>, - CCIfType<[f64], CCAssignToReg<[D0]>> -]>; -</pre> -</div> - -<p> -The definition of <tt>CC_Sparc32</tt> in <tt>SparcCallingConv.td</tt> introduces -<tt>CCAssignToStack</tt>, which assigns the value to a stack slot with the -specified size and alignment. In the example below, the first parameter, 4, -indicates the size of the slot, and the second parameter, also 4, indicates the -stack alignment along 4-byte units. (Special cases: if size is zero, then the -ABI size is used; if alignment is zero, then the ABI alignment is used.) -</p> - -<div class="doc_code"> -<pre> -def CC_Sparc32 : CallingConv<[ - // All arguments get passed in integer registers if there is space. - CCIfType<[i32, f32, f64], CCAssignToReg<[I0, I1, I2, I3, I4, I5]>>, - CCAssignToStack<4, 4> -]>; -</pre> -</div> - -<p> -<tt>CCDelegateTo</tt> is another commonly used interface, which tries to find a -specified sub-calling convention, and, if a match is found, it is invoked. In -the following example (in <tt>X86CallingConv.td</tt>), the definition of -<tt>RetCC_X86_32_C</tt> ends with <tt>CCDelegateTo</tt>. After the current value -is assigned to the register <tt>ST0</tt> or <tt>ST1</tt>, -the <tt>RetCC_X86Common</tt> is invoked. -</p> - -<div class="doc_code"> -<pre> -def RetCC_X86_32_C : CallingConv<[ - CCIfType<[f32], CCAssignToReg<[ST0, ST1]>>, - CCIfType<[f64], CCAssignToReg<[ST0, ST1]>>, - CCDelegateTo<RetCC_X86Common> -]>; -</pre> -</div> - -<p> -<tt>CCIfCC</tt> is an interface that attempts to match the given name to the -current calling convention. If the name identifies the current calling -convention, then a specified action is invoked. In the following example (in -<tt>X86CallingConv.td</tt>), if the <tt>Fast</tt> calling convention is in use, -then <tt>RetCC_X86_32_Fast</tt> is invoked. If the <tt>SSECall</tt> calling -convention is in use, then <tt>RetCC_X86_32_SSE</tt> is invoked. -</p> - -<div class="doc_code"> -<pre> -def RetCC_X86_32 : CallingConv<[ - CCIfCC<"CallingConv::Fast", CCDelegateTo<RetCC_X86_32_Fast>>, - CCIfCC<"CallingConv::X86_SSECall", CCDelegateTo<RetCC_X86_32_SSE>>, - CCDelegateTo<RetCC_X86_32_C> -]>; -</pre> -</div> - -<p>Other calling convention interfaces include:</p> - -<ul> -<li><tt>CCIf <predicate, action></tt> — If the predicate matches, - apply the action.</li> - -<li><tt>CCIfInReg <action></tt> — If the argument is marked with the - '<tt>inreg</tt>' attribute, then apply the action.</li> - -<li><tt>CCIfNest <action></tt> — Inf the argument is marked with the - '<tt>nest</tt>' attribute, then apply the action.</li> - -<li><tt>CCIfNotVarArg <action></tt> — If the current function does - not take a variable number of arguments, apply the action.</li> - -<li><tt>CCAssignToRegWithShadow <registerList, shadowList></tt> — - similar to <tt>CCAssignToReg</tt>, but with a shadow list of registers.</li> - -<li><tt>CCPassByVal <size, align></tt> — Assign value to a stack - slot with the minimum specified size and alignment.</li> - -<li><tt>CCPromoteToType <type></tt> — Promote the current value to - the specified type.</li> - -<li><tt>CallingConv <[actions]></tt> — Define each calling - convention that is supported.</li> -</ul> - -</div> - -</div> - -<!-- *********************************************************************** --> -<h2> - <a name="assemblyPrinter">Assembly Printer</a> -</h2> -<!-- *********************************************************************** --> - -<div> - -<p> -During the code emission stage, the code generator may utilize an LLVM pass to -produce assembly output. To do this, you want to implement the code for a -printer that converts LLVM IR to a GAS-format assembly language for your target -machine, using the following steps: -</p> - -<ul> -<li>Define all the assembly strings for your target, adding them to the - instructions defined in the <tt>XXXInstrInfo.td</tt> file. - (See <a href="#InstructionSet">Instruction Set</a>.) TableGen will produce - an output file (<tt>XXXGenAsmWriter.inc</tt>) with an implementation of - the <tt>printInstruction</tt> method for the XXXAsmPrinter class.</li> - -<li>Write <tt>XXXTargetAsmInfo.h</tt>, which contains the bare-bones declaration - of the <tt>XXXTargetAsmInfo</tt> class (a subclass - of <tt>TargetAsmInfo</tt>).</li> - -<li>Write <tt>XXXTargetAsmInfo.cpp</tt>, which contains target-specific values - for <tt>TargetAsmInfo</tt> properties and sometimes new implementations for - methods.</li> - -<li>Write <tt>XXXAsmPrinter.cpp</tt>, which implements the <tt>AsmPrinter</tt> - class that performs the LLVM-to-assembly conversion.</li> -</ul> - -<p> -The code in <tt>XXXTargetAsmInfo.h</tt> is usually a trivial declaration of the -<tt>XXXTargetAsmInfo</tt> class for use in <tt>XXXTargetAsmInfo.cpp</tt>. -Similarly, <tt>XXXTargetAsmInfo.cpp</tt> usually has a few declarations of -<tt>XXXTargetAsmInfo</tt> replacement values that override the default values -in <tt>TargetAsmInfo.cpp</tt>. For example in <tt>SparcTargetAsmInfo.cpp</tt>: -</p> - -<div class="doc_code"> -<pre> -SparcTargetAsmInfo::SparcTargetAsmInfo(const SparcTargetMachine &TM) { - Data16bitsDirective = "\t.half\t"; - Data32bitsDirective = "\t.word\t"; - Data64bitsDirective = 0; // .xword is only supported by V9. - ZeroDirective = "\t.skip\t"; - CommentString = "!"; - ConstantPoolSection = "\t.section \".rodata\",#alloc\n"; -} -</pre> -</div> - -<p> -The X86 assembly printer implementation (<tt>X86TargetAsmInfo</tt>) is an -example where the target specific <tt>TargetAsmInfo</tt> class uses an -overridden methods: <tt>ExpandInlineAsm</tt>. -</p> - -<p> -A target-specific implementation of AsmPrinter is written in -<tt>XXXAsmPrinter.cpp</tt>, which implements the <tt>AsmPrinter</tt> class that -converts the LLVM to printable assembly. The implementation must include the -following headers that have declarations for the <tt>AsmPrinter</tt> and -<tt>MachineFunctionPass</tt> classes. The <tt>MachineFunctionPass</tt> is a -subclass of <tt>FunctionPass</tt>. -</p> - -<div class="doc_code"> -<pre> -#include "llvm/CodeGen/AsmPrinter.h" -#include "llvm/CodeGen/MachineFunctionPass.h" -</pre> -</div> - -<p> -As a <tt>FunctionPass</tt>, <tt>AsmPrinter</tt> first -calls <tt>doInitialization</tt> to set up the <tt>AsmPrinter</tt>. In -<tt>SparcAsmPrinter</tt>, a <tt>Mangler</tt> object is instantiated to process -variable names. -</p> - -<p> -In <tt>XXXAsmPrinter.cpp</tt>, the <tt>runOnMachineFunction</tt> method -(declared in <tt>MachineFunctionPass</tt>) must be implemented -for <tt>XXXAsmPrinter</tt>. In <tt>MachineFunctionPass</tt>, -the <tt>runOnFunction</tt> method invokes <tt>runOnMachineFunction</tt>. -Target-specific implementations of <tt>runOnMachineFunction</tt> differ, but -generally do the following to process each machine function: -</p> - -<ul> -<li>Call <tt>SetupMachineFunction</tt> to perform initialization.</li> - -<li>Call <tt>EmitConstantPool</tt> to print out (to the output stream) constants - which have been spilled to memory.</li> - -<li>Call <tt>EmitJumpTableInfo</tt> to print out jump tables used by the current - function.</li> - -<li>Print out the label for the current function.</li> - -<li>Print out the code for the function, including basic block labels and the - assembly for the instruction (using <tt>printInstruction</tt>)</li> -</ul> - -<p> -The <tt>XXXAsmPrinter</tt> implementation must also include the code generated -by TableGen that is output in the <tt>XXXGenAsmWriter.inc</tt> file. The code -in <tt>XXXGenAsmWriter.inc</tt> contains an implementation of the -<tt>printInstruction</tt> method that may call these methods: -</p> - -<ul> -<li><tt>printOperand</tt></li> - -<li><tt>printMemOperand</tt></li> - -<li><tt>printCCOperand (for conditional statements)</tt></li> - -<li><tt>printDataDirective</tt></li> - -<li><tt>printDeclare</tt></li> - -<li><tt>printImplicitDef</tt></li> - -<li><tt>printInlineAsm</tt></li> -</ul> - -<p> -The implementations of <tt>printDeclare</tt>, <tt>printImplicitDef</tt>, -<tt>printInlineAsm</tt>, and <tt>printLabel</tt> in <tt>AsmPrinter.cpp</tt> are -generally adequate for printing assembly and do not need to be -overridden. -</p> - -<p> -The <tt>printOperand</tt> method is implemented with a long switch/case -statement for the type of operand: register, immediate, basic block, external -symbol, global address, constant pool index, or jump table index. For an -instruction with a memory address operand, the <tt>printMemOperand</tt> method -should be implemented to generate the proper output. Similarly, -<tt>printCCOperand</tt> should be used to print a conditional operand. -</p> - -<p><tt>doFinalization</tt> should be overridden in <tt>XXXAsmPrinter</tt>, and -it should be called to shut down the assembly printer. During -<tt>doFinalization</tt>, global variables and constants are printed to -output. -</p> - -</div> - -<!-- *********************************************************************** --> -<h2> - <a name="subtargetSupport">Subtarget Support</a> -</h2> -<!-- *********************************************************************** --> - -<div> - -<p> -Subtarget support is used to inform the code generation process of instruction -set variations for a given chip set. For example, the LLVM SPARC implementation -provided covers three major versions of the SPARC microprocessor architecture: -Version 8 (V8, which is a 32-bit architecture), Version 9 (V9, a 64-bit -architecture), and the UltraSPARC architecture. V8 has 16 double-precision -floating-point registers that are also usable as either 32 single-precision or 8 -quad-precision registers. V8 is also purely big-endian. V9 has 32 -double-precision floating-point registers that are also usable as 16 -quad-precision registers, but cannot be used as single-precision registers. The -UltraSPARC architecture combines V9 with UltraSPARC Visual Instruction Set -extensions. -</p> - -<p> -If subtarget support is needed, you should implement a target-specific -XXXSubtarget class for your architecture. This class should process the -command-line options <tt>-mcpu=</tt> and <tt>-mattr=</tt>. -</p> - -<p> -TableGen uses definitions in the <tt>Target.td</tt> and <tt>Sparc.td</tt> files -to generate code in <tt>SparcGenSubtarget.inc</tt>. In <tt>Target.td</tt>, shown -below, the <tt>SubtargetFeature</tt> interface is defined. The first 4 string -parameters of the <tt>SubtargetFeature</tt> interface are a feature name, an -attribute set by the feature, the value of the attribute, and a description of -the feature. (The fifth parameter is a list of features whose presence is -implied, and its default value is an empty array.) -</p> - -<div class="doc_code"> -<pre> -class SubtargetFeature<string n, string a, string v, string d, - list<SubtargetFeature> i = []> { - string Name = n; - string Attribute = a; - string Value = v; - string Desc = d; - list<SubtargetFeature> Implies = i; -} -</pre> -</div> - -<p> -In the <tt>Sparc.td</tt> file, the SubtargetFeature is used to define the -following features. -</p> - -<div class="doc_code"> -<pre> -def FeatureV9 : SubtargetFeature<"v9", "IsV9", "true", - "Enable SPARC-V9 instructions">; -def FeatureV8Deprecated : SubtargetFeature<"deprecated-v8", - "V8DeprecatedInsts", "true", - "Enable deprecated V8 instructions in V9 mode">; -def FeatureVIS : SubtargetFeature<"vis", "IsVIS", "true", - "Enable UltraSPARC Visual Instruction Set extensions">; -</pre> -</div> - -<p> -Elsewhere in <tt>Sparc.td</tt>, the Proc class is defined and then is used to -define particular SPARC processor subtypes that may have the previously -described features. -</p> - -<div class="doc_code"> -<pre> -class Proc<string Name, list<SubtargetFeature> Features> - : Processor<Name, NoItineraries, Features>; - -def : Proc<"generic", []>; -def : Proc<"v8", []>; -def : Proc<"supersparc", []>; -def : Proc<"sparclite", []>; -def : Proc<"f934", []>; -def : Proc<"hypersparc", []>; -def : Proc<"sparclite86x", []>; -def : Proc<"sparclet", []>; -def : Proc<"tsc701", []>; -def : Proc<"v9", [FeatureV9]>; -def : Proc<"ultrasparc", [FeatureV9, FeatureV8Deprecated]>; -def : Proc<"ultrasparc3", [FeatureV9, FeatureV8Deprecated]>; -def : Proc<"ultrasparc3-vis", [FeatureV9, FeatureV8Deprecated, FeatureVIS]>; -</pre> -</div> - -<p> -From <tt>Target.td</tt> and <tt>Sparc.td</tt> files, the resulting -SparcGenSubtarget.inc specifies enum values to identify the features, arrays of -constants to represent the CPU features and CPU subtypes, and the -ParseSubtargetFeatures method that parses the features string that sets -specified subtarget options. The generated <tt>SparcGenSubtarget.inc</tt> file -should be included in the <tt>SparcSubtarget.cpp</tt>. The target-specific -implementation of the XXXSubtarget method should follow this pseudocode: -</p> - -<div class="doc_code"> -<pre> -XXXSubtarget::XXXSubtarget(const Module &M, const std::string &FS) { - // Set the default features - // Determine default and user specified characteristics of the CPU - // Call ParseSubtargetFeatures(FS, CPU) to parse the features string - // Perform any additional operations -} -</pre> -</div> - -</div> - -<!-- *********************************************************************** --> -<h2> - <a name="jitSupport">JIT Support</a> -</h2> -<!-- *********************************************************************** --> - -<div> - -<p> -The implementation of a target machine optionally includes a Just-In-Time (JIT) -code generator that emits machine code and auxiliary structures as binary output -that can be written directly to memory. To do this, implement JIT code -generation by performing the following steps: -</p> - -<ul> -<li>Write an <tt>XXXCodeEmitter.cpp</tt> file that contains a machine function - pass that transforms target-machine instructions into relocatable machine - code.</li> - -<li>Write an <tt>XXXJITInfo.cpp</tt> file that implements the JIT interfaces for - target-specific code-generation activities, such as emitting machine code - and stubs.</li> - -<li>Modify <tt>XXXTargetMachine</tt> so that it provides a - <tt>TargetJITInfo</tt> object through its <tt>getJITInfo</tt> method.</li> -</ul> - -<p> -There are several different approaches to writing the JIT support code. For -instance, TableGen and target descriptor files may be used for creating a JIT -code generator, but are not mandatory. For the Alpha and PowerPC target -machines, TableGen is used to generate <tt>XXXGenCodeEmitter.inc</tt>, which -contains the binary coding of machine instructions and the -<tt>getBinaryCodeForInstr</tt> method to access those codes. Other JIT -implementations do not. -</p> - -<p> -Both <tt>XXXJITInfo.cpp</tt> and <tt>XXXCodeEmitter.cpp</tt> must include the -<tt>llvm/CodeGen/MachineCodeEmitter.h</tt> header file that defines the -<tt>MachineCodeEmitter</tt> class containing code for several callback functions -that write data (in bytes, words, strings, etc.) to the output stream. -</p> - -<!-- ======================================================================= --> -<h3> - <a name="mce">Machine Code Emitter</a> -</h3> - -<div> - -<p> -In <tt>XXXCodeEmitter.cpp</tt>, a target-specific of the <tt>Emitter</tt> class -is implemented as a function pass (subclass -of <tt>MachineFunctionPass</tt>). The target-specific implementation -of <tt>runOnMachineFunction</tt> (invoked by -<tt>runOnFunction</tt> in <tt>MachineFunctionPass</tt>) iterates through the -<tt>MachineBasicBlock</tt> calls <tt>emitInstruction</tt> to process each -instruction and emit binary code. <tt>emitInstruction</tt> is largely -implemented with case statements on the instruction types defined in -<tt>XXXInstrInfo.h</tt>. For example, in <tt>X86CodeEmitter.cpp</tt>, -the <tt>emitInstruction</tt> method is built around the following switch/case -statements: -</p> - -<div class="doc_code"> -<pre> -switch (Desc->TSFlags & X86::FormMask) { -case X86II::Pseudo: // for not yet implemented instructions - ... // or pseudo-instructions - break; -case X86II::RawFrm: // for instructions with a fixed opcode value - ... - break; -case X86II::AddRegFrm: // for instructions that have one register operand - ... // added to their opcode - break; -case X86II::MRMDestReg:// for instructions that use the Mod/RM byte - ... // to specify a destination (register) - break; -case X86II::MRMDestMem:// for instructions that use the Mod/RM byte - ... // to specify a destination (memory) - break; -case X86II::MRMSrcReg: // for instructions that use the Mod/RM byte - ... // to specify a source (register) - break; -case X86II::MRMSrcMem: // for instructions that use the Mod/RM byte - ... // to specify a source (memory) - break; -case X86II::MRM0r: case X86II::MRM1r: // for instructions that operate on -case X86II::MRM2r: case X86II::MRM3r: // a REGISTER r/m operand and -case X86II::MRM4r: case X86II::MRM5r: // use the Mod/RM byte and a field -case X86II::MRM6r: case X86II::MRM7r: // to hold extended opcode data - ... - break; -case X86II::MRM0m: case X86II::MRM1m: // for instructions that operate on -case X86II::MRM2m: case X86II::MRM3m: // a MEMORY r/m operand and -case X86II::MRM4m: case X86II::MRM5m: // use the Mod/RM byte and a field -case X86II::MRM6m: case X86II::MRM7m: // to hold extended opcode data - ... - break; -case X86II::MRMInitReg: // for instructions whose source and - ... // destination are the same register - break; -} -</pre> -</div> - -<p> -The implementations of these case statements often first emit the opcode and -then get the operand(s). Then depending upon the operand, helper methods may be -called to process the operand(s). For example, in <tt>X86CodeEmitter.cpp</tt>, -for the <tt>X86II::AddRegFrm</tt> case, the first data emitted -(by <tt>emitByte</tt>) is the opcode added to the register operand. Then an -object representing the machine operand, <tt>MO1</tt>, is extracted. The helper -methods such as <tt>isImmediate</tt>, -<tt>isGlobalAddress</tt>, <tt>isExternalSymbol</tt>, <tt>isConstantPoolIndex</tt>, and -<tt>isJumpTableIndex</tt> determine the operand -type. (<tt>X86CodeEmitter.cpp</tt> also has private methods such -as <tt>emitConstant</tt>, <tt>emitGlobalAddress</tt>, -<tt>emitExternalSymbolAddress</tt>, <tt>emitConstPoolAddress</tt>, -and <tt>emitJumpTableAddress</tt> that emit the data into the output stream.) -</p> - -<div class="doc_code"> -<pre> -case X86II::AddRegFrm: - MCE.emitByte(BaseOpcode + getX86RegNum(MI.getOperand(CurOp++).getReg())); - - if (CurOp != NumOps) { - const MachineOperand &MO1 = MI.getOperand(CurOp++); - unsigned Size = X86InstrInfo::sizeOfImm(Desc); - if (MO1.isImmediate()) - emitConstant(MO1.getImm(), Size); - else { - unsigned rt = Is64BitMode ? X86::reloc_pcrel_word - : (IsPIC ? X86::reloc_picrel_word : X86::reloc_absolute_word); - if (Opcode == X86::MOV64ri) - rt = X86::reloc_absolute_dword; // FIXME: add X86II flag? - if (MO1.isGlobalAddress()) { - bool NeedStub = isa<Function>(MO1.getGlobal()); - bool isLazy = gvNeedsLazyPtr(MO1.getGlobal()); - emitGlobalAddress(MO1.getGlobal(), rt, MO1.getOffset(), 0, - NeedStub, isLazy); - } else if (MO1.isExternalSymbol()) - emitExternalSymbolAddress(MO1.getSymbolName(), rt); - else if (MO1.isConstantPoolIndex()) - emitConstPoolAddress(MO1.getIndex(), rt); - else if (MO1.isJumpTableIndex()) - emitJumpTableAddress(MO1.getIndex(), rt); - } - } - break; -</pre> -</div> - -<p> -In the previous example, <tt>XXXCodeEmitter.cpp</tt> uses the -variable <tt>rt</tt>, which is a RelocationType enum that may be used to -relocate addresses (for example, a global address with a PIC base offset). The -<tt>RelocationType</tt> enum for that target is defined in the short -target-specific <tt>XXXRelocations.h</tt> file. The <tt>RelocationType</tt> is used by -the <tt>relocate</tt> method defined in <tt>XXXJITInfo.cpp</tt> to rewrite -addresses for referenced global symbols. -</p> - -<p> -For example, <tt>X86Relocations.h</tt> specifies the following relocation types -for the X86 addresses. In all four cases, the relocated value is added to the -value already in memory. For <tt>reloc_pcrel_word</tt> -and <tt>reloc_picrel_word</tt>, there is an additional initial adjustment. -</p> - -<div class="doc_code"> -<pre> -enum RelocationType { - reloc_pcrel_word = 0, // add reloc value after adjusting for the PC loc - reloc_picrel_word = 1, // add reloc value after adjusting for the PIC base - reloc_absolute_word = 2, // absolute relocation; no additional adjustment - reloc_absolute_dword = 3 // absolute relocation; no additional adjustment -}; -</pre> -</div> - -</div> - -<!-- ======================================================================= --> -<h3> - <a name="targetJITInfo">Target JIT Info</a> -</h3> - -<div> - -<p> -<tt>XXXJITInfo.cpp</tt> implements the JIT interfaces for target-specific -code-generation activities, such as emitting machine code and stubs. At minimum, -a target-specific version of <tt>XXXJITInfo</tt> implements the following: -</p> - -<ul> -<li><tt>getLazyResolverFunction</tt> — Initializes the JIT, gives the - target a function that is used for compilation.</li> - -<li><tt>emitFunctionStub</tt> — Returns a native function with a specified - address for a callback function.</li> - -<li><tt>relocate</tt> — Changes the addresses of referenced globals, based - on relocation types.</li> - -<li>Callback function that are wrappers to a function stub that is used when the - real target is not initially known.</li> -</ul> - -<p> -<tt>getLazyResolverFunction</tt> is generally trivial to implement. It makes the -incoming parameter as the global <tt>JITCompilerFunction</tt> and returns the -callback function that will be used a function wrapper. For the Alpha target -(in <tt>AlphaJITInfo.cpp</tt>), the <tt>getLazyResolverFunction</tt> -implementation is simply: -</p> - -<div class="doc_code"> -<pre> -TargetJITInfo::LazyResolverFn AlphaJITInfo::getLazyResolverFunction( - JITCompilerFn F) { - JITCompilerFunction = F; - return AlphaCompilationCallback; -} -</pre> -</div> - -<p> -For the X86 target, the <tt>getLazyResolverFunction</tt> implementation is a -little more complication, because it returns a different callback function for -processors with SSE instructions and XMM registers. -</p> - -<p> -The callback function initially saves and later restores the callee register -values, incoming arguments, and frame and return address. The callback function -needs low-level access to the registers or stack, so it is typically implemented -with assembler. -</p> - -</div> - -</div> - -<!-- *********************************************************************** --> - -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss-blue" alt="Valid CSS"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"></a> - - <a href="http://www.woo.com">Mason Woo</a> and <a href="http://misha.brukman.net">Misha Brukman</a><br> - <a href="http://llvm.org/">The LLVM Compiler Infrastructure</a> - <br> - Last modified: $Date$ -</address> - -</body> -</html> diff --git a/docs/WritingAnLLVMBackend.rst b/docs/WritingAnLLVMBackend.rst new file mode 100644 index 0000000000..7803163ae6 --- /dev/null +++ b/docs/WritingAnLLVMBackend.rst @@ -0,0 +1,1835 @@ +================================ +Writing an LLVM Compiler Backend +================================ + +.. sectionauthor:: Mason Woo <http://www.woo.com> and Misha Brukman <http://misha.brukman.net> + +.. contents:: + :local: + +Introduction +============ + +This document describes techniques for writing compiler backends that convert +the LLVM Intermediate Representation (IR) to code for a specified machine or +other languages. Code intended for a specific machine can take the form of +either assembly code or binary code (usable for a JIT compiler). + +The backend of LLVM features a target-independent code generator that may +create output for several types of target CPUs --- including X86, PowerPC, +ARM, and SPARC. The backend may also be used to generate code targeted at SPUs +of the Cell processor or GPUs to support the execution of compute kernels. + +The document focuses on existing examples found in subdirectories of +``llvm/lib/Target`` in a downloaded LLVM release. In particular, this document +focuses on the example of creating a static compiler (one that emits text +assembly) for a SPARC target, because SPARC has fairly standard +characteristics, such as a RISC instruction set and straightforward calling +conventions. + +Audience +-------- + +The audience for this document is anyone who needs to write an LLVM backend to +generate code for a specific hardware or software target. + +Prerequisite Reading +-------------------- + +These essential documents must be read before reading this document: + +* `LLVM Language Reference Manual <LangRef.html>`_ --- a reference manual for + the LLVM assembly language. + +* :doc:`CodeGenerator` --- a guide to the components (classes and code + generation algorithms) for translating the LLVM internal representation into + machine code for a specified target. Pay particular attention to the + descriptions of code generation stages: Instruction Selection, Scheduling and + Formation, SSA-based Optimization, Register Allocation, Prolog/Epilog Code + Insertion, Late Machine Code Optimizations, and Code Emission. + +* :doc:`TableGenFundamentals` --- a document that describes the TableGen + (``tblgen``) application that manages domain-specific information to support + LLVM code generation. TableGen processes input from a target description + file (``.td`` suffix) and generates C++ code that can be used for code + generation. + +* `Writing an LLVM Pass <WritingAnLLVMPass.html>`_ --- The assembly printer is + a ``FunctionPass``, as are several SelectionDAG processing steps. + +To follow the SPARC examples in this document, have a copy of `The SPARC +Architecture Manual, Version 8 <http://www.sparc.org/standards/V8.pdf>`_ for +reference. For details about the ARM instruction set, refer to the `ARM +Architecture Reference Manual <http://infocenter.arm.com/>`_. For more about +the GNU Assembler format (``GAS``), see `Using As +<http://sourceware.org/binutils/docs/as/index.html>`_, especially for the +assembly printer. "Using As" contains a list of target machine dependent +features. + +Basic Steps +----------- + +To write a compiler backend for LLVM that converts the LLVM IR to code for a +specified target (machine or other language), follow these steps: + +* Create a subclass of the ``TargetMachine`` class that describes + characteristics of your target machine. Copy existing examples of specific + ``TargetMachine`` class and header files; for example, start with + ``SparcTargetMachine.cpp`` and ``SparcTargetMachine.h``, but change the file + names for your target. Similarly, change code that references "``Sparc``" to + reference your target. + +* Describe the register set of the target. Use TableGen to generate code for + register definition, register aliases, and register classes from a + target-specific ``RegisterInfo.td`` input file. You should also write + additional code for a subclass of the ``TargetRegisterInfo`` class that + represents the class register file data used for register allocation and also + describes the interactions between registers. + +* Describe the instruction set of the target. Use TableGen to generate code + for target-specific instructions from target-specific versions of + ``TargetInstrFormats.td`` and ``TargetInstrInfo.td``. You should write + additional code for a subclass of the ``TargetInstrInfo`` class to represent + machine instructions supported by the target machine. + +* Describe the selection and conversion of the LLVM IR from a Directed Acyclic + Graph (DAG) representation of instructions to native target-specific + instructions. Use TableGen to generate code that matches patterns and + selects instructions based on additional information in a target-specific + version of ``TargetInstrInfo.td``. Write code for ``XXXISelDAGToDAG.cpp``, + where ``XXX`` identifies the specific target, to perform pattern matching and + DAG-to-DAG instruction selection. Also write code in ``XXXISelLowering.cpp`` + to replace or remove operations and data types that are not supported + natively in a SelectionDAG. + +* Write code for an assembly printer that converts LLVM IR to a GAS format for + your target machine. You should add assembly strings to the instructions + defined in your target-specific version of ``TargetInstrInfo.td``. You + should also write code for a subclass of ``AsmPrinter`` that performs the + LLVM-to-assembly conversion and a trivial subclass of ``TargetAsmInfo``. + +* Optionally, add support for subtargets (i.e., variants with different + capabilities). You should also write code for a subclass of the + ``TargetSubtarget`` class, which allows you to use the ``-mcpu=`` and + ``-mattr=`` command-line options. + +* Optionally, add JIT support and create a machine code emitter (subclass of + ``TargetJITInfo``) that is used to emit binary code directly into memory. + +In the ``.cpp`` and ``.h``. files, initially stub up these methods and then +implement them later. Initially, you may not know which private members that +the class will need and which components will need to be subclassed. + +Preliminaries +------------- + +To actually create your compiler backend, you need to create and modify a few +files. The absolute minimum is discussed here. But to actually use the LLVM +target-independent code generator, you must perform the steps described in the +:doc:`LLVM Target-Independent Code Generator <CodeGenerator>` document. + +First, you should create a subdirectory under ``lib/Target`` to hold all the +files related to your target. If your target is called "Dummy", create the +directory ``lib/Target/Dummy``. + +In this new directory, create a ``Makefile``. It is easiest to copy a +``Makefile`` of another target and modify it. It should at least contain the +``LEVEL``, ``LIBRARYNAME`` and ``TARGET`` variables, and then include +``$(LEVEL)/Makefile.common``. The library can be named ``LLVMDummy`` (for +example, see the MIPS target). Alternatively, you can split the library into +``LLVMDummyCodeGen`` and ``LLVMDummyAsmPrinter``, the latter of which should be +implemented in a subdirectory below ``lib/Target/Dummy`` (for example, see the +PowerPC target). + +Note that these two naming schemes are hardcoded into ``llvm-config``. Using +any other naming scheme will confuse ``llvm-config`` and produce a lot of +(seemingly unrelated) linker errors when linking ``llc``. + +To make your target actually do something, you need to implement a subclass of +``TargetMachine``. This implementation should typically be in the file +``lib/Target/DummyTargetMachine.cpp``, but any file in the ``lib/Target`` +directory will be built and should work. To use LLVM's target independent code +generator, you should do what all current machine backends do: create a +subclass of ``LLVMTargetMachine``. (To create a target from scratch, create a +subclass of ``TargetMachine``.) + +To get LLVM to actually build and link your target, you need to add it to the +``TARGETS_TO_BUILD`` variable. To do this, you modify the configure script to +know about your target when parsing the ``--enable-targets`` option. Search +the configure script for ``TARGETS_TO_BUILD``, add your target to the lists +there (some creativity required), and then reconfigure. Alternatively, you can +change ``autotools/configure.ac`` and regenerate configure by running +``./autoconf/AutoRegen.sh``. + +Target Machine +============== + +``LLVMTargetMachine`` is designed as a base class for targets implemented with +the LLVM target-independent code generator. The ``LLVMTargetMachine`` class +should be specialized by a concrete target class that implements the various +virtual methods. ``LLVMTargetMachine`` is defined as a subclass of +``TargetMachine`` in ``include/llvm/Target/TargetMachine.h``. The +``TargetMachine`` class implementation (``TargetMachine.cpp``) also processes +numerous command-line options. + +To create a concrete target-specific subclass of ``LLVMTargetMachine``, start +by copying an existing ``TargetMachine`` class and header. You should name the +files that you create to reflect your specific target. For instance, for the +SPARC target, name the files ``SparcTargetMachine.h`` and +``SparcTargetMachine.cpp``. + +For a target machine ``XXX``, the implementation of ``XXXTargetMachine`` must +have access methods to obtain objects that represent target components. These +methods are named ``get*Info``, and are intended to obtain the instruction set +(``getInstrInfo``), register set (``getRegisterInfo``), stack frame layout +(``getFrameInfo``), and similar information. ``XXXTargetMachine`` must also +implement the ``getDataLayout`` method to access an object with target-specific +data characteristics, such as data type size and alignment requirements. + +For instance, for the SPARC target, the header file ``SparcTargetMachine.h`` +declares prototypes for several ``get*Info`` and ``getDataLayout`` methods that +simply return a class member. + +.. code-block:: c++ + + namespace llvm { + + class Module; + + class SparcTargetMachine : public LLVMTargetMachine { + const DataLayout DataLayout; // Calculates type size & alignment + SparcSubtarget Subtarget; + SparcInstrInfo InstrInfo; + TargetFrameInfo FrameInfo; + + protected: + virtual const TargetAsmInfo *createTargetAsmInfo() const; + + public: + SparcTargetMachine(const Module &M, const std::string &FS); + + virtual const SparcInstrInfo *getInstrInfo() const {return &InstrInfo; } + virtual const TargetFrameInfo *getFrameInfo() const {return &FrameInfo; } + virtual const TargetSubtarget *getSubtargetImpl() const{return &Subtarget; } + virtual const TargetRegisterInfo *getRegisterInfo() const { + return &InstrInfo.getRegisterInfo(); + } + virtual const DataLayout *getDataLayout() const { return &DataLayout; } + static unsigned getModuleMatchQuality(const Module &M); + + // Pass Pipeline Configuration + virtual bool addInstSelector(PassManagerBase &PM, bool Fast); + virtual bool addPreEmitPass(PassManagerBase &PM, bool Fast); + }; + + } // end namespace llvm + +* ``getInstrInfo()`` +* ``getRegisterInfo()`` +* ``getFrameInfo()`` +* ``getDataLayout()`` +* ``getSubtargetImpl()`` + +For some targets, you also need to support the following methods: + +* ``getTargetLowering()`` +* ``getJITInfo()`` + +In addition, the ``XXXTargetMachine`` constructor should specify a +``TargetDescription`` string that determines the data layout for the target +machine, including characteristics such as pointer size, alignment, and +endianness. For example, the constructor for ``SparcTargetMachine`` contains +the following: + +.. code-block:: c++ + + SparcTargetMachine::SparcTargetMachine(const Module &M, const std::string &FS) + : DataLayout("E-p:32:32-f128:128:128"), + Subtarget(M, FS), InstrInfo(Subtarget), + FrameInfo(TargetFrameInfo::StackGrowsDown, 8, 0) { + } + +Hyphens separate portions of the ``TargetDescription`` string. + +* An upper-case "``E``" in the string indicates a big-endian target data model. + A lower-case "``e``" indicates little-endian. + +* "``p:``" is followed by pointer information: size, ABI alignment, and + preferred alignment. If only two figures follow "``p:``", then the first + value is pointer size, and the second value is both ABI and preferred + alignment. + +* Then a letter for numeric type alignment: "``i``", "``f``", "``v``", or + "``a``" (corresponding to integer, floating point, vector, or aggregate). + "``i``", "``v``", or "``a``" are followed by ABI alignment and preferred + alignment. "``f``" is followed by three values: the first indicates the size + of a long double, then ABI alignment, and then ABI preferred alignment. + +Target Registration +=================== + +You must also register your target with the ``TargetRegistry``, which is what +other LLVM tools use to be able to lookup and use your target at runtime. The +``TargetRegistry`` can be used directly, but for most targets there are helper +templates which should take care of the work for you. + +All targets should declare a global ``Target`` object which is used to +represent the target during registration. Then, in the target's ``TargetInfo`` +library, the target should define that object and use the ``RegisterTarget`` +template to register the target. For example, the Sparc registration code +looks like this: + +.. code-block:: c++ + + Target llvm::TheSparcTarget; + + extern "C" void LLVMInitializeSparcTargetInfo() { + RegisterTarget<Triple::sparc, /*HasJIT=*/false> + X(TheSparcTarget, "sparc", "Sparc"); + } + +This allows the ``TargetRegistry`` to look up the target by name or by target +triple. In addition, most targets will also register additional features which +are available in separate libraries. These registration steps are separate, +because some clients may wish to only link in some parts of the target --- the +JIT code generator does not require the use of the assembler printer, for +example. Here is an example of registering the Sparc assembly printer: + +.. code-block:: c++ + + extern "C" void LLVMInitializeSparcAsmPrinter() { + RegisterAsmPrinter<SparcAsmPrinter> X(TheSparcTarget); + } + +For more information, see "`llvm/Target/TargetRegistry.h +</doxygen/TargetRegistry_8h-source.html>`_". + +Register Set and Register Classes +================================= + +You should describe a concrete target-specific class that represents the +register file of a target machine. This class is called ``XXXRegisterInfo`` +(where ``XXX`` identifies the target) and represents the class register file +data that is used for register allocation. It also describes the interactions +between registers. + +You also need to define register classes to categorize related registers. A +register class should be added for groups of registers that are all treated the +same way for some instruction. Typical examples are register classes for +integer, floating-point, or vector registers. A register allocator allows an +instruction to use any register in a specified register class to perform the +instruction in a similar manner. Register classes allocate virtual registers +to instructions from these sets, and register classes let the +target-independent register allocator automatically choose the actual +registers. + +Much of the code for registers, including register definition, register +aliases, and register classes, is generated by TableGen from +``XXXRegisterInfo.td`` input files and placed in ``XXXGenRegisterInfo.h.inc`` +and ``XXXGenRegisterInfo.inc`` output files. Some of the code in the +implementation of ``XXXRegisterInfo`` requires hand-coding. + +Defining a Register +------------------- + +The ``XXXRegisterInfo.td`` file typically starts with register definitions for +a target machine. The ``Register`` class (specified in ``Target.td``) is used +to define an object for each register. The specified string ``n`` becomes the +``Name`` of the register. The basic ``Register`` object does not have any +subregisters and does not specify any aliases. + +.. code-block:: llvm + + class Register<string n> { + string Namespace = ""; + string AsmName = n; + string Name = n; + int SpillSize = 0; + int SpillAlignment = 0; + list<Register> Aliases = []; + list<Register> SubRegs = []; + list<int> DwarfNumbers = []; + } + +For example, in the ``X86RegisterInfo.td`` file, there are register definitions +that utilize the ``Register`` class, such as: + +.. code-block:: llvm + + def AL : Register<"AL">, DwarfRegNum<[0, 0, 0]>; + +This defines the register ``AL`` and assigns it values (with ``DwarfRegNum``) +that are used by ``gcc``, ``gdb``, or a debug information writer to identify a +register. For register ``AL``, ``DwarfRegNum`` takes an array of 3 values +representing 3 different modes: the first element is for X86-64, the second for +exception handling (EH) on X86-32, and the third is generic. -1 is a special +Dwarf number that indicates the gcc number is undefined, and -2 indicates the +register number is invalid for this mode. + +From the previously described line in the ``X86RegisterInfo.td`` file, TableGen +generates this code in the ``X86GenRegisterInfo.inc`` file: + +.. code-block:: c++ + + static const unsigned GR8[] = { X86::AL, ... }; + + const unsigned AL_AliasSet[] = { X86::AX, X86::EAX, X86::RAX, 0 }; + + const TargetRegisterDesc RegisterDescriptors[] = { + ... + { "AL", "AL", AL_AliasSet, Empty_SubRegsSet, Empty_SubRegsSet, AL_SuperRegsSet }, ... + +From the register info file, TableGen generates a ``TargetRegisterDesc`` object +for each register. ``TargetRegisterDesc`` is defined in +``include/llvm/Target/TargetRegisterInfo.h`` with the following fields: + +.. code-block:: c++ + + struct TargetRegisterDesc { + const char *AsmName; // Assembly language name for the register + const char *Name; // Printable name for the reg (for debugging) + const unsigned *AliasSet; // Register Alias Set + const unsigned *SubRegs; // Sub-register set + const unsigned *ImmSubRegs; // Immediate sub-register set + const unsigned *SuperRegs; // Super-register set + }; + +TableGen uses the entire target description file (``.td``) to determine text +names for the register (in the ``AsmName`` and ``Name`` fields of +``TargetRegisterDesc``) and the relationships of other registers to the defined +register (in the other ``TargetRegisterDesc`` fields). In this example, other +definitions establish the registers "``AX``", "``EAX``", and "``RAX``" as +aliases for one another, so TableGen generates a null-terminated array +(``AL_AliasSet``) for this register alias set. + +The ``Register`` class is commonly used as a base class for more complex +classes. In ``Target.td``, the ``Register`` class is the base for the +``RegisterWithSubRegs`` class that is used to define registers that need to +specify subregisters in the ``SubRegs`` list, as shown here: + +.. code-block:: llvm + + class RegisterWithSubRegs<string n, list<Register> subregs> : Register<n> { + let SubRegs = subregs; + } + +In ``SparcRegisterInfo.td``, additional register classes are defined for SPARC: +a ``Register`` subclass, ``SparcReg``, and further subclasses: ``Ri``, ``Rf``, +and ``Rd``. SPARC registers are identified by 5-bit ID numbers, which is a +feature common to these subclasses. Note the use of "``let``" expressions to +override values that are initially defined in a superclass (such as ``SubRegs`` +field in the ``Rd`` class). + +.. code-block:: llvm + + class SparcReg<string n> : Register<n> { + field bits<5> Num; + let Namespace = "SP"; + } + // Ri - 32-bit integer registers + class Ri<bits<5> num, string n> : + SparcReg<n> { + let Num = num; + } + // Rf - 32-bit floating-point registers + class Rf<bits<5> num, string n> : + SparcReg<n> { + let Num = num; + } + // Rd - Slots in the FP register file for 64-bit floating-point values. + class Rd<bits<5> num, string n, list<Register> subregs> : SparcReg<n> { + let Num = num; + let SubRegs = subregs; + } + +In the ``SparcRegisterInfo.td`` file, there are register definitions that +utilize these subclasses of ``Register``, such as: + +.. code-block:: llvm + + def G0 : Ri< 0, "G0">, DwarfRegNum<[0]>; + def G1 : Ri< 1, "G1">, DwarfRegNum<[1]>; + ... + def F0 : Rf< 0, "F0">, DwarfRegNum<[32]>; + def F1 : Rf< 1, "F1">, DwarfRegNum<[33]>; + ... + def D0 : Rd< 0, "F0", [F0, F1]>, DwarfRegNum<[32]>; + def D1 : Rd< 2, "F2", [F2, F3]>, DwarfRegNum<[34]>; + +The last two registers shown above (``D0`` and ``D1``) are double-precision +floating-point registers that are aliases for pairs of single-precision +floating-point sub-registers. In addition to aliases, the sub-register and +super-register relationships of the defined register are in fields of a +register's ``TargetRegisterDesc``. + +Defining a Register Class +------------------------- + +The ``RegisterClass`` class (specified in ``Target.td``) is used to define an +object that represents a group of related registers and also defines the +default allocation order of the registers. A target description file +``XXXRegisterInfo.td`` that uses ``Target.td`` can construct register classes +using the following class: + +.. code-block:: llvm + + class RegisterClass<string namespace, + list<ValueType> regTypes, int alignment, dag regList> { + string Namespace = namespace; + list<ValueType> RegTypes = regTypes; + int Size = 0; // spill size, in bits; zero lets tblgen pick the size + int Alignment = alignment; + + // CopyCost is the cost of copying a value between two registers + // default value 1 means a single instruction + // A negative value means copying is extremely expensive or impossible + int CopyCost = 1; + dag MemberList = regList; + + // for register classes that are subregisters of this class + list<RegisterClass> SubRegClassList = []; + + code MethodProtos = [{}]; // to insert arbitrary code + code MethodBodies = [{}]; + } + +To define a ``RegisterClass``, use the following 4 arguments: + +* The first argument of the definition is the name of the namespace. + +* The second argument is a list of ``ValueType`` register type values that are + defined in ``include/llvm/CodeGen/ValueTypes.td``. Defined values include + integer types (such as ``i16``, ``i32``, and ``i1`` for Boolean), + floating-point types (``f32``, ``f64``), and vector types (for example, + ``v8i16`` for an ``8 x i16`` vector). All registers in a ``RegisterClass`` + must have the same ``ValueType``, but some registers may store vector data in + different configurations. For example a register that can process a 128-bit + vector may be able to handle 16 8-bit integer elements, 8 16-bit integers, 4 + 32-bit integers, and so on. + +* The third argument of the ``RegisterClass`` definition specifies the + alignment required of the registers when they are stored or loaded to + memory. + +* The final argument, ``regList``, specifies which registers are in this class. + If an alternative allocation order method is not specified, then ``regList`` + also defines the order of allocation used by the register allocator. Besides + simply listing registers with ``(add R0, R1, ...)``, more advanced set + operators are available. See ``include/llvm/Target/Target.td`` for more + information. + +In ``SparcRegisterInfo.td``, three ``RegisterClass`` objects are defined: +``FPRegs``, ``DFPRegs``, and ``IntRegs``. For all three register classes, the +first argument defines the namespace with the string "``SP``". ``FPRegs`` +defines a group of 32 single-precision floating-point registers (``F0`` to +``F31``); ``DFPRegs`` defines a group of 16 double-precision registers +(``D0-D15``). + +.. code-block:: llvm + + // F0, F1, F2, ..., F31 + def FPRegs : RegisterClass<"SP", [f32], 32, (sequence "F%u", 0, 31)>; + + def DFPRegs : RegisterClass<"SP", [f64], 64, + (add D0, D1, D2, D3, D4, D5, D6, D7, D8, + D9, D10, D11, D12, D13, D14, D15)>; + + def IntRegs : RegisterClass<"SP", [i32], 32, + (add L0, L1, L2, L3, L4, L5, L6, L7, + I0, I1, I2, I3, I4, I5, + O0, O1, O2, O3, O4, O5, O7, + G1, + // Non-allocatable regs: + G2, G3, G4, + O6, // stack ptr + I6, // frame ptr + I7, // return address + G0, // constant zero + G5, G6, G7 // reserved for kernel + )>; + +Using ``SparcRegisterInfo.td`` with TableGen generates several output files +that are intended for inclusion in other source code that you write. +``SparcRegisterInfo.td`` generates ``SparcGenRegisterInfo.h.inc``, which should +be included in the header file for the implementation of the SPARC register +implementation that you write (``SparcRegisterInfo.h``). In +``SparcGenRegisterInfo.h.inc`` a new structure is defined called +``SparcGenRegisterInfo`` that uses ``TargetRegisterInfo`` as its base. It also +specifies types, based upon the defined register classes: ``DFPRegsClass``, +``FPRegsClass``, and ``IntRegsClass``. + +``SparcRegisterInfo.td`` also generates ``SparcGenRegisterInfo.inc``, which is +included at the bottom of ``SparcRegisterInfo.cpp``, the SPARC register +implementation. The code below shows only the generated integer registers and +associated register classes. The order of registers in ``IntRegs`` reflects +the order in the definition of ``IntRegs`` in the target description file. + +.. code-block:: c++ + + // IntRegs Register Class... + static const unsigned IntRegs[] = { + SP::L0, SP::L1, SP::L2, SP::L3, SP::L4, SP::L5, + SP::L6, SP::L7, SP::I0, SP::I1, SP::I2, SP::I3, + SP::I4, SP::I5, SP::O0, SP::O1, SP::O2, SP::O3, + SP::O4, SP::O5, SP::O7, SP::G1, SP::G2, SP::G3, + SP::G4, SP::O6, SP::I6, SP::I7, SP::G0, SP::G5, + SP::G6, SP::G7, + }; + + // IntRegsVTs Register Class Value Types... + static const MVT::ValueType IntRegsVTs[] = { + MVT::i32, MVT::Other + }; + + namespace SP { // Register class instances + DFPRegsClass DFPRegsRegClass; + FPRegsClass FPRegsRegClass; + IntRegsClass IntRegsRegClass; + ... + // IntRegs Sub-register Classess... + static const TargetRegisterClass* const IntRegsSubRegClasses [] = { + NULL + }; + ... + // IntRegs Super-register Classess... + static const TargetRegisterClass* const IntRegsSuperRegClasses [] = { + NULL + }; + ... + // IntRegs Register Class sub-classes... + static const TargetRegisterClass* const IntRegsSubclasses [] = { + NULL + }; + ... + // IntRegs Register Class super-classes... + static const TargetRegisterClass* const IntRegsSuperclasses [] = { + NULL + }; + + IntRegsClass::IntRegsClass() : TargetRegisterClass(IntRegsRegClassID, + IntRegsVTs, IntRegsSubclasses, IntRegsSuperclasses, IntRegsSubRegClasses, + IntRegsSuperRegClasses, 4, 4, 1, IntRegs, IntRegs + 32) {} + } + +The register allocators will avoid using reserved registers, and callee saved +registers are not used until all the volatile registers have been used. That +is usually good enough, but in some cases it may be necessary to provide custom +allocation orders. + +Implement a subclass of ``TargetRegisterInfo`` +---------------------------------------------- + +The final step is to hand code portions of ``XXXRegisterInfo``, which +implements the interface described in ``TargetRegisterInfo.h`` (see +:ref:`TargetRegisterInfo`). These functions return ``0``, ``NULL``, or +``false``, unless overridden. Here is a list of functions that are overridden +for the SPARC implementation in ``SparcRegisterInfo.cpp``: + +* ``getCalleeSavedRegs`` --- Returns a list of callee-saved registers in the + order of the desired callee-save stack frame offset. + +* ``getReservedRegs`` --- Returns a bitset indexed by physical register + numbers, indicating if a particular register is unavailable. + +* ``hasFP`` --- Return a Boolean indicating if a function should have a + dedicated frame pointer register. + +* ``eliminateCallFramePseudoInstr`` --- If call frame setup or destroy pseudo + instructions are used, this can be called to eliminate them. + +* ``eliminateFrameIndex`` --- Eliminate abstract frame indices from + instructions that may use them. + +* ``emitPrologue`` --- Insert prologue code into the function. + +* ``emitEpilogue`` --- Insert epilogue code into the function. + +.. _instruction-set: + +Instruction Set +=============== + +During the early stages of code generation, the LLVM IR code is converted to a +``SelectionDAG`` with nodes that are instances of the ``SDNode`` class +containing target instructions. An ``SDNode`` has an opcode, operands, type +requirements, and operation properties. For example, is an operation +commutative, does an operation load from memory. The various operation node +types are described in the ``include/llvm/CodeGen/SelectionDAGNodes.h`` file +(values of the ``NodeType`` enum in the ``ISD`` namespace). + +TableGen uses the following target description (``.td``) input files to +generate much of the code for instruction definition: + +* ``Target.td`` --- Where the ``Instruction``, ``Operand``, ``InstrInfo``, and + other fundamental classes are defined. + +* ``TargetSelectionDAG.td`` --- Used by ``SelectionDAG`` instruction selection + generators, contains ``SDTC*`` classes (selection DAG type constraint), + definitions of ``SelectionDAG`` nodes (such as ``imm``, ``cond``, ``bb``, + ``add``, ``fadd``, ``sub``), and pattern support (``Pattern``, ``Pat``, + ``PatFrag``, ``PatLeaf``, ``ComplexPattern``. + +* ``XXXInstrFormats.td`` --- Patterns for definitions of target-specific + instructions. + +* ``XXXInstrInfo.td`` --- Target-specific definitions of instruction templates, + condition codes, and instructions of an instruction set. For architecture + modifications, a different file name may be used. For example, for Pentium + with SSE instruction, this file is ``X86InstrSSE.td``, and for Pentium with + MMX, this file is ``X86InstrMMX.td``. + +There is also a target-specific ``XXX.td`` file, where ``XXX`` is the name of +the target. The ``XXX.td`` file includes the other ``.td`` input files, but +its contents are only directly important for subtargets. + +You should describe a concrete target-specific class ``XXXInstrInfo`` that +represents machine instructions supported by a target machine. +``XXXInstrInfo`` contains an array of ``XXXInstrDescriptor`` objects, each of +which describes one instruction. An instruction descriptor defines: + +* Opcode mnemonic +* Number of operands +* List of implicit register definitions and uses +* Target-independent properties (such as memory access, is commutable) +* Target-specific flags + +The Instruction class (defined in ``Target.td``) is mostly used as a base for +more complex instruction classes. + +.. code-block:: llvm + + class Instruction { + string Namespace = ""; + dag OutOperandList; // A dag containing the MI def operand list. + dag InOperandList; // A dag containing the MI use operand list. + string AsmString = ""; // The .s format to print the instruction with. + list<dag> Pattern; // Set to the DAG pattern for this instruction. + list<Register> Uses = []; + list<Register> Defs = []; + list<Predicate> Predicates = []; // predicates turned into isel match code + ... remainder not shown for space ... + } + +A ``SelectionDAG`` node (``SDNode``) should contain an object representing a +target-specific instruction that is defined in ``XXXInstrInfo.td``. The +instruction objects should represent instructions from the architecture manual +of the target machine (such as the SPARC Architecture Manual for the SPARC +target). + +A single instruction from the architecture manual is often modeled as multiple +target instructions, depending upon its operands. For example, a manual might +describe an add instruction that takes a register or an immediate operand. An +LLVM target could model this with two instructions named ``ADDri`` and +``ADDrr``. + +You should define a class for each instruction category and define each opcode +as a subclass of the category with appropriate parameters such as the fixed +binary encoding of opcodes and extended opcodes. You should map the register +bits to the bits of the instruction in which they are encoded (for the JIT). +Also you should specify how the instruction should be printed when the +automatic assembly printer is used. + +As is described in the SPARC Architecture Manual, Version 8, there are three +major 32-bit formats for instructions. Format 1 is only for the ``CALL`` +instruction. Format 2 is for branch on condition codes and ``SETHI`` (set high +bits of a register) instructions. Format 3 is for other instructions. + +Each of these formats has corresponding classes in ``SparcInstrFormat.td``. +``InstSP`` is a base class for other instruction classes. Additional base +classes are specified for more precise formats: for example in +``SparcInstrFormat.td``, ``F2_1`` is for ``SETHI``, and ``F2_2`` is for +branches. There are three other base classes: ``F3_1`` for register/register +operations, ``F3_2`` for register/immediate operations, and ``F3_3`` for +floating-point operations. ``SparcInstrInfo.td`` also adds the base class +``Pseudo`` for synthetic SPARC instructions. + +``SparcInstrInfo.td`` largely consists of operand and instruction definitions +for the SPARC target. In ``SparcInstrInfo.td``, the following target +description file entry, ``LDrr``, defines the Load Integer instruction for a +Word (the ``LD`` SPARC opcode) from a memory address to a register. The first +parameter, the value 3 (``11``\ :sub:`2`), is the operation value for this +category of operation. The second parameter (``000000``\ :sub:`2`) is the +specific operation value for ``LD``/Load Word. The third parameter is the +output destination, which is a register operand and defined in the ``Register`` +target description file (``IntRegs``). + +.. code-block:: llvm + + def LDrr : F3_1 <3, 0b000000, (outs IntRegs:$dst), (ins MEMrr:$addr), + "ld [$addr], $dst", + [(set IntRegs:$dst, (load ADDRrr:$addr))]>; + +The fourth parameter is the input source, which uses the address operand +``MEMrr`` that is defined earlier in ``SparcInstrInfo.td``: + +.. code-block:: llvm + + def MEMrr : Operand<i32> { + let PrintMethod = "printMemOperand"; + let MIOperandInfo = (ops IntRegs, IntRegs); + } + +The fifth parameter is a string that is used by the assembly printer and can be +left as an empty string until the assembly printer interface is implemented. +The sixth and final parameter is the pattern used to match the instruction +during the SelectionDAG Select Phase described in :doc:`CodeGenerator`. +This parameter is detailed in the next section, :ref:`instruction-selector`. + +Instruction class definitions are not overloaded for different operand types, +so separate versions of instructions are needed for register, memory, or +immediate value operands. For example, to perform a Load Integer instruction +for a Word from an immediate operand to a register, the following instruction +class is defined: + +.. code-block:: llvm + + def LDri : F3_2 <3, 0b000000, (outs IntRegs:$dst), (ins MEMri:$addr), + "ld [$addr], $dst", + [(set IntRegs:$dst, (load ADDRri:$addr))]>; + +Writing these definitions for so many similar instructions can involve a lot of +cut and paste. In ``.td`` files, the ``multiclass`` directive enables the +creation of templates to define several instruction classes at once (using the +``defm`` directive). For example in ``SparcInstrInfo.td``, the ``multiclass`` +pattern ``F3_12`` is defined to create 2 instruction classes each time +``F3_12`` is invoked: + +.. code-block:: llvm + + multiclass F3_12 <string OpcStr, bits<6> Op3Val, SDNode OpNode> { + def rr : F3_1 <2, Op3Val, + (outs IntRegs:$dst), (ins IntRegs:$b, IntRegs:$c), + !strconcat(OpcStr, " $b, $c, $dst"), + [(set IntRegs:$dst, (OpNode IntRegs:$b, IntRegs:$c))]>; + def ri : F3_2 <2, Op3Val, + (outs IntRegs:$dst), (ins IntRegs:$b, i32imm:$c), + !strconcat(OpcStr, " $b, $c, $dst"), + [(set IntRegs:$dst, (OpNode IntRegs:$b, simm13:$c))]>; + } + +So when the ``defm`` directive is used for the ``XOR`` and ``ADD`` +instructions, as seen below, it creates four instruction objects: ``XORrr``, +``XORri``, ``ADDrr``, and ``ADDri``. + +.. code-block:: llvm + + defm XOR : F3_12<"xor", 0b000011, xor>; + defm ADD : F3_12<"add", 0b000000, add>; + +``SparcInstrInfo.td`` also includes definitions for condition codes that are +referenced by branch instructions. The following definitions in +``SparcInstrInfo.td`` indicate the bit location of the SPARC condition code. +For example, the 10\ :sup:`th` bit represents the "greater than" condition for +integers, and the 22\ :sup:`nd` bit represents the "greater than" condition for +floats. + +.. code-block:: llvm + + def ICC_NE : ICC_VAL< 9>; // Not Equal + def ICC_E : ICC_VAL< 1>; // Equal + def ICC_G : ICC_VAL<10>; // Greater + ... + def FCC_U : FCC_VAL<23>; // Unordered + def FCC_G : FCC_VAL<22>; // Greater + def FCC_UG : FCC_VAL<21>; // Unordered or Greater + ... + +(Note that ``Sparc.h`` also defines enums that correspond to the same SPARC +condition codes. Care must be taken to ensure the values in ``Sparc.h`` +correspond to the values in ``SparcInstrInfo.td``. I.e., ``SPCC::ICC_NE = 9``, +``SPCC::FCC_U = 23`` and so on.) + +Instruction Operand Mapping +--------------------------- + +The code generator backend maps instruction operands to fields in the +instruction. Operands are assigned to unbound fields in the instruction in the +order they are defined. Fields are bound when they are assigned a value. For +example, the Sparc target defines the ``XNORrr`` instruction as a ``F3_1`` +format instruction having three operands. + +.. code-block:: llvm + + def XNORrr : F3_1<2, 0b000111, + (outs IntRegs:$dst), (ins IntRegs:$b, IntRegs:$c), + "xnor $b, $c, $dst", + [(set IntRegs:$dst, (not (xor IntRegs:$b, IntRegs:$c)))]>; + +The instruction templates in ``SparcInstrFormats.td`` show the base class for +``F3_1`` is ``InstSP``. + +.. code-block:: llvm + + class InstSP<dag outs, dag ins, string asmstr, list<dag> pattern> : Instruction { + field bits<32> Inst; + let Namespace = "SP"; + bits<2> op; + let Inst{31-30} = op; + dag OutOperandList = outs; + dag InOperandList = ins; + let AsmString = asmstr; + let Pattern = pattern; + } + +``InstSP`` leaves the ``op`` field unbound. + +.. code-block:: llvm + + class F3<dag outs, dag ins, string asmstr, list<dag> pattern> + : InstSP<outs, ins, asmstr, pattern> { + bits<5> rd; + bits<6> op3; + bits<5> rs1; + let op{1} = 1; // Op = 2 or 3 + let Inst{29-25} = rd; + let Inst{24-19} = op3; + let Inst{18-14} = rs1; + } + +``F3`` binds the ``op`` field and defines the ``rd``, ``op3``, and ``rs1`` +fields. ``F3`` format instructions will bind the operands ``rd``, ``op3``, and +``rs1`` fields. + +.. code-block:: llvm + + class F3_1<bits<2> opVal, bits<6> op3val, dag outs, dag ins, + string asmstr, list<dag> pattern> : F3<outs, ins, asmstr, pattern> { + bits<8> asi = 0; // asi not currently used + bits<5> rs2; + let op = opVal; + let op3 = op3val; + let Inst{13} = 0; // i field = 0 + let Inst{12-5} = asi; // address space identifier + let Inst{4-0} = rs2; + } + +``F3_1`` binds the ``op3`` field and defines the ``rs2`` fields. ``F3_1`` +format instructions will bind the operands to the ``rd``, ``rs1``, and ``rs2`` +fields. This results in the ``XNORrr`` instruction binding ``$dst``, ``$b``, +and ``$c`` operands to the ``rd``, ``rs1``, and ``rs2`` fields respectively. + +Instruction Relation Mapping +---------------------------- + +This TableGen feature is used to relate instructions with each other. It is +particularly useful when you have multiple instruction formats and need to +switch between them after instruction selection. This entire feature is driven +by relation models which can be defined in ``XXXInstrInfo.td`` files +according to the target-specific instruction set. Relation models are defined +using ``InstrMapping`` class as a base. TableGen parses all the models +and generates instruction relation maps using the specified information. +Relation maps are emitted as tables in the ``XXXGenInstrInfo.inc`` file +along with the functions to query them. For the detailed information on how to +use this feature, please refer to :doc:`HowToUseInstrMappings`. + +Implement a subclass of ``TargetInstrInfo`` +------------------------------------------- + +The final step is to hand code portions of ``XXXInstrInfo``, which implements +the interface described in ``TargetInstrInfo.h`` (see :ref:`TargetInstrInfo`). +These functions return ``0`` or a Boolean or they assert, unless overridden. +Here's a list of functions that are overridden for the SPARC implementation in +``SparcInstrInfo.cpp``: + +* ``isLoadFromStackSlot`` --- If the specified machine instruction is a direct + load from a stack slot, return the register number of the destination and the + ``FrameIndex`` of the stack slot. + +* ``isStoreToStackSlot`` --- If the specified machine instruction is a direct + store to a stack slot, return the register number of the destination and the + ``FrameIndex`` of the stack slot. + +* ``copyPhysReg`` --- Copy values between a pair of physical registers. + +* ``storeRegToStackSlot`` --- Store a register value to a stack slot. + +* ``loadRegFromStackSlot`` --- Load a register value from a stack slot. + +* ``storeRegToAddr`` --- Store a register value to memory. + +* ``loadRegFromAddr`` --- Load a register value from memory. + +* ``foldMemoryOperand`` --- Attempt to combine instructions of any load or + store instruction for the specified operand(s). + +Branch Folding and If Conversion +-------------------------------- + +Performance can be improved by combining instructions or by eliminating +instructions that are never reached. The ``AnalyzeBranch`` method in +``XXXInstrInfo`` may be implemented to examine conditional instructions and +remove unnecessary instructions. ``AnalyzeBranch`` looks at the end of a +machine basic block (MBB) for opportunities for improvement, such as branch +folding and if conversion. The ``BranchFolder`` and ``IfConverter`` machine +function passes (see the source files ``BranchFolding.cpp`` and +``IfConversion.cpp`` in the ``lib/CodeGen`` directory) call ``AnalyzeBranch`` +to improve the control flow graph that represents the instructions. + +Several implementations of ``AnalyzeBranch`` (for ARM, Alpha, and X86) can be +examined as models for your own ``AnalyzeBranch`` implementation. Since SPARC +does not implement a useful ``AnalyzeBranch``, the ARM target implementation is +shown below. + +``AnalyzeBranch`` returns a Boolean value and takes four parameters: + +* ``MachineBasicBlock &MBB`` --- The incoming block to be examined. + +* ``MachineBasicBlock *&TBB`` --- A destination block that is returned. For a + conditional branch that evaluates to true, ``TBB`` is the destination. + +* ``MachineBasicBlock *&FBB`` --- For a conditional branch that evaluates to + false, ``FBB`` is returned as the destination. + +* ``std::vector<MachineOperand> &Cond`` --- List of operands to evaluate a + condition for a conditional branch. + +In the simplest case, if a block ends without a branch, then it falls through +to the successor block. No destination blocks are specified for either ``TBB`` +or ``FBB``, so both parameters return ``NULL``. The start of the +``AnalyzeBranch`` (see code below for the ARM target) shows the function +parameters and the code for the simplest case. + +.. code-block:: c++ + + bool ARMInstrInfo::AnalyzeBranch(MachineBasicBlock &MBB, + MachineBasicBlock *&TBB, + MachineBasicBlock *&FBB, + std::vector<MachineOperand> &Cond) const + { + MachineBasicBlock::iterator I = MBB.end(); + if (I == MBB.begin() || !isUnpredicatedTerminator(--I)) + return false; + +If a block ends with a single unconditional branch instruction, then +``AnalyzeBranch`` (shown below) should return the destination of that branch in +the ``TBB`` parameter. + +.. code-block:: c++ + + if (LastOpc == ARM::B || LastOpc == ARM::tB) { + TBB = LastInst->getOperand(0).getMBB(); + return false; + } + +If a block ends with two unconditional branches, then the second branch is +never reached. In that situation, as shown below, remove the last branch +instruction and return the penultimate branch in the ``TBB`` parameter. + +.. code-block:: c++ + + if ((SecondLastOpc == ARM::B || SecondLastOpc == ARM::tB) && + (LastOpc == ARM::B || LastOpc == ARM::tB)) { + TBB = SecondLastInst->getOperand(0).getMBB(); + I = LastInst; + I->eraseFromParent(); + return false; + } + +A block may end with a single conditional branch instruction that falls through +to successor block if the condition evaluates to false. In that case, +``AnalyzeBranch`` (shown below) should return the destination of that +conditional branch in the ``TBB`` parameter and a list of operands in the +``Cond`` parameter to evaluate the condition. + +.. code-block:: c++ + + if (LastOpc == ARM::Bcc || LastOpc == ARM::tBcc) { + // Block ends with fall-through condbranch. + TBB = LastInst->getOperand(0).getMBB(); + Cond.push_back(LastInst->getOperand(1)); + Cond.push_back(LastInst->getOperand(2)); + return false; + } + +If a block ends with both a conditional branch and an ensuing unconditional +branch, then ``AnalyzeBranch`` (shown below) should return the conditional +branch destination (assuming it corresponds to a conditional evaluation of +"``true``") in the ``TBB`` parameter and the unconditional branch destination +in the ``FBB`` (corresponding to a conditional evaluation of "``false``"). A +list of operands to evaluate the condition should be returned in the ``Cond`` +parameter. + +.. code-block:: c++ + + unsigned SecondLastOpc = SecondLastInst->getOpcode(); + + if ((SecondLastOpc == ARM::Bcc && LastOpc == ARM::B) || + (SecondLastOpc == ARM::tBcc && LastOpc == ARM::tB)) { + TBB = SecondLastInst->getOperand(0).getMBB(); + Cond.push_back(SecondLastInst->getOperand(1)); + Cond.push_back(SecondLastInst->getOperand(2)); + FBB = LastInst->getOperand(0).getMBB(); + return false; + } + +For the last two cases (ending with a single conditional branch or ending with +one conditional and one unconditional branch), the operands returned in the +``Cond`` parameter can be passed to methods of other instructions to create new +branches or perform other operations. An implementation of ``AnalyzeBranch`` +requires the helper methods ``RemoveBranch`` and ``InsertBranch`` to manage +subsequent operations. + +``AnalyzeBranch`` should return false indicating success in most circumstances. +``AnalyzeBranch`` should only return true when the method is stumped about what +to do, for example, if a block has three terminating branches. +``AnalyzeBranch`` may return true if it encounters a terminator it cannot +handle, such as an indirect branch. + +.. _instruction-selector: + +Instruction Selector +==================== + +LLVM uses a ``SelectionDAG`` to represent LLVM IR instructions, and nodes of +the ``SelectionDAG`` ideally represent native target instructions. During code +generation, instruction selection passes are performed to convert non-native +DAG instructions into native target-specific instructions. The pass described +in ``XXXISelDAGToDAG.cpp`` is used to match patterns and perform DAG-to-DAG +instruction selection. Optionally, a pass may be defined (in +``XXXBranchSelector.cpp``) to perform similar DAG-to-DAG operations for branch +instructions. Later, the code in ``XXXISelLowering.cpp`` replaces or removes +operations and data types not supported natively (legalizes) in a +``SelectionDAG``. + +TableGen generates code for instruction selection using the following target +description input files: + +* ``XXXInstrInfo.td`` --- Contains definitions of instructions in a + target-specific instruction set, generates ``XXXGenDAGISel.inc``, which is + included in ``XXXISelDAGToDAG.cpp``. + +* ``XXXCallingConv.td`` --- Contains the calling and return value conventions + for the target architecture, and it generates ``XXXGenCallingConv.inc``, + which is included in ``XXXISelLowering.cpp``. + +The implementation of an instruction selection pass must include a header that +declares the ``FunctionPass`` class or a subclass of ``FunctionPass``. In +``XXXTargetMachine.cpp``, a Pass Manager (PM) should add each instruction +selection pass into the queue of passes to run. + +The LLVM static compiler (``llc``) is an excellent tool for visualizing the +contents of DAGs. To display the ``SelectionDAG`` before or after specific +processing phases, use the command line options for ``llc``, described at +:ref:`SelectionDAG-Process`. + +To describe instruction selector behavior, you should add patterns for lowering +LLVM code into a ``SelectionDAG`` as the last parameter of the instruction +definitions in ``XXXInstrInfo.td``. For example, in ``SparcInstrInfo.td``, +this entry defines a register store operation, and the last parameter describes +a pattern with the store DAG operator. + +.. code-block:: llvm + + def STrr : F3_1< 3, 0b000100, (outs), (ins MEMrr:$addr, IntRegs:$src), + "st $src, [$addr]", [(store IntRegs:$src, ADDRrr:$addr)]>; + +``ADDRrr`` is a memory mode that is also defined in ``SparcInstrInfo.td``: + +.. code-block:: llvm + + def ADDRrr : ComplexPattern<i32, 2, "SelectADDRrr", [], []>; + +The definition of ``ADDRrr`` refers to ``SelectADDRrr``, which is a function +defined in an implementation of the Instructor Selector (such as +``SparcISelDAGToDAG.cpp``). + +In ``lib/Target/TargetSelectionDAG.td``, the DAG operator for store is defined +below: + +.. code-block:: llvm + + def store : PatFrag<(ops node:$val, node:$ptr), + (st node:$val, node:$ptr), [{ + if (StoreSDNode *ST = dyn_cast<StoreSDNode>(N)) + return !ST->isTruncatingStore() && + ST->getAddressingMode() == ISD::UNINDEXED; + return false; + }]>; + +``XXXInstrInfo.td`` also generates (in ``XXXGenDAGISel.inc``) the +``SelectCode`` method that is used to call the appropriate processing method +for an instruction. In this example, ``SelectCode`` calls ``Select_ISD_STORE`` +for the ``ISD::STORE`` opcode. + +.. code-block:: c++ + + SDNode *SelectCode(SDValue N) { + ... + MVT::ValueType NVT = N.getNode()->getValueType(0); + switch (N.getOpcode()) { + case ISD::STORE: { + switch (NVT) { + default: + return Select_ISD_STORE(N); + break; + } + break; + } + ... + +The pattern for ``STrr`` is matched, so elsewhere in ``XXXGenDAGISel.inc``, +code for ``STrr`` is created for ``Select_ISD_STORE``. The ``Emit_22`` method +is also generated in ``XXXGenDAGISel.inc`` to complete the processing of this +instruction. + +.. code-block:: c++ + + SDNode *Select_ISD_STORE(const SDValue &N) { + SDValue Chain = N.getOperand(0); + if (Predicate_store(N.getNode())) { + SDValue N1 = N.getOperand(1); + SDValue N2 = N.getOperand(2); + SDValue CPTmp0; + SDValue CPTmp1; + + // Pattern: (st:void IntRegs:i32:$src, + // ADDRrr:i32:$addr)<<P:Predicate_store>> + // Emits: (STrr:void ADDRrr:i32:$addr, IntRegs:i32:$src) + // Pattern complexity = 13 cost = 1 size = 0 + if (SelectADDRrr(N, N2, CPTmp0, CPTmp1) && + N1.getNode()->getValueType(0) == MVT::i32 && + N2.getNode()->getValueType(0) == MVT::i32) { + return Emit_22(N, SP::STrr, CPTmp0, CPTmp1); + } + ... + +The SelectionDAG Legalize Phase +------------------------------- + +The Legalize phase converts a DAG to use types and operations that are natively +supported by the target. For natively unsupported types and operations, you +need to add code to the target-specific ``XXXTargetLowering`` implementation to +convert unsupported types and operations to supported ones. + +In the constructor for the ``XXXTargetLowering`` class, first use the +``addRegisterClass`` method to specify which types are supported and which +register classes are associated with them. The code for the register classes +are generated by TableGen from ``XXXRegisterInfo.td`` and placed in +``XXXGenRegisterInfo.h.inc``. For example, the implementation of the +constructor for the SparcTargetLowering class (in ``SparcISelLowering.cpp``) +starts with the following code: + +.. code-block:: c++ + + addRegisterClass(MVT::i32, SP::IntRegsRegisterClass); + addRegisterClass(MVT::f32, SP::FPRegsRegisterClass); + addRegisterClass(MVT::f64, SP::DFPRegsRegisterClass); + +You should examine the node types in the ``ISD`` namespace +(``include/llvm/CodeGen/SelectionDAGNodes.h``) and determine which operations +the target natively supports. For operations that do **not** have native +support, add a callback to the constructor for the ``XXXTargetLowering`` class, +so the instruction selection process knows what to do. The ``TargetLowering`` +class callback methods (declared in ``llvm/Target/TargetLowering.h``) are: + +* ``setOperationAction`` --- General operation. +* ``setLoadExtAction`` --- Load with extension. +* ``setTruncStoreAction`` --- Truncating store. +* ``setIndexedLoadAction`` --- Indexed load. +* ``setIndexedStoreAction`` --- Indexed store. +* ``setConvertAction`` --- Type conversion. +* ``setCondCodeAction`` --- Support for a given condition code. + +Note: on older releases, ``setLoadXAction`` is used instead of +``setLoadExtAction``. Also, on older releases, ``setCondCodeAction`` may not +be supported. Examine your release to see what methods are specifically +supported. + +These callbacks are used to determine that an operation does or does not work +with a specified type (or types). And in all cases, the third parameter is a +``LegalAction`` type enum value: ``Promote``, ``Expand``, ``Custom``, or +``Legal``. ``SparcISelLowering.cpp`` contains examples of all four +``LegalAction`` values. + +Promote +^^^^^^^ + +For an operation without native support for a given type, the specified type +may be promoted to a larger type that is supported. For example, SPARC does +not support a sign-extending load for Boolean values (``i1`` type), so in +``SparcISelLowering.cpp`` the third parameter below, ``Promote``, changes +``i1`` type values to a large type before loading. + +.. code-block:: c++ + + setLoadExtAction(ISD::SEXTLOAD, MVT::i1, Promote); + +Expand +^^^^^^ + +For a type without native support, a value may need to be broken down further, +rather than promoted. For an operation without native support, a combination +of other operations may be used to similar effect. In SPARC, the +floating-point sine and cosine trig operations are supported by expansion to +other operations, as indicated by the third parameter, ``Expand``, to +``setOperationAction``: + +.. code-block:: c++ + + setOperationAction(ISD::FSIN, MVT::f32, Expand); + setOperationAction(ISD::FCOS, MVT::f32, Expand); + +Custom +^^^^^^ + +For some operations, simple type promotion or operation expansion may be +insufficient. In some cases, a special intrinsic function must be implemented. + +For example, a constant value may require special treatment, or an operation +may require spilling and restoring registers in the stack and working with +register allocators. + +As seen in ``SparcISelLowering.cpp`` code below, to perform a type conversion +from a floating point value to a signed integer, first the +``setOperationAction`` should be called with ``Custom`` as the third parameter: + +.. code-block:: c++ + + setOperationAction(ISD::FP_TO_SINT, MVT::i32, Custom); + +In the ``LowerOperation`` method, for each ``Custom`` operation, a case +statement should be added to indicate what function to call. In the following +code, an ``FP_TO_SINT`` opcode will call the ``LowerFP_TO_SINT`` method: + +.. code-block:: c++ + + SDValue SparcTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) { + switch (Op.getOpcode()) { + case ISD::FP_TO_SINT: return LowerFP_TO_SINT(Op, DAG); + ... + } + } + +Finally, the ``LowerFP_TO_SINT`` method is implemented, using an FP register to +convert the floating-point value to an integer. + +.. code-block:: c++ + + static SDValue LowerFP_TO_SINT(SDValue Op, SelectionDAG &DAG) { + assert(Op.getValueType() == MVT::i32); + Op = DAG.getNode(SPISD::FTOI, MVT::f32, Op.getOperand(0)); + return DAG.getNode(ISD::BITCAST, MVT::i32, Op); + } + +Legal +^^^^^ + +The ``Legal`` ``LegalizeAction`` enum value simply indicates that an operation +**is** natively supported. ``Legal`` represents the default condition, so it +is rarely used. In ``SparcISelLowering.cpp``, the action for ``CTPOP`` (an +operation to count the bits set in an integer) is natively supported only for +SPARC v9. The following code enables the ``Expand`` conversion technique for +non-v9 SPARC implementations. + +.. code-block:: c++ + + setOperationAction(ISD::CTPOP, MVT::i32, Expand); + ... + if (TM.getSubtarget<SparcSubtarget>().isV9()) + setOperationAction(ISD::CTPOP, MVT::i32, Legal); + +Calling Conventions +------------------- + +To support target-specific calling conventions, ``XXXGenCallingConv.td`` uses +interfaces (such as ``CCIfType`` and ``CCAssignToReg``) that are defined in +``lib/Target/TargetCallingConv.td``. TableGen can take the target descriptor +file ``XXXGenCallingConv.td`` and generate the header file +``XXXGenCallingConv.inc``, which is typically included in +``XXXISelLowering.cpp``. You can use the interfaces in +``TargetCallingConv.td`` to specify: + +* The order of parameter allocation. + +* Where parameters and return values are placed (that is, on the stack or in + registers). + +* Which registers may be used. + +* Whether the caller or callee unwinds the stack. + +The following example demonstrates the use of the ``CCIfType`` and +``CCAssignToReg`` interfaces. If the ``CCIfType`` predicate is true (that is, +if the current argument is of type ``f32`` or ``f64``), then the action is +performed. In this case, the ``CCAssignToReg`` action assigns the argument +value to the first available register: either ``R0`` or ``R1``. + +.. code-block:: llvm + + CCIfType<[f32,f64], CCAssignToReg<[R0, R1]>> + +``SparcCallingConv.td`` contains definitions for a target-specific return-value +calling convention (``RetCC_Sparc32``) and a basic 32-bit C calling convention +(``CC_Sparc32``). The definition of ``RetCC_Sparc32`` (shown below) indicates +which registers are used for specified scalar return types. A single-precision +float is returned to register ``F0``, and a double-precision float goes to +register ``D0``. A 32-bit integer is returned in register ``I0`` or ``I1``. + +.. code-block:: llvm + + def RetCC_Sparc32 : CallingConv<[ + CCIfType<[i32], CCAssignToReg<[I0, I1]>>, + CCIfType<[f32], CCAssignToReg<[F0]>>, + CCIfType<[f64], CCAssignToReg<[D0]>> + ]>; + +The definition of ``CC_Sparc32`` in ``SparcCallingConv.td`` introduces +``CCAssignToStack``, which assigns the value to a stack slot with the specified +size and alignment. In the example below, the first parameter, 4, indicates +the size of the slot, and the second parameter, also 4, indicates the stack +alignment along 4-byte units. (Special cases: if size is zero, then the ABI +size is used; if alignment is zero, then the ABI alignment is used.) + +.. code-block:: llvm + + def CC_Sparc32 : CallingConv<[ + // All arguments get passed in integer registers if there is space. + CCIfType<[i32, f32, f64], CCAssignToReg<[I0, I1, I2, I3, I4, I5]>>, + CCAssignToStack<4, 4> + ]>; + +``CCDelegateTo`` is another commonly used interface, which tries to find a +specified sub-calling convention, and, if a match is found, it is invoked. In +the following example (in ``X86CallingConv.td``), the definition of +``RetCC_X86_32_C`` ends with ``CCDelegateTo``. After the current value is +assigned to the register ``ST0`` or ``ST1``, the ``RetCC_X86Common`` is +invoked. + +.. code-block:: llvm + + def RetCC_X86_32_C : CallingConv<[ + CCIfType<[f32], CCAssignToReg<[ST0, ST1]>>, + CCIfType<[f64], CCAssignToReg<[ST0, ST1]>>, + CCDelegateTo<RetCC_X86Common> + ]>; + +``CCIfCC`` is an interface that attempts to match the given name to the current +calling convention. If the name identifies the current calling convention, +then a specified action is invoked. In the following example (in +``X86CallingConv.td``), if the ``Fast`` calling convention is in use, then +``RetCC_X86_32_Fast`` is invoked. If the ``SSECall`` calling convention is in +use, then ``RetCC_X86_32_SSE`` is invoked. + +.. code-block:: llvm + + def RetCC_X86_32 : CallingConv<[ + CCIfCC<"CallingConv::Fast", CCDelegateTo<RetCC_X86_32_Fast>>, + CCIfCC<"CallingConv::X86_SSECall", CCDelegateTo<RetCC_X86_32_SSE>>, + CCDelegateTo<RetCC_X86_32_C> + ]>; + +Other calling convention interfaces include: + +* ``CCIf <predicate, action>`` --- If the predicate matches, apply the action. + +* ``CCIfInReg <action>`` --- If the argument is marked with the "``inreg``" + attribute, then apply the action. + +* ``CCIfNest <action>`` --- If the argument is marked with the "``nest``" + attribute, then apply the action. + +* ``CCIfNotVarArg <action>`` --- If the current function does not take a + variable number of arguments, apply the action. + +* ``CCAssignToRegWithShadow <registerList, shadowList>`` --- similar to + ``CCAssignToReg``, but with a shadow list of registers. + +* ``CCPassByVal <size, align>`` --- Assign value to a stack slot with the + minimum specified size and alignment. + +* ``CCPromoteToType <type>`` --- Promote the current value to the specified + type. + +* ``CallingConv <[actions]>`` --- Define each calling convention that is + supported. + +Assembly Printer +================ + +During the code emission stage, the code generator may utilize an LLVM pass to +produce assembly output. To do this, you want to implement the code for a +printer that converts LLVM IR to a GAS-format assembly language for your target +machine, using the following steps: + +* Define all the assembly strings for your target, adding them to the + instructions defined in the ``XXXInstrInfo.td`` file. (See + :ref:`instruction-set`.) TableGen will produce an output file + (``XXXGenAsmWriter.inc``) with an implementation of the ``printInstruction`` + method for the ``XXXAsmPrinter`` class. + +* Write ``XXXTargetAsmInfo.h``, which contains the bare-bones declaration of + the ``XXXTargetAsmInfo`` class (a subclass of ``TargetAsmInfo``). + +* Write ``XXXTargetAsmInfo.cpp``, which contains target-specific values for + ``TargetAsmInfo`` properties and sometimes new implementations for methods. + +* Write ``XXXAsmPrinter.cpp``, which implements the ``AsmPrinter`` class that + performs the LLVM-to-assembly conversion. + +The code in ``XXXTargetAsmInfo.h`` is usually a trivial declaration of the +``XXXTargetAsmInfo`` class for use in ``XXXTargetAsmInfo.cpp``. Similarly, +``XXXTargetAsmInfo.cpp`` usually has a few declarations of ``XXXTargetAsmInfo`` +replacement values that override the default values in ``TargetAsmInfo.cpp``. +For example in ``SparcTargetAsmInfo.cpp``: + +.. code-block:: c++ + + SparcTargetAsmInfo::SparcTargetAsmInfo(const SparcTargetMachine &TM) { + Data16bitsDirective = "\t.half\t"; + Data32bitsDirective = "\t.word\t"; + Data64bitsDirective = 0; // .xword is only supported by V9. + ZeroDirective = "\t.skip\t"; + CommentString = "!"; + ConstantPoolSection = "\t.section \".rodata\",#alloc\n"; + } + +The X86 assembly printer implementation (``X86TargetAsmInfo``) is an example +where the target specific ``TargetAsmInfo`` class uses an overridden methods: +``ExpandInlineAsm``. + +A target-specific implementation of ``AsmPrinter`` is written in +``XXXAsmPrinter.cpp``, which implements the ``AsmPrinter`` class that converts +the LLVM to printable assembly. The implementation must include the following +headers that have declarations for the ``AsmPrinter`` and +``MachineFunctionPass`` classes. The ``MachineFunctionPass`` is a subclass of +``FunctionPass``. + +.. code-block:: c++ + + #include "llvm/CodeGen/AsmPrinter.h" + #include "llvm/CodeGen/MachineFunctionPass.h" + +As a ``FunctionPass``, ``AsmPrinter`` first calls ``doInitialization`` to set +up the ``AsmPrinter``. In ``SparcAsmPrinter``, a ``Mangler`` object is +instantiated to process variable names. + +In ``XXXAsmPrinter.cpp``, the ``runOnMachineFunction`` method (declared in +``MachineFunctionPass``) must be implemented for ``XXXAsmPrinter``. In +``MachineFunctionPass``, the ``runOnFunction`` method invokes +``runOnMachineFunction``. Target-specific implementations of +``runOnMachineFunction`` differ, but generally do the following to process each +machine function: + +* Call ``SetupMachineFunction`` to perform initialization. + +* Call ``EmitConstantPool`` to print out (to the output stream) constants which + have been spilled to memory. + +* Call ``EmitJumpTableInfo`` to print out jump tables used by the current + function. + +* Print out the label for the current function. + +* Print out the code for the function, including basic block labels and the + assembly for the instruction (using ``printInstruction``) + +The ``XXXAsmPrinter`` implementation must also include the code generated by +TableGen that is output in the ``XXXGenAsmWriter.inc`` file. The code in +``XXXGenAsmWriter.inc`` contains an implementation of the ``printInstruction`` +method that may call these methods: + +* ``printOperand`` +* ``printMemOperand`` +* ``printCCOperand`` (for conditional statements) +* ``printDataDirective`` +* ``printDeclare`` +* ``printImplicitDef`` +* ``printInlineAsm`` + +The implementations of ``printDeclare``, ``printImplicitDef``, +``printInlineAsm``, and ``printLabel`` in ``AsmPrinter.cpp`` are generally +adequate for printing assembly and do not need to be overridden. + +The ``printOperand`` method is implemented with a long ``switch``/``case`` +statement for the type of operand: register, immediate, basic block, external +symbol, global address, constant pool index, or jump table index. For an +instruction with a memory address operand, the ``printMemOperand`` method +should be implemented to generate the proper output. Similarly, +``printCCOperand`` should be used to print a conditional operand. + +``doFinalization`` should be overridden in ``XXXAsmPrinter``, and it should be +called to shut down the assembly printer. During ``doFinalization``, global +variables and constants are printed to output. + +Subtarget Support +================= + +Subtarget support is used to inform the code generation process of instruction +set variations for a given chip set. For example, the LLVM SPARC +implementation provided covers three major versions of the SPARC microprocessor +architecture: Version 8 (V8, which is a 32-bit architecture), Version 9 (V9, a +64-bit architecture), and the UltraSPARC architecture. V8 has 16 +double-precision floating-point registers that are also usable as either 32 +single-precision or 8 quad-precision registers. V8 is also purely big-endian. +V9 has 32 double-precision floating-point registers that are also usable as 16 +quad-precision registers, but cannot be used as single-precision registers. +The UltraSPARC architecture combines V9 with UltraSPARC Visual Instruction Set +extensions. + +If subtarget support is needed, you should implement a target-specific +``XXXSubtarget`` class for your architecture. This class should process the +command-line options ``-mcpu=`` and ``-mattr=``. + +TableGen uses definitions in the ``Target.td`` and ``Sparc.td`` files to +generate code in ``SparcGenSubtarget.inc``. In ``Target.td``, shown below, the +``SubtargetFeature`` interface is defined. The first 4 string parameters of +the ``SubtargetFeature`` interface are a feature name, an attribute set by the +feature, the value of the attribute, and a description of the feature. (The +fifth parameter is a list of features whose presence is implied, and its +default value is an empty array.) + +.. code-block:: llvm + + class SubtargetFeature<string n, string a, string v, string d, + list<SubtargetFeature> i = []> { + string Name = n; + string Attribute = a; + string Value = v; + string Desc = d; + list<SubtargetFeature> Implies = i; + } + +In the ``Sparc.td`` file, the ``SubtargetFeature`` is used to define the +following features. + +.. code-block:: llvm + + def FeatureV9 : SubtargetFeature<"v9", "IsV9", "true", + "Enable SPARC-V9 instructions">; + def FeatureV8Deprecated : SubtargetFeature<"deprecated-v8", + "V8DeprecatedInsts", "true", + "Enable deprecated V8 instructions in V9 mode">; + def FeatureVIS : SubtargetFeature<"vis", "IsVIS", "true", + "Enable UltraSPARC Visual Instruction Set extensions">; + +Elsewhere in ``Sparc.td``, the ``Proc`` class is defined and then is used to +define particular SPARC processor subtypes that may have the previously +described features. + +.. code-block:: llvm + + class Proc<string Name, list<SubtargetFeature> Features> + : Processor<Name, NoItineraries, Features>; + + def : Proc<"generic", []>; + def : Proc<"v8", []>; + def : Proc<"supersparc", []>; + def : Proc<"sparclite", []>; + def : Proc<"f934", []>; + def : Proc<"hypersparc", []>; + def : Proc<"sparclite86x", []>; + def : Proc<"sparclet", []>; + def : Proc<"tsc701", []>; + def : Proc<"v9", [FeatureV9]>; + def : Proc<"ultrasparc", [FeatureV9, FeatureV8Deprecated]>; + def : Proc<"ultrasparc3", [FeatureV9, FeatureV8Deprecated]>; + def : Proc<"ultrasparc3-vis", [FeatureV9, FeatureV8Deprecated, FeatureVIS]>; + +From ``Target.td`` and ``Sparc.td`` files, the resulting +``SparcGenSubtarget.inc`` specifies enum values to identify the features, +arrays of constants to represent the CPU features and CPU subtypes, and the +``ParseSubtargetFeatures`` method that parses the features string that sets +specified subtarget options. The generated ``SparcGenSubtarget.inc`` file +should be included in the ``SparcSubtarget.cpp``. The target-specific +implementation of the ``XXXSubtarget`` method should follow this pseudocode: + +.. code-block:: c++ + + XXXSubtarget::XXXSubtarget(const Module &M, const std::string &FS) { + // Set the default features + // Determine default and user specified characteristics of the CPU + // Call ParseSubtargetFeatures(FS, CPU) to parse the features string + // Perform any additional operations + } + +JIT Support +=========== + +The implementation of a target machine optionally includes a Just-In-Time (JIT) +code generator that emits machine code and auxiliary structures as binary +output that can be written directly to memory. To do this, implement JIT code +generation by performing the following steps: + +* Write an ``XXXCodeEmitter.cpp`` file that contains a machine function pass + that transforms target-machine instructions into relocatable machine + code. + +* Write an ``XXXJITInfo.cpp`` file that implements the JIT interfaces for + target-specific code-generation activities, such as emitting machine code and + stubs. + +* Modify ``XXXTargetMachine`` so that it provides a ``TargetJITInfo`` object + through its ``getJITInfo`` method. + +There are several different approaches to writing the JIT support code. For +instance, TableGen and target descriptor files may be used for creating a JIT +code generator, but are not mandatory. For the Alpha and PowerPC target +machines, TableGen is used to generate ``XXXGenCodeEmitter.inc``, which +contains the binary coding of machine instructions and the +``getBinaryCodeForInstr`` method to access those codes. Other JIT +implementations do not. + +Both ``XXXJITInfo.cpp`` and ``XXXCodeEmitter.cpp`` must include the +``llvm/CodeGen/MachineCodeEmitter.h`` header file that defines the +``MachineCodeEmitter`` class containing code for several callback functions +that write data (in bytes, words, strings, etc.) to the output stream. + +Machine Code Emitter +-------------------- + +In ``XXXCodeEmitter.cpp``, a target-specific of the ``Emitter`` class is +implemented as a function pass (subclass of ``MachineFunctionPass``). The +target-specific implementation of ``runOnMachineFunction`` (invoked by +``runOnFunction`` in ``MachineFunctionPass``) iterates through the +``MachineBasicBlock`` calls ``emitInstruction`` to process each instruction and +emit binary code. ``emitInstruction`` is largely implemented with case +statements on the instruction types defined in ``XXXInstrInfo.h``. For +example, in ``X86CodeEmitter.cpp``, the ``emitInstruction`` method is built +around the following ``switch``/``case`` statements: + +.. code-block:: c++ + + switch (Desc->TSFlags & X86::FormMask) { + case X86II::Pseudo: // for not yet implemented instructions + ... // or pseudo-instructions + break; + case X86II::RawFrm: // for instructions with a fixed opcode value + ... + break; + case X86II::AddRegFrm: // for instructions that have one register operand + ... // added to their opcode + break; + case X86II::MRMDestReg:// for instructions that use the Mod/RM byte + ... // to specify a destination (register) + break; + case X86II::MRMDestMem:// for instructions that use the Mod/RM byte + ... // to specify a destination (memory) + break; + case X86II::MRMSrcReg: // for instructions that use the Mod/RM byte + ... // to specify a source (register) + break; + case X86II::MRMSrcMem: // for instructions that use the Mod/RM byte + ... // to specify a source (memory) + break; + case X86II::MRM0r: case X86II::MRM1r: // for instructions that operate on + case X86II::MRM2r: case X86II::MRM3r: // a REGISTER r/m operand and + case X86II::MRM4r: case X86II::MRM5r: // use the Mod/RM byte and a field + case X86II::MRM6r: case X86II::MRM7r: // to hold extended opcode data + ... + break; + case X86II::MRM0m: case X86II::MRM1m: // for instructions that operate on + case X86II::MRM2m: case X86II::MRM3m: // a MEMORY r/m operand and + case X86II::MRM4m: case X86II::MRM5m: // use the Mod/RM byte and a field + case X86II::MRM6m: case X86II::MRM7m: // to hold extended opcode data + ... + break; + case X86II::MRMInitReg: // for instructions whose source and + ... // destination are the same register + break; + } + +The implementations of these case statements often first emit the opcode and +then get the operand(s). Then depending upon the operand, helper methods may +be called to process the operand(s). For example, in ``X86CodeEmitter.cpp``, +for the ``X86II::AddRegFrm`` case, the first data emitted (by ``emitByte``) is +the opcode added to the register operand. Then an object representing the +machine operand, ``MO1``, is extracted. The helper methods such as +``isImmediate``, ``isGlobalAddress``, ``isExternalSymbol``, +``isConstantPoolIndex``, and ``isJumpTableIndex`` determine the operand type. +(``X86CodeEmitter.cpp`` also has private methods such as ``emitConstant``, +``emitGlobalAddress``, ``emitExternalSymbolAddress``, ``emitConstPoolAddress``, +and ``emitJumpTableAddress`` that emit the data into the output stream.) + +.. code-block:: c++ + + case X86II::AddRegFrm: + MCE.emitByte(BaseOpcode + getX86RegNum(MI.getOperand(CurOp++).getReg())); + + if (CurOp != NumOps) { + const MachineOperand &MO1 = MI.getOperand(CurOp++); + unsigned Size = X86InstrInfo::sizeOfImm(Desc); + if (MO1.isImmediate()) + emitConstant(MO1.getImm(), Size); + else { + unsigned rt = Is64BitMode ? X86::reloc_pcrel_word + : (IsPIC ? X86::reloc_picrel_word : X86::reloc_absolute_word); + if (Opcode == X86::MOV64ri) + rt = X86::reloc_absolute_dword; // FIXME: add X86II flag? + if (MO1.isGlobalAddress()) { + bool NeedStub = isa<Function>(MO1.getGlobal()); + bool isLazy = gvNeedsLazyPtr(MO1.getGlobal()); + emitGlobalAddress(MO1.getGlobal(), rt, MO1.getOffset(), 0, + NeedStub, isLazy); + } else if (MO1.isExternalSymbol()) + emitExternalSymbolAddress(MO1.getSymbolName(), rt); + else if (MO1.isConstantPoolIndex()) + emitConstPoolAddress(MO1.getIndex(), rt); + else if (MO1.isJumpTableIndex()) + emitJumpTableAddress(MO1.getIndex(), rt); + } + } + break; + +In the previous example, ``XXXCodeEmitter.cpp`` uses the variable ``rt``, which +is a ``RelocationType`` enum that may be used to relocate addresses (for +example, a global address with a PIC base offset). The ``RelocationType`` enum +for that target is defined in the short target-specific ``XXXRelocations.h`` +file. The ``RelocationType`` is used by the ``relocate`` method defined in +``XXXJITInfo.cpp`` to rewrite addresses for referenced global symbols. + +For example, ``X86Relocations.h`` specifies the following relocation types for +the X86 addresses. In all four cases, the relocated value is added to the +value already in memory. For ``reloc_pcrel_word`` and ``reloc_picrel_word``, +there is an additional initial adjustment. + +.. code-block:: c++ + + enum RelocationType { + reloc_pcrel_word = 0, // add reloc value after adjusting for the PC loc + reloc_picrel_word = 1, // add reloc value after adjusting for the PIC base + reloc_absolute_word = 2, // absolute relocation; no additional adjustment + reloc_absolute_dword = 3 // absolute relocation; no additional adjustment + }; + +Target JIT Info +--------------- + +``XXXJITInfo.cpp`` implements the JIT interfaces for target-specific +code-generation activities, such as emitting machine code and stubs. At +minimum, a target-specific version of ``XXXJITInfo`` implements the following: + +* ``getLazyResolverFunction`` --- Initializes the JIT, gives the target a + function that is used for compilation. + +* ``emitFunctionStub`` --- Returns a native function with a specified address + for a callback function. + +* ``relocate`` --- Changes the addresses of referenced globals, based on + relocation types. + +* Callback function that are wrappers to a function stub that is used when the + real target is not initially known. + +``getLazyResolverFunction`` is generally trivial to implement. It makes the +incoming parameter as the global ``JITCompilerFunction`` and returns the +callback function that will be used a function wrapper. For the Alpha target +(in ``AlphaJITInfo.cpp``), the ``getLazyResolverFunction`` implementation is +simply: + +.. code-block:: c++ + + TargetJITInfo::LazyResolverFn AlphaJITInfo::getLazyResolverFunction( + JITCompilerFn F) { + JITCompilerFunction = F; + return AlphaCompilationCallback; + } + +For the X86 target, the ``getLazyResolverFunction`` implementation is a little +more complicated, because it returns a different callback function for +processors with SSE instructions and XMM registers. + +The callback function initially saves and later restores the callee register +values, incoming arguments, and frame and return address. The callback +function needs low-level access to the registers or stack, so it is typically +implemented with assembler. + diff --git a/docs/conf.py b/docs/conf.py index a1e9b5f6e2..919bb3bc9d 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -47,9 +47,9 @@ copyright = u'2012, LLVM Project' # built documents. # # The short X.Y version. -version = '3.2' +version = '3.3' # The full version, including alpha/beta/rc tags. -release = '3.2' +release = '3.3' # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. diff --git a/docs/design_and_overview.rst b/docs/design_and_overview.rst index ea684155e0..22e8125bb6 100644 --- a/docs/design_and_overview.rst +++ b/docs/design_and_overview.rst @@ -6,9 +6,10 @@ LLVM Design & Overview .. toctree:: :hidden: + LangRef GetElementPtr -* `LLVM Language Reference Manual <LangRef.html>`_ +* :doc:`LangRef` Defines the LLVM intermediate representation. diff --git a/docs/development_process.rst b/docs/development_process.rst index 4fc20b3412..ecd4c6a616 100644 --- a/docs/development_process.rst +++ b/docs/development_process.rst @@ -8,6 +8,8 @@ Development Process Documentation MakefileGuide Projects + LLVMBuild + HowToReleaseLLVM * :ref:`projects` @@ -16,7 +18,7 @@ Development Process Documentation tree) allow the project code to be located outside (or inside) the ``llvm/`` tree, while using LLVM header files and libraries. -* `LLVMBuild Documentation <LLVMBuild.html>`_ +* :doc:`LLVMBuild` Describes the LLVMBuild organization and files used by LLVM to specify component descriptions. @@ -25,6 +27,6 @@ Development Process Documentation Describes how the LLVM makefiles work and how to use them. -* `How To Release LLVM To The Public <HowToReleaseLLVM.html>`_ +* :doc:`HowToReleaseLLVM` This is a guide to preparing LLVM releases. Most developers can ignore it. diff --git a/docs/programming.rst b/docs/programming.rst index c4eec59417..3fea6ed427 100644 --- a/docs/programming.rst +++ b/docs/programming.rst @@ -12,6 +12,7 @@ Programming Documentation CompilerWriterInfo ExtendingLLVM HowToSetUpLLVMStyleRTTI + ProgrammersManual * `LLVM Language Reference Manual <LangRef.html>`_ @@ -22,7 +23,7 @@ Programming Documentation Information about LLVM's concurrency model. -* `The LLVM Programmers Manual <ProgrammersManual.html>`_ +* :doc:`ProgrammersManual` Introduction to the general layout of the LLVM sourcebase, important classes and APIs, and some tips & tricks. diff --git a/docs/subsystems.rst b/docs/subsystems.rst index 80d0eed663..275955be6e 100644 --- a/docs/subsystems.rst +++ b/docs/subsystems.rst @@ -18,13 +18,21 @@ Subsystem Documentation DebuggingJITedCode GoldPlugin MarkedUpDisassembly + HowToUseInstrMappings + SystemLibrary + SourceLevelDebugging + WritingAnLLVMBackend + GarbageCollection + +.. FIXME: once LangRef is Sphinxified, HowToUseInstrMappings should be put + under LangRef's toctree instead of this page's toctree. * `Writing an LLVM Pass <WritingAnLLVMPass.html>`_ Information on how to write LLVM transformations and analyses. -* `Writing an LLVM Backend <WritingAnLLVMBackend.html>`_ - +* :doc:`WritingAnLLVMBackend` + Information on how to write LLVM backends for machine targets. * :ref:`code_generator` @@ -42,13 +50,13 @@ Subsystem Documentation Information on how to write a new alias analysis implementation or how to use existing analyses. - -* `Accurate Garbage Collection with LLVM <GarbageCollection.html>`_ - + +* :doc:`GarbageCollection` + The interfaces source-language compilers should use for compiling GC'd programs. -* `Source Level Debugging with LLVM <SourceLevelDebugging.html>`_ +* :doc:`Source Level Debugging with LLVM <SourceLevelDebugging>` This document describes the design and philosophy behind the LLVM source-level debugger. @@ -67,9 +75,9 @@ Subsystem Documentation This describes the file format and encoding used for LLVM "bc" files. -* `System Library <SystemLibrary.html>`_ +* :doc:`System Library <SystemLibrary>` - This document describes the LLVM System Library (<tt>lib/System</tt>) and + This document describes the LLVM System Library (``lib/System``) and how to keep LLVM source code portable * :ref:`lto` diff --git a/docs/tutorial/LangImpl1.html b/docs/tutorial/LangImpl1.html deleted file mode 100644 index a65646f286..0000000000 --- a/docs/tutorial/LangImpl1.html +++ /dev/null @@ -1,348 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> - -<html> -<head> - <title>Kaleidoscope: Tutorial Introduction and the Lexer</title> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <meta name="author" content="Chris Lattner"> - <link rel="stylesheet" href="../_static/llvm.css" type="text/css"> -</head> - -<body> - -<h1>Kaleidoscope: Tutorial Introduction and the Lexer</h1> - -<ul> -<li><a href="index.html">Up to Tutorial Index</a></li> -<li>Chapter 1 - <ol> - <li><a href="#intro">Tutorial Introduction</a></li> - <li><a href="#language">The Basic Language</a></li> - <li><a href="#lexer">The Lexer</a></li> - </ol> -</li> -<li><a href="LangImpl2.html">Chapter 2</a>: Implementing a Parser and AST</li> -</ul> - -<div class="doc_author"> - <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p> -</div> - -<!-- *********************************************************************** --> -<h2><a name="intro">Tutorial Introduction</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>Welcome to the "Implementing a language with LLVM" tutorial. This tutorial -runs through the implementation of a simple language, showing how fun and -easy it can be. This tutorial will get you up and started as well as help to -build a framework you can extend to other languages. The code in this tutorial -can also be used as a playground to hack on other LLVM specific things. -</p> - -<p> -The goal of this tutorial is to progressively unveil our language, describing -how it is built up over time. This will let us cover a fairly broad range of -language design and LLVM-specific usage issues, showing and explaining the code -for it all along the way, without overwhelming you with tons of details up -front.</p> - -<p>It is useful to point out ahead of time that this tutorial is really about -teaching compiler techniques and LLVM specifically, <em>not</em> about teaching -modern and sane software engineering principles. In practice, this means that -we'll take a number of shortcuts to simplify the exposition. For example, the -code leaks memory, uses global variables all over the place, doesn't use nice -design patterns like <a -href="http://en.wikipedia.org/wiki/Visitor_pattern">visitors</a>, etc... but it -is very simple. If you dig in and use the code as a basis for future projects, -fixing these deficiencies shouldn't be hard.</p> - -<p>I've tried to put this tutorial together in a way that makes chapters easy to -skip over if you are already familiar with or are uninterested in the various -pieces. The structure of the tutorial is: -</p> - -<ul> -<li><b><a href="#language">Chapter #1</a>: Introduction to the Kaleidoscope -language, and the definition of its Lexer</b> - This shows where we are going -and the basic functionality that we want it to do. In order to make this -tutorial maximally understandable and hackable, we choose to implement -everything in C++ instead of using lexer and parser generators. LLVM obviously -works just fine with such tools, feel free to use one if you prefer.</li> -<li><b><a href="LangImpl2.html">Chapter #2</a>: Implementing a Parser and -AST</b> - With the lexer in place, we can talk about parsing techniques and -basic AST construction. This tutorial describes recursive descent parsing and -operator precedence parsing. Nothing in Chapters 1 or 2 is LLVM-specific, -the code doesn't even link in LLVM at this point. :)</li> -<li><b><a href="LangImpl3.html">Chapter #3</a>: Code generation to LLVM IR</b> - -With the AST ready, we can show off how easy generation of LLVM IR really -is.</li> -<li><b><a href="LangImpl4.html">Chapter #4</a>: Adding JIT and Optimizer -Support</b> - Because a lot of people are interested in using LLVM as a JIT, -we'll dive right into it and show you the 3 lines it takes to add JIT support. -LLVM is also useful in many other ways, but this is one simple and "sexy" way -to shows off its power. :)</li> -<li><b><a href="LangImpl5.html">Chapter #5</a>: Extending the Language: Control -Flow</b> - With the language up and running, we show how to extend it with -control flow operations (if/then/else and a 'for' loop). This gives us a chance -to talk about simple SSA construction and control flow.</li> -<li><b><a href="LangImpl6.html">Chapter #6</a>: Extending the Language: -User-defined Operators</b> - This is a silly but fun chapter that talks about -extending the language to let the user program define their own arbitrary -unary and binary operators (with assignable precedence!). This lets us build a -significant piece of the "language" as library routines.</li> -<li><b><a href="LangImpl7.html">Chapter #7</a>: Extending the Language: Mutable -Variables</b> - This chapter talks about adding user-defined local variables -along with an assignment operator. The interesting part about this is how -easy and trivial it is to construct SSA form in LLVM: no, LLVM does <em>not</em> -require your front-end to construct SSA form!</li> -<li><b><a href="LangImpl8.html">Chapter #8</a>: Conclusion and other useful LLVM -tidbits</b> - This chapter wraps up the series by talking about potential -ways to extend the language, but also includes a bunch of pointers to info about -"special topics" like adding garbage collection support, exceptions, debugging, -support for "spaghetti stacks", and a bunch of other tips and tricks.</li> - -</ul> - -<p>By the end of the tutorial, we'll have written a bit less than 700 lines of -non-comment, non-blank, lines of code. With this small amount of code, we'll -have built up a very reasonable compiler for a non-trivial language including -a hand-written lexer, parser, AST, as well as code generation support with a JIT -compiler. While other systems may have interesting "hello world" tutorials, -I think the breadth of this tutorial is a great testament to the strengths of -LLVM and why you should consider it if you're interested in language or compiler -design.</p> - -<p>A note about this tutorial: we expect you to extend the language and play -with it on your own. Take the code and go crazy hacking away at it, compilers -don't need to be scary creatures - it can be a lot of fun to play with -languages!</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="language">The Basic Language</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>This tutorial will be illustrated with a toy language that we'll call -"<a href="http://en.wikipedia.org/wiki/Kaleidoscope">Kaleidoscope</a>" (derived -from "meaning beautiful, form, and view"). -Kaleidoscope is a procedural language that allows you to define functions, use -conditionals, math, etc. Over the course of the tutorial, we'll extend -Kaleidoscope to support the if/then/else construct, a for loop, user defined -operators, JIT compilation with a simple command line interface, etc.</p> - -<p>Because we want to keep things simple, the only datatype in Kaleidoscope is a -64-bit floating point type (aka 'double' in C parlance). As such, all values -are implicitly double precision and the language doesn't require type -declarations. This gives the language a very nice and simple syntax. For -example, the following simple example computes <a -href="http://en.wikipedia.org/wiki/Fibonacci_number">Fibonacci numbers:</a></p> - -<div class="doc_code"> -<pre> -# Compute the x'th fibonacci number. -def fib(x) - if x < 3 then - 1 - else - fib(x-1)+fib(x-2) - -# This expression will compute the 40th number. -fib(40) -</pre> -</div> - -<p>We also allow Kaleidoscope to call into standard library functions (the LLVM -JIT makes this completely trivial). This means that you can use the 'extern' -keyword to define a function before you use it (this is also useful for mutually -recursive functions). For example:</p> - -<div class="doc_code"> -<pre> -extern sin(arg); -extern cos(arg); -extern atan2(arg1 arg2); - -atan2(sin(.4), cos(42)) -</pre> -</div> - -<p>A more interesting example is included in Chapter 6 where we write a little -Kaleidoscope application that <a href="LangImpl6.html#example">displays -a Mandelbrot Set</a> at various levels of magnification.</p> - -<p>Lets dive into the implementation of this language!</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="lexer">The Lexer</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>When it comes to implementing a language, the first thing needed is -the ability to process a text file and recognize what it says. The traditional -way to do this is to use a "<a -href="http://en.wikipedia.org/wiki/Lexical_analysis">lexer</a>" (aka 'scanner') -to break the input up into "tokens". Each token returned by the lexer includes -a token code and potentially some metadata (e.g. the numeric value of a number). -First, we define the possibilities: -</p> - -<div class="doc_code"> -<pre> -// The lexer returns tokens [0-255] if it is an unknown character, otherwise one -// of these for known things. -enum Token { - tok_eof = -1, - - // commands - tok_def = -2, tok_extern = -3, - - // primary - tok_identifier = -4, tok_number = -5, -}; - -static std::string IdentifierStr; // Filled in if tok_identifier -static double NumVal; // Filled in if tok_number -</pre> -</div> - -<p>Each token returned by our lexer will either be one of the Token enum values -or it will be an 'unknown' character like '+', which is returned as its ASCII -value. If the current token is an identifier, the <tt>IdentifierStr</tt> -global variable holds the name of the identifier. If the current token is a -numeric literal (like 1.0), <tt>NumVal</tt> holds its value. Note that we use -global variables for simplicity, this is not the best choice for a real language -implementation :). -</p> - -<p>The actual implementation of the lexer is a single function named -<tt>gettok</tt>. The <tt>gettok</tt> function is called to return the next token -from standard input. Its definition starts as:</p> - -<div class="doc_code"> -<pre> -/// gettok - Return the next token from standard input. -static int gettok() { - static int LastChar = ' '; - - // Skip any whitespace. - while (isspace(LastChar)) - LastChar = getchar(); -</pre> -</div> - -<p> -<tt>gettok</tt> works by calling the C <tt>getchar()</tt> function to read -characters one at a time from standard input. It eats them as it recognizes -them and stores the last character read, but not processed, in LastChar. The -first thing that it has to do is ignore whitespace between tokens. This is -accomplished with the loop above.</p> - -<p>The next thing <tt>gettok</tt> needs to do is recognize identifiers and -specific keywords like "def". Kaleidoscope does this with this simple loop:</p> - -<div class="doc_code"> -<pre> - if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]* - IdentifierStr = LastChar; - while (isalnum((LastChar = getchar()))) - IdentifierStr += LastChar; - - if (IdentifierStr == "def") return tok_def; - if (IdentifierStr == "extern") return tok_extern; - return tok_identifier; - } -</pre> -</div> - -<p>Note that this code sets the '<tt>IdentifierStr</tt>' global whenever it -lexes an identifier. Also, since language keywords are matched by the same -loop, we handle them here inline. Numeric values are similar:</p> - -<div class="doc_code"> -<pre> - if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+ - std::string NumStr; - do { - NumStr += LastChar; - LastChar = getchar(); - } while (isdigit(LastChar) || LastChar == '.'); - - NumVal = strtod(NumStr.c_str(), 0); - return tok_number; - } -</pre> -</div> - -<p>This is all pretty straight-forward code for processing input. When reading -a numeric value from input, we use the C <tt>strtod</tt> function to convert it -to a numeric value that we store in <tt>NumVal</tt>. Note that this isn't doing -sufficient error checking: it will incorrectly read "1.23.45.67" and handle it as -if you typed in "1.23". Feel free to extend it :). Next we handle comments: -</p> - -<div class="doc_code"> -<pre> - if (LastChar == '#') { - // Comment until end of line. - do LastChar = getchar(); - while (LastChar != EOF && LastChar != '\n' && LastChar != '\r'); - - if (LastChar != EOF) - return gettok(); - } -</pre> -</div> - -<p>We handle comments by skipping to the end of the line and then return the -next token. Finally, if the input doesn't match one of the above cases, it is -either an operator character like '+' or the end of the file. These are handled -with this code:</p> - -<div class="doc_code"> -<pre> - // Check for end of file. Don't eat the EOF. - if (LastChar == EOF) - return tok_eof; - - // Otherwise, just return the character as its ascii value. - int ThisChar = LastChar; - LastChar = getchar(); - return ThisChar; -} -</pre> -</div> - -<p>With this, we have the complete lexer for the basic Kaleidoscope language -(the <a href="LangImpl2.html#code">full code listing</a> for the Lexer is -available in the <a href="LangImpl2.html">next chapter</a> of the tutorial). -Next we'll <a href="LangImpl2.html">build a simple parser that uses this to -build an Abstract Syntax Tree</a>. When we have that, we'll include a driver -so that you can use the lexer and parser together. -</p> - -<a href="LangImpl2.html">Next: Implementing a Parser and AST</a> -</div> - -<!-- *********************************************************************** --> -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> - - <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> - <a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> -</body> -</html> diff --git a/docs/tutorial/LangImpl1.rst b/docs/tutorial/LangImpl1.rst new file mode 100644 index 0000000000..eb84e4c923 --- /dev/null +++ b/docs/tutorial/LangImpl1.rst @@ -0,0 +1,280 @@ +================================================= +Kaleidoscope: Tutorial Introduction and the Lexer +================================================= + +.. contents:: + :local: + +Written by `Chris Lattner <mailto:sabre@nondot.org>`_ + +Tutorial Introduction +===================== + +Welcome to the "Implementing a language with LLVM" tutorial. This +tutorial runs through the implementation of a simple language, showing +how fun and easy it can be. This tutorial will get you up and started as +well as help to build a framework you can extend to other languages. The +code in this tutorial can also be used as a playground to hack on other +LLVM specific things. + +The goal of this tutorial is to progressively unveil our language, +describing how it is built up over time. This will let us cover a fairly +broad range of language design and LLVM-specific usage issues, showing +and explaining the code for it all along the way, without overwhelming +you with tons of details up front. + +It is useful to point out ahead of time that this tutorial is really +about teaching compiler techniques and LLVM specifically, *not* about +teaching modern and sane software engineering principles. In practice, +this means that we'll take a number of shortcuts to simplify the +exposition. For example, the code leaks memory, uses global variables +all over the place, doesn't use nice design patterns like +`visitors <http://en.wikipedia.org/wiki/Visitor_pattern>`_, etc... but +it is very simple. If you dig in and use the code as a basis for future +projects, fixing these deficiencies shouldn't be hard. + +I've tried to put this tutorial together in a way that makes chapters +easy to skip over if you are already familiar with or are uninterested +in the various pieces. The structure of the tutorial is: + +- `Chapter #1 <#language>`_: Introduction to the Kaleidoscope + language, and the definition of its Lexer - This shows where we are + going and the basic functionality that we want it to do. In order to + make this tutorial maximally understandable and hackable, we choose + to implement everything in C++ instead of using lexer and parser + generators. LLVM obviously works just fine with such tools, feel free + to use one if you prefer. +- `Chapter #2 <LangImpl2.html>`_: Implementing a Parser and AST - + With the lexer in place, we can talk about parsing techniques and + basic AST construction. This tutorial describes recursive descent + parsing and operator precedence parsing. Nothing in Chapters 1 or 2 + is LLVM-specific, the code doesn't even link in LLVM at this point. + :) +- `Chapter #3 <LangImpl3.html>`_: Code generation to LLVM IR - With + the AST ready, we can show off how easy generation of LLVM IR really + is. +- `Chapter #4 <LangImpl4.html>`_: Adding JIT and Optimizer Support + - Because a lot of people are interested in using LLVM as a JIT, + we'll dive right into it and show you the 3 lines it takes to add JIT + support. LLVM is also useful in many other ways, but this is one + simple and "sexy" way to shows off its power. :) +- `Chapter #5 <LangImpl5.html>`_: Extending the Language: Control + Flow - With the language up and running, we show how to extend it + with control flow operations (if/then/else and a 'for' loop). This + gives us a chance to talk about simple SSA construction and control + flow. +- `Chapter #6 <LangImpl6.html>`_: Extending the Language: + User-defined Operators - This is a silly but fun chapter that talks + about extending the language to let the user program define their own + arbitrary unary and binary operators (with assignable precedence!). + This lets us build a significant piece of the "language" as library + routines. +- `Chapter #7 <LangImpl7.html>`_: Extending the Language: Mutable + Variables - This chapter talks about adding user-defined local + variables along with an assignment operator. The interesting part + about this is how easy and trivial it is to construct SSA form in + LLVM: no, LLVM does *not* require your front-end to construct SSA + form! +- `Chapter #8 <LangImpl8.html>`_: Conclusion and other useful LLVM + tidbits - This chapter wraps up the series by talking about + potential ways to extend the language, but also includes a bunch of + pointers to info about "special topics" like adding garbage + collection support, exceptions, debugging, support for "spaghetti + stacks", and a bunch of other tips and tricks. + +By the end of the tutorial, we'll have written a bit less than 700 lines +of non-comment, non-blank, lines of code. With this small amount of +code, we'll have built up a very reasonable compiler for a non-trivial +language including a hand-written lexer, parser, AST, as well as code +generation support with a JIT compiler. While other systems may have +interesting "hello world" tutorials, I think the breadth of this +tutorial is a great testament to the strengths of LLVM and why you +should consider it if you're interested in language or compiler design. + +A note about this tutorial: we expect you to extend the language and +play with it on your own. Take the code and go crazy hacking away at it, +compilers don't need to be scary creatures - it can be a lot of fun to +play with languages! + +The Basic Language +================== + +This tutorial will be illustrated with a toy language that we'll call +"`Kaleidoscope <http://en.wikipedia.org/wiki/Kaleidoscope>`_" (derived +from "meaning beautiful, form, and view"). Kaleidoscope is a procedural +language that allows you to define functions, use conditionals, math, +etc. Over the course of the tutorial, we'll extend Kaleidoscope to +support the if/then/else construct, a for loop, user defined operators, +JIT compilation with a simple command line interface, etc. + +Because we want to keep things simple, the only datatype in Kaleidoscope +is a 64-bit floating point type (aka 'double' in C parlance). As such, +all values are implicitly double precision and the language doesn't +require type declarations. This gives the language a very nice and +simple syntax. For example, the following simple example computes +`Fibonacci numbers: <http://en.wikipedia.org/wiki/Fibonacci_number>`_ + +:: + + # Compute the x'th fibonacci number. + def fib(x) + if x < 3 then + 1 + else + fib(x-1)+fib(x-2) + + # This expression will compute the 40th number. + fib(40) + +We also allow Kaleidoscope to call into standard library functions (the +LLVM JIT makes this completely trivial). This means that you can use the +'extern' keyword to define a function before you use it (this is also +useful for mutually recursive functions). For example: + +:: + + extern sin(arg); + extern cos(arg); + extern atan2(arg1 arg2); + + atan2(sin(.4), cos(42)) + +A more interesting example is included in Chapter 6 where we write a +little Kaleidoscope application that `displays a Mandelbrot +Set <LangImpl6.html#example>`_ at various levels of magnification. + +Lets dive into the implementation of this language! + +The Lexer +========= + +When it comes to implementing a language, the first thing needed is the +ability to process a text file and recognize what it says. The +traditional way to do this is to use a +"`lexer <http://en.wikipedia.org/wiki/Lexical_analysis>`_" (aka +'scanner') to break the input up into "tokens". Each token returned by +the lexer includes a token code and potentially some metadata (e.g. the +numeric value of a number). First, we define the possibilities: + +.. code-block:: c++ + + // The lexer returns tokens [0-255] if it is an unknown character, otherwise one + // of these for known things. + enum Token { + tok_eof = -1, + + // commands + tok_def = -2, tok_extern = -3, + + // primary + tok_identifier = -4, tok_number = -5, + }; + + static std::string IdentifierStr; // Filled in if tok_identifier + static double NumVal; // Filled in if tok_number + +Each token returned by our lexer will either be one of the Token enum +values or it will be an 'unknown' character like '+', which is returned +as its ASCII value. If the current token is an identifier, the +``IdentifierStr`` global variable holds the name of the identifier. If +the current token is a numeric literal (like 1.0), ``NumVal`` holds its +value. Note that we use global variables for simplicity, this is not the +best choice for a real language implementation :). + +The actual implementation of the lexer is a single function named +``gettok``. The ``gettok`` function is called to return the next token +from standard input. Its definition starts as: + +.. code-block:: c++ + + /// gettok - Return the next token from standard input. + static int gettok() { + static int LastChar = ' '; + + // Skip any whitespace. + while (isspace(LastChar)) + LastChar = getchar(); + +``gettok`` works by calling the C ``getchar()`` function to read +characters one at a time from standard input. It eats them as it +recognizes them and stores the last character read, but not processed, +in LastChar. The first thing that it has to do is ignore whitespace +between tokens. This is accomplished with the loop above. + +The next thing ``gettok`` needs to do is recognize identifiers and +specific keywords like "def". Kaleidoscope does this with this simple +loop: + +.. code-block:: c++ + + if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]* + IdentifierStr = LastChar; + while (isalnum((LastChar = getchar()))) + IdentifierStr += LastChar; + + if (IdentifierStr == "def") return tok_def; + if (IdentifierStr == "extern") return tok_extern; + return tok_identifier; + } + +Note that this code sets the '``IdentifierStr``' global whenever it +lexes an identifier. Also, since language keywords are matched by the +same loop, we handle them here inline. Numeric values are similar: + +.. code-block:: c++ + + if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+ + std::string NumStr; + do { + NumStr += LastChar; + LastChar = getchar(); + } while (isdigit(LastChar) || LastChar == '.'); + + NumVal = strtod(NumStr.c_str(), 0); + return tok_number; + } + +This is all pretty straight-forward code for processing input. When +reading a numeric value from input, we use the C ``strtod`` function to +convert it to a numeric value that we store in ``NumVal``. Note that +this isn't doing sufficient error checking: it will incorrectly read +"1.23.45.67" and handle it as if you typed in "1.23". Feel free to +extend it :). Next we handle comments: + +.. code-block:: c++ + + if (LastChar == '#') { + // Comment until end of line. + do LastChar = getchar(); + while (LastChar != EOF && LastChar != '\n' && LastChar != '\r'); + + if (LastChar != EOF) + return gettok(); + } + +We handle comments by skipping to the end of the line and then return +the next token. Finally, if the input doesn't match one of the above +cases, it is either an operator character like '+' or the end of the +file. These are handled with this code: + +.. code-block:: c++ + + // Check for end of file. Don't eat the EOF. + if (LastChar == EOF) + return tok_eof; + + // Otherwise, just return the character as its ascii value. + int ThisChar = LastChar; + LastChar = getchar(); + return ThisChar; + } + +With this, we have the complete lexer for the basic Kaleidoscope +language (the `full code listing <LangImpl2.html#code>`_ for the Lexer +is available in the `next chapter <LangImpl2.html>`_ of the tutorial). +Next we'll `build a simple parser that uses this to build an Abstract +Syntax Tree <LangImpl2.html>`_. When we have that, we'll include a +driver so that you can use the lexer and parser together. + +`Next: Implementing a Parser and AST <LangImpl2.html>`_ + diff --git a/docs/tutorial/LangImpl2.html b/docs/tutorial/LangImpl2.html deleted file mode 100644 index 292dd4e516..0000000000 --- a/docs/tutorial/LangImpl2.html +++ /dev/null @@ -1,1231 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> - -<html> -<head> - <title>Kaleidoscope: Implementing a Parser and AST</title> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <meta name="author" content="Chris Lattner"> - <link rel="stylesheet" href="../_static/llvm.css" type="text/css"> -</head> - -<body> - -<h1>Kaleidoscope: Implementing a Parser and AST</h1> - -<ul> -<li><a href="index.html">Up to Tutorial Index</a></li> -<li>Chapter 2 - <ol> - <li><a href="#intro">Chapter 2 Introduction</a></li> - <li><a href="#ast">The Abstract Syntax Tree (AST)</a></li> - <li><a href="#parserbasics">Parser Basics</a></li> - <li><a href="#parserprimexprs">Basic Expression Parsing</a></li> - <li><a href="#parserbinops">Binary Expression Parsing</a></li> - <li><a href="#parsertop">Parsing the Rest</a></li> - <li><a href="#driver">The Driver</a></li> - <li><a href="#conclusions">Conclusions</a></li> - <li><a href="#code">Full Code Listing</a></li> - </ol> -</li> -<li><a href="LangImpl3.html">Chapter 3</a>: Code generation to LLVM IR</li> -</ul> - -<div class="doc_author"> - <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p> -</div> - -<!-- *********************************************************************** --> -<h2><a name="intro">Chapter 2 Introduction</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>Welcome to Chapter 2 of the "<a href="index.html">Implementing a language -with LLVM</a>" tutorial. This chapter shows you how to use the lexer, built in -<a href="LangImpl1.html">Chapter 1</a>, to build a full <a -href="http://en.wikipedia.org/wiki/Parsing">parser</a> for -our Kaleidoscope language. Once we have a parser, we'll define and build an <a -href="http://en.wikipedia.org/wiki/Abstract_syntax_tree">Abstract Syntax -Tree</a> (AST).</p> - -<p>The parser we will build uses a combination of <a -href="http://en.wikipedia.org/wiki/Recursive_descent_parser">Recursive Descent -Parsing</a> and <a href= -"http://en.wikipedia.org/wiki/Operator-precedence_parser">Operator-Precedence -Parsing</a> to parse the Kaleidoscope language (the latter for -binary expressions and the former for everything else). Before we get to -parsing though, lets talk about the output of the parser: the Abstract Syntax -Tree.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="ast">The Abstract Syntax Tree (AST)</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>The AST for a program captures its behavior in such a way that it is easy for -later stages of the compiler (e.g. code generation) to interpret. We basically -want one object for each construct in the language, and the AST should closely -model the language. In Kaleidoscope, we have expressions, a prototype, and a -function object. We'll start with expressions first:</p> - -<div class="doc_code"> -<pre> -/// ExprAST - Base class for all expression nodes. -class ExprAST { -public: - virtual ~ExprAST() {} -}; - -/// NumberExprAST - Expression class for numeric literals like "1.0". -class NumberExprAST : public ExprAST { - double Val; -public: - NumberExprAST(double val) : Val(val) {} -}; -</pre> -</div> - -<p>The code above shows the definition of the base ExprAST class and one -subclass which we use for numeric literals. The important thing to note about -this code is that the NumberExprAST class captures the numeric value of the -literal as an instance variable. This allows later phases of the compiler to -know what the stored numeric value is.</p> - -<p>Right now we only create the AST, so there are no useful accessor methods on -them. It would be very easy to add a virtual method to pretty print the code, -for example. Here are the other expression AST node definitions that we'll use -in the basic form of the Kaleidoscope language: -</p> - -<div class="doc_code"> -<pre> -/// VariableExprAST - Expression class for referencing a variable, like "a". -class VariableExprAST : public ExprAST { - std::string Name; -public: - VariableExprAST(const std::string &name) : Name(name) {} -}; - -/// BinaryExprAST - Expression class for a binary operator. -class BinaryExprAST : public ExprAST { - char Op; - ExprAST *LHS, *RHS; -public: - BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs) - : Op(op), LHS(lhs), RHS(rhs) {} -}; - -/// CallExprAST - Expression class for function calls. -class CallExprAST : public ExprAST { - std::string Callee; - std::vector<ExprAST*> Args; -public: - CallExprAST(const std::string &callee, std::vector<ExprAST*> &args) - : Callee(callee), Args(args) {} -}; -</pre> -</div> - -<p>This is all (intentionally) rather straight-forward: variables capture the -variable name, binary operators capture their opcode (e.g. '+'), and calls -capture a function name as well as a list of any argument expressions. One thing -that is nice about our AST is that it captures the language features without -talking about the syntax of the language. Note that there is no discussion about -precedence of binary operators, lexical structure, etc.</p> - -<p>For our basic language, these are all of the expression nodes we'll define. -Because it doesn't have conditional control flow, it isn't Turing-complete; -we'll fix that in a later installment. The two things we need next are a way -to talk about the interface to a function, and a way to talk about functions -themselves:</p> - -<div class="doc_code"> -<pre> -/// PrototypeAST - This class represents the "prototype" for a function, -/// which captures its name, and its argument names (thus implicitly the number -/// of arguments the function takes). -class PrototypeAST { - std::string Name; - std::vector<std::string> Args; -public: - PrototypeAST(const std::string &name, const std::vector<std::string> &args) - : Name(name), Args(args) {} -}; - -/// FunctionAST - This class represents a function definition itself. -class FunctionAST { - PrototypeAST *Proto; - ExprAST *Body; -public: - FunctionAST(PrototypeAST *proto, ExprAST *body) - : Proto(proto), Body(body) {} -}; -</pre> -</div> - -<p>In Kaleidoscope, functions are typed with just a count of their arguments. -Since all values are double precision floating point, the type of each argument -doesn't need to be stored anywhere. In a more aggressive and realistic -language, the "ExprAST" class would probably have a type field.</p> - -<p>With this scaffolding, we can now talk about parsing expressions and function -bodies in Kaleidoscope.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="parserbasics">Parser Basics</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>Now that we have an AST to build, we need to define the parser code to build -it. The idea here is that we want to parse something like "x+y" (which is -returned as three tokens by the lexer) into an AST that could be generated with -calls like this:</p> - -<div class="doc_code"> -<pre> - ExprAST *X = new VariableExprAST("x"); - ExprAST *Y = new VariableExprAST("y"); - ExprAST *Result = new BinaryExprAST('+', X, Y); -</pre> -</div> - -<p>In order to do this, we'll start by defining some basic helper routines:</p> - -<div class="doc_code"> -<pre> -/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current -/// token the parser is looking at. getNextToken reads another token from the -/// lexer and updates CurTok with its results. -static int CurTok; -static int getNextToken() { - return CurTok = gettok(); -} -</pre> -</div> - -<p> -This implements a simple token buffer around the lexer. This allows -us to look one token ahead at what the lexer is returning. Every function in -our parser will assume that CurTok is the current token that needs to be -parsed.</p> - -<div class="doc_code"> -<pre> - -/// Error* - These are little helper functions for error handling. -ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;} -PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; } -FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; } -</pre> -</div> - -<p> -The <tt>Error</tt> routines are simple helper routines that our parser will use -to handle errors. The error recovery in our parser will not be the best and -is not particular user-friendly, but it will be enough for our tutorial. These -routines make it easier to handle errors in routines that have various return -types: they always return null.</p> - -<p>With these basic helper functions, we can implement the first -piece of our grammar: numeric literals.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="parserprimexprs">Basic Expression Parsing</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>We start with numeric literals, because they are the simplest to process. -For each production in our grammar, we'll define a function which parses that -production. For numeric literals, we have: -</p> - -<div class="doc_code"> -<pre> -/// numberexpr ::= number -static ExprAST *ParseNumberExpr() { - ExprAST *Result = new NumberExprAST(NumVal); - getNextToken(); // consume the number - return Result; -} -</pre> -</div> - -<p>This routine is very simple: it expects to be called when the current token -is a <tt>tok_number</tt> token. It takes the current number value, creates -a <tt>NumberExprAST</tt> node, advances the lexer to the next token, and finally -returns.</p> - -<p>There are some interesting aspects to this. The most important one is that -this routine eats all of the tokens that correspond to the production and -returns the lexer buffer with the next token (which is not part of the grammar -production) ready to go. This is a fairly standard way to go for recursive -descent parsers. For a better example, the parenthesis operator is defined like -this:</p> - -<div class="doc_code"> -<pre> -/// parenexpr ::= '(' expression ')' -static ExprAST *ParseParenExpr() { - getNextToken(); // eat (. - ExprAST *V = ParseExpression(); - if (!V) return 0; - - if (CurTok != ')') - return Error("expected ')'"); - getNextToken(); // eat ). - return V; -} -</pre> -</div> - -<p>This function illustrates a number of interesting things about the -parser:</p> - -<p> -1) It shows how we use the Error routines. When called, this function expects -that the current token is a '(' token, but after parsing the subexpression, it -is possible that there is no ')' waiting. For example, if the user types in -"(4 x" instead of "(4)", the parser should emit an error. Because errors can -occur, the parser needs a way to indicate that they happened: in our parser, we -return null on an error.</p> - -<p>2) Another interesting aspect of this function is that it uses recursion by -calling <tt>ParseExpression</tt> (we will soon see that <tt>ParseExpression</tt> can call -<tt>ParseParenExpr</tt>). This is powerful because it allows us to handle -recursive grammars, and keeps each production very simple. Note that -parentheses do not cause construction of AST nodes themselves. While we could -do it this way, the most important role of parentheses are to guide the parser -and provide grouping. Once the parser constructs the AST, parentheses are not -needed.</p> - -<p>The next simple production is for handling variable references and function -calls:</p> - -<div class="doc_code"> -<pre> -/// identifierexpr -/// ::= identifier -/// ::= identifier '(' expression* ')' -static ExprAST *ParseIdentifierExpr() { - std::string IdName = IdentifierStr; - - getNextToken(); // eat identifier. - - if (CurTok != '(') // Simple variable ref. - return new VariableExprAST(IdName); - - // Call. - getNextToken(); // eat ( - std::vector<ExprAST*> Args; - if (CurTok != ')') { - while (1) { - ExprAST *Arg = ParseExpression(); - if (!Arg) return 0; - Args.push_back(Arg); - - if (CurTok == ')') break; - - if (CurTok != ',') - return Error("Expected ')' or ',' in argument list"); - getNextToken(); - } - } - - // Eat the ')'. - getNextToken(); - - return new CallExprAST(IdName, Args); -} -</pre> -</div> - -<p>This routine follows the same style as the other routines. (It expects to be -called if the current token is a <tt>tok_identifier</tt> token). It also has -recursion and error handling. One interesting aspect of this is that it uses -<em>look-ahead</em> to determine if the current identifier is a stand alone -variable reference or if it is a function call expression. It handles this by -checking to see if the token after the identifier is a '(' token, constructing -either a <tt>VariableExprAST</tt> or <tt>CallExprAST</tt> node as appropriate. -</p> - -<p>Now that we have all of our simple expression-parsing logic in place, we can -define a helper function to wrap it together into one entry point. We call this -class of expressions "primary" expressions, for reasons that will become more -clear <a href="LangImpl6.html#unary">later in the tutorial</a>. In order to -parse an arbitrary primary expression, we need to determine what sort of -expression it is:</p> - -<div class="doc_code"> -<pre> -/// primary -/// ::= identifierexpr -/// ::= numberexpr -/// ::= parenexpr -static ExprAST *ParsePrimary() { - switch (CurTok) { - default: return Error("unknown token when expecting an expression"); - case tok_identifier: return ParseIdentifierExpr(); - case tok_number: return ParseNumberExpr(); - case '(': return ParseParenExpr(); - } -} -</pre> -</div> - -<p>Now that you see the definition of this function, it is more obvious why we -can assume the state of CurTok in the various functions. This uses look-ahead -to determine which sort of expression is being inspected, and then parses it -with a function call.</p> - -<p>Now that basic expressions are handled, we need to handle binary expressions. -They are a bit more complex.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="parserbinops">Binary Expression Parsing</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>Binary expressions are significantly harder to parse because they are often -ambiguous. For example, when given the string "x+y*z", the parser can choose -to parse it as either "(x+y)*z" or "x+(y*z)". With common definitions from -mathematics, we expect the later parse, because "*" (multiplication) has -higher <em>precedence</em> than "+" (addition).</p> - -<p>There are many ways to handle this, but an elegant and efficient way is to -use <a href= -"http://en.wikipedia.org/wiki/Operator-precedence_parser">Operator-Precedence -Parsing</a>. This parsing technique uses the precedence of binary operators to -guide recursion. To start with, we need a table of precedences:</p> - -<div class="doc_code"> -<pre> -/// BinopPrecedence - This holds the precedence for each binary operator that is -/// defined. -static std::map<char, int> BinopPrecedence; - -/// GetTokPrecedence - Get the precedence of the pending binary operator token. -static int GetTokPrecedence() { - if (!isascii(CurTok)) - return -1; - - // Make sure it's a declared binop. - int TokPrec = BinopPrecedence[CurTok]; - if (TokPrec <= 0) return -1; - return TokPrec; -} - -int main() { - // Install standard binary operators. - // 1 is lowest precedence. - BinopPrecedence['<'] = 10; - BinopPrecedence['+'] = 20; - BinopPrecedence['-'] = 20; - BinopPrecedence['*'] = 40; // highest. - ... -} -</pre> -</div> - -<p>For the basic form of Kaleidoscope, we will only support 4 binary operators -(this can obviously be extended by you, our brave and intrepid reader). The -<tt>GetTokPrecedence</tt> function returns the precedence for the current token, -or -1 if the token is not a binary operator. Having a map makes it easy to add -new operators and makes it clear that the algorithm doesn't depend on the -specific operators involved, but it would be easy enough to eliminate the map -and do the comparisons in the <tt>GetTokPrecedence</tt> function. (Or just use -a fixed-size array).</p> - -<p>With the helper above defined, we can now start parsing binary expressions. -The basic idea of operator precedence parsing is to break down an expression -with potentially ambiguous binary operators into pieces. Consider ,for example, -the expression "a+b+(c+d)*e*f+g". Operator precedence parsing considers this -as a stream of primary expressions separated by binary operators. As such, -it will first parse the leading primary expression "a", then it will see the -pairs [+, b] [+, (c+d)] [*, e] [*, f] and [+, g]. Note that because parentheses -are primary expressions, the binary expression parser doesn't need to worry -about nested subexpressions like (c+d) at all. -</p> - -<p> -To start, an expression is a primary expression potentially followed by a -sequence of [binop,primaryexpr] pairs:</p> - -<div class="doc_code"> -<pre> -/// expression -/// ::= primary binoprhs -/// -static ExprAST *ParseExpression() { - ExprAST *LHS = ParsePrimary(); - if (!LHS) return 0; - - return ParseBinOpRHS(0, LHS); -} -</pre> -</div> - -<p><tt>ParseBinOpRHS</tt> is the function that parses the sequence of pairs for -us. It takes a precedence and a pointer to an expression for the part that has been -parsed so far. Note that "x" is a perfectly valid expression: As such, "binoprhs" is -allowed to be empty, in which case it returns the expression that is passed into -it. In our example above, the code passes the expression for "a" into -<tt>ParseBinOpRHS</tt> and the current token is "+".</p> - -<p>The precedence value passed into <tt>ParseBinOpRHS</tt> indicates the <em> -minimal operator precedence</em> that the function is allowed to eat. For -example, if the current pair stream is [+, x] and <tt>ParseBinOpRHS</tt> is -passed in a precedence of 40, it will not consume any tokens (because the -precedence of '+' is only 20). With this in mind, <tt>ParseBinOpRHS</tt> starts -with:</p> - -<div class="doc_code"> -<pre> -/// binoprhs -/// ::= ('+' primary)* -static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) { - // If this is a binop, find its precedence. - while (1) { - int TokPrec = GetTokPrecedence(); - - // If this is a binop that binds at least as tightly as the current binop, - // consume it, otherwise we are done. - if (TokPrec < ExprPrec) - return LHS; -</pre> -</div> - -<p>This code gets the precedence of the current token and checks to see if if is -too low. Because we defined invalid tokens to have a precedence of -1, this -check implicitly knows that the pair-stream ends when the token stream runs out -of binary operators. If this check succeeds, we know that the token is a binary -operator and that it will be included in this expression:</p> - -<div class="doc_code"> -<pre> - // Okay, we know this is a binop. - int BinOp = CurTok; - getNextToken(); // eat binop - - // Parse the primary expression after the binary operator. - ExprAST *RHS = ParsePrimary(); - if (!RHS) return 0; -</pre> -</div> - -<p>As such, this code eats (and remembers) the binary operator and then parses -the primary expression that follows. This builds up the whole pair, the first of -which is [+, b] for the running example.</p> - -<p>Now that we parsed the left-hand side of an expression and one pair of the -RHS sequence, we have to decide which way the expression associates. In -particular, we could have "(a+b) binop unparsed" or "a + (b binop unparsed)". -To determine this, we look ahead at "binop" to determine its precedence and -compare it to BinOp's precedence (which is '+' in this case):</p> - -<div class="doc_code"> -<pre> - // If BinOp binds less tightly with RHS than the operator after RHS, let - // the pending operator take RHS as its LHS. - int NextPrec = GetTokPrecedence(); - if (TokPrec < NextPrec) { -</pre> -</div> - -<p>If the precedence of the binop to the right of "RHS" is lower or equal to the -precedence of our current operator, then we know that the parentheses associate -as "(a+b) binop ...". In our example, the current operator is "+" and the next -operator is "+", we know that they have the same precedence. In this case we'll -create the AST node for "a+b", and then continue parsing:</p> - -<div class="doc_code"> -<pre> - ... if body omitted ... - } - - // Merge LHS/RHS. - LHS = new BinaryExprAST(BinOp, LHS, RHS); - } // loop around to the top of the while loop. -} -</pre> -</div> - -<p>In our example above, this will turn "a+b+" into "(a+b)" and execute the next -iteration of the loop, with "+" as the current token. The code above will eat, -remember, and parse "(c+d)" as the primary expression, which makes the -current pair equal to [+, (c+d)]. It will then evaluate the 'if' conditional above with -"*" as the binop to the right of the primary. In this case, the precedence of "*" is -higher than the precedence of "+" so the if condition will be entered.</p> - -<p>The critical question left here is "how can the if condition parse the right -hand side in full"? In particular, to build the AST correctly for our example, -it needs to get all of "(c+d)*e*f" as the RHS expression variable. The code to -do this is surprisingly simple (code from the above two blocks duplicated for -context):</p> - -<div class="doc_code"> -<pre> - // If BinOp binds less tightly with RHS than the operator after RHS, let - // the pending operator take RHS as its LHS. - int NextPrec = GetTokPrecedence(); - if (TokPrec < NextPrec) { - <b>RHS = ParseBinOpRHS(TokPrec+1, RHS); - if (RHS == 0) return 0;</b> - } - // Merge LHS/RHS. - LHS = new BinaryExprAST(BinOp, LHS, RHS); - } // loop around to the top of the while loop. -} -</pre> -</div> - -<p>At this point, we know that the binary operator to the RHS of our primary -has higher precedence than the binop we are currently parsing. As such, we know -that any sequence of pairs whose operators are all higher precedence than "+" -should be parsed together and returned as "RHS". To do this, we recursively -invoke the <tt>ParseBinOpRHS</tt> function specifying "TokPrec+1" as the minimum -precedence required for it to continue. In our example above, this will cause -it to return the AST node for "(c+d)*e*f" as RHS, which is then set as the RHS -of the '+' expression.</p> - -<p>Finally, on the next iteration of the while loop, the "+g" piece is parsed -and added to the AST. With this little bit of code (14 non-trivial lines), we -correctly handle fully general binary expression parsing in a very elegant way. -This was a whirlwind tour of this code, and it is somewhat subtle. I recommend -running through it with a few tough examples to see how it works. -</p> - -<p>This wraps up handling of expressions. At this point, we can point the -parser at an arbitrary token stream and build an expression from it, stopping -at the first token that is not part of the expression. Next up we need to -handle function definitions, etc.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="parsertop">Parsing the Rest</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p> -The next thing missing is handling of function prototypes. In Kaleidoscope, -these are used both for 'extern' function declarations as well as function body -definitions. The code to do this is straight-forward and not very interesting -(once you've survived expressions): -</p> - -<div class="doc_code"> -<pre> -/// prototype -/// ::= id '(' id* ')' -static PrototypeAST *ParsePrototype() { - if (CurTok != tok_identifier) - return ErrorP("Expected function name in prototype"); - - std::string FnName = IdentifierStr; - getNextToken(); - - if (CurTok != '(') - return ErrorP("Expected '(' in prototype"); - - // Read the list of argument names. - std::vector<std::string> ArgNames; - while (getNextToken() == tok_identifier) - ArgNames.push_back(IdentifierStr); - if (CurTok != ')') - return ErrorP("Expected ')' in prototype"); - - // success. - getNextToken(); // eat ')'. - - return new PrototypeAST(FnName, ArgNames); -} -</pre> -</div> - -<p>Given this, a function definition is very simple, just a prototype plus -an expression to implement the body:</p> - -<div class="doc_code"> -<pre> -/// definition ::= 'def' prototype expression -static FunctionAST *ParseDefinition() { - getNextToken(); // eat def. - PrototypeAST *Proto = ParsePrototype(); - if (Proto == 0) return 0; - - if (ExprAST *E = ParseExpression()) - return new FunctionAST(Proto, E); - return 0; -} -</pre> -</div> - -<p>In addition, we support 'extern' to declare functions like 'sin' and 'cos' as -well as to support forward declaration of user functions. These 'extern's are just -prototypes with no body:</p> - -<div class="doc_code"> -<pre> -/// external ::= 'extern' prototype -static PrototypeAST *ParseExtern() { - getNextToken(); // eat extern. - return ParsePrototype(); -} -</pre> -</div> - -<p>Finally, we'll also let the user type in arbitrary top-level expressions and -evaluate them on the fly. We will handle this by defining anonymous nullary -(zero argument) functions for them:</p> - -<div class="doc_code"> -<pre> -/// toplevelexpr ::= expression -static FunctionAST *ParseTopLevelExpr() { - if (ExprAST *E = ParseExpression()) { - // Make an anonymous proto. - PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>()); - return new FunctionAST(Proto, E); - } - return 0; -} -</pre> -</div> - -<p>Now that we have all the pieces, let's build a little driver that will let us -actually <em>execute</em> this code we've built!</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="driver">The Driver</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>The driver for this simply invokes all of the parsing pieces with a top-level -dispatch loop. There isn't much interesting here, so I'll just include the -top-level loop. See <a href="#code">below</a> for full code in the "Top-Level -Parsing" section.</p> - -<div class="doc_code"> -<pre> -/// top ::= definition | external | expression | ';' -static void MainLoop() { - while (1) { - fprintf(stderr, "ready> "); - switch (CurTok) { - case tok_eof: return; - case ';': getNextToken(); break; // ignore top-level semicolons. - case tok_def: HandleDefinition(); break; - case tok_extern: HandleExtern(); break; - default: HandleTopLevelExpression(); break; - } - } -} -</pre> -</div> - -<p>The most interesting part of this is that we ignore top-level semicolons. -Why is this, you ask? The basic reason is that if you type "4 + 5" at the -command line, the parser doesn't know whether that is the end of what you will type -or not. For example, on the next line you could type "def foo..." in which case -4+5 is the end of a top-level expression. Alternatively you could type "* 6", -which would continue the expression. Having top-level semicolons allows you to -type "4+5;", and the parser will know you are done.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="conclusions">Conclusions</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>With just under 400 lines of commented code (240 lines of non-comment, -non-blank code), we fully defined our minimal language, including a lexer, -parser, and AST builder. With this done, the executable will validate -Kaleidoscope code and tell us if it is grammatically invalid. For -example, here is a sample interaction:</p> - -<div class="doc_code"> -<pre> -$ <b>./a.out</b> -ready> <b>def foo(x y) x+foo(y, 4.0);</b> -Parsed a function definition. -ready> <b>def foo(x y) x+y y;</b> -Parsed a function definition. -Parsed a top-level expr -ready> <b>def foo(x y) x+y );</b> -Parsed a function definition. -Error: unknown token when expecting an expression -ready> <b>extern sin(a);</b> -ready> Parsed an extern -ready> <b>^D</b> -$ -</pre> -</div> - -<p>There is a lot of room for extension here. You can define new AST nodes, -extend the language in many ways, etc. In the <a href="LangImpl3.html">next -installment</a>, we will describe how to generate LLVM Intermediate -Representation (IR) from the AST.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="code">Full Code Listing</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p> -Here is the complete code listing for this and the previous chapter. -Note that it is fully self-contained: you don't need LLVM or any external -libraries at all for this. (Besides the C and C++ standard libraries, of -course.) To build this, just compile with:</p> - -<div class="doc_code"> -<pre> -# Compile -clang++ -g -O3 toy.cpp -# Run -./a.out -</pre> -</div> - -<p>Here is the code:</p> - -<div class="doc_code"> -<pre> -#include <cstdio> -#include <cstdlib> -#include <string> -#include <map> -#include <vector> - -//===----------------------------------------------------------------------===// -// Lexer -//===----------------------------------------------------------------------===// - -// The lexer returns tokens [0-255] if it is an unknown character, otherwise one -// of these for known things. -enum Token { - tok_eof = -1, - - // commands - tok_def = -2, tok_extern = -3, - - // primary - tok_identifier = -4, tok_number = -5 -}; - -static std::string IdentifierStr; // Filled in if tok_identifier -static double NumVal; // Filled in if tok_number - -/// gettok - Return the next token from standard input. -static int gettok() { - static int LastChar = ' '; - - // Skip any whitespace. - while (isspace(LastChar)) - LastChar = getchar(); - - if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]* - IdentifierStr = LastChar; - while (isalnum((LastChar = getchar()))) - IdentifierStr += LastChar; - - if (IdentifierStr == "def") return tok_def; - if (IdentifierStr == "extern") return tok_extern; - return tok_identifier; - } - - if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+ - std::string NumStr; - do { - NumStr += LastChar; - LastChar = getchar(); - } while (isdigit(LastChar) || LastChar == '.'); - - NumVal = strtod(NumStr.c_str(), 0); - return tok_number; - } - - if (LastChar == '#') { - // Comment until end of line. - do LastChar = getchar(); - while (LastChar != EOF && LastChar != '\n' && LastChar != '\r'); - - if (LastChar != EOF) - return gettok(); - } - - // Check for end of file. Don't eat the EOF. - if (LastChar == EOF) - return tok_eof; - - // Otherwise, just return the character as its ascii value. - int ThisChar = LastChar; - LastChar = getchar(); - return ThisChar; -} - -//===----------------------------------------------------------------------===// -// Abstract Syntax Tree (aka Parse Tree) -//===----------------------------------------------------------------------===// - -/// ExprAST - Base class for all expression nodes. -class ExprAST { -public: - virtual ~ExprAST() {} -}; - -/// NumberExprAST - Expression class for numeric literals like "1.0". -class NumberExprAST : public ExprAST { - double Val; -public: - NumberExprAST(double val) : Val(val) {} -}; - -/// VariableExprAST - Expression class for referencing a variable, like "a". -class VariableExprAST : public ExprAST { - std::string Name; -public: - VariableExprAST(const std::string &name) : Name(name) {} -}; - -/// BinaryExprAST - Expression class for a binary operator. -class BinaryExprAST : public ExprAST { - char Op; - ExprAST *LHS, *RHS; -public: - BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs) - : Op(op), LHS(lhs), RHS(rhs) {} -}; - -/// CallExprAST - Expression class for function calls. -class CallExprAST : public ExprAST { - std::string Callee; - std::vector<ExprAST*> Args; -public: - CallExprAST(const std::string &callee, std::vector<ExprAST*> &args) - : Callee(callee), Args(args) {} -}; - -/// PrototypeAST - This class represents the "prototype" for a function, -/// which captures its name, and its argument names (thus implicitly the number -/// of arguments the function takes). -class PrototypeAST { - std::string Name; - std::vector<std::string> Args; -public: - PrototypeAST(const std::string &name, const std::vector<std::string> &args) - : Name(name), Args(args) {} - -}; - -/// FunctionAST - This class represents a function definition itself. -class FunctionAST { - PrototypeAST *Proto; - ExprAST *Body; -public: - FunctionAST(PrototypeAST *proto, ExprAST *body) - : Proto(proto), Body(body) {} - -}; - -//===----------------------------------------------------------------------===// -// Parser -//===----------------------------------------------------------------------===// - -/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current -/// token the parser is looking at. getNextToken reads another token from the -/// lexer and updates CurTok with its results. -static int CurTok; -static int getNextToken() { - return CurTok = gettok(); -} - -/// BinopPrecedence - This holds the precedence for each binary operator that is -/// defined. -static std::map<char, int> BinopPrecedence; - -/// GetTokPrecedence - Get the precedence of the pending binary operator token. -static int GetTokPrecedence() { - if (!isascii(CurTok)) - return -1; - - // Make sure it's a declared binop. - int TokPrec = BinopPrecedence[CurTok]; - if (TokPrec <= 0) return -1; - return TokPrec; -} - -/// Error* - These are little helper functions for error handling. -ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;} -PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; } -FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; } - -static ExprAST *ParseExpression(); - -/// identifierexpr -/// ::= identifier -/// ::= identifier '(' expression* ')' -static ExprAST *ParseIdentifierExpr() { - std::string IdName = IdentifierStr; - - getNextToken(); // eat identifier. - - if (CurTok != '(') // Simple variable ref. - return new VariableExprAST(IdName); - - // Call. - getNextToken(); // eat ( - std::vector<ExprAST*> Args; - if (CurTok != ')') { - while (1) { - ExprAST *Arg = ParseExpression(); - if (!Arg) return 0; - Args.push_back(Arg); - - if (CurTok == ')') break; - - if (CurTok != ',') - return Error("Expected ')' or ',' in argument list"); - getNextToken(); - } - } - - // Eat the ')'. - getNextToken(); - - return new CallExprAST(IdName, Args); -} - -/// numberexpr ::= number -static ExprAST *ParseNumberExpr() { - ExprAST *Result = new NumberExprAST(NumVal); - getNextToken(); // consume the number - return Result; -} - -/// parenexpr ::= '(' expression ')' -static ExprAST *ParseParenExpr() { - getNextToken(); // eat (. - ExprAST *V = ParseExpression(); - if (!V) return 0; - - if (CurTok != ')') - return Error("expected ')'"); - getNextToken(); // eat ). - return V; -} - -/// primary -/// ::= identifierexpr -/// ::= numberexpr -/// ::= parenexpr -static ExprAST *ParsePrimary() { - switch (CurTok) { - default: return Error("unknown token when expecting an expression"); - case tok_identifier: return ParseIdentifierExpr(); - case tok_number: return ParseNumberExpr(); - case '(': return ParseParenExpr(); - } -} - -/// binoprhs -/// ::= ('+' primary)* -static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) { - // If this is a binop, find its precedence. - while (1) { - int TokPrec = GetTokPrecedence(); - - // If this is a binop that binds at least as tightly as the current binop, - // consume it, otherwise we are done. - if (TokPrec < ExprPrec) - return LHS; - - // Okay, we know this is a binop. - int BinOp = CurTok; - getNextToken(); // eat binop - - // Parse the primary expression after the binary operator. - ExprAST *RHS = ParsePrimary(); - if (!RHS) return 0; - - // If BinOp binds less tightly with RHS than the operator after RHS, let - // the pending operator take RHS as its LHS. - int NextPrec = GetTokPrecedence(); - if (TokPrec < NextPrec) { - RHS = ParseBinOpRHS(TokPrec+1, RHS); - if (RHS == 0) return 0; - } - - // Merge LHS/RHS. - LHS = new BinaryExprAST(BinOp, LHS, RHS); - } -} - -/// expression -/// ::= primary binoprhs -/// -static ExprAST *ParseExpression() { - ExprAST *LHS = ParsePrimary(); - if (!LHS) return 0; - - return ParseBinOpRHS(0, LHS); -} - -/// prototype -/// ::= id '(' id* ')' -static PrototypeAST *ParsePrototype() { - if (CurTok != tok_identifier) - return ErrorP("Expected function name in prototype"); - - std::string FnName = IdentifierStr; - getNextToken(); - - if (CurTok != '(') - return ErrorP("Expected '(' in prototype"); - - std::vector<std::string> ArgNames; - while (getNextToken() == tok_identifier) - ArgNames.push_back(IdentifierStr); - if (CurTok != ')') - return ErrorP("Expected ')' in prototype"); - - // success. - getNextToken(); // eat ')'. - - return new PrototypeAST(FnName, ArgNames); -} - -/// definition ::= 'def' prototype expression -static FunctionAST *ParseDefinition() { - getNextToken(); // eat def. - PrototypeAST *Proto = ParsePrototype(); - if (Proto == 0) return 0; - - if (ExprAST *E = ParseExpression()) - return new FunctionAST(Proto, E); - return 0; -} - -/// toplevelexpr ::= expression -static FunctionAST *ParseTopLevelExpr() { - if (ExprAST *E = ParseExpression()) { - // Make an anonymous proto. - PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>()); - return new FunctionAST(Proto, E); - } - return 0; -} - -/// external ::= 'extern' prototype -static PrototypeAST *ParseExtern() { - getNextToken(); // eat extern. - return ParsePrototype(); -} - -//===----------------------------------------------------------------------===// -// Top-Level parsing -//===----------------------------------------------------------------------===// - -static void HandleDefinition() { - if (ParseDefinition()) { - fprintf(stderr, "Parsed a function definition.\n"); - } else { - // Skip token for error recovery. - getNextToken(); - } -} - -static void HandleExtern() { - if (ParseExtern()) { - fprintf(stderr, "Parsed an extern\n"); - } else { - // Skip token for error recovery. - getNextToken(); - } -} - -static void HandleTopLevelExpression() { - // Evaluate a top-level expression into an anonymous function. - if (ParseTopLevelExpr()) { - fprintf(stderr, "Parsed a top-level expr\n"); - } else { - // Skip token for error recovery. - getNextToken(); - } -} - -/// top ::= definition | external | expression | ';' -static void MainLoop() { - while (1) { - fprintf(stderr, "ready> "); - switch (CurTok) { - case tok_eof: return; - case ';': getNextToken(); break; // ignore top-level semicolons. - case tok_def: HandleDefinition(); break; - case tok_extern: HandleExtern(); break; - default: HandleTopLevelExpression(); break; - } - } -} - -//===----------------------------------------------------------------------===// -// Main driver code. -//===----------------------------------------------------------------------===// - -int main() { - // Install standard binary operators. - // 1 is lowest precedence. - BinopPrecedence['<'] = 10; - BinopPrecedence['+'] = 20; - BinopPrecedence['-'] = 20; - BinopPrecedence['*'] = 40; // highest. - - // Prime the first token. - fprintf(stderr, "ready> "); - getNextToken(); - - // Run the main "interpreter loop" now. - MainLoop(); - - return 0; -} -</pre> -</div> -<a href="LangImpl3.html">Next: Implementing Code Generation to LLVM IR</a> -</div> - -<!-- *********************************************************************** --> -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> - - <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> - <a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> -</body> -</html> diff --git a/docs/tutorial/LangImpl2.rst b/docs/tutorial/LangImpl2.rst new file mode 100644 index 0000000000..0d62894a24 --- /dev/null +++ b/docs/tutorial/LangImpl2.rst @@ -0,0 +1,1098 @@ +=========================================== +Kaleidoscope: Implementing a Parser and AST +=========================================== + +.. contents:: + :local: + +Written by `Chris Lattner <mailto:sabre@nondot.org>`_ + +Chapter 2 Introduction +====================== + +Welcome to Chapter 2 of the "`Implementing a language with +LLVM <index.html>`_" tutorial. This chapter shows you how to use the +lexer, built in `Chapter 1 <LangImpl1.html>`_, to build a full +`parser <http://en.wikipedia.org/wiki/Parsing>`_ for our Kaleidoscope +language. Once we have a parser, we'll define and build an `Abstract +Syntax Tree <http://en.wikipedia.org/wiki/Abstract_syntax_tree>`_ (AST). + +The parser we will build uses a combination of `Recursive Descent +Parsing <http://en.wikipedia.org/wiki/Recursive_descent_parser>`_ and +`Operator-Precedence +Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_ to +parse the Kaleidoscope language (the latter for binary expressions and +the former for everything else). Before we get to parsing though, lets +talk about the output of the parser: the Abstract Syntax Tree. + +The Abstract Syntax Tree (AST) +============================== + +The AST for a program captures its behavior in such a way that it is +easy for later stages of the compiler (e.g. code generation) to +interpret. We basically want one object for each construct in the +language, and the AST should closely model the language. In +Kaleidoscope, we have expressions, a prototype, and a function object. +We'll start with expressions first: + +.. code-block:: c++ + + /// ExprAST - Base class for all expression nodes. + class ExprAST { + public: + virtual ~ExprAST() {} + }; + + /// NumberExprAST - Expression class for numeric literals like "1.0". + class NumberExprAST : public ExprAST { + double Val; + public: + NumberExprAST(double val) : Val(val) {} + }; + +The code above shows the definition of the base ExprAST class and one +subclass which we use for numeric literals. The important thing to note +about this code is that the NumberExprAST class captures the numeric +value of the literal as an instance variable. This allows later phases +of the compiler to know what the stored numeric value is. + +Right now we only create the AST, so there are no useful accessor +methods on them. It would be very easy to add a virtual method to pretty +print the code, for example. Here are the other expression AST node +definitions that we'll use in the basic form of the Kaleidoscope +language: + +.. code-block:: c++ + + /// VariableExprAST - Expression class for referencing a variable, like "a". + class VariableExprAST : public ExprAST { + std::string Name; + public: + VariableExprAST(const std::string &name) : Name(name) {} + }; + + /// BinaryExprAST - Expression class for a binary operator. + class BinaryExprAST : public ExprAST { + char Op; + ExprAST *LHS, *RHS; + public: + BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs) + : Op(op), LHS(lhs), RHS(rhs) {} + }; + + /// CallExprAST - Expression class for function calls. + class CallExprAST : public ExprAST { + std::string Callee; + std::vector<ExprAST*> Args; + public: + CallExprAST(const std::string &callee, std::vector<ExprAST*> &args) + : Callee(callee), Args(args) {} + }; + +This is all (intentionally) rather straight-forward: variables capture +the variable name, binary operators capture their opcode (e.g. '+'), and +calls capture a function name as well as a list of any argument +expressions. One thing that is nice about our AST is that it captures +the language features without talking about the syntax of the language. +Note that there is no discussion about precedence of binary operators, +lexical structure, etc. + +For our basic language, these are all of the expression nodes we'll +define. Because it doesn't have conditional control flow, it isn't +Turing-complete; we'll fix that in a later installment. The two things +we need next are a way to talk about the interface to a function, and a +way to talk about functions themselves: + +.. code-block:: c++ + + /// PrototypeAST - This class represents the "prototype" for a function, + /// which captures its name, and its argument names (thus implicitly the number + /// of arguments the function takes). + class PrototypeAST { + std::string Name; + std::vector<std::string> Args; + public: + PrototypeAST(const std::string &name, const std::vector<std::string> &args) + : Name(name), Args(args) {} + }; + + /// FunctionAST - This class represents a function definition itself. + class FunctionAST { + PrototypeAST *Proto; + ExprAST *Body; + public: + FunctionAST(PrototypeAST *proto, ExprAST *body) + : Proto(proto), Body(body) {} + }; + +In Kaleidoscope, functions are typed with just a count of their +arguments. Since all values are double precision floating point, the +type of each argument doesn't need to be stored anywhere. In a more +aggressive and realistic language, the "ExprAST" class would probably +have a type field. + +With this scaffolding, we can now talk about parsing expressions and +function bodies in Kaleidoscope. + +Parser Basics +============= + +Now that we have an AST to build, we need to define the parser code to +build it. The idea here is that we want to parse something like "x+y" +(which is returned as three tokens by the lexer) into an AST that could +be generated with calls like this: + +.. code-block:: c++ + + ExprAST *X = new VariableExprAST("x"); + ExprAST *Y = new VariableExprAST("y"); + ExprAST *Result = new BinaryExprAST('+', X, Y); + +In order to do this, we'll start by defining some basic helper routines: + +.. code-block:: c++ + + /// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current + /// token the parser is looking at. getNextToken reads another token from the + /// lexer and updates CurTok with its results. + static int CurTok; + static int getNextToken() { + return CurTok = gettok(); + } + +This implements a simple token buffer around the lexer. This allows us +to look one token ahead at what the lexer is returning. Every function +in our parser will assume that CurTok is the current token that needs to +be parsed. + +.. code-block:: c++ + + + /// Error* - These are little helper functions for error handling. + ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;} + PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; } + FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; } + +The ``Error`` routines are simple helper routines that our parser will +use to handle errors. The error recovery in our parser will not be the +best and is not particular user-friendly, but it will be enough for our +tutorial. These routines make it easier to handle errors in routines +that have various return types: they always return null. + +With these basic helper functions, we can implement the first piece of +our grammar: numeric literals. + +Basic Expression Parsing +======================== + +We start with numeric literals, because they are the simplest to +process. For each production in our grammar, we'll define a function +which parses that production. For numeric literals, we have: + +.. code-block:: c++ + + /// numberexpr ::= number + static ExprAST *ParseNumberExpr() { + ExprAST *Result = new NumberExprAST(NumVal); + getNextToken(); // consume the number + return Result; + } + +This routine is very simple: it expects to be called when the current +token is a ``tok_number`` token. It takes the current number value, +creates a ``NumberExprAST`` node, advances the lexer to the next token, +and finally returns. + +There are some interesting aspects to this. The most important one is +that this routine eats all of the tokens that correspond to the +production and returns the lexer buffer with the next token (which is +not part of the grammar production) ready to go. This is a fairly +standard way to go for recursive descent parsers. For a better example, +the parenthesis operator is defined like this: + +.. code-block:: c++ + + /// parenexpr ::= '(' expression ')' + static ExprAST *ParseParenExpr() { + getNextToken(); // eat (. + ExprAST *V = ParseExpression(); + if (!V) return 0; + + if (CurTok != ')') + return Error("expected ')'"); + getNextToken(); // eat ). + return V; + } + +This function illustrates a number of interesting things about the +parser: + +1) It shows how we use the Error routines. When called, this function +expects that the current token is a '(' token, but after parsing the +subexpression, it is possible that there is no ')' waiting. For example, +if the user types in "(4 x" instead of "(4)", the parser should emit an +error. Because errors can occur, the parser needs a way to indicate that +they happened: in our parser, we return null on an error. + +2) Another interesting aspect of this function is that it uses recursion +by calling ``ParseExpression`` (we will soon see that +``ParseExpression`` can call ``ParseParenExpr``). This is powerful +because it allows us to handle recursive grammars, and keeps each +production very simple. Note that parentheses do not cause construction +of AST nodes themselves. While we could do it this way, the most +important role of parentheses are to guide the parser and provide +grouping. Once the parser constructs the AST, parentheses are not +needed. + +The next simple production is for handling variable references and +function calls: + +.. code-block:: c++ + + /// identifierexpr + /// ::= identifier + /// ::= identifier '(' expression* ')' + static ExprAST *ParseIdentifierExpr() { + std::string IdName = IdentifierStr; + + getNextToken(); // eat identifier. + + if (CurTok != '(') // Simple variable ref. + return new VariableExprAST(IdName); + + // Call. + getNextToken(); // eat ( + std::vector<ExprAST*> Args; + if (CurTok != ')') { + while (1) { + ExprAST *Arg = ParseExpression(); + if (!Arg) return 0; + Args.push_back(Arg); + + if (CurTok == ')') break; + + if (CurTok != ',') + return Error("Expected ')' or ',' in argument list"); + getNextToken(); + } + } + + // Eat the ')'. + getNextToken(); + + return new CallExprAST(IdName, Args); + } + +This routine follows the same style as the other routines. (It expects +to be called if the current token is a ``tok_identifier`` token). It +also has recursion and error handling. One interesting aspect of this is +that it uses *look-ahead* to determine if the current identifier is a +stand alone variable reference or if it is a function call expression. +It handles this by checking to see if the token after the identifier is +a '(' token, constructing either a ``VariableExprAST`` or +``CallExprAST`` node as appropriate. + +Now that we have all of our simple expression-parsing logic in place, we +can define a helper function to wrap it together into one entry point. +We call this class of expressions "primary" expressions, for reasons +that will become more clear `later in the +tutorial <LangImpl6.html#unary>`_. In order to parse an arbitrary +primary expression, we need to determine what sort of expression it is: + +.. code-block:: c++ + + /// primary + /// ::= identifierexpr + /// ::= numberexpr + /// ::= parenexpr + static ExprAST *ParsePrimary() { + switch (CurTok) { + default: return Error("unknown token when expecting an expression"); + case tok_identifier: return ParseIdentifierExpr(); + case tok_number: return ParseNumberExpr(); + case '(': return ParseParenExpr(); + } + } + +Now that you see the definition of this function, it is more obvious why +we can assume the state of CurTok in the various functions. This uses +look-ahead to determine which sort of expression is being inspected, and +then parses it with a function call. + +Now that basic expressions are handled, we need to handle binary +expressions. They are a bit more complex. + +Binary Expression Parsing +========================= + +Binary expressions are significantly harder to parse because they are +often ambiguous. For example, when given the string "x+y\*z", the parser +can choose to parse it as either "(x+y)\*z" or "x+(y\*z)". With common +definitions from mathematics, we expect the later parse, because "\*" +(multiplication) has higher *precedence* than "+" (addition). + +There are many ways to handle this, but an elegant and efficient way is +to use `Operator-Precedence +Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_. +This parsing technique uses the precedence of binary operators to guide +recursion. To start with, we need a table of precedences: + +.. code-block:: c++ + + /// BinopPrecedence - This holds the precedence for each binary operator that is + /// defined. + static std::map<char, int> BinopPrecedence; + + /// GetTokPrecedence - Get the precedence of the pending binary operator token. + static int GetTokPrecedence() { + if (!isascii(CurTok)) + return -1; + + // Make sure it's a declared binop. + int TokPrec = BinopPrecedence[CurTok]; + if (TokPrec <= 0) return -1; + return TokPrec; + } + + int main() { + // Install standard binary operators. + // 1 is lowest precedence. + BinopPrecedence['<'] = 10; + BinopPrecedence['+'] = 20; + BinopPrecedence['-'] = 20; + BinopPrecedence['*'] = 40; // highest. + ... + } + +For the basic form of Kaleidoscope, we will only support 4 binary +operators (this can obviously be extended by you, our brave and intrepid +reader). The ``GetTokPrecedence`` function returns the precedence for +the current token, or -1 if the token is not a binary operator. Having a +map makes it easy to add new operators and makes it clear that the +algorithm doesn't depend on the specific operators involved, but it +would be easy enough to eliminate the map and do the comparisons in the +``GetTokPrecedence`` function. (Or just use a fixed-size array). + +With the helper above defined, we can now start parsing binary +expressions. The basic idea of operator precedence parsing is to break +down an expression with potentially ambiguous binary operators into +pieces. Consider ,for example, the expression "a+b+(c+d)\*e\*f+g". +Operator precedence parsing considers this as a stream of primary +expressions separated by binary operators. As such, it will first parse +the leading primary expression "a", then it will see the pairs [+, b] +[+, (c+d)] [\*, e] [\*, f] and [+, g]. Note that because parentheses are +primary expressions, the binary expression parser doesn't need to worry +about nested subexpressions like (c+d) at all. + +To start, an expression is a primary expression potentially followed by +a sequence of [binop,primaryexpr] pairs: + +.. code-block:: c++ + + /// expression + /// ::= primary binoprhs + /// + static ExprAST *ParseExpression() { + ExprAST *LHS = ParsePrimary(); + if (!LHS) return 0; + + return ParseBinOpRHS(0, LHS); + } + +``ParseBinOpRHS`` is the function that parses the sequence of pairs for +us. It takes a precedence and a pointer to an expression for the part +that has been parsed so far. Note that "x" is a perfectly valid +expression: As such, "binoprhs" is allowed to be empty, in which case it +returns the expression that is passed into it. In our example above, the +code passes the expression for "a" into ``ParseBinOpRHS`` and the +current token is "+". + +The precedence value passed into ``ParseBinOpRHS`` indicates the +*minimal operator precedence* that the function is allowed to eat. For +example, if the current pair stream is [+, x] and ``ParseBinOpRHS`` is +passed in a precedence of 40, it will not consume any tokens (because +the precedence of '+' is only 20). With this in mind, ``ParseBinOpRHS`` +starts with: + +.. code-block:: c++ + + /// binoprhs + /// ::= ('+' primary)* + static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) { + // If this is a binop, find its precedence. + while (1) { + int TokPrec = GetTokPrecedence(); + + // If this is a binop that binds at least as tightly as the current binop, + // consume it, otherwise we are done. + if (TokPrec < ExprPrec) + return LHS; + +This code gets the precedence of the current token and checks to see if +if is too low. Because we defined invalid tokens to have a precedence of +-1, this check implicitly knows that the pair-stream ends when the token +stream runs out of binary operators. If this check succeeds, we know +that the token is a binary operator and that it will be included in this +expression: + +.. code-block:: c++ + + // Okay, we know this is a binop. + int BinOp = CurTok; + getNextToken(); // eat binop + + // Parse the primary expression after the binary operator. + ExprAST *RHS = ParsePrimary(); + if (!RHS) return 0; + +As such, this code eats (and remembers) the binary operator and then +parses the primary expression that follows. This builds up the whole +pair, the first of which is [+, b] for the running example. + +Now that we parsed the left-hand side of an expression and one pair of +the RHS sequence, we have to decide which way the expression associates. +In particular, we could have "(a+b) binop unparsed" or "a + (b binop +unparsed)". To determine this, we look ahead at "binop" to determine its +precedence and compare it to BinOp's precedence (which is '+' in this +case): + +.. code-block:: c++ + + // If BinOp binds less tightly with RHS than the operator after RHS, let + // the pending operator take RHS as its LHS. + int NextPrec = GetTokPrecedence(); + if (TokPrec < NextPrec) { + +If the precedence of the binop to the right of "RHS" is lower or equal +to the precedence of our current operator, then we know that the +parentheses associate as "(a+b) binop ...". In our example, the current +operator is "+" and the next operator is "+", we know that they have the +same precedence. In this case we'll create the AST node for "a+b", and +then continue parsing: + +.. code-block:: c++ + + ... if body omitted ... + } + + // Merge LHS/RHS. + LHS = new BinaryExprAST(BinOp, LHS, RHS); + } // loop around to the top of the while loop. + } + +In our example above, this will turn "a+b+" into "(a+b)" and execute the +next iteration of the loop, with "+" as the current token. The code +above will eat, remember, and parse "(c+d)" as the primary expression, +which makes the current pair equal to [+, (c+d)]. It will then evaluate +the 'if' conditional above with "\*" as the binop to the right of the +primary. In this case, the precedence of "\*" is higher than the +precedence of "+" so the if condition will be entered. + +The critical question left here is "how can the if condition parse the +right hand side in full"? In particular, to build the AST correctly for +our example, it needs to get all of "(c+d)\*e\*f" as the RHS expression +variable. The code to do this is surprisingly simple (code from the +above two blocks duplicated for context): + +.. code-block:: c++ + + // If BinOp binds less tightly with RHS than the operator after RHS, let + // the pending operator take RHS as its LHS. + int NextPrec = GetTokPrecedence(); + if (TokPrec < NextPrec) { + RHS = ParseBinOpRHS(TokPrec+1, RHS); + if (RHS == 0) return 0; + } + // Merge LHS/RHS. + LHS = new BinaryExprAST(BinOp, LHS, RHS); + } // loop around to the top of the while loop. + } + +At this point, we know that the binary operator to the RHS of our +primary has higher precedence than the binop we are currently parsing. +As such, we know that any sequence of pairs whose operators are all +higher precedence than "+" should be parsed together and returned as +"RHS". To do this, we recursively invoke the ``ParseBinOpRHS`` function +specifying "TokPrec+1" as the minimum precedence required for it to +continue. In our example above, this will cause it to return the AST +node for "(c+d)\*e\*f" as RHS, which is then set as the RHS of the '+' +expression. + +Finally, on the next iteration of the while loop, the "+g" piece is +parsed and added to the AST. With this little bit of code (14 +non-trivial lines), we correctly handle fully general binary expression +parsing in a very elegant way. This was a whirlwind tour of this code, +and it is somewhat subtle. I recommend running through it with a few +tough examples to see how it works. + +This wraps up handling of expressions. At this point, we can point the +parser at an arbitrary token stream and build an expression from it, +stopping at the first token that is not part of the expression. Next up +we need to handle function definitions, etc. + +Parsing the Rest +================ + +The next thing missing is handling of function prototypes. In +Kaleidoscope, these are used both for 'extern' function declarations as +well as function body definitions. The code to do this is +straight-forward and not very interesting (once you've survived +expressions): + +.. code-block:: c++ + + /// prototype + /// ::= id '(' id* ')' + static PrototypeAST *ParsePrototype() { + if (CurTok != tok_identifier) + return ErrorP("Expected function name in prototype"); + + std::string FnName = IdentifierStr; + getNextToken(); + + if (CurTok != '(') + return ErrorP("Expected '(' in prototype"); + + // Read the list of argument names. + std::vector<std::string> ArgNames; + while (getNextToken() == tok_identifier) + ArgNames.push_back(IdentifierStr); + if (CurTok != ')') + return ErrorP("Expected ')' in prototype"); + + // success. + getNextToken(); // eat ')'. + + return new PrototypeAST(FnName, ArgNames); + } + +Given this, a function definition is very simple, just a prototype plus +an expression to implement the body: + +.. code-block:: c++ + + /// definition ::= 'def' prototype expression + static FunctionAST *ParseDefinition() { + getNextToken(); // eat def. + PrototypeAST *Proto = ParsePrototype(); + if (Proto == 0) return 0; + + if (ExprAST *E = ParseExpression()) + return new FunctionAST(Proto, E); + return 0; + } + +In addition, we support 'extern' to declare functions like 'sin' and +'cos' as well as to support forward declaration of user functions. These +'extern's are just prototypes with no body: + +.. code-block:: c++ + + /// external ::= 'extern' prototype + static PrototypeAST *ParseExtern() { + getNextToken(); // eat extern. + return ParsePrototype(); + } + +Finally, we'll also let the user type in arbitrary top-level expressions +and evaluate them on the fly. We will handle this by defining anonymous +nullary (zero argument) functions for them: + +.. code-block:: c++ + + /// toplevelexpr ::= expression + static FunctionAST *ParseTopLevelExpr() { + if (ExprAST *E = ParseExpression()) { + // Make an anonymous proto. + PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>()); + return new FunctionAST(Proto, E); + } + return 0; + } + +Now that we have all the pieces, let's build a little driver that will +let us actually *execute* this code we've built! + +The Driver +========== + +The driver for this simply invokes all of the parsing pieces with a +top-level dispatch loop. There isn't much interesting here, so I'll just +include the top-level loop. See `below <#code>`_ for full code in the +"Top-Level Parsing" section. + +.. code-block:: c++ + + /// top ::= definition | external | expression | ';' + static void MainLoop() { + while (1) { + fprintf(stderr, "ready> "); + switch (CurTok) { + case tok_eof: return; + case ';': getNextToken(); break; // ignore top-level semicolons. + case tok_def: HandleDefinition(); break; + case tok_extern: HandleExtern(); break; + default: HandleTopLevelExpression(); break; + } + } + } + +The most interesting part of this is that we ignore top-level +semicolons. Why is this, you ask? The basic reason is that if you type +"4 + 5" at the command line, the parser doesn't know whether that is the +end of what you will type or not. For example, on the next line you +could type "def foo..." in which case 4+5 is the end of a top-level +expression. Alternatively you could type "\* 6", which would continue +the expression. Having top-level semicolons allows you to type "4+5;", +and the parser will know you are done. + +Conclusions +=========== + +With just under 400 lines of commented code (240 lines of non-comment, +non-blank code), we fully defined our minimal language, including a +lexer, parser, and AST builder. With this done, the executable will +validate Kaleidoscope code and tell us if it is grammatically invalid. +For example, here is a sample interaction: + +.. code-block:: bash + + $ ./a.out + ready> def foo(x y) x+foo(y, 4.0); + Parsed a function definition. + ready> def foo(x y) x+y y; + Parsed a function definition. + Parsed a top-level expr + ready> def foo(x y) x+y ); + Parsed a function definition. + Error: unknown token when expecting an expression + ready> extern sin(a); + ready> Parsed an extern + ready> ^D + $ + +There is a lot of room for extension here. You can define new AST nodes, +extend the language in many ways, etc. In the `next +installment <LangImpl3.html>`_, we will describe how to generate LLVM +Intermediate Representation (IR) from the AST. + +Full Code Listing +================= + +Here is the complete code listing for this and the previous chapter. +Note that it is fully self-contained: you don't need LLVM or any +external libraries at all for this. (Besides the C and C++ standard +libraries, of course.) To build this, just compile with: + +.. code-block:: bash + + # Compile + clang++ -g -O3 toy.cpp + # Run + ./a.out + +Here is the code: + +.. code-block:: c++ + + #include <cstdio> + #include <cstdlib> + #include <string> + #include <map> + #include <vector> + + //===----------------------------------------------------------------------===// + // Lexer + //===----------------------------------------------------------------------===// + + // The lexer returns tokens [0-255] if it is an unknown character, otherwise one + // of these for known things. + enum Token { + tok_eof = -1, + + // commands + tok_def = -2, tok_extern = -3, + + // primary + tok_identifier = -4, tok_number = -5 + }; + + static std::string IdentifierStr; // Filled in if tok_identifier + static double NumVal; // Filled in if tok_number + + /// gettok - Return the next token from standard input. + static int gettok() { + static int LastChar = ' '; + + // Skip any whitespace. + while (isspace(LastChar)) + LastChar = getchar(); + + if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]* + IdentifierStr = LastChar; + while (isalnum((LastChar = getchar()))) + IdentifierStr += LastChar; + + if (IdentifierStr == "def") return tok_def; + if (IdentifierStr == "extern") return tok_extern; + return tok_identifier; + } + + if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+ + std::string NumStr; + do { + NumStr += LastChar; + LastChar = getchar(); + } while (isdigit(LastChar) || LastChar == '.'); + + NumVal = strtod(NumStr.c_str(), 0); + return tok_number; + } + + if (LastChar == '#') { + // Comment until end of line. + do LastChar = getchar(); + while (LastChar != EOF && LastChar != '\n' && LastChar != '\r'); + + if (LastChar != EOF) + return gettok(); + } + + // Check for end of file. Don't eat the EOF. + if (LastChar == EOF) + return tok_eof; + + // Otherwise, just return the character as its ascii value. + int ThisChar = LastChar; + LastChar = getchar(); + return ThisChar; + } + + //===----------------------------------------------------------------------===// + // Abstract Syntax Tree (aka Parse Tree) + //===----------------------------------------------------------------------===// + + /// ExprAST - Base class for all expression nodes. + class ExprAST { + public: + virtual ~ExprAST() {} + }; + + /// NumberExprAST - Expression class for numeric literals like "1.0". + class NumberExprAST : public ExprAST { + double Val; + public: + NumberExprAST(double val) : Val(val) {} + }; + + /// VariableExprAST - Expression class for referencing a variable, like "a". + class VariableExprAST : public ExprAST { + std::string Name; + public: + VariableExprAST(const std::string &name) : Name(name) {} + }; + + /// BinaryExprAST - Expression class for a binary operator. + class BinaryExprAST : public ExprAST { + char Op; + ExprAST *LHS, *RHS; + public: + BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs) + : Op(op), LHS(lhs), RHS(rhs) {} + }; + + /// CallExprAST - Expression class for function calls. + class CallExprAST : public ExprAST { + std::string Callee; + std::vector<ExprAST*> Args; + public: + CallExprAST(const std::string &callee, std::vector<ExprAST*> &args) + : Callee(callee), Args(args) {} + }; + + /// PrototypeAST - This class represents the "prototype" for a function, + /// which captures its name, and its argument names (thus implicitly the number + /// of arguments the function takes). + class PrototypeAST { + std::string Name; + std::vector<std::string> Args; + public: + PrototypeAST(const std::string &name, const std::vector<std::string> &args) + : Name(name), Args(args) {} + + }; + + /// FunctionAST - This class represents a function definition itself. + class FunctionAST { + PrototypeAST *Proto; + ExprAST *Body; + public: + FunctionAST(PrototypeAST *proto, ExprAST *body) + : Proto(proto), Body(body) {} + + }; + + //===----------------------------------------------------------------------===// + // Parser + //===----------------------------------------------------------------------===// + + /// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current + /// token the parser is looking at. getNextToken reads another token from the + /// lexer and updates CurTok with its results. + static int CurTok; + static int getNextToken() { + return CurTok = gettok(); + } + + /// BinopPrecedence - This holds the precedence for each binary operator that is + /// defined. + static std::map<char, int> BinopPrecedence; + + /// GetTokPrecedence - Get the precedence of the pending binary operator token. + static int GetTokPrecedence() { + if (!isascii(CurTok)) + return -1; + + // Make sure it's a declared binop. + int TokPrec = BinopPrecedence[CurTok]; + if (TokPrec <= 0) return -1; + return TokPrec; + } + + /// Error* - These are little helper functions for error handling. + ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;} + PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; } + FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; } + + static ExprAST *ParseExpression(); + + /// identifierexpr + /// ::= identifier + /// ::= identifier '(' expression* ')' + static ExprAST *ParseIdentifierExpr() { + std::string IdName = IdentifierStr; + + getNextToken(); // eat identifier. + + if (CurTok != '(') // Simple variable ref. + return new VariableExprAST(IdName); + + // Call. + getNextToken(); // eat ( + std::vector<ExprAST*> Args; + if (CurTok != ')') { + while (1) { + ExprAST *Arg = ParseExpression(); + if (!Arg) return 0; + Args.push_back(Arg); + + if (CurTok == ')') break; + + if (CurTok != ',') + return Error("Expected ')' or ',' in argument list"); + getNextToken(); + } + } + + // Eat the ')'. + getNextToken(); + + return new CallExprAST(IdName, Args); + } + + /// numberexpr ::= number + static ExprAST *ParseNumberExpr() { + ExprAST *Result = new NumberExprAST(NumVal); + getNextToken(); // consume the number + return Result; + } + + /// parenexpr ::= '(' expression ')' + static ExprAST *ParseParenExpr() { + getNextToken(); // eat (. + ExprAST *V = ParseExpression(); + if (!V) return 0; + + if (CurTok != ')') + return Error("expected ')'"); + getNextToken(); // eat ). + return V; + } + + /// primary + /// ::= identifierexpr + /// ::= numberexpr + /// ::= parenexpr + static ExprAST *ParsePrimary() { + switch (CurTok) { + default: return Error("unknown token when expecting an expression"); + case tok_identifier: return ParseIdentifierExpr(); + case tok_number: return ParseNumberExpr(); + case '(': return ParseParenExpr(); + } + } + + /// binoprhs + /// ::= ('+' primary)* + static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) { + // If this is a binop, find its precedence. + while (1) { + int TokPrec = GetTokPrecedence(); + + // If this is a binop that binds at least as tightly as the current binop, + // consume it, otherwise we are done. + if (TokPrec < ExprPrec) + return LHS; + + // Okay, we know this is a binop. + int BinOp = CurTok; + getNextToken(); // eat binop + + // Parse the primary expression after the binary operator. + ExprAST *RHS = ParsePrimary(); + if (!RHS) return 0; + + // If BinOp binds less tightly with RHS than the operator after RHS, let + // the pending operator take RHS as its LHS. + int NextPrec = GetTokPrecedence(); + if (TokPrec < NextPrec) { + RHS = ParseBinOpRHS(TokPrec+1, RHS); + if (RHS == 0) return 0; + } + + // Merge LHS/RHS. + LHS = new BinaryExprAST(BinOp, LHS, RHS); + } + } + + /// expression + /// ::= primary binoprhs + /// + static ExprAST *ParseExpression() { + ExprAST *LHS = ParsePrimary(); + if (!LHS) return 0; + + return ParseBinOpRHS(0, LHS); + } + + /// prototype + /// ::= id '(' id* ')' + static PrototypeAST *ParsePrototype() { + if (CurTok != tok_identifier) + return ErrorP("Expected function name in prototype"); + + std::string FnName = IdentifierStr; + getNextToken(); + + if (CurTok != '(') + return ErrorP("Expected '(' in prototype"); + + std::vector<std::string> ArgNames; + while (getNextToken() == tok_identifier) + ArgNames.push_back(IdentifierStr); + if (CurTok != ')') + return ErrorP("Expected ')' in prototype"); + + // success. + getNextToken(); // eat ')'. + + return new PrototypeAST(FnName, ArgNames); + } + + /// definition ::= 'def' prototype expression + static FunctionAST *ParseDefinition() { + getNextToken(); // eat def. + PrototypeAST *Proto = ParsePrototype(); + if (Proto == 0) return 0; + + if (ExprAST *E = ParseExpression()) + return new FunctionAST(Proto, E); + return 0; + } + + /// toplevelexpr ::= expression + static FunctionAST *ParseTopLevelExpr() { + if (ExprAST *E = ParseExpression()) { + // Make an anonymous proto. + PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>()); + return new FunctionAST(Proto, E); + } + return 0; + } + + /// external ::= 'extern' prototype + static PrototypeAST *ParseExtern() { + getNextToken(); // eat extern. + return ParsePrototype(); + } + + //===----------------------------------------------------------------------===// + // Top-Level parsing + //===----------------------------------------------------------------------===// + + static void HandleDefinition() { + if (ParseDefinition()) { + fprintf(stderr, "Parsed a function definition.\n"); + } else { + // Skip token for error recovery. + getNextToken(); + } + } + + static void HandleExtern() { + if (ParseExtern()) { + fprintf(stderr, "Parsed an extern\n"); + } else { + // Skip token for error recovery. + getNextToken(); + } + } + + static void HandleTopLevelExpression() { + // Evaluate a top-level expression into an anonymous function. + if (ParseTopLevelExpr()) { + fprintf(stderr, "Parsed a top-level expr\n"); + } else { + // Skip token for error recovery. + getNextToken(); + } + } + + /// top ::= definition | external | expression | ';' + static void MainLoop() { + while (1) { + fprintf(stderr, "ready> "); + switch (CurTok) { + case tok_eof: return; + case ';': getNextToken(); break; // ignore top-level semicolons. + case tok_def: HandleDefinition(); break; + case tok_extern: HandleExtern(); break; + default: HandleTopLevelExpression(); break; + } + } + } + + //===----------------------------------------------------------------------===// + // Main driver code. + //===----------------------------------------------------------------------===// + + int main() { + // Install standard binary operators. + // 1 is lowest precedence. + BinopPrecedence['<'] = 10; + BinopPrecedence['+'] = 20; + BinopPrecedence['-'] = 20; + BinopPrecedence['*'] = 40; // highest. + + // Prime the first token. + fprintf(stderr, "ready> "); + getNextToken(); + + // Run the main "interpreter loop" now. + MainLoop(); + + return 0; + } + +`Next: Implementing Code Generation to LLVM IR <LangImpl3.html>`_ + diff --git a/docs/tutorial/LangImpl3.html b/docs/tutorial/LangImpl3.html deleted file mode 100644 index 57ff7373f6..0000000000 --- a/docs/tutorial/LangImpl3.html +++ /dev/null @@ -1,1268 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> - -<html> -<head> - <title>Kaleidoscope: Implementing code generation to LLVM IR</title> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <meta name="author" content="Chris Lattner"> - <link rel="stylesheet" href="../_static/llvm.css" type="text/css"> -</head> - -<body> - -<h1>Kaleidoscope: Code generation to LLVM IR</h1> - -<ul> -<li><a href="index.html">Up to Tutorial Index</a></li> -<li>Chapter 3 - <ol> - <li><a href="#intro">Chapter 3 Introduction</a></li> - <li><a href="#basics">Code Generation Setup</a></li> - <li><a href="#exprs">Expression Code Generation</a></li> - <li><a href="#funcs">Function Code Generation</a></li> - <li><a href="#driver">Driver Changes and Closing Thoughts</a></li> - <li><a href="#code">Full Code Listing</a></li> - </ol> -</li> -<li><a href="LangImpl4.html">Chapter 4</a>: Adding JIT and Optimizer -Support</li> -</ul> - -<div class="doc_author"> - <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p> -</div> - -<!-- *********************************************************************** --> -<h2><a name="intro">Chapter 3 Introduction</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>Welcome to Chapter 3 of the "<a href="index.html">Implementing a language -with LLVM</a>" tutorial. This chapter shows you how to transform the <a -href="LangImpl2.html">Abstract Syntax Tree</a>, built in Chapter 2, into LLVM IR. -This will teach you a little bit about how LLVM does things, as well as -demonstrate how easy it is to use. It's much more work to build a lexer and -parser than it is to generate LLVM IR code. :) -</p> - -<p><b>Please note</b>: the code in this chapter and later require LLVM 2.2 or -later. LLVM 2.1 and before will not work with it. Also note that you need -to use a version of this tutorial that matches your LLVM release: If you are -using an official LLVM release, use the version of the documentation included -with your release or on the <a href="http://llvm.org/releases/">llvm.org -releases page</a>.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="basics">Code Generation Setup</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p> -In order to generate LLVM IR, we want some simple setup to get started. First -we define virtual code generation (codegen) methods in each AST class:</p> - -<div class="doc_code"> -<pre> -/// ExprAST - Base class for all expression nodes. -class ExprAST { -public: - virtual ~ExprAST() {} - <b>virtual Value *Codegen() = 0;</b> -}; - -/// NumberExprAST - Expression class for numeric literals like "1.0". -class NumberExprAST : public ExprAST { - double Val; -public: - NumberExprAST(double val) : Val(val) {} - <b>virtual Value *Codegen();</b> -}; -... -</pre> -</div> - -<p>The Codegen() method says to emit IR for that AST node along with all the things it -depends on, and they all return an LLVM Value object. -"Value" is the class used to represent a "<a -href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Static Single -Assignment (SSA)</a> register" or "SSA value" in LLVM. The most distinct aspect -of SSA values is that their value is computed as the related instruction -executes, and it does not get a new value until (and if) the instruction -re-executes. In other words, there is no way to "change" an SSA value. For -more information, please read up on <a -href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Static Single -Assignment</a> - the concepts are really quite natural once you grok them.</p> - -<p>Note that instead of adding virtual methods to the ExprAST class hierarchy, -it could also make sense to use a <a -href="http://en.wikipedia.org/wiki/Visitor_pattern">visitor pattern</a> or some -other way to model this. Again, this tutorial won't dwell on good software -engineering practices: for our purposes, adding a virtual method is -simplest.</p> - -<p>The -second thing we want is an "Error" method like we used for the parser, which will -be used to report errors found during code generation (for example, use of an -undeclared parameter):</p> - -<div class="doc_code"> -<pre> -Value *ErrorV(const char *Str) { Error(Str); return 0; } - -static Module *TheModule; -static IRBuilder<> Builder(getGlobalContext()); -static std::map<std::string, Value*> NamedValues; -</pre> -</div> - -<p>The static variables will be used during code generation. <tt>TheModule</tt> -is the LLVM construct that contains all of the functions and global variables in -a chunk of code. In many ways, it is the top-level structure that the LLVM IR -uses to contain code.</p> - -<p>The <tt>Builder</tt> object is a helper object that makes it easy to generate -LLVM instructions. Instances of the <a -href="http://llvm.org/doxygen/IRBuilder_8h-source.html"><tt>IRBuilder</tt></a> -class template keep track of the current place to insert instructions and has -methods to create new instructions.</p> - -<p>The <tt>NamedValues</tt> map keeps track of which values are defined in the -current scope and what their LLVM representation is. (In other words, it is a -symbol table for the code). In this form of Kaleidoscope, the only things that -can be referenced are function parameters. As such, function parameters will -be in this map when generating code for their function body.</p> - -<p> -With these basics in place, we can start talking about how to generate code for -each expression. Note that this assumes that the <tt>Builder</tt> has been set -up to generate code <em>into</em> something. For now, we'll assume that this -has already been done, and we'll just use it to emit code. -</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="exprs">Expression Code Generation</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>Generating LLVM code for expression nodes is very straightforward: less -than 45 lines of commented code for all four of our expression nodes. First -we'll do numeric literals:</p> - -<div class="doc_code"> -<pre> -Value *NumberExprAST::Codegen() { - return ConstantFP::get(getGlobalContext(), APFloat(Val)); -} -</pre> -</div> - -<p>In the LLVM IR, numeric constants are represented with the -<tt>ConstantFP</tt> class, which holds the numeric value in an <tt>APFloat</tt> -internally (<tt>APFloat</tt> has the capability of holding floating point -constants of <em>A</em>rbitrary <em>P</em>recision). This code basically just -creates and returns a <tt>ConstantFP</tt>. Note that in the LLVM IR -that constants are all uniqued together and shared. For this reason, the API -uses the "foo::get(...)" idiom instead of "new foo(..)" or "foo::Create(..)".</p> - -<div class="doc_code"> -<pre> -Value *VariableExprAST::Codegen() { - // Look this variable up in the function. - Value *V = NamedValues[Name]; - return V ? V : ErrorV("Unknown variable name"); -} -</pre> -</div> - -<p>References to variables are also quite simple using LLVM. In the simple version -of Kaleidoscope, we assume that the variable has already been emitted somewhere -and its value is available. In practice, the only values that can be in the -<tt>NamedValues</tt> map are function arguments. This -code simply checks to see that the specified name is in the map (if not, an -unknown variable is being referenced) and returns the value for it. In future -chapters, we'll add support for <a href="LangImpl5.html#for">loop induction -variables</a> in the symbol table, and for <a -href="LangImpl7.html#localvars">local variables</a>.</p> - -<div class="doc_code"> -<pre> -Value *BinaryExprAST::Codegen() { - Value *L = LHS->Codegen(); - Value *R = RHS->Codegen(); - if (L == 0 || R == 0) return 0; - - switch (Op) { - case '+': return Builder.CreateFAdd(L, R, "addtmp"); - case '-': return Builder.CreateFSub(L, R, "subtmp"); - case '*': return Builder.CreateFMul(L, R, "multmp"); - case '<': - L = Builder.CreateFCmpULT(L, R, "cmptmp"); - // Convert bool 0/1 to double 0.0 or 1.0 - return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), - "booltmp"); - default: return ErrorV("invalid binary operator"); - } -} -</pre> -</div> - -<p>Binary operators start to get more interesting. The basic idea here is that -we recursively emit code for the left-hand side of the expression, then the -right-hand side, then we compute the result of the binary expression. In this -code, we do a simple switch on the opcode to create the right LLVM instruction. -</p> - -<p>In the example above, the LLVM builder class is starting to show its value. -IRBuilder knows where to insert the newly created instruction, all you have to -do is specify what instruction to create (e.g. with <tt>CreateFAdd</tt>), which -operands to use (<tt>L</tt> and <tt>R</tt> here) and optionally provide a name -for the generated instruction.</p> - -<p>One nice thing about LLVM is that the name is just a hint. For instance, if -the code above emits multiple "addtmp" variables, LLVM will automatically -provide each one with an increasing, unique numeric suffix. Local value names -for instructions are purely optional, but it makes it much easier to read the -IR dumps.</p> - -<p><a href="../LangRef.html#instref">LLVM instructions</a> are constrained by -strict rules: for example, the Left and Right operators of -an <a href="../LangRef.html#i_add">add instruction</a> must have the same -type, and the result type of the add must match the operand types. Because -all values in Kaleidoscope are doubles, this makes for very simple code for add, -sub and mul.</p> - -<p>On the other hand, LLVM specifies that the <a -href="../LangRef.html#i_fcmp">fcmp instruction</a> always returns an 'i1' value -(a one bit integer). The problem with this is that Kaleidoscope wants the value to be a 0.0 or 1.0 value. In order to get these semantics, we combine the fcmp instruction with -a <a href="../LangRef.html#i_uitofp">uitofp instruction</a>. This instruction -converts its input integer into a floating point value by treating the input -as an unsigned value. In contrast, if we used the <a -href="../LangRef.html#i_sitofp">sitofp instruction</a>, the Kaleidoscope '<' -operator would return 0.0 and -1.0, depending on the input value.</p> - -<div class="doc_code"> -<pre> -Value *CallExprAST::Codegen() { - // Look up the name in the global module table. - Function *CalleeF = TheModule->getFunction(Callee); - if (CalleeF == 0) - return ErrorV("Unknown function referenced"); - - // If argument mismatch error. - if (CalleeF->arg_size() != Args.size()) - return ErrorV("Incorrect # arguments passed"); - - std::vector<Value*> ArgsV; - for (unsigned i = 0, e = Args.size(); i != e; ++i) { - ArgsV.push_back(Args[i]->Codegen()); - if (ArgsV.back() == 0) return 0; - } - - return Builder.CreateCall(CalleeF, ArgsV, "calltmp"); -} -</pre> -</div> - -<p>Code generation for function calls is quite straightforward with LLVM. The -code above initially does a function name lookup in the LLVM Module's symbol -table. Recall that the LLVM Module is the container that holds all of the -functions we are JIT'ing. By giving each function the same name as what the -user specifies, we can use the LLVM symbol table to resolve function names for -us.</p> - -<p>Once we have the function to call, we recursively codegen each argument that -is to be passed in, and create an LLVM <a href="../LangRef.html#i_call">call -instruction</a>. Note that LLVM uses the native C calling conventions by -default, allowing these calls to also call into standard library functions like -"sin" and "cos", with no additional effort.</p> - -<p>This wraps up our handling of the four basic expressions that we have so far -in Kaleidoscope. Feel free to go in and add some more. For example, by -browsing the <a href="../LangRef.html">LLVM language reference</a> you'll find -several other interesting instructions that are really easy to plug into our -basic framework.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="funcs">Function Code Generation</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>Code generation for prototypes and functions must handle a number of -details, which make their code less beautiful than expression code -generation, but allows us to illustrate some important points. First, lets -talk about code generation for prototypes: they are used both for function -bodies and external function declarations. The code starts with:</p> - -<div class="doc_code"> -<pre> -Function *PrototypeAST::Codegen() { - // Make the function type: double(double,double) etc. - std::vector<Type*> Doubles(Args.size(), - Type::getDoubleTy(getGlobalContext())); - FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()), - Doubles, false); - - Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule); -</pre> -</div> - -<p>This code packs a lot of power into a few lines. Note first that this -function returns a "Function*" instead of a "Value*". Because a "prototype" -really talks about the external interface for a function (not the value computed -by an expression), it makes sense for it to return the LLVM Function it -corresponds to when codegen'd.</p> - -<p>The call to <tt>FunctionType::get</tt> creates -the <tt>FunctionType</tt> that should be used for a given Prototype. Since all -function arguments in Kaleidoscope are of type double, the first line creates -a vector of "N" LLVM double types. It then uses the <tt>Functiontype::get</tt> -method to create a function type that takes "N" doubles as arguments, returns -one double as a result, and that is not vararg (the false parameter indicates -this). Note that Types in LLVM are uniqued just like Constants are, so you -don't "new" a type, you "get" it.</p> - -<p>The final line above actually creates the function that the prototype will -correspond to. This indicates the type, linkage and name to use, as well as which -module to insert into. "<a href="../LangRef.html#linkage">external linkage</a>" -means that the function may be defined outside the current module and/or that it -is callable by functions outside the module. The Name passed in is the name the -user specified: since "<tt>TheModule</tt>" is specified, this name is registered -in "<tt>TheModule</tt>"s symbol table, which is used by the function call code -above.</p> - -<div class="doc_code"> -<pre> - // If F conflicted, there was already something named 'Name'. If it has a - // body, don't allow redefinition or reextern. - if (F->getName() != Name) { - // Delete the one we just made and get the existing one. - F->eraseFromParent(); - F = TheModule->getFunction(Name); -</pre> -</div> - -<p>The Module symbol table works just like the Function symbol table when it -comes to name conflicts: if a new function is created with a name that was previously -added to the symbol table, the new function will get implicitly renamed when added to the -Module. The code above exploits this fact to determine if there was a previous -definition of this function.</p> - -<p>In Kaleidoscope, I choose to allow redefinitions of functions in two cases: -first, we want to allow 'extern'ing a function more than once, as long as the -prototypes for the externs match (since all arguments have the same type, we -just have to check that the number of arguments match). Second, we want to -allow 'extern'ing a function and then defining a body for it. This is useful -when defining mutually recursive functions.</p> - -<p>In order to implement this, the code above first checks to see if there is -a collision on the name of the function. If so, it deletes the function we just -created (by calling <tt>eraseFromParent</tt>) and then calling -<tt>getFunction</tt> to get the existing function with the specified name. Note -that many APIs in LLVM have "erase" forms and "remove" forms. The "remove" form -unlinks the object from its parent (e.g. a Function from a Module) and returns -it. The "erase" form unlinks the object and then deletes it.</p> - -<div class="doc_code"> -<pre> - // If F already has a body, reject this. - if (!F->empty()) { - ErrorF("redefinition of function"); - return 0; - } - - // If F took a different number of args, reject. - if (F->arg_size() != Args.size()) { - ErrorF("redefinition of function with different # args"); - return 0; - } - } -</pre> -</div> - -<p>In order to verify the logic above, we first check to see if the pre-existing -function is "empty". In this case, empty means that it has no basic blocks in -it, which means it has no body. If it has no body, it is a forward -declaration. Since we don't allow anything after a full definition of the -function, the code rejects this case. If the previous reference to a function -was an 'extern', we simply verify that the number of arguments for that -definition and this one match up. If not, we emit an error.</p> - -<div class="doc_code"> -<pre> - // Set names for all arguments. - unsigned Idx = 0; - for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size(); - ++AI, ++Idx) { - AI->setName(Args[Idx]); - - // Add arguments to variable symbol table. - NamedValues[Args[Idx]] = AI; - } - return F; -} -</pre> -</div> - -<p>The last bit of code for prototypes loops over all of the arguments in the -function, setting the name of the LLVM Argument objects to match, and registering -the arguments in the <tt>NamedValues</tt> map for future use by the -<tt>VariableExprAST</tt> AST node. Once this is set up, it returns the Function -object to the caller. Note that we don't check for conflicting -argument names here (e.g. "extern foo(a b a)"). Doing so would be very -straight-forward with the mechanics we have already used above.</p> - -<div class="doc_code"> -<pre> -Function *FunctionAST::Codegen() { - NamedValues.clear(); - - Function *TheFunction = Proto->Codegen(); - if (TheFunction == 0) - return 0; -</pre> -</div> - -<p>Code generation for function definitions starts out simply enough: we just -codegen the prototype (Proto) and verify that it is ok. We then clear out the -<tt>NamedValues</tt> map to make sure that there isn't anything in it from the -last function we compiled. Code generation of the prototype ensures that there -is an LLVM Function object that is ready to go for us.</p> - -<div class="doc_code"> -<pre> - // Create a new basic block to start insertion into. - BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction); - Builder.SetInsertPoint(BB); - - if (Value *RetVal = Body->Codegen()) { -</pre> -</div> - -<p>Now we get to the point where the <tt>Builder</tt> is set up. The first -line creates a new <a href="http://en.wikipedia.org/wiki/Basic_block">basic -block</a> (named "entry"), which is inserted into <tt>TheFunction</tt>. The -second line then tells the builder that new instructions should be inserted into -the end of the new basic block. Basic blocks in LLVM are an important part -of functions that define the <a -href="http://en.wikipedia.org/wiki/Control_flow_graph">Control Flow Graph</a>. -Since we don't have any control flow, our functions will only contain one -block at this point. We'll fix this in <a href="LangImpl5.html">Chapter 5</a> :).</p> - -<div class="doc_code"> -<pre> - if (Value *RetVal = Body->Codegen()) { - // Finish off the function. - Builder.CreateRet(RetVal); - - // Validate the generated code, checking for consistency. - verifyFunction(*TheFunction); - - return TheFunction; - } -</pre> -</div> - -<p>Once the insertion point is set up, we call the <tt>CodeGen()</tt> method for -the root expression of the function. If no error happens, this emits code to -compute the expression into the entry block and returns the value that was -computed. Assuming no error, we then create an LLVM <a -href="../LangRef.html#i_ret">ret instruction</a>, which completes the function. -Once the function is built, we call <tt>verifyFunction</tt>, which -is provided by LLVM. This function does a variety of consistency checks on the -generated code, to determine if our compiler is doing everything right. Using -this is important: it can catch a lot of bugs. Once the function is finished -and validated, we return it.</p> - -<div class="doc_code"> -<pre> - // Error reading body, remove function. - TheFunction->eraseFromParent(); - return 0; -} -</pre> -</div> - -<p>The only piece left here is handling of the error case. For simplicity, we -handle this by merely deleting the function we produced with the -<tt>eraseFromParent</tt> method. This allows the user to redefine a function -that they incorrectly typed in before: if we didn't delete it, it would live in -the symbol table, with a body, preventing future redefinition.</p> - -<p>This code does have a bug, though. Since the <tt>PrototypeAST::Codegen</tt> -can return a previously defined forward declaration, our code can actually delete -a forward declaration. There are a number of ways to fix this bug, see what you -can come up with! Here is a testcase:</p> - -<div class="doc_code"> -<pre> -extern foo(a b); # ok, defines foo. -def foo(a b) c; # error, 'c' is invalid. -def bar() foo(1, 2); # error, unknown function "foo" -</pre> -</div> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="driver">Driver Changes and Closing Thoughts</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p> -For now, code generation to LLVM doesn't really get us much, except that we can -look at the pretty IR calls. The sample code inserts calls to Codegen into the -"<tt>HandleDefinition</tt>", "<tt>HandleExtern</tt>" etc functions, and then -dumps out the LLVM IR. This gives a nice way to look at the LLVM IR for simple -functions. For example: -</p> - -<div class="doc_code"> -<pre> -ready> <b>4+5</b>; -Read top-level expression: -define double @0() { -entry: - ret double 9.000000e+00 -} -</pre> -</div> - -<p>Note how the parser turns the top-level expression into anonymous functions -for us. This will be handy when we add <a href="LangImpl4.html#jit">JIT -support</a> in the next chapter. Also note that the code is very literally -transcribed, no optimizations are being performed except simple constant -folding done by IRBuilder. We will -<a href="LangImpl4.html#trivialconstfold">add optimizations</a> explicitly in -the next chapter.</p> - -<div class="doc_code"> -<pre> -ready> <b>def foo(a b) a*a + 2*a*b + b*b;</b> -Read function definition: -define double @foo(double %a, double %b) { -entry: - %multmp = fmul double %a, %a - %multmp1 = fmul double 2.000000e+00, %a - %multmp2 = fmul double %multmp1, %b - %addtmp = fadd double %multmp, %multmp2 - %multmp3 = fmul double %b, %b - %addtmp4 = fadd double %addtmp, %multmp3 - ret double %addtmp4 -} -</pre> -</div> - -<p>This shows some simple arithmetic. Notice the striking similarity to the -LLVM builder calls that we use to create the instructions.</p> - -<div class="doc_code"> -<pre> -ready> <b>def bar(a) foo(a, 4.0) + bar(31337);</b> -Read function definition: -define double @bar(double %a) { -entry: - %calltmp = call double @foo(double %a, double 4.000000e+00) - %calltmp1 = call double @bar(double 3.133700e+04) - %addtmp = fadd double %calltmp, %calltmp1 - ret double %addtmp -} -</pre> -</div> - -<p>This shows some function calls. Note that this function will take a long -time to execute if you call it. In the future we'll add conditional control -flow to actually make recursion useful :).</p> - -<div class="doc_code"> -<pre> -ready> <b>extern cos(x);</b> -Read extern: -declare double @cos(double) - -ready> <b>cos(1.234);</b> -Read top-level expression: -define double @1() { -entry: - %calltmp = call double @cos(double 1.234000e+00) - ret double %calltmp -} -</pre> -</div> - -<p>This shows an extern for the libm "cos" function, and a call to it.</p> - - -<div class="doc_code"> -<pre> -ready> <b>^D</b> -; ModuleID = 'my cool jit' - -define double @0() { -entry: - %addtmp = fadd double 4.000000e+00, 5.000000e+00 - ret double %addtmp -} - -define double @foo(double %a, double %b) { -entry: - %multmp = fmul double %a, %a - %multmp1 = fmul double 2.000000e+00, %a - %multmp2 = fmul double %multmp1, %b - %addtmp = fadd double %multmp, %multmp2 - %multmp3 = fmul double %b, %b - %addtmp4 = fadd double %addtmp, %multmp3 - ret double %addtmp4 -} - -define double @bar(double %a) { -entry: - %calltmp = call double @foo(double %a, double 4.000000e+00) - %calltmp1 = call double @bar(double 3.133700e+04) - %addtmp = fadd double %calltmp, %calltmp1 - ret double %addtmp -} - -declare double @cos(double) - -define double @1() { -entry: - %calltmp = call double @cos(double 1.234000e+00) - ret double %calltmp -} -</pre> -</div> - -<p>When you quit the current demo, it dumps out the IR for the entire module -generated. Here you can see the big picture with all the functions referencing -each other.</p> - -<p>This wraps up the third chapter of the Kaleidoscope tutorial. Up next, we'll -describe how to <a href="LangImpl4.html">add JIT codegen and optimizer -support</a> to this so we can actually start running code!</p> - -</div> - - -<!-- *********************************************************************** --> -<h2><a name="code">Full Code Listing</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p> -Here is the complete code listing for our running example, enhanced with the -LLVM code generator. Because this uses the LLVM libraries, we need to link -them in. To do this, we use the <a -href="http://llvm.org/cmds/llvm-config.html">llvm-config</a> tool to inform -our makefile/command line about which options to use:</p> - -<div class="doc_code"> -<pre> -# Compile -clang++ -g -O3 toy.cpp `llvm-config --cppflags --ldflags --libs core` -o toy -# Run -./toy -</pre> -</div> - -<p>Here is the code:</p> - -<div class="doc_code"> -<pre> -// To build this: -// See example below. - -#include "llvm/DerivedTypes.h" -#include "llvm/IRBuilder.h" -#include "llvm/LLVMContext.h" -#include "llvm/Module.h" -#include "llvm/Analysis/Verifier.h" -#include <cstdio> -#include <string> -#include <map> -#include <vector> -using namespace llvm; - -//===----------------------------------------------------------------------===// -// Lexer -//===----------------------------------------------------------------------===// - -// The lexer returns tokens [0-255] if it is an unknown character, otherwise one -// of these for known things. -enum Token { - tok_eof = -1, - - // commands - tok_def = -2, tok_extern = -3, - - // primary - tok_identifier = -4, tok_number = -5 -}; - -static std::string IdentifierStr; // Filled in if tok_identifier -static double NumVal; // Filled in if tok_number - -/// gettok - Return the next token from standard input. -static int gettok() { - static int LastChar = ' '; - - // Skip any whitespace. - while (isspace(LastChar)) - LastChar = getchar(); - - if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]* - IdentifierStr = LastChar; - while (isalnum((LastChar = getchar()))) - IdentifierStr += LastChar; - - if (IdentifierStr == "def") return tok_def; - if (IdentifierStr == "extern") return tok_extern; - return tok_identifier; - } - - if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+ - std::string NumStr; - do { - NumStr += LastChar; - LastChar = getchar(); - } while (isdigit(LastChar) || LastChar == '.'); - - NumVal = strtod(NumStr.c_str(), 0); - return tok_number; - } - - if (LastChar == '#') { - // Comment until end of line. - do LastChar = getchar(); - while (LastChar != EOF && LastChar != '\n' && LastChar != '\r'); - - if (LastChar != EOF) - return gettok(); - } - - // Check for end of file. Don't eat the EOF. - if (LastChar == EOF) - return tok_eof; - - // Otherwise, just return the character as its ascii value. - int ThisChar = LastChar; - LastChar = getchar(); - return ThisChar; -} - -//===----------------------------------------------------------------------===// -// Abstract Syntax Tree (aka Parse Tree) -//===----------------------------------------------------------------------===// - -/// ExprAST - Base class for all expression nodes. -class ExprAST { -public: - virtual ~ExprAST() {} - virtual Value *Codegen() = 0; -}; - -/// NumberExprAST - Expression class for numeric literals like "1.0". -class NumberExprAST : public ExprAST { - double Val; -public: - NumberExprAST(double val) : Val(val) {} - virtual Value *Codegen(); -}; - -/// VariableExprAST - Expression class for referencing a variable, like "a". -class VariableExprAST : public ExprAST { - std::string Name; -public: - VariableExprAST(const std::string &name) : Name(name) {} - virtual Value *Codegen(); -}; - -/// BinaryExprAST - Expression class for a binary operator. -class BinaryExprAST : public ExprAST { - char Op; - ExprAST *LHS, *RHS; -public: - BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs) - : Op(op), LHS(lhs), RHS(rhs) {} - virtual Value *Codegen(); -}; - -/// CallExprAST - Expression class for function calls. -class CallExprAST : public ExprAST { - std::string Callee; - std::vector<ExprAST*> Args; -public: - CallExprAST(const std::string &callee, std::vector<ExprAST*> &args) - : Callee(callee), Args(args) {} - virtual Value *Codegen(); -}; - -/// PrototypeAST - This class represents the "prototype" for a function, -/// which captures its name, and its argument names (thus implicitly the number -/// of arguments the function takes). -class PrototypeAST { - std::string Name; - std::vector<std::string> Args; -public: - PrototypeAST(const std::string &name, const std::vector<std::string> &args) - : Name(name), Args(args) {} - - Function *Codegen(); -}; - -/// FunctionAST - This class represents a function definition itself. -class FunctionAST { - PrototypeAST *Proto; - ExprAST *Body; -public: - FunctionAST(PrototypeAST *proto, ExprAST *body) - : Proto(proto), Body(body) {} - - Function *Codegen(); -}; - -//===----------------------------------------------------------------------===// -// Parser -//===----------------------------------------------------------------------===// - -/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current -/// token the parser is looking at. getNextToken reads another token from the -/// lexer and updates CurTok with its results. -static int CurTok; -static int getNextToken() { - return CurTok = gettok(); -} - -/// BinopPrecedence - This holds the precedence for each binary operator that is -/// defined. -static std::map<char, int> BinopPrecedence; - -/// GetTokPrecedence - Get the precedence of the pending binary operator token. -static int GetTokPrecedence() { - if (!isascii(CurTok)) - return -1; - - // Make sure it's a declared binop. - int TokPrec = BinopPrecedence[CurTok]; - if (TokPrec <= 0) return -1; - return TokPrec; -} - -/// Error* - These are little helper functions for error handling. -ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;} -PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; } -FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; } - -static ExprAST *ParseExpression(); - -/// identifierexpr -/// ::= identifier -/// ::= identifier '(' expression* ')' -static ExprAST *ParseIdentifierExpr() { - std::string IdName = IdentifierStr; - - getNextToken(); // eat identifier. - - if (CurTok != '(') // Simple variable ref. - return new VariableExprAST(IdName); - - // Call. - getNextToken(); // eat ( - std::vector<ExprAST*> Args; - if (CurTok != ')') { - while (1) { - ExprAST *Arg = ParseExpression(); - if (!Arg) return 0; - Args.push_back(Arg); - - if (CurTok == ')') break; - - if (CurTok != ',') - return Error("Expected ')' or ',' in argument list"); - getNextToken(); - } - } - - // Eat the ')'. - getNextToken(); - - return new CallExprAST(IdName, Args); -} - -/// numberexpr ::= number -static ExprAST *ParseNumberExpr() { - ExprAST *Result = new NumberExprAST(NumVal); - getNextToken(); // consume the number - return Result; -} - -/// parenexpr ::= '(' expression ')' -static ExprAST *ParseParenExpr() { - getNextToken(); // eat (. - ExprAST *V = ParseExpression(); - if (!V) return 0; - - if (CurTok != ')') - return Error("expected ')'"); - getNextToken(); // eat ). - return V; -} - -/// primary -/// ::= identifierexpr -/// ::= numberexpr -/// ::= parenexpr -static ExprAST *ParsePrimary() { - switch (CurTok) { - default: return Error("unknown token when expecting an expression"); - case tok_identifier: return ParseIdentifierExpr(); - case tok_number: return ParseNumberExpr(); - case '(': return ParseParenExpr(); - } -} - -/// binoprhs -/// ::= ('+' primary)* -static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) { - // If this is a binop, find its precedence. - while (1) { - int TokPrec = GetTokPrecedence(); - - // If this is a binop that binds at least as tightly as the current binop, - // consume it, otherwise we are done. - if (TokPrec < ExprPrec) - return LHS; - - // Okay, we know this is a binop. - int BinOp = CurTok; - getNextToken(); // eat binop - - // Parse the primary expression after the binary operator. - ExprAST *RHS = ParsePrimary(); - if (!RHS) return 0; - - // If BinOp binds less tightly with RHS than the operator after RHS, let - // the pending operator take RHS as its LHS. - int NextPrec = GetTokPrecedence(); - if (TokPrec < NextPrec) { - RHS = ParseBinOpRHS(TokPrec+1, RHS); - if (RHS == 0) return 0; - } - - // Merge LHS/RHS. - LHS = new BinaryExprAST(BinOp, LHS, RHS); - } -} - -/// expression -/// ::= primary binoprhs -/// -static ExprAST *ParseExpression() { - ExprAST *LHS = ParsePrimary(); - if (!LHS) return 0; - - return ParseBinOpRHS(0, LHS); -} - -/// prototype -/// ::= id '(' id* ')' -static PrototypeAST *ParsePrototype() { - if (CurTok != tok_identifier) - return ErrorP("Expected function name in prototype"); - - std::string FnName = IdentifierStr; - getNextToken(); - - if (CurTok != '(') - return ErrorP("Expected '(' in prototype"); - - std::vector<std::string> ArgNames; - while (getNextToken() == tok_identifier) - ArgNames.push_back(IdentifierStr); - if (CurTok != ')') - return ErrorP("Expected ')' in prototype"); - - // success. - getNextToken(); // eat ')'. - - return new PrototypeAST(FnName, ArgNames); -} - -/// definition ::= 'def' prototype expression -static FunctionAST *ParseDefinition() { - getNextToken(); // eat def. - PrototypeAST *Proto = ParsePrototype(); - if (Proto == 0) return 0; - - if (ExprAST *E = ParseExpression()) - return new FunctionAST(Proto, E); - return 0; -} - -/// toplevelexpr ::= expression -static FunctionAST *ParseTopLevelExpr() { - if (ExprAST *E = ParseExpression()) { - // Make an anonymous proto. - PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>()); - return new FunctionAST(Proto, E); - } - return 0; -} - -/// external ::= 'extern' prototype -static PrototypeAST *ParseExtern() { - getNextToken(); // eat extern. - return ParsePrototype(); -} - -//===----------------------------------------------------------------------===// -// Code Generation -//===----------------------------------------------------------------------===// - -static Module *TheModule; -static IRBuilder<> Builder(getGlobalContext()); -static std::map<std::string, Value*> NamedValues; - -Value *ErrorV(const char *Str) { Error(Str); return 0; } - -Value *NumberExprAST::Codegen() { - return ConstantFP::get(getGlobalContext(), APFloat(Val)); -} - -Value *VariableExprAST::Codegen() { - // Look this variable up in the function. - Value *V = NamedValues[Name]; - return V ? V : ErrorV("Unknown variable name"); -} - -Value *BinaryExprAST::Codegen() { - Value *L = LHS->Codegen(); - Value *R = RHS->Codegen(); - if (L == 0 || R == 0) return 0; - - switch (Op) { - case '+': return Builder.CreateFAdd(L, R, "addtmp"); - case '-': return Builder.CreateFSub(L, R, "subtmp"); - case '*': return Builder.CreateFMul(L, R, "multmp"); - case '<': - L = Builder.CreateFCmpULT(L, R, "cmptmp"); - // Convert bool 0/1 to double 0.0 or 1.0 - return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), - "booltmp"); - default: return ErrorV("invalid binary operator"); - } -} - -Value *CallExprAST::Codegen() { - // Look up the name in the global module table. - Function *CalleeF = TheModule->getFunction(Callee); - if (CalleeF == 0) - return ErrorV("Unknown function referenced"); - - // If argument mismatch error. - if (CalleeF->arg_size() != Args.size()) - return ErrorV("Incorrect # arguments passed"); - - std::vector<Value*> ArgsV; - for (unsigned i = 0, e = Args.size(); i != e; ++i) { - ArgsV.push_back(Args[i]->Codegen()); - if (ArgsV.back() == 0) return 0; - } - - return Builder.CreateCall(CalleeF, ArgsV, "calltmp"); -} - -Function *PrototypeAST::Codegen() { - // Make the function type: double(double,double) etc. - std::vector<Type*> Doubles(Args.size(), - Type::getDoubleTy(getGlobalContext())); - FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()), - Doubles, false); - - Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule); - - // If F conflicted, there was already something named 'Name'. If it has a - // body, don't allow redefinition or reextern. - if (F->getName() != Name) { - // Delete the one we just made and get the existing one. - F->eraseFromParent(); - F = TheModule->getFunction(Name); - - // If F already has a body, reject this. - if (!F->empty()) { - ErrorF("redefinition of function"); - return 0; - } - - // If F took a different number of args, reject. - if (F->arg_size() != Args.size()) { - ErrorF("redefinition of function with different # args"); - return 0; - } - } - - // Set names for all arguments. - unsigned Idx = 0; - for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size(); - ++AI, ++Idx) { - AI->setName(Args[Idx]); - - // Add arguments to variable symbol table. - NamedValues[Args[Idx]] = AI; - } - - return F; -} - -Function *FunctionAST::Codegen() { - NamedValues.clear(); - - Function *TheFunction = Proto->Codegen(); - if (TheFunction == 0) - return 0; - - // Create a new basic block to start insertion into. - BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction); - Builder.SetInsertPoint(BB); - - if (Value *RetVal = Body->Codegen()) { - // Finish off the function. - Builder.CreateRet(RetVal); - - // Validate the generated code, checking for consistency. - verifyFunction(*TheFunction); - - return TheFunction; - } - - // Error reading body, remove function. - TheFunction->eraseFromParent(); - return 0; -} - -//===----------------------------------------------------------------------===// -// Top-Level parsing and JIT Driver -//===----------------------------------------------------------------------===// - -static void HandleDefinition() { - if (FunctionAST *F = ParseDefinition()) { - if (Function *LF = F->Codegen()) { - fprintf(stderr, "Read function definition:"); - LF->dump(); - } - } else { - // Skip token for error recovery. - getNextToken(); - } -} - -static void HandleExtern() { - if (PrototypeAST *P = ParseExtern()) { - if (Function *F = P->Codegen()) { - fprintf(stderr, "Read extern: "); - F->dump(); - } - } else { - // Skip token for error recovery. - getNextToken(); - } -} - -static void HandleTopLevelExpression() { - // Evaluate a top-level expression into an anonymous function. - if (FunctionAST *F = ParseTopLevelExpr()) { - if (Function *LF = F->Codegen()) { - fprintf(stderr, "Read top-level expression:"); - LF->dump(); - } - } else { - // Skip token for error recovery. - getNextToken(); - } -} - -/// top ::= definition | external | expression | ';' -static void MainLoop() { - while (1) { - fprintf(stderr, "ready> "); - switch (CurTok) { - case tok_eof: return; - case ';': getNextToken(); break; // ignore top-level semicolons. - case tok_def: HandleDefinition(); break; - case tok_extern: HandleExtern(); break; - default: HandleTopLevelExpression(); break; - } - } -} - -//===----------------------------------------------------------------------===// -// "Library" functions that can be "extern'd" from user code. -//===----------------------------------------------------------------------===// - -/// putchard - putchar that takes a double and returns 0. -extern "C" -double putchard(double X) { - putchar((char)X); - return 0; -} - -//===----------------------------------------------------------------------===// -// Main driver code. -//===----------------------------------------------------------------------===// - -int main() { - LLVMContext &Context = getGlobalContext(); - - // Install standard binary operators. - // 1 is lowest precedence. - BinopPrecedence['<'] = 10; - BinopPrecedence['+'] = 20; - BinopPrecedence['-'] = 20; - BinopPrecedence['*'] = 40; // highest. - - // Prime the first token. - fprintf(stderr, "ready> "); - getNextToken(); - - // Make the module, which holds all the code. - TheModule = new Module("my cool jit", Context); - - // Run the main "interpreter loop" now. - MainLoop(); - - // Print out all of the generated code. - TheModule->dump(); - - return 0; -} -</pre> -</div> -<a href="LangImpl4.html">Next: Adding JIT and Optimizer Support</a> -</div> - -<!-- *********************************************************************** --> -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> - - <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> - <a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> -</body> -</html> diff --git a/docs/tutorial/LangImpl3.rst b/docs/tutorial/LangImpl3.rst new file mode 100644 index 0000000000..01935a443b --- /dev/null +++ b/docs/tutorial/LangImpl3.rst @@ -0,0 +1,1162 @@ +======================================== +Kaleidoscope: Code generation to LLVM IR +======================================== + +.. contents:: + :local: + +Written by `Chris Lattner <mailto:sabre@nondot.org>`_ + +Chapter 3 Introduction +====================== + +Welcome to Chapter 3 of the "`Implementing a language with +LLVM <index.html>`_" tutorial. This chapter shows you how to transform +the `Abstract Syntax Tree <LangImpl2.html>`_, built in Chapter 2, into +LLVM IR. This will teach you a little bit about how LLVM does things, as +well as demonstrate how easy it is to use. It's much more work to build +a lexer and parser than it is to generate LLVM IR code. :) + +**Please note**: the code in this chapter and later require LLVM 2.2 or +later. LLVM 2.1 and before will not work with it. Also note that you +need to use a version of this tutorial that matches your LLVM release: +If you are using an official LLVM release, use the version of the +documentation included with your release or on the `llvm.org releases +page <http://llvm.org/releases/>`_. + +Code Generation Setup +===================== + +In order to generate LLVM IR, we want some simple setup to get started. +First we define virtual code generation (codegen) methods in each AST +class: + +.. code-block:: c++ + + /// ExprAST - Base class for all expression nodes. + class ExprAST { + public: + virtual ~ExprAST() {} + virtual Value *Codegen() = 0; + }; + + /// NumberExprAST - Expression class for numeric literals like "1.0". + class NumberExprAST : public ExprAST { + double Val; + public: + NumberExprAST(double val) : Val(val) {} + virtual Value *Codegen(); + }; + ... + +The Codegen() method says to emit IR for that AST node along with all +the things it depends on, and they all return an LLVM Value object. +"Value" is the class used to represent a "`Static Single Assignment +(SSA) <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_ +register" or "SSA value" in LLVM. The most distinct aspect of SSA values +is that their value is computed as the related instruction executes, and +it does not get a new value until (and if) the instruction re-executes. +In other words, there is no way to "change" an SSA value. For more +information, please read up on `Static Single +Assignment <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_ +- the concepts are really quite natural once you grok them. + +Note that instead of adding virtual methods to the ExprAST class +hierarchy, it could also make sense to use a `visitor +pattern <http://en.wikipedia.org/wiki/Visitor_pattern>`_ or some other +way to model this. Again, this tutorial won't dwell on good software +engineering practices: for our purposes, adding a virtual method is +simplest. + +The second thing we want is an "Error" method like we used for the +parser, which will be used to report errors found during code generation +(for example, use of an undeclared parameter): + +.. code-block:: c++ + + Value *ErrorV(const char *Str) { Error(Str); return 0; } + + static Module *TheModule; + static IRBuilder<> Builder(getGlobalContext()); + static std::map<std::string, Value*> NamedValues; + +The static variables will be used during code generation. ``TheModule`` +is the LLVM construct that contains all of the functions and global +variables in a chunk of code. In many ways, it is the top-level +structure that the LLVM IR uses to contain code. + +The ``Builder`` object is a helper object that makes it easy to generate +LLVM instructions. Instances of the +```IRBuilder`` <http://llvm.org/doxygen/IRBuilder_8h-source.html>`_ +class template keep track of the current place to insert instructions +and has methods to create new instructions. + +The ``NamedValues`` map keeps track of which values are defined in the +current scope and what their LLVM representation is. (In other words, it +is a symbol table for the code). In this form of Kaleidoscope, the only +things that can be referenced are function parameters. As such, function +parameters will be in this map when generating code for their function +body. + +With these basics in place, we can start talking about how to generate +code for each expression. Note that this assumes that the ``Builder`` +has been set up to generate code *into* something. For now, we'll assume +that this has already been done, and we'll just use it to emit code. + +Expression Code Generation +========================== + +Generating LLVM code for expression nodes is very straightforward: less +than 45 lines of commented code for all four of our expression nodes. +First we'll do numeric literals: + +.. code-block:: c++ + + Value *NumberExprAST::Codegen() { + return ConstantFP::get(getGlobalContext(), APFloat(Val)); + } + +In the LLVM IR, numeric constants are represented with the +``ConstantFP`` class, which holds the numeric value in an ``APFloat`` +internally (``APFloat`` has the capability of holding floating point +constants of Arbitrary Precision). This code basically just creates +and returns a ``ConstantFP``. Note that in the LLVM IR that constants +are all uniqued together and shared. For this reason, the API uses the +"foo::get(...)" idiom instead of "new foo(..)" or "foo::Create(..)". + +.. code-block:: c++ + + Value *VariableExprAST::Codegen() { + // Look this variable up in the function. + Value *V = NamedValues[Name]; + return V ? V : ErrorV("Unknown variable name"); + } + +References to variables are also quite simple using LLVM. In the simple +version of Kaleidoscope, we assume that the variable has already been +emitted somewhere and its value is available. In practice, the only +values that can be in the ``NamedValues`` map are function arguments. +This code simply checks to see that the specified name is in the map (if +not, an unknown variable is being referenced) and returns the value for +it. In future chapters, we'll add support for `loop induction +variables <LangImpl5.html#for>`_ in the symbol table, and for `local +variables <LangImpl7.html#localvars>`_. + +.. code-block:: c++ + + Value *BinaryExprAST::Codegen() { + Value *L = LHS->Codegen(); + Value *R = RHS->Codegen(); + if (L == 0 || R == 0) return 0; + + switch (Op) { + case '+': return Builder.CreateFAdd(L, R, "addtmp"); + case '-': return Builder.CreateFSub(L, R, "subtmp"); + case '*': return Builder.CreateFMul(L, R, "multmp"); + case '<': + L = Builder.CreateFCmpULT(L, R, "cmptmp"); + // Convert bool 0/1 to double 0.0 or 1.0 + return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), + "booltmp"); + default: return ErrorV("invalid binary operator"); + } + } + +Binary operators start to get more interesting. The basic idea here is +that we recursively emit code for the left-hand side of the expression, +then the right-hand side, then we compute the result of the binary +expression. In this code, we do a simple switch on the opcode to create +the right LLVM instruction. + +In the example above, the LLVM builder class is starting to show its +value. IRBuilder knows where to insert the newly created instruction, +all you have to do is specify what instruction to create (e.g. with +``CreateFAdd``), which operands to use (``L`` and ``R`` here) and +optionally provide a name for the generated instruction. + +One nice thing about LLVM is that the name is just a hint. For instance, +if the code above emits multiple "addtmp" variables, LLVM will +automatically provide each one with an increasing, unique numeric +suffix. Local value names for instructions are purely optional, but it +makes it much easier to read the IR dumps. + +`LLVM instructions <../LangRef.html#instref>`_ are constrained by strict +rules: for example, the Left and Right operators of an `add +instruction <../LangRef.html#i_add>`_ must have the same type, and the +result type of the add must match the operand types. Because all values +in Kaleidoscope are doubles, this makes for very simple code for add, +sub and mul. + +On the other hand, LLVM specifies that the `fcmp +instruction <../LangRef.html#i_fcmp>`_ always returns an 'i1' value (a +one bit integer). The problem with this is that Kaleidoscope wants the +value to be a 0.0 or 1.0 value. In order to get these semantics, we +combine the fcmp instruction with a `uitofp +instruction <../LangRef.html#i_uitofp>`_. This instruction converts its +input integer into a floating point value by treating the input as an +unsigned value. In contrast, if we used the `sitofp +instruction <../LangRef.html#i_sitofp>`_, the Kaleidoscope '<' operator +would return 0.0 and -1.0, depending on the input value. + +.. code-block:: c++ + + Value *CallExprAST::Codegen() { + // Look up the name in the global module table. + Function *CalleeF = TheModule->getFunction(Callee); + if (CalleeF == 0) + return ErrorV("Unknown function referenced"); + + // If argument mismatch error. + if (CalleeF->arg_size() != Args.size()) + return ErrorV("Incorrect # arguments passed"); + + std::vector<Value*> ArgsV; + for (unsigned i = 0, e = Args.size(); i != e; ++i) { + ArgsV.push_back(Args[i]->Codegen()); + if (ArgsV.back() == 0) return 0; + } + + return Builder.CreateCall(CalleeF, ArgsV, "calltmp"); + } + +Code generation for function calls is quite straightforward with LLVM. +The code above initially does a function name lookup in the LLVM +Module's symbol table. Recall that the LLVM Module is the container that +holds all of the functions we are JIT'ing. By giving each function the +same name as what the user specifies, we can use the LLVM symbol table +to resolve function names for us. + +Once we have the function to call, we recursively codegen each argument +that is to be passed in, and create an LLVM `call +instruction <../LangRef.html#i_call>`_. Note that LLVM uses the native C +calling conventions by default, allowing these calls to also call into +standard library functions like "sin" and "cos", with no additional +effort. + +This wraps up our handling of the four basic expressions that we have so +far in Kaleidoscope. Feel free to go in and add some more. For example, +by browsing the `LLVM language reference <../LangRef.html>`_ you'll find +several other interesting instructions that are really easy to plug into +our basic framework. + +Function Code Generation +======================== + +Code generation for prototypes and functions must handle a number of +details, which make their code less beautiful than expression code +generation, but allows us to illustrate some important points. First, +lets talk about code generation for prototypes: they are used both for +function bodies and external function declarations. The code starts +with: + +.. code-block:: c++ + + Function *PrototypeAST::Codegen() { + // Make the function type: double(double,double) etc. + std::vector<Type*> Doubles(Args.size(), + Type::getDoubleTy(getGlobalContext())); + FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()), + Doubles, false); + + Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule); + +This code packs a lot of power into a few lines. Note first that this +function returns a "Function\*" instead of a "Value\*". Because a +"prototype" really talks about the external interface for a function +(not the value computed by an expression), it makes sense for it to +return the LLVM Function it corresponds to when codegen'd. + +The call to ``FunctionType::get`` creates the ``FunctionType`` that +should be used for a given Prototype. Since all function arguments in +Kaleidoscope are of type double, the first line creates a vector of "N" +LLVM double types. It then uses the ``Functiontype::get`` method to +create a function type that takes "N" doubles as arguments, returns one +double as a result, and that is not vararg (the false parameter +indicates this). Note that Types in LLVM are uniqued just like Constants +are, so you don't "new" a type, you "get" it. + +The final line above actually creates the function that the prototype +will correspond to. This indicates the type, linkage and name to use, as +well as which module to insert into. "`external +linkage <../LangRef.html#linkage>`_" means that the function may be +defined outside the current module and/or that it is callable by +functions outside the module. The Name passed in is the name the user +specified: since "``TheModule``" is specified, this name is registered +in "``TheModule``"s symbol table, which is used by the function call +code above. + +.. code-block:: c++ + + // If F conflicted, there was already something named 'Name'. If it has a + // body, don't allow redefinition or reextern. + if (F->getName() != Name) { + // Delete the one we just made and get the existing one. + F->eraseFromParent(); + F = TheModule->getFunction(Name); + +The Module symbol table works just like the Function symbol table when +it comes to name conflicts: if a new function is created with a name +that was previously added to the symbol table, the new function will get +implicitly renamed when added to the Module. The code above exploits +this fact to determine if there was a previous definition of this +function. + +In Kaleidoscope, I choose to allow redefinitions of functions in two +cases: first, we want to allow 'extern'ing a function more than once, as +long as the prototypes for the externs match (since all arguments have +the same type, we just have to check that the number of arguments +match). Second, we want to allow 'extern'ing a function and then +defining a body for it. This is useful when defining mutually recursive +functions. + +In order to implement this, the code above first checks to see if there +is a collision on the name of the function. If so, it deletes the +function we just created (by calling ``eraseFromParent``) and then +calling ``getFunction`` to get the existing function with the specified +name. Note that many APIs in LLVM have "erase" forms and "remove" forms. +The "remove" form unlinks the object from its parent (e.g. a Function +from a Module) and returns it. The "erase" form unlinks the object and +then deletes it. + +.. code-block:: c++ + + // If F already has a body, reject this. + if (!F->empty()) { + ErrorF("redefinition of function"); + return 0; + } + + // If F took a different number of args, reject. + if (F->arg_size() != Args.size()) { + ErrorF("redefinition of function with different # args"); + return 0; + } + } + +In order to verify the logic above, we first check to see if the +pre-existing function is "empty". In this case, empty means that it has +no basic blocks in it, which means it has no body. If it has no body, it +is a forward declaration. Since we don't allow anything after a full +definition of the function, the code rejects this case. If the previous +reference to a function was an 'extern', we simply verify that the +number of arguments for that definition and this one match up. If not, +we emit an error. + +.. code-block:: c++ + + // Set names for all arguments. + unsigned Idx = 0; + for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size(); + ++AI, ++Idx) { + AI->setName(Args[Idx]); + + // Add arguments to variable symbol table. + NamedValues[Args[Idx]] = AI; + } + return F; + } + +The last bit of code for prototypes loops over all of the arguments in +the function, setting the name of the LLVM Argument objects to match, +and registering the arguments in the ``NamedValues`` map for future use +by the ``VariableExprAST`` AST node. Once this is set up, it returns the +Function object to the caller. Note that we don't check for conflicting +argument names here (e.g. "extern foo(a b a)"). Doing so would be very +straight-forward with the mechanics we have already used above. + +.. code-block:: c++ + + Function *FunctionAST::Codegen() { + NamedValues.clear(); + + Function *TheFunction = Proto->Codegen(); + if (TheFunction == 0) + return 0; + +Code generation for function definitions starts out simply enough: we +just codegen the prototype (Proto) and verify that it is ok. We then +clear out the ``NamedValues`` map to make sure that there isn't anything +in it from the last function we compiled. Code generation of the +prototype ensures that there is an LLVM Function object that is ready to +go for us. + +.. code-block:: c++ + + // Create a new basic block to start insertion into. + BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction); + Builder.SetInsertPoint(BB); + + if (Value *RetVal = Body->Codegen()) { + +Now we get to the point where the ``Builder`` is set up. The first line +creates a new `basic block <http://en.wikipedia.org/wiki/Basic_block>`_ +(named "entry"), which is inserted into ``TheFunction``. The second line +then tells the builder that new instructions should be inserted into the +end of the new basic block. Basic blocks in LLVM are an important part +of functions that define the `Control Flow +Graph <http://en.wikipedia.org/wiki/Control_flow_graph>`_. Since we +don't have any control flow, our functions will only contain one block +at this point. We'll fix this in `Chapter 5 <LangImpl5.html>`_ :). + +.. code-block:: c++ + + if (Value *RetVal = Body->Codegen()) { + // Finish off the function. + Builder.CreateRet(RetVal); + + // Validate the generated code, checking for consistency. + verifyFunction(*TheFunction); + + return TheFunction; + } + +Once the insertion point is set up, we call the ``CodeGen()`` method for +the root expression of the function. If no error happens, this emits +code to compute the expression into the entry block and returns the +value that was computed. Assuming no error, we then create an LLVM `ret +instruction <../LangRef.html#i_ret>`_, which completes the function. +Once the function is built, we call ``verifyFunction``, which is +provided by LLVM. This function does a variety of consistency checks on +the generated code, to determine if our compiler is doing everything +right. Using this is important: it can catch a lot of bugs. Once the +function is finished and validated, we return it. + +.. code-block:: c++ + + // Error reading body, remove function. + TheFunction->eraseFromParent(); + return 0; + } + +The only piece left here is handling of the error case. For simplicity, +we handle this by merely deleting the function we produced with the +``eraseFromParent`` method. This allows the user to redefine a function +that they incorrectly typed in before: if we didn't delete it, it would +live in the symbol table, with a body, preventing future redefinition. + +This code does have a bug, though. Since the ``PrototypeAST::Codegen`` +can return a previously defined forward declaration, our code can +actually delete a forward declaration. There are a number of ways to fix +this bug, see what you can come up with! Here is a testcase: + +:: + + extern foo(a b); # ok, defines foo. + def foo(a b) c; # error, 'c' is invalid. + def bar() foo(1, 2); # error, unknown function "foo" + +Driver Changes and Closing Thoughts +=================================== + +For now, code generation to LLVM doesn't really get us much, except that +we can look at the pretty IR calls. The sample code inserts calls to +Codegen into the "``HandleDefinition``", "``HandleExtern``" etc +functions, and then dumps out the LLVM IR. This gives a nice way to look +at the LLVM IR for simple functions. For example: + +:: + + ready> 4+5; + Read top-level expression: + define double @0() { + entry: + ret double 9.000000e+00 + } + +Note how the parser turns the top-level expression into anonymous +functions for us. This will be handy when we add `JIT +support <LangImpl4.html#jit>`_ in the next chapter. Also note that the +code is very literally transcribed, no optimizations are being performed +except simple constant folding done by IRBuilder. We will `add +optimizations <LangImpl4.html#trivialconstfold>`_ explicitly in the next +chapter. + +:: + + ready> def foo(a b) a*a + 2*a*b + b*b; + Read function definition: + define double @foo(double %a, double %b) { + entry: + %multmp = fmul double %a, %a + %multmp1 = fmul double 2.000000e+00, %a + %multmp2 = fmul double %multmp1, %b + %addtmp = fadd double %multmp, %multmp2 + %multmp3 = fmul double %b, %b + %addtmp4 = fadd double %addtmp, %multmp3 + ret double %addtmp4 + } + +This shows some simple arithmetic. Notice the striking similarity to the +LLVM builder calls that we use to create the instructions. + +:: + + ready> def bar(a) foo(a, 4.0) + bar(31337); + Read function definition: + define double @bar(double %a) { + entry: + %calltmp = call double @foo(double %a, double 4.000000e+00) + %calltmp1 = call double @bar(double 3.133700e+04) + %addtmp = fadd double %calltmp, %calltmp1 + ret double %addtmp + } + +This shows some function calls. Note that this function will take a long +time to execute if you call it. In the future we'll add conditional +control flow to actually make recursion useful :). + +:: + + ready> extern cos(x); + Read extern: + declare double @cos(double) + + ready> cos(1.234); + Read top-level expression: + define double @1() { + entry: + %calltmp = call double @cos(double 1.234000e+00) + ret double %calltmp + } + +This shows an extern for the libm "cos" function, and a call to it. + +.. TODO:: Abandon Pygments' horrible `llvm` lexer. It just totally gives up + on highlighting this due to the first line. + +:: + + ready> ^D + ; ModuleID = 'my cool jit' + + define double @0() { + entry: + %addtmp = fadd double 4.000000e+00, 5.000000e+00 + ret double %addtmp + } + + define double @foo(double %a, double %b) { + entry: + %multmp = fmul double %a, %a + %multmp1 = fmul double 2.000000e+00, %a + %multmp2 = fmul double %multmp1, %b + %addtmp = fadd double %multmp, %multmp2 + %multmp3 = fmul double %b, %b + %addtmp4 = fadd double %addtmp, %multmp3 + ret double %addtmp4 + } + + define double @bar(double %a) { + entry: + %calltmp = call double @foo(double %a, double 4.000000e+00) + %calltmp1 = call double @bar(double 3.133700e+04) + %addtmp = fadd double %calltmp, %calltmp1 + ret double %addtmp + } + + declare double @cos(double) + + define double @1() { + entry: + %calltmp = call double @cos(double 1.234000e+00) + ret double %calltmp + } + +When you quit the current demo, it dumps out the IR for the entire +module generated. Here you can see the big picture with all the +functions referencing each other. + +This wraps up the third chapter of the Kaleidoscope tutorial. Up next, +we'll describe how to `add JIT codegen and optimizer +support <LangImpl4.html>`_ to this so we can actually start running +code! + +Full Code Listing +================= + +Here is the complete code listing for our running example, enhanced with +the LLVM code generator. Because this uses the LLVM libraries, we need +to link them in. To do this, we use the +`llvm-config <http://llvm.org/cmds/llvm-config.html>`_ tool to inform +our makefile/command line about which options to use: + +.. code-block:: bash + + # Compile + clang++ -g -O3 toy.cpp `llvm-config --cppflags --ldflags --libs core` -o toy + # Run + ./toy + +Here is the code: + +.. code-block:: c++ + + // To build this: + // See example below. + + #include "llvm/DerivedTypes.h" + #include "llvm/IRBuilder.h" + #include "llvm/LLVMContext.h" + #include "llvm/Module.h" + #include "llvm/Analysis/Verifier.h" + #include <cstdio> + #include <string> + #include <map> + #include <vector> + using namespace llvm; + + //===----------------------------------------------------------------------===// + // Lexer + //===----------------------------------------------------------------------===// + + // The lexer returns tokens [0-255] if it is an unknown character, otherwise one + // of these for known things. + enum Token { + tok_eof = -1, + + // commands + tok_def = -2, tok_extern = -3, + + // primary + tok_identifier = -4, tok_number = -5 + }; + + static std::string IdentifierStr; // Filled in if tok_identifier + static double NumVal; // Filled in if tok_number + + /// gettok - Return the next token from standard input. + static int gettok() { + static int LastChar = ' '; + + // Skip any whitespace. + while (isspace(LastChar)) + LastChar = getchar(); + + if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]* + IdentifierStr = LastChar; + while (isalnum((LastChar = getchar()))) + IdentifierStr += LastChar; + + if (IdentifierStr == "def") return tok_def; + if (IdentifierStr == "extern") return tok_extern; + return tok_identifier; + } + + if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+ + std::string NumStr; + do { + NumStr += LastChar; + LastChar = getchar(); + } while (isdigit(LastChar) || LastChar == '.'); + + NumVal = strtod(NumStr.c_str(), 0); + return tok_number; + } + + if (LastChar == '#') { + // Comment until end of line. + do LastChar = getchar(); + while (LastChar != EOF && LastChar != '\n' && LastChar != '\r'); + + if (LastChar != EOF) + return gettok(); + } + + // Check for end of file. Don't eat the EOF. + if (LastChar == EOF) + return tok_eof; + + // Otherwise, just return the character as its ascii value. + int ThisChar = LastChar; + LastChar = getchar(); + return ThisChar; + } + + //===----------------------------------------------------------------------===// + // Abstract Syntax Tree (aka Parse Tree) + //===----------------------------------------------------------------------===// + + /// ExprAST - Base class for all expression nodes. + class ExprAST { + public: + virtual ~ExprAST() {} + virtual Value *Codegen() = 0; + }; + + /// NumberExprAST - Expression class for numeric literals like "1.0". + class NumberExprAST : public ExprAST { + double Val; + public: + NumberExprAST(double val) : Val(val) {} + virtual Value *Codegen(); + }; + + /// VariableExprAST - Expression class for referencing a variable, like "a". + class VariableExprAST : public ExprAST { + std::string Name; + public: + VariableExprAST(const std::string &name) : Name(name) {} + virtual Value *Codegen(); + }; + + /// BinaryExprAST - Expression class for a binary operator. + class BinaryExprAST : public ExprAST { + char Op; + ExprAST *LHS, *RHS; + public: + BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs) + : Op(op), LHS(lhs), RHS(rhs) {} + virtual Value *Codegen(); + }; + + /// CallExprAST - Expression class for function calls. + class CallExprAST : public ExprAST { + std::string Callee; + std::vector<ExprAST*> Args; + public: + CallExprAST(const std::string &callee, std::vector<ExprAST*> &args) + : Callee(callee), Args(args) {} + virtual Value *Codegen(); + }; + + /// PrototypeAST - This class represents the "prototype" for a function, + /// which captures its name, and its argument names (thus implicitly the number + /// of arguments the function takes). + class PrototypeAST { + std::string Name; + std::vector<std::string> Args; + public: + PrototypeAST(const std::string &name, const std::vector<std::string> &args) + : Name(name), Args(args) {} + + Function *Codegen(); + }; + + /// FunctionAST - This class represents a function definition itself. + class FunctionAST { + PrototypeAST *Proto; + ExprAST *Body; + public: + FunctionAST(PrototypeAST *proto, ExprAST *body) + : Proto(proto), Body(body) {} + + Function *Codegen(); + }; + + //===----------------------------------------------------------------------===// + // Parser + //===----------------------------------------------------------------------===// + + /// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current + /// token the parser is looking at. getNextToken reads another token from the + /// lexer and updates CurTok with its results. + static int CurTok; + static int getNextToken() { + return CurTok = gettok(); + } + + /// BinopPrecedence - This holds the precedence for each binary operator that is + /// defined. + static std::map<char, int> BinopPrecedence; + + /// GetTokPrecedence - Get the precedence of the pending binary operator token. + static int GetTokPrecedence() { + if (!isascii(CurTok)) + return -1; + + // Make sure it's a declared binop. + int TokPrec = BinopPrecedence[CurTok]; + if (TokPrec <= 0) return -1; + return TokPrec; + } + + /// Error* - These are little helper functions for error handling. + ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;} + PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; } + FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; } + + static ExprAST *ParseExpression(); + + /// identifierexpr + /// ::= identifier + /// ::= identifier '(' expression* ')' + static ExprAST *ParseIdentifierExpr() { + std::string IdName = IdentifierStr; + + getNextToken(); // eat identifier. + + if (CurTok != '(') // Simple variable ref. + return new VariableExprAST(IdName); + + // Call. + getNextToken(); // eat ( + std::vector<ExprAST*> Args; + if (CurTok != ')') { + while (1) { + ExprAST *Arg = ParseExpression(); + if (!Arg) return 0; + Args.push_back(Arg); + + if (CurTok == ')') break; + + if (CurTok != ',') + return Error("Expected ')' or ',' in argument list"); + getNextToken(); + } + } + + // Eat the ')'. + getNextToken(); + + return new CallExprAST(IdName, Args); + } + + /// numberexpr ::= number + static ExprAST *ParseNumberExpr() { + ExprAST *Result = new NumberExprAST(NumVal); + getNextToken(); // consume the number + return Result; + } + + /// parenexpr ::= '(' expression ')' + static ExprAST *ParseParenExpr() { + getNextToken(); // eat (. + ExprAST *V = ParseExpression(); + if (!V) return 0; + + if (CurTok != ')') + return Error("expected ')'"); + getNextToken(); // eat ). + return V; + } + + /// primary + /// ::= identifierexpr + /// ::= numberexpr + /// ::= parenexpr + static ExprAST *ParsePrimary() { + switch (CurTok) { + default: return Error("unknown token when expecting an expression"); + case tok_identifier: return ParseIdentifierExpr(); + case tok_number: return ParseNumberExpr(); + case '(': return ParseParenExpr(); + } + } + + /// binoprhs + /// ::= ('+' primary)* + static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) { + // If this is a binop, find its precedence. + while (1) { + int TokPrec = GetTokPrecedence(); + + // If this is a binop that binds at least as tightly as the current binop, + // consume it, otherwise we are done. + if (TokPrec < ExprPrec) + return LHS; + + // Okay, we know this is a binop. + int BinOp = CurTok; + getNextToken(); // eat binop + + // Parse the primary expression after the binary operator. + ExprAST *RHS = ParsePrimary(); + if (!RHS) return 0; + + // If BinOp binds less tightly with RHS than the operator after RHS, let + // the pending operator take RHS as its LHS. + int NextPrec = GetTokPrecedence(); + if (TokPrec < NextPrec) { + RHS = ParseBinOpRHS(TokPrec+1, RHS); + if (RHS == 0) return 0; + } + + // Merge LHS/RHS. + LHS = new BinaryExprAST(BinOp, LHS, RHS); + } + } + + /// expression + /// ::= primary binoprhs + /// + static ExprAST *ParseExpression() { + ExprAST *LHS = ParsePrimary(); + if (!LHS) return 0; + + return ParseBinOpRHS(0, LHS); + } + + /// prototype + /// ::= id '(' id* ')' + static PrototypeAST *ParsePrototype() { + if (CurTok != tok_identifier) + return ErrorP("Expected function name in prototype"); + + std::string FnName = IdentifierStr; + getNextToken(); + + if (CurTok != '(') + return ErrorP("Expected '(' in prototype"); + + std::vector<std::string> ArgNames; + while (getNextToken() == tok_identifier) + ArgNames.push_back(IdentifierStr); + if (CurTok != ')') + return ErrorP("Expected ')' in prototype"); + + // success. + getNextToken(); // eat ')'. + + return new PrototypeAST(FnName, ArgNames); + } + + /// definition ::= 'def' prototype expression + static FunctionAST *ParseDefinition() { + getNextToken(); // eat def. + PrototypeAST *Proto = ParsePrototype(); + if (Proto == 0) return 0; + + if (ExprAST *E = ParseExpression()) + return new FunctionAST(Proto, E); + return 0; + } + + /// toplevelexpr ::= expression + static FunctionAST *ParseTopLevelExpr() { + if (ExprAST *E = ParseExpression()) { + // Make an anonymous proto. + PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>()); + return new FunctionAST(Proto, E); + } + return 0; + } + + /// external ::= 'extern' prototype + static PrototypeAST *ParseExtern() { + getNextToken(); // eat extern. + return ParsePrototype(); + } + + //===----------------------------------------------------------------------===// + // Code Generation + //===----------------------------------------------------------------------===// + + static Module *TheModule; + static IRBuilder<> Builder(getGlobalContext()); + static std::map<std::string, Value*> NamedValues; + + Value *ErrorV(const char *Str) { Error(Str); return 0; } + + Value *NumberExprAST::Codegen() { + return ConstantFP::get(getGlobalContext(), APFloat(Val)); + } + + Value *VariableExprAST::Codegen() { + // Look this variable up in the function. + Value *V = NamedValues[Name]; + return V ? V : ErrorV("Unknown variable name"); + } + + Value *BinaryExprAST::Codegen() { + Value *L = LHS->Codegen(); + Value *R = RHS->Codegen(); + if (L == 0 || R == 0) return 0; + + switch (Op) { + case '+': return Builder.CreateFAdd(L, R, "addtmp"); + case '-': return Builder.CreateFSub(L, R, "subtmp"); + case '*': return Builder.CreateFMul(L, R, "multmp"); + case '<': + L = Builder.CreateFCmpULT(L, R, "cmptmp"); + // Convert bool 0/1 to double 0.0 or 1.0 + return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), + "booltmp"); + default: return ErrorV("invalid binary operator"); + } + } + + Value *CallExprAST::Codegen() { + // Look up the name in the global module table. + Function *CalleeF = TheModule->getFunction(Callee); + if (CalleeF == 0) + return ErrorV("Unknown function referenced"); + + // If argument mismatch error. + if (CalleeF->arg_size() != Args.size()) + return ErrorV("Incorrect # arguments passed"); + + std::vector<Value*> ArgsV; + for (unsigned i = 0, e = Args.size(); i != e; ++i) { + ArgsV.push_back(Args[i]->Codegen()); + if (ArgsV.back() == 0) return 0; + } + + return Builder.CreateCall(CalleeF, ArgsV, "calltmp"); + } + + Function *PrototypeAST::Codegen() { + // Make the function type: double(double,double) etc. + std::vector<Type*> Doubles(Args.size(), + Type::getDoubleTy(getGlobalContext())); + FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()), + Doubles, false); + + Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule); + + // If F conflicted, there was already something named 'Name'. If it has a + // body, don't allow redefinition or reextern. + if (F->getName() != Name) { + // Delete the one we just made and get the existing one. + F->eraseFromParent(); + F = TheModule->getFunction(Name); + + // If F already has a body, reject this. + if (!F->empty()) { + ErrorF("redefinition of function"); + return 0; + } + + // If F took a different number of args, reject. + if (F->arg_size() != Args.size()) { + ErrorF("redefinition of function with different # args"); + return 0; + } + } + + // Set names for all arguments. + unsigned Idx = 0; + for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size(); + ++AI, ++Idx) { + AI->setName(Args[Idx]); + + // Add arguments to variable symbol table. + NamedValues[Args[Idx]] = AI; + } + + return F; + } + + Function *FunctionAST::Codegen() { + NamedValues.clear(); + + Function *TheFunction = Proto->Codegen(); + if (TheFunction == 0) + return 0; + + // Create a new basic block to start insertion into. + BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction); + Builder.SetInsertPoint(BB); + + if (Value *RetVal = Body->Codegen()) { + // Finish off the function. + Builder.CreateRet(RetVal); + + // Validate the generated code, checking for consistency. + verifyFunction(*TheFunction); + + return TheFunction; + } + + // Error reading body, remove function. + TheFunction->eraseFromParent(); + return 0; + } + + //===----------------------------------------------------------------------===// + // Top-Level parsing and JIT Driver + //===----------------------------------------------------------------------===// + + static void HandleDefinition() { + if (FunctionAST *F = ParseDefinition()) { + if (Function *LF = F->Codegen()) { + fprintf(stderr, "Read function definition:"); + LF->dump(); + } + } else { + // Skip token for error recovery. + getNextToken(); + } + } + + static void HandleExtern() { + if (PrototypeAST *P = ParseExtern()) { + if (Function *F = P->Codegen()) { + fprintf(stderr, "Read extern: "); + F->dump(); + } + } else { + // Skip token for error recovery. + getNextToken(); + } + } + + static void HandleTopLevelExpression() { + // Evaluate a top-level expression into an anonymous function. + if (FunctionAST *F = ParseTopLevelExpr()) { + if (Function *LF = F->Codegen()) { + fprintf(stderr, "Read top-level expression:"); + LF->dump(); + } + } else { + // Skip token for error recovery. + getNextToken(); + } + } + + /// top ::= definition | external | expression | ';' + static void MainLoop() { + while (1) { + fprintf(stderr, "ready> "); + switch (CurTok) { + case tok_eof: return; + case ';': getNextToken(); break; // ignore top-level semicolons. + case tok_def: HandleDefinition(); break; + case tok_extern: HandleExtern(); break; + default: HandleTopLevelExpression(); break; + } + } + } + + //===----------------------------------------------------------------------===// + // "Library" functions that can be "extern'd" from user code. + //===----------------------------------------------------------------------===// + + /// putchard - putchar that takes a double and returns 0. + extern "C" + double putchard(double X) { + putchar((char)X); + return 0; + } + + //===----------------------------------------------------------------------===// + // Main driver code. + //===----------------------------------------------------------------------===// + + int main() { + LLVMContext &Context = getGlobalContext(); + + // Install standard binary operators. + // 1 is lowest precedence. + BinopPrecedence['<'] = 10; + BinopPrecedence['+'] = 20; + BinopPrecedence['-'] = 20; + BinopPrecedence['*'] = 40; // highest. + + // Prime the first token. + fprintf(stderr, "ready> "); + getNextToken(); + + // Make the module, which holds all the code. + TheModule = new Module("my cool jit", Context); + + // Run the main "interpreter loop" now. + MainLoop(); + + // Print out all of the generated code. + TheModule->dump(); + + return 0; + } + +`Next: Adding JIT and Optimizer Support <LangImpl4.html>`_ + diff --git a/docs/tutorial/LangImpl4.html b/docs/tutorial/LangImpl4.html deleted file mode 100644 index 53695924b2..0000000000 --- a/docs/tutorial/LangImpl4.html +++ /dev/null @@ -1,1152 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> - -<html> -<head> - <title>Kaleidoscope: Adding JIT and Optimizer Support</title> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <meta name="author" content="Chris Lattner"> - <link rel="stylesheet" href="../_static/llvm.css" type="text/css"> -</head> - -<body> - -<h1>Kaleidoscope: Adding JIT and Optimizer Support</h1> - -<ul> -<li><a href="index.html">Up to Tutorial Index</a></li> -<li>Chapter 4 - <ol> - <li><a href="#intro">Chapter 4 Introduction</a></li> - <li><a href="#trivialconstfold">Trivial Constant Folding</a></li> - <li><a href="#optimizerpasses">LLVM Optimization Passes</a></li> - <li><a href="#jit">Adding a JIT Compiler</a></li> - <li><a href="#code">Full Code Listing</a></li> - </ol> -</li> -<li><a href="LangImpl5.html">Chapter 5</a>: Extending the Language: Control -Flow</li> -</ul> - -<div class="doc_author"> - <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p> -</div> - -<!-- *********************************************************************** --> -<h2><a name="intro">Chapter 4 Introduction</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>Welcome to Chapter 4 of the "<a href="index.html">Implementing a language -with LLVM</a>" tutorial. Chapters 1-3 described the implementation of a simple -language and added support for generating LLVM IR. This chapter describes -two new techniques: adding optimizer support to your language, and adding JIT -compiler support. These additions will demonstrate how to get nice, efficient code -for the Kaleidoscope language.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="trivialconstfold">Trivial Constant Folding</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p> -Our demonstration for Chapter 3 is elegant and easy to extend. Unfortunately, -it does not produce wonderful code. The IRBuilder, however, does give us -obvious optimizations when compiling simple code:</p> - -<div class="doc_code"> -<pre> -ready> <b>def test(x) 1+2+x;</b> -Read function definition: -define double @test(double %x) { -entry: - %addtmp = fadd double 3.000000e+00, %x - ret double %addtmp -} -</pre> -</div> - -<p>This code is not a literal transcription of the AST built by parsing the -input. That would be: - -<div class="doc_code"> -<pre> -ready> <b>def test(x) 1+2+x;</b> -Read function definition: -define double @test(double %x) { -entry: - %addtmp = fadd double 2.000000e+00, 1.000000e+00 - %addtmp1 = fadd double %addtmp, %x - ret double %addtmp1 -} -</pre> -</div> - -<p>Constant folding, as seen above, in particular, is a very common and very -important optimization: so much so that many language implementors implement -constant folding support in their AST representation.</p> - -<p>With LLVM, you don't need this support in the AST. Since all calls to build -LLVM IR go through the LLVM IR builder, the builder itself checked to see if -there was a constant folding opportunity when you call it. If so, it just does -the constant fold and return the constant instead of creating an instruction. - -<p>Well, that was easy :). In practice, we recommend always using -<tt>IRBuilder</tt> when generating code like this. It has no -"syntactic overhead" for its use (you don't have to uglify your compiler with -constant checks everywhere) and it can dramatically reduce the amount of -LLVM IR that is generated in some cases (particular for languages with a macro -preprocessor or that use a lot of constants).</p> - -<p>On the other hand, the <tt>IRBuilder</tt> is limited by the fact -that it does all of its analysis inline with the code as it is built. If you -take a slightly more complex example:</p> - -<div class="doc_code"> -<pre> -ready> <b>def test(x) (1+2+x)*(x+(1+2));</b> -ready> Read function definition: -define double @test(double %x) { -entry: - %addtmp = fadd double 3.000000e+00, %x - %addtmp1 = fadd double %x, 3.000000e+00 - %multmp = fmul double %addtmp, %addtmp1 - ret double %multmp -} -</pre> -</div> - -<p>In this case, the LHS and RHS of the multiplication are the same value. We'd -really like to see this generate "<tt>tmp = x+3; result = tmp*tmp;</tt>" instead -of computing "<tt>x+3</tt>" twice.</p> - -<p>Unfortunately, no amount of local analysis will be able to detect and correct -this. This requires two transformations: reassociation of expressions (to -make the add's lexically identical) and Common Subexpression Elimination (CSE) -to delete the redundant add instruction. Fortunately, LLVM provides a broad -range of optimizations that you can use, in the form of "passes".</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="optimizerpasses">LLVM Optimization Passes</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>LLVM provides many optimization passes, which do many different sorts of -things and have different tradeoffs. Unlike other systems, LLVM doesn't hold -to the mistaken notion that one set of optimizations is right for all languages -and for all situations. LLVM allows a compiler implementor to make complete -decisions about what optimizations to use, in which order, and in what -situation.</p> - -<p>As a concrete example, LLVM supports both "whole module" passes, which look -across as large of body of code as they can (often a whole file, but if run -at link time, this can be a substantial portion of the whole program). It also -supports and includes "per-function" passes which just operate on a single -function at a time, without looking at other functions. For more information -on passes and how they are run, see the <a href="../WritingAnLLVMPass.html">How -to Write a Pass</a> document and the <a href="../Passes.html">List of LLVM -Passes</a>.</p> - -<p>For Kaleidoscope, we are currently generating functions on the fly, one at -a time, as the user types them in. We aren't shooting for the ultimate -optimization experience in this setting, but we also want to catch the easy and -quick stuff where possible. As such, we will choose to run a few per-function -optimizations as the user types the function in. If we wanted to make a "static -Kaleidoscope compiler", we would use exactly the code we have now, except that -we would defer running the optimizer until the entire file has been parsed.</p> - -<p>In order to get per-function optimizations going, we need to set up a -<a href="../WritingAnLLVMPass.html#passmanager">FunctionPassManager</a> to hold and -organize the LLVM optimizations that we want to run. Once we have that, we can -add a set of optimizations to run. The code looks like this:</p> - -<div class="doc_code"> -<pre> - FunctionPassManager OurFPM(TheModule); - - // Set up the optimizer pipeline. Start with registering info about how the - // target lays out data structures. - OurFPM.add(new DataLayout(*TheExecutionEngine->getDataLayout())); - // Provide basic AliasAnalysis support for GVN. - OurFPM.add(createBasicAliasAnalysisPass()); - // Do simple "peephole" optimizations and bit-twiddling optzns. - OurFPM.add(createInstructionCombiningPass()); - // Reassociate expressions. - OurFPM.add(createReassociatePass()); - // Eliminate Common SubExpressions. - OurFPM.add(createGVNPass()); - // Simplify the control flow graph (deleting unreachable blocks, etc). - OurFPM.add(createCFGSimplificationPass()); - - OurFPM.doInitialization(); - - // Set the global so the code gen can use this. - TheFPM = &OurFPM; - - // Run the main "interpreter loop" now. - MainLoop(); -</pre> -</div> - -<p>This code defines a <tt>FunctionPassManager</tt>, "<tt>OurFPM</tt>". It -requires a pointer to the <tt>Module</tt> to construct itself. Once it is set -up, we use a series of "add" calls to add a bunch of LLVM passes. The first -pass is basically boilerplate, it adds a pass so that later optimizations know -how the data structures in the program are laid out. The -"<tt>TheExecutionEngine</tt>" variable is related to the JIT, which we will get -to in the next section.</p> - -<p>In this case, we choose to add 4 optimization passes. The passes we chose -here are a pretty standard set of "cleanup" optimizations that are useful for -a wide variety of code. I won't delve into what they do but, believe me, -they are a good starting place :).</p> - -<p>Once the PassManager is set up, we need to make use of it. We do this by -running it after our newly created function is constructed (in -<tt>FunctionAST::Codegen</tt>), but before it is returned to the client:</p> - -<div class="doc_code"> -<pre> - if (Value *RetVal = Body->Codegen()) { - // Finish off the function. - Builder.CreateRet(RetVal); - - // Validate the generated code, checking for consistency. - verifyFunction(*TheFunction); - - <b>// Optimize the function. - TheFPM->run(*TheFunction);</b> - - return TheFunction; - } -</pre> -</div> - -<p>As you can see, this is pretty straightforward. The -<tt>FunctionPassManager</tt> optimizes and updates the LLVM Function* in place, -improving (hopefully) its body. With this in place, we can try our test above -again:</p> - -<div class="doc_code"> -<pre> -ready> <b>def test(x) (1+2+x)*(x+(1+2));</b> -ready> Read function definition: -define double @test(double %x) { -entry: - %addtmp = fadd double %x, 3.000000e+00 - %multmp = fmul double %addtmp, %addtmp - ret double %multmp -} -</pre> -</div> - -<p>As expected, we now get our nicely optimized code, saving a floating point -add instruction from every execution of this function.</p> - -<p>LLVM provides a wide variety of optimizations that can be used in certain -circumstances. Some <a href="../Passes.html">documentation about the various -passes</a> is available, but it isn't very complete. Another good source of -ideas can come from looking at the passes that <tt>Clang</tt> runs to get -started. The "<tt>opt</tt>" tool allows you to experiment with passes from the -command line, so you can see if they do anything.</p> - -<p>Now that we have reasonable code coming out of our front-end, lets talk about -executing it!</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="jit">Adding a JIT Compiler</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>Code that is available in LLVM IR can have a wide variety of tools -applied to it. For example, you can run optimizations on it (as we did above), -you can dump it out in textual or binary forms, you can compile the code to an -assembly file (.s) for some target, or you can JIT compile it. The nice thing -about the LLVM IR representation is that it is the "common currency" between -many different parts of the compiler. -</p> - -<p>In this section, we'll add JIT compiler support to our interpreter. The -basic idea that we want for Kaleidoscope is to have the user enter function -bodies as they do now, but immediately evaluate the top-level expressions they -type in. For example, if they type in "1 + 2;", we should evaluate and print -out 3. If they define a function, they should be able to call it from the -command line.</p> - -<p>In order to do this, we first declare and initialize the JIT. This is done -by adding a global variable and a call in <tt>main</tt>:</p> - -<div class="doc_code"> -<pre> -<b>static ExecutionEngine *TheExecutionEngine;</b> -... -int main() { - .. - <b>// Create the JIT. This takes ownership of the module. - TheExecutionEngine = EngineBuilder(TheModule).create();</b> - .. -} -</pre> -</div> - -<p>This creates an abstract "Execution Engine" which can be either a JIT -compiler or the LLVM interpreter. LLVM will automatically pick a JIT compiler -for you if one is available for your platform, otherwise it will fall back to -the interpreter.</p> - -<p>Once the <tt>ExecutionEngine</tt> is created, the JIT is ready to be used. -There are a variety of APIs that are useful, but the simplest one is the -"<tt>getPointerToFunction(F)</tt>" method. This method JIT compiles the -specified LLVM Function and returns a function pointer to the generated machine -code. In our case, this means that we can change the code that parses a -top-level expression to look like this:</p> - -<div class="doc_code"> -<pre> -static void HandleTopLevelExpression() { - // Evaluate a top-level expression into an anonymous function. - if (FunctionAST *F = ParseTopLevelExpr()) { - if (Function *LF = F->Codegen()) { - LF->dump(); // Dump the function for exposition purposes. - - <b>// JIT the function, returning a function pointer. - void *FPtr = TheExecutionEngine->getPointerToFunction(LF); - - // Cast it to the right type (takes no arguments, returns a double) so we - // can call it as a native function. - double (*FP)() = (double (*)())(intptr_t)FPtr; - fprintf(stderr, "Evaluated to %f\n", FP());</b> - } -</pre> -</div> - -<p>Recall that we compile top-level expressions into a self-contained LLVM -function that takes no arguments and returns the computed double. Because the -LLVM JIT compiler matches the native platform ABI, this means that you can just -cast the result pointer to a function pointer of that type and call it directly. -This means, there is no difference between JIT compiled code and native machine -code that is statically linked into your application.</p> - -<p>With just these two changes, lets see how Kaleidoscope works now!</p> - -<div class="doc_code"> -<pre> -ready> <b>4+5;</b> -Read top-level expression: -define double @0() { -entry: - ret double 9.000000e+00 -} - -<em>Evaluated to 9.000000</em> -</pre> -</div> - -<p>Well this looks like it is basically working. The dump of the function -shows the "no argument function that always returns double" that we synthesize -for each top-level expression that is typed in. This demonstrates very basic -functionality, but can we do more?</p> - -<div class="doc_code"> -<pre> -ready> <b>def testfunc(x y) x + y*2; </b> -Read function definition: -define double @testfunc(double %x, double %y) { -entry: - %multmp = fmul double %y, 2.000000e+00 - %addtmp = fadd double %multmp, %x - ret double %addtmp -} - -ready> <b>testfunc(4, 10);</b> -Read top-level expression: -define double @1() { -entry: - %calltmp = call double @testfunc(double 4.000000e+00, double 1.000000e+01) - ret double %calltmp -} - -<em>Evaluated to 24.000000</em> -</pre> -</div> - -<p>This illustrates that we can now call user code, but there is something a bit -subtle going on here. Note that we only invoke the JIT on the anonymous -functions that <em>call testfunc</em>, but we never invoked it -on <em>testfunc</em> itself. What actually happened here is that the JIT -scanned for all non-JIT'd functions transitively called from the anonymous -function and compiled all of them before returning -from <tt>getPointerToFunction()</tt>.</p> - -<p>The JIT provides a number of other more advanced interfaces for things like -freeing allocated machine code, rejit'ing functions to update them, etc. -However, even with this simple code, we get some surprisingly powerful -capabilities - check this out (I removed the dump of the anonymous functions, -you should get the idea by now :) :</p> - -<div class="doc_code"> -<pre> -ready> <b>extern sin(x);</b> -Read extern: -declare double @sin(double) - -ready> <b>extern cos(x);</b> -Read extern: -declare double @cos(double) - -ready> <b>sin(1.0);</b> -Read top-level expression: -define double @2() { -entry: - ret double 0x3FEAED548F090CEE -} - -<em>Evaluated to 0.841471</em> - -ready> <b>def foo(x) sin(x)*sin(x) + cos(x)*cos(x);</b> -Read function definition: -define double @foo(double %x) { -entry: - %calltmp = call double @sin(double %x) - %multmp = fmul double %calltmp, %calltmp - %calltmp2 = call double @cos(double %x) - %multmp4 = fmul double %calltmp2, %calltmp2 - %addtmp = fadd double %multmp, %multmp4 - ret double %addtmp -} - -ready> <b>foo(4.0);</b> -Read top-level expression: -define double @3() { -entry: - %calltmp = call double @foo(double 4.000000e+00) - ret double %calltmp -} - -<em>Evaluated to 1.000000</em> -</pre> -</div> - -<p>Whoa, how does the JIT know about sin and cos? The answer is surprisingly -simple: in this -example, the JIT started execution of a function and got to a function call. It -realized that the function was not yet JIT compiled and invoked the standard set -of routines to resolve the function. In this case, there is no body defined -for the function, so the JIT ended up calling "<tt>dlsym("sin")</tt>" on the -Kaleidoscope process itself. -Since "<tt>sin</tt>" is defined within the JIT's address space, it simply -patches up calls in the module to call the libm version of <tt>sin</tt> -directly.</p> - -<p>The LLVM JIT provides a number of interfaces (look in the -<tt>ExecutionEngine.h</tt> file) for controlling how unknown functions get -resolved. It allows you to establish explicit mappings between IR objects and -addresses (useful for LLVM global variables that you want to map to static -tables, for example), allows you to dynamically decide on the fly based on the -function name, and even allows you to have the JIT compile functions lazily the -first time they're called.</p> - -<p>One interesting application of this is that we can now extend the language -by writing arbitrary C++ code to implement operations. For example, if we add: -</p> - -<div class="doc_code"> -<pre> -/// putchard - putchar that takes a double and returns 0. -extern "C" -double putchard(double X) { - putchar((char)X); - return 0; -} -</pre> -</div> - -<p>Now we can produce simple output to the console by using things like: -"<tt>extern putchard(x); putchard(120);</tt>", which prints a lowercase 'x' on -the console (120 is the ASCII code for 'x'). Similar code could be used to -implement file I/O, console input, and many other capabilities in -Kaleidoscope.</p> - -<p>This completes the JIT and optimizer chapter of the Kaleidoscope tutorial. At -this point, we can compile a non-Turing-complete programming language, optimize -and JIT compile it in a user-driven way. Next up we'll look into <a -href="LangImpl5.html">extending the language with control flow constructs</a>, -tackling some interesting LLVM IR issues along the way.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="code">Full Code Listing</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p> -Here is the complete code listing for our running example, enhanced with the -LLVM JIT and optimizer. To build this example, use: -</p> - -<div class="doc_code"> -<pre> -# Compile -clang++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy -# Run -./toy -</pre> -</div> - -<p> -If you are compiling this on Linux, make sure to add the "-rdynamic" option -as well. This makes sure that the external functions are resolved properly -at runtime.</p> - -<p>Here is the code:</p> - -<div class="doc_code"> -<pre> -#include "llvm/DerivedTypes.h" -#include "llvm/ExecutionEngine/ExecutionEngine.h" -#include "llvm/ExecutionEngine/JIT.h" -#include "llvm/IRBuilder.h" -#include "llvm/LLVMContext.h" -#include "llvm/Module.h" -#include "llvm/PassManager.h" -#include "llvm/Analysis/Verifier.h" -#include "llvm/Analysis/Passes.h" -#include "llvm/DataLayout.h" -#include "llvm/Transforms/Scalar.h" -#include "llvm/Support/TargetSelect.h" -#include <cstdio> -#include <string> -#include <map> -#include <vector> -using namespace llvm; - -//===----------------------------------------------------------------------===// -// Lexer -//===----------------------------------------------------------------------===// - -// The lexer returns tokens [0-255] if it is an unknown character, otherwise one -// of these for known things. -enum Token { - tok_eof = -1, - - // commands - tok_def = -2, tok_extern = -3, - - // primary - tok_identifier = -4, tok_number = -5 -}; - -static std::string IdentifierStr; // Filled in if tok_identifier -static double NumVal; // Filled in if tok_number - -/// gettok - Return the next token from standard input. -static int gettok() { - static int LastChar = ' '; - - // Skip any whitespace. - while (isspace(LastChar)) - LastChar = getchar(); - - if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]* - IdentifierStr = LastChar; - while (isalnum((LastChar = getchar()))) - IdentifierStr += LastChar; - - if (IdentifierStr == "def") return tok_def; - if (IdentifierStr == "extern") return tok_extern; - return tok_identifier; - } - - if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+ - std::string NumStr; - do { - NumStr += LastChar; - LastChar = getchar(); - } while (isdigit(LastChar) || LastChar == '.'); - - NumVal = strtod(NumStr.c_str(), 0); - return tok_number; - } - - if (LastChar == '#') { - // Comment until end of line. - do LastChar = getchar(); - while (LastChar != EOF && LastChar != '\n' && LastChar != '\r'); - - if (LastChar != EOF) - return gettok(); - } - - // Check for end of file. Don't eat the EOF. - if (LastChar == EOF) - return tok_eof; - - // Otherwise, just return the character as its ascii value. - int ThisChar = LastChar; - LastChar = getchar(); - return ThisChar; -} - -//===----------------------------------------------------------------------===// -// Abstract Syntax Tree (aka Parse Tree) -//===----------------------------------------------------------------------===// - -/// ExprAST - Base class for all expression nodes. -class ExprAST { -public: - virtual ~ExprAST() {} - virtual Value *Codegen() = 0; -}; - -/// NumberExprAST - Expression class for numeric literals like "1.0". -class NumberExprAST : public ExprAST { - double Val; -public: - NumberExprAST(double val) : Val(val) {} - virtual Value *Codegen(); -}; - -/// VariableExprAST - Expression class for referencing a variable, like "a". -class VariableExprAST : public ExprAST { - std::string Name; -public: - VariableExprAST(const std::string &name) : Name(name) {} - virtual Value *Codegen(); -}; - -/// BinaryExprAST - Expression class for a binary operator. -class BinaryExprAST : public ExprAST { - char Op; - ExprAST *LHS, *RHS; -public: - BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs) - : Op(op), LHS(lhs), RHS(rhs) {} - virtual Value *Codegen(); -}; - -/// CallExprAST - Expression class for function calls. -class CallExprAST : public ExprAST { - std::string Callee; - std::vector<ExprAST*> Args; -public: - CallExprAST(const std::string &callee, std::vector<ExprAST*> &args) - : Callee(callee), Args(args) {} - virtual Value *Codegen(); -}; - -/// PrototypeAST - This class represents the "prototype" for a function, -/// which captures its name, and its argument names (thus implicitly the number -/// of arguments the function takes). -class PrototypeAST { - std::string Name; - std::vector<std::string> Args; -public: - PrototypeAST(const std::string &name, const std::vector<std::string> &args) - : Name(name), Args(args) {} - - Function *Codegen(); -}; - -/// FunctionAST - This class represents a function definition itself. -class FunctionAST { - PrototypeAST *Proto; - ExprAST *Body; -public: - FunctionAST(PrototypeAST *proto, ExprAST *body) - : Proto(proto), Body(body) {} - - Function *Codegen(); -}; - -//===----------------------------------------------------------------------===// -// Parser -//===----------------------------------------------------------------------===// - -/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current -/// token the parser is looking at. getNextToken reads another token from the -/// lexer and updates CurTok with its results. -static int CurTok; -static int getNextToken() { - return CurTok = gettok(); -} - -/// BinopPrecedence - This holds the precedence for each binary operator that is -/// defined. -static std::map<char, int> BinopPrecedence; - -/// GetTokPrecedence - Get the precedence of the pending binary operator token. -static int GetTokPrecedence() { - if (!isascii(CurTok)) - return -1; - - // Make sure it's a declared binop. - int TokPrec = BinopPrecedence[CurTok]; - if (TokPrec <= 0) return -1; - return TokPrec; -} - -/// Error* - These are little helper functions for error handling. -ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;} -PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; } -FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; } - -static ExprAST *ParseExpression(); - -/// identifierexpr -/// ::= identifier -/// ::= identifier '(' expression* ')' -static ExprAST *ParseIdentifierExpr() { - std::string IdName = IdentifierStr; - - getNextToken(); // eat identifier. - - if (CurTok != '(') // Simple variable ref. - return new VariableExprAST(IdName); - - // Call. - getNextToken(); // eat ( - std::vector<ExprAST*> Args; - if (CurTok != ')') { - while (1) { - ExprAST *Arg = ParseExpression(); - if (!Arg) return 0; - Args.push_back(Arg); - - if (CurTok == ')') break; - - if (CurTok != ',') - return Error("Expected ')' or ',' in argument list"); - getNextToken(); - } - } - - // Eat the ')'. - getNextToken(); - - return new CallExprAST(IdName, Args); -} - -/// numberexpr ::= number -static ExprAST *ParseNumberExpr() { - ExprAST *Result = new NumberExprAST(NumVal); - getNextToken(); // consume the number - return Result; -} - -/// parenexpr ::= '(' expression ')' -static ExprAST *ParseParenExpr() { - getNextToken(); // eat (. - ExprAST *V = ParseExpression(); - if (!V) return 0; - - if (CurTok != ')') - return Error("expected ')'"); - getNextToken(); // eat ). - return V; -} - -/// primary -/// ::= identifierexpr -/// ::= numberexpr -/// ::= parenexpr -static ExprAST *ParsePrimary() { - switch (CurTok) { - default: return Error("unknown token when expecting an expression"); - case tok_identifier: return ParseIdentifierExpr(); - case tok_number: return ParseNumberExpr(); - case '(': return ParseParenExpr(); - } -} - -/// binoprhs -/// ::= ('+' primary)* -static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) { - // If this is a binop, find its precedence. - while (1) { - int TokPrec = GetTokPrecedence(); - - // If this is a binop that binds at least as tightly as the current binop, - // consume it, otherwise we are done. - if (TokPrec < ExprPrec) - return LHS; - - // Okay, we know this is a binop. - int BinOp = CurTok; - getNextToken(); // eat binop - - // Parse the primary expression after the binary operator. - ExprAST *RHS = ParsePrimary(); - if (!RHS) return 0; - - // If BinOp binds less tightly with RHS than the operator after RHS, let - // the pending operator take RHS as its LHS. - int NextPrec = GetTokPrecedence(); - if (TokPrec < NextPrec) { - RHS = ParseBinOpRHS(TokPrec+1, RHS); - if (RHS == 0) return 0; - } - - // Merge LHS/RHS. - LHS = new BinaryExprAST(BinOp, LHS, RHS); - } -} - -/// expression -/// ::= primary binoprhs -/// -static ExprAST *ParseExpression() { - ExprAST *LHS = ParsePrimary(); - if (!LHS) return 0; - - return ParseBinOpRHS(0, LHS); -} - -/// prototype -/// ::= id '(' id* ')' -static PrototypeAST *ParsePrototype() { - if (CurTok != tok_identifier) - return ErrorP("Expected function name in prototype"); - - std::string FnName = IdentifierStr; - getNextToken(); - - if (CurTok != '(') - return ErrorP("Expected '(' in prototype"); - - std::vector<std::string> ArgNames; - while (getNextToken() == tok_identifier) - ArgNames.push_back(IdentifierStr); - if (CurTok != ')') - return ErrorP("Expected ')' in prototype"); - - // success. - getNextToken(); // eat ')'. - - return new PrototypeAST(FnName, ArgNames); -} - -/// definition ::= 'def' prototype expression -static FunctionAST *ParseDefinition() { - getNextToken(); // eat def. - PrototypeAST *Proto = ParsePrototype(); - if (Proto == 0) return 0; - - if (ExprAST *E = ParseExpression()) - return new FunctionAST(Proto, E); - return 0; -} - -/// toplevelexpr ::= expression -static FunctionAST *ParseTopLevelExpr() { - if (ExprAST *E = ParseExpression()) { - // Make an anonymous proto. - PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>()); - return new FunctionAST(Proto, E); - } - return 0; -} - -/// external ::= 'extern' prototype -static PrototypeAST *ParseExtern() { - getNextToken(); // eat extern. - return ParsePrototype(); -} - -//===----------------------------------------------------------------------===// -// Code Generation -//===----------------------------------------------------------------------===// - -static Module *TheModule; -static IRBuilder<> Builder(getGlobalContext()); -static std::map<std::string, Value*> NamedValues; -static FunctionPassManager *TheFPM; - -Value *ErrorV(const char *Str) { Error(Str); return 0; } - -Value *NumberExprAST::Codegen() { - return ConstantFP::get(getGlobalContext(), APFloat(Val)); -} - -Value *VariableExprAST::Codegen() { - // Look this variable up in the function. - Value *V = NamedValues[Name]; - return V ? V : ErrorV("Unknown variable name"); -} - -Value *BinaryExprAST::Codegen() { - Value *L = LHS->Codegen(); - Value *R = RHS->Codegen(); - if (L == 0 || R == 0) return 0; - - switch (Op) { - case '+': return Builder.CreateFAdd(L, R, "addtmp"); - case '-': return Builder.CreateFSub(L, R, "subtmp"); - case '*': return Builder.CreateFMul(L, R, "multmp"); - case '<': - L = Builder.CreateFCmpULT(L, R, "cmptmp"); - // Convert bool 0/1 to double 0.0 or 1.0 - return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), - "booltmp"); - default: return ErrorV("invalid binary operator"); - } -} - -Value *CallExprAST::Codegen() { - // Look up the name in the global module table. - Function *CalleeF = TheModule->getFunction(Callee); - if (CalleeF == 0) - return ErrorV("Unknown function referenced"); - - // If argument mismatch error. - if (CalleeF->arg_size() != Args.size()) - return ErrorV("Incorrect # arguments passed"); - - std::vector<Value*> ArgsV; - for (unsigned i = 0, e = Args.size(); i != e; ++i) { - ArgsV.push_back(Args[i]->Codegen()); - if (ArgsV.back() == 0) return 0; - } - - return Builder.CreateCall(CalleeF, ArgsV, "calltmp"); -} - -Function *PrototypeAST::Codegen() { - // Make the function type: double(double,double) etc. - std::vector<Type*> Doubles(Args.size(), - Type::getDoubleTy(getGlobalContext())); - FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()), - Doubles, false); - - Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule); - - // If F conflicted, there was already something named 'Name'. If it has a - // body, don't allow redefinition or reextern. - if (F->getName() != Name) { - // Delete the one we just made and get the existing one. - F->eraseFromParent(); - F = TheModule->getFunction(Name); - - // If F already has a body, reject this. - if (!F->empty()) { - ErrorF("redefinition of function"); - return 0; - } - - // If F took a different number of args, reject. - if (F->arg_size() != Args.size()) { - ErrorF("redefinition of function with different # args"); - return 0; - } - } - - // Set names for all arguments. - unsigned Idx = 0; - for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size(); - ++AI, ++Idx) { - AI->setName(Args[Idx]); - - // Add arguments to variable symbol table. - NamedValues[Args[Idx]] = AI; - } - - return F; -} - -Function *FunctionAST::Codegen() { - NamedValues.clear(); - - Function *TheFunction = Proto->Codegen(); - if (TheFunction == 0) - return 0; - - // Create a new basic block to start insertion into. - BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction); - Builder.SetInsertPoint(BB); - - if (Value *RetVal = Body->Codegen()) { - // Finish off the function. - Builder.CreateRet(RetVal); - - // Validate the generated code, checking for consistency. - verifyFunction(*TheFunction); - - // Optimize the function. - TheFPM->run(*TheFunction); - - return TheFunction; - } - - // Error reading body, remove function. - TheFunction->eraseFromParent(); - return 0; -} - -//===----------------------------------------------------------------------===// -// Top-Level parsing and JIT Driver -//===----------------------------------------------------------------------===// - -static ExecutionEngine *TheExecutionEngine; - -static void HandleDefinition() { - if (FunctionAST *F = ParseDefinition()) { - if (Function *LF = F->Codegen()) { - fprintf(stderr, "Read function definition:"); - LF->dump(); - } - } else { - // Skip token for error recovery. - getNextToken(); - } -} - -static void HandleExtern() { - if (PrototypeAST *P = ParseExtern()) { - if (Function *F = P->Codegen()) { - fprintf(stderr, "Read extern: "); - F->dump(); - } - } else { - // Skip token for error recovery. - getNextToken(); - } -} - -static void HandleTopLevelExpression() { - // Evaluate a top-level expression into an anonymous function. - if (FunctionAST *F = ParseTopLevelExpr()) { - if (Function *LF = F->Codegen()) { - fprintf(stderr, "Read top-level expression:"); - LF->dump(); - - // JIT the function, returning a function pointer. - void *FPtr = TheExecutionEngine->getPointerToFunction(LF); - - // Cast it to the right type (takes no arguments, returns a double) so we - // can call it as a native function. - double (*FP)() = (double (*)())(intptr_t)FPtr; - fprintf(stderr, "Evaluated to %f\n", FP()); - } - } else { - // Skip token for error recovery. - getNextToken(); - } -} - -/// top ::= definition | external | expression | ';' -static void MainLoop() { - while (1) { - fprintf(stderr, "ready> "); - switch (CurTok) { - case tok_eof: return; - case ';': getNextToken(); break; // ignore top-level semicolons. - case tok_def: HandleDefinition(); break; - case tok_extern: HandleExtern(); break; - default: HandleTopLevelExpression(); break; - } - } -} - -//===----------------------------------------------------------------------===// -// "Library" functions that can be "extern'd" from user code. -//===----------------------------------------------------------------------===// - -/// putchard - putchar that takes a double and returns 0. -extern "C" -double putchard(double X) { - putchar((char)X); - return 0; -} - -//===----------------------------------------------------------------------===// -// Main driver code. -//===----------------------------------------------------------------------===// - -int main() { - InitializeNativeTarget(); - LLVMContext &Context = getGlobalContext(); - - // Install standard binary operators. - // 1 is lowest precedence. - BinopPrecedence['<'] = 10; - BinopPrecedence['+'] = 20; - BinopPrecedence['-'] = 20; - BinopPrecedence['*'] = 40; // highest. - - // Prime the first token. - fprintf(stderr, "ready> "); - getNextToken(); - - // Make the module, which holds all the code. - TheModule = new Module("my cool jit", Context); - - // Create the JIT. This takes ownership of the module. - std::string ErrStr; - TheExecutionEngine = EngineBuilder(TheModule).setErrorStr(&ErrStr).create(); - if (!TheExecutionEngine) { - fprintf(stderr, "Could not create ExecutionEngine: %s\n", ErrStr.c_str()); - exit(1); - } - - FunctionPassManager OurFPM(TheModule); - - // Set up the optimizer pipeline. Start with registering info about how the - // target lays out data structures. - OurFPM.add(new DataLayout(*TheExecutionEngine->getDataLayout())); - // Provide basic AliasAnalysis support for GVN. - OurFPM.add(createBasicAliasAnalysisPass()); - // Do simple "peephole" optimizations and bit-twiddling optzns. - OurFPM.add(createInstructionCombiningPass()); - // Reassociate expressions. - OurFPM.add(createReassociatePass()); - // Eliminate Common SubExpressions. - OurFPM.add(createGVNPass()); - // Simplify the control flow graph (deleting unreachable blocks, etc). - OurFPM.add(createCFGSimplificationPass()); - - OurFPM.doInitialization(); - - // Set the global so the code gen can use this. - TheFPM = &OurFPM; - - // Run the main "interpreter loop" now. - MainLoop(); - - TheFPM = 0; - - // Print out all of the generated code. - TheModule->dump(); - - return 0; -} -</pre> -</div> - -<a href="LangImpl5.html">Next: Extending the language: control flow</a> -</div> - -<!-- *********************************************************************** --> -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> - - <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> - <a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> -</body> -</html> diff --git a/docs/tutorial/LangImpl4.rst b/docs/tutorial/LangImpl4.rst new file mode 100644 index 0000000000..8484c57f9d --- /dev/null +++ b/docs/tutorial/LangImpl4.rst @@ -0,0 +1,1063 @@ +============================================== +Kaleidoscope: Adding JIT and Optimizer Support +============================================== + +.. contents:: + :local: + +Written by `Chris Lattner <mailto:sabre@nondot.org>`_ + +Chapter 4 Introduction +====================== + +Welcome to Chapter 4 of the "`Implementing a language with +LLVM <index.html>`_" tutorial. Chapters 1-3 described the implementation +of a simple language and added support for generating LLVM IR. This +chapter describes two new techniques: adding optimizer support to your +language, and adding JIT compiler support. These additions will +demonstrate how to get nice, efficient code for the Kaleidoscope +language. + +Trivial Constant Folding +======================== + +Our demonstration for Chapter 3 is elegant and easy to extend. +Unfortunately, it does not produce wonderful code. The IRBuilder, +however, does give us obvious optimizations when compiling simple code: + +:: + + ready> def test(x) 1+2+x; + Read function definition: + define double @test(double %x) { + entry: + %addtmp = fadd double 3.000000e+00, %x + ret double %addtmp + } + +This code is not a literal transcription of the AST built by parsing the +input. That would be: + +:: + + ready> def test(x) 1+2+x; + Read function definition: + define double @test(double %x) { + entry: + %addtmp = fadd double 2.000000e+00, 1.000000e+00 + %addtmp1 = fadd double %addtmp, %x + ret double %addtmp1 + } + +Constant folding, as seen above, in particular, is a very common and +very important optimization: so much so that many language implementors +implement constant folding support in their AST representation. + +With LLVM, you don't need this support in the AST. Since all calls to +build LLVM IR go through the LLVM IR builder, the builder itself checked +to see if there was a constant folding opportunity when you call it. If +so, it just does the constant fold and return the constant instead of +creating an instruction. + +Well, that was easy :). In practice, we recommend always using +``IRBuilder`` when generating code like this. It has no "syntactic +overhead" for its use (you don't have to uglify your compiler with +constant checks everywhere) and it can dramatically reduce the amount of +LLVM IR that is generated in some cases (particular for languages with a +macro preprocessor or that use a lot of constants). + +On the other hand, the ``IRBuilder`` is limited by the fact that it does +all of its analysis inline with the code as it is built. If you take a +slightly more complex example: + +:: + + ready> def test(x) (1+2+x)*(x+(1+2)); + ready> Read function definition: + define double @test(double %x) { + entry: + %addtmp = fadd double 3.000000e+00, %x + %addtmp1 = fadd double %x, 3.000000e+00 + %multmp = fmul double %addtmp, %addtmp1 + ret double %multmp + } + +In this case, the LHS and RHS of the multiplication are the same value. +We'd really like to see this generate "``tmp = x+3; result = tmp*tmp;``" +instead of computing "``x+3``" twice. + +Unfortunately, no amount of local analysis will be able to detect and +correct this. This requires two transformations: reassociation of +expressions (to make the add's lexically identical) and Common +Subexpression Elimination (CSE) to delete the redundant add instruction. +Fortunately, LLVM provides a broad range of optimizations that you can +use, in the form of "passes". + +LLVM Optimization Passes +======================== + +LLVM provides many optimization passes, which do many different sorts of +things and have different tradeoffs. Unlike other systems, LLVM doesn't +hold to the mistaken notion that one set of optimizations is right for +all languages and for all situations. LLVM allows a compiler implementor +to make complete decisions about what optimizations to use, in which +order, and in what situation. + +As a concrete example, LLVM supports both "whole module" passes, which +look across as large of body of code as they can (often a whole file, +but if run at link time, this can be a substantial portion of the whole +program). It also supports and includes "per-function" passes which just +operate on a single function at a time, without looking at other +functions. For more information on passes and how they are run, see the +`How to Write a Pass <../WritingAnLLVMPass.html>`_ document and the +`List of LLVM Passes <../Passes.html>`_. + +For Kaleidoscope, we are currently generating functions on the fly, one +at a time, as the user types them in. We aren't shooting for the +ultimate optimization experience in this setting, but we also want to +catch the easy and quick stuff where possible. As such, we will choose +to run a few per-function optimizations as the user types the function +in. If we wanted to make a "static Kaleidoscope compiler", we would use +exactly the code we have now, except that we would defer running the +optimizer until the entire file has been parsed. + +In order to get per-function optimizations going, we need to set up a +`FunctionPassManager <../WritingAnLLVMPass.html#passmanager>`_ to hold +and organize the LLVM optimizations that we want to run. Once we have +that, we can add a set of optimizations to run. The code looks like +this: + +.. code-block:: c++ + + FunctionPassManager OurFPM(TheModule); + + // Set up the optimizer pipeline. Start with registering info about how the + // target lays out data structures. + OurFPM.add(new DataLayout(*TheExecutionEngine->getDataLayout())); + // Provide basic AliasAnalysis support for GVN. + OurFPM.add(createBasicAliasAnalysisPass()); + // Do simple "peephole" optimizations and bit-twiddling optzns. + OurFPM.add(createInstructionCombiningPass()); + // Reassociate expressions. + OurFPM.add(createReassociatePass()); + // Eliminate Common SubExpressions. + OurFPM.add(createGVNPass()); + // Simplify the control flow graph (deleting unreachable blocks, etc). + OurFPM.add(createCFGSimplificationPass()); + + OurFPM.doInitialization(); + + // Set the global so the code gen can use this. + TheFPM = &OurFPM; + + // Run the main "interpreter loop" now. + MainLoop(); + +This code defines a ``FunctionPassManager``, "``OurFPM``". It requires a +pointer to the ``Module`` to construct itself. Once it is set up, we use +a series of "add" calls to add a bunch of LLVM passes. The first pass is +basically boilerplate, it adds a pass so that later optimizations know +how the data structures in the program are laid out. The +"``TheExecutionEngine``" variable is related to the JIT, which we will +get to in the next section. + +In this case, we choose to add 4 optimization passes. The passes we +chose here are a pretty standard set of "cleanup" optimizations that are +useful for a wide variety of code. I won't delve into what they do but, +believe me, they are a good starting place :). + +Once the PassManager is set up, we need to make use of it. We do this by +running it after our newly created function is constructed (in +``FunctionAST::Codegen``), but before it is returned to the client: + +.. code-block:: c++ + + if (Value *RetVal = Body->Codegen()) { + // Finish off the function. + Builder.CreateRet(RetVal); + + // Validate the generated code, checking for consistency. + verifyFunction(*TheFunction); + + // Optimize the function. + TheFPM->run(*TheFunction); + + return TheFunction; + } + +As you can see, this is pretty straightforward. The +``FunctionPassManager`` optimizes and updates the LLVM Function\* in +place, improving (hopefully) its body. With this in place, we can try +our test above again: + +:: + + ready> def test(x) (1+2+x)*(x+(1+2)); + ready> Read function definition: + define double @test(double %x) { + entry: + %addtmp = fadd double %x, 3.000000e+00 + %multmp = fmul double %addtmp, %addtmp + ret double %multmp + } + +As expected, we now get our nicely optimized code, saving a floating +point add instruction from every execution of this function. + +LLVM provides a wide variety of optimizations that can be used in +certain circumstances. Some `documentation about the various +passes <../Passes.html>`_ is available, but it isn't very complete. +Another good source of ideas can come from looking at the passes that +``Clang`` runs to get started. The "``opt``" tool allows you to +experiment with passes from the command line, so you can see if they do +anything. + +Now that we have reasonable code coming out of our front-end, lets talk +about executing it! + +Adding a JIT Compiler +===================== + +Code that is available in LLVM IR can have a wide variety of tools +applied to it. For example, you can run optimizations on it (as we did +above), you can dump it out in textual or binary forms, you can compile +the code to an assembly file (.s) for some target, or you can JIT +compile it. The nice thing about the LLVM IR representation is that it +is the "common currency" between many different parts of the compiler. + +In this section, we'll add JIT compiler support to our interpreter. The +basic idea that we want for Kaleidoscope is to have the user enter +function bodies as they do now, but immediately evaluate the top-level +expressions they type in. For example, if they type in "1 + 2;", we +should evaluate and print out 3. If they define a function, they should +be able to call it from the command line. + +In order to do this, we first declare and initialize the JIT. This is +done by adding a global variable and a call in ``main``: + +.. code-block:: c++ + + static ExecutionEngine *TheExecutionEngine; + ... + int main() { + .. + // Create the JIT. This takes ownership of the module. + TheExecutionEngine = EngineBuilder(TheModule).create(); + .. + } + +This creates an abstract "Execution Engine" which can be either a JIT +compiler or the LLVM interpreter. LLVM will automatically pick a JIT +compiler for you if one is available for your platform, otherwise it +will fall back to the interpreter. + +Once the ``ExecutionEngine`` is created, the JIT is ready to be used. +There are a variety of APIs that are useful, but the simplest one is the +"``getPointerToFunction(F)``" method. This method JIT compiles the +specified LLVM Function and returns a function pointer to the generated +machine code. In our case, this means that we can change the code that +parses a top-level expression to look like this: + +.. code-block:: c++ + + static void HandleTopLevelExpression() { + // Evaluate a top-level expression into an anonymous function. + if (FunctionAST *F = ParseTopLevelExpr()) { + if (Function *LF = F->Codegen()) { + LF->dump(); // Dump the function for exposition purposes. + + // JIT the function, returning a function pointer. + void *FPtr = TheExecutionEngine->getPointerToFunction(LF); + + // Cast it to the right type (takes no arguments, returns a double) so we + // can call it as a native function. + double (*FP)() = (double (*)())(intptr_t)FPtr; + fprintf(stderr, "Evaluated to %f\n", FP()); + } + +Recall that we compile top-level expressions into a self-contained LLVM +function that takes no arguments and returns the computed double. +Because the LLVM JIT compiler matches the native platform ABI, this +means that you can just cast the result pointer to a function pointer of +that type and call it directly. This means, there is no difference +between JIT compiled code and native machine code that is statically +linked into your application. + +With just these two changes, lets see how Kaleidoscope works now! + +:: + + ready> 4+5; + Read top-level expression: + define double @0() { + entry: + ret double 9.000000e+00 + } + + Evaluated to 9.000000 + +Well this looks like it is basically working. The dump of the function +shows the "no argument function that always returns double" that we +synthesize for each top-level expression that is typed in. This +demonstrates very basic functionality, but can we do more? + +:: + + ready> def testfunc(x y) x + y*2; + Read function definition: + define double @testfunc(double %x, double %y) { + entry: + %multmp = fmul double %y, 2.000000e+00 + %addtmp = fadd double %multmp, %x + ret double %addtmp + } + + ready> testfunc(4, 10); + Read top-level expression: + define double @1() { + entry: + %calltmp = call double @testfunc(double 4.000000e+00, double 1.000000e+01) + ret double %calltmp + } + + Evaluated to 24.000000 + +This illustrates that we can now call user code, but there is something +a bit subtle going on here. Note that we only invoke the JIT on the +anonymous functions that *call testfunc*, but we never invoked it on +*testfunc* itself. What actually happened here is that the JIT scanned +for all non-JIT'd functions transitively called from the anonymous +function and compiled all of them before returning from +``getPointerToFunction()``. + +The JIT provides a number of other more advanced interfaces for things +like freeing allocated machine code, rejit'ing functions to update them, +etc. However, even with this simple code, we get some surprisingly +powerful capabilities - check this out (I removed the dump of the +anonymous functions, you should get the idea by now :) : + +:: + + ready> extern sin(x); + Read extern: + declare double @sin(double) + + ready> extern cos(x); + Read extern: + declare double @cos(double) + + ready> sin(1.0); + Read top-level expression: + define double @2() { + entry: + ret double 0x3FEAED548F090CEE + } + + Evaluated to 0.841471 + + ready> def foo(x) sin(x)*sin(x) + cos(x)*cos(x); + Read function definition: + define double @foo(double %x) { + entry: + %calltmp = call double @sin(double %x) + %multmp = fmul double %calltmp, %calltmp + %calltmp2 = call double @cos(double %x) + %multmp4 = fmul double %calltmp2, %calltmp2 + %addtmp = fadd double %multmp, %multmp4 + ret double %addtmp + } + + ready> foo(4.0); + Read top-level expression: + define double @3() { + entry: + %calltmp = call double @foo(double 4.000000e+00) + ret double %calltmp + } + + Evaluated to 1.000000 + +Whoa, how does the JIT know about sin and cos? The answer is +surprisingly simple: in this example, the JIT started execution of a +function and got to a function call. It realized that the function was +not yet JIT compiled and invoked the standard set of routines to resolve +the function. In this case, there is no body defined for the function, +so the JIT ended up calling "``dlsym("sin")``" on the Kaleidoscope +process itself. Since "``sin``" is defined within the JIT's address +space, it simply patches up calls in the module to call the libm version +of ``sin`` directly. + +The LLVM JIT provides a number of interfaces (look in the +``ExecutionEngine.h`` file) for controlling how unknown functions get +resolved. It allows you to establish explicit mappings between IR +objects and addresses (useful for LLVM global variables that you want to +map to static tables, for example), allows you to dynamically decide on +the fly based on the function name, and even allows you to have the JIT +compile functions lazily the first time they're called. + +One interesting application of this is that we can now extend the +language by writing arbitrary C++ code to implement operations. For +example, if we add: + +.. code-block:: c++ + + /// putchard - putchar that takes a double and returns 0. + extern "C" + double putchard(double X) { + putchar((char)X); + return 0; + } + +Now we can produce simple output to the console by using things like: +"``extern putchard(x); putchard(120);``", which prints a lowercase 'x' +on the console (120 is the ASCII code for 'x'). Similar code could be +used to implement file I/O, console input, and many other capabilities +in Kaleidoscope. + +This completes the JIT and optimizer chapter of the Kaleidoscope +tutorial. At this point, we can compile a non-Turing-complete +programming language, optimize and JIT compile it in a user-driven way. +Next up we'll look into `extending the language with control flow +constructs <LangImpl5.html>`_, tackling some interesting LLVM IR issues +along the way. + +Full Code Listing +================= + +Here is the complete code listing for our running example, enhanced with +the LLVM JIT and optimizer. To build this example, use: + +.. code-block:: bash + + # Compile + clang++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy + # Run + ./toy + +If you are compiling this on Linux, make sure to add the "-rdynamic" +option as well. This makes sure that the external functions are resolved +properly at runtime. + +Here is the code: + +.. code-block:: c++ + + #include "llvm/DerivedTypes.h" + #include "llvm/ExecutionEngine/ExecutionEngine.h" + #include "llvm/ExecutionEngine/JIT.h" + #include "llvm/IRBuilder.h" + #include "llvm/LLVMContext.h" + #include "llvm/Module.h" + #include "llvm/PassManager.h" + #include "llvm/Analysis/Verifier.h" + #include "llvm/Analysis/Passes.h" + #include "llvm/DataLayout.h" + #include "llvm/Transforms/Scalar.h" + #include "llvm/Support/TargetSelect.h" + #include <cstdio> + #include <string> + #include <map> + #include <vector> + using namespace llvm; + + //===----------------------------------------------------------------------===// + // Lexer + //===----------------------------------------------------------------------===// + + // The lexer returns tokens [0-255] if it is an unknown character, otherwise one + // of these for known things. + enum Token { + tok_eof = -1, + + // commands + tok_def = -2, tok_extern = -3, + + // primary + tok_identifier = -4, tok_number = -5 + }; + + static std::string IdentifierStr; // Filled in if tok_identifier + static double NumVal; // Filled in if tok_number + + /// gettok - Return the next token from standard input. + static int gettok() { + static int LastChar = ' '; + + // Skip any whitespace. + while (isspace(LastChar)) + LastChar = getchar(); + + if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]* + IdentifierStr = LastChar; + while (isalnum((LastChar = getchar()))) + IdentifierStr += LastChar; + + if (IdentifierStr == "def") return tok_def; + if (IdentifierStr == "extern") return tok_extern; + return tok_identifier; + } + + if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+ + std::string NumStr; + do { + NumStr += LastChar; + LastChar = getchar(); + } while (isdigit(LastChar) || LastChar == '.'); + + NumVal = strtod(NumStr.c_str(), 0); + return tok_number; + } + + if (LastChar == '#') { + // Comment until end of line. + do LastChar = getchar(); + while (LastChar != EOF && LastChar != '\n' && LastChar != '\r'); + + if (LastChar != EOF) + return gettok(); + } + + // Check for end of file. Don't eat the EOF. + if (LastChar == EOF) + return tok_eof; + + // Otherwise, just return the character as its ascii value. + int ThisChar = LastChar; + LastChar = getchar(); + return ThisChar; + } + + //===----------------------------------------------------------------------===// + // Abstract Syntax Tree (aka Parse Tree) + //===----------------------------------------------------------------------===// + + /// ExprAST - Base class for all expression nodes. + class ExprAST { + public: + virtual ~ExprAST() {} + virtual Value *Codegen() = 0; + }; + + /// NumberExprAST - Expression class for numeric literals like "1.0". + class NumberExprAST : public ExprAST { + double Val; + public: + NumberExprAST(double val) : Val(val) {} + virtual Value *Codegen(); + }; + + /// VariableExprAST - Expression class for referencing a variable, like "a". + class VariableExprAST : public ExprAST { + std::string Name; + public: + VariableExprAST(const std::string &name) : Name(name) {} + virtual Value *Codegen(); + }; + + /// BinaryExprAST - Expression class for a binary operator. + class BinaryExprAST : public ExprAST { + char Op; + ExprAST *LHS, *RHS; + public: + BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs) + : Op(op), LHS(lhs), RHS(rhs) {} + virtual Value *Codegen(); + }; + + /// CallExprAST - Expression class for function calls. + class CallExprAST : public ExprAST { + std::string Callee; + std::vector<ExprAST*> Args; + public: + CallExprAST(const std::string &callee, std::vector<ExprAST*> &args) + : Callee(callee), Args(args) {} + virtual Value *Codegen(); + }; + + /// PrototypeAST - This class represents the "prototype" for a function, + /// which captures its name, and its argument names (thus implicitly the number + /// of arguments the function takes). + class PrototypeAST { + std::string Name; + std::vector<std::string> Args; + public: + PrototypeAST(const std::string &name, const std::vector<std::string> &args) + : Name(name), Args(args) {} + + Function *Codegen(); + }; + + /// FunctionAST - This class represents a function definition itself. + class FunctionAST { + PrototypeAST *Proto; + ExprAST *Body; + public: + FunctionAST(PrototypeAST *proto, ExprAST *body) + : Proto(proto), Body(body) {} + + Function *Codegen(); + }; + + //===----------------------------------------------------------------------===// + // Parser + //===----------------------------------------------------------------------===// + + /// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current + /// token the parser is looking at. getNextToken reads another token from the + /// lexer and updates CurTok with its results. + static int CurTok; + static int getNextToken() { + return CurTok = gettok(); + } + + /// BinopPrecedence - This holds the precedence for each binary operator that is + /// defined. + static std::map<char, int> BinopPrecedence; + + /// GetTokPrecedence - Get the precedence of the pending binary operator token. + static int GetTokPrecedence() { + if (!isascii(CurTok)) + return -1; + + // Make sure it's a declared binop. + int TokPrec = BinopPrecedence[CurTok]; + if (TokPrec <= 0) return -1; + return TokPrec; + } + + /// Error* - These are little helper functions for error handling. + ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;} + PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; } + FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; } + + static ExprAST *ParseExpression(); + + /// identifierexpr + /// ::= identifier + /// ::= identifier '(' expression* ')' + static ExprAST *ParseIdentifierExpr() { + std::string IdName = IdentifierStr; + + getNextToken(); // eat identifier. + + if (CurTok != '(') // Simple variable ref. + return new VariableExprAST(IdName); + + // Call. + getNextToken(); // eat ( + std::vector<ExprAST*> Args; + if (CurTok != ')') { + while (1) { + ExprAST *Arg = ParseExpression(); + if (!Arg) return 0; + Args.push_back(Arg); + + if (CurTok == ')') break; + + if (CurTok != ',') + return Error("Expected ')' or ',' in argument list"); + getNextToken(); + } + } + + // Eat the ')'. + getNextToken(); + + return new CallExprAST(IdName, Args); + } + + /// numberexpr ::= number + static ExprAST *ParseNumberExpr() { + ExprAST *Result = new NumberExprAST(NumVal); + getNextToken(); // consume the number + return Result; + } + + /// parenexpr ::= '(' expression ')' + static ExprAST *ParseParenExpr() { + getNextToken(); // eat (. + ExprAST *V = ParseExpression(); + if (!V) return 0; + + if (CurTok != ')') + return Error("expected ')'"); + getNextToken(); // eat ). + return V; + } + + /// primary + /// ::= identifierexpr + /// ::= numberexpr + /// ::= parenexpr + static ExprAST *ParsePrimary() { + switch (CurTok) { + default: return Error("unknown token when expecting an expression"); + case tok_identifier: return ParseIdentifierExpr(); + case tok_number: return ParseNumberExpr(); + case '(': return ParseParenExpr(); + } + } + + /// binoprhs + /// ::= ('+' primary)* + static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) { + // If this is a binop, find its precedence. + while (1) { + int TokPrec = GetTokPrecedence(); + + // If this is a binop that binds at least as tightly as the current binop, + // consume it, otherwise we are done. + if (TokPrec < ExprPrec) + return LHS; + + // Okay, we know this is a binop. + int BinOp = CurTok; + getNextToken(); // eat binop + + // Parse the primary expression after the binary operator. + ExprAST *RHS = ParsePrimary(); + if (!RHS) return 0; + + // If BinOp binds less tightly with RHS than the operator after RHS, let + // the pending operator take RHS as its LHS. + int NextPrec = GetTokPrecedence(); + if (TokPrec < NextPrec) { + RHS = ParseBinOpRHS(TokPrec+1, RHS); + if (RHS == 0) return 0; + } + + // Merge LHS/RHS. + LHS = new BinaryExprAST(BinOp, LHS, RHS); + } + } + + /// expression + /// ::= primary binoprhs + /// + static ExprAST *ParseExpression() { + ExprAST *LHS = ParsePrimary(); + if (!LHS) return 0; + + return ParseBinOpRHS(0, LHS); + } + + /// prototype + /// ::= id '(' id* ')' + static PrototypeAST *ParsePrototype() { + if (CurTok != tok_identifier) + return ErrorP("Expected function name in prototype"); + + std::string FnName = IdentifierStr; + getNextToken(); + + if (CurTok != '(') + return ErrorP("Expected '(' in prototype"); + + std::vector<std::string> ArgNames; + while (getNextToken() == tok_identifier) + ArgNames.push_back(IdentifierStr); + if (CurTok != ')') + return ErrorP("Expected ')' in prototype"); + + // success. + getNextToken(); // eat ')'. + + return new PrototypeAST(FnName, ArgNames); + } + + /// definition ::= 'def' prototype expression + static FunctionAST *ParseDefinition() { + getNextToken(); // eat def. + PrototypeAST *Proto = ParsePrototype(); + if (Proto == 0) return 0; + + if (ExprAST *E = ParseExpression()) + return new FunctionAST(Proto, E); + return 0; + } + + /// toplevelexpr ::= expression + static FunctionAST *ParseTopLevelExpr() { + if (ExprAST *E = ParseExpression()) { + // Make an anonymous proto. + PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>()); + return new FunctionAST(Proto, E); + } + return 0; + } + + /// external ::= 'extern' prototype + static PrototypeAST *ParseExtern() { + getNextToken(); // eat extern. + return ParsePrototype(); + } + + //===----------------------------------------------------------------------===// + // Code Generation + //===----------------------------------------------------------------------===// + + static Module *TheModule; + static IRBuilder<> Builder(getGlobalContext()); + static std::map<std::string, Value*> NamedValues; + static FunctionPassManager *TheFPM; + + Value *ErrorV(const char *Str) { Error(Str); return 0; } + + Value *NumberExprAST::Codegen() { + return ConstantFP::get(getGlobalContext(), APFloat(Val)); + } + + Value *VariableExprAST::Codegen() { + // Look this variable up in the function. + Value *V = NamedValues[Name]; + return V ? V : ErrorV("Unknown variable name"); + } + + Value *BinaryExprAST::Codegen() { + Value *L = LHS->Codegen(); + Value *R = RHS->Codegen(); + if (L == 0 || R == 0) return 0; + + switch (Op) { + case '+': return Builder.CreateFAdd(L, R, "addtmp"); + case '-': return Builder.CreateFSub(L, R, "subtmp"); + case '*': return Builder.CreateFMul(L, R, "multmp"); + case '<': + L = Builder.CreateFCmpULT(L, R, "cmptmp"); + // Convert bool 0/1 to double 0.0 or 1.0 + return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), + "booltmp"); + default: return ErrorV("invalid binary operator"); + } + } + + Value *CallExprAST::Codegen() { + // Look up the name in the global module table. + Function *CalleeF = TheModule->getFunction(Callee); + if (CalleeF == 0) + return ErrorV("Unknown function referenced"); + + // If argument mismatch error. + if (CalleeF->arg_size() != Args.size()) + return ErrorV("Incorrect # arguments passed"); + + std::vector<Value*> ArgsV; + for (unsigned i = 0, e = Args.size(); i != e; ++i) { + ArgsV.push_back(Args[i]->Codegen()); + if (ArgsV.back() == 0) return 0; + } + + return Builder.CreateCall(CalleeF, ArgsV, "calltmp"); + } + + Function *PrototypeAST::Codegen() { + // Make the function type: double(double,double) etc. + std::vector<Type*> Doubles(Args.size(), + Type::getDoubleTy(getGlobalContext())); + FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()), + Doubles, false); + + Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule); + + // If F conflicted, there was already something named 'Name'. If it has a + // body, don't allow redefinition or reextern. + if (F->getName() != Name) { + // Delete the one we just made and get the existing one. + F->eraseFromParent(); + F = TheModule->getFunction(Name); + + // If F already has a body, reject this. + if (!F->empty()) { + ErrorF("redefinition of function"); + return 0; + } + + // If F took a different number of args, reject. + if (F->arg_size() != Args.size()) { + ErrorF("redefinition of function with different # args"); + return 0; + } + } + + // Set names for all arguments. + unsigned Idx = 0; + for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size(); + ++AI, ++Idx) { + AI->setName(Args[Idx]); + + // Add arguments to variable symbol table. + NamedValues[Args[Idx]] = AI; + } + + return F; + } + + Function *FunctionAST::Codegen() { + NamedValues.clear(); + + Function *TheFunction = Proto->Codegen(); + if (TheFunction == 0) + return 0; + + // Create a new basic block to start insertion into. + BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction); + Builder.SetInsertPoint(BB); + + if (Value *RetVal = Body->Codegen()) { + // Finish off the function. + Builder.CreateRet(RetVal); + + // Validate the generated code, checking for consistency. + verifyFunction(*TheFunction); + + // Optimize the function. + TheFPM->run(*TheFunction); + + return TheFunction; + } + + // Error reading body, remove function. + TheFunction->eraseFromParent(); + return 0; + } + + //===----------------------------------------------------------------------===// + // Top-Level parsing and JIT Driver + //===----------------------------------------------------------------------===// + + static ExecutionEngine *TheExecutionEngine; + + static void HandleDefinition() { + if (FunctionAST *F = ParseDefinition()) { + if (Function *LF = F->Codegen()) { + fprintf(stderr, "Read function definition:"); + LF->dump(); + } + } else { + // Skip token for error recovery. + getNextToken(); + } + } + + static void HandleExtern() { + if (PrototypeAST *P = ParseExtern()) { + if (Function *F = P->Codegen()) { + fprintf(stderr, "Read extern: "); + F->dump(); + } + } else { + // Skip token for error recovery. + getNextToken(); + } + } + + static void HandleTopLevelExpression() { + // Evaluate a top-level expression into an anonymous function. + if (FunctionAST *F = ParseTopLevelExpr()) { + if (Function *LF = F->Codegen()) { + fprintf(stderr, "Read top-level expression:"); + LF->dump(); + + // JIT the function, returning a function pointer. + void *FPtr = TheExecutionEngine->getPointerToFunction(LF); + + // Cast it to the right type (takes no arguments, returns a double) so we + // can call it as a native function. + double (*FP)() = (double (*)())(intptr_t)FPtr; + fprintf(stderr, "Evaluated to %f\n", FP()); + } + } else { + // Skip token for error recovery. + getNextToken(); + } + } + + /// top ::= definition | external | expression | ';' + static void MainLoop() { + while (1) { + fprintf(stderr, "ready> "); + switch (CurTok) { + case tok_eof: return; + case ';': getNextToken(); break; // ignore top-level semicolons. + case tok_def: HandleDefinition(); break; + case tok_extern: HandleExtern(); break; + default: HandleTopLevelExpression(); break; + } + } + } + + //===----------------------------------------------------------------------===// + // "Library" functions that can be "extern'd" from user code. + //===----------------------------------------------------------------------===// + + /// putchard - putchar that takes a double and returns 0. + extern "C" + double putchard(double X) { + putchar((char)X); + return 0; + } + + //===----------------------------------------------------------------------===// + // Main driver code. + //===----------------------------------------------------------------------===// + + int main() { + InitializeNativeTarget(); + LLVMContext &Context = getGlobalContext(); + + // Install standard binary operators. + // 1 is lowest precedence. + BinopPrecedence['<'] = 10; + BinopPrecedence['+'] = 20; + BinopPrecedence['-'] = 20; + BinopPrecedence['*'] = 40; // highest. + + // Prime the first token. + fprintf(stderr, "ready> "); + getNextToken(); + + // Make the module, which holds all the code. + TheModule = new Module("my cool jit", Context); + + // Create the JIT. This takes ownership of the module. + std::string ErrStr; + TheExecutionEngine = EngineBuilder(TheModule).setErrorStr(&ErrStr).create(); + if (!TheExecutionEngine) { + fprintf(stderr, "Could not create ExecutionEngine: %s\n", ErrStr.c_str()); + exit(1); + } + + FunctionPassManager OurFPM(TheModule); + + // Set up the optimizer pipeline. Start with registering info about how the + // target lays out data structures. + OurFPM.add(new DataLayout(*TheExecutionEngine->getDataLayout())); + // Provide basic AliasAnalysis support for GVN. + OurFPM.add(createBasicAliasAnalysisPass()); + // Do simple "peephole" optimizations and bit-twiddling optzns. + OurFPM.add(createInstructionCombiningPass()); + // Reassociate expressions. + OurFPM.add(createReassociatePass()); + // Eliminate Common SubExpressions. + OurFPM.add(createGVNPass()); + // Simplify the control flow graph (deleting unreachable blocks, etc). + OurFPM.add(createCFGSimplificationPass()); + + OurFPM.doInitialization(); + + // Set the global so the code gen can use this. + TheFPM = &OurFPM; + + // Run the main "interpreter loop" now. + MainLoop(); + + TheFPM = 0; + + // Print out all of the generated code. + TheModule->dump(); + + return 0; + } + +`Next: Extending the language: control flow <LangImpl5.html>`_ + diff --git a/docs/tutorial/LangImpl5.html b/docs/tutorial/LangImpl5.html deleted file mode 100644 index 768d9a0e11..0000000000 --- a/docs/tutorial/LangImpl5.html +++ /dev/null @@ -1,1772 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> - -<html> -<head> - <title>Kaleidoscope: Extending the Language: Control Flow</title> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <meta name="author" content="Chris Lattner"> - <link rel="stylesheet" href="../_static/llvm.css" type="text/css"> -</head> - -<body> - -<h1>Kaleidoscope: Extending the Language: Control Flow</h1> - -<ul> -<li><a href="index.html">Up to Tutorial Index</a></li> -<li>Chapter 5 - <ol> - <li><a href="#intro">Chapter 5 Introduction</a></li> - <li><a href="#ifthen">If/Then/Else</a> - <ol> - <li><a href="#iflexer">Lexer Extensions</a></li> - <li><a href="#ifast">AST Extensions</a></li> - <li><a href="#ifparser">Parser Extensions</a></li> - <li><a href="#ifir">LLVM IR</a></li> - <li><a href="#ifcodegen">Code Generation</a></li> - </ol> - </li> - <li><a href="#for">'for' Loop Expression</a> - <ol> - <li><a href="#forlexer">Lexer Extensions</a></li> - <li><a href="#forast">AST Extensions</a></li> - <li><a href="#forparser">Parser Extensions</a></li> - <li><a href="#forir">LLVM IR</a></li> - <li><a href="#forcodegen">Code Generation</a></li> - </ol> - </li> - <li><a href="#code">Full Code Listing</a></li> - </ol> -</li> -<li><a href="LangImpl6.html">Chapter 6</a>: Extending the Language: -User-defined Operators</li> -</ul> - -<div class="doc_author"> - <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p> -</div> - -<!-- *********************************************************************** --> -<h2><a name="intro">Chapter 5 Introduction</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>Welcome to Chapter 5 of the "<a href="index.html">Implementing a language -with LLVM</a>" tutorial. Parts 1-4 described the implementation of the simple -Kaleidoscope language and included support for generating LLVM IR, followed by -optimizations and a JIT compiler. Unfortunately, as presented, Kaleidoscope is -mostly useless: it has no control flow other than call and return. This means -that you can't have conditional branches in the code, significantly limiting its -power. In this episode of "build that compiler", we'll extend Kaleidoscope to -have an if/then/else expression plus a simple 'for' loop.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="ifthen">If/Then/Else</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p> -Extending Kaleidoscope to support if/then/else is quite straightforward. It -basically requires adding support for this "new" concept to the lexer, -parser, AST, and LLVM code emitter. This example is nice, because it shows how -easy it is to "grow" a language over time, incrementally extending it as new -ideas are discovered.</p> - -<p>Before we get going on "how" we add this extension, lets talk about "what" we -want. The basic idea is that we want to be able to write this sort of thing: -</p> - -<div class="doc_code"> -<pre> -def fib(x) - if x < 3 then - 1 - else - fib(x-1)+fib(x-2); -</pre> -</div> - -<p>In Kaleidoscope, every construct is an expression: there are no statements. -As such, the if/then/else expression needs to return a value like any other. -Since we're using a mostly functional form, we'll have it evaluate its -conditional, then return the 'then' or 'else' value based on how the condition -was resolved. This is very similar to the C "?:" expression.</p> - -<p>The semantics of the if/then/else expression is that it evaluates the -condition to a boolean equality value: 0.0 is considered to be false and -everything else is considered to be true. -If the condition is true, the first subexpression is evaluated and returned, if -the condition is false, the second subexpression is evaluated and returned. -Since Kaleidoscope allows side-effects, this behavior is important to nail down. -</p> - -<p>Now that we know what we "want", lets break this down into its constituent -pieces.</p> - -<!-- ======================================================================= --> -<h4><a name="iflexer">Lexer Extensions for If/Then/Else</a></h4> -<!-- ======================================================================= --> - - -<div> - -<p>The lexer extensions are straightforward. First we add new enum values -for the relevant tokens:</p> - -<div class="doc_code"> -<pre> - // control - tok_if = -6, tok_then = -7, tok_else = -8, -</pre> -</div> - -<p>Once we have that, we recognize the new keywords in the lexer. This is pretty simple -stuff:</p> - -<div class="doc_code"> -<pre> - ... - if (IdentifierStr == "def") return tok_def; - if (IdentifierStr == "extern") return tok_extern; - <b>if (IdentifierStr == "if") return tok_if; - if (IdentifierStr == "then") return tok_then; - if (IdentifierStr == "else") return tok_else;</b> - return tok_identifier; -</pre> -</div> - -</div> - -<!-- ======================================================================= --> -<h4><a name="ifast">AST Extensions for If/Then/Else</a></h4> -<!-- ======================================================================= --> - -<div> - -<p>To represent the new expression we add a new AST node for it:</p> - -<div class="doc_code"> -<pre> -/// IfExprAST - Expression class for if/then/else. -class IfExprAST : public ExprAST { - ExprAST *Cond, *Then, *Else; -public: - IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else) - : Cond(cond), Then(then), Else(_else) {} - virtual Value *Codegen(); -}; -</pre> -</div> - -<p>The AST node just has pointers to the various subexpressions.</p> - -</div> - -<!-- ======================================================================= --> -<h4><a name="ifparser">Parser Extensions for If/Then/Else</a></h4> -<!-- ======================================================================= --> - -<div> - -<p>Now that we have the relevant tokens coming from the lexer and we have the -AST node to build, our parsing logic is relatively straightforward. First we -define a new parsing function:</p> - -<div class="doc_code"> -<pre> -/// ifexpr ::= 'if' expression 'then' expression 'else' expression -static ExprAST *ParseIfExpr() { - getNextToken(); // eat the if. - - // condition. - ExprAST *Cond = ParseExpression(); - if (!Cond) return 0; - - if (CurTok != tok_then) - return Error("expected then"); - getNextToken(); // eat the then - - ExprAST *Then = ParseExpression(); - if (Then == 0) return 0; - - if (CurTok != tok_else) - return Error("expected else"); - - getNextToken(); - - ExprAST *Else = ParseExpression(); - if (!Else) return 0; - - return new IfExprAST(Cond, Then, Else); -} -</pre> -</div> - -<p>Next we hook it up as a primary expression:</p> - -<div class="doc_code"> -<pre> -static ExprAST *ParsePrimary() { - switch (CurTok) { - default: return Error("unknown token when expecting an expression"); - case tok_identifier: return ParseIdentifierExpr(); - case tok_number: return ParseNumberExpr(); - case '(': return ParseParenExpr(); - <b>case tok_if: return ParseIfExpr();</b> - } -} -</pre> -</div> - -</div> - -<!-- ======================================================================= --> -<h4><a name="ifir">LLVM IR for If/Then/Else</a></h4> -<!-- ======================================================================= --> - -<div> - -<p>Now that we have it parsing and building the AST, the final piece is adding -LLVM code generation support. This is the most interesting part of the -if/then/else example, because this is where it starts to introduce new concepts. -All of the code above has been thoroughly described in previous chapters. -</p> - -<p>To motivate the code we want to produce, lets take a look at a simple -example. Consider:</p> - -<div class="doc_code"> -<pre> -extern foo(); -extern bar(); -def baz(x) if x then foo() else bar(); -</pre> -</div> - -<p>If you disable optimizations, the code you'll (soon) get from Kaleidoscope -looks like this:</p> - -<div class="doc_code"> -<pre> -declare double @foo() - -declare double @bar() - -define double @baz(double %x) { -entry: - %ifcond = fcmp one double %x, 0.000000e+00 - br i1 %ifcond, label %then, label %else - -then: ; preds = %entry - %calltmp = call double @foo() - br label %ifcont - -else: ; preds = %entry - %calltmp1 = call double @bar() - br label %ifcont - -ifcont: ; preds = %else, %then - %iftmp = phi double [ %calltmp, %then ], [ %calltmp1, %else ] - ret double %iftmp -} -</pre> -</div> - -<p>To visualize the control flow graph, you can use a nifty feature of the LLVM -'<a href="http://llvm.org/cmds/opt.html">opt</a>' tool. If you put this LLVM IR -into "t.ll" and run "<tt>llvm-as < t.ll | opt -analyze -view-cfg</tt>", <a -href="../ProgrammersManual.html#ViewGraph">a window will pop up</a> and you'll -see this graph:</p> - -<div style="text-align: center"><img src="LangImpl5-cfg.png" alt="Example CFG" width="423" -height="315"></div> - -<p>Another way to get this is to call "<tt>F->viewCFG()</tt>" or -"<tt>F->viewCFGOnly()</tt>" (where F is a "<tt>Function*</tt>") either by -inserting actual calls into the code and recompiling or by calling these in the -debugger. LLVM has many nice features for visualizing various graphs.</p> - -<p>Getting back to the generated code, it is fairly simple: the entry block -evaluates the conditional expression ("x" in our case here) and compares the -result to 0.0 with the "<tt><a href="../LangRef.html#i_fcmp">fcmp</a> one</tt>" -instruction ('one' is "Ordered and Not Equal"). Based on the result of this -expression, the code jumps to either the "then" or "else" blocks, which contain -the expressions for the true/false cases.</p> - -<p>Once the then/else blocks are finished executing, they both branch back to the -'ifcont' block to execute the code that happens after the if/then/else. In this -case the only thing left to do is to return to the caller of the function. The -question then becomes: how does the code know which expression to return?</p> - -<p>The answer to this question involves an important SSA operation: the -<a href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Phi -operation</a>. If you're not familiar with SSA, <a -href="http://en.wikipedia.org/wiki/Static_single_assignment_form">the wikipedia -article</a> is a good introduction and there are various other introductions to -it available on your favorite search engine. The short version is that -"execution" of the Phi operation requires "remembering" which block control came -from. The Phi operation takes on the value corresponding to the input control -block. In this case, if control comes in from the "then" block, it gets the -value of "calltmp". If control comes from the "else" block, it gets the value -of "calltmp1".</p> - -<p>At this point, you are probably starting to think "Oh no! This means my -simple and elegant front-end will have to start generating SSA form in order to -use LLVM!". Fortunately, this is not the case, and we strongly advise -<em>not</em> implementing an SSA construction algorithm in your front-end -unless there is an amazingly good reason to do so. In practice, there are two -sorts of values that float around in code written for your average imperative -programming language that might need Phi nodes:</p> - -<ol> -<li>Code that involves user variables: <tt>x = 1; x = x + 1; </tt></li> -<li>Values that are implicit in the structure of your AST, such as the Phi node -in this case.</li> -</ol> - -<p>In <a href="LangImpl7.html">Chapter 7</a> of this tutorial ("mutable -variables"), we'll talk about #1 -in depth. For now, just believe me that you don't need SSA construction to -handle this case. For #2, you have the choice of using the techniques that we will -describe for #1, or you can insert Phi nodes directly, if convenient. In this -case, it is really really easy to generate the Phi node, so we choose to do it -directly.</p> - -<p>Okay, enough of the motivation and overview, lets generate code!</p> - -</div> - -<!-- ======================================================================= --> -<h4><a name="ifcodegen">Code Generation for If/Then/Else</a></h4> -<!-- ======================================================================= --> - -<div> - -<p>In order to generate code for this, we implement the <tt>Codegen</tt> method -for <tt>IfExprAST</tt>:</p> - -<div class="doc_code"> -<pre> -Value *IfExprAST::Codegen() { - Value *CondV = Cond->Codegen(); - if (CondV == 0) return 0; - - // Convert condition to a bool by comparing equal to 0.0. - CondV = Builder.CreateFCmpONE(CondV, - ConstantFP::get(getGlobalContext(), APFloat(0.0)), - "ifcond"); -</pre> -</div> - -<p>This code is straightforward and similar to what we saw before. We emit the -expression for the condition, then compare that value to zero to get a truth -value as a 1-bit (bool) value.</p> - -<div class="doc_code"> -<pre> - Function *TheFunction = Builder.GetInsertBlock()->getParent(); - - // Create blocks for the then and else cases. Insert the 'then' block at the - // end of the function. - BasicBlock *ThenBB = BasicBlock::Create(getGlobalContext(), "then", TheFunction); - BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else"); - BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont"); - - Builder.CreateCondBr(CondV, ThenBB, ElseBB); -</pre> -</div> - -<p>This code creates the basic blocks that are related to the if/then/else -statement, and correspond directly to the blocks in the example above. The -first line gets the current Function object that is being built. It -gets this by asking the builder for the current BasicBlock, and asking that -block for its "parent" (the function it is currently embedded into).</p> - -<p>Once it has that, it creates three blocks. Note that it passes "TheFunction" -into the constructor for the "then" block. This causes the constructor to -automatically insert the new block into the end of the specified function. The -other two blocks are created, but aren't yet inserted into the function.</p> - -<p>Once the blocks are created, we can emit the conditional branch that chooses -between them. Note that creating new blocks does not implicitly affect the -IRBuilder, so it is still inserting into the block that the condition -went into. Also note that it is creating a branch to the "then" block and the -"else" block, even though the "else" block isn't inserted into the function yet. -This is all ok: it is the standard way that LLVM supports forward -references.</p> - -<div class="doc_code"> -<pre> - // Emit then value. - Builder.SetInsertPoint(ThenBB); - - Value *ThenV = Then->Codegen(); - if (ThenV == 0) return 0; - - Builder.CreateBr(MergeBB); - // Codegen of 'Then' can change the current block, update ThenBB for the PHI. - ThenBB = Builder.GetInsertBlock(); -</pre> -</div> - -<p>After the conditional branch is inserted, we move the builder to start -inserting into the "then" block. Strictly speaking, this call moves the -insertion point to be at the end of the specified block. However, since the -"then" block is empty, it also starts out by inserting at the beginning of the -block. :)</p> - -<p>Once the insertion point is set, we recursively codegen the "then" expression -from the AST. To finish off the "then" block, we create an unconditional branch -to the merge block. One interesting (and very important) aspect of the LLVM IR -is that it <a href="../LangRef.html#functionstructure">requires all basic blocks -to be "terminated"</a> with a <a href="../LangRef.html#terminators">control flow -instruction</a> such as return or branch. This means that all control flow, -<em>including fall throughs</em> must be made explicit in the LLVM IR. If you -violate this rule, the verifier will emit an error.</p> - -<p>The final line here is quite subtle, but is very important. The basic issue -is that when we create the Phi node in the merge block, we need to set up the -block/value pairs that indicate how the Phi will work. Importantly, the Phi -node expects to have an entry for each predecessor of the block in the CFG. Why -then, are we getting the current block when we just set it to ThenBB 5 lines -above? The problem is that the "Then" expression may actually itself change the -block that the Builder is emitting into if, for example, it contains a nested -"if/then/else" expression. Because calling Codegen recursively could -arbitrarily change the notion of the current block, we are required to get an -up-to-date value for code that will set up the Phi node.</p> - -<div class="doc_code"> -<pre> - // Emit else block. - TheFunction->getBasicBlockList().push_back(ElseBB); - Builder.SetInsertPoint(ElseBB); - - Value *ElseV = Else->Codegen(); - if (ElseV == 0) return 0; - - Builder.CreateBr(MergeBB); - // Codegen of 'Else' can change the current block, update ElseBB for the PHI. - ElseBB = Builder.GetInsertBlock(); -</pre> -</div> - -<p>Code generation for the 'else' block is basically identical to codegen for -the 'then' block. The only significant difference is the first line, which adds -the 'else' block to the function. Recall previously that the 'else' block was -created, but not added to the function. Now that the 'then' and 'else' blocks -are emitted, we can finish up with the merge code:</p> - -<div class="doc_code"> -<pre> - // Emit merge block. - TheFunction->getBasicBlockList().push_back(MergeBB); - Builder.SetInsertPoint(MergeBB); - PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, - "iftmp"); - - PN->addIncoming(ThenV, ThenBB); - PN->addIncoming(ElseV, ElseBB); - return PN; -} -</pre> -</div> - -<p>The first two lines here are now familiar: the first adds the "merge" block -to the Function object (it was previously floating, like the else block above). -The second block changes the insertion point so that newly created code will go -into the "merge" block. Once that is done, we need to create the PHI node and -set up the block/value pairs for the PHI.</p> - -<p>Finally, the CodeGen function returns the phi node as the value computed by -the if/then/else expression. In our example above, this returned value will -feed into the code for the top-level function, which will create the return -instruction.</p> - -<p>Overall, we now have the ability to execute conditional code in -Kaleidoscope. With this extension, Kaleidoscope is a fairly complete language -that can calculate a wide variety of numeric functions. Next up we'll add -another useful expression that is familiar from non-functional languages...</p> - -</div> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="for">'for' Loop Expression</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>Now that we know how to add basic control flow constructs to the language, -we have the tools to add more powerful things. Lets add something more -aggressive, a 'for' expression:</p> - -<div class="doc_code"> -<pre> - extern putchard(char) - def printstar(n) - for i = 1, i < n, 1.0 in - putchard(42); # ascii 42 = '*' - - # print 100 '*' characters - printstar(100); -</pre> -</div> - -<p>This expression defines a new variable ("i" in this case) which iterates from -a starting value, while the condition ("i < n" in this case) is true, -incrementing by an optional step value ("1.0" in this case). If the step value -is omitted, it defaults to 1.0. While the loop is true, it executes its -body expression. Because we don't have anything better to return, we'll just -define the loop as always returning 0.0. In the future when we have mutable -variables, it will get more useful.</p> - -<p>As before, lets talk about the changes that we need to Kaleidoscope to -support this.</p> - -<!-- ======================================================================= --> -<h4><a name="forlexer">Lexer Extensions for the 'for' Loop</a></h4> -<!-- ======================================================================= --> - -<div> - -<p>The lexer extensions are the same sort of thing as for if/then/else:</p> - -<div class="doc_code"> -<pre> - ... in enum Token ... - // control - tok_if = -6, tok_then = -7, tok_else = -8, -<b> tok_for = -9, tok_in = -10</b> - - ... in gettok ... - if (IdentifierStr == "def") return tok_def; - if (IdentifierStr == "extern") return tok_extern; - if (IdentifierStr == "if") return tok_if; - if (IdentifierStr == "then") return tok_then; - if (IdentifierStr == "else") return tok_else; - <b>if (IdentifierStr == "for") return tok_for; - if (IdentifierStr == "in") return tok_in;</b> - return tok_identifier; -</pre> -</div> - -</div> - -<!-- ======================================================================= --> -<h4><a name="forast">AST Extensions for the 'for' Loop</a></h4> -<!-- ======================================================================= --> - -<div> - -<p>The AST node is just as simple. It basically boils down to capturing -the variable name and the constituent expressions in the node.</p> - -<div class="doc_code"> -<pre> -/// ForExprAST - Expression class for for/in. -class ForExprAST : public ExprAST { - std::string VarName; - ExprAST *Start, *End, *Step, *Body; -public: - ForExprAST(const std::string &varname, ExprAST *start, ExprAST *end, - ExprAST *step, ExprAST *body) - : VarName(varname), Start(start), End(end), Step(step), Body(body) {} - virtual Value *Codegen(); -}; -</pre> -</div> - -</div> - -<!-- ======================================================================= --> -<h4><a name="forparser">Parser Extensions for the 'for' Loop</a></h4> -<!-- ======================================================================= --> - -<div> - -<p>The parser code is also fairly standard. The only interesting thing here is -handling of the optional step value. The parser code handles it by checking to -see if the second comma is present. If not, it sets the step value to null in -the AST node:</p> - -<div class="doc_code"> -<pre> -/// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression -static ExprAST *ParseForExpr() { - getNextToken(); // eat the for. - - if (CurTok != tok_identifier) - return Error("expected identifier after for"); - - std::string IdName = IdentifierStr; - getNextToken(); // eat identifier. - - if (CurTok != '=') - return Error("expected '=' after for"); - getNextToken(); // eat '='. - - - ExprAST *Start = ParseExpression(); - if (Start == 0) return 0; - if (CurTok != ',') - return Error("expected ',' after for start value"); - getNextToken(); - - ExprAST *End = ParseExpression(); - if (End == 0) return 0; - - // The step value is optional. - ExprAST *Step = 0; - if (CurTok == ',') { - getNextToken(); - Step = ParseExpression(); - if (Step == 0) return 0; - } - - if (CurTok != tok_in) - return Error("expected 'in' after for"); - getNextToken(); // eat 'in'. - - ExprAST *Body = ParseExpression(); - if (Body == 0) return 0; - - return new ForExprAST(IdName, Start, End, Step, Body); -} -</pre> -</div> - -</div> - -<!-- ======================================================================= --> -<h4><a name="forir">LLVM IR for the 'for' Loop</a></h4> -<!-- ======================================================================= --> - -<div> - -<p>Now we get to the good part: the LLVM IR we want to generate for this thing. -With the simple example above, we get this LLVM IR (note that this dump is -generated with optimizations disabled for clarity): -</p> - -<div class="doc_code"> -<pre> -declare double @putchard(double) - -define double @printstar(double %n) { -entry: - ; initial value = 1.0 (inlined into phi) - br label %loop - -loop: ; preds = %loop, %entry - %i = phi double [ 1.000000e+00, %entry ], [ %nextvar, %loop ] - ; body - %calltmp = call double @putchard(double 4.200000e+01) - ; increment - %nextvar = fadd double %i, 1.000000e+00 - - ; termination test - %cmptmp = fcmp ult double %i, %n - %booltmp = uitofp i1 %cmptmp to double - %loopcond = fcmp one double %booltmp, 0.000000e+00 - br i1 %loopcond, label %loop, label %afterloop - -afterloop: ; preds = %loop - ; loop always returns 0.0 - ret double 0.000000e+00 -} -</pre> -</div> - -<p>This loop contains all the same constructs we saw before: a phi node, several -expressions, and some basic blocks. Lets see how this fits together.</p> - -</div> - -<!-- ======================================================================= --> -<h4><a name="forcodegen">Code Generation for the 'for' Loop</a></h4> -<!-- ======================================================================= --> - -<div> - -<p>The first part of Codegen is very simple: we just output the start expression -for the loop value:</p> - -<div class="doc_code"> -<pre> -Value *ForExprAST::Codegen() { - // Emit the start code first, without 'variable' in scope. - Value *StartVal = Start->Codegen(); - if (StartVal == 0) return 0; -</pre> -</div> - -<p>With this out of the way, the next step is to set up the LLVM basic block -for the start of the loop body. In the case above, the whole loop body is one -block, but remember that the body code itself could consist of multiple blocks -(e.g. if it contains an if/then/else or a for/in expression).</p> - -<div class="doc_code"> -<pre> - // Make the new basic block for the loop header, inserting after current - // block. - Function *TheFunction = Builder.GetInsertBlock()->getParent(); - BasicBlock *PreheaderBB = Builder.GetInsertBlock(); - BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction); - - // Insert an explicit fall through from the current block to the LoopBB. - Builder.CreateBr(LoopBB); -</pre> -</div> - -<p>This code is similar to what we saw for if/then/else. Because we will need -it to create the Phi node, we remember the block that falls through into the -loop. Once we have that, we create the actual block that starts the loop and -create an unconditional branch for the fall-through between the two blocks.</p> - -<div class="doc_code"> -<pre> - // Start insertion in LoopBB. - Builder.SetInsertPoint(LoopBB); - - // Start the PHI node with an entry for Start. - PHINode *Variable = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, VarName.c_str()); - Variable->addIncoming(StartVal, PreheaderBB); -</pre> -</div> - -<p>Now that the "preheader" for the loop is set up, we switch to emitting code -for the loop body. To begin with, we move the insertion point and create the -PHI node for the loop induction variable. Since we already know the incoming -value for the starting value, we add it to the Phi node. Note that the Phi will -eventually get a second value for the backedge, but we can't set it up yet -(because it doesn't exist!).</p> - -<div class="doc_code"> -<pre> - // Within the loop, the variable is defined equal to the PHI node. If it - // shadows an existing variable, we have to restore it, so save it now. - Value *OldVal = NamedValues[VarName]; - NamedValues[VarName] = Variable; - - // Emit the body of the loop. This, like any other expr, can change the - // current BB. Note that we ignore the value computed by the body, but don't - // allow an error. - if (Body->Codegen() == 0) - return 0; -</pre> -</div> - -<p>Now the code starts to get more interesting. Our 'for' loop introduces a new -variable to the symbol table. This means that our symbol table can now contain -either function arguments or loop variables. To handle this, before we codegen -the body of the loop, we add the loop variable as the current value for its -name. Note that it is possible that there is a variable of the same name in the -outer scope. It would be easy to make this an error (emit an error and return -null if there is already an entry for VarName) but we choose to allow shadowing -of variables. In order to handle this correctly, we remember the Value that -we are potentially shadowing in <tt>OldVal</tt> (which will be null if there is -no shadowed variable).</p> - -<p>Once the loop variable is set into the symbol table, the code recursively -codegen's the body. This allows the body to use the loop variable: any -references to it will naturally find it in the symbol table.</p> - -<div class="doc_code"> -<pre> - // Emit the step value. - Value *StepVal; - if (Step) { - StepVal = Step->Codegen(); - if (StepVal == 0) return 0; - } else { - // If not specified, use 1.0. - StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0)); - } - - Value *NextVar = Builder.CreateFAdd(Variable, StepVal, "nextvar"); -</pre> -</div> - -<p>Now that the body is emitted, we compute the next value of the iteration -variable by adding the step value, or 1.0 if it isn't present. '<tt>NextVar</tt>' -will be the value of the loop variable on the next iteration of the loop.</p> - -<div class="doc_code"> -<pre> - // Compute the end condition. - Value *EndCond = End->Codegen(); - if (EndCond == 0) return EndCond; - - // Convert condition to a bool by comparing equal to 0.0. - EndCond = Builder.CreateFCmpONE(EndCond, - ConstantFP::get(getGlobalContext(), APFloat(0.0)), - "loopcond"); -</pre> -</div> - -<p>Finally, we evaluate the exit value of the loop, to determine whether the -loop should exit. This mirrors the condition evaluation for the if/then/else -statement.</p> - -<div class="doc_code"> -<pre> - // Create the "after loop" block and insert it. - BasicBlock *LoopEndBB = Builder.GetInsertBlock(); - BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction); - - // Insert the conditional branch into the end of LoopEndBB. - Builder.CreateCondBr(EndCond, LoopBB, AfterBB); - - // Any new code will be inserted in AfterBB. - Builder.SetInsertPoint(AfterBB); -</pre> -</div> - -<p>With the code for the body of the loop complete, we just need to finish up -the control flow for it. This code remembers the end block (for the phi node), -then creates the block for the loop exit ("afterloop"). Based on the value of -the exit condition, it creates a conditional branch that chooses between -executing the loop again and exiting the loop. Any future code is emitted in -the "afterloop" block, so it sets the insertion position to it.</p> - -<div class="doc_code"> -<pre> - // Add a new entry to the PHI node for the backedge. - Variable->addIncoming(NextVar, LoopEndBB); - - // Restore the unshadowed variable. - if (OldVal) - NamedValues[VarName] = OldVal; - else - NamedValues.erase(VarName); - - // for expr always returns 0.0. - return Constant::getNullValue(Type::getDoubleTy(getGlobalContext())); -} -</pre> -</div> - -<p>The final code handles various cleanups: now that we have the "NextVar" -value, we can add the incoming value to the loop PHI node. After that, we -remove the loop variable from the symbol table, so that it isn't in scope after -the for loop. Finally, code generation of the for loop always returns 0.0, so -that is what we return from <tt>ForExprAST::Codegen</tt>.</p> - -<p>With this, we conclude the "adding control flow to Kaleidoscope" chapter of -the tutorial. In this chapter we added two control flow constructs, and used them to motivate a couple of aspects of the LLVM IR that are important for front-end implementors -to know. In the next chapter of our saga, we will get a bit crazier and add -<a href="LangImpl6.html">user-defined operators</a> to our poor innocent -language.</p> - -</div> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="code">Full Code Listing</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p> -Here is the complete code listing for our running example, enhanced with the -if/then/else and for expressions.. To build this example, use: -</p> - -<div class="doc_code"> -<pre> -# Compile -clang++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy -# Run -./toy -</pre> -</div> - -<p>Here is the code:</p> - -<div class="doc_code"> -<pre> -#include "llvm/DerivedTypes.h" -#include "llvm/ExecutionEngine/ExecutionEngine.h" -#include "llvm/ExecutionEngine/JIT.h" -#include "llvm/IRBuilder.h" -#include "llvm/LLVMContext.h" -#include "llvm/Module.h" -#include "llvm/PassManager.h" -#include "llvm/Analysis/Verifier.h" -#include "llvm/Analysis/Passes.h" -#include "llvm/DataLayout.h" -#include "llvm/Transforms/Scalar.h" -#include "llvm/Support/TargetSelect.h" -#include <cstdio> -#include <string> -#include <map> -#include <vector> -using namespace llvm; - -//===----------------------------------------------------------------------===// -// Lexer -//===----------------------------------------------------------------------===// - -// The lexer returns tokens [0-255] if it is an unknown character, otherwise one -// of these for known things. -enum Token { - tok_eof = -1, - - // commands - tok_def = -2, tok_extern = -3, - - // primary - tok_identifier = -4, tok_number = -5, - - // control - tok_if = -6, tok_then = -7, tok_else = -8, - tok_for = -9, tok_in = -10 -}; - -static std::string IdentifierStr; // Filled in if tok_identifier -static double NumVal; // Filled in if tok_number - -/// gettok - Return the next token from standard input. -static int gettok() { - static int LastChar = ' '; - - // Skip any whitespace. - while (isspace(LastChar)) - LastChar = getchar(); - - if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]* - IdentifierStr = LastChar; - while (isalnum((LastChar = getchar()))) - IdentifierStr += LastChar; - - if (IdentifierStr == "def") return tok_def; - if (IdentifierStr == "extern") return tok_extern; - if (IdentifierStr == "if") return tok_if; - if (IdentifierStr == "then") return tok_then; - if (IdentifierStr == "else") return tok_else; - if (IdentifierStr == "for") return tok_for; - if (IdentifierStr == "in") return tok_in; - return tok_identifier; - } - - if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+ - std::string NumStr; - do { - NumStr += LastChar; - LastChar = getchar(); - } while (isdigit(LastChar) || LastChar == '.'); - - NumVal = strtod(NumStr.c_str(), 0); - return tok_number; - } - - if (LastChar == '#') { - // Comment until end of line. - do LastChar = getchar(); - while (LastChar != EOF && LastChar != '\n' && LastChar != '\r'); - - if (LastChar != EOF) - return gettok(); - } - - // Check for end of file. Don't eat the EOF. - if (LastChar == EOF) - return tok_eof; - - // Otherwise, just return the character as its ascii value. - int ThisChar = LastChar; - LastChar = getchar(); - return ThisChar; -} - -//===----------------------------------------------------------------------===// -// Abstract Syntax Tree (aka Parse Tree) -//===----------------------------------------------------------------------===// - -/// ExprAST - Base class for all expression nodes. -class ExprAST { -public: - virtual ~ExprAST() {} - virtual Value *Codegen() = 0; -}; - -/// NumberExprAST - Expression class for numeric literals like "1.0". -class NumberExprAST : public ExprAST { - double Val; -public: - NumberExprAST(double val) : Val(val) {} - virtual Value *Codegen(); -}; - -/// VariableExprAST - Expression class for referencing a variable, like "a". -class VariableExprAST : public ExprAST { - std::string Name; -public: - VariableExprAST(const std::string &name) : Name(name) {} - virtual Value *Codegen(); -}; - -/// BinaryExprAST - Expression class for a binary operator. -class BinaryExprAST : public ExprAST { - char Op; - ExprAST *LHS, *RHS; -public: - BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs) - : Op(op), LHS(lhs), RHS(rhs) {} - virtual Value *Codegen(); -}; - -/// CallExprAST - Expression class for function calls. -class CallExprAST : public ExprAST { - std::string Callee; - std::vector<ExprAST*> Args; -public: - CallExprAST(const std::string &callee, std::vector<ExprAST*> &args) - : Callee(callee), Args(args) {} - virtual Value *Codegen(); -}; - -/// IfExprAST - Expression class for if/then/else. -class IfExprAST : public ExprAST { - ExprAST *Cond, *Then, *Else; -public: - IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else) - : Cond(cond), Then(then), Else(_else) {} - virtual Value *Codegen(); -}; - -/// ForExprAST - Expression class for for/in. -class ForExprAST : public ExprAST { - std::string VarName; - ExprAST *Start, *End, *Step, *Body; -public: - ForExprAST(const std::string &varname, ExprAST *start, ExprAST *end, - ExprAST *step, ExprAST *body) - : VarName(varname), Start(start), End(end), Step(step), Body(body) {} - virtual Value *Codegen(); -}; - -/// PrototypeAST - This class represents the "prototype" for a function, -/// which captures its name, and its argument names (thus implicitly the number -/// of arguments the function takes). -class PrototypeAST { - std::string Name; - std::vector<std::string> Args; -public: - PrototypeAST(const std::string &name, const std::vector<std::string> &args) - : Name(name), Args(args) {} - - Function *Codegen(); -}; - -/// FunctionAST - This class represents a function definition itself. -class FunctionAST { - PrototypeAST *Proto; - ExprAST *Body; -public: - FunctionAST(PrototypeAST *proto, ExprAST *body) - : Proto(proto), Body(body) {} - - Function *Codegen(); -}; - -//===----------------------------------------------------------------------===// -// Parser -//===----------------------------------------------------------------------===// - -/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current -/// token the parser is looking at. getNextToken reads another token from the -/// lexer and updates CurTok with its results. -static int CurTok; -static int getNextToken() { - return CurTok = gettok(); -} - -/// BinopPrecedence - This holds the precedence for each binary operator that is -/// defined. -static std::map<char, int> BinopPrecedence; - -/// GetTokPrecedence - Get the precedence of the pending binary operator token. -static int GetTokPrecedence() { - if (!isascii(CurTok)) - return -1; - - // Make sure it's a declared binop. - int TokPrec = BinopPrecedence[CurTok]; - if (TokPrec <= 0) return -1; - return TokPrec; -} - -/// Error* - These are little helper functions for error handling. -ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;} -PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; } -FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; } - -static ExprAST *ParseExpression(); - -/// identifierexpr -/// ::= identifier -/// ::= identifier '(' expression* ')' -static ExprAST *ParseIdentifierExpr() { - std::string IdName = IdentifierStr; - - getNextToken(); // eat identifier. - - if (CurTok != '(') // Simple variable ref. - return new VariableExprAST(IdName); - - // Call. - getNextToken(); // eat ( - std::vector<ExprAST*> Args; - if (CurTok != ')') { - while (1) { - ExprAST *Arg = ParseExpression(); - if (!Arg) return 0; - Args.push_back(Arg); - - if (CurTok == ')') break; - - if (CurTok != ',') - return Error("Expected ')' or ',' in argument list"); - getNextToken(); - } - } - - // Eat the ')'. - getNextToken(); - - return new CallExprAST(IdName, Args); -} - -/// numberexpr ::= number -static ExprAST *ParseNumberExpr() { - ExprAST *Result = new NumberExprAST(NumVal); - getNextToken(); // consume the number - return Result; -} - -/// parenexpr ::= '(' expression ')' -static ExprAST *ParseParenExpr() { - getNextToken(); // eat (. - ExprAST *V = ParseExpression(); - if (!V) return 0; - - if (CurTok != ')') - return Error("expected ')'"); - getNextToken(); // eat ). - return V; -} - -/// ifexpr ::= 'if' expression 'then' expression 'else' expression -static ExprAST *ParseIfExpr() { - getNextToken(); // eat the if. - - // condition. - ExprAST *Cond = ParseExpression(); - if (!Cond) return 0; - - if (CurTok != tok_then) - return Error("expected then"); - getNextToken(); // eat the then - - ExprAST *Then = ParseExpression(); - if (Then == 0) return 0; - - if (CurTok != tok_else) - return Error("expected else"); - - getNextToken(); - - ExprAST *Else = ParseExpression(); - if (!Else) return 0; - - return new IfExprAST(Cond, Then, Else); -} - -/// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression -static ExprAST *ParseForExpr() { - getNextToken(); // eat the for. - - if (CurTok != tok_identifier) - return Error("expected identifier after for"); - - std::string IdName = IdentifierStr; - getNextToken(); // eat identifier. - - if (CurTok != '=') - return Error("expected '=' after for"); - getNextToken(); // eat '='. - - - ExprAST *Start = ParseExpression(); - if (Start == 0) return 0; - if (CurTok != ',') - return Error("expected ',' after for start value"); - getNextToken(); - - ExprAST *End = ParseExpression(); - if (End == 0) return 0; - - // The step value is optional. - ExprAST *Step = 0; - if (CurTok == ',') { - getNextToken(); - Step = ParseExpression(); - if (Step == 0) return 0; - } - - if (CurTok != tok_in) - return Error("expected 'in' after for"); - getNextToken(); // eat 'in'. - - ExprAST *Body = ParseExpression(); - if (Body == 0) return 0; - - return new ForExprAST(IdName, Start, End, Step, Body); -} - -/// primary -/// ::= identifierexpr -/// ::= numberexpr -/// ::= parenexpr -/// ::= ifexpr -/// ::= forexpr -static ExprAST *ParsePrimary() { - switch (CurTok) { - default: return Error("unknown token when expecting an expression"); - case tok_identifier: return ParseIdentifierExpr(); - case tok_number: return ParseNumberExpr(); - case '(': return ParseParenExpr(); - case tok_if: return ParseIfExpr(); - case tok_for: return ParseForExpr(); - } -} - -/// binoprhs -/// ::= ('+' primary)* -static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) { - // If this is a binop, find its precedence. - while (1) { - int TokPrec = GetTokPrecedence(); - - // If this is a binop that binds at least as tightly as the current binop, - // consume it, otherwise we are done. - if (TokPrec < ExprPrec) - return LHS; - - // Okay, we know this is a binop. - int BinOp = CurTok; - getNextToken(); // eat binop - - // Parse the primary expression after the binary operator. - ExprAST *RHS = ParsePrimary(); - if (!RHS) return 0; - - // If BinOp binds less tightly with RHS than the operator after RHS, let - // the pending operator take RHS as its LHS. - int NextPrec = GetTokPrecedence(); - if (TokPrec < NextPrec) { - RHS = ParseBinOpRHS(TokPrec+1, RHS); - if (RHS == 0) return 0; - } - - // Merge LHS/RHS. - LHS = new BinaryExprAST(BinOp, LHS, RHS); - } -} - -/// expression -/// ::= primary binoprhs -/// -static ExprAST *ParseExpression() { - ExprAST *LHS = ParsePrimary(); - if (!LHS) return 0; - - return ParseBinOpRHS(0, LHS); -} - -/// prototype -/// ::= id '(' id* ')' -static PrototypeAST *ParsePrototype() { - if (CurTok != tok_identifier) - return ErrorP("Expected function name in prototype"); - - std::string FnName = IdentifierStr; - getNextToken(); - - if (CurTok != '(') - return ErrorP("Expected '(' in prototype"); - - std::vector<std::string> ArgNames; - while (getNextToken() == tok_identifier) - ArgNames.push_back(IdentifierStr); - if (CurTok != ')') - return ErrorP("Expected ')' in prototype"); - - // success. - getNextToken(); // eat ')'. - - return new PrototypeAST(FnName, ArgNames); -} - -/// definition ::= 'def' prototype expression -static FunctionAST *ParseDefinition() { - getNextToken(); // eat def. - PrototypeAST *Proto = ParsePrototype(); - if (Proto == 0) return 0; - - if (ExprAST *E = ParseExpression()) - return new FunctionAST(Proto, E); - return 0; -} - -/// toplevelexpr ::= expression -static FunctionAST *ParseTopLevelExpr() { - if (ExprAST *E = ParseExpression()) { - // Make an anonymous proto. - PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>()); - return new FunctionAST(Proto, E); - } - return 0; -} - -/// external ::= 'extern' prototype -static PrototypeAST *ParseExtern() { - getNextToken(); // eat extern. - return ParsePrototype(); -} - -//===----------------------------------------------------------------------===// -// Code Generation -//===----------------------------------------------------------------------===// - -static Module *TheModule; -static IRBuilder<> Builder(getGlobalContext()); -static std::map<std::string, Value*> NamedValues; -static FunctionPassManager *TheFPM; - -Value *ErrorV(const char *Str) { Error(Str); return 0; } - -Value *NumberExprAST::Codegen() { - return ConstantFP::get(getGlobalContext(), APFloat(Val)); -} - -Value *VariableExprAST::Codegen() { - // Look this variable up in the function. - Value *V = NamedValues[Name]; - return V ? V : ErrorV("Unknown variable name"); -} - -Value *BinaryExprAST::Codegen() { - Value *L = LHS->Codegen(); - Value *R = RHS->Codegen(); - if (L == 0 || R == 0) return 0; - - switch (Op) { - case '+': return Builder.CreateFAdd(L, R, "addtmp"); - case '-': return Builder.CreateFSub(L, R, "subtmp"); - case '*': return Builder.CreateFMul(L, R, "multmp"); - case '<': - L = Builder.CreateFCmpULT(L, R, "cmptmp"); - // Convert bool 0/1 to double 0.0 or 1.0 - return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), - "booltmp"); - default: return ErrorV("invalid binary operator"); - } -} - -Value *CallExprAST::Codegen() { - // Look up the name in the global module table. - Function *CalleeF = TheModule->getFunction(Callee); - if (CalleeF == 0) - return ErrorV("Unknown function referenced"); - - // If argument mismatch error. - if (CalleeF->arg_size() != Args.size()) - return ErrorV("Incorrect # arguments passed"); - - std::vector<Value*> ArgsV; - for (unsigned i = 0, e = Args.size(); i != e; ++i) { - ArgsV.push_back(Args[i]->Codegen()); - if (ArgsV.back() == 0) return 0; - } - - return Builder.CreateCall(CalleeF, ArgsV, "calltmp"); -} - -Value *IfExprAST::Codegen() { - Value *CondV = Cond->Codegen(); - if (CondV == 0) return 0; - - // Convert condition to a bool by comparing equal to 0.0. - CondV = Builder.CreateFCmpONE(CondV, - ConstantFP::get(getGlobalContext(), APFloat(0.0)), - "ifcond"); - - Function *TheFunction = Builder.GetInsertBlock()->getParent(); - - // Create blocks for the then and else cases. Insert the 'then' block at the - // end of the function. - BasicBlock *ThenBB = BasicBlock::Create(getGlobalContext(), "then", TheFunction); - BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else"); - BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont"); - - Builder.CreateCondBr(CondV, ThenBB, ElseBB); - - // Emit then value. - Builder.SetInsertPoint(ThenBB); - - Value *ThenV = Then->Codegen(); - if (ThenV == 0) return 0; - - Builder.CreateBr(MergeBB); - // Codegen of 'Then' can change the current block, update ThenBB for the PHI. - ThenBB = Builder.GetInsertBlock(); - - // Emit else block. - TheFunction->getBasicBlockList().push_back(ElseBB); - Builder.SetInsertPoint(ElseBB); - - Value *ElseV = Else->Codegen(); - if (ElseV == 0) return 0; - - Builder.CreateBr(MergeBB); - // Codegen of 'Else' can change the current block, update ElseBB for the PHI. - ElseBB = Builder.GetInsertBlock(); - - // Emit merge block. - TheFunction->getBasicBlockList().push_back(MergeBB); - Builder.SetInsertPoint(MergeBB); - PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, - "iftmp"); - - PN->addIncoming(ThenV, ThenBB); - PN->addIncoming(ElseV, ElseBB); - return PN; -} - -Value *ForExprAST::Codegen() { - // Output this as: - // ... - // start = startexpr - // goto loop - // loop: - // variable = phi [start, loopheader], [nextvariable, loopend] - // ... - // bodyexpr - // ... - // loopend: - // step = stepexpr - // nextvariable = variable + step - // endcond = endexpr - // br endcond, loop, endloop - // outloop: - - // Emit the start code first, without 'variable' in scope. - Value *StartVal = Start->Codegen(); - if (StartVal == 0) return 0; - - // Make the new basic block for the loop header, inserting after current - // block. - Function *TheFunction = Builder.GetInsertBlock()->getParent(); - BasicBlock *PreheaderBB = Builder.GetInsertBlock(); - BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction); - - // Insert an explicit fall through from the current block to the LoopBB. - Builder.CreateBr(LoopBB); - - // Start insertion in LoopBB. - Builder.SetInsertPoint(LoopBB); - - // Start the PHI node with an entry for Start. - PHINode *Variable = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, VarName.c_str()); - Variable->addIncoming(StartVal, PreheaderBB); - - // Within the loop, the variable is defined equal to the PHI node. If it - // shadows an existing variable, we have to restore it, so save it now. - Value *OldVal = NamedValues[VarName]; - NamedValues[VarName] = Variable; - - // Emit the body of the loop. This, like any other expr, can change the - // current BB. Note that we ignore the value computed by the body, but don't - // allow an error. - if (Body->Codegen() == 0) - return 0; - - // Emit the step value. - Value *StepVal; - if (Step) { - StepVal = Step->Codegen(); - if (StepVal == 0) return 0; - } else { - // If not specified, use 1.0. - StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0)); - } - - Value *NextVar = Builder.CreateFAdd(Variable, StepVal, "nextvar"); - - // Compute the end condition. - Value *EndCond = End->Codegen(); - if (EndCond == 0) return EndCond; - - // Convert condition to a bool by comparing equal to 0.0. - EndCond = Builder.CreateFCmpONE(EndCond, - ConstantFP::get(getGlobalContext(), APFloat(0.0)), - "loopcond"); - - // Create the "after loop" block and insert it. - BasicBlock *LoopEndBB = Builder.GetInsertBlock(); - BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction); - - // Insert the conditional branch into the end of LoopEndBB. - Builder.CreateCondBr(EndCond, LoopBB, AfterBB); - - // Any new code will be inserted in AfterBB. - Builder.SetInsertPoint(AfterBB); - - // Add a new entry to the PHI node for the backedge. - Variable->addIncoming(NextVar, LoopEndBB); - - // Restore the unshadowed variable. - if (OldVal) - NamedValues[VarName] = OldVal; - else - NamedValues.erase(VarName); - - - // for expr always returns 0.0. - return Constant::getNullValue(Type::getDoubleTy(getGlobalContext())); -} - -Function *PrototypeAST::Codegen() { - // Make the function type: double(double,double) etc. - std::vector<Type*> Doubles(Args.size(), - Type::getDoubleTy(getGlobalContext())); - FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()), - Doubles, false); - - Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule); - - // If F conflicted, there was already something named 'Name'. If it has a - // body, don't allow redefinition or reextern. - if (F->getName() != Name) { - // Delete the one we just made and get the existing one. - F->eraseFromParent(); - F = TheModule->getFunction(Name); - - // If F already has a body, reject this. - if (!F->empty()) { - ErrorF("redefinition of function"); - return 0; - } - - // If F took a different number of args, reject. - if (F->arg_size() != Args.size()) { - ErrorF("redefinition of function with different # args"); - return 0; - } - } - - // Set names for all arguments. - unsigned Idx = 0; - for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size(); - ++AI, ++Idx) { - AI->setName(Args[Idx]); - - // Add arguments to variable symbol table. - NamedValues[Args[Idx]] = AI; - } - - return F; -} - -Function *FunctionAST::Codegen() { - NamedValues.clear(); - - Function *TheFunction = Proto->Codegen(); - if (TheFunction == 0) - return 0; - - // Create a new basic block to start insertion into. - BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction); - Builder.SetInsertPoint(BB); - - if (Value *RetVal = Body->Codegen()) { - // Finish off the function. - Builder.CreateRet(RetVal); - - // Validate the generated code, checking for consistency. - verifyFunction(*TheFunction); - - // Optimize the function. - TheFPM->run(*TheFunction); - - return TheFunction; - } - - // Error reading body, remove function. - TheFunction->eraseFromParent(); - return 0; -} - -//===----------------------------------------------------------------------===// -// Top-Level parsing and JIT Driver -//===----------------------------------------------------------------------===// - -static ExecutionEngine *TheExecutionEngine; - -static void HandleDefinition() { - if (FunctionAST *F = ParseDefinition()) { - if (Function *LF = F->Codegen()) { - fprintf(stderr, "Read function definition:"); - LF->dump(); - } - } else { - // Skip token for error recovery. - getNextToken(); - } -} - -static void HandleExtern() { - if (PrototypeAST *P = ParseExtern()) { - if (Function *F = P->Codegen()) { - fprintf(stderr, "Read extern: "); - F->dump(); - } - } else { - // Skip token for error recovery. - getNextToken(); - } -} - -static void HandleTopLevelExpression() { - // Evaluate a top-level expression into an anonymous function. - if (FunctionAST *F = ParseTopLevelExpr()) { - if (Function *LF = F->Codegen()) { - // JIT the function, returning a function pointer. - void *FPtr = TheExecutionEngine->getPointerToFunction(LF); - - // Cast it to the right type (takes no arguments, returns a double) so we - // can call it as a native function. - double (*FP)() = (double (*)())(intptr_t)FPtr; - fprintf(stderr, "Evaluated to %f\n", FP()); - } - } else { - // Skip token for error recovery. - getNextToken(); - } -} - -/// top ::= definition | external | expression | ';' -static void MainLoop() { - while (1) { - fprintf(stderr, "ready> "); - switch (CurTok) { - case tok_eof: return; - case ';': getNextToken(); break; // ignore top-level semicolons. - case tok_def: HandleDefinition(); break; - case tok_extern: HandleExtern(); break; - default: HandleTopLevelExpression(); break; - } - } -} - -//===----------------------------------------------------------------------===// -// "Library" functions that can be "extern'd" from user code. -//===----------------------------------------------------------------------===// - -/// putchard - putchar that takes a double and returns 0. -extern "C" -double putchard(double X) { - putchar((char)X); - return 0; -} - -//===----------------------------------------------------------------------===// -// Main driver code. -//===----------------------------------------------------------------------===// - -int main() { - InitializeNativeTarget(); - LLVMContext &Context = getGlobalContext(); - - // Install standard binary operators. - // 1 is lowest precedence. - BinopPrecedence['<'] = 10; - BinopPrecedence['+'] = 20; - BinopPrecedence['-'] = 20; - BinopPrecedence['*'] = 40; // highest. - - // Prime the first token. - fprintf(stderr, "ready> "); - getNextToken(); - - // Make the module, which holds all the code. - TheModule = new Module("my cool jit", Context); - - // Create the JIT. This takes ownership of the module. - std::string ErrStr; - TheExecutionEngine = EngineBuilder(TheModule).setErrorStr(&ErrStr).create(); - if (!TheExecutionEngine) { - fprintf(stderr, "Could not create ExecutionEngine: %s\n", ErrStr.c_str()); - exit(1); - } - - FunctionPassManager OurFPM(TheModule); - - // Set up the optimizer pipeline. Start with registering info about how the - // target lays out data structures. - OurFPM.add(new DataLayout(*TheExecutionEngine->getDataLayout())); - // Provide basic AliasAnalysis support for GVN. - OurFPM.add(createBasicAliasAnalysisPass()); - // Do simple "peephole" optimizations and bit-twiddling optzns. - OurFPM.add(createInstructionCombiningPass()); - // Reassociate expressions. - OurFPM.add(createReassociatePass()); - // Eliminate Common SubExpressions. - OurFPM.add(createGVNPass()); - // Simplify the control flow graph (deleting unreachable blocks, etc). - OurFPM.add(createCFGSimplificationPass()); - - OurFPM.doInitialization(); - - // Set the global so the code gen can use this. - TheFPM = &OurFPM; - - // Run the main "interpreter loop" now. - MainLoop(); - - TheFPM = 0; - - // Print out all of the generated code. - TheModule->dump(); - - return 0; -} -</pre> -</div> - -<a href="LangImpl6.html">Next: Extending the language: user-defined operators</a> -</div> - -<!-- *********************************************************************** --> -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> - - <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> - <a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> -</body> -</html> diff --git a/docs/tutorial/LangImpl5.rst b/docs/tutorial/LangImpl5.rst new file mode 100644 index 0000000000..8405e1a917 --- /dev/null +++ b/docs/tutorial/LangImpl5.rst @@ -0,0 +1,1609 @@ +================================================== +Kaleidoscope: Extending the Language: Control Flow +================================================== + +.. contents:: + :local: + +Written by `Chris Lattner <mailto:sabre@nondot.org>`_ + +Chapter 5 Introduction +====================== + +Welcome to Chapter 5 of the "`Implementing a language with +LLVM <index.html>`_" tutorial. Parts 1-4 described the implementation of +the simple Kaleidoscope language and included support for generating +LLVM IR, followed by optimizations and a JIT compiler. Unfortunately, as +presented, Kaleidoscope is mostly useless: it has no control flow other +than call and return. This means that you can't have conditional +branches in the code, significantly limiting its power. In this episode +of "build that compiler", we'll extend Kaleidoscope to have an +if/then/else expression plus a simple 'for' loop. + +If/Then/Else +============ + +Extending Kaleidoscope to support if/then/else is quite straightforward. +It basically requires adding support for this "new" concept to the +lexer, parser, AST, and LLVM code emitter. This example is nice, because +it shows how easy it is to "grow" a language over time, incrementally +extending it as new ideas are discovered. + +Before we get going on "how" we add this extension, lets talk about +"what" we want. The basic idea is that we want to be able to write this +sort of thing: + +:: + + def fib(x) + if x < 3 then + 1 + else + fib(x-1)+fib(x-2); + +In Kaleidoscope, every construct is an expression: there are no +statements. As such, the if/then/else expression needs to return a value +like any other. Since we're using a mostly functional form, we'll have +it evaluate its conditional, then return the 'then' or 'else' value +based on how the condition was resolved. This is very similar to the C +"?:" expression. + +The semantics of the if/then/else expression is that it evaluates the +condition to a boolean equality value: 0.0 is considered to be false and +everything else is considered to be true. If the condition is true, the +first subexpression is evaluated and returned, if the condition is +false, the second subexpression is evaluated and returned. Since +Kaleidoscope allows side-effects, this behavior is important to nail +down. + +Now that we know what we "want", lets break this down into its +constituent pieces. + +Lexer Extensions for If/Then/Else +--------------------------------- + +The lexer extensions are straightforward. First we add new enum values +for the relevant tokens: + +.. code-block:: c++ + + // control + tok_if = -6, tok_then = -7, tok_else = -8, + +Once we have that, we recognize the new keywords in the lexer. This is +pretty simple stuff: + +.. code-block:: c++ + + ... + if (IdentifierStr == "def") return tok_def; + if (IdentifierStr == "extern") return tok_extern; + if (IdentifierStr == "if") return tok_if; + if (IdentifierStr == "then") return tok_then; + if (IdentifierStr == "else") return tok_else; + return tok_identifier; + +AST Extensions for If/Then/Else +------------------------------- + +To represent the new expression we add a new AST node for it: + +.. code-block:: c++ + + /// IfExprAST - Expression class for if/then/else. + class IfExprAST : public ExprAST { + ExprAST *Cond, *Then, *Else; + public: + IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else) + : Cond(cond), Then(then), Else(_else) {} + virtual Value *Codegen(); + }; + +The AST node just has pointers to the various subexpressions. + +Parser Extensions for If/Then/Else +---------------------------------- + +Now that we have the relevant tokens coming from the lexer and we have +the AST node to build, our parsing logic is relatively straightforward. +First we define a new parsing function: + +.. code-block:: c++ + + /// ifexpr ::= 'if' expression 'then' expression 'else' expression + static ExprAST *ParseIfExpr() { + getNextToken(); // eat the if. + + // condition. + ExprAST *Cond = ParseExpression(); + if (!Cond) return 0; + + if (CurTok != tok_then) + return Error("expected then"); + getNextToken(); // eat the then + + ExprAST *Then = ParseExpression(); + if (Then == 0) return 0; + + if (CurTok != tok_else) + return Error("expected else"); + + getNextToken(); + + ExprAST *Else = ParseExpression(); + if (!Else) return 0; + + return new IfExprAST(Cond, Then, Else); + } + +Next we hook it up as a primary expression: + +.. code-block:: c++ + + static ExprAST *ParsePrimary() { + switch (CurTok) { + default: return Error("unknown token when expecting an expression"); + case tok_identifier: return ParseIdentifierExpr(); + case tok_number: return ParseNumberExpr(); + case '(': return ParseParenExpr(); + case tok_if: return ParseIfExpr(); + } + } + +LLVM IR for If/Then/Else +------------------------ + +Now that we have it parsing and building the AST, the final piece is +adding LLVM code generation support. This is the most interesting part +of the if/then/else example, because this is where it starts to +introduce new concepts. All of the code above has been thoroughly +described in previous chapters. + +To motivate the code we want to produce, lets take a look at a simple +example. Consider: + +:: + + extern foo(); + extern bar(); + def baz(x) if x then foo() else bar(); + +If you disable optimizations, the code you'll (soon) get from +Kaleidoscope looks like this: + +.. code-block:: llvm + + declare double @foo() + + declare double @bar() + + define double @baz(double %x) { + entry: + %ifcond = fcmp one double %x, 0.000000e+00 + br i1 %ifcond, label %then, label %else + + then: ; preds = %entry + %calltmp = call double @foo() + br label %ifcont + + else: ; preds = %entry + %calltmp1 = call double @bar() + br label %ifcont + + ifcont: ; preds = %else, %then + %iftmp = phi double [ %calltmp, %then ], [ %calltmp1, %else ] + ret double %iftmp + } + +To visualize the control flow graph, you can use a nifty feature of the +LLVM '`opt <http://llvm.org/cmds/opt.html>`_' tool. If you put this LLVM +IR into "t.ll" and run "``llvm-as < t.ll | opt -analyze -view-cfg``", `a +window will pop up <../ProgrammersManual.html#ViewGraph>`_ and you'll +see this graph: + +.. figure:: LangImpl5-cfg.png + :align: center + :alt: Example CFG + + Example CFG + +Another way to get this is to call "``F->viewCFG()``" or +"``F->viewCFGOnly()``" (where F is a "``Function*``") either by +inserting actual calls into the code and recompiling or by calling these +in the debugger. LLVM has many nice features for visualizing various +graphs. + +Getting back to the generated code, it is fairly simple: the entry block +evaluates the conditional expression ("x" in our case here) and compares +the result to 0.0 with the "``fcmp one``" instruction ('one' is "Ordered +and Not Equal"). Based on the result of this expression, the code jumps +to either the "then" or "else" blocks, which contain the expressions for +the true/false cases. + +Once the then/else blocks are finished executing, they both branch back +to the 'ifcont' block to execute the code that happens after the +if/then/else. In this case the only thing left to do is to return to the +caller of the function. The question then becomes: how does the code +know which expression to return? + +The answer to this question involves an important SSA operation: the +`Phi +operation <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_. +If you're not familiar with SSA, `the wikipedia +article <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_ +is a good introduction and there are various other introductions to it +available on your favorite search engine. The short version is that +"execution" of the Phi operation requires "remembering" which block +control came from. The Phi operation takes on the value corresponding to +the input control block. In this case, if control comes in from the +"then" block, it gets the value of "calltmp". If control comes from the +"else" block, it gets the value of "calltmp1". + +At this point, you are probably starting to think "Oh no! This means my +simple and elegant front-end will have to start generating SSA form in +order to use LLVM!". Fortunately, this is not the case, and we strongly +advise *not* implementing an SSA construction algorithm in your +front-end unless there is an amazingly good reason to do so. In +practice, there are two sorts of values that float around in code +written for your average imperative programming language that might need +Phi nodes: + +#. Code that involves user variables: ``x = 1; x = x + 1;`` +#. Values that are implicit in the structure of your AST, such as the + Phi node in this case. + +In `Chapter 7 <LangImpl7.html>`_ of this tutorial ("mutable variables"), +we'll talk about #1 in depth. For now, just believe me that you don't +need SSA construction to handle this case. For #2, you have the choice +of using the techniques that we will describe for #1, or you can insert +Phi nodes directly, if convenient. In this case, it is really really +easy to generate the Phi node, so we choose to do it directly. + +Okay, enough of the motivation and overview, lets generate code! + +Code Generation for If/Then/Else +-------------------------------- + +In order to generate code for this, we implement the ``Codegen`` method +for ``IfExprAST``: + +.. code-block:: c++ + + Value *IfExprAST::Codegen() { + Value *CondV = Cond->Codegen(); + if (CondV == 0) return 0; + + // Convert condition to a bool by comparing equal to 0.0. + CondV = Builder.CreateFCmpONE(CondV, + ConstantFP::get(getGlobalContext(), APFloat(0.0)), + "ifcond"); + +This code is straightforward and similar to what we saw before. We emit +the expression for the condition, then compare that value to zero to get +a truth value as a 1-bit (bool) value. + +.. code-block:: c++ + + Function *TheFunction = Builder.GetInsertBlock()->getParent(); + + // Create blocks for the then and else cases. Insert the 'then' block at the + // end of the function. + BasicBlock *ThenBB = BasicBlock::Create(getGlobalContext(), "then", TheFunction); + BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else"); + BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont"); + + Builder.CreateCondBr(CondV, ThenBB, ElseBB); + +This code creates the basic blocks that are related to the if/then/else +statement, and correspond directly to the blocks in the example above. +The first line gets the current Function object that is being built. It +gets this by asking the builder for the current BasicBlock, and asking +that block for its "parent" (the function it is currently embedded +into). + +Once it has that, it creates three blocks. Note that it passes +"TheFunction" into the constructor for the "then" block. This causes the +constructor to automatically insert the new block into the end of the +specified function. The other two blocks are created, but aren't yet +inserted into the function. + +Once the blocks are created, we can emit the conditional branch that +chooses between them. Note that creating new blocks does not implicitly +affect the IRBuilder, so it is still inserting into the block that the +condition went into. Also note that it is creating a branch to the +"then" block and the "else" block, even though the "else" block isn't +inserted into the function yet. This is all ok: it is the standard way +that LLVM supports forward references. + +.. code-block:: c++ + + // Emit then value. + Builder.SetInsertPoint(ThenBB); + + Value *ThenV = Then->Codegen(); + if (ThenV == 0) return 0; + + Builder.CreateBr(MergeBB); + // Codegen of 'Then' can change the current block, update ThenBB for the PHI. + ThenBB = Builder.GetInsertBlock(); + +After the conditional branch is inserted, we move the builder to start +inserting into the "then" block. Strictly speaking, this call moves the +insertion point to be at the end of the specified block. However, since +the "then" block is empty, it also starts out by inserting at the +beginning of the block. :) + +Once the insertion point is set, we recursively codegen the "then" +expression from the AST. To finish off the "then" block, we create an +unconditional branch to the merge block. One interesting (and very +important) aspect of the LLVM IR is that it `requires all basic blocks +to be "terminated" <../LangRef.html#functionstructure>`_ with a `control +flow instruction <../LangRef.html#terminators>`_ such as return or +branch. This means that all control flow, *including fall throughs* must +be made explicit in the LLVM IR. If you violate this rule, the verifier +will emit an error. + +The final line here is quite subtle, but is very important. The basic +issue is that when we create the Phi node in the merge block, we need to +set up the block/value pairs that indicate how the Phi will work. +Importantly, the Phi node expects to have an entry for each predecessor +of the block in the CFG. Why then, are we getting the current block when +we just set it to ThenBB 5 lines above? The problem is that the "Then" +expression may actually itself change the block that the Builder is +emitting into if, for example, it contains a nested "if/then/else" +expression. Because calling Codegen recursively could arbitrarily change +the notion of the current block, we are required to get an up-to-date +value for code that will set up the Phi node. + +.. code-block:: c++ + + // Emit else block. + TheFunction->getBasicBlockList().push_back(ElseBB); + Builder.SetInsertPoint(ElseBB); + + Value *ElseV = Else->Codegen(); + if (ElseV == 0) return 0; + + Builder.CreateBr(MergeBB); + // Codegen of 'Else' can change the current block, update ElseBB for the PHI. + ElseBB = Builder.GetInsertBlock(); + +Code generation for the 'else' block is basically identical to codegen +for the 'then' block. The only significant difference is the first line, +which adds the 'else' block to the function. Recall previously that the +'else' block was created, but not added to the function. Now that the +'then' and 'else' blocks are emitted, we can finish up with the merge +code: + +.. code-block:: c++ + + // Emit merge block. + TheFunction->getBasicBlockList().push_back(MergeBB); + Builder.SetInsertPoint(MergeBB); + PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, + "iftmp"); + + PN->addIncoming(ThenV, ThenBB); + PN->addIncoming(ElseV, ElseBB); + return PN; + } + +The first two lines here are now familiar: the first adds the "merge" +block to the Function object (it was previously floating, like the else +block above). The second block changes the insertion point so that newly +created code will go into the "merge" block. Once that is done, we need +to create the PHI node and set up the block/value pairs for the PHI. + +Finally, the CodeGen function returns the phi node as the value computed +by the if/then/else expression. In our example above, this returned +value will feed into the code for the top-level function, which will +create the return instruction. + +Overall, we now have the ability to execute conditional code in +Kaleidoscope. With this extension, Kaleidoscope is a fairly complete +language that can calculate a wide variety of numeric functions. Next up +we'll add another useful expression that is familiar from non-functional +languages... + +'for' Loop Expression +===================== + +Now that we know how to add basic control flow constructs to the +language, we have the tools to add more powerful things. Lets add +something more aggressive, a 'for' expression: + +:: + + extern putchard(char) + def printstar(n) + for i = 1, i < n, 1.0 in + putchard(42); # ascii 42 = '*' + + # print 100 '*' characters + printstar(100); + +This expression defines a new variable ("i" in this case) which iterates +from a starting value, while the condition ("i < n" in this case) is +true, incrementing by an optional step value ("1.0" in this case). If +the step value is omitted, it defaults to 1.0. While the loop is true, +it executes its body expression. Because we don't have anything better +to return, we'll just define the loop as always returning 0.0. In the +future when we have mutable variables, it will get more useful. + +As before, lets talk about the changes that we need to Kaleidoscope to +support this. + +Lexer Extensions for the 'for' Loop +----------------------------------- + +The lexer extensions are the same sort of thing as for if/then/else: + +.. code-block:: c++ + + ... in enum Token ... + // control + tok_if = -6, tok_then = -7, tok_else = -8, + tok_for = -9, tok_in = -10 + + ... in gettok ... + if (IdentifierStr == "def") return tok_def; + if (IdentifierStr == "extern") return tok_extern; + if (IdentifierStr == "if") return tok_if; + if (IdentifierStr == "then") return tok_then; + if (IdentifierStr == "else") return tok_else; + if (IdentifierStr == "for") return tok_for; + if (IdentifierStr == "in") return tok_in; + return tok_identifier; + +AST Extensions for the 'for' Loop +--------------------------------- + +The AST node is just as simple. It basically boils down to capturing the +variable name and the constituent expressions in the node. + +.. code-block:: c++ + + /// ForExprAST - Expression class for for/in. + class ForExprAST : public ExprAST { + std::string VarName; + ExprAST *Start, *End, *Step, *Body; + public: + ForExprAST(const std::string &varname, ExprAST *start, ExprAST *end, + ExprAST *step, ExprAST *body) + : VarName(varname), Start(start), End(end), Step(step), Body(body) {} + virtual Value *Codegen(); + }; + +Parser Extensions for the 'for' Loop +------------------------------------ + +The parser code is also fairly standard. The only interesting thing here +is handling of the optional step value. The parser code handles it by +checking to see if the second comma is present. If not, it sets the step +value to null in the AST node: + +.. code-block:: c++ + + /// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression + static ExprAST *ParseForExpr() { + getNextToken(); // eat the for. + + if (CurTok != tok_identifier) + return Error("expected identifier after for"); + + std::string IdName = IdentifierStr; + getNextToken(); // eat identifier. + + if (CurTok != '=') + return Error("expected '=' after for"); + getNextToken(); // eat '='. + + + ExprAST *Start = ParseExpression(); + if (Start == 0) return 0; + if (CurTok != ',') + return Error("expected ',' after for start value"); + getNextToken(); + + ExprAST *End = ParseExpression(); + if (End == 0) return 0; + + // The step value is optional. + ExprAST *Step = 0; + if (CurTok == ',') { + getNextToken(); + Step = ParseExpression(); + if (Step == 0) return 0; + } + + if (CurTok != tok_in) + return Error("expected 'in' after for"); + getNextToken(); // eat 'in'. + + ExprAST *Body = ParseExpression(); + if (Body == 0) return 0; + + return new ForExprAST(IdName, Start, End, Step, Body); + } + +LLVM IR for the 'for' Loop +-------------------------- + +Now we get to the good part: the LLVM IR we want to generate for this +thing. With the simple example above, we get this LLVM IR (note that +this dump is generated with optimizations disabled for clarity): + +.. code-block:: llvm + + declare double @putchard(double) + + define double @printstar(double %n) { + entry: + ; initial value = 1.0 (inlined into phi) + br label %loop + + loop: ; preds = %loop, %entry + %i = phi double [ 1.000000e+00, %entry ], [ %nextvar, %loop ] + ; body + %calltmp = call double @putchard(double 4.200000e+01) + ; increment + %nextvar = fadd double %i, 1.000000e+00 + + ; termination test + %cmptmp = fcmp ult double %i, %n + %booltmp = uitofp i1 %cmptmp to double + %loopcond = fcmp one double %booltmp, 0.000000e+00 + br i1 %loopcond, label %loop, label %afterloop + + afterloop: ; preds = %loop + ; loop always returns 0.0 + ret double 0.000000e+00 + } + +This loop contains all the same constructs we saw before: a phi node, +several expressions, and some basic blocks. Lets see how this fits +together. + +Code Generation for the 'for' Loop +---------------------------------- + +The first part of Codegen is very simple: we just output the start +expression for the loop value: + +.. code-block:: c++ + + Value *ForExprAST::Codegen() { + // Emit the start code first, without 'variable' in scope. + Value *StartVal = Start->Codegen(); + if (StartVal == 0) return 0; + +With this out of the way, the next step is to set up the LLVM basic +block for the start of the loop body. In the case above, the whole loop +body is one block, but remember that the body code itself could consist +of multiple blocks (e.g. if it contains an if/then/else or a for/in +expression). + +.. code-block:: c++ + + // Make the new basic block for the loop header, inserting after current + // block. + Function *TheFunction = Builder.GetInsertBlock()->getParent(); + BasicBlock *PreheaderBB = Builder.GetInsertBlock(); + BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction); + + // Insert an explicit fall through from the current block to the LoopBB. + Builder.CreateBr(LoopBB); + +This code is similar to what we saw for if/then/else. Because we will +need it to create the Phi node, we remember the block that falls through +into the loop. Once we have that, we create the actual block that starts +the loop and create an unconditional branch for the fall-through between +the two blocks. + +.. code-block:: c++ + + // Start insertion in LoopBB. + Builder.SetInsertPoint(LoopBB); + + // Start the PHI node with an entry for Start. + PHINode *Variable = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, VarName.c_str()); + Variable->addIncoming(StartVal, PreheaderBB); + +Now that the "preheader" for the loop is set up, we switch to emitting +code for the loop body. To begin with, we move the insertion point and +create the PHI node for the loop induction variable. Since we already +know the incoming value for the starting value, we add it to the Phi +node. Note that the Phi will eventually get a second value for the +backedge, but we can't set it up yet (because it doesn't exist!). + +.. code-block:: c++ + + // Within the loop, the variable is defined equal to the PHI node. If it + // shadows an existing variable, we have to restore it, so save it now. + Value *OldVal = NamedValues[VarName]; + NamedValues[VarName] = Variable; + + // Emit the body of the loop. This, like any other expr, can change the + // current BB. Note that we ignore the value computed by the body, but don't + // allow an error. + if (Body->Codegen() == 0) + return 0; + +Now the code starts to get more interesting. Our 'for' loop introduces a +new variable to the symbol table. This means that our symbol table can +now contain either function arguments or loop variables. To handle this, +before we codegen the body of the loop, we add the loop variable as the +current value for its name. Note that it is possible that there is a +variable of the same name in the outer scope. It would be easy to make +this an error (emit an error and return null if there is already an +entry for VarName) but we choose to allow shadowing of variables. In +order to handle this correctly, we remember the Value that we are +potentially shadowing in ``OldVal`` (which will be null if there is no +shadowed variable). + +Once the loop variable is set into the symbol table, the code +recursively codegen's the body. This allows the body to use the loop +variable: any references to it will naturally find it in the symbol +table. + +.. code-block:: c++ + + // Emit the step value. + Value *StepVal; + if (Step) { + StepVal = Step->Codegen(); + if (StepVal == 0) return 0; + } else { + // If not specified, use 1.0. + StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0)); + } + + Value *NextVar = Builder.CreateFAdd(Variable, StepVal, "nextvar"); + +Now that the body is emitted, we compute the next value of the iteration +variable by adding the step value, or 1.0 if it isn't present. +'``NextVar``' will be the value of the loop variable on the next +iteration of the loop. + +.. code-block:: c++ + + // Compute the end condition. + Value *EndCond = End->Codegen(); + if (EndCond == 0) return EndCond; + + // Convert condition to a bool by comparing equal to 0.0. + EndCond = Builder.CreateFCmpONE(EndCond, + ConstantFP::get(getGlobalContext(), APFloat(0.0)), + "loopcond"); + +Finally, we evaluate the exit value of the loop, to determine whether +the loop should exit. This mirrors the condition evaluation for the +if/then/else statement. + +.. code-block:: c++ + + // Create the "after loop" block and insert it. + BasicBlock *LoopEndBB = Builder.GetInsertBlock(); + BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction); + + // Insert the conditional branch into the end of LoopEndBB. + Builder.CreateCondBr(EndCond, LoopBB, AfterBB); + + // Any new code will be inserted in AfterBB. + Builder.SetInsertPoint(AfterBB); + +With the code for the body of the loop complete, we just need to finish +up the control flow for it. This code remembers the end block (for the +phi node), then creates the block for the loop exit ("afterloop"). Based +on the value of the exit condition, it creates a conditional branch that +chooses between executing the loop again and exiting the loop. Any +future code is emitted in the "afterloop" block, so it sets the +insertion position to it. + +.. code-block:: c++ + + // Add a new entry to the PHI node for the backedge. + Variable->addIncoming(NextVar, LoopEndBB); + + // Restore the unshadowed variable. + if (OldVal) + NamedValues[VarName] = OldVal; + else + NamedValues.erase(VarName); + + // for expr always returns 0.0. + return Constant::getNullValue(Type::getDoubleTy(getGlobalContext())); + } + +The final code handles various cleanups: now that we have the "NextVar" +value, we can add the incoming value to the loop PHI node. After that, +we remove the loop variable from the symbol table, so that it isn't in +scope after the for loop. Finally, code generation of the for loop +always returns 0.0, so that is what we return from +``ForExprAST::Codegen``. + +With this, we conclude the "adding control flow to Kaleidoscope" chapter +of the tutorial. In this chapter we added two control flow constructs, +and used them to motivate a couple of aspects of the LLVM IR that are +important for front-end implementors to know. In the next chapter of our +saga, we will get a bit crazier and add `user-defined +operators <LangImpl6.html>`_ to our poor innocent language. + +Full Code Listing +================= + +Here is the complete code listing for our running example, enhanced with +the if/then/else and for expressions.. To build this example, use: + +.. code-block:: bash + + # Compile + clang++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy + # Run + ./toy + +Here is the code: + +.. code-block:: c++ + + #include "llvm/DerivedTypes.h" + #include "llvm/ExecutionEngine/ExecutionEngine.h" + #include "llvm/ExecutionEngine/JIT.h" + #include "llvm/IRBuilder.h" + #include "llvm/LLVMContext.h" + #include "llvm/Module.h" + #include "llvm/PassManager.h" + #include "llvm/Analysis/Verifier.h" + #include "llvm/Analysis/Passes.h" + #include "llvm/DataLayout.h" + #include "llvm/Transforms/Scalar.h" + #include "llvm/Support/TargetSelect.h" + #include <cstdio> + #include <string> + #include <map> + #include <vector> + using namespace llvm; + + //===----------------------------------------------------------------------===// + // Lexer + //===----------------------------------------------------------------------===// + + // The lexer returns tokens [0-255] if it is an unknown character, otherwise one + // of these for known things. + enum Token { + tok_eof = -1, + + // commands + tok_def = -2, tok_extern = -3, + + // primary + tok_identifier = -4, tok_number = -5, + + // control + tok_if = -6, tok_then = -7, tok_else = -8, + tok_for = -9, tok_in = -10 + }; + + static std::string IdentifierStr; // Filled in if tok_identifier + static double NumVal; // Filled in if tok_number + + /// gettok - Return the next token from standard input. + static int gettok() { + static int LastChar = ' '; + + // Skip any whitespace. + while (isspace(LastChar)) + LastChar = getchar(); + + if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]* + IdentifierStr = LastChar; + while (isalnum((LastChar = getchar()))) + IdentifierStr += LastChar; + + if (IdentifierStr == "def") return tok_def; + if (IdentifierStr == "extern") return tok_extern; + if (IdentifierStr == "if") return tok_if; + if (IdentifierStr == "then") return tok_then; + if (IdentifierStr == "else") return tok_else; + if (IdentifierStr == "for") return tok_for; + if (IdentifierStr == "in") return tok_in; + return tok_identifier; + } + + if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+ + std::string NumStr; + do { + NumStr += LastChar; + LastChar = getchar(); + } while (isdigit(LastChar) || LastChar == '.'); + + NumVal = strtod(NumStr.c_str(), 0); + return tok_number; + } + + if (LastChar == '#') { + // Comment until end of line. + do LastChar = getchar(); + while (LastChar != EOF && LastChar != '\n' && LastChar != '\r'); + + if (LastChar != EOF) + return gettok(); + } + + // Check for end of file. Don't eat the EOF. + if (LastChar == EOF) + return tok_eof; + + // Otherwise, just return the character as its ascii value. + int ThisChar = LastChar; + LastChar = getchar(); + return ThisChar; + } + + //===----------------------------------------------------------------------===// + // Abstract Syntax Tree (aka Parse Tree) + //===----------------------------------------------------------------------===// + + /// ExprAST - Base class for all expression nodes. + class ExprAST { + public: + virtual ~ExprAST() {} + virtual Value *Codegen() = 0; + }; + + /// NumberExprAST - Expression class for numeric literals like "1.0". + class NumberExprAST : public ExprAST { + double Val; + public: + NumberExprAST(double val) : Val(val) {} + virtual Value *Codegen(); + }; + + /// VariableExprAST - Expression class for referencing a variable, like "a". + class VariableExprAST : public ExprAST { + std::string Name; + public: + VariableExprAST(const std::string &name) : Name(name) {} + virtual Value *Codegen(); + }; + + /// BinaryExprAST - Expression class for a binary operator. + class BinaryExprAST : public ExprAST { + char Op; + ExprAST *LHS, *RHS; + public: + BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs) + : Op(op), LHS(lhs), RHS(rhs) {} + virtual Value *Codegen(); + }; + + /// CallExprAST - Expression class for function calls. + class CallExprAST : public ExprAST { + std::string Callee; + std::vector<ExprAST*> Args; + public: + CallExprAST(const std::string &callee, std::vector<ExprAST*> &args) + : Callee(callee), Args(args) {} + virtual Value *Codegen(); + }; + + /// IfExprAST - Expression class for if/then/else. + class IfExprAST : public ExprAST { + ExprAST *Cond, *Then, *Else; + public: + IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else) + : Cond(cond), Then(then), Else(_else) {} + virtual Value *Codegen(); + }; + + /// ForExprAST - Expression class for for/in. + class ForExprAST : public ExprAST { + std::string VarName; + ExprAST *Start, *End, *Step, *Body; + public: + ForExprAST(const std::string &varname, ExprAST *start, ExprAST *end, + ExprAST *step, ExprAST *body) + : VarName(varname), Start(start), End(end), Step(step), Body(body) {} + virtual Value *Codegen(); + }; + + /// PrototypeAST - This class represents the "prototype" for a function, + /// which captures its name, and its argument names (thus implicitly the number + /// of arguments the function takes). + class PrototypeAST { + std::string Name; + std::vector<std::string> Args; + public: + PrototypeAST(const std::string &name, const std::vector<std::string> &args) + : Name(name), Args(args) {} + + Function *Codegen(); + }; + + /// FunctionAST - This class represents a function definition itself. + class FunctionAST { + PrototypeAST *Proto; + ExprAST *Body; + public: + FunctionAST(PrototypeAST *proto, ExprAST *body) + : Proto(proto), Body(body) {} + + Function *Codegen(); + }; + + //===----------------------------------------------------------------------===// + // Parser + //===----------------------------------------------------------------------===// + + /// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current + /// token the parser is looking at. getNextToken reads another token from the + /// lexer and updates CurTok with its results. + static int CurTok; + static int getNextToken() { + return CurTok = gettok(); + } + + /// BinopPrecedence - This holds the precedence for each binary operator that is + /// defined. + static std::map<char, int> BinopPrecedence; + + /// GetTokPrecedence - Get the precedence of the pending binary operator token. + static int GetTokPrecedence() { + if (!isascii(CurTok)) + return -1; + + // Make sure it's a declared binop. + int TokPrec = BinopPrecedence[CurTok]; + if (TokPrec <= 0) return -1; + return TokPrec; + } + + /// Error* - These are little helper functions for error handling. + ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;} + PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; } + FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; } + + static ExprAST *ParseExpression(); + + /// identifierexpr + /// ::= identifier + /// ::= identifier '(' expression* ')' + static ExprAST *ParseIdentifierExpr() { + std::string IdName = IdentifierStr; + + getNextToken(); // eat identifier. + + if (CurTok != '(') // Simple variable ref. + return new VariableExprAST(IdName); + + // Call. + getNextToken(); // eat ( + std::vector<ExprAST*> Args; + if (CurTok != ')') { + while (1) { + ExprAST *Arg = ParseExpression(); + if (!Arg) return 0; + Args.push_back(Arg); + + if (CurTok == ')') break; + + if (CurTok != ',') + return Error("Expected ')' or ',' in argument list"); + getNextToken(); + } + } + + // Eat the ')'. + getNextToken(); + + return new CallExprAST(IdName, Args); + } + + /// numberexpr ::= number + static ExprAST *ParseNumberExpr() { + ExprAST *Result = new NumberExprAST(NumVal); + getNextToken(); // consume the number + return Result; + } + + /// parenexpr ::= '(' expression ')' + static ExprAST *ParseParenExpr() { + getNextToken(); // eat (. + ExprAST *V = ParseExpression(); + if (!V) return 0; + + if (CurTok != ')') + return Error("expected ')'"); + getNextToken(); // eat ). + return V; + } + + /// ifexpr ::= 'if' expression 'then' expression 'else' expression + static ExprAST *ParseIfExpr() { + getNextToken(); // eat the if. + + // condition. + ExprAST *Cond = ParseExpression(); + if (!Cond) return 0; + + if (CurTok != tok_then) + return Error("expected then"); + getNextToken(); // eat the then + + ExprAST *Then = ParseExpression(); + if (Then == 0) return 0; + + if (CurTok != tok_else) + return Error("expected else"); + + getNextToken(); + + ExprAST *Else = ParseExpression(); + if (!Else) return 0; + + return new IfExprAST(Cond, Then, Else); + } + + /// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression + static ExprAST *ParseForExpr() { + getNextToken(); // eat the for. + + if (CurTok != tok_identifier) + return Error("expected identifier after for"); + + std::string IdName = IdentifierStr; + getNextToken(); // eat identifier. + + if (CurTok != '=') + return Error("expected '=' after for"); + getNextToken(); // eat '='. + + + ExprAST *Start = ParseExpression(); + if (Start == 0) return 0; + if (CurTok != ',') + return Error("expected ',' after for start value"); + getNextToken(); + + ExprAST *End = ParseExpression(); + if (End == 0) return 0; + + // The step value is optional. + ExprAST *Step = 0; + if (CurTok == ',') { + getNextToken(); + Step = ParseExpression(); + if (Step == 0) return 0; + } + + if (CurTok != tok_in) + return Error("expected 'in' after for"); + getNextToken(); // eat 'in'. + + ExprAST *Body = ParseExpression(); + if (Body == 0) return 0; + + return new ForExprAST(IdName, Start, End, Step, Body); + } + + /// primary + /// ::= identifierexpr + /// ::= numberexpr + /// ::= parenexpr + /// ::= ifexpr + /// ::= forexpr + static ExprAST *ParsePrimary() { + switch (CurTok) { + default: return Error("unknown token when expecting an expression"); + case tok_identifier: return ParseIdentifierExpr(); + case tok_number: return ParseNumberExpr(); + case '(': return ParseParenExpr(); + case tok_if: return ParseIfExpr(); + case tok_for: return ParseForExpr(); + } + } + + /// binoprhs + /// ::= ('+' primary)* + static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) { + // If this is a binop, find its precedence. + while (1) { + int TokPrec = GetTokPrecedence(); + + // If this is a binop that binds at least as tightly as the current binop, + // consume it, otherwise we are done. + if (TokPrec < ExprPrec) + return LHS; + + // Okay, we know this is a binop. + int BinOp = CurTok; + getNextToken(); // eat binop + + // Parse the primary expression after the binary operator. + ExprAST *RHS = ParsePrimary(); + if (!RHS) return 0; + + // If BinOp binds less tightly with RHS than the operator after RHS, let + // the pending operator take RHS as its LHS. + int NextPrec = GetTokPrecedence(); + if (TokPrec < NextPrec) { + RHS = ParseBinOpRHS(TokPrec+1, RHS); + if (RHS == 0) return 0; + } + + // Merge LHS/RHS. + LHS = new BinaryExprAST(BinOp, LHS, RHS); + } + } + + /// expression + /// ::= primary binoprhs + /// + static ExprAST *ParseExpression() { + ExprAST *LHS = ParsePrimary(); + if (!LHS) return 0; + + return ParseBinOpRHS(0, LHS); + } + + /// prototype + /// ::= id '(' id* ')' + static PrototypeAST *ParsePrototype() { + if (CurTok != tok_identifier) + return ErrorP("Expected function name in prototype"); + + std::string FnName = IdentifierStr; + getNextToken(); + + if (CurTok != '(') + return ErrorP("Expected '(' in prototype"); + + std::vector<std::string> ArgNames; + while (getNextToken() == tok_identifier) + ArgNames.push_back(IdentifierStr); + if (CurTok != ')') + return ErrorP("Expected ')' in prototype"); + + // success. + getNextToken(); // eat ')'. + + return new PrototypeAST(FnName, ArgNames); + } + + /// definition ::= 'def' prototype expression + static FunctionAST *ParseDefinition() { + getNextToken(); // eat def. + PrototypeAST *Proto = ParsePrototype(); + if (Proto == 0) return 0; + + if (ExprAST *E = ParseExpression()) + return new FunctionAST(Proto, E); + return 0; + } + + /// toplevelexpr ::= expression + static FunctionAST *ParseTopLevelExpr() { + if (ExprAST *E = ParseExpression()) { + // Make an anonymous proto. + PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>()); + return new FunctionAST(Proto, E); + } + return 0; + } + + /// external ::= 'extern' prototype + static PrototypeAST *ParseExtern() { + getNextToken(); // eat extern. + return ParsePrototype(); + } + + //===----------------------------------------------------------------------===// + // Code Generation + //===----------------------------------------------------------------------===// + + static Module *TheModule; + static IRBuilder<> Builder(getGlobalContext()); + static std::map<std::string, Value*> NamedValues; + static FunctionPassManager *TheFPM; + + Value *ErrorV(const char *Str) { Error(Str); return 0; } + + Value *NumberExprAST::Codegen() { + return ConstantFP::get(getGlobalContext(), APFloat(Val)); + } + + Value *VariableExprAST::Codegen() { + // Look this variable up in the function. + Value *V = NamedValues[Name]; + return V ? V : ErrorV("Unknown variable name"); + } + + Value *BinaryExprAST::Codegen() { + Value *L = LHS->Codegen(); + Value *R = RHS->Codegen(); + if (L == 0 || R == 0) return 0; + + switch (Op) { + case '+': return Builder.CreateFAdd(L, R, "addtmp"); + case '-': return Builder.CreateFSub(L, R, "subtmp"); + case '*': return Builder.CreateFMul(L, R, "multmp"); + case '<': + L = Builder.CreateFCmpULT(L, R, "cmptmp"); + // Convert bool 0/1 to double 0.0 or 1.0 + return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), + "booltmp"); + default: return ErrorV("invalid binary operator"); + } + } + + Value *CallExprAST::Codegen() { + // Look up the name in the global module table. + Function *CalleeF = TheModule->getFunction(Callee); + if (CalleeF == 0) + return ErrorV("Unknown function referenced"); + + // If argument mismatch error. + if (CalleeF->arg_size() != Args.size()) + return ErrorV("Incorrect # arguments passed"); + + std::vector<Value*> ArgsV; + for (unsigned i = 0, e = Args.size(); i != e; ++i) { + ArgsV.push_back(Args[i]->Codegen()); + if (ArgsV.back() == 0) return 0; + } + + return Builder.CreateCall(CalleeF, ArgsV, "calltmp"); + } + + Value *IfExprAST::Codegen() { + Value *CondV = Cond->Codegen(); + if (CondV == 0) return 0; + + // Convert condition to a bool by comparing equal to 0.0. + CondV = Builder.CreateFCmpONE(CondV, + ConstantFP::get(getGlobalContext(), APFloat(0.0)), + "ifcond"); + + Function *TheFunction = Builder.GetInsertBlock()->getParent(); + + // Create blocks for the then and else cases. Insert the 'then' block at the + // end of the function. + BasicBlock *ThenBB = BasicBlock::Create(getGlobalContext(), "then", TheFunction); + BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else"); + BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont"); + + Builder.CreateCondBr(CondV, ThenBB, ElseBB); + + // Emit then value. + Builder.SetInsertPoint(ThenBB); + + Value *ThenV = Then->Codegen(); + if (ThenV == 0) return 0; + + Builder.CreateBr(MergeBB); + // Codegen of 'Then' can change the current block, update ThenBB for the PHI. + ThenBB = Builder.GetInsertBlock(); + + // Emit else block. + TheFunction->getBasicBlockList().push_back(ElseBB); + Builder.SetInsertPoint(ElseBB); + + Value *ElseV = Else->Codegen(); + if (ElseV == 0) return 0; + + Builder.CreateBr(MergeBB); + // Codegen of 'Else' can change the current block, update ElseBB for the PHI. + ElseBB = Builder.GetInsertBlock(); + + // Emit merge block. + TheFunction->getBasicBlockList().push_back(MergeBB); + Builder.SetInsertPoint(MergeBB); + PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, + "iftmp"); + + PN->addIncoming(ThenV, ThenBB); + PN->addIncoming(ElseV, ElseBB); + return PN; + } + + Value *ForExprAST::Codegen() { + // Output this as: + // ... + // start = startexpr + // goto loop + // loop: + // variable = phi [start, loopheader], [nextvariable, loopend] + // ... + // bodyexpr + // ... + // loopend: + // step = stepexpr + // nextvariable = variable + step + // endcond = endexpr + // br endcond, loop, endloop + // outloop: + + // Emit the start code first, without 'variable' in scope. + Value *StartVal = Start->Codegen(); + if (StartVal == 0) return 0; + + // Make the new basic block for the loop header, inserting after current + // block. + Function *TheFunction = Builder.GetInsertBlock()->getParent(); + BasicBlock *PreheaderBB = Builder.GetInsertBlock(); + BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction); + + // Insert an explicit fall through from the current block to the LoopBB. + Builder.CreateBr(LoopBB); + + // Start insertion in LoopBB. + Builder.SetInsertPoint(LoopBB); + + // Start the PHI node with an entry for Start. + PHINode *Variable = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, VarName.c_str()); + Variable->addIncoming(StartVal, PreheaderBB); + + // Within the loop, the variable is defined equal to the PHI node. If it + // shadows an existing variable, we have to restore it, so save it now. + Value *OldVal = NamedValues[VarName]; + NamedValues[VarName] = Variable; + + // Emit the body of the loop. This, like any other expr, can change the + // current BB. Note that we ignore the value computed by the body, but don't + // allow an error. + if (Body->Codegen() == 0) + return 0; + + // Emit the step value. + Value *StepVal; + if (Step) { + StepVal = Step->Codegen(); + if (StepVal == 0) return 0; + } else { + // If not specified, use 1.0. + StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0)); + } + + Value *NextVar = Builder.CreateFAdd(Variable, StepVal, "nextvar"); + + // Compute the end condition. + Value *EndCond = End->Codegen(); + if (EndCond == 0) return EndCond; + + // Convert condition to a bool by comparing equal to 0.0. + EndCond = Builder.CreateFCmpONE(EndCond, + ConstantFP::get(getGlobalContext(), APFloat(0.0)), + "loopcond"); + + // Create the "after loop" block and insert it. + BasicBlock *LoopEndBB = Builder.GetInsertBlock(); + BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction); + + // Insert the conditional branch into the end of LoopEndBB. + Builder.CreateCondBr(EndCond, LoopBB, AfterBB); + + // Any new code will be inserted in AfterBB. + Builder.SetInsertPoint(AfterBB); + + // Add a new entry to the PHI node for the backedge. + Variable->addIncoming(NextVar, LoopEndBB); + + // Restore the unshadowed variable. + if (OldVal) + NamedValues[VarName] = OldVal; + else + NamedValues.erase(VarName); + + + // for expr always returns 0.0. + return Constant::getNullValue(Type::getDoubleTy(getGlobalContext())); + } + + Function *PrototypeAST::Codegen() { + // Make the function type: double(double,double) etc. + std::vector<Type*> Doubles(Args.size(), + Type::getDoubleTy(getGlobalContext())); + FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()), + Doubles, false); + + Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule); + + // If F conflicted, there was already something named 'Name'. If it has a + // body, don't allow redefinition or reextern. + if (F->getName() != Name) { + // Delete the one we just made and get the existing one. + F->eraseFromParent(); + F = TheModule->getFunction(Name); + + // If F already has a body, reject this. + if (!F->empty()) { + ErrorF("redefinition of function"); + return 0; + } + + // If F took a different number of args, reject. + if (F->arg_size() != Args.size()) { + ErrorF("redefinition of function with different # args"); + return 0; + } + } + + // Set names for all arguments. + unsigned Idx = 0; + for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size(); + ++AI, ++Idx) { + AI->setName(Args[Idx]); + + // Add arguments to variable symbol table. + NamedValues[Args[Idx]] = AI; + } + + return F; + } + + Function *FunctionAST::Codegen() { + NamedValues.clear(); + + Function *TheFunction = Proto->Codegen(); + if (TheFunction == 0) + return 0; + + // Create a new basic block to start insertion into. + BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction); + Builder.SetInsertPoint(BB); + + if (Value *RetVal = Body->Codegen()) { + // Finish off the function. + Builder.CreateRet(RetVal); + + // Validate the generated code, checking for consistency. + verifyFunction(*TheFunction); + + // Optimize the function. + TheFPM->run(*TheFunction); + + return TheFunction; + } + + // Error reading body, remove function. + TheFunction->eraseFromParent(); + return 0; + } + + //===----------------------------------------------------------------------===// + // Top-Level parsing and JIT Driver + //===----------------------------------------------------------------------===// + + static ExecutionEngine *TheExecutionEngine; + + static void HandleDefinition() { + if (FunctionAST *F = ParseDefinition()) { + if (Function *LF = F->Codegen()) { + fprintf(stderr, "Read function definition:"); + LF->dump(); + } + } else { + // Skip token for error recovery. + getNextToken(); + } + } + + static void HandleExtern() { + if (PrototypeAST *P = ParseExtern()) { + if (Function *F = P->Codegen()) { + fprintf(stderr, "Read extern: "); + F->dump(); + } + } else { + // Skip token for error recovery. + getNextToken(); + } + } + + static void HandleTopLevelExpression() { + // Evaluate a top-level expression into an anonymous function. + if (FunctionAST *F = ParseTopLevelExpr()) { + if (Function *LF = F->Codegen()) { + // JIT the function, returning a function pointer. + void *FPtr = TheExecutionEngine->getPointerToFunction(LF); + + // Cast it to the right type (takes no arguments, returns a double) so we + // can call it as a native function. + double (*FP)() = (double (*)())(intptr_t)FPtr; + fprintf(stderr, "Evaluated to %f\n", FP()); + } + } else { + // Skip token for error recovery. + getNextToken(); + } + } + + /// top ::= definition | external | expression | ';' + static void MainLoop() { + while (1) { + fprintf(stderr, "ready> "); + switch (CurTok) { + case tok_eof: return; + case ';': getNextToken(); break; // ignore top-level semicolons. + case tok_def: HandleDefinition(); break; + case tok_extern: HandleExtern(); break; + default: HandleTopLevelExpression(); break; + } + } + } + + //===----------------------------------------------------------------------===// + // "Library" functions that can be "extern'd" from user code. + //===----------------------------------------------------------------------===// + + /// putchard - putchar that takes a double and returns 0. + extern "C" + double putchard(double X) { + putchar((char)X); + return 0; + } + + //===----------------------------------------------------------------------===// + // Main driver code. + //===----------------------------------------------------------------------===// + + int main() { + InitializeNativeTarget(); + LLVMContext &Context = getGlobalContext(); + + // Install standard binary operators. + // 1 is lowest precedence. + BinopPrecedence['<'] = 10; + BinopPrecedence['+'] = 20; + BinopPrecedence['-'] = 20; + BinopPrecedence['*'] = 40; // highest. + + // Prime the first token. + fprintf(stderr, "ready> "); + getNextToken(); + + // Make the module, which holds all the code. + TheModule = new Module("my cool jit", Context); + + // Create the JIT. This takes ownership of the module. + std::string ErrStr; + TheExecutionEngine = EngineBuilder(TheModule).setErrorStr(&ErrStr).create(); + if (!TheExecutionEngine) { + fprintf(stderr, "Could not create ExecutionEngine: %s\n", ErrStr.c_str()); + exit(1); + } + + FunctionPassManager OurFPM(TheModule); + + // Set up the optimizer pipeline. Start with registering info about how the + // target lays out data structures. + OurFPM.add(new DataLayout(*TheExecutionEngine->getDataLayout())); + // Provide basic AliasAnalysis support for GVN. + OurFPM.add(createBasicAliasAnalysisPass()); + // Do simple "peephole" optimizations and bit-twiddling optzns. + OurFPM.add(createInstructionCombiningPass()); + // Reassociate expressions. + OurFPM.add(createReassociatePass()); + // Eliminate Common SubExpressions. + OurFPM.add(createGVNPass()); + // Simplify the control flow graph (deleting unreachable blocks, etc). + OurFPM.add(createCFGSimplificationPass()); + + OurFPM.doInitialization(); + + // Set the global so the code gen can use this. + TheFPM = &OurFPM; + + // Run the main "interpreter loop" now. + MainLoop(); + + TheFPM = 0; + + // Print out all of the generated code. + TheModule->dump(); + + return 0; + } + +`Next: Extending the language: user-defined operators <LangImpl6.html>`_ + diff --git a/docs/tutorial/LangImpl6.html b/docs/tutorial/LangImpl6.html deleted file mode 100644 index bf502e7da9..0000000000 --- a/docs/tutorial/LangImpl6.html +++ /dev/null @@ -1,1829 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> - -<html> -<head> - <title>Kaleidoscope: Extending the Language: User-defined Operators</title> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <meta name="author" content="Chris Lattner"> - <link rel="stylesheet" href="../_static/llvm.css" type="text/css"> -</head> - -<body> - -<h1>Kaleidoscope: Extending the Language: User-defined Operators</h1> - -<ul> -<li><a href="index.html">Up to Tutorial Index</a></li> -<li>Chapter 6 - <ol> - <li><a href="#intro">Chapter 6 Introduction</a></li> - <li><a href="#idea">User-defined Operators: the Idea</a></li> - <li><a href="#binary">User-defined Binary Operators</a></li> - <li><a href="#unary">User-defined Unary Operators</a></li> - <li><a href="#example">Kicking the Tires</a></li> - <li><a href="#code">Full Code Listing</a></li> - </ol> -</li> -<li><a href="LangImpl7.html">Chapter 7</a>: Extending the Language: Mutable -Variables / SSA Construction</li> -</ul> - -<div class="doc_author"> - <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p> -</div> - -<!-- *********************************************************************** --> -<h2><a name="intro">Chapter 6 Introduction</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>Welcome to Chapter 6 of the "<a href="index.html">Implementing a language -with LLVM</a>" tutorial. At this point in our tutorial, we now have a fully -functional language that is fairly minimal, but also useful. There -is still one big problem with it, however. Our language doesn't have many -useful operators (like division, logical negation, or even any comparisons -besides less-than).</p> - -<p>This chapter of the tutorial takes a wild digression into adding user-defined -operators to the simple and beautiful Kaleidoscope language. This digression now gives -us a simple and ugly language in some ways, but also a powerful one at the same time. -One of the great things about creating your own language is that you get to -decide what is good or bad. In this tutorial we'll assume that it is okay to -use this as a way to show some interesting parsing techniques.</p> - -<p>At the end of this tutorial, we'll run through an example Kaleidoscope -application that <a href="#example">renders the Mandelbrot set</a>. This gives -an example of what you can build with Kaleidoscope and its feature set.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="idea">User-defined Operators: the Idea</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p> -The "operator overloading" that we will add to Kaleidoscope is more general than -languages like C++. In C++, you are only allowed to redefine existing -operators: you can't programatically change the grammar, introduce new -operators, change precedence levels, etc. In this chapter, we will add this -capability to Kaleidoscope, which will let the user round out the set of -operators that are supported.</p> - -<p>The point of going into user-defined operators in a tutorial like this is to -show the power and flexibility of using a hand-written parser. Thus far, the parser -we have been implementing uses recursive descent for most parts of the grammar and -operator precedence parsing for the expressions. See <a -href="LangImpl2.html">Chapter 2</a> for details. Without using operator -precedence parsing, it would be very difficult to allow the programmer to -introduce new operators into the grammar: the grammar is dynamically extensible -as the JIT runs.</p> - -<p>The two specific features we'll add are programmable unary operators (right -now, Kaleidoscope has no unary operators at all) as well as binary operators. -An example of this is:</p> - -<div class="doc_code"> -<pre> -# Logical unary not. -def unary!(v) - if v then - 0 - else - 1; - -# Define > with the same precedence as <. -def binary> 10 (LHS RHS) - RHS < LHS; - -# Binary "logical or", (note that it does not "short circuit") -def binary| 5 (LHS RHS) - if LHS then - 1 - else if RHS then - 1 - else - 0; - -# Define = with slightly lower precedence than relationals. -def binary= 9 (LHS RHS) - !(LHS < RHS | LHS > RHS); -</pre> -</div> - -<p>Many languages aspire to being able to implement their standard runtime -library in the language itself. In Kaleidoscope, we can implement significant -parts of the language in the library!</p> - -<p>We will break down implementation of these features into two parts: -implementing support for user-defined binary operators and adding unary -operators.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="binary">User-defined Binary Operators</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>Adding support for user-defined binary operators is pretty simple with our -current framework. We'll first add support for the unary/binary keywords:</p> - -<div class="doc_code"> -<pre> -enum Token { - ... - <b>// operators - tok_binary = -11, tok_unary = -12</b> -}; -... -static int gettok() { -... - if (IdentifierStr == "for") return tok_for; - if (IdentifierStr == "in") return tok_in; - <b>if (IdentifierStr == "binary") return tok_binary; - if (IdentifierStr == "unary") return tok_unary;</b> - return tok_identifier; -</pre> -</div> - -<p>This just adds lexer support for the unary and binary keywords, like we -did in <a href="LangImpl5.html#iflexer">previous chapters</a>. One nice thing -about our current AST, is that we represent binary operators with full generalisation -by using their ASCII code as the opcode. For our extended operators, we'll use this -same representation, so we don't need any new AST or parser support.</p> - -<p>On the other hand, we have to be able to represent the definitions of these -new operators, in the "def binary| 5" part of the function definition. In our -grammar so far, the "name" for the function definition is parsed as the -"prototype" production and into the <tt>PrototypeAST</tt> AST node. To -represent our new user-defined operators as prototypes, we have to extend -the <tt>PrototypeAST</tt> AST node like this:</p> - -<div class="doc_code"> -<pre> -/// PrototypeAST - This class represents the "prototype" for a function, -/// which captures its argument names as well as if it is an operator. -class PrototypeAST { - std::string Name; - std::vector<std::string> Args; - <b>bool isOperator; - unsigned Precedence; // Precedence if a binary op.</b> -public: - PrototypeAST(const std::string &name, const std::vector<std::string> &args, - <b>bool isoperator = false, unsigned prec = 0</b>) - : Name(name), Args(args), <b>isOperator(isoperator), Precedence(prec)</b> {} - - <b>bool isUnaryOp() const { return isOperator && Args.size() == 1; } - bool isBinaryOp() const { return isOperator && Args.size() == 2; } - - char getOperatorName() const { - assert(isUnaryOp() || isBinaryOp()); - return Name[Name.size()-1]; - } - - unsigned getBinaryPrecedence() const { return Precedence; }</b> - - Function *Codegen(); -}; -</pre> -</div> - -<p>Basically, in addition to knowing a name for the prototype, we now keep track -of whether it was an operator, and if it was, what precedence level the operator -is at. The precedence is only used for binary operators (as you'll see below, -it just doesn't apply for unary operators). Now that we have a way to represent -the prototype for a user-defined operator, we need to parse it:</p> - -<div class="doc_code"> -<pre> -/// prototype -/// ::= id '(' id* ')' -<b>/// ::= binary LETTER number? (id, id)</b> -static PrototypeAST *ParsePrototype() { - std::string FnName; - - <b>unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary. - unsigned BinaryPrecedence = 30;</b> - - switch (CurTok) { - default: - return ErrorP("Expected function name in prototype"); - case tok_identifier: - FnName = IdentifierStr; - Kind = 0; - getNextToken(); - break; - <b>case tok_binary: - getNextToken(); - if (!isascii(CurTok)) - return ErrorP("Expected binary operator"); - FnName = "binary"; - FnName += (char)CurTok; - Kind = 2; - getNextToken(); - - // Read the precedence if present. - if (CurTok == tok_number) { - if (NumVal < 1 || NumVal > 100) - return ErrorP("Invalid precedecnce: must be 1..100"); - BinaryPrecedence = (unsigned)NumVal; - getNextToken(); - } - break;</b> - } - - if (CurTok != '(') - return ErrorP("Expected '(' in prototype"); - - std::vector<std::string> ArgNames; - while (getNextToken() == tok_identifier) - ArgNames.push_back(IdentifierStr); - if (CurTok != ')') - return ErrorP("Expected ')' in prototype"); - - // success. - getNextToken(); // eat ')'. - - <b>// Verify right number of names for operator. - if (Kind && ArgNames.size() != Kind) - return ErrorP("Invalid number of operands for operator"); - - return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence);</b> -} -</pre> -</div> - -<p>This is all fairly straightforward parsing code, and we have already seen -a lot of similar code in the past. One interesting part about the code above is -the couple lines that set up <tt>FnName</tt> for binary operators. This builds names -like "binary@" for a newly defined "@" operator. This then takes advantage of the -fact that symbol names in the LLVM symbol table are allowed to have any character in -them, including embedded nul characters.</p> - -<p>The next interesting thing to add, is codegen support for these binary operators. -Given our current structure, this is a simple addition of a default case for our -existing binary operator node:</p> - -<div class="doc_code"> -<pre> -Value *BinaryExprAST::Codegen() { - Value *L = LHS->Codegen(); - Value *R = RHS->Codegen(); - if (L == 0 || R == 0) return 0; - - switch (Op) { - case '+': return Builder.CreateFAdd(L, R, "addtmp"); - case '-': return Builder.CreateFSub(L, R, "subtmp"); - case '*': return Builder.CreateFMul(L, R, "multmp"); - case '<': - L = Builder.CreateFCmpULT(L, R, "cmptmp"); - // Convert bool 0/1 to double 0.0 or 1.0 - return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), - "booltmp"); - <b>default: break;</b> - } - - <b>// If it wasn't a builtin binary operator, it must be a user defined one. Emit - // a call to it. - Function *F = TheModule->getFunction(std::string("binary")+Op); - assert(F && "binary operator not found!"); - - Value *Ops[2] = { L, R }; - return Builder.CreateCall(F, Ops, "binop");</b> -} - -</pre> -</div> - -<p>As you can see above, the new code is actually really simple. It just does -a lookup for the appropriate operator in the symbol table and generates a -function call to it. Since user-defined operators are just built as normal -functions (because the "prototype" boils down to a function with the right -name) everything falls into place.</p> - -<p>The final piece of code we are missing, is a bit of top-level magic:</p> - -<div class="doc_code"> -<pre> -Function *FunctionAST::Codegen() { - NamedValues.clear(); - - Function *TheFunction = Proto->Codegen(); - if (TheFunction == 0) - return 0; - - <b>// If this is an operator, install it. - if (Proto->isBinaryOp()) - BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence();</b> - - // Create a new basic block to start insertion into. - BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction); - Builder.SetInsertPoint(BB); - - if (Value *RetVal = Body->Codegen()) { - ... -</pre> -</div> - -<p>Basically, before codegening a function, if it is a user-defined operator, we -register it in the precedence table. This allows the binary operator parsing -logic we already have in place to handle it. Since we are working on a fully-general operator precedence parser, this is all we need to do to "extend the grammar".</p> - -<p>Now we have useful user-defined binary operators. This builds a lot -on the previous framework we built for other operators. Adding unary operators -is a bit more challenging, because we don't have any framework for it yet - lets -see what it takes.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="unary">User-defined Unary Operators</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>Since we don't currently support unary operators in the Kaleidoscope -language, we'll need to add everything to support them. Above, we added simple -support for the 'unary' keyword to the lexer. In addition to that, we need an -AST node:</p> - -<div class="doc_code"> -<pre> -/// UnaryExprAST - Expression class for a unary operator. -class UnaryExprAST : public ExprAST { - char Opcode; - ExprAST *Operand; -public: - UnaryExprAST(char opcode, ExprAST *operand) - : Opcode(opcode), Operand(operand) {} - virtual Value *Codegen(); -}; -</pre> -</div> - -<p>This AST node is very simple and obvious by now. It directly mirrors the -binary operator AST node, except that it only has one child. With this, we -need to add the parsing logic. Parsing a unary operator is pretty simple: we'll -add a new function to do it:</p> - -<div class="doc_code"> -<pre> -/// unary -/// ::= primary -/// ::= '!' unary -static ExprAST *ParseUnary() { - // If the current token is not an operator, it must be a primary expr. - if (!isascii(CurTok) || CurTok == '(' || CurTok == ',') - return ParsePrimary(); - - // If this is a unary operator, read it. - int Opc = CurTok; - getNextToken(); - if (ExprAST *Operand = ParseUnary()) - return new UnaryExprAST(Opc, Operand); - return 0; -} -</pre> -</div> - -<p>The grammar we add is pretty straightforward here. If we see a unary -operator when parsing a primary operator, we eat the operator as a prefix and -parse the remaining piece as another unary operator. This allows us to handle -multiple unary operators (e.g. "!!x"). Note that unary operators can't have -ambiguous parses like binary operators can, so there is no need for precedence -information.</p> - -<p>The problem with this function, is that we need to call ParseUnary from somewhere. -To do this, we change previous callers of ParsePrimary to call ParseUnary -instead:</p> - -<div class="doc_code"> -<pre> -/// binoprhs -/// ::= ('+' unary)* -static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) { - ... - <b>// Parse the unary expression after the binary operator. - ExprAST *RHS = ParseUnary(); - if (!RHS) return 0;</b> - ... -} -/// expression -/// ::= unary binoprhs -/// -static ExprAST *ParseExpression() { - <b>ExprAST *LHS = ParseUnary();</b> - if (!LHS) return 0; - - return ParseBinOpRHS(0, LHS); -} -</pre> -</div> - -<p>With these two simple changes, we are now able to parse unary operators and build the -AST for them. Next up, we need to add parser support for prototypes, to parse -the unary operator prototype. We extend the binary operator code above -with:</p> - -<div class="doc_code"> -<pre> -/// prototype -/// ::= id '(' id* ')' -/// ::= binary LETTER number? (id, id) -<b>/// ::= unary LETTER (id)</b> -static PrototypeAST *ParsePrototype() { - std::string FnName; - - unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary. - unsigned BinaryPrecedence = 30; - - switch (CurTok) { - default: - return ErrorP("Expected function name in prototype"); - case tok_identifier: - FnName = IdentifierStr; - Kind = 0; - getNextToken(); - break; - <b>case tok_unary: - getNextToken(); - if (!isascii(CurTok)) - return ErrorP("Expected unary operator"); - FnName = "unary"; - FnName += (char)CurTok; - Kind = 1; - getNextToken(); - break;</b> - case tok_binary: - ... -</pre> -</div> - -<p>As with binary operators, we name unary operators with a name that includes -the operator character. This assists us at code generation time. Speaking of, -the final piece we need to add is codegen support for unary operators. It looks -like this:</p> - -<div class="doc_code"> -<pre> -Value *UnaryExprAST::Codegen() { - Value *OperandV = Operand->Codegen(); - if (OperandV == 0) return 0; - - Function *F = TheModule->getFunction(std::string("unary")+Opcode); - if (F == 0) - return ErrorV("Unknown unary operator"); - - return Builder.CreateCall(F, OperandV, "unop"); -} -</pre> -</div> - -<p>This code is similar to, but simpler than, the code for binary operators. It -is simpler primarily because it doesn't need to handle any predefined operators. -</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="example">Kicking the Tires</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>It is somewhat hard to believe, but with a few simple extensions we've -covered in the last chapters, we have grown a real-ish language. With this, we -can do a lot of interesting things, including I/O, math, and a bunch of other -things. For example, we can now add a nice sequencing operator (printd is -defined to print out the specified value and a newline):</p> - -<div class="doc_code"> -<pre> -ready> <b>extern printd(x);</b> -Read extern: -declare double @printd(double) - -ready> <b>def binary : 1 (x y) 0; # Low-precedence operator that ignores operands.</b> -.. -ready> <b>printd(123) : printd(456) : printd(789);</b> -123.000000 -456.000000 -789.000000 -Evaluated to 0.000000 -</pre> -</div> - -<p>We can also define a bunch of other "primitive" operations, such as:</p> - -<div class="doc_code"> -<pre> -# Logical unary not. -def unary!(v) - if v then - 0 - else - 1; - -# Unary negate. -def unary-(v) - 0-v; - -# Define > with the same precedence as <. -def binary> 10 (LHS RHS) - RHS < LHS; - -# Binary logical or, which does not short circuit. -def binary| 5 (LHS RHS) - if LHS then - 1 - else if RHS then - 1 - else - 0; - -# Binary logical and, which does not short circuit. -def binary& 6 (LHS RHS) - if !LHS then - 0 - else - !!RHS; - -# Define = with slightly lower precedence than relationals. -def binary = 9 (LHS RHS) - !(LHS < RHS | LHS > RHS); - -# Define ':' for sequencing: as a low-precedence operator that ignores operands -# and just returns the RHS. -def binary : 1 (x y) y; -</pre> -</div> - - -<p>Given the previous if/then/else support, we can also define interesting -functions for I/O. For example, the following prints out a character whose -"density" reflects the value passed in: the lower the value, the denser the -character:</p> - -<div class="doc_code"> -<pre> -ready> -<b> -extern putchard(char) -def printdensity(d) - if d > 8 then - putchard(32) # ' ' - else if d > 4 then - putchard(46) # '.' - else if d > 2 then - putchard(43) # '+' - else - putchard(42); # '*'</b> -... -ready> <b>printdensity(1): printdensity(2): printdensity(3): - printdensity(4): printdensity(5): printdensity(9): - putchard(10);</b> -**++. -Evaluated to 0.000000 -</pre> -</div> - -<p>Based on these simple primitive operations, we can start to define more -interesting things. For example, here's a little function that solves for the -number of iterations it takes a function in the complex plane to -converge:</p> - -<div class="doc_code"> -<pre> -# Determine whether the specific location diverges. -# Solve for z = z^2 + c in the complex plane. -def mandleconverger(real imag iters creal cimag) - if iters > 255 | (real*real + imag*imag > 4) then - iters - else - mandleconverger(real*real - imag*imag + creal, - 2*real*imag + cimag, - iters+1, creal, cimag); - -# Return the number of iterations required for the iteration to escape -def mandleconverge(real imag) - mandleconverger(real, imag, 0, real, imag); -</pre> -</div> - -<p>This "<code>z = z<sup>2</sup> + c</code>" function is a beautiful little -creature that is the basis for computation of -the <a href="http://en.wikipedia.org/wiki/Mandelbrot_set">Mandelbrot Set</a>. -Our <tt>mandelconverge</tt> function returns the number of iterations that it -takes for a complex orbit to escape, saturating to 255. This is not a very -useful function by itself, but if you plot its value over a two-dimensional -plane, you can see the Mandelbrot set. Given that we are limited to using -putchard here, our amazing graphical output is limited, but we can whip together -something using the density plotter above:</p> - -<div class="doc_code"> -<pre> -# Compute and plot the mandlebrot set with the specified 2 dimensional range -# info. -def mandelhelp(xmin xmax xstep ymin ymax ystep) - for y = ymin, y < ymax, ystep in ( - (for x = xmin, x < xmax, xstep in - printdensity(mandleconverge(x,y))) - : putchard(10) - ) - -# mandel - This is a convenient helper function for plotting the mandelbrot set -# from the specified position with the specified Magnification. -def mandel(realstart imagstart realmag imagmag) - mandelhelp(realstart, realstart+realmag*78, realmag, - imagstart, imagstart+imagmag*40, imagmag); -</pre> -</div> - -<p>Given this, we can try plotting out the mandlebrot set! Lets try it out:</p> - -<div class="doc_code"> -<pre> -ready> <b>mandel(-2.3, -1.3, 0.05, 0.07);</b> -*******************************+++++++++++************************************* -*************************+++++++++++++++++++++++******************************* -**********************+++++++++++++++++++++++++++++**************************** -*******************+++++++++++++++++++++.. ...++++++++************************* -*****************++++++++++++++++++++++.... ...+++++++++*********************** -***************+++++++++++++++++++++++..... ...+++++++++********************* -**************+++++++++++++++++++++++.... ....+++++++++******************** -*************++++++++++++++++++++++...... .....++++++++******************* -************+++++++++++++++++++++....... .......+++++++****************** -***********+++++++++++++++++++.... ... .+++++++***************** -**********+++++++++++++++++....... .+++++++**************** -*********++++++++++++++........... ...+++++++*************** -********++++++++++++............ ...++++++++************** -********++++++++++... .......... .++++++++************** -*******+++++++++..... .+++++++++************* -*******++++++++...... ..+++++++++************* -*******++++++....... ..+++++++++************* -*******+++++...... ..+++++++++************* -*******.... .... ...+++++++++************* -*******.... . ...+++++++++************* -*******+++++...... ...+++++++++************* -*******++++++....... ..+++++++++************* -*******++++++++...... .+++++++++************* -*******+++++++++..... ..+++++++++************* -********++++++++++... .......... .++++++++************** -********++++++++++++............ ...++++++++************** -*********++++++++++++++.......... ...+++++++*************** -**********++++++++++++++++........ .+++++++**************** -**********++++++++++++++++++++.... ... ..+++++++**************** -***********++++++++++++++++++++++....... .......++++++++***************** -************+++++++++++++++++++++++...... ......++++++++****************** -**************+++++++++++++++++++++++.... ....++++++++******************** -***************+++++++++++++++++++++++..... ...+++++++++********************* -*****************++++++++++++++++++++++.... ...++++++++*********************** -*******************+++++++++++++++++++++......++++++++************************* -*********************++++++++++++++++++++++.++++++++*************************** -*************************+++++++++++++++++++++++******************************* -******************************+++++++++++++************************************ -******************************************************************************* -******************************************************************************* -******************************************************************************* -Evaluated to 0.000000 -ready> <b>mandel(-2, -1, 0.02, 0.04);</b> -**************************+++++++++++++++++++++++++++++++++++++++++++++++++++++ -***********************++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -*********************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++. -*******************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++... -*****************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++..... -***************++++++++++++++++++++++++++++++++++++++++++++++++++++++++........ -**************++++++++++++++++++++++++++++++++++++++++++++++++++++++........... -************+++++++++++++++++++++++++++++++++++++++++++++++++++++.............. -***********++++++++++++++++++++++++++++++++++++++++++++++++++........ . -**********++++++++++++++++++++++++++++++++++++++++++++++............. -********+++++++++++++++++++++++++++++++++++++++++++.................. -*******+++++++++++++++++++++++++++++++++++++++....................... -******+++++++++++++++++++++++++++++++++++........................... -*****++++++++++++++++++++++++++++++++............................ -*****++++++++++++++++++++++++++++............................... -****++++++++++++++++++++++++++...... ......................... -***++++++++++++++++++++++++......... ...... ........... -***++++++++++++++++++++++............ -**+++++++++++++++++++++.............. -**+++++++++++++++++++................ -*++++++++++++++++++................. -*++++++++++++++++............ ... -*++++++++++++++.............. -*+++....++++................ -*.......... ........... -* -*.......... ........... -*+++....++++................ -*++++++++++++++.............. -*++++++++++++++++............ ... -*++++++++++++++++++................. -**+++++++++++++++++++................ -**+++++++++++++++++++++.............. -***++++++++++++++++++++++............ -***++++++++++++++++++++++++......... ...... ........... -****++++++++++++++++++++++++++...... ......................... -*****++++++++++++++++++++++++++++............................... -*****++++++++++++++++++++++++++++++++............................ -******+++++++++++++++++++++++++++++++++++........................... -*******+++++++++++++++++++++++++++++++++++++++....................... -********+++++++++++++++++++++++++++++++++++++++++++.................. -Evaluated to 0.000000 -ready> <b>mandel(-0.9, -1.4, 0.02, 0.03);</b> -******************************************************************************* -******************************************************************************* -******************************************************************************* -**********+++++++++++++++++++++************************************************ -*+++++++++++++++++++++++++++++++++++++++*************************************** -+++++++++++++++++++++++++++++++++++++++++++++********************************** -++++++++++++++++++++++++++++++++++++++++++++++++++***************************** -++++++++++++++++++++++++++++++++++++++++++++++++++++++************************* -+++++++++++++++++++++++++++++++++++++++++++++++++++++++++********************** -+++++++++++++++++++++++++++++++++.........++++++++++++++++++******************* -+++++++++++++++++++++++++++++++.... ......+++++++++++++++++++**************** -+++++++++++++++++++++++++++++....... ........+++++++++++++++++++************** -++++++++++++++++++++++++++++........ ........++++++++++++++++++++************ -+++++++++++++++++++++++++++......... .. ...+++++++++++++++++++++********** -++++++++++++++++++++++++++........... ....++++++++++++++++++++++******** -++++++++++++++++++++++++............. .......++++++++++++++++++++++****** -+++++++++++++++++++++++............. ........+++++++++++++++++++++++**** -++++++++++++++++++++++........... ..........++++++++++++++++++++++*** -++++++++++++++++++++........... .........++++++++++++++++++++++* -++++++++++++++++++............ ...........++++++++++++++++++++ -++++++++++++++++............... .............++++++++++++++++++ -++++++++++++++................. ...............++++++++++++++++ -++++++++++++.................. .................++++++++++++++ -+++++++++.................. .................+++++++++++++ -++++++........ . ......... ..++++++++++++ -++............ ...... ....++++++++++ -.............. ...++++++++++ -.............. ....+++++++++ -.............. .....++++++++ -............. ......++++++++ -........... .......++++++++ -......... ........+++++++ -......... ........+++++++ -......... ....+++++++ -........ ...+++++++ -....... ...+++++++ - ....+++++++ - .....+++++++ - ....+++++++ - ....+++++++ - ....+++++++ -Evaluated to 0.000000 -ready> <b>^D</b> -</pre> -</div> - -<p>At this point, you may be starting to realize that Kaleidoscope is a real -and powerful language. It may not be self-similar :), but it can be used to -plot things that are!</p> - -<p>With this, we conclude the "adding user-defined operators" chapter of the -tutorial. We have successfully augmented our language, adding the ability to extend the -language in the library, and we have shown how this can be used to build a simple but -interesting end-user application in Kaleidoscope. At this point, Kaleidoscope -can build a variety of applications that are functional and can call functions -with side-effects, but it can't actually define and mutate a variable itself. -</p> - -<p>Strikingly, variable mutation is an important feature of some -languages, and it is not at all obvious how to <a href="LangImpl7.html">add -support for mutable variables</a> without having to add an "SSA construction" -phase to your front-end. In the next chapter, we will describe how you can -add variable mutation without building SSA in your front-end.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="code">Full Code Listing</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p> -Here is the complete code listing for our running example, enhanced with the -if/then/else and for expressions.. To build this example, use: -</p> - -<div class="doc_code"> -<pre> -# Compile -clang++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy -# Run -./toy -</pre> -</div> - -<p>On some platforms, you will need to specify -rdynamic or -Wl,--export-dynamic -when linking. This ensures that symbols defined in the main executable are -exported to the dynamic linker and so are available for symbol resolution at -run time. This is not needed if you compile your support code into a shared -library, although doing that will cause problems on Windows.</p> - -<p>Here is the code:</p> - -<div class="doc_code"> -<pre> -#include "llvm/DerivedTypes.h" -#include "llvm/ExecutionEngine/ExecutionEngine.h" -#include "llvm/ExecutionEngine/JIT.h" -#include "llvm/IRBuilder.h" -#include "llvm/LLVMContext.h" -#include "llvm/Module.h" -#include "llvm/PassManager.h" -#include "llvm/Analysis/Verifier.h" -#include "llvm/Analysis/Passes.h" -#include "llvm/DataLayout.h" -#include "llvm/Transforms/Scalar.h" -#include "llvm/Support/TargetSelect.h" -#include <cstdio> -#include <string> -#include <map> -#include <vector> -using namespace llvm; - -//===----------------------------------------------------------------------===// -// Lexer -//===----------------------------------------------------------------------===// - -// The lexer returns tokens [0-255] if it is an unknown character, otherwise one -// of these for known things. -enum Token { - tok_eof = -1, - - // commands - tok_def = -2, tok_extern = -3, - - // primary - tok_identifier = -4, tok_number = -5, - - // control - tok_if = -6, tok_then = -7, tok_else = -8, - tok_for = -9, tok_in = -10, - - // operators - tok_binary = -11, tok_unary = -12 -}; - -static std::string IdentifierStr; // Filled in if tok_identifier -static double NumVal; // Filled in if tok_number - -/// gettok - Return the next token from standard input. -static int gettok() { - static int LastChar = ' '; - - // Skip any whitespace. - while (isspace(LastChar)) - LastChar = getchar(); - - if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]* - IdentifierStr = LastChar; - while (isalnum((LastChar = getchar()))) - IdentifierStr += LastChar; - - if (IdentifierStr == "def") return tok_def; - if (IdentifierStr == "extern") return tok_extern; - if (IdentifierStr == "if") return tok_if; - if (IdentifierStr == "then") return tok_then; - if (IdentifierStr == "else") return tok_else; - if (IdentifierStr == "for") return tok_for; - if (IdentifierStr == "in") return tok_in; - if (IdentifierStr == "binary") return tok_binary; - if (IdentifierStr == "unary") return tok_unary; - return tok_identifier; - } - - if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+ - std::string NumStr; - do { - NumStr += LastChar; - LastChar = getchar(); - } while (isdigit(LastChar) || LastChar == '.'); - - NumVal = strtod(NumStr.c_str(), 0); - return tok_number; - } - - if (LastChar == '#') { - // Comment until end of line. - do LastChar = getchar(); - while (LastChar != EOF && LastChar != '\n' && LastChar != '\r'); - - if (LastChar != EOF) - return gettok(); - } - - // Check for end of file. Don't eat the EOF. - if (LastChar == EOF) - return tok_eof; - - // Otherwise, just return the character as its ascii value. - int ThisChar = LastChar; - LastChar = getchar(); - return ThisChar; -} - -//===----------------------------------------------------------------------===// -// Abstract Syntax Tree (aka Parse Tree) -//===----------------------------------------------------------------------===// - -/// ExprAST - Base class for all expression nodes. -class ExprAST { -public: - virtual ~ExprAST() {} - virtual Value *Codegen() = 0; -}; - -/// NumberExprAST - Expression class for numeric literals like "1.0". -class NumberExprAST : public ExprAST { - double Val; -public: - NumberExprAST(double val) : Val(val) {} - virtual Value *Codegen(); -}; - -/// VariableExprAST - Expression class for referencing a variable, like "a". -class VariableExprAST : public ExprAST { - std::string Name; -public: - VariableExprAST(const std::string &name) : Name(name) {} - virtual Value *Codegen(); -}; - -/// UnaryExprAST - Expression class for a unary operator. -class UnaryExprAST : public ExprAST { - char Opcode; - ExprAST *Operand; -public: - UnaryExprAST(char opcode, ExprAST *operand) - : Opcode(opcode), Operand(operand) {} - virtual Value *Codegen(); -}; - -/// BinaryExprAST - Expression class for a binary operator. -class BinaryExprAST : public ExprAST { - char Op; - ExprAST *LHS, *RHS; -public: - BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs) - : Op(op), LHS(lhs), RHS(rhs) {} - virtual Value *Codegen(); -}; - -/// CallExprAST - Expression class for function calls. -class CallExprAST : public ExprAST { - std::string Callee; - std::vector<ExprAST*> Args; -public: - CallExprAST(const std::string &callee, std::vector<ExprAST*> &args) - : Callee(callee), Args(args) {} - virtual Value *Codegen(); -}; - -/// IfExprAST - Expression class for if/then/else. -class IfExprAST : public ExprAST { - ExprAST *Cond, *Then, *Else; -public: - IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else) - : Cond(cond), Then(then), Else(_else) {} - virtual Value *Codegen(); -}; - -/// ForExprAST - Expression class for for/in. -class ForExprAST : public ExprAST { - std::string VarName; - ExprAST *Start, *End, *Step, *Body; -public: - ForExprAST(const std::string &varname, ExprAST *start, ExprAST *end, - ExprAST *step, ExprAST *body) - : VarName(varname), Start(start), End(end), Step(step), Body(body) {} - virtual Value *Codegen(); -}; - -/// PrototypeAST - This class represents the "prototype" for a function, -/// which captures its name, and its argument names (thus implicitly the number -/// of arguments the function takes), as well as if it is an operator. -class PrototypeAST { - std::string Name; - std::vector<std::string> Args; - bool isOperator; - unsigned Precedence; // Precedence if a binary op. -public: - PrototypeAST(const std::string &name, const std::vector<std::string> &args, - bool isoperator = false, unsigned prec = 0) - : Name(name), Args(args), isOperator(isoperator), Precedence(prec) {} - - bool isUnaryOp() const { return isOperator && Args.size() == 1; } - bool isBinaryOp() const { return isOperator && Args.size() == 2; } - - char getOperatorName() const { - assert(isUnaryOp() || isBinaryOp()); - return Name[Name.size()-1]; - } - - unsigned getBinaryPrecedence() const { return Precedence; } - - Function *Codegen(); -}; - -/// FunctionAST - This class represents a function definition itself. -class FunctionAST { - PrototypeAST *Proto; - ExprAST *Body; -public: - FunctionAST(PrototypeAST *proto, ExprAST *body) - : Proto(proto), Body(body) {} - - Function *Codegen(); -}; - -//===----------------------------------------------------------------------===// -// Parser -//===----------------------------------------------------------------------===// - -/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current -/// token the parser is looking at. getNextToken reads another token from the -/// lexer and updates CurTok with its results. -static int CurTok; -static int getNextToken() { - return CurTok = gettok(); -} - -/// BinopPrecedence - This holds the precedence for each binary operator that is -/// defined. -static std::map<char, int> BinopPrecedence; - -/// GetTokPrecedence - Get the precedence of the pending binary operator token. -static int GetTokPrecedence() { - if (!isascii(CurTok)) - return -1; - - // Make sure it's a declared binop. - int TokPrec = BinopPrecedence[CurTok]; - if (TokPrec <= 0) return -1; - return TokPrec; -} - -/// Error* - These are little helper functions for error handling. -ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;} -PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; } -FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; } - -static ExprAST *ParseExpression(); - -/// identifierexpr -/// ::= identifier -/// ::= identifier '(' expression* ')' -static ExprAST *ParseIdentifierExpr() { - std::string IdName = IdentifierStr; - - getNextToken(); // eat identifier. - - if (CurTok != '(') // Simple variable ref. - return new VariableExprAST(IdName); - - // Call. - getNextToken(); // eat ( - std::vector<ExprAST*> Args; - if (CurTok != ')') { - while (1) { - ExprAST *Arg = ParseExpression(); - if (!Arg) return 0; - Args.push_back(Arg); - - if (CurTok == ')') break; - - if (CurTok != ',') - return Error("Expected ')' or ',' in argument list"); - getNextToken(); - } - } - - // Eat the ')'. - getNextToken(); - - return new CallExprAST(IdName, Args); -} - -/// numberexpr ::= number -static ExprAST *ParseNumberExpr() { - ExprAST *Result = new NumberExprAST(NumVal); - getNextToken(); // consume the number - return Result; -} - -/// parenexpr ::= '(' expression ')' -static ExprAST *ParseParenExpr() { - getNextToken(); // eat (. - ExprAST *V = ParseExpression(); - if (!V) return 0; - - if (CurTok != ')') - return Error("expected ')'"); - getNextToken(); // eat ). - return V; -} - -/// ifexpr ::= 'if' expression 'then' expression 'else' expression -static ExprAST *ParseIfExpr() { - getNextToken(); // eat the if. - - // condition. - ExprAST *Cond = ParseExpression(); - if (!Cond) return 0; - - if (CurTok != tok_then) - return Error("expected then"); - getNextToken(); // eat the then - - ExprAST *Then = ParseExpression(); - if (Then == 0) return 0; - - if (CurTok != tok_else) - return Error("expected else"); - - getNextToken(); - - ExprAST *Else = ParseExpression(); - if (!Else) return 0; - - return new IfExprAST(Cond, Then, Else); -} - -/// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression -static ExprAST *ParseForExpr() { - getNextToken(); // eat the for. - - if (CurTok != tok_identifier) - return Error("expected identifier after for"); - - std::string IdName = IdentifierStr; - getNextToken(); // eat identifier. - - if (CurTok != '=') - return Error("expected '=' after for"); - getNextToken(); // eat '='. - - - ExprAST *Start = ParseExpression(); - if (Start == 0) return 0; - if (CurTok != ',') - return Error("expected ',' after for start value"); - getNextToken(); - - ExprAST *End = ParseExpression(); - if (End == 0) return 0; - - // The step value is optional. - ExprAST *Step = 0; - if (CurTok == ',') { - getNextToken(); - Step = ParseExpression(); - if (Step == 0) return 0; - } - - if (CurTok != tok_in) - return Error("expected 'in' after for"); - getNextToken(); // eat 'in'. - - ExprAST *Body = ParseExpression(); - if (Body == 0) return 0; - - return new ForExprAST(IdName, Start, End, Step, Body); -} - -/// primary -/// ::= identifierexpr -/// ::= numberexpr -/// ::= parenexpr -/// ::= ifexpr -/// ::= forexpr -static ExprAST *ParsePrimary() { - switch (CurTok) { - default: return Error("unknown token when expecting an expression"); - case tok_identifier: return ParseIdentifierExpr(); - case tok_number: return ParseNumberExpr(); - case '(': return ParseParenExpr(); - case tok_if: return ParseIfExpr(); - case tok_for: return ParseForExpr(); - } -} - -/// unary -/// ::= primary -/// ::= '!' unary -static ExprAST *ParseUnary() { - // If the current token is not an operator, it must be a primary expr. - if (!isascii(CurTok) || CurTok == '(' || CurTok == ',') - return ParsePrimary(); - - // If this is a unary operator, read it. - int Opc = CurTok; - getNextToken(); - if (ExprAST *Operand = ParseUnary()) - return new UnaryExprAST(Opc, Operand); - return 0; -} - -/// binoprhs -/// ::= ('+' unary)* -static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) { - // If this is a binop, find its precedence. - while (1) { - int TokPrec = GetTokPrecedence(); - - // If this is a binop that binds at least as tightly as the current binop, - // consume it, otherwise we are done. - if (TokPrec < ExprPrec) - return LHS; - - // Okay, we know this is a binop. - int BinOp = CurTok; - getNextToken(); // eat binop - - // Parse the unary expression after the binary operator. - ExprAST *RHS = ParseUnary(); - if (!RHS) return 0; - - // If BinOp binds less tightly with RHS than the operator after RHS, let - // the pending operator take RHS as its LHS. - int NextPrec = GetTokPrecedence(); - if (TokPrec < NextPrec) { - RHS = ParseBinOpRHS(TokPrec+1, RHS); - if (RHS == 0) return 0; - } - - // Merge LHS/RHS. - LHS = new BinaryExprAST(BinOp, LHS, RHS); - } -} - -/// expression -/// ::= unary binoprhs -/// -static ExprAST *ParseExpression() { - ExprAST *LHS = ParseUnary(); - if (!LHS) return 0; - - return ParseBinOpRHS(0, LHS); -} - -/// prototype -/// ::= id '(' id* ')' -/// ::= binary LETTER number? (id, id) -/// ::= unary LETTER (id) -static PrototypeAST *ParsePrototype() { - std::string FnName; - - unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary. - unsigned BinaryPrecedence = 30; - - switch (CurTok) { - default: - return ErrorP("Expected function name in prototype"); - case tok_identifier: - FnName = IdentifierStr; - Kind = 0; - getNextToken(); - break; - case tok_unary: - getNextToken(); - if (!isascii(CurTok)) - return ErrorP("Expected unary operator"); - FnName = "unary"; - FnName += (char)CurTok; - Kind = 1; - getNextToken(); - break; - case tok_binary: - getNextToken(); - if (!isascii(CurTok)) - return ErrorP("Expected binary operator"); - FnName = "binary"; - FnName += (char)CurTok; - Kind = 2; - getNextToken(); - - // Read the precedence if present. - if (CurTok == tok_number) { - if (NumVal < 1 || NumVal > 100) - return ErrorP("Invalid precedecnce: must be 1..100"); - BinaryPrecedence = (unsigned)NumVal; - getNextToken(); - } - break; - } - - if (CurTok != '(') - return ErrorP("Expected '(' in prototype"); - - std::vector<std::string> ArgNames; - while (getNextToken() == tok_identifier) - ArgNames.push_back(IdentifierStr); - if (CurTok != ')') - return ErrorP("Expected ')' in prototype"); - - // success. - getNextToken(); // eat ')'. - - // Verify right number of names for operator. - if (Kind && ArgNames.size() != Kind) - return ErrorP("Invalid number of operands for operator"); - - return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence); -} - -/// definition ::= 'def' prototype expression -static FunctionAST *ParseDefinition() { - getNextToken(); // eat def. - PrototypeAST *Proto = ParsePrototype(); - if (Proto == 0) return 0; - - if (ExprAST *E = ParseExpression()) - return new FunctionAST(Proto, E); - return 0; -} - -/// toplevelexpr ::= expression -static FunctionAST *ParseTopLevelExpr() { - if (ExprAST *E = ParseExpression()) { - // Make an anonymous proto. - PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>()); - return new FunctionAST(Proto, E); - } - return 0; -} - -/// external ::= 'extern' prototype -static PrototypeAST *ParseExtern() { - getNextToken(); // eat extern. - return ParsePrototype(); -} - -//===----------------------------------------------------------------------===// -// Code Generation -//===----------------------------------------------------------------------===// - -static Module *TheModule; -static IRBuilder<> Builder(getGlobalContext()); -static std::map<std::string, Value*> NamedValues; -static FunctionPassManager *TheFPM; - -Value *ErrorV(const char *Str) { Error(Str); return 0; } - -Value *NumberExprAST::Codegen() { - return ConstantFP::get(getGlobalContext(), APFloat(Val)); -} - -Value *VariableExprAST::Codegen() { - // Look this variable up in the function. - Value *V = NamedValues[Name]; - return V ? V : ErrorV("Unknown variable name"); -} - -Value *UnaryExprAST::Codegen() { - Value *OperandV = Operand->Codegen(); - if (OperandV == 0) return 0; - - Function *F = TheModule->getFunction(std::string("unary")+Opcode); - if (F == 0) - return ErrorV("Unknown unary operator"); - - return Builder.CreateCall(F, OperandV, "unop"); -} - -Value *BinaryExprAST::Codegen() { - Value *L = LHS->Codegen(); - Value *R = RHS->Codegen(); - if (L == 0 || R == 0) return 0; - - switch (Op) { - case '+': return Builder.CreateFAdd(L, R, "addtmp"); - case '-': return Builder.CreateFSub(L, R, "subtmp"); - case '*': return Builder.CreateFMul(L, R, "multmp"); - case '<': - L = Builder.CreateFCmpULT(L, R, "cmptmp"); - // Convert bool 0/1 to double 0.0 or 1.0 - return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), - "booltmp"); - default: break; - } - - // If it wasn't a builtin binary operator, it must be a user defined one. Emit - // a call to it. - Function *F = TheModule->getFunction(std::string("binary")+Op); - assert(F && "binary operator not found!"); - - Value *Ops[2] = { L, R }; - return Builder.CreateCall(F, Ops, "binop"); -} - -Value *CallExprAST::Codegen() { - // Look up the name in the global module table. - Function *CalleeF = TheModule->getFunction(Callee); - if (CalleeF == 0) - return ErrorV("Unknown function referenced"); - - // If argument mismatch error. - if (CalleeF->arg_size() != Args.size()) - return ErrorV("Incorrect # arguments passed"); - - std::vector<Value*> ArgsV; - for (unsigned i = 0, e = Args.size(); i != e; ++i) { - ArgsV.push_back(Args[i]->Codegen()); - if (ArgsV.back() == 0) return 0; - } - - return Builder.CreateCall(CalleeF, ArgsV, "calltmp"); -} - -Value *IfExprAST::Codegen() { - Value *CondV = Cond->Codegen(); - if (CondV == 0) return 0; - - // Convert condition to a bool by comparing equal to 0.0. - CondV = Builder.CreateFCmpONE(CondV, - ConstantFP::get(getGlobalContext(), APFloat(0.0)), - "ifcond"); - - Function *TheFunction = Builder.GetInsertBlock()->getParent(); - - // Create blocks for the then and else cases. Insert the 'then' block at the - // end of the function. - BasicBlock *ThenBB = BasicBlock::Create(getGlobalContext(), "then", TheFunction); - BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else"); - BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont"); - - Builder.CreateCondBr(CondV, ThenBB, ElseBB); - - // Emit then value. - Builder.SetInsertPoint(ThenBB); - - Value *ThenV = Then->Codegen(); - if (ThenV == 0) return 0; - - Builder.CreateBr(MergeBB); - // Codegen of 'Then' can change the current block, update ThenBB for the PHI. - ThenBB = Builder.GetInsertBlock(); - - // Emit else block. - TheFunction->getBasicBlockList().push_back(ElseBB); - Builder.SetInsertPoint(ElseBB); - - Value *ElseV = Else->Codegen(); - if (ElseV == 0) return 0; - - Builder.CreateBr(MergeBB); - // Codegen of 'Else' can change the current block, update ElseBB for the PHI. - ElseBB = Builder.GetInsertBlock(); - - // Emit merge block. - TheFunction->getBasicBlockList().push_back(MergeBB); - Builder.SetInsertPoint(MergeBB); - PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, - "iftmp"); - - PN->addIncoming(ThenV, ThenBB); - PN->addIncoming(ElseV, ElseBB); - return PN; -} - -Value *ForExprAST::Codegen() { - // Output this as: - // ... - // start = startexpr - // goto loop - // loop: - // variable = phi [start, loopheader], [nextvariable, loopend] - // ... - // bodyexpr - // ... - // loopend: - // step = stepexpr - // nextvariable = variable + step - // endcond = endexpr - // br endcond, loop, endloop - // outloop: - - // Emit the start code first, without 'variable' in scope. - Value *StartVal = Start->Codegen(); - if (StartVal == 0) return 0; - - // Make the new basic block for the loop header, inserting after current - // block. - Function *TheFunction = Builder.GetInsertBlock()->getParent(); - BasicBlock *PreheaderBB = Builder.GetInsertBlock(); - BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction); - - // Insert an explicit fall through from the current block to the LoopBB. - Builder.CreateBr(LoopBB); - - // Start insertion in LoopBB. - Builder.SetInsertPoint(LoopBB); - - // Start the PHI node with an entry for Start. - PHINode *Variable = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, VarName.c_str()); - Variable->addIncoming(StartVal, PreheaderBB); - - // Within the loop, the variable is defined equal to the PHI node. If it - // shadows an existing variable, we have to restore it, so save it now. - Value *OldVal = NamedValues[VarName]; - NamedValues[VarName] = Variable; - - // Emit the body of the loop. This, like any other expr, can change the - // current BB. Note that we ignore the value computed by the body, but don't - // allow an error. - if (Body->Codegen() == 0) - return 0; - - // Emit the step value. - Value *StepVal; - if (Step) { - StepVal = Step->Codegen(); - if (StepVal == 0) return 0; - } else { - // If not specified, use 1.0. - StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0)); - } - - Value *NextVar = Builder.CreateFAdd(Variable, StepVal, "nextvar"); - - // Compute the end condition. - Value *EndCond = End->Codegen(); - if (EndCond == 0) return EndCond; - - // Convert condition to a bool by comparing equal to 0.0. - EndCond = Builder.CreateFCmpONE(EndCond, - ConstantFP::get(getGlobalContext(), APFloat(0.0)), - "loopcond"); - - // Create the "after loop" block and insert it. - BasicBlock *LoopEndBB = Builder.GetInsertBlock(); - BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction); - - // Insert the conditional branch into the end of LoopEndBB. - Builder.CreateCondBr(EndCond, LoopBB, AfterBB); - - // Any new code will be inserted in AfterBB. - Builder.SetInsertPoint(AfterBB); - - // Add a new entry to the PHI node for the backedge. - Variable->addIncoming(NextVar, LoopEndBB); - - // Restore the unshadowed variable. - if (OldVal) - NamedValues[VarName] = OldVal; - else - NamedValues.erase(VarName); - - - // for expr always returns 0.0. - return Constant::getNullValue(Type::getDoubleTy(getGlobalContext())); -} - -Function *PrototypeAST::Codegen() { - // Make the function type: double(double,double) etc. - std::vector<Type*> Doubles(Args.size(), - Type::getDoubleTy(getGlobalContext())); - FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()), - Doubles, false); - - Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule); - - // If F conflicted, there was already something named 'Name'. If it has a - // body, don't allow redefinition or reextern. - if (F->getName() != Name) { - // Delete the one we just made and get the existing one. - F->eraseFromParent(); - F = TheModule->getFunction(Name); - - // If F already has a body, reject this. - if (!F->empty()) { - ErrorF("redefinition of function"); - return 0; - } - - // If F took a different number of args, reject. - if (F->arg_size() != Args.size()) { - ErrorF("redefinition of function with different # args"); - return 0; - } - } - - // Set names for all arguments. - unsigned Idx = 0; - for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size(); - ++AI, ++Idx) { - AI->setName(Args[Idx]); - - // Add arguments to variable symbol table. - NamedValues[Args[Idx]] = AI; - } - - return F; -} - -Function *FunctionAST::Codegen() { - NamedValues.clear(); - - Function *TheFunction = Proto->Codegen(); - if (TheFunction == 0) - return 0; - - // If this is an operator, install it. - if (Proto->isBinaryOp()) - BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence(); - - // Create a new basic block to start insertion into. - BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction); - Builder.SetInsertPoint(BB); - - if (Value *RetVal = Body->Codegen()) { - // Finish off the function. - Builder.CreateRet(RetVal); - - // Validate the generated code, checking for consistency. - verifyFunction(*TheFunction); - - // Optimize the function. - TheFPM->run(*TheFunction); - - return TheFunction; - } - - // Error reading body, remove function. - TheFunction->eraseFromParent(); - - if (Proto->isBinaryOp()) - BinopPrecedence.erase(Proto->getOperatorName()); - return 0; -} - -//===----------------------------------------------------------------------===// -// Top-Level parsing and JIT Driver -//===----------------------------------------------------------------------===// - -static ExecutionEngine *TheExecutionEngine; - -static void HandleDefinition() { - if (FunctionAST *F = ParseDefinition()) { - if (Function *LF = F->Codegen()) { - fprintf(stderr, "Read function definition:"); - LF->dump(); - } - } else { - // Skip token for error recovery. - getNextToken(); - } -} - -static void HandleExtern() { - if (PrototypeAST *P = ParseExtern()) { - if (Function *F = P->Codegen()) { - fprintf(stderr, "Read extern: "); - F->dump(); - } - } else { - // Skip token for error recovery. - getNextToken(); - } -} - -static void HandleTopLevelExpression() { - // Evaluate a top-level expression into an anonymous function. - if (FunctionAST *F = ParseTopLevelExpr()) { - if (Function *LF = F->Codegen()) { - // JIT the function, returning a function pointer. - void *FPtr = TheExecutionEngine->getPointerToFunction(LF); - - // Cast it to the right type (takes no arguments, returns a double) so we - // can call it as a native function. - double (*FP)() = (double (*)())(intptr_t)FPtr; - fprintf(stderr, "Evaluated to %f\n", FP()); - } - } else { - // Skip token for error recovery. - getNextToken(); - } -} - -/// top ::= definition | external | expression | ';' -static void MainLoop() { - while (1) { - fprintf(stderr, "ready> "); - switch (CurTok) { - case tok_eof: return; - case ';': getNextToken(); break; // ignore top-level semicolons. - case tok_def: HandleDefinition(); break; - case tok_extern: HandleExtern(); break; - default: HandleTopLevelExpression(); break; - } - } -} - -//===----------------------------------------------------------------------===// -// "Library" functions that can be "extern'd" from user code. -//===----------------------------------------------------------------------===// - -/// putchard - putchar that takes a double and returns 0. -extern "C" -double putchard(double X) { - putchar((char)X); - return 0; -} - -/// printd - printf that takes a double prints it as "%f\n", returning 0. -extern "C" -double printd(double X) { - printf("%f\n", X); - return 0; -} - -//===----------------------------------------------------------------------===// -// Main driver code. -//===----------------------------------------------------------------------===// - -int main() { - InitializeNativeTarget(); - LLVMContext &Context = getGlobalContext(); - - // Install standard binary operators. - // 1 is lowest precedence. - BinopPrecedence['<'] = 10; - BinopPrecedence['+'] = 20; - BinopPrecedence['-'] = 20; - BinopPrecedence['*'] = 40; // highest. - - // Prime the first token. - fprintf(stderr, "ready> "); - getNextToken(); - - // Make the module, which holds all the code. - TheModule = new Module("my cool jit", Context); - - // Create the JIT. This takes ownership of the module. - std::string ErrStr; - TheExecutionEngine = EngineBuilder(TheModule).setErrorStr(&ErrStr).create(); - if (!TheExecutionEngine) { - fprintf(stderr, "Could not create ExecutionEngine: %s\n", ErrStr.c_str()); - exit(1); - } - - FunctionPassManager OurFPM(TheModule); - - // Set up the optimizer pipeline. Start with registering info about how the - // target lays out data structures. - OurFPM.add(new DataLayout(*TheExecutionEngine->getDataLayout())); - // Provide basic AliasAnalysis support for GVN. - OurFPM.add(createBasicAliasAnalysisPass()); - // Do simple "peephole" optimizations and bit-twiddling optzns. - OurFPM.add(createInstructionCombiningPass()); - // Reassociate expressions. - OurFPM.add(createReassociatePass()); - // Eliminate Common SubExpressions. - OurFPM.add(createGVNPass()); - // Simplify the control flow graph (deleting unreachable blocks, etc). - OurFPM.add(createCFGSimplificationPass()); - - OurFPM.doInitialization(); - - // Set the global so the code gen can use this. - TheFPM = &OurFPM; - - // Run the main "interpreter loop" now. - MainLoop(); - - TheFPM = 0; - - // Print out all of the generated code. - TheModule->dump(); - - return 0; -} -</pre> -</div> - -<a href="LangImpl7.html">Next: Extending the language: mutable variables / SSA construction</a> -</div> - -<!-- *********************************************************************** --> -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> - - <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> - <a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> -</body> -</html> diff --git a/docs/tutorial/LangImpl6.rst b/docs/tutorial/LangImpl6.rst new file mode 100644 index 0000000000..30f4e90d03 --- /dev/null +++ b/docs/tutorial/LangImpl6.rst @@ -0,0 +1,1728 @@ +============================================================ +Kaleidoscope: Extending the Language: User-defined Operators +============================================================ + +.. contents:: + :local: + +Written by `Chris Lattner <mailto:sabre@nondot.org>`_ + +Chapter 6 Introduction +====================== + +Welcome to Chapter 6 of the "`Implementing a language with +LLVM <index.html>`_" tutorial. At this point in our tutorial, we now +have a fully functional language that is fairly minimal, but also +useful. There is still one big problem with it, however. Our language +doesn't have many useful operators (like division, logical negation, or +even any comparisons besides less-than). + +This chapter of the tutorial takes a wild digression into adding +user-defined operators to the simple and beautiful Kaleidoscope +language. This digression now gives us a simple and ugly language in +some ways, but also a powerful one at the same time. One of the great +things about creating your own language is that you get to decide what +is good or bad. In this tutorial we'll assume that it is okay to use +this as a way to show some interesting parsing techniques. + +At the end of this tutorial, we'll run through an example Kaleidoscope +application that `renders the Mandelbrot set <#example>`_. This gives an +example of what you can build with Kaleidoscope and its feature set. + +User-defined Operators: the Idea +================================ + +The "operator overloading" that we will add to Kaleidoscope is more +general than languages like C++. In C++, you are only allowed to +redefine existing operators: you can't programatically change the +grammar, introduce new operators, change precedence levels, etc. In this +chapter, we will add this capability to Kaleidoscope, which will let the +user round out the set of operators that are supported. + +The point of going into user-defined operators in a tutorial like this +is to show the power and flexibility of using a hand-written parser. +Thus far, the parser we have been implementing uses recursive descent +for most parts of the grammar and operator precedence parsing for the +expressions. See `Chapter 2 <LangImpl2.html>`_ for details. Without +using operator precedence parsing, it would be very difficult to allow +the programmer to introduce new operators into the grammar: the grammar +is dynamically extensible as the JIT runs. + +The two specific features we'll add are programmable unary operators +(right now, Kaleidoscope has no unary operators at all) as well as +binary operators. An example of this is: + +:: + + # Logical unary not. + def unary!(v) + if v then + 0 + else + 1; + + # Define > with the same precedence as <. + def binary> 10 (LHS RHS) + RHS < LHS; + + # Binary "logical or", (note that it does not "short circuit") + def binary| 5 (LHS RHS) + if LHS then + 1 + else if RHS then + 1 + else + 0; + + # Define = with slightly lower precedence than relationals. + def binary= 9 (LHS RHS) + !(LHS < RHS | LHS > RHS); + +Many languages aspire to being able to implement their standard runtime +library in the language itself. In Kaleidoscope, we can implement +significant parts of the language in the library! + +We will break down implementation of these features into two parts: +implementing support for user-defined binary operators and adding unary +operators. + +User-defined Binary Operators +============================= + +Adding support for user-defined binary operators is pretty simple with +our current framework. We'll first add support for the unary/binary +keywords: + +.. code-block:: c++ + + enum Token { + ... + // operators + tok_binary = -11, tok_unary = -12 + }; + ... + static int gettok() { + ... + if (IdentifierStr == "for") return tok_for; + if (IdentifierStr == "in") return tok_in; + if (IdentifierStr == "binary") return tok_binary; + if (IdentifierStr == "unary") return tok_unary; + return tok_identifier; + +This just adds lexer support for the unary and binary keywords, like we +did in `previous chapters <LangImpl5.html#iflexer>`_. One nice thing +about our current AST, is that we represent binary operators with full +generalisation by using their ASCII code as the opcode. For our extended +operators, we'll use this same representation, so we don't need any new +AST or parser support. + +On the other hand, we have to be able to represent the definitions of +these new operators, in the "def binary\| 5" part of the function +definition. In our grammar so far, the "name" for the function +definition is parsed as the "prototype" production and into the +``PrototypeAST`` AST node. To represent our new user-defined operators +as prototypes, we have to extend the ``PrototypeAST`` AST node like +this: + +.. code-block:: c++ + + /// PrototypeAST - This class represents the "prototype" for a function, + /// which captures its argument names as well as if it is an operator. + class PrototypeAST { + std::string Name; + std::vector<std::string> Args; + bool isOperator; + unsigned Precedence; // Precedence if a binary op. + public: + PrototypeAST(const std::string &name, const std::vector<std::string> &args, + bool isoperator = false, unsigned prec = 0) + : Name(name), Args(args), isOperator(isoperator), Precedence(prec) {} + + bool isUnaryOp() const { return isOperator && Args.size() == 1; } + bool isBinaryOp() const { return isOperator && Args.size() == 2; } + + char getOperatorName() const { + assert(isUnaryOp() || isBinaryOp()); + return Name[Name.size()-1]; + } + + unsigned getBinaryPrecedence() const { return Precedence; } + + Function *Codegen(); + }; + +Basically, in addition to knowing a name for the prototype, we now keep +track of whether it was an operator, and if it was, what precedence +level the operator is at. The precedence is only used for binary +operators (as you'll see below, it just doesn't apply for unary +operators). Now that we have a way to represent the prototype for a +user-defined operator, we need to parse it: + +.. code-block:: c++ + + /// prototype + /// ::= id '(' id* ')' + /// ::= binary LETTER number? (id, id) + static PrototypeAST *ParsePrototype() { + std::string FnName; + + unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary. + unsigned BinaryPrecedence = 30; + + switch (CurTok) { + default: + return ErrorP("Expected function name in prototype"); + case tok_identifier: + FnName = IdentifierStr; + Kind = 0; + getNextToken(); + break; + case tok_binary: + getNextToken(); + if (!isascii(CurTok)) + return ErrorP("Expected binary operator"); + FnName = "binary"; + FnName += (char)CurTok; + Kind = 2; + getNextToken(); + + // Read the precedence if present. + if (CurTok == tok_number) { + if (NumVal < 1 || NumVal > 100) + return ErrorP("Invalid precedecnce: must be 1..100"); + BinaryPrecedence = (unsigned)NumVal; + getNextToken(); + } + break; + } + + if (CurTok != '(') + return ErrorP("Expected '(' in prototype"); + + std::vector<std::string> ArgNames; + while (getNextToken() == tok_identifier) + ArgNames.push_back(IdentifierStr); + if (CurTok != ')') + return ErrorP("Expected ')' in prototype"); + + // success. + getNextToken(); // eat ')'. + + // Verify right number of names for operator. + if (Kind && ArgNames.size() != Kind) + return ErrorP("Invalid number of operands for operator"); + + return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence); + } + +This is all fairly straightforward parsing code, and we have already +seen a lot of similar code in the past. One interesting part about the +code above is the couple lines that set up ``FnName`` for binary +operators. This builds names like "binary@" for a newly defined "@" +operator. This then takes advantage of the fact that symbol names in the +LLVM symbol table are allowed to have any character in them, including +embedded nul characters. + +The next interesting thing to add, is codegen support for these binary +operators. Given our current structure, this is a simple addition of a +default case for our existing binary operator node: + +.. code-block:: c++ + + Value *BinaryExprAST::Codegen() { + Value *L = LHS->Codegen(); + Value *R = RHS->Codegen(); + if (L == 0 || R == 0) return 0; + + switch (Op) { + case '+': return Builder.CreateFAdd(L, R, "addtmp"); + case '-': return Builder.CreateFSub(L, R, "subtmp"); + case '*': return Builder.CreateFMul(L, R, "multmp"); + case '<': + L = Builder.CreateFCmpULT(L, R, "cmptmp"); + // Convert bool 0/1 to double 0.0 or 1.0 + return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), + "booltmp"); + default: break; + } + + // If it wasn't a builtin binary operator, it must be a user defined one. Emit + // a call to it. + Function *F = TheModule->getFunction(std::string("binary")+Op); + assert(F && "binary operator not found!"); + + Value *Ops[2] = { L, R }; + return Builder.CreateCall(F, Ops, "binop"); + } + +As you can see above, the new code is actually really simple. It just +does a lookup for the appropriate operator in the symbol table and +generates a function call to it. Since user-defined operators are just +built as normal functions (because the "prototype" boils down to a +function with the right name) everything falls into place. + +The final piece of code we are missing, is a bit of top-level magic: + +.. code-block:: c++ + + Function *FunctionAST::Codegen() { + NamedValues.clear(); + + Function *TheFunction = Proto->Codegen(); + if (TheFunction == 0) + return 0; + + // If this is an operator, install it. + if (Proto->isBinaryOp()) + BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence(); + + // Create a new basic block to start insertion into. + BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction); + Builder.SetInsertPoint(BB); + + if (Value *RetVal = Body->Codegen()) { + ... + +Basically, before codegening a function, if it is a user-defined +operator, we register it in the precedence table. This allows the binary +operator parsing logic we already have in place to handle it. Since we +are working on a fully-general operator precedence parser, this is all +we need to do to "extend the grammar". + +Now we have useful user-defined binary operators. This builds a lot on +the previous framework we built for other operators. Adding unary +operators is a bit more challenging, because we don't have any framework +for it yet - lets see what it takes. + +User-defined Unary Operators +============================ + +Since we don't currently support unary operators in the Kaleidoscope +language, we'll need to add everything to support them. Above, we added +simple support for the 'unary' keyword to the lexer. In addition to +that, we need an AST node: + +.. code-block:: c++ + + /// UnaryExprAST - Expression class for a unary operator. + class UnaryExprAST : public ExprAST { + char Opcode; + ExprAST *Operand; + public: + UnaryExprAST(char opcode, ExprAST *operand) + : Opcode(opcode), Operand(operand) {} + virtual Value *Codegen(); + }; + +This AST node is very simple and obvious by now. It directly mirrors the +binary operator AST node, except that it only has one child. With this, +we need to add the parsing logic. Parsing a unary operator is pretty +simple: we'll add a new function to do it: + +.. code-block:: c++ + + /// unary + /// ::= primary + /// ::= '!' unary + static ExprAST *ParseUnary() { + // If the current token is not an operator, it must be a primary expr. + if (!isascii(CurTok) || CurTok == '(' || CurTok == ',') + return ParsePrimary(); + + // If this is a unary operator, read it. + int Opc = CurTok; + getNextToken(); + if (ExprAST *Operand = ParseUnary()) + return new UnaryExprAST(Opc, Operand); + return 0; + } + +The grammar we add is pretty straightforward here. If we see a unary +operator when parsing a primary operator, we eat the operator as a +prefix and parse the remaining piece as another unary operator. This +allows us to handle multiple unary operators (e.g. "!!x"). Note that +unary operators can't have ambiguous parses like binary operators can, +so there is no need for precedence information. + +The problem with this function, is that we need to call ParseUnary from +somewhere. To do this, we change previous callers of ParsePrimary to +call ParseUnary instead: + +.. code-block:: c++ + + /// binoprhs + /// ::= ('+' unary)* + static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) { + ... + // Parse the unary expression after the binary operator. + ExprAST *RHS = ParseUnary(); + if (!RHS) return 0; + ... + } + /// expression + /// ::= unary binoprhs + /// + static ExprAST *ParseExpression() { + ExprAST *LHS = ParseUnary(); + if (!LHS) return 0; + + return ParseBinOpRHS(0, LHS); + } + +With these two simple changes, we are now able to parse unary operators +and build the AST for them. Next up, we need to add parser support for +prototypes, to parse the unary operator prototype. We extend the binary +operator code above with: + +.. code-block:: c++ + + /// prototype + /// ::= id '(' id* ')' + /// ::= binary LETTER number? (id, id) + /// ::= unary LETTER (id) + static PrototypeAST *ParsePrototype() { + std::string FnName; + + unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary. + unsigned BinaryPrecedence = 30; + + switch (CurTok) { + default: + return ErrorP("Expected function name in prototype"); + case tok_identifier: + FnName = IdentifierStr; + Kind = 0; + getNextToken(); + break; + case tok_unary: + getNextToken(); + if (!isascii(CurTok)) + return ErrorP("Expected unary operator"); + FnName = "unary"; + FnName += (char)CurTok; + Kind = 1; + getNextToken(); + break; + case tok_binary: + ... + +As with binary operators, we name unary operators with a name that +includes the operator character. This assists us at code generation +time. Speaking of, the final piece we need to add is codegen support for +unary operators. It looks like this: + +.. code-block:: c++ + + Value *UnaryExprAST::Codegen() { + Value *OperandV = Operand->Codegen(); + if (OperandV == 0) return 0; + + Function *F = TheModule->getFunction(std::string("unary")+Opcode); + if (F == 0) + return ErrorV("Unknown unary operator"); + + return Builder.CreateCall(F, OperandV, "unop"); + } + +This code is similar to, but simpler than, the code for binary +operators. It is simpler primarily because it doesn't need to handle any +predefined operators. + +Kicking the Tires +================= + +It is somewhat hard to believe, but with a few simple extensions we've +covered in the last chapters, we have grown a real-ish language. With +this, we can do a lot of interesting things, including I/O, math, and a +bunch of other things. For example, we can now add a nice sequencing +operator (printd is defined to print out the specified value and a +newline): + +:: + + ready> extern printd(x); + Read extern: + declare double @printd(double) + + ready> def binary : 1 (x y) 0; # Low-precedence operator that ignores operands. + .. + ready> printd(123) : printd(456) : printd(789); + 123.000000 + 456.000000 + 789.000000 + Evaluated to 0.000000 + +We can also define a bunch of other "primitive" operations, such as: + +:: + + # Logical unary not. + def unary!(v) + if v then + 0 + else + 1; + + # Unary negate. + def unary-(v) + 0-v; + + # Define > with the same precedence as <. + def binary> 10 (LHS RHS) + RHS < LHS; + + # Binary logical or, which does not short circuit. + def binary| 5 (LHS RHS) + if LHS then + 1 + else if RHS then + 1 + else + 0; + + # Binary logical and, which does not short circuit. + def binary& 6 (LHS RHS) + if !LHS then + 0 + else + !!RHS; + + # Define = with slightly lower precedence than relationals. + def binary = 9 (LHS RHS) + !(LHS < RHS | LHS > RHS); + + # Define ':' for sequencing: as a low-precedence operator that ignores operands + # and just returns the RHS. + def binary : 1 (x y) y; + +Given the previous if/then/else support, we can also define interesting +functions for I/O. For example, the following prints out a character +whose "density" reflects the value passed in: the lower the value, the +denser the character: + +:: + + ready> + + extern putchard(char) + def printdensity(d) + if d > 8 then + putchard(32) # ' ' + else if d > 4 then + putchard(46) # '.' + else if d > 2 then + putchard(43) # '+' + else + putchard(42); # '*' + ... + ready> printdensity(1): printdensity(2): printdensity(3): + printdensity(4): printdensity(5): printdensity(9): + putchard(10); + **++. + Evaluated to 0.000000 + +Based on these simple primitive operations, we can start to define more +interesting things. For example, here's a little function that solves +for the number of iterations it takes a function in the complex plane to +converge: + +:: + + # Determine whether the specific location diverges. + # Solve for z = z^2 + c in the complex plane. + def mandleconverger(real imag iters creal cimag) + if iters > 255 | (real*real + imag*imag > 4) then + iters + else + mandleconverger(real*real - imag*imag + creal, + 2*real*imag + cimag, + iters+1, creal, cimag); + + # Return the number of iterations required for the iteration to escape + def mandleconverge(real imag) + mandleconverger(real, imag, 0, real, imag); + +This "``z = z2 + c``" function is a beautiful little creature that is +the basis for computation of the `Mandelbrot +Set <http://en.wikipedia.org/wiki/Mandelbrot_set>`_. Our +``mandelconverge`` function returns the number of iterations that it +takes for a complex orbit to escape, saturating to 255. This is not a +very useful function by itself, but if you plot its value over a +two-dimensional plane, you can see the Mandelbrot set. Given that we are +limited to using putchard here, our amazing graphical output is limited, +but we can whip together something using the density plotter above: + +:: + + # Compute and plot the mandlebrot set with the specified 2 dimensional range + # info. + def mandelhelp(xmin xmax xstep ymin ymax ystep) + for y = ymin, y < ymax, ystep in ( + (for x = xmin, x < xmax, xstep in + printdensity(mandleconverge(x,y))) + : putchard(10) + ) + + # mandel - This is a convenient helper function for plotting the mandelbrot set + # from the specified position with the specified Magnification. + def mandel(realstart imagstart realmag imagmag) + mandelhelp(realstart, realstart+realmag*78, realmag, + imagstart, imagstart+imagmag*40, imagmag); + +Given this, we can try plotting out the mandlebrot set! Lets try it out: + +:: + + ready> mandel(-2.3, -1.3, 0.05, 0.07); + *******************************+++++++++++************************************* + *************************+++++++++++++++++++++++******************************* + **********************+++++++++++++++++++++++++++++**************************** + *******************+++++++++++++++++++++.. ...++++++++************************* + *****************++++++++++++++++++++++.... ...+++++++++*********************** + ***************+++++++++++++++++++++++..... ...+++++++++********************* + **************+++++++++++++++++++++++.... ....+++++++++******************** + *************++++++++++++++++++++++...... .....++++++++******************* + ************+++++++++++++++++++++....... .......+++++++****************** + ***********+++++++++++++++++++.... ... .+++++++***************** + **********+++++++++++++++++....... .+++++++**************** + *********++++++++++++++........... ...+++++++*************** + ********++++++++++++............ ...++++++++************** + ********++++++++++... .......... .++++++++************** + *******+++++++++..... .+++++++++************* + *******++++++++...... ..+++++++++************* + *******++++++....... ..+++++++++************* + *******+++++...... ..+++++++++************* + *******.... .... ...+++++++++************* + *******.... . ...+++++++++************* + *******+++++...... ...+++++++++************* + *******++++++....... ..+++++++++************* + *******++++++++...... .+++++++++************* + *******+++++++++..... ..+++++++++************* + ********++++++++++... .......... .++++++++************** + ********++++++++++++............ ...++++++++************** + *********++++++++++++++.......... ...+++++++*************** + **********++++++++++++++++........ .+++++++**************** + **********++++++++++++++++++++.... ... ..+++++++**************** + ***********++++++++++++++++++++++....... .......++++++++***************** + ************+++++++++++++++++++++++...... ......++++++++****************** + **************+++++++++++++++++++++++.... ....++++++++******************** + ***************+++++++++++++++++++++++..... ...+++++++++********************* + *****************++++++++++++++++++++++.... ...++++++++*********************** + *******************+++++++++++++++++++++......++++++++************************* + *********************++++++++++++++++++++++.++++++++*************************** + *************************+++++++++++++++++++++++******************************* + ******************************+++++++++++++************************************ + ******************************************************************************* + ******************************************************************************* + ******************************************************************************* + Evaluated to 0.000000 + ready> mandel(-2, -1, 0.02, 0.04); + **************************+++++++++++++++++++++++++++++++++++++++++++++++++++++ + ***********************++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + *********************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++. + *******************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++... + *****************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++..... + ***************++++++++++++++++++++++++++++++++++++++++++++++++++++++++........ + **************++++++++++++++++++++++++++++++++++++++++++++++++++++++........... + ************+++++++++++++++++++++++++++++++++++++++++++++++++++++.............. + ***********++++++++++++++++++++++++++++++++++++++++++++++++++........ . + **********++++++++++++++++++++++++++++++++++++++++++++++............. + ********+++++++++++++++++++++++++++++++++++++++++++.................. + *******+++++++++++++++++++++++++++++++++++++++....................... + ******+++++++++++++++++++++++++++++++++++........................... + *****++++++++++++++++++++++++++++++++............................ + *****++++++++++++++++++++++++++++............................... + ****++++++++++++++++++++++++++...... ......................... + ***++++++++++++++++++++++++......... ...... ........... + ***++++++++++++++++++++++............ + **+++++++++++++++++++++.............. + **+++++++++++++++++++................ + *++++++++++++++++++................. + *++++++++++++++++............ ... + *++++++++++++++.............. + *+++....++++................ + *.......... ........... + * + *.......... ........... + *+++....++++................ + *++++++++++++++.............. + *++++++++++++++++............ ... + *++++++++++++++++++................. + **+++++++++++++++++++................ + **+++++++++++++++++++++.............. + ***++++++++++++++++++++++............ + ***++++++++++++++++++++++++......... ...... ........... + ****++++++++++++++++++++++++++...... ......................... + *****++++++++++++++++++++++++++++............................... + *****++++++++++++++++++++++++++++++++............................ + ******+++++++++++++++++++++++++++++++++++........................... + *******+++++++++++++++++++++++++++++++++++++++....................... + ********+++++++++++++++++++++++++++++++++++++++++++.................. + Evaluated to 0.000000 + ready> mandel(-0.9, -1.4, 0.02, 0.03); + ******************************************************************************* + ******************************************************************************* + ******************************************************************************* + **********+++++++++++++++++++++************************************************ + *+++++++++++++++++++++++++++++++++++++++*************************************** + +++++++++++++++++++++++++++++++++++++++++++++********************************** + ++++++++++++++++++++++++++++++++++++++++++++++++++***************************** + ++++++++++++++++++++++++++++++++++++++++++++++++++++++************************* + +++++++++++++++++++++++++++++++++++++++++++++++++++++++++********************** + +++++++++++++++++++++++++++++++++.........++++++++++++++++++******************* + +++++++++++++++++++++++++++++++.... ......+++++++++++++++++++**************** + +++++++++++++++++++++++++++++....... ........+++++++++++++++++++************** + ++++++++++++++++++++++++++++........ ........++++++++++++++++++++************ + +++++++++++++++++++++++++++......... .. ...+++++++++++++++++++++********** + ++++++++++++++++++++++++++........... ....++++++++++++++++++++++******** + ++++++++++++++++++++++++............. .......++++++++++++++++++++++****** + +++++++++++++++++++++++............. ........+++++++++++++++++++++++**** + ++++++++++++++++++++++........... ..........++++++++++++++++++++++*** + ++++++++++++++++++++........... .........++++++++++++++++++++++* + ++++++++++++++++++............ ...........++++++++++++++++++++ + ++++++++++++++++............... .............++++++++++++++++++ + ++++++++++++++................. ...............++++++++++++++++ + ++++++++++++.................. .................++++++++++++++ + +++++++++.................. .................+++++++++++++ + ++++++........ . ......... ..++++++++++++ + ++............ ...... ....++++++++++ + .............. ...++++++++++ + .............. ....+++++++++ + .............. .....++++++++ + ............. ......++++++++ + ........... .......++++++++ + ......... ........+++++++ + ......... ........+++++++ + ......... ....+++++++ + ........ ...+++++++ + ....... ...+++++++ + ....+++++++ + .....+++++++ + ....+++++++ + ....+++++++ + ....+++++++ + Evaluated to 0.000000 + ready> ^D + +At this point, you may be starting to realize that Kaleidoscope is a +real and powerful language. It may not be self-similar :), but it can be +used to plot things that are! + +With this, we conclude the "adding user-defined operators" chapter of +the tutorial. We have successfully augmented our language, adding the +ability to extend the language in the library, and we have shown how +this can be used to build a simple but interesting end-user application +in Kaleidoscope. At this point, Kaleidoscope can build a variety of +applications that are functional and can call functions with +side-effects, but it can't actually define and mutate a variable itself. + +Strikingly, variable mutation is an important feature of some languages, +and it is not at all obvious how to `add support for mutable +variables <LangImpl7.html>`_ without having to add an "SSA construction" +phase to your front-end. In the next chapter, we will describe how you +can add variable mutation without building SSA in your front-end. + +Full Code Listing +================= + +Here is the complete code listing for our running example, enhanced with +the if/then/else and for expressions.. To build this example, use: + +.. code-block:: bash + + # Compile + clang++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy + # Run + ./toy + +On some platforms, you will need to specify -rdynamic or +-Wl,--export-dynamic when linking. This ensures that symbols defined in +the main executable are exported to the dynamic linker and so are +available for symbol resolution at run time. This is not needed if you +compile your support code into a shared library, although doing that +will cause problems on Windows. + +Here is the code: + +.. code-block:: c++ + + #include "llvm/DerivedTypes.h" + #include "llvm/ExecutionEngine/ExecutionEngine.h" + #include "llvm/ExecutionEngine/JIT.h" + #include "llvm/IRBuilder.h" + #include "llvm/LLVMContext.h" + #include "llvm/Module.h" + #include "llvm/PassManager.h" + #include "llvm/Analysis/Verifier.h" + #include "llvm/Analysis/Passes.h" + #include "llvm/DataLayout.h" + #include "llvm/Transforms/Scalar.h" + #include "llvm/Support/TargetSelect.h" + #include <cstdio> + #include <string> + #include <map> + #include <vector> + using namespace llvm; + + //===----------------------------------------------------------------------===// + // Lexer + //===----------------------------------------------------------------------===// + + // The lexer returns tokens [0-255] if it is an unknown character, otherwise one + // of these for known things. + enum Token { + tok_eof = -1, + + // commands + tok_def = -2, tok_extern = -3, + + // primary + tok_identifier = -4, tok_number = -5, + + // control + tok_if = -6, tok_then = -7, tok_else = -8, + tok_for = -9, tok_in = -10, + + // operators + tok_binary = -11, tok_unary = -12 + }; + + static std::string IdentifierStr; // Filled in if tok_identifier + static double NumVal; // Filled in if tok_number + + /// gettok - Return the next token from standard input. + static int gettok() { + static int LastChar = ' '; + + // Skip any whitespace. + while (isspace(LastChar)) + LastChar = getchar(); + + if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]* + IdentifierStr = LastChar; + while (isalnum((LastChar = getchar()))) + IdentifierStr += LastChar; + + if (IdentifierStr == "def") return tok_def; + if (IdentifierStr == "extern") return tok_extern; + if (IdentifierStr == "if") return tok_if; + if (IdentifierStr == "then") return tok_then; + if (IdentifierStr == "else") return tok_else; + if (IdentifierStr == "for") return tok_for; + if (IdentifierStr == "in") return tok_in; + if (IdentifierStr == "binary") return tok_binary; + if (IdentifierStr == "unary") return tok_unary; + return tok_identifier; + } + + if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+ + std::string NumStr; + do { + NumStr += LastChar; + LastChar = getchar(); + } while (isdigit(LastChar) || LastChar == '.'); + + NumVal = strtod(NumStr.c_str(), 0); + return tok_number; + } + + if (LastChar == '#') { + // Comment until end of line. + do LastChar = getchar(); + while (LastChar != EOF && LastChar != '\n' && LastChar != '\r'); + + if (LastChar != EOF) + return gettok(); + } + + // Check for end of file. Don't eat the EOF. + if (LastChar == EOF) + return tok_eof; + + // Otherwise, just return the character as its ascii value. + int ThisChar = LastChar; + LastChar = getchar(); + return ThisChar; + } + + //===----------------------------------------------------------------------===// + // Abstract Syntax Tree (aka Parse Tree) + //===----------------------------------------------------------------------===// + + /// ExprAST - Base class for all expression nodes. + class ExprAST { + public: + virtual ~ExprAST() {} + virtual Value *Codegen() = 0; + }; + + /// NumberExprAST - Expression class for numeric literals like "1.0". + class NumberExprAST : public ExprAST { + double Val; + public: + NumberExprAST(double val) : Val(val) {} + virtual Value *Codegen(); + }; + + /// VariableExprAST - Expression class for referencing a variable, like "a". + class VariableExprAST : public ExprAST { + std::string Name; + public: + VariableExprAST(const std::string &name) : Name(name) {} + virtual Value *Codegen(); + }; + + /// UnaryExprAST - Expression class for a unary operator. + class UnaryExprAST : public ExprAST { + char Opcode; + ExprAST *Operand; + public: + UnaryExprAST(char opcode, ExprAST *operand) + : Opcode(opcode), Operand(operand) {} + virtual Value *Codegen(); + }; + + /// BinaryExprAST - Expression class for a binary operator. + class BinaryExprAST : public ExprAST { + char Op; + ExprAST *LHS, *RHS; + public: + BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs) + : Op(op), LHS(lhs), RHS(rhs) {} + virtual Value *Codegen(); + }; + + /// CallExprAST - Expression class for function calls. + class CallExprAST : public ExprAST { + std::string Callee; + std::vector<ExprAST*> Args; + public: + CallExprAST(const std::string &callee, std::vector<ExprAST*> &args) + : Callee(callee), Args(args) {} + virtual Value *Codegen(); + }; + + /// IfExprAST - Expression class for if/then/else. + class IfExprAST : public ExprAST { + ExprAST *Cond, *Then, *Else; + public: + IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else) + : Cond(cond), Then(then), Else(_else) {} + virtual Value *Codegen(); + }; + + /// ForExprAST - Expression class for for/in. + class ForExprAST : public ExprAST { + std::string VarName; + ExprAST *Start, *End, *Step, *Body; + public: + ForExprAST(const std::string &varname, ExprAST *start, ExprAST *end, + ExprAST *step, ExprAST *body) + : VarName(varname), Start(start), End(end), Step(step), Body(body) {} + virtual Value *Codegen(); + }; + + /// PrototypeAST - This class represents the "prototype" for a function, + /// which captures its name, and its argument names (thus implicitly the number + /// of arguments the function takes), as well as if it is an operator. + class PrototypeAST { + std::string Name; + std::vector<std::string> Args; + bool isOperator; + unsigned Precedence; // Precedence if a binary op. + public: + PrototypeAST(const std::string &name, const std::vector<std::string> &args, + bool isoperator = false, unsigned prec = 0) + : Name(name), Args(args), isOperator(isoperator), Precedence(prec) {} + + bool isUnaryOp() const { return isOperator && Args.size() == 1; } + bool isBinaryOp() const { return isOperator && Args.size() == 2; } + + char getOperatorName() const { + assert(isUnaryOp() || isBinaryOp()); + return Name[Name.size()-1]; + } + + unsigned getBinaryPrecedence() const { return Precedence; } + + Function *Codegen(); + }; + + /// FunctionAST - This class represents a function definition itself. + class FunctionAST { + PrototypeAST *Proto; + ExprAST *Body; + public: + FunctionAST(PrototypeAST *proto, ExprAST *body) + : Proto(proto), Body(body) {} + + Function *Codegen(); + }; + + //===----------------------------------------------------------------------===// + // Parser + //===----------------------------------------------------------------------===// + + /// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current + /// token the parser is looking at. getNextToken reads another token from the + /// lexer and updates CurTok with its results. + static int CurTok; + static int getNextToken() { + return CurTok = gettok(); + } + + /// BinopPrecedence - This holds the precedence for each binary operator that is + /// defined. + static std::map<char, int> BinopPrecedence; + + /// GetTokPrecedence - Get the precedence of the pending binary operator token. + static int GetTokPrecedence() { + if (!isascii(CurTok)) + return -1; + + // Make sure it's a declared binop. + int TokPrec = BinopPrecedence[CurTok]; + if (TokPrec <= 0) return -1; + return TokPrec; + } + + /// Error* - These are little helper functions for error handling. + ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;} + PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; } + FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; } + + static ExprAST *ParseExpression(); + + /// identifierexpr + /// ::= identifier + /// ::= identifier '(' expression* ')' + static ExprAST *ParseIdentifierExpr() { + std::string IdName = IdentifierStr; + + getNextToken(); // eat identifier. + + if (CurTok != '(') // Simple variable ref. + return new VariableExprAST(IdName); + + // Call. + getNextToken(); // eat ( + std::vector<ExprAST*> Args; + if (CurTok != ')') { + while (1) { + ExprAST *Arg = ParseExpression(); + if (!Arg) return 0; + Args.push_back(Arg); + + if (CurTok == ')') break; + + if (CurTok != ',') + return Error("Expected ')' or ',' in argument list"); + getNextToken(); + } + } + + // Eat the ')'. + getNextToken(); + + return new CallExprAST(IdName, Args); + } + + /// numberexpr ::= number + static ExprAST *ParseNumberExpr() { + ExprAST *Result = new NumberExprAST(NumVal); + getNextToken(); // consume the number + return Result; + } + + /// parenexpr ::= '(' expression ')' + static ExprAST *ParseParenExpr() { + getNextToken(); // eat (. + ExprAST *V = ParseExpression(); + if (!V) return 0; + + if (CurTok != ')') + return Error("expected ')'"); + getNextToken(); // eat ). + return V; + } + + /// ifexpr ::= 'if' expression 'then' expression 'else' expression + static ExprAST *ParseIfExpr() { + getNextToken(); // eat the if. + + // condition. + ExprAST *Cond = ParseExpression(); + if (!Cond) return 0; + + if (CurTok != tok_then) + return Error("expected then"); + getNextToken(); // eat the then + + ExprAST *Then = ParseExpression(); + if (Then == 0) return 0; + + if (CurTok != tok_else) + return Error("expected else"); + + getNextToken(); + + ExprAST *Else = ParseExpression(); + if (!Else) return 0; + + return new IfExprAST(Cond, Then, Else); + } + + /// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression + static ExprAST *ParseForExpr() { + getNextToken(); // eat the for. + + if (CurTok != tok_identifier) + return Error("expected identifier after for"); + + std::string IdName = IdentifierStr; + getNextToken(); // eat identifier. + + if (CurTok != '=') + return Error("expected '=' after for"); + getNextToken(); // eat '='. + + + ExprAST *Start = ParseExpression(); + if (Start == 0) return 0; + if (CurTok != ',') + return Error("expected ',' after for start value"); + getNextToken(); + + ExprAST *End = ParseExpression(); + if (End == 0) return 0; + + // The step value is optional. + ExprAST *Step = 0; + if (CurTok == ',') { + getNextToken(); + Step = ParseExpression(); + if (Step == 0) return 0; + } + + if (CurTok != tok_in) + return Error("expected 'in' after for"); + getNextToken(); // eat 'in'. + + ExprAST *Body = ParseExpression(); + if (Body == 0) return 0; + + return new ForExprAST(IdName, Start, End, Step, Body); + } + + /// primary + /// ::= identifierexpr + /// ::= numberexpr + /// ::= parenexpr + /// ::= ifexpr + /// ::= forexpr + static ExprAST *ParsePrimary() { + switch (CurTok) { + default: return Error("unknown token when expecting an expression"); + case tok_identifier: return ParseIdentifierExpr(); + case tok_number: return ParseNumberExpr(); + case '(': return ParseParenExpr(); + case tok_if: return ParseIfExpr(); + case tok_for: return ParseForExpr(); + } + } + + /// unary + /// ::= primary + /// ::= '!' unary + static ExprAST *ParseUnary() { + // If the current token is not an operator, it must be a primary expr. + if (!isascii(CurTok) || CurTok == '(' || CurTok == ',') + return ParsePrimary(); + + // If this is a unary operator, read it. + int Opc = CurTok; + getNextToken(); + if (ExprAST *Operand = ParseUnary()) + return new UnaryExprAST(Opc, Operand); + return 0; + } + + /// binoprhs + /// ::= ('+' unary)* + static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) { + // If this is a binop, find its precedence. + while (1) { + int TokPrec = GetTokPrecedence(); + + // If this is a binop that binds at least as tightly as the current binop, + // consume it, otherwise we are done. + if (TokPrec < ExprPrec) + return LHS; + + // Okay, we know this is a binop. + int BinOp = CurTok; + getNextToken(); // eat binop + + // Parse the unary expression after the binary operator. + ExprAST *RHS = ParseUnary(); + if (!RHS) return 0; + + // If BinOp binds less tightly with RHS than the operator after RHS, let + // the pending operator take RHS as its LHS. + int NextPrec = GetTokPrecedence(); + if (TokPrec < NextPrec) { + RHS = ParseBinOpRHS(TokPrec+1, RHS); + if (RHS == 0) return 0; + } + + // Merge LHS/RHS. + LHS = new BinaryExprAST(BinOp, LHS, RHS); + } + } + + /// expression + /// ::= unary binoprhs + /// + static ExprAST *ParseExpression() { + ExprAST *LHS = ParseUnary(); + if (!LHS) return 0; + + return ParseBinOpRHS(0, LHS); + } + + /// prototype + /// ::= id '(' id* ')' + /// ::= binary LETTER number? (id, id) + /// ::= unary LETTER (id) + static PrototypeAST *ParsePrototype() { + std::string FnName; + + unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary. + unsigned BinaryPrecedence = 30; + + switch (CurTok) { + default: + return ErrorP("Expected function name in prototype"); + case tok_identifier: + FnName = IdentifierStr; + Kind = 0; + getNextToken(); + break; + case tok_unary: + getNextToken(); + if (!isascii(CurTok)) + return ErrorP("Expected unary operator"); + FnName = "unary"; + FnName += (char)CurTok; + Kind = 1; + getNextToken(); + break; + case tok_binary: + getNextToken(); + if (!isascii(CurTok)) + return ErrorP("Expected binary operator"); + FnName = "binary"; + FnName += (char)CurTok; + Kind = 2; + getNextToken(); + + // Read the precedence if present. + if (CurTok == tok_number) { + if (NumVal < 1 || NumVal > 100) + return ErrorP("Invalid precedecnce: must be 1..100"); + BinaryPrecedence = (unsigned)NumVal; + getNextToken(); + } + break; + } + + if (CurTok != '(') + return ErrorP("Expected '(' in prototype"); + + std::vector<std::string> ArgNames; + while (getNextToken() == tok_identifier) + ArgNames.push_back(IdentifierStr); + if (CurTok != ')') + return ErrorP("Expected ')' in prototype"); + + // success. + getNextToken(); // eat ')'. + + // Verify right number of names for operator. + if (Kind && ArgNames.size() != Kind) + return ErrorP("Invalid number of operands for operator"); + + return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence); + } + + /// definition ::= 'def' prototype expression + static FunctionAST *ParseDefinition() { + getNextToken(); // eat def. + PrototypeAST *Proto = ParsePrototype(); + if (Proto == 0) return 0; + + if (ExprAST *E = ParseExpression()) + return new FunctionAST(Proto, E); + return 0; + } + + /// toplevelexpr ::= expression + static FunctionAST *ParseTopLevelExpr() { + if (ExprAST *E = ParseExpression()) { + // Make an anonymous proto. + PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>()); + return new FunctionAST(Proto, E); + } + return 0; + } + + /// external ::= 'extern' prototype + static PrototypeAST *ParseExtern() { + getNextToken(); // eat extern. + return ParsePrototype(); + } + + //===----------------------------------------------------------------------===// + // Code Generation + //===----------------------------------------------------------------------===// + + static Module *TheModule; + static IRBuilder<> Builder(getGlobalContext()); + static std::map<std::string, Value*> NamedValues; + static FunctionPassManager *TheFPM; + + Value *ErrorV(const char *Str) { Error(Str); return 0; } + + Value *NumberExprAST::Codegen() { + return ConstantFP::get(getGlobalContext(), APFloat(Val)); + } + + Value *VariableExprAST::Codegen() { + // Look this variable up in the function. + Value *V = NamedValues[Name]; + return V ? V : ErrorV("Unknown variable name"); + } + + Value *UnaryExprAST::Codegen() { + Value *OperandV = Operand->Codegen(); + if (OperandV == 0) return 0; + + Function *F = TheModule->getFunction(std::string("unary")+Opcode); + if (F == 0) + return ErrorV("Unknown unary operator"); + + return Builder.CreateCall(F, OperandV, "unop"); + } + + Value *BinaryExprAST::Codegen() { + Value *L = LHS->Codegen(); + Value *R = RHS->Codegen(); + if (L == 0 || R == 0) return 0; + + switch (Op) { + case '+': return Builder.CreateFAdd(L, R, "addtmp"); + case '-': return Builder.CreateFSub(L, R, "subtmp"); + case '*': return Builder.CreateFMul(L, R, "multmp"); + case '<': + L = Builder.CreateFCmpULT(L, R, "cmptmp"); + // Convert bool 0/1 to double 0.0 or 1.0 + return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), + "booltmp"); + default: break; + } + + // If it wasn't a builtin binary operator, it must be a user defined one. Emit + // a call to it. + Function *F = TheModule->getFunction(std::string("binary")+Op); + assert(F && "binary operator not found!"); + + Value *Ops[2] = { L, R }; + return Builder.CreateCall(F, Ops, "binop"); + } + + Value *CallExprAST::Codegen() { + // Look up the name in the global module table. + Function *CalleeF = TheModule->getFunction(Callee); + if (CalleeF == 0) + return ErrorV("Unknown function referenced"); + + // If argument mismatch error. + if (CalleeF->arg_size() != Args.size()) + return ErrorV("Incorrect # arguments passed"); + + std::vector<Value*> ArgsV; + for (unsigned i = 0, e = Args.size(); i != e; ++i) { + ArgsV.push_back(Args[i]->Codegen()); + if (ArgsV.back() == 0) return 0; + } + + return Builder.CreateCall(CalleeF, ArgsV, "calltmp"); + } + + Value *IfExprAST::Codegen() { + Value *CondV = Cond->Codegen(); + if (CondV == 0) return 0; + + // Convert condition to a bool by comparing equal to 0.0. + CondV = Builder.CreateFCmpONE(CondV, + ConstantFP::get(getGlobalContext(), APFloat(0.0)), + "ifcond"); + + Function *TheFunction = Builder.GetInsertBlock()->getParent(); + + // Create blocks for the then and else cases. Insert the 'then' block at the + // end of the function. + BasicBlock *ThenBB = BasicBlock::Create(getGlobalContext(), "then", TheFunction); + BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else"); + BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont"); + + Builder.CreateCondBr(CondV, ThenBB, ElseBB); + + // Emit then value. + Builder.SetInsertPoint(ThenBB); + + Value *ThenV = Then->Codegen(); + if (ThenV == 0) return 0; + + Builder.CreateBr(MergeBB); + // Codegen of 'Then' can change the current block, update ThenBB for the PHI. + ThenBB = Builder.GetInsertBlock(); + + // Emit else block. + TheFunction->getBasicBlockList().push_back(ElseBB); + Builder.SetInsertPoint(ElseBB); + + Value *ElseV = Else->Codegen(); + if (ElseV == 0) return 0; + + Builder.CreateBr(MergeBB); + // Codegen of 'Else' can change the current block, update ElseBB for the PHI. + ElseBB = Builder.GetInsertBlock(); + + // Emit merge block. + TheFunction->getBasicBlockList().push_back(MergeBB); + Builder.SetInsertPoint(MergeBB); + PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, + "iftmp"); + + PN->addIncoming(ThenV, ThenBB); + PN->addIncoming(ElseV, ElseBB); + return PN; + } + + Value *ForExprAST::Codegen() { + // Output this as: + // ... + // start = startexpr + // goto loop + // loop: + // variable = phi [start, loopheader], [nextvariable, loopend] + // ... + // bodyexpr + // ... + // loopend: + // step = stepexpr + // nextvariable = variable + step + // endcond = endexpr + // br endcond, loop, endloop + // outloop: + + // Emit the start code first, without 'variable' in scope. + Value *StartVal = Start->Codegen(); + if (StartVal == 0) return 0; + + // Make the new basic block for the loop header, inserting after current + // block. + Function *TheFunction = Builder.GetInsertBlock()->getParent(); + BasicBlock *PreheaderBB = Builder.GetInsertBlock(); + BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction); + + // Insert an explicit fall through from the current block to the LoopBB. + Builder.CreateBr(LoopBB); + + // Start insertion in LoopBB. + Builder.SetInsertPoint(LoopBB); + + // Start the PHI node with an entry for Start. + PHINode *Variable = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, VarName.c_str()); + Variable->addIncoming(StartVal, PreheaderBB); + + // Within the loop, the variable is defined equal to the PHI node. If it + // shadows an existing variable, we have to restore it, so save it now. + Value *OldVal = NamedValues[VarName]; + NamedValues[VarName] = Variable; + + // Emit the body of the loop. This, like any other expr, can change the + // current BB. Note that we ignore the value computed by the body, but don't + // allow an error. + if (Body->Codegen() == 0) + return 0; + + // Emit the step value. + Value *StepVal; + if (Step) { + StepVal = Step->Codegen(); + if (StepVal == 0) return 0; + } else { + // If not specified, use 1.0. + StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0)); + } + + Value *NextVar = Builder.CreateFAdd(Variable, StepVal, "nextvar"); + + // Compute the end condition. + Value *EndCond = End->Codegen(); + if (EndCond == 0) return EndCond; + + // Convert condition to a bool by comparing equal to 0.0. + EndCond = Builder.CreateFCmpONE(EndCond, + ConstantFP::get(getGlobalContext(), APFloat(0.0)), + "loopcond"); + + // Create the "after loop" block and insert it. + BasicBlock *LoopEndBB = Builder.GetInsertBlock(); + BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction); + + // Insert the conditional branch into the end of LoopEndBB. + Builder.CreateCondBr(EndCond, LoopBB, AfterBB); + + // Any new code will be inserted in AfterBB. + Builder.SetInsertPoint(AfterBB); + + // Add a new entry to the PHI node for the backedge. + Variable->addIncoming(NextVar, LoopEndBB); + + // Restore the unshadowed variable. + if (OldVal) + NamedValues[VarName] = OldVal; + else + NamedValues.erase(VarName); + + + // for expr always returns 0.0. + return Constant::getNullValue(Type::getDoubleTy(getGlobalContext())); + } + + Function *PrototypeAST::Codegen() { + // Make the function type: double(double,double) etc. + std::vector<Type*> Doubles(Args.size(), + Type::getDoubleTy(getGlobalContext())); + FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()), + Doubles, false); + + Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule); + + // If F conflicted, there was already something named 'Name'. If it has a + // body, don't allow redefinition or reextern. + if (F->getName() != Name) { + // Delete the one we just made and get the existing one. + F->eraseFromParent(); + F = TheModule->getFunction(Name); + + // If F already has a body, reject this. + if (!F->empty()) { + ErrorF("redefinition of function"); + return 0; + } + + // If F took a different number of args, reject. + if (F->arg_size() != Args.size()) { + ErrorF("redefinition of function with different # args"); + return 0; + } + } + + // Set names for all arguments. + unsigned Idx = 0; + for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size(); + ++AI, ++Idx) { + AI->setName(Args[Idx]); + + // Add arguments to variable symbol table. + NamedValues[Args[Idx]] = AI; + } + + return F; + } + + Function *FunctionAST::Codegen() { + NamedValues.clear(); + + Function *TheFunction = Proto->Codegen(); + if (TheFunction == 0) + return 0; + + // If this is an operator, install it. + if (Proto->isBinaryOp()) + BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence(); + + // Create a new basic block to start insertion into. + BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction); + Builder.SetInsertPoint(BB); + + if (Value *RetVal = Body->Codegen()) { + // Finish off the function. + Builder.CreateRet(RetVal); + + // Validate the generated code, checking for consistency. + verifyFunction(*TheFunction); + + // Optimize the function. + TheFPM->run(*TheFunction); + + return TheFunction; + } + + // Error reading body, remove function. + TheFunction->eraseFromParent(); + + if (Proto->isBinaryOp()) + BinopPrecedence.erase(Proto->getOperatorName()); + return 0; + } + + //===----------------------------------------------------------------------===// + // Top-Level parsing and JIT Driver + //===----------------------------------------------------------------------===// + + static ExecutionEngine *TheExecutionEngine; + + static void HandleDefinition() { + if (FunctionAST *F = ParseDefinition()) { + if (Function *LF = F->Codegen()) { + fprintf(stderr, "Read function definition:"); + LF->dump(); + } + } else { + // Skip token for error recovery. + getNextToken(); + } + } + + static void HandleExtern() { + if (PrototypeAST *P = ParseExtern()) { + if (Function *F = P->Codegen()) { + fprintf(stderr, "Read extern: "); + F->dump(); + } + } else { + // Skip token for error recovery. + getNextToken(); + } + } + + static void HandleTopLevelExpression() { + // Evaluate a top-level expression into an anonymous function. + if (FunctionAST *F = ParseTopLevelExpr()) { + if (Function *LF = F->Codegen()) { + // JIT the function, returning a function pointer. + void *FPtr = TheExecutionEngine->getPointerToFunction(LF); + + // Cast it to the right type (takes no arguments, returns a double) so we + // can call it as a native function. + double (*FP)() = (double (*)())(intptr_t)FPtr; + fprintf(stderr, "Evaluated to %f\n", FP()); + } + } else { + // Skip token for error recovery. + getNextToken(); + } + } + + /// top ::= definition | external | expression | ';' + static void MainLoop() { + while (1) { + fprintf(stderr, "ready> "); + switch (CurTok) { + case tok_eof: return; + case ';': getNextToken(); break; // ignore top-level semicolons. + case tok_def: HandleDefinition(); break; + case tok_extern: HandleExtern(); break; + default: HandleTopLevelExpression(); break; + } + } + } + + //===----------------------------------------------------------------------===// + // "Library" functions that can be "extern'd" from user code. + //===----------------------------------------------------------------------===// + + /// putchard - putchar that takes a double and returns 0. + extern "C" + double putchard(double X) { + putchar((char)X); + return 0; + } + + /// printd - printf that takes a double prints it as "%f\n", returning 0. + extern "C" + double printd(double X) { + printf("%f\n", X); + return 0; + } + + //===----------------------------------------------------------------------===// + // Main driver code. + //===----------------------------------------------------------------------===// + + int main() { + InitializeNativeTarget(); + LLVMContext &Context = getGlobalContext(); + + // Install standard binary operators. + // 1 is lowest precedence. + BinopPrecedence['<'] = 10; + BinopPrecedence['+'] = 20; + BinopPrecedence['-'] = 20; + BinopPrecedence['*'] = 40; // highest. + + // Prime the first token. + fprintf(stderr, "ready> "); + getNextToken(); + + // Make the module, which holds all the code. + TheModule = new Module("my cool jit", Context); + + // Create the JIT. This takes ownership of the module. + std::string ErrStr; + TheExecutionEngine = EngineBuilder(TheModule).setErrorStr(&ErrStr).create(); + if (!TheExecutionEngine) { + fprintf(stderr, "Could not create ExecutionEngine: %s\n", ErrStr.c_str()); + exit(1); + } + + FunctionPassManager OurFPM(TheModule); + + // Set up the optimizer pipeline. Start with registering info about how the + // target lays out data structures. + OurFPM.add(new DataLayout(*TheExecutionEngine->getDataLayout())); + // Provide basic AliasAnalysis support for GVN. + OurFPM.add(createBasicAliasAnalysisPass()); + // Do simple "peephole" optimizations and bit-twiddling optzns. + OurFPM.add(createInstructionCombiningPass()); + // Reassociate expressions. + OurFPM.add(createReassociatePass()); + // Eliminate Common SubExpressions. + OurFPM.add(createGVNPass()); + // Simplify the control flow graph (deleting unreachable blocks, etc). + OurFPM.add(createCFGSimplificationPass()); + + OurFPM.doInitialization(); + + // Set the global so the code gen can use this. + TheFPM = &OurFPM; + + // Run the main "interpreter loop" now. + MainLoop(); + + TheFPM = 0; + + // Print out all of the generated code. + TheModule->dump(); + + return 0; + } + +`Next: Extending the language: mutable variables / SSA +construction <LangImpl7.html>`_ + diff --git a/docs/tutorial/LangImpl7.html b/docs/tutorial/LangImpl7.html deleted file mode 100644 index 8fa99b1903..0000000000 --- a/docs/tutorial/LangImpl7.html +++ /dev/null @@ -1,2164 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> - -<html> -<head> - <title>Kaleidoscope: Extending the Language: Mutable Variables / SSA - construction</title> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <meta name="author" content="Chris Lattner"> - <link rel="stylesheet" href="../_static/llvm.css" type="text/css"> -</head> - -<body> - -<h1>Kaleidoscope: Extending the Language: Mutable Variables</h1> - -<ul> -<li><a href="index.html">Up to Tutorial Index</a></li> -<li>Chapter 7 - <ol> - <li><a href="#intro">Chapter 7 Introduction</a></li> - <li><a href="#why">Why is this a hard problem?</a></li> - <li><a href="#memory">Memory in LLVM</a></li> - <li><a href="#kalvars">Mutable Variables in Kaleidoscope</a></li> - <li><a href="#adjustments">Adjusting Existing Variables for - Mutation</a></li> - <li><a href="#assignment">New Assignment Operator</a></li> - <li><a href="#localvars">User-defined Local Variables</a></li> - <li><a href="#code">Full Code Listing</a></li> - </ol> -</li> -<li><a href="LangImpl8.html">Chapter 8</a>: Conclusion and other useful LLVM - tidbits</li> -</ul> - -<div class="doc_author"> - <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p> -</div> - -<!-- *********************************************************************** --> -<h2><a name="intro">Chapter 7 Introduction</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>Welcome to Chapter 7 of the "<a href="index.html">Implementing a language -with LLVM</a>" tutorial. In chapters 1 through 6, we've built a very -respectable, albeit simple, <a -href="http://en.wikipedia.org/wiki/Functional_programming">functional -programming language</a>. In our journey, we learned some parsing techniques, -how to build and represent an AST, how to build LLVM IR, and how to optimize -the resultant code as well as JIT compile it.</p> - -<p>While Kaleidoscope is interesting as a functional language, the fact that it -is functional makes it "too easy" to generate LLVM IR for it. In particular, a -functional language makes it very easy to build LLVM IR directly in <a -href="http://en.wikipedia.org/wiki/Static_single_assignment_form">SSA form</a>. -Since LLVM requires that the input code be in SSA form, this is a very nice -property and it is often unclear to newcomers how to generate code for an -imperative language with mutable variables.</p> - -<p>The short (and happy) summary of this chapter is that there is no need for -your front-end to build SSA form: LLVM provides highly tuned and well tested -support for this, though the way it works is a bit unexpected for some.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="why">Why is this a hard problem?</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p> -To understand why mutable variables cause complexities in SSA construction, -consider this extremely simple C example: -</p> - -<div class="doc_code"> -<pre> -int G, H; -int test(_Bool Condition) { - int X; - if (Condition) - X = G; - else - X = H; - return X; -} -</pre> -</div> - -<p>In this case, we have the variable "X", whose value depends on the path -executed in the program. Because there are two different possible values for X -before the return instruction, a PHI node is inserted to merge the two values. -The LLVM IR that we want for this example looks like this:</p> - -<div class="doc_code"> -<pre> -@G = weak global i32 0 ; type of @G is i32* -@H = weak global i32 0 ; type of @H is i32* - -define i32 @test(i1 %Condition) { -entry: - br i1 %Condition, label %cond_true, label %cond_false - -cond_true: - %X.0 = load i32* @G - br label %cond_next - -cond_false: - %X.1 = load i32* @H - br label %cond_next - -cond_next: - %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ] - ret i32 %X.2 -} -</pre> -</div> - -<p>In this example, the loads from the G and H global variables are explicit in -the LLVM IR, and they live in the then/else branches of the if statement -(cond_true/cond_false). In order to merge the incoming values, the X.2 phi node -in the cond_next block selects the right value to use based on where control -flow is coming from: if control flow comes from the cond_false block, X.2 gets -the value of X.1. Alternatively, if control flow comes from cond_true, it gets -the value of X.0. The intent of this chapter is not to explain the details of -SSA form. For more information, see one of the many <a -href="http://en.wikipedia.org/wiki/Static_single_assignment_form">online -references</a>.</p> - -<p>The question for this article is "who places the phi nodes when lowering -assignments to mutable variables?". The issue here is that LLVM -<em>requires</em> that its IR be in SSA form: there is no "non-ssa" mode for it. -However, SSA construction requires non-trivial algorithms and data structures, -so it is inconvenient and wasteful for every front-end to have to reproduce this -logic.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="memory">Memory in LLVM</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>The 'trick' here is that while LLVM does require all register values to be -in SSA form, it does not require (or permit) memory objects to be in SSA form. -In the example above, note that the loads from G and H are direct accesses to -G and H: they are not renamed or versioned. This differs from some other -compiler systems, which do try to version memory objects. In LLVM, instead of -encoding dataflow analysis of memory into the LLVM IR, it is handled with <a -href="../WritingAnLLVMPass.html">Analysis Passes</a> which are computed on -demand.</p> - -<p> -With this in mind, the high-level idea is that we want to make a stack variable -(which lives in memory, because it is on the stack) for each mutable object in -a function. To take advantage of this trick, we need to talk about how LLVM -represents stack variables. -</p> - -<p>In LLVM, all memory accesses are explicit with load/store instructions, and -it is carefully designed not to have (or need) an "address-of" operator. Notice -how the type of the @G/@H global variables is actually "i32*" even though the -variable is defined as "i32". What this means is that @G defines <em>space</em> -for an i32 in the global data area, but its <em>name</em> actually refers to the -address for that space. Stack variables work the same way, except that instead of -being declared with global variable definitions, they are declared with the -<a href="../LangRef.html#i_alloca">LLVM alloca instruction</a>:</p> - -<div class="doc_code"> -<pre> -define i32 @example() { -entry: - %X = alloca i32 ; type of %X is i32*. - ... - %tmp = load i32* %X ; load the stack value %X from the stack. - %tmp2 = add i32 %tmp, 1 ; increment it - store i32 %tmp2, i32* %X ; store it back - ... -</pre> -</div> - -<p>This code shows an example of how you can declare and manipulate a stack -variable in the LLVM IR. Stack memory allocated with the alloca instruction is -fully general: you can pass the address of the stack slot to functions, you can -store it in other variables, etc. In our example above, we could rewrite the -example to use the alloca technique to avoid using a PHI node:</p> - -<div class="doc_code"> -<pre> -@G = weak global i32 0 ; type of @G is i32* -@H = weak global i32 0 ; type of @H is i32* - -define i32 @test(i1 %Condition) { -entry: - %X = alloca i32 ; type of %X is i32*. - br i1 %Condition, label %cond_true, label %cond_false - -cond_true: - %X.0 = load i32* @G - store i32 %X.0, i32* %X ; Update X - br label %cond_next - -cond_false: - %X.1 = load i32* @H - store i32 %X.1, i32* %X ; Update X - br label %cond_next - -cond_next: - %X.2 = load i32* %X ; Read X - ret i32 %X.2 -} -</pre> -</div> - -<p>With this, we have discovered a way to handle arbitrary mutable variables -without the need to create Phi nodes at all:</p> - -<ol> -<li>Each mutable variable becomes a stack allocation.</li> -<li>Each read of the variable becomes a load from the stack.</li> -<li>Each update of the variable becomes a store to the stack.</li> -<li>Taking the address of a variable just uses the stack address directly.</li> -</ol> - -<p>While this solution has solved our immediate problem, it introduced another -one: we have now apparently introduced a lot of stack traffic for very simple -and common operations, a major performance problem. Fortunately for us, the -LLVM optimizer has a highly-tuned optimization pass named "mem2reg" that handles -this case, promoting allocas like this into SSA registers, inserting Phi nodes -as appropriate. If you run this example through the pass, for example, you'll -get:</p> - -<div class="doc_code"> -<pre> -$ <b>llvm-as < example.ll | opt -mem2reg | llvm-dis</b> -@G = weak global i32 0 -@H = weak global i32 0 - -define i32 @test(i1 %Condition) { -entry: - br i1 %Condition, label %cond_true, label %cond_false - -cond_true: - %X.0 = load i32* @G - br label %cond_next - -cond_false: - %X.1 = load i32* @H - br label %cond_next - -cond_next: - %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ] - ret i32 %X.01 -} -</pre> -</div> - -<p>The mem2reg pass implements the standard "iterated dominance frontier" -algorithm for constructing SSA form and has a number of optimizations that speed -up (very common) degenerate cases. The mem2reg optimization pass is the answer to dealing -with mutable variables, and we highly recommend that you depend on it. Note that -mem2reg only works on variables in certain circumstances:</p> - -<ol> -<li>mem2reg is alloca-driven: it looks for allocas and if it can handle them, it -promotes them. It does not apply to global variables or heap allocations.</li> - -<li>mem2reg only looks for alloca instructions in the entry block of the -function. Being in the entry block guarantees that the alloca is only executed -once, which makes analysis simpler.</li> - -<li>mem2reg only promotes allocas whose uses are direct loads and stores. If -the address of the stack object is passed to a function, or if any funny pointer -arithmetic is involved, the alloca will not be promoted.</li> - -<li>mem2reg only works on allocas of <a -href="../LangRef.html#t_classifications">first class</a> -values (such as pointers, scalars and vectors), and only if the array size -of the allocation is 1 (or missing in the .ll file). mem2reg is not capable of -promoting structs or arrays to registers. Note that the "scalarrepl" pass is -more powerful and can promote structs, "unions", and arrays in many cases.</li> - -</ol> - -<p> -All of these properties are easy to satisfy for most imperative languages, and -we'll illustrate it below with Kaleidoscope. The final question you may be -asking is: should I bother with this nonsense for my front-end? Wouldn't it be -better if I just did SSA construction directly, avoiding use of the mem2reg -optimization pass? In short, we strongly recommend that you use this technique -for building SSA form, unless there is an extremely good reason not to. Using -this technique is:</p> - -<ul> -<li>Proven and well tested: llvm-gcc and clang both use this technique for local -mutable variables. As such, the most common clients of LLVM are using this to -handle a bulk of their variables. You can be sure that bugs are found fast and -fixed early.</li> - -<li>Extremely Fast: mem2reg has a number of special cases that make it fast in -common cases as well as fully general. For example, it has fast-paths for -variables that are only used in a single block, variables that only have one -assignment point, good heuristics to avoid insertion of unneeded phi nodes, etc. -</li> - -<li>Needed for debug info generation: <a href="../SourceLevelDebugging.html"> -Debug information in LLVM</a> relies on having the address of the variable -exposed so that debug info can be attached to it. This technique dovetails -very naturally with this style of debug info.</li> -</ul> - -<p>If nothing else, this makes it much easier to get your front-end up and -running, and is very simple to implement. Lets extend Kaleidoscope with mutable -variables now! -</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="kalvars">Mutable Variables in Kaleidoscope</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>Now that we know the sort of problem we want to tackle, lets see what this -looks like in the context of our little Kaleidoscope language. We're going to -add two features:</p> - -<ol> -<li>The ability to mutate variables with the '=' operator.</li> -<li>The ability to define new variables.</li> -</ol> - -<p>While the first item is really what this is about, we only have variables -for incoming arguments as well as for induction variables, and redefining those only -goes so far :). Also, the ability to define new variables is a -useful thing regardless of whether you will be mutating them. Here's a -motivating example that shows how we could use these:</p> - -<div class="doc_code"> -<pre> -# Define ':' for sequencing: as a low-precedence operator that ignores operands -# and just returns the RHS. -def binary : 1 (x y) y; - -# Recursive fib, we could do this before. -def fib(x) - if (x < 3) then - 1 - else - fib(x-1)+fib(x-2); - -# Iterative fib. -def fibi(x) - <b>var a = 1, b = 1, c in</b> - (for i = 3, i < x in - <b>c = a + b</b> : - <b>a = b</b> : - <b>b = c</b>) : - b; - -# Call it. -fibi(10); -</pre> -</div> - -<p> -In order to mutate variables, we have to change our existing variables to use -the "alloca trick". Once we have that, we'll add our new operator, then extend -Kaleidoscope to support new variable definitions. -</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="adjustments">Adjusting Existing Variables for Mutation</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p> -The symbol table in Kaleidoscope is managed at code generation time by the -'<tt>NamedValues</tt>' map. This map currently keeps track of the LLVM "Value*" -that holds the double value for the named variable. In order to support -mutation, we need to change this slightly, so that it <tt>NamedValues</tt> holds -the <em>memory location</em> of the variable in question. Note that this -change is a refactoring: it changes the structure of the code, but does not -(by itself) change the behavior of the compiler. All of these changes are -isolated in the Kaleidoscope code generator.</p> - -<p> -At this point in Kaleidoscope's development, it only supports variables for two -things: incoming arguments to functions and the induction variable of 'for' -loops. For consistency, we'll allow mutation of these variables in addition to -other user-defined variables. This means that these will both need memory -locations. -</p> - -<p>To start our transformation of Kaleidoscope, we'll change the NamedValues -map so that it maps to AllocaInst* instead of Value*. Once we do this, the C++ -compiler will tell us what parts of the code we need to update:</p> - -<div class="doc_code"> -<pre> -static std::map<std::string, AllocaInst*> NamedValues; -</pre> -</div> - -<p>Also, since we will need to create these alloca's, we'll use a helper -function that ensures that the allocas are created in the entry block of the -function:</p> - -<div class="doc_code"> -<pre> -/// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of -/// the function. This is used for mutable variables etc. -static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction, - const std::string &VarName) { - IRBuilder<> TmpB(&TheFunction->getEntryBlock(), - TheFunction->getEntryBlock().begin()); - return TmpB.CreateAlloca(Type::getDoubleTy(getGlobalContext()), 0, - VarName.c_str()); -} -</pre> -</div> - -<p>This funny looking code creates an IRBuilder object that is pointing at -the first instruction (.begin()) of the entry block. It then creates an alloca -with the expected name and returns it. Because all values in Kaleidoscope are -doubles, there is no need to pass in a type to use.</p> - -<p>With this in place, the first functionality change we want to make is to -variable references. In our new scheme, variables live on the stack, so code -generating a reference to them actually needs to produce a load from the stack -slot:</p> - -<div class="doc_code"> -<pre> -Value *VariableExprAST::Codegen() { - // Look this variable up in the function. - Value *V = NamedValues[Name]; - if (V == 0) return ErrorV("Unknown variable name"); - - <b>// Load the value. - return Builder.CreateLoad(V, Name.c_str());</b> -} -</pre> -</div> - -<p>As you can see, this is pretty straightforward. Now we need to update the -things that define the variables to set up the alloca. We'll start with -<tt>ForExprAST::Codegen</tt> (see the <a href="#code">full code listing</a> for -the unabridged code):</p> - -<div class="doc_code"> -<pre> - Function *TheFunction = Builder.GetInsertBlock()->getParent(); - - <b>// Create an alloca for the variable in the entry block. - AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);</b> - - // Emit the start code first, without 'variable' in scope. - Value *StartVal = Start->Codegen(); - if (StartVal == 0) return 0; - - <b>// Store the value into the alloca. - Builder.CreateStore(StartVal, Alloca);</b> - ... - - // Compute the end condition. - Value *EndCond = End->Codegen(); - if (EndCond == 0) return EndCond; - - <b>// Reload, increment, and restore the alloca. This handles the case where - // the body of the loop mutates the variable. - Value *CurVar = Builder.CreateLoad(Alloca); - Value *NextVar = Builder.CreateFAdd(CurVar, StepVal, "nextvar"); - Builder.CreateStore(NextVar, Alloca);</b> - ... -</pre> -</div> - -<p>This code is virtually identical to the code <a -href="LangImpl5.html#forcodegen">before we allowed mutable variables</a>. The -big difference is that we no longer have to construct a PHI node, and we use -load/store to access the variable as needed.</p> - -<p>To support mutable argument variables, we need to also make allocas for them. -The code for this is also pretty simple:</p> - -<div class="doc_code"> -<pre> -/// CreateArgumentAllocas - Create an alloca for each argument and register the -/// argument in the symbol table so that references to it will succeed. -void PrototypeAST::CreateArgumentAllocas(Function *F) { - Function::arg_iterator AI = F->arg_begin(); - for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) { - // Create an alloca for this variable. - AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]); - - // Store the initial value into the alloca. - Builder.CreateStore(AI, Alloca); - - // Add arguments to variable symbol table. - NamedValues[Args[Idx]] = Alloca; - } -} -</pre> -</div> - -<p>For each argument, we make an alloca, store the input value to the function -into the alloca, and register the alloca as the memory location for the -argument. This method gets invoked by <tt>FunctionAST::Codegen</tt> right after -it sets up the entry block for the function.</p> - -<p>The final missing piece is adding the mem2reg pass, which allows us to get -good codegen once again:</p> - -<div class="doc_code"> -<pre> - // Set up the optimizer pipeline. Start with registering info about how the - // target lays out data structures. - OurFPM.add(new DataLayout(*TheExecutionEngine->getDataLayout())); - <b>// Promote allocas to registers. - OurFPM.add(createPromoteMemoryToRegisterPass());</b> - // Do simple "peephole" optimizations and bit-twiddling optzns. - OurFPM.add(createInstructionCombiningPass()); - // Reassociate expressions. - OurFPM.add(createReassociatePass()); -</pre> -</div> - -<p>It is interesting to see what the code looks like before and after the -mem2reg optimization runs. For example, this is the before/after code for our -recursive fib function. Before the optimization:</p> - -<div class="doc_code"> -<pre> -define double @fib(double %x) { -entry: - <b>%x1 = alloca double - store double %x, double* %x1 - %x2 = load double* %x1</b> - %cmptmp = fcmp ult double %x2, 3.000000e+00 - %booltmp = uitofp i1 %cmptmp to double - %ifcond = fcmp one double %booltmp, 0.000000e+00 - br i1 %ifcond, label %then, label %else - -then: ; preds = %entry - br label %ifcont - -else: ; preds = %entry - <b>%x3 = load double* %x1</b> - %subtmp = fsub double %x3, 1.000000e+00 - %calltmp = call double @fib(double %subtmp) - <b>%x4 = load double* %x1</b> - %subtmp5 = fsub double %x4, 2.000000e+00 - %calltmp6 = call double @fib(double %subtmp5) - %addtmp = fadd double %calltmp, %calltmp6 - br label %ifcont - -ifcont: ; preds = %else, %then - %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ] - ret double %iftmp -} -</pre> -</div> - -<p>Here there is only one variable (x, the input argument) but you can still -see the extremely simple-minded code generation strategy we are using. In the -entry block, an alloca is created, and the initial input value is stored into -it. Each reference to the variable does a reload from the stack. Also, note -that we didn't modify the if/then/else expression, so it still inserts a PHI -node. While we could make an alloca for it, it is actually easier to create a -PHI node for it, so we still just make the PHI.</p> - -<p>Here is the code after the mem2reg pass runs:</p> - -<div class="doc_code"> -<pre> -define double @fib(double %x) { -entry: - %cmptmp = fcmp ult double <b>%x</b>, 3.000000e+00 - %booltmp = uitofp i1 %cmptmp to double - %ifcond = fcmp one double %booltmp, 0.000000e+00 - br i1 %ifcond, label %then, label %else - -then: - br label %ifcont - -else: - %subtmp = fsub double <b>%x</b>, 1.000000e+00 - %calltmp = call double @fib(double %subtmp) - %subtmp5 = fsub double <b>%x</b>, 2.000000e+00 - %calltmp6 = call double @fib(double %subtmp5) - %addtmp = fadd double %calltmp, %calltmp6 - br label %ifcont - -ifcont: ; preds = %else, %then - %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ] - ret double %iftmp -} -</pre> -</div> - -<p>This is a trivial case for mem2reg, since there are no redefinitions of the -variable. The point of showing this is to calm your tension about inserting -such blatent inefficiencies :).</p> - -<p>After the rest of the optimizers run, we get:</p> - -<div class="doc_code"> -<pre> -define double @fib(double %x) { -entry: - %cmptmp = fcmp ult double %x, 3.000000e+00 - %booltmp = uitofp i1 %cmptmp to double - %ifcond = fcmp ueq double %booltmp, 0.000000e+00 - br i1 %ifcond, label %else, label %ifcont - -else: - %subtmp = fsub double %x, 1.000000e+00 - %calltmp = call double @fib(double %subtmp) - %subtmp5 = fsub double %x, 2.000000e+00 - %calltmp6 = call double @fib(double %subtmp5) - %addtmp = fadd double %calltmp, %calltmp6 - ret double %addtmp - -ifcont: - ret double 1.000000e+00 -} -</pre> -</div> - -<p>Here we see that the simplifycfg pass decided to clone the return instruction -into the end of the 'else' block. This allowed it to eliminate some branches -and the PHI node.</p> - -<p>Now that all symbol table references are updated to use stack variables, -we'll add the assignment operator.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="assignment">New Assignment Operator</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>With our current framework, adding a new assignment operator is really -simple. We will parse it just like any other binary operator, but handle it -internally (instead of allowing the user to define it). The first step is to -set a precedence:</p> - -<div class="doc_code"> -<pre> - int main() { - // Install standard binary operators. - // 1 is lowest precedence. - <b>BinopPrecedence['='] = 2;</b> - BinopPrecedence['<'] = 10; - BinopPrecedence['+'] = 20; - BinopPrecedence['-'] = 20; -</pre> -</div> - -<p>Now that the parser knows the precedence of the binary operator, it takes -care of all the parsing and AST generation. We just need to implement codegen -for the assignment operator. This looks like:</p> - -<div class="doc_code"> -<pre> -Value *BinaryExprAST::Codegen() { - // Special case '=' because we don't want to emit the LHS as an expression. - if (Op == '=') { - // Assignment requires the LHS to be an identifier. - VariableExprAST *LHSE = dynamic_cast<VariableExprAST*>(LHS); - if (!LHSE) - return ErrorV("destination of '=' must be a variable"); -</pre> -</div> - -<p>Unlike the rest of the binary operators, our assignment operator doesn't -follow the "emit LHS, emit RHS, do computation" model. As such, it is handled -as a special case before the other binary operators are handled. The other -strange thing is that it requires the LHS to be a variable. It is invalid to -have "(x+1) = expr" - only things like "x = expr" are allowed. -</p> - -<div class="doc_code"> -<pre> - // Codegen the RHS. - Value *Val = RHS->Codegen(); - if (Val == 0) return 0; - - // Look up the name. - Value *Variable = NamedValues[LHSE->getName()]; - if (Variable == 0) return ErrorV("Unknown variable name"); - - Builder.CreateStore(Val, Variable); - return Val; - } - ... -</pre> -</div> - -<p>Once we have the variable, codegen'ing the assignment is straightforward: -we emit the RHS of the assignment, create a store, and return the computed -value. Returning a value allows for chained assignments like "X = (Y = Z)".</p> - -<p>Now that we have an assignment operator, we can mutate loop variables and -arguments. For example, we can now run code like this:</p> - -<div class="doc_code"> -<pre> -# Function to print a double. -extern printd(x); - -# Define ':' for sequencing: as a low-precedence operator that ignores operands -# and just returns the RHS. -def binary : 1 (x y) y; - -def test(x) - printd(x) : - x = 4 : - printd(x); - -test(123); -</pre> -</div> - -<p>When run, this example prints "123" and then "4", showing that we did -actually mutate the value! Okay, we have now officially implemented our goal: -getting this to work requires SSA construction in the general case. However, -to be really useful, we want the ability to define our own local variables, lets -add this next! -</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="localvars">User-defined Local Variables</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>Adding var/in is just like any other other extensions we made to -Kaleidoscope: we extend the lexer, the parser, the AST and the code generator. -The first step for adding our new 'var/in' construct is to extend the lexer. -As before, this is pretty trivial, the code looks like this:</p> - -<div class="doc_code"> -<pre> -enum Token { - ... - <b>// var definition - tok_var = -13</b> -... -} -... -static int gettok() { -... - if (IdentifierStr == "in") return tok_in; - if (IdentifierStr == "binary") return tok_binary; - if (IdentifierStr == "unary") return tok_unary; - <b>if (IdentifierStr == "var") return tok_var;</b> - return tok_identifier; -... -</pre> -</div> - -<p>The next step is to define the AST node that we will construct. For var/in, -it looks like this:</p> - -<div class="doc_code"> -<pre> -/// VarExprAST - Expression class for var/in -class VarExprAST : public ExprAST { - std::vector<std::pair<std::string, ExprAST*> > VarNames; - ExprAST *Body; -public: - VarExprAST(const std::vector<std::pair<std::string, ExprAST*> > &varnames, - ExprAST *body) - : VarNames(varnames), Body(body) {} - - virtual Value *Codegen(); -}; -</pre> -</div> - -<p>var/in allows a list of names to be defined all at once, and each name can -optionally have an initializer value. As such, we capture this information in -the VarNames vector. Also, var/in has a body, this body is allowed to access -the variables defined by the var/in.</p> - -<p>With this in place, we can define the parser pieces. The first thing we do is add -it as a primary expression:</p> - -<div class="doc_code"> -<pre> -/// primary -/// ::= identifierexpr -/// ::= numberexpr -/// ::= parenexpr -/// ::= ifexpr -/// ::= forexpr -<b>/// ::= varexpr</b> -static ExprAST *ParsePrimary() { - switch (CurTok) { - default: return Error("unknown token when expecting an expression"); - case tok_identifier: return ParseIdentifierExpr(); - case tok_number: return ParseNumberExpr(); - case '(': return ParseParenExpr(); - case tok_if: return ParseIfExpr(); - case tok_for: return ParseForExpr(); - <b>case tok_var: return ParseVarExpr();</b> - } -} -</pre> -</div> - -<p>Next we define ParseVarExpr:</p> - -<div class="doc_code"> -<pre> -/// varexpr ::= 'var' identifier ('=' expression)? -// (',' identifier ('=' expression)?)* 'in' expression -static ExprAST *ParseVarExpr() { - getNextToken(); // eat the var. - - std::vector<std::pair<std::string, ExprAST*> > VarNames; - - // At least one variable name is required. - if (CurTok != tok_identifier) - return Error("expected identifier after var"); -</pre> -</div> - -<p>The first part of this code parses the list of identifier/expr pairs into the -local <tt>VarNames</tt> vector. - -<div class="doc_code"> -<pre> - while (1) { - std::string Name = IdentifierStr; - getNextToken(); // eat identifier. - - // Read the optional initializer. - ExprAST *Init = 0; - if (CurTok == '=') { - getNextToken(); // eat the '='. - - Init = ParseExpression(); - if (Init == 0) return 0; - } - - VarNames.push_back(std::make_pair(Name, Init)); - - // End of var list, exit loop. - if (CurTok != ',') break; - getNextToken(); // eat the ','. - - if (CurTok != tok_identifier) - return Error("expected identifier list after var"); - } -</pre> -</div> - -<p>Once all the variables are parsed, we then parse the body and create the -AST node:</p> - -<div class="doc_code"> -<pre> - // At this point, we have to have 'in'. - if (CurTok != tok_in) - return Error("expected 'in' keyword after 'var'"); - getNextToken(); // eat 'in'. - - ExprAST *Body = ParseExpression(); - if (Body == 0) return 0; - - return new VarExprAST(VarNames, Body); -} -</pre> -</div> - -<p>Now that we can parse and represent the code, we need to support emission of -LLVM IR for it. This code starts out with:</p> - -<div class="doc_code"> -<pre> -Value *VarExprAST::Codegen() { - std::vector<AllocaInst *> OldBindings; - - Function *TheFunction = Builder.GetInsertBlock()->getParent(); - - // Register all variables and emit their initializer. - for (unsigned i = 0, e = VarNames.size(); i != e; ++i) { - const std::string &VarName = VarNames[i].first; - ExprAST *Init = VarNames[i].second; -</pre> -</div> - -<p>Basically it loops over all the variables, installing them one at a time. -For each variable we put into the symbol table, we remember the previous value -that we replace in OldBindings.</p> - -<div class="doc_code"> -<pre> - // Emit the initializer before adding the variable to scope, this prevents - // the initializer from referencing the variable itself, and permits stuff - // like this: - // var a = 1 in - // var a = a in ... # refers to outer 'a'. - Value *InitVal; - if (Init) { - InitVal = Init->Codegen(); - if (InitVal == 0) return 0; - } else { // If not specified, use 0.0. - InitVal = ConstantFP::get(getGlobalContext(), APFloat(0.0)); - } - - AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName); - Builder.CreateStore(InitVal, Alloca); - - // Remember the old variable binding so that we can restore the binding when - // we unrecurse. - OldBindings.push_back(NamedValues[VarName]); - - // Remember this binding. - NamedValues[VarName] = Alloca; - } -</pre> -</div> - -<p>There are more comments here than code. The basic idea is that we emit the -initializer, create the alloca, then update the symbol table to point to it. -Once all the variables are installed in the symbol table, we evaluate the body -of the var/in expression:</p> - -<div class="doc_code"> -<pre> - // Codegen the body, now that all vars are in scope. - Value *BodyVal = Body->Codegen(); - if (BodyVal == 0) return 0; -</pre> -</div> - -<p>Finally, before returning, we restore the previous variable bindings:</p> - -<div class="doc_code"> -<pre> - // Pop all our variables from scope. - for (unsigned i = 0, e = VarNames.size(); i != e; ++i) - NamedValues[VarNames[i].first] = OldBindings[i]; - - // Return the body computation. - return BodyVal; -} -</pre> -</div> - -<p>The end result of all of this is that we get properly scoped variable -definitions, and we even (trivially) allow mutation of them :).</p> - -<p>With this, we completed what we set out to do. Our nice iterative fib -example from the intro compiles and runs just fine. The mem2reg pass optimizes -all of our stack variables into SSA registers, inserting PHI nodes where needed, -and our front-end remains simple: no "iterated dominance frontier" computation -anywhere in sight.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="code">Full Code Listing</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p> -Here is the complete code listing for our running example, enhanced with mutable -variables and var/in support. To build this example, use: -</p> - -<div class="doc_code"> -<pre> -# Compile -clang++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy -# Run -./toy -</pre> -</div> - -<p>Here is the code:</p> - -<div class="doc_code"> -<pre> -#include "llvm/DerivedTypes.h" -#include "llvm/ExecutionEngine/ExecutionEngine.h" -#include "llvm/ExecutionEngine/JIT.h" -#include "llvm/IRBuilder.h" -#include "llvm/LLVMContext.h" -#include "llvm/Module.h" -#include "llvm/PassManager.h" -#include "llvm/Analysis/Verifier.h" -#include "llvm/Analysis/Passes.h" -#include "llvm/DataLayout.h" -#include "llvm/Transforms/Scalar.h" -#include "llvm/Support/TargetSelect.h" -#include <cstdio> -#include <string> -#include <map> -#include <vector> -using namespace llvm; - -//===----------------------------------------------------------------------===// -// Lexer -//===----------------------------------------------------------------------===// - -// The lexer returns tokens [0-255] if it is an unknown character, otherwise one -// of these for known things. -enum Token { - tok_eof = -1, - - // commands - tok_def = -2, tok_extern = -3, - - // primary - tok_identifier = -4, tok_number = -5, - - // control - tok_if = -6, tok_then = -7, tok_else = -8, - tok_for = -9, tok_in = -10, - - // operators - tok_binary = -11, tok_unary = -12, - - // var definition - tok_var = -13 -}; - -static std::string IdentifierStr; // Filled in if tok_identifier -static double NumVal; // Filled in if tok_number - -/// gettok - Return the next token from standard input. -static int gettok() { - static int LastChar = ' '; - - // Skip any whitespace. - while (isspace(LastChar)) - LastChar = getchar(); - - if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]* - IdentifierStr = LastChar; - while (isalnum((LastChar = getchar()))) - IdentifierStr += LastChar; - - if (IdentifierStr == "def") return tok_def; - if (IdentifierStr == "extern") return tok_extern; - if (IdentifierStr == "if") return tok_if; - if (IdentifierStr == "then") return tok_then; - if (IdentifierStr == "else") return tok_else; - if (IdentifierStr == "for") return tok_for; - if (IdentifierStr == "in") return tok_in; - if (IdentifierStr == "binary") return tok_binary; - if (IdentifierStr == "unary") return tok_unary; - if (IdentifierStr == "var") return tok_var; - return tok_identifier; - } - - if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+ - std::string NumStr; - do { - NumStr += LastChar; - LastChar = getchar(); - } while (isdigit(LastChar) || LastChar == '.'); - - NumVal = strtod(NumStr.c_str(), 0); - return tok_number; - } - - if (LastChar == '#') { - // Comment until end of line. - do LastChar = getchar(); - while (LastChar != EOF && LastChar != '\n' && LastChar != '\r'); - - if (LastChar != EOF) - return gettok(); - } - - // Check for end of file. Don't eat the EOF. - if (LastChar == EOF) - return tok_eof; - - // Otherwise, just return the character as its ascii value. - int ThisChar = LastChar; - LastChar = getchar(); - return ThisChar; -} - -//===----------------------------------------------------------------------===// -// Abstract Syntax Tree (aka Parse Tree) -//===----------------------------------------------------------------------===// - -/// ExprAST - Base class for all expression nodes. -class ExprAST { -public: - virtual ~ExprAST() {} - virtual Value *Codegen() = 0; -}; - -/// NumberExprAST - Expression class for numeric literals like "1.0". -class NumberExprAST : public ExprAST { - double Val; -public: - NumberExprAST(double val) : Val(val) {} - virtual Value *Codegen(); -}; - -/// VariableExprAST - Expression class for referencing a variable, like "a". -class VariableExprAST : public ExprAST { - std::string Name; -public: - VariableExprAST(const std::string &name) : Name(name) {} - const std::string &getName() const { return Name; } - virtual Value *Codegen(); -}; - -/// UnaryExprAST - Expression class for a unary operator. -class UnaryExprAST : public ExprAST { - char Opcode; - ExprAST *Operand; -public: - UnaryExprAST(char opcode, ExprAST *operand) - : Opcode(opcode), Operand(operand) {} - virtual Value *Codegen(); -}; - -/// BinaryExprAST - Expression class for a binary operator. -class BinaryExprAST : public ExprAST { - char Op; - ExprAST *LHS, *RHS; -public: - BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs) - : Op(op), LHS(lhs), RHS(rhs) {} - virtual Value *Codegen(); -}; - -/// CallExprAST - Expression class for function calls. -class CallExprAST : public ExprAST { - std::string Callee; - std::vector<ExprAST*> Args; -public: - CallExprAST(const std::string &callee, std::vector<ExprAST*> &args) - : Callee(callee), Args(args) {} - virtual Value *Codegen(); -}; - -/// IfExprAST - Expression class for if/then/else. -class IfExprAST : public ExprAST { - ExprAST *Cond, *Then, *Else; -public: - IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else) - : Cond(cond), Then(then), Else(_else) {} - virtual Value *Codegen(); -}; - -/// ForExprAST - Expression class for for/in. -class ForExprAST : public ExprAST { - std::string VarName; - ExprAST *Start, *End, *Step, *Body; -public: - ForExprAST(const std::string &varname, ExprAST *start, ExprAST *end, - ExprAST *step, ExprAST *body) - : VarName(varname), Start(start), End(end), Step(step), Body(body) {} - virtual Value *Codegen(); -}; - -/// VarExprAST - Expression class for var/in -class VarExprAST : public ExprAST { - std::vector<std::pair<std::string, ExprAST*> > VarNames; - ExprAST *Body; -public: - VarExprAST(const std::vector<std::pair<std::string, ExprAST*> > &varnames, - ExprAST *body) - : VarNames(varnames), Body(body) {} - - virtual Value *Codegen(); -}; - -/// PrototypeAST - This class represents the "prototype" for a function, -/// which captures its name, and its argument names (thus implicitly the number -/// of arguments the function takes), as well as if it is an operator. -class PrototypeAST { - std::string Name; - std::vector<std::string> Args; - bool isOperator; - unsigned Precedence; // Precedence if a binary op. -public: - PrototypeAST(const std::string &name, const std::vector<std::string> &args, - bool isoperator = false, unsigned prec = 0) - : Name(name), Args(args), isOperator(isoperator), Precedence(prec) {} - - bool isUnaryOp() const { return isOperator && Args.size() == 1; } - bool isBinaryOp() const { return isOperator && Args.size() == 2; } - - char getOperatorName() const { - assert(isUnaryOp() || isBinaryOp()); - return Name[Name.size()-1]; - } - - unsigned getBinaryPrecedence() const { return Precedence; } - - Function *Codegen(); - - void CreateArgumentAllocas(Function *F); -}; - -/// FunctionAST - This class represents a function definition itself. -class FunctionAST { - PrototypeAST *Proto; - ExprAST *Body; -public: - FunctionAST(PrototypeAST *proto, ExprAST *body) - : Proto(proto), Body(body) {} - - Function *Codegen(); -}; - -//===----------------------------------------------------------------------===// -// Parser -//===----------------------------------------------------------------------===// - -/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current -/// token the parser is looking at. getNextToken reads another token from the -/// lexer and updates CurTok with its results. -static int CurTok; -static int getNextToken() { - return CurTok = gettok(); -} - -/// BinopPrecedence - This holds the precedence for each binary operator that is -/// defined. -static std::map<char, int> BinopPrecedence; - -/// GetTokPrecedence - Get the precedence of the pending binary operator token. -static int GetTokPrecedence() { - if (!isascii(CurTok)) - return -1; - - // Make sure it's a declared binop. - int TokPrec = BinopPrecedence[CurTok]; - if (TokPrec <= 0) return -1; - return TokPrec; -} - -/// Error* - These are little helper functions for error handling. -ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;} -PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; } -FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; } - -static ExprAST *ParseExpression(); - -/// identifierexpr -/// ::= identifier -/// ::= identifier '(' expression* ')' -static ExprAST *ParseIdentifierExpr() { - std::string IdName = IdentifierStr; - - getNextToken(); // eat identifier. - - if (CurTok != '(') // Simple variable ref. - return new VariableExprAST(IdName); - - // Call. - getNextToken(); // eat ( - std::vector<ExprAST*> Args; - if (CurTok != ')') { - while (1) { - ExprAST *Arg = ParseExpression(); - if (!Arg) return 0; - Args.push_back(Arg); - - if (CurTok == ')') break; - - if (CurTok != ',') - return Error("Expected ')' or ',' in argument list"); - getNextToken(); - } - } - - // Eat the ')'. - getNextToken(); - - return new CallExprAST(IdName, Args); -} - -/// numberexpr ::= number -static ExprAST *ParseNumberExpr() { - ExprAST *Result = new NumberExprAST(NumVal); - getNextToken(); // consume the number - return Result; -} - -/// parenexpr ::= '(' expression ')' -static ExprAST *ParseParenExpr() { - getNextToken(); // eat (. - ExprAST *V = ParseExpression(); - if (!V) return 0; - - if (CurTok != ')') - return Error("expected ')'"); - getNextToken(); // eat ). - return V; -} - -/// ifexpr ::= 'if' expression 'then' expression 'else' expression -static ExprAST *ParseIfExpr() { - getNextToken(); // eat the if. - - // condition. - ExprAST *Cond = ParseExpression(); - if (!Cond) return 0; - - if (CurTok != tok_then) - return Error("expected then"); - getNextToken(); // eat the then - - ExprAST *Then = ParseExpression(); - if (Then == 0) return 0; - - if (CurTok != tok_else) - return Error("expected else"); - - getNextToken(); - - ExprAST *Else = ParseExpression(); - if (!Else) return 0; - - return new IfExprAST(Cond, Then, Else); -} - -/// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression -static ExprAST *ParseForExpr() { - getNextToken(); // eat the for. - - if (CurTok != tok_identifier) - return Error("expected identifier after for"); - - std::string IdName = IdentifierStr; - getNextToken(); // eat identifier. - - if (CurTok != '=') - return Error("expected '=' after for"); - getNextToken(); // eat '='. - - - ExprAST *Start = ParseExpression(); - if (Start == 0) return 0; - if (CurTok != ',') - return Error("expected ',' after for start value"); - getNextToken(); - - ExprAST *End = ParseExpression(); - if (End == 0) return 0; - - // The step value is optional. - ExprAST *Step = 0; - if (CurTok == ',') { - getNextToken(); - Step = ParseExpression(); - if (Step == 0) return 0; - } - - if (CurTok != tok_in) - return Error("expected 'in' after for"); - getNextToken(); // eat 'in'. - - ExprAST *Body = ParseExpression(); - if (Body == 0) return 0; - - return new ForExprAST(IdName, Start, End, Step, Body); -} - -/// varexpr ::= 'var' identifier ('=' expression)? -// (',' identifier ('=' expression)?)* 'in' expression -static ExprAST *ParseVarExpr() { - getNextToken(); // eat the var. - - std::vector<std::pair<std::string, ExprAST*> > VarNames; - - // At least one variable name is required. - if (CurTok != tok_identifier) - return Error("expected identifier after var"); - - while (1) { - std::string Name = IdentifierStr; - getNextToken(); // eat identifier. - - // Read the optional initializer. - ExprAST *Init = 0; - if (CurTok == '=') { - getNextToken(); // eat the '='. - - Init = ParseExpression(); - if (Init == 0) return 0; - } - - VarNames.push_back(std::make_pair(Name, Init)); - - // End of var list, exit loop. - if (CurTok != ',') break; - getNextToken(); // eat the ','. - - if (CurTok != tok_identifier) - return Error("expected identifier list after var"); - } - - // At this point, we have to have 'in'. - if (CurTok != tok_in) - return Error("expected 'in' keyword after 'var'"); - getNextToken(); // eat 'in'. - - ExprAST *Body = ParseExpression(); - if (Body == 0) return 0; - - return new VarExprAST(VarNames, Body); -} - -/// primary -/// ::= identifierexpr -/// ::= numberexpr -/// ::= parenexpr -/// ::= ifexpr -/// ::= forexpr -/// ::= varexpr -static ExprAST *ParsePrimary() { - switch (CurTok) { - default: return Error("unknown token when expecting an expression"); - case tok_identifier: return ParseIdentifierExpr(); - case tok_number: return ParseNumberExpr(); - case '(': return ParseParenExpr(); - case tok_if: return ParseIfExpr(); - case tok_for: return ParseForExpr(); - case tok_var: return ParseVarExpr(); - } -} - -/// unary -/// ::= primary -/// ::= '!' unary -static ExprAST *ParseUnary() { - // If the current token is not an operator, it must be a primary expr. - if (!isascii(CurTok) || CurTok == '(' || CurTok == ',') - return ParsePrimary(); - - // If this is a unary operator, read it. - int Opc = CurTok; - getNextToken(); - if (ExprAST *Operand = ParseUnary()) - return new UnaryExprAST(Opc, Operand); - return 0; -} - -/// binoprhs -/// ::= ('+' unary)* -static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) { - // If this is a binop, find its precedence. - while (1) { - int TokPrec = GetTokPrecedence(); - - // If this is a binop that binds at least as tightly as the current binop, - // consume it, otherwise we are done. - if (TokPrec < ExprPrec) - return LHS; - - // Okay, we know this is a binop. - int BinOp = CurTok; - getNextToken(); // eat binop - - // Parse the unary expression after the binary operator. - ExprAST *RHS = ParseUnary(); - if (!RHS) return 0; - - // If BinOp binds less tightly with RHS than the operator after RHS, let - // the pending operator take RHS as its LHS. - int NextPrec = GetTokPrecedence(); - if (TokPrec < NextPrec) { - RHS = ParseBinOpRHS(TokPrec+1, RHS); - if (RHS == 0) return 0; - } - - // Merge LHS/RHS. - LHS = new BinaryExprAST(BinOp, LHS, RHS); - } -} - -/// expression -/// ::= unary binoprhs -/// -static ExprAST *ParseExpression() { - ExprAST *LHS = ParseUnary(); - if (!LHS) return 0; - - return ParseBinOpRHS(0, LHS); -} - -/// prototype -/// ::= id '(' id* ')' -/// ::= binary LETTER number? (id, id) -/// ::= unary LETTER (id) -static PrototypeAST *ParsePrototype() { - std::string FnName; - - unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary. - unsigned BinaryPrecedence = 30; - - switch (CurTok) { - default: - return ErrorP("Expected function name in prototype"); - case tok_identifier: - FnName = IdentifierStr; - Kind = 0; - getNextToken(); - break; - case tok_unary: - getNextToken(); - if (!isascii(CurTok)) - return ErrorP("Expected unary operator"); - FnName = "unary"; - FnName += (char)CurTok; - Kind = 1; - getNextToken(); - break; - case tok_binary: - getNextToken(); - if (!isascii(CurTok)) - return ErrorP("Expected binary operator"); - FnName = "binary"; - FnName += (char)CurTok; - Kind = 2; - getNextToken(); - - // Read the precedence if present. - if (CurTok == tok_number) { - if (NumVal < 1 || NumVal > 100) - return ErrorP("Invalid precedecnce: must be 1..100"); - BinaryPrecedence = (unsigned)NumVal; - getNextToken(); - } - break; - } - - if (CurTok != '(') - return ErrorP("Expected '(' in prototype"); - - std::vector<std::string> ArgNames; - while (getNextToken() == tok_identifier) - ArgNames.push_back(IdentifierStr); - if (CurTok != ')') - return ErrorP("Expected ')' in prototype"); - - // success. - getNextToken(); // eat ')'. - - // Verify right number of names for operator. - if (Kind && ArgNames.size() != Kind) - return ErrorP("Invalid number of operands for operator"); - - return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence); -} - -/// definition ::= 'def' prototype expression -static FunctionAST *ParseDefinition() { - getNextToken(); // eat def. - PrototypeAST *Proto = ParsePrototype(); - if (Proto == 0) return 0; - - if (ExprAST *E = ParseExpression()) - return new FunctionAST(Proto, E); - return 0; -} - -/// toplevelexpr ::= expression -static FunctionAST *ParseTopLevelExpr() { - if (ExprAST *E = ParseExpression()) { - // Make an anonymous proto. - PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>()); - return new FunctionAST(Proto, E); - } - return 0; -} - -/// external ::= 'extern' prototype -static PrototypeAST *ParseExtern() { - getNextToken(); // eat extern. - return ParsePrototype(); -} - -//===----------------------------------------------------------------------===// -// Code Generation -//===----------------------------------------------------------------------===// - -static Module *TheModule; -static IRBuilder<> Builder(getGlobalContext()); -static std::map<std::string, AllocaInst*> NamedValues; -static FunctionPassManager *TheFPM; - -Value *ErrorV(const char *Str) { Error(Str); return 0; } - -/// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of -/// the function. This is used for mutable variables etc. -static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction, - const std::string &VarName) { - IRBuilder<> TmpB(&TheFunction->getEntryBlock(), - TheFunction->getEntryBlock().begin()); - return TmpB.CreateAlloca(Type::getDoubleTy(getGlobalContext()), 0, - VarName.c_str()); -} - -Value *NumberExprAST::Codegen() { - return ConstantFP::get(getGlobalContext(), APFloat(Val)); -} - -Value *VariableExprAST::Codegen() { - // Look this variable up in the function. - Value *V = NamedValues[Name]; - if (V == 0) return ErrorV("Unknown variable name"); - - // Load the value. - return Builder.CreateLoad(V, Name.c_str()); -} - -Value *UnaryExprAST::Codegen() { - Value *OperandV = Operand->Codegen(); - if (OperandV == 0) return 0; - - Function *F = TheModule->getFunction(std::string("unary")+Opcode); - if (F == 0) - return ErrorV("Unknown unary operator"); - - return Builder.CreateCall(F, OperandV, "unop"); -} - -Value *BinaryExprAST::Codegen() { - // Special case '=' because we don't want to emit the LHS as an expression. - if (Op == '=') { - // Assignment requires the LHS to be an identifier. - VariableExprAST *LHSE = dynamic_cast<VariableExprAST*>(LHS); - if (!LHSE) - return ErrorV("destination of '=' must be a variable"); - // Codegen the RHS. - Value *Val = RHS->Codegen(); - if (Val == 0) return 0; - - // Look up the name. - Value *Variable = NamedValues[LHSE->getName()]; - if (Variable == 0) return ErrorV("Unknown variable name"); - - Builder.CreateStore(Val, Variable); - return Val; - } - - Value *L = LHS->Codegen(); - Value *R = RHS->Codegen(); - if (L == 0 || R == 0) return 0; - - switch (Op) { - case '+': return Builder.CreateFAdd(L, R, "addtmp"); - case '-': return Builder.CreateFSub(L, R, "subtmp"); - case '*': return Builder.CreateFMul(L, R, "multmp"); - case '<': - L = Builder.CreateFCmpULT(L, R, "cmptmp"); - // Convert bool 0/1 to double 0.0 or 1.0 - return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), - "booltmp"); - default: break; - } - - // If it wasn't a builtin binary operator, it must be a user defined one. Emit - // a call to it. - Function *F = TheModule->getFunction(std::string("binary")+Op); - assert(F && "binary operator not found!"); - - Value *Ops[2] = { L, R }; - return Builder.CreateCall(F, Ops, "binop"); -} - -Value *CallExprAST::Codegen() { - // Look up the name in the global module table. - Function *CalleeF = TheModule->getFunction(Callee); - if (CalleeF == 0) - return ErrorV("Unknown function referenced"); - - // If argument mismatch error. - if (CalleeF->arg_size() != Args.size()) - return ErrorV("Incorrect # arguments passed"); - - std::vector<Value*> ArgsV; - for (unsigned i = 0, e = Args.size(); i != e; ++i) { - ArgsV.push_back(Args[i]->Codegen()); - if (ArgsV.back() == 0) return 0; - } - - return Builder.CreateCall(CalleeF, ArgsV, "calltmp"); -} - -Value *IfExprAST::Codegen() { - Value *CondV = Cond->Codegen(); - if (CondV == 0) return 0; - - // Convert condition to a bool by comparing equal to 0.0. - CondV = Builder.CreateFCmpONE(CondV, - ConstantFP::get(getGlobalContext(), APFloat(0.0)), - "ifcond"); - - Function *TheFunction = Builder.GetInsertBlock()->getParent(); - - // Create blocks for the then and else cases. Insert the 'then' block at the - // end of the function. - BasicBlock *ThenBB = BasicBlock::Create(getGlobalContext(), "then", TheFunction); - BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else"); - BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont"); - - Builder.CreateCondBr(CondV, ThenBB, ElseBB); - - // Emit then value. - Builder.SetInsertPoint(ThenBB); - - Value *ThenV = Then->Codegen(); - if (ThenV == 0) return 0; - - Builder.CreateBr(MergeBB); - // Codegen of 'Then' can change the current block, update ThenBB for the PHI. - ThenBB = Builder.GetInsertBlock(); - - // Emit else block. - TheFunction->getBasicBlockList().push_back(ElseBB); - Builder.SetInsertPoint(ElseBB); - - Value *ElseV = Else->Codegen(); - if (ElseV == 0) return 0; - - Builder.CreateBr(MergeBB); - // Codegen of 'Else' can change the current block, update ElseBB for the PHI. - ElseBB = Builder.GetInsertBlock(); - - // Emit merge block. - TheFunction->getBasicBlockList().push_back(MergeBB); - Builder.SetInsertPoint(MergeBB); - PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, - "iftmp"); - - PN->addIncoming(ThenV, ThenBB); - PN->addIncoming(ElseV, ElseBB); - return PN; -} - -Value *ForExprAST::Codegen() { - // Output this as: - // var = alloca double - // ... - // start = startexpr - // store start -> var - // goto loop - // loop: - // ... - // bodyexpr - // ... - // loopend: - // step = stepexpr - // endcond = endexpr - // - // curvar = load var - // nextvar = curvar + step - // store nextvar -> var - // br endcond, loop, endloop - // outloop: - - Function *TheFunction = Builder.GetInsertBlock()->getParent(); - - // Create an alloca for the variable in the entry block. - AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName); - - // Emit the start code first, without 'variable' in scope. - Value *StartVal = Start->Codegen(); - if (StartVal == 0) return 0; - - // Store the value into the alloca. - Builder.CreateStore(StartVal, Alloca); - - // Make the new basic block for the loop header, inserting after current - // block. - BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction); - - // Insert an explicit fall through from the current block to the LoopBB. - Builder.CreateBr(LoopBB); - - // Start insertion in LoopBB. - Builder.SetInsertPoint(LoopBB); - - // Within the loop, the variable is defined equal to the PHI node. If it - // shadows an existing variable, we have to restore it, so save it now. - AllocaInst *OldVal = NamedValues[VarName]; - NamedValues[VarName] = Alloca; - - // Emit the body of the loop. This, like any other expr, can change the - // current BB. Note that we ignore the value computed by the body, but don't - // allow an error. - if (Body->Codegen() == 0) - return 0; - - // Emit the step value. - Value *StepVal; - if (Step) { - StepVal = Step->Codegen(); - if (StepVal == 0) return 0; - } else { - // If not specified, use 1.0. - StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0)); - } - - // Compute the end condition. - Value *EndCond = End->Codegen(); - if (EndCond == 0) return EndCond; - - // Reload, increment, and restore the alloca. This handles the case where - // the body of the loop mutates the variable. - Value *CurVar = Builder.CreateLoad(Alloca, VarName.c_str()); - Value *NextVar = Builder.CreateFAdd(CurVar, StepVal, "nextvar"); - Builder.CreateStore(NextVar, Alloca); - - // Convert condition to a bool by comparing equal to 0.0. - EndCond = Builder.CreateFCmpONE(EndCond, - ConstantFP::get(getGlobalContext(), APFloat(0.0)), - "loopcond"); - - // Create the "after loop" block and insert it. - BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction); - - // Insert the conditional branch into the end of LoopEndBB. - Builder.CreateCondBr(EndCond, LoopBB, AfterBB); - - // Any new code will be inserted in AfterBB. - Builder.SetInsertPoint(AfterBB); - - // Restore the unshadowed variable. - if (OldVal) - NamedValues[VarName] = OldVal; - else - NamedValues.erase(VarName); - - - // for expr always returns 0.0. - return Constant::getNullValue(Type::getDoubleTy(getGlobalContext())); -} - -Value *VarExprAST::Codegen() { - std::vector<AllocaInst *> OldBindings; - - Function *TheFunction = Builder.GetInsertBlock()->getParent(); - - // Register all variables and emit their initializer. - for (unsigned i = 0, e = VarNames.size(); i != e; ++i) { - const std::string &VarName = VarNames[i].first; - ExprAST *Init = VarNames[i].second; - - // Emit the initializer before adding the variable to scope, this prevents - // the initializer from referencing the variable itself, and permits stuff - // like this: - // var a = 1 in - // var a = a in ... # refers to outer 'a'. - Value *InitVal; - if (Init) { - InitVal = Init->Codegen(); - if (InitVal == 0) return 0; - } else { // If not specified, use 0.0. - InitVal = ConstantFP::get(getGlobalContext(), APFloat(0.0)); - } - - AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName); - Builder.CreateStore(InitVal, Alloca); - - // Remember the old variable binding so that we can restore the binding when - // we unrecurse. - OldBindings.push_back(NamedValues[VarName]); - - // Remember this binding. - NamedValues[VarName] = Alloca; - } - - // Codegen the body, now that all vars are in scope. - Value *BodyVal = Body->Codegen(); - if (BodyVal == 0) return 0; - - // Pop all our variables from scope. - for (unsigned i = 0, e = VarNames.size(); i != e; ++i) - NamedValues[VarNames[i].first] = OldBindings[i]; - - // Return the body computation. - return BodyVal; -} - -Function *PrototypeAST::Codegen() { - // Make the function type: double(double,double) etc. - std::vector<Type*> Doubles(Args.size(), - Type::getDoubleTy(getGlobalContext())); - FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()), - Doubles, false); - - Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule); - - // If F conflicted, there was already something named 'Name'. If it has a - // body, don't allow redefinition or reextern. - if (F->getName() != Name) { - // Delete the one we just made and get the existing one. - F->eraseFromParent(); - F = TheModule->getFunction(Name); - - // If F already has a body, reject this. - if (!F->empty()) { - ErrorF("redefinition of function"); - return 0; - } - - // If F took a different number of args, reject. - if (F->arg_size() != Args.size()) { - ErrorF("redefinition of function with different # args"); - return 0; - } - } - - // Set names for all arguments. - unsigned Idx = 0; - for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size(); - ++AI, ++Idx) - AI->setName(Args[Idx]); - - return F; -} - -/// CreateArgumentAllocas - Create an alloca for each argument and register the -/// argument in the symbol table so that references to it will succeed. -void PrototypeAST::CreateArgumentAllocas(Function *F) { - Function::arg_iterator AI = F->arg_begin(); - for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) { - // Create an alloca for this variable. - AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]); - - // Store the initial value into the alloca. - Builder.CreateStore(AI, Alloca); - - // Add arguments to variable symbol table. - NamedValues[Args[Idx]] = Alloca; - } -} - -Function *FunctionAST::Codegen() { - NamedValues.clear(); - - Function *TheFunction = Proto->Codegen(); - if (TheFunction == 0) - return 0; - - // If this is an operator, install it. - if (Proto->isBinaryOp()) - BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence(); - - // Create a new basic block to start insertion into. - BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction); - Builder.SetInsertPoint(BB); - - // Add all arguments to the symbol table and create their allocas. - Proto->CreateArgumentAllocas(TheFunction); - - if (Value *RetVal = Body->Codegen()) { - // Finish off the function. - Builder.CreateRet(RetVal); - - // Validate the generated code, checking for consistency. - verifyFunction(*TheFunction); - - // Optimize the function. - TheFPM->run(*TheFunction); - - return TheFunction; - } - - // Error reading body, remove function. - TheFunction->eraseFromParent(); - - if (Proto->isBinaryOp()) - BinopPrecedence.erase(Proto->getOperatorName()); - return 0; -} - -//===----------------------------------------------------------------------===// -// Top-Level parsing and JIT Driver -//===----------------------------------------------------------------------===// - -static ExecutionEngine *TheExecutionEngine; - -static void HandleDefinition() { - if (FunctionAST *F = ParseDefinition()) { - if (Function *LF = F->Codegen()) { - fprintf(stderr, "Read function definition:"); - LF->dump(); - } - } else { - // Skip token for error recovery. - getNextToken(); - } -} - -static void HandleExtern() { - if (PrototypeAST *P = ParseExtern()) { - if (Function *F = P->Codegen()) { - fprintf(stderr, "Read extern: "); - F->dump(); - } - } else { - // Skip token for error recovery. - getNextToken(); - } -} - -static void HandleTopLevelExpression() { - // Evaluate a top-level expression into an anonymous function. - if (FunctionAST *F = ParseTopLevelExpr()) { - if (Function *LF = F->Codegen()) { - // JIT the function, returning a function pointer. - void *FPtr = TheExecutionEngine->getPointerToFunction(LF); - - // Cast it to the right type (takes no arguments, returns a double) so we - // can call it as a native function. - double (*FP)() = (double (*)())(intptr_t)FPtr; - fprintf(stderr, "Evaluated to %f\n", FP()); - } - } else { - // Skip token for error recovery. - getNextToken(); - } -} - -/// top ::= definition | external | expression | ';' -static void MainLoop() { - while (1) { - fprintf(stderr, "ready> "); - switch (CurTok) { - case tok_eof: return; - case ';': getNextToken(); break; // ignore top-level semicolons. - case tok_def: HandleDefinition(); break; - case tok_extern: HandleExtern(); break; - default: HandleTopLevelExpression(); break; - } - } -} - -//===----------------------------------------------------------------------===// -// "Library" functions that can be "extern'd" from user code. -//===----------------------------------------------------------------------===// - -/// putchard - putchar that takes a double and returns 0. -extern "C" -double putchard(double X) { - putchar((char)X); - return 0; -} - -/// printd - printf that takes a double prints it as "%f\n", returning 0. -extern "C" -double printd(double X) { - printf("%f\n", X); - return 0; -} - -//===----------------------------------------------------------------------===// -// Main driver code. -//===----------------------------------------------------------------------===// - -int main() { - InitializeNativeTarget(); - LLVMContext &Context = getGlobalContext(); - - // Install standard binary operators. - // 1 is lowest precedence. - BinopPrecedence['='] = 2; - BinopPrecedence['<'] = 10; - BinopPrecedence['+'] = 20; - BinopPrecedence['-'] = 20; - BinopPrecedence['*'] = 40; // highest. - - // Prime the first token. - fprintf(stderr, "ready> "); - getNextToken(); - - // Make the module, which holds all the code. - TheModule = new Module("my cool jit", Context); - - // Create the JIT. This takes ownership of the module. - std::string ErrStr; - TheExecutionEngine = EngineBuilder(TheModule).setErrorStr(&ErrStr).create(); - if (!TheExecutionEngine) { - fprintf(stderr, "Could not create ExecutionEngine: %s\n", ErrStr.c_str()); - exit(1); - } - - FunctionPassManager OurFPM(TheModule); - - // Set up the optimizer pipeline. Start with registering info about how the - // target lays out data structures. - OurFPM.add(new DataLayout(*TheExecutionEngine->getDataLayout())); - // Provide basic AliasAnalysis support for GVN. - OurFPM.add(createBasicAliasAnalysisPass()); - // Promote allocas to registers. - OurFPM.add(createPromoteMemoryToRegisterPass()); - // Do simple "peephole" optimizations and bit-twiddling optzns. - OurFPM.add(createInstructionCombiningPass()); - // Reassociate expressions. - OurFPM.add(createReassociatePass()); - // Eliminate Common SubExpressions. - OurFPM.add(createGVNPass()); - // Simplify the control flow graph (deleting unreachable blocks, etc). - OurFPM.add(createCFGSimplificationPass()); - - OurFPM.doInitialization(); - - // Set the global so the code gen can use this. - TheFPM = &OurFPM; - - // Run the main "interpreter loop" now. - MainLoop(); - - TheFPM = 0; - - // Print out all of the generated code. - TheModule->dump(); - - return 0; -} -</pre> -</div> - -<a href="LangImpl8.html">Next: Conclusion and other useful LLVM tidbits</a> -</div> - -<!-- *********************************************************************** --> -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> - - <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> - <a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> -</body> -</html> diff --git a/docs/tutorial/LangImpl7.rst b/docs/tutorial/LangImpl7.rst new file mode 100644 index 0000000000..602dcb5f6f --- /dev/null +++ b/docs/tutorial/LangImpl7.rst @@ -0,0 +1,2005 @@ +======================================================= +Kaleidoscope: Extending the Language: Mutable Variables +======================================================= + +.. contents:: + :local: + +Written by `Chris Lattner <mailto:sabre@nondot.org>`_ + +Chapter 7 Introduction +====================== + +Welcome to Chapter 7 of the "`Implementing a language with +LLVM <index.html>`_" tutorial. In chapters 1 through 6, we've built a +very respectable, albeit simple, `functional programming +language <http://en.wikipedia.org/wiki/Functional_programming>`_. In our +journey, we learned some parsing techniques, how to build and represent +an AST, how to build LLVM IR, and how to optimize the resultant code as +well as JIT compile it. + +While Kaleidoscope is interesting as a functional language, the fact +that it is functional makes it "too easy" to generate LLVM IR for it. In +particular, a functional language makes it very easy to build LLVM IR +directly in `SSA +form <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_. +Since LLVM requires that the input code be in SSA form, this is a very +nice property and it is often unclear to newcomers how to generate code +for an imperative language with mutable variables. + +The short (and happy) summary of this chapter is that there is no need +for your front-end to build SSA form: LLVM provides highly tuned and +well tested support for this, though the way it works is a bit +unexpected for some. + +Why is this a hard problem? +=========================== + +To understand why mutable variables cause complexities in SSA +construction, consider this extremely simple C example: + +.. code-block:: c + + int G, H; + int test(_Bool Condition) { + int X; + if (Condition) + X = G; + else + X = H; + return X; + } + +In this case, we have the variable "X", whose value depends on the path +executed in the program. Because there are two different possible values +for X before the return instruction, a PHI node is inserted to merge the +two values. The LLVM IR that we want for this example looks like this: + +.. code-block:: llvm + + @G = weak global i32 0 ; type of @G is i32* + @H = weak global i32 0 ; type of @H is i32* + + define i32 @test(i1 %Condition) { + entry: + br i1 %Condition, label %cond_true, label %cond_false + + cond_true: + %X.0 = load i32* @G + br label %cond_next + + cond_false: + %X.1 = load i32* @H + br label %cond_next + + cond_next: + %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ] + ret i32 %X.2 + } + +In this example, the loads from the G and H global variables are +explicit in the LLVM IR, and they live in the then/else branches of the +if statement (cond\_true/cond\_false). In order to merge the incoming +values, the X.2 phi node in the cond\_next block selects the right value +to use based on where control flow is coming from: if control flow comes +from the cond\_false block, X.2 gets the value of X.1. Alternatively, if +control flow comes from cond\_true, it gets the value of X.0. The intent +of this chapter is not to explain the details of SSA form. For more +information, see one of the many `online +references <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_. + +The question for this article is "who places the phi nodes when lowering +assignments to mutable variables?". The issue here is that LLVM +*requires* that its IR be in SSA form: there is no "non-ssa" mode for +it. However, SSA construction requires non-trivial algorithms and data +structures, so it is inconvenient and wasteful for every front-end to +have to reproduce this logic. + +Memory in LLVM +============== + +The 'trick' here is that while LLVM does require all register values to +be in SSA form, it does not require (or permit) memory objects to be in +SSA form. In the example above, note that the loads from G and H are +direct accesses to G and H: they are not renamed or versioned. This +differs from some other compiler systems, which do try to version memory +objects. In LLVM, instead of encoding dataflow analysis of memory into +the LLVM IR, it is handled with `Analysis +Passes <../WritingAnLLVMPass.html>`_ which are computed on demand. + +With this in mind, the high-level idea is that we want to make a stack +variable (which lives in memory, because it is on the stack) for each +mutable object in a function. To take advantage of this trick, we need +to talk about how LLVM represents stack variables. + +In LLVM, all memory accesses are explicit with load/store instructions, +and it is carefully designed not to have (or need) an "address-of" +operator. Notice how the type of the @G/@H global variables is actually +"i32\*" even though the variable is defined as "i32". What this means is +that @G defines *space* for an i32 in the global data area, but its +*name* actually refers to the address for that space. Stack variables +work the same way, except that instead of being declared with global +variable definitions, they are declared with the `LLVM alloca +instruction <../LangRef.html#i_alloca>`_: + +.. code-block:: llvm + + define i32 @example() { + entry: + %X = alloca i32 ; type of %X is i32*. + ... + %tmp = load i32* %X ; load the stack value %X from the stack. + %tmp2 = add i32 %tmp, 1 ; increment it + store i32 %tmp2, i32* %X ; store it back + ... + +This code shows an example of how you can declare and manipulate a stack +variable in the LLVM IR. Stack memory allocated with the alloca +instruction is fully general: you can pass the address of the stack slot +to functions, you can store it in other variables, etc. In our example +above, we could rewrite the example to use the alloca technique to avoid +using a PHI node: + +.. code-block:: llvm + + @G = weak global i32 0 ; type of @G is i32* + @H = weak global i32 0 ; type of @H is i32* + + define i32 @test(i1 %Condition) { + entry: + %X = alloca i32 ; type of %X is i32*. + br i1 %Condition, label %cond_true, label %cond_false + + cond_true: + %X.0 = load i32* @G + store i32 %X.0, i32* %X ; Update X + br label %cond_next + + cond_false: + %X.1 = load i32* @H + store i32 %X.1, i32* %X ; Update X + br label %cond_next + + cond_next: + %X.2 = load i32* %X ; Read X + ret i32 %X.2 + } + +With this, we have discovered a way to handle arbitrary mutable +variables without the need to create Phi nodes at all: + +#. Each mutable variable becomes a stack allocation. +#. Each read of the variable becomes a load from the stack. +#. Each update of the variable becomes a store to the stack. +#. Taking the address of a variable just uses the stack address + directly. + +While this solution has solved our immediate problem, it introduced +another one: we have now apparently introduced a lot of stack traffic +for very simple and common operations, a major performance problem. +Fortunately for us, the LLVM optimizer has a highly-tuned optimization +pass named "mem2reg" that handles this case, promoting allocas like this +into SSA registers, inserting Phi nodes as appropriate. If you run this +example through the pass, for example, you'll get: + +.. code-block:: bash + + $ llvm-as < example.ll | opt -mem2reg | llvm-dis + @G = weak global i32 0 + @H = weak global i32 0 + + define i32 @test(i1 %Condition) { + entry: + br i1 %Condition, label %cond_true, label %cond_false + + cond_true: + %X.0 = load i32* @G + br label %cond_next + + cond_false: + %X.1 = load i32* @H + br label %cond_next + + cond_next: + %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ] + ret i32 %X.01 + } + +The mem2reg pass implements the standard "iterated dominance frontier" +algorithm for constructing SSA form and has a number of optimizations +that speed up (very common) degenerate cases. The mem2reg optimization +pass is the answer to dealing with mutable variables, and we highly +recommend that you depend on it. Note that mem2reg only works on +variables in certain circumstances: + +#. mem2reg is alloca-driven: it looks for allocas and if it can handle + them, it promotes them. It does not apply to global variables or heap + allocations. +#. mem2reg only looks for alloca instructions in the entry block of the + function. Being in the entry block guarantees that the alloca is only + executed once, which makes analysis simpler. +#. mem2reg only promotes allocas whose uses are direct loads and stores. + If the address of the stack object is passed to a function, or if any + funny pointer arithmetic is involved, the alloca will not be + promoted. +#. mem2reg only works on allocas of `first + class <../LangRef.html#t_classifications>`_ values (such as pointers, + scalars and vectors), and only if the array size of the allocation is + 1 (or missing in the .ll file). mem2reg is not capable of promoting + structs or arrays to registers. Note that the "scalarrepl" pass is + more powerful and can promote structs, "unions", and arrays in many + cases. + +All of these properties are easy to satisfy for most imperative +languages, and we'll illustrate it below with Kaleidoscope. The final +question you may be asking is: should I bother with this nonsense for my +front-end? Wouldn't it be better if I just did SSA construction +directly, avoiding use of the mem2reg optimization pass? In short, we +strongly recommend that you use this technique for building SSA form, +unless there is an extremely good reason not to. Using this technique +is: + +- Proven and well tested: llvm-gcc and clang both use this technique + for local mutable variables. As such, the most common clients of LLVM + are using this to handle a bulk of their variables. You can be sure + that bugs are found fast and fixed early. +- Extremely Fast: mem2reg has a number of special cases that make it + fast in common cases as well as fully general. For example, it has + fast-paths for variables that are only used in a single block, + variables that only have one assignment point, good heuristics to + avoid insertion of unneeded phi nodes, etc. +- Needed for debug info generation: `Debug information in + LLVM <../SourceLevelDebugging.html>`_ relies on having the address of + the variable exposed so that debug info can be attached to it. This + technique dovetails very naturally with this style of debug info. + +If nothing else, this makes it much easier to get your front-end up and +running, and is very simple to implement. Lets extend Kaleidoscope with +mutable variables now! + +Mutable Variables in Kaleidoscope +================================= + +Now that we know the sort of problem we want to tackle, lets see what +this looks like in the context of our little Kaleidoscope language. +We're going to add two features: + +#. The ability to mutate variables with the '=' operator. +#. The ability to define new variables. + +While the first item is really what this is about, we only have +variables for incoming arguments as well as for induction variables, and +redefining those only goes so far :). Also, the ability to define new +variables is a useful thing regardless of whether you will be mutating +them. Here's a motivating example that shows how we could use these: + +:: + + # Define ':' for sequencing: as a low-precedence operator that ignores operands + # and just returns the RHS. + def binary : 1 (x y) y; + + # Recursive fib, we could do this before. + def fib(x) + if (x < 3) then + 1 + else + fib(x-1)+fib(x-2); + + # Iterative fib. + def fibi(x) + var a = 1, b = 1, c in + (for i = 3, i < x in + c = a + b : + a = b : + b = c) : + b; + + # Call it. + fibi(10); + +In order to mutate variables, we have to change our existing variables +to use the "alloca trick". Once we have that, we'll add our new +operator, then extend Kaleidoscope to support new variable definitions. + +Adjusting Existing Variables for Mutation +========================================= + +The symbol table in Kaleidoscope is managed at code generation time by +the '``NamedValues``' map. This map currently keeps track of the LLVM +"Value\*" that holds the double value for the named variable. In order +to support mutation, we need to change this slightly, so that it +``NamedValues`` holds the *memory location* of the variable in question. +Note that this change is a refactoring: it changes the structure of the +code, but does not (by itself) change the behavior of the compiler. All +of these changes are isolated in the Kaleidoscope code generator. + +At this point in Kaleidoscope's development, it only supports variables +for two things: incoming arguments to functions and the induction +variable of 'for' loops. For consistency, we'll allow mutation of these +variables in addition to other user-defined variables. This means that +these will both need memory locations. + +To start our transformation of Kaleidoscope, we'll change the +NamedValues map so that it maps to AllocaInst\* instead of Value\*. Once +we do this, the C++ compiler will tell us what parts of the code we need +to update: + +.. code-block:: c++ + + static std::map<std::string, AllocaInst*> NamedValues; + +Also, since we will need to create these alloca's, we'll use a helper +function that ensures that the allocas are created in the entry block of +the function: + +.. code-block:: c++ + + /// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of + /// the function. This is used for mutable variables etc. + static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction, + const std::string &VarName) { + IRBuilder<> TmpB(&TheFunction->getEntryBlock(), + TheFunction->getEntryBlock().begin()); + return TmpB.CreateAlloca(Type::getDoubleTy(getGlobalContext()), 0, + VarName.c_str()); + } + +This funny looking code creates an IRBuilder object that is pointing at +the first instruction (.begin()) of the entry block. It then creates an +alloca with the expected name and returns it. Because all values in +Kaleidoscope are doubles, there is no need to pass in a type to use. + +With this in place, the first functionality change we want to make is to +variable references. In our new scheme, variables live on the stack, so +code generating a reference to them actually needs to produce a load +from the stack slot: + +.. code-block:: c++ + + Value *VariableExprAST::Codegen() { + // Look this variable up in the function. + Value *V = NamedValues[Name]; + if (V == 0) return ErrorV("Unknown variable name"); + + // Load the value. + return Builder.CreateLoad(V, Name.c_str()); + } + +As you can see, this is pretty straightforward. Now we need to update +the things that define the variables to set up the alloca. We'll start +with ``ForExprAST::Codegen`` (see the `full code listing <#code>`_ for +the unabridged code): + +.. code-block:: c++ + + Function *TheFunction = Builder.GetInsertBlock()->getParent(); + + // Create an alloca for the variable in the entry block. + AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName); + + // Emit the start code first, without 'variable' in scope. + Value *StartVal = Start->Codegen(); + if (StartVal == 0) return 0; + + // Store the value into the alloca. + Builder.CreateStore(StartVal, Alloca); + ... + + // Compute the end condition. + Value *EndCond = End->Codegen(); + if (EndCond == 0) return EndCond; + + // Reload, increment, and restore the alloca. This handles the case where + // the body of the loop mutates the variable. + Value *CurVar = Builder.CreateLoad(Alloca); + Value *NextVar = Builder.CreateFAdd(CurVar, StepVal, "nextvar"); + Builder.CreateStore(NextVar, Alloca); + ... + +This code is virtually identical to the code `before we allowed mutable +variables <LangImpl5.html#forcodegen>`_. The big difference is that we +no longer have to construct a PHI node, and we use load/store to access +the variable as needed. + +To support mutable argument variables, we need to also make allocas for +them. The code for this is also pretty simple: + +.. code-block:: c++ + + /// CreateArgumentAllocas - Create an alloca for each argument and register the + /// argument in the symbol table so that references to it will succeed. + void PrototypeAST::CreateArgumentAllocas(Function *F) { + Function::arg_iterator AI = F->arg_begin(); + for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) { + // Create an alloca for this variable. + AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]); + + // Store the initial value into the alloca. + Builder.CreateStore(AI, Alloca); + + // Add arguments to variable symbol table. + NamedValues[Args[Idx]] = Alloca; + } + } + +For each argument, we make an alloca, store the input value to the +function into the alloca, and register the alloca as the memory location +for the argument. This method gets invoked by ``FunctionAST::Codegen`` +right after it sets up the entry block for the function. + +The final missing piece is adding the mem2reg pass, which allows us to +get good codegen once again: + +.. code-block:: c++ + + // Set up the optimizer pipeline. Start with registering info about how the + // target lays out data structures. + OurFPM.add(new DataLayout(*TheExecutionEngine->getDataLayout())); + // Promote allocas to registers. + OurFPM.add(createPromoteMemoryToRegisterPass()); + // Do simple "peephole" optimizations and bit-twiddling optzns. + OurFPM.add(createInstructionCombiningPass()); + // Reassociate expressions. + OurFPM.add(createReassociatePass()); + +It is interesting to see what the code looks like before and after the +mem2reg optimization runs. For example, this is the before/after code +for our recursive fib function. Before the optimization: + +.. code-block:: llvm + + define double @fib(double %x) { + entry: + %x1 = alloca double + store double %x, double* %x1 + %x2 = load double* %x1 + %cmptmp = fcmp ult double %x2, 3.000000e+00 + %booltmp = uitofp i1 %cmptmp to double + %ifcond = fcmp one double %booltmp, 0.000000e+00 + br i1 %ifcond, label %then, label %else + + then: ; preds = %entry + br label %ifcont + + else: ; preds = %entry + %x3 = load double* %x1 + %subtmp = fsub double %x3, 1.000000e+00 + %calltmp = call double @fib(double %subtmp) + %x4 = load double* %x1 + %subtmp5 = fsub double %x4, 2.000000e+00 + %calltmp6 = call double @fib(double %subtmp5) + %addtmp = fadd double %calltmp, %calltmp6 + br label %ifcont + + ifcont: ; preds = %else, %then + %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ] + ret double %iftmp + } + +Here there is only one variable (x, the input argument) but you can +still see the extremely simple-minded code generation strategy we are +using. In the entry block, an alloca is created, and the initial input +value is stored into it. Each reference to the variable does a reload +from the stack. Also, note that we didn't modify the if/then/else +expression, so it still inserts a PHI node. While we could make an +alloca for it, it is actually easier to create a PHI node for it, so we +still just make the PHI. + +Here is the code after the mem2reg pass runs: + +.. code-block:: llvm + + define double @fib(double %x) { + entry: + %cmptmp = fcmp ult double %x, 3.000000e+00 + %booltmp = uitofp i1 %cmptmp to double + %ifcond = fcmp one double %booltmp, 0.000000e+00 + br i1 %ifcond, label %then, label %else + + then: + br label %ifcont + + else: + %subtmp = fsub double %x, 1.000000e+00 + %calltmp = call double @fib(double %subtmp) + %subtmp5 = fsub double %x, 2.000000e+00 + %calltmp6 = call double @fib(double %subtmp5) + %addtmp = fadd double %calltmp, %calltmp6 + br label %ifcont + + ifcont: ; preds = %else, %then + %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ] + ret double %iftmp + } + +This is a trivial case for mem2reg, since there are no redefinitions of +the variable. The point of showing this is to calm your tension about +inserting such blatent inefficiencies :). + +After the rest of the optimizers run, we get: + +.. code-block:: llvm + + define double @fib(double %x) { + entry: + %cmptmp = fcmp ult double %x, 3.000000e+00 + %booltmp = uitofp i1 %cmptmp to double + %ifcond = fcmp ueq double %booltmp, 0.000000e+00 + br i1 %ifcond, label %else, label %ifcont + + else: + %subtmp = fsub double %x, 1.000000e+00 + %calltmp = call double @fib(double %subtmp) + %subtmp5 = fsub double %x, 2.000000e+00 + %calltmp6 = call double @fib(double %subtmp5) + %addtmp = fadd double %calltmp, %calltmp6 + ret double %addtmp + + ifcont: + ret double 1.000000e+00 + } + +Here we see that the simplifycfg pass decided to clone the return +instruction into the end of the 'else' block. This allowed it to +eliminate some branches and the PHI node. + +Now that all symbol table references are updated to use stack variables, +we'll add the assignment operator. + +New Assignment Operator +======================= + +With our current framework, adding a new assignment operator is really +simple. We will parse it just like any other binary operator, but handle +it internally (instead of allowing the user to define it). The first +step is to set a precedence: + +.. code-block:: c++ + + int main() { + // Install standard binary operators. + // 1 is lowest precedence. + BinopPrecedence['='] = 2; + BinopPrecedence['<'] = 10; + BinopPrecedence['+'] = 20; + BinopPrecedence['-'] = 20; + +Now that the parser knows the precedence of the binary operator, it +takes care of all the parsing and AST generation. We just need to +implement codegen for the assignment operator. This looks like: + +.. code-block:: c++ + + Value *BinaryExprAST::Codegen() { + // Special case '=' because we don't want to emit the LHS as an expression. + if (Op == '=') { + // Assignment requires the LHS to be an identifier. + VariableExprAST *LHSE = dynamic_cast<VariableExprAST*>(LHS); + if (!LHSE) + return ErrorV("destination of '=' must be a variable"); + +Unlike the rest of the binary operators, our assignment operator doesn't +follow the "emit LHS, emit RHS, do computation" model. As such, it is +handled as a special case before the other binary operators are handled. +The other strange thing is that it requires the LHS to be a variable. It +is invalid to have "(x+1) = expr" - only things like "x = expr" are +allowed. + +.. code-block:: c++ + + // Codegen the RHS. + Value *Val = RHS->Codegen(); + if (Val == 0) return 0; + + // Look up the name. + Value *Variable = NamedValues[LHSE->getName()]; + if (Variable == 0) return ErrorV("Unknown variable name"); + + Builder.CreateStore(Val, Variable); + return Val; + } + ... + +Once we have the variable, codegen'ing the assignment is +straightforward: we emit the RHS of the assignment, create a store, and +return the computed value. Returning a value allows for chained +assignments like "X = (Y = Z)". + +Now that we have an assignment operator, we can mutate loop variables +and arguments. For example, we can now run code like this: + +:: + + # Function to print a double. + extern printd(x); + + # Define ':' for sequencing: as a low-precedence operator that ignores operands + # and just returns the RHS. + def binary : 1 (x y) y; + + def test(x) + printd(x) : + x = 4 : + printd(x); + + test(123); + +When run, this example prints "123" and then "4", showing that we did +actually mutate the value! Okay, we have now officially implemented our +goal: getting this to work requires SSA construction in the general +case. However, to be really useful, we want the ability to define our +own local variables, lets add this next! + +User-defined Local Variables +============================ + +Adding var/in is just like any other other extensions we made to +Kaleidoscope: we extend the lexer, the parser, the AST and the code +generator. The first step for adding our new 'var/in' construct is to +extend the lexer. As before, this is pretty trivial, the code looks like +this: + +.. code-block:: c++ + + enum Token { + ... + // var definition + tok_var = -13 + ... + } + ... + static int gettok() { + ... + if (IdentifierStr == "in") return tok_in; + if (IdentifierStr == "binary") return tok_binary; + if (IdentifierStr == "unary") return tok_unary; + if (IdentifierStr == "var") return tok_var; + return tok_identifier; + ... + +The next step is to define the AST node that we will construct. For +var/in, it looks like this: + +.. code-block:: c++ + + /// VarExprAST - Expression class for var/in + class VarExprAST : public ExprAST { + std::vector<std::pair<std::string, ExprAST*> > VarNames; + ExprAST *Body; + public: + VarExprAST(const std::vector<std::pair<std::string, ExprAST*> > &varnames, + ExprAST *body) + : VarNames(varnames), Body(body) {} + + virtual Value *Codegen(); + }; + +var/in allows a list of names to be defined all at once, and each name +can optionally have an initializer value. As such, we capture this +information in the VarNames vector. Also, var/in has a body, this body +is allowed to access the variables defined by the var/in. + +With this in place, we can define the parser pieces. The first thing we +do is add it as a primary expression: + +.. code-block:: c++ + + /// primary + /// ::= identifierexpr + /// ::= numberexpr + /// ::= parenexpr + /// ::= ifexpr + /// ::= forexpr + /// ::= varexpr + static ExprAST *ParsePrimary() { + switch (CurTok) { + default: return Error("unknown token when expecting an expression"); + case tok_identifier: return ParseIdentifierExpr(); + case tok_number: return ParseNumberExpr(); + case '(': return ParseParenExpr(); + case tok_if: return ParseIfExpr(); + case tok_for: return ParseForExpr(); + case tok_var: return ParseVarExpr(); + } + } + +Next we define ParseVarExpr: + +.. code-block:: c++ + + /// varexpr ::= 'var' identifier ('=' expression)? + // (',' identifier ('=' expression)?)* 'in' expression + static ExprAST *ParseVarExpr() { + getNextToken(); // eat the var. + + std::vector<std::pair<std::string, ExprAST*> > VarNames; + + // At least one variable name is required. + if (CurTok != tok_identifier) + return Error("expected identifier after var"); + +The first part of this code parses the list of identifier/expr pairs +into the local ``VarNames`` vector. + +.. code-block:: c++ + + while (1) { + std::string Name = IdentifierStr; + getNextToken(); // eat identifier. + + // Read the optional initializer. + ExprAST *Init = 0; + if (CurTok == '=') { + getNextToken(); // eat the '='. + + Init = ParseExpression(); + if (Init == 0) return 0; + } + + VarNames.push_back(std::make_pair(Name, Init)); + + // End of var list, exit loop. + if (CurTok != ',') break; + getNextToken(); // eat the ','. + + if (CurTok != tok_identifier) + return Error("expected identifier list after var"); + } + +Once all the variables are parsed, we then parse the body and create the +AST node: + +.. code-block:: c++ + + // At this point, we have to have 'in'. + if (CurTok != tok_in) + return Error("expected 'in' keyword after 'var'"); + getNextToken(); // eat 'in'. + + ExprAST *Body = ParseExpression(); + if (Body == 0) return 0; + + return new VarExprAST(VarNames, Body); + } + +Now that we can parse and represent the code, we need to support +emission of LLVM IR for it. This code starts out with: + +.. code-block:: c++ + + Value *VarExprAST::Codegen() { + std::vector<AllocaInst *> OldBindings; + + Function *TheFunction = Builder.GetInsertBlock()->getParent(); + + // Register all variables and emit their initializer. + for (unsigned i = 0, e = VarNames.size(); i != e; ++i) { + const std::string &VarName = VarNames[i].first; + ExprAST *Init = VarNames[i].second; + +Basically it loops over all the variables, installing them one at a +time. For each variable we put into the symbol table, we remember the +previous value that we replace in OldBindings. + +.. code-block:: c++ + + // Emit the initializer before adding the variable to scope, this prevents + // the initializer from referencing the variable itself, and permits stuff + // like this: + // var a = 1 in + // var a = a in ... # refers to outer 'a'. + Value *InitVal; + if (Init) { + InitVal = Init->Codegen(); + if (InitVal == 0) return 0; + } else { // If not specified, use 0.0. + InitVal = ConstantFP::get(getGlobalContext(), APFloat(0.0)); + } + + AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName); + Builder.CreateStore(InitVal, Alloca); + + // Remember the old variable binding so that we can restore the binding when + // we unrecurse. + OldBindings.push_back(NamedValues[VarName]); + + // Remember this binding. + NamedValues[VarName] = Alloca; + } + +There are more comments here than code. The basic idea is that we emit +the initializer, create the alloca, then update the symbol table to +point to it. Once all the variables are installed in the symbol table, +we evaluate the body of the var/in expression: + +.. code-block:: c++ + + // Codegen the body, now that all vars are in scope. + Value *BodyVal = Body->Codegen(); + if (BodyVal == 0) return 0; + +Finally, before returning, we restore the previous variable bindings: + +.. code-block:: c++ + + // Pop all our variables from scope. + for (unsigned i = 0, e = VarNames.size(); i != e; ++i) + NamedValues[VarNames[i].first] = OldBindings[i]; + + // Return the body computation. + return BodyVal; + } + +The end result of all of this is that we get properly scoped variable +definitions, and we even (trivially) allow mutation of them :). + +With this, we completed what we set out to do. Our nice iterative fib +example from the intro compiles and runs just fine. The mem2reg pass +optimizes all of our stack variables into SSA registers, inserting PHI +nodes where needed, and our front-end remains simple: no "iterated +dominance frontier" computation anywhere in sight. + +Full Code Listing +================= + +Here is the complete code listing for our running example, enhanced with +mutable variables and var/in support. To build this example, use: + +.. code-block:: bash + + # Compile + clang++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy + # Run + ./toy + +Here is the code: + +.. code-block:: c++ + + #include "llvm/DerivedTypes.h" + #include "llvm/ExecutionEngine/ExecutionEngine.h" + #include "llvm/ExecutionEngine/JIT.h" + #include "llvm/IRBuilder.h" + #include "llvm/LLVMContext.h" + #include "llvm/Module.h" + #include "llvm/PassManager.h" + #include "llvm/Analysis/Verifier.h" + #include "llvm/Analysis/Passes.h" + #include "llvm/DataLayout.h" + #include "llvm/Transforms/Scalar.h" + #include "llvm/Support/TargetSelect.h" + #include <cstdio> + #include <string> + #include <map> + #include <vector> + using namespace llvm; + + //===----------------------------------------------------------------------===// + // Lexer + //===----------------------------------------------------------------------===// + + // The lexer returns tokens [0-255] if it is an unknown character, otherwise one + // of these for known things. + enum Token { + tok_eof = -1, + + // commands + tok_def = -2, tok_extern = -3, + + // primary + tok_identifier = -4, tok_number = -5, + + // control + tok_if = -6, tok_then = -7, tok_else = -8, + tok_for = -9, tok_in = -10, + + // operators + tok_binary = -11, tok_unary = -12, + + // var definition + tok_var = -13 + }; + + static std::string IdentifierStr; // Filled in if tok_identifier + static double NumVal; // Filled in if tok_number + + /// gettok - Return the next token from standard input. + static int gettok() { + static int LastChar = ' '; + + // Skip any whitespace. + while (isspace(LastChar)) + LastChar = getchar(); + + if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]* + IdentifierStr = LastChar; + while (isalnum((LastChar = getchar()))) + IdentifierStr += LastChar; + + if (IdentifierStr == "def") return tok_def; + if (IdentifierStr == "extern") return tok_extern; + if (IdentifierStr == "if") return tok_if; + if (IdentifierStr == "then") return tok_then; + if (IdentifierStr == "else") return tok_else; + if (IdentifierStr == "for") return tok_for; + if (IdentifierStr == "in") return tok_in; + if (IdentifierStr == "binary") return tok_binary; + if (IdentifierStr == "unary") return tok_unary; + if (IdentifierStr == "var") return tok_var; + return tok_identifier; + } + + if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+ + std::string NumStr; + do { + NumStr += LastChar; + LastChar = getchar(); + } while (isdigit(LastChar) || LastChar == '.'); + + NumVal = strtod(NumStr.c_str(), 0); + return tok_number; + } + + if (LastChar == '#') { + // Comment until end of line. + do LastChar = getchar(); + while (LastChar != EOF && LastChar != '\n' && LastChar != '\r'); + + if (LastChar != EOF) + return gettok(); + } + + // Check for end of file. Don't eat the EOF. + if (LastChar == EOF) + return tok_eof; + + // Otherwise, just return the character as its ascii value. + int ThisChar = LastChar; + LastChar = getchar(); + return ThisChar; + } + + //===----------------------------------------------------------------------===// + // Abstract Syntax Tree (aka Parse Tree) + //===----------------------------------------------------------------------===// + + /// ExprAST - Base class for all expression nodes. + class ExprAST { + public: + virtual ~ExprAST() {} + virtual Value *Codegen() = 0; + }; + + /// NumberExprAST - Expression class for numeric literals like "1.0". + class NumberExprAST : public ExprAST { + double Val; + public: + NumberExprAST(double val) : Val(val) {} + virtual Value *Codegen(); + }; + + /// VariableExprAST - Expression class for referencing a variable, like "a". + class VariableExprAST : public ExprAST { + std::string Name; + public: + VariableExprAST(const std::string &name) : Name(name) {} + const std::string &getName() const { return Name; } + virtual Value *Codegen(); + }; + + /// UnaryExprAST - Expression class for a unary operator. + class UnaryExprAST : public ExprAST { + char Opcode; + ExprAST *Operand; + public: + UnaryExprAST(char opcode, ExprAST *operand) + : Opcode(opcode), Operand(operand) {} + virtual Value *Codegen(); + }; + + /// BinaryExprAST - Expression class for a binary operator. + class BinaryExprAST : public ExprAST { + char Op; + ExprAST *LHS, *RHS; + public: + BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs) + : Op(op), LHS(lhs), RHS(rhs) {} + virtual Value *Codegen(); + }; + + /// CallExprAST - Expression class for function calls. + class CallExprAST : public ExprAST { + std::string Callee; + std::vector<ExprAST*> Args; + public: + CallExprAST(const std::string &callee, std::vector<ExprAST*> &args) + : Callee(callee), Args(args) {} + virtual Value *Codegen(); + }; + + /// IfExprAST - Expression class for if/then/else. + class IfExprAST : public ExprAST { + ExprAST *Cond, *Then, *Else; + public: + IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else) + : Cond(cond), Then(then), Else(_else) {} + virtual Value *Codegen(); + }; + + /// ForExprAST - Expression class for for/in. + class ForExprAST : public ExprAST { + std::string VarName; + ExprAST *Start, *End, *Step, *Body; + public: + ForExprAST(const std::string &varname, ExprAST *start, ExprAST *end, + ExprAST *step, ExprAST *body) + : VarName(varname), Start(start), End(end), Step(step), Body(body) {} + virtual Value *Codegen(); + }; + + /// VarExprAST - Expression class for var/in + class VarExprAST : public ExprAST { + std::vector<std::pair<std::string, ExprAST*> > VarNames; + ExprAST *Body; + public: + VarExprAST(const std::vector<std::pair<std::string, ExprAST*> > &varnames, + ExprAST *body) + : VarNames(varnames), Body(body) {} + + virtual Value *Codegen(); + }; + + /// PrototypeAST - This class represents the "prototype" for a function, + /// which captures its name, and its argument names (thus implicitly the number + /// of arguments the function takes), as well as if it is an operator. + class PrototypeAST { + std::string Name; + std::vector<std::string> Args; + bool isOperator; + unsigned Precedence; // Precedence if a binary op. + public: + PrototypeAST(const std::string &name, const std::vector<std::string> &args, + bool isoperator = false, unsigned prec = 0) + : Name(name), Args(args), isOperator(isoperator), Precedence(prec) {} + + bool isUnaryOp() const { return isOperator && Args.size() == 1; } + bool isBinaryOp() const { return isOperator && Args.size() == 2; } + + char getOperatorName() const { + assert(isUnaryOp() || isBinaryOp()); + return Name[Name.size()-1]; + } + + unsigned getBinaryPrecedence() const { return Precedence; } + + Function *Codegen(); + + void CreateArgumentAllocas(Function *F); + }; + + /// FunctionAST - This class represents a function definition itself. + class FunctionAST { + PrototypeAST *Proto; + ExprAST *Body; + public: + FunctionAST(PrototypeAST *proto, ExprAST *body) + : Proto(proto), Body(body) {} + + Function *Codegen(); + }; + + //===----------------------------------------------------------------------===// + // Parser + //===----------------------------------------------------------------------===// + + /// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current + /// token the parser is looking at. getNextToken reads another token from the + /// lexer and updates CurTok with its results. + static int CurTok; + static int getNextToken() { + return CurTok = gettok(); + } + + /// BinopPrecedence - This holds the precedence for each binary operator that is + /// defined. + static std::map<char, int> BinopPrecedence; + + /// GetTokPrecedence - Get the precedence of the pending binary operator token. + static int GetTokPrecedence() { + if (!isascii(CurTok)) + return -1; + + // Make sure it's a declared binop. + int TokPrec = BinopPrecedence[CurTok]; + if (TokPrec <= 0) return -1; + return TokPrec; + } + + /// Error* - These are little helper functions for error handling. + ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;} + PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; } + FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; } + + static ExprAST *ParseExpression(); + + /// identifierexpr + /// ::= identifier + /// ::= identifier '(' expression* ')' + static ExprAST *ParseIdentifierExpr() { + std::string IdName = IdentifierStr; + + getNextToken(); // eat identifier. + + if (CurTok != '(') // Simple variable ref. + return new VariableExprAST(IdName); + + // Call. + getNextToken(); // eat ( + std::vector<ExprAST*> Args; + if (CurTok != ')') { + while (1) { + ExprAST *Arg = ParseExpression(); + if (!Arg) return 0; + Args.push_back(Arg); + + if (CurTok == ')') break; + + if (CurTok != ',') + return Error("Expected ')' or ',' in argument list"); + getNextToken(); + } + } + + // Eat the ')'. + getNextToken(); + + return new CallExprAST(IdName, Args); + } + + /// numberexpr ::= number + static ExprAST *ParseNumberExpr() { + ExprAST *Result = new NumberExprAST(NumVal); + getNextToken(); // consume the number + return Result; + } + + /// parenexpr ::= '(' expression ')' + static ExprAST *ParseParenExpr() { + getNextToken(); // eat (. + ExprAST *V = ParseExpression(); + if (!V) return 0; + + if (CurTok != ')') + return Error("expected ')'"); + getNextToken(); // eat ). + return V; + } + + /// ifexpr ::= 'if' expression 'then' expression 'else' expression + static ExprAST *ParseIfExpr() { + getNextToken(); // eat the if. + + // condition. + ExprAST *Cond = ParseExpression(); + if (!Cond) return 0; + + if (CurTok != tok_then) + return Error("expected then"); + getNextToken(); // eat the then + + ExprAST *Then = ParseExpression(); + if (Then == 0) return 0; + + if (CurTok != tok_else) + return Error("expected else"); + + getNextToken(); + + ExprAST *Else = ParseExpression(); + if (!Else) return 0; + + return new IfExprAST(Cond, Then, Else); + } + + /// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression + static ExprAST *ParseForExpr() { + getNextToken(); // eat the for. + + if (CurTok != tok_identifier) + return Error("expected identifier after for"); + + std::string IdName = IdentifierStr; + getNextToken(); // eat identifier. + + if (CurTok != '=') + return Error("expected '=' after for"); + getNextToken(); // eat '='. + + + ExprAST *Start = ParseExpression(); + if (Start == 0) return 0; + if (CurTok != ',') + return Error("expected ',' after for start value"); + getNextToken(); + + ExprAST *End = ParseExpression(); + if (End == 0) return 0; + + // The step value is optional. + ExprAST *Step = 0; + if (CurTok == ',') { + getNextToken(); + Step = ParseExpression(); + if (Step == 0) return 0; + } + + if (CurTok != tok_in) + return Error("expected 'in' after for"); + getNextToken(); // eat 'in'. + + ExprAST *Body = ParseExpression(); + if (Body == 0) return 0; + + return new ForExprAST(IdName, Start, End, Step, Body); + } + + /// varexpr ::= 'var' identifier ('=' expression)? + // (',' identifier ('=' expression)?)* 'in' expression + static ExprAST *ParseVarExpr() { + getNextToken(); // eat the var. + + std::vector<std::pair<std::string, ExprAST*> > VarNames; + + // At least one variable name is required. + if (CurTok != tok_identifier) + return Error("expected identifier after var"); + + while (1) { + std::string Name = IdentifierStr; + getNextToken(); // eat identifier. + + // Read the optional initializer. + ExprAST *Init = 0; + if (CurTok == '=') { + getNextToken(); // eat the '='. + + Init = ParseExpression(); + if (Init == 0) return 0; + } + + VarNames.push_back(std::make_pair(Name, Init)); + + // End of var list, exit loop. + if (CurTok != ',') break; + getNextToken(); // eat the ','. + + if (CurTok != tok_identifier) + return Error("expected identifier list after var"); + } + + // At this point, we have to have 'in'. + if (CurTok != tok_in) + return Error("expected 'in' keyword after 'var'"); + getNextToken(); // eat 'in'. + + ExprAST *Body = ParseExpression(); + if (Body == 0) return 0; + + return new VarExprAST(VarNames, Body); + } + + /// primary + /// ::= identifierexpr + /// ::= numberexpr + /// ::= parenexpr + /// ::= ifexpr + /// ::= forexpr + /// ::= varexpr + static ExprAST *ParsePrimary() { + switch (CurTok) { + default: return Error("unknown token when expecting an expression"); + case tok_identifier: return ParseIdentifierExpr(); + case tok_number: return ParseNumberExpr(); + case '(': return ParseParenExpr(); + case tok_if: return ParseIfExpr(); + case tok_for: return ParseForExpr(); + case tok_var: return ParseVarExpr(); + } + } + + /// unary + /// ::= primary + /// ::= '!' unary + static ExprAST *ParseUnary() { + // If the current token is not an operator, it must be a primary expr. + if (!isascii(CurTok) || CurTok == '(' || CurTok == ',') + return ParsePrimary(); + + // If this is a unary operator, read it. + int Opc = CurTok; + getNextToken(); + if (ExprAST *Operand = ParseUnary()) + return new UnaryExprAST(Opc, Operand); + return 0; + } + + /// binoprhs + /// ::= ('+' unary)* + static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) { + // If this is a binop, find its precedence. + while (1) { + int TokPrec = GetTokPrecedence(); + + // If this is a binop that binds at least as tightly as the current binop, + // consume it, otherwise we are done. + if (TokPrec < ExprPrec) + return LHS; + + // Okay, we know this is a binop. + int BinOp = CurTok; + getNextToken(); // eat binop + + // Parse the unary expression after the binary operator. + ExprAST *RHS = ParseUnary(); + if (!RHS) return 0; + + // If BinOp binds less tightly with RHS than the operator after RHS, let + // the pending operator take RHS as its LHS. + int NextPrec = GetTokPrecedence(); + if (TokPrec < NextPrec) { + RHS = ParseBinOpRHS(TokPrec+1, RHS); + if (RHS == 0) return 0; + } + + // Merge LHS/RHS. + LHS = new BinaryExprAST(BinOp, LHS, RHS); + } + } + + /// expression + /// ::= unary binoprhs + /// + static ExprAST *ParseExpression() { + ExprAST *LHS = ParseUnary(); + if (!LHS) return 0; + + return ParseBinOpRHS(0, LHS); + } + + /// prototype + /// ::= id '(' id* ')' + /// ::= binary LETTER number? (id, id) + /// ::= unary LETTER (id) + static PrototypeAST *ParsePrototype() { + std::string FnName; + + unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary. + unsigned BinaryPrecedence = 30; + + switch (CurTok) { + default: + return ErrorP("Expected function name in prototype"); + case tok_identifier: + FnName = IdentifierStr; + Kind = 0; + getNextToken(); + break; + case tok_unary: + getNextToken(); + if (!isascii(CurTok)) + return ErrorP("Expected unary operator"); + FnName = "unary"; + FnName += (char)CurTok; + Kind = 1; + getNextToken(); + break; + case tok_binary: + getNextToken(); + if (!isascii(CurTok)) + return ErrorP("Expected binary operator"); + FnName = "binary"; + FnName += (char)CurTok; + Kind = 2; + getNextToken(); + + // Read the precedence if present. + if (CurTok == tok_number) { + if (NumVal < 1 || NumVal > 100) + return ErrorP("Invalid precedecnce: must be 1..100"); + BinaryPrecedence = (unsigned)NumVal; + getNextToken(); + } + break; + } + + if (CurTok != '(') + return ErrorP("Expected '(' in prototype"); + + std::vector<std::string> ArgNames; + while (getNextToken() == tok_identifier) + ArgNames.push_back(IdentifierStr); + if (CurTok != ')') + return ErrorP("Expected ')' in prototype"); + + // success. + getNextToken(); // eat ')'. + + // Verify right number of names for operator. + if (Kind && ArgNames.size() != Kind) + return ErrorP("Invalid number of operands for operator"); + + return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence); + } + + /// definition ::= 'def' prototype expression + static FunctionAST *ParseDefinition() { + getNextToken(); // eat def. + PrototypeAST *Proto = ParsePrototype(); + if (Proto == 0) return 0; + + if (ExprAST *E = ParseExpression()) + return new FunctionAST(Proto, E); + return 0; + } + + /// toplevelexpr ::= expression + static FunctionAST *ParseTopLevelExpr() { + if (ExprAST *E = ParseExpression()) { + // Make an anonymous proto. + PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>()); + return new FunctionAST(Proto, E); + } + return 0; + } + + /// external ::= 'extern' prototype + static PrototypeAST *ParseExtern() { + getNextToken(); // eat extern. + return ParsePrototype(); + } + + //===----------------------------------------------------------------------===// + // Code Generation + //===----------------------------------------------------------------------===// + + static Module *TheModule; + static IRBuilder<> Builder(getGlobalContext()); + static std::map<std::string, AllocaInst*> NamedValues; + static FunctionPassManager *TheFPM; + + Value *ErrorV(const char *Str) { Error(Str); return 0; } + + /// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of + /// the function. This is used for mutable variables etc. + static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction, + const std::string &VarName) { + IRBuilder<> TmpB(&TheFunction->getEntryBlock(), + TheFunction->getEntryBlock().begin()); + return TmpB.CreateAlloca(Type::getDoubleTy(getGlobalContext()), 0, + VarName.c_str()); + } + + Value *NumberExprAST::Codegen() { + return ConstantFP::get(getGlobalContext(), APFloat(Val)); + } + + Value *VariableExprAST::Codegen() { + // Look this variable up in the function. + Value *V = NamedValues[Name]; + if (V == 0) return ErrorV("Unknown variable name"); + + // Load the value. + return Builder.CreateLoad(V, Name.c_str()); + } + + Value *UnaryExprAST::Codegen() { + Value *OperandV = Operand->Codegen(); + if (OperandV == 0) return 0; + + Function *F = TheModule->getFunction(std::string("unary")+Opcode); + if (F == 0) + return ErrorV("Unknown unary operator"); + + return Builder.CreateCall(F, OperandV, "unop"); + } + + Value *BinaryExprAST::Codegen() { + // Special case '=' because we don't want to emit the LHS as an expression. + if (Op == '=') { + // Assignment requires the LHS to be an identifier. + VariableExprAST *LHSE = dynamic_cast<VariableExprAST*>(LHS); + if (!LHSE) + return ErrorV("destination of '=' must be a variable"); + // Codegen the RHS. + Value *Val = RHS->Codegen(); + if (Val == 0) return 0; + + // Look up the name. + Value *Variable = NamedValues[LHSE->getName()]; + if (Variable == 0) return ErrorV("Unknown variable name"); + + Builder.CreateStore(Val, Variable); + return Val; + } + + Value *L = LHS->Codegen(); + Value *R = RHS->Codegen(); + if (L == 0 || R == 0) return 0; + + switch (Op) { + case '+': return Builder.CreateFAdd(L, R, "addtmp"); + case '-': return Builder.CreateFSub(L, R, "subtmp"); + case '*': return Builder.CreateFMul(L, R, "multmp"); + case '<': + L = Builder.CreateFCmpULT(L, R, "cmptmp"); + // Convert bool 0/1 to double 0.0 or 1.0 + return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), + "booltmp"); + default: break; + } + + // If it wasn't a builtin binary operator, it must be a user defined one. Emit + // a call to it. + Function *F = TheModule->getFunction(std::string("binary")+Op); + assert(F && "binary operator not found!"); + + Value *Ops[2] = { L, R }; + return Builder.CreateCall(F, Ops, "binop"); + } + + Value *CallExprAST::Codegen() { + // Look up the name in the global module table. + Function *CalleeF = TheModule->getFunction(Callee); + if (CalleeF == 0) + return ErrorV("Unknown function referenced"); + + // If argument mismatch error. + if (CalleeF->arg_size() != Args.size()) + return ErrorV("Incorrect # arguments passed"); + + std::vector<Value*> ArgsV; + for (unsigned i = 0, e = Args.size(); i != e; ++i) { + ArgsV.push_back(Args[i]->Codegen()); + if (ArgsV.back() == 0) return 0; + } + + return Builder.CreateCall(CalleeF, ArgsV, "calltmp"); + } + + Value *IfExprAST::Codegen() { + Value *CondV = Cond->Codegen(); + if (CondV == 0) return 0; + + // Convert condition to a bool by comparing equal to 0.0. + CondV = Builder.CreateFCmpONE(CondV, + ConstantFP::get(getGlobalContext(), APFloat(0.0)), + "ifcond"); + + Function *TheFunction = Builder.GetInsertBlock()->getParent(); + + // Create blocks for the then and else cases. Insert the 'then' block at the + // end of the function. + BasicBlock *ThenBB = BasicBlock::Create(getGlobalContext(), "then", TheFunction); + BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else"); + BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont"); + + Builder.CreateCondBr(CondV, ThenBB, ElseBB); + + // Emit then value. + Builder.SetInsertPoint(ThenBB); + + Value *ThenV = Then->Codegen(); + if (ThenV == 0) return 0; + + Builder.CreateBr(MergeBB); + // Codegen of 'Then' can change the current block, update ThenBB for the PHI. + ThenBB = Builder.GetInsertBlock(); + + // Emit else block. + TheFunction->getBasicBlockList().push_back(ElseBB); + Builder.SetInsertPoint(ElseBB); + + Value *ElseV = Else->Codegen(); + if (ElseV == 0) return 0; + + Builder.CreateBr(MergeBB); + // Codegen of 'Else' can change the current block, update ElseBB for the PHI. + ElseBB = Builder.GetInsertBlock(); + + // Emit merge block. + TheFunction->getBasicBlockList().push_back(MergeBB); + Builder.SetInsertPoint(MergeBB); + PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, + "iftmp"); + + PN->addIncoming(ThenV, ThenBB); + PN->addIncoming(ElseV, ElseBB); + return PN; + } + + Value *ForExprAST::Codegen() { + // Output this as: + // var = alloca double + // ... + // start = startexpr + // store start -> var + // goto loop + // loop: + // ... + // bodyexpr + // ... + // loopend: + // step = stepexpr + // endcond = endexpr + // + // curvar = load var + // nextvar = curvar + step + // store nextvar -> var + // br endcond, loop, endloop + // outloop: + + Function *TheFunction = Builder.GetInsertBlock()->getParent(); + + // Create an alloca for the variable in the entry block. + AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName); + + // Emit the start code first, without 'variable' in scope. + Value *StartVal = Start->Codegen(); + if (StartVal == 0) return 0; + + // Store the value into the alloca. + Builder.CreateStore(StartVal, Alloca); + + // Make the new basic block for the loop header, inserting after current + // block. + BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction); + + // Insert an explicit fall through from the current block to the LoopBB. + Builder.CreateBr(LoopBB); + + // Start insertion in LoopBB. + Builder.SetInsertPoint(LoopBB); + + // Within the loop, the variable is defined equal to the PHI node. If it + // shadows an existing variable, we have to restore it, so save it now. + AllocaInst *OldVal = NamedValues[VarName]; + NamedValues[VarName] = Alloca; + + // Emit the body of the loop. This, like any other expr, can change the + // current BB. Note that we ignore the value computed by the body, but don't + // allow an error. + if (Body->Codegen() == 0) + return 0; + + // Emit the step value. + Value *StepVal; + if (Step) { + StepVal = Step->Codegen(); + if (StepVal == 0) return 0; + } else { + // If not specified, use 1.0. + StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0)); + } + + // Compute the end condition. + Value *EndCond = End->Codegen(); + if (EndCond == 0) return EndCond; + + // Reload, increment, and restore the alloca. This handles the case where + // the body of the loop mutates the variable. + Value *CurVar = Builder.CreateLoad(Alloca, VarName.c_str()); + Value *NextVar = Builder.CreateFAdd(CurVar, StepVal, "nextvar"); + Builder.CreateStore(NextVar, Alloca); + + // Convert condition to a bool by comparing equal to 0.0. + EndCond = Builder.CreateFCmpONE(EndCond, + ConstantFP::get(getGlobalContext(), APFloat(0.0)), + "loopcond"); + + // Create the "after loop" block and insert it. + BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction); + + // Insert the conditional branch into the end of LoopEndBB. + Builder.CreateCondBr(EndCond, LoopBB, AfterBB); + + // Any new code will be inserted in AfterBB. + Builder.SetInsertPoint(AfterBB); + + // Restore the unshadowed variable. + if (OldVal) + NamedValues[VarName] = OldVal; + else + NamedValues.erase(VarName); + + + // for expr always returns 0.0. + return Constant::getNullValue(Type::getDoubleTy(getGlobalContext())); + } + + Value *VarExprAST::Codegen() { + std::vector<AllocaInst *> OldBindings; + + Function *TheFunction = Builder.GetInsertBlock()->getParent(); + + // Register all variables and emit their initializer. + for (unsigned i = 0, e = VarNames.size(); i != e; ++i) { + const std::string &VarName = VarNames[i].first; + ExprAST *Init = VarNames[i].second; + + // Emit the initializer before adding the variable to scope, this prevents + // the initializer from referencing the variable itself, and permits stuff + // like this: + // var a = 1 in + // var a = a in ... # refers to outer 'a'. + Value *InitVal; + if (Init) { + InitVal = Init->Codegen(); + if (InitVal == 0) return 0; + } else { // If not specified, use 0.0. + InitVal = ConstantFP::get(getGlobalContext(), APFloat(0.0)); + } + + AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName); + Builder.CreateStore(InitVal, Alloca); + + // Remember the old variable binding so that we can restore the binding when + // we unrecurse. + OldBindings.push_back(NamedValues[VarName]); + + // Remember this binding. + NamedValues[VarName] = Alloca; + } + + // Codegen the body, now that all vars are in scope. + Value *BodyVal = Body->Codegen(); + if (BodyVal == 0) return 0; + + // Pop all our variables from scope. + for (unsigned i = 0, e = VarNames.size(); i != e; ++i) + NamedValues[VarNames[i].first] = OldBindings[i]; + + // Return the body computation. + return BodyVal; + } + + Function *PrototypeAST::Codegen() { + // Make the function type: double(double,double) etc. + std::vector<Type*> Doubles(Args.size(), + Type::getDoubleTy(getGlobalContext())); + FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()), + Doubles, false); + + Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule); + + // If F conflicted, there was already something named 'Name'. If it has a + // body, don't allow redefinition or reextern. + if (F->getName() != Name) { + // Delete the one we just made and get the existing one. + F->eraseFromParent(); + F = TheModule->getFunction(Name); + + // If F already has a body, reject this. + if (!F->empty()) { + ErrorF("redefinition of function"); + return 0; + } + + // If F took a different number of args, reject. + if (F->arg_size() != Args.size()) { + ErrorF("redefinition of function with different # args"); + return 0; + } + } + + // Set names for all arguments. + unsigned Idx = 0; + for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size(); + ++AI, ++Idx) + AI->setName(Args[Idx]); + + return F; + } + + /// CreateArgumentAllocas - Create an alloca for each argument and register the + /// argument in the symbol table so that references to it will succeed. + void PrototypeAST::CreateArgumentAllocas(Function *F) { + Function::arg_iterator AI = F->arg_begin(); + for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) { + // Create an alloca for this variable. + AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]); + + // Store the initial value into the alloca. + Builder.CreateStore(AI, Alloca); + + // Add arguments to variable symbol table. + NamedValues[Args[Idx]] = Alloca; + } + } + + Function *FunctionAST::Codegen() { + NamedValues.clear(); + + Function *TheFunction = Proto->Codegen(); + if (TheFunction == 0) + return 0; + + // If this is an operator, install it. + if (Proto->isBinaryOp()) + BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence(); + + // Create a new basic block to start insertion into. + BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction); + Builder.SetInsertPoint(BB); + + // Add all arguments to the symbol table and create their allocas. + Proto->CreateArgumentAllocas(TheFunction); + + if (Value *RetVal = Body->Codegen()) { + // Finish off the function. + Builder.CreateRet(RetVal); + + // Validate the generated code, checking for consistency. + verifyFunction(*TheFunction); + + // Optimize the function. + TheFPM->run(*TheFunction); + + return TheFunction; + } + + // Error reading body, remove function. + TheFunction->eraseFromParent(); + + if (Proto->isBinaryOp()) + BinopPrecedence.erase(Proto->getOperatorName()); + return 0; + } + + //===----------------------------------------------------------------------===// + // Top-Level parsing and JIT Driver + //===----------------------------------------------------------------------===// + + static ExecutionEngine *TheExecutionEngine; + + static void HandleDefinition() { + if (FunctionAST *F = ParseDefinition()) { + if (Function *LF = F->Codegen()) { + fprintf(stderr, "Read function definition:"); + LF->dump(); + } + } else { + // Skip token for error recovery. + getNextToken(); + } + } + + static void HandleExtern() { + if (PrototypeAST *P = ParseExtern()) { + if (Function *F = P->Codegen()) { + fprintf(stderr, "Read extern: "); + F->dump(); + } + } else { + // Skip token for error recovery. + getNextToken(); + } + } + + static void HandleTopLevelExpression() { + // Evaluate a top-level expression into an anonymous function. + if (FunctionAST *F = ParseTopLevelExpr()) { + if (Function *LF = F->Codegen()) { + // JIT the function, returning a function pointer. + void *FPtr = TheExecutionEngine->getPointerToFunction(LF); + + // Cast it to the right type (takes no arguments, returns a double) so we + // can call it as a native function. + double (*FP)() = (double (*)())(intptr_t)FPtr; + fprintf(stderr, "Evaluated to %f\n", FP()); + } + } else { + // Skip token for error recovery. + getNextToken(); + } + } + + /// top ::= definition | external | expression | ';' + static void MainLoop() { + while (1) { + fprintf(stderr, "ready> "); + switch (CurTok) { + case tok_eof: return; + case ';': getNextToken(); break; // ignore top-level semicolons. + case tok_def: HandleDefinition(); break; + case tok_extern: HandleExtern(); break; + default: HandleTopLevelExpression(); break; + } + } + } + + //===----------------------------------------------------------------------===// + // "Library" functions that can be "extern'd" from user code. + //===----------------------------------------------------------------------===// + + /// putchard - putchar that takes a double and returns 0. + extern "C" + double putchard(double X) { + putchar((char)X); + return 0; + } + + /// printd - printf that takes a double prints it as "%f\n", returning 0. + extern "C" + double printd(double X) { + printf("%f\n", X); + return 0; + } + + //===----------------------------------------------------------------------===// + // Main driver code. + //===----------------------------------------------------------------------===// + + int main() { + InitializeNativeTarget(); + LLVMContext &Context = getGlobalContext(); + + // Install standard binary operators. + // 1 is lowest precedence. + BinopPrecedence['='] = 2; + BinopPrecedence['<'] = 10; + BinopPrecedence['+'] = 20; + BinopPrecedence['-'] = 20; + BinopPrecedence['*'] = 40; // highest. + + // Prime the first token. + fprintf(stderr, "ready> "); + getNextToken(); + + // Make the module, which holds all the code. + TheModule = new Module("my cool jit", Context); + + // Create the JIT. This takes ownership of the module. + std::string ErrStr; + TheExecutionEngine = EngineBuilder(TheModule).setErrorStr(&ErrStr).create(); + if (!TheExecutionEngine) { + fprintf(stderr, "Could not create ExecutionEngine: %s\n", ErrStr.c_str()); + exit(1); + } + + FunctionPassManager OurFPM(TheModule); + + // Set up the optimizer pipeline. Start with registering info about how the + // target lays out data structures. + OurFPM.add(new DataLayout(*TheExecutionEngine->getDataLayout())); + // Provide basic AliasAnalysis support for GVN. + OurFPM.add(createBasicAliasAnalysisPass()); + // Promote allocas to registers. + OurFPM.add(createPromoteMemoryToRegisterPass()); + // Do simple "peephole" optimizations and bit-twiddling optzns. + OurFPM.add(createInstructionCombiningPass()); + // Reassociate expressions. + OurFPM.add(createReassociatePass()); + // Eliminate Common SubExpressions. + OurFPM.add(createGVNPass()); + // Simplify the control flow graph (deleting unreachable blocks, etc). + OurFPM.add(createCFGSimplificationPass()); + + OurFPM.doInitialization(); + + // Set the global so the code gen can use this. + TheFPM = &OurFPM; + + // Run the main "interpreter loop" now. + MainLoop(); + + TheFPM = 0; + + // Print out all of the generated code. + TheModule->dump(); + + return 0; + } + +`Next: Conclusion and other useful LLVM tidbits <LangImpl8.html>`_ + diff --git a/docs/tutorial/LangImpl8.html b/docs/tutorial/LangImpl8.html deleted file mode 100644 index 7c1a500a21..0000000000 --- a/docs/tutorial/LangImpl8.html +++ /dev/null @@ -1,359 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> - -<html> -<head> - <title>Kaleidoscope: Conclusion and other useful LLVM tidbits</title> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <meta name="author" content="Chris Lattner"> - <link rel="stylesheet" href="../_static/llvm.css" type="text/css"> -</head> - -<body> - -<h1>Kaleidoscope: Conclusion and other useful LLVM tidbits</h1> - -<ul> -<li><a href="index.html">Up to Tutorial Index</a></li> -<li>Chapter 8 - <ol> - <li><a href="#conclusion">Tutorial Conclusion</a></li> - <li><a href="#llvmirproperties">Properties of LLVM IR</a> - <ul> - <li><a href="#targetindep">Target Independence</a></li> - <li><a href="#safety">Safety Guarantees</a></li> - <li><a href="#langspecific">Language-Specific Optimizations</a></li> - </ul> - </li> - <li><a href="#tipsandtricks">Tips and Tricks</a> - <ul> - <li><a href="#offsetofsizeof">Implementing portable - offsetof/sizeof</a></li> - <li><a href="#gcstack">Garbage Collected Stack Frames</a></li> - </ul> - </li> - </ol> -</li> -</ul> - - -<div class="doc_author"> - <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p> -</div> - -<!-- *********************************************************************** --> -<h2><a name="conclusion">Tutorial Conclusion</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>Welcome to the final chapter of the "<a href="index.html">Implementing a -language with LLVM</a>" tutorial. In the course of this tutorial, we have grown -our little Kaleidoscope language from being a useless toy, to being a -semi-interesting (but probably still useless) toy. :)</p> - -<p>It is interesting to see how far we've come, and how little code it has -taken. We built the entire lexer, parser, AST, code generator, and an -interactive run-loop (with a JIT!) by-hand in under 700 lines of -(non-comment/non-blank) code.</p> - -<p>Our little language supports a couple of interesting features: it supports -user defined binary and unary operators, it uses JIT compilation for immediate -evaluation, and it supports a few control flow constructs with SSA construction. -</p> - -<p>Part of the idea of this tutorial was to show you how easy and fun it can be -to define, build, and play with languages. Building a compiler need not be a -scary or mystical process! Now that you've seen some of the basics, I strongly -encourage you to take the code and hack on it. For example, try adding:</p> - -<ul> -<li><b>global variables</b> - While global variables have questional value in -modern software engineering, they are often useful when putting together quick -little hacks like the Kaleidoscope compiler itself. Fortunately, our current -setup makes it very easy to add global variables: just have value lookup check -to see if an unresolved variable is in the global variable symbol table before -rejecting it. To create a new global variable, make an instance of the LLVM -<tt>GlobalVariable</tt> class.</li> - -<li><b>typed variables</b> - Kaleidoscope currently only supports variables of -type double. This gives the language a very nice elegance, because only -supporting one type means that you never have to specify types. Different -languages have different ways of handling this. The easiest way is to require -the user to specify types for every variable definition, and record the type -of the variable in the symbol table along with its Value*.</li> - -<li><b>arrays, structs, vectors, etc</b> - Once you add types, you can start -extending the type system in all sorts of interesting ways. Simple arrays are -very easy and are quite useful for many different applications. Adding them is -mostly an exercise in learning how the LLVM <a -href="../LangRef.html#i_getelementptr">getelementptr</a> instruction works: it -is so nifty/unconventional, it <a -href="../GetElementPtr.html">has its own FAQ</a>! If you add support -for recursive types (e.g. linked lists), make sure to read the <a -href="../ProgrammersManual.html#TypeResolve">section in the LLVM -Programmer's Manual</a> that describes how to construct them.</li> - -<li><b>standard runtime</b> - Our current language allows the user to access -arbitrary external functions, and we use it for things like "printd" and -"putchard". As you extend the language to add higher-level constructs, often -these constructs make the most sense if they are lowered to calls into a -language-supplied runtime. For example, if you add hash tables to the language, -it would probably make sense to add the routines to a runtime, instead of -inlining them all the way.</li> - -<li><b>memory management</b> - Currently we can only access the stack in -Kaleidoscope. It would also be useful to be able to allocate heap memory, -either with calls to the standard libc malloc/free interface or with a garbage -collector. If you would like to use garbage collection, note that LLVM fully -supports <a href="../GarbageCollection.html">Accurate Garbage Collection</a> -including algorithms that move objects and need to scan/update the stack.</li> - -<li><b>debugger support</b> - LLVM supports generation of <a -href="../SourceLevelDebugging.html">DWARF Debug info</a> which is understood by -common debuggers like GDB. Adding support for debug info is fairly -straightforward. The best way to understand it is to compile some C/C++ code -with "<tt>llvm-gcc -g -O0</tt>" and taking a look at what it produces.</li> - -<li><b>exception handling support</b> - LLVM supports generation of <a -href="../ExceptionHandling.html">zero cost exceptions</a> which interoperate -with code compiled in other languages. You could also generate code by -implicitly making every function return an error value and checking it. You -could also make explicit use of setjmp/longjmp. There are many different ways -to go here.</li> - -<li><b>object orientation, generics, database access, complex numbers, -geometric programming, ...</b> - Really, there is -no end of crazy features that you can add to the language.</li> - -<li><b>unusual domains</b> - We've been talking about applying LLVM to a domain -that many people are interested in: building a compiler for a specific language. -However, there are many other domains that can use compiler technology that are -not typically considered. For example, LLVM has been used to implement OpenGL -graphics acceleration, translate C++ code to ActionScript, and many other -cute and clever things. Maybe you will be the first to JIT compile a regular -expression interpreter into native code with LLVM?</li> - -</ul> - -<p> -Have fun - try doing something crazy and unusual. Building a language like -everyone else always has, is much less fun than trying something a little crazy -or off the wall and seeing how it turns out. If you get stuck or want to talk -about it, feel free to email the <a -href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">llvmdev mailing -list</a>: it has lots of people who are interested in languages and are often -willing to help out. -</p> - -<p>Before we end this tutorial, I want to talk about some "tips and tricks" for generating -LLVM IR. These are some of the more subtle things that may not be obvious, but -are very useful if you want to take advantage of LLVM's capabilities.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="llvmirproperties">Properties of the LLVM IR</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>We have a couple common questions about code in the LLVM IR form - lets just -get these out of the way right now, shall we?</p> - -<!-- ======================================================================= --> -<h4><a name="targetindep">Target Independence</a></h4> -<!-- ======================================================================= --> - -<div> - -<p>Kaleidoscope is an example of a "portable language": any program written in -Kaleidoscope will work the same way on any target that it runs on. Many other -languages have this property, e.g. lisp, java, haskell, javascript, python, etc -(note that while these languages are portable, not all their libraries are).</p> - -<p>One nice aspect of LLVM is that it is often capable of preserving target -independence in the IR: you can take the LLVM IR for a Kaleidoscope-compiled -program and run it on any target that LLVM supports, even emitting C code and -compiling that on targets that LLVM doesn't support natively. You can trivially -tell that the Kaleidoscope compiler generates target-independent code because it -never queries for any target-specific information when generating code.</p> - -<p>The fact that LLVM provides a compact, target-independent, representation for -code gets a lot of people excited. Unfortunately, these people are usually -thinking about C or a language from the C family when they are asking questions -about language portability. I say "unfortunately", because there is really no -way to make (fully general) C code portable, other than shipping the source code -around (and of course, C source code is not actually portable in general -either - ever port a really old application from 32- to 64-bits?).</p> - -<p>The problem with C (again, in its full generality) is that it is heavily -laden with target specific assumptions. As one simple example, the preprocessor -often destructively removes target-independence from the code when it processes -the input text:</p> - -<div class="doc_code"> -<pre> -#ifdef __i386__ - int X = 1; -#else - int X = 42; -#endif -</pre> -</div> - -<p>While it is possible to engineer more and more complex solutions to problems -like this, it cannot be solved in full generality in a way that is better than shipping -the actual source code.</p> - -<p>That said, there are interesting subsets of C that can be made portable. If -you are willing to fix primitive types to a fixed size (say int = 32-bits, -and long = 64-bits), don't care about ABI compatibility with existing binaries, -and are willing to give up some other minor features, you can have portable -code. This can make sense for specialized domains such as an -in-kernel language.</p> - -</div> - -<!-- ======================================================================= --> -<h4><a name="safety">Safety Guarantees</a></h4> -<!-- ======================================================================= --> - -<div> - -<p>Many of the languages above are also "safe" languages: it is impossible for -a program written in Java to corrupt its address space and crash the process -(assuming the JVM has no bugs). -Safety is an interesting property that requires a combination of language -design, runtime support, and often operating system support.</p> - -<p>It is certainly possible to implement a safe language in LLVM, but LLVM IR -does not itself guarantee safety. The LLVM IR allows unsafe pointer casts, -use after free bugs, buffer over-runs, and a variety of other problems. Safety -needs to be implemented as a layer on top of LLVM and, conveniently, several -groups have investigated this. Ask on the <a -href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">llvmdev mailing -list</a> if you are interested in more details.</p> - -</div> - -<!-- ======================================================================= --> -<h4><a name="langspecific">Language-Specific Optimizations</a></h4> -<!-- ======================================================================= --> - -<div> - -<p>One thing about LLVM that turns off many people is that it does not solve all -the world's problems in one system (sorry 'world hunger', someone else will have -to solve you some other day). One specific complaint is that people perceive -LLVM as being incapable of performing high-level language-specific optimization: -LLVM "loses too much information".</p> - -<p>Unfortunately, this is really not the place to give you a full and unified -version of "Chris Lattner's theory of compiler design". Instead, I'll make a -few observations:</p> - -<p>First, you're right that LLVM does lose information. For example, as of this -writing, there is no way to distinguish in the LLVM IR whether an SSA-value came -from a C "int" or a C "long" on an ILP32 machine (other than debug info). Both -get compiled down to an 'i32' value and the information about what it came from -is lost. The more general issue here, is that the LLVM type system uses -"structural equivalence" instead of "name equivalence". Another place this -surprises people is if you have two types in a high-level language that have the -same structure (e.g. two different structs that have a single int field): these -types will compile down into a single LLVM type and it will be impossible to -tell what it came from.</p> - -<p>Second, while LLVM does lose information, LLVM is not a fixed target: we -continue to enhance and improve it in many different ways. In addition to -adding new features (LLVM did not always support exceptions or debug info), we -also extend the IR to capture important information for optimization (e.g. -whether an argument is sign or zero extended, information about pointers -aliasing, etc). Many of the enhancements are user-driven: people want LLVM to -include some specific feature, so they go ahead and extend it.</p> - -<p>Third, it is <em>possible and easy</em> to add language-specific -optimizations, and you have a number of choices in how to do it. As one trivial -example, it is easy to add language-specific optimization passes that -"know" things about code compiled for a language. In the case of the C family, -there is an optimization pass that "knows" about the standard C library -functions. If you call "exit(0)" in main(), it knows that it is safe to -optimize that into "return 0;" because C specifies what the 'exit' -function does.</p> - -<p>In addition to simple library knowledge, it is possible to embed a variety of -other language-specific information into the LLVM IR. If you have a specific -need and run into a wall, please bring the topic up on the llvmdev list. At the -very worst, you can always treat LLVM as if it were a "dumb code generator" and -implement the high-level optimizations you desire in your front-end, on the -language-specific AST. -</p> - -</div> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="tipsandtricks">Tips and Tricks</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>There is a variety of useful tips and tricks that you come to know after -working on/with LLVM that aren't obvious at first glance. Instead of letting -everyone rediscover them, this section talks about some of these issues.</p> - -<!-- ======================================================================= --> -<h4><a name="offsetofsizeof">Implementing portable offsetof/sizeof</a></h4> -<!-- ======================================================================= --> - -<div> - -<p>One interesting thing that comes up, if you are trying to keep the code -generated by your compiler "target independent", is that you often need to know -the size of some LLVM type or the offset of some field in an llvm structure. -For example, you might need to pass the size of a type into a function that -allocates memory.</p> - -<p>Unfortunately, this can vary widely across targets: for example the width of -a pointer is trivially target-specific. However, there is a <a -href="http://nondot.org/sabre/LLVMNotes/SizeOf-OffsetOf-VariableSizedStructs.txt">clever -way to use the getelementptr instruction</a> that allows you to compute this -in a portable way.</p> - -</div> - -<!-- ======================================================================= --> -<h4><a name="gcstack">Garbage Collected Stack Frames</a></h4> -<!-- ======================================================================= --> - -<div> - -<p>Some languages want to explicitly manage their stack frames, often so that -they are garbage collected or to allow easy implementation of closures. There -are often better ways to implement these features than explicit stack frames, -but <a -href="http://nondot.org/sabre/LLVMNotes/ExplicitlyManagedStackFrames.txt">LLVM -does support them,</a> if you want. It requires your front-end to convert the -code into <a -href="http://en.wikipedia.org/wiki/Continuation-passing_style">Continuation -Passing Style</a> and the use of tail calls (which LLVM also supports).</p> - -</div> - -</div> - -<!-- *********************************************************************** --> -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> - - <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> - <a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> -</body> -</html> diff --git a/docs/tutorial/LangImpl8.rst b/docs/tutorial/LangImpl8.rst new file mode 100644 index 0000000000..4058991f19 --- /dev/null +++ b/docs/tutorial/LangImpl8.rst @@ -0,0 +1,269 @@ +====================================================== +Kaleidoscope: Conclusion and other useful LLVM tidbits +====================================================== + +.. contents:: + :local: + +Written by `Chris Lattner <mailto:sabre@nondot.org>`_ + +Tutorial Conclusion +=================== + +Welcome to the final chapter of the "`Implementing a language with +LLVM <index.html>`_" tutorial. In the course of this tutorial, we have +grown our little Kaleidoscope language from being a useless toy, to +being a semi-interesting (but probably still useless) toy. :) + +It is interesting to see how far we've come, and how little code it has +taken. We built the entire lexer, parser, AST, code generator, and an +interactive run-loop (with a JIT!) by-hand in under 700 lines of +(non-comment/non-blank) code. + +Our little language supports a couple of interesting features: it +supports user defined binary and unary operators, it uses JIT +compilation for immediate evaluation, and it supports a few control flow +constructs with SSA construction. + +Part of the idea of this tutorial was to show you how easy and fun it +can be to define, build, and play with languages. Building a compiler +need not be a scary or mystical process! Now that you've seen some of +the basics, I strongly encourage you to take the code and hack on it. +For example, try adding: + +- **global variables** - While global variables have questional value + in modern software engineering, they are often useful when putting + together quick little hacks like the Kaleidoscope compiler itself. + Fortunately, our current setup makes it very easy to add global + variables: just have value lookup check to see if an unresolved + variable is in the global variable symbol table before rejecting it. + To create a new global variable, make an instance of the LLVM + ``GlobalVariable`` class. +- **typed variables** - Kaleidoscope currently only supports variables + of type double. This gives the language a very nice elegance, because + only supporting one type means that you never have to specify types. + Different languages have different ways of handling this. The easiest + way is to require the user to specify types for every variable + definition, and record the type of the variable in the symbol table + along with its Value\*. +- **arrays, structs, vectors, etc** - Once you add types, you can start + extending the type system in all sorts of interesting ways. Simple + arrays are very easy and are quite useful for many different + applications. Adding them is mostly an exercise in learning how the + LLVM `getelementptr <../LangRef.html#i_getelementptr>`_ instruction + works: it is so nifty/unconventional, it `has its own + FAQ <../GetElementPtr.html>`_! If you add support for recursive types + (e.g. linked lists), make sure to read the `section in the LLVM + Programmer's Manual <../ProgrammersManual.html#TypeResolve>`_ that + describes how to construct them. +- **standard runtime** - Our current language allows the user to access + arbitrary external functions, and we use it for things like "printd" + and "putchard". As you extend the language to add higher-level + constructs, often these constructs make the most sense if they are + lowered to calls into a language-supplied runtime. For example, if + you add hash tables to the language, it would probably make sense to + add the routines to a runtime, instead of inlining them all the way. +- **memory management** - Currently we can only access the stack in + Kaleidoscope. It would also be useful to be able to allocate heap + memory, either with calls to the standard libc malloc/free interface + or with a garbage collector. If you would like to use garbage + collection, note that LLVM fully supports `Accurate Garbage + Collection <../GarbageCollection.html>`_ including algorithms that + move objects and need to scan/update the stack. +- **debugger support** - LLVM supports generation of `DWARF Debug + info <../SourceLevelDebugging.html>`_ which is understood by common + debuggers like GDB. Adding support for debug info is fairly + straightforward. The best way to understand it is to compile some + C/C++ code with "``llvm-gcc -g -O0``" and taking a look at what it + produces. +- **exception handling support** - LLVM supports generation of `zero + cost exceptions <../ExceptionHandling.html>`_ which interoperate with + code compiled in other languages. You could also generate code by + implicitly making every function return an error value and checking + it. You could also make explicit use of setjmp/longjmp. There are + many different ways to go here. +- **object orientation, generics, database access, complex numbers, + geometric programming, ...** - Really, there is no end of crazy + features that you can add to the language. +- **unusual domains** - We've been talking about applying LLVM to a + domain that many people are interested in: building a compiler for a + specific language. However, there are many other domains that can use + compiler technology that are not typically considered. For example, + LLVM has been used to implement OpenGL graphics acceleration, + translate C++ code to ActionScript, and many other cute and clever + things. Maybe you will be the first to JIT compile a regular + expression interpreter into native code with LLVM? + +Have fun - try doing something crazy and unusual. Building a language +like everyone else always has, is much less fun than trying something a +little crazy or off the wall and seeing how it turns out. If you get +stuck or want to talk about it, feel free to email the `llvmdev mailing +list <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>`_: it has lots +of people who are interested in languages and are often willing to help +out. + +Before we end this tutorial, I want to talk about some "tips and tricks" +for generating LLVM IR. These are some of the more subtle things that +may not be obvious, but are very useful if you want to take advantage of +LLVM's capabilities. + +Properties of the LLVM IR +========================= + +We have a couple common questions about code in the LLVM IR form - lets +just get these out of the way right now, shall we? + +Target Independence +------------------- + +Kaleidoscope is an example of a "portable language": any program written +in Kaleidoscope will work the same way on any target that it runs on. +Many other languages have this property, e.g. lisp, java, haskell, +javascript, python, etc (note that while these languages are portable, +not all their libraries are). + +One nice aspect of LLVM is that it is often capable of preserving target +independence in the IR: you can take the LLVM IR for a +Kaleidoscope-compiled program and run it on any target that LLVM +supports, even emitting C code and compiling that on targets that LLVM +doesn't support natively. You can trivially tell that the Kaleidoscope +compiler generates target-independent code because it never queries for +any target-specific information when generating code. + +The fact that LLVM provides a compact, target-independent, +representation for code gets a lot of people excited. Unfortunately, +these people are usually thinking about C or a language from the C +family when they are asking questions about language portability. I say +"unfortunately", because there is really no way to make (fully general) +C code portable, other than shipping the source code around (and of +course, C source code is not actually portable in general either - ever +port a really old application from 32- to 64-bits?). + +The problem with C (again, in its full generality) is that it is heavily +laden with target specific assumptions. As one simple example, the +preprocessor often destructively removes target-independence from the +code when it processes the input text: + +.. code-block:: c + + #ifdef __i386__ + int X = 1; + #else + int X = 42; + #endif + +While it is possible to engineer more and more complex solutions to +problems like this, it cannot be solved in full generality in a way that +is better than shipping the actual source code. + +That said, there are interesting subsets of C that can be made portable. +If you are willing to fix primitive types to a fixed size (say int = +32-bits, and long = 64-bits), don't care about ABI compatibility with +existing binaries, and are willing to give up some other minor features, +you can have portable code. This can make sense for specialized domains +such as an in-kernel language. + +Safety Guarantees +----------------- + +Many of the languages above are also "safe" languages: it is impossible +for a program written in Java to corrupt its address space and crash the +process (assuming the JVM has no bugs). Safety is an interesting +property that requires a combination of language design, runtime +support, and often operating system support. + +It is certainly possible to implement a safe language in LLVM, but LLVM +IR does not itself guarantee safety. The LLVM IR allows unsafe pointer +casts, use after free bugs, buffer over-runs, and a variety of other +problems. Safety needs to be implemented as a layer on top of LLVM and, +conveniently, several groups have investigated this. Ask on the `llvmdev +mailing list <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>`_ if +you are interested in more details. + +Language-Specific Optimizations +------------------------------- + +One thing about LLVM that turns off many people is that it does not +solve all the world's problems in one system (sorry 'world hunger', +someone else will have to solve you some other day). One specific +complaint is that people perceive LLVM as being incapable of performing +high-level language-specific optimization: LLVM "loses too much +information". + +Unfortunately, this is really not the place to give you a full and +unified version of "Chris Lattner's theory of compiler design". Instead, +I'll make a few observations: + +First, you're right that LLVM does lose information. For example, as of +this writing, there is no way to distinguish in the LLVM IR whether an +SSA-value came from a C "int" or a C "long" on an ILP32 machine (other +than debug info). Both get compiled down to an 'i32' value and the +information about what it came from is lost. The more general issue +here, is that the LLVM type system uses "structural equivalence" instead +of "name equivalence". Another place this surprises people is if you +have two types in a high-level language that have the same structure +(e.g. two different structs that have a single int field): these types +will compile down into a single LLVM type and it will be impossible to +tell what it came from. + +Second, while LLVM does lose information, LLVM is not a fixed target: we +continue to enhance and improve it in many different ways. In addition +to adding new features (LLVM did not always support exceptions or debug +info), we also extend the IR to capture important information for +optimization (e.g. whether an argument is sign or zero extended, +information about pointers aliasing, etc). Many of the enhancements are +user-driven: people want LLVM to include some specific feature, so they +go ahead and extend it. + +Third, it is *possible and easy* to add language-specific optimizations, +and you have a number of choices in how to do it. As one trivial +example, it is easy to add language-specific optimization passes that +"know" things about code compiled for a language. In the case of the C +family, there is an optimization pass that "knows" about the standard C +library functions. If you call "exit(0)" in main(), it knows that it is +safe to optimize that into "return 0;" because C specifies what the +'exit' function does. + +In addition to simple library knowledge, it is possible to embed a +variety of other language-specific information into the LLVM IR. If you +have a specific need and run into a wall, please bring the topic up on +the llvmdev list. At the very worst, you can always treat LLVM as if it +were a "dumb code generator" and implement the high-level optimizations +you desire in your front-end, on the language-specific AST. + +Tips and Tricks +=============== + +There is a variety of useful tips and tricks that you come to know after +working on/with LLVM that aren't obvious at first glance. Instead of +letting everyone rediscover them, this section talks about some of these +issues. + +Implementing portable offsetof/sizeof +------------------------------------- + +One interesting thing that comes up, if you are trying to keep the code +generated by your compiler "target independent", is that you often need +to know the size of some LLVM type or the offset of some field in an +llvm structure. For example, you might need to pass the size of a type +into a function that allocates memory. + +Unfortunately, this can vary widely across targets: for example the +width of a pointer is trivially target-specific. However, there is a +`clever way to use the getelementptr +instruction <http://nondot.org/sabre/LLVMNotes/SizeOf-OffsetOf-VariableSizedStructs.txt>`_ +that allows you to compute this in a portable way. + +Garbage Collected Stack Frames +------------------------------ + +Some languages want to explicitly manage their stack frames, often so +that they are garbage collected or to allow easy implementation of +closures. There are often better ways to implement these features than +explicit stack frames, but `LLVM does support +them, <http://nondot.org/sabre/LLVMNotes/ExplicitlyManagedStackFrames.txt>`_ +if you want. It requires your front-end to convert the code into +`Continuation Passing +Style <http://en.wikipedia.org/wiki/Continuation-passing_style>`_ and +the use of tail calls (which LLVM also supports). + diff --git a/docs/tutorial/OCamlLangImpl1.html b/docs/tutorial/OCamlLangImpl1.html deleted file mode 100644 index 73fe07bb84..0000000000 --- a/docs/tutorial/OCamlLangImpl1.html +++ /dev/null @@ -1,365 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> - -<html> -<head> - <title>Kaleidoscope: Tutorial Introduction and the Lexer</title> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <meta name="author" content="Chris Lattner"> - <meta name="author" content="Erick Tryzelaar"> - <link rel="stylesheet" href="../_static/llvm.css" type="text/css"> -</head> - -<body> - -<h1>Kaleidoscope: Tutorial Introduction and the Lexer</h1> - -<ul> -<li><a href="index.html">Up to Tutorial Index</a></li> -<li>Chapter 1 - <ol> - <li><a href="#intro">Tutorial Introduction</a></li> - <li><a href="#language">The Basic Language</a></li> - <li><a href="#lexer">The Lexer</a></li> - </ol> -</li> -<li><a href="OCamlLangImpl2.html">Chapter 2</a>: Implementing a Parser and -AST</li> -</ul> - -<div class="doc_author"> - <p> - Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a> - and <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a> - </p> -</div> - -<!-- *********************************************************************** --> -<h2><a name="intro">Tutorial Introduction</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>Welcome to the "Implementing a language with LLVM" tutorial. This tutorial -runs through the implementation of a simple language, showing how fun and -easy it can be. This tutorial will get you up and started as well as help to -build a framework you can extend to other languages. The code in this tutorial -can also be used as a playground to hack on other LLVM specific things. -</p> - -<p> -The goal of this tutorial is to progressively unveil our language, describing -how it is built up over time. This will let us cover a fairly broad range of -language design and LLVM-specific usage issues, showing and explaining the code -for it all along the way, without overwhelming you with tons of details up -front.</p> - -<p>It is useful to point out ahead of time that this tutorial is really about -teaching compiler techniques and LLVM specifically, <em>not</em> about teaching -modern and sane software engineering principles. In practice, this means that -we'll take a number of shortcuts to simplify the exposition. For example, the -code leaks memory, uses global variables all over the place, doesn't use nice -design patterns like <a -href="http://en.wikipedia.org/wiki/Visitor_pattern">visitors</a>, etc... but it -is very simple. If you dig in and use the code as a basis for future projects, -fixing these deficiencies shouldn't be hard.</p> - -<p>I've tried to put this tutorial together in a way that makes chapters easy to -skip over if you are already familiar with or are uninterested in the various -pieces. The structure of the tutorial is: -</p> - -<ul> -<li><b><a href="#language">Chapter #1</a>: Introduction to the Kaleidoscope -language, and the definition of its Lexer</b> - This shows where we are going -and the basic functionality that we want it to do. In order to make this -tutorial maximally understandable and hackable, we choose to implement -everything in Objective Caml instead of using lexer and parser generators. -LLVM obviously works just fine with such tools, feel free to use one if you -prefer.</li> -<li><b><a href="OCamlLangImpl2.html">Chapter #2</a>: Implementing a Parser and -AST</b> - With the lexer in place, we can talk about parsing techniques and -basic AST construction. This tutorial describes recursive descent parsing and -operator precedence parsing. Nothing in Chapters 1 or 2 is LLVM-specific, -the code doesn't even link in LLVM at this point. :)</li> -<li><b><a href="OCamlLangImpl3.html">Chapter #3</a>: Code generation to LLVM -IR</b> - With the AST ready, we can show off how easy generation of LLVM IR -really is.</li> -<li><b><a href="OCamlLangImpl4.html">Chapter #4</a>: Adding JIT and Optimizer -Support</b> - Because a lot of people are interested in using LLVM as a JIT, -we'll dive right into it and show you the 3 lines it takes to add JIT support. -LLVM is also useful in many other ways, but this is one simple and "sexy" way -to shows off its power. :)</li> -<li><b><a href="OCamlLangImpl5.html">Chapter #5</a>: Extending the Language: -Control Flow</b> - With the language up and running, we show how to extend it -with control flow operations (if/then/else and a 'for' loop). This gives us a -chance to talk about simple SSA construction and control flow.</li> -<li><b><a href="OCamlLangImpl6.html">Chapter #6</a>: Extending the Language: -User-defined Operators</b> - This is a silly but fun chapter that talks about -extending the language to let the user program define their own arbitrary -unary and binary operators (with assignable precedence!). This lets us build a -significant piece of the "language" as library routines.</li> -<li><b><a href="OCamlLangImpl7.html">Chapter #7</a>: Extending the Language: -Mutable Variables</b> - This chapter talks about adding user-defined local -variables along with an assignment operator. The interesting part about this -is how easy and trivial it is to construct SSA form in LLVM: no, LLVM does -<em>not</em> require your front-end to construct SSA form!</li> -<li><b><a href="OCamlLangImpl8.html">Chapter #8</a>: Conclusion and other -useful LLVM tidbits</b> - This chapter wraps up the series by talking about -potential ways to extend the language, but also includes a bunch of pointers to -info about "special topics" like adding garbage collection support, exceptions, -debugging, support for "spaghetti stacks", and a bunch of other tips and -tricks.</li> - -</ul> - -<p>By the end of the tutorial, we'll have written a bit less than 700 lines of -non-comment, non-blank, lines of code. With this small amount of code, we'll -have built up a very reasonable compiler for a non-trivial language including -a hand-written lexer, parser, AST, as well as code generation support with a JIT -compiler. While other systems may have interesting "hello world" tutorials, -I think the breadth of this tutorial is a great testament to the strengths of -LLVM and why you should consider it if you're interested in language or compiler -design.</p> - -<p>A note about this tutorial: we expect you to extend the language and play -with it on your own. Take the code and go crazy hacking away at it, compilers -don't need to be scary creatures - it can be a lot of fun to play with -languages!</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="language">The Basic Language</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>This tutorial will be illustrated with a toy language that we'll call -"<a href="http://en.wikipedia.org/wiki/Kaleidoscope">Kaleidoscope</a>" (derived -from "meaning beautiful, form, and view"). -Kaleidoscope is a procedural language that allows you to define functions, use -conditionals, math, etc. Over the course of the tutorial, we'll extend -Kaleidoscope to support the if/then/else construct, a for loop, user defined -operators, JIT compilation with a simple command line interface, etc.</p> - -<p>Because we want to keep things simple, the only datatype in Kaleidoscope is a -64-bit floating point type (aka 'float' in O'Caml parlance). As such, all -values are implicitly double precision and the language doesn't require type -declarations. This gives the language a very nice and simple syntax. For -example, the following simple example computes <a -href="http://en.wikipedia.org/wiki/Fibonacci_number">Fibonacci numbers:</a></p> - -<div class="doc_code"> -<pre> -# Compute the x'th fibonacci number. -def fib(x) - if x < 3 then - 1 - else - fib(x-1)+fib(x-2) - -# This expression will compute the 40th number. -fib(40) -</pre> -</div> - -<p>We also allow Kaleidoscope to call into standard library functions (the LLVM -JIT makes this completely trivial). This means that you can use the 'extern' -keyword to define a function before you use it (this is also useful for mutually -recursive functions). For example:</p> - -<div class="doc_code"> -<pre> -extern sin(arg); -extern cos(arg); -extern atan2(arg1 arg2); - -atan2(sin(.4), cos(42)) -</pre> -</div> - -<p>A more interesting example is included in Chapter 6 where we write a little -Kaleidoscope application that <a href="OCamlLangImpl6.html#example">displays -a Mandelbrot Set</a> at various levels of magnification.</p> - -<p>Lets dive into the implementation of this language!</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="lexer">The Lexer</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>When it comes to implementing a language, the first thing needed is -the ability to process a text file and recognize what it says. The traditional -way to do this is to use a "<a -href="http://en.wikipedia.org/wiki/Lexical_analysis">lexer</a>" (aka 'scanner') -to break the input up into "tokens". Each token returned by the lexer includes -a token code and potentially some metadata (e.g. the numeric value of a number). -First, we define the possibilities: -</p> - -<div class="doc_code"> -<pre> -(* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of - * these others for known things. *) -type token = - (* commands *) - | Def | Extern - - (* primary *) - | Ident of string | Number of float - - (* unknown *) - | Kwd of char -</pre> -</div> - -<p>Each token returned by our lexer will be one of the token variant values. -An unknown character like '+' will be returned as <tt>Token.Kwd '+'</tt>. If -the curr token is an identifier, the value will be <tt>Token.Ident s</tt>. If -the current token is a numeric literal (like 1.0), the value will be -<tt>Token.Number 1.0</tt>. -</p> - -<p>The actual implementation of the lexer is a collection of functions driven -by a function named <tt>Lexer.lex</tt>. The <tt>Lexer.lex</tt> function is -called to return the next token from standard input. We will use -<a href="http://caml.inria.fr/pub/docs/manual-camlp4/index.html">Camlp4</a> -to simplify the tokenization of the standard input. Its definition starts -as:</p> - -<div class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Lexer - *===----------------------------------------------------------------------===*) - -let rec lex = parser - (* Skip any whitespace. *) - | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream -</pre> -</div> - -<p> -<tt>Lexer.lex</tt> works by recursing over a <tt>char Stream.t</tt> to read -characters one at a time from the standard input. It eats them as it recognizes -them and stores them in in a <tt>Token.token</tt> variant. The first thing that -it has to do is ignore whitespace between tokens. This is accomplished with the -recursive call above.</p> - -<p>The next thing <tt>Lexer.lex</tt> needs to do is recognize identifiers and -specific keywords like "def". Kaleidoscope does this with a pattern match -and a helper function.<p> - -<div class="doc_code"> -<pre> - (* identifier: [a-zA-Z][a-zA-Z0-9] *) - | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_ident buffer stream - -... - -and lex_ident buffer = parser - | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> - Buffer.add_char buffer c; - lex_ident buffer stream - | [< stream=lex >] -> - match Buffer.contents buffer with - | "def" -> [< 'Token.Def; stream >] - | "extern" -> [< 'Token.Extern; stream >] - | id -> [< 'Token.Ident id; stream >] -</pre> -</div> - -<p>Numeric values are similar:</p> - -<div class="doc_code"> -<pre> - (* number: [0-9.]+ *) - | [< ' ('0' .. '9' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_number buffer stream - -... - -and lex_number buffer = parser - | [< ' ('0' .. '9' | '.' as c); stream >] -> - Buffer.add_char buffer c; - lex_number buffer stream - | [< stream=lex >] -> - [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] -</pre> -</div> - -<p>This is all pretty straight-forward code for processing input. When reading -a numeric value from input, we use the ocaml <tt>float_of_string</tt> function -to convert it to a numeric value that we store in <tt>Token.Number</tt>. Note -that this isn't doing sufficient error checking: it will raise <tt>Failure</tt> -if the string "1.23.45.67". Feel free to extend it :). Next we handle -comments: -</p> - -<div class="doc_code"> -<pre> - (* Comment until end of line. *) - | [< ' ('#'); stream >] -> - lex_comment stream - -... - -and lex_comment = parser - | [< ' ('\n'); stream=lex >] -> stream - | [< 'c; e=lex_comment >] -> e - | [< >] -> [< >] -</pre> -</div> - -<p>We handle comments by skipping to the end of the line and then return the -next token. Finally, if the input doesn't match one of the above cases, it is -either an operator character like '+' or the end of the file. These are handled -with this code:</p> - -<div class="doc_code"> -<pre> - (* Otherwise, just return the character as its ascii value. *) - | [< 'c; stream >] -> - [< 'Token.Kwd c; lex stream >] - - (* end of stream. *) - | [< >] -> [< >] -</pre> -</div> - -<p>With this, we have the complete lexer for the basic Kaleidoscope language -(the <a href="OCamlLangImpl2.html#code">full code listing</a> for the Lexer is -available in the <a href="OCamlLangImpl2.html">next chapter</a> of the -tutorial). Next we'll <a href="OCamlLangImpl2.html">build a simple parser that -uses this to build an Abstract Syntax Tree</a>. When we have that, we'll -include a driver so that you can use the lexer and parser together. -</p> - -<a href="OCamlLangImpl2.html">Next: Implementing a Parser and AST</a> -</div> - -<!-- *********************************************************************** --> -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> - - <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> - <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a><br> - <a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> -</body> -</html> diff --git a/docs/tutorial/OCamlLangImpl1.rst b/docs/tutorial/OCamlLangImpl1.rst new file mode 100644 index 0000000000..daa482507d --- /dev/null +++ b/docs/tutorial/OCamlLangImpl1.rst @@ -0,0 +1,288 @@ +================================================= +Kaleidoscope: Tutorial Introduction and the Lexer +================================================= + +.. contents:: + :local: + +Written by `Chris Lattner <mailto:sabre@nondot.org>`_ and `Erick +Tryzelaar <mailto:idadesub@users.sourceforge.net>`_ + +Tutorial Introduction +===================== + +Welcome to the "Implementing a language with LLVM" tutorial. This +tutorial runs through the implementation of a simple language, showing +how fun and easy it can be. This tutorial will get you up and started as +well as help to build a framework you can extend to other languages. The +code in this tutorial can also be used as a playground to hack on other +LLVM specific things. + +The goal of this tutorial is to progressively unveil our language, +describing how it is built up over time. This will let us cover a fairly +broad range of language design and LLVM-specific usage issues, showing +and explaining the code for it all along the way, without overwhelming +you with tons of details up front. + +It is useful to point out ahead of time that this tutorial is really +about teaching compiler techniques and LLVM specifically, *not* about +teaching modern and sane software engineering principles. In practice, +this means that we'll take a number of shortcuts to simplify the +exposition. For example, the code leaks memory, uses global variables +all over the place, doesn't use nice design patterns like +`visitors <http://en.wikipedia.org/wiki/Visitor_pattern>`_, etc... but +it is very simple. If you dig in and use the code as a basis for future +projects, fixing these deficiencies shouldn't be hard. + +I've tried to put this tutorial together in a way that makes chapters +easy to skip over if you are already familiar with or are uninterested +in the various pieces. The structure of the tutorial is: + +- `Chapter #1 <#language>`_: Introduction to the Kaleidoscope + language, and the definition of its Lexer - This shows where we are + going and the basic functionality that we want it to do. In order to + make this tutorial maximally understandable and hackable, we choose + to implement everything in Objective Caml instead of using lexer and + parser generators. LLVM obviously works just fine with such tools, + feel free to use one if you prefer. +- `Chapter #2 <OCamlLangImpl2.html>`_: Implementing a Parser and + AST - With the lexer in place, we can talk about parsing techniques + and basic AST construction. This tutorial describes recursive descent + parsing and operator precedence parsing. Nothing in Chapters 1 or 2 + is LLVM-specific, the code doesn't even link in LLVM at this point. + :) +- `Chapter #3 <OCamlLangImpl3.html>`_: Code generation to LLVM IR - + With the AST ready, we can show off how easy generation of LLVM IR + really is. +- `Chapter #4 <OCamlLangImpl4.html>`_: Adding JIT and Optimizer + Support - Because a lot of people are interested in using LLVM as a + JIT, we'll dive right into it and show you the 3 lines it takes to + add JIT support. LLVM is also useful in many other ways, but this is + one simple and "sexy" way to shows off its power. :) +- `Chapter #5 <OCamlLangImpl5.html>`_: Extending the Language: + Control Flow - With the language up and running, we show how to + extend it with control flow operations (if/then/else and a 'for' + loop). This gives us a chance to talk about simple SSA construction + and control flow. +- `Chapter #6 <OCamlLangImpl6.html>`_: Extending the Language: + User-defined Operators - This is a silly but fun chapter that talks + about extending the language to let the user program define their own + arbitrary unary and binary operators (with assignable precedence!). + This lets us build a significant piece of the "language" as library + routines. +- `Chapter #7 <OCamlLangImpl7.html>`_: Extending the Language: + Mutable Variables - This chapter talks about adding user-defined + local variables along with an assignment operator. The interesting + part about this is how easy and trivial it is to construct SSA form + in LLVM: no, LLVM does *not* require your front-end to construct SSA + form! +- `Chapter #8 <OCamlLangImpl8.html>`_: Conclusion and other useful + LLVM tidbits - This chapter wraps up the series by talking about + potential ways to extend the language, but also includes a bunch of + pointers to info about "special topics" like adding garbage + collection support, exceptions, debugging, support for "spaghetti + stacks", and a bunch of other tips and tricks. + +By the end of the tutorial, we'll have written a bit less than 700 lines +of non-comment, non-blank, lines of code. With this small amount of +code, we'll have built up a very reasonable compiler for a non-trivial +language including a hand-written lexer, parser, AST, as well as code +generation support with a JIT compiler. While other systems may have +interesting "hello world" tutorials, I think the breadth of this +tutorial is a great testament to the strengths of LLVM and why you +should consider it if you're interested in language or compiler design. + +A note about this tutorial: we expect you to extend the language and +play with it on your own. Take the code and go crazy hacking away at it, +compilers don't need to be scary creatures - it can be a lot of fun to +play with languages! + +The Basic Language +================== + +This tutorial will be illustrated with a toy language that we'll call +"`Kaleidoscope <http://en.wikipedia.org/wiki/Kaleidoscope>`_" (derived +from "meaning beautiful, form, and view"). Kaleidoscope is a procedural +language that allows you to define functions, use conditionals, math, +etc. Over the course of the tutorial, we'll extend Kaleidoscope to +support the if/then/else construct, a for loop, user defined operators, +JIT compilation with a simple command line interface, etc. + +Because we want to keep things simple, the only datatype in Kaleidoscope +is a 64-bit floating point type (aka 'float' in O'Caml parlance). As +such, all values are implicitly double precision and the language +doesn't require type declarations. This gives the language a very nice +and simple syntax. For example, the following simple example computes +`Fibonacci numbers: <http://en.wikipedia.org/wiki/Fibonacci_number>`_ + +:: + + # Compute the x'th fibonacci number. + def fib(x) + if x < 3 then + 1 + else + fib(x-1)+fib(x-2) + + # This expression will compute the 40th number. + fib(40) + +We also allow Kaleidoscope to call into standard library functions (the +LLVM JIT makes this completely trivial). This means that you can use the +'extern' keyword to define a function before you use it (this is also +useful for mutually recursive functions). For example: + +:: + + extern sin(arg); + extern cos(arg); + extern atan2(arg1 arg2); + + atan2(sin(.4), cos(42)) + +A more interesting example is included in Chapter 6 where we write a +little Kaleidoscope application that `displays a Mandelbrot +Set <OCamlLangImpl6.html#example>`_ at various levels of magnification. + +Lets dive into the implementation of this language! + +The Lexer +========= + +When it comes to implementing a language, the first thing needed is the +ability to process a text file and recognize what it says. The +traditional way to do this is to use a +"`lexer <http://en.wikipedia.org/wiki/Lexical_analysis>`_" (aka +'scanner') to break the input up into "tokens". Each token returned by +the lexer includes a token code and potentially some metadata (e.g. the +numeric value of a number). First, we define the possibilities: + +.. code-block:: ocaml + + (* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of + * these others for known things. *) + type token = + (* commands *) + | Def | Extern + + (* primary *) + | Ident of string | Number of float + + (* unknown *) + | Kwd of char + +Each token returned by our lexer will be one of the token variant +values. An unknown character like '+' will be returned as +``Token.Kwd '+'``. If the curr token is an identifier, the value will be +``Token.Ident s``. If the current token is a numeric literal (like 1.0), +the value will be ``Token.Number 1.0``. + +The actual implementation of the lexer is a collection of functions +driven by a function named ``Lexer.lex``. The ``Lexer.lex`` function is +called to return the next token from standard input. We will use +`Camlp4 <http://caml.inria.fr/pub/docs/manual-camlp4/index.html>`_ to +simplify the tokenization of the standard input. Its definition starts +as: + +.. code-block:: ocaml + + (*===----------------------------------------------------------------------=== + * Lexer + *===----------------------------------------------------------------------===*) + + let rec lex = parser + (* Skip any whitespace. *) + | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream + +``Lexer.lex`` works by recursing over a ``char Stream.t`` to read +characters one at a time from the standard input. It eats them as it +recognizes them and stores them in in a ``Token.token`` variant. The +first thing that it has to do is ignore whitespace between tokens. This +is accomplished with the recursive call above. + +The next thing ``Lexer.lex`` needs to do is recognize identifiers and +specific keywords like "def". Kaleidoscope does this with a pattern +match and a helper function. + +.. code-block:: ocaml + + (* identifier: [a-zA-Z][a-zA-Z0-9] *) + | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> + let buffer = Buffer.create 1 in + Buffer.add_char buffer c; + lex_ident buffer stream + + ... + + and lex_ident buffer = parser + | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> + Buffer.add_char buffer c; + lex_ident buffer stream + | [< stream=lex >] -> + match Buffer.contents buffer with + | "def" -> [< 'Token.Def; stream >] + | "extern" -> [< 'Token.Extern; stream >] + | id -> [< 'Token.Ident id; stream >] + +Numeric values are similar: + +.. code-block:: ocaml + + (* number: [0-9.]+ *) + | [< ' ('0' .. '9' as c); stream >] -> + let buffer = Buffer.create 1 in + Buffer.add_char buffer c; + lex_number buffer stream + + ... + + and lex_number buffer = parser + | [< ' ('0' .. '9' | '.' as c); stream >] -> + Buffer.add_char buffer c; + lex_number buffer stream + | [< stream=lex >] -> + [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] + +This is all pretty straight-forward code for processing input. When +reading a numeric value from input, we use the ocaml ``float_of_string`` +function to convert it to a numeric value that we store in +``Token.Number``. Note that this isn't doing sufficient error checking: +it will raise ``Failure`` if the string "1.23.45.67". Feel free to +extend it :). Next we handle comments: + +.. code-block:: ocaml + + (* Comment until end of line. *) + | [< ' ('#'); stream >] -> + lex_comment stream + + ... + + and lex_comment = parser + | [< ' ('\n'); stream=lex >] -> stream + | [< 'c; e=lex_comment >] -> e + | [< >] -> [< >] + +We handle comments by skipping to the end of the line and then return +the next token. Finally, if the input doesn't match one of the above +cases, it is either an operator character like '+' or the end of the +file. These are handled with this code: + +.. code-block:: ocaml + + (* Otherwise, just return the character as its ascii value. *) + | [< 'c; stream >] -> + [< 'Token.Kwd c; lex stream >] + + (* end of stream. *) + | [< >] -> [< >] + +With this, we have the complete lexer for the basic Kaleidoscope +language (the `full code listing <OCamlLangImpl2.html#code>`_ for the +Lexer is available in the `next chapter <OCamlLangImpl2.html>`_ of the +tutorial). Next we'll `build a simple parser that uses this to build an +Abstract Syntax Tree <OCamlLangImpl2.html>`_. When we have that, we'll +include a driver so that you can use the lexer and parser together. + +`Next: Implementing a Parser and AST <OCamlLangImpl2.html>`_ + diff --git a/docs/tutorial/OCamlLangImpl2.html b/docs/tutorial/OCamlLangImpl2.html deleted file mode 100644 index dd7e07b422..0000000000 --- a/docs/tutorial/OCamlLangImpl2.html +++ /dev/null @@ -1,1043 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> - -<html> -<head> - <title>Kaleidoscope: Implementing a Parser and AST</title> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <meta name="author" content="Chris Lattner"> - <meta name="author" content="Erick Tryzelaar"> - <link rel="stylesheet" href="../_static/llvm.css" type="text/css"> -</head> - -<body> - -<h1>Kaleidoscope: Implementing a Parser and AST</h1> - -<ul> -<li><a href="index.html">Up to Tutorial Index</a></li> -<li>Chapter 2 - <ol> - <li><a href="#intro">Chapter 2 Introduction</a></li> - <li><a href="#ast">The Abstract Syntax Tree (AST)</a></li> - <li><a href="#parserbasics">Parser Basics</a></li> - <li><a href="#parserprimexprs">Basic Expression Parsing</a></li> - <li><a href="#parserbinops">Binary Expression Parsing</a></li> - <li><a href="#parsertop">Parsing the Rest</a></li> - <li><a href="#driver">The Driver</a></li> - <li><a href="#conclusions">Conclusions</a></li> - <li><a href="#code">Full Code Listing</a></li> - </ol> -</li> -<li><a href="OCamlLangImpl3.html">Chapter 3</a>: Code generation to LLVM IR</li> -</ul> - -<div class="doc_author"> - <p> - Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a> - and <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a> - </p> -</div> - -<!-- *********************************************************************** --> -<h2><a name="intro">Chapter 2 Introduction</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>Welcome to Chapter 2 of the "<a href="index.html">Implementing a language -with LLVM in Objective Caml</a>" tutorial. This chapter shows you how to use -the lexer, built in <a href="OCamlLangImpl1.html">Chapter 1</a>, to build a -full <a href="http://en.wikipedia.org/wiki/Parsing">parser</a> for our -Kaleidoscope language. Once we have a parser, we'll define and build an <a -href="http://en.wikipedia.org/wiki/Abstract_syntax_tree">Abstract Syntax -Tree</a> (AST).</p> - -<p>The parser we will build uses a combination of <a -href="http://en.wikipedia.org/wiki/Recursive_descent_parser">Recursive Descent -Parsing</a> and <a href= -"http://en.wikipedia.org/wiki/Operator-precedence_parser">Operator-Precedence -Parsing</a> to parse the Kaleidoscope language (the latter for -binary expressions and the former for everything else). Before we get to -parsing though, lets talk about the output of the parser: the Abstract Syntax -Tree.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="ast">The Abstract Syntax Tree (AST)</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>The AST for a program captures its behavior in such a way that it is easy for -later stages of the compiler (e.g. code generation) to interpret. We basically -want one object for each construct in the language, and the AST should closely -model the language. In Kaleidoscope, we have expressions, a prototype, and a -function object. We'll start with expressions first:</p> - -<div class="doc_code"> -<pre> -(* expr - Base type for all expression nodes. *) -type expr = - (* variant for numeric literals like "1.0". *) - | Number of float -</pre> -</div> - -<p>The code above shows the definition of the base ExprAST class and one -subclass which we use for numeric literals. The important thing to note about -this code is that the Number variant captures the numeric value of the -literal as an instance variable. This allows later phases of the compiler to -know what the stored numeric value is.</p> - -<p>Right now we only create the AST, so there are no useful functions on -them. It would be very easy to add a function to pretty print the code, -for example. Here are the other expression AST node definitions that we'll use -in the basic form of the Kaleidoscope language: -</p> - -<div class="doc_code"> -<pre> - (* variant for referencing a variable, like "a". *) - | Variable of string - - (* variant for a binary operator. *) - | Binary of char * expr * expr - - (* variant for function calls. *) - | Call of string * expr array -</pre> -</div> - -<p>This is all (intentionally) rather straight-forward: variables capture the -variable name, binary operators capture their opcode (e.g. '+'), and calls -capture a function name as well as a list of any argument expressions. One thing -that is nice about our AST is that it captures the language features without -talking about the syntax of the language. Note that there is no discussion about -precedence of binary operators, lexical structure, etc.</p> - -<p>For our basic language, these are all of the expression nodes we'll define. -Because it doesn't have conditional control flow, it isn't Turing-complete; -we'll fix that in a later installment. The two things we need next are a way -to talk about the interface to a function, and a way to talk about functions -themselves:</p> - -<div class="doc_code"> -<pre> -(* proto - This type represents the "prototype" for a function, which captures - * its name, and its argument names (thus implicitly the number of arguments the - * function takes). *) -type proto = Prototype of string * string array - -(* func - This type represents a function definition itself. *) -type func = Function of proto * expr -</pre> -</div> - -<p>In Kaleidoscope, functions are typed with just a count of their arguments. -Since all values are double precision floating point, the type of each argument -doesn't need to be stored anywhere. In a more aggressive and realistic -language, the "expr" variants would probably have a type field.</p> - -<p>With this scaffolding, we can now talk about parsing expressions and function -bodies in Kaleidoscope.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="parserbasics">Parser Basics</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>Now that we have an AST to build, we need to define the parser code to build -it. The idea here is that we want to parse something like "x+y" (which is -returned as three tokens by the lexer) into an AST that could be generated with -calls like this:</p> - -<div class="doc_code"> -<pre> - let x = Variable "x" in - let y = Variable "y" in - let result = Binary ('+', x, y) in - ... -</pre> -</div> - -<p> -The error handling routines make use of the builtin <tt>Stream.Failure</tt> and -<tt>Stream.Error</tt>s. <tt>Stream.Failure</tt> is raised when the parser is -unable to find any matching token in the first position of a pattern. -<tt>Stream.Error</tt> is raised when the first token matches, but the rest do -not. The error recovery in our parser will not be the best and is not -particular user-friendly, but it will be enough for our tutorial. These -exceptions make it easier to handle errors in routines that have various return -types.</p> - -<p>With these basic types and exceptions, we can implement the first -piece of our grammar: numeric literals.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="parserprimexprs">Basic Expression Parsing</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>We start with numeric literals, because they are the simplest to process. -For each production in our grammar, we'll define a function which parses that -production. We call this class of expressions "primary" expressions, for -reasons that will become more clear <a href="OCamlLangImpl6.html#unary"> -later in the tutorial</a>. In order to parse an arbitrary primary expression, -we need to determine what sort of expression it is. For numeric literals, we -have:</p> - -<div class="doc_code"> -<pre> -(* primary - * ::= identifier - * ::= numberexpr - * ::= parenexpr *) -parse_primary = parser - (* numberexpr ::= number *) - | [< 'Token.Number n >] -> Ast.Number n -</pre> -</div> - -<p>This routine is very simple: it expects to be called when the current token -is a <tt>Token.Number</tt> token. It takes the current number value, creates -a <tt>Ast.Number</tt> node, advances the lexer to the next token, and finally -returns.</p> - -<p>There are some interesting aspects to this. The most important one is that -this routine eats all of the tokens that correspond to the production and -returns the lexer buffer with the next token (which is not part of the grammar -production) ready to go. This is a fairly standard way to go for recursive -descent parsers. For a better example, the parenthesis operator is defined like -this:</p> - -<div class="doc_code"> -<pre> - (* parenexpr ::= '(' expression ')' *) - | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e -</pre> -</div> - -<p>This function illustrates a number of interesting things about the -parser:</p> - -<p> -1) It shows how we use the <tt>Stream.Error</tt> exception. When called, this -function expects that the current token is a '(' token, but after parsing the -subexpression, it is possible that there is no ')' waiting. For example, if -the user types in "(4 x" instead of "(4)", the parser should emit an error. -Because errors can occur, the parser needs a way to indicate that they -happened. In our parser, we use the camlp4 shortcut syntax <tt>token ?? "parse -error"</tt>, where if the token before the <tt>??</tt> does not match, then -<tt>Stream.Error "parse error"</tt> will be raised.</p> - -<p>2) Another interesting aspect of this function is that it uses recursion by -calling <tt>Parser.parse_primary</tt> (we will soon see that -<tt>Parser.parse_primary</tt> can call <tt>Parser.parse_primary</tt>). This is -powerful because it allows us to handle recursive grammars, and keeps each -production very simple. Note that parentheses do not cause construction of AST -nodes themselves. While we could do it this way, the most important role of -parentheses are to guide the parser and provide grouping. Once the parser -constructs the AST, parentheses are not needed.</p> - -<p>The next simple production is for handling variable references and function -calls:</p> - -<div class="doc_code"> -<pre> - (* identifierexpr - * ::= identifier - * ::= identifier '(' argumentexpr ')' *) - | [< 'Token.Ident id; stream >] -> - let rec parse_args accumulator = parser - | [< e=parse_expr; stream >] -> - begin parser - | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e - | [< >] -> e :: accumulator - end stream - | [< >] -> accumulator - in - let rec parse_ident id = parser - (* Call. *) - | [< 'Token.Kwd '('; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')'">] -> - Ast.Call (id, Array.of_list (List.rev args)) - - (* Simple variable ref. *) - | [< >] -> Ast.Variable id - in - parse_ident id stream -</pre> -</div> - -<p>This routine follows the same style as the other routines. (It expects to be -called if the current token is a <tt>Token.Ident</tt> token). It also has -recursion and error handling. One interesting aspect of this is that it uses -<em>look-ahead</em> to determine if the current identifier is a stand alone -variable reference or if it is a function call expression. It handles this by -checking to see if the token after the identifier is a '(' token, constructing -either a <tt>Ast.Variable</tt> or <tt>Ast.Call</tt> node as appropriate. -</p> - -<p>We finish up by raising an exception if we received a token we didn't -expect:</p> - -<div class="doc_code"> -<pre> - | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") -</pre> -</div> - -<p>Now that basic expressions are handled, we need to handle binary expressions. -They are a bit more complex.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="parserbinops">Binary Expression Parsing</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>Binary expressions are significantly harder to parse because they are often -ambiguous. For example, when given the string "x+y*z", the parser can choose -to parse it as either "(x+y)*z" or "x+(y*z)". With common definitions from -mathematics, we expect the later parse, because "*" (multiplication) has -higher <em>precedence</em> than "+" (addition).</p> - -<p>There are many ways to handle this, but an elegant and efficient way is to -use <a href= -"http://en.wikipedia.org/wiki/Operator-precedence_parser">Operator-Precedence -Parsing</a>. This parsing technique uses the precedence of binary operators to -guide recursion. To start with, we need a table of precedences:</p> - -<div class="doc_code"> -<pre> -(* binop_precedence - This holds the precedence for each binary operator that is - * defined *) -let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 - -(* precedence - Get the precedence of the pending binary operator token. *) -let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 - -... - -let main () = - (* Install standard binary operators. - * 1 is the lowest precedence. *) - Hashtbl.add Parser.binop_precedence '<' 10; - Hashtbl.add Parser.binop_precedence '+' 20; - Hashtbl.add Parser.binop_precedence '-' 20; - Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) - ... -</pre> -</div> - -<p>For the basic form of Kaleidoscope, we will only support 4 binary operators -(this can obviously be extended by you, our brave and intrepid reader). The -<tt>Parser.precedence</tt> function returns the precedence for the current -token, or -1 if the token is not a binary operator. Having a <tt>Hashtbl.t</tt> -makes it easy to add new operators and makes it clear that the algorithm doesn't -depend on the specific operators involved, but it would be easy enough to -eliminate the <tt>Hashtbl.t</tt> and do the comparisons in the -<tt>Parser.precedence</tt> function. (Or just use a fixed-size array).</p> - -<p>With the helper above defined, we can now start parsing binary expressions. -The basic idea of operator precedence parsing is to break down an expression -with potentially ambiguous binary operators into pieces. Consider ,for example, -the expression "a+b+(c+d)*e*f+g". Operator precedence parsing considers this -as a stream of primary expressions separated by binary operators. As such, -it will first parse the leading primary expression "a", then it will see the -pairs [+, b] [+, (c+d)] [*, e] [*, f] and [+, g]. Note that because parentheses -are primary expressions, the binary expression parser doesn't need to worry -about nested subexpressions like (c+d) at all. -</p> - -<p> -To start, an expression is a primary expression potentially followed by a -sequence of [binop,primaryexpr] pairs:</p> - -<div class="doc_code"> -<pre> -(* expression - * ::= primary binoprhs *) -and parse_expr = parser - | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream -</pre> -</div> - -<p><tt>Parser.parse_bin_rhs</tt> is the function that parses the sequence of -pairs for us. It takes a precedence and a pointer to an expression for the part -that has been parsed so far. Note that "x" is a perfectly valid expression: As -such, "binoprhs" is allowed to be empty, in which case it returns the expression -that is passed into it. In our example above, the code passes the expression for -"a" into <tt>Parser.parse_bin_rhs</tt> and the current token is "+".</p> - -<p>The precedence value passed into <tt>Parser.parse_bin_rhs</tt> indicates the -<em>minimal operator precedence</em> that the function is allowed to eat. For -example, if the current pair stream is [+, x] and <tt>Parser.parse_bin_rhs</tt> -is passed in a precedence of 40, it will not consume any tokens (because the -precedence of '+' is only 20). With this in mind, <tt>Parser.parse_bin_rhs</tt> -starts with:</p> - -<div class="doc_code"> -<pre> -(* binoprhs - * ::= ('+' primary)* *) -and parse_bin_rhs expr_prec lhs stream = - match Stream.peek stream with - (* If this is a binop, find its precedence. *) - | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> - let token_prec = precedence c in - - (* If this is a binop that binds at least as tightly as the current binop, - * consume it, otherwise we are done. *) - if token_prec < expr_prec then lhs else begin -</pre> -</div> - -<p>This code gets the precedence of the current token and checks to see if if is -too low. Because we defined invalid tokens to have a precedence of -1, this -check implicitly knows that the pair-stream ends when the token stream runs out -of binary operators. If this check succeeds, we know that the token is a binary -operator and that it will be included in this expression:</p> - -<div class="doc_code"> -<pre> - (* Eat the binop. *) - Stream.junk stream; - - (* Okay, we know this is a binop. *) - let rhs = - match Stream.peek stream with - | Some (Token.Kwd c2) -> -</pre> -</div> - -<p>As such, this code eats (and remembers) the binary operator and then parses -the primary expression that follows. This builds up the whole pair, the first of -which is [+, b] for the running example.</p> - -<p>Now that we parsed the left-hand side of an expression and one pair of the -RHS sequence, we have to decide which way the expression associates. In -particular, we could have "(a+b) binop unparsed" or "a + (b binop unparsed)". -To determine this, we look ahead at "binop" to determine its precedence and -compare it to BinOp's precedence (which is '+' in this case):</p> - -<div class="doc_code"> -<pre> - (* If BinOp binds less tightly with rhs than the operator after - * rhs, let the pending operator take rhs as its lhs. *) - let next_prec = precedence c2 in - if token_prec < next_prec -</pre> -</div> - -<p>If the precedence of the binop to the right of "RHS" is lower or equal to the -precedence of our current operator, then we know that the parentheses associate -as "(a+b) binop ...". In our example, the current operator is "+" and the next -operator is "+", we know that they have the same precedence. In this case we'll -create the AST node for "a+b", and then continue parsing:</p> - -<div class="doc_code"> -<pre> - ... if body omitted ... - in - - (* Merge lhs/rhs. *) - let lhs = Ast.Binary (c, lhs, rhs) in - parse_bin_rhs expr_prec lhs stream - end -</pre> -</div> - -<p>In our example above, this will turn "a+b+" into "(a+b)" and execute the next -iteration of the loop, with "+" as the current token. The code above will eat, -remember, and parse "(c+d)" as the primary expression, which makes the -current pair equal to [+, (c+d)]. It will then evaluate the 'if' conditional above with -"*" as the binop to the right of the primary. In this case, the precedence of "*" is -higher than the precedence of "+" so the if condition will be entered.</p> - -<p>The critical question left here is "how can the if condition parse the right -hand side in full"? In particular, to build the AST correctly for our example, -it needs to get all of "(c+d)*e*f" as the RHS expression variable. The code to -do this is surprisingly simple (code from the above two blocks duplicated for -context):</p> - -<div class="doc_code"> -<pre> - match Stream.peek stream with - | Some (Token.Kwd c2) -> - (* If BinOp binds less tightly with rhs than the operator after - * rhs, let the pending operator take rhs as its lhs. *) - if token_prec < precedence c2 - then <b>parse_bin_rhs (token_prec + 1) rhs stream</b> - else rhs - | _ -> rhs - in - - (* Merge lhs/rhs. *) - let lhs = Ast.Binary (c, lhs, rhs) in - parse_bin_rhs expr_prec lhs stream - end -</pre> -</div> - -<p>At this point, we know that the binary operator to the RHS of our primary -has higher precedence than the binop we are currently parsing. As such, we know -that any sequence of pairs whose operators are all higher precedence than "+" -should be parsed together and returned as "RHS". To do this, we recursively -invoke the <tt>Parser.parse_bin_rhs</tt> function specifying "token_prec+1" as -the minimum precedence required for it to continue. In our example above, this -will cause it to return the AST node for "(c+d)*e*f" as RHS, which is then set -as the RHS of the '+' expression.</p> - -<p>Finally, on the next iteration of the while loop, the "+g" piece is parsed -and added to the AST. With this little bit of code (14 non-trivial lines), we -correctly handle fully general binary expression parsing in a very elegant way. -This was a whirlwind tour of this code, and it is somewhat subtle. I recommend -running through it with a few tough examples to see how it works. -</p> - -<p>This wraps up handling of expressions. At this point, we can point the -parser at an arbitrary token stream and build an expression from it, stopping -at the first token that is not part of the expression. Next up we need to -handle function definitions, etc.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="parsertop">Parsing the Rest</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p> -The next thing missing is handling of function prototypes. In Kaleidoscope, -these are used both for 'extern' function declarations as well as function body -definitions. The code to do this is straight-forward and not very interesting -(once you've survived expressions): -</p> - -<div class="doc_code"> -<pre> -(* prototype - * ::= id '(' id* ')' *) -let parse_prototype = - let rec parse_args accumulator = parser - | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e - | [< >] -> accumulator - in - - parser - | [< 'Token.Ident id; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - (* success. *) - Ast.Prototype (id, Array.of_list (List.rev args)) - - | [< >] -> - raise (Stream.Error "expected function name in prototype") -</pre> -</div> - -<p>Given this, a function definition is very simple, just a prototype plus -an expression to implement the body:</p> - -<div class="doc_code"> -<pre> -(* definition ::= 'def' prototype expression *) -let parse_definition = parser - | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> - Ast.Function (p, e) -</pre> -</div> - -<p>In addition, we support 'extern' to declare functions like 'sin' and 'cos' as -well as to support forward declaration of user functions. These 'extern's are just -prototypes with no body:</p> - -<div class="doc_code"> -<pre> -(* external ::= 'extern' prototype *) -let parse_extern = parser - | [< 'Token.Extern; e=parse_prototype >] -> e -</pre> -</div> - -<p>Finally, we'll also let the user type in arbitrary top-level expressions and -evaluate them on the fly. We will handle this by defining anonymous nullary -(zero argument) functions for them:</p> - -<div class="doc_code"> -<pre> -(* toplevelexpr ::= expression *) -let parse_toplevel = parser - | [< e=parse_expr >] -> - (* Make an anonymous proto. *) - Ast.Function (Ast.Prototype ("", [||]), e) -</pre> -</div> - -<p>Now that we have all the pieces, let's build a little driver that will let us -actually <em>execute</em> this code we've built!</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="driver">The Driver</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>The driver for this simply invokes all of the parsing pieces with a top-level -dispatch loop. There isn't much interesting here, so I'll just include the -top-level loop. See <a href="#code">below</a> for full code in the "Top-Level -Parsing" section.</p> - -<div class="doc_code"> -<pre> -(* top ::= definition | external | expression | ';' *) -let rec main_loop stream = - match Stream.peek stream with - | None -> () - - (* ignore top-level semicolons. *) - | Some (Token.Kwd ';') -> - Stream.junk stream; - main_loop stream - - | Some token -> - begin - try match token with - | Token.Def -> - ignore(Parser.parse_definition stream); - print_endline "parsed a function definition."; - | Token.Extern -> - ignore(Parser.parse_extern stream); - print_endline "parsed an extern."; - | _ -> - (* Evaluate a top-level expression into an anonymous function. *) - ignore(Parser.parse_toplevel stream); - print_endline "parsed a top-level expr"; - with Stream.Error s -> - (* Skip token for error recovery. *) - Stream.junk stream; - print_endline s; - end; - print_string "ready> "; flush stdout; - main_loop stream -</pre> -</div> - -<p>The most interesting part of this is that we ignore top-level semicolons. -Why is this, you ask? The basic reason is that if you type "4 + 5" at the -command line, the parser doesn't know whether that is the end of what you will type -or not. For example, on the next line you could type "def foo..." in which case -4+5 is the end of a top-level expression. Alternatively you could type "* 6", -which would continue the expression. Having top-level semicolons allows you to -type "4+5;", and the parser will know you are done.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="conclusions">Conclusions</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>With just under 300 lines of commented code (240 lines of non-comment, -non-blank code), we fully defined our minimal language, including a lexer, -parser, and AST builder. With this done, the executable will validate -Kaleidoscope code and tell us if it is grammatically invalid. For -example, here is a sample interaction:</p> - -<div class="doc_code"> -<pre> -$ <b>./toy.byte</b> -ready> <b>def foo(x y) x+foo(y, 4.0);</b> -Parsed a function definition. -ready> <b>def foo(x y) x+y y;</b> -Parsed a function definition. -Parsed a top-level expr -ready> <b>def foo(x y) x+y );</b> -Parsed a function definition. -Error: unknown token when expecting an expression -ready> <b>extern sin(a);</b> -ready> Parsed an extern -ready> <b>^D</b> -$ -</pre> -</div> - -<p>There is a lot of room for extension here. You can define new AST nodes, -extend the language in many ways, etc. In the <a href="OCamlLangImpl3.html"> -next installment</a>, we will describe how to generate LLVM Intermediate -Representation (IR) from the AST.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="code">Full Code Listing</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p> -Here is the complete code listing for this and the previous chapter. -Note that it is fully self-contained: you don't need LLVM or any external -libraries at all for this. (Besides the ocaml standard libraries, of -course.) To build this, just compile with:</p> - -<div class="doc_code"> -<pre> -# Compile -ocamlbuild toy.byte -# Run -./toy.byte -</pre> -</div> - -<p>Here is the code:</p> - -<dl> -<dt>_tags:</dt> -<dd class="doc_code"> -<pre> -<{lexer,parser}.ml>: use_camlp4, pp(camlp4of) -</pre> -</dd> - -<dt>token.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Lexer Tokens - *===----------------------------------------------------------------------===*) - -(* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of - * these others for known things. *) -type token = - (* commands *) - | Def | Extern - - (* primary *) - | Ident of string | Number of float - - (* unknown *) - | Kwd of char -</pre> -</dd> - -<dt>lexer.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Lexer - *===----------------------------------------------------------------------===*) - -let rec lex = parser - (* Skip any whitespace. *) - | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream - - (* identifier: [a-zA-Z][a-zA-Z0-9] *) - | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_ident buffer stream - - (* number: [0-9.]+ *) - | [< ' ('0' .. '9' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_number buffer stream - - (* Comment until end of line. *) - | [< ' ('#'); stream >] -> - lex_comment stream - - (* Otherwise, just return the character as its ascii value. *) - | [< 'c; stream >] -> - [< 'Token.Kwd c; lex stream >] - - (* end of stream. *) - | [< >] -> [< >] - -and lex_number buffer = parser - | [< ' ('0' .. '9' | '.' as c); stream >] -> - Buffer.add_char buffer c; - lex_number buffer stream - | [< stream=lex >] -> - [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] - -and lex_ident buffer = parser - | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> - Buffer.add_char buffer c; - lex_ident buffer stream - | [< stream=lex >] -> - match Buffer.contents buffer with - | "def" -> [< 'Token.Def; stream >] - | "extern" -> [< 'Token.Extern; stream >] - | id -> [< 'Token.Ident id; stream >] - -and lex_comment = parser - | [< ' ('\n'); stream=lex >] -> stream - | [< 'c; e=lex_comment >] -> e - | [< >] -> [< >] -</pre> -</dd> - -<dt>ast.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Abstract Syntax Tree (aka Parse Tree) - *===----------------------------------------------------------------------===*) - -(* expr - Base type for all expression nodes. *) -type expr = - (* variant for numeric literals like "1.0". *) - | Number of float - - (* variant for referencing a variable, like "a". *) - | Variable of string - - (* variant for a binary operator. *) - | Binary of char * expr * expr - - (* variant for function calls. *) - | Call of string * expr array - -(* proto - This type represents the "prototype" for a function, which captures - * its name, and its argument names (thus implicitly the number of arguments the - * function takes). *) -type proto = Prototype of string * string array - -(* func - This type represents a function definition itself. *) -type func = Function of proto * expr -</pre> -</dd> - -<dt>parser.ml:</dt> -<dd class="doc_code"> -<pre> -(*===---------------------------------------------------------------------=== - * Parser - *===---------------------------------------------------------------------===*) - -(* binop_precedence - This holds the precedence for each binary operator that is - * defined *) -let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 - -(* precedence - Get the precedence of the pending binary operator token. *) -let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 - -(* primary - * ::= identifier - * ::= numberexpr - * ::= parenexpr *) -let rec parse_primary = parser - (* numberexpr ::= number *) - | [< 'Token.Number n >] -> Ast.Number n - - (* parenexpr ::= '(' expression ')' *) - | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e - - (* identifierexpr - * ::= identifier - * ::= identifier '(' argumentexpr ')' *) - | [< 'Token.Ident id; stream >] -> - let rec parse_args accumulator = parser - | [< e=parse_expr; stream >] -> - begin parser - | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e - | [< >] -> e :: accumulator - end stream - | [< >] -> accumulator - in - let rec parse_ident id = parser - (* Call. *) - | [< 'Token.Kwd '('; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')'">] -> - Ast.Call (id, Array.of_list (List.rev args)) - - (* Simple variable ref. *) - | [< >] -> Ast.Variable id - in - parse_ident id stream - - | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") - -(* binoprhs - * ::= ('+' primary)* *) -and parse_bin_rhs expr_prec lhs stream = - match Stream.peek stream with - (* If this is a binop, find its precedence. *) - | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> - let token_prec = precedence c in - - (* If this is a binop that binds at least as tightly as the current binop, - * consume it, otherwise we are done. *) - if token_prec < expr_prec then lhs else begin - (* Eat the binop. *) - Stream.junk stream; - - (* Parse the primary expression after the binary operator. *) - let rhs = parse_primary stream in - - (* Okay, we know this is a binop. *) - let rhs = - match Stream.peek stream with - | Some (Token.Kwd c2) -> - (* If BinOp binds less tightly with rhs than the operator after - * rhs, let the pending operator take rhs as its lhs. *) - let next_prec = precedence c2 in - if token_prec < next_prec - then parse_bin_rhs (token_prec + 1) rhs stream - else rhs - | _ -> rhs - in - - (* Merge lhs/rhs. *) - let lhs = Ast.Binary (c, lhs, rhs) in - parse_bin_rhs expr_prec lhs stream - end - | _ -> lhs - -(* expression - * ::= primary binoprhs *) -and parse_expr = parser - | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream - -(* prototype - * ::= id '(' id* ')' *) -let parse_prototype = - let rec parse_args accumulator = parser - | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e - | [< >] -> accumulator - in - - parser - | [< 'Token.Ident id; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - (* success. *) - Ast.Prototype (id, Array.of_list (List.rev args)) - - | [< >] -> - raise (Stream.Error "expected function name in prototype") - -(* definition ::= 'def' prototype expression *) -let parse_definition = parser - | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> - Ast.Function (p, e) - -(* toplevelexpr ::= expression *) -let parse_toplevel = parser - | [< e=parse_expr >] -> - (* Make an anonymous proto. *) - Ast.Function (Ast.Prototype ("", [||]), e) - -(* external ::= 'extern' prototype *) -let parse_extern = parser - | [< 'Token.Extern; e=parse_prototype >] -> e -</pre> -</dd> - -<dt>toplevel.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Top-Level parsing and JIT Driver - *===----------------------------------------------------------------------===*) - -(* top ::= definition | external | expression | ';' *) -let rec main_loop stream = - match Stream.peek stream with - | None -> () - - (* ignore top-level semicolons. *) - | Some (Token.Kwd ';') -> - Stream.junk stream; - main_loop stream - - | Some token -> - begin - try match token with - | Token.Def -> - ignore(Parser.parse_definition stream); - print_endline "parsed a function definition."; - | Token.Extern -> - ignore(Parser.parse_extern stream); - print_endline "parsed an extern."; - | _ -> - (* Evaluate a top-level expression into an anonymous function. *) - ignore(Parser.parse_toplevel stream); - print_endline "parsed a top-level expr"; - with Stream.Error s -> - (* Skip token for error recovery. *) - Stream.junk stream; - print_endline s; - end; - print_string "ready> "; flush stdout; - main_loop stream -</pre> -</dd> - -<dt>toy.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Main driver code. - *===----------------------------------------------------------------------===*) - -let main () = - (* Install standard binary operators. - * 1 is the lowest precedence. *) - Hashtbl.add Parser.binop_precedence '<' 10; - Hashtbl.add Parser.binop_precedence '+' 20; - Hashtbl.add Parser.binop_precedence '-' 20; - Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) - - (* Prime the first token. *) - print_string "ready> "; flush stdout; - let stream = Lexer.lex (Stream.of_channel stdin) in - - (* Run the main "interpreter loop" now. *) - Toplevel.main_loop stream; -;; - -main () -</pre> -</dd> -</dl> - -<a href="OCamlLangImpl3.html">Next: Implementing Code Generation to LLVM IR</a> -</div> - -<!-- *********************************************************************** --> -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> - - <a href="mailto:sabre@nondot.org">Chris Lattner</a> - <a href="mailto:erickt@users.sourceforge.net">Erick Tryzelaar</a><br> - <a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> -</body> -</html> diff --git a/docs/tutorial/OCamlLangImpl2.rst b/docs/tutorial/OCamlLangImpl2.rst new file mode 100644 index 0000000000..07490e1f67 --- /dev/null +++ b/docs/tutorial/OCamlLangImpl2.rst @@ -0,0 +1,899 @@ +=========================================== +Kaleidoscope: Implementing a Parser and AST +=========================================== + +.. contents:: + :local: + +Written by `Chris Lattner <mailto:sabre@nondot.org>`_ and `Erick +Tryzelaar <mailto:idadesub@users.sourceforge.net>`_ + +Chapter 2 Introduction +====================== + +Welcome to Chapter 2 of the "`Implementing a language with LLVM in +Objective Caml <index.html>`_" tutorial. This chapter shows you how to +use the lexer, built in `Chapter 1 <OCamlLangImpl1.html>`_, to build a +full `parser <http://en.wikipedia.org/wiki/Parsing>`_ for our +Kaleidoscope language. Once we have a parser, we'll define and build an +`Abstract Syntax +Tree <http://en.wikipedia.org/wiki/Abstract_syntax_tree>`_ (AST). + +The parser we will build uses a combination of `Recursive Descent +Parsing <http://en.wikipedia.org/wiki/Recursive_descent_parser>`_ and +`Operator-Precedence +Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_ to +parse the Kaleidoscope language (the latter for binary expressions and +the former for everything else). Before we get to parsing though, lets +talk about the output of the parser: the Abstract Syntax Tree. + +The Abstract Syntax Tree (AST) +============================== + +The AST for a program captures its behavior in such a way that it is +easy for later stages of the compiler (e.g. code generation) to +interpret. We basically want one object for each construct in the +language, and the AST should closely model the language. In +Kaleidoscope, we have expressions, a prototype, and a function object. +We'll start with expressions first: + +.. code-block:: ocaml + + (* expr - Base type for all expression nodes. *) + type expr = + (* variant for numeric literals like "1.0". *) + | Number of float + +The code above shows the definition of the base ExprAST class and one +subclass which we use for numeric literals. The important thing to note +about this code is that the Number variant captures the numeric value of +the literal as an instance variable. This allows later phases of the +compiler to know what the stored numeric value is. + +Right now we only create the AST, so there are no useful functions on +them. It would be very easy to add a function to pretty print the code, +for example. Here are the other expression AST node definitions that +we'll use in the basic form of the Kaleidoscope language: + +.. code-block:: ocaml + + (* variant for referencing a variable, like "a". *) + | Variable of string + + (* variant for a binary operator. *) + | Binary of char * expr * expr + + (* variant for function calls. *) + | Call of string * expr array + +This is all (intentionally) rather straight-forward: variables capture +the variable name, binary operators capture their opcode (e.g. '+'), and +calls capture a function name as well as a list of any argument +expressions. One thing that is nice about our AST is that it captures +the language features without talking about the syntax of the language. +Note that there is no discussion about precedence of binary operators, +lexical structure, etc. + +For our basic language, these are all of the expression nodes we'll +define. Because it doesn't have conditional control flow, it isn't +Turing-complete; we'll fix that in a later installment. The two things +we need next are a way to talk about the interface to a function, and a +way to talk about functions themselves: + +.. code-block:: ocaml + + (* proto - This type represents the "prototype" for a function, which captures + * its name, and its argument names (thus implicitly the number of arguments the + * function takes). *) + type proto = Prototype of string * string array + + (* func - This type represents a function definition itself. *) + type func = Function of proto * expr + +In Kaleidoscope, functions are typed with just a count of their +arguments. Since all values are double precision floating point, the +type of each argument doesn't need to be stored anywhere. In a more +aggressive and realistic language, the "expr" variants would probably +have a type field. + +With this scaffolding, we can now talk about parsing expressions and +function bodies in Kaleidoscope. + +Parser Basics +============= + +Now that we have an AST to build, we need to define the parser code to +build it. The idea here is that we want to parse something like "x+y" +(which is returned as three tokens by the lexer) into an AST that could +be generated with calls like this: + +.. code-block:: ocaml + + let x = Variable "x" in + let y = Variable "y" in + let result = Binary ('+', x, y) in + ... + +The error handling routines make use of the builtin ``Stream.Failure`` +and ``Stream.Error``s. ``Stream.Failure`` is raised when the parser is +unable to find any matching token in the first position of a pattern. +``Stream.Error`` is raised when the first token matches, but the rest do +not. The error recovery in our parser will not be the best and is not +particular user-friendly, but it will be enough for our tutorial. These +exceptions make it easier to handle errors in routines that have various +return types. + +With these basic types and exceptions, we can implement the first piece +of our grammar: numeric literals. + +Basic Expression Parsing +======================== + +We start with numeric literals, because they are the simplest to +process. For each production in our grammar, we'll define a function +which parses that production. We call this class of expressions +"primary" expressions, for reasons that will become more clear `later in +the tutorial <OCamlLangImpl6.html#unary>`_. In order to parse an +arbitrary primary expression, we need to determine what sort of +expression it is. For numeric literals, we have: + +.. code-block:: ocaml + + (* primary + * ::= identifier + * ::= numberexpr + * ::= parenexpr *) + parse_primary = parser + (* numberexpr ::= number *) + | [< 'Token.Number n >] -> Ast.Number n + +This routine is very simple: it expects to be called when the current +token is a ``Token.Number`` token. It takes the current number value, +creates a ``Ast.Number`` node, advances the lexer to the next token, and +finally returns. + +There are some interesting aspects to this. The most important one is +that this routine eats all of the tokens that correspond to the +production and returns the lexer buffer with the next token (which is +not part of the grammar production) ready to go. This is a fairly +standard way to go for recursive descent parsers. For a better example, +the parenthesis operator is defined like this: + +.. code-block:: ocaml + + (* parenexpr ::= '(' expression ')' *) + | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e + +This function illustrates a number of interesting things about the +parser: + +1) It shows how we use the ``Stream.Error`` exception. When called, this +function expects that the current token is a '(' token, but after +parsing the subexpression, it is possible that there is no ')' waiting. +For example, if the user types in "(4 x" instead of "(4)", the parser +should emit an error. Because errors can occur, the parser needs a way +to indicate that they happened. In our parser, we use the camlp4 +shortcut syntax ``token ?? "parse error"``, where if the token before +the ``??`` does not match, then ``Stream.Error "parse error"`` will be +raised. + +2) Another interesting aspect of this function is that it uses recursion +by calling ``Parser.parse_primary`` (we will soon see that +``Parser.parse_primary`` can call ``Parser.parse_primary``). This is +powerful because it allows us to handle recursive grammars, and keeps +each production very simple. Note that parentheses do not cause +construction of AST nodes themselves. While we could do it this way, the +most important role of parentheses are to guide the parser and provide +grouping. Once the parser constructs the AST, parentheses are not +needed. + +The next simple production is for handling variable references and +function calls: + +.. code-block:: ocaml + + (* identifierexpr + * ::= identifier + * ::= identifier '(' argumentexpr ')' *) + | [< 'Token.Ident id; stream >] -> + let rec parse_args accumulator = parser + | [< e=parse_expr; stream >] -> + begin parser + | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e + | [< >] -> e :: accumulator + end stream + | [< >] -> accumulator + in + let rec parse_ident id = parser + (* Call. *) + | [< 'Token.Kwd '('; + args=parse_args []; + 'Token.Kwd ')' ?? "expected ')'">] -> + Ast.Call (id, Array.of_list (List.rev args)) + + (* Simple variable ref. *) + | [< >] -> Ast.Variable id + in + parse_ident id stream + +This routine follows the same style as the other routines. (It expects +to be called if the current token is a ``Token.Ident`` token). It also +has recursion and error handling. One interesting aspect of this is that +it uses *look-ahead* to determine if the current identifier is a stand +alone variable reference or if it is a function call expression. It +handles this by checking to see if the token after the identifier is a +'(' token, constructing either a ``Ast.Variable`` or ``Ast.Call`` node +as appropriate. + +We finish up by raising an exception if we received a token we didn't +expect: + +.. code-block:: ocaml + + | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") + +Now that basic expressions are handled, we need to handle binary +expressions. They are a bit more complex. + +Binary Expression Parsing +========================= + +Binary expressions are significantly harder to parse because they are +often ambiguous. For example, when given the string "x+y\*z", the parser +can choose to parse it as either "(x+y)\*z" or "x+(y\*z)". With common +definitions from mathematics, we expect the later parse, because "\*" +(multiplication) has higher *precedence* than "+" (addition). + +There are many ways to handle this, but an elegant and efficient way is +to use `Operator-Precedence +Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_. +This parsing technique uses the precedence of binary operators to guide +recursion. To start with, we need a table of precedences: + +.. code-block:: ocaml + + (* binop_precedence - This holds the precedence for each binary operator that is + * defined *) + let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 + + (* precedence - Get the precedence of the pending binary operator token. *) + let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 + + ... + + let main () = + (* Install standard binary operators. + * 1 is the lowest precedence. *) + Hashtbl.add Parser.binop_precedence '<' 10; + Hashtbl.add Parser.binop_precedence '+' 20; + Hashtbl.add Parser.binop_precedence '-' 20; + Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) + ... + +For the basic form of Kaleidoscope, we will only support 4 binary +operators (this can obviously be extended by you, our brave and intrepid +reader). The ``Parser.precedence`` function returns the precedence for +the current token, or -1 if the token is not a binary operator. Having a +``Hashtbl.t`` makes it easy to add new operators and makes it clear that +the algorithm doesn't depend on the specific operators involved, but it +would be easy enough to eliminate the ``Hashtbl.t`` and do the +comparisons in the ``Parser.precedence`` function. (Or just use a +fixed-size array). + +With the helper above defined, we can now start parsing binary +expressions. The basic idea of operator precedence parsing is to break +down an expression with potentially ambiguous binary operators into +pieces. Consider ,for example, the expression "a+b+(c+d)\*e\*f+g". +Operator precedence parsing considers this as a stream of primary +expressions separated by binary operators. As such, it will first parse +the leading primary expression "a", then it will see the pairs [+, b] +[+, (c+d)] [\*, e] [\*, f] and [+, g]. Note that because parentheses are +primary expressions, the binary expression parser doesn't need to worry +about nested subexpressions like (c+d) at all. + +To start, an expression is a primary expression potentially followed by +a sequence of [binop,primaryexpr] pairs: + +.. code-block:: ocaml + + (* expression + * ::= primary binoprhs *) + and parse_expr = parser + | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream + +``Parser.parse_bin_rhs`` is the function that parses the sequence of +pairs for us. It takes a precedence and a pointer to an expression for +the part that has been parsed so far. Note that "x" is a perfectly valid +expression: As such, "binoprhs" is allowed to be empty, in which case it +returns the expression that is passed into it. In our example above, the +code passes the expression for "a" into ``Parser.parse_bin_rhs`` and the +current token is "+". + +The precedence value passed into ``Parser.parse_bin_rhs`` indicates the +*minimal operator precedence* that the function is allowed to eat. For +example, if the current pair stream is [+, x] and +``Parser.parse_bin_rhs`` is passed in a precedence of 40, it will not +consume any tokens (because the precedence of '+' is only 20). With this +in mind, ``Parser.parse_bin_rhs`` starts with: + +.. code-block:: ocaml + + (* binoprhs + * ::= ('+' primary)* *) + and parse_bin_rhs expr_prec lhs stream = + match Stream.peek stream with + (* If this is a binop, find its precedence. *) + | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> + let token_prec = precedence c in + + (* If this is a binop that binds at least as tightly as the current binop, + * consume it, otherwise we are done. *) + if token_prec < expr_prec then lhs else begin + +This code gets the precedence of the current token and checks to see if +if is too low. Because we defined invalid tokens to have a precedence of +-1, this check implicitly knows that the pair-stream ends when the token +stream runs out of binary operators. If this check succeeds, we know +that the token is a binary operator and that it will be included in this +expression: + +.. code-block:: ocaml + + (* Eat the binop. *) + Stream.junk stream; + + (* Okay, we know this is a binop. *) + let rhs = + match Stream.peek stream with + | Some (Token.Kwd c2) -> + +As such, this code eats (and remembers) the binary operator and then +parses the primary expression that follows. This builds up the whole +pair, the first of which is [+, b] for the running example. + +Now that we parsed the left-hand side of an expression and one pair of +the RHS sequence, we have to decide which way the expression associates. +In particular, we could have "(a+b) binop unparsed" or "a + (b binop +unparsed)". To determine this, we look ahead at "binop" to determine its +precedence and compare it to BinOp's precedence (which is '+' in this +case): + +.. code-block:: ocaml + + (* If BinOp binds less tightly with rhs than the operator after + * rhs, let the pending operator take rhs as its lhs. *) + let next_prec = precedence c2 in + if token_prec < next_prec + +If the precedence of the binop to the right of "RHS" is lower or equal +to the precedence of our current operator, then we know that the +parentheses associate as "(a+b) binop ...". In our example, the current +operator is "+" and the next operator is "+", we know that they have the +same precedence. In this case we'll create the AST node for "a+b", and +then continue parsing: + +.. code-block:: ocaml + + ... if body omitted ... + in + + (* Merge lhs/rhs. *) + let lhs = Ast.Binary (c, lhs, rhs) in + parse_bin_rhs expr_prec lhs stream + end + +In our example above, this will turn "a+b+" into "(a+b)" and execute the +next iteration of the loop, with "+" as the current token. The code +above will eat, remember, and parse "(c+d)" as the primary expression, +which makes the current pair equal to [+, (c+d)]. It will then evaluate +the 'if' conditional above with "\*" as the binop to the right of the +primary. In this case, the precedence of "\*" is higher than the +precedence of "+" so the if condition will be entered. + +The critical question left here is "how can the if condition parse the +right hand side in full"? In particular, to build the AST correctly for +our example, it needs to get all of "(c+d)\*e\*f" as the RHS expression +variable. The code to do this is surprisingly simple (code from the +above two blocks duplicated for context): + +.. code-block:: ocaml + + match Stream.peek stream with + | Some (Token.Kwd c2) -> + (* If BinOp binds less tightly with rhs than the operator after + * rhs, let the pending operator take rhs as its lhs. *) + if token_prec < precedence c2 + then parse_bin_rhs (token_prec + 1) rhs stream + else rhs + | _ -> rhs + in + + (* Merge lhs/rhs. *) + let lhs = Ast.Binary (c, lhs, rhs) in + parse_bin_rhs expr_prec lhs stream + end + +At this point, we know that the binary operator to the RHS of our +primary has higher precedence than the binop we are currently parsing. +As such, we know that any sequence of pairs whose operators are all +higher precedence than "+" should be parsed together and returned as +"RHS". To do this, we recursively invoke the ``Parser.parse_bin_rhs`` +function specifying "token\_prec+1" as the minimum precedence required +for it to continue. In our example above, this will cause it to return +the AST node for "(c+d)\*e\*f" as RHS, which is then set as the RHS of +the '+' expression. + +Finally, on the next iteration of the while loop, the "+g" piece is +parsed and added to the AST. With this little bit of code (14 +non-trivial lines), we correctly handle fully general binary expression +parsing in a very elegant way. This was a whirlwind tour of this code, +and it is somewhat subtle. I recommend running through it with a few +tough examples to see how it works. + +This wraps up handling of expressions. At this point, we can point the +parser at an arbitrary token stream and build an expression from it, +stopping at the first token that is not part of the expression. Next up +we need to handle function definitions, etc. + +Parsing the Rest +================ + +The next thing missing is handling of function prototypes. In +Kaleidoscope, these are used both for 'extern' function declarations as +well as function body definitions. The code to do this is +straight-forward and not very interesting (once you've survived +expressions): + +.. code-block:: ocaml + + (* prototype + * ::= id '(' id* ')' *) + let parse_prototype = + let rec parse_args accumulator = parser + | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e + | [< >] -> accumulator + in + + parser + | [< 'Token.Ident id; + 'Token.Kwd '(' ?? "expected '(' in prototype"; + args=parse_args []; + 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> + (* success. *) + Ast.Prototype (id, Array.of_list (List.rev args)) + + | [< >] -> + raise (Stream.Error "expected function name in prototype") + +Given this, a function definition is very simple, just a prototype plus +an expression to implement the body: + +.. code-block:: ocaml + + (* definition ::= 'def' prototype expression *) + let parse_definition = parser + | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> + Ast.Function (p, e) + +In addition, we support 'extern' to declare functions like 'sin' and +'cos' as well as to support forward declaration of user functions. These +'extern's are just prototypes with no body: + +.. code-block:: ocaml + + (* external ::= 'extern' prototype *) + let parse_extern = parser + | [< 'Token.Extern; e=parse_prototype >] -> e + +Finally, we'll also let the user type in arbitrary top-level expressions +and evaluate them on the fly. We will handle this by defining anonymous +nullary (zero argument) functions for them: + +.. code-block:: ocaml + + (* toplevelexpr ::= expression *) + let parse_toplevel = parser + | [< e=parse_expr >] -> + (* Make an anonymous proto. *) + Ast.Function (Ast.Prototype ("", [||]), e) + +Now that we have all the pieces, let's build a little driver that will +let us actually *execute* this code we've built! + +The Driver +========== + +The driver for this simply invokes all of the parsing pieces with a +top-level dispatch loop. There isn't much interesting here, so I'll just +include the top-level loop. See `below <#code>`_ for full code in the +"Top-Level Parsing" section. + +.. code-block:: ocaml + + (* top ::= definition | external | expression | ';' *) + let rec main_loop stream = + match Stream.peek stream with + | None -> () + + (* ignore top-level semicolons. *) + | Some (Token.Kwd ';') -> + Stream.junk stream; + main_loop stream + + | Some token -> + begin + try match token with + | Token.Def -> + ignore(Parser.parse_definition stream); + print_endline "parsed a function definition."; + | Token.Extern -> + ignore(Parser.parse_extern stream); + print_endline "parsed an extern."; + | _ -> + (* Evaluate a top-level expression into an anonymous function. *) + ignore(Parser.parse_toplevel stream); + print_endline "parsed a top-level expr"; + with Stream.Error s -> + (* Skip token for error recovery. *) + Stream.junk stream; + print_endline s; + end; + print_string "ready> "; flush stdout; + main_loop stream + +The most interesting part of this is that we ignore top-level +semicolons. Why is this, you ask? The basic reason is that if you type +"4 + 5" at the command line, the parser doesn't know whether that is the +end of what you will type or not. For example, on the next line you +could type "def foo..." in which case 4+5 is the end of a top-level +expression. Alternatively you could type "\* 6", which would continue +the expression. Having top-level semicolons allows you to type "4+5;", +and the parser will know you are done. + +Conclusions +=========== + +With just under 300 lines of commented code (240 lines of non-comment, +non-blank code), we fully defined our minimal language, including a +lexer, parser, and AST builder. With this done, the executable will +validate Kaleidoscope code and tell us if it is grammatically invalid. +For example, here is a sample interaction: + +.. code-block:: bash + + $ ./toy.byte + ready> def foo(x y) x+foo(y, 4.0); + Parsed a function definition. + ready> def foo(x y) x+y y; + Parsed a function definition. + Parsed a top-level expr + ready> def foo(x y) x+y ); + Parsed a function definition. + Error: unknown token when expecting an expression + ready> extern sin(a); + ready> Parsed an extern + ready> ^D + $ + +There is a lot of room for extension here. You can define new AST nodes, +extend the language in many ways, etc. In the `next +installment <OCamlLangImpl3.html>`_, we will describe how to generate +LLVM Intermediate Representation (IR) from the AST. + +Full Code Listing +================= + +Here is the complete code listing for this and the previous chapter. +Note that it is fully self-contained: you don't need LLVM or any +external libraries at all for this. (Besides the ocaml standard +libraries, of course.) To build this, just compile with: + +.. code-block:: bash + + # Compile + ocamlbuild toy.byte + # Run + ./toy.byte + +Here is the code: + +\_tags: + :: + + <{lexer,parser}.ml>: use_camlp4, pp(camlp4of) + +token.ml: + .. code-block:: ocaml + + (*===----------------------------------------------------------------------=== + * Lexer Tokens + *===----------------------------------------------------------------------===*) + + (* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of + * these others for known things. *) + type token = + (* commands *) + | Def | Extern + + (* primary *) + | Ident of string | Number of float + + (* unknown *) + | Kwd of char + +lexer.ml: + .. code-block:: ocaml + + (*===----------------------------------------------------------------------=== + * Lexer + *===----------------------------------------------------------------------===*) + + let rec lex = parser + (* Skip any whitespace. *) + | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream + + (* identifier: [a-zA-Z][a-zA-Z0-9] *) + | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> + let buffer = Buffer.create 1 in + Buffer.add_char buffer c; + lex_ident buffer stream + + (* number: [0-9.]+ *) + | [< ' ('0' .. '9' as c); stream >] -> + let buffer = Buffer.create 1 in + Buffer.add_char buffer c; + lex_number buffer stream + + (* Comment until end of line. *) + | [< ' ('#'); stream >] -> + lex_comment stream + + (* Otherwise, just return the character as its ascii value. *) + | [< 'c; stream >] -> + [< 'Token.Kwd c; lex stream >] + + (* end of stream. *) + | [< >] -> [< >] + + and lex_number buffer = parser + | [< ' ('0' .. '9' | '.' as c); stream >] -> + Buffer.add_char buffer c; + lex_number buffer stream + | [< stream=lex >] -> + [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] + + and lex_ident buffer = parser + | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> + Buffer.add_char buffer c; + lex_ident buffer stream + | [< stream=lex >] -> + match Buffer.contents buffer with + | "def" -> [< 'Token.Def; stream >] + | "extern" -> [< 'Token.Extern; stream >] + | id -> [< 'Token.Ident id; stream >] + + and lex_comment = parser + | [< ' ('\n'); stream=lex >] -> stream + | [< 'c; e=lex_comment >] -> e + | [< >] -> [< >] + +ast.ml: + .. code-block:: ocaml + + (*===----------------------------------------------------------------------=== + * Abstract Syntax Tree (aka Parse Tree) + *===----------------------------------------------------------------------===*) + + (* expr - Base type for all expression nodes. *) + type expr = + (* variant for numeric literals like "1.0". *) + | Number of float + + (* variant for referencing a variable, like "a". *) + | Variable of string + + (* variant for a binary operator. *) + | Binary of char * expr * expr + + (* variant for function calls. *) + | Call of string * expr array + + (* proto - This type represents the "prototype" for a function, which captures + * its name, and its argument names (thus implicitly the number of arguments the + * function takes). *) + type proto = Prototype of string * string array + + (* func - This type represents a function definition itself. *) + type func = Function of proto * expr + +parser.ml: + .. code-block:: ocaml + + (*===---------------------------------------------------------------------=== + * Parser + *===---------------------------------------------------------------------===*) + + (* binop_precedence - This holds the precedence for each binary operator that is + * defined *) + let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 + + (* precedence - Get the precedence of the pending binary operator token. *) + let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 + + (* primary + * ::= identifier + * ::= numberexpr + * ::= parenexpr *) + let rec parse_primary = parser + (* numberexpr ::= number *) + | [< 'Token.Number n >] -> Ast.Number n + + (* parenexpr ::= '(' expression ')' *) + | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e + + (* identifierexpr + * ::= identifier + * ::= identifier '(' argumentexpr ')' *) + | [< 'Token.Ident id; stream >] -> + let rec parse_args accumulator = parser + | [< e=parse_expr; stream >] -> + begin parser + | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e + | [< >] -> e :: accumulator + end stream + | [< >] -> accumulator + in + let rec parse_ident id = parser + (* Call. *) + | [< 'Token.Kwd '('; + args=parse_args []; + 'Token.Kwd ')' ?? "expected ')'">] -> + Ast.Call (id, Array.of_list (List.rev args)) + + (* Simple variable ref. *) + | [< >] -> Ast.Variable id + in + parse_ident id stream + + | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") + + (* binoprhs + * ::= ('+' primary)* *) + and parse_bin_rhs expr_prec lhs stream = + match Stream.peek stream with + (* If this is a binop, find its precedence. *) + | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> + let token_prec = precedence c in + + (* If this is a binop that binds at least as tightly as the current binop, + * consume it, otherwise we are done. *) + if token_prec < expr_prec then lhs else begin + (* Eat the binop. *) + Stream.junk stream; + + (* Parse the primary expression after the binary operator. *) + let rhs = parse_primary stream in + + (* Okay, we know this is a binop. *) + let rhs = + match Stream.peek stream with + | Some (Token.Kwd c2) -> + (* If BinOp binds less tightly with rhs than the operator after + * rhs, let the pending operator take rhs as its lhs. *) + let next_prec = precedence c2 in + if token_prec < next_prec + then parse_bin_rhs (token_prec + 1) rhs stream + else rhs + | _ -> rhs + in + + (* Merge lhs/rhs. *) + let lhs = Ast.Binary (c, lhs, rhs) in + parse_bin_rhs expr_prec lhs stream + end + | _ -> lhs + + (* expression + * ::= primary binoprhs *) + and parse_expr = parser + | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream + + (* prototype + * ::= id '(' id* ')' *) + let parse_prototype = + let rec parse_args accumulator = parser + | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e + | [< >] -> accumulator + in + + parser + | [< 'Token.Ident id; + 'Token.Kwd '(' ?? "expected '(' in prototype"; + args=parse_args []; + 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> + (* success. *) + Ast.Prototype (id, Array.of_list (List.rev args)) + + | [< >] -> + raise (Stream.Error "expected function name in prototype") + + (* definition ::= 'def' prototype expression *) + let parse_definition = parser + | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> + Ast.Function (p, e) + + (* toplevelexpr ::= expression *) + let parse_toplevel = parser + | [< e=parse_expr >] -> + (* Make an anonymous proto. *) + Ast.Function (Ast.Prototype ("", [||]), e) + + (* external ::= 'extern' prototype *) + let parse_extern = parser + | [< 'Token.Extern; e=parse_prototype >] -> e + +toplevel.ml: + .. code-block:: ocaml + + (*===----------------------------------------------------------------------=== + * Top-Level parsing and JIT Driver + *===----------------------------------------------------------------------===*) + + (* top ::= definition | external | expression | ';' *) + let rec main_loop stream = + match Stream.peek stream with + | None -> () + + (* ignore top-level semicolons. *) + | Some (Token.Kwd ';') -> + Stream.junk stream; + main_loop stream + + | Some token -> + begin + try match token with + | Token.Def -> + ignore(Parser.parse_definition stream); + print_endline "parsed a function definition."; + | Token.Extern -> + ignore(Parser.parse_extern stream); + print_endline "parsed an extern."; + | _ -> + (* Evaluate a top-level expression into an anonymous function. *) + ignore(Parser.parse_toplevel stream); + print_endline "parsed a top-level expr"; + with Stream.Error s -> + (* Skip token for error recovery. *) + Stream.junk stream; + print_endline s; + end; + print_string "ready> "; flush stdout; + main_loop stream + +toy.ml: + .. code-block:: ocaml + + (*===----------------------------------------------------------------------=== + * Main driver code. + *===----------------------------------------------------------------------===*) + + let main () = + (* Install standard binary operators. + * 1 is the lowest precedence. *) + Hashtbl.add Parser.binop_precedence '<' 10; + Hashtbl.add Parser.binop_precedence '+' 20; + Hashtbl.add Parser.binop_precedence '-' 20; + Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) + + (* Prime the first token. *) + print_string "ready> "; flush stdout; + let stream = Lexer.lex (Stream.of_channel stdin) in + + (* Run the main "interpreter loop" now. *) + Toplevel.main_loop stream; + ;; + + main () + +`Next: Implementing Code Generation to LLVM IR <OCamlLangImpl3.html>`_ + diff --git a/docs/tutorial/OCamlLangImpl3.html b/docs/tutorial/OCamlLangImpl3.html deleted file mode 100644 index a49a0b5d9c..0000000000 --- a/docs/tutorial/OCamlLangImpl3.html +++ /dev/null @@ -1,1093 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> - -<html> -<head> - <title>Kaleidoscope: Implementing code generation to LLVM IR</title> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <meta name="author" content="Chris Lattner"> - <meta name="author" content="Erick Tryzelaar"> - <link rel="stylesheet" href="../_static/llvm.css" type="text/css"> -</head> - -<body> - -<h1>Kaleidoscope: Code generation to LLVM IR</h1> - -<ul> -<li><a href="index.html">Up to Tutorial Index</a></li> -<li>Chapter 3 - <ol> - <li><a href="#intro">Chapter 3 Introduction</a></li> - <li><a href="#basics">Code Generation Setup</a></li> - <li><a href="#exprs">Expression Code Generation</a></li> - <li><a href="#funcs">Function Code Generation</a></li> - <li><a href="#driver">Driver Changes and Closing Thoughts</a></li> - <li><a href="#code">Full Code Listing</a></li> - </ol> -</li> -<li><a href="OCamlLangImpl4.html">Chapter 4</a>: Adding JIT and Optimizer -Support</li> -</ul> - -<div class="doc_author"> - <p> - Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a> - and <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a> - </p> -</div> - -<!-- *********************************************************************** --> -<h2><a name="intro">Chapter 3 Introduction</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>Welcome to Chapter 3 of the "<a href="index.html">Implementing a language -with LLVM</a>" tutorial. This chapter shows you how to transform the <a -href="OCamlLangImpl2.html">Abstract Syntax Tree</a>, built in Chapter 2, into -LLVM IR. This will teach you a little bit about how LLVM does things, as well -as demonstrate how easy it is to use. It's much more work to build a lexer and -parser than it is to generate LLVM IR code. :) -</p> - -<p><b>Please note</b>: the code in this chapter and later require LLVM 2.3 or -LLVM SVN to work. LLVM 2.2 and before will not work with it.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="basics">Code Generation Setup</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p> -In order to generate LLVM IR, we want some simple setup to get started. First -we define virtual code generation (codegen) methods in each AST class:</p> - -<div class="doc_code"> -<pre> -let rec codegen_expr = function - | Ast.Number n -> ... - | Ast.Variable name -> ... -</pre> -</div> - -<p>The <tt>Codegen.codegen_expr</tt> function says to emit IR for that AST node -along with all the things it depends on, and they all return an LLVM Value -object. "Value" is the class used to represent a "<a -href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Static Single -Assignment (SSA)</a> register" or "SSA value" in LLVM. The most distinct aspect -of SSA values is that their value is computed as the related instruction -executes, and it does not get a new value until (and if) the instruction -re-executes. In other words, there is no way to "change" an SSA value. For -more information, please read up on <a -href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Static Single -Assignment</a> - the concepts are really quite natural once you grok them.</p> - -<p>The -second thing we want is an "Error" exception like we used for the parser, which -will be used to report errors found during code generation (for example, use of -an undeclared parameter):</p> - -<div class="doc_code"> -<pre> -exception Error of string - -let context = global_context () -let the_module = create_module context "my cool jit" -let builder = builder context -let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 -let double_type = double_type context -</pre> -</div> - -<p>The static variables will be used during code generation. -<tt>Codgen.the_module</tt> is the LLVM construct that contains all of the -functions and global variables in a chunk of code. In many ways, it is the -top-level structure that the LLVM IR uses to contain code.</p> - -<p>The <tt>Codegen.builder</tt> object is a helper object that makes it easy to -generate LLVM instructions. Instances of the <a -href="http://llvm.org/doxygen/IRBuilder_8h-source.html"><tt>IRBuilder</tt></a> -class keep track of the current place to insert instructions and has methods to -create new instructions.</p> - -<p>The <tt>Codegen.named_values</tt> map keeps track of which values are defined -in the current scope and what their LLVM representation is. (In other words, it -is a symbol table for the code). In this form of Kaleidoscope, the only things -that can be referenced are function parameters. As such, function parameters -will be in this map when generating code for their function body.</p> - -<p> -With these basics in place, we can start talking about how to generate code for -each expression. Note that this assumes that the <tt>Codgen.builder</tt> has -been set up to generate code <em>into</em> something. For now, we'll assume -that this has already been done, and we'll just use it to emit code.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="exprs">Expression Code Generation</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>Generating LLVM code for expression nodes is very straightforward: less -than 30 lines of commented code for all four of our expression nodes. First -we'll do numeric literals:</p> - -<div class="doc_code"> -<pre> - | Ast.Number n -> const_float double_type n -</pre> -</div> - -<p>In the LLVM IR, numeric constants are represented with the -<tt>ConstantFP</tt> class, which holds the numeric value in an <tt>APFloat</tt> -internally (<tt>APFloat</tt> has the capability of holding floating point -constants of <em>A</em>rbitrary <em>P</em>recision). This code basically just -creates and returns a <tt>ConstantFP</tt>. Note that in the LLVM IR -that constants are all uniqued together and shared. For this reason, the API -uses "the foo::get(..)" idiom instead of "new foo(..)" or "foo::Create(..)".</p> - -<div class="doc_code"> -<pre> - | Ast.Variable name -> - (try Hashtbl.find named_values name with - | Not_found -> raise (Error "unknown variable name")) -</pre> -</div> - -<p>References to variables are also quite simple using LLVM. In the simple -version of Kaleidoscope, we assume that the variable has already been emitted -somewhere and its value is available. In practice, the only values that can be -in the <tt>Codegen.named_values</tt> map are function arguments. This code -simply checks to see that the specified name is in the map (if not, an unknown -variable is being referenced) and returns the value for it. In future chapters, -we'll add support for <a href="LangImpl5.html#for">loop induction variables</a> -in the symbol table, and for <a href="LangImpl7.html#localvars">local -variables</a>.</p> - -<div class="doc_code"> -<pre> - | Ast.Binary (op, lhs, rhs) -> - let lhs_val = codegen_expr lhs in - let rhs_val = codegen_expr rhs in - begin - match op with - | '+' -> build_fadd lhs_val rhs_val "addtmp" builder - | '-' -> build_fsub lhs_val rhs_val "subtmp" builder - | '*' -> build_fmul lhs_val rhs_val "multmp" builder - | '<' -> - (* Convert bool 0/1 to double 0.0 or 1.0 *) - let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in - build_uitofp i double_type "booltmp" builder - | _ -> raise (Error "invalid binary operator") - end -</pre> -</div> - -<p>Binary operators start to get more interesting. The basic idea here is that -we recursively emit code for the left-hand side of the expression, then the -right-hand side, then we compute the result of the binary expression. In this -code, we do a simple switch on the opcode to create the right LLVM instruction. -</p> - -<p>In the example above, the LLVM builder class is starting to show its value. -IRBuilder knows where to insert the newly created instruction, all you have to -do is specify what instruction to create (e.g. with <tt>Llvm.create_add</tt>), -which operands to use (<tt>lhs</tt> and <tt>rhs</tt> here) and optionally -provide a name for the generated instruction.</p> - -<p>One nice thing about LLVM is that the name is just a hint. For instance, if -the code above emits multiple "addtmp" variables, LLVM will automatically -provide each one with an increasing, unique numeric suffix. Local value names -for instructions are purely optional, but it makes it much easier to read the -IR dumps.</p> - -<p><a href="../LangRef.html#instref">LLVM instructions</a> are constrained by -strict rules: for example, the Left and Right operators of -an <a href="../LangRef.html#i_add">add instruction</a> must have the same -type, and the result type of the add must match the operand types. Because -all values in Kaleidoscope are doubles, this makes for very simple code for add, -sub and mul.</p> - -<p>On the other hand, LLVM specifies that the <a -href="../LangRef.html#i_fcmp">fcmp instruction</a> always returns an 'i1' value -(a one bit integer). The problem with this is that Kaleidoscope wants the value to be a 0.0 or 1.0 value. In order to get these semantics, we combine the fcmp instruction with -a <a href="../LangRef.html#i_uitofp">uitofp instruction</a>. This instruction -converts its input integer into a floating point value by treating the input -as an unsigned value. In contrast, if we used the <a -href="../LangRef.html#i_sitofp">sitofp instruction</a>, the Kaleidoscope '<' -operator would return 0.0 and -1.0, depending on the input value.</p> - -<div class="doc_code"> -<pre> - | Ast.Call (callee, args) -> - (* Look up the name in the module table. *) - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "unknown function referenced") - in - let params = params callee in - - (* If argument mismatch error. *) - if Array.length params == Array.length args then () else - raise (Error "incorrect # arguments passed"); - let args = Array.map codegen_expr args in - build_call callee args "calltmp" builder -</pre> -</div> - -<p>Code generation for function calls is quite straightforward with LLVM. The -code above initially does a function name lookup in the LLVM Module's symbol -table. Recall that the LLVM Module is the container that holds all of the -functions we are JIT'ing. By giving each function the same name as what the -user specifies, we can use the LLVM symbol table to resolve function names for -us.</p> - -<p>Once we have the function to call, we recursively codegen each argument that -is to be passed in, and create an LLVM <a href="../LangRef.html#i_call">call -instruction</a>. Note that LLVM uses the native C calling conventions by -default, allowing these calls to also call into standard library functions like -"sin" and "cos", with no additional effort.</p> - -<p>This wraps up our handling of the four basic expressions that we have so far -in Kaleidoscope. Feel free to go in and add some more. For example, by -browsing the <a href="../LangRef.html">LLVM language reference</a> you'll find -several other interesting instructions that are really easy to plug into our -basic framework.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="funcs">Function Code Generation</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>Code generation for prototypes and functions must handle a number of -details, which make their code less beautiful than expression code -generation, but allows us to illustrate some important points. First, lets -talk about code generation for prototypes: they are used both for function -bodies and external function declarations. The code starts with:</p> - -<div class="doc_code"> -<pre> -let codegen_proto = function - | Ast.Prototype (name, args) -> - (* Make the function type: double(double,double) etc. *) - let doubles = Array.make (Array.length args) double_type in - let ft = function_type double_type doubles in - let f = - match lookup_function name the_module with -</pre> -</div> - -<p>This code packs a lot of power into a few lines. Note first that this -function returns a "Function*" instead of a "Value*" (although at the moment -they both are modeled by <tt>llvalue</tt> in ocaml). Because a "prototype" -really talks about the external interface for a function (not the value computed -by an expression), it makes sense for it to return the LLVM Function it -corresponds to when codegen'd.</p> - -<p>The call to <tt>Llvm.function_type</tt> creates the <tt>Llvm.llvalue</tt> -that should be used for a given Prototype. Since all function arguments in -Kaleidoscope are of type double, the first line creates a vector of "N" LLVM -double types. It then uses the <tt>Llvm.function_type</tt> method to create a -function type that takes "N" doubles as arguments, returns one double as a -result, and that is not vararg (that uses the function -<tt>Llvm.var_arg_function_type</tt>). Note that Types in LLVM are uniqued just -like <tt>Constant</tt>s are, so you don't "new" a type, you "get" it.</p> - -<p>The final line above checks if the function has already been defined in -<tt>Codegen.the_module</tt>. If not, we will create it.</p> - -<div class="doc_code"> -<pre> - | None -> declare_function name ft the_module -</pre> -</div> - -<p>This indicates the type and name to use, as well as which module to insert -into. By default we assume a function has -<tt>Llvm.Linkage.ExternalLinkage</tt>. "<a href="LangRef.html#linkage">external -linkage</a>" means that the function may be defined outside the current module -and/or that it is callable by functions outside the module. The "<tt>name</tt>" -passed in is the name the user specified: this name is registered in -"<tt>Codegen.the_module</tt>"s symbol table, which is used by the function call -code above.</p> - -<p>In Kaleidoscope, I choose to allow redefinitions of functions in two cases: -first, we want to allow 'extern'ing a function more than once, as long as the -prototypes for the externs match (since all arguments have the same type, we -just have to check that the number of arguments match). Second, we want to -allow 'extern'ing a function and then defining a body for it. This is useful -when defining mutually recursive functions.</p> - -<div class="doc_code"> -<pre> - (* If 'f' conflicted, there was already something named 'name'. If it - * has a body, don't allow redefinition or reextern. *) - | Some f -> - (* If 'f' already has a body, reject this. *) - if Array.length (basic_blocks f) == 0 then () else - raise (Error "redefinition of function"); - - (* If 'f' took a different number of arguments, reject. *) - if Array.length (params f) == Array.length args then () else - raise (Error "redefinition of function with different # args"); - f - in -</pre> -</div> - -<p>In order to verify the logic above, we first check to see if the pre-existing -function is "empty". In this case, empty means that it has no basic blocks in -it, which means it has no body. If it has no body, it is a forward -declaration. Since we don't allow anything after a full definition of the -function, the code rejects this case. If the previous reference to a function -was an 'extern', we simply verify that the number of arguments for that -definition and this one match up. If not, we emit an error.</p> - -<div class="doc_code"> -<pre> - (* Set names for all arguments. *) - Array.iteri (fun i a -> - let n = args.(i) in - set_value_name n a; - Hashtbl.add named_values n a; - ) (params f); - f -</pre> -</div> - -<p>The last bit of code for prototypes loops over all of the arguments in the -function, setting the name of the LLVM Argument objects to match, and registering -the arguments in the <tt>Codegen.named_values</tt> map for future use by the -<tt>Ast.Variable</tt> variant. Once this is set up, it returns the Function -object to the caller. Note that we don't check for conflicting -argument names here (e.g. "extern foo(a b a)"). Doing so would be very -straight-forward with the mechanics we have already used above.</p> - -<div class="doc_code"> -<pre> -let codegen_func = function - | Ast.Function (proto, body) -> - Hashtbl.clear named_values; - let the_function = codegen_proto proto in -</pre> -</div> - -<p>Code generation for function definitions starts out simply enough: we just -codegen the prototype (Proto) and verify that it is ok. We then clear out the -<tt>Codegen.named_values</tt> map to make sure that there isn't anything in it -from the last function we compiled. Code generation of the prototype ensures -that there is an LLVM Function object that is ready to go for us.</p> - -<div class="doc_code"> -<pre> - (* Create a new basic block to start insertion into. *) - let bb = append_block context "entry" the_function in - position_at_end bb builder; - - try - let ret_val = codegen_expr body in -</pre> -</div> - -<p>Now we get to the point where the <tt>Codegen.builder</tt> is set up. The -first line creates a new -<a href="http://en.wikipedia.org/wiki/Basic_block">basic block</a> (named -"entry"), which is inserted into <tt>the_function</tt>. The second line then -tells the builder that new instructions should be inserted into the end of the -new basic block. Basic blocks in LLVM are an important part of functions that -define the <a -href="http://en.wikipedia.org/wiki/Control_flow_graph">Control Flow Graph</a>. -Since we don't have any control flow, our functions will only contain one -block at this point. We'll fix this in <a href="OCamlLangImpl5.html">Chapter -5</a> :).</p> - -<div class="doc_code"> -<pre> - let ret_val = codegen_expr body in - - (* Finish off the function. *) - let _ = build_ret ret_val builder in - - (* Validate the generated code, checking for consistency. *) - Llvm_analysis.assert_valid_function the_function; - - the_function -</pre> -</div> - -<p>Once the insertion point is set up, we call the <tt>Codegen.codegen_func</tt> -method for the root expression of the function. If no error happens, this emits -code to compute the expression into the entry block and returns the value that -was computed. Assuming no error, we then create an LLVM <a -href="../LangRef.html#i_ret">ret instruction</a>, which completes the function. -Once the function is built, we call -<tt>Llvm_analysis.assert_valid_function</tt>, which is provided by LLVM. This -function does a variety of consistency checks on the generated code, to -determine if our compiler is doing everything right. Using this is important: -it can catch a lot of bugs. Once the function is finished and validated, we -return it.</p> - -<div class="doc_code"> -<pre> - with e -> - delete_function the_function; - raise e -</pre> -</div> - -<p>The only piece left here is handling of the error case. For simplicity, we -handle this by merely deleting the function we produced with the -<tt>Llvm.delete_function</tt> method. This allows the user to redefine a -function that they incorrectly typed in before: if we didn't delete it, it -would live in the symbol table, with a body, preventing future redefinition.</p> - -<p>This code does have a bug, though. Since the <tt>Codegen.codegen_proto</tt> -can return a previously defined forward declaration, our code can actually delete -a forward declaration. There are a number of ways to fix this bug, see what you -can come up with! Here is a testcase:</p> - -<div class="doc_code"> -<pre> -extern foo(a b); # ok, defines foo. -def foo(a b) c; # error, 'c' is invalid. -def bar() foo(1, 2); # error, unknown function "foo" -</pre> -</div> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="driver">Driver Changes and Closing Thoughts</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p> -For now, code generation to LLVM doesn't really get us much, except that we can -look at the pretty IR calls. The sample code inserts calls to Codegen into the -"<tt>Toplevel.main_loop</tt>", and then dumps out the LLVM IR. This gives a -nice way to look at the LLVM IR for simple functions. For example: -</p> - -<div class="doc_code"> -<pre> -ready> <b>4+5</b>; -Read top-level expression: -define double @""() { -entry: - %addtmp = fadd double 4.000000e+00, 5.000000e+00 - ret double %addtmp -} -</pre> -</div> - -<p>Note how the parser turns the top-level expression into anonymous functions -for us. This will be handy when we add <a href="OCamlLangImpl4.html#jit">JIT -support</a> in the next chapter. Also note that the code is very literally -transcribed, no optimizations are being performed. We will -<a href="OCamlLangImpl4.html#trivialconstfold">add optimizations</a> explicitly -in the next chapter.</p> - -<div class="doc_code"> -<pre> -ready> <b>def foo(a b) a*a + 2*a*b + b*b;</b> -Read function definition: -define double @foo(double %a, double %b) { -entry: - %multmp = fmul double %a, %a - %multmp1 = fmul double 2.000000e+00, %a - %multmp2 = fmul double %multmp1, %b - %addtmp = fadd double %multmp, %multmp2 - %multmp3 = fmul double %b, %b - %addtmp4 = fadd double %addtmp, %multmp3 - ret double %addtmp4 -} -</pre> -</div> - -<p>This shows some simple arithmetic. Notice the striking similarity to the -LLVM builder calls that we use to create the instructions.</p> - -<div class="doc_code"> -<pre> -ready> <b>def bar(a) foo(a, 4.0) + bar(31337);</b> -Read function definition: -define double @bar(double %a) { -entry: - %calltmp = call double @foo(double %a, double 4.000000e+00) - %calltmp1 = call double @bar(double 3.133700e+04) - %addtmp = fadd double %calltmp, %calltmp1 - ret double %addtmp -} -</pre> -</div> - -<p>This shows some function calls. Note that this function will take a long -time to execute if you call it. In the future we'll add conditional control -flow to actually make recursion useful :).</p> - -<div class="doc_code"> -<pre> -ready> <b>extern cos(x);</b> -Read extern: -declare double @cos(double) - -ready> <b>cos(1.234);</b> -Read top-level expression: -define double @""() { -entry: - %calltmp = call double @cos(double 1.234000e+00) - ret double %calltmp -} -</pre> -</div> - -<p>This shows an extern for the libm "cos" function, and a call to it.</p> - - -<div class="doc_code"> -<pre> -ready> <b>^D</b> -; ModuleID = 'my cool jit' - -define double @""() { -entry: - %addtmp = fadd double 4.000000e+00, 5.000000e+00 - ret double %addtmp -} - -define double @foo(double %a, double %b) { -entry: - %multmp = fmul double %a, %a - %multmp1 = fmul double 2.000000e+00, %a - %multmp2 = fmul double %multmp1, %b - %addtmp = fadd double %multmp, %multmp2 - %multmp3 = fmul double %b, %b - %addtmp4 = fadd double %addtmp, %multmp3 - ret double %addtmp4 -} - -define double @bar(double %a) { -entry: - %calltmp = call double @foo(double %a, double 4.000000e+00) - %calltmp1 = call double @bar(double 3.133700e+04) - %addtmp = fadd double %calltmp, %calltmp1 - ret double %addtmp -} - -declare double @cos(double) - -define double @""() { -entry: - %calltmp = call double @cos(double 1.234000e+00) - ret double %calltmp -} -</pre> -</div> - -<p>When you quit the current demo, it dumps out the IR for the entire module -generated. Here you can see the big picture with all the functions referencing -each other.</p> - -<p>This wraps up the third chapter of the Kaleidoscope tutorial. Up next, we'll -describe how to <a href="OCamlLangImpl4.html">add JIT codegen and optimizer -support</a> to this so we can actually start running code!</p> - -</div> - - -<!-- *********************************************************************** --> -<h2><a name="code">Full Code Listing</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p> -Here is the complete code listing for our running example, enhanced with the -LLVM code generator. Because this uses the LLVM libraries, we need to link -them in. To do this, we use the <a -href="http://llvm.org/cmds/llvm-config.html">llvm-config</a> tool to inform -our makefile/command line about which options to use:</p> - -<div class="doc_code"> -<pre> -# Compile -ocamlbuild toy.byte -# Run -./toy.byte -</pre> -</div> - -<p>Here is the code:</p> - -<dl> -<dt>_tags:</dt> -<dd class="doc_code"> -<pre> -<{lexer,parser}.ml>: use_camlp4, pp(camlp4of) -<*.{byte,native}>: g++, use_llvm, use_llvm_analysis -</pre> -</dd> - -<dt>myocamlbuild.ml:</dt> -<dd class="doc_code"> -<pre> -open Ocamlbuild_plugin;; - -ocaml_lib ~extern:true "llvm";; -ocaml_lib ~extern:true "llvm_analysis";; - -flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"]);; -</pre> -</dd> - -<dt>token.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Lexer Tokens - *===----------------------------------------------------------------------===*) - -(* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of - * these others for known things. *) -type token = - (* commands *) - | Def | Extern - - (* primary *) - | Ident of string | Number of float - - (* unknown *) - | Kwd of char -</pre> -</dd> - -<dt>lexer.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Lexer - *===----------------------------------------------------------------------===*) - -let rec lex = parser - (* Skip any whitespace. *) - | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream - - (* identifier: [a-zA-Z][a-zA-Z0-9] *) - | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_ident buffer stream - - (* number: [0-9.]+ *) - | [< ' ('0' .. '9' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_number buffer stream - - (* Comment until end of line. *) - | [< ' ('#'); stream >] -> - lex_comment stream - - (* Otherwise, just return the character as its ascii value. *) - | [< 'c; stream >] -> - [< 'Token.Kwd c; lex stream >] - - (* end of stream. *) - | [< >] -> [< >] - -and lex_number buffer = parser - | [< ' ('0' .. '9' | '.' as c); stream >] -> - Buffer.add_char buffer c; - lex_number buffer stream - | [< stream=lex >] -> - [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] - -and lex_ident buffer = parser - | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> - Buffer.add_char buffer c; - lex_ident buffer stream - | [< stream=lex >] -> - match Buffer.contents buffer with - | "def" -> [< 'Token.Def; stream >] - | "extern" -> [< 'Token.Extern; stream >] - | id -> [< 'Token.Ident id; stream >] - -and lex_comment = parser - | [< ' ('\n'); stream=lex >] -> stream - | [< 'c; e=lex_comment >] -> e - | [< >] -> [< >] -</pre> -</dd> - -<dt>ast.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Abstract Syntax Tree (aka Parse Tree) - *===----------------------------------------------------------------------===*) - -(* expr - Base type for all expression nodes. *) -type expr = - (* variant for numeric literals like "1.0". *) - | Number of float - - (* variant for referencing a variable, like "a". *) - | Variable of string - - (* variant for a binary operator. *) - | Binary of char * expr * expr - - (* variant for function calls. *) - | Call of string * expr array - -(* proto - This type represents the "prototype" for a function, which captures - * its name, and its argument names (thus implicitly the number of arguments the - * function takes). *) -type proto = Prototype of string * string array - -(* func - This type represents a function definition itself. *) -type func = Function of proto * expr -</pre> -</dd> - -<dt>parser.ml:</dt> -<dd class="doc_code"> -<pre> -(*===---------------------------------------------------------------------=== - * Parser - *===---------------------------------------------------------------------===*) - -(* binop_precedence - This holds the precedence for each binary operator that is - * defined *) -let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 - -(* precedence - Get the precedence of the pending binary operator token. *) -let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 - -(* primary - * ::= identifier - * ::= numberexpr - * ::= parenexpr *) -let rec parse_primary = parser - (* numberexpr ::= number *) - | [< 'Token.Number n >] -> Ast.Number n - - (* parenexpr ::= '(' expression ')' *) - | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e - - (* identifierexpr - * ::= identifier - * ::= identifier '(' argumentexpr ')' *) - | [< 'Token.Ident id; stream >] -> - let rec parse_args accumulator = parser - | [< e=parse_expr; stream >] -> - begin parser - | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e - | [< >] -> e :: accumulator - end stream - | [< >] -> accumulator - in - let rec parse_ident id = parser - (* Call. *) - | [< 'Token.Kwd '('; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')'">] -> - Ast.Call (id, Array.of_list (List.rev args)) - - (* Simple variable ref. *) - | [< >] -> Ast.Variable id - in - parse_ident id stream - - | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") - -(* binoprhs - * ::= ('+' primary)* *) -and parse_bin_rhs expr_prec lhs stream = - match Stream.peek stream with - (* If this is a binop, find its precedence. *) - | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> - let token_prec = precedence c in - - (* If this is a binop that binds at least as tightly as the current binop, - * consume it, otherwise we are done. *) - if token_prec < expr_prec then lhs else begin - (* Eat the binop. *) - Stream.junk stream; - - (* Parse the primary expression after the binary operator. *) - let rhs = parse_primary stream in - - (* Okay, we know this is a binop. *) - let rhs = - match Stream.peek stream with - | Some (Token.Kwd c2) -> - (* If BinOp binds less tightly with rhs than the operator after - * rhs, let the pending operator take rhs as its lhs. *) - let next_prec = precedence c2 in - if token_prec < next_prec - then parse_bin_rhs (token_prec + 1) rhs stream - else rhs - | _ -> rhs - in - - (* Merge lhs/rhs. *) - let lhs = Ast.Binary (c, lhs, rhs) in - parse_bin_rhs expr_prec lhs stream - end - | _ -> lhs - -(* expression - * ::= primary binoprhs *) -and parse_expr = parser - | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream - -(* prototype - * ::= id '(' id* ')' *) -let parse_prototype = - let rec parse_args accumulator = parser - | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e - | [< >] -> accumulator - in - - parser - | [< 'Token.Ident id; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - (* success. *) - Ast.Prototype (id, Array.of_list (List.rev args)) - - | [< >] -> - raise (Stream.Error "expected function name in prototype") - -(* definition ::= 'def' prototype expression *) -let parse_definition = parser - | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> - Ast.Function (p, e) - -(* toplevelexpr ::= expression *) -let parse_toplevel = parser - | [< e=parse_expr >] -> - (* Make an anonymous proto. *) - Ast.Function (Ast.Prototype ("", [||]), e) - -(* external ::= 'extern' prototype *) -let parse_extern = parser - | [< 'Token.Extern; e=parse_prototype >] -> e -</pre> -</dd> - -<dt>codegen.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Code Generation - *===----------------------------------------------------------------------===*) - -open Llvm - -exception Error of string - -let context = global_context () -let the_module = create_module context "my cool jit" -let builder = builder context -let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 -let double_type = double_type context - -let rec codegen_expr = function - | Ast.Number n -> const_float double_type n - | Ast.Variable name -> - (try Hashtbl.find named_values name with - | Not_found -> raise (Error "unknown variable name")) - | Ast.Binary (op, lhs, rhs) -> - let lhs_val = codegen_expr lhs in - let rhs_val = codegen_expr rhs in - begin - match op with - | '+' -> build_add lhs_val rhs_val "addtmp" builder - | '-' -> build_sub lhs_val rhs_val "subtmp" builder - | '*' -> build_mul lhs_val rhs_val "multmp" builder - | '<' -> - (* Convert bool 0/1 to double 0.0 or 1.0 *) - let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in - build_uitofp i double_type "booltmp" builder - | _ -> raise (Error "invalid binary operator") - end - | Ast.Call (callee, args) -> - (* Look up the name in the module table. *) - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "unknown function referenced") - in - let params = params callee in - - (* If argument mismatch error. *) - if Array.length params == Array.length args then () else - raise (Error "incorrect # arguments passed"); - let args = Array.map codegen_expr args in - build_call callee args "calltmp" builder - -let codegen_proto = function - | Ast.Prototype (name, args) -> - (* Make the function type: double(double,double) etc. *) - let doubles = Array.make (Array.length args) double_type in - let ft = function_type double_type doubles in - let f = - match lookup_function name the_module with - | None -> declare_function name ft the_module - - (* If 'f' conflicted, there was already something named 'name'. If it - * has a body, don't allow redefinition or reextern. *) - | Some f -> - (* If 'f' already has a body, reject this. *) - if block_begin f <> At_end f then - raise (Error "redefinition of function"); - - (* If 'f' took a different number of arguments, reject. *) - if element_type (type_of f) <> ft then - raise (Error "redefinition of function with different # args"); - f - in - - (* Set names for all arguments. *) - Array.iteri (fun i a -> - let n = args.(i) in - set_value_name n a; - Hashtbl.add named_values n a; - ) (params f); - f - -let codegen_func = function - | Ast.Function (proto, body) -> - Hashtbl.clear named_values; - let the_function = codegen_proto proto in - - (* Create a new basic block to start insertion into. *) - let bb = append_block context "entry" the_function in - position_at_end bb builder; - - try - let ret_val = codegen_expr body in - - (* Finish off the function. *) - let _ = build_ret ret_val builder in - - (* Validate the generated code, checking for consistency. *) - Llvm_analysis.assert_valid_function the_function; - - the_function - with e -> - delete_function the_function; - raise e -</pre> -</dd> - -<dt>toplevel.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Top-Level parsing and JIT Driver - *===----------------------------------------------------------------------===*) - -open Llvm - -(* top ::= definition | external | expression | ';' *) -let rec main_loop stream = - match Stream.peek stream with - | None -> () - - (* ignore top-level semicolons. *) - | Some (Token.Kwd ';') -> - Stream.junk stream; - main_loop stream - - | Some token -> - begin - try match token with - | Token.Def -> - let e = Parser.parse_definition stream in - print_endline "parsed a function definition."; - dump_value (Codegen.codegen_func e); - | Token.Extern -> - let e = Parser.parse_extern stream in - print_endline "parsed an extern."; - dump_value (Codegen.codegen_proto e); - | _ -> - (* Evaluate a top-level expression into an anonymous function. *) - let e = Parser.parse_toplevel stream in - print_endline "parsed a top-level expr"; - dump_value (Codegen.codegen_func e); - with Stream.Error s | Codegen.Error s -> - (* Skip token for error recovery. *) - Stream.junk stream; - print_endline s; - end; - print_string "ready> "; flush stdout; - main_loop stream -</pre> -</dd> - -<dt>toy.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Main driver code. - *===----------------------------------------------------------------------===*) - -open Llvm - -let main () = - (* Install standard binary operators. - * 1 is the lowest precedence. *) - Hashtbl.add Parser.binop_precedence '<' 10; - Hashtbl.add Parser.binop_precedence '+' 20; - Hashtbl.add Parser.binop_precedence '-' 20; - Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) - - (* Prime the first token. *) - print_string "ready> "; flush stdout; - let stream = Lexer.lex (Stream.of_channel stdin) in - - (* Run the main "interpreter loop" now. *) - Toplevel.main_loop stream; - - (* Print out all the generated code. *) - dump_module Codegen.the_module -;; - -main () -</pre> -</dd> -</dl> - -<a href="OCamlLangImpl4.html">Next: Adding JIT and Optimizer Support</a> -</div> - -<!-- *********************************************************************** --> -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> - - <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> - <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a><br> - <a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> -</body> -</html> diff --git a/docs/tutorial/OCamlLangImpl3.rst b/docs/tutorial/OCamlLangImpl3.rst new file mode 100644 index 0000000000..d2a47b486c --- /dev/null +++ b/docs/tutorial/OCamlLangImpl3.rst @@ -0,0 +1,964 @@ +======================================== +Kaleidoscope: Code generation to LLVM IR +======================================== + +.. contents:: + :local: + +Written by `Chris Lattner <mailto:sabre@nondot.org>`_ and `Erick +Tryzelaar <mailto:idadesub@users.sourceforge.net>`_ + +Chapter 3 Introduction +====================== + +Welcome to Chapter 3 of the "`Implementing a language with +LLVM <index.html>`_" tutorial. This chapter shows you how to transform +the `Abstract Syntax Tree <OCamlLangImpl2.html>`_, built in Chapter 2, +into LLVM IR. This will teach you a little bit about how LLVM does +things, as well as demonstrate how easy it is to use. It's much more +work to build a lexer and parser than it is to generate LLVM IR code. :) + +**Please note**: the code in this chapter and later require LLVM 2.3 or +LLVM SVN to work. LLVM 2.2 and before will not work with it. + +Code Generation Setup +===================== + +In order to generate LLVM IR, we want some simple setup to get started. +First we define virtual code generation (codegen) methods in each AST +class: + +.. code-block:: ocaml + + let rec codegen_expr = function + | Ast.Number n -> ... + | Ast.Variable name -> ... + +The ``Codegen.codegen_expr`` function says to emit IR for that AST node +along with all the things it depends on, and they all return an LLVM +Value object. "Value" is the class used to represent a "`Static Single +Assignment +(SSA) <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_ +register" or "SSA value" in LLVM. The most distinct aspect of SSA values +is that their value is computed as the related instruction executes, and +it does not get a new value until (and if) the instruction re-executes. +In other words, there is no way to "change" an SSA value. For more +information, please read up on `Static Single +Assignment <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_ +- the concepts are really quite natural once you grok them. + +The second thing we want is an "Error" exception like we used for the +parser, which will be used to report errors found during code generation +(for example, use of an undeclared parameter): + +.. code-block:: ocaml + + exception Error of string + + let context = global_context () + let the_module = create_module context "my cool jit" + let builder = builder context + let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 + let double_type = double_type context + +The static variables will be used during code generation. +``Codgen.the_module`` is the LLVM construct that contains all of the +functions and global variables in a chunk of code. In many ways, it is +the top-level structure that the LLVM IR uses to contain code. + +The ``Codegen.builder`` object is a helper object that makes it easy to +generate LLVM instructions. Instances of the +```IRBuilder`` <http://llvm.org/doxygen/IRBuilder_8h-source.html>`_ +class keep track of the current place to insert instructions and has +methods to create new instructions. + +The ``Codegen.named_values`` map keeps track of which values are defined +in the current scope and what their LLVM representation is. (In other +words, it is a symbol table for the code). In this form of Kaleidoscope, +the only things that can be referenced are function parameters. As such, +function parameters will be in this map when generating code for their +function body. + +With these basics in place, we can start talking about how to generate +code for each expression. Note that this assumes that the +``Codgen.builder`` has been set up to generate code *into* something. +For now, we'll assume that this has already been done, and we'll just +use it to emit code. + +Expression Code Generation +========================== + +Generating LLVM code for expression nodes is very straightforward: less +than 30 lines of commented code for all four of our expression nodes. +First we'll do numeric literals: + +.. code-block:: ocaml + + | Ast.Number n -> const_float double_type n + +In the LLVM IR, numeric constants are represented with the +``ConstantFP`` class, which holds the numeric value in an ``APFloat`` +internally (``APFloat`` has the capability of holding floating point +constants of Arbitrary Precision). This code basically just creates +and returns a ``ConstantFP``. Note that in the LLVM IR that constants +are all uniqued together and shared. For this reason, the API uses "the +foo::get(..)" idiom instead of "new foo(..)" or "foo::Create(..)". + +.. code-block:: ocaml + + | Ast.Variable name -> + (try Hashtbl.find named_values name with + | Not_found -> raise (Error "unknown variable name")) + +References to variables are also quite simple using LLVM. In the simple +version of Kaleidoscope, we assume that the variable has already been +emitted somewhere and its value is available. In practice, the only +values that can be in the ``Codegen.named_values`` map are function +arguments. This code simply checks to see that the specified name is in +the map (if not, an unknown variable is being referenced) and returns +the value for it. In future chapters, we'll add support for `loop +induction variables <LangImpl5.html#for>`_ in the symbol table, and for +`local variables <LangImpl7.html#localvars>`_. + +.. code-block:: ocaml + + | Ast.Binary (op, lhs, rhs) -> + let lhs_val = codegen_expr lhs in + let rhs_val = codegen_expr rhs in + begin + match op with + | '+' -> build_fadd lhs_val rhs_val "addtmp" builder + | '-' -> build_fsub lhs_val rhs_val "subtmp" builder + | '*' -> build_fmul lhs_val rhs_val "multmp" builder + | '<' -> + (* Convert bool 0/1 to double 0.0 or 1.0 *) + let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in + build_uitofp i double_type "booltmp" builder + | _ -> raise (Error "invalid binary operator") + end + +Binary operators start to get more interesting. The basic idea here is +that we recursively emit code for the left-hand side of the expression, +then the right-hand side, then we compute the result of the binary +expression. In this code, we do a simple switch on the opcode to create +the right LLVM instruction. + +In the example above, the LLVM builder class is starting to show its +value. IRBuilder knows where to insert the newly created instruction, +all you have to do is specify what instruction to create (e.g. with +``Llvm.create_add``), which operands to use (``lhs`` and ``rhs`` here) +and optionally provide a name for the generated instruction. + +One nice thing about LLVM is that the name is just a hint. For instance, +if the code above emits multiple "addtmp" variables, LLVM will +automatically provide each one with an increasing, unique numeric +suffix. Local value names for instructions are purely optional, but it +makes it much easier to read the IR dumps. + +`LLVM instructions <../LangRef.html#instref>`_ are constrained by strict +rules: for example, the Left and Right operators of an `add +instruction <../LangRef.html#i_add>`_ must have the same type, and the +result type of the add must match the operand types. Because all values +in Kaleidoscope are doubles, this makes for very simple code for add, +sub and mul. + +On the other hand, LLVM specifies that the `fcmp +instruction <../LangRef.html#i_fcmp>`_ always returns an 'i1' value (a +one bit integer). The problem with this is that Kaleidoscope wants the +value to be a 0.0 or 1.0 value. In order to get these semantics, we +combine the fcmp instruction with a `uitofp +instruction <../LangRef.html#i_uitofp>`_. This instruction converts its +input integer into a floating point value by treating the input as an +unsigned value. In contrast, if we used the `sitofp +instruction <../LangRef.html#i_sitofp>`_, the Kaleidoscope '<' operator +would return 0.0 and -1.0, depending on the input value. + +.. code-block:: ocaml + + | Ast.Call (callee, args) -> + (* Look up the name in the module table. *) + let callee = + match lookup_function callee the_module with + | Some callee -> callee + | None -> raise (Error "unknown function referenced") + in + let params = params callee in + + (* If argument mismatch error. *) + if Array.length params == Array.length args then () else + raise (Error "incorrect # arguments passed"); + let args = Array.map codegen_expr args in + build_call callee args "calltmp" builder + +Code generation for function calls is quite straightforward with LLVM. +The code above initially does a function name lookup in the LLVM +Module's symbol table. Recall that the LLVM Module is the container that +holds all of the functions we are JIT'ing. By giving each function the +same name as what the user specifies, we can use the LLVM symbol table +to resolve function names for us. + +Once we have the function to call, we recursively codegen each argument +that is to be passed in, and create an LLVM `call +instruction <../LangRef.html#i_call>`_. Note that LLVM uses the native C +calling conventions by default, allowing these calls to also call into +standard library functions like "sin" and "cos", with no additional +effort. + +This wraps up our handling of the four basic expressions that we have so +far in Kaleidoscope. Feel free to go in and add some more. For example, +by browsing the `LLVM language reference <../LangRef.html>`_ you'll find +several other interesting instructions that are really easy to plug into +our basic framework. + +Function Code Generation +======================== + +Code generation for prototypes and functions must handle a number of +details, which make their code less beautiful than expression code +generation, but allows us to illustrate some important points. First, +lets talk about code generation for prototypes: they are used both for +function bodies and external function declarations. The code starts +with: + +.. code-block:: ocaml + + let codegen_proto = function + | Ast.Prototype (name, args) -> + (* Make the function type: double(double,double) etc. *) + let doubles = Array.make (Array.length args) double_type in + let ft = function_type double_type doubles in + let f = + match lookup_function name the_module with + +This code packs a lot of power into a few lines. Note first that this +function returns a "Function\*" instead of a "Value\*" (although at the +moment they both are modeled by ``llvalue`` in ocaml). Because a +"prototype" really talks about the external interface for a function +(not the value computed by an expression), it makes sense for it to +return the LLVM Function it corresponds to when codegen'd. + +The call to ``Llvm.function_type`` creates the ``Llvm.llvalue`` that +should be used for a given Prototype. Since all function arguments in +Kaleidoscope are of type double, the first line creates a vector of "N" +LLVM double types. It then uses the ``Llvm.function_type`` method to +create a function type that takes "N" doubles as arguments, returns one +double as a result, and that is not vararg (that uses the function +``Llvm.var_arg_function_type``). Note that Types in LLVM are uniqued +just like ``Constant``'s are, so you don't "new" a type, you "get" it. + +The final line above checks if the function has already been defined in +``Codegen.the_module``. If not, we will create it. + +.. code-block:: ocaml + + | None -> declare_function name ft the_module + +This indicates the type and name to use, as well as which module to +insert into. By default we assume a function has +``Llvm.Linkage.ExternalLinkage``. "`external +linkage <LangRef.html#linkage>`_" means that the function may be defined +outside the current module and/or that it is callable by functions +outside the module. The "``name``" passed in is the name the user +specified: this name is registered in "``Codegen.the_module``"s symbol +table, which is used by the function call code above. + +In Kaleidoscope, I choose to allow redefinitions of functions in two +cases: first, we want to allow 'extern'ing a function more than once, as +long as the prototypes for the externs match (since all arguments have +the same type, we just have to check that the number of arguments +match). Second, we want to allow 'extern'ing a function and then +defining a body for it. This is useful when defining mutually recursive +functions. + +.. code-block:: ocaml + + (* If 'f' conflicted, there was already something named 'name'. If it + * has a body, don't allow redefinition or reextern. *) + | Some f -> + (* If 'f' already has a body, reject this. *) + if Array.length (basic_blocks f) == 0 then () else + raise (Error "redefinition of function"); + + (* If 'f' took a different number of arguments, reject. *) + if Array.length (params f) == Array.length args then () else + raise (Error "redefinition of function with different # args"); + f + in + +In order to verify the logic above, we first check to see if the +pre-existing function is "empty". In this case, empty means that it has +no basic blocks in it, which means it has no body. If it has no body, it +is a forward declaration. Since we don't allow anything after a full +definition of the function, the code rejects this case. If the previous +reference to a function was an 'extern', we simply verify that the +number of arguments for that definition and this one match up. If not, +we emit an error. + +.. code-block:: ocaml + + (* Set names for all arguments. *) + Array.iteri (fun i a -> + let n = args.(i) in + set_value_name n a; + Hashtbl.add named_values n a; + ) (params f); + f + +The last bit of code for prototypes loops over all of the arguments in +the function, setting the name of the LLVM Argument objects to match, +and registering the arguments in the ``Codegen.named_values`` map for +future use by the ``Ast.Variable`` variant. Once this is set up, it +returns the Function object to the caller. Note that we don't check for +conflicting argument names here (e.g. "extern foo(a b a)"). Doing so +would be very straight-forward with the mechanics we have already used +above. + +.. code-block:: ocaml + + let codegen_func = function + | Ast.Function (proto, body) -> + Hashtbl.clear named_values; + let the_function = codegen_proto proto in + +Code generation for function definitions starts out simply enough: we +just codegen the prototype (Proto) and verify that it is ok. We then +clear out the ``Codegen.named_values`` map to make sure that there isn't +anything in it from the last function we compiled. Code generation of +the prototype ensures that there is an LLVM Function object that is +ready to go for us. + +.. code-block:: ocaml + + (* Create a new basic block to start insertion into. *) + let bb = append_block context "entry" the_function in + position_at_end bb builder; + + try + let ret_val = codegen_expr body in + +Now we get to the point where the ``Codegen.builder`` is set up. The +first line creates a new `basic +block <http://en.wikipedia.org/wiki/Basic_block>`_ (named "entry"), +which is inserted into ``the_function``. The second line then tells the +builder that new instructions should be inserted into the end of the new +basic block. Basic blocks in LLVM are an important part of functions +that define the `Control Flow +Graph <http://en.wikipedia.org/wiki/Control_flow_graph>`_. Since we +don't have any control flow, our functions will only contain one block +at this point. We'll fix this in `Chapter 5 <OCamlLangImpl5.html>`_ :). + +.. code-block:: ocaml + + let ret_val = codegen_expr body in + + (* Finish off the function. *) + let _ = build_ret ret_val builder in + + (* Validate the generated code, checking for consistency. *) + Llvm_analysis.assert_valid_function the_function; + + the_function + +Once the insertion point is set up, we call the ``Codegen.codegen_func`` +method for the root expression of the function. If no error happens, +this emits code to compute the expression into the entry block and +returns the value that was computed. Assuming no error, we then create +an LLVM `ret instruction <../LangRef.html#i_ret>`_, which completes the +function. Once the function is built, we call +``Llvm_analysis.assert_valid_function``, which is provided by LLVM. This +function does a variety of consistency checks on the generated code, to +determine if our compiler is doing everything right. Using this is +important: it can catch a lot of bugs. Once the function is finished and +validated, we return it. + +.. code-block:: ocaml + + with e -> + delete_function the_function; + raise e + +The only piece left here is handling of the error case. For simplicity, +we handle this by merely deleting the function we produced with the +``Llvm.delete_function`` method. This allows the user to redefine a +function that they incorrectly typed in before: if we didn't delete it, +it would live in the symbol table, with a body, preventing future +redefinition. + +This code does have a bug, though. Since the ``Codegen.codegen_proto`` +can return a previously defined forward declaration, our code can +actually delete a forward declaration. There are a number of ways to fix +this bug, see what you can come up with! Here is a testcase: + +:: + + extern foo(a b); # ok, defines foo. + def foo(a b) c; # error, 'c' is invalid. + def bar() foo(1, 2); # error, unknown function "foo" + +Driver Changes and Closing Thoughts +=================================== + +For now, code generation to LLVM doesn't really get us much, except that +we can look at the pretty IR calls. The sample code inserts calls to +Codegen into the "``Toplevel.main_loop``", and then dumps out the LLVM +IR. This gives a nice way to look at the LLVM IR for simple functions. +For example: + +:: + + ready> 4+5; + Read top-level expression: + define double @""() { + entry: + %addtmp = fadd double 4.000000e+00, 5.000000e+00 + ret double %addtmp + } + +Note how the parser turns the top-level expression into anonymous +functions for us. This will be handy when we add `JIT +support <OCamlLangImpl4.html#jit>`_ in the next chapter. Also note that +the code is very literally transcribed, no optimizations are being +performed. We will `add +optimizations <OCamlLangImpl4.html#trivialconstfold>`_ explicitly in the +next chapter. + +:: + + ready> def foo(a b) a*a + 2*a*b + b*b; + Read function definition: + define double @foo(double %a, double %b) { + entry: + %multmp = fmul double %a, %a + %multmp1 = fmul double 2.000000e+00, %a + %multmp2 = fmul double %multmp1, %b + %addtmp = fadd double %multmp, %multmp2 + %multmp3 = fmul double %b, %b + %addtmp4 = fadd double %addtmp, %multmp3 + ret double %addtmp4 + } + +This shows some simple arithmetic. Notice the striking similarity to the +LLVM builder calls that we use to create the instructions. + +:: + + ready> def bar(a) foo(a, 4.0) + bar(31337); + Read function definition: + define double @bar(double %a) { + entry: + %calltmp = call double @foo(double %a, double 4.000000e+00) + %calltmp1 = call double @bar(double 3.133700e+04) + %addtmp = fadd double %calltmp, %calltmp1 + ret double %addtmp + } + +This shows some function calls. Note that this function will take a long +time to execute if you call it. In the future we'll add conditional +control flow to actually make recursion useful :). + +:: + + ready> extern cos(x); + Read extern: + declare double @cos(double) + + ready> cos(1.234); + Read top-level expression: + define double @""() { + entry: + %calltmp = call double @cos(double 1.234000e+00) + ret double %calltmp + } + +This shows an extern for the libm "cos" function, and a call to it. + +:: + + ready> ^D + ; ModuleID = 'my cool jit' + + define double @""() { + entry: + %addtmp = fadd double 4.000000e+00, 5.000000e+00 + ret double %addtmp + } + + define double @foo(double %a, double %b) { + entry: + %multmp = fmul double %a, %a + %multmp1 = fmul double 2.000000e+00, %a + %multmp2 = fmul double %multmp1, %b + %addtmp = fadd double %multmp, %multmp2 + %multmp3 = fmul double %b, %b + %addtmp4 = fadd double %addtmp, %multmp3 + ret double %addtmp4 + } + + define double @bar(double %a) { + entry: + %calltmp = call double @foo(double %a, double 4.000000e+00) + %calltmp1 = call double @bar(double 3.133700e+04) + %addtmp = fadd double %calltmp, %calltmp1 + ret double %addtmp + } + + declare double @cos(double) + + define double @""() { + entry: + %calltmp = call double @cos(double 1.234000e+00) + ret double %calltmp + } + +When you quit the current demo, it dumps out the IR for the entire +module generated. Here you can see the big picture with all the +functions referencing each other. + +This wraps up the third chapter of the Kaleidoscope tutorial. Up next, +we'll describe how to `add JIT codegen and optimizer +support <OCamlLangImpl4.html>`_ to this so we can actually start running +code! + +Full Code Listing +================= + +Here is the complete code listing for our running example, enhanced with +the LLVM code generator. Because this uses the LLVM libraries, we need +to link them in. To do this, we use the +`llvm-config <http://llvm.org/cmds/llvm-config.html>`_ tool to inform +our makefile/command line about which options to use: + +.. code-block:: bash + + # Compile + ocamlbuild toy.byte + # Run + ./toy.byte + +Here is the code: + +\_tags: + :: + + <{lexer,parser}.ml>: use_camlp4, pp(camlp4of) + <*.{byte,native}>: g++, use_llvm, use_llvm_analysis + +myocamlbuild.ml: + .. code-block:: ocaml + + open Ocamlbuild_plugin;; + + ocaml_lib ~extern:true "llvm";; + ocaml_lib ~extern:true "llvm_analysis";; + + flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"]);; + +token.ml: + .. code-block:: ocaml + + (*===----------------------------------------------------------------------=== + * Lexer Tokens + *===----------------------------------------------------------------------===*) + + (* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of + * these others for known things. *) + type token = + (* commands *) + | Def | Extern + + (* primary *) + | Ident of string | Number of float + + (* unknown *) + | Kwd of char + +lexer.ml: + .. code-block:: ocaml + + (*===----------------------------------------------------------------------=== + * Lexer + *===----------------------------------------------------------------------===*) + + let rec lex = parser + (* Skip any whitespace. *) + | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream + + (* identifier: [a-zA-Z][a-zA-Z0-9] *) + | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> + let buffer = Buffer.create 1 in + Buffer.add_char buffer c; + lex_ident buffer stream + + (* number: [0-9.]+ *) + | [< ' ('0' .. '9' as c); stream >] -> + let buffer = Buffer.create 1 in + Buffer.add_char buffer c; + lex_number buffer stream + + (* Comment until end of line. *) + | [< ' ('#'); stream >] -> + lex_comment stream + + (* Otherwise, just return the character as its ascii value. *) + | [< 'c; stream >] -> + [< 'Token.Kwd c; lex stream >] + + (* end of stream. *) + | [< >] -> [< >] + + and lex_number buffer = parser + | [< ' ('0' .. '9' | '.' as c); stream >] -> + Buffer.add_char buffer c; + lex_number buffer stream + | [< stream=lex >] -> + [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] + + and lex_ident buffer = parser + | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> + Buffer.add_char buffer c; + lex_ident buffer stream + | [< stream=lex >] -> + match Buffer.contents buffer with + | "def" -> [< 'Token.Def; stream >] + | "extern" -> [< 'Token.Extern; stream >] + | id -> [< 'Token.Ident id; stream >] + + and lex_comment = parser + | [< ' ('\n'); stream=lex >] -> stream + | [< 'c; e=lex_comment >] -> e + | [< >] -> [< >] + +ast.ml: + .. code-block:: ocaml + + (*===----------------------------------------------------------------------=== + * Abstract Syntax Tree (aka Parse Tree) + *===----------------------------------------------------------------------===*) + + (* expr - Base type for all expression nodes. *) + type expr = + (* variant for numeric literals like "1.0". *) + | Number of float + + (* variant for referencing a variable, like "a". *) + | Variable of string + + (* variant for a binary operator. *) + | Binary of char * expr * expr + + (* variant for function calls. *) + | Call of string * expr array + + (* proto - This type represents the "prototype" for a function, which captures + * its name, and its argument names (thus implicitly the number of arguments the + * function takes). *) + type proto = Prototype of string * string array + + (* func - This type represents a function definition itself. *) + type func = Function of proto * expr + +parser.ml: + .. code-block:: ocaml + + (*===---------------------------------------------------------------------=== + * Parser + *===---------------------------------------------------------------------===*) + + (* binop_precedence - This holds the precedence for each binary operator that is + * defined *) + let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 + + (* precedence - Get the precedence of the pending binary operator token. *) + let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 + + (* primary + * ::= identifier + * ::= numberexpr + * ::= parenexpr *) + let rec parse_primary = parser + (* numberexpr ::= number *) + | [< 'Token.Number n >] -> Ast.Number n + + (* parenexpr ::= '(' expression ')' *) + | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e + + (* identifierexpr + * ::= identifier + * ::= identifier '(' argumentexpr ')' *) + | [< 'Token.Ident id; stream >] -> + let rec parse_args accumulator = parser + | [< e=parse_expr; stream >] -> + begin parser + | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e + | [< >] -> e :: accumulator + end stream + | [< >] -> accumulator + in + let rec parse_ident id = parser + (* Call. *) + | [< 'Token.Kwd '('; + args=parse_args []; + 'Token.Kwd ')' ?? "expected ')'">] -> + Ast.Call (id, Array.of_list (List.rev args)) + + (* Simple variable ref. *) + | [< >] -> Ast.Variable id + in + parse_ident id stream + + | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") + + (* binoprhs + * ::= ('+' primary)* *) + and parse_bin_rhs expr_prec lhs stream = + match Stream.peek stream with + (* If this is a binop, find its precedence. *) + | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> + let token_prec = precedence c in + + (* If this is a binop that binds at least as tightly as the current binop, + * consume it, otherwise we are done. *) + if token_prec < expr_prec then lhs else begin + (* Eat the binop. *) + Stream.junk stream; + + (* Parse the primary expression after the binary operator. *) + let rhs = parse_primary stream in + + (* Okay, we know this is a binop. *) + let rhs = + match Stream.peek stream with + | Some (Token.Kwd c2) -> + (* If BinOp binds less tightly with rhs than the operator after + * rhs, let the pending operator take rhs as its lhs. *) + let next_prec = precedence c2 in + if token_prec < next_prec + then parse_bin_rhs (token_prec + 1) rhs stream + else rhs + | _ -> rhs + in + + (* Merge lhs/rhs. *) + let lhs = Ast.Binary (c, lhs, rhs) in + parse_bin_rhs expr_prec lhs stream + end + | _ -> lhs + + (* expression + * ::= primary binoprhs *) + and parse_expr = parser + | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream + + (* prototype + * ::= id '(' id* ')' *) + let parse_prototype = + let rec parse_args accumulator = parser + | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e + | [< >] -> accumulator + in + + parser + | [< 'Token.Ident id; + 'Token.Kwd '(' ?? "expected '(' in prototype"; + args=parse_args []; + 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> + (* success. *) + Ast.Prototype (id, Array.of_list (List.rev args)) + + | [< >] -> + raise (Stream.Error "expected function name in prototype") + + (* definition ::= 'def' prototype expression *) + let parse_definition = parser + | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> + Ast.Function (p, e) + + (* toplevelexpr ::= expression *) + let parse_toplevel = parser + | [< e=parse_expr >] -> + (* Make an anonymous proto. *) + Ast.Function (Ast.Prototype ("", [||]), e) + + (* external ::= 'extern' prototype *) + let parse_extern = parser + | [< 'Token.Extern; e=parse_prototype >] -> e + +codegen.ml: + .. code-block:: ocaml + + (*===----------------------------------------------------------------------=== + * Code Generation + *===----------------------------------------------------------------------===*) + + open Llvm + + exception Error of string + + let context = global_context () + let the_module = create_module context "my cool jit" + let builder = builder context + let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 + let double_type = double_type context + + let rec codegen_expr = function + | Ast.Number n -> const_float double_type n + | Ast.Variable name -> + (try Hashtbl.find named_values name with + | Not_found -> raise (Error "unknown variable name")) + | Ast.Binary (op, lhs, rhs) -> + let lhs_val = codegen_expr lhs in + let rhs_val = codegen_expr rhs in + begin + match op with + | '+' -> build_add lhs_val rhs_val "addtmp" builder + | '-' -> build_sub lhs_val rhs_val "subtmp" builder + | '*' -> build_mul lhs_val rhs_val "multmp" builder + | '<' -> + (* Convert bool 0/1 to double 0.0 or 1.0 *) + let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in + build_uitofp i double_type "booltmp" builder + | _ -> raise (Error "invalid binary operator") + end + | Ast.Call (callee, args) -> + (* Look up the name in the module table. *) + let callee = + match lookup_function callee the_module with + | Some callee -> callee + | None -> raise (Error "unknown function referenced") + in + let params = params callee in + + (* If argument mismatch error. *) + if Array.length params == Array.length args then () else + raise (Error "incorrect # arguments passed"); + let args = Array.map codegen_expr args in + build_call callee args "calltmp" builder + + let codegen_proto = function + | Ast.Prototype (name, args) -> + (* Make the function type: double(double,double) etc. *) + let doubles = Array.make (Array.length args) double_type in + let ft = function_type double_type doubles in + let f = + match lookup_function name the_module with + | None -> declare_function name ft the_module + + (* If 'f' conflicted, there was already something named 'name'. If it + * has a body, don't allow redefinition or reextern. *) + | Some f -> + (* If 'f' already has a body, reject this. *) + if block_begin f <> At_end f then + raise (Error "redefinition of function"); + + (* If 'f' took a different number of arguments, reject. *) + if element_type (type_of f) <> ft then + raise (Error "redefinition of function with different # args"); + f + in + + (* Set names for all arguments. *) + Array.iteri (fun i a -> + let n = args.(i) in + set_value_name n a; + Hashtbl.add named_values n a; + ) (params f); + f + + let codegen_func = function + | Ast.Function (proto, body) -> + Hashtbl.clear named_values; + let the_function = codegen_proto proto in + + (* Create a new basic block to start insertion into. *) + let bb = append_block context "entry" the_function in + position_at_end bb builder; + + try + let ret_val = codegen_expr body in + + (* Finish off the function. *) + let _ = build_ret ret_val builder in + + (* Validate the generated code, checking for consistency. *) + Llvm_analysis.assert_valid_function the_function; + + the_function + with e -> + delete_function the_function; + raise e + +toplevel.ml: + .. code-block:: ocaml + + (*===----------------------------------------------------------------------=== + * Top-Level parsing and JIT Driver + *===----------------------------------------------------------------------===*) + + open Llvm + + (* top ::= definition | external | expression | ';' *) + let rec main_loop stream = + match Stream.peek stream with + | None -> () + + (* ignore top-level semicolons. *) + | Some (Token.Kwd ';') -> + Stream.junk stream; + main_loop stream + + | Some token -> + begin + try match token with + | Token.Def -> + let e = Parser.parse_definition stream in + print_endline "parsed a function definition."; + dump_value (Codegen.codegen_func e); + | Token.Extern -> + let e = Parser.parse_extern stream in + print_endline "parsed an extern."; + dump_value (Codegen.codegen_proto e); + | _ -> + (* Evaluate a top-level expression into an anonymous function. *) + let e = Parser.parse_toplevel stream in + print_endline "parsed a top-level expr"; + dump_value (Codegen.codegen_func e); + with Stream.Error s | Codegen.Error s -> + (* Skip token for error recovery. *) + Stream.junk stream; + print_endline s; + end; + print_string "ready> "; flush stdout; + main_loop stream + +toy.ml: + .. code-block:: ocaml + + (*===----------------------------------------------------------------------=== + * Main driver code. + *===----------------------------------------------------------------------===*) + + open Llvm + + let main () = + (* Install standard binary operators. + * 1 is the lowest precedence. *) + Hashtbl.add Parser.binop_precedence '<' 10; + Hashtbl.add Parser.binop_precedence '+' 20; + Hashtbl.add Parser.binop_precedence '-' 20; + Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) + + (* Prime the first token. *) + print_string "ready> "; flush stdout; + let stream = Lexer.lex (Stream.of_channel stdin) in + + (* Run the main "interpreter loop" now. *) + Toplevel.main_loop stream; + + (* Print out all the generated code. *) + dump_module Codegen.the_module + ;; + + main () + +`Next: Adding JIT and Optimizer Support <OCamlLangImpl4.html>`_ + diff --git a/docs/tutorial/OCamlLangImpl4.html b/docs/tutorial/OCamlLangImpl4.html deleted file mode 100644 index eb97d986c2..0000000000 --- a/docs/tutorial/OCamlLangImpl4.html +++ /dev/null @@ -1,1026 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> - -<html> -<head> - <title>Kaleidoscope: Adding JIT and Optimizer Support</title> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <meta name="author" content="Chris Lattner"> - <meta name="author" content="Erick Tryzelaar"> - <link rel="stylesheet" href="../_static/llvm.css" type="text/css"> -</head> - -<body> - -<h1>Kaleidoscope: Adding JIT and Optimizer Support</h1> - -<ul> -<li><a href="index.html">Up to Tutorial Index</a></li> -<li>Chapter 4 - <ol> - <li><a href="#intro">Chapter 4 Introduction</a></li> - <li><a href="#trivialconstfold">Trivial Constant Folding</a></li> - <li><a href="#optimizerpasses">LLVM Optimization Passes</a></li> - <li><a href="#jit">Adding a JIT Compiler</a></li> - <li><a href="#code">Full Code Listing</a></li> - </ol> -</li> -<li><a href="OCamlLangImpl5.html">Chapter 5</a>: Extending the Language: Control -Flow</li> -</ul> - -<div class="doc_author"> - <p> - Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a> - and <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a> - </p> -</div> - -<!-- *********************************************************************** --> -<h2><a name="intro">Chapter 4 Introduction</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>Welcome to Chapter 4 of the "<a href="index.html">Implementing a language -with LLVM</a>" tutorial. Chapters 1-3 described the implementation of a simple -language and added support for generating LLVM IR. This chapter describes -two new techniques: adding optimizer support to your language, and adding JIT -compiler support. These additions will demonstrate how to get nice, efficient code -for the Kaleidoscope language.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="trivialconstfold">Trivial Constant Folding</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p><b>Note:</b> the default <tt>IRBuilder</tt> now always includes the constant -folding optimisations below.<p> - -<p> -Our demonstration for Chapter 3 is elegant and easy to extend. Unfortunately, -it does not produce wonderful code. For example, when compiling simple code, -we don't get obvious optimizations:</p> - -<div class="doc_code"> -<pre> -ready> <b>def test(x) 1+2+x;</b> -Read function definition: -define double @test(double %x) { -entry: - %addtmp = fadd double 1.000000e+00, 2.000000e+00 - %addtmp1 = fadd double %addtmp, %x - ret double %addtmp1 -} -</pre> -</div> - -<p>This code is a very, very literal transcription of the AST built by parsing -the input. As such, this transcription lacks optimizations like constant folding -(we'd like to get "<tt>add x, 3.0</tt>" in the example above) as well as other -more important optimizations. Constant folding, in particular, is a very common -and very important optimization: so much so that many language implementors -implement constant folding support in their AST representation.</p> - -<p>With LLVM, you don't need this support in the AST. Since all calls to build -LLVM IR go through the LLVM builder, it would be nice if the builder itself -checked to see if there was a constant folding opportunity when you call it. -If so, it could just do the constant fold and return the constant instead of -creating an instruction. This is exactly what the <tt>LLVMFoldingBuilder</tt> -class does. - -<p>All we did was switch from <tt>LLVMBuilder</tt> to -<tt>LLVMFoldingBuilder</tt>. Though we change no other code, we now have all of our -instructions implicitly constant folded without us having to do anything -about it. For example, the input above now compiles to:</p> - -<div class="doc_code"> -<pre> -ready> <b>def test(x) 1+2+x;</b> -Read function definition: -define double @test(double %x) { -entry: - %addtmp = fadd double 3.000000e+00, %x - ret double %addtmp -} -</pre> -</div> - -<p>Well, that was easy :). In practice, we recommend always using -<tt>LLVMFoldingBuilder</tt> when generating code like this. It has no -"syntactic overhead" for its use (you don't have to uglify your compiler with -constant checks everywhere) and it can dramatically reduce the amount of -LLVM IR that is generated in some cases (particular for languages with a macro -preprocessor or that use a lot of constants).</p> - -<p>On the other hand, the <tt>LLVMFoldingBuilder</tt> is limited by the fact -that it does all of its analysis inline with the code as it is built. If you -take a slightly more complex example:</p> - -<div class="doc_code"> -<pre> -ready> <b>def test(x) (1+2+x)*(x+(1+2));</b> -ready> Read function definition: -define double @test(double %x) { -entry: - %addtmp = fadd double 3.000000e+00, %x - %addtmp1 = fadd double %x, 3.000000e+00 - %multmp = fmul double %addtmp, %addtmp1 - ret double %multmp -} -</pre> -</div> - -<p>In this case, the LHS and RHS of the multiplication are the same value. We'd -really like to see this generate "<tt>tmp = x+3; result = tmp*tmp;</tt>" instead -of computing "<tt>x*3</tt>" twice.</p> - -<p>Unfortunately, no amount of local analysis will be able to detect and correct -this. This requires two transformations: reassociation of expressions (to -make the add's lexically identical) and Common Subexpression Elimination (CSE) -to delete the redundant add instruction. Fortunately, LLVM provides a broad -range of optimizations that you can use, in the form of "passes".</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="optimizerpasses">LLVM Optimization Passes</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>LLVM provides many optimization passes, which do many different sorts of -things and have different tradeoffs. Unlike other systems, LLVM doesn't hold -to the mistaken notion that one set of optimizations is right for all languages -and for all situations. LLVM allows a compiler implementor to make complete -decisions about what optimizations to use, in which order, and in what -situation.</p> - -<p>As a concrete example, LLVM supports both "whole module" passes, which look -across as large of body of code as they can (often a whole file, but if run -at link time, this can be a substantial portion of the whole program). It also -supports and includes "per-function" passes which just operate on a single -function at a time, without looking at other functions. For more information -on passes and how they are run, see the <a href="../WritingAnLLVMPass.html">How -to Write a Pass</a> document and the <a href="../Passes.html">List of LLVM -Passes</a>.</p> - -<p>For Kaleidoscope, we are currently generating functions on the fly, one at -a time, as the user types them in. We aren't shooting for the ultimate -optimization experience in this setting, but we also want to catch the easy and -quick stuff where possible. As such, we will choose to run a few per-function -optimizations as the user types the function in. If we wanted to make a "static -Kaleidoscope compiler", we would use exactly the code we have now, except that -we would defer running the optimizer until the entire file has been parsed.</p> - -<p>In order to get per-function optimizations going, we need to set up a -<a href="../WritingAnLLVMPass.html#passmanager">Llvm.PassManager</a> to hold and -organize the LLVM optimizations that we want to run. Once we have that, we can -add a set of optimizations to run. The code looks like this:</p> - -<div class="doc_code"> -<pre> - (* Create the JIT. *) - let the_execution_engine = ExecutionEngine.create Codegen.the_module in - let the_fpm = PassManager.create_function Codegen.the_module in - - (* Set up the optimizer pipeline. Start with registering info about how the - * target lays out data structures. *) - DataLayout.add (ExecutionEngine.target_data the_execution_engine) the_fpm; - - (* Do simple "peephole" optimizations and bit-twiddling optzn. *) - add_instruction_combining the_fpm; - - (* reassociate expressions. *) - add_reassociation the_fpm; - - (* Eliminate Common SubExpressions. *) - add_gvn the_fpm; - - (* Simplify the control flow graph (deleting unreachable blocks, etc). *) - add_cfg_simplification the_fpm; - - ignore (PassManager.initialize the_fpm); - - (* Run the main "interpreter loop" now. *) - Toplevel.main_loop the_fpm the_execution_engine stream; -</pre> -</div> - -<p>The meat of the matter here, is the definition of "<tt>the_fpm</tt>". It -requires a pointer to the <tt>the_module</tt> to construct itself. Once it is -set up, we use a series of "add" calls to add a bunch of LLVM passes. The -first pass is basically boilerplate, it adds a pass so that later optimizations -know how the data structures in the program are laid out. The -"<tt>the_execution_engine</tt>" variable is related to the JIT, which we will -get to in the next section.</p> - -<p>In this case, we choose to add 4 optimization passes. The passes we chose -here are a pretty standard set of "cleanup" optimizations that are useful for -a wide variety of code. I won't delve into what they do but, believe me, -they are a good starting place :).</p> - -<p>Once the <tt>Llvm.PassManager.</tt> is set up, we need to make use of it. -We do this by running it after our newly created function is constructed (in -<tt>Codegen.codegen_func</tt>), but before it is returned to the client:</p> - -<div class="doc_code"> -<pre> -let codegen_func the_fpm = function - ... - try - let ret_val = codegen_expr body in - - (* Finish off the function. *) - let _ = build_ret ret_val builder in - - (* Validate the generated code, checking for consistency. *) - Llvm_analysis.assert_valid_function the_function; - - (* Optimize the function. *) - let _ = PassManager.run_function the_function the_fpm in - - the_function -</pre> -</div> - -<p>As you can see, this is pretty straightforward. The <tt>the_fpm</tt> -optimizes and updates the LLVM Function* in place, improving (hopefully) its -body. With this in place, we can try our test above again:</p> - -<div class="doc_code"> -<pre> -ready> <b>def test(x) (1+2+x)*(x+(1+2));</b> -ready> Read function definition: -define double @test(double %x) { -entry: - %addtmp = fadd double %x, 3.000000e+00 - %multmp = fmul double %addtmp, %addtmp - ret double %multmp -} -</pre> -</div> - -<p>As expected, we now get our nicely optimized code, saving a floating point -add instruction from every execution of this function.</p> - -<p>LLVM provides a wide variety of optimizations that can be used in certain -circumstances. Some <a href="../Passes.html">documentation about the various -passes</a> is available, but it isn't very complete. Another good source of -ideas can come from looking at the passes that <tt>Clang</tt> runs to get -started. The "<tt>opt</tt>" tool allows you to experiment with passes from the -command line, so you can see if they do anything.</p> - -<p>Now that we have reasonable code coming out of our front-end, lets talk about -executing it!</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="jit">Adding a JIT Compiler</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>Code that is available in LLVM IR can have a wide variety of tools -applied to it. For example, you can run optimizations on it (as we did above), -you can dump it out in textual or binary forms, you can compile the code to an -assembly file (.s) for some target, or you can JIT compile it. The nice thing -about the LLVM IR representation is that it is the "common currency" between -many different parts of the compiler. -</p> - -<p>In this section, we'll add JIT compiler support to our interpreter. The -basic idea that we want for Kaleidoscope is to have the user enter function -bodies as they do now, but immediately evaluate the top-level expressions they -type in. For example, if they type in "1 + 2;", we should evaluate and print -out 3. If they define a function, they should be able to call it from the -command line.</p> - -<p>In order to do this, we first declare and initialize the JIT. This is done -by adding a global variable and a call in <tt>main</tt>:</p> - -<div class="doc_code"> -<pre> -... -let main () = - ... - <b>(* Create the JIT. *) - let the_execution_engine = ExecutionEngine.create Codegen.the_module in</b> - ... -</pre> -</div> - -<p>This creates an abstract "Execution Engine" which can be either a JIT -compiler or the LLVM interpreter. LLVM will automatically pick a JIT compiler -for you if one is available for your platform, otherwise it will fall back to -the interpreter.</p> - -<p>Once the <tt>Llvm_executionengine.ExecutionEngine.t</tt> is created, the JIT -is ready to be used. There are a variety of APIs that are useful, but the -simplest one is the "<tt>Llvm_executionengine.ExecutionEngine.run_function</tt>" -function. This method JIT compiles the specified LLVM Function and returns a -function pointer to the generated machine code. In our case, this means that we -can change the code that parses a top-level expression to look like this:</p> - -<div class="doc_code"> -<pre> - (* Evaluate a top-level expression into an anonymous function. *) - let e = Parser.parse_toplevel stream in - print_endline "parsed a top-level expr"; - let the_function = Codegen.codegen_func the_fpm e in - dump_value the_function; - - (* JIT the function, returning a function pointer. *) - let result = ExecutionEngine.run_function the_function [||] - the_execution_engine in - - print_string "Evaluated to "; - print_float (GenericValue.as_float Codegen.double_type result); - print_newline (); -</pre> -</div> - -<p>Recall that we compile top-level expressions into a self-contained LLVM -function that takes no arguments and returns the computed double. Because the -LLVM JIT compiler matches the native platform ABI, this means that you can just -cast the result pointer to a function pointer of that type and call it directly. -This means, there is no difference between JIT compiled code and native machine -code that is statically linked into your application.</p> - -<p>With just these two changes, lets see how Kaleidoscope works now!</p> - -<div class="doc_code"> -<pre> -ready> <b>4+5;</b> -define double @""() { -entry: - ret double 9.000000e+00 -} - -<em>Evaluated to 9.000000</em> -</pre> -</div> - -<p>Well this looks like it is basically working. The dump of the function -shows the "no argument function that always returns double" that we synthesize -for each top level expression that is typed in. This demonstrates very basic -functionality, but can we do more?</p> - -<div class="doc_code"> -<pre> -ready> <b>def testfunc(x y) x + y*2; </b> -Read function definition: -define double @testfunc(double %x, double %y) { -entry: - %multmp = fmul double %y, 2.000000e+00 - %addtmp = fadd double %multmp, %x - ret double %addtmp -} - -ready> <b>testfunc(4, 10);</b> -define double @""() { -entry: - %calltmp = call double @testfunc(double 4.000000e+00, double 1.000000e+01) - ret double %calltmp -} - -<em>Evaluated to 24.000000</em> -</pre> -</div> - -<p>This illustrates that we can now call user code, but there is something a bit -subtle going on here. Note that we only invoke the JIT on the anonymous -functions that <em>call testfunc</em>, but we never invoked it -on <em>testfunc</em> itself. What actually happened here is that the JIT -scanned for all non-JIT'd functions transitively called from the anonymous -function and compiled all of them before returning -from <tt>run_function</tt>.</p> - -<p>The JIT provides a number of other more advanced interfaces for things like -freeing allocated machine code, rejit'ing functions to update them, etc. -However, even with this simple code, we get some surprisingly powerful -capabilities - check this out (I removed the dump of the anonymous functions, -you should get the idea by now :) :</p> - -<div class="doc_code"> -<pre> -ready> <b>extern sin(x);</b> -Read extern: -declare double @sin(double) - -ready> <b>extern cos(x);</b> -Read extern: -declare double @cos(double) - -ready> <b>sin(1.0);</b> -<em>Evaluated to 0.841471</em> - -ready> <b>def foo(x) sin(x)*sin(x) + cos(x)*cos(x);</b> -Read function definition: -define double @foo(double %x) { -entry: - %calltmp = call double @sin(double %x) - %multmp = fmul double %calltmp, %calltmp - %calltmp2 = call double @cos(double %x) - %multmp4 = fmul double %calltmp2, %calltmp2 - %addtmp = fadd double %multmp, %multmp4 - ret double %addtmp -} - -ready> <b>foo(4.0);</b> -<em>Evaluated to 1.000000</em> -</pre> -</div> - -<p>Whoa, how does the JIT know about sin and cos? The answer is surprisingly -simple: in this example, the JIT started execution of a function and got to a -function call. It realized that the function was not yet JIT compiled and -invoked the standard set of routines to resolve the function. In this case, -there is no body defined for the function, so the JIT ended up calling -"<tt>dlsym("sin")</tt>" on the Kaleidoscope process itself. Since -"<tt>sin</tt>" is defined within the JIT's address space, it simply patches up -calls in the module to call the libm version of <tt>sin</tt> directly.</p> - -<p>The LLVM JIT provides a number of interfaces (look in the -<tt>llvm_executionengine.mli</tt> file) for controlling how unknown functions -get resolved. It allows you to establish explicit mappings between IR objects -and addresses (useful for LLVM global variables that you want to map to static -tables, for example), allows you to dynamically decide on the fly based on the -function name, and even allows you to have the JIT compile functions lazily the -first time they're called.</p> - -<p>One interesting application of this is that we can now extend the language -by writing arbitrary C code to implement operations. For example, if we add: -</p> - -<div class="doc_code"> -<pre> -/* putchard - putchar that takes a double and returns 0. */ -extern "C" -double putchard(double X) { - putchar((char)X); - return 0; -} -</pre> -</div> - -<p>Now we can produce simple output to the console by using things like: -"<tt>extern putchard(x); putchard(120);</tt>", which prints a lowercase 'x' on -the console (120 is the ASCII code for 'x'). Similar code could be used to -implement file I/O, console input, and many other capabilities in -Kaleidoscope.</p> - -<p>This completes the JIT and optimizer chapter of the Kaleidoscope tutorial. At -this point, we can compile a non-Turing-complete programming language, optimize -and JIT compile it in a user-driven way. Next up we'll look into <a -href="OCamlLangImpl5.html">extending the language with control flow -constructs</a>, tackling some interesting LLVM IR issues along the way.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="code">Full Code Listing</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p> -Here is the complete code listing for our running example, enhanced with the -LLVM JIT and optimizer. To build this example, use: -</p> - -<div class="doc_code"> -<pre> -# Compile -ocamlbuild toy.byte -# Run -./toy.byte -</pre> -</div> - -<p>Here is the code:</p> - -<dl> -<dt>_tags:</dt> -<dd class="doc_code"> -<pre> -<{lexer,parser}.ml>: use_camlp4, pp(camlp4of) -<*.{byte,native}>: g++, use_llvm, use_llvm_analysis -<*.{byte,native}>: use_llvm_executionengine, use_llvm_target -<*.{byte,native}>: use_llvm_scalar_opts, use_bindings -</pre> -</dd> - -<dt>myocamlbuild.ml:</dt> -<dd class="doc_code"> -<pre> -open Ocamlbuild_plugin;; - -ocaml_lib ~extern:true "llvm";; -ocaml_lib ~extern:true "llvm_analysis";; -ocaml_lib ~extern:true "llvm_executionengine";; -ocaml_lib ~extern:true "llvm_target";; -ocaml_lib ~extern:true "llvm_scalar_opts";; - -flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"]);; -dep ["link"; "ocaml"; "use_bindings"] ["bindings.o"];; -</pre> -</dd> - -<dt>token.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Lexer Tokens - *===----------------------------------------------------------------------===*) - -(* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of - * these others for known things. *) -type token = - (* commands *) - | Def | Extern - - (* primary *) - | Ident of string | Number of float - - (* unknown *) - | Kwd of char -</pre> -</dd> - -<dt>lexer.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Lexer - *===----------------------------------------------------------------------===*) - -let rec lex = parser - (* Skip any whitespace. *) - | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream - - (* identifier: [a-zA-Z][a-zA-Z0-9] *) - | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_ident buffer stream - - (* number: [0-9.]+ *) - | [< ' ('0' .. '9' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_number buffer stream - - (* Comment until end of line. *) - | [< ' ('#'); stream >] -> - lex_comment stream - - (* Otherwise, just return the character as its ascii value. *) - | [< 'c; stream >] -> - [< 'Token.Kwd c; lex stream >] - - (* end of stream. *) - | [< >] -> [< >] - -and lex_number buffer = parser - | [< ' ('0' .. '9' | '.' as c); stream >] -> - Buffer.add_char buffer c; - lex_number buffer stream - | [< stream=lex >] -> - [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] - -and lex_ident buffer = parser - | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> - Buffer.add_char buffer c; - lex_ident buffer stream - | [< stream=lex >] -> - match Buffer.contents buffer with - | "def" -> [< 'Token.Def; stream >] - | "extern" -> [< 'Token.Extern; stream >] - | id -> [< 'Token.Ident id; stream >] - -and lex_comment = parser - | [< ' ('\n'); stream=lex >] -> stream - | [< 'c; e=lex_comment >] -> e - | [< >] -> [< >] -</pre> -</dd> - -<dt>ast.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Abstract Syntax Tree (aka Parse Tree) - *===----------------------------------------------------------------------===*) - -(* expr - Base type for all expression nodes. *) -type expr = - (* variant for numeric literals like "1.0". *) - | Number of float - - (* variant for referencing a variable, like "a". *) - | Variable of string - - (* variant for a binary operator. *) - | Binary of char * expr * expr - - (* variant for function calls. *) - | Call of string * expr array - -(* proto - This type represents the "prototype" for a function, which captures - * its name, and its argument names (thus implicitly the number of arguments the - * function takes). *) -type proto = Prototype of string * string array - -(* func - This type represents a function definition itself. *) -type func = Function of proto * expr -</pre> -</dd> - -<dt>parser.ml:</dt> -<dd class="doc_code"> -<pre> -(*===---------------------------------------------------------------------=== - * Parser - *===---------------------------------------------------------------------===*) - -(* binop_precedence - This holds the precedence for each binary operator that is - * defined *) -let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 - -(* precedence - Get the precedence of the pending binary operator token. *) -let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 - -(* primary - * ::= identifier - * ::= numberexpr - * ::= parenexpr *) -let rec parse_primary = parser - (* numberexpr ::= number *) - | [< 'Token.Number n >] -> Ast.Number n - - (* parenexpr ::= '(' expression ')' *) - | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e - - (* identifierexpr - * ::= identifier - * ::= identifier '(' argumentexpr ')' *) - | [< 'Token.Ident id; stream >] -> - let rec parse_args accumulator = parser - | [< e=parse_expr; stream >] -> - begin parser - | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e - | [< >] -> e :: accumulator - end stream - | [< >] -> accumulator - in - let rec parse_ident id = parser - (* Call. *) - | [< 'Token.Kwd '('; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')'">] -> - Ast.Call (id, Array.of_list (List.rev args)) - - (* Simple variable ref. *) - | [< >] -> Ast.Variable id - in - parse_ident id stream - - | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") - -(* binoprhs - * ::= ('+' primary)* *) -and parse_bin_rhs expr_prec lhs stream = - match Stream.peek stream with - (* If this is a binop, find its precedence. *) - | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> - let token_prec = precedence c in - - (* If this is a binop that binds at least as tightly as the current binop, - * consume it, otherwise we are done. *) - if token_prec < expr_prec then lhs else begin - (* Eat the binop. *) - Stream.junk stream; - - (* Parse the primary expression after the binary operator. *) - let rhs = parse_primary stream in - - (* Okay, we know this is a binop. *) - let rhs = - match Stream.peek stream with - | Some (Token.Kwd c2) -> - (* If BinOp binds less tightly with rhs than the operator after - * rhs, let the pending operator take rhs as its lhs. *) - let next_prec = precedence c2 in - if token_prec < next_prec - then parse_bin_rhs (token_prec + 1) rhs stream - else rhs - | _ -> rhs - in - - (* Merge lhs/rhs. *) - let lhs = Ast.Binary (c, lhs, rhs) in - parse_bin_rhs expr_prec lhs stream - end - | _ -> lhs - -(* expression - * ::= primary binoprhs *) -and parse_expr = parser - | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream - -(* prototype - * ::= id '(' id* ')' *) -let parse_prototype = - let rec parse_args accumulator = parser - | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e - | [< >] -> accumulator - in - - parser - | [< 'Token.Ident id; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - (* success. *) - Ast.Prototype (id, Array.of_list (List.rev args)) - - | [< >] -> - raise (Stream.Error "expected function name in prototype") - -(* definition ::= 'def' prototype expression *) -let parse_definition = parser - | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> - Ast.Function (p, e) - -(* toplevelexpr ::= expression *) -let parse_toplevel = parser - | [< e=parse_expr >] -> - (* Make an anonymous proto. *) - Ast.Function (Ast.Prototype ("", [||]), e) - -(* external ::= 'extern' prototype *) -let parse_extern = parser - | [< 'Token.Extern; e=parse_prototype >] -> e -</pre> -</dd> - -<dt>codegen.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Code Generation - *===----------------------------------------------------------------------===*) - -open Llvm - -exception Error of string - -let context = global_context () -let the_module = create_module context "my cool jit" -let builder = builder context -let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 -let double_type = double_type context - -let rec codegen_expr = function - | Ast.Number n -> const_float double_type n - | Ast.Variable name -> - (try Hashtbl.find named_values name with - | Not_found -> raise (Error "unknown variable name")) - | Ast.Binary (op, lhs, rhs) -> - let lhs_val = codegen_expr lhs in - let rhs_val = codegen_expr rhs in - begin - match op with - | '+' -> build_add lhs_val rhs_val "addtmp" builder - | '-' -> build_sub lhs_val rhs_val "subtmp" builder - | '*' -> build_mul lhs_val rhs_val "multmp" builder - | '<' -> - (* Convert bool 0/1 to double 0.0 or 1.0 *) - let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in - build_uitofp i double_type "booltmp" builder - | _ -> raise (Error "invalid binary operator") - end - | Ast.Call (callee, args) -> - (* Look up the name in the module table. *) - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "unknown function referenced") - in - let params = params callee in - - (* If argument mismatch error. *) - if Array.length params == Array.length args then () else - raise (Error "incorrect # arguments passed"); - let args = Array.map codegen_expr args in - build_call callee args "calltmp" builder - -let codegen_proto = function - | Ast.Prototype (name, args) -> - (* Make the function type: double(double,double) etc. *) - let doubles = Array.make (Array.length args) double_type in - let ft = function_type double_type doubles in - let f = - match lookup_function name the_module with - | None -> declare_function name ft the_module - - (* If 'f' conflicted, there was already something named 'name'. If it - * has a body, don't allow redefinition or reextern. *) - | Some f -> - (* If 'f' already has a body, reject this. *) - if block_begin f <> At_end f then - raise (Error "redefinition of function"); - - (* If 'f' took a different number of arguments, reject. *) - if element_type (type_of f) <> ft then - raise (Error "redefinition of function with different # args"); - f - in - - (* Set names for all arguments. *) - Array.iteri (fun i a -> - let n = args.(i) in - set_value_name n a; - Hashtbl.add named_values n a; - ) (params f); - f - -let codegen_func the_fpm = function - | Ast.Function (proto, body) -> - Hashtbl.clear named_values; - let the_function = codegen_proto proto in - - (* Create a new basic block to start insertion into. *) - let bb = append_block context "entry" the_function in - position_at_end bb builder; - - try - let ret_val = codegen_expr body in - - (* Finish off the function. *) - let _ = build_ret ret_val builder in - - (* Validate the generated code, checking for consistency. *) - Llvm_analysis.assert_valid_function the_function; - - (* Optimize the function. *) - let _ = PassManager.run_function the_function the_fpm in - - the_function - with e -> - delete_function the_function; - raise e -</pre> -</dd> - -<dt>toplevel.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Top-Level parsing and JIT Driver - *===----------------------------------------------------------------------===*) - -open Llvm -open Llvm_executionengine - -(* top ::= definition | external | expression | ';' *) -let rec main_loop the_fpm the_execution_engine stream = - match Stream.peek stream with - | None -> () - - (* ignore top-level semicolons. *) - | Some (Token.Kwd ';') -> - Stream.junk stream; - main_loop the_fpm the_execution_engine stream - - | Some token -> - begin - try match token with - | Token.Def -> - let e = Parser.parse_definition stream in - print_endline "parsed a function definition."; - dump_value (Codegen.codegen_func the_fpm e); - | Token.Extern -> - let e = Parser.parse_extern stream in - print_endline "parsed an extern."; - dump_value (Codegen.codegen_proto e); - | _ -> - (* Evaluate a top-level expression into an anonymous function. *) - let e = Parser.parse_toplevel stream in - print_endline "parsed a top-level expr"; - let the_function = Codegen.codegen_func the_fpm e in - dump_value the_function; - - (* JIT the function, returning a function pointer. *) - let result = ExecutionEngine.run_function the_function [||] - the_execution_engine in - - print_string "Evaluated to "; - print_float (GenericValue.as_float Codegen.double_type result); - print_newline (); - with Stream.Error s | Codegen.Error s -> - (* Skip token for error recovery. *) - Stream.junk stream; - print_endline s; - end; - print_string "ready> "; flush stdout; - main_loop the_fpm the_execution_engine stream -</pre> -</dd> - -<dt>toy.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Main driver code. - *===----------------------------------------------------------------------===*) - -open Llvm -open Llvm_executionengine -open Llvm_target -open Llvm_scalar_opts - -let main () = - ignore (initialize_native_target ()); - - (* Install standard binary operators. - * 1 is the lowest precedence. *) - Hashtbl.add Parser.binop_precedence '<' 10; - Hashtbl.add Parser.binop_precedence '+' 20; - Hashtbl.add Parser.binop_precedence '-' 20; - Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) - - (* Prime the first token. *) - print_string "ready> "; flush stdout; - let stream = Lexer.lex (Stream.of_channel stdin) in - - (* Create the JIT. *) - let the_execution_engine = ExecutionEngine.create Codegen.the_module in - let the_fpm = PassManager.create_function Codegen.the_module in - - (* Set up the optimizer pipeline. Start with registering info about how the - * target lays out data structures. *) - DataLayout.add (ExecutionEngine.target_data the_execution_engine) the_fpm; - - (* Do simple "peephole" optimizations and bit-twiddling optzn. *) - add_instruction_combination the_fpm; - - (* reassociate expressions. *) - add_reassociation the_fpm; - - (* Eliminate Common SubExpressions. *) - add_gvn the_fpm; - - (* Simplify the control flow graph (deleting unreachable blocks, etc). *) - add_cfg_simplification the_fpm; - - ignore (PassManager.initialize the_fpm); - - (* Run the main "interpreter loop" now. *) - Toplevel.main_loop the_fpm the_execution_engine stream; - - (* Print out all the generated code. *) - dump_module Codegen.the_module -;; - -main () -</pre> -</dd> - -<dt>bindings.c</dt> -<dd class="doc_code"> -<pre> -#include <stdio.h> - -/* putchard - putchar that takes a double and returns 0. */ -extern double putchard(double X) { - putchar((char)X); - return 0; -} -</pre> -</dd> -</dl> - -<a href="OCamlLangImpl5.html">Next: Extending the language: control flow</a> -</div> - -<!-- *********************************************************************** --> -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> - - <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> - <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a><br> - <a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> -</body> -</html> diff --git a/docs/tutorial/OCamlLangImpl4.rst b/docs/tutorial/OCamlLangImpl4.rst new file mode 100644 index 0000000000..865a03dfb7 --- /dev/null +++ b/docs/tutorial/OCamlLangImpl4.rst @@ -0,0 +1,918 @@ +============================================== +Kaleidoscope: Adding JIT and Optimizer Support +============================================== + +.. contents:: + :local: + +Written by `Chris Lattner <mailto:sabre@nondot.org>`_ and `Erick +Tryzelaar <mailto:idadesub@users.sourceforge.net>`_ + +Chapter 4 Introduction +====================== + +Welcome to Chapter 4 of the "`Implementing a language with +LLVM <index.html>`_" tutorial. Chapters 1-3 described the implementation +of a simple language and added support for generating LLVM IR. This +chapter describes two new techniques: adding optimizer support to your +language, and adding JIT compiler support. These additions will +demonstrate how to get nice, efficient code for the Kaleidoscope +language. + +Trivial Constant Folding +======================== + +**Note:** the default ``IRBuilder`` now always includes the constant +folding optimisations below. + +Our demonstration for Chapter 3 is elegant and easy to extend. +Unfortunately, it does not produce wonderful code. For example, when +compiling simple code, we don't get obvious optimizations: + +:: + + ready> def test(x) 1+2+x; + Read function definition: + define double @test(double %x) { + entry: + %addtmp = fadd double 1.000000e+00, 2.000000e+00 + %addtmp1 = fadd double %addtmp, %x + ret double %addtmp1 + } + +This code is a very, very literal transcription of the AST built by +parsing the input. As such, this transcription lacks optimizations like +constant folding (we'd like to get "``add x, 3.0``" in the example +above) as well as other more important optimizations. Constant folding, +in particular, is a very common and very important optimization: so much +so that many language implementors implement constant folding support in +their AST representation. + +With LLVM, you don't need this support in the AST. Since all calls to +build LLVM IR go through the LLVM builder, it would be nice if the +builder itself checked to see if there was a constant folding +opportunity when you call it. If so, it could just do the constant fold +and return the constant instead of creating an instruction. This is +exactly what the ``LLVMFoldingBuilder`` class does. + +All we did was switch from ``LLVMBuilder`` to ``LLVMFoldingBuilder``. +Though we change no other code, we now have all of our instructions +implicitly constant folded without us having to do anything about it. +For example, the input above now compiles to: + +:: + + ready> def test(x) 1+2+x; + Read function definition: + define double @test(double %x) { + entry: + %addtmp = fadd double 3.000000e+00, %x + ret double %addtmp + } + +Well, that was easy :). In practice, we recommend always using +``LLVMFoldingBuilder`` when generating code like this. It has no +"syntactic overhead" for its use (you don't have to uglify your compiler +with constant checks everywhere) and it can dramatically reduce the +amount of LLVM IR that is generated in some cases (particular for +languages with a macro preprocessor or that use a lot of constants). + +On the other hand, the ``LLVMFoldingBuilder`` is limited by the fact +that it does all of its analysis inline with the code as it is built. If +you take a slightly more complex example: + +:: + + ready> def test(x) (1+2+x)*(x+(1+2)); + ready> Read function definition: + define double @test(double %x) { + entry: + %addtmp = fadd double 3.000000e+00, %x + %addtmp1 = fadd double %x, 3.000000e+00 + %multmp = fmul double %addtmp, %addtmp1 + ret double %multmp + } + +In this case, the LHS and RHS of the multiplication are the same value. +We'd really like to see this generate "``tmp = x+3; result = tmp*tmp;``" +instead of computing "``x*3``" twice. + +Unfortunately, no amount of local analysis will be able to detect and +correct this. This requires two transformations: reassociation of +expressions (to make the add's lexically identical) and Common +Subexpression Elimination (CSE) to delete the redundant add instruction. +Fortunately, LLVM provides a broad range of optimizations that you can +use, in the form of "passes". + +LLVM Optimization Passes +======================== + +LLVM provides many optimization passes, which do many different sorts of +things and have different tradeoffs. Unlike other systems, LLVM doesn't +hold to the mistaken notion that one set of optimizations is right for +all languages and for all situations. LLVM allows a compiler implementor +to make complete decisions about what optimizations to use, in which +order, and in what situation. + +As a concrete example, LLVM supports both "whole module" passes, which +look across as large of body of code as they can (often a whole file, +but if run at link time, this can be a substantial portion of the whole +program). It also supports and includes "per-function" passes which just +operate on a single function at a time, without looking at other +functions. For more information on passes and how they are run, see the +`How to Write a Pass <../WritingAnLLVMPass.html>`_ document and the +`List of LLVM Passes <../Passes.html>`_. + +For Kaleidoscope, we are currently generating functions on the fly, one +at a time, as the user types them in. We aren't shooting for the +ultimate optimization experience in this setting, but we also want to +catch the easy and quick stuff where possible. As such, we will choose +to run a few per-function optimizations as the user types the function +in. If we wanted to make a "static Kaleidoscope compiler", we would use +exactly the code we have now, except that we would defer running the +optimizer until the entire file has been parsed. + +In order to get per-function optimizations going, we need to set up a +`Llvm.PassManager <../WritingAnLLVMPass.html#passmanager>`_ to hold and +organize the LLVM optimizations that we want to run. Once we have that, +we can add a set of optimizations to run. The code looks like this: + +.. code-block:: ocaml + + (* Create the JIT. *) + let the_execution_engine = ExecutionEngine.create Codegen.the_module in + let the_fpm = PassManager.create_function Codegen.the_module in + + (* Set up the optimizer pipeline. Start with registering info about how the + * target lays out data structures. *) + DataLayout.add (ExecutionEngine.target_data the_execution_engine) the_fpm; + + (* Do simple "peephole" optimizations and bit-twiddling optzn. *) + add_instruction_combining the_fpm; + + (* reassociate expressions. *) + add_reassociation the_fpm; + + (* Eliminate Common SubExpressions. *) + add_gvn the_fpm; + + (* Simplify the control flow graph (deleting unreachable blocks, etc). *) + add_cfg_simplification the_fpm; + + ignore (PassManager.initialize the_fpm); + + (* Run the main "interpreter loop" now. *) + Toplevel.main_loop the_fpm the_execution_engine stream; + +The meat of the matter here, is the definition of "``the_fpm``". It +requires a pointer to the ``the_module`` to construct itself. Once it is +set up, we use a series of "add" calls to add a bunch of LLVM passes. +The first pass is basically boilerplate, it adds a pass so that later +optimizations know how the data structures in the program are laid out. +The "``the_execution_engine``" variable is related to the JIT, which we +will get to in the next section. + +In this case, we choose to add 4 optimization passes. The passes we +chose here are a pretty standard set of "cleanup" optimizations that are +useful for a wide variety of code. I won't delve into what they do but, +believe me, they are a good starting place :). + +Once the ``Llvm.PassManager.`` is set up, we need to make use of it. We +do this by running it after our newly created function is constructed +(in ``Codegen.codegen_func``), but before it is returned to the client: + +.. code-block:: ocaml + + let codegen_func the_fpm = function + ... + try + let ret_val = codegen_expr body in + + (* Finish off the function. *) + let _ = build_ret ret_val builder in + + (* Validate the generated code, checking for consistency. *) + Llvm_analysis.assert_valid_function the_function; + + (* Optimize the function. *) + let _ = PassManager.run_function the_function the_fpm in + + the_function + +As you can see, this is pretty straightforward. The ``the_fpm`` +optimizes and updates the LLVM Function\* in place, improving +(hopefully) its body. With this in place, we can try our test above +again: + +:: + + ready> def test(x) (1+2+x)*(x+(1+2)); + ready> Read function definition: + define double @test(double %x) { + entry: + %addtmp = fadd double %x, 3.000000e+00 + %multmp = fmul double %addtmp, %addtmp + ret double %multmp + } + +As expected, we now get our nicely optimized code, saving a floating +point add instruction from every execution of this function. + +LLVM provides a wide variety of optimizations that can be used in +certain circumstances. Some `documentation about the various +passes <../Passes.html>`_ is available, but it isn't very complete. +Another good source of ideas can come from looking at the passes that +``Clang`` runs to get started. The "``opt``" tool allows you to +experiment with passes from the command line, so you can see if they do +anything. + +Now that we have reasonable code coming out of our front-end, lets talk +about executing it! + +Adding a JIT Compiler +===================== + +Code that is available in LLVM IR can have a wide variety of tools +applied to it. For example, you can run optimizations on it (as we did +above), you can dump it out in textual or binary forms, you can compile +the code to an assembly file (.s) for some target, or you can JIT +compile it. The nice thing about the LLVM IR representation is that it +is the "common currency" between many different parts of the compiler. + +In this section, we'll add JIT compiler support to our interpreter. The +basic idea that we want for Kaleidoscope is to have the user enter +function bodies as they do now, but immediately evaluate the top-level +expressions they type in. For example, if they type in "1 + 2;", we +should evaluate and print out 3. If they define a function, they should +be able to call it from the command line. + +In order to do this, we first declare and initialize the JIT. This is +done by adding a global variable and a call in ``main``: + +.. code-block:: ocaml + + ... + let main () = + ... + (* Create the JIT. *) + let the_execution_engine = ExecutionEngine.create Codegen.the_module in + ... + +This creates an abstract "Execution Engine" which can be either a JIT +compiler or the LLVM interpreter. LLVM will automatically pick a JIT +compiler for you if one is available for your platform, otherwise it +will fall back to the interpreter. + +Once the ``Llvm_executionengine.ExecutionEngine.t`` is created, the JIT +is ready to be used. There are a variety of APIs that are useful, but +the simplest one is the +"``Llvm_executionengine.ExecutionEngine.run_function``" function. This +method JIT compiles the specified LLVM Function and returns a function +pointer to the generated machine code. In our case, this means that we +can change the code that parses a top-level expression to look like +this: + +.. code-block:: ocaml + + (* Evaluate a top-level expression into an anonymous function. *) + let e = Parser.parse_toplevel stream in + print_endline "parsed a top-level expr"; + let the_function = Codegen.codegen_func the_fpm e in + dump_value the_function; + + (* JIT the function, returning a function pointer. *) + let result = ExecutionEngine.run_function the_function [||] + the_execution_engine in + + print_string "Evaluated to "; + print_float (GenericValue.as_float Codegen.double_type result); + print_newline (); + +Recall that we compile top-level expressions into a self-contained LLVM +function that takes no arguments and returns the computed double. +Because the LLVM JIT compiler matches the native platform ABI, this +means that you can just cast the result pointer to a function pointer of +that type and call it directly. This means, there is no difference +between JIT compiled code and native machine code that is statically +linked into your application. + +With just these two changes, lets see how Kaleidoscope works now! + +:: + + ready> 4+5; + define double @""() { + entry: + ret double 9.000000e+00 + } + + Evaluated to 9.000000 + +Well this looks like it is basically working. The dump of the function +shows the "no argument function that always returns double" that we +synthesize for each top level expression that is typed in. This +demonstrates very basic functionality, but can we do more? + +:: + + ready> def testfunc(x y) x + y*2; + Read function definition: + define double @testfunc(double %x, double %y) { + entry: + %multmp = fmul double %y, 2.000000e+00 + %addtmp = fadd double %multmp, %x + ret double %addtmp + } + + ready> testfunc(4, 10); + define double @""() { + entry: + %calltmp = call double @testfunc(double 4.000000e+00, double 1.000000e+01) + ret double %calltmp + } + + Evaluated to 24.000000 + +This illustrates that we can now call user code, but there is something +a bit subtle going on here. Note that we only invoke the JIT on the +anonymous functions that *call testfunc*, but we never invoked it on +*testfunc* itself. What actually happened here is that the JIT scanned +for all non-JIT'd functions transitively called from the anonymous +function and compiled all of them before returning from +``run_function``. + +The JIT provides a number of other more advanced interfaces for things +like freeing allocated machine code, rejit'ing functions to update them, +etc. However, even with this simple code, we get some surprisingly +powerful capabilities - check this out (I removed the dump of the +anonymous functions, you should get the idea by now :) : + +:: + + ready> extern sin(x); + Read extern: + declare double @sin(double) + + ready> extern cos(x); + Read extern: + declare double @cos(double) + + ready> sin(1.0); + Evaluated to 0.841471 + + ready> def foo(x) sin(x)*sin(x) + cos(x)*cos(x); + Read function definition: + define double @foo(double %x) { + entry: + %calltmp = call double @sin(double %x) + %multmp = fmul double %calltmp, %calltmp + %calltmp2 = call double @cos(double %x) + %multmp4 = fmul double %calltmp2, %calltmp2 + %addtmp = fadd double %multmp, %multmp4 + ret double %addtmp + } + + ready> foo(4.0); + Evaluated to 1.000000 + +Whoa, how does the JIT know about sin and cos? The answer is +surprisingly simple: in this example, the JIT started execution of a +function and got to a function call. It realized that the function was +not yet JIT compiled and invoked the standard set of routines to resolve +the function. In this case, there is no body defined for the function, +so the JIT ended up calling "``dlsym("sin")``" on the Kaleidoscope +process itself. Since "``sin``" is defined within the JIT's address +space, it simply patches up calls in the module to call the libm version +of ``sin`` directly. + +The LLVM JIT provides a number of interfaces (look in the +``llvm_executionengine.mli`` file) for controlling how unknown functions +get resolved. It allows you to establish explicit mappings between IR +objects and addresses (useful for LLVM global variables that you want to +map to static tables, for example), allows you to dynamically decide on +the fly based on the function name, and even allows you to have the JIT +compile functions lazily the first time they're called. + +One interesting application of this is that we can now extend the +language by writing arbitrary C code to implement operations. For +example, if we add: + +.. code-block:: c++ + + /* putchard - putchar that takes a double and returns 0. */ + extern "C" + double putchard(double X) { + putchar((char)X); + return 0; + } + +Now we can produce simple output to the console by using things like: +"``extern putchard(x); putchard(120);``", which prints a lowercase 'x' +on the console (120 is the ASCII code for 'x'). Similar code could be +used to implement file I/O, console input, and many other capabilities +in Kaleidoscope. + +This completes the JIT and optimizer chapter of the Kaleidoscope +tutorial. At this point, we can compile a non-Turing-complete +programming language, optimize and JIT compile it in a user-driven way. +Next up we'll look into `extending the language with control flow +constructs <OCamlLangImpl5.html>`_, tackling some interesting LLVM IR +issues along the way. + +Full Code Listing +================= + +Here is the complete code listing for our running example, enhanced with +the LLVM JIT and optimizer. To build this example, use: + +.. code-block:: bash + + # Compile + ocamlbuild toy.byte + # Run + ./toy.byte + +Here is the code: + +\_tags: + :: + + <{lexer,parser}.ml>: use_camlp4, pp(camlp4of) + <*.{byte,native}>: g++, use_llvm, use_llvm_analysis + <*.{byte,native}>: use_llvm_executionengine, use_llvm_target + <*.{byte,native}>: use_llvm_scalar_opts, use_bindings + +myocamlbuild.ml: + .. code-block:: ocaml + + open Ocamlbuild_plugin;; + + ocaml_lib ~extern:true "llvm";; + ocaml_lib ~extern:true "llvm_analysis";; + ocaml_lib ~extern:true "llvm_executionengine";; + ocaml_lib ~extern:true "llvm_target";; + ocaml_lib ~extern:true "llvm_scalar_opts";; + + flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"]);; + dep ["link"; "ocaml"; "use_bindings"] ["bindings.o"];; + +token.ml: + .. code-block:: ocaml + + (*===----------------------------------------------------------------------=== + * Lexer Tokens + *===----------------------------------------------------------------------===*) + + (* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of + * these others for known things. *) + type token = + (* commands *) + | Def | Extern + + (* primary *) + | Ident of string | Number of float + + (* unknown *) + | Kwd of char + +lexer.ml: + .. code-block:: ocaml + + (*===----------------------------------------------------------------------=== + * Lexer + *===----------------------------------------------------------------------===*) + + let rec lex = parser + (* Skip any whitespace. *) + | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream + + (* identifier: [a-zA-Z][a-zA-Z0-9] *) + | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> + let buffer = Buffer.create 1 in + Buffer.add_char buffer c; + lex_ident buffer stream + + (* number: [0-9.]+ *) + | [< ' ('0' .. '9' as c); stream >] -> + let buffer = Buffer.create 1 in + Buffer.add_char buffer c; + lex_number buffer stream + + (* Comment until end of line. *) + | [< ' ('#'); stream >] -> + lex_comment stream + + (* Otherwise, just return the character as its ascii value. *) + | [< 'c; stream >] -> + [< 'Token.Kwd c; lex stream >] + + (* end of stream. *) + | [< >] -> [< >] + + and lex_number buffer = parser + | [< ' ('0' .. '9' | '.' as c); stream >] -> + Buffer.add_char buffer c; + lex_number buffer stream + | [< stream=lex >] -> + [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] + + and lex_ident buffer = parser + | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> + Buffer.add_char buffer c; + lex_ident buffer stream + | [< stream=lex >] -> + match Buffer.contents buffer with + | "def" -> [< 'Token.Def; stream >] + | "extern" -> [< 'Token.Extern; stream >] + | id -> [< 'Token.Ident id; stream >] + + and lex_comment = parser + | [< ' ('\n'); stream=lex >] -> stream + | [< 'c; e=lex_comment >] -> e + | [< >] -> [< >] + +ast.ml: + .. code-block:: ocaml + + (*===----------------------------------------------------------------------=== + * Abstract Syntax Tree (aka Parse Tree) + *===----------------------------------------------------------------------===*) + + (* expr - Base type for all expression nodes. *) + type expr = + (* variant for numeric literals like "1.0". *) + | Number of float + + (* variant for referencing a variable, like "a". *) + | Variable of string + + (* variant for a binary operator. *) + | Binary of char * expr * expr + + (* variant for function calls. *) + | Call of string * expr array + + (* proto - This type represents the "prototype" for a function, which captures + * its name, and its argument names (thus implicitly the number of arguments the + * function takes). *) + type proto = Prototype of string * string array + + (* func - This type represents a function definition itself. *) + type func = Function of proto * expr + +parser.ml: + .. code-block:: ocaml + + (*===---------------------------------------------------------------------=== + * Parser + *===---------------------------------------------------------------------===*) + + (* binop_precedence - This holds the precedence for each binary operator that is + * defined *) + let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 + + (* precedence - Get the precedence of the pending binary operator token. *) + let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 + + (* primary + * ::= identifier + * ::= numberexpr + * ::= parenexpr *) + let rec parse_primary = parser + (* numberexpr ::= number *) + | [< 'Token.Number n >] -> Ast.Number n + + (* parenexpr ::= '(' expression ')' *) + | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e + + (* identifierexpr + * ::= identifier + * ::= identifier '(' argumentexpr ')' *) + | [< 'Token.Ident id; stream >] -> + let rec parse_args accumulator = parser + | [< e=parse_expr; stream >] -> + begin parser + | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e + | [< >] -> e :: accumulator + end stream + | [< >] -> accumulator + in + let rec parse_ident id = parser + (* Call. *) + | [< 'Token.Kwd '('; + args=parse_args []; + 'Token.Kwd ')' ?? "expected ')'">] -> + Ast.Call (id, Array.of_list (List.rev args)) + + (* Simple variable ref. *) + | [< >] -> Ast.Variable id + in + parse_ident id stream + + | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") + + (* binoprhs + * ::= ('+' primary)* *) + and parse_bin_rhs expr_prec lhs stream = + match Stream.peek stream with + (* If this is a binop, find its precedence. *) + | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> + let token_prec = precedence c in + + (* If this is a binop that binds at least as tightly as the current binop, + * consume it, otherwise we are done. *) + if token_prec < expr_prec then lhs else begin + (* Eat the binop. *) + Stream.junk stream; + + (* Parse the primary expression after the binary operator. *) + let rhs = parse_primary stream in + + (* Okay, we know this is a binop. *) + let rhs = + match Stream.peek stream with + | Some (Token.Kwd c2) -> + (* If BinOp binds less tightly with rhs than the operator after + * rhs, let the pending operator take rhs as its lhs. *) + let next_prec = precedence c2 in + if token_prec < next_prec + then parse_bin_rhs (token_prec + 1) rhs stream + else rhs + | _ -> rhs + in + + (* Merge lhs/rhs. *) + let lhs = Ast.Binary (c, lhs, rhs) in + parse_bin_rhs expr_prec lhs stream + end + | _ -> lhs + + (* expression + * ::= primary binoprhs *) + and parse_expr = parser + | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream + + (* prototype + * ::= id '(' id* ')' *) + let parse_prototype = + let rec parse_args accumulator = parser + | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e + | [< >] -> accumulator + in + + parser + | [< 'Token.Ident id; + 'Token.Kwd '(' ?? "expected '(' in prototype"; + args=parse_args []; + 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> + (* success. *) + Ast.Prototype (id, Array.of_list (List.rev args)) + + | [< >] -> + raise (Stream.Error "expected function name in prototype") + + (* definition ::= 'def' prototype expression *) + let parse_definition = parser + | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> + Ast.Function (p, e) + + (* toplevelexpr ::= expression *) + let parse_toplevel = parser + | [< e=parse_expr >] -> + (* Make an anonymous proto. *) + Ast.Function (Ast.Prototype ("", [||]), e) + + (* external ::= 'extern' prototype *) + let parse_extern = parser + | [< 'Token.Extern; e=parse_prototype >] -> e + +codegen.ml: + .. code-block:: ocaml + + (*===----------------------------------------------------------------------=== + * Code Generation + *===----------------------------------------------------------------------===*) + + open Llvm + + exception Error of string + + let context = global_context () + let the_module = create_module context "my cool jit" + let builder = builder context + let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 + let double_type = double_type context + + let rec codegen_expr = function + | Ast.Number n -> const_float double_type n + | Ast.Variable name -> + (try Hashtbl.find named_values name with + | Not_found -> raise (Error "unknown variable name")) + | Ast.Binary (op, lhs, rhs) -> + let lhs_val = codegen_expr lhs in + let rhs_val = codegen_expr rhs in + begin + match op with + | '+' -> build_add lhs_val rhs_val "addtmp" builder + | '-' -> build_sub lhs_val rhs_val "subtmp" builder + | '*' -> build_mul lhs_val rhs_val "multmp" builder + | '<' -> + (* Convert bool 0/1 to double 0.0 or 1.0 *) + let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in + build_uitofp i double_type "booltmp" builder + | _ -> raise (Error "invalid binary operator") + end + | Ast.Call (callee, args) -> + (* Look up the name in the module table. *) + let callee = + match lookup_function callee the_module with + | Some callee -> callee + | None -> raise (Error "unknown function referenced") + in + let params = params callee in + + (* If argument mismatch error. *) + if Array.length params == Array.length args then () else + raise (Error "incorrect # arguments passed"); + let args = Array.map codegen_expr args in + build_call callee args "calltmp" builder + + let codegen_proto = function + | Ast.Prototype (name, args) -> + (* Make the function type: double(double,double) etc. *) + let doubles = Array.make (Array.length args) double_type in + let ft = function_type double_type doubles in + let f = + match lookup_function name the_module with + | None -> declare_function name ft the_module + + (* If 'f' conflicted, there was already something named 'name'. If it + * has a body, don't allow redefinition or reextern. *) + | Some f -> + (* If 'f' already has a body, reject this. *) + if block_begin f <> At_end f then + raise (Error "redefinition of function"); + + (* If 'f' took a different number of arguments, reject. *) + if element_type (type_of f) <> ft then + raise (Error "redefinition of function with different # args"); + f + in + + (* Set names for all arguments. *) + Array.iteri (fun i a -> + let n = args.(i) in + set_value_name n a; + Hashtbl.add named_values n a; + ) (params f); + f + + let codegen_func the_fpm = function + | Ast.Function (proto, body) -> + Hashtbl.clear named_values; + let the_function = codegen_proto proto in + + (* Create a new basic block to start insertion into. *) + let bb = append_block context "entry" the_function in + position_at_end bb builder; + + try + let ret_val = codegen_expr body in + + (* Finish off the function. *) + let _ = build_ret ret_val builder in + + (* Validate the generated code, checking for consistency. *) + Llvm_analysis.assert_valid_function the_function; + + (* Optimize the function. *) + let _ = PassManager.run_function the_function the_fpm in + + the_function + with e -> + delete_function the_function; + raise e + +toplevel.ml: + .. code-block:: ocaml + + (*===----------------------------------------------------------------------=== + * Top-Level parsing and JIT Driver + *===----------------------------------------------------------------------===*) + + open Llvm + open Llvm_executionengine + + (* top ::= definition | external | expression | ';' *) + let rec main_loop the_fpm the_execution_engine stream = + match Stream.peek stream with + | None -> () + + (* ignore top-level semicolons. *) + | Some (Token.Kwd ';') -> + Stream.junk stream; + main_loop the_fpm the_execution_engine stream + + | Some token -> + begin + try match token with + | Token.Def -> + let e = Parser.parse_definition stream in + print_endline "parsed a function definition."; + dump_value (Codegen.codegen_func the_fpm e); + | Token.Extern -> + let e = Parser.parse_extern stream in + print_endline "parsed an extern."; + dump_value (Codegen.codegen_proto e); + | _ -> + (* Evaluate a top-level expression into an anonymous function. *) + let e = Parser.parse_toplevel stream in + print_endline "parsed a top-level expr"; + let the_function = Codegen.codegen_func the_fpm e in + dump_value the_function; + + (* JIT the function, returning a function pointer. *) + let result = ExecutionEngine.run_function the_function [||] + the_execution_engine in + + print_string "Evaluated to "; + print_float (GenericValue.as_float Codegen.double_type result); + print_newline (); + with Stream.Error s | Codegen.Error s -> + (* Skip token for error recovery. *) + Stream.junk stream; + print_endline s; + end; + print_string "ready> "; flush stdout; + main_loop the_fpm the_execution_engine stream + +toy.ml: + .. code-block:: ocaml + + (*===----------------------------------------------------------------------=== + * Main driver code. + *===----------------------------------------------------------------------===*) + + open Llvm + open Llvm_executionengine + open Llvm_target + open Llvm_scalar_opts + + let main () = + ignore (initialize_native_target ()); + + (* Install standard binary operators. + * 1 is the lowest precedence. *) + Hashtbl.add Parser.binop_precedence '<' 10; + Hashtbl.add Parser.binop_precedence '+' 20; + Hashtbl.add Parser.binop_precedence '-' 20; + Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) + + (* Prime the first token. *) + print_string "ready> "; flush stdout; + let stream = Lexer.lex (Stream.of_channel stdin) in + + (* Create the JIT. *) + let the_execution_engine = ExecutionEngine.create Codegen.the_module in + let the_fpm = PassManager.create_function Codegen.the_module in + + (* Set up the optimizer pipeline. Start with registering info about how the + * target lays out data structures. *) + DataLayout.add (ExecutionEngine.target_data the_execution_engine) the_fpm; + + (* Do simple "peephole" optimizations and bit-twiddling optzn. *) + add_instruction_combination the_fpm; + + (* reassociate expressions. *) + add_reassociation the_fpm; + + (* Eliminate Common SubExpressions. *) + add_gvn the_fpm; + + (* Simplify the control flow graph (deleting unreachable blocks, etc). *) + add_cfg_simplification the_fpm; + + ignore (PassManager.initialize the_fpm); + + (* Run the main "interpreter loop" now. *) + Toplevel.main_loop the_fpm the_execution_engine stream; + + (* Print out all the generated code. *) + dump_module Codegen.the_module + ;; + + main () + +bindings.c + .. code-block:: c + + #include <stdio.h> + + /* putchard - putchar that takes a double and returns 0. */ + extern double putchard(double X) { + putchar((char)X); + return 0; + } + +`Next: Extending the language: control flow <OCamlLangImpl5.html>`_ + diff --git a/docs/tutorial/OCamlLangImpl5.html b/docs/tutorial/OCamlLangImpl5.html deleted file mode 100644 index d25f1dc9bb..0000000000 --- a/docs/tutorial/OCamlLangImpl5.html +++ /dev/null @@ -1,1560 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> - -<html> -<head> - <title>Kaleidoscope: Extending the Language: Control Flow</title> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <meta name="author" content="Chris Lattner"> - <meta name="author" content="Erick Tryzelaar"> - <link rel="stylesheet" href="../_static/llvm.css" type="text/css"> -</head> - -<body> - -<h1>Kaleidoscope: Extending the Language: Control Flow</h1> - -<ul> -<li><a href="index.html">Up to Tutorial Index</a></li> -<li>Chapter 5 - <ol> - <li><a href="#intro">Chapter 5 Introduction</a></li> - <li><a href="#ifthen">If/Then/Else</a> - <ol> - <li><a href="#iflexer">Lexer Extensions</a></li> - <li><a href="#ifast">AST Extensions</a></li> - <li><a href="#ifparser">Parser Extensions</a></li> - <li><a href="#ifir">LLVM IR</a></li> - <li><a href="#ifcodegen">Code Generation</a></li> - </ol> - </li> - <li><a href="#for">'for' Loop Expression</a> - <ol> - <li><a href="#forlexer">Lexer Extensions</a></li> - <li><a href="#forast">AST Extensions</a></li> - <li><a href="#forparser">Parser Extensions</a></li> - <li><a href="#forir">LLVM IR</a></li> - <li><a href="#forcodegen">Code Generation</a></li> - </ol> - </li> - <li><a href="#code">Full Code Listing</a></li> - </ol> -</li> -<li><a href="OCamlLangImpl6.html">Chapter 6</a>: Extending the Language: -User-defined Operators</li> -</ul> - -<div class="doc_author"> - <p> - Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a> - and <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a> - </p> -</div> - -<!-- *********************************************************************** --> -<h2><a name="intro">Chapter 5 Introduction</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>Welcome to Chapter 5 of the "<a href="index.html">Implementing a language -with LLVM</a>" tutorial. Parts 1-4 described the implementation of the simple -Kaleidoscope language and included support for generating LLVM IR, followed by -optimizations and a JIT compiler. Unfortunately, as presented, Kaleidoscope is -mostly useless: it has no control flow other than call and return. This means -that you can't have conditional branches in the code, significantly limiting its -power. In this episode of "build that compiler", we'll extend Kaleidoscope to -have an if/then/else expression plus a simple 'for' loop.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="ifthen">If/Then/Else</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p> -Extending Kaleidoscope to support if/then/else is quite straightforward. It -basically requires adding lexer support for this "new" concept to the lexer, -parser, AST, and LLVM code emitter. This example is nice, because it shows how -easy it is to "grow" a language over time, incrementally extending it as new -ideas are discovered.</p> - -<p>Before we get going on "how" we add this extension, lets talk about "what" we -want. The basic idea is that we want to be able to write this sort of thing: -</p> - -<div class="doc_code"> -<pre> -def fib(x) - if x < 3 then - 1 - else - fib(x-1)+fib(x-2); -</pre> -</div> - -<p>In Kaleidoscope, every construct is an expression: there are no statements. -As such, the if/then/else expression needs to return a value like any other. -Since we're using a mostly functional form, we'll have it evaluate its -conditional, then return the 'then' or 'else' value based on how the condition -was resolved. This is very similar to the C "?:" expression.</p> - -<p>The semantics of the if/then/else expression is that it evaluates the -condition to a boolean equality value: 0.0 is considered to be false and -everything else is considered to be true. -If the condition is true, the first subexpression is evaluated and returned, if -the condition is false, the second subexpression is evaluated and returned. -Since Kaleidoscope allows side-effects, this behavior is important to nail down. -</p> - -<p>Now that we know what we "want", lets break this down into its constituent -pieces.</p> - -<!-- ======================================================================= --> -<h4><a name="iflexer">Lexer Extensions for If/Then/Else</a></h4> -<!-- ======================================================================= --> - - -<div> - -<p>The lexer extensions are straightforward. First we add new variants -for the relevant tokens:</p> - -<div class="doc_code"> -<pre> - (* control *) - | If | Then | Else | For | In -</pre> -</div> - -<p>Once we have that, we recognize the new keywords in the lexer. This is pretty simple -stuff:</p> - -<div class="doc_code"> -<pre> - ... - match Buffer.contents buffer with - | "def" -> [< 'Token.Def; stream >] - | "extern" -> [< 'Token.Extern; stream >] - | "if" -> [< 'Token.If; stream >] - | "then" -> [< 'Token.Then; stream >] - | "else" -> [< 'Token.Else; stream >] - | "for" -> [< 'Token.For; stream >] - | "in" -> [< 'Token.In; stream >] - | id -> [< 'Token.Ident id; stream >] -</pre> -</div> - -</div> - -<!-- ======================================================================= --> -<h4><a name="ifast">AST Extensions for If/Then/Else</a></h4> -<!-- ======================================================================= --> - -<div> - -<p>To represent the new expression we add a new AST variant for it:</p> - -<div class="doc_code"> -<pre> -type expr = - ... - (* variant for if/then/else. *) - | If of expr * expr * expr -</pre> -</div> - -<p>The AST variant just has pointers to the various subexpressions.</p> - -</div> - -<!-- ======================================================================= --> -<h4><a name="ifparser">Parser Extensions for If/Then/Else</a></h4> -<!-- ======================================================================= --> - -<div> - -<p>Now that we have the relevant tokens coming from the lexer and we have the -AST node to build, our parsing logic is relatively straightforward. First we -define a new parsing function:</p> - -<div class="doc_code"> -<pre> -let rec parse_primary = parser - ... - (* ifexpr ::= 'if' expr 'then' expr 'else' expr *) - | [< 'Token.If; c=parse_expr; - 'Token.Then ?? "expected 'then'"; t=parse_expr; - 'Token.Else ?? "expected 'else'"; e=parse_expr >] -> - Ast.If (c, t, e) -</pre> -</div> - -<p>Next we hook it up as a primary expression:</p> - -<div class="doc_code"> -<pre> -let rec parse_primary = parser - ... - (* ifexpr ::= 'if' expr 'then' expr 'else' expr *) - | [< 'Token.If; c=parse_expr; - 'Token.Then ?? "expected 'then'"; t=parse_expr; - 'Token.Else ?? "expected 'else'"; e=parse_expr >] -> - Ast.If (c, t, e) -</pre> -</div> - -</div> - -<!-- ======================================================================= --> -<h4><a name="ifir">LLVM IR for If/Then/Else</a></h4> -<!-- ======================================================================= --> - -<div> - -<p>Now that we have it parsing and building the AST, the final piece is adding -LLVM code generation support. This is the most interesting part of the -if/then/else example, because this is where it starts to introduce new concepts. -All of the code above has been thoroughly described in previous chapters. -</p> - -<p>To motivate the code we want to produce, lets take a look at a simple -example. Consider:</p> - -<div class="doc_code"> -<pre> -extern foo(); -extern bar(); -def baz(x) if x then foo() else bar(); -</pre> -</div> - -<p>If you disable optimizations, the code you'll (soon) get from Kaleidoscope -looks like this:</p> - -<div class="doc_code"> -<pre> -declare double @foo() - -declare double @bar() - -define double @baz(double %x) { -entry: - %ifcond = fcmp one double %x, 0.000000e+00 - br i1 %ifcond, label %then, label %else - -then: ; preds = %entry - %calltmp = call double @foo() - br label %ifcont - -else: ; preds = %entry - %calltmp1 = call double @bar() - br label %ifcont - -ifcont: ; preds = %else, %then - %iftmp = phi double [ %calltmp, %then ], [ %calltmp1, %else ] - ret double %iftmp -} -</pre> -</div> - -<p>To visualize the control flow graph, you can use a nifty feature of the LLVM -'<a href="http://llvm.org/cmds/opt.html">opt</a>' tool. If you put this LLVM IR -into "t.ll" and run "<tt>llvm-as < t.ll | opt -analyze -view-cfg</tt>", <a -href="../ProgrammersManual.html#ViewGraph">a window will pop up</a> and you'll -see this graph:</p> - -<div style="text-align: center"><img src="LangImpl5-cfg.png" alt="Example CFG" width="423" -height="315"></div> - -<p>Another way to get this is to call "<tt>Llvm_analysis.view_function_cfg -f</tt>" or "<tt>Llvm_analysis.view_function_cfg_only f</tt>" (where <tt>f</tt> -is a "<tt>Function</tt>") either by inserting actual calls into the code and -recompiling or by calling these in the debugger. LLVM has many nice features -for visualizing various graphs.</p> - -<p>Getting back to the generated code, it is fairly simple: the entry block -evaluates the conditional expression ("x" in our case here) and compares the -result to 0.0 with the "<tt><a href="../LangRef.html#i_fcmp">fcmp</a> one</tt>" -instruction ('one' is "Ordered and Not Equal"). Based on the result of this -expression, the code jumps to either the "then" or "else" blocks, which contain -the expressions for the true/false cases.</p> - -<p>Once the then/else blocks are finished executing, they both branch back to the -'ifcont' block to execute the code that happens after the if/then/else. In this -case the only thing left to do is to return to the caller of the function. The -question then becomes: how does the code know which expression to return?</p> - -<p>The answer to this question involves an important SSA operation: the -<a href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Phi -operation</a>. If you're not familiar with SSA, <a -href="http://en.wikipedia.org/wiki/Static_single_assignment_form">the wikipedia -article</a> is a good introduction and there are various other introductions to -it available on your favorite search engine. The short version is that -"execution" of the Phi operation requires "remembering" which block control came -from. The Phi operation takes on the value corresponding to the input control -block. In this case, if control comes in from the "then" block, it gets the -value of "calltmp". If control comes from the "else" block, it gets the value -of "calltmp1".</p> - -<p>At this point, you are probably starting to think "Oh no! This means my -simple and elegant front-end will have to start generating SSA form in order to -use LLVM!". Fortunately, this is not the case, and we strongly advise -<em>not</em> implementing an SSA construction algorithm in your front-end -unless there is an amazingly good reason to do so. In practice, there are two -sorts of values that float around in code written for your average imperative -programming language that might need Phi nodes:</p> - -<ol> -<li>Code that involves user variables: <tt>x = 1; x = x + 1; </tt></li> -<li>Values that are implicit in the structure of your AST, such as the Phi node -in this case.</li> -</ol> - -<p>In <a href="OCamlLangImpl7.html">Chapter 7</a> of this tutorial ("mutable -variables"), we'll talk about #1 -in depth. For now, just believe me that you don't need SSA construction to -handle this case. For #2, you have the choice of using the techniques that we will -describe for #1, or you can insert Phi nodes directly, if convenient. In this -case, it is really really easy to generate the Phi node, so we choose to do it -directly.</p> - -<p>Okay, enough of the motivation and overview, lets generate code!</p> - -</div> - -<!-- ======================================================================= --> -<h4><a name="ifcodegen">Code Generation for If/Then/Else</a></h4> -<!-- ======================================================================= --> - -<div> - -<p>In order to generate code for this, we implement the <tt>Codegen</tt> method -for <tt>IfExprAST</tt>:</p> - -<div class="doc_code"> -<pre> -let rec codegen_expr = function - ... - | Ast.If (cond, then_, else_) -> - let cond = codegen_expr cond in - - (* Convert condition to a bool by comparing equal to 0.0 *) - let zero = const_float double_type 0.0 in - let cond_val = build_fcmp Fcmp.One cond zero "ifcond" builder in -</pre> -</div> - -<p>This code is straightforward and similar to what we saw before. We emit the -expression for the condition, then compare that value to zero to get a truth -value as a 1-bit (bool) value.</p> - -<div class="doc_code"> -<pre> - (* Grab the first block so that we might later add the conditional branch - * to it at the end of the function. *) - let start_bb = insertion_block builder in - let the_function = block_parent start_bb in - - let then_bb = append_block context "then" the_function in - position_at_end then_bb builder; -</pre> -</div> - -<p> -As opposed to the <a href="LangImpl5.html">C++ tutorial</a>, we have to build -our basic blocks bottom up since we can't have dangling BasicBlocks. We start -off by saving a pointer to the first block (which might not be the entry -block), which we'll need to build a conditional branch later. We do this by -asking the <tt>builder</tt> for the current BasicBlock. The fourth line -gets the current Function object that is being built. It gets this by the -<tt>start_bb</tt> for its "parent" (the function it is currently embedded -into).</p> - -<p>Once it has that, it creates one block. It is automatically appended into -the function's list of blocks.</p> - -<div class="doc_code"> -<pre> - (* Emit 'then' value. *) - position_at_end then_bb builder; - let then_val = codegen_expr then_ in - - (* Codegen of 'then' can change the current block, update then_bb for the - * phi. We create a new name because one is used for the phi node, and the - * other is used for the conditional branch. *) - let new_then_bb = insertion_block builder in -</pre> -</div> - -<p>We move the builder to start inserting into the "then" block. Strictly -speaking, this call moves the insertion point to be at the end of the specified -block. However, since the "then" block is empty, it also starts out by -inserting at the beginning of the block. :)</p> - -<p>Once the insertion point is set, we recursively codegen the "then" expression -from the AST.</p> - -<p>The final line here is quite subtle, but is very important. The basic issue -is that when we create the Phi node in the merge block, we need to set up the -block/value pairs that indicate how the Phi will work. Importantly, the Phi -node expects to have an entry for each predecessor of the block in the CFG. Why -then, are we getting the current block when we just set it to ThenBB 5 lines -above? The problem is that the "Then" expression may actually itself change the -block that the Builder is emitting into if, for example, it contains a nested -"if/then/else" expression. Because calling Codegen recursively could -arbitrarily change the notion of the current block, we are required to get an -up-to-date value for code that will set up the Phi node.</p> - -<div class="doc_code"> -<pre> - (* Emit 'else' value. *) - let else_bb = append_block context "else" the_function in - position_at_end else_bb builder; - let else_val = codegen_expr else_ in - - (* Codegen of 'else' can change the current block, update else_bb for the - * phi. *) - let new_else_bb = insertion_block builder in -</pre> -</div> - -<p>Code generation for the 'else' block is basically identical to codegen for -the 'then' block.</p> - -<div class="doc_code"> -<pre> - (* Emit merge block. *) - let merge_bb = append_block context "ifcont" the_function in - position_at_end merge_bb builder; - let incoming = [(then_val, new_then_bb); (else_val, new_else_bb)] in - let phi = build_phi incoming "iftmp" builder in -</pre> -</div> - -<p>The first two lines here are now familiar: the first adds the "merge" block -to the Function object. The second block changes the insertion point so that -newly created code will go into the "merge" block. Once that is done, we need -to create the PHI node and set up the block/value pairs for the PHI.</p> - -<div class="doc_code"> -<pre> - (* Return to the start block to add the conditional branch. *) - position_at_end start_bb builder; - ignore (build_cond_br cond_val then_bb else_bb builder); -</pre> -</div> - -<p>Once the blocks are created, we can emit the conditional branch that chooses -between them. Note that creating new blocks does not implicitly affect the -IRBuilder, so it is still inserting into the block that the condition -went into. This is why we needed to save the "start" block.</p> - -<div class="doc_code"> -<pre> - (* Set a unconditional branch at the end of the 'then' block and the - * 'else' block to the 'merge' block. *) - position_at_end new_then_bb builder; ignore (build_br merge_bb builder); - position_at_end new_else_bb builder; ignore (build_br merge_bb builder); - - (* Finally, set the builder to the end of the merge block. *) - position_at_end merge_bb builder; - - phi -</pre> -</div> - -<p>To finish off the blocks, we create an unconditional branch -to the merge block. One interesting (and very important) aspect of the LLVM IR -is that it <a href="../LangRef.html#functionstructure">requires all basic blocks -to be "terminated"</a> with a <a href="../LangRef.html#terminators">control flow -instruction</a> such as return or branch. This means that all control flow, -<em>including fall throughs</em> must be made explicit in the LLVM IR. If you -violate this rule, the verifier will emit an error. - -<p>Finally, the CodeGen function returns the phi node as the value computed by -the if/then/else expression. In our example above, this returned value will -feed into the code for the top-level function, which will create the return -instruction.</p> - -<p>Overall, we now have the ability to execute conditional code in -Kaleidoscope. With this extension, Kaleidoscope is a fairly complete language -that can calculate a wide variety of numeric functions. Next up we'll add -another useful expression that is familiar from non-functional languages...</p> - -</div> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="for">'for' Loop Expression</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>Now that we know how to add basic control flow constructs to the language, -we have the tools to add more powerful things. Lets add something more -aggressive, a 'for' expression:</p> - -<div class="doc_code"> -<pre> - extern putchard(char); - def printstar(n) - for i = 1, i < n, 1.0 in - putchard(42); # ascii 42 = '*' - - # print 100 '*' characters - printstar(100); -</pre> -</div> - -<p>This expression defines a new variable ("i" in this case) which iterates from -a starting value, while the condition ("i < n" in this case) is true, -incrementing by an optional step value ("1.0" in this case). If the step value -is omitted, it defaults to 1.0. While the loop is true, it executes its -body expression. Because we don't have anything better to return, we'll just -define the loop as always returning 0.0. In the future when we have mutable -variables, it will get more useful.</p> - -<p>As before, lets talk about the changes that we need to Kaleidoscope to -support this.</p> - -<!-- ======================================================================= --> -<h4><a name="forlexer">Lexer Extensions for the 'for' Loop</a></h4> -<!-- ======================================================================= --> - -<div> - -<p>The lexer extensions are the same sort of thing as for if/then/else:</p> - -<div class="doc_code"> -<pre> - ... in Token.token ... - (* control *) - | If | Then | Else - <b>| For | In</b> - - ... in Lexer.lex_ident... - match Buffer.contents buffer with - | "def" -> [< 'Token.Def; stream >] - | "extern" -> [< 'Token.Extern; stream >] - | "if" -> [< 'Token.If; stream >] - | "then" -> [< 'Token.Then; stream >] - | "else" -> [< 'Token.Else; stream >] - <b>| "for" -> [< 'Token.For; stream >] - | "in" -> [< 'Token.In; stream >]</b> - | id -> [< 'Token.Ident id; stream >] -</pre> -</div> - -</div> - -<!-- ======================================================================= --> -<h4><a name="forast">AST Extensions for the 'for' Loop</a></h4> -<!-- ======================================================================= --> - -<div> - -<p>The AST variant is just as simple. It basically boils down to capturing -the variable name and the constituent expressions in the node.</p> - -<div class="doc_code"> -<pre> -type expr = - ... - (* variant for for/in. *) - | For of string * expr * expr * expr option * expr -</pre> -</div> - -</div> - -<!-- ======================================================================= --> -<h4><a name="forparser">Parser Extensions for the 'for' Loop</a></h4> -<!-- ======================================================================= --> - -<div> - -<p>The parser code is also fairly standard. The only interesting thing here is -handling of the optional step value. The parser code handles it by checking to -see if the second comma is present. If not, it sets the step value to null in -the AST node:</p> - -<div class="doc_code"> -<pre> -let rec parse_primary = parser - ... - (* forexpr - ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression *) - | [< 'Token.For; - 'Token.Ident id ?? "expected identifier after for"; - 'Token.Kwd '=' ?? "expected '=' after for"; - stream >] -> - begin parser - | [< - start=parse_expr; - 'Token.Kwd ',' ?? "expected ',' after for"; - end_=parse_expr; - stream >] -> - let step = - begin parser - | [< 'Token.Kwd ','; step=parse_expr >] -> Some step - | [< >] -> None - end stream - in - begin parser - | [< 'Token.In; body=parse_expr >] -> - Ast.For (id, start, end_, step, body) - | [< >] -> - raise (Stream.Error "expected 'in' after for") - end stream - | [< >] -> - raise (Stream.Error "expected '=' after for") - end stream -</pre> -</div> - -</div> - -<!-- ======================================================================= --> -<h4><a name="forir">LLVM IR for the 'for' Loop</a></h4> -<!-- ======================================================================= --> - -<div> - -<p>Now we get to the good part: the LLVM IR we want to generate for this thing. -With the simple example above, we get this LLVM IR (note that this dump is -generated with optimizations disabled for clarity): -</p> - -<div class="doc_code"> -<pre> -declare double @putchard(double) - -define double @printstar(double %n) { -entry: - ; initial value = 1.0 (inlined into phi) - br label %loop - -loop: ; preds = %loop, %entry - %i = phi double [ 1.000000e+00, %entry ], [ %nextvar, %loop ] - ; body - %calltmp = call double @putchard(double 4.200000e+01) - ; increment - %nextvar = fadd double %i, 1.000000e+00 - - ; termination test - %cmptmp = fcmp ult double %i, %n - %booltmp = uitofp i1 %cmptmp to double - %loopcond = fcmp one double %booltmp, 0.000000e+00 - br i1 %loopcond, label %loop, label %afterloop - -afterloop: ; preds = %loop - ; loop always returns 0.0 - ret double 0.000000e+00 -} -</pre> -</div> - -<p>This loop contains all the same constructs we saw before: a phi node, several -expressions, and some basic blocks. Lets see how this fits together.</p> - -</div> - -<!-- ======================================================================= --> -<h4><a name="forcodegen">Code Generation for the 'for' Loop</a></h4> -<!-- ======================================================================= --> - -<div> - -<p>The first part of Codegen is very simple: we just output the start expression -for the loop value:</p> - -<div class="doc_code"> -<pre> -let rec codegen_expr = function - ... - | Ast.For (var_name, start, end_, step, body) -> - (* Emit the start code first, without 'variable' in scope. *) - let start_val = codegen_expr start in -</pre> -</div> - -<p>With this out of the way, the next step is to set up the LLVM basic block -for the start of the loop body. In the case above, the whole loop body is one -block, but remember that the body code itself could consist of multiple blocks -(e.g. if it contains an if/then/else or a for/in expression).</p> - -<div class="doc_code"> -<pre> - (* Make the new basic block for the loop header, inserting after current - * block. *) - let preheader_bb = insertion_block builder in - let the_function = block_parent preheader_bb in - let loop_bb = append_block context "loop" the_function in - - (* Insert an explicit fall through from the current block to the - * loop_bb. *) - ignore (build_br loop_bb builder); -</pre> -</div> - -<p>This code is similar to what we saw for if/then/else. Because we will need -it to create the Phi node, we remember the block that falls through into the -loop. Once we have that, we create the actual block that starts the loop and -create an unconditional branch for the fall-through between the two blocks.</p> - -<div class="doc_code"> -<pre> - (* Start insertion in loop_bb. *) - position_at_end loop_bb builder; - - (* Start the PHI node with an entry for start. *) - let variable = build_phi [(start_val, preheader_bb)] var_name builder in -</pre> -</div> - -<p>Now that the "preheader" for the loop is set up, we switch to emitting code -for the loop body. To begin with, we move the insertion point and create the -PHI node for the loop induction variable. Since we already know the incoming -value for the starting value, we add it to the Phi node. Note that the Phi will -eventually get a second value for the backedge, but we can't set it up yet -(because it doesn't exist!).</p> - -<div class="doc_code"> -<pre> - (* Within the loop, the variable is defined equal to the PHI node. If it - * shadows an existing variable, we have to restore it, so save it - * now. *) - let old_val = - try Some (Hashtbl.find named_values var_name) with Not_found -> None - in - Hashtbl.add named_values var_name variable; - - (* Emit the body of the loop. This, like any other expr, can change the - * current BB. Note that we ignore the value computed by the body, but - * don't allow an error *) - ignore (codegen_expr body); -</pre> -</div> - -<p>Now the code starts to get more interesting. Our 'for' loop introduces a new -variable to the symbol table. This means that our symbol table can now contain -either function arguments or loop variables. To handle this, before we codegen -the body of the loop, we add the loop variable as the current value for its -name. Note that it is possible that there is a variable of the same name in the -outer scope. It would be easy to make this an error (emit an error and return -null if there is already an entry for VarName) but we choose to allow shadowing -of variables. In order to handle this correctly, we remember the Value that -we are potentially shadowing in <tt>old_val</tt> (which will be None if there is -no shadowed variable).</p> - -<p>Once the loop variable is set into the symbol table, the code recursively -codegen's the body. This allows the body to use the loop variable: any -references to it will naturally find it in the symbol table.</p> - -<div class="doc_code"> -<pre> - (* Emit the step value. *) - let step_val = - match step with - | Some step -> codegen_expr step - (* If not specified, use 1.0. *) - | None -> const_float double_type 1.0 - in - - let next_var = build_add variable step_val "nextvar" builder in -</pre> -</div> - -<p>Now that the body is emitted, we compute the next value of the iteration -variable by adding the step value, or 1.0 if it isn't present. -'<tt>next_var</tt>' will be the value of the loop variable on the next iteration -of the loop.</p> - -<div class="doc_code"> -<pre> - (* Compute the end condition. *) - let end_cond = codegen_expr end_ in - - (* Convert condition to a bool by comparing equal to 0.0. *) - let zero = const_float double_type 0.0 in - let end_cond = build_fcmp Fcmp.One end_cond zero "loopcond" builder in -</pre> -</div> - -<p>Finally, we evaluate the exit value of the loop, to determine whether the -loop should exit. This mirrors the condition evaluation for the if/then/else -statement.</p> - -<div class="doc_code"> -<pre> - (* Create the "after loop" block and insert it. *) - let loop_end_bb = insertion_block builder in - let after_bb = append_block context "afterloop" the_function in - - (* Insert the conditional branch into the end of loop_end_bb. *) - ignore (build_cond_br end_cond loop_bb after_bb builder); - - (* Any new code will be inserted in after_bb. *) - position_at_end after_bb builder; -</pre> -</div> - -<p>With the code for the body of the loop complete, we just need to finish up -the control flow for it. This code remembers the end block (for the phi node), then creates the block for the loop exit ("afterloop"). Based on the value of the -exit condition, it creates a conditional branch that chooses between executing -the loop again and exiting the loop. Any future code is emitted in the -"afterloop" block, so it sets the insertion position to it.</p> - -<div class="doc_code"> -<pre> - (* Add a new entry to the PHI node for the backedge. *) - add_incoming (next_var, loop_end_bb) variable; - - (* Restore the unshadowed variable. *) - begin match old_val with - | Some old_val -> Hashtbl.add named_values var_name old_val - | None -> () - end; - - (* for expr always returns 0.0. *) - const_null double_type -</pre> -</div> - -<p>The final code handles various cleanups: now that we have the -"<tt>next_var</tt>" value, we can add the incoming value to the loop PHI node. -After that, we remove the loop variable from the symbol table, so that it isn't -in scope after the for loop. Finally, code generation of the for loop always -returns 0.0, so that is what we return from <tt>Codegen.codegen_expr</tt>.</p> - -<p>With this, we conclude the "adding control flow to Kaleidoscope" chapter of -the tutorial. In this chapter we added two control flow constructs, and used -them to motivate a couple of aspects of the LLVM IR that are important for -front-end implementors to know. In the next chapter of our saga, we will get -a bit crazier and add <a href="OCamlLangImpl6.html">user-defined operators</a> -to our poor innocent language.</p> - -</div> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="code">Full Code Listing</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p> -Here is the complete code listing for our running example, enhanced with the -if/then/else and for expressions.. To build this example, use: -</p> - -<div class="doc_code"> -<pre> -# Compile -ocamlbuild toy.byte -# Run -./toy.byte -</pre> -</div> - -<p>Here is the code:</p> - -<dl> -<dt>_tags:</dt> -<dd class="doc_code"> -<pre> -<{lexer,parser}.ml>: use_camlp4, pp(camlp4of) -<*.{byte,native}>: g++, use_llvm, use_llvm_analysis -<*.{byte,native}>: use_llvm_executionengine, use_llvm_target -<*.{byte,native}>: use_llvm_scalar_opts, use_bindings -</pre> -</dd> - -<dt>myocamlbuild.ml:</dt> -<dd class="doc_code"> -<pre> -open Ocamlbuild_plugin;; - -ocaml_lib ~extern:true "llvm";; -ocaml_lib ~extern:true "llvm_analysis";; -ocaml_lib ~extern:true "llvm_executionengine";; -ocaml_lib ~extern:true "llvm_target";; -ocaml_lib ~extern:true "llvm_scalar_opts";; - -flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"]);; -dep ["link"; "ocaml"; "use_bindings"] ["bindings.o"];; -</pre> -</dd> - -<dt>token.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Lexer Tokens - *===----------------------------------------------------------------------===*) - -(* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of - * these others for known things. *) -type token = - (* commands *) - | Def | Extern - - (* primary *) - | Ident of string | Number of float - - (* unknown *) - | Kwd of char - - (* control *) - | If | Then | Else - | For | In -</pre> -</dd> - -<dt>lexer.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Lexer - *===----------------------------------------------------------------------===*) - -let rec lex = parser - (* Skip any whitespace. *) - | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream - - (* identifier: [a-zA-Z][a-zA-Z0-9] *) - | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_ident buffer stream - - (* number: [0-9.]+ *) - | [< ' ('0' .. '9' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_number buffer stream - - (* Comment until end of line. *) - | [< ' ('#'); stream >] -> - lex_comment stream - - (* Otherwise, just return the character as its ascii value. *) - | [< 'c; stream >] -> - [< 'Token.Kwd c; lex stream >] - - (* end of stream. *) - | [< >] -> [< >] - -and lex_number buffer = parser - | [< ' ('0' .. '9' | '.' as c); stream >] -> - Buffer.add_char buffer c; - lex_number buffer stream - | [< stream=lex >] -> - [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] - -and lex_ident buffer = parser - | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> - Buffer.add_char buffer c; - lex_ident buffer stream - | [< stream=lex >] -> - match Buffer.contents buffer with - | "def" -> [< 'Token.Def; stream >] - | "extern" -> [< 'Token.Extern; stream >] - | "if" -> [< 'Token.If; stream >] - | "then" -> [< 'Token.Then; stream >] - | "else" -> [< 'Token.Else; stream >] - | "for" -> [< 'Token.For; stream >] - | "in" -> [< 'Token.In; stream >] - | id -> [< 'Token.Ident id; stream >] - -and lex_comment = parser - | [< ' ('\n'); stream=lex >] -> stream - | [< 'c; e=lex_comment >] -> e - | [< >] -> [< >] -</pre> -</dd> - -<dt>ast.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Abstract Syntax Tree (aka Parse Tree) - *===----------------------------------------------------------------------===*) - -(* expr - Base type for all expression nodes. *) -type expr = - (* variant for numeric literals like "1.0". *) - | Number of float - - (* variant for referencing a variable, like "a". *) - | Variable of string - - (* variant for a binary operator. *) - | Binary of char * expr * expr - - (* variant for function calls. *) - | Call of string * expr array - - (* variant for if/then/else. *) - | If of expr * expr * expr - - (* variant for for/in. *) - | For of string * expr * expr * expr option * expr - -(* proto - This type represents the "prototype" for a function, which captures - * its name, and its argument names (thus implicitly the number of arguments the - * function takes). *) -type proto = Prototype of string * string array - -(* func - This type represents a function definition itself. *) -type func = Function of proto * expr -</pre> -</dd> - -<dt>parser.ml:</dt> -<dd class="doc_code"> -<pre> -(*===---------------------------------------------------------------------=== - * Parser - *===---------------------------------------------------------------------===*) - -(* binop_precedence - This holds the precedence for each binary operator that is - * defined *) -let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 - -(* precedence - Get the precedence of the pending binary operator token. *) -let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 - -(* primary - * ::= identifier - * ::= numberexpr - * ::= parenexpr - * ::= ifexpr - * ::= forexpr *) -let rec parse_primary = parser - (* numberexpr ::= number *) - | [< 'Token.Number n >] -> Ast.Number n - - (* parenexpr ::= '(' expression ')' *) - | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e - - (* identifierexpr - * ::= identifier - * ::= identifier '(' argumentexpr ')' *) - | [< 'Token.Ident id; stream >] -> - let rec parse_args accumulator = parser - | [< e=parse_expr; stream >] -> - begin parser - | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e - | [< >] -> e :: accumulator - end stream - | [< >] -> accumulator - in - let rec parse_ident id = parser - (* Call. *) - | [< 'Token.Kwd '('; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')'">] -> - Ast.Call (id, Array.of_list (List.rev args)) - - (* Simple variable ref. *) - | [< >] -> Ast.Variable id - in - parse_ident id stream - - (* ifexpr ::= 'if' expr 'then' expr 'else' expr *) - | [< 'Token.If; c=parse_expr; - 'Token.Then ?? "expected 'then'"; t=parse_expr; - 'Token.Else ?? "expected 'else'"; e=parse_expr >] -> - Ast.If (c, t, e) - - (* forexpr - ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression *) - | [< 'Token.For; - 'Token.Ident id ?? "expected identifier after for"; - 'Token.Kwd '=' ?? "expected '=' after for"; - stream >] -> - begin parser - | [< - start=parse_expr; - 'Token.Kwd ',' ?? "expected ',' after for"; - end_=parse_expr; - stream >] -> - let step = - begin parser - | [< 'Token.Kwd ','; step=parse_expr >] -> Some step - | [< >] -> None - end stream - in - begin parser - | [< 'Token.In; body=parse_expr >] -> - Ast.For (id, start, end_, step, body) - | [< >] -> - raise (Stream.Error "expected 'in' after for") - end stream - | [< >] -> - raise (Stream.Error "expected '=' after for") - end stream - - | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") - -(* binoprhs - * ::= ('+' primary)* *) -and parse_bin_rhs expr_prec lhs stream = - match Stream.peek stream with - (* If this is a binop, find its precedence. *) - | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> - let token_prec = precedence c in - - (* If this is a binop that binds at least as tightly as the current binop, - * consume it, otherwise we are done. *) - if token_prec < expr_prec then lhs else begin - (* Eat the binop. *) - Stream.junk stream; - - (* Parse the primary expression after the binary operator. *) - let rhs = parse_primary stream in - - (* Okay, we know this is a binop. *) - let rhs = - match Stream.peek stream with - | Some (Token.Kwd c2) -> - (* If BinOp binds less tightly with rhs than the operator after - * rhs, let the pending operator take rhs as its lhs. *) - let next_prec = precedence c2 in - if token_prec < next_prec - then parse_bin_rhs (token_prec + 1) rhs stream - else rhs - | _ -> rhs - in - - (* Merge lhs/rhs. *) - let lhs = Ast.Binary (c, lhs, rhs) in - parse_bin_rhs expr_prec lhs stream - end - | _ -> lhs - -(* expression - * ::= primary binoprhs *) -and parse_expr = parser - | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream - -(* prototype - * ::= id '(' id* ')' *) -let parse_prototype = - let rec parse_args accumulator = parser - | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e - | [< >] -> accumulator - in - - parser - | [< 'Token.Ident id; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - (* success. *) - Ast.Prototype (id, Array.of_list (List.rev args)) - - | [< >] -> - raise (Stream.Error "expected function name in prototype") - -(* definition ::= 'def' prototype expression *) -let parse_definition = parser - | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> - Ast.Function (p, e) - -(* toplevelexpr ::= expression *) -let parse_toplevel = parser - | [< e=parse_expr >] -> - (* Make an anonymous proto. *) - Ast.Function (Ast.Prototype ("", [||]), e) - -(* external ::= 'extern' prototype *) -let parse_extern = parser - | [< 'Token.Extern; e=parse_prototype >] -> e -</pre> -</dd> - -<dt>codegen.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Code Generation - *===----------------------------------------------------------------------===*) - -open Llvm - -exception Error of string - -let context = global_context () -let the_module = create_module context "my cool jit" -let builder = builder context -let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 -let double_type = double_type context - -let rec codegen_expr = function - | Ast.Number n -> const_float double_type n - | Ast.Variable name -> - (try Hashtbl.find named_values name with - | Not_found -> raise (Error "unknown variable name")) - | Ast.Binary (op, lhs, rhs) -> - let lhs_val = codegen_expr lhs in - let rhs_val = codegen_expr rhs in - begin - match op with - | '+' -> build_add lhs_val rhs_val "addtmp" builder - | '-' -> build_sub lhs_val rhs_val "subtmp" builder - | '*' -> build_mul lhs_val rhs_val "multmp" builder - | '<' -> - (* Convert bool 0/1 to double 0.0 or 1.0 *) - let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in - build_uitofp i double_type "booltmp" builder - | _ -> raise (Error "invalid binary operator") - end - | Ast.Call (callee, args) -> - (* Look up the name in the module table. *) - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "unknown function referenced") - in - let params = params callee in - - (* If argument mismatch error. *) - if Array.length params == Array.length args then () else - raise (Error "incorrect # arguments passed"); - let args = Array.map codegen_expr args in - build_call callee args "calltmp" builder - | Ast.If (cond, then_, else_) -> - let cond = codegen_expr cond in - - (* Convert condition to a bool by comparing equal to 0.0 *) - let zero = const_float double_type 0.0 in - let cond_val = build_fcmp Fcmp.One cond zero "ifcond" builder in - - (* Grab the first block so that we might later add the conditional branch - * to it at the end of the function. *) - let start_bb = insertion_block builder in - let the_function = block_parent start_bb in - - let then_bb = append_block context "then" the_function in - - (* Emit 'then' value. *) - position_at_end then_bb builder; - let then_val = codegen_expr then_ in - - (* Codegen of 'then' can change the current block, update then_bb for the - * phi. We create a new name because one is used for the phi node, and the - * other is used for the conditional branch. *) - let new_then_bb = insertion_block builder in - - (* Emit 'else' value. *) - let else_bb = append_block context "else" the_function in - position_at_end else_bb builder; - let else_val = codegen_expr else_ in - - (* Codegen of 'else' can change the current block, update else_bb for the - * phi. *) - let new_else_bb = insertion_block builder in - - (* Emit merge block. *) - let merge_bb = append_block context "ifcont" the_function in - position_at_end merge_bb builder; - let incoming = [(then_val, new_then_bb); (else_val, new_else_bb)] in - let phi = build_phi incoming "iftmp" builder in - - (* Return to the start block to add the conditional branch. *) - position_at_end start_bb builder; - ignore (build_cond_br cond_val then_bb else_bb builder); - - (* Set a unconditional branch at the end of the 'then' block and the - * 'else' block to the 'merge' block. *) - position_at_end new_then_bb builder; ignore (build_br merge_bb builder); - position_at_end new_else_bb builder; ignore (build_br merge_bb builder); - - (* Finally, set the builder to the end of the merge block. *) - position_at_end merge_bb builder; - - phi - | Ast.For (var_name, start, end_, step, body) -> - (* Emit the start code first, without 'variable' in scope. *) - let start_val = codegen_expr start in - - (* Make the new basic block for the loop header, inserting after current - * block. *) - let preheader_bb = insertion_block builder in - let the_function = block_parent preheader_bb in - let loop_bb = append_block context "loop" the_function in - - (* Insert an explicit fall through from the current block to the - * loop_bb. *) - ignore (build_br loop_bb builder); - - (* Start insertion in loop_bb. *) - position_at_end loop_bb builder; - - (* Start the PHI node with an entry for start. *) - let variable = build_phi [(start_val, preheader_bb)] var_name builder in - - (* Within the loop, the variable is defined equal to the PHI node. If it - * shadows an existing variable, we have to restore it, so save it - * now. *) - let old_val = - try Some (Hashtbl.find named_values var_name) with Not_found -> None - in - Hashtbl.add named_values var_name variable; - - (* Emit the body of the loop. This, like any other expr, can change the - * current BB. Note that we ignore the value computed by the body, but - * don't allow an error *) - ignore (codegen_expr body); - - (* Emit the step value. *) - let step_val = - match step with - | Some step -> codegen_expr step - (* If not specified, use 1.0. *) - | None -> const_float double_type 1.0 - in - - let next_var = build_add variable step_val "nextvar" builder in - - (* Compute the end condition. *) - let end_cond = codegen_expr end_ in - - (* Convert condition to a bool by comparing equal to 0.0. *) - let zero = const_float double_type 0.0 in - let end_cond = build_fcmp Fcmp.One end_cond zero "loopcond" builder in - - (* Create the "after loop" block and insert it. *) - let loop_end_bb = insertion_block builder in - let after_bb = append_block context "afterloop" the_function in - - (* Insert the conditional branch into the end of loop_end_bb. *) - ignore (build_cond_br end_cond loop_bb after_bb builder); - - (* Any new code will be inserted in after_bb. *) - position_at_end after_bb builder; - - (* Add a new entry to the PHI node for the backedge. *) - add_incoming (next_var, loop_end_bb) variable; - - (* Restore the unshadowed variable. *) - begin match old_val with - | Some old_val -> Hashtbl.add named_values var_name old_val - | None -> () - end; - - (* for expr always returns 0.0. *) - const_null double_type - -let codegen_proto = function - | Ast.Prototype (name, args) -> - (* Make the function type: double(double,double) etc. *) - let doubles = Array.make (Array.length args) double_type in - let ft = function_type double_type doubles in - let f = - match lookup_function name the_module with - | None -> declare_function name ft the_module - - (* If 'f' conflicted, there was already something named 'name'. If it - * has a body, don't allow redefinition or reextern. *) - | Some f -> - (* If 'f' already has a body, reject this. *) - if block_begin f <> At_end f then - raise (Error "redefinition of function"); - - (* If 'f' took a different number of arguments, reject. *) - if element_type (type_of f) <> ft then - raise (Error "redefinition of function with different # args"); - f - in - - (* Set names for all arguments. *) - Array.iteri (fun i a -> - let n = args.(i) in - set_value_name n a; - Hashtbl.add named_values n a; - ) (params f); - f - -let codegen_func the_fpm = function - | Ast.Function (proto, body) -> - Hashtbl.clear named_values; - let the_function = codegen_proto proto in - - (* Create a new basic block to start insertion into. *) - let bb = append_block context "entry" the_function in - position_at_end bb builder; - - try - let ret_val = codegen_expr body in - - (* Finish off the function. *) - let _ = build_ret ret_val builder in - - (* Validate the generated code, checking for consistency. *) - Llvm_analysis.assert_valid_function the_function; - - (* Optimize the function. *) - let _ = PassManager.run_function the_function the_fpm in - - the_function - with e -> - delete_function the_function; - raise e -</pre> -</dd> - -<dt>toplevel.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Top-Level parsing and JIT Driver - *===----------------------------------------------------------------------===*) - -open Llvm -open Llvm_executionengine - -(* top ::= definition | external | expression | ';' *) -let rec main_loop the_fpm the_execution_engine stream = - match Stream.peek stream with - | None -> () - - (* ignore top-level semicolons. *) - | Some (Token.Kwd ';') -> - Stream.junk stream; - main_loop the_fpm the_execution_engine stream - - | Some token -> - begin - try match token with - | Token.Def -> - let e = Parser.parse_definition stream in - print_endline "parsed a function definition."; - dump_value (Codegen.codegen_func the_fpm e); - | Token.Extern -> - let e = Parser.parse_extern stream in - print_endline "parsed an extern."; - dump_value (Codegen.codegen_proto e); - | _ -> - (* Evaluate a top-level expression into an anonymous function. *) - let e = Parser.parse_toplevel stream in - print_endline "parsed a top-level expr"; - let the_function = Codegen.codegen_func the_fpm e in - dump_value the_function; - - (* JIT the function, returning a function pointer. *) - let result = ExecutionEngine.run_function the_function [||] - the_execution_engine in - - print_string "Evaluated to "; - print_float (GenericValue.as_float Codegen.double_type result); - print_newline (); - with Stream.Error s | Codegen.Error s -> - (* Skip token for error recovery. *) - Stream.junk stream; - print_endline s; - end; - print_string "ready> "; flush stdout; - main_loop the_fpm the_execution_engine stream -</pre> -</dd> - -<dt>toy.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Main driver code. - *===----------------------------------------------------------------------===*) - -open Llvm -open Llvm_executionengine -open Llvm_target -open Llvm_scalar_opts - -let main () = - ignore (initialize_native_target ()); - - (* Install standard binary operators. - * 1 is the lowest precedence. *) - Hashtbl.add Parser.binop_precedence '<' 10; - Hashtbl.add Parser.binop_precedence '+' 20; - Hashtbl.add Parser.binop_precedence '-' 20; - Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) - - (* Prime the first token. *) - print_string "ready> "; flush stdout; - let stream = Lexer.lex (Stream.of_channel stdin) in - - (* Create the JIT. *) - let the_execution_engine = ExecutionEngine.create Codegen.the_module in - let the_fpm = PassManager.create_function Codegen.the_module in - - (* Set up the optimizer pipeline. Start with registering info about how the - * target lays out data structures. *) - DataLayout.add (ExecutionEngine.target_data the_execution_engine) the_fpm; - - (* Do simple "peephole" optimizations and bit-twiddling optzn. *) - add_instruction_combination the_fpm; - - (* reassociate expressions. *) - add_reassociation the_fpm; - - (* Eliminate Common SubExpressions. *) - add_gvn the_fpm; - - (* Simplify the control flow graph (deleting unreachable blocks, etc). *) - add_cfg_simplification the_fpm; - - ignore (PassManager.initialize the_fpm); - - (* Run the main "interpreter loop" now. *) - Toplevel.main_loop the_fpm the_execution_engine stream; - - (* Print out all the generated code. *) - dump_module Codegen.the_module -;; - -main () -</pre> -</dd> - -<dt>bindings.c</dt> -<dd class="doc_code"> -<pre> -#include <stdio.h> - -/* putchard - putchar that takes a double and returns 0. */ -extern double putchard(double X) { - putchar((char)X); - return 0; -} -</pre> -</dd> -</dl> - -<a href="OCamlLangImpl6.html">Next: Extending the language: user-defined -operators</a> -</div> - -<!-- *********************************************************************** --> -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> - - <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> - <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a><br> - <a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> -</body> -</html> diff --git a/docs/tutorial/OCamlLangImpl5.rst b/docs/tutorial/OCamlLangImpl5.rst new file mode 100644 index 0000000000..203fb6f73b --- /dev/null +++ b/docs/tutorial/OCamlLangImpl5.rst @@ -0,0 +1,1365 @@ +================================================== +Kaleidoscope: Extending the Language: Control Flow +================================================== + +.. contents:: + :local: + +Written by `Chris Lattner <mailto:sabre@nondot.org>`_ and `Erick +Tryzelaar <mailto:idadesub@users.sourceforge.net>`_ + +Chapter 5 Introduction +====================== + +Welcome to Chapter 5 of the "`Implementing a language with +LLVM <index.html>`_" tutorial. Parts 1-4 described the implementation of +the simple Kaleidoscope language and included support for generating +LLVM IR, followed by optimizations and a JIT compiler. Unfortunately, as +presented, Kaleidoscope is mostly useless: it has no control flow other +than call and return. This means that you can't have conditional +branches in the code, significantly limiting its power. In this episode +of "build that compiler", we'll extend Kaleidoscope to have an +if/then/else expression plus a simple 'for' loop. + +If/Then/Else +============ + +Extending Kaleidoscope to support if/then/else is quite straightforward. +It basically requires adding lexer support for this "new" concept to the +lexer, parser, AST, and LLVM code emitter. This example is nice, because +it shows how easy it is to "grow" a language over time, incrementally +extending it as new ideas are discovered. + +Before we get going on "how" we add this extension, lets talk about +"what" we want. The basic idea is that we want to be able to write this +sort of thing: + +:: + + def fib(x) + if x < 3 then + 1 + else + fib(x-1)+fib(x-2); + +In Kaleidoscope, every construct is an expression: there are no +statements. As such, the if/then/else expression needs to return a value +like any other. Since we're using a mostly functional form, we'll have +it evaluate its conditional, then return the 'then' or 'else' value +based on how the condition was resolved. This is very similar to the C +"?:" expression. + +The semantics of the if/then/else expression is that it evaluates the +condition to a boolean equality value: 0.0 is considered to be false and +everything else is considered to be true. If the condition is true, the +first subexpression is evaluated and returned, if the condition is +false, the second subexpression is evaluated and returned. Since +Kaleidoscope allows side-effects, this behavior is important to nail +down. + +Now that we know what we "want", lets break this down into its +constituent pieces. + +Lexer Extensions for If/Then/Else +--------------------------------- + +The lexer extensions are straightforward. First we add new variants for +the relevant tokens: + +.. code-block:: ocaml + + (* control *) + | If | Then | Else | For | In + +Once we have that, we recognize the new keywords in the lexer. This is +pretty simple stuff: + +.. code-block:: ocaml + + ... + match Buffer.contents buffer with + | "def" -> [< 'Token.Def; stream >] + | "extern" -> [< 'Token.Extern; stream >] + | "if" -> [< 'Token.If; stream >] + | "then" -> [< 'Token.Then; stream >] + | "else" -> [< 'Token.Else; stream >] + | "for" -> [< 'Token.For; stream >] + | "in" -> [< 'Token.In; stream >] + | id -> [< 'Token.Ident id; stream >] + +AST Extensions for If/Then/Else +------------------------------- + +To represent the new expression we add a new AST variant for it: + +.. code-block:: ocaml + + type expr = + ... + (* variant for if/then/else. *) + | If of expr * expr * expr + +The AST variant just has pointers to the various subexpressions. + +Parser Extensions for If/Then/Else +---------------------------------- + +Now that we have the relevant tokens coming from the lexer and we have +the AST node to build, our parsing logic is relatively straightforward. +First we define a new parsing function: + +.. code-block:: ocaml + + let rec parse_primary = parser + ... + (* ifexpr ::= 'if' expr 'then' expr 'else' expr *) + | [< 'Token.If; c=parse_expr; + 'Token.Then ?? "expected 'then'"; t=parse_expr; + 'Token.Else ?? "expected 'else'"; e=parse_expr >] -> + Ast.If (c, t, e) + +Next we hook it up as a primary expression: + +.. code-block:: ocaml + + let rec parse_primary = parser + ... + (* ifexpr ::= 'if' expr 'then' expr 'else' expr *) + | [< 'Token.If; c=parse_expr; + 'Token.Then ?? "expected 'then'"; t=parse_expr; + 'Token.Else ?? "expected 'else'"; e=parse_expr >] -> + Ast.If (c, t, e) + +LLVM IR for If/Then/Else +------------------------ + +Now that we have it parsing and building the AST, the final piece is +adding LLVM code generation support. This is the most interesting part +of the if/then/else example, because this is where it starts to +introduce new concepts. All of the code above has been thoroughly +described in previous chapters. + +To motivate the code we want to produce, lets take a look at a simple +example. Consider: + +:: + + extern foo(); + extern bar(); + def baz(x) if x then foo() else bar(); + +If you disable optimizations, the code you'll (soon) get from +Kaleidoscope looks like this: + +.. code-block:: llvm + + declare double @foo() + + declare double @bar() + + define double @baz(double %x) { + entry: + %ifcond = fcmp one double %x, 0.000000e+00 + br i1 %ifcond, label %then, label %else + + then: ; preds = %entry + %calltmp = call double @foo() + br label %ifcont + + else: ; preds = %entry + %calltmp1 = call double @bar() + br label %ifcont + + ifcont: ; preds = %else, %then + %iftmp = phi double [ %calltmp, %then ], [ %calltmp1, %else ] + ret double %iftmp + } + +To visualize the control flow graph, you can use a nifty feature of the +LLVM '`opt <http://llvm.org/cmds/opt.html>`_' tool. If you put this LLVM +IR into "t.ll" and run "``llvm-as < t.ll | opt -analyze -view-cfg``", `a +window will pop up <../ProgrammersManual.html#ViewGraph>`_ and you'll +see this graph: + +.. figure:: LangImpl5-cfg.png + :align: center + :alt: Example CFG + + Example CFG + +Another way to get this is to call +"``Llvm_analysis.view_function_cfg f``" or +"``Llvm_analysis.view_function_cfg_only f``" (where ``f`` is a +"``Function``") either by inserting actual calls into the code and +recompiling or by calling these in the debugger. LLVM has many nice +features for visualizing various graphs. + +Getting back to the generated code, it is fairly simple: the entry block +evaluates the conditional expression ("x" in our case here) and compares +the result to 0.0 with the "``fcmp one``" instruction ('one' is "Ordered +and Not Equal"). Based on the result of this expression, the code jumps +to either the "then" or "else" blocks, which contain the expressions for +the true/false cases. + +Once the then/else blocks are finished executing, they both branch back +to the 'ifcont' block to execute the code that happens after the +if/then/else. In this case the only thing left to do is to return to the +caller of the function. The question then becomes: how does the code +know which expression to return? + +The answer to this question involves an important SSA operation: the +`Phi +operation <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_. +If you're not familiar with SSA, `the wikipedia +article <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_ +is a good introduction and there are various other introductions to it +available on your favorite search engine. The short version is that +"execution" of the Phi operation requires "remembering" which block +control came from. The Phi operation takes on the value corresponding to +the input control block. In this case, if control comes in from the +"then" block, it gets the value of "calltmp". If control comes from the +"else" block, it gets the value of "calltmp1". + +At this point, you are probably starting to think "Oh no! This means my +simple and elegant front-end will have to start generating SSA form in +order to use LLVM!". Fortunately, this is not the case, and we strongly +advise *not* implementing an SSA construction algorithm in your +front-end unless there is an amazingly good reason to do so. In +practice, there are two sorts of values that float around in code +written for your average imperative programming language that might need +Phi nodes: + +#. Code that involves user variables: ``x = 1; x = x + 1;`` +#. Values that are implicit in the structure of your AST, such as the + Phi node in this case. + +In `Chapter 7 <OCamlLangImpl7.html>`_ of this tutorial ("mutable +variables"), we'll talk about #1 in depth. For now, just believe me that +you don't need SSA construction to handle this case. For #2, you have +the choice of using the techniques that we will describe for #1, or you +can insert Phi nodes directly, if convenient. In this case, it is really +really easy to generate the Phi node, so we choose to do it directly. + +Okay, enough of the motivation and overview, lets generate code! + +Code Generation for If/Then/Else +-------------------------------- + +In order to generate code for this, we implement the ``Codegen`` method +for ``IfExprAST``: + +.. code-block:: ocaml + + let rec codegen_expr = function + ... + | Ast.If (cond, then_, else_) -> + let cond = codegen_expr cond in + + (* Convert condition to a bool by comparing equal to 0.0 *) + let zero = const_float double_type 0.0 in + let cond_val = build_fcmp Fcmp.One cond zero "ifcond" builder in + +This code is straightforward and similar to what we saw before. We emit +the expression for the condition, then compare that value to zero to get +a truth value as a 1-bit (bool) value. + +.. code-block:: ocaml + + (* Grab the first block so that we might later add the conditional branch + * to it at the end of the function. *) + let start_bb = insertion_block builder in + let the_function = block_parent start_bb in + + let then_bb = append_block context "then" the_function in + position_at_end then_bb builder; + +As opposed to the `C++ tutorial <LangImpl5.html>`_, we have to build our +basic blocks bottom up since we can't have dangling BasicBlocks. We +start off by saving a pointer to the first block (which might not be the +entry block), which we'll need to build a conditional branch later. We +do this by asking the ``builder`` for the current BasicBlock. The fourth +line gets the current Function object that is being built. It gets this +by the ``start_bb`` for its "parent" (the function it is currently +embedded into). + +Once it has that, it creates one block. It is automatically appended +into the function's list of blocks. + +.. code-block:: ocaml + + (* Emit 'then' value. *) + position_at_end then_bb builder; + let then_val = codegen_expr then_ in + + (* Codegen of 'then' can change the current block, update then_bb for the + * phi. We create a new name because one is used for the phi node, and the + * other is used for the conditional branch. *) + let new_then_bb = insertion_block builder in + +We move the builder to start inserting into the "then" block. Strictly +speaking, this call moves the insertion point to be at the end of the +specified block. However, since the "then" block is empty, it also +starts out by inserting at the beginning of the block. :) + +Once the insertion point is set, we recursively codegen the "then" +expression from the AST. + +The final line here is quite subtle, but is very important. The basic +issue is that when we create the Phi node in the merge block, we need to +set up the block/value pairs that indicate how the Phi will work. +Importantly, the Phi node expects to have an entry for each predecessor +of the block in the CFG. Why then, are we getting the current block when +we just set it to ThenBB 5 lines above? The problem is that the "Then" +expression may actually itself change the block that the Builder is +emitting into if, for example, it contains a nested "if/then/else" +expression. Because calling Codegen recursively could arbitrarily change +the notion of the current block, we are required to get an up-to-date +value for code that will set up the Phi node. + +.. code-block:: ocaml + + (* Emit 'else' value. *) + let else_bb = append_block context "else" the_function in + position_at_end else_bb builder; + let else_val = codegen_expr else_ in + + (* Codegen of 'else' can change the current block, update else_bb for the + * phi. *) + let new_else_bb = insertion_block builder in + +Code generation for the 'else' block is basically identical to codegen +for the 'then' block. + +.. code-block:: ocaml + + (* Emit merge block. *) + let merge_bb = append_block context "ifcont" the_function in + position_at_end merge_bb builder; + let incoming = [(then_val, new_then_bb); (else_val, new_else_bb)] in + let phi = build_phi incoming "iftmp" builder in + +The first two lines here are now familiar: the first adds the "merge" +block to the Function object. The second block changes the insertion +point so that newly created code will go into the "merge" block. Once +that is done, we need to create the PHI node and set up the block/value +pairs for the PHI. + +.. code-block:: ocaml + + (* Return to the start block to add the conditional branch. *) + position_at_end start_bb builder; + ignore (build_cond_br cond_val then_bb else_bb builder); + +Once the blocks are created, we can emit the conditional branch that +chooses between them. Note that creating new blocks does not implicitly +affect the IRBuilder, so it is still inserting into the block that the +condition went into. This is why we needed to save the "start" block. + +.. code-block:: ocaml + + (* Set a unconditional branch at the end of the 'then' block and the + * 'else' block to the 'merge' block. *) + position_at_end new_then_bb builder; ignore (build_br merge_bb builder); + position_at_end new_else_bb builder; ignore (build_br merge_bb builder); + + (* Finally, set the builder to the end of the merge block. *) + position_at_end merge_bb builder; + + phi + +To finish off the blocks, we create an unconditional branch to the merge +block. One interesting (and very important) aspect of the LLVM IR is +that it `requires all basic blocks to be +"terminated" <../LangRef.html#functionstructure>`_ with a `control flow +instruction <../LangRef.html#terminators>`_ such as return or branch. +This means that all control flow, *including fall throughs* must be made +explicit in the LLVM IR. If you violate this rule, the verifier will +emit an error. + +Finally, the CodeGen function returns the phi node as the value computed +by the if/then/else expression. In our example above, this returned +value will feed into the code for the top-level function, which will +create the return instruction. + +Overall, we now have the ability to execute conditional code in +Kaleidoscope. With this extension, Kaleidoscope is a fairly complete +language that can calculate a wide variety of numeric functions. Next up +we'll add another useful expression that is familiar from non-functional +languages... + +'for' Loop Expression +===================== + +Now that we know how to add basic control flow constructs to the +language, we have the tools to add more powerful things. Lets add +something more aggressive, a 'for' expression: + +:: + + extern putchard(char); + def printstar(n) + for i = 1, i < n, 1.0 in + putchard(42); # ascii 42 = '*' + + # print 100 '*' characters + printstar(100); + +This expression defines a new variable ("i" in this case) which iterates +from a starting value, while the condition ("i < n" in this case) is +true, incrementing by an optional step value ("1.0" in this case). If +the step value is omitted, it defaults to 1.0. While the loop is true, +it executes its body expression. Because we don't have anything better +to return, we'll just define the loop as always returning 0.0. In the +future when we have mutable variables, it will get more useful. + +As before, lets talk about the changes that we need to Kaleidoscope to +support this. + +Lexer Extensions for the 'for' Loop +----------------------------------- + +The lexer extensions are the same sort of thing as for if/then/else: + +.. code-block:: ocaml + + ... in Token.token ... + (* control *) + | If | Then | Else + | For | In + + ... in Lexer.lex_ident... + match Buffer.contents buffer with + | "def" -> [< 'Token.Def; stream >] + | "extern" -> [< 'Token.Extern; stream >] + | "if" -> [< 'Token.If; stream >] + | "then" -> [< 'Token.Then; stream >] + | "else" -> [< 'Token.Else; stream >] + | "for" -> [< 'Token.For; stream >] + | "in" -> [< 'Token.In; stream >] + | id -> [< 'Token.Ident id; stream >] + +AST Extensions for the 'for' Loop +--------------------------------- + +The AST variant is just as simple. It basically boils down to capturing +the variable name and the constituent expressions in the node. + +.. code-block:: ocaml + + type expr = + ... + (* variant for for/in. *) + | For of string * expr * expr * expr option * expr + +Parser Extensions for the 'for' Loop +------------------------------------ + +The parser code is also fairly standard. The only interesting thing here +is handling of the optional step value. The parser code handles it by +checking to see if the second comma is present. If not, it sets the step +value to null in the AST node: + +.. code-block:: ocaml + + let rec parse_primary = parser + ... + (* forexpr + ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression *) + | [< 'Token.For; + 'Token.Ident id ?? "expected identifier after for"; + 'Token.Kwd '=' ?? "expected '=' after for"; + stream >] -> + begin parser + | [< + start=parse_expr; + 'Token.Kwd ',' ?? "expected ',' after for"; + end_=parse_expr; + stream >] -> + let step = + begin parser + | [< 'Token.Kwd ','; step=parse_expr >] -> Some step + | [< >] -> None + end stream + in + begin parser + | [< 'Token.In; body=parse_expr >] -> + Ast.For (id, start, end_, step, body) + | [< >] -> + raise (Stream.Error "expected 'in' after for") + end stream + | [< >] -> + raise (Stream.Error "expected '=' after for") + end stream + +LLVM IR for the 'for' Loop +-------------------------- + +Now we get to the good part: the LLVM IR we want to generate for this +thing. With the simple example above, we get this LLVM IR (note that +this dump is generated with optimizations disabled for clarity): + +.. code-block:: llvm + + declare double @putchard(double) + + define double @printstar(double %n) { + entry: + ; initial value = 1.0 (inlined into phi) + br label %loop + + loop: ; preds = %loop, %entry + %i = phi double [ 1.000000e+00, %entry ], [ %nextvar, %loop ] + ; body + %calltmp = call double @putchard(double 4.200000e+01) + ; increment + %nextvar = fadd double %i, 1.000000e+00 + + ; termination test + %cmptmp = fcmp ult double %i, %n + %booltmp = uitofp i1 %cmptmp to double + %loopcond = fcmp one double %booltmp, 0.000000e+00 + br i1 %loopcond, label %loop, label %afterloop + + afterloop: ; preds = %loop + ; loop always returns 0.0 + ret double 0.000000e+00 + } + +This loop contains all the same constructs we saw before: a phi node, +several expressions, and some basic blocks. Lets see how this fits +together. + +Code Generation for the 'for' Loop +---------------------------------- + +The first part of Codegen is very simple: we just output the start +expression for the loop value: + +.. code-block:: ocaml + + let rec codegen_expr = function + ... + | Ast.For (var_name, start, end_, step, body) -> + (* Emit the start code first, without 'variable' in scope. *) + let start_val = codegen_expr start in + +With this out of the way, the next step is to set up the LLVM basic +block for the start of the loop body. In the case above, the whole loop +body is one block, but remember that the body code itself could consist +of multiple blocks (e.g. if it contains an if/then/else or a for/in +expression). + +.. code-block:: ocaml + + (* Make the new basic block for the loop header, inserting after current + * block. *) + let preheader_bb = insertion_block builder in + let the_function = block_parent preheader_bb in + let loop_bb = append_block context "loop" the_function in + + (* Insert an explicit fall through from the current block to the + * loop_bb. *) + ignore (build_br loop_bb builder); + +This code is similar to what we saw for if/then/else. Because we will +need it to create the Phi node, we remember the block that falls through +into the loop. Once we have that, we create the actual block that starts +the loop and create an unconditional branch for the fall-through between +the two blocks. + +.. code-block:: ocaml + + (* Start insertion in loop_bb. *) + position_at_end loop_bb builder; + + (* Start the PHI node with an entry for start. *) + let variable = build_phi [(start_val, preheader_bb)] var_name builder in + +Now that the "preheader" for the loop is set up, we switch to emitting +code for the loop body. To begin with, we move the insertion point and +create the PHI node for the loop induction variable. Since we already +know the incoming value for the starting value, we add it to the Phi +node. Note that the Phi will eventually get a second value for the +backedge, but we can't set it up yet (because it doesn't exist!). + +.. code-block:: ocaml + + (* Within the loop, the variable is defined equal to the PHI node. If it + * shadows an existing variable, we have to restore it, so save it + * now. *) + let old_val = + try Some (Hashtbl.find named_values var_name) with Not_found -> None + in + Hashtbl.add named_values var_name variable; + + (* Emit the body of the loop. This, like any other expr, can change the + * current BB. Note that we ignore the value computed by the body, but + * don't allow an error *) + ignore (codegen_expr body); + +Now the code starts to get more interesting. Our 'for' loop introduces a +new variable to the symbol table. This means that our symbol table can +now contain either function arguments or loop variables. To handle this, +before we codegen the body of the loop, we add the loop variable as the +current value for its name. Note that it is possible that there is a +variable of the same name in the outer scope. It would be easy to make +this an error (emit an error and return null if there is already an +entry for VarName) but we choose to allow shadowing of variables. In +order to handle this correctly, we remember the Value that we are +potentially shadowing in ``old_val`` (which will be None if there is no +shadowed variable). + +Once the loop variable is set into the symbol table, the code +recursively codegen's the body. This allows the body to use the loop +variable: any references to it will naturally find it in the symbol +table. + +.. code-block:: ocaml + + (* Emit the step value. *) + let step_val = + match step with + | Some step -> codegen_expr step + (* If not specified, use 1.0. *) + | None -> const_float double_type 1.0 + in + + let next_var = build_add variable step_val "nextvar" builder in + +Now that the body is emitted, we compute the next value of the iteration +variable by adding the step value, or 1.0 if it isn't present. +'``next_var``' will be the value of the loop variable on the next +iteration of the loop. + +.. code-block:: ocaml + + (* Compute the end condition. *) + let end_cond = codegen_expr end_ in + + (* Convert condition to a bool by comparing equal to 0.0. *) + let zero = const_float double_type 0.0 in + let end_cond = build_fcmp Fcmp.One end_cond zero "loopcond" builder in + +Finally, we evaluate the exit value of the loop, to determine whether +the loop should exit. This mirrors the condition evaluation for the +if/then/else statement. + +.. code-block:: ocaml + + (* Create the "after loop" block and insert it. *) + let loop_end_bb = insertion_block builder in + let after_bb = append_block context "afterloop" the_function in + + (* Insert the conditional branch into the end of loop_end_bb. *) + ignore (build_cond_br end_cond loop_bb after_bb builder); + + (* Any new code will be inserted in after_bb. *) + position_at_end after_bb builder; + +With the code for the body of the loop complete, we just need to finish +up the control flow for it. This code remembers the end block (for the +phi node), then creates the block for the loop exit ("afterloop"). Based +on the value of the exit condition, it creates a conditional branch that +chooses between executing the loop again and exiting the loop. Any +future code is emitted in the "afterloop" block, so it sets the +insertion position to it. + +.. code-block:: ocaml + + (* Add a new entry to the PHI node for the backedge. *) + add_incoming (next_var, loop_end_bb) variable; + + (* Restore the unshadowed variable. *) + begin match old_val with + | Some old_val -> Hashtbl.add named_values var_name old_val + | None -> () + end; + + (* for expr always returns 0.0. *) + const_null double_type + +The final code handles various cleanups: now that we have the +"``next_var``" value, we can add the incoming value to the loop PHI +node. After that, we remove the loop variable from the symbol table, so +that it isn't in scope after the for loop. Finally, code generation of +the for loop always returns 0.0, so that is what we return from +``Codegen.codegen_expr``. + +With this, we conclude the "adding control flow to Kaleidoscope" chapter +of the tutorial. In this chapter we added two control flow constructs, +and used them to motivate a couple of aspects of the LLVM IR that are +important for front-end implementors to know. In the next chapter of our +saga, we will get a bit crazier and add `user-defined +operators <OCamlLangImpl6.html>`_ to our poor innocent language. + +Full Code Listing +================= + +Here is the complete code listing for our running example, enhanced with +the if/then/else and for expressions.. To build this example, use: + +.. code-block:: bash + + # Compile + ocamlbuild toy.byte + # Run + ./toy.byte + +Here is the code: + +\_tags: + :: + + <{lexer,parser}.ml>: use_camlp4, pp(camlp4of) + <*.{byte,native}>: g++, use_llvm, use_llvm_analysis + <*.{byte,native}>: use_llvm_executionengine, use_llvm_target + <*.{byte,native}>: use_llvm_scalar_opts, use_bindings + +myocamlbuild.ml: + .. code-block:: ocaml + + open Ocamlbuild_plugin;; + + ocaml_lib ~extern:true "llvm";; + ocaml_lib ~extern:true "llvm_analysis";; + ocaml_lib ~extern:true "llvm_executionengine";; + ocaml_lib ~extern:true "llvm_target";; + ocaml_lib ~extern:true "llvm_scalar_opts";; + + flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"]);; + dep ["link"; "ocaml"; "use_bindings"] ["bindings.o"];; + +token.ml: + .. code-block:: ocaml + + (*===----------------------------------------------------------------------=== + * Lexer Tokens + *===----------------------------------------------------------------------===*) + + (* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of + * these others for known things. *) + type token = + (* commands *) + | Def | Extern + + (* primary *) + | Ident of string | Number of float + + (* unknown *) + | Kwd of char + + (* control *) + | If | Then | Else + | For | In + +lexer.ml: + .. code-block:: ocaml + + (*===----------------------------------------------------------------------=== + * Lexer + *===----------------------------------------------------------------------===*) + + let rec lex = parser + (* Skip any whitespace. *) + | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream + + (* identifier: [a-zA-Z][a-zA-Z0-9] *) + | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> + let buffer = Buffer.create 1 in + Buffer.add_char buffer c; + lex_ident buffer stream + + (* number: [0-9.]+ *) + | [< ' ('0' .. '9' as c); stream >] -> + let buffer = Buffer.create 1 in + Buffer.add_char buffer c; + lex_number buffer stream + + (* Comment until end of line. *) + | [< ' ('#'); stream >] -> + lex_comment stream + + (* Otherwise, just return the character as its ascii value. *) + | [< 'c; stream >] -> + [< 'Token.Kwd c; lex stream >] + + (* end of stream. *) + | [< >] -> [< >] + + and lex_number buffer = parser + | [< ' ('0' .. '9' | '.' as c); stream >] -> + Buffer.add_char buffer c; + lex_number buffer stream + | [< stream=lex >] -> + [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] + + and lex_ident buffer = parser + | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> + Buffer.add_char buffer c; + lex_ident buffer stream + | [< stream=lex >] -> + match Buffer.contents buffer with + | "def" -> [< 'Token.Def; stream >] + | "extern" -> [< 'Token.Extern; stream >] + | "if" -> [< 'Token.If; stream >] + | "then" -> [< 'Token.Then; stream >] + | "else" -> [< 'Token.Else; stream >] + | "for" -> [< 'Token.For; stream >] + | "in" -> [< 'Token.In; stream >] + | id -> [< 'Token.Ident id; stream >] + + and lex_comment = parser + | [< ' ('\n'); stream=lex >] -> stream + | [< 'c; e=lex_comment >] -> e + | [< >] -> [< >] + +ast.ml: + .. code-block:: ocaml + + (*===----------------------------------------------------------------------=== + * Abstract Syntax Tree (aka Parse Tree) + *===----------------------------------------------------------------------===*) + + (* expr - Base type for all expression nodes. *) + type expr = + (* variant for numeric literals like "1.0". *) + | Number of float + + (* variant for referencing a variable, like "a". *) + | Variable of string + + (* variant for a binary operator. *) + | Binary of char * expr * expr + + (* variant for function calls. *) + | Call of string * expr array + + (* variant for if/then/else. *) + | If of expr * expr * expr + + (* variant for for/in. *) + | For of string * expr * expr * expr option * expr + + (* proto - This type represents the "prototype" for a function, which captures + * its name, and its argument names (thus implicitly the number of arguments the + * function takes). *) + type proto = Prototype of string * string array + + (* func - This type represents a function definition itself. *) + type func = Function of proto * expr + +parser.ml: + .. code-block:: ocaml + + (*===---------------------------------------------------------------------=== + * Parser + *===---------------------------------------------------------------------===*) + + (* binop_precedence - This holds the precedence for each binary operator that is + * defined *) + let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 + + (* precedence - Get the precedence of the pending binary operator token. *) + let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 + + (* primary + * ::= identifier + * ::= numberexpr + * ::= parenexpr + * ::= ifexpr + * ::= forexpr *) + let rec parse_primary = parser + (* numberexpr ::= number *) + | [< 'Token.Number n >] -> Ast.Number n + + (* parenexpr ::= '(' expression ')' *) + | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e + + (* identifierexpr + * ::= identifier + * ::= identifier '(' argumentexpr ')' *) + | [< 'Token.Ident id; stream >] -> + let rec parse_args accumulator = parser + | [< e=parse_expr; stream >] -> + begin parser + | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e + | [< >] -> e :: accumulator + end stream + | [< >] -> accumulator + in + let rec parse_ident id = parser + (* Call. *) + | [< 'Token.Kwd '('; + args=parse_args []; + 'Token.Kwd ')' ?? "expected ')'">] -> + Ast.Call (id, Array.of_list (List.rev args)) + + (* Simple variable ref. *) + | [< >] -> Ast.Variable id + in + parse_ident id stream + + (* ifexpr ::= 'if' expr 'then' expr 'else' expr *) + | [< 'Token.If; c=parse_expr; + 'Token.Then ?? "expected 'then'"; t=parse_expr; + 'Token.Else ?? "expected 'else'"; e=parse_expr >] -> + Ast.If (c, t, e) + + (* forexpr + ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression *) + | [< 'Token.For; + 'Token.Ident id ?? "expected identifier after for"; + 'Token.Kwd '=' ?? "expected '=' after for"; + stream >] -> + begin parser + | [< + start=parse_expr; + 'Token.Kwd ',' ?? "expected ',' after for"; + end_=parse_expr; + stream >] -> + let step = + begin parser + | [< 'Token.Kwd ','; step=parse_expr >] -> Some step + | [< >] -> None + end stream + in + begin parser + | [< 'Token.In; body=parse_expr >] -> + Ast.For (id, start, end_, step, body) + | [< >] -> + raise (Stream.Error "expected 'in' after for") + end stream + | [< >] -> + raise (Stream.Error "expected '=' after for") + end stream + + | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") + + (* binoprhs + * ::= ('+' primary)* *) + and parse_bin_rhs expr_prec lhs stream = + match Stream.peek stream with + (* If this is a binop, find its precedence. *) + | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> + let token_prec = precedence c in + + (* If this is a binop that binds at least as tightly as the current binop, + * consume it, otherwise we are done. *) + if token_prec < expr_prec then lhs else begin + (* Eat the binop. *) + Stream.junk stream; + + (* Parse the primary expression after the binary operator. *) + let rhs = parse_primary stream in + + (* Okay, we know this is a binop. *) + let rhs = + match Stream.peek stream with + | Some (Token.Kwd c2) -> + (* If BinOp binds less tightly with rhs than the operator after + * rhs, let the pending operator take rhs as its lhs. *) + let next_prec = precedence c2 in + if token_prec < next_prec + then parse_bin_rhs (token_prec + 1) rhs stream + else rhs + | _ -> rhs + in + + (* Merge lhs/rhs. *) + let lhs = Ast.Binary (c, lhs, rhs) in + parse_bin_rhs expr_prec lhs stream + end + | _ -> lhs + + (* expression + * ::= primary binoprhs *) + and parse_expr = parser + | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream + + (* prototype + * ::= id '(' id* ')' *) + let parse_prototype = + let rec parse_args accumulator = parser + | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e + | [< >] -> accumulator + in + + parser + | [< 'Token.Ident id; + 'Token.Kwd '(' ?? "expected '(' in prototype"; + args=parse_args []; + 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> + (* success. *) + Ast.Prototype (id, Array.of_list (List.rev args)) + + | [< >] -> + raise (Stream.Error "expected function name in prototype") + + (* definition ::= 'def' prototype expression *) + let parse_definition = parser + | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> + Ast.Function (p, e) + + (* toplevelexpr ::= expression *) + let parse_toplevel = parser + | [< e=parse_expr >] -> + (* Make an anonymous proto. *) + Ast.Function (Ast.Prototype ("", [||]), e) + + (* external ::= 'extern' prototype *) + let parse_extern = parser + | [< 'Token.Extern; e=parse_prototype >] -> e + +codegen.ml: + .. code-block:: ocaml + + (*===----------------------------------------------------------------------=== + * Code Generation + *===----------------------------------------------------------------------===*) + + open Llvm + + exception Error of string + + let context = global_context () + let the_module = create_module context "my cool jit" + let builder = builder context + let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 + let double_type = double_type context + + let rec codegen_expr = function + | Ast.Number n -> const_float double_type n + | Ast.Variable name -> + (try Hashtbl.find named_values name with + | Not_found -> raise (Error "unknown variable name")) + | Ast.Binary (op, lhs, rhs) -> + let lhs_val = codegen_expr lhs in + let rhs_val = codegen_expr rhs in + begin + match op with + | '+' -> build_add lhs_val rhs_val "addtmp" builder + | '-' -> build_sub lhs_val rhs_val "subtmp" builder + | '*' -> build_mul lhs_val rhs_val "multmp" builder + | '<' -> + (* Convert bool 0/1 to double 0.0 or 1.0 *) + let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in + build_uitofp i double_type "booltmp" builder + | _ -> raise (Error "invalid binary operator") + end + | Ast.Call (callee, args) -> + (* Look up the name in the module table. *) + let callee = + match lookup_function callee the_module with + | Some callee -> callee + | None -> raise (Error "unknown function referenced") + in + let params = params callee in + + (* If argument mismatch error. *) + if Array.length params == Array.length args then () else + raise (Error "incorrect # arguments passed"); + let args = Array.map codegen_expr args in + build_call callee args "calltmp" builder + | Ast.If (cond, then_, else_) -> + let cond = codegen_expr cond in + + (* Convert condition to a bool by comparing equal to 0.0 *) + let zero = const_float double_type 0.0 in + let cond_val = build_fcmp Fcmp.One cond zero "ifcond" builder in + + (* Grab the first block so that we might later add the conditional branch + * to it at the end of the function. *) + let start_bb = insertion_block builder in + let the_function = block_parent start_bb in + + let then_bb = append_block context "then" the_function in + + (* Emit 'then' value. *) + position_at_end then_bb builder; + let then_val = codegen_expr then_ in + + (* Codegen of 'then' can change the current block, update then_bb for the + * phi. We create a new name because one is used for the phi node, and the + * other is used for the conditional branch. *) + let new_then_bb = insertion_block builder in + + (* Emit 'else' value. *) + let else_bb = append_block context "else" the_function in + position_at_end else_bb builder; + let else_val = codegen_expr else_ in + + (* Codegen of 'else' can change the current block, update else_bb for the + * phi. *) + let new_else_bb = insertion_block builder in + + (* Emit merge block. *) + let merge_bb = append_block context "ifcont" the_function in + position_at_end merge_bb builder; + let incoming = [(then_val, new_then_bb); (else_val, new_else_bb)] in + let phi = build_phi incoming "iftmp" builder in + + (* Return to the start block to add the conditional branch. *) + position_at_end start_bb builder; + ignore (build_cond_br cond_val then_bb else_bb builder); + + (* Set a unconditional branch at the end of the 'then' block and the + * 'else' block to the 'merge' block. *) + position_at_end new_then_bb builder; ignore (build_br merge_bb builder); + position_at_end new_else_bb builder; ignore (build_br merge_bb builder); + + (* Finally, set the builder to the end of the merge block. *) + position_at_end merge_bb builder; + + phi + | Ast.For (var_name, start, end_, step, body) -> + (* Emit the start code first, without 'variable' in scope. *) + let start_val = codegen_expr start in + + (* Make the new basic block for the loop header, inserting after current + * block. *) + let preheader_bb = insertion_block builder in + let the_function = block_parent preheader_bb in + let loop_bb = append_block context "loop" the_function in + + (* Insert an explicit fall through from the current block to the + * loop_bb. *) + ignore (build_br loop_bb builder); + + (* Start insertion in loop_bb. *) + position_at_end loop_bb builder; + + (* Start the PHI node with an entry for start. *) + let variable = build_phi [(start_val, preheader_bb)] var_name builder in + + (* Within the loop, the variable is defined equal to the PHI node. If it + * shadows an existing variable, we have to restore it, so save it + * now. *) + let old_val = + try Some (Hashtbl.find named_values var_name) with Not_found -> None + in + Hashtbl.add named_values var_name variable; + + (* Emit the body of the loop. This, like any other expr, can change the + * current BB. Note that we ignore the value computed by the body, but + * don't allow an error *) + ignore (codegen_expr body); + + (* Emit the step value. *) + let step_val = + match step with + | Some step -> codegen_expr step + (* If not specified, use 1.0. *) + | None -> const_float double_type 1.0 + in + + let next_var = build_add variable step_val "nextvar" builder in + + (* Compute the end condition. *) + let end_cond = codegen_expr end_ in + + (* Convert condition to a bool by comparing equal to 0.0. *) + let zero = const_float double_type 0.0 in + let end_cond = build_fcmp Fcmp.One end_cond zero "loopcond" builder in + + (* Create the "after loop" block and insert it. *) + let loop_end_bb = insertion_block builder in + let after_bb = append_block context "afterloop" the_function in + + (* Insert the conditional branch into the end of loop_end_bb. *) + ignore (build_cond_br end_cond loop_bb after_bb builder); + + (* Any new code will be inserted in after_bb. *) + position_at_end after_bb builder; + + (* Add a new entry to the PHI node for the backedge. *) + add_incoming (next_var, loop_end_bb) variable; + + (* Restore the unshadowed variable. *) + begin match old_val with + | Some old_val -> Hashtbl.add named_values var_name old_val + | None -> () + end; + + (* for expr always returns 0.0. *) + const_null double_type + + let codegen_proto = function + | Ast.Prototype (name, args) -> + (* Make the function type: double(double,double) etc. *) + let doubles = Array.make (Array.length args) double_type in + let ft = function_type double_type doubles in + let f = + match lookup_function name the_module with + | None -> declare_function name ft the_module + + (* If 'f' conflicted, there was already something named 'name'. If it + * has a body, don't allow redefinition or reextern. *) + | Some f -> + (* If 'f' already has a body, reject this. *) + if block_begin f <> At_end f then + raise (Error "redefinition of function"); + + (* If 'f' took a different number of arguments, reject. *) + if element_type (type_of f) <> ft then + raise (Error "redefinition of function with different # args"); + f + in + + (* Set names for all arguments. *) + Array.iteri (fun i a -> + let n = args.(i) in + set_value_name n a; + Hashtbl.add named_values n a; + ) (params f); + f + + let codegen_func the_fpm = function + | Ast.Function (proto, body) -> + Hashtbl.clear named_values; + let the_function = codegen_proto proto in + + (* Create a new basic block to start insertion into. *) + let bb = append_block context "entry" the_function in + position_at_end bb builder; + + try + let ret_val = codegen_expr body in + + (* Finish off the function. *) + let _ = build_ret ret_val builder in + + (* Validate the generated code, checking for consistency. *) + Llvm_analysis.assert_valid_function the_function; + + (* Optimize the function. *) + let _ = PassManager.run_function the_function the_fpm in + + the_function + with e -> + delete_function the_function; + raise e + +toplevel.ml: + .. code-block:: ocaml + + (*===----------------------------------------------------------------------=== + * Top-Level parsing and JIT Driver + *===----------------------------------------------------------------------===*) + + open Llvm + open Llvm_executionengine + + (* top ::= definition | external | expression | ';' *) + let rec main_loop the_fpm the_execution_engine stream = + match Stream.peek stream with + | None -> () + + (* ignore top-level semicolons. *) + | Some (Token.Kwd ';') -> + Stream.junk stream; + main_loop the_fpm the_execution_engine stream + + | Some token -> + begin + try match token with + | Token.Def -> + let e = Parser.parse_definition stream in + print_endline "parsed a function definition."; + dump_value (Codegen.codegen_func the_fpm e); + | Token.Extern -> + let e = Parser.parse_extern stream in + print_endline "parsed an extern."; + dump_value (Codegen.codegen_proto e); + | _ -> + (* Evaluate a top-level expression into an anonymous function. *) + let e = Parser.parse_toplevel stream in + print_endline "parsed a top-level expr"; + let the_function = Codegen.codegen_func the_fpm e in + dump_value the_function; + + (* JIT the function, returning a function pointer. *) + let result = ExecutionEngine.run_function the_function [||] + the_execution_engine in + + print_string "Evaluated to "; + print_float (GenericValue.as_float Codegen.double_type result); + print_newline (); + with Stream.Error s | Codegen.Error s -> + (* Skip token for error recovery. *) + Stream.junk stream; + print_endline s; + end; + print_string "ready> "; flush stdout; + main_loop the_fpm the_execution_engine stream + +toy.ml: + .. code-block:: ocaml + + (*===----------------------------------------------------------------------=== + * Main driver code. + *===----------------------------------------------------------------------===*) + + open Llvm + open Llvm_executionengine + open Llvm_target + open Llvm_scalar_opts + + let main () = + ignore (initialize_native_target ()); + + (* Install standard binary operators. + * 1 is the lowest precedence. *) + Hashtbl.add Parser.binop_precedence '<' 10; + Hashtbl.add Parser.binop_precedence '+' 20; + Hashtbl.add Parser.binop_precedence '-' 20; + Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) + + (* Prime the first token. *) + print_string "ready> "; flush stdout; + let stream = Lexer.lex (Stream.of_channel stdin) in + + (* Create the JIT. *) + let the_execution_engine = ExecutionEngine.create Codegen.the_module in + let the_fpm = PassManager.create_function Codegen.the_module in + + (* Set up the optimizer pipeline. Start with registering info about how the + * target lays out data structures. *) + DataLayout.add (ExecutionEngine.target_data the_execution_engine) the_fpm; + + (* Do simple "peephole" optimizations and bit-twiddling optzn. *) + add_instruction_combination the_fpm; + + (* reassociate expressions. *) + add_reassociation the_fpm; + + (* Eliminate Common SubExpressions. *) + add_gvn the_fpm; + + (* Simplify the control flow graph (deleting unreachable blocks, etc). *) + add_cfg_simplification the_fpm; + + ignore (PassManager.initialize the_fpm); + + (* Run the main "interpreter loop" now. *) + Toplevel.main_loop the_fpm the_execution_engine stream; + + (* Print out all the generated code. *) + dump_module Codegen.the_module + ;; + + main () + +bindings.c + .. code-block:: c + + #include <stdio.h> + + /* putchard - putchar that takes a double and returns 0. */ + extern double putchard(double X) { + putchar((char)X); + return 0; + } + +`Next: Extending the language: user-defined +operators <OCamlLangImpl6.html>`_ + diff --git a/docs/tutorial/OCamlLangImpl6.html b/docs/tutorial/OCamlLangImpl6.html deleted file mode 100644 index 56883d539b..0000000000 --- a/docs/tutorial/OCamlLangImpl6.html +++ /dev/null @@ -1,1574 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> - -<html> -<head> - <title>Kaleidoscope: Extending the Language: User-defined Operators</title> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <meta name="author" content="Chris Lattner"> - <meta name="author" content="Erick Tryzelaar"> - <link rel="stylesheet" href="../_static/llvm.css" type="text/css"> -</head> - -<body> - -<h1>Kaleidoscope: Extending the Language: User-defined Operators</h1> - -<ul> -<li><a href="index.html">Up to Tutorial Index</a></li> -<li>Chapter 6 - <ol> - <li><a href="#intro">Chapter 6 Introduction</a></li> - <li><a href="#idea">User-defined Operators: the Idea</a></li> - <li><a href="#binary">User-defined Binary Operators</a></li> - <li><a href="#unary">User-defined Unary Operators</a></li> - <li><a href="#example">Kicking the Tires</a></li> - <li><a href="#code">Full Code Listing</a></li> - </ol> -</li> -<li><a href="OCamlLangImpl7.html">Chapter 7</a>: Extending the Language: Mutable -Variables / SSA Construction</li> -</ul> - -<div class="doc_author"> - <p> - Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a> - and <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a> - </p> -</div> - -<!-- *********************************************************************** --> -<h2><a name="intro">Chapter 6 Introduction</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>Welcome to Chapter 6 of the "<a href="index.html">Implementing a language -with LLVM</a>" tutorial. At this point in our tutorial, we now have a fully -functional language that is fairly minimal, but also useful. There -is still one big problem with it, however. Our language doesn't have many -useful operators (like division, logical negation, or even any comparisons -besides less-than).</p> - -<p>This chapter of the tutorial takes a wild digression into adding user-defined -operators to the simple and beautiful Kaleidoscope language. This digression now -gives us a simple and ugly language in some ways, but also a powerful one at the -same time. One of the great things about creating your own language is that you -get to decide what is good or bad. In this tutorial we'll assume that it is -okay to use this as a way to show some interesting parsing techniques.</p> - -<p>At the end of this tutorial, we'll run through an example Kaleidoscope -application that <a href="#example">renders the Mandelbrot set</a>. This gives -an example of what you can build with Kaleidoscope and its feature set.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="idea">User-defined Operators: the Idea</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p> -The "operator overloading" that we will add to Kaleidoscope is more general than -languages like C++. In C++, you are only allowed to redefine existing -operators: you can't programatically change the grammar, introduce new -operators, change precedence levels, etc. In this chapter, we will add this -capability to Kaleidoscope, which will let the user round out the set of -operators that are supported.</p> - -<p>The point of going into user-defined operators in a tutorial like this is to -show the power and flexibility of using a hand-written parser. Thus far, the parser -we have been implementing uses recursive descent for most parts of the grammar and -operator precedence parsing for the expressions. See <a -href="OCamlLangImpl2.html">Chapter 2</a> for details. Without using operator -precedence parsing, it would be very difficult to allow the programmer to -introduce new operators into the grammar: the grammar is dynamically extensible -as the JIT runs.</p> - -<p>The two specific features we'll add are programmable unary operators (right -now, Kaleidoscope has no unary operators at all) as well as binary operators. -An example of this is:</p> - -<div class="doc_code"> -<pre> -# Logical unary not. -def unary!(v) - if v then - 0 - else - 1; - -# Define > with the same precedence as <. -def binary> 10 (LHS RHS) - RHS < LHS; - -# Binary "logical or", (note that it does not "short circuit") -def binary| 5 (LHS RHS) - if LHS then - 1 - else if RHS then - 1 - else - 0; - -# Define = with slightly lower precedence than relationals. -def binary= 9 (LHS RHS) - !(LHS < RHS | LHS > RHS); -</pre> -</div> - -<p>Many languages aspire to being able to implement their standard runtime -library in the language itself. In Kaleidoscope, we can implement significant -parts of the language in the library!</p> - -<p>We will break down implementation of these features into two parts: -implementing support for user-defined binary operators and adding unary -operators.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="binary">User-defined Binary Operators</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>Adding support for user-defined binary operators is pretty simple with our -current framework. We'll first add support for the unary/binary keywords:</p> - -<div class="doc_code"> -<pre> -type token = - ... - <b>(* operators *) - | Binary | Unary</b> - -... - -and lex_ident buffer = parser - ... - | "for" -> [< 'Token.For; stream >] - | "in" -> [< 'Token.In; stream >] - <b>| "binary" -> [< 'Token.Binary; stream >] - | "unary" -> [< 'Token.Unary; stream >]</b> -</pre> -</div> - -<p>This just adds lexer support for the unary and binary keywords, like we -did in <a href="OCamlLangImpl5.html#iflexer">previous chapters</a>. One nice -thing about our current AST, is that we represent binary operators with full -generalisation by using their ASCII code as the opcode. For our extended -operators, we'll use this same representation, so we don't need any new AST or -parser support.</p> - -<p>On the other hand, we have to be able to represent the definitions of these -new operators, in the "def binary| 5" part of the function definition. In our -grammar so far, the "name" for the function definition is parsed as the -"prototype" production and into the <tt>Ast.Prototype</tt> AST node. To -represent our new user-defined operators as prototypes, we have to extend -the <tt>Ast.Prototype</tt> AST node like this:</p> - -<div class="doc_code"> -<pre> -(* proto - This type represents the "prototype" for a function, which captures - * its name, and its argument names (thus implicitly the number of arguments the - * function takes). *) -type proto = - | Prototype of string * string array - <b>| BinOpPrototype of string * string array * int</b> -</pre> -</div> - -<p>Basically, in addition to knowing a name for the prototype, we now keep track -of whether it was an operator, and if it was, what precedence level the operator -is at. The precedence is only used for binary operators (as you'll see below, -it just doesn't apply for unary operators). Now that we have a way to represent -the prototype for a user-defined operator, we need to parse it:</p> - -<div class="doc_code"> -<pre> -(* prototype - * ::= id '(' id* ')' - <b>* ::= binary LETTER number? (id, id) - * ::= unary LETTER number? (id) *)</b> -let parse_prototype = - let rec parse_args accumulator = parser - | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e - | [< >] -> accumulator - in - let parse_operator = parser - | [< 'Token.Unary >] -> "unary", 1 - | [< 'Token.Binary >] -> "binary", 2 - in - let parse_binary_precedence = parser - | [< 'Token.Number n >] -> int_of_float n - | [< >] -> 30 - in - parser - | [< 'Token.Ident id; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - (* success. *) - Ast.Prototype (id, Array.of_list (List.rev args)) - <b>| [< (prefix, kind)=parse_operator; - 'Token.Kwd op ?? "expected an operator"; - (* Read the precedence if present. *) - binary_precedence=parse_binary_precedence; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - let name = prefix ^ (String.make 1 op) in - let args = Array.of_list (List.rev args) in - - (* Verify right number of arguments for operator. *) - if Array.length args != kind - then raise (Stream.Error "invalid number of operands for operator") - else - if kind == 1 then - Ast.Prototype (name, args) - else - Ast.BinOpPrototype (name, args, binary_precedence)</b> - | [< >] -> - raise (Stream.Error "expected function name in prototype") -</pre> -</div> - -<p>This is all fairly straightforward parsing code, and we have already seen -a lot of similar code in the past. One interesting part about the code above is -the couple lines that set up <tt>name</tt> for binary operators. This builds -names like "binary@" for a newly defined "@" operator. This then takes -advantage of the fact that symbol names in the LLVM symbol table are allowed to -have any character in them, including embedded nul characters.</p> - -<p>The next interesting thing to add, is codegen support for these binary -operators. Given our current structure, this is a simple addition of a default -case for our existing binary operator node:</p> - -<div class="doc_code"> -<pre> -let codegen_expr = function - ... - | Ast.Binary (op, lhs, rhs) -> - let lhs_val = codegen_expr lhs in - let rhs_val = codegen_expr rhs in - begin - match op with - | '+' -> build_add lhs_val rhs_val "addtmp" builder - | '-' -> build_sub lhs_val rhs_val "subtmp" builder - | '*' -> build_mul lhs_val rhs_val "multmp" builder - | '<' -> - (* Convert bool 0/1 to double 0.0 or 1.0 *) - let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in - build_uitofp i double_type "booltmp" builder - <b>| _ -> - (* If it wasn't a builtin binary operator, it must be a user defined - * one. Emit a call to it. *) - let callee = "binary" ^ (String.make 1 op) in - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "binary operator not found!") - in - build_call callee [|lhs_val; rhs_val|] "binop" builder</b> - end -</pre> -</div> - -<p>As you can see above, the new code is actually really simple. It just does -a lookup for the appropriate operator in the symbol table and generates a -function call to it. Since user-defined operators are just built as normal -functions (because the "prototype" boils down to a function with the right -name) everything falls into place.</p> - -<p>The final piece of code we are missing, is a bit of top level magic:</p> - -<div class="doc_code"> -<pre> -let codegen_func the_fpm = function - | Ast.Function (proto, body) -> - Hashtbl.clear named_values; - let the_function = codegen_proto proto in - - <b>(* If this is an operator, install it. *) - begin match proto with - | Ast.BinOpPrototype (name, args, prec) -> - let op = name.[String.length name - 1] in - Hashtbl.add Parser.binop_precedence op prec; - | _ -> () - end;</b> - - (* Create a new basic block to start insertion into. *) - let bb = append_block context "entry" the_function in - position_at_end bb builder; - ... -</pre> -</div> - -<p>Basically, before codegening a function, if it is a user-defined operator, we -register it in the precedence table. This allows the binary operator parsing -logic we already have in place to handle it. Since we are working on a -fully-general operator precedence parser, this is all we need to do to "extend -the grammar".</p> - -<p>Now we have useful user-defined binary operators. This builds a lot -on the previous framework we built for other operators. Adding unary operators -is a bit more challenging, because we don't have any framework for it yet - lets -see what it takes.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="unary">User-defined Unary Operators</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>Since we don't currently support unary operators in the Kaleidoscope -language, we'll need to add everything to support them. Above, we added simple -support for the 'unary' keyword to the lexer. In addition to that, we need an -AST node:</p> - -<div class="doc_code"> -<pre> -type expr = - ... - (* variant for a unary operator. *) - | Unary of char * expr - ... -</pre> -</div> - -<p>This AST node is very simple and obvious by now. It directly mirrors the -binary operator AST node, except that it only has one child. With this, we -need to add the parsing logic. Parsing a unary operator is pretty simple: we'll -add a new function to do it:</p> - -<div class="doc_code"> -<pre> -(* unary - * ::= primary - * ::= '!' unary *) -and parse_unary = parser - (* If this is a unary operator, read it. *) - | [< 'Token.Kwd op when op != '(' && op != ')'; operand=parse_expr >] -> - Ast.Unary (op, operand) - - (* If the current token is not an operator, it must be a primary expr. *) - | [< stream >] -> parse_primary stream -</pre> -</div> - -<p>The grammar we add is pretty straightforward here. If we see a unary -operator when parsing a primary operator, we eat the operator as a prefix and -parse the remaining piece as another unary operator. This allows us to handle -multiple unary operators (e.g. "!!x"). Note that unary operators can't have -ambiguous parses like binary operators can, so there is no need for precedence -information.</p> - -<p>The problem with this function, is that we need to call ParseUnary from -somewhere. To do this, we change previous callers of ParsePrimary to call -<tt>parse_unary</tt> instead:</p> - -<div class="doc_code"> -<pre> -(* binoprhs - * ::= ('+' primary)* *) -and parse_bin_rhs expr_prec lhs stream = - ... - <b>(* Parse the unary expression after the binary operator. *) - let rhs = parse_unary stream in</b> - ... - -... - -(* expression - * ::= primary binoprhs *) -and parse_expr = parser - | [< lhs=<b>parse_unary</b>; stream >] -> parse_bin_rhs 0 lhs stream -</pre> -</div> - -<p>With these two simple changes, we are now able to parse unary operators and build the -AST for them. Next up, we need to add parser support for prototypes, to parse -the unary operator prototype. We extend the binary operator code above -with:</p> - -<div class="doc_code"> -<pre> -(* prototype - * ::= id '(' id* ')' - * ::= binary LETTER number? (id, id) - <b>* ::= unary LETTER number? (id)</b> *) -let parse_prototype = - let rec parse_args accumulator = parser - | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e - | [< >] -> accumulator - in - <b>let parse_operator = parser - | [< 'Token.Unary >] -> "unary", 1 - | [< 'Token.Binary >] -> "binary", 2 - in</b> - let parse_binary_precedence = parser - | [< 'Token.Number n >] -> int_of_float n - | [< >] -> 30 - in - parser - | [< 'Token.Ident id; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - (* success. *) - Ast.Prototype (id, Array.of_list (List.rev args)) - <b>| [< (prefix, kind)=parse_operator; - 'Token.Kwd op ?? "expected an operator"; - (* Read the precedence if present. *) - binary_precedence=parse_binary_precedence; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - let name = prefix ^ (String.make 1 op) in - let args = Array.of_list (List.rev args) in - - (* Verify right number of arguments for operator. *) - if Array.length args != kind - then raise (Stream.Error "invalid number of operands for operator") - else - if kind == 1 then - Ast.Prototype (name, args) - else - Ast.BinOpPrototype (name, args, binary_precedence)</b> - | [< >] -> - raise (Stream.Error "expected function name in prototype") -</pre> -</div> - -<p>As with binary operators, we name unary operators with a name that includes -the operator character. This assists us at code generation time. Speaking of, -the final piece we need to add is codegen support for unary operators. It looks -like this:</p> - -<div class="doc_code"> -<pre> -let rec codegen_expr = function - ... - | Ast.Unary (op, operand) -> - let operand = codegen_expr operand in - let callee = "unary" ^ (String.make 1 op) in - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "unknown unary operator") - in - build_call callee [|operand|] "unop" builder -</pre> -</div> - -<p>This code is similar to, but simpler than, the code for binary operators. It -is simpler primarily because it doesn't need to handle any predefined operators. -</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="example">Kicking the Tires</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>It is somewhat hard to believe, but with a few simple extensions we've -covered in the last chapters, we have grown a real-ish language. With this, we -can do a lot of interesting things, including I/O, math, and a bunch of other -things. For example, we can now add a nice sequencing operator (printd is -defined to print out the specified value and a newline):</p> - -<div class="doc_code"> -<pre> -ready> <b>extern printd(x);</b> -Read extern: declare double @printd(double) -ready> <b>def binary : 1 (x y) 0; # Low-precedence operator that ignores operands.</b> -.. -ready> <b>printd(123) : printd(456) : printd(789);</b> -123.000000 -456.000000 -789.000000 -Evaluated to 0.000000 -</pre> -</div> - -<p>We can also define a bunch of other "primitive" operations, such as:</p> - -<div class="doc_code"> -<pre> -# Logical unary not. -def unary!(v) - if v then - 0 - else - 1; - -# Unary negate. -def unary-(v) - 0-v; - -# Define > with the same precedence as <. -def binary> 10 (LHS RHS) - RHS < LHS; - -# Binary logical or, which does not short circuit. -def binary| 5 (LHS RHS) - if LHS then - 1 - else if RHS then - 1 - else - 0; - -# Binary logical and, which does not short circuit. -def binary& 6 (LHS RHS) - if !LHS then - 0 - else - !!RHS; - -# Define = with slightly lower precedence than relationals. -def binary = 9 (LHS RHS) - !(LHS < RHS | LHS > RHS); - -</pre> -</div> - - -<p>Given the previous if/then/else support, we can also define interesting -functions for I/O. For example, the following prints out a character whose -"density" reflects the value passed in: the lower the value, the denser the -character:</p> - -<div class="doc_code"> -<pre> -ready> -<b> -extern putchard(char) -def printdensity(d) - if d > 8 then - putchard(32) # ' ' - else if d > 4 then - putchard(46) # '.' - else if d > 2 then - putchard(43) # '+' - else - putchard(42); # '*'</b> -... -ready> <b>printdensity(1): printdensity(2): printdensity(3) : - printdensity(4): printdensity(5): printdensity(9): putchard(10);</b> -*++.. -Evaluated to 0.000000 -</pre> -</div> - -<p>Based on these simple primitive operations, we can start to define more -interesting things. For example, here's a little function that solves for the -number of iterations it takes a function in the complex plane to -converge:</p> - -<div class="doc_code"> -<pre> -# determine whether the specific location diverges. -# Solve for z = z^2 + c in the complex plane. -def mandleconverger(real imag iters creal cimag) - if iters > 255 | (real*real + imag*imag > 4) then - iters - else - mandleconverger(real*real - imag*imag + creal, - 2*real*imag + cimag, - iters+1, creal, cimag); - -# return the number of iterations required for the iteration to escape -def mandleconverge(real imag) - mandleconverger(real, imag, 0, real, imag); -</pre> -</div> - -<p>This "z = z<sup>2</sup> + c" function is a beautiful little creature that is the basis -for computation of the <a -href="http://en.wikipedia.org/wiki/Mandelbrot_set">Mandelbrot Set</a>. Our -<tt>mandelconverge</tt> function returns the number of iterations that it takes -for a complex orbit to escape, saturating to 255. This is not a very useful -function by itself, but if you plot its value over a two-dimensional plane, -you can see the Mandelbrot set. Given that we are limited to using putchard -here, our amazing graphical output is limited, but we can whip together -something using the density plotter above:</p> - -<div class="doc_code"> -<pre> -# compute and plot the mandlebrot set with the specified 2 dimensional range -# info. -def mandelhelp(xmin xmax xstep ymin ymax ystep) - for y = ymin, y < ymax, ystep in ( - (for x = xmin, x < xmax, xstep in - printdensity(mandleconverge(x,y))) - : putchard(10) - ) - -# mandel - This is a convenient helper function for plotting the mandelbrot set -# from the specified position with the specified Magnification. -def mandel(realstart imagstart realmag imagmag) - mandelhelp(realstart, realstart+realmag*78, realmag, - imagstart, imagstart+imagmag*40, imagmag); -</pre> -</div> - -<p>Given this, we can try plotting out the mandlebrot set! Lets try it out:</p> - -<div class="doc_code"> -<pre> -ready> <b>mandel(-2.3, -1.3, 0.05, 0.07);</b> -*******************************+++++++++++************************************* -*************************+++++++++++++++++++++++******************************* -**********************+++++++++++++++++++++++++++++**************************** -*******************+++++++++++++++++++++.. ...++++++++************************* -*****************++++++++++++++++++++++.... ...+++++++++*********************** -***************+++++++++++++++++++++++..... ...+++++++++********************* -**************+++++++++++++++++++++++.... ....+++++++++******************** -*************++++++++++++++++++++++...... .....++++++++******************* -************+++++++++++++++++++++....... .......+++++++****************** -***********+++++++++++++++++++.... ... .+++++++***************** -**********+++++++++++++++++....... .+++++++**************** -*********++++++++++++++........... ...+++++++*************** -********++++++++++++............ ...++++++++************** -********++++++++++... .......... .++++++++************** -*******+++++++++..... .+++++++++************* -*******++++++++...... ..+++++++++************* -*******++++++....... ..+++++++++************* -*******+++++...... ..+++++++++************* -*******.... .... ...+++++++++************* -*******.... . ...+++++++++************* -*******+++++...... ...+++++++++************* -*******++++++....... ..+++++++++************* -*******++++++++...... .+++++++++************* -*******+++++++++..... ..+++++++++************* -********++++++++++... .......... .++++++++************** -********++++++++++++............ ...++++++++************** -*********++++++++++++++.......... ...+++++++*************** -**********++++++++++++++++........ .+++++++**************** -**********++++++++++++++++++++.... ... ..+++++++**************** -***********++++++++++++++++++++++....... .......++++++++***************** -************+++++++++++++++++++++++...... ......++++++++****************** -**************+++++++++++++++++++++++.... ....++++++++******************** -***************+++++++++++++++++++++++..... ...+++++++++********************* -*****************++++++++++++++++++++++.... ...++++++++*********************** -*******************+++++++++++++++++++++......++++++++************************* -*********************++++++++++++++++++++++.++++++++*************************** -*************************+++++++++++++++++++++++******************************* -******************************+++++++++++++************************************ -******************************************************************************* -******************************************************************************* -******************************************************************************* -Evaluated to 0.000000 -ready> <b>mandel(-2, -1, 0.02, 0.04);</b> -**************************+++++++++++++++++++++++++++++++++++++++++++++++++++++ -***********************++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -*********************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++. -*******************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++... -*****************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++..... -***************++++++++++++++++++++++++++++++++++++++++++++++++++++++++........ -**************++++++++++++++++++++++++++++++++++++++++++++++++++++++........... -************+++++++++++++++++++++++++++++++++++++++++++++++++++++.............. -***********++++++++++++++++++++++++++++++++++++++++++++++++++........ . -**********++++++++++++++++++++++++++++++++++++++++++++++............. -********+++++++++++++++++++++++++++++++++++++++++++.................. -*******+++++++++++++++++++++++++++++++++++++++....................... -******+++++++++++++++++++++++++++++++++++........................... -*****++++++++++++++++++++++++++++++++............................ -*****++++++++++++++++++++++++++++............................... -****++++++++++++++++++++++++++...... ......................... -***++++++++++++++++++++++++......... ...... ........... -***++++++++++++++++++++++............ -**+++++++++++++++++++++.............. -**+++++++++++++++++++................ -*++++++++++++++++++................. -*++++++++++++++++............ ... -*++++++++++++++.............. -*+++....++++................ -*.......... ........... -* -*.......... ........... -*+++....++++................ -*++++++++++++++.............. -*++++++++++++++++............ ... -*++++++++++++++++++................. -**+++++++++++++++++++................ -**+++++++++++++++++++++.............. -***++++++++++++++++++++++............ -***++++++++++++++++++++++++......... ...... ........... -****++++++++++++++++++++++++++...... ......................... -*****++++++++++++++++++++++++++++............................... -*****++++++++++++++++++++++++++++++++............................ -******+++++++++++++++++++++++++++++++++++........................... -*******+++++++++++++++++++++++++++++++++++++++....................... -********+++++++++++++++++++++++++++++++++++++++++++.................. -Evaluated to 0.000000 -ready> <b>mandel(-0.9, -1.4, 0.02, 0.03);</b> -******************************************************************************* -******************************************************************************* -******************************************************************************* -**********+++++++++++++++++++++************************************************ -*+++++++++++++++++++++++++++++++++++++++*************************************** -+++++++++++++++++++++++++++++++++++++++++++++********************************** -++++++++++++++++++++++++++++++++++++++++++++++++++***************************** -++++++++++++++++++++++++++++++++++++++++++++++++++++++************************* -+++++++++++++++++++++++++++++++++++++++++++++++++++++++++********************** -+++++++++++++++++++++++++++++++++.........++++++++++++++++++******************* -+++++++++++++++++++++++++++++++.... ......+++++++++++++++++++**************** -+++++++++++++++++++++++++++++....... ........+++++++++++++++++++************** -++++++++++++++++++++++++++++........ ........++++++++++++++++++++************ -+++++++++++++++++++++++++++......... .. ...+++++++++++++++++++++********** -++++++++++++++++++++++++++........... ....++++++++++++++++++++++******** -++++++++++++++++++++++++............. .......++++++++++++++++++++++****** -+++++++++++++++++++++++............. ........+++++++++++++++++++++++**** -++++++++++++++++++++++........... ..........++++++++++++++++++++++*** -++++++++++++++++++++........... .........++++++++++++++++++++++* -++++++++++++++++++............ ...........++++++++++++++++++++ -++++++++++++++++............... .............++++++++++++++++++ -++++++++++++++................. ...............++++++++++++++++ -++++++++++++.................. .................++++++++++++++ -+++++++++.................. .................+++++++++++++ -++++++........ . ......... ..++++++++++++ -++............ ...... ....++++++++++ -.............. ...++++++++++ -.............. ....+++++++++ -.............. .....++++++++ -............. ......++++++++ -........... .......++++++++ -......... ........+++++++ -......... ........+++++++ -......... ....+++++++ -........ ...+++++++ -....... ...+++++++ - ....+++++++ - .....+++++++ - ....+++++++ - ....+++++++ - ....+++++++ -Evaluated to 0.000000 -ready> <b>^D</b> -</pre> -</div> - -<p>At this point, you may be starting to realize that Kaleidoscope is a real -and powerful language. It may not be self-similar :), but it can be used to -plot things that are!</p> - -<p>With this, we conclude the "adding user-defined operators" chapter of the -tutorial. We have successfully augmented our language, adding the ability to -extend the language in the library, and we have shown how this can be used to -build a simple but interesting end-user application in Kaleidoscope. At this -point, Kaleidoscope can build a variety of applications that are functional and -can call functions with side-effects, but it can't actually define and mutate a -variable itself.</p> - -<p>Strikingly, variable mutation is an important feature of some -languages, and it is not at all obvious how to <a href="OCamlLangImpl7.html">add -support for mutable variables</a> without having to add an "SSA construction" -phase to your front-end. In the next chapter, we will describe how you can -add variable mutation without building SSA in your front-end.</p> - -</div> - - -<!-- *********************************************************************** --> -<h2><a name="code">Full Code Listing</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p> -Here is the complete code listing for our running example, enhanced with the -if/then/else and for expressions.. To build this example, use: -</p> - -<div class="doc_code"> -<pre> -# Compile -ocamlbuild toy.byte -# Run -./toy.byte -</pre> -</div> - -<p>Here is the code:</p> - -<dl> -<dt>_tags:</dt> -<dd class="doc_code"> -<pre> -<{lexer,parser}.ml>: use_camlp4, pp(camlp4of) -<*.{byte,native}>: g++, use_llvm, use_llvm_analysis -<*.{byte,native}>: use_llvm_executionengine, use_llvm_target -<*.{byte,native}>: use_llvm_scalar_opts, use_bindings -</pre> -</dd> - -<dt>myocamlbuild.ml:</dt> -<dd class="doc_code"> -<pre> -open Ocamlbuild_plugin;; - -ocaml_lib ~extern:true "llvm";; -ocaml_lib ~extern:true "llvm_analysis";; -ocaml_lib ~extern:true "llvm_executionengine";; -ocaml_lib ~extern:true "llvm_target";; -ocaml_lib ~extern:true "llvm_scalar_opts";; - -flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"; A"-cclib"; A"-rdynamic"]);; -dep ["link"; "ocaml"; "use_bindings"] ["bindings.o"];; -</pre> -</dd> - -<dt>token.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Lexer Tokens - *===----------------------------------------------------------------------===*) - -(* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of - * these others for known things. *) -type token = - (* commands *) - | Def | Extern - - (* primary *) - | Ident of string | Number of float - - (* unknown *) - | Kwd of char - - (* control *) - | If | Then | Else - | For | In - - (* operators *) - | Binary | Unary -</pre> -</dd> - -<dt>lexer.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Lexer - *===----------------------------------------------------------------------===*) - -let rec lex = parser - (* Skip any whitespace. *) - | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream - - (* identifier: [a-zA-Z][a-zA-Z0-9] *) - | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_ident buffer stream - - (* number: [0-9.]+ *) - | [< ' ('0' .. '9' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_number buffer stream - - (* Comment until end of line. *) - | [< ' ('#'); stream >] -> - lex_comment stream - - (* Otherwise, just return the character as its ascii value. *) - | [< 'c; stream >] -> - [< 'Token.Kwd c; lex stream >] - - (* end of stream. *) - | [< >] -> [< >] - -and lex_number buffer = parser - | [< ' ('0' .. '9' | '.' as c); stream >] -> - Buffer.add_char buffer c; - lex_number buffer stream - | [< stream=lex >] -> - [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] - -and lex_ident buffer = parser - | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> - Buffer.add_char buffer c; - lex_ident buffer stream - | [< stream=lex >] -> - match Buffer.contents buffer with - | "def" -> [< 'Token.Def; stream >] - | "extern" -> [< 'Token.Extern; stream >] - | "if" -> [< 'Token.If; stream >] - | "then" -> [< 'Token.Then; stream >] - | "else" -> [< 'Token.Else; stream >] - | "for" -> [< 'Token.For; stream >] - | "in" -> [< 'Token.In; stream >] - | "binary" -> [< 'Token.Binary; stream >] - | "unary" -> [< 'Token.Unary; stream >] - | id -> [< 'Token.Ident id; stream >] - -and lex_comment = parser - | [< ' ('\n'); stream=lex >] -> stream - | [< 'c; e=lex_comment >] -> e - | [< >] -> [< >] -</pre> -</dd> - -<dt>ast.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Abstract Syntax Tree (aka Parse Tree) - *===----------------------------------------------------------------------===*) - -(* expr - Base type for all expression nodes. *) -type expr = - (* variant for numeric literals like "1.0". *) - | Number of float - - (* variant for referencing a variable, like "a". *) - | Variable of string - - (* variant for a unary operator. *) - | Unary of char * expr - - (* variant for a binary operator. *) - | Binary of char * expr * expr - - (* variant for function calls. *) - | Call of string * expr array - - (* variant for if/then/else. *) - | If of expr * expr * expr - - (* variant for for/in. *) - | For of string * expr * expr * expr option * expr - -(* proto - This type represents the "prototype" for a function, which captures - * its name, and its argument names (thus implicitly the number of arguments the - * function takes). *) -type proto = - | Prototype of string * string array - | BinOpPrototype of string * string array * int - -(* func - This type represents a function definition itself. *) -type func = Function of proto * expr -</pre> -</dd> - -<dt>parser.ml:</dt> -<dd class="doc_code"> -<pre> -(*===---------------------------------------------------------------------=== - * Parser - *===---------------------------------------------------------------------===*) - -(* binop_precedence - This holds the precedence for each binary operator that is - * defined *) -let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 - -(* precedence - Get the precedence of the pending binary operator token. *) -let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 - -(* primary - * ::= identifier - * ::= numberexpr - * ::= parenexpr - * ::= ifexpr - * ::= forexpr *) -let rec parse_primary = parser - (* numberexpr ::= number *) - | [< 'Token.Number n >] -> Ast.Number n - - (* parenexpr ::= '(' expression ')' *) - | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e - - (* identifierexpr - * ::= identifier - * ::= identifier '(' argumentexpr ')' *) - | [< 'Token.Ident id; stream >] -> - let rec parse_args accumulator = parser - | [< e=parse_expr; stream >] -> - begin parser - | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e - | [< >] -> e :: accumulator - end stream - | [< >] -> accumulator - in - let rec parse_ident id = parser - (* Call. *) - | [< 'Token.Kwd '('; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')'">] -> - Ast.Call (id, Array.of_list (List.rev args)) - - (* Simple variable ref. *) - | [< >] -> Ast.Variable id - in - parse_ident id stream - - (* ifexpr ::= 'if' expr 'then' expr 'else' expr *) - | [< 'Token.If; c=parse_expr; - 'Token.Then ?? "expected 'then'"; t=parse_expr; - 'Token.Else ?? "expected 'else'"; e=parse_expr >] -> - Ast.If (c, t, e) - - (* forexpr - ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression *) - | [< 'Token.For; - 'Token.Ident id ?? "expected identifier after for"; - 'Token.Kwd '=' ?? "expected '=' after for"; - stream >] -> - begin parser - | [< - start=parse_expr; - 'Token.Kwd ',' ?? "expected ',' after for"; - end_=parse_expr; - stream >] -> - let step = - begin parser - | [< 'Token.Kwd ','; step=parse_expr >] -> Some step - | [< >] -> None - end stream - in - begin parser - | [< 'Token.In; body=parse_expr >] -> - Ast.For (id, start, end_, step, body) - | [< >] -> - raise (Stream.Error "expected 'in' after for") - end stream - | [< >] -> - raise (Stream.Error "expected '=' after for") - end stream - - | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") - -(* unary - * ::= primary - * ::= '!' unary *) -and parse_unary = parser - (* If this is a unary operator, read it. *) - | [< 'Token.Kwd op when op != '(' && op != ')'; operand=parse_expr >] -> - Ast.Unary (op, operand) - - (* If the current token is not an operator, it must be a primary expr. *) - | [< stream >] -> parse_primary stream - -(* binoprhs - * ::= ('+' primary)* *) -and parse_bin_rhs expr_prec lhs stream = - match Stream.peek stream with - (* If this is a binop, find its precedence. *) - | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> - let token_prec = precedence c in - - (* If this is a binop that binds at least as tightly as the current binop, - * consume it, otherwise we are done. *) - if token_prec < expr_prec then lhs else begin - (* Eat the binop. *) - Stream.junk stream; - - (* Parse the unary expression after the binary operator. *) - let rhs = parse_unary stream in - - (* Okay, we know this is a binop. *) - let rhs = - match Stream.peek stream with - | Some (Token.Kwd c2) -> - (* If BinOp binds less tightly with rhs than the operator after - * rhs, let the pending operator take rhs as its lhs. *) - let next_prec = precedence c2 in - if token_prec < next_prec - then parse_bin_rhs (token_prec + 1) rhs stream - else rhs - | _ -> rhs - in - - (* Merge lhs/rhs. *) - let lhs = Ast.Binary (c, lhs, rhs) in - parse_bin_rhs expr_prec lhs stream - end - | _ -> lhs - -(* expression - * ::= primary binoprhs *) -and parse_expr = parser - | [< lhs=parse_unary; stream >] -> parse_bin_rhs 0 lhs stream - -(* prototype - * ::= id '(' id* ')' - * ::= binary LETTER number? (id, id) - * ::= unary LETTER number? (id) *) -let parse_prototype = - let rec parse_args accumulator = parser - | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e - | [< >] -> accumulator - in - let parse_operator = parser - | [< 'Token.Unary >] -> "unary", 1 - | [< 'Token.Binary >] -> "binary", 2 - in - let parse_binary_precedence = parser - | [< 'Token.Number n >] -> int_of_float n - | [< >] -> 30 - in - parser - | [< 'Token.Ident id; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - (* success. *) - Ast.Prototype (id, Array.of_list (List.rev args)) - | [< (prefix, kind)=parse_operator; - 'Token.Kwd op ?? "expected an operator"; - (* Read the precedence if present. *) - binary_precedence=parse_binary_precedence; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - let name = prefix ^ (String.make 1 op) in - let args = Array.of_list (List.rev args) in - - (* Verify right number of arguments for operator. *) - if Array.length args != kind - then raise (Stream.Error "invalid number of operands for operator") - else - if kind == 1 then - Ast.Prototype (name, args) - else - Ast.BinOpPrototype (name, args, binary_precedence) - | [< >] -> - raise (Stream.Error "expected function name in prototype") - -(* definition ::= 'def' prototype expression *) -let parse_definition = parser - | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> - Ast.Function (p, e) - -(* toplevelexpr ::= expression *) -let parse_toplevel = parser - | [< e=parse_expr >] -> - (* Make an anonymous proto. *) - Ast.Function (Ast.Prototype ("", [||]), e) - -(* external ::= 'extern' prototype *) -let parse_extern = parser - | [< 'Token.Extern; e=parse_prototype >] -> e -</pre> -</dd> - -<dt>codegen.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Code Generation - *===----------------------------------------------------------------------===*) - -open Llvm - -exception Error of string - -let context = global_context () -let the_module = create_module context "my cool jit" -let builder = builder context -let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 -let double_type = double_type context - -let rec codegen_expr = function - | Ast.Number n -> const_float double_type n - | Ast.Variable name -> - (try Hashtbl.find named_values name with - | Not_found -> raise (Error "unknown variable name")) - | Ast.Unary (op, operand) -> - let operand = codegen_expr operand in - let callee = "unary" ^ (String.make 1 op) in - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "unknown unary operator") - in - build_call callee [|operand|] "unop" builder - | Ast.Binary (op, lhs, rhs) -> - let lhs_val = codegen_expr lhs in - let rhs_val = codegen_expr rhs in - begin - match op with - | '+' -> build_add lhs_val rhs_val "addtmp" builder - | '-' -> build_sub lhs_val rhs_val "subtmp" builder - | '*' -> build_mul lhs_val rhs_val "multmp" builder - | '<' -> - (* Convert bool 0/1 to double 0.0 or 1.0 *) - let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in - build_uitofp i double_type "booltmp" builder - | _ -> - (* If it wasn't a builtin binary operator, it must be a user defined - * one. Emit a call to it. *) - let callee = "binary" ^ (String.make 1 op) in - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "binary operator not found!") - in - build_call callee [|lhs_val; rhs_val|] "binop" builder - end - | Ast.Call (callee, args) -> - (* Look up the name in the module table. *) - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "unknown function referenced") - in - let params = params callee in - - (* If argument mismatch error. *) - if Array.length params == Array.length args then () else - raise (Error "incorrect # arguments passed"); - let args = Array.map codegen_expr args in - build_call callee args "calltmp" builder - | Ast.If (cond, then_, else_) -> - let cond = codegen_expr cond in - - (* Convert condition to a bool by comparing equal to 0.0 *) - let zero = const_float double_type 0.0 in - let cond_val = build_fcmp Fcmp.One cond zero "ifcond" builder in - - (* Grab the first block so that we might later add the conditional branch - * to it at the end of the function. *) - let start_bb = insertion_block builder in - let the_function = block_parent start_bb in - - let then_bb = append_block context "then" the_function in - - (* Emit 'then' value. *) - position_at_end then_bb builder; - let then_val = codegen_expr then_ in - - (* Codegen of 'then' can change the current block, update then_bb for the - * phi. We create a new name because one is used for the phi node, and the - * other is used for the conditional branch. *) - let new_then_bb = insertion_block builder in - - (* Emit 'else' value. *) - let else_bb = append_block context "else" the_function in - position_at_end else_bb builder; - let else_val = codegen_expr else_ in - - (* Codegen of 'else' can change the current block, update else_bb for the - * phi. *) - let new_else_bb = insertion_block builder in - - (* Emit merge block. *) - let merge_bb = append_block context "ifcont" the_function in - position_at_end merge_bb builder; - let incoming = [(then_val, new_then_bb); (else_val, new_else_bb)] in - let phi = build_phi incoming "iftmp" builder in - - (* Return to the start block to add the conditional branch. *) - position_at_end start_bb builder; - ignore (build_cond_br cond_val then_bb else_bb builder); - - (* Set a unconditional branch at the end of the 'then' block and the - * 'else' block to the 'merge' block. *) - position_at_end new_then_bb builder; ignore (build_br merge_bb builder); - position_at_end new_else_bb builder; ignore (build_br merge_bb builder); - - (* Finally, set the builder to the end of the merge block. *) - position_at_end merge_bb builder; - - phi - | Ast.For (var_name, start, end_, step, body) -> - (* Emit the start code first, without 'variable' in scope. *) - let start_val = codegen_expr start in - - (* Make the new basic block for the loop header, inserting after current - * block. *) - let preheader_bb = insertion_block builder in - let the_function = block_parent preheader_bb in - let loop_bb = append_block context "loop" the_function in - - (* Insert an explicit fall through from the current block to the - * loop_bb. *) - ignore (build_br loop_bb builder); - - (* Start insertion in loop_bb. *) - position_at_end loop_bb builder; - - (* Start the PHI node with an entry for start. *) - let variable = build_phi [(start_val, preheader_bb)] var_name builder in - - (* Within the loop, the variable is defined equal to the PHI node. If it - * shadows an existing variable, we have to restore it, so save it - * now. *) - let old_val = - try Some (Hashtbl.find named_values var_name) with Not_found -> None - in - Hashtbl.add named_values var_name variable; - - (* Emit the body of the loop. This, like any other expr, can change the - * current BB. Note that we ignore the value computed by the body, but - * don't allow an error *) - ignore (codegen_expr body); - - (* Emit the step value. *) - let step_val = - match step with - | Some step -> codegen_expr step - (* If not specified, use 1.0. *) - | None -> const_float double_type 1.0 - in - - let next_var = build_add variable step_val "nextvar" builder in - - (* Compute the end condition. *) - let end_cond = codegen_expr end_ in - - (* Convert condition to a bool by comparing equal to 0.0. *) - let zero = const_float double_type 0.0 in - let end_cond = build_fcmp Fcmp.One end_cond zero "loopcond" builder in - - (* Create the "after loop" block and insert it. *) - let loop_end_bb = insertion_block builder in - let after_bb = append_block context "afterloop" the_function in - - (* Insert the conditional branch into the end of loop_end_bb. *) - ignore (build_cond_br end_cond loop_bb after_bb builder); - - (* Any new code will be inserted in after_bb. *) - position_at_end after_bb builder; - - (* Add a new entry to the PHI node for the backedge. *) - add_incoming (next_var, loop_end_bb) variable; - - (* Restore the unshadowed variable. *) - begin match old_val with - | Some old_val -> Hashtbl.add named_values var_name old_val - | None -> () - end; - - (* for expr always returns 0.0. *) - const_null double_type - -let codegen_proto = function - | Ast.Prototype (name, args) | Ast.BinOpPrototype (name, args, _) -> - (* Make the function type: double(double,double) etc. *) - let doubles = Array.make (Array.length args) double_type in - let ft = function_type double_type doubles in - let f = - match lookup_function name the_module with - | None -> declare_function name ft the_module - - (* If 'f' conflicted, there was already something named 'name'. If it - * has a body, don't allow redefinition or reextern. *) - | Some f -> - (* If 'f' already has a body, reject this. *) - if block_begin f <> At_end f then - raise (Error "redefinition of function"); - - (* If 'f' took a different number of arguments, reject. *) - if element_type (type_of f) <> ft then - raise (Error "redefinition of function with different # args"); - f - in - - (* Set names for all arguments. *) - Array.iteri (fun i a -> - let n = args.(i) in - set_value_name n a; - Hashtbl.add named_values n a; - ) (params f); - f - -let codegen_func the_fpm = function - | Ast.Function (proto, body) -> - Hashtbl.clear named_values; - let the_function = codegen_proto proto in - - (* If this is an operator, install it. *) - begin match proto with - | Ast.BinOpPrototype (name, args, prec) -> - let op = name.[String.length name - 1] in - Hashtbl.add Parser.binop_precedence op prec; - | _ -> () - end; - - (* Create a new basic block to start insertion into. *) - let bb = append_block context "entry" the_function in - position_at_end bb builder; - - try - let ret_val = codegen_expr body in - - (* Finish off the function. *) - let _ = build_ret ret_val builder in - - (* Validate the generated code, checking for consistency. *) - Llvm_analysis.assert_valid_function the_function; - - (* Optimize the function. *) - let _ = PassManager.run_function the_function the_fpm in - - the_function - with e -> - delete_function the_function; - raise e -</pre> -</dd> - -<dt>toplevel.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Top-Level parsing and JIT Driver - *===----------------------------------------------------------------------===*) - -open Llvm -open Llvm_executionengine - -(* top ::= definition | external | expression | ';' *) -let rec main_loop the_fpm the_execution_engine stream = - match Stream.peek stream with - | None -> () - - (* ignore top-level semicolons. *) - | Some (Token.Kwd ';') -> - Stream.junk stream; - main_loop the_fpm the_execution_engine stream - - | Some token -> - begin - try match token with - | Token.Def -> - let e = Parser.parse_definition stream in - print_endline "parsed a function definition."; - dump_value (Codegen.codegen_func the_fpm e); - | Token.Extern -> - let e = Parser.parse_extern stream in - print_endline "parsed an extern."; - dump_value (Codegen.codegen_proto e); - | _ -> - (* Evaluate a top-level expression into an anonymous function. *) - let e = Parser.parse_toplevel stream in - print_endline "parsed a top-level expr"; - let the_function = Codegen.codegen_func the_fpm e in - dump_value the_function; - - (* JIT the function, returning a function pointer. *) - let result = ExecutionEngine.run_function the_function [||] - the_execution_engine in - - print_string "Evaluated to "; - print_float (GenericValue.as_float Codegen.double_type result); - print_newline (); - with Stream.Error s | Codegen.Error s -> - (* Skip token for error recovery. *) - Stream.junk stream; - print_endline s; - end; - print_string "ready> "; flush stdout; - main_loop the_fpm the_execution_engine stream -</pre> -</dd> - -<dt>toy.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Main driver code. - *===----------------------------------------------------------------------===*) - -open Llvm -open Llvm_executionengine -open Llvm_target -open Llvm_scalar_opts - -let main () = - ignore (initialize_native_target ()); - - (* Install standard binary operators. - * 1 is the lowest precedence. *) - Hashtbl.add Parser.binop_precedence '<' 10; - Hashtbl.add Parser.binop_precedence '+' 20; - Hashtbl.add Parser.binop_precedence '-' 20; - Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) - - (* Prime the first token. *) - print_string "ready> "; flush stdout; - let stream = Lexer.lex (Stream.of_channel stdin) in - - (* Create the JIT. *) - let the_execution_engine = ExecutionEngine.create Codegen.the_module in - let the_fpm = PassManager.create_function Codegen.the_module in - - (* Set up the optimizer pipeline. Start with registering info about how the - * target lays out data structures. *) - DataLayout.add (ExecutionEngine.target_data the_execution_engine) the_fpm; - - (* Do simple "peephole" optimizations and bit-twiddling optzn. *) - add_instruction_combination the_fpm; - - (* reassociate expressions. *) - add_reassociation the_fpm; - - (* Eliminate Common SubExpressions. *) - add_gvn the_fpm; - - (* Simplify the control flow graph (deleting unreachable blocks, etc). *) - add_cfg_simplification the_fpm; - - ignore (PassManager.initialize the_fpm); - - (* Run the main "interpreter loop" now. *) - Toplevel.main_loop the_fpm the_execution_engine stream; - - (* Print out all the generated code. *) - dump_module Codegen.the_module -;; - -main () -</pre> -</dd> - -<dt>bindings.c</dt> -<dd class="doc_code"> -<pre> -#include <stdio.h> - -/* putchard - putchar that takes a double and returns 0. */ -extern double putchard(double X) { - putchar((char)X); - return 0; -} - -/* printd - printf that takes a double prints it as "%f\n", returning 0. */ -extern double printd(double X) { - printf("%f\n", X); - return 0; -} -</pre> -</dd> -</dl> - -<a href="OCamlLangImpl7.html">Next: Extending the language: mutable variables / -SSA construction</a> -</div> - -<!-- *********************************************************************** --> -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> - - <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> - <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a><br> - <a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> -</body> -</html> diff --git a/docs/tutorial/OCamlLangImpl6.rst b/docs/tutorial/OCamlLangImpl6.rst new file mode 100644 index 0000000000..7665647736 --- /dev/null +++ b/docs/tutorial/OCamlLangImpl6.rst @@ -0,0 +1,1444 @@ +============================================================ +Kaleidoscope: Extending the Language: User-defined Operators +============================================================ + +.. contents:: + :local: + +Written by `Chris Lattner <mailto:sabre@nondot.org>`_ and `Erick +Tryzelaar <mailto:idadesub@users.sourceforge.net>`_ + +Chapter 6 Introduction +====================== + +Welcome to Chapter 6 of the "`Implementing a language with +LLVM <index.html>`_" tutorial. At this point in our tutorial, we now +have a fully functional language that is fairly minimal, but also +useful. There is still one big problem with it, however. Our language +doesn't have many useful operators (like division, logical negation, or +even any comparisons besides less-than). + +This chapter of the tutorial takes a wild digression into adding +user-defined operators to the simple and beautiful Kaleidoscope +language. This digression now gives us a simple and ugly language in +some ways, but also a powerful one at the same time. One of the great +things about creating your own language is that you get to decide what +is good or bad. In this tutorial we'll assume that it is okay to use +this as a way to show some interesting parsing techniques. + +At the end of this tutorial, we'll run through an example Kaleidoscope +application that `renders the Mandelbrot set <#example>`_. This gives an +example of what you can build with Kaleidoscope and its feature set. + +User-defined Operators: the Idea +================================ + +The "operator overloading" that we will add to Kaleidoscope is more +general than languages like C++. In C++, you are only allowed to +redefine existing operators: you can't programatically change the +grammar, introduce new operators, change precedence levels, etc. In this +chapter, we will add this capability to Kaleidoscope, which will let the +user round out the set of operators that are supported. + +The point of going into user-defined operators in a tutorial like this +is to show the power and flexibility of using a hand-written parser. +Thus far, the parser we have been implementing uses recursive descent +for most parts of the grammar and operator precedence parsing for the +expressions. See `Chapter 2 <OCamlLangImpl2.html>`_ for details. Without +using operator precedence parsing, it would be very difficult to allow +the programmer to introduce new operators into the grammar: the grammar +is dynamically extensible as the JIT runs. + +The two specific features we'll add are programmable unary operators +(right now, Kaleidoscope has no unary operators at all) as well as +binary operators. An example of this is: + +:: + + # Logical unary not. + def unary!(v) + if v then + 0 + else + 1; + + # Define > with the same precedence as <. + def binary> 10 (LHS RHS) + RHS < LHS; + + # Binary "logical or", (note that it does not "short circuit") + def binary| 5 (LHS RHS) + if LHS then + 1 + else if RHS then + 1 + else + 0; + + # Define = with slightly lower precedence than relationals. + def binary= 9 (LHS RHS) + !(LHS < RHS | LHS > RHS); + +Many languages aspire to being able to implement their standard runtime +library in the language itself. In Kaleidoscope, we can implement +significant parts of the language in the library! + +We will break down implementation of these features into two parts: +implementing support for user-defined binary operators and adding unary +operators. + +User-defined Binary Operators +============================= + +Adding support for user-defined binary operators is pretty simple with +our current framework. We'll first add support for the unary/binary +keywords: + +.. code-block:: ocaml + + type token = + ... + (* operators *) + | Binary | Unary + + ... + + and lex_ident buffer = parser + ... + | "for" -> [< 'Token.For; stream >] + | "in" -> [< 'Token.In; stream >] + | "binary" -> [< 'Token.Binary; stream >] + | "unary" -> [< 'Token.Unary; stream >] + +This just adds lexer support for the unary and binary keywords, like we +did in `previous chapters <OCamlLangImpl5.html#iflexer>`_. One nice +thing about our current AST, is that we represent binary operators with +full generalisation by using their ASCII code as the opcode. For our +extended operators, we'll use this same representation, so we don't need +any new AST or parser support. + +On the other hand, we have to be able to represent the definitions of +these new operators, in the "def binary\| 5" part of the function +definition. In our grammar so far, the "name" for the function +definition is parsed as the "prototype" production and into the +``Ast.Prototype`` AST node. To represent our new user-defined operators +as prototypes, we have to extend the ``Ast.Prototype`` AST node like +this: + +.. code-block:: ocaml + + (* proto - This type represents the "prototype" for a function, which captures + * its name, and its argument names (thus implicitly the number of arguments the + * function takes). *) + type proto = + | Prototype of string * string array + | BinOpPrototype of string * string array * int + +Basically, in addition to knowing a name for the prototype, we now keep +track of whether it was an operator, and if it was, what precedence +level the operator is at. The precedence is only used for binary +operators (as you'll see below, it just doesn't apply for unary +operators). Now that we have a way to represent the prototype for a +user-defined operator, we need to parse it: + +.. code-block:: ocaml + + (* prototype + * ::= id '(' id* ')' + * ::= binary LETTER number? (id, id) + * ::= unary LETTER number? (id) *) + let parse_prototype = + let rec parse_args accumulator = parser + | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e + | [< >] -> accumulator + in + let parse_operator = parser + | [< 'Token.Unary >] -> "unary", 1 + | [< 'Token.Binary >] -> "binary", 2 + in + let parse_binary_precedence = parser + | [< 'Token.Number n >] -> int_of_float n + | [< >] -> 30 + in + parser + | [< 'Token.Ident id; + 'Token.Kwd '(' ?? "expected '(' in prototype"; + args=parse_args []; + 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> + (* success. *) + Ast.Prototype (id, Array.of_list (List.rev args)) + | [< (prefix, kind)=parse_operator; + 'Token.Kwd op ?? "expected an operator"; + (* Read the precedence if present. *) + binary_precedence=parse_binary_precedence; + 'Token.Kwd '(' ?? "expected '(' in prototype"; + args=parse_args []; + 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> + let name = prefix ^ (String.make 1 op) in + let args = Array.of_list (List.rev args) in + + (* Verify right number of arguments for operator. *) + if Array.length args != kind + then raise (Stream.Error "invalid number of operands for operator") + else + if kind == 1 then + Ast.Prototype (name, args) + else + Ast.BinOpPrototype (name, args, binary_precedence) + | [< >] -> + raise (Stream.Error "expected function name in prototype") + +This is all fairly straightforward parsing code, and we have already +seen a lot of similar code in the past. One interesting part about the +code above is the couple lines that set up ``name`` for binary +operators. This builds names like "binary@" for a newly defined "@" +operator. This then takes advantage of the fact that symbol names in the +LLVM symbol table are allowed to have any character in them, including +embedded nul characters. + +The next interesting thing to add, is codegen support for these binary +operators. Given our current structure, this is a simple addition of a +default case for our existing binary operator node: + +.. code-block:: ocaml + + let codegen_expr = function + ... + | Ast.Binary (op, lhs, rhs) -> + let lhs_val = codegen_expr lhs in + let rhs_val = codegen_expr rhs in + begin + match op with + | '+' -> build_add lhs_val rhs_val "addtmp" builder + | '-' -> build_sub lhs_val rhs_val "subtmp" builder + | '*' -> build_mul lhs_val rhs_val "multmp" builder + | '<' -> + (* Convert bool 0/1 to double 0.0 or 1.0 *) + let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in + build_uitofp i double_type "booltmp" builder + | _ -> + (* If it wasn't a builtin binary operator, it must be a user defined + * one. Emit a call to it. *) + let callee = "binary" ^ (String.make 1 op) in + let callee = + match lookup_function callee the_module with + | Some callee -> callee + | None -> raise (Error "binary operator not found!") + in + build_call callee [|lhs_val; rhs_val|] "binop" builder + end + +As you can see above, the new code is actually really simple. It just +does a lookup for the appropriate operator in the symbol table and +generates a function call to it. Since user-defined operators are just +built as normal functions (because the "prototype" boils down to a +function with the right name) everything falls into place. + +The final piece of code we are missing, is a bit of top level magic: + +.. code-block:: ocaml + + let codegen_func the_fpm = function + | Ast.Function (proto, body) -> + Hashtbl.clear named_values; + let the_function = codegen_proto proto in + + (* If this is an operator, install it. *) + begin match proto with + | Ast.BinOpPrototype (name, args, prec) -> + let op = name.[String.length name - 1] in + Hashtbl.add Parser.binop_precedence op prec; + | _ -> () + end; + + (* Create a new basic block to start insertion into. *) + let bb = append_block context "entry" the_function in + position_at_end bb builder; + ... + +Basically, before codegening a function, if it is a user-defined +operator, we register it in the precedence table. This allows the binary +operator parsing logic we already have in place to handle it. Since we +are working on a fully-general operator precedence parser, this is all +we need to do to "extend the grammar". + +Now we have useful user-defined binary operators. This builds a lot on +the previous framework we built for other operators. Adding unary +operators is a bit more challenging, because we don't have any framework +for it yet - lets see what it takes. + +User-defined Unary Operators +============================ + +Since we don't currently support unary operators in the Kaleidoscope +language, we'll need to add everything to support them. Above, we added +simple support for the 'unary' keyword to the lexer. In addition to +that, we need an AST node: + +.. code-block:: ocaml + + type expr = + ... + (* variant for a unary operator. *) + | Unary of char * expr + ... + +This AST node is very simple and obvious by now. It directly mirrors the +binary operator AST node, except that it only has one child. With this, +we need to add the parsing logic. Parsing a unary operator is pretty +simple: we'll add a new function to do it: + +.. code-block:: ocaml + + (* unary + * ::= primary + * ::= '!' unary *) + and parse_unary = parser + (* If this is a unary operator, read it. *) + | [< 'Token.Kwd op when op != '(' && op != ')'; operand=parse_expr >] -> + Ast.Unary (op, operand) + + (* If the current token is not an operator, it must be a primary expr. *) + | [< stream >] -> parse_primary stream + +The grammar we add is pretty straightforward here. If we see a unary +operator when parsing a primary operator, we eat the operator as a +prefix and parse the remaining piece as another unary operator. This +allows us to handle multiple unary operators (e.g. "!!x"). Note that +unary operators can't have ambiguous parses like binary operators can, +so there is no need for precedence information. + +The problem with this function, is that we need to call ParseUnary from +somewhere. To do this, we change previous callers of ParsePrimary to +call ``parse_unary`` instead: + +.. code-block:: ocaml + + (* binoprhs + * ::= ('+' primary)* *) + and parse_bin_rhs expr_prec lhs stream = + ... + (* Parse the unary expression after the binary operator. *) + let rhs = parse_unary stream in + ... + + ... + + (* expression + * ::= primary binoprhs *) + and parse_expr = parser + | [< lhs=parse_unary; stream >] -> parse_bin_rhs 0 lhs stream + +With these two simple changes, we are now able to parse unary operators +and build the AST for them. Next up, we need to add parser support for +prototypes, to parse the unary operator prototype. We extend the binary +operator code above with: + +.. code-block:: ocaml + + (* prototype + * ::= id '(' id* ')' + * ::= binary LETTER number? (id, id) + * ::= unary LETTER number? (id) *) + let parse_prototype = + let rec parse_args accumulator = parser + | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e + | [< >] -> accumulator + in + let parse_operator = parser + | [< 'Token.Unary >] -> "unary", 1 + | [< 'Token.Binary >] -> "binary", 2 + in + let parse_binary_precedence = parser + | [< 'Token.Number n >] -> int_of_float n + | [< >] -> 30 + in + parser + | [< 'Token.Ident id; + 'Token.Kwd '(' ?? "expected '(' in prototype"; + args=parse_args []; + 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> + (* success. *) + Ast.Prototype (id, Array.of_list (List.rev args)) + | [< (prefix, kind)=parse_operator; + 'Token.Kwd op ?? "expected an operator"; + (* Read the precedence if present. *) + binary_precedence=parse_binary_precedence; + 'Token.Kwd '(' ?? "expected '(' in prototype"; + args=parse_args []; + 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> + let name = prefix ^ (String.make 1 op) in + let args = Array.of_list (List.rev args) in + + (* Verify right number of arguments for operator. *) + if Array.length args != kind + then raise (Stream.Error "invalid number of operands for operator") + else + if kind == 1 then + Ast.Prototype (name, args) + else + Ast.BinOpPrototype (name, args, binary_precedence) + | [< >] -> + raise (Stream.Error "expected function name in prototype") + +As with binary operators, we name unary operators with a name that +includes the operator character. This assists us at code generation +time. Speaking of, the final piece we need to add is codegen support for +unary operators. It looks like this: + +.. code-block:: ocaml + + let rec codegen_expr = function + ... + | Ast.Unary (op, operand) -> + let operand = codegen_expr operand in + let callee = "unary" ^ (String.make 1 op) in + let callee = + match lookup_function callee the_module with + | Some callee -> callee + | None -> raise (Error "unknown unary operator") + in + build_call callee [|operand|] "unop" builder + +This code is similar to, but simpler than, the code for binary +operators. It is simpler primarily because it doesn't need to handle any +predefined operators. + +Kicking the Tires +================= + +It is somewhat hard to believe, but with a few simple extensions we've +covered in the last chapters, we have grown a real-ish language. With +this, we can do a lot of interesting things, including I/O, math, and a +bunch of other things. For example, we can now add a nice sequencing +operator (printd is defined to print out the specified value and a +newline): + +:: + + ready> extern printd(x); + Read extern: declare double @printd(double) + ready> def binary : 1 (x y) 0; # Low-precedence operator that ignores operands. + .. + ready> printd(123) : printd(456) : printd(789); + 123.000000 + 456.000000 + 789.000000 + Evaluated to 0.000000 + +We can also define a bunch of other "primitive" operations, such as: + +:: + + # Logical unary not. + def unary!(v) + if v then + 0 + else + 1; + + # Unary negate. + def unary-(v) + 0-v; + + # Define > with the same precedence as <. + def binary> 10 (LHS RHS) + RHS < LHS; + + # Binary logical or, which does not short circuit. + def binary| 5 (LHS RHS) + if LHS then + 1 + else if RHS then + 1 + else + 0; + + # Binary logical and, which does not short circuit. + def binary& 6 (LHS RHS) + if !LHS then + 0 + else + !!RHS; + + # Define = with slightly lower precedence than relationals. + def binary = 9 (LHS RHS) + !(LHS < RHS | LHS > RHS); + +Given the previous if/then/else support, we can also define interesting +functions for I/O. For example, the following prints out a character +whose "density" reflects the value passed in: the lower the value, the +denser the character: + +:: + + ready> + + extern putchard(char) + def printdensity(d) + if d > 8 then + putchard(32) # ' ' + else if d > 4 then + putchard(46) # '.' + else if d > 2 then + putchard(43) # '+' + else + putchard(42); # '*' + ... + ready> printdensity(1): printdensity(2): printdensity(3) : + printdensity(4): printdensity(5): printdensity(9): putchard(10); + *++.. + Evaluated to 0.000000 + +Based on these simple primitive operations, we can start to define more +interesting things. For example, here's a little function that solves +for the number of iterations it takes a function in the complex plane to +converge: + +:: + + # determine whether the specific location diverges. + # Solve for z = z^2 + c in the complex plane. + def mandleconverger(real imag iters creal cimag) + if iters > 255 | (real*real + imag*imag > 4) then + iters + else + mandleconverger(real*real - imag*imag + creal, + 2*real*imag + cimag, + iters+1, creal, cimag); + + # return the number of iterations required for the iteration to escape + def mandleconverge(real imag) + mandleconverger(real, imag, 0, real, imag); + +This "z = z\ :sup:`2`\ + c" function is a beautiful little creature +that is the basis for computation of the `Mandelbrot +Set <http://en.wikipedia.org/wiki/Mandelbrot_set>`_. Our +``mandelconverge`` function returns the number of iterations that it +takes for a complex orbit to escape, saturating to 255. This is not a +very useful function by itself, but if you plot its value over a +two-dimensional plane, you can see the Mandelbrot set. Given that we are +limited to using putchard here, our amazing graphical output is limited, +but we can whip together something using the density plotter above: + +:: + + # compute and plot the mandlebrot set with the specified 2 dimensional range + # info. + def mandelhelp(xmin xmax xstep ymin ymax ystep) + for y = ymin, y < ymax, ystep in ( + (for x = xmin, x < xmax, xstep in + printdensity(mandleconverge(x,y))) + : putchard(10) + ) + + # mandel - This is a convenient helper function for plotting the mandelbrot set + # from the specified position with the specified Magnification. + def mandel(realstart imagstart realmag imagmag) + mandelhelp(realstart, realstart+realmag*78, realmag, + imagstart, imagstart+imagmag*40, imagmag); + +Given this, we can try plotting out the mandlebrot set! Lets try it out: + +:: + + ready> mandel(-2.3, -1.3, 0.05, 0.07); + *******************************+++++++++++************************************* + *************************+++++++++++++++++++++++******************************* + **********************+++++++++++++++++++++++++++++**************************** + *******************+++++++++++++++++++++.. ...++++++++************************* + *****************++++++++++++++++++++++.... ...+++++++++*********************** + ***************+++++++++++++++++++++++..... ...+++++++++********************* + **************+++++++++++++++++++++++.... ....+++++++++******************** + *************++++++++++++++++++++++...... .....++++++++******************* + ************+++++++++++++++++++++....... .......+++++++****************** + ***********+++++++++++++++++++.... ... .+++++++***************** + **********+++++++++++++++++....... .+++++++**************** + *********++++++++++++++........... ...+++++++*************** + ********++++++++++++............ ...++++++++************** + ********++++++++++... .......... .++++++++************** + *******+++++++++..... .+++++++++************* + *******++++++++...... ..+++++++++************* + *******++++++....... ..+++++++++************* + *******+++++...... ..+++++++++************* + *******.... .... ...+++++++++************* + *******.... . ...+++++++++************* + *******+++++...... ...+++++++++************* + *******++++++....... ..+++++++++************* + *******++++++++...... .+++++++++************* + *******+++++++++..... ..+++++++++************* + ********++++++++++... .......... .++++++++************** + ********++++++++++++............ ...++++++++************** + *********++++++++++++++.......... ...+++++++*************** + **********++++++++++++++++........ .+++++++**************** + **********++++++++++++++++++++.... ... ..+++++++**************** + ***********++++++++++++++++++++++....... .......++++++++***************** + ************+++++++++++++++++++++++...... ......++++++++****************** + **************+++++++++++++++++++++++.... ....++++++++******************** + ***************+++++++++++++++++++++++..... ...+++++++++********************* + *****************++++++++++++++++++++++.... ...++++++++*********************** + *******************+++++++++++++++++++++......++++++++************************* + *********************++++++++++++++++++++++.++++++++*************************** + *************************+++++++++++++++++++++++******************************* + ******************************+++++++++++++************************************ + ******************************************************************************* + ******************************************************************************* + ******************************************************************************* + Evaluated to 0.000000 + ready> mandel(-2, -1, 0.02, 0.04); + **************************+++++++++++++++++++++++++++++++++++++++++++++++++++++ + ***********************++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + *********************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++. + *******************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++... + *****************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++..... + ***************++++++++++++++++++++++++++++++++++++++++++++++++++++++++........ + **************++++++++++++++++++++++++++++++++++++++++++++++++++++++........... + ************+++++++++++++++++++++++++++++++++++++++++++++++++++++.............. + ***********++++++++++++++++++++++++++++++++++++++++++++++++++........ . + **********++++++++++++++++++++++++++++++++++++++++++++++............. + ********+++++++++++++++++++++++++++++++++++++++++++.................. + *******+++++++++++++++++++++++++++++++++++++++....................... + ******+++++++++++++++++++++++++++++++++++........................... + *****++++++++++++++++++++++++++++++++............................ + *****++++++++++++++++++++++++++++............................... + ****++++++++++++++++++++++++++...... ......................... + ***++++++++++++++++++++++++......... ...... ........... + ***++++++++++++++++++++++............ + **+++++++++++++++++++++.............. + **+++++++++++++++++++................ + *++++++++++++++++++................. + *++++++++++++++++............ ... + *++++++++++++++.............. + *+++....++++................ + *.......... ........... + * + *.......... ........... + *+++....++++................ + *++++++++++++++.............. + *++++++++++++++++............ ... + *++++++++++++++++++................. + **+++++++++++++++++++................ + **+++++++++++++++++++++.............. + ***++++++++++++++++++++++............ + ***++++++++++++++++++++++++......... ...... ........... + ****++++++++++++++++++++++++++...... ......................... + *****++++++++++++++++++++++++++++............................... + *****++++++++++++++++++++++++++++++++............................ + ******+++++++++++++++++++++++++++++++++++........................... + *******+++++++++++++++++++++++++++++++++++++++....................... + ********+++++++++++++++++++++++++++++++++++++++++++.................. + Evaluated to 0.000000 + ready> mandel(-0.9, -1.4, 0.02, 0.03); + ******************************************************************************* + ******************************************************************************* + ******************************************************************************* + **********+++++++++++++++++++++************************************************ + *+++++++++++++++++++++++++++++++++++++++*************************************** + +++++++++++++++++++++++++++++++++++++++++++++********************************** + ++++++++++++++++++++++++++++++++++++++++++++++++++***************************** + ++++++++++++++++++++++++++++++++++++++++++++++++++++++************************* + +++++++++++++++++++++++++++++++++++++++++++++++++++++++++********************** + +++++++++++++++++++++++++++++++++.........++++++++++++++++++******************* + +++++++++++++++++++++++++++++++.... ......+++++++++++++++++++**************** + +++++++++++++++++++++++++++++....... ........+++++++++++++++++++************** + ++++++++++++++++++++++++++++........ ........++++++++++++++++++++************ + +++++++++++++++++++++++++++......... .. ...+++++++++++++++++++++********** + ++++++++++++++++++++++++++........... ....++++++++++++++++++++++******** + ++++++++++++++++++++++++............. .......++++++++++++++++++++++****** + +++++++++++++++++++++++............. ........+++++++++++++++++++++++**** + ++++++++++++++++++++++........... ..........++++++++++++++++++++++*** + ++++++++++++++++++++........... .........++++++++++++++++++++++* + ++++++++++++++++++............ ...........++++++++++++++++++++ + ++++++++++++++++............... .............++++++++++++++++++ + ++++++++++++++................. ...............++++++++++++++++ + ++++++++++++.................. .................++++++++++++++ + +++++++++.................. .................+++++++++++++ + ++++++........ . ......... ..++++++++++++ + ++............ ...... ....++++++++++ + .............. ...++++++++++ + .............. ....+++++++++ + .............. .....++++++++ + ............. ......++++++++ + ........... .......++++++++ + ......... ........+++++++ + ......... ........+++++++ + ......... ....+++++++ + ........ ...+++++++ + ....... ...+++++++ + ....+++++++ + .....+++++++ + ....+++++++ + ....+++++++ + ....+++++++ + Evaluated to 0.000000 + ready> ^D + +At this point, you may be starting to realize that Kaleidoscope is a +real and powerful language. It may not be self-similar :), but it can be +used to plot things that are! + +With this, we conclude the "adding user-defined operators" chapter of +the tutorial. We have successfully augmented our language, adding the +ability to extend the language in the library, and we have shown how +this can be used to build a simple but interesting end-user application +in Kaleidoscope. At this point, Kaleidoscope can build a variety of +applications that are functional and can call functions with +side-effects, but it can't actually define and mutate a variable itself. + +Strikingly, variable mutation is an important feature of some languages, +and it is not at all obvious how to `add support for mutable +variables <OCamlLangImpl7.html>`_ without having to add an "SSA +construction" phase to your front-end. In the next chapter, we will +describe how you can add variable mutation without building SSA in your +front-end. + +Full Code Listing +================= + +Here is the complete code listing for our running example, enhanced with +the if/then/else and for expressions.. To build this example, use: + +.. code-block:: bash + + # Compile + ocamlbuild toy.byte + # Run + ./toy.byte + +Here is the code: + +\_tags: + :: + + <{lexer,parser}.ml>: use_camlp4, pp(camlp4of) + <*.{byte,native}>: g++, use_llvm, use_llvm_analysis + <*.{byte,native}>: use_llvm_executionengine, use_llvm_target + <*.{byte,native}>: use_llvm_scalar_opts, use_bindings + +myocamlbuild.ml: + .. code-block:: ocaml + + open Ocamlbuild_plugin;; + + ocaml_lib ~extern:true "llvm";; + ocaml_lib ~extern:true "llvm_analysis";; + ocaml_lib ~extern:true "llvm_executionengine";; + ocaml_lib ~extern:true "llvm_target";; + ocaml_lib ~extern:true "llvm_scalar_opts";; + + flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"; A"-cclib"; A"-rdynamic"]);; + dep ["link"; "ocaml"; "use_bindings"] ["bindings.o"];; + +token.ml: + .. code-block:: ocaml + + (*===----------------------------------------------------------------------=== + * Lexer Tokens + *===----------------------------------------------------------------------===*) + + (* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of + * these others for known things. *) + type token = + (* commands *) + | Def | Extern + + (* primary *) + | Ident of string | Number of float + + (* unknown *) + | Kwd of char + + (* control *) + | If | Then | Else + | For | In + + (* operators *) + | Binary | Unary + +lexer.ml: + .. code-block:: ocaml + + (*===----------------------------------------------------------------------=== + * Lexer + *===----------------------------------------------------------------------===*) + + let rec lex = parser + (* Skip any whitespace. *) + | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream + + (* identifier: [a-zA-Z][a-zA-Z0-9] *) + | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> + let buffer = Buffer.create 1 in + Buffer.add_char buffer c; + lex_ident buffer stream + + (* number: [0-9.]+ *) + | [< ' ('0' .. '9' as c); stream >] -> + let buffer = Buffer.create 1 in + Buffer.add_char buffer c; + lex_number buffer stream + + (* Comment until end of line. *) + | [< ' ('#'); stream >] -> + lex_comment stream + + (* Otherwise, just return the character as its ascii value. *) + | [< 'c; stream >] -> + [< 'Token.Kwd c; lex stream >] + + (* end of stream. *) + | [< >] -> [< >] + + and lex_number buffer = parser + | [< ' ('0' .. '9' | '.' as c); stream >] -> + Buffer.add_char buffer c; + lex_number buffer stream + | [< stream=lex >] -> + [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] + + and lex_ident buffer = parser + | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> + Buffer.add_char buffer c; + lex_ident buffer stream + | [< stream=lex >] -> + match Buffer.contents buffer with + | "def" -> [< 'Token.Def; stream >] + | "extern" -> [< 'Token.Extern; stream >] + | "if" -> [< 'Token.If; stream >] + | "then" -> [< 'Token.Then; stream >] + | "else" -> [< 'Token.Else; stream >] + | "for" -> [< 'Token.For; stream >] + | "in" -> [< 'Token.In; stream >] + | "binary" -> [< 'Token.Binary; stream >] + | "unary" -> [< 'Token.Unary; stream >] + | id -> [< 'Token.Ident id; stream >] + + and lex_comment = parser + | [< ' ('\n'); stream=lex >] -> stream + | [< 'c; e=lex_comment >] -> e + | [< >] -> [< >] + +ast.ml: + .. code-block:: ocaml + + (*===----------------------------------------------------------------------=== + * Abstract Syntax Tree (aka Parse Tree) + *===----------------------------------------------------------------------===*) + + (* expr - Base type for all expression nodes. *) + type expr = + (* variant for numeric literals like "1.0". *) + | Number of float + + (* variant for referencing a variable, like "a". *) + | Variable of string + + (* variant for a unary operator. *) + | Unary of char * expr + + (* variant for a binary operator. *) + | Binary of char * expr * expr + + (* variant for function calls. *) + | Call of string * expr array + + (* variant for if/then/else. *) + | If of expr * expr * expr + + (* variant for for/in. *) + | For of string * expr * expr * expr option * expr + + (* proto - This type represents the "prototype" for a function, which captures + * its name, and its argument names (thus implicitly the number of arguments the + * function takes). *) + type proto = + | Prototype of string * string array + | BinOpPrototype of string * string array * int + + (* func - This type represents a function definition itself. *) + type func = Function of proto * expr + +parser.ml: + .. code-block:: ocaml + + (*===---------------------------------------------------------------------=== + * Parser + *===---------------------------------------------------------------------===*) + + (* binop_precedence - This holds the precedence for each binary operator that is + * defined *) + let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 + + (* precedence - Get the precedence of the pending binary operator token. *) + let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 + + (* primary + * ::= identifier + * ::= numberexpr + * ::= parenexpr + * ::= ifexpr + * ::= forexpr *) + let rec parse_primary = parser + (* numberexpr ::= number *) + | [< 'Token.Number n >] -> Ast.Number n + + (* parenexpr ::= '(' expression ')' *) + | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e + + (* identifierexpr + * ::= identifier + * ::= identifier '(' argumentexpr ')' *) + | [< 'Token.Ident id; stream >] -> + let rec parse_args accumulator = parser + | [< e=parse_expr; stream >] -> + begin parser + | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e + | [< >] -> e :: accumulator + end stream + | [< >] -> accumulator + in + let rec parse_ident id = parser + (* Call. *) + | [< 'Token.Kwd '('; + args=parse_args []; + 'Token.Kwd ')' ?? "expected ')'">] -> + Ast.Call (id, Array.of_list (List.rev args)) + + (* Simple variable ref. *) + | [< >] -> Ast.Variable id + in + parse_ident id stream + + (* ifexpr ::= 'if' expr 'then' expr 'else' expr *) + | [< 'Token.If; c=parse_expr; + 'Token.Then ?? "expected 'then'"; t=parse_expr; + 'Token.Else ?? "expected 'else'"; e=parse_expr >] -> + Ast.If (c, t, e) + + (* forexpr + ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression *) + | [< 'Token.For; + 'Token.Ident id ?? "expected identifier after for"; + 'Token.Kwd '=' ?? "expected '=' after for"; + stream >] -> + begin parser + | [< + start=parse_expr; + 'Token.Kwd ',' ?? "expected ',' after for"; + end_=parse_expr; + stream >] -> + let step = + begin parser + | [< 'Token.Kwd ','; step=parse_expr >] -> Some step + | [< >] -> None + end stream + in + begin parser + | [< 'Token.In; body=parse_expr >] -> + Ast.For (id, start, end_, step, body) + | [< >] -> + raise (Stream.Error "expected 'in' after for") + end stream + | [< >] -> + raise (Stream.Error "expected '=' after for") + end stream + + | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") + + (* unary + * ::= primary + * ::= '!' unary *) + and parse_unary = parser + (* If this is a unary operator, read it. *) + | [< 'Token.Kwd op when op != '(' && op != ')'; operand=parse_expr >] -> + Ast.Unary (op, operand) + + (* If the current token is not an operator, it must be a primary expr. *) + | [< stream >] -> parse_primary stream + + (* binoprhs + * ::= ('+' primary)* *) + and parse_bin_rhs expr_prec lhs stream = + match Stream.peek stream with + (* If this is a binop, find its precedence. *) + | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> + let token_prec = precedence c in + + (* If this is a binop that binds at least as tightly as the current binop, + * consume it, otherwise we are done. *) + if token_prec < expr_prec then lhs else begin + (* Eat the binop. *) + Stream.junk stream; + + (* Parse the unary expression after the binary operator. *) + let rhs = parse_unary stream in + + (* Okay, we know this is a binop. *) + let rhs = + match Stream.peek stream with + | Some (Token.Kwd c2) -> + (* If BinOp binds less tightly with rhs than the operator after + * rhs, let the pending operator take rhs as its lhs. *) + let next_prec = precedence c2 in + if token_prec < next_prec + then parse_bin_rhs (token_prec + 1) rhs stream + else rhs + | _ -> rhs + in + + (* Merge lhs/rhs. *) + let lhs = Ast.Binary (c, lhs, rhs) in + parse_bin_rhs expr_prec lhs stream + end + | _ -> lhs + + (* expression + * ::= primary binoprhs *) + and parse_expr = parser + | [< lhs=parse_unary; stream >] -> parse_bin_rhs 0 lhs stream + + (* prototype + * ::= id '(' id* ')' + * ::= binary LETTER number? (id, id) + * ::= unary LETTER number? (id) *) + let parse_prototype = + let rec parse_args accumulator = parser + | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e + | [< >] -> accumulator + in + let parse_operator = parser + | [< 'Token.Unary >] -> "unary", 1 + | [< 'Token.Binary >] -> "binary", 2 + in + let parse_binary_precedence = parser + | [< 'Token.Number n >] -> int_of_float n + | [< >] -> 30 + in + parser + | [< 'Token.Ident id; + 'Token.Kwd '(' ?? "expected '(' in prototype"; + args=parse_args []; + 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> + (* success. *) + Ast.Prototype (id, Array.of_list (List.rev args)) + | [< (prefix, kind)=parse_operator; + 'Token.Kwd op ?? "expected an operator"; + (* Read the precedence if present. *) + binary_precedence=parse_binary_precedence; + 'Token.Kwd '(' ?? "expected '(' in prototype"; + args=parse_args []; + 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> + let name = prefix ^ (String.make 1 op) in + let args = Array.of_list (List.rev args) in + + (* Verify right number of arguments for operator. *) + if Array.length args != kind + then raise (Stream.Error "invalid number of operands for operator") + else + if kind == 1 then + Ast.Prototype (name, args) + else + Ast.BinOpPrototype (name, args, binary_precedence) + | [< >] -> + raise (Stream.Error "expected function name in prototype") + + (* definition ::= 'def' prototype expression *) + let parse_definition = parser + | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> + Ast.Function (p, e) + + (* toplevelexpr ::= expression *) + let parse_toplevel = parser + | [< e=parse_expr >] -> + (* Make an anonymous proto. *) + Ast.Function (Ast.Prototype ("", [||]), e) + + (* external ::= 'extern' prototype *) + let parse_extern = parser + | [< 'Token.Extern; e=parse_prototype >] -> e + +codegen.ml: + .. code-block:: ocaml + + (*===----------------------------------------------------------------------=== + * Code Generation + *===----------------------------------------------------------------------===*) + + open Llvm + + exception Error of string + + let context = global_context () + let the_module = create_module context "my cool jit" + let builder = builder context + let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 + let double_type = double_type context + + let rec codegen_expr = function + | Ast.Number n -> const_float double_type n + | Ast.Variable name -> + (try Hashtbl.find named_values name with + | Not_found -> raise (Error "unknown variable name")) + | Ast.Unary (op, operand) -> + let operand = codegen_expr operand in + let callee = "unary" ^ (String.make 1 op) in + let callee = + match lookup_function callee the_module with + | Some callee -> callee + | None -> raise (Error "unknown unary operator") + in + build_call callee [|operand|] "unop" builder + | Ast.Binary (op, lhs, rhs) -> + let lhs_val = codegen_expr lhs in + let rhs_val = codegen_expr rhs in + begin + match op with + | '+' -> build_add lhs_val rhs_val "addtmp" builder + | '-' -> build_sub lhs_val rhs_val "subtmp" builder + | '*' -> build_mul lhs_val rhs_val "multmp" builder + | '<' -> + (* Convert bool 0/1 to double 0.0 or 1.0 *) + let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in + build_uitofp i double_type "booltmp" builder + | _ -> + (* If it wasn't a builtin binary operator, it must be a user defined + * one. Emit a call to it. *) + let callee = "binary" ^ (String.make 1 op) in + let callee = + match lookup_function callee the_module with + | Some callee -> callee + | None -> raise (Error "binary operator not found!") + in + build_call callee [|lhs_val; rhs_val|] "binop" builder + end + | Ast.Call (callee, args) -> + (* Look up the name in the module table. *) + let callee = + match lookup_function callee the_module with + | Some callee -> callee + | None -> raise (Error "unknown function referenced") + in + let params = params callee in + + (* If argument mismatch error. *) + if Array.length params == Array.length args then () else + raise (Error "incorrect # arguments passed"); + let args = Array.map codegen_expr args in + build_call callee args "calltmp" builder + | Ast.If (cond, then_, else_) -> + let cond = codegen_expr cond in + + (* Convert condition to a bool by comparing equal to 0.0 *) + let zero = const_float double_type 0.0 in + let cond_val = build_fcmp Fcmp.One cond zero "ifcond" builder in + + (* Grab the first block so that we might later add the conditional branch + * to it at the end of the function. *) + let start_bb = insertion_block builder in + let the_function = block_parent start_bb in + + let then_bb = append_block context "then" the_function in + + (* Emit 'then' value. *) + position_at_end then_bb builder; + let then_val = codegen_expr then_ in + + (* Codegen of 'then' can change the current block, update then_bb for the + * phi. We create a new name because one is used for the phi node, and the + * other is used for the conditional branch. *) + let new_then_bb = insertion_block builder in + + (* Emit 'else' value. *) + let else_bb = append_block context "else" the_function in + position_at_end else_bb builder; + let else_val = codegen_expr else_ in + + (* Codegen of 'else' can change the current block, update else_bb for the + * phi. *) + let new_else_bb = insertion_block builder in + + (* Emit merge block. *) + let merge_bb = append_block context "ifcont" the_function in + position_at_end merge_bb builder; + let incoming = [(then_val, new_then_bb); (else_val, new_else_bb)] in + let phi = build_phi incoming "iftmp" builder in + + (* Return to the start block to add the conditional branch. *) + position_at_end start_bb builder; + ignore (build_cond_br cond_val then_bb else_bb builder); + + (* Set a unconditional branch at the end of the 'then' block and the + * 'else' block to the 'merge' block. *) + position_at_end new_then_bb builder; ignore (build_br merge_bb builder); + position_at_end new_else_bb builder; ignore (build_br merge_bb builder); + + (* Finally, set the builder to the end of the merge block. *) + position_at_end merge_bb builder; + + phi + | Ast.For (var_name, start, end_, step, body) -> + (* Emit the start code first, without 'variable' in scope. *) + let start_val = codegen_expr start in + + (* Make the new basic block for the loop header, inserting after current + * block. *) + let preheader_bb = insertion_block builder in + let the_function = block_parent preheader_bb in + let loop_bb = append_block context "loop" the_function in + + (* Insert an explicit fall through from the current block to the + * loop_bb. *) + ignore (build_br loop_bb builder); + + (* Start insertion in loop_bb. *) + position_at_end loop_bb builder; + + (* Start the PHI node with an entry for start. *) + let variable = build_phi [(start_val, preheader_bb)] var_name builder in + + (* Within the loop, the variable is defined equal to the PHI node. If it + * shadows an existing variable, we have to restore it, so save it + * now. *) + let old_val = + try Some (Hashtbl.find named_values var_name) with Not_found -> None + in + Hashtbl.add named_values var_name variable; + + (* Emit the body of the loop. This, like any other expr, can change the + * current BB. Note that we ignore the value computed by the body, but + * don't allow an error *) + ignore (codegen_expr body); + + (* Emit the step value. *) + let step_val = + match step with + | Some step -> codegen_expr step + (* If not specified, use 1.0. *) + | None -> const_float double_type 1.0 + in + + let next_var = build_add variable step_val "nextvar" builder in + + (* Compute the end condition. *) + let end_cond = codegen_expr end_ in + + (* Convert condition to a bool by comparing equal to 0.0. *) + let zero = const_float double_type 0.0 in + let end_cond = build_fcmp Fcmp.One end_cond zero "loopcond" builder in + + (* Create the "after loop" block and insert it. *) + let loop_end_bb = insertion_block builder in + let after_bb = append_block context "afterloop" the_function in + + (* Insert the conditional branch into the end of loop_end_bb. *) + ignore (build_cond_br end_cond loop_bb after_bb builder); + + (* Any new code will be inserted in after_bb. *) + position_at_end after_bb builder; + + (* Add a new entry to the PHI node for the backedge. *) + add_incoming (next_var, loop_end_bb) variable; + + (* Restore the unshadowed variable. *) + begin match old_val with + | Some old_val -> Hashtbl.add named_values var_name old_val + | None -> () + end; + + (* for expr always returns 0.0. *) + const_null double_type + + let codegen_proto = function + | Ast.Prototype (name, args) | Ast.BinOpPrototype (name, args, _) -> + (* Make the function type: double(double,double) etc. *) + let doubles = Array.make (Array.length args) double_type in + let ft = function_type double_type doubles in + let f = + match lookup_function name the_module with + | None -> declare_function name ft the_module + + (* If 'f' conflicted, there was already something named 'name'. If it + * has a body, don't allow redefinition or reextern. *) + | Some f -> + (* If 'f' already has a body, reject this. *) + if block_begin f <> At_end f then + raise (Error "redefinition of function"); + + (* If 'f' took a different number of arguments, reject. *) + if element_type (type_of f) <> ft then + raise (Error "redefinition of function with different # args"); + f + in + + (* Set names for all arguments. *) + Array.iteri (fun i a -> + let n = args.(i) in + set_value_name n a; + Hashtbl.add named_values n a; + ) (params f); + f + + let codegen_func the_fpm = function + | Ast.Function (proto, body) -> + Hashtbl.clear named_values; + let the_function = codegen_proto proto in + + (* If this is an operator, install it. *) + begin match proto with + | Ast.BinOpPrototype (name, args, prec) -> + let op = name.[String.length name - 1] in + Hashtbl.add Parser.binop_precedence op prec; + | _ -> () + end; + + (* Create a new basic block to start insertion into. *) + let bb = append_block context "entry" the_function in + position_at_end bb builder; + + try + let ret_val = codegen_expr body in + + (* Finish off the function. *) + let _ = build_ret ret_val builder in + + (* Validate the generated code, checking for consistency. *) + Llvm_analysis.assert_valid_function the_function; + + (* Optimize the function. *) + let _ = PassManager.run_function the_function the_fpm in + + the_function + with e -> + delete_function the_function; + raise e + +toplevel.ml: + .. code-block:: ocaml + + (*===----------------------------------------------------------------------=== + * Top-Level parsing and JIT Driver + *===----------------------------------------------------------------------===*) + + open Llvm + open Llvm_executionengine + + (* top ::= definition | external | expression | ';' *) + let rec main_loop the_fpm the_execution_engine stream = + match Stream.peek stream with + | None -> () + + (* ignore top-level semicolons. *) + | Some (Token.Kwd ';') -> + Stream.junk stream; + main_loop the_fpm the_execution_engine stream + + | Some token -> + begin + try match token with + | Token.Def -> + let e = Parser.parse_definition stream in + print_endline "parsed a function definition."; + dump_value (Codegen.codegen_func the_fpm e); + | Token.Extern -> + let e = Parser.parse_extern stream in + print_endline "parsed an extern."; + dump_value (Codegen.codegen_proto e); + | _ -> + (* Evaluate a top-level expression into an anonymous function. *) + let e = Parser.parse_toplevel stream in + print_endline "parsed a top-level expr"; + let the_function = Codegen.codegen_func the_fpm e in + dump_value the_function; + + (* JIT the function, returning a function pointer. *) + let result = ExecutionEngine.run_function the_function [||] + the_execution_engine in + + print_string "Evaluated to "; + print_float (GenericValue.as_float Codegen.double_type result); + print_newline (); + with Stream.Error s | Codegen.Error s -> + (* Skip token for error recovery. *) + Stream.junk stream; + print_endline s; + end; + print_string "ready> "; flush stdout; + main_loop the_fpm the_execution_engine stream + +toy.ml: + .. code-block:: ocaml + + (*===----------------------------------------------------------------------=== + * Main driver code. + *===----------------------------------------------------------------------===*) + + open Llvm + open Llvm_executionengine + open Llvm_target + open Llvm_scalar_opts + + let main () = + ignore (initialize_native_target ()); + + (* Install standard binary operators. + * 1 is the lowest precedence. *) + Hashtbl.add Parser.binop_precedence '<' 10; + Hashtbl.add Parser.binop_precedence '+' 20; + Hashtbl.add Parser.binop_precedence '-' 20; + Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) + + (* Prime the first token. *) + print_string "ready> "; flush stdout; + let stream = Lexer.lex (Stream.of_channel stdin) in + + (* Create the JIT. *) + let the_execution_engine = ExecutionEngine.create Codegen.the_module in + let the_fpm = PassManager.create_function Codegen.the_module in + + (* Set up the optimizer pipeline. Start with registering info about how the + * target lays out data structures. *) + DataLayout.add (ExecutionEngine.target_data the_execution_engine) the_fpm; + + (* Do simple "peephole" optimizations and bit-twiddling optzn. *) + add_instruction_combination the_fpm; + + (* reassociate expressions. *) + add_reassociation the_fpm; + + (* Eliminate Common SubExpressions. *) + add_gvn the_fpm; + + (* Simplify the control flow graph (deleting unreachable blocks, etc). *) + add_cfg_simplification the_fpm; + + ignore (PassManager.initialize the_fpm); + + (* Run the main "interpreter loop" now. *) + Toplevel.main_loop the_fpm the_execution_engine stream; + + (* Print out all the generated code. *) + dump_module Codegen.the_module + ;; + + main () + +bindings.c + .. code-block:: c + + #include <stdio.h> + + /* putchard - putchar that takes a double and returns 0. */ + extern double putchard(double X) { + putchar((char)X); + return 0; + } + + /* printd - printf that takes a double prints it as "%f\n", returning 0. */ + extern double printd(double X) { + printf("%f\n", X); + return 0; + } + +`Next: Extending the language: mutable variables / SSA +construction <OCamlLangImpl7.html>`_ + diff --git a/docs/tutorial/OCamlLangImpl7.html b/docs/tutorial/OCamlLangImpl7.html deleted file mode 100644 index fd66b10c1a..0000000000 --- a/docs/tutorial/OCamlLangImpl7.html +++ /dev/null @@ -1,1904 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> - -<html> -<head> - <title>Kaleidoscope: Extending the Language: Mutable Variables / SSA - construction</title> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <meta name="author" content="Chris Lattner"> - <meta name="author" content="Erick Tryzelaar"> - <link rel="stylesheet" href="../_static/llvm.css" type="text/css"> -</head> - -<body> - -<h1>Kaleidoscope: Extending the Language: Mutable Variables</h1> - -<ul> -<li><a href="index.html">Up to Tutorial Index</a></li> -<li>Chapter 7 - <ol> - <li><a href="#intro">Chapter 7 Introduction</a></li> - <li><a href="#why">Why is this a hard problem?</a></li> - <li><a href="#memory">Memory in LLVM</a></li> - <li><a href="#kalvars">Mutable Variables in Kaleidoscope</a></li> - <li><a href="#adjustments">Adjusting Existing Variables for - Mutation</a></li> - <li><a href="#assignment">New Assignment Operator</a></li> - <li><a href="#localvars">User-defined Local Variables</a></li> - <li><a href="#code">Full Code Listing</a></li> - </ol> -</li> -<li><a href="OCamlLangImpl8.html">Chapter 8</a>: Conclusion and other useful LLVM - tidbits</li> -</ul> - -<div class="doc_author"> - <p> - Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a> - and <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a> - </p> -</div> - -<!-- *********************************************************************** --> -<h2><a name="intro">Chapter 7 Introduction</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>Welcome to Chapter 7 of the "<a href="index.html">Implementing a language -with LLVM</a>" tutorial. In chapters 1 through 6, we've built a very -respectable, albeit simple, <a -href="http://en.wikipedia.org/wiki/Functional_programming">functional -programming language</a>. In our journey, we learned some parsing techniques, -how to build and represent an AST, how to build LLVM IR, and how to optimize -the resultant code as well as JIT compile it.</p> - -<p>While Kaleidoscope is interesting as a functional language, the fact that it -is functional makes it "too easy" to generate LLVM IR for it. In particular, a -functional language makes it very easy to build LLVM IR directly in <a -href="http://en.wikipedia.org/wiki/Static_single_assignment_form">SSA form</a>. -Since LLVM requires that the input code be in SSA form, this is a very nice -property and it is often unclear to newcomers how to generate code for an -imperative language with mutable variables.</p> - -<p>The short (and happy) summary of this chapter is that there is no need for -your front-end to build SSA form: LLVM provides highly tuned and well tested -support for this, though the way it works is a bit unexpected for some.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="why">Why is this a hard problem?</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p> -To understand why mutable variables cause complexities in SSA construction, -consider this extremely simple C example: -</p> - -<div class="doc_code"> -<pre> -int G, H; -int test(_Bool Condition) { - int X; - if (Condition) - X = G; - else - X = H; - return X; -} -</pre> -</div> - -<p>In this case, we have the variable "X", whose value depends on the path -executed in the program. Because there are two different possible values for X -before the return instruction, a PHI node is inserted to merge the two values. -The LLVM IR that we want for this example looks like this:</p> - -<div class="doc_code"> -<pre> -@G = weak global i32 0 ; type of @G is i32* -@H = weak global i32 0 ; type of @H is i32* - -define i32 @test(i1 %Condition) { -entry: - br i1 %Condition, label %cond_true, label %cond_false - -cond_true: - %X.0 = load i32* @G - br label %cond_next - -cond_false: - %X.1 = load i32* @H - br label %cond_next - -cond_next: - %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ] - ret i32 %X.2 -} -</pre> -</div> - -<p>In this example, the loads from the G and H global variables are explicit in -the LLVM IR, and they live in the then/else branches of the if statement -(cond_true/cond_false). In order to merge the incoming values, the X.2 phi node -in the cond_next block selects the right value to use based on where control -flow is coming from: if control flow comes from the cond_false block, X.2 gets -the value of X.1. Alternatively, if control flow comes from cond_true, it gets -the value of X.0. The intent of this chapter is not to explain the details of -SSA form. For more information, see one of the many <a -href="http://en.wikipedia.org/wiki/Static_single_assignment_form">online -references</a>.</p> - -<p>The question for this article is "who places the phi nodes when lowering -assignments to mutable variables?". The issue here is that LLVM -<em>requires</em> that its IR be in SSA form: there is no "non-ssa" mode for it. -However, SSA construction requires non-trivial algorithms and data structures, -so it is inconvenient and wasteful for every front-end to have to reproduce this -logic.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="memory">Memory in LLVM</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>The 'trick' here is that while LLVM does require all register values to be -in SSA form, it does not require (or permit) memory objects to be in SSA form. -In the example above, note that the loads from G and H are direct accesses to -G and H: they are not renamed or versioned. This differs from some other -compiler systems, which do try to version memory objects. In LLVM, instead of -encoding dataflow analysis of memory into the LLVM IR, it is handled with <a -href="../WritingAnLLVMPass.html">Analysis Passes</a> which are computed on -demand.</p> - -<p> -With this in mind, the high-level idea is that we want to make a stack variable -(which lives in memory, because it is on the stack) for each mutable object in -a function. To take advantage of this trick, we need to talk about how LLVM -represents stack variables. -</p> - -<p>In LLVM, all memory accesses are explicit with load/store instructions, and -it is carefully designed not to have (or need) an "address-of" operator. Notice -how the type of the @G/@H global variables is actually "i32*" even though the -variable is defined as "i32". What this means is that @G defines <em>space</em> -for an i32 in the global data area, but its <em>name</em> actually refers to the -address for that space. Stack variables work the same way, except that instead of -being declared with global variable definitions, they are declared with the -<a href="../LangRef.html#i_alloca">LLVM alloca instruction</a>:</p> - -<div class="doc_code"> -<pre> -define i32 @example() { -entry: - %X = alloca i32 ; type of %X is i32*. - ... - %tmp = load i32* %X ; load the stack value %X from the stack. - %tmp2 = add i32 %tmp, 1 ; increment it - store i32 %tmp2, i32* %X ; store it back - ... -</pre> -</div> - -<p>This code shows an example of how you can declare and manipulate a stack -variable in the LLVM IR. Stack memory allocated with the alloca instruction is -fully general: you can pass the address of the stack slot to functions, you can -store it in other variables, etc. In our example above, we could rewrite the -example to use the alloca technique to avoid using a PHI node:</p> - -<div class="doc_code"> -<pre> -@G = weak global i32 0 ; type of @G is i32* -@H = weak global i32 0 ; type of @H is i32* - -define i32 @test(i1 %Condition) { -entry: - %X = alloca i32 ; type of %X is i32*. - br i1 %Condition, label %cond_true, label %cond_false - -cond_true: - %X.0 = load i32* @G - store i32 %X.0, i32* %X ; Update X - br label %cond_next - -cond_false: - %X.1 = load i32* @H - store i32 %X.1, i32* %X ; Update X - br label %cond_next - -cond_next: - %X.2 = load i32* %X ; Read X - ret i32 %X.2 -} -</pre> -</div> - -<p>With this, we have discovered a way to handle arbitrary mutable variables -without the need to create Phi nodes at all:</p> - -<ol> -<li>Each mutable variable becomes a stack allocation.</li> -<li>Each read of the variable becomes a load from the stack.</li> -<li>Each update of the variable becomes a store to the stack.</li> -<li>Taking the address of a variable just uses the stack address directly.</li> -</ol> - -<p>While this solution has solved our immediate problem, it introduced another -one: we have now apparently introduced a lot of stack traffic for very simple -and common operations, a major performance problem. Fortunately for us, the -LLVM optimizer has a highly-tuned optimization pass named "mem2reg" that handles -this case, promoting allocas like this into SSA registers, inserting Phi nodes -as appropriate. If you run this example through the pass, for example, you'll -get:</p> - -<div class="doc_code"> -<pre> -$ <b>llvm-as < example.ll | opt -mem2reg | llvm-dis</b> -@G = weak global i32 0 -@H = weak global i32 0 - -define i32 @test(i1 %Condition) { -entry: - br i1 %Condition, label %cond_true, label %cond_false - -cond_true: - %X.0 = load i32* @G - br label %cond_next - -cond_false: - %X.1 = load i32* @H - br label %cond_next - -cond_next: - %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ] - ret i32 %X.01 -} -</pre> -</div> - -<p>The mem2reg pass implements the standard "iterated dominance frontier" -algorithm for constructing SSA form and has a number of optimizations that speed -up (very common) degenerate cases. The mem2reg optimization pass is the answer -to dealing with mutable variables, and we highly recommend that you depend on -it. Note that mem2reg only works on variables in certain circumstances:</p> - -<ol> -<li>mem2reg is alloca-driven: it looks for allocas and if it can handle them, it -promotes them. It does not apply to global variables or heap allocations.</li> - -<li>mem2reg only looks for alloca instructions in the entry block of the -function. Being in the entry block guarantees that the alloca is only executed -once, which makes analysis simpler.</li> - -<li>mem2reg only promotes allocas whose uses are direct loads and stores. If -the address of the stack object is passed to a function, or if any funny pointer -arithmetic is involved, the alloca will not be promoted.</li> - -<li>mem2reg only works on allocas of <a -href="../LangRef.html#t_classifications">first class</a> -values (such as pointers, scalars and vectors), and only if the array size -of the allocation is 1 (or missing in the .ll file). mem2reg is not capable of -promoting structs or arrays to registers. Note that the "scalarrepl" pass is -more powerful and can promote structs, "unions", and arrays in many cases.</li> - -</ol> - -<p> -All of these properties are easy to satisfy for most imperative languages, and -we'll illustrate it below with Kaleidoscope. The final question you may be -asking is: should I bother with this nonsense for my front-end? Wouldn't it be -better if I just did SSA construction directly, avoiding use of the mem2reg -optimization pass? In short, we strongly recommend that you use this technique -for building SSA form, unless there is an extremely good reason not to. Using -this technique is:</p> - -<ul> -<li>Proven and well tested: llvm-gcc and clang both use this technique for local -mutable variables. As such, the most common clients of LLVM are using this to -handle a bulk of their variables. You can be sure that bugs are found fast and -fixed early.</li> - -<li>Extremely Fast: mem2reg has a number of special cases that make it fast in -common cases as well as fully general. For example, it has fast-paths for -variables that are only used in a single block, variables that only have one -assignment point, good heuristics to avoid insertion of unneeded phi nodes, etc. -</li> - -<li>Needed for debug info generation: <a href="../SourceLevelDebugging.html"> -Debug information in LLVM</a> relies on having the address of the variable -exposed so that debug info can be attached to it. This technique dovetails -very naturally with this style of debug info.</li> -</ul> - -<p>If nothing else, this makes it much easier to get your front-end up and -running, and is very simple to implement. Lets extend Kaleidoscope with mutable -variables now! -</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="kalvars">Mutable Variables in Kaleidoscope</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>Now that we know the sort of problem we want to tackle, lets see what this -looks like in the context of our little Kaleidoscope language. We're going to -add two features:</p> - -<ol> -<li>The ability to mutate variables with the '=' operator.</li> -<li>The ability to define new variables.</li> -</ol> - -<p>While the first item is really what this is about, we only have variables -for incoming arguments as well as for induction variables, and redefining those only -goes so far :). Also, the ability to define new variables is a -useful thing regardless of whether you will be mutating them. Here's a -motivating example that shows how we could use these:</p> - -<div class="doc_code"> -<pre> -# Define ':' for sequencing: as a low-precedence operator that ignores operands -# and just returns the RHS. -def binary : 1 (x y) y; - -# Recursive fib, we could do this before. -def fib(x) - if (x < 3) then - 1 - else - fib(x-1)+fib(x-2); - -# Iterative fib. -def fibi(x) - <b>var a = 1, b = 1, c in</b> - (for i = 3, i < x in - <b>c = a + b</b> : - <b>a = b</b> : - <b>b = c</b>) : - b; - -# Call it. -fibi(10); -</pre> -</div> - -<p> -In order to mutate variables, we have to change our existing variables to use -the "alloca trick". Once we have that, we'll add our new operator, then extend -Kaleidoscope to support new variable definitions. -</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="adjustments">Adjusting Existing Variables for Mutation</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p> -The symbol table in Kaleidoscope is managed at code generation time by the -'<tt>named_values</tt>' map. This map currently keeps track of the LLVM -"Value*" that holds the double value for the named variable. In order to -support mutation, we need to change this slightly, so that it -<tt>named_values</tt> holds the <em>memory location</em> of the variable in -question. Note that this change is a refactoring: it changes the structure of -the code, but does not (by itself) change the behavior of the compiler. All of -these changes are isolated in the Kaleidoscope code generator.</p> - -<p> -At this point in Kaleidoscope's development, it only supports variables for two -things: incoming arguments to functions and the induction variable of 'for' -loops. For consistency, we'll allow mutation of these variables in addition to -other user-defined variables. This means that these will both need memory -locations. -</p> - -<p>To start our transformation of Kaleidoscope, we'll change the -<tt>named_values</tt> map so that it maps to AllocaInst* instead of Value*. -Once we do this, the C++ compiler will tell us what parts of the code we need to -update:</p> - -<p><b>Note:</b> the ocaml bindings currently model both <tt>Value*</tt>s and -<tt>AllocInst*</tt>s as <tt>Llvm.llvalue</tt>s, but this may change in the -future to be more type safe.</p> - -<div class="doc_code"> -<pre> -let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 -</pre> -</div> - -<p>Also, since we will need to create these alloca's, we'll use a helper -function that ensures that the allocas are created in the entry block of the -function:</p> - -<div class="doc_code"> -<pre> -(* Create an alloca instruction in the entry block of the function. This - * is used for mutable variables etc. *) -let create_entry_block_alloca the_function var_name = - let builder = builder_at (instr_begin (entry_block the_function)) in - build_alloca double_type var_name builder -</pre> -</div> - -<p>This funny looking code creates an <tt>Llvm.llbuilder</tt> object that is -pointing at the first instruction of the entry block. It then creates an alloca -with the expected name and returns it. Because all values in Kaleidoscope are -doubles, there is no need to pass in a type to use.</p> - -<p>With this in place, the first functionality change we want to make is to -variable references. In our new scheme, variables live on the stack, so code -generating a reference to them actually needs to produce a load from the stack -slot:</p> - -<div class="doc_code"> -<pre> -let rec codegen_expr = function - ... - | Ast.Variable name -> - let v = try Hashtbl.find named_values name with - | Not_found -> raise (Error "unknown variable name") - in - <b>(* Load the value. *) - build_load v name builder</b> -</pre> -</div> - -<p>As you can see, this is pretty straightforward. Now we need to update the -things that define the variables to set up the alloca. We'll start with -<tt>codegen_expr Ast.For ...</tt> (see the <a href="#code">full code listing</a> -for the unabridged code):</p> - -<div class="doc_code"> -<pre> - | Ast.For (var_name, start, end_, step, body) -> - let the_function = block_parent (insertion_block builder) in - - (* Create an alloca for the variable in the entry block. *) - <b>let alloca = create_entry_block_alloca the_function var_name in</b> - - (* Emit the start code first, without 'variable' in scope. *) - let start_val = codegen_expr start in - - <b>(* Store the value into the alloca. *) - ignore(build_store start_val alloca builder);</b> - - ... - - (* Within the loop, the variable is defined equal to the PHI node. If it - * shadows an existing variable, we have to restore it, so save it - * now. *) - let old_val = - try Some (Hashtbl.find named_values var_name) with Not_found -> None - in - <b>Hashtbl.add named_values var_name alloca;</b> - - ... - - (* Compute the end condition. *) - let end_cond = codegen_expr end_ in - - <b>(* Reload, increment, and restore the alloca. This handles the case where - * the body of the loop mutates the variable. *) - let cur_var = build_load alloca var_name builder in - let next_var = build_add cur_var step_val "nextvar" builder in - ignore(build_store next_var alloca builder);</b> - ... -</pre> -</div> - -<p>This code is virtually identical to the code <a -href="OCamlLangImpl5.html#forcodegen">before we allowed mutable variables</a>. -The big difference is that we no longer have to construct a PHI node, and we use -load/store to access the variable as needed.</p> - -<p>To support mutable argument variables, we need to also make allocas for them. -The code for this is also pretty simple:</p> - -<div class="doc_code"> -<pre> -(* Create an alloca for each argument and register the argument in the symbol - * table so that references to it will succeed. *) -let create_argument_allocas the_function proto = - let args = match proto with - | Ast.Prototype (_, args) | Ast.BinOpPrototype (_, args, _) -> args - in - Array.iteri (fun i ai -> - let var_name = args.(i) in - (* Create an alloca for this variable. *) - let alloca = create_entry_block_alloca the_function var_name in - - (* Store the initial value into the alloca. *) - ignore(build_store ai alloca builder); - - (* Add arguments to variable symbol table. *) - Hashtbl.add named_values var_name alloca; - ) (params the_function) -</pre> -</div> - -<p>For each argument, we make an alloca, store the input value to the function -into the alloca, and register the alloca as the memory location for the -argument. This method gets invoked by <tt>Codegen.codegen_func</tt> right after -it sets up the entry block for the function.</p> - -<p>The final missing piece is adding the mem2reg pass, which allows us to get -good codegen once again:</p> - -<div class="doc_code"> -<pre> -let main () = - ... - let the_fpm = PassManager.create_function Codegen.the_module in - - (* Set up the optimizer pipeline. Start with registering info about how the - * target lays out data structures. *) - DataLayout.add (ExecutionEngine.target_data the_execution_engine) the_fpm; - - <b>(* Promote allocas to registers. *) - add_memory_to_register_promotion the_fpm;</b> - - (* Do simple "peephole" optimizations and bit-twiddling optzn. *) - add_instruction_combining the_fpm; - - (* reassociate expressions. *) - add_reassociation the_fpm; -</pre> -</div> - -<p>It is interesting to see what the code looks like before and after the -mem2reg optimization runs. For example, this is the before/after code for our -recursive fib function. Before the optimization:</p> - -<div class="doc_code"> -<pre> -define double @fib(double %x) { -entry: - <b>%x1 = alloca double - store double %x, double* %x1 - %x2 = load double* %x1</b> - %cmptmp = fcmp ult double %x2, 3.000000e+00 - %booltmp = uitofp i1 %cmptmp to double - %ifcond = fcmp one double %booltmp, 0.000000e+00 - br i1 %ifcond, label %then, label %else - -then: ; preds = %entry - br label %ifcont - -else: ; preds = %entry - <b>%x3 = load double* %x1</b> - %subtmp = fsub double %x3, 1.000000e+00 - %calltmp = call double @fib(double %subtmp) - <b>%x4 = load double* %x1</b> - %subtmp5 = fsub double %x4, 2.000000e+00 - %calltmp6 = call double @fib(double %subtmp5) - %addtmp = fadd double %calltmp, %calltmp6 - br label %ifcont - -ifcont: ; preds = %else, %then - %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ] - ret double %iftmp -} -</pre> -</div> - -<p>Here there is only one variable (x, the input argument) but you can still -see the extremely simple-minded code generation strategy we are using. In the -entry block, an alloca is created, and the initial input value is stored into -it. Each reference to the variable does a reload from the stack. Also, note -that we didn't modify the if/then/else expression, so it still inserts a PHI -node. While we could make an alloca for it, it is actually easier to create a -PHI node for it, so we still just make the PHI.</p> - -<p>Here is the code after the mem2reg pass runs:</p> - -<div class="doc_code"> -<pre> -define double @fib(double %x) { -entry: - %cmptmp = fcmp ult double <b>%x</b>, 3.000000e+00 - %booltmp = uitofp i1 %cmptmp to double - %ifcond = fcmp one double %booltmp, 0.000000e+00 - br i1 %ifcond, label %then, label %else - -then: - br label %ifcont - -else: - %subtmp = fsub double <b>%x</b>, 1.000000e+00 - %calltmp = call double @fib(double %subtmp) - %subtmp5 = fsub double <b>%x</b>, 2.000000e+00 - %calltmp6 = call double @fib(double %subtmp5) - %addtmp = fadd double %calltmp, %calltmp6 - br label %ifcont - -ifcont: ; preds = %else, %then - %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ] - ret double %iftmp -} -</pre> -</div> - -<p>This is a trivial case for mem2reg, since there are no redefinitions of the -variable. The point of showing this is to calm your tension about inserting -such blatent inefficiencies :).</p> - -<p>After the rest of the optimizers run, we get:</p> - -<div class="doc_code"> -<pre> -define double @fib(double %x) { -entry: - %cmptmp = fcmp ult double %x, 3.000000e+00 - %booltmp = uitofp i1 %cmptmp to double - %ifcond = fcmp ueq double %booltmp, 0.000000e+00 - br i1 %ifcond, label %else, label %ifcont - -else: - %subtmp = fsub double %x, 1.000000e+00 - %calltmp = call double @fib(double %subtmp) - %subtmp5 = fsub double %x, 2.000000e+00 - %calltmp6 = call double @fib(double %subtmp5) - %addtmp = fadd double %calltmp, %calltmp6 - ret double %addtmp - -ifcont: - ret double 1.000000e+00 -} -</pre> -</div> - -<p>Here we see that the simplifycfg pass decided to clone the return instruction -into the end of the 'else' block. This allowed it to eliminate some branches -and the PHI node.</p> - -<p>Now that all symbol table references are updated to use stack variables, -we'll add the assignment operator.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="assignment">New Assignment Operator</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>With our current framework, adding a new assignment operator is really -simple. We will parse it just like any other binary operator, but handle it -internally (instead of allowing the user to define it). The first step is to -set a precedence:</p> - -<div class="doc_code"> -<pre> -let main () = - (* Install standard binary operators. - * 1 is the lowest precedence. *) - <b>Hashtbl.add Parser.binop_precedence '=' 2;</b> - Hashtbl.add Parser.binop_precedence '<' 10; - Hashtbl.add Parser.binop_precedence '+' 20; - Hashtbl.add Parser.binop_precedence '-' 20; - ... -</pre> -</div> - -<p>Now that the parser knows the precedence of the binary operator, it takes -care of all the parsing and AST generation. We just need to implement codegen -for the assignment operator. This looks like:</p> - -<div class="doc_code"> -<pre> -let rec codegen_expr = function - begin match op with - | '=' -> - (* Special case '=' because we don't want to emit the LHS as an - * expression. *) - let name = - match lhs with - | Ast.Variable name -> name - | _ -> raise (Error "destination of '=' must be a variable") - in -</pre> -</div> - -<p>Unlike the rest of the binary operators, our assignment operator doesn't -follow the "emit LHS, emit RHS, do computation" model. As such, it is handled -as a special case before the other binary operators are handled. The other -strange thing is that it requires the LHS to be a variable. It is invalid to -have "(x+1) = expr" - only things like "x = expr" are allowed. -</p> - - -<div class="doc_code"> -<pre> - (* Codegen the rhs. *) - let val_ = codegen_expr rhs in - - (* Lookup the name. *) - let variable = try Hashtbl.find named_values name with - | Not_found -> raise (Error "unknown variable name") - in - ignore(build_store val_ variable builder); - val_ - | _ -> - ... -</pre> -</div> - -<p>Once we have the variable, codegen'ing the assignment is straightforward: -we emit the RHS of the assignment, create a store, and return the computed -value. Returning a value allows for chained assignments like "X = (Y = Z)".</p> - -<p>Now that we have an assignment operator, we can mutate loop variables and -arguments. For example, we can now run code like this:</p> - -<div class="doc_code"> -<pre> -# Function to print a double. -extern printd(x); - -# Define ':' for sequencing: as a low-precedence operator that ignores operands -# and just returns the RHS. -def binary : 1 (x y) y; - -def test(x) - printd(x) : - x = 4 : - printd(x); - -test(123); -</pre> -</div> - -<p>When run, this example prints "123" and then "4", showing that we did -actually mutate the value! Okay, we have now officially implemented our goal: -getting this to work requires SSA construction in the general case. However, -to be really useful, we want the ability to define our own local variables, lets -add this next! -</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="localvars">User-defined Local Variables</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>Adding var/in is just like any other other extensions we made to -Kaleidoscope: we extend the lexer, the parser, the AST and the code generator. -The first step for adding our new 'var/in' construct is to extend the lexer. -As before, this is pretty trivial, the code looks like this:</p> - -<div class="doc_code"> -<pre> -type token = - ... - <b>(* var definition *) - | Var</b> - -... - -and lex_ident buffer = parser - ... - | "in" -> [< 'Token.In; stream >] - | "binary" -> [< 'Token.Binary; stream >] - | "unary" -> [< 'Token.Unary; stream >] - <b>| "var" -> [< 'Token.Var; stream >]</b> - ... -</pre> -</div> - -<p>The next step is to define the AST node that we will construct. For var/in, -it looks like this:</p> - -<div class="doc_code"> -<pre> -type expr = - ... - (* variant for var/in. *) - | Var of (string * expr option) array * expr - ... -</pre> -</div> - -<p>var/in allows a list of names to be defined all at once, and each name can -optionally have an initializer value. As such, we capture this information in -the VarNames vector. Also, var/in has a body, this body is allowed to access -the variables defined by the var/in.</p> - -<p>With this in place, we can define the parser pieces. The first thing we do -is add it as a primary expression:</p> - -<div class="doc_code"> -<pre> -(* primary - * ::= identifier - * ::= numberexpr - * ::= parenexpr - * ::= ifexpr - * ::= forexpr - <b>* ::= varexpr</b> *) -let rec parse_primary = parser - ... - <b>(* varexpr - * ::= 'var' identifier ('=' expression? - * (',' identifier ('=' expression)?)* 'in' expression *) - | [< 'Token.Var; - (* At least one variable name is required. *) - 'Token.Ident id ?? "expected identifier after var"; - init=parse_var_init; - var_names=parse_var_names [(id, init)]; - (* At this point, we have to have 'in'. *) - 'Token.In ?? "expected 'in' keyword after 'var'"; - body=parse_expr >] -> - Ast.Var (Array.of_list (List.rev var_names), body)</b> - -... - -and parse_var_init = parser - (* read in the optional initializer. *) - | [< 'Token.Kwd '='; e=parse_expr >] -> Some e - | [< >] -> None - -and parse_var_names accumulator = parser - | [< 'Token.Kwd ','; - 'Token.Ident id ?? "expected identifier list after var"; - init=parse_var_init; - e=parse_var_names ((id, init) :: accumulator) >] -> e - | [< >] -> accumulator -</pre> -</div> - -<p>Now that we can parse and represent the code, we need to support emission of -LLVM IR for it. This code starts out with:</p> - -<div class="doc_code"> -<pre> -let rec codegen_expr = function - ... - | Ast.Var (var_names, body) - let old_bindings = ref [] in - - let the_function = block_parent (insertion_block builder) in - - (* Register all variables and emit their initializer. *) - Array.iter (fun (var_name, init) -> -</pre> -</div> - -<p>Basically it loops over all the variables, installing them one at a time. -For each variable we put into the symbol table, we remember the previous value -that we replace in OldBindings.</p> - -<div class="doc_code"> -<pre> - (* Emit the initializer before adding the variable to scope, this - * prevents the initializer from referencing the variable itself, and - * permits stuff like this: - * var a = 1 in - * var a = a in ... # refers to outer 'a'. *) - let init_val = - match init with - | Some init -> codegen_expr init - (* If not specified, use 0.0. *) - | None -> const_float double_type 0.0 - in - - let alloca = create_entry_block_alloca the_function var_name in - ignore(build_store init_val alloca builder); - - (* Remember the old variable binding so that we can restore the binding - * when we unrecurse. *) - - begin - try - let old_value = Hashtbl.find named_values var_name in - old_bindings := (var_name, old_value) :: !old_bindings; - with Not_found > () - end; - - (* Remember this binding. *) - Hashtbl.add named_values var_name alloca; - ) var_names; -</pre> -</div> - -<p>There are more comments here than code. The basic idea is that we emit the -initializer, create the alloca, then update the symbol table to point to it. -Once all the variables are installed in the symbol table, we evaluate the body -of the var/in expression:</p> - -<div class="doc_code"> -<pre> - (* Codegen the body, now that all vars are in scope. *) - let body_val = codegen_expr body in -</pre> -</div> - -<p>Finally, before returning, we restore the previous variable bindings:</p> - -<div class="doc_code"> -<pre> - (* Pop all our variables from scope. *) - List.iter (fun (var_name, old_value) -> - Hashtbl.add named_values var_name old_value - ) !old_bindings; - - (* Return the body computation. *) - body_val -</pre> -</div> - -<p>The end result of all of this is that we get properly scoped variable -definitions, and we even (trivially) allow mutation of them :).</p> - -<p>With this, we completed what we set out to do. Our nice iterative fib -example from the intro compiles and runs just fine. The mem2reg pass optimizes -all of our stack variables into SSA registers, inserting PHI nodes where needed, -and our front-end remains simple: no "iterated dominance frontier" computation -anywhere in sight.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="code">Full Code Listing</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p> -Here is the complete code listing for our running example, enhanced with mutable -variables and var/in support. To build this example, use: -</p> - -<div class="doc_code"> -<pre> -# Compile -ocamlbuild toy.byte -# Run -./toy.byte -</pre> -</div> - -<p>Here is the code:</p> - -<dl> -<dt>_tags:</dt> -<dd class="doc_code"> -<pre> -<{lexer,parser}.ml>: use_camlp4, pp(camlp4of) -<*.{byte,native}>: g++, use_llvm, use_llvm_analysis -<*.{byte,native}>: use_llvm_executionengine, use_llvm_target -<*.{byte,native}>: use_llvm_scalar_opts, use_bindings -</pre> -</dd> - -<dt>myocamlbuild.ml:</dt> -<dd class="doc_code"> -<pre> -open Ocamlbuild_plugin;; - -ocaml_lib ~extern:true "llvm";; -ocaml_lib ~extern:true "llvm_analysis";; -ocaml_lib ~extern:true "llvm_executionengine";; -ocaml_lib ~extern:true "llvm_target";; -ocaml_lib ~extern:true "llvm_scalar_opts";; - -flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"; A"-cclib"; A"-rdynamic"]);; -dep ["link"; "ocaml"; "use_bindings"] ["bindings.o"];; -</pre> -</dd> - -<dt>token.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Lexer Tokens - *===----------------------------------------------------------------------===*) - -(* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of - * these others for known things. *) -type token = - (* commands *) - | Def | Extern - - (* primary *) - | Ident of string | Number of float - - (* unknown *) - | Kwd of char - - (* control *) - | If | Then | Else - | For | In - - (* operators *) - | Binary | Unary - - (* var definition *) - | Var -</pre> -</dd> - -<dt>lexer.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Lexer - *===----------------------------------------------------------------------===*) - -let rec lex = parser - (* Skip any whitespace. *) - | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream - - (* identifier: [a-zA-Z][a-zA-Z0-9] *) - | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_ident buffer stream - - (* number: [0-9.]+ *) - | [< ' ('0' .. '9' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_number buffer stream - - (* Comment until end of line. *) - | [< ' ('#'); stream >] -> - lex_comment stream - - (* Otherwise, just return the character as its ascii value. *) - | [< 'c; stream >] -> - [< 'Token.Kwd c; lex stream >] - - (* end of stream. *) - | [< >] -> [< >] - -and lex_number buffer = parser - | [< ' ('0' .. '9' | '.' as c); stream >] -> - Buffer.add_char buffer c; - lex_number buffer stream - | [< stream=lex >] -> - [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] - -and lex_ident buffer = parser - | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> - Buffer.add_char buffer c; - lex_ident buffer stream - | [< stream=lex >] -> - match Buffer.contents buffer with - | "def" -> [< 'Token.Def; stream >] - | "extern" -> [< 'Token.Extern; stream >] - | "if" -> [< 'Token.If; stream >] - | "then" -> [< 'Token.Then; stream >] - | "else" -> [< 'Token.Else; stream >] - | "for" -> [< 'Token.For; stream >] - | "in" -> [< 'Token.In; stream >] - | "binary" -> [< 'Token.Binary; stream >] - | "unary" -> [< 'Token.Unary; stream >] - | "var" -> [< 'Token.Var; stream >] - | id -> [< 'Token.Ident id; stream >] - -and lex_comment = parser - | [< ' ('\n'); stream=lex >] -> stream - | [< 'c; e=lex_comment >] -> e - | [< >] -> [< >] -</pre> -</dd> - -<dt>ast.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Abstract Syntax Tree (aka Parse Tree) - *===----------------------------------------------------------------------===*) - -(* expr - Base type for all expression nodes. *) -type expr = - (* variant for numeric literals like "1.0". *) - | Number of float - - (* variant for referencing a variable, like "a". *) - | Variable of string - - (* variant for a unary operator. *) - | Unary of char * expr - - (* variant for a binary operator. *) - | Binary of char * expr * expr - - (* variant for function calls. *) - | Call of string * expr array - - (* variant for if/then/else. *) - | If of expr * expr * expr - - (* variant for for/in. *) - | For of string * expr * expr * expr option * expr - - (* variant for var/in. *) - | Var of (string * expr option) array * expr - -(* proto - This type represents the "prototype" for a function, which captures - * its name, and its argument names (thus implicitly the number of arguments the - * function takes). *) -type proto = - | Prototype of string * string array - | BinOpPrototype of string * string array * int - -(* func - This type represents a function definition itself. *) -type func = Function of proto * expr -</pre> -</dd> - -<dt>parser.ml:</dt> -<dd class="doc_code"> -<pre> -(*===---------------------------------------------------------------------=== - * Parser - *===---------------------------------------------------------------------===*) - -(* binop_precedence - This holds the precedence for each binary operator that is - * defined *) -let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 - -(* precedence - Get the precedence of the pending binary operator token. *) -let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 - -(* primary - * ::= identifier - * ::= numberexpr - * ::= parenexpr - * ::= ifexpr - * ::= forexpr - * ::= varexpr *) -let rec parse_primary = parser - (* numberexpr ::= number *) - | [< 'Token.Number n >] -> Ast.Number n - - (* parenexpr ::= '(' expression ')' *) - | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e - - (* identifierexpr - * ::= identifier - * ::= identifier '(' argumentexpr ')' *) - | [< 'Token.Ident id; stream >] -> - let rec parse_args accumulator = parser - | [< e=parse_expr; stream >] -> - begin parser - | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e - | [< >] -> e :: accumulator - end stream - | [< >] -> accumulator - in - let rec parse_ident id = parser - (* Call. *) - | [< 'Token.Kwd '('; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')'">] -> - Ast.Call (id, Array.of_list (List.rev args)) - - (* Simple variable ref. *) - | [< >] -> Ast.Variable id - in - parse_ident id stream - - (* ifexpr ::= 'if' expr 'then' expr 'else' expr *) - | [< 'Token.If; c=parse_expr; - 'Token.Then ?? "expected 'then'"; t=parse_expr; - 'Token.Else ?? "expected 'else'"; e=parse_expr >] -> - Ast.If (c, t, e) - - (* forexpr - ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression *) - | [< 'Token.For; - 'Token.Ident id ?? "expected identifier after for"; - 'Token.Kwd '=' ?? "expected '=' after for"; - stream >] -> - begin parser - | [< - start=parse_expr; - 'Token.Kwd ',' ?? "expected ',' after for"; - end_=parse_expr; - stream >] -> - let step = - begin parser - | [< 'Token.Kwd ','; step=parse_expr >] -> Some step - | [< >] -> None - end stream - in - begin parser - | [< 'Token.In; body=parse_expr >] -> - Ast.For (id, start, end_, step, body) - | [< >] -> - raise (Stream.Error "expected 'in' after for") - end stream - | [< >] -> - raise (Stream.Error "expected '=' after for") - end stream - - (* varexpr - * ::= 'var' identifier ('=' expression? - * (',' identifier ('=' expression)?)* 'in' expression *) - | [< 'Token.Var; - (* At least one variable name is required. *) - 'Token.Ident id ?? "expected identifier after var"; - init=parse_var_init; - var_names=parse_var_names [(id, init)]; - (* At this point, we have to have 'in'. *) - 'Token.In ?? "expected 'in' keyword after 'var'"; - body=parse_expr >] -> - Ast.Var (Array.of_list (List.rev var_names), body) - - | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") - -(* unary - * ::= primary - * ::= '!' unary *) -and parse_unary = parser - (* If this is a unary operator, read it. *) - | [< 'Token.Kwd op when op != '(' && op != ')'; operand=parse_expr >] -> - Ast.Unary (op, operand) - - (* If the current token is not an operator, it must be a primary expr. *) - | [< stream >] -> parse_primary stream - -(* binoprhs - * ::= ('+' primary)* *) -and parse_bin_rhs expr_prec lhs stream = - match Stream.peek stream with - (* If this is a binop, find its precedence. *) - | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> - let token_prec = precedence c in - - (* If this is a binop that binds at least as tightly as the current binop, - * consume it, otherwise we are done. *) - if token_prec < expr_prec then lhs else begin - (* Eat the binop. *) - Stream.junk stream; - - (* Parse the primary expression after the binary operator. *) - let rhs = parse_unary stream in - - (* Okay, we know this is a binop. *) - let rhs = - match Stream.peek stream with - | Some (Token.Kwd c2) -> - (* If BinOp binds less tightly with rhs than the operator after - * rhs, let the pending operator take rhs as its lhs. *) - let next_prec = precedence c2 in - if token_prec < next_prec - then parse_bin_rhs (token_prec + 1) rhs stream - else rhs - | _ -> rhs - in - - (* Merge lhs/rhs. *) - let lhs = Ast.Binary (c, lhs, rhs) in - parse_bin_rhs expr_prec lhs stream - end - | _ -> lhs - -and parse_var_init = parser - (* read in the optional initializer. *) - | [< 'Token.Kwd '='; e=parse_expr >] -> Some e - | [< >] -> None - -and parse_var_names accumulator = parser - | [< 'Token.Kwd ','; - 'Token.Ident id ?? "expected identifier list after var"; - init=parse_var_init; - e=parse_var_names ((id, init) :: accumulator) >] -> e - | [< >] -> accumulator - -(* expression - * ::= primary binoprhs *) -and parse_expr = parser - | [< lhs=parse_unary; stream >] -> parse_bin_rhs 0 lhs stream - -(* prototype - * ::= id '(' id* ')' - * ::= binary LETTER number? (id, id) - * ::= unary LETTER number? (id) *) -let parse_prototype = - let rec parse_args accumulator = parser - | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e - | [< >] -> accumulator - in - let parse_operator = parser - | [< 'Token.Unary >] -> "unary", 1 - | [< 'Token.Binary >] -> "binary", 2 - in - let parse_binary_precedence = parser - | [< 'Token.Number n >] -> int_of_float n - | [< >] -> 30 - in - parser - | [< 'Token.Ident id; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - (* success. *) - Ast.Prototype (id, Array.of_list (List.rev args)) - | [< (prefix, kind)=parse_operator; - 'Token.Kwd op ?? "expected an operator"; - (* Read the precedence if present. *) - binary_precedence=parse_binary_precedence; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - let name = prefix ^ (String.make 1 op) in - let args = Array.of_list (List.rev args) in - - (* Verify right number of arguments for operator. *) - if Array.length args != kind - then raise (Stream.Error "invalid number of operands for operator") - else - if kind == 1 then - Ast.Prototype (name, args) - else - Ast.BinOpPrototype (name, args, binary_precedence) - | [< >] -> - raise (Stream.Error "expected function name in prototype") - -(* definition ::= 'def' prototype expression *) -let parse_definition = parser - | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> - Ast.Function (p, e) - -(* toplevelexpr ::= expression *) -let parse_toplevel = parser - | [< e=parse_expr >] -> - (* Make an anonymous proto. *) - Ast.Function (Ast.Prototype ("", [||]), e) - -(* external ::= 'extern' prototype *) -let parse_extern = parser - | [< 'Token.Extern; e=parse_prototype >] -> e -</pre> -</dd> - -<dt>codegen.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Code Generation - *===----------------------------------------------------------------------===*) - -open Llvm - -exception Error of string - -let context = global_context () -let the_module = create_module context "my cool jit" -let builder = builder context -let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 -let double_type = double_type context - -(* Create an alloca instruction in the entry block of the function. This - * is used for mutable variables etc. *) -let create_entry_block_alloca the_function var_name = - let builder = builder_at context (instr_begin (entry_block the_function)) in - build_alloca double_type var_name builder - -let rec codegen_expr = function - | Ast.Number n -> const_float double_type n - | Ast.Variable name -> - let v = try Hashtbl.find named_values name with - | Not_found -> raise (Error "unknown variable name") - in - (* Load the value. *) - build_load v name builder - | Ast.Unary (op, operand) -> - let operand = codegen_expr operand in - let callee = "unary" ^ (String.make 1 op) in - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "unknown unary operator") - in - build_call callee [|operand|] "unop" builder - | Ast.Binary (op, lhs, rhs) -> - begin match op with - | '=' -> - (* Special case '=' because we don't want to emit the LHS as an - * expression. *) - let name = - match lhs with - | Ast.Variable name -> name - | _ -> raise (Error "destination of '=' must be a variable") - in - - (* Codegen the rhs. *) - let val_ = codegen_expr rhs in - - (* Lookup the name. *) - let variable = try Hashtbl.find named_values name with - | Not_found -> raise (Error "unknown variable name") - in - ignore(build_store val_ variable builder); - val_ - | _ -> - let lhs_val = codegen_expr lhs in - let rhs_val = codegen_expr rhs in - begin - match op with - | '+' -> build_add lhs_val rhs_val "addtmp" builder - | '-' -> build_sub lhs_val rhs_val "subtmp" builder - | '*' -> build_mul lhs_val rhs_val "multmp" builder - | '<' -> - (* Convert bool 0/1 to double 0.0 or 1.0 *) - let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in - build_uitofp i double_type "booltmp" builder - | _ -> - (* If it wasn't a builtin binary operator, it must be a user defined - * one. Emit a call to it. *) - let callee = "binary" ^ (String.make 1 op) in - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "binary operator not found!") - in - build_call callee [|lhs_val; rhs_val|] "binop" builder - end - end - | Ast.Call (callee, args) -> - (* Look up the name in the module table. *) - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "unknown function referenced") - in - let params = params callee in - - (* If argument mismatch error. *) - if Array.length params == Array.length args then () else - raise (Error "incorrect # arguments passed"); - let args = Array.map codegen_expr args in - build_call callee args "calltmp" builder - | Ast.If (cond, then_, else_) -> - let cond = codegen_expr cond in - - (* Convert condition to a bool by comparing equal to 0.0 *) - let zero = const_float double_type 0.0 in - let cond_val = build_fcmp Fcmp.One cond zero "ifcond" builder in - - (* Grab the first block so that we might later add the conditional branch - * to it at the end of the function. *) - let start_bb = insertion_block builder in - let the_function = block_parent start_bb in - - let then_bb = append_block context "then" the_function in - - (* Emit 'then' value. *) - position_at_end then_bb builder; - let then_val = codegen_expr then_ in - - (* Codegen of 'then' can change the current block, update then_bb for the - * phi. We create a new name because one is used for the phi node, and the - * other is used for the conditional branch. *) - let new_then_bb = insertion_block builder in - - (* Emit 'else' value. *) - let else_bb = append_block context "else" the_function in - position_at_end else_bb builder; - let else_val = codegen_expr else_ in - - (* Codegen of 'else' can change the current block, update else_bb for the - * phi. *) - let new_else_bb = insertion_block builder in - - (* Emit merge block. *) - let merge_bb = append_block context "ifcont" the_function in - position_at_end merge_bb builder; - let incoming = [(then_val, new_then_bb); (else_val, new_else_bb)] in - let phi = build_phi incoming "iftmp" builder in - - (* Return to the start block to add the conditional branch. *) - position_at_end start_bb builder; - ignore (build_cond_br cond_val then_bb else_bb builder); - - (* Set a unconditional branch at the end of the 'then' block and the - * 'else' block to the 'merge' block. *) - position_at_end new_then_bb builder; ignore (build_br merge_bb builder); - position_at_end new_else_bb builder; ignore (build_br merge_bb builder); - - (* Finally, set the builder to the end of the merge block. *) - position_at_end merge_bb builder; - - phi - | Ast.For (var_name, start, end_, step, body) -> - (* Output this as: - * var = alloca double - * ... - * start = startexpr - * store start -> var - * goto loop - * loop: - * ... - * bodyexpr - * ... - * loopend: - * step = stepexpr - * endcond = endexpr - * - * curvar = load var - * nextvar = curvar + step - * store nextvar -> var - * br endcond, loop, endloop - * outloop: *) - - let the_function = block_parent (insertion_block builder) in - - (* Create an alloca for the variable in the entry block. *) - let alloca = create_entry_block_alloca the_function var_name in - - (* Emit the start code first, without 'variable' in scope. *) - let start_val = codegen_expr start in - - (* Store the value into the alloca. *) - ignore(build_store start_val alloca builder); - - (* Make the new basic block for the loop header, inserting after current - * block. *) - let loop_bb = append_block context "loop" the_function in - - (* Insert an explicit fall through from the current block to the - * loop_bb. *) - ignore (build_br loop_bb builder); - - (* Start insertion in loop_bb. *) - position_at_end loop_bb builder; - - (* Within the loop, the variable is defined equal to the PHI node. If it - * shadows an existing variable, we have to restore it, so save it - * now. *) - let old_val = - try Some (Hashtbl.find named_values var_name) with Not_found -> None - in - Hashtbl.add named_values var_name alloca; - - (* Emit the body of the loop. This, like any other expr, can change the - * current BB. Note that we ignore the value computed by the body, but - * don't allow an error *) - ignore (codegen_expr body); - - (* Emit the step value. *) - let step_val = - match step with - | Some step -> codegen_expr step - (* If not specified, use 1.0. *) - | None -> const_float double_type 1.0 - in - - (* Compute the end condition. *) - let end_cond = codegen_expr end_ in - - (* Reload, increment, and restore the alloca. This handles the case where - * the body of the loop mutates the variable. *) - let cur_var = build_load alloca var_name builder in - let next_var = build_add cur_var step_val "nextvar" builder in - ignore(build_store next_var alloca builder); - - (* Convert condition to a bool by comparing equal to 0.0. *) - let zero = const_float double_type 0.0 in - let end_cond = build_fcmp Fcmp.One end_cond zero "loopcond" builder in - - (* Create the "after loop" block and insert it. *) - let after_bb = append_block context "afterloop" the_function in - - (* Insert the conditional branch into the end of loop_end_bb. *) - ignore (build_cond_br end_cond loop_bb after_bb builder); - - (* Any new code will be inserted in after_bb. *) - position_at_end after_bb builder; - - (* Restore the unshadowed variable. *) - begin match old_val with - | Some old_val -> Hashtbl.add named_values var_name old_val - | None -> () - end; - - (* for expr always returns 0.0. *) - const_null double_type - | Ast.Var (var_names, body) -> - let old_bindings = ref [] in - - let the_function = block_parent (insertion_block builder) in - - (* Register all variables and emit their initializer. *) - Array.iter (fun (var_name, init) -> - (* Emit the initializer before adding the variable to scope, this - * prevents the initializer from referencing the variable itself, and - * permits stuff like this: - * var a = 1 in - * var a = a in ... # refers to outer 'a'. *) - let init_val = - match init with - | Some init -> codegen_expr init - (* If not specified, use 0.0. *) - | None -> const_float double_type 0.0 - in - - let alloca = create_entry_block_alloca the_function var_name in - ignore(build_store init_val alloca builder); - - (* Remember the old variable binding so that we can restore the binding - * when we unrecurse. *) - begin - try - let old_value = Hashtbl.find named_values var_name in - old_bindings := (var_name, old_value) :: !old_bindings; - with Not_found -> () - end; - - (* Remember this binding. *) - Hashtbl.add named_values var_name alloca; - ) var_names; - - (* Codegen the body, now that all vars are in scope. *) - let body_val = codegen_expr body in - - (* Pop all our variables from scope. *) - List.iter (fun (var_name, old_value) -> - Hashtbl.add named_values var_name old_value - ) !old_bindings; - - (* Return the body computation. *) - body_val - -let codegen_proto = function - | Ast.Prototype (name, args) | Ast.BinOpPrototype (name, args, _) -> - (* Make the function type: double(double,double) etc. *) - let doubles = Array.make (Array.length args) double_type in - let ft = function_type double_type doubles in - let f = - match lookup_function name the_module with - | None -> declare_function name ft the_module - - (* If 'f' conflicted, there was already something named 'name'. If it - * has a body, don't allow redefinition or reextern. *) - | Some f -> - (* If 'f' already has a body, reject this. *) - if block_begin f <> At_end f then - raise (Error "redefinition of function"); - - (* If 'f' took a different number of arguments, reject. *) - if element_type (type_of f) <> ft then - raise (Error "redefinition of function with different # args"); - f - in - - (* Set names for all arguments. *) - Array.iteri (fun i a -> - let n = args.(i) in - set_value_name n a; - Hashtbl.add named_values n a; - ) (params f); - f - -(* Create an alloca for each argument and register the argument in the symbol - * table so that references to it will succeed. *) -let create_argument_allocas the_function proto = - let args = match proto with - | Ast.Prototype (_, args) | Ast.BinOpPrototype (_, args, _) -> args - in - Array.iteri (fun i ai -> - let var_name = args.(i) in - (* Create an alloca for this variable. *) - let alloca = create_entry_block_alloca the_function var_name in - - (* Store the initial value into the alloca. *) - ignore(build_store ai alloca builder); - - (* Add arguments to variable symbol table. *) - Hashtbl.add named_values var_name alloca; - ) (params the_function) - -let codegen_func the_fpm = function - | Ast.Function (proto, body) -> - Hashtbl.clear named_values; - let the_function = codegen_proto proto in - - (* If this is an operator, install it. *) - begin match proto with - | Ast.BinOpPrototype (name, args, prec) -> - let op = name.[String.length name - 1] in - Hashtbl.add Parser.binop_precedence op prec; - | _ -> () - end; - - (* Create a new basic block to start insertion into. *) - let bb = append_block context "entry" the_function in - position_at_end bb builder; - - try - (* Add all arguments to the symbol table and create their allocas. *) - create_argument_allocas the_function proto; - - let ret_val = codegen_expr body in - - (* Finish off the function. *) - let _ = build_ret ret_val builder in - - (* Validate the generated code, checking for consistency. *) - Llvm_analysis.assert_valid_function the_function; - - (* Optimize the function. *) - let _ = PassManager.run_function the_function the_fpm in - - the_function - with e -> - delete_function the_function; - raise e -</pre> -</dd> - -<dt>toplevel.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Top-Level parsing and JIT Driver - *===----------------------------------------------------------------------===*) - -open Llvm -open Llvm_executionengine - -(* top ::= definition | external | expression | ';' *) -let rec main_loop the_fpm the_execution_engine stream = - match Stream.peek stream with - | None -> () - - (* ignore top-level semicolons. *) - | Some (Token.Kwd ';') -> - Stream.junk stream; - main_loop the_fpm the_execution_engine stream - - | Some token -> - begin - try match token with - | Token.Def -> - let e = Parser.parse_definition stream in - print_endline "parsed a function definition."; - dump_value (Codegen.codegen_func the_fpm e); - | Token.Extern -> - let e = Parser.parse_extern stream in - print_endline "parsed an extern."; - dump_value (Codegen.codegen_proto e); - | _ -> - (* Evaluate a top-level expression into an anonymous function. *) - let e = Parser.parse_toplevel stream in - print_endline "parsed a top-level expr"; - let the_function = Codegen.codegen_func the_fpm e in - dump_value the_function; - - (* JIT the function, returning a function pointer. *) - let result = ExecutionEngine.run_function the_function [||] - the_execution_engine in - - print_string "Evaluated to "; - print_float (GenericValue.as_float Codegen.double_type result); - print_newline (); - with Stream.Error s | Codegen.Error s -> - (* Skip token for error recovery. *) - Stream.junk stream; - print_endline s; - end; - print_string "ready> "; flush stdout; - main_loop the_fpm the_execution_engine stream -</pre> -</dd> - -<dt>toy.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Main driver code. - *===----------------------------------------------------------------------===*) - -open Llvm -open Llvm_executionengine -open Llvm_target -open Llvm_scalar_opts - -let main () = - ignore (initialize_native_target ()); - - (* Install standard binary operators. - * 1 is the lowest precedence. *) - Hashtbl.add Parser.binop_precedence '=' 2; - Hashtbl.add Parser.binop_precedence '<' 10; - Hashtbl.add Parser.binop_precedence '+' 20; - Hashtbl.add Parser.binop_precedence '-' 20; - Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) - - (* Prime the first token. *) - print_string "ready> "; flush stdout; - let stream = Lexer.lex (Stream.of_channel stdin) in - - (* Create the JIT. *) - let the_execution_engine = ExecutionEngine.create Codegen.the_module in - let the_fpm = PassManager.create_function Codegen.the_module in - - (* Set up the optimizer pipeline. Start with registering info about how the - * target lays out data structures. *) - DataLayout.add (ExecutionEngine.target_data the_execution_engine) the_fpm; - - (* Promote allocas to registers. *) - add_memory_to_register_promotion the_fpm; - - (* Do simple "peephole" optimizations and bit-twiddling optzn. *) - add_instruction_combination the_fpm; - - (* reassociate expressions. *) - add_reassociation the_fpm; - - (* Eliminate Common SubExpressions. *) - add_gvn the_fpm; - - (* Simplify the control flow graph (deleting unreachable blocks, etc). *) - add_cfg_simplification the_fpm; - - ignore (PassManager.initialize the_fpm); - - (* Run the main "interpreter loop" now. *) - Toplevel.main_loop the_fpm the_execution_engine stream; - - (* Print out all the generated code. *) - dump_module Codegen.the_module -;; - -main () -</pre> -</dd> - -<dt>bindings.c</dt> -<dd class="doc_code"> -<pre> -#include <stdio.h> - -/* putchard - putchar that takes a double and returns 0. */ -extern double putchard(double X) { - putchar((char)X); - return 0; -} - -/* printd - printf that takes a double prints it as "%f\n", returning 0. */ -extern double printd(double X) { - printf("%f\n", X); - return 0; -} -</pre> -</dd> -</dl> - -<a href="OCamlLangImpl8.html">Next: Conclusion and other useful LLVM tidbits</a> -</div> - -<!-- *********************************************************************** --> -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> - - <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> - <a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br> - <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a><br> - Last modified: $Date$ -</address> -</body> -</html> diff --git a/docs/tutorial/OCamlLangImpl7.rst b/docs/tutorial/OCamlLangImpl7.rst new file mode 100644 index 0000000000..07da3a8ff9 --- /dev/null +++ b/docs/tutorial/OCamlLangImpl7.rst @@ -0,0 +1,1726 @@ +======================================================= +Kaleidoscope: Extending the Language: Mutable Variables +======================================================= + +.. contents:: + :local: + +Written by `Chris Lattner <mailto:sabre@nondot.org>`_ and `Erick +Tryzelaar <mailto:idadesub@users.sourceforge.net>`_ + +Chapter 7 Introduction +====================== + +Welcome to Chapter 7 of the "`Implementing a language with +LLVM <index.html>`_" tutorial. In chapters 1 through 6, we've built a +very respectable, albeit simple, `functional programming +language <http://en.wikipedia.org/wiki/Functional_programming>`_. In our +journey, we learned some parsing techniques, how to build and represent +an AST, how to build LLVM IR, and how to optimize the resultant code as +well as JIT compile it. + +While Kaleidoscope is interesting as a functional language, the fact +that it is functional makes it "too easy" to generate LLVM IR for it. In +particular, a functional language makes it very easy to build LLVM IR +directly in `SSA +form <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_. +Since LLVM requires that the input code be in SSA form, this is a very +nice property and it is often unclear to newcomers how to generate code +for an imperative language with mutable variables. + +The short (and happy) summary of this chapter is that there is no need +for your front-end to build SSA form: LLVM provides highly tuned and +well tested support for this, though the way it works is a bit +unexpected for some. + +Why is this a hard problem? +=========================== + +To understand why mutable variables cause complexities in SSA +construction, consider this extremely simple C example: + +.. code-block:: c + + int G, H; + int test(_Bool Condition) { + int X; + if (Condition) + X = G; + else + X = H; + return X; + } + +In this case, we have the variable "X", whose value depends on the path +executed in the program. Because there are two different possible values +for X before the return instruction, a PHI node is inserted to merge the +two values. The LLVM IR that we want for this example looks like this: + +.. code-block:: llvm + + @G = weak global i32 0 ; type of @G is i32* + @H = weak global i32 0 ; type of @H is i32* + + define i32 @test(i1 %Condition) { + entry: + br i1 %Condition, label %cond_true, label %cond_false + + cond_true: + %X.0 = load i32* @G + br label %cond_next + + cond_false: + %X.1 = load i32* @H + br label %cond_next + + cond_next: + %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ] + ret i32 %X.2 + } + +In this example, the loads from the G and H global variables are +explicit in the LLVM IR, and they live in the then/else branches of the +if statement (cond\_true/cond\_false). In order to merge the incoming +values, the X.2 phi node in the cond\_next block selects the right value +to use based on where control flow is coming from: if control flow comes +from the cond\_false block, X.2 gets the value of X.1. Alternatively, if +control flow comes from cond\_true, it gets the value of X.0. The intent +of this chapter is not to explain the details of SSA form. For more +information, see one of the many `online +references <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_. + +The question for this article is "who places the phi nodes when lowering +assignments to mutable variables?". The issue here is that LLVM +*requires* that its IR be in SSA form: there is no "non-ssa" mode for +it. However, SSA construction requires non-trivial algorithms and data +structures, so it is inconvenient and wasteful for every front-end to +have to reproduce this logic. + +Memory in LLVM +============== + +The 'trick' here is that while LLVM does require all register values to +be in SSA form, it does not require (or permit) memory objects to be in +SSA form. In the example above, note that the loads from G and H are +direct accesses to G and H: they are not renamed or versioned. This +differs from some other compiler systems, which do try to version memory +objects. In LLVM, instead of encoding dataflow analysis of memory into +the LLVM IR, it is handled with `Analysis +Passes <../WritingAnLLVMPass.html>`_ which are computed on demand. + +With this in mind, the high-level idea is that we want to make a stack +variable (which lives in memory, because it is on the stack) for each +mutable object in a function. To take advantage of this trick, we need +to talk about how LLVM represents stack variables. + +In LLVM, all memory accesses are explicit with load/store instructions, +and it is carefully designed not to have (or need) an "address-of" +operator. Notice how the type of the @G/@H global variables is actually +"i32\*" even though the variable is defined as "i32". What this means is +that @G defines *space* for an i32 in the global data area, but its +*name* actually refers to the address for that space. Stack variables +work the same way, except that instead of being declared with global +variable definitions, they are declared with the `LLVM alloca +instruction <../LangRef.html#i_alloca>`_: + +.. code-block:: llvm + + define i32 @example() { + entry: + %X = alloca i32 ; type of %X is i32*. + ... + %tmp = load i32* %X ; load the stack value %X from the stack. + %tmp2 = add i32 %tmp, 1 ; increment it + store i32 %tmp2, i32* %X ; store it back + ... + +This code shows an example of how you can declare and manipulate a stack +variable in the LLVM IR. Stack memory allocated with the alloca +instruction is fully general: you can pass the address of the stack slot +to functions, you can store it in other variables, etc. In our example +above, we could rewrite the example to use the alloca technique to avoid +using a PHI node: + +.. code-block:: llvm + + @G = weak global i32 0 ; type of @G is i32* + @H = weak global i32 0 ; type of @H is i32* + + define i32 @test(i1 %Condition) { + entry: + %X = alloca i32 ; type of %X is i32*. + br i1 %Condition, label %cond_true, label %cond_false + + cond_true: + %X.0 = load i32* @G + store i32 %X.0, i32* %X ; Update X + br label %cond_next + + cond_false: + %X.1 = load i32* @H + store i32 %X.1, i32* %X ; Update X + br label %cond_next + + cond_next: + %X.2 = load i32* %X ; Read X + ret i32 %X.2 + } + +With this, we have discovered a way to handle arbitrary mutable +variables without the need to create Phi nodes at all: + +#. Each mutable variable becomes a stack allocation. +#. Each read of the variable becomes a load from the stack. +#. Each update of the variable becomes a store to the stack. +#. Taking the address of a variable just uses the stack address + directly. + +While this solution has solved our immediate problem, it introduced +another one: we have now apparently introduced a lot of stack traffic +for very simple and common operations, a major performance problem. +Fortunately for us, the LLVM optimizer has a highly-tuned optimization +pass named "mem2reg" that handles this case, promoting allocas like this +into SSA registers, inserting Phi nodes as appropriate. If you run this +example through the pass, for example, you'll get: + +.. code-block:: bash + + $ llvm-as < example.ll | opt -mem2reg | llvm-dis + @G = weak global i32 0 + @H = weak global i32 0 + + define i32 @test(i1 %Condition) { + entry: + br i1 %Condition, label %cond_true, label %cond_false + + cond_true: + %X.0 = load i32* @G + br label %cond_next + + cond_false: + %X.1 = load i32* @H + br label %cond_next + + cond_next: + %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ] + ret i32 %X.01 + } + +The mem2reg pass implements the standard "iterated dominance frontier" +algorithm for constructing SSA form and has a number of optimizations +that speed up (very common) degenerate cases. The mem2reg optimization +pass is the answer to dealing with mutable variables, and we highly +recommend that you depend on it. Note that mem2reg only works on +variables in certain circumstances: + +#. mem2reg is alloca-driven: it looks for allocas and if it can handle + them, it promotes them. It does not apply to global variables or heap + allocations. +#. mem2reg only looks for alloca instructions in the entry block of the + function. Being in the entry block guarantees that the alloca is only + executed once, which makes analysis simpler. +#. mem2reg only promotes allocas whose uses are direct loads and stores. + If the address of the stack object is passed to a function, or if any + funny pointer arithmetic is involved, the alloca will not be + promoted. +#. mem2reg only works on allocas of `first + class <../LangRef.html#t_classifications>`_ values (such as pointers, + scalars and vectors), and only if the array size of the allocation is + 1 (or missing in the .ll file). mem2reg is not capable of promoting + structs or arrays to registers. Note that the "scalarrepl" pass is + more powerful and can promote structs, "unions", and arrays in many + cases. + +All of these properties are easy to satisfy for most imperative +languages, and we'll illustrate it below with Kaleidoscope. The final +question you may be asking is: should I bother with this nonsense for my +front-end? Wouldn't it be better if I just did SSA construction +directly, avoiding use of the mem2reg optimization pass? In short, we +strongly recommend that you use this technique for building SSA form, +unless there is an extremely good reason not to. Using this technique +is: + +- Proven and well tested: llvm-gcc and clang both use this technique + for local mutable variables. As such, the most common clients of LLVM + are using this to handle a bulk of their variables. You can be sure + that bugs are found fast and fixed early. +- Extremely Fast: mem2reg has a number of special cases that make it + fast in common cases as well as fully general. For example, it has + fast-paths for variables that are only used in a single block, + variables that only have one assignment point, good heuristics to + avoid insertion of unneeded phi nodes, etc. +- Needed for debug info generation: `Debug information in + LLVM <../SourceLevelDebugging.html>`_ relies on having the address of + the variable exposed so that debug info can be attached to it. This + technique dovetails very naturally with this style of debug info. + +If nothing else, this makes it much easier to get your front-end up and +running, and is very simple to implement. Lets extend Kaleidoscope with +mutable variables now! + +Mutable Variables in Kaleidoscope +================================= + +Now that we know the sort of problem we want to tackle, lets see what +this looks like in the context of our little Kaleidoscope language. +We're going to add two features: + +#. The ability to mutate variables with the '=' operator. +#. The ability to define new variables. + +While the first item is really what this is about, we only have +variables for incoming arguments as well as for induction variables, and +redefining those only goes so far :). Also, the ability to define new +variables is a useful thing regardless of whether you will be mutating +them. Here's a motivating example that shows how we could use these: + +:: + + # Define ':' for sequencing: as a low-precedence operator that ignores operands + # and just returns the RHS. + def binary : 1 (x y) y; + + # Recursive fib, we could do this before. + def fib(x) + if (x < 3) then + 1 + else + fib(x-1)+fib(x-2); + + # Iterative fib. + def fibi(x) + var a = 1, b = 1, c in + (for i = 3, i < x in + c = a + b : + a = b : + b = c) : + b; + + # Call it. + fibi(10); + +In order to mutate variables, we have to change our existing variables +to use the "alloca trick". Once we have that, we'll add our new +operator, then extend Kaleidoscope to support new variable definitions. + +Adjusting Existing Variables for Mutation +========================================= + +The symbol table in Kaleidoscope is managed at code generation time by +the '``named_values``' map. This map currently keeps track of the LLVM +"Value\*" that holds the double value for the named variable. In order +to support mutation, we need to change this slightly, so that it +``named_values`` holds the *memory location* of the variable in +question. Note that this change is a refactoring: it changes the +structure of the code, but does not (by itself) change the behavior of +the compiler. All of these changes are isolated in the Kaleidoscope code +generator. + +At this point in Kaleidoscope's development, it only supports variables +for two things: incoming arguments to functions and the induction +variable of 'for' loops. For consistency, we'll allow mutation of these +variables in addition to other user-defined variables. This means that +these will both need memory locations. + +To start our transformation of Kaleidoscope, we'll change the +``named_values`` map so that it maps to AllocaInst\* instead of Value\*. +Once we do this, the C++ compiler will tell us what parts of the code we +need to update: + +**Note:** the ocaml bindings currently model both ``Value*``'s and +``AllocInst*``'s as ``Llvm.llvalue``'s, but this may change in the future +to be more type safe. + +.. code-block:: ocaml + + let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 + +Also, since we will need to create these alloca's, we'll use a helper +function that ensures that the allocas are created in the entry block of +the function: + +.. code-block:: ocaml + + (* Create an alloca instruction in the entry block of the function. This + * is used for mutable variables etc. *) + let create_entry_block_alloca the_function var_name = + let builder = builder_at (instr_begin (entry_block the_function)) in + build_alloca double_type var_name builder + +This funny looking code creates an ``Llvm.llbuilder`` object that is +pointing at the first instruction of the entry block. It then creates an +alloca with the expected name and returns it. Because all values in +Kaleidoscope are doubles, there is no need to pass in a type to use. + +With this in place, the first functionality change we want to make is to +variable references. In our new scheme, variables live on the stack, so +code generating a reference to them actually needs to produce a load +from the stack slot: + +.. code-block:: ocaml + + let rec codegen_expr = function + ... + | Ast.Variable name -> + let v = try Hashtbl.find named_values name with + | Not_found -> raise (Error "unknown variable name") + in + (* Load the value. *) + build_load v name builder + +As you can see, this is pretty straightforward. Now we need to update +the things that define the variables to set up the alloca. We'll start +with ``codegen_expr Ast.For ...`` (see the `full code listing <#code>`_ +for the unabridged code): + +.. code-block:: ocaml + + | Ast.For (var_name, start, end_, step, body) -> + let the_function = block_parent (insertion_block builder) in + + (* Create an alloca for the variable in the entry block. *) + let alloca = create_entry_block_alloca the_function var_name in + + (* Emit the start code first, without 'variable' in scope. *) + let start_val = codegen_expr start in + + (* Store the value into the alloca. *) + ignore(build_store start_val alloca builder); + + ... + + (* Within the loop, the variable is defined equal to the PHI node. If it + * shadows an existing variable, we have to restore it, so save it + * now. *) + let old_val = + try Some (Hashtbl.find named_values var_name) with Not_found -> None + in + Hashtbl.add named_values var_name alloca; + + ... + + (* Compute the end condition. *) + let end_cond = codegen_expr end_ in + + (* Reload, increment, and restore the alloca. This handles the case where + * the body of the loop mutates the variable. *) + let cur_var = build_load alloca var_name builder in + let next_var = build_add cur_var step_val "nextvar" builder in + ignore(build_store next_var alloca builder); + ... + +This code is virtually identical to the code `before we allowed mutable +variables <OCamlLangImpl5.html#forcodegen>`_. The big difference is that +we no longer have to construct a PHI node, and we use load/store to +access the variable as needed. + +To support mutable argument variables, we need to also make allocas for +them. The code for this is also pretty simple: + +.. code-block:: ocaml + + (* Create an alloca for each argument and register the argument in the symbol + * table so that references to it will succeed. *) + let create_argument_allocas the_function proto = + let args = match proto with + | Ast.Prototype (_, args) | Ast.BinOpPrototype (_, args, _) -> args + in + Array.iteri (fun i ai -> + let var_name = args.(i) in + (* Create an alloca for this variable. *) + let alloca = create_entry_block_alloca the_function var_name in + + (* Store the initial value into the alloca. *) + ignore(build_store ai alloca builder); + + (* Add arguments to variable symbol table. *) + Hashtbl.add named_values var_name alloca; + ) (params the_function) + +For each argument, we make an alloca, store the input value to the +function into the alloca, and register the alloca as the memory location +for the argument. This method gets invoked by ``Codegen.codegen_func`` +right after it sets up the entry block for the function. + +The final missing piece is adding the mem2reg pass, which allows us to +get good codegen once again: + +.. code-block:: ocaml + + let main () = + ... + let the_fpm = PassManager.create_function Codegen.the_module in + + (* Set up the optimizer pipeline. Start with registering info about how the + * target lays out data structures. *) + DataLayout.add (ExecutionEngine.target_data the_execution_engine) the_fpm; + + (* Promote allocas to registers. *) + add_memory_to_register_promotion the_fpm; + + (* Do simple "peephole" optimizations and bit-twiddling optzn. *) + add_instruction_combining the_fpm; + + (* reassociate expressions. *) + add_reassociation the_fpm; + +It is interesting to see what the code looks like before and after the +mem2reg optimization runs. For example, this is the before/after code +for our recursive fib function. Before the optimization: + +.. code-block:: llvm + + define double @fib(double %x) { + entry: + %x1 = alloca double + store double %x, double* %x1 + %x2 = load double* %x1 + %cmptmp = fcmp ult double %x2, 3.000000e+00 + %booltmp = uitofp i1 %cmptmp to double + %ifcond = fcmp one double %booltmp, 0.000000e+00 + br i1 %ifcond, label %then, label %else + + then: ; preds = %entry + br label %ifcont + + else: ; preds = %entry + %x3 = load double* %x1 + %subtmp = fsub double %x3, 1.000000e+00 + %calltmp = call double @fib(double %subtmp) + %x4 = load double* %x1 + %subtmp5 = fsub double %x4, 2.000000e+00 + %calltmp6 = call double @fib(double %subtmp5) + %addtmp = fadd double %calltmp, %calltmp6 + br label %ifcont + + ifcont: ; preds = %else, %then + %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ] + ret double %iftmp + } + +Here there is only one variable (x, the input argument) but you can +still see the extremely simple-minded code generation strategy we are +using. In the entry block, an alloca is created, and the initial input +value is stored into it. Each reference to the variable does a reload +from the stack. Also, note that we didn't modify the if/then/else +expression, so it still inserts a PHI node. While we could make an +alloca for it, it is actually easier to create a PHI node for it, so we +still just make the PHI. + +Here is the code after the mem2reg pass runs: + +.. code-block:: llvm + + define double @fib(double %x) { + entry: + %cmptmp = fcmp ult double %x, 3.000000e+00 + %booltmp = uitofp i1 %cmptmp to double + %ifcond = fcmp one double %booltmp, 0.000000e+00 + br i1 %ifcond, label %then, label %else + + then: + br label %ifcont + + else: + %subtmp = fsub double %x, 1.000000e+00 + %calltmp = call double @fib(double %subtmp) + %subtmp5 = fsub double %x, 2.000000e+00 + %calltmp6 = call double @fib(double %subtmp5) + %addtmp = fadd double %calltmp, %calltmp6 + br label %ifcont + + ifcont: ; preds = %else, %then + %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ] + ret double %iftmp + } + +This is a trivial case for mem2reg, since there are no redefinitions of +the variable. The point of showing this is to calm your tension about +inserting such blatent inefficiencies :). + +After the rest of the optimizers run, we get: + +.. code-block:: llvm + + define double @fib(double %x) { + entry: + %cmptmp = fcmp ult double %x, 3.000000e+00 + %booltmp = uitofp i1 %cmptmp to double + %ifcond = fcmp ueq double %booltmp, 0.000000e+00 + br i1 %ifcond, label %else, label %ifcont + + else: + %subtmp = fsub double %x, 1.000000e+00 + %calltmp = call double @fib(double %subtmp) + %subtmp5 = fsub double %x, 2.000000e+00 + %calltmp6 = call double @fib(double %subtmp5) + %addtmp = fadd double %calltmp, %calltmp6 + ret double %addtmp + + ifcont: + ret double 1.000000e+00 + } + +Here we see that the simplifycfg pass decided to clone the return +instruction into the end of the 'else' block. This allowed it to +eliminate some branches and the PHI node. + +Now that all symbol table references are updated to use stack variables, +we'll add the assignment operator. + +New Assignment Operator +======================= + +With our current framework, adding a new assignment operator is really +simple. We will parse it just like any other binary operator, but handle +it internally (instead of allowing the user to define it). The first +step is to set a precedence: + +.. code-block:: ocaml + + let main () = + (* Install standard binary operators. + * 1 is the lowest precedence. *) + Hashtbl.add Parser.binop_precedence '=' 2; + Hashtbl.add Parser.binop_precedence '<' 10; + Hashtbl.add Parser.binop_precedence '+' 20; + Hashtbl.add Parser.binop_precedence '-' 20; + ... + +Now that the parser knows the precedence of the binary operator, it +takes care of all the parsing and AST generation. We just need to +implement codegen for the assignment operator. This looks like: + +.. code-block:: ocaml + + let rec codegen_expr = function + begin match op with + | '=' -> + (* Special case '=' because we don't want to emit the LHS as an + * expression. *) + let name = + match lhs with + | Ast.Variable name -> name + | _ -> raise (Error "destination of '=' must be a variable") + in + +Unlike the rest of the binary operators, our assignment operator doesn't +follow the "emit LHS, emit RHS, do computation" model. As such, it is +handled as a special case before the other binary operators are handled. +The other strange thing is that it requires the LHS to be a variable. It +is invalid to have "(x+1) = expr" - only things like "x = expr" are +allowed. + +.. code-block:: ocaml + + (* Codegen the rhs. *) + let val_ = codegen_expr rhs in + + (* Lookup the name. *) + let variable = try Hashtbl.find named_values name with + | Not_found -> raise (Error "unknown variable name") + in + ignore(build_store val_ variable builder); + val_ + | _ -> + ... + +Once we have the variable, codegen'ing the assignment is +straightforward: we emit the RHS of the assignment, create a store, and +return the computed value. Returning a value allows for chained +assignments like "X = (Y = Z)". + +Now that we have an assignment operator, we can mutate loop variables +and arguments. For example, we can now run code like this: + +:: + + # Function to print a double. + extern printd(x); + + # Define ':' for sequencing: as a low-precedence operator that ignores operands + # and just returns the RHS. + def binary : 1 (x y) y; + + def test(x) + printd(x) : + x = 4 : + printd(x); + + test(123); + +When run, this example prints "123" and then "4", showing that we did +actually mutate the value! Okay, we have now officially implemented our +goal: getting this to work requires SSA construction in the general +case. However, to be really useful, we want the ability to define our +own local variables, lets add this next! + +User-defined Local Variables +============================ + +Adding var/in is just like any other other extensions we made to +Kaleidoscope: we extend the lexer, the parser, the AST and the code +generator. The first step for adding our new 'var/in' construct is to +extend the lexer. As before, this is pretty trivial, the code looks like +this: + +.. code-block:: ocaml + + type token = + ... + (* var definition *) + | Var + + ... + + and lex_ident buffer = parser + ... + | "in" -> [< 'Token.In; stream >] + | "binary" -> [< 'Token.Binary; stream >] + | "unary" -> [< 'Token.Unary; stream >] + | "var" -> [< 'Token.Var; stream >] + ... + +The next step is to define the AST node that we will construct. For +var/in, it looks like this: + +.. code-block:: ocaml + + type expr = + ... + (* variant for var/in. *) + | Var of (string * expr option) array * expr + ... + +var/in allows a list of names to be defined all at once, and each name +can optionally have an initializer value. As such, we capture this +information in the VarNames vector. Also, var/in has a body, this body +is allowed to access the variables defined by the var/in. + +With this in place, we can define the parser pieces. The first thing we +do is add it as a primary expression: + +.. code-block:: ocaml + + (* primary + * ::= identifier + * ::= numberexpr + * ::= parenexpr + * ::= ifexpr + * ::= forexpr + * ::= varexpr *) + let rec parse_primary = parser + ... + (* varexpr + * ::= 'var' identifier ('=' expression? + * (',' identifier ('=' expression)?)* 'in' expression *) + | [< 'Token.Var; + (* At least one variable name is required. *) + 'Token.Ident id ?? "expected identifier after var"; + init=parse_var_init; + var_names=parse_var_names [(id, init)]; + (* At this point, we have to have 'in'. *) + 'Token.In ?? "expected 'in' keyword after 'var'"; + body=parse_expr >] -> + Ast.Var (Array.of_list (List.rev var_names), body) + + ... + + and parse_var_init = parser + (* read in the optional initializer. *) + | [< 'Token.Kwd '='; e=parse_expr >] -> Some e + | [< >] -> None + + and parse_var_names accumulator = parser + | [< 'Token.Kwd ','; + 'Token.Ident id ?? "expected identifier list after var"; + init=parse_var_init; + e=parse_var_names ((id, init) :: accumulator) >] -> e + | [< >] -> accumulator + +Now that we can parse and represent the code, we need to support +emission of LLVM IR for it. This code starts out with: + +.. code-block:: ocaml + + let rec codegen_expr = function + ... + | Ast.Var (var_names, body) + let old_bindings = ref [] in + + let the_function = block_parent (insertion_block builder) in + + (* Register all variables and emit their initializer. *) + Array.iter (fun (var_name, init) -> + +Basically it loops over all the variables, installing them one at a +time. For each variable we put into the symbol table, we remember the +previous value that we replace in OldBindings. + +.. code-block:: ocaml + + (* Emit the initializer before adding the variable to scope, this + * prevents the initializer from referencing the variable itself, and + * permits stuff like this: + * var a = 1 in + * var a = a in ... # refers to outer 'a'. *) + let init_val = + match init with + | Some init -> codegen_expr init + (* If not specified, use 0.0. *) + | None -> const_float double_type 0.0 + in + + let alloca = create_entry_block_alloca the_function var_name in + ignore(build_store init_val alloca builder); + + (* Remember the old variable binding so that we can restore the binding + * when we unrecurse. *) + + begin + try + let old_value = Hashtbl.find named_values var_name in + old_bindings := (var_name, old_value) :: !old_bindings; + with Not_found > () + end; + + (* Remember this binding. *) + Hashtbl.add named_values var_name alloca; + ) var_names; + +There are more comments here than code. The basic idea is that we emit +the initializer, create the alloca, then update the symbol table to +point to it. Once all the variables are installed in the symbol table, +we evaluate the body of the var/in expression: + +.. code-block:: ocaml + + (* Codegen the body, now that all vars are in scope. *) + let body_val = codegen_expr body in + +Finally, before returning, we restore the previous variable bindings: + +.. code-block:: ocaml + + (* Pop all our variables from scope. *) + List.iter (fun (var_name, old_value) -> + Hashtbl.add named_values var_name old_value + ) !old_bindings; + + (* Return the body computation. *) + body_val + +The end result of all of this is that we get properly scoped variable +definitions, and we even (trivially) allow mutation of them :). + +With this, we completed what we set out to do. Our nice iterative fib +example from the intro compiles and runs just fine. The mem2reg pass +optimizes all of our stack variables into SSA registers, inserting PHI +nodes where needed, and our front-end remains simple: no "iterated +dominance frontier" computation anywhere in sight. + +Full Code Listing +================= + +Here is the complete code listing for our running example, enhanced with +mutable variables and var/in support. To build this example, use: + +.. code-block:: bash + + # Compile + ocamlbuild toy.byte + # Run + ./toy.byte + +Here is the code: + +\_tags: + :: + + <{lexer,parser}.ml>: use_camlp4, pp(camlp4of) + <*.{byte,native}>: g++, use_llvm, use_llvm_analysis + <*.{byte,native}>: use_llvm_executionengine, use_llvm_target + <*.{byte,native}>: use_llvm_scalar_opts, use_bindings + +myocamlbuild.ml: + .. code-block:: ocaml + + open Ocamlbuild_plugin;; + + ocaml_lib ~extern:true "llvm";; + ocaml_lib ~extern:true "llvm_analysis";; + ocaml_lib ~extern:true "llvm_executionengine";; + ocaml_lib ~extern:true "llvm_target";; + ocaml_lib ~extern:true "llvm_scalar_opts";; + + flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"; A"-cclib"; A"-rdynamic"]);; + dep ["link"; "ocaml"; "use_bindings"] ["bindings.o"];; + +token.ml: + .. code-block:: ocaml + + (*===----------------------------------------------------------------------=== + * Lexer Tokens + *===----------------------------------------------------------------------===*) + + (* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of + * these others for known things. *) + type token = + (* commands *) + | Def | Extern + + (* primary *) + | Ident of string | Number of float + + (* unknown *) + | Kwd of char + + (* control *) + | If | Then | Else + | For | In + + (* operators *) + | Binary | Unary + + (* var definition *) + | Var + +lexer.ml: + .. code-block:: ocaml + + (*===----------------------------------------------------------------------=== + * Lexer + *===----------------------------------------------------------------------===*) + + let rec lex = parser + (* Skip any whitespace. *) + | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream + + (* identifier: [a-zA-Z][a-zA-Z0-9] *) + | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> + let buffer = Buffer.create 1 in + Buffer.add_char buffer c; + lex_ident buffer stream + + (* number: [0-9.]+ *) + | [< ' ('0' .. '9' as c); stream >] -> + let buffer = Buffer.create 1 in + Buffer.add_char buffer c; + lex_number buffer stream + + (* Comment until end of line. *) + | [< ' ('#'); stream >] -> + lex_comment stream + + (* Otherwise, just return the character as its ascii value. *) + | [< 'c; stream >] -> + [< 'Token.Kwd c; lex stream >] + + (* end of stream. *) + | [< >] -> [< >] + + and lex_number buffer = parser + | [< ' ('0' .. '9' | '.' as c); stream >] -> + Buffer.add_char buffer c; + lex_number buffer stream + | [< stream=lex >] -> + [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] + + and lex_ident buffer = parser + | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> + Buffer.add_char buffer c; + lex_ident buffer stream + | [< stream=lex >] -> + match Buffer.contents buffer with + | "def" -> [< 'Token.Def; stream >] + | "extern" -> [< 'Token.Extern; stream >] + | "if" -> [< 'Token.If; stream >] + | "then" -> [< 'Token.Then; stream >] + | "else" -> [< 'Token.Else; stream >] + | "for" -> [< 'Token.For; stream >] + | "in" -> [< 'Token.In; stream >] + | "binary" -> [< 'Token.Binary; stream >] + | "unary" -> [< 'Token.Unary; stream >] + | "var" -> [< 'Token.Var; stream >] + | id -> [< 'Token.Ident id; stream >] + + and lex_comment = parser + | [< ' ('\n'); stream=lex >] -> stream + | [< 'c; e=lex_comment >] -> e + | [< >] -> [< >] + +ast.ml: + .. code-block:: ocaml + + (*===----------------------------------------------------------------------=== + * Abstract Syntax Tree (aka Parse Tree) + *===----------------------------------------------------------------------===*) + + (* expr - Base type for all expression nodes. *) + type expr = + (* variant for numeric literals like "1.0". *) + | Number of float + + (* variant for referencing a variable, like "a". *) + | Variable of string + + (* variant for a unary operator. *) + | Unary of char * expr + + (* variant for a binary operator. *) + | Binary of char * expr * expr + + (* variant for function calls. *) + | Call of string * expr array + + (* variant for if/then/else. *) + | If of expr * expr * expr + + (* variant for for/in. *) + | For of string * expr * expr * expr option * expr + + (* variant for var/in. *) + | Var of (string * expr option) array * expr + + (* proto - This type represents the "prototype" for a function, which captures + * its name, and its argument names (thus implicitly the number of arguments the + * function takes). *) + type proto = + | Prototype of string * string array + | BinOpPrototype of string * string array * int + + (* func - This type represents a function definition itself. *) + type func = Function of proto * expr + +parser.ml: + .. code-block:: ocaml + + (*===---------------------------------------------------------------------=== + * Parser + *===---------------------------------------------------------------------===*) + + (* binop_precedence - This holds the precedence for each binary operator that is + * defined *) + let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 + + (* precedence - Get the precedence of the pending binary operator token. *) + let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 + + (* primary + * ::= identifier + * ::= numberexpr + * ::= parenexpr + * ::= ifexpr + * ::= forexpr + * ::= varexpr *) + let rec parse_primary = parser + (* numberexpr ::= number *) + | [< 'Token.Number n >] -> Ast.Number n + + (* parenexpr ::= '(' expression ')' *) + | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e + + (* identifierexpr + * ::= identifier + * ::= identifier '(' argumentexpr ')' *) + | [< 'Token.Ident id; stream >] -> + let rec parse_args accumulator = parser + | [< e=parse_expr; stream >] -> + begin parser + | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e + | [< >] -> e :: accumulator + end stream + | [< >] -> accumulator + in + let rec parse_ident id = parser + (* Call. *) + | [< 'Token.Kwd '('; + args=parse_args []; + 'Token.Kwd ')' ?? "expected ')'">] -> + Ast.Call (id, Array.of_list (List.rev args)) + + (* Simple variable ref. *) + | [< >] -> Ast.Variable id + in + parse_ident id stream + + (* ifexpr ::= 'if' expr 'then' expr 'else' expr *) + | [< 'Token.If; c=parse_expr; + 'Token.Then ?? "expected 'then'"; t=parse_expr; + 'Token.Else ?? "expected 'else'"; e=parse_expr >] -> + Ast.If (c, t, e) + + (* forexpr + ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression *) + | [< 'Token.For; + 'Token.Ident id ?? "expected identifier after for"; + 'Token.Kwd '=' ?? "expected '=' after for"; + stream >] -> + begin parser + | [< + start=parse_expr; + 'Token.Kwd ',' ?? "expected ',' after for"; + end_=parse_expr; + stream >] -> + let step = + begin parser + | [< 'Token.Kwd ','; step=parse_expr >] -> Some step + | [< >] -> None + end stream + in + begin parser + | [< 'Token.In; body=parse_expr >] -> + Ast.For (id, start, end_, step, body) + | [< >] -> + raise (Stream.Error "expected 'in' after for") + end stream + | [< >] -> + raise (Stream.Error "expected '=' after for") + end stream + + (* varexpr + * ::= 'var' identifier ('=' expression? + * (',' identifier ('=' expression)?)* 'in' expression *) + | [< 'Token.Var; + (* At least one variable name is required. *) + 'Token.Ident id ?? "expected identifier after var"; + init=parse_var_init; + var_names=parse_var_names [(id, init)]; + (* At this point, we have to have 'in'. *) + 'Token.In ?? "expected 'in' keyword after 'var'"; + body=parse_expr >] -> + Ast.Var (Array.of_list (List.rev var_names), body) + + | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") + + (* unary + * ::= primary + * ::= '!' unary *) + and parse_unary = parser + (* If this is a unary operator, read it. *) + | [< 'Token.Kwd op when op != '(' && op != ')'; operand=parse_expr >] -> + Ast.Unary (op, operand) + + (* If the current token is not an operator, it must be a primary expr. *) + | [< stream >] -> parse_primary stream + + (* binoprhs + * ::= ('+' primary)* *) + and parse_bin_rhs expr_prec lhs stream = + match Stream.peek stream with + (* If this is a binop, find its precedence. *) + | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> + let token_prec = precedence c in + + (* If this is a binop that binds at least as tightly as the current binop, + * consume it, otherwise we are done. *) + if token_prec < expr_prec then lhs else begin + (* Eat the binop. *) + Stream.junk stream; + + (* Parse the primary expression after the binary operator. *) + let rhs = parse_unary stream in + + (* Okay, we know this is a binop. *) + let rhs = + match Stream.peek stream with + | Some (Token.Kwd c2) -> + (* If BinOp binds less tightly with rhs than the operator after + * rhs, let the pending operator take rhs as its lhs. *) + let next_prec = precedence c2 in + if token_prec < next_prec + then parse_bin_rhs (token_prec + 1) rhs stream + else rhs + | _ -> rhs + in + + (* Merge lhs/rhs. *) + let lhs = Ast.Binary (c, lhs, rhs) in + parse_bin_rhs expr_prec lhs stream + end + | _ -> lhs + + and parse_var_init = parser + (* read in the optional initializer. *) + | [< 'Token.Kwd '='; e=parse_expr >] -> Some e + | [< >] -> None + + and parse_var_names accumulator = parser + | [< 'Token.Kwd ','; + 'Token.Ident id ?? "expected identifier list after var"; + init=parse_var_init; + e=parse_var_names ((id, init) :: accumulator) >] -> e + | [< >] -> accumulator + + (* expression + * ::= primary binoprhs *) + and parse_expr = parser + | [< lhs=parse_unary; stream >] -> parse_bin_rhs 0 lhs stream + + (* prototype + * ::= id '(' id* ')' + * ::= binary LETTER number? (id, id) + * ::= unary LETTER number? (id) *) + let parse_prototype = + let rec parse_args accumulator = parser + | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e + | [< >] -> accumulator + in + let parse_operator = parser + | [< 'Token.Unary >] -> "unary", 1 + | [< 'Token.Binary >] -> "binary", 2 + in + let parse_binary_precedence = parser + | [< 'Token.Number n >] -> int_of_float n + | [< >] -> 30 + in + parser + | [< 'Token.Ident id; + 'Token.Kwd '(' ?? "expected '(' in prototype"; + args=parse_args []; + 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> + (* success. *) + Ast.Prototype (id, Array.of_list (List.rev args)) + | [< (prefix, kind)=parse_operator; + 'Token.Kwd op ?? "expected an operator"; + (* Read the precedence if present. *) + binary_precedence=parse_binary_precedence; + 'Token.Kwd '(' ?? "expected '(' in prototype"; + args=parse_args []; + 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> + let name = prefix ^ (String.make 1 op) in + let args = Array.of_list (List.rev args) in + + (* Verify right number of arguments for operator. *) + if Array.length args != kind + then raise (Stream.Error "invalid number of operands for operator") + else + if kind == 1 then + Ast.Prototype (name, args) + else + Ast.BinOpPrototype (name, args, binary_precedence) + | [< >] -> + raise (Stream.Error "expected function name in prototype") + + (* definition ::= 'def' prototype expression *) + let parse_definition = parser + | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> + Ast.Function (p, e) + + (* toplevelexpr ::= expression *) + let parse_toplevel = parser + | [< e=parse_expr >] -> + (* Make an anonymous proto. *) + Ast.Function (Ast.Prototype ("", [||]), e) + + (* external ::= 'extern' prototype *) + let parse_extern = parser + | [< 'Token.Extern; e=parse_prototype >] -> e + +codegen.ml: + .. code-block:: ocaml + + (*===----------------------------------------------------------------------=== + * Code Generation + *===----------------------------------------------------------------------===*) + + open Llvm + + exception Error of string + + let context = global_context () + let the_module = create_module context "my cool jit" + let builder = builder context + let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 + let double_type = double_type context + + (* Create an alloca instruction in the entry block of the function. This + * is used for mutable variables etc. *) + let create_entry_block_alloca the_function var_name = + let builder = builder_at context (instr_begin (entry_block the_function)) in + build_alloca double_type var_name builder + + let rec codegen_expr = function + | Ast.Number n -> const_float double_type n + | Ast.Variable name -> + let v = try Hashtbl.find named_values name with + | Not_found -> raise (Error "unknown variable name") + in + (* Load the value. *) + build_load v name builder + | Ast.Unary (op, operand) -> + let operand = codegen_expr operand in + let callee = "unary" ^ (String.make 1 op) in + let callee = + match lookup_function callee the_module with + | Some callee -> callee + | None -> raise (Error "unknown unary operator") + in + build_call callee [|operand|] "unop" builder + | Ast.Binary (op, lhs, rhs) -> + begin match op with + | '=' -> + (* Special case '=' because we don't want to emit the LHS as an + * expression. *) + let name = + match lhs with + | Ast.Variable name -> name + | _ -> raise (Error "destination of '=' must be a variable") + in + + (* Codegen the rhs. *) + let val_ = codegen_expr rhs in + + (* Lookup the name. *) + let variable = try Hashtbl.find named_values name with + | Not_found -> raise (Error "unknown variable name") + in + ignore(build_store val_ variable builder); + val_ + | _ -> + let lhs_val = codegen_expr lhs in + let rhs_val = codegen_expr rhs in + begin + match op with + | '+' -> build_add lhs_val rhs_val "addtmp" builder + | '-' -> build_sub lhs_val rhs_val "subtmp" builder + | '*' -> build_mul lhs_val rhs_val "multmp" builder + | '<' -> + (* Convert bool 0/1 to double 0.0 or 1.0 *) + let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in + build_uitofp i double_type "booltmp" builder + | _ -> + (* If it wasn't a builtin binary operator, it must be a user defined + * one. Emit a call to it. *) + let callee = "binary" ^ (String.make 1 op) in + let callee = + match lookup_function callee the_module with + | Some callee -> callee + | None -> raise (Error "binary operator not found!") + in + build_call callee [|lhs_val; rhs_val|] "binop" builder + end + end + | Ast.Call (callee, args) -> + (* Look up the name in the module table. *) + let callee = + match lookup_function callee the_module with + | Some callee -> callee + | None -> raise (Error "unknown function referenced") + in + let params = params callee in + + (* If argument mismatch error. *) + if Array.length params == Array.length args then () else + raise (Error "incorrect # arguments passed"); + let args = Array.map codegen_expr args in + build_call callee args "calltmp" builder + | Ast.If (cond, then_, else_) -> + let cond = codegen_expr cond in + + (* Convert condition to a bool by comparing equal to 0.0 *) + let zero = const_float double_type 0.0 in + let cond_val = build_fcmp Fcmp.One cond zero "ifcond" builder in + + (* Grab the first block so that we might later add the conditional branch + * to it at the end of the function. *) + let start_bb = insertion_block builder in + let the_function = block_parent start_bb in + + let then_bb = append_block context "then" the_function in + + (* Emit 'then' value. *) + position_at_end then_bb builder; + let then_val = codegen_expr then_ in + + (* Codegen of 'then' can change the current block, update then_bb for the + * phi. We create a new name because one is used for the phi node, and the + * other is used for the conditional branch. *) + let new_then_bb = insertion_block builder in + + (* Emit 'else' value. *) + let else_bb = append_block context "else" the_function in + position_at_end else_bb builder; + let else_val = codegen_expr else_ in + + (* Codegen of 'else' can change the current block, update else_bb for the + * phi. *) + let new_else_bb = insertion_block builder in + + (* Emit merge block. *) + let merge_bb = append_block context "ifcont" the_function in + position_at_end merge_bb builder; + let incoming = [(then_val, new_then_bb); (else_val, new_else_bb)] in + let phi = build_phi incoming "iftmp" builder in + + (* Return to the start block to add the conditional branch. *) + position_at_end start_bb builder; + ignore (build_cond_br cond_val then_bb else_bb builder); + + (* Set a unconditional branch at the end of the 'then' block and the + * 'else' block to the 'merge' block. *) + position_at_end new_then_bb builder; ignore (build_br merge_bb builder); + position_at_end new_else_bb builder; ignore (build_br merge_bb builder); + + (* Finally, set the builder to the end of the merge block. *) + position_at_end merge_bb builder; + + phi + | Ast.For (var_name, start, end_, step, body) -> + (* Output this as: + * var = alloca double + * ... + * start = startexpr + * store start -> var + * goto loop + * loop: + * ... + * bodyexpr + * ... + * loopend: + * step = stepexpr + * endcond = endexpr + * + * curvar = load var + * nextvar = curvar + step + * store nextvar -> var + * br endcond, loop, endloop + * outloop: *) + + let the_function = block_parent (insertion_block builder) in + + (* Create an alloca for the variable in the entry block. *) + let alloca = create_entry_block_alloca the_function var_name in + + (* Emit the start code first, without 'variable' in scope. *) + let start_val = codegen_expr start in + + (* Store the value into the alloca. *) + ignore(build_store start_val alloca builder); + + (* Make the new basic block for the loop header, inserting after current + * block. *) + let loop_bb = append_block context "loop" the_function in + + (* Insert an explicit fall through from the current block to the + * loop_bb. *) + ignore (build_br loop_bb builder); + + (* Start insertion in loop_bb. *) + position_at_end loop_bb builder; + + (* Within the loop, the variable is defined equal to the PHI node. If it + * shadows an existing variable, we have to restore it, so save it + * now. *) + let old_val = + try Some (Hashtbl.find named_values var_name) with Not_found -> None + in + Hashtbl.add named_values var_name alloca; + + (* Emit the body of the loop. This, like any other expr, can change the + * current BB. Note that we ignore the value computed by the body, but + * don't allow an error *) + ignore (codegen_expr body); + + (* Emit the step value. *) + let step_val = + match step with + | Some step -> codegen_expr step + (* If not specified, use 1.0. *) + | None -> const_float double_type 1.0 + in + + (* Compute the end condition. *) + let end_cond = codegen_expr end_ in + + (* Reload, increment, and restore the alloca. This handles the case where + * the body of the loop mutates the variable. *) + let cur_var = build_load alloca var_name builder in + let next_var = build_add cur_var step_val "nextvar" builder in + ignore(build_store next_var alloca builder); + + (* Convert condition to a bool by comparing equal to 0.0. *) + let zero = const_float double_type 0.0 in + let end_cond = build_fcmp Fcmp.One end_cond zero "loopcond" builder in + + (* Create the "after loop" block and insert it. *) + let after_bb = append_block context "afterloop" the_function in + + (* Insert the conditional branch into the end of loop_end_bb. *) + ignore (build_cond_br end_cond loop_bb after_bb builder); + + (* Any new code will be inserted in after_bb. *) + position_at_end after_bb builder; + + (* Restore the unshadowed variable. *) + begin match old_val with + | Some old_val -> Hashtbl.add named_values var_name old_val + | None -> () + end; + + (* for expr always returns 0.0. *) + const_null double_type + | Ast.Var (var_names, body) -> + let old_bindings = ref [] in + + let the_function = block_parent (insertion_block builder) in + + (* Register all variables and emit their initializer. *) + Array.iter (fun (var_name, init) -> + (* Emit the initializer before adding the variable to scope, this + * prevents the initializer from referencing the variable itself, and + * permits stuff like this: + * var a = 1 in + * var a = a in ... # refers to outer 'a'. *) + let init_val = + match init with + | Some init -> codegen_expr init + (* If not specified, use 0.0. *) + | None -> const_float double_type 0.0 + in + + let alloca = create_entry_block_alloca the_function var_name in + ignore(build_store init_val alloca builder); + + (* Remember the old variable binding so that we can restore the binding + * when we unrecurse. *) + begin + try + let old_value = Hashtbl.find named_values var_name in + old_bindings := (var_name, old_value) :: !old_bindings; + with Not_found -> () + end; + + (* Remember this binding. *) + Hashtbl.add named_values var_name alloca; + ) var_names; + + (* Codegen the body, now that all vars are in scope. *) + let body_val = codegen_expr body in + + (* Pop all our variables from scope. *) + List.iter (fun (var_name, old_value) -> + Hashtbl.add named_values var_name old_value + ) !old_bindings; + + (* Return the body computation. *) + body_val + + let codegen_proto = function + | Ast.Prototype (name, args) | Ast.BinOpPrototype (name, args, _) -> + (* Make the function type: double(double,double) etc. *) + let doubles = Array.make (Array.length args) double_type in + let ft = function_type double_type doubles in + let f = + match lookup_function name the_module with + | None -> declare_function name ft the_module + + (* If 'f' conflicted, there was already something named 'name'. If it + * has a body, don't allow redefinition or reextern. *) + | Some f -> + (* If 'f' already has a body, reject this. *) + if block_begin f <> At_end f then + raise (Error "redefinition of function"); + + (* If 'f' took a different number of arguments, reject. *) + if element_type (type_of f) <> ft then + raise (Error "redefinition of function with different # args"); + f + in + + (* Set names for all arguments. *) + Array.iteri (fun i a -> + let n = args.(i) in + set_value_name n a; + Hashtbl.add named_values n a; + ) (params f); + f + + (* Create an alloca for each argument and register the argument in the symbol + * table so that references to it will succeed. *) + let create_argument_allocas the_function proto = + let args = match proto with + | Ast.Prototype (_, args) | Ast.BinOpPrototype (_, args, _) -> args + in + Array.iteri (fun i ai -> + let var_name = args.(i) in + (* Create an alloca for this variable. *) + let alloca = create_entry_block_alloca the_function var_name in + + (* Store the initial value into the alloca. *) + ignore(build_store ai alloca builder); + + (* Add arguments to variable symbol table. *) + Hashtbl.add named_values var_name alloca; + ) (params the_function) + + let codegen_func the_fpm = function + | Ast.Function (proto, body) -> + Hashtbl.clear named_values; + let the_function = codegen_proto proto in + + (* If this is an operator, install it. *) + begin match proto with + | Ast.BinOpPrototype (name, args, prec) -> + let op = name.[String.length name - 1] in + Hashtbl.add Parser.binop_precedence op prec; + | _ -> () + end; + + (* Create a new basic block to start insertion into. *) + let bb = append_block context "entry" the_function in + position_at_end bb builder; + + try + (* Add all arguments to the symbol table and create their allocas. *) + create_argument_allocas the_function proto; + + let ret_val = codegen_expr body in + + (* Finish off the function. *) + let _ = build_ret ret_val builder in + + (* Validate the generated code, checking for consistency. *) + Llvm_analysis.assert_valid_function the_function; + + (* Optimize the function. *) + let _ = PassManager.run_function the_function the_fpm in + + the_function + with e -> + delete_function the_function; + raise e + +toplevel.ml: + .. code-block:: ocaml + + (*===----------------------------------------------------------------------=== + * Top-Level parsing and JIT Driver + *===----------------------------------------------------------------------===*) + + open Llvm + open Llvm_executionengine + + (* top ::= definition | external | expression | ';' *) + let rec main_loop the_fpm the_execution_engine stream = + match Stream.peek stream with + | None -> () + + (* ignore top-level semicolons. *) + | Some (Token.Kwd ';') -> + Stream.junk stream; + main_loop the_fpm the_execution_engine stream + + | Some token -> + begin + try match token with + | Token.Def -> + let e = Parser.parse_definition stream in + print_endline "parsed a function definition."; + dump_value (Codegen.codegen_func the_fpm e); + | Token.Extern -> + let e = Parser.parse_extern stream in + print_endline "parsed an extern."; + dump_value (Codegen.codegen_proto e); + | _ -> + (* Evaluate a top-level expression into an anonymous function. *) + let e = Parser.parse_toplevel stream in + print_endline "parsed a top-level expr"; + let the_function = Codegen.codegen_func the_fpm e in + dump_value the_function; + + (* JIT the function, returning a function pointer. *) + let result = ExecutionEngine.run_function the_function [||] + the_execution_engine in + + print_string "Evaluated to "; + print_float (GenericValue.as_float Codegen.double_type result); + print_newline (); + with Stream.Error s | Codegen.Error s -> + (* Skip token for error recovery. *) + Stream.junk stream; + print_endline s; + end; + print_string "ready> "; flush stdout; + main_loop the_fpm the_execution_engine stream + +toy.ml: + .. code-block:: ocaml + + (*===----------------------------------------------------------------------=== + * Main driver code. + *===----------------------------------------------------------------------===*) + + open Llvm + open Llvm_executionengine + open Llvm_target + open Llvm_scalar_opts + + let main () = + ignore (initialize_native_target ()); + + (* Install standard binary operators. + * 1 is the lowest precedence. *) + Hashtbl.add Parser.binop_precedence '=' 2; + Hashtbl.add Parser.binop_precedence '<' 10; + Hashtbl.add Parser.binop_precedence '+' 20; + Hashtbl.add Parser.binop_precedence '-' 20; + Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) + + (* Prime the first token. *) + print_string "ready> "; flush stdout; + let stream = Lexer.lex (Stream.of_channel stdin) in + + (* Create the JIT. *) + let the_execution_engine = ExecutionEngine.create Codegen.the_module in + let the_fpm = PassManager.create_function Codegen.the_module in + + (* Set up the optimizer pipeline. Start with registering info about how the + * target lays out data structures. *) + DataLayout.add (ExecutionEngine.target_data the_execution_engine) the_fpm; + + (* Promote allocas to registers. *) + add_memory_to_register_promotion the_fpm; + + (* Do simple "peephole" optimizations and bit-twiddling optzn. *) + add_instruction_combination the_fpm; + + (* reassociate expressions. *) + add_reassociation the_fpm; + + (* Eliminate Common SubExpressions. *) + add_gvn the_fpm; + + (* Simplify the control flow graph (deleting unreachable blocks, etc). *) + add_cfg_simplification the_fpm; + + ignore (PassManager.initialize the_fpm); + + (* Run the main "interpreter loop" now. *) + Toplevel.main_loop the_fpm the_execution_engine stream; + + (* Print out all the generated code. *) + dump_module Codegen.the_module + ;; + + main () + +bindings.c + .. code-block:: c + + #include <stdio.h> + + /* putchard - putchar that takes a double and returns 0. */ + extern double putchard(double X) { + putchar((char)X); + return 0; + } + + /* printd - printf that takes a double prints it as "%f\n", returning 0. */ + extern double printd(double X) { + printf("%f\n", X); + return 0; + } + +`Next: Conclusion and other useful LLVM tidbits <OCamlLangImpl8.html>`_ + diff --git a/docs/tutorial/OCamlLangImpl8.html b/docs/tutorial/OCamlLangImpl8.html deleted file mode 100644 index 7c1a500a21..0000000000 --- a/docs/tutorial/OCamlLangImpl8.html +++ /dev/null @@ -1,359 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> - -<html> -<head> - <title>Kaleidoscope: Conclusion and other useful LLVM tidbits</title> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <meta name="author" content="Chris Lattner"> - <link rel="stylesheet" href="../_static/llvm.css" type="text/css"> -</head> - -<body> - -<h1>Kaleidoscope: Conclusion and other useful LLVM tidbits</h1> - -<ul> -<li><a href="index.html">Up to Tutorial Index</a></li> -<li>Chapter 8 - <ol> - <li><a href="#conclusion">Tutorial Conclusion</a></li> - <li><a href="#llvmirproperties">Properties of LLVM IR</a> - <ul> - <li><a href="#targetindep">Target Independence</a></li> - <li><a href="#safety">Safety Guarantees</a></li> - <li><a href="#langspecific">Language-Specific Optimizations</a></li> - </ul> - </li> - <li><a href="#tipsandtricks">Tips and Tricks</a> - <ul> - <li><a href="#offsetofsizeof">Implementing portable - offsetof/sizeof</a></li> - <li><a href="#gcstack">Garbage Collected Stack Frames</a></li> - </ul> - </li> - </ol> -</li> -</ul> - - -<div class="doc_author"> - <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p> -</div> - -<!-- *********************************************************************** --> -<h2><a name="conclusion">Tutorial Conclusion</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>Welcome to the final chapter of the "<a href="index.html">Implementing a -language with LLVM</a>" tutorial. In the course of this tutorial, we have grown -our little Kaleidoscope language from being a useless toy, to being a -semi-interesting (but probably still useless) toy. :)</p> - -<p>It is interesting to see how far we've come, and how little code it has -taken. We built the entire lexer, parser, AST, code generator, and an -interactive run-loop (with a JIT!) by-hand in under 700 lines of -(non-comment/non-blank) code.</p> - -<p>Our little language supports a couple of interesting features: it supports -user defined binary and unary operators, it uses JIT compilation for immediate -evaluation, and it supports a few control flow constructs with SSA construction. -</p> - -<p>Part of the idea of this tutorial was to show you how easy and fun it can be -to define, build, and play with languages. Building a compiler need not be a -scary or mystical process! Now that you've seen some of the basics, I strongly -encourage you to take the code and hack on it. For example, try adding:</p> - -<ul> -<li><b>global variables</b> - While global variables have questional value in -modern software engineering, they are often useful when putting together quick -little hacks like the Kaleidoscope compiler itself. Fortunately, our current -setup makes it very easy to add global variables: just have value lookup check -to see if an unresolved variable is in the global variable symbol table before -rejecting it. To create a new global variable, make an instance of the LLVM -<tt>GlobalVariable</tt> class.</li> - -<li><b>typed variables</b> - Kaleidoscope currently only supports variables of -type double. This gives the language a very nice elegance, because only -supporting one type means that you never have to specify types. Different -languages have different ways of handling this. The easiest way is to require -the user to specify types for every variable definition, and record the type -of the variable in the symbol table along with its Value*.</li> - -<li><b>arrays, structs, vectors, etc</b> - Once you add types, you can start -extending the type system in all sorts of interesting ways. Simple arrays are -very easy and are quite useful for many different applications. Adding them is -mostly an exercise in learning how the LLVM <a -href="../LangRef.html#i_getelementptr">getelementptr</a> instruction works: it -is so nifty/unconventional, it <a -href="../GetElementPtr.html">has its own FAQ</a>! If you add support -for recursive types (e.g. linked lists), make sure to read the <a -href="../ProgrammersManual.html#TypeResolve">section in the LLVM -Programmer's Manual</a> that describes how to construct them.</li> - -<li><b>standard runtime</b> - Our current language allows the user to access -arbitrary external functions, and we use it for things like "printd" and -"putchard". As you extend the language to add higher-level constructs, often -these constructs make the most sense if they are lowered to calls into a -language-supplied runtime. For example, if you add hash tables to the language, -it would probably make sense to add the routines to a runtime, instead of -inlining them all the way.</li> - -<li><b>memory management</b> - Currently we can only access the stack in -Kaleidoscope. It would also be useful to be able to allocate heap memory, -either with calls to the standard libc malloc/free interface or with a garbage -collector. If you would like to use garbage collection, note that LLVM fully -supports <a href="../GarbageCollection.html">Accurate Garbage Collection</a> -including algorithms that move objects and need to scan/update the stack.</li> - -<li><b>debugger support</b> - LLVM supports generation of <a -href="../SourceLevelDebugging.html">DWARF Debug info</a> which is understood by -common debuggers like GDB. Adding support for debug info is fairly -straightforward. The best way to understand it is to compile some C/C++ code -with "<tt>llvm-gcc -g -O0</tt>" and taking a look at what it produces.</li> - -<li><b>exception handling support</b> - LLVM supports generation of <a -href="../ExceptionHandling.html">zero cost exceptions</a> which interoperate -with code compiled in other languages. You could also generate code by -implicitly making every function return an error value and checking it. You -could also make explicit use of setjmp/longjmp. There are many different ways -to go here.</li> - -<li><b>object orientation, generics, database access, complex numbers, -geometric programming, ...</b> - Really, there is -no end of crazy features that you can add to the language.</li> - -<li><b>unusual domains</b> - We've been talking about applying LLVM to a domain -that many people are interested in: building a compiler for a specific language. -However, there are many other domains that can use compiler technology that are -not typically considered. For example, LLVM has been used to implement OpenGL -graphics acceleration, translate C++ code to ActionScript, and many other -cute and clever things. Maybe you will be the first to JIT compile a regular -expression interpreter into native code with LLVM?</li> - -</ul> - -<p> -Have fun - try doing something crazy and unusual. Building a language like -everyone else always has, is much less fun than trying something a little crazy -or off the wall and seeing how it turns out. If you get stuck or want to talk -about it, feel free to email the <a -href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">llvmdev mailing -list</a>: it has lots of people who are interested in languages and are often -willing to help out. -</p> - -<p>Before we end this tutorial, I want to talk about some "tips and tricks" for generating -LLVM IR. These are some of the more subtle things that may not be obvious, but -are very useful if you want to take advantage of LLVM's capabilities.</p> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="llvmirproperties">Properties of the LLVM IR</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>We have a couple common questions about code in the LLVM IR form - lets just -get these out of the way right now, shall we?</p> - -<!-- ======================================================================= --> -<h4><a name="targetindep">Target Independence</a></h4> -<!-- ======================================================================= --> - -<div> - -<p>Kaleidoscope is an example of a "portable language": any program written in -Kaleidoscope will work the same way on any target that it runs on. Many other -languages have this property, e.g. lisp, java, haskell, javascript, python, etc -(note that while these languages are portable, not all their libraries are).</p> - -<p>One nice aspect of LLVM is that it is often capable of preserving target -independence in the IR: you can take the LLVM IR for a Kaleidoscope-compiled -program and run it on any target that LLVM supports, even emitting C code and -compiling that on targets that LLVM doesn't support natively. You can trivially -tell that the Kaleidoscope compiler generates target-independent code because it -never queries for any target-specific information when generating code.</p> - -<p>The fact that LLVM provides a compact, target-independent, representation for -code gets a lot of people excited. Unfortunately, these people are usually -thinking about C or a language from the C family when they are asking questions -about language portability. I say "unfortunately", because there is really no -way to make (fully general) C code portable, other than shipping the source code -around (and of course, C source code is not actually portable in general -either - ever port a really old application from 32- to 64-bits?).</p> - -<p>The problem with C (again, in its full generality) is that it is heavily -laden with target specific assumptions. As one simple example, the preprocessor -often destructively removes target-independence from the code when it processes -the input text:</p> - -<div class="doc_code"> -<pre> -#ifdef __i386__ - int X = 1; -#else - int X = 42; -#endif -</pre> -</div> - -<p>While it is possible to engineer more and more complex solutions to problems -like this, it cannot be solved in full generality in a way that is better than shipping -the actual source code.</p> - -<p>That said, there are interesting subsets of C that can be made portable. If -you are willing to fix primitive types to a fixed size (say int = 32-bits, -and long = 64-bits), don't care about ABI compatibility with existing binaries, -and are willing to give up some other minor features, you can have portable -code. This can make sense for specialized domains such as an -in-kernel language.</p> - -</div> - -<!-- ======================================================================= --> -<h4><a name="safety">Safety Guarantees</a></h4> -<!-- ======================================================================= --> - -<div> - -<p>Many of the languages above are also "safe" languages: it is impossible for -a program written in Java to corrupt its address space and crash the process -(assuming the JVM has no bugs). -Safety is an interesting property that requires a combination of language -design, runtime support, and often operating system support.</p> - -<p>It is certainly possible to implement a safe language in LLVM, but LLVM IR -does not itself guarantee safety. The LLVM IR allows unsafe pointer casts, -use after free bugs, buffer over-runs, and a variety of other problems. Safety -needs to be implemented as a layer on top of LLVM and, conveniently, several -groups have investigated this. Ask on the <a -href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">llvmdev mailing -list</a> if you are interested in more details.</p> - -</div> - -<!-- ======================================================================= --> -<h4><a name="langspecific">Language-Specific Optimizations</a></h4> -<!-- ======================================================================= --> - -<div> - -<p>One thing about LLVM that turns off many people is that it does not solve all -the world's problems in one system (sorry 'world hunger', someone else will have -to solve you some other day). One specific complaint is that people perceive -LLVM as being incapable of performing high-level language-specific optimization: -LLVM "loses too much information".</p> - -<p>Unfortunately, this is really not the place to give you a full and unified -version of "Chris Lattner's theory of compiler design". Instead, I'll make a -few observations:</p> - -<p>First, you're right that LLVM does lose information. For example, as of this -writing, there is no way to distinguish in the LLVM IR whether an SSA-value came -from a C "int" or a C "long" on an ILP32 machine (other than debug info). Both -get compiled down to an 'i32' value and the information about what it came from -is lost. The more general issue here, is that the LLVM type system uses -"structural equivalence" instead of "name equivalence". Another place this -surprises people is if you have two types in a high-level language that have the -same structure (e.g. two different structs that have a single int field): these -types will compile down into a single LLVM type and it will be impossible to -tell what it came from.</p> - -<p>Second, while LLVM does lose information, LLVM is not a fixed target: we -continue to enhance and improve it in many different ways. In addition to -adding new features (LLVM did not always support exceptions or debug info), we -also extend the IR to capture important information for optimization (e.g. -whether an argument is sign or zero extended, information about pointers -aliasing, etc). Many of the enhancements are user-driven: people want LLVM to -include some specific feature, so they go ahead and extend it.</p> - -<p>Third, it is <em>possible and easy</em> to add language-specific -optimizations, and you have a number of choices in how to do it. As one trivial -example, it is easy to add language-specific optimization passes that -"know" things about code compiled for a language. In the case of the C family, -there is an optimization pass that "knows" about the standard C library -functions. If you call "exit(0)" in main(), it knows that it is safe to -optimize that into "return 0;" because C specifies what the 'exit' -function does.</p> - -<p>In addition to simple library knowledge, it is possible to embed a variety of -other language-specific information into the LLVM IR. If you have a specific -need and run into a wall, please bring the topic up on the llvmdev list. At the -very worst, you can always treat LLVM as if it were a "dumb code generator" and -implement the high-level optimizations you desire in your front-end, on the -language-specific AST. -</p> - -</div> - -</div> - -<!-- *********************************************************************** --> -<h2><a name="tipsandtricks">Tips and Tricks</a></h2> -<!-- *********************************************************************** --> - -<div> - -<p>There is a variety of useful tips and tricks that you come to know after -working on/with LLVM that aren't obvious at first glance. Instead of letting -everyone rediscover them, this section talks about some of these issues.</p> - -<!-- ======================================================================= --> -<h4><a name="offsetofsizeof">Implementing portable offsetof/sizeof</a></h4> -<!-- ======================================================================= --> - -<div> - -<p>One interesting thing that comes up, if you are trying to keep the code -generated by your compiler "target independent", is that you often need to know -the size of some LLVM type or the offset of some field in an llvm structure. -For example, you might need to pass the size of a type into a function that -allocates memory.</p> - -<p>Unfortunately, this can vary widely across targets: for example the width of -a pointer is trivially target-specific. However, there is a <a -href="http://nondot.org/sabre/LLVMNotes/SizeOf-OffsetOf-VariableSizedStructs.txt">clever -way to use the getelementptr instruction</a> that allows you to compute this -in a portable way.</p> - -</div> - -<!-- ======================================================================= --> -<h4><a name="gcstack">Garbage Collected Stack Frames</a></h4> -<!-- ======================================================================= --> - -<div> - -<p>Some languages want to explicitly manage their stack frames, often so that -they are garbage collected or to allow easy implementation of closures. There -are often better ways to implement these features than explicit stack frames, -but <a -href="http://nondot.org/sabre/LLVMNotes/ExplicitlyManagedStackFrames.txt">LLVM -does support them,</a> if you want. It requires your front-end to convert the -code into <a -href="http://en.wikipedia.org/wiki/Continuation-passing_style">Continuation -Passing Style</a> and the use of tail calls (which LLVM also supports).</p> - -</div> - -</div> - -<!-- *********************************************************************** --> -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> - - <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> - <a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> -</body> -</html> diff --git a/docs/tutorial/OCamlLangImpl8.rst b/docs/tutorial/OCamlLangImpl8.rst new file mode 100644 index 0000000000..4058991f19 --- /dev/null +++ b/docs/tutorial/OCamlLangImpl8.rst @@ -0,0 +1,269 @@ +====================================================== +Kaleidoscope: Conclusion and other useful LLVM tidbits +====================================================== + +.. contents:: + :local: + +Written by `Chris Lattner <mailto:sabre@nondot.org>`_ + +Tutorial Conclusion +=================== + +Welcome to the final chapter of the "`Implementing a language with +LLVM <index.html>`_" tutorial. In the course of this tutorial, we have +grown our little Kaleidoscope language from being a useless toy, to +being a semi-interesting (but probably still useless) toy. :) + +It is interesting to see how far we've come, and how little code it has +taken. We built the entire lexer, parser, AST, code generator, and an +interactive run-loop (with a JIT!) by-hand in under 700 lines of +(non-comment/non-blank) code. + +Our little language supports a couple of interesting features: it +supports user defined binary and unary operators, it uses JIT +compilation for immediate evaluation, and it supports a few control flow +constructs with SSA construction. + +Part of the idea of this tutorial was to show you how easy and fun it +can be to define, build, and play with languages. Building a compiler +need not be a scary or mystical process! Now that you've seen some of +the basics, I strongly encourage you to take the code and hack on it. +For example, try adding: + +- **global variables** - While global variables have questional value + in modern software engineering, they are often useful when putting + together quick little hacks like the Kaleidoscope compiler itself. + Fortunately, our current setup makes it very easy to add global + variables: just have value lookup check to see if an unresolved + variable is in the global variable symbol table before rejecting it. + To create a new global variable, make an instance of the LLVM + ``GlobalVariable`` class. +- **typed variables** - Kaleidoscope currently only supports variables + of type double. This gives the language a very nice elegance, because + only supporting one type means that you never have to specify types. + Different languages have different ways of handling this. The easiest + way is to require the user to specify types for every variable + definition, and record the type of the variable in the symbol table + along with its Value\*. +- **arrays, structs, vectors, etc** - Once you add types, you can start + extending the type system in all sorts of interesting ways. Simple + arrays are very easy and are quite useful for many different + applications. Adding them is mostly an exercise in learning how the + LLVM `getelementptr <../LangRef.html#i_getelementptr>`_ instruction + works: it is so nifty/unconventional, it `has its own + FAQ <../GetElementPtr.html>`_! If you add support for recursive types + (e.g. linked lists), make sure to read the `section in the LLVM + Programmer's Manual <../ProgrammersManual.html#TypeResolve>`_ that + describes how to construct them. +- **standard runtime** - Our current language allows the user to access + arbitrary external functions, and we use it for things like "printd" + and "putchard". As you extend the language to add higher-level + constructs, often these constructs make the most sense if they are + lowered to calls into a language-supplied runtime. For example, if + you add hash tables to the language, it would probably make sense to + add the routines to a runtime, instead of inlining them all the way. +- **memory management** - Currently we can only access the stack in + Kaleidoscope. It would also be useful to be able to allocate heap + memory, either with calls to the standard libc malloc/free interface + or with a garbage collector. If you would like to use garbage + collection, note that LLVM fully supports `Accurate Garbage + Collection <../GarbageCollection.html>`_ including algorithms that + move objects and need to scan/update the stack. +- **debugger support** - LLVM supports generation of `DWARF Debug + info <../SourceLevelDebugging.html>`_ which is understood by common + debuggers like GDB. Adding support for debug info is fairly + straightforward. The best way to understand it is to compile some + C/C++ code with "``llvm-gcc -g -O0``" and taking a look at what it + produces. +- **exception handling support** - LLVM supports generation of `zero + cost exceptions <../ExceptionHandling.html>`_ which interoperate with + code compiled in other languages. You could also generate code by + implicitly making every function return an error value and checking + it. You could also make explicit use of setjmp/longjmp. There are + many different ways to go here. +- **object orientation, generics, database access, complex numbers, + geometric programming, ...** - Really, there is no end of crazy + features that you can add to the language. +- **unusual domains** - We've been talking about applying LLVM to a + domain that many people are interested in: building a compiler for a + specific language. However, there are many other domains that can use + compiler technology that are not typically considered. For example, + LLVM has been used to implement OpenGL graphics acceleration, + translate C++ code to ActionScript, and many other cute and clever + things. Maybe you will be the first to JIT compile a regular + expression interpreter into native code with LLVM? + +Have fun - try doing something crazy and unusual. Building a language +like everyone else always has, is much less fun than trying something a +little crazy or off the wall and seeing how it turns out. If you get +stuck or want to talk about it, feel free to email the `llvmdev mailing +list <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>`_: it has lots +of people who are interested in languages and are often willing to help +out. + +Before we end this tutorial, I want to talk about some "tips and tricks" +for generating LLVM IR. These are some of the more subtle things that +may not be obvious, but are very useful if you want to take advantage of +LLVM's capabilities. + +Properties of the LLVM IR +========================= + +We have a couple common questions about code in the LLVM IR form - lets +just get these out of the way right now, shall we? + +Target Independence +------------------- + +Kaleidoscope is an example of a "portable language": any program written +in Kaleidoscope will work the same way on any target that it runs on. +Many other languages have this property, e.g. lisp, java, haskell, +javascript, python, etc (note that while these languages are portable, +not all their libraries are). + +One nice aspect of LLVM is that it is often capable of preserving target +independence in the IR: you can take the LLVM IR for a +Kaleidoscope-compiled program and run it on any target that LLVM +supports, even emitting C code and compiling that on targets that LLVM +doesn't support natively. You can trivially tell that the Kaleidoscope +compiler generates target-independent code because it never queries for +any target-specific information when generating code. + +The fact that LLVM provides a compact, target-independent, +representation for code gets a lot of people excited. Unfortunately, +these people are usually thinking about C or a language from the C +family when they are asking questions about language portability. I say +"unfortunately", because there is really no way to make (fully general) +C code portable, other than shipping the source code around (and of +course, C source code is not actually portable in general either - ever +port a really old application from 32- to 64-bits?). + +The problem with C (again, in its full generality) is that it is heavily +laden with target specific assumptions. As one simple example, the +preprocessor often destructively removes target-independence from the +code when it processes the input text: + +.. code-block:: c + + #ifdef __i386__ + int X = 1; + #else + int X = 42; + #endif + +While it is possible to engineer more and more complex solutions to +problems like this, it cannot be solved in full generality in a way that +is better than shipping the actual source code. + +That said, there are interesting subsets of C that can be made portable. +If you are willing to fix primitive types to a fixed size (say int = +32-bits, and long = 64-bits), don't care about ABI compatibility with +existing binaries, and are willing to give up some other minor features, +you can have portable code. This can make sense for specialized domains +such as an in-kernel language. + +Safety Guarantees +----------------- + +Many of the languages above are also "safe" languages: it is impossible +for a program written in Java to corrupt its address space and crash the +process (assuming the JVM has no bugs). Safety is an interesting +property that requires a combination of language design, runtime +support, and often operating system support. + +It is certainly possible to implement a safe language in LLVM, but LLVM +IR does not itself guarantee safety. The LLVM IR allows unsafe pointer +casts, use after free bugs, buffer over-runs, and a variety of other +problems. Safety needs to be implemented as a layer on top of LLVM and, +conveniently, several groups have investigated this. Ask on the `llvmdev +mailing list <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>`_ if +you are interested in more details. + +Language-Specific Optimizations +------------------------------- + +One thing about LLVM that turns off many people is that it does not +solve all the world's problems in one system (sorry 'world hunger', +someone else will have to solve you some other day). One specific +complaint is that people perceive LLVM as being incapable of performing +high-level language-specific optimization: LLVM "loses too much +information". + +Unfortunately, this is really not the place to give you a full and +unified version of "Chris Lattner's theory of compiler design". Instead, +I'll make a few observations: + +First, you're right that LLVM does lose information. For example, as of +this writing, there is no way to distinguish in the LLVM IR whether an +SSA-value came from a C "int" or a C "long" on an ILP32 machine (other +than debug info). Both get compiled down to an 'i32' value and the +information about what it came from is lost. The more general issue +here, is that the LLVM type system uses "structural equivalence" instead +of "name equivalence". Another place this surprises people is if you +have two types in a high-level language that have the same structure +(e.g. two different structs that have a single int field): these types +will compile down into a single LLVM type and it will be impossible to +tell what it came from. + +Second, while LLVM does lose information, LLVM is not a fixed target: we +continue to enhance and improve it in many different ways. In addition +to adding new features (LLVM did not always support exceptions or debug +info), we also extend the IR to capture important information for +optimization (e.g. whether an argument is sign or zero extended, +information about pointers aliasing, etc). Many of the enhancements are +user-driven: people want LLVM to include some specific feature, so they +go ahead and extend it. + +Third, it is *possible and easy* to add language-specific optimizations, +and you have a number of choices in how to do it. As one trivial +example, it is easy to add language-specific optimization passes that +"know" things about code compiled for a language. In the case of the C +family, there is an optimization pass that "knows" about the standard C +library functions. If you call "exit(0)" in main(), it knows that it is +safe to optimize that into "return 0;" because C specifies what the +'exit' function does. + +In addition to simple library knowledge, it is possible to embed a +variety of other language-specific information into the LLVM IR. If you +have a specific need and run into a wall, please bring the topic up on +the llvmdev list. At the very worst, you can always treat LLVM as if it +were a "dumb code generator" and implement the high-level optimizations +you desire in your front-end, on the language-specific AST. + +Tips and Tricks +=============== + +There is a variety of useful tips and tricks that you come to know after +working on/with LLVM that aren't obvious at first glance. Instead of +letting everyone rediscover them, this section talks about some of these +issues. + +Implementing portable offsetof/sizeof +------------------------------------- + +One interesting thing that comes up, if you are trying to keep the code +generated by your compiler "target independent", is that you often need +to know the size of some LLVM type or the offset of some field in an +llvm structure. For example, you might need to pass the size of a type +into a function that allocates memory. + +Unfortunately, this can vary widely across targets: for example the +width of a pointer is trivially target-specific. However, there is a +`clever way to use the getelementptr +instruction <http://nondot.org/sabre/LLVMNotes/SizeOf-OffsetOf-VariableSizedStructs.txt>`_ +that allows you to compute this in a portable way. + +Garbage Collected Stack Frames +------------------------------ + +Some languages want to explicitly manage their stack frames, often so +that they are garbage collected or to allow easy implementation of +closures. There are often better ways to implement these features than +explicit stack frames, but `LLVM does support +them, <http://nondot.org/sabre/LLVMNotes/ExplicitlyManagedStackFrames.txt>`_ +if you want. It requires your front-end to convert the code into +`Continuation Passing +Style <http://en.wikipedia.org/wiki/Continuation-passing_style>`_ and +the use of tail calls (which LLVM also supports). + diff --git a/docs/tutorial/index.html b/docs/tutorial/index.html deleted file mode 100644 index 2c11a9a48b..0000000000 --- a/docs/tutorial/index.html +++ /dev/null @@ -1,48 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> -<html> -<head> - <title>LLVM Tutorial: Table of Contents</title> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <meta name="author" content="Owen Anderson"> - <meta name="description" - content="LLVM Tutorial: Table of Contents."> - <link rel="stylesheet" href="../_static/llvm.css" type="text/css"> -</head> - -<body> - -<h1>LLVM Tutorial: Table of Contents</h1> - -<ol> - <li>Kaleidoscope: Implementing a Language with LLVM - <ol> - <li><a href="LangImpl1.html">Tutorial Introduction and the Lexer</a></li> - <li><a href="LangImpl2.html">Implementing a Parser and AST</a></li> - <li><a href="LangImpl3.html">Implementing Code Generation to LLVM IR</a></li> - <li><a href="LangImpl4.html">Adding JIT and Optimizer Support</a></li> - <li><a href="LangImpl5.html">Extending the language: control flow</a></li> - <li><a href="LangImpl6.html">Extending the language: user-defined operators</a></li> - <li><a href="LangImpl7.html">Extending the language: mutable variables / SSA construction</a></li> - <li><a href="LangImpl8.html">Conclusion and other useful LLVM tidbits</a></li> - </ol></li> - <li>Kaleidoscope: Implementing a Language with LLVM in Objective Caml - <ol> - <li><a href="OCamlLangImpl1.html">Tutorial Introduction and the Lexer</a></li> - <li><a href="OCamlLangImpl2.html">Implementing a Parser and AST</a></li> - <li><a href="OCamlLangImpl3.html">Implementing Code Generation to LLVM IR</a></li> - <li><a href="OCamlLangImpl4.html">Adding JIT and Optimizer Support</a></li> - <li><a href="OCamlLangImpl5.html">Extending the language: control flow</a></li> - <li><a href="OCamlLangImpl6.html">Extending the language: user-defined operators</a></li> - <li><a href="OCamlLangImpl7.html">Extending the language: mutable variables / SSA construction</a></li> - <li><a href="OCamlLangImpl8.html">Conclusion and other useful LLVM tidbits</a></li> - </ol></li> - <li>Advanced Topics - <ol> - <li><a href="http://llvm.org/pubs/2004-09-22-LCPCLLVMTutorial.html">Writing - an Optimization for LLVM</a></li> - </ol></li> -</ol> - -</body> -</html> diff --git a/docs/tutorial/index.rst b/docs/tutorial/index.rst new file mode 100644 index 0000000000..4658f9619e --- /dev/null +++ b/docs/tutorial/index.rst @@ -0,0 +1,30 @@ +================================ +LLVM Tutorial: Table of Contents +================================ + +Kaleidoscope: Implementing a Language with LLVM +=============================================== + +.. toctree:: + :titlesonly: + :glob: + :numbered: + + LangImpl* + +Kaleidoscope: Implementing a Language with LLVM in Objective Caml +================================================================= + +.. toctree:: + :titlesonly: + :glob: + :numbered: + + OCamlLangImpl* + + +Advanced Topics +=============== + +#. `Writing an Optimization for LLVM <http://llvm.org/pubs/2004-09-22-LCPCLLVMTutorial.html>`_ + diff --git a/docs/userguides.rst b/docs/userguides.rst index 8c1554dfce..cfb6dbeb5e 100644 --- a/docs/userguides.rst +++ b/docs/userguides.rst @@ -20,6 +20,9 @@ User Guides HowToSubmitABug SphinxQuickstartTemplate Phabricator + TestingGuide + tutorial/index + ReleaseNotes * :ref:`getting_started` @@ -36,13 +39,12 @@ User Guides Notes on building and testing LLVM/Clang on ARM. -* `Getting Started with the LLVM System using Microsoft Visual Studio - <GettingStartedVS.html>`_ +* :doc:`GettingStartedVS` An addendum to the main Getting Started guide for those using Visual Studio on Windows. -* `LLVM Tutorial <tutorial/>`_ +* :doc:`tutorial/index` A walk through the process of using LLVM for a custom language, and the facilities LLVM offers in tutorial form. @@ -64,7 +66,7 @@ User Guides A list of common questions and problems and their solutions. -* `Release notes for the current release <ReleaseNotes.html>`_ +* :doc:`Release notes for the current release <ReleaseNotes>` This describes new features, known bugs, and other limitations. @@ -77,7 +79,7 @@ User Guides A template + tutorial for writing new Sphinx documentation. It is meant to be read in source form. -* `LLVM Testing Infrastructure Guide <TestingGuide.html>`_ +* :doc:`LLVM Testing Infrastructure Guide <TestingGuide>` A reference manual for using the LLVM testing infrastructure. |