<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>Kaleidoscope: Implementing code generation to LLVM IR</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="author" content="Chris Lattner">
<link rel="stylesheet" href="../llvm.css" type="text/css">
</head>
<body>
<div class="doc_title">Kaleidoscope: Code generation to LLVM IR</div>
<div class="doc_author">
<p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
</div>
<!-- *********************************************************************** -->
<div class="doc_section"><a name="intro">Part 3 Introduction</a></div>
<!-- *********************************************************************** -->
<div class="doc_text">
<p>Welcome to part 3 of the "<a href="index.html">Implementing a language with
LLVM</a>" tutorial. This chapter shows you how to transform the <a
href="LangImpl2.html">Abstract Syntax Tree built in Chapter 2</a> into LLVM IR.
This will teach you a little bit about how LLVM does things, as well as
demonstrate how easy it is to use. It's much more work to build a lexer and
parser than it is to generate LLVM IR code.
</p>
</div>
<!-- *********************************************************************** -->
<div class="doc_section"><a name="basics">Code Generation setup</a></div>
<!-- *********************************************************************** -->
<div class="doc_text">
<p>
In order to generate LLVM IR, we want some simple setup to get started. First,
we define virtual codegen methods in each AST class:</p>
<div class="doc_code">
<pre>
/// ExprAST - Base class for all expression nodes.
class ExprAST {
public:
virtual ~ExprAST() {}
virtual Value *Codegen() = 0;
};
/// NumberExprAST - Expression class for numeric literals like "1.0".
class NumberExprAST : public ExprAST {
double Val;
public:
explicit NumberExprAST(double val) : Val(val) {}
virtual Value *Codegen();
};
...
</pre>
</div>
<p>The Codegen() method says to emit IR for that AST node and all things it
depends on, and they all return an LLVM Value object.
"Value" is the class used to represent a "<a
href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Static Single
Assignment (SSA)</a> register" or "SSA value" in LLVM. The most distinct aspect
of SSA values is that their value is computed as the related instruction
executes, and it does not get a new value until (and if) the instruction
re-executes. In order words, there is no way to "change" an SSA value. For
more information, please read up on <a
href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Static Single
Assignment</a> - the concepts are really quite natural once you grok them.</p>
<p>The
second thing we want is an "Error" method like we used for parser, which will
be used to report errors found during code generation (for example, use of an
undeclared parameter):</p>
<div class="doc_code">
<pre>
Value *ErrorV(const char *Str) { Error(Str); return 0; }
static Module *TheModule;
static LLVMBuilder Builder;
static std::map<std::string, Value*> NamedValues;
</pre>
</div>
<p>The static variables will be used during code generation. <tt>TheModule</tt>
is the LLVM construct that contains all of the functions and global variables in
a chunk of code. In many ways, it is the top-level structure that the LLVM IR
uses to contain code.</p>
<p>The <tt>Builder</tt> object is a helper object that makes it easy to generate
LLVM instructions. The <tt>Builder</tt> keeps track of the current place to
insert instructions and has methods to create new instructions.</p>
<p>The <tt>NamedValues</tt> map keeps track of which values are defined in the
current scope and what their LLVM representation is. In this form of
Kaleidoscope, the only things that can be referenced are function parameters.
As such, function parameters will be in this map when generating code for their
function body.</p>
<p>
With these basics in place, we can start talking about how to generate code for
each expression. Note that this assumes that the <tt>Builder</tt> has been set
up to generate code <em>into</em> something. For now, we'll assume that this
has already been done, and we'll just use it to emit code.
</p>
</div>
<!-- *********************************************************************** -->
<div class="doc_section"><a name="exprs">Expression Code Generation</a></div>
<!-- *********************************************************************** -->
<div class="doc_text">
<p>Generating LLVM code for expression nodes is very straight-forward: less
than 45 lines of commented code for all four of our expression nodes. First,
we'll do numeric literals:</p>
<div class="doc_code">
<pre>
Value *NumberExprAST::Codegen() {
return ConstantFP::get(Type::DoubleTy, APFloat(Val));
}
</pre>
</div>
<p>In the LLVM IR, numeric constants are represented with the
<tt>ConstantFP</tt> class, which holds the numeric value in an <tt>APFloat</tt>
internally (