<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>Kaleidoscope: Implementing a Parser and AST</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="author" content="Chris Lattner">
<link rel="stylesheet" href="../llvm.css" type="text/css">
</head>
<body>
<div class="doc_title">Kaleidoscope: Implementing a Parser and AST</div>
<div class="doc_author">
<p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
</div>
<!-- *********************************************************************** -->
<div class="doc_section"><a name="intro">Part 2 Introduction</a></div>
<!-- *********************************************************************** -->
<div class="doc_text">
<p>Welcome to part 2 of the "<a href="index.html">Implementing a language with
LLVM</a>" tutorial. This chapter shows you how to use the <a
href="LangImpl1.html">Lexer built in Chapter 1</a> to build a full <a
href="http://en.wikipedia.org/wiki/Parsing">parser</a> for
our Kaleidoscope language and build an <a
href="http://en.wikipedia.org/wiki/Abstract_syntax_tree">Abstract Syntax
Tree</a> (AST).</p>
<p>The parser we will build uses a combination of <a
href="http://en.wikipedia.org/wiki/Recursive_descent_parser">Recursive Descent
Parsing</a> and <a href=
"http://en.wikipedia.org/wiki/Operator-precedence_parser">Operator-Precedence
Parsing</a> to parse the Kaleidoscope language (the later for binary expression
and the former for everything else). Before we get to parsing though, lets talk
about the output of the parser: the Abstract Syntax Tree.</p>
</div>
<!-- *********************************************************************** -->
<div class="doc_section"><a name="ast">The Abstract Syntax Tree (AST)</a></div>
<!-- *********************************************************************** -->
<div class="doc_text">
<p>The AST for a program captures its behavior in a way that it is easy for
later stages of the compiler (e.g. code generation) to interpret. We basically
want one object for each construct in the language, and the AST should closely
model the language. In Kaleidoscope, we have expressions, a prototype, and a
function object. We'll start with expressions first:</p>
<div class="doc_code">
<pre>
/// ExprAST - Base class for all expression nodes.
class ExprAST {
public:
virtual ~ExprAST() {}
};
/// NumberExprAST - Expression class for numeric literals like "1.0".
class NumberExprAST : public ExprAST {
double Val;
public:
explicit NumberExprAST(double val) : Val(val) {}
};
</pre>
</div>
<p>The code above shows the definition of the base ExprAST class and one
subclass which we use for numeric literals. The important thing about this is
that the NumberExprAST class captures the numeric value of the literal in the
class, so that later phases of the compiler can know what it is.</p>
<p>Right now we only create the AST, so there are no useful accessor methods on
them. It would be very easy to add a virtual method to pretty print the code,
for example. Here are the other expression AST node definitions that we'll use
in the basic form of the Kaleidoscope language.
</p>
<div class="doc_code">
<pre>
/// VariableExprAST - Expression class for referencing a variable, like "a".
class VariableExprAST : public ExprAST {
std::string Name;
public:
explicit VariableExprAST(const std::string &name) : Name(name) {}
};
/// BinaryExprAST - Expression class for a binary operator.
class BinaryExprAST : public ExprAST {
char Op;
ExprAST *LHS, *RHS;
public:
BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
: Op(op), LHS(lhs), RHS(rhs) {}
};
/// CallExprAST - Expression class for function calls.
class CallExprAST : public ExprAST {
std::string Callee;
std::vector<ExprAST*> Args;
public:
CallExprAST(const std::string &callee, std::vector<ExprAST*> &args)
: Callee(callee), Args(args) {}
};
</pre>
</div>
<p>This is all (intentially) rather straight-forward: variables capture the
variable name, binary operators capture their opcode (e.g. '+'), and calls
capture a function name and list of argument expressions. One thing that is
nice about our AST is that it captures the language features without talking
about the syntax of the language. Note that there is no discussion about
precedence of binary operators, lexical structure etc.</p>
<p>For our basic language, these are all of the expression nodes we'll define.
Because it doesn't have conditional control flow, it isn't Turing-complete;
we'll fix that in a later installment. The two things we need next are a way
to talk about the interface to a function, and a way to talk about functions
themselves:</p>
<div class="doc_code">
<pre>
/// PrototypeAST - This class represents the "prototype" for a function,
/// which captures its argument names as well as if it is an operator.
class PrototypeAST {
std::string Name;
std::vector<std::string> Args;
public:
PrototypeAST(const std::string &name, const std::vector<std::string> &args)
: Name(name), Args(args) {}
};
/// FunctionAST - This class represents a function definition itself.
class FunctionAST {
PrototypeAST *Proto;
ExprAST *Body;
public:
FunctionAST(PrototypeAST *proto, ExprAST *body)
: Proto(proto), Body(body) {}
};
</pre>
</div>
<p>In Kaleidoscope, functions are typed with just a count of their arguments.
Since all values are double precision floating point, this fact doesn't need to
be captured anywhere. In a more aggressive and realistic language, the
"ExprAST" class would probably have a type field.</p>
<p>With this scaffolding, we can now talk about parsing expressions and function
bodies in Kaleidoscope.</p>
</div>
<!-- *********************************************************************** -->
<div class="doc_section"><a name="parserbasics">Parser Basics</a></div>
<!-- *********************************************************************** -->
<div cl