Revert r103213. It broke several sections of live website.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@103219 91177308-0d34-0410-b5e6-96231b3b80d8
author: mike-m <mikem.llvm@gmail.com> 2010-05-07 00:28:04 +0000
committer: mike-m <mikem.llvm@gmail.com> 2010-05-07 00:28:04 +0000
commit: e2c3a49c8029ebd9ef530101cc24c66562e3dff5 (patch)
tree: 91bf9600cc8df90cf99751a8f8bafc317cffc91e /docs/tutorial
parent: c10b5afbe8138b0fdf3af4ed3e1ddf96cf3cb4cb (diff)
18 files changed, 18760 insertions, 0 deletions
diff --git a/docs/tutorial/LangImpl1.html b/docs/tutorial/LangImpl1.html
new file mode 100644
index 0000000000..66843db5d3
--- /dev/null
+++ b/docs/tutorial/LangImpl1.html
@@ -0,0 +1,348 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
+                      "http://www.w3.org/TR/html4/strict.dtd">
+
+<html>
+<head>
+  <title>Kaleidoscope: Tutorial Introduction and the Lexer</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
+  <meta name="author" content="Chris Lattner">
+  <link rel="stylesheet" href="../llvm.css" type="text/css">
+</head>
+
+<body>
+
+<div class="doc_title">Kaleidoscope: Tutorial Introduction and the Lexer</div>
+
+<ul>
+<li><a href="index.html">Up to Tutorial Index</a></li>
+<li>Chapter 1
+  <ol>
+    <li><a href="#intro">Tutorial Introduction</a></li>
+    <li><a href="#language">The Basic Language</a></li>
+    <li><a href="#lexer">The Lexer</a></li>
+  </ol>
+</li>
+<li><a href="LangImpl2.html">Chapter 2</a>: Implementing a Parser and AST</li>
+</ul>
+
+<div class="doc_author">
+  <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
+</div>
+
+<!-- *********************************************************************** -->
+<div class="doc_section"><a name="intro">Tutorial Introduction</a></div>
+<!-- *********************************************************************** -->
+
+<div class="doc_text">
+
+<p>Welcome to the "Implementing a language with LLVM" tutorial.  This tutorial
+runs through the implementation of a simple language, showing how fun and
+easy it can be.  This tutorial will get you up and started as well as help to
+build a framework you can extend to other languages.  The code in this tutorial
+can also be used as a playground to hack on other LLVM specific things.
+</p>
+
+<p>
+The goal of this tutorial is to progressively unveil our language, describing
+how it is built up over time.  This will let us cover a fairly broad range of
+language design and LLVM-specific usage issues, showing and explaining the code
+for it all along the way, without overwhelming you with tons of details up
+front.</p>
+
+<p>It is useful to point out ahead of time that this tutorial is really about
+teaching compiler techniques and LLVM specifically, <em>not</em> about teaching
+modern and sane software engineering principles.  In practice, this means that
+we'll take a number of shortcuts to simplify the exposition.  For example, the
+code leaks memory, uses global variables all over the place, doesn't use nice
+design patterns like <a
+href="http://en.wikipedia.org/wiki/Visitor_pattern">visitors</a>, etc... but it
+is very simple.  If you dig in and use the code as a basis for future projects,
+fixing these deficiencies shouldn't be hard.</p>
+
+<p>I've tried to put this tutorial together in a way that makes chapters easy to
+skip over if you are already familiar with or are uninterested in the various
+pieces.  The structure of the tutorial is:
+</p>
+
+<ul>
+<li><b><a href="#language">Chapter #1</a>: Introduction to the Kaleidoscope
+language, and the definition of its Lexer</b> - This shows where we are going
+and the basic functionality that we want it to do.  In order to make this
+tutorial maximally understandable and hackable, we choose to implement 
+everything in C++ instead of using lexer and parser generators.  LLVM obviously
+works just fine with such tools, feel free to use one if you prefer.</li>
+<li><b><a href="LangImpl2.html">Chapter #2</a>: Implementing a Parser and
+AST</b> - With the lexer in place, we can talk about parsing techniques and
+basic AST construction.  This tutorial describes recursive descent parsing and
+operator precedence parsing.  Nothing in Chapters 1 or 2 is LLVM-specific,
+the code doesn't even link in LLVM at this point. :)</li>
+<li><b><a href="LangImpl3.html">Chapter #3</a>: Code generation to LLVM IR</b> -
+With the AST ready, we can show off how easy generation of LLVM IR really 
+is.</li>
+<li><b><a href="LangImpl4.html">Chapter #4</a>: Adding JIT and Optimizer
+Support</b> - Because a lot of people are interested in using LLVM as a JIT,
+we'll dive right into it and show you the 3 lines it takes to add JIT support.
+LLVM is also useful in many other ways, but this is one simple and "sexy" way
+to shows off its power. :)</li>
+<li><b><a href="LangImpl5.html">Chapter #5</a>: Extending the Language: Control
+Flow</b> - With the language up and running, we show how to extend it with
+control flow operations (if/then/else and a 'for' loop).  This gives us a chance
+to talk about simple SSA construction and control flow.</li>
+<li><b><a href="LangImpl6.html">Chapter #6</a>: Extending the Language: 
+User-defined Operators</b> - This is a silly but fun chapter that talks about
+extending the language to let the user program define their own arbitrary
+unary and binary operators (with assignable precedence!).  This lets us build a
+significant piece of the "language" as library routines.</li>
+<li><b><a href="LangImpl7.html">Chapter #7</a>: Extending the Language: Mutable
+Variables</b> - This chapter talks about adding user-defined local variables
+along with an assignment operator.  The interesting part about this is how
+easy and trivial it is to construct SSA form in LLVM: no, LLVM does <em>not</em>
+require your front-end to construct SSA form!</li>
+<li><b><a href="LangImpl8.html">Chapter #8</a>: Conclusion and other useful LLVM
+tidbits</b> - This chapter wraps up the series by talking about potential
+ways to extend the language, but also includes a bunch of pointers to info about
+"special topics" like adding garbage collection support, exceptions, debugging,
+support for "spaghetti stacks", and a bunch of other tips and tricks.</li>
+
+</ul>
+
+<p>By the end of the tutorial, we'll have written a bit less than 700 lines of 
+non-comment, non-blank, lines of code.  With this small amount of code, we'll
+have built up a very reasonable compiler for a non-trivial language including
+a hand-written lexer, parser, AST, as well as code generation support with a JIT
+compiler.  While other systems may have interesting "hello world" tutorials,
+I think the breadth of this tutorial is a great testament to the strengths of
+LLVM and why you should consider it if you're interested in language or compiler
+design.</p>
+
+<p>A note about this tutorial: we expect you to extend the language and play
+with it on your own.  Take the code and go crazy hacking away at it, compilers
+don't need to be scary creatures - it can be a lot of fun to play with
+languages!</p>
+
+</div>
+
+<!-- *********************************************************************** -->
+<div class="doc_section"><a name="language">The Basic Language</a></div>
+<!-- *********************************************************************** -->
+
+<div class="doc_text">
+
+<p>This tutorial will be illustrated with a toy language that we'll call
+"<a href="http://en.wikipedia.org/wiki/Kaleidoscope">Kaleidoscope</a>" (derived 
+from "meaning beautiful, form, and view").
+Kaleidoscope is a procedural language that allows you to define functions, use
+conditionals, math, etc.  Over the course of the tutorial, we'll extend
+Kaleidoscope to support the if/then/else construct, a for loop, user defined
+operators, JIT compilation with a simple command line interface, etc.</p>
+
+<p>Because we want to keep things simple, the only datatype in Kaleidoscope is a
+64-bit floating point type (aka 'double' in C parlance).  As such, all values
+are implicitly double precision and the language doesn't require type
+declarations.  This gives the language a very nice and simple syntax.  For
+example, the following simple example computes <a 
+href="http://en.wikipedia.org/wiki/Fibonacci_number">Fibonacci numbers:</a></p>
+
+<div class="doc_code">
+<pre>
+# Compute the x'th fibonacci number.
+def fib(x)
+  if x &lt; 3 then
+    1
+  else
+    fib(x-1)+fib(x-2)
+
+# This expression will compute the 40th number.
+fib(40)
+</pre>
+</div>
+
+<p>We also allow Kaleidoscope to call into standard library functions (the LLVM
+JIT makes this completely trivial).  This means that you can use the 'extern'
+keyword to define a function before you use it (this is also useful for mutually
+recursive functions).  For example:</p>
+
+<div class="doc_code">
+<pre>
+extern sin(arg);
+extern cos(arg);
+extern atan2(arg1 arg2);
+
+atan2(sin(.4), cos(42))
+</pre>
+</div>
+
+<p>A more interesting example is included in Chapter 6 where we write a little
+Kaleidoscope application that <a href="LangImpl6.html#example">displays 
+a Mandelbrot Set</a> at various levels of magnification.</p>
+
+<p>Lets dive into the implementation of this language!</p>
+
+</div>
+
+<!-- *********************************************************************** -->
+<div class="doc_section"><a name="lexer">The Lexer</a></div>
+<!-- *********************************************************************** -->
+
+<div class="doc_text">
+
+<p>When it comes to implementing a language, the first thing needed is
+the ability to process a text file and recognize what it says.  The traditional
+way to do this is to use a "<a 
+href="http://en.wikipedia.org/wiki/Lexical_analysis">lexer</a>" (aka 'scanner')
+to break the input up into "tokens".  Each token returned by the lexer includes
+a token code and potentially some metadata (e.g. the numeric value of a number).
+First, we define the possibilities:
+</p>
+
+<div class="doc_code">
+<pre>
+// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
+// of these for known things.
+enum Token {
+  tok_eof = -1,
+
+  // commands
+  tok_def = -2, tok_extern = -3,
+
+  // primary
+  tok_identifier = -4, tok_number = -5,
+};
+
+static std::string IdentifierStr;  // Filled in if tok_identifier
+static double NumVal;              // Filled in if tok_number
+</pre>
+</div>
+
+<p>Each token returned by our lexer will either be one of the Token enum values
+or it will be an 'unknown' character like '+', which is returned as its ASCII
+value.  If the current token is an identifier, the <tt>IdentifierStr</tt>
+global variable holds the name of the identifier.  If the current token is a
+numeric literal (like 1.0), <tt>NumVal</tt> holds its value.  Note that we use
+global variables for simplicity, this is not the best choice for a real language
+implementation :).
+</p>
+
+<p>The actual implementation of the lexer is a single function named
+<tt>gettok</tt>. The <tt>gettok</tt> function is called to return the next token
+from standard input.  Its definition starts as:</p>
+
+<div class="doc_code">
+<pre>
+/// gettok - Return the next token from standard input.
+static int gettok() {
+  static int LastChar = ' ';
+
+  // Skip any whitespace.
+  while (isspace(LastChar))
+    LastChar = getchar();
+</pre>
+</div>
+
+<p>
+<tt>gettok</tt> works by calling the C <tt>getchar()</tt> function to read
+characters one at a time from standard input.  It eats them as it recognizes
+them and stores the last character read, but not processed, in LastChar.  The
+first thing that it has to do is ignore whitespace between tokens.  This is 
+accomplished with the loop above.</p>
+
+<p>The next thing <tt>gettok</tt> needs to do is recognize identifiers and
+specific keywords like "def".  Kaleidoscope does this with this simple loop:</p>
+
+<div class="doc_code">
+<pre>
+  if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
+    IdentifierStr = LastChar;
+    while (isalnum((LastChar = getchar())))
+      IdentifierStr += LastChar;
+
+    if (IdentifierStr == "def") return tok_def;
+    if (IdentifierStr == "extern") return tok_extern;
+    return tok_identifier;
+  }
+</pre>
+</div>
+
+<p>Note that this code sets the '<tt>IdentifierStr</tt>' global whenever it
+lexes an identifier.  Also, since language keywords are matched by the same
+loop, we handle them here inline.  Numeric values are similar:</p>
+
+<div class="doc_code">
+<pre>
+  if (isdigit(LastChar) || LastChar == '.') {   // Number: [0-9.]+
+    std::string NumStr;
+    do {
+      NumStr += LastChar;
+      LastChar = getchar();
+    } while (isdigit(LastChar) || LastChar == '.');
+
+    NumVal = strtod(NumStr.c_str(), 0);
+    return tok_number;
+  }
+</pre>
+</div>
+
+<p>This is all pretty straight-forward code for processing input.  When reading
+a numeric value from input, we use the C <tt>strtod</tt> function to convert it
+to a numeric value that we store in <tt>NumVal</tt>.  Note that this isn't doing
+sufficient error checking: it will incorrectly read "1.23.45.67" and handle it as
+if you typed in "1.23".  Feel free to extend it :).  Next we handle comments:
+</p>
+
+<div class="doc_code">
+<pre>
+  if (LastChar == '#') {
+    // Comment until end of line.
+    do LastChar = getchar();
+    while (LastChar != EOF &amp;&amp; LastChar != '\n' &amp;&amp; LastChar != '\r');
+    
+    if (LastChar != EOF)
+      return gettok();
+  }
+</pre>
+</div>
+
+<p>We handle comments by skipping to the end of the line and then return the
+next token.  Finally, if the input doesn't match one of the above cases, it is
+either an operator character like '+' or the end of the file.  These are handled
+with this code:</p>
+
+<div class="doc_code">
+<pre>
+  // Check for end of file.  Don't eat the EOF.
+  if (LastChar == EOF)
+    return tok_eof;
+  
+  // Otherwise, just return the character as its ascii value.
+  int ThisChar = LastChar;
+  LastChar = getchar();
+  return ThisChar;
+}
+</pre>
+</div>
+
+<p>With this, we have the complete lexer for the basic Kaleidoscope language
+(the <a href="LangImpl2.html#code">full code listing</a> for the Lexer is
+available in the <a href="LangImpl2.html">next chapter</a> of the tutorial).
+Next we'll <a href="LangImpl2.html">build a simple parser that uses this to 
+build an Abstract Syntax Tree</a>.  When we have that, we'll include a driver
+so that you can use the lexer and parser together.
+</p>
+
+<a href="LangImpl2.html">Next: Implementing a Parser and AST</a>
+</div>
+
+<!-- *********************************************************************** -->
+<hr>
+<address>
+  <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
+  src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
+  <a href="http://validator.w3.org/check/referer"><img
+  src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
+
+  <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
+  <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
+  Last modified: $Date$
+</address>
+</body>
+</html>
diff --git a/docs/tutorial/LangImpl2.html b/docs/tutorial/LangImpl2.html
new file mode 100644
index 0000000000..9c13b486fa
--- /dev/null
+++ b/docs/tutorial/LangImpl2.html
@@ -0,0 +1,1233 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
+                      "http://www.w3.org/TR/html4/strict.dtd">
+
+<html>
+<head>
+  <title>Kaleidoscope: Implementing a Parser and AST</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
+  <meta name="author" content="Chris Lattner">
+  <link rel="stylesheet" href="../llvm.css" type="text/css">
+</head>
+
+<body>
+
+<div class="doc_title">Kaleidoscope: Implementing a Parser and AST</div>
+
+<ul>
+<li><a href="index.html">Up to Tutorial Index</a></li>
+<li>Chapter 2
+  <ol>
+    <li><a href="#intro">Chapter 2 Introduction</a></li>
+    <li><a href="#ast">The Abstract Syntax Tree (AST)</a></li>
+    <li><a href="#parserbasics">Parser Basics</a></li>
+    <li><a href="#parserprimexprs">Basic Expression Parsing</a></li>
+    <li><a href="#parserbinops">Binary Expression Parsing</a></li>
+    <li><a href="#parsertop">Parsing the Rest</a></li>
+    <li><a href="#driver">The Driver</a></li>
+    <li><a href="#conclusions">Conclusions</a></li>
+    <li><a href="#code">Full Code Listing</a></li>
+  </ol>
+</li>
+<li><a href="LangImpl3.html">Chapter 3</a>: Code generation to LLVM IR</li>
+</ul>
+
+<div class="doc_author">
+  <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
+</div>
+
+<!-- *********************************************************************** -->
+<div class="doc_section"><a name="intro">Chapter 2 Introduction</a></div>
+<!-- *********************************************************************** -->
+
+<div class="doc_text">
+
+<p>Welcome to Chapter 2 of the "<a href="index.html">Implementing a language
+with LLVM</a>" tutorial.  This chapter shows you how to use the lexer, built in 
+<a href="LangImpl1.html">Chapter 1</a>, to build a full <a
+href="http://en.wikipedia.org/wiki/Parsing">parser</a> for
+our Kaleidoscope language.  Once we have a parser, we'll define and build an <a 
+href="http://en.wikipedia.org/wiki/Abstract_syntax_tree">Abstract Syntax 
+Tree</a> (AST).</p>
+
+<p>The parser we will build uses a combination of <a 
+href="http://en.wikipedia.org/wiki/Recursive_descent_parser">Recursive Descent
+Parsing</a> and <a href=
+"http://en.wikipedia.org/wiki/Operator-precedence_parser">Operator-Precedence 
+Parsing</a> to parse the Kaleidoscope language (the latter for 
+binary expressions and the former for everything else).  Before we get to
+parsing though, lets talk about the output of the parser: the Abstract Syntax
+Tree.</p>
+
+</div>
+
+<!-- *********************************************************************** -->
+<div class="doc_section"><a name="ast">The Abstract Syntax Tree (AST)</a></div>
+<!-- *********************************************************************** -->
+
+<div class="doc_text">
+
+<p>The AST for a program captures its behavior in such a way that it is easy for
+later stages of the compiler (e.g. code generation) to interpret.  We basically
+want one object for each construct in the language, and the AST should closely
+model the language.  In Kaleidoscope, we have expressions, a prototype, and a
+function object.  We'll start with expressions first:</p>
+
+<div class="doc_code">
+<pre>
+/// ExprAST - Base class for all expression nodes.
+class ExprAST {
+public:
+  virtual ~ExprAST() {}
+};
+
+/// NumberExprAST - Expression class for numeric literals like "1.0".
+class NumberExprAST : public ExprAST {
+  double Val;
+public:
+  NumberExprAST(double val) : Val(val) {}
+};
+</pre>
+</div>
+
+<p>The code above shows the definition of the base ExprAST class and one
+subclass which we use for numeric literals.  The important thing to note about
+this code is that the NumberExprAST class captures the numeric value of the
+literal as an instance variable. This allows later phases of the compiler to
+know what the stored numeric value is.</p>
+
+<p>Right now we only create the AST,  so there are no useful accessor methods on
+them.  It would be very easy to add a virtual method to pretty print the code,
+for example.  Here are the other expression AST node definitions that we'll use
+in the basic form of the Kaleidoscope language:
+</p>
+
+<div class="doc_code">
+<pre>
+/// VariableExprAST - Expression class for referencing a variable, like "a".
+class VariableExprAST : public ExprAST {
+  std::string Name;
+public:
+  VariableExprAST(const std::string &amp;name) : Name(name) {}
+};
+
+/// BinaryExprAST - Expression class for a binary operator.
+class BinaryExprAST : public ExprAST {
+  char Op;
+  ExprAST *LHS, *RHS;
+public:
+  BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs) 
+    : Op(op), LHS(lhs), RHS(rhs) {}
+};
+
+/// CallExprAST - Expression class for function calls.
+class CallExprAST : public ExprAST {
+  std::string Callee;
+  std::vector&lt;ExprAST*&gt; Args;
+public:
+  CallExprAST(const std::string &amp;callee, std::vector&lt;ExprAST*&gt; &amp;args)
+    : Callee(callee), Args(args) {}
+};
+</pre>
+</div>
+
+<p>This is all (intentionally) rather straight-forward: variables capture the
+variable name, binary operators capture their opcode (e.g. '+'), and calls
+capture a function name as well as a list of any argument expressions.  One thing 
+that is nice about our AST is that it captures the language features without 
+talking about the syntax of the language.  Note that there is no discussion about 
+precedence of binary operators, lexical structure, etc.</p>
+
+<p>For our basic language, these are all of the expression nodes we'll define.
+Because it doesn't have conditional control flow, it isn't Turing-complete;
+we'll fix that in a later installment.  The two things we need next are a way
+to talk about the interface to a function, and a way to talk about functions
+themselves:</p>
+
+<div class="doc_code">
+<pre>
+/// PrototypeAST - This class represents the "prototype" for a function,
+/// which captures its name, and its argument names (thus implicitly the number
+/// of arguments the function takes).
+class PrototypeAST {
+  std::string Name;
+  std::vector&lt;std::string&gt; Args;
+public:
+  PrototypeAST(const std::string &amp;name, const std::vector&lt;std::string&gt; &amp;args)
+    : Name(name), Args(args) {}
+};
+
+/// FunctionAST - This class represents a function definition itself.
+class FunctionAST {
+  PrototypeAST *Proto;
+  ExprAST *Body;
+public:
+  FunctionAST(PrototypeAST *proto, ExprAST *body)
+    : Proto(proto), Body(body) {}
+};
+</pre>
+</div>
+
+<p>In Kaleidoscope, functions are typed with just a count of their arguments.
+Since all values are double precision floating point, the type of each argument
+doesn't need to be stored anywhere.  In a more aggressive and realistic
+language, the "ExprAST" class would probably have a type field.</p>
+
+<p>With this scaffolding, we can now talk about parsing expressions and function
+bodies in Kaleidoscope.</p>
+
+</div>
+
+<!-- *********************************************************************** -->
+<div class="doc_section"><a name="parserbasics">Parser Basics</a></div>
+<!-- *********************************************************************** -->
+
+<div class="doc_text">
+
+<p>Now that we have an AST to build, we need to define the parser code to build
+it.  The idea here is that we want to parse something like "x+y" (which is
+returned as three tokens by the lexer) into an AST that could be generated with
+calls like this:</p>
+
+<div class="doc_code">
+<pre>
+  ExprAST *X = new VariableExprAST("x");
+  ExprAST *Y = new VariableExprAST("y");
+  ExprAST *Result = new BinaryExprAST('+', X, Y);
+</pre>
+</div>
+
+<p>In order to do this, we'll start by defining some basic helper routines:</p>
+
+<div class="doc_code">
+<pre>
+/// CurTok/getNextToken - Provide a simple token buffer.  CurTok is the current
+/// token the parser is looking at.  getNextToken reads another token from the
+/// lexer and updates CurTok with its results.
+static int CurTok;
+static int getNextToken() {
+  return CurTok = gettok();
+}
+</pre>
+</div>
+
+<p>
+This implements a simple token buffer around the lexer.  This allows 
+us to look one token ahead at what the lexer is returning.  Every function in
+our parser will assume that CurTok is the current token that needs to be
+parsed.</p>
+
+<div class="doc_code">
+<pre>
+
+/// Error* - These are little helper functions for error handling.
+ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
+PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
+FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
+</pre>
+</div>
+
+<p>
+The <tt>Error</tt> routines are simple helper routines that our parser will use
+to handle errors.  The error recovery in our parser will not be the best and
+is not particular user-friendly, but it will be enough for our tutorial.  These
+routines make it easier to handle errors in routines that have various return
+types: they always return null.</p>
+
+<p>With these basic helper functions, we can implement the first
+piece of our grammar: numeric literals.</p>
+
+</div>
+
+<!-- *********************************************************************** -->
+<div class="doc_section"><a name="parserprimexprs">Basic Expression
+ Parsing</a></div>
+<!-- *********************************************************************** -->
+
+<div class="doc_text">
+
+<p>We start with numeric literals, because they are the simplest to process.
+For each production in our grammar, we'll define a function which parses that
+production.  For numeric literals, we have:
+</p>
+
+<div class="doc_code">
+<pre>
+/// numberexpr ::= number
+static ExprAST *ParseNumberExpr() {
+  ExprAST *Result = new NumberExprAST(NumVal);
+  getNextToken(); // consume the number
+  return Result;
+}
+</pre>
+</div>
+
+<p>This routine is very simple: it expects to be called when the current token
+is a <tt>tok_number</tt> token.  It takes the current number value, creates 
+a <tt>NumberExprAST</tt> node, advances the lexer to the next token, and finally
+returns.</p>
+
+<p>There are some interesting aspects to this.  The most important one is that
+this routine eats all of the tokens that correspond to the production and
+returns the lexer buffer with the next token (which is not part of the grammar
+production) ready to go.  This is a fairly standard way to go for recursive
+descent parsers.  For a better example, the parenthesis operator is defined like
+this:</p>
+
+<div class="doc_code">
+<pre>
+/// parenexpr ::= '(' expression ')'
+static ExprAST *ParseParenExpr() {
+  getNextToken();  // eat (.
+  ExprAST *V = ParseExpression();
+  if (!V) return 0;
+  
+  if (CurTok != ')')
+    return Error("expected ')'");
+  getNextToken();  // eat ).
+  return V;
+}
+</pre>
+</div>
+
+<p>This function illustrates a number of interesting things about the 
+parser:</p>
+
+<p>
+1) It shows how we use the Error routines.  When called, this function expects
+that the current token is a '(' token, but after parsing the subexpression, it
+is possible that there is no ')' waiting.  For example, if the user types in
+"(4 x" instead of "(4)", the parser should emit an error.  Because errors can
+occur, the parser needs a way to indicate that they happened: in our parser, we
+return null on an error.</p>
+
+<p>2) Another interesting aspect of this function is that it uses recursion by
+calling <tt>ParseExpression</tt> (we will soon see that <tt>ParseExpression</tt> can call
+<tt>ParseParenExpr</tt>).  This is powerful because it allows us to handle 
+recursive grammars, and keeps each production very simple.  Note that
+parentheses do not cause construction of AST nodes themselves.  While we could
+do it this way, the most important role of parentheses are to guide the parser
+and provide grouping.  Once the parser constructs the AST, parentheses are not
+needed.</p>
+
+<p>The next simple production is for handling variable references and function
+calls:</p>
+
+<div class="doc_code">
+<pre>
+/// identifierexpr
+///   ::= identifier
+///   ::= identifier '(' expression* ')'
+static ExprAST *ParseIdentifierExpr() {
+  std::string IdName = IdentifierStr;
+  
+  getNextToken();  // eat identifier.
+  
+  if (CurTok != '(') // Simple variable ref.
+    return new VariableExprAST(IdName);
+  
+  // Call.
+  getNextToken();  // eat (
+  std::vector&lt;ExprAST*&gt; Args;
+  if (CurTok != ')') {
+    while (1) {
+      ExprAST *Arg = ParseExpression();
+      if (!Arg) return 0;
+      Args.push_back(Arg);
+
+      if (CurTok == ')') break;
+
+      if (CurTok != ',')
+        return Error("Expected ')' or ',' in argument list");
+      getNextToken();
+    }
+  }
+
+  // Eat the ')'.
+  getNextToken();
+  
+  return new CallExprAST(IdName, Args);
+}
+</pre>
+</div>
+
+<p>This routine follows the same style as the other routines.  (It expects to be
+called if the current token is a <tt>tok_identifier</tt> token).  It also has
+recursion and error handling.  One interesting aspect of this is that it uses
+<em>look-ahead</em> to determine if the current identifier is a stand alone
+variable reference or if it is a function call expression.  It handles this by
+checking to see if the token after the identifier is a '(' token, constructing
+either a <tt>VariableExprAST</tt> or <tt>CallExprAST</tt> node as appropriate.
+</p>
+
+<p>Now that we have all of our simple expression-parsing logic in place, we can
+define a helper function to wrap it together into one entry point.  We call this
+class of expressions "primary" expressions, for reasons that will become more
+clear <a href="LangImpl6.html#unary">later in the tutorial</a>.  In order to
+parse an arbitrary primary expression, we need to determine what sort of
+expression it is:</p>
+
+<div class="doc_code">
+<pre>
+/// primary
+///   ::= identifierexpr
+///   ::= numberexpr
+///   ::= parenexpr
+static ExprAST *ParsePrimary() {
+  switch (CurTok) {
+  default: return Error("unknown token when expecting an expression");
+  case tok_identifier: return ParseIdentifierExpr();
+  case tok_number:     return ParseNumberExpr();
+  case '(':            return ParseParenExpr();
+  }
+}
+</pre>
+</div>
+
+<p>Now that you see the definition of this function, it is more obvious why we
+can assume the state of CurTok in the various functions.  This uses look-ahead
+to determine which sort of expression is being inspected, and then parses it
+with a function call.</p>
+
+<p>Now that basic expressions are handled, we need to handle binary expressions.
+They are a bit more complex.</p>
+
+</div>
+
+<!-- *********************************************************************** -->
+<div class="doc_section"><a name="parserbinops">Binary Expression
+ Parsing</a></div>
+<!-- *********************************************************************** -->
+
+<div class="doc_text">
+
+<p>Binary expressions are significantly harder to parse because they are often
+ambiguous.  For example, when given the string "x+y*z", the parser can choose
+to parse it as either "(x+y)*z" or "x+(y*z)".  With common definitions from
+mathematics, we expect the later parse, because "*" (multiplication) has
+higher <em>precedence</em> than "+" (addition).</p>
+
+<p>There are many ways to handle this, but an elegant and efficient way is to
+use <a href=
+"http://en.wikipedia.org/wiki/Operator-precedence_parser">Operator-Precedence 
+Parsing</a>.  This parsing technique uses the precedence of binary operators to
+guide recursion.  To start with, we need a table of precedences:</p>
+
+<div class="doc_code">
+<pre>
+/// BinopPrecedence - This holds the precedence for each binary operator that is
+/// defined.
+static std::map&lt;char, int&gt; BinopPrecedence;
+
+/// GetTokPrecedence - Get the preceden
author	mike-m <mikem.llvm@gmail.com>	2010-05-07 00:28:04 +0000
committer	mike-m <mikem.llvm@gmail.com>	2010-05-07 00:28:04 +0000
commit	e2c3a49c8029ebd9ef530101cc24c66562e3dff5 (patch)
tree	91bf9600cc8df90cf99751a8f8bafc317cffc91e /docs/tutorial
parent	c10b5afbe8138b0fdf3af4ed3e1ddf96cf3cb4cb (diff)