aboutsummaryrefslogtreecommitdiff
path: root/docs/tutorial/LangImpl2.html
diff options
context:
space:
mode:
authorChris Lattner <sabre@nondot.org>2007-11-06 18:13:32 +0000
committerChris Lattner <sabre@nondot.org>2007-11-06 18:13:32 +0000
commitbd779a3a1f405e208be2d2a9d6c1960b12b1ec99 (patch)
treeb069d40f5eae51522fd3722d2a286753db1b8093 /docs/tutorial/LangImpl2.html
parent4f6766d41e867948357e95168c40f20465ee27c6 (diff)
fixes from Kelly Wilson.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@43767 91177308-0d34-0410-b5e6-96231b3b80d8
Diffstat (limited to 'docs/tutorial/LangImpl2.html')
-rw-r--r--docs/tutorial/LangImpl2.html81
1 files changed, 44 insertions, 37 deletions
diff --git a/docs/tutorial/LangImpl2.html b/docs/tutorial/LangImpl2.html
index df54f5b84f..6dba260771 100644
--- a/docs/tutorial/LangImpl2.html
+++ b/docs/tutorial/LangImpl2.html
@@ -42,8 +42,8 @@
<div class="doc_text">
<p>Welcome to Chapter 2 of the "<a href="index.html">Implementing a language
-with LLVM</a>" tutorial. This chapter shows you how to use the <a
-href="LangImpl1.html">Lexer built in Chapter 1</a> to build a full <a
+with LLVM</a>" tutorial. This chapter shows you how to use the lexer, built in
+<a href="LangImpl1.html">Chapter 1</a>, to build a full <a
href="http://en.wikipedia.org/wiki/Parsing">parser</a> for
our Kaleidoscope language. Once we have a parser, we'll define and build an <a
href="http://en.wikipedia.org/wiki/Abstract_syntax_tree">Abstract Syntax
@@ -53,9 +53,10 @@ Tree</a> (AST).</p>
href="http://en.wikipedia.org/wiki/Recursive_descent_parser">Recursive Descent
Parsing</a> and <a href=
"http://en.wikipedia.org/wiki/Operator-precedence_parser">Operator-Precedence
-Parsing</a> to parse the Kaleidoscope language (the latter for binary expression
-and the former for everything else). Before we get to parsing though, lets talk
-about the output of the parser: the Abstract Syntax Tree.</p>
+Parsing</a> to parse the Kaleidoscope language (the latter for
+binary expressions and the former for everything else). Before we get to
+parsing though, lets talk about the output of the parser: the Abstract Syntax
+Tree.</p>
</div>
@@ -65,7 +66,7 @@ about the output of the parser: the Abstract Syntax Tree.</p>
<div class="doc_text">
-<p>The AST for a program captures its behavior in a way that it is easy for
+<p>The AST for a program captures its behavior in such a way that it is easy for
later stages of the compiler (e.g. code generation) to interpret. We basically
want one object for each construct in the language, and the AST should closely
model the language. In Kaleidoscope, we have expressions, a prototype, and a
@@ -89,9 +90,10 @@ public:
</div>
<p>The code above shows the definition of the base ExprAST class and one
-subclass which we use for numeric literals. The important thing about this is
-that the NumberExprAST class captures the numeric value of the literal in the
-class, so that later phases of the compiler can know what it is.</p>
+subclass which we use for numeric literals. The important thing to note about
+this code is that the NumberExprAST class captures the numeric value of the
+literal as an instance variable. This allows later phases of the compiler to
+know what the stored numeric value is.</p>
<p>Right now we only create the AST, so there are no useful accessor methods on
them. It would be very easy to add a virtual method to pretty print the code,
@@ -130,10 +132,10 @@ public:
<p>This is all (intentially) rather straight-forward: variables capture the
variable name, binary operators capture their opcode (e.g. '+'), and calls
-capture a function name and list of argument expressions. One thing that is
-nice about our AST is that it captures the language features without talking
-about the syntax of the language. Note that there is no discussion about
-precedence of binary operators, lexical structure etc.</p>
+capture a function name as well as a list of any argument expressions. One thing
+that is nice about our AST is that it captures the language features without
+talking about the syntax of the language. Note that there is no discussion about
+precedence of binary operators, lexical structure, etc.</p>
<p>For our basic language, these are all of the expression nodes we'll define.
Because it doesn't have conditional control flow, it isn't Turing-complete;
@@ -231,8 +233,8 @@ is not particular user-friendly, but it will be enough for our tutorial. These
routines make it easier to handle errors in routines that have various return
types: they always return null.</p>
-<p>With these basic helper functions implemented, we can implement the first
-piece of our grammar: we'll start with numeric literals.</p>
+<p>With these basic helper functions, we can implement the first
+piece of our grammar: numeric literals.</p>
</div>
@@ -261,10 +263,10 @@ static ExprAST *ParseNumberExpr() {
<p>This routine is very simple: it expects to be called when the current token
is a <tt>tok_number</tt> token. It takes the current number value, creates
-a <tt>NumberExprAST</tt> node, advances the lexer to the next token, then
+a <tt>NumberExprAST</tt> node, advances the lexer to the next token and finally
returns.</p>
-<p>There are some interesting aspects of this. The most important one is that
+<p>There are some interesting aspects to this. The most important one is that
this routine eats all of the tokens that correspond to the production, and
returns the lexer buffer with the next token (which is not part of the grammar
production) ready to go. This is a fairly standard way to go for recursive
@@ -287,7 +289,10 @@ static ExprAST *ParseParenExpr() {
</pre>
</div>
-<p>This function illustrates a number of interesting things about the parser:
+<p>This function illustrates a number of interesting things about the
+parser:</p>
+
+<p>
1) it shows how we use the Error routines. When called, this function expects
that the current token is a '(' token, but after parsing the subexpression, it
is possible that there is no ')' waiting. For example, if the user types in
@@ -295,13 +300,14 @@ is possible that there is no ')' waiting. For example, if the user types in
occur, the parser needs a way to indicate that they happened: in our parser, we
return null on an error.</p>
-<p>Another interesting aspect of this function is that it uses recursion by
+<p>2) Another interesting aspect of this function is that it uses recursion by
calling <tt>ParseExpression</tt> (we will soon see that <tt>ParseExpression</tt> can call
<tt>ParseParenExpr</tt>). This is powerful because it allows us to handle
recursive grammars, and keeps each production very simple. Note that
parentheses do not cause construction of AST nodes themselves. While we could
-do this, the most important role of parens are to guide the parser and provide
-grouping. Once the parser constructs the AST, parens are not needed.</p>
+do it this way, the most important role of parens are to guide the parser and
+provide grouping. Once the parser constructs the AST, parens are not
+needed.</p>
<p>The next simple production is for handling variable references and function
calls:</p>
@@ -474,8 +480,8 @@ static ExprAST *ParseExpression() {
</div>
<p><tt>ParseBinOpRHS</tt> is the function that parses the sequence of pairs for
-us. It takes a precedence and a pointer to an expression for the part parsed
-so far. Note that "x" is a perfectly valid expression: As such, "binoprhs" is
+us. It takes a precedence and a pointer to an expression for the part that has been
+parsed so far. Note that "x" is a perfectly valid expression: As such, "binoprhs" is
allowed to be empty, in which case it returns the expression that is passed into
it. In our example above, the code passes the expression for "a" into
<tt>ParseBinOpRHS</tt> and the current token is "+".</p>
@@ -522,7 +528,7 @@ operator and that it will be included in this expression:</p>
</div>
<p>As such, this code eats (and remembers) the binary operator and then parses
-the following primary expression. This builds up the whole pair, the first of
+the primary expression that follows. This builds up the whole pair, the first of
which is [+, b] for the running example.</p>
<p>Now that we parsed the left-hand side of an expression and one pair of the
@@ -542,8 +548,8 @@ compare it to BinOp's precedence (which is '+' in this case):</p>
<p>If the precedence of the binop to the right of "RHS" is lower or equal to the
precedence of our current operator, then we know that the parentheses associate
-as "(a+b) binop ...". In our example, since the next operator is "+" and so is
-our current one, we know that they have the same precedence. In this case we'll
+as "(a+b) binop ...". In our example, the current operator is "+" and the next
+operator is "+", we know that they have the same precedence. In this case we'll
create the AST node for "a+b", and then continue parsing:</p>
<div class="doc_code">
@@ -559,10 +565,10 @@ create the AST node for "a+b", and then continue parsing:</p>
</div>
<p>In our example above, this will turn "a+b+" into "(a+b)" and execute the next
-iteration of the loop, with "+" as the current token. The code above will eat
-and remember it and parse "(c+d)" as the primary expression, which makes the
-current pair be [+, (c+d)]. It will then enter the 'if' above with "*" as the
-binop to the right of the primary. In this case, the precedence of "*" is
+iteration of the loop, with "+" as the current token. The code above will eat,
+remember, and parse "(c+d)" as the primary expression, which makes the
+current pair equal to [+, (c+d)]. It will then evaluate the 'if' conditional above with
+"*" as the binop to the right of the primary. In this case, the precedence of "*" is
higher than the precedence of "+" so the if condition will be entered.</p>
<p>The critical question left here is "how can the if condition parse the right
@@ -596,7 +602,7 @@ precedence required for it to continue. In our example above, this will cause
it to return the AST node for "(c+d)*e*f" as RHS, which is then set as the RHS
of the '+' expression.</p>
-<p>Finally, on the next iteration of the while loop, the "+g" piece is parsed.
+<p>Finally, on the next iteration of the while loop, the "+g" piece is parsed
and added to the AST. With this little bit of code (14 non-trivial lines), we
correctly handle fully general binary expression parsing in a very elegant way.
This was a whirlwind tour of this code, and it is somewhat subtle. I recommend
@@ -604,9 +610,9 @@ running through it with a few tough examples to see how it works.
</p>
<p>This wraps up handling of expressions. At this point, we can point the
-parser at an arbitrary token stream and build an expression from them, stopping
+parser at an arbitrary token stream and build an expression from it, stopping
at the first token that is not part of the expression. Next up we need to
-handle function definitions etc.</p>
+handle function definitions, etc.</p>
</div>
@@ -671,7 +677,7 @@ static FunctionAST *ParseDefinition() {
</div>
<p>In addition, we support 'extern' to declare functions like 'sin' and 'cos' as
-well as to support forward declaration of user functions. 'externs' are just
+well as to support forward declaration of user functions. These 'extern's are just
prototypes with no body:</p>
<div class="doc_code">
@@ -738,8 +744,8 @@ static void MainLoop() {
<p>The most interesting part of this is that we ignore top-level semi colons.
Why is this, you ask? The basic reason is that if you type "4 + 5" at the
-command line, the parser doesn't know that that is the end of what you will
-type. For example, on the next line you could type "def foo..." in which case
+command line, the parser doesn't know whether that is the end of what you will type
+or not. For example, on the next line you could type "def foo..." in which case
4+5 is the end of a top-level expression. Alternatively you could type "* 6",
which would continue the expression. Having top-level semicolons allows you to
type "4+5;" and the parser will know you are done.</p>
@@ -778,7 +784,8 @@ $
<p>There is a lot of room for extension here. You can define new AST nodes,
extend the language in many ways, etc. In the <a href="LangImpl3.html">next
-installment</a>, we will describe how to generate LLVM IR from the AST.</p>
+installment</a>, we will describe how to generate LLVM Intermediate
+Representation (IR) from the AST.</p>
</div>