1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>Kaleidoscope: Implementing code generation to LLVM IR</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="author" content="Chris Lattner">
<link rel="stylesheet" href="../llvm.css" type="text/css">
</head>
<body>
<div class="doc_title">Kaleidoscope: Code generation to LLVM IR</div>
<div class="doc_author">
<p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
</div>
<!-- *********************************************************************** -->
<div class="doc_section"><a name="intro">Part 3 Introduction</a></div>
<!-- *********************************************************************** -->
<div class="doc_text">
<p>Welcome to part 3 of the "<a href="index.html">Implementing a language with
LLVM</a>" tutorial. This chapter shows you how to transform the <a
href="LangImpl2.html">Abstract Syntax Tree built in Chapter 2</a> into LLVM IR.
This will teach you a little bit about how LLVM does things, as well as
demonstrate how easy it is to use. It's much more work to build a lexer and
parser than it is to generate LLVM IR code.
</p>
</div>
<!-- *********************************************************************** -->
<div class="doc_section"><a name="basics">Code Generation setup</a></div>
<!-- *********************************************************************** -->
<div class="doc_text">
<p>
In order to generate LLVM IR, we want some simple setup to get started. First,
we define virtual codegen methods in each AST class:</p>
<div class="doc_code">
<pre>
/// ExprAST - Base class for all expression nodes.
class ExprAST {
public:
virtual ~ExprAST() {}
virtual Value *Codegen() = 0;
};
/// NumberExprAST - Expression class for numeric literals like "1.0".
class NumberExprAST : public ExprAST {
double Val;
public:
explicit NumberExprAST(double val) : Val(val) {}
virtual Value *Codegen();
};
...
</pre>
</div>
<p>The Codegen() method says to emit IR for that AST node and all things it
depends on, and they all return an LLVM Value object.
"Value" is the class used to represent a "<a
href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Static Single
Assignment (SSA)</a> register" or "SSA value" in LLVM. The most distinct aspect
of SSA values is that their value is computed as the related instruction
executes, and it does not get a new value until (and if) the instruction
re-executes. In order words, there is no way to "change" an SSA value. For
more information, please read up on <a
href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Static Single
Assignment</a> - the concepts are really quite natural once you grok them.</p>
<p>The
second thing we want is an "Error" method like we used for parser, which will
be used to report errors found during code generation (for example, use of an
undeclared parameter):</p>
<div class="doc_code">
<pre>
Value *ErrorV(const char *Str) { Error(Str); return 0; }
static Module *TheModule;
static LLVMBuilder Builder;
static std::map<std::string, Value*> NamedValues;
</pre>
</div>
<p>The static variables will be used during code generation. <tt>TheModule</tt>
is the LLVM construct that contains all of the functions and global variables in
a chunk of code. In many ways, it is the top-level structure that the LLVM IR
uses to contain code.</p>
<p>The <tt>Builder</tt> object is a helper object that makes it easy to generate
LLVM instructions. The <tt>Builder</tt> keeps track of the current place to
insert instructions and has methods to create new instructions.</p>
<p>The <tt>NamedValues</tt> map keeps track of which values are defined in the
current scope and what their LLVM representation is. In this form of
Kaleidoscope, the only things that can be referenced are function parameters.
As such, function parameters will be in this map when generating code for their
function body.</p>
<p>
With these basics in place, we can start talking about how to generate code for
each expression. Note that this assumes that the <tt>Builder</tt> has been set
up to generate code <em>into</em> something. For now, we'll assume that this
has already been done, and we'll just use it to emit code.
</p>
</div>
<!-- *********************************************************************** -->
<div class="doc_section"><a name="exprs">Expression Code Generation</a></div>
<!-- *********************************************************************** -->
<div class="doc_text">
<p>Generating LLVM code for expression nodes is very straight-forward: less
than 45 lines of commented code for all four of our expression nodes. First,
we'll do numeric literals:</p>
<div class="doc_code">
<pre>
Value *NumberExprAST::Codegen() {
return ConstantFP::get(Type::DoubleTy, APFloat(Val));
}
</pre>
</div>
<p>In the LLVM IR, numeric constants are represented with the
<tt>ConstantFP</tt> class, which holds the numeric value in an <tt>APFloat</tt>
internally (<tt>APFloat</tt> has the capability of holding floating point
constants of <em>A</em>rbitrary <em>P</em>recision). This code basically just
creates and returns a <tt>ConstantFP</tt>. Note that in the LLVM IR
that constants are all uniqued together and shared. For this reason, the API
uses "the foo::get(..)" idiom instead of "new foo(..)" or "foo::create(..).</p>
<div class="doc_code">
<pre>
Value *VariableExprAST::Codegen() {
// Look this variable up in the function.
Value *V = NamedValues[Name];
return V ? V : ErrorV("Unknown variable name");
}
</pre>
</div>
<p>References to variables is also quite simple here. In the simple version
of Kaleidoscope, we assume that the variable has already been emited somewhere
and its value is available. In practice, the only values that can be in the
<tt>NamedValues</tt> map are function arguments. This
code simply checks to see that the specified name is in the map (if not, an
unknown variable is being referenced) and returns the value for it.</p>
<div class="doc_code">
<pre>
Value *BinaryExprAST::Codegen() {
Value *L = LHS->Codegen();
Value *R = RHS->Codegen();
if (L == 0 || R == 0) return 0;
switch (Op) {
case '+': return Builder.CreateAdd(L, R, "addtmp");
case '-': return Builder.CreateSub(L, R, "subtmp");
case '*': return Builder.CreateMul(L, R, "multmp");
case '<':
L = Builder.CreateFCmpULT(L, R, "multmp");
// Convert bool 0/1 to double 0.0 or 1.0
return Builder.CreateUIToFP(L, Type::DoubleTy, "booltmp");
default: return ErrorV("invalid binary operator");
}
}
</pre>
</div>
<p>Binary operators start to get more interesting. The basic idea here is that
we recursively emit code for the left-hand side of the expression, then the
right-hand side, then we compute the result of the binary expression. In this
code, we do a simple switch on the opcode to create the right LLVM instruction.
</p>
<p>In this example, the LLVM builder class is starting to show its value.
Because it knows where to insert the newly created instruction, you just have to
specificy what instruction to create (e.g. with <tt>CreateAdd</tt>), which
operands to use (<tt>L</tt> and <tt>R</tt> here) and optionally provide a name
for the generated instruction. One nice thing about LLVM is that the name is
just a hint: if there are multiple additions in a single function, the first
will be named "addtmp" and the second will be "autorenamed" by adding a suffix,
giving it a name like "addtmp42". Local value names for instructions are purely
optional, but it makes it much easier to read the IR dumps.</p>
<p><a href="../LangRef.html#instref">LLVM instructions</a> are constrained to
have very strict type properties: for example, the Left and Right operators of
an <a href="../LangRef.html#i_add">add instruction</a> have to have the same
type, and that the result of the add matches the operands. Because all values
in Kaleidoscope are doubles, this makes for very simple code for add, sub and
mul.</p>
<p>On the other hand, LLVM specifies that the <a
href="../LangRef.html#i_fcmp">fcmp instruction</a> always returns an 'i1' value
(a one bit integer). However, Kaleidoscope wants the value to be a 0.0 or 1.0
value. In order to get these semantics, we combine the fcmp instruction with
a <a href="../LangRef.html#i_uitofp">uitofp instruction</a>. This instruction
converts its input integer into a floating point value by treating the input
as an unsigned value. In contrast, if we used the <a
href="../LangRef.html#i_sitofp">sitofp instruction</a>, the Kaleidoscope '<'
operator would return 0.0 and -1.0, depending on the input value.</p>
<div class="doc_code">
<pre>
Value *CallExprAST::Codegen() {
// Look up the name in the global module table.
Function *CalleeF = TheModule->getFunction(Callee);
if (CalleeF == 0)
return ErrorV("Unknown function referenced");
// If argument mismatch error.
if (CalleeF->arg_size() != Args.size())
return ErrorV("Incorrect # arguments passed");
std::vector<Value*> ArgsV;
for (unsigned i = 0, e = Args.size(); i != e; ++i) {
ArgsV.push_back(Args[i]->Codegen());
if (ArgsV.back() == 0) return 0;
}
return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
}
</pre>
</div>
<p>Code generation for function calls is quite straight-forward with LLVM. The
code above first looks the name of the function up in the LLVM Module's symbol
table. Recall that the LLVM Module is the container that holds all of the
functions we are JIT'ing. By giving each function the same name as what the
user specifies, we can use the LLVM symbol table to resolve function names for
us.</p>
<p>Once we have the function to call, we recursively codegen each argument that
is to be passed in, and create an LLVM <a href="../LangRef.html#i_call">call
instruction</a>. Note that LLVM uses the native C calling conventions by
default, allowing these calls to call into standard library functions like
"sin" and "cos" with no additional effort.</p>
<p>This wraps up our handling of the four basic expressions that we have so far
in Kaleidoscope. Feel free to go in and add some more. For example, by
browsing the <a href="../LangRef.html">LLVM language reference</a> you'll find
several other interesting instructions that are really easy to plug into our
basic framework.</p>
</div>
<!-- *********************************************************************** -->
<div class="doc_section"><a name="code">Conclusions and the Full Code</a></div>
<!-- *********************************************************************** -->
<div class="doc_text">
<div class="doc_code">
<pre>
// To build this:
// g++ -g toy.cpp `llvm-config --cppflags` `llvm-config --ldflags` \
// `llvm-config --libs core` -I ~/llvm/include/
// ./a.out
// See example below.
#include "llvm/DerivedTypes.h"
#include "llvm/Module.h"
#include "llvm/Support/LLVMBuilder.h"
#include <cstdio>
#include
|