aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorChris Lattner <sabre@nondot.org>2010-09-11 23:02:10 +0000
committerChris Lattner <sabre@nondot.org>2010-09-11 23:02:10 +0000
commite1b834515b07ea20ede924c7562317b9ebc69a46 (patch)
tree4a054e872c4ba85f0e9ae593599e7682260415d5 /docs
parent0989d29d093c281a0d8b4f1b1ea22436249c4087 (diff)
add some documentation for the most important MC-level classes along with
an overview of mc and the idea of the code emission phase. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@113707 91177308-0d34-0410-b5e6-96231b3b80d8
Diffstat (limited to 'docs')
-rw-r--r--docs/CodeGenerator.html267
1 files changed, 243 insertions, 24 deletions
diff --git a/docs/CodeGenerator.html b/docs/CodeGenerator.html
index 4d0813fd6d..7185f4d0a6 100644
--- a/docs/CodeGenerator.html
+++ b/docs/CodeGenerator.html
@@ -33,7 +33,7 @@
<li><a href="#targetjitinfo">The <tt>TargetJITInfo</tt> class</a></li>
</ul>
</li>
- <li><a href="#codegendesc">Machine code description classes</a>
+ <li><a href="#codegendesc">The "Machine" Code Generator classes</a>
<ul>
<li><a href="#machineinstr">The <tt>MachineInstr</tt> class</a></li>
<li><a href="#machinebasicblock">The <tt>MachineBasicBlock</tt>
@@ -41,6 +41,15 @@
<li><a href="#machinefunction">The <tt>MachineFunction</tt> class</a></li>
</ul>
</li>
+ <li><a href="#mc">The "MC" Layer</a>
+ <ul>
+ <li><a href="#mcstreamer">The <tt>MCStreamer</tt> API</a></li>
+ <li><a href="#mccontext">The <tt>MCContext</tt> class</a>
+ <li><a href="#mcsymbol">The <tt>MCSymbol</tt> class</a></li>
+ <li><a href="#mcsection">The <tt>MCSection</tt> class</a></li>
+ <li><a href="#mcinst">The <tt>MCInst</tt> class</a></li>
+ </ul>
+ </li>
<li><a href="#codegenalgs">Target-independent code generation algorithms</a>
<ul>
<li><a href="#instselect">Instruction Selection</a>
@@ -76,13 +85,11 @@
<li><a href="#regAlloc_fold">Instruction folding</a></li>
<li><a href="#regAlloc_builtIn">Built in register allocators</a></li>
</ul></li>
- <li><a href="#codeemit">Code Emission</a>
- <ul>
- <li><a href="#codeemit_asm">Generating Assembly Code</a></li>
- <li><a href="#codeemit_bin">Generating Binary Machine Code</a></li>
- </ul></li>
+ <li><a href="#codeemit">Code Emission</a></li>
</ul>
</li>
+ <li><a href="#nativeassembler">Implementing a Native Assembler</a></li>
+
<li><a href="#targetimpls">Target-specific Implementation Notes</a>
<ul>
<li><a href="#tailcallopt">Tail call optimization</a></li>
@@ -100,11 +107,7 @@
</ol>
<div class="doc_author">
- <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a>,
- <a href="mailto:isanbard@gmail.com">Bill Wendling</a>,
- <a href="mailto:pronesto@gmail.com">Fernando Magno Quintao
- Pereira</a> and
- <a href="mailto:jlaskey@mac.com">Jim Laskey</a></p>
+ <p>Written by the LLVM Team.</p>
</div>
<div class="doc_warning">
@@ -123,7 +126,7 @@
suite of reusable components for translating the LLVM internal representation
to the machine code for a specified target&mdash;either in assembly form
(suitable for a static compiler) or in binary machine code format (usable for
- a JIT compiler). The LLVM target-independent code generator consists of five
+ a JIT compiler). The LLVM target-independent code generator consists of six
main components:</p>
<ol>
@@ -132,10 +135,17 @@
independently of how they will be used. These interfaces are defined in
<tt>include/llvm/Target/</tt>.</li>
- <li>Classes used to represent the <a href="#codegendesc">machine code</a>
- being generated for a target. These classes are intended to be abstract
+ <li>Classes used to represent the <a href="#codegendesc">code being
+ generated</a> for a target. These classes are intended to be abstract
enough to represent the machine code for <i>any</i> target machine. These
- classes are defined in <tt>include/llvm/CodeGen/</tt>.</li>
+ classes are defined in <tt>include/llvm/CodeGen/</tt>. At this level,
+ concepts like "constant pool entries" and "jump tables" are explicitly
+ exposed.</li>
+
+ <li>Classes and algorithms used to represent code as the object file level,
+ the <a href="#mc">MC Layer</a>. These classes represent assembly level
+ constructs like labels, sections, and instructions. At this level,
+ concepts like "constant pool entries" and "jump tables" don't exist.</li>
<li><a href="#codegenalgs">Target-independent algorithms</a> used to implement
various phases of native code generation (register allocation, scheduling,
@@ -732,6 +742,157 @@ ret
</div>
+
+<!-- *********************************************************************** -->
+<div class="doc_section">
+ <a name="mc">The "MC" Layer</a>
+</div>
+<!-- *********************************************************************** -->
+
+<div class="doc_text">
+
+<p>
+The MC Layer is used to represent and process code at the raw machine code
+level, devoid of "high level" information like "constant pools", "jump tables",
+"global variables" or anything like that. At this level, LLVM handles things
+like label names, machine instructions, and sections in the object file. The
+code in this layer is used for a number of important purposes: the tail end of
+the code generator uses it to write a .s or .o file, and it is also used by the
+llvm-mc tool to implement standalone machine codeassemblers and disassemblers.
+</p>
+
+<p>
+This section describes some of the important classes. There are also a number
+of important subsystems that interact at this layer, they are described later
+in this manual.
+</p>
+
+</div>
+
+
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+ <a name="mcstreamer">The <tt>MCStreamer</tt> API</a>
+</div>
+
+<div class="doc_text">
+
+<p>
+MCStreamer is best thought of as an assembler API. It is an abstract API which
+is <em>implemented</em> in different ways (e.g. to output a .s file, output an
+ELF .o file, etc) but whose API correspond directly to what you see in a .s
+file. MCStreamer has one method per directive, such as EmitLabel,
+EmitSymbolAttribute, SwitchSection, EmitValue (for .byte, .word), etc, which
+directly correspond to assembly level directives. It also has an
+EmitInstruction method, which is used to output an MCInst to the streamer.
+</p>
+
+<p>
+This API is most important for two clients: the llvm-mc stand-alone assembler is
+effectively a parser that parses a line, then invokes a method on MCStreamer. In
+the code generator, the <a href="#codeemit">Code Emission</a> phase of the code
+generator lowers higher level LLVM IR and Machine* constructs down to the MC
+layer, emitting directives through MCStreamer.</p>
+
+<p>
+On the implementation side of MCStreamer, there are two major implementations:
+one for writing out a .s file (MCAsmStreamer), and one for writing out a .o
+file (MCObjectStreamer). MCAsmStreamer is a straight-forward implementation
+that prints out a directive for each method (e.g. EmitValue -&gt; .byte), but
+MCObjectStreamer implements a full assembler.
+</p>
+
+</div>
+
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+ <a name="mccontext">The <tt>MCContext</tt> class</a>
+</div>
+
+<div class="doc_text">
+
+<p>
+The MCContext class is the owner of a variety of uniqued data structures at the
+MC layer, including symbols, sections, etc. As such, this is the class that you
+interact with to create symbols and sections. This class can not be subclassed.
+</p>
+
+</div>
+
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+ <a name="mcsymbol">The <tt>MCSymbol</tt> class</a>
+</div>
+
+<div class="doc_text">
+
+<p>
+The MCSymbol class represents a symbol (aka label) in the assembly file. There
+are two interesting kinds of symbols: assembler temporary symbols, and normal
+symbols. Assembler temporary symbols are used and processed by the assembler
+but are discarded when the object file is produced. The distinction is usually
+represented by adding a prefix to the label, for example "L" labels are
+assembler temporary labels in MachO.
+</p>
+
+<p>MCSymbols are created by MCContext and uniqued there. This means that
+MCSymbols can be compared for pointer equivalence to find out if they are the
+same symbol. Note that pointer inequality does not guarantee the labels will
+end up at different addresses though. It's perfectly legal to output something
+like this to the .s file:<p>
+
+<pre>
+ foo:
+ bar:
+ .byte 4
+</pre>
+
+<p>In this case, both the foo and bar symbols will have the same address.</p>
+
+</div>
+
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+ <a name="mcsection">The <tt>MCSection</tt> class</a>
+</div>
+
+<div class="doc_text">
+
+<p>
+The MCSection class represents an object-file specific section. It is subclassed
+by object file specific implementations (e.g. <tt>MCSectionMachO</tt>,
+<tt>MCSectionCOFF</tt>, <tt>MCSectionELF</tt>) and these are created and uniqued
+by MCContext. The MCStreamer has a notion of the current section, which can be
+changed with the SwitchToSection method (which corresponds to a ".section"
+directive in a .s file).
+</p>
+
+</div>
+
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+ <a name="mcinst">The <tt>MCInst</tt> class</a></li>
+</div>
+
+<div class="doc_text">
+
+<p>
+The MCInst class is a target-independent representation of an instruction. It
+is a simple class (much more so than <a href="#machineinstr">MachineInstr</a>)
+that holds a target-specific opcode and a vector of MCOperands. MCOperand, in
+turn, is a simple discriminated union of three cases: 1) a simple immediate,
+2) a target register ID, 3) a symbolic expression (e.g. "Lfoo-Lbar+42") as an
+MCExpr.
+</p>
+
+<p>MCInst is the common currency used to represent machine instructions at the
+MC layer. It is the type used by the instruction encoder, the instruction
+printer, and the type generated by the assembly parser and disassembler.
+</p>
+
+</div>
+
+
<!-- *********************************************************************** -->
<div class="doc_section">
<a name="codegenalgs">Target-independent code generation algorithms</a>
@@ -1635,23 +1796,81 @@ $ llc -regalloc=pbqp file.bc -o pbqp.s;
<a name="latemco">Late Machine Code Optimizations</a>
</div>
<div class="doc_text"><p>To Be Written</p></div>
+
<!-- ======================================================================= -->
<div class="doc_subsection">
<a name="codeemit">Code Emission</a>
</div>
-<div class="doc_text"><p>To Be Written</p></div>
-<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
- <a name="codeemit_asm">Generating Assembly Code</a>
+
+<div class="doc_text">
+
+<p>The code emission step of code generation is responsible for lowering from
+the code generator abstractions (like <a
+href="#machinefunction">MachineFunction</a>, <a
+href="#machineinstr">MachineInstr</a>, etc) down
+to the abstractions used by the MC layer (<a href="#mcinst">MCInst</a>,
+<a href="#mcstreamer">MCStreamer</a>, etc). This is
+done with a combination of several different classes: the (misnamed)
+target-independent AsmPrinter class, target-specific subclasses of AsmPrinter
+(such as SparcAsmPrinter), and the TargetLoweringObjectFile class.</p>
+
+<p>Since the MC layer works at the level of abstraction of object files, it
+doesn't have a notion of functions, global variables etc. Instead, it thinks
+about labels, directives, and instructions. A key class used at this time is
+the MCStreamer class. This is an abstract API that is implemented in different
+ways (e.g. to output a .s file, output an ELF .o file, etc) that is effectively
+an "assembler API". MCStreamer has one method per directive, such as EmitLabel,
+EmitSymbolAttribute, SwitchSection, etc, which directly correspond to assembly
+level directives.
+</p>
+
+<p>If you are interested in implementing a code generator for a target, there
+are three important things that you have to implement for your target:</p>
+
+<ol>
+<li>First, you need a subclass of AsmPrinter for your target. This class
+implements the general lowering process converting MachineFunction's into MC
+label constructs. The AsmPrinter base class provides a number of useful methods
+and routines, and also allows you to override the lowering process in some
+important ways. You should get much of the lowering for free if you are
+implementing an ELF, COFF, or MachO target, because the TargetLoweringObjectFile
+class implements much of the common logic.</li>
+
+<li>Second, you need to implement an instruction printer for your target. The
+instruction printer takes an <a href="#mcinst">MCInst</a> and renders it to a
+raw_ostream as text. Most of this is automatically generated from the .td file
+(when you specify something like "<tt>add $dst, $src1, $src2</tt>" in the
+instructions), but you need to implement routines to print operands.</li>
+
+<li>Third, you need to implement code that lowers a <a
+href="#machineinstr">MachineInstr</a> to an MCInst, usually implemented in
+"&lt;target&gt;MCInstLower.cpp". This lowering process is often target
+specific, and is responsible for turning jump table entries, constant pool
+indices, global variable addresses, etc into MCLabels as appropriate. This
+translation layer is also responsible for expanding pseudo ops used by the code
+generator into the actual machine instructions they correspond to. The MCInsts
+that are generated by this are fed into the instruction printer or the encoder.
+</li>
+
+</ol>
+
+<p>Finally, at your choosing, you can also implement an subclass of
+MCCodeEmitter which lowers MCInst's into machine code bytes and relocations.
+This is important if you want to support direct .o file emission, or would like
+to implement an assembler for your target.</p>
+
</div>
-<div class="doc_text"><p>To Be Written</p></div>
-<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
- <a name="codeemit_bin">Generating Binary Machine Code</a>
+
+
+<!-- ======================================================================= -->
+<div class="doc_section">
+ <a name="nativeassembler">Implementing a Native Assembler</a>
</div>
<div class="doc_text">
- <p>For the JIT or <tt>.o</tt> file writer</p>
+
+<p>TODO</p>
+
</div>