Sphinxify the bitcode format document.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159340 91177308-0d34-0410-b5e6-96231b3b80d8
author: Bill Wendling <isanbard@gmail.com> 2012-06-28 08:43:12 +0000
committer: Bill Wendling <isanbard@gmail.com> 2012-06-28 08:43:12 +0000
commit: 0ca9927a71fed311eea7459b4c85c98cc7ed0352 (patch)
tree: 4e950553c2a69c6db06092a5dc69138c0e2e6378 /docs
parent: 87dc7a4c8d21bd638465881f3b6d091f22d9767c (diff)
3 files changed, 1047 insertions, 1491 deletions
diff --git a/docs/BitCodeFormat.html b/docs/BitCodeFormat.html
deleted file mode 100644
index 6a670f56b9..0000000000
--- a/docs/BitCodeFormat.html
+++ /dev/null
@@ -1,1490 +0,0 @@
-<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
-                      "http://www.w3.org/TR/html4/strict.dtd">
-<html>
-<head>
-  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
-  <title>LLVM Bitcode File Format</title>
-  <link rel="stylesheet" href="_static/llvm.css" type="text/css">
-</head>
-<body>
-<h1> LLVM Bitcode File Format</h1>
-<ol>
-  <li><a href="#abstract">Abstract</a></li>
-  <li><a href="#overview">Overview</a></li>
-  <li><a href="#bitstream">Bitstream Format</a>
-    <ol>
-    <li><a href="#magic">Magic Numbers</a></li>
-    <li><a href="#primitives">Primitives</a></li>
-    <li><a href="#abbrevid">Abbreviation IDs</a></li>
-    <li><a href="#blocks">Blocks</a></li>
-    <li><a href="#datarecord">Data Records</a></li>
-    <li><a href="#abbreviations">Abbreviations</a></li>
-    <li><a href="#stdblocks">Standard Blocks</a></li>
-    </ol>
-  </li>
-  <li><a href="#wrapper">Bitcode Wrapper Format</a>
-  </li>
-  <li><a href="#llvmir">LLVM IR Encoding</a>
-    <ol>
-    <li><a href="#basics">Basics</a></li>
-    <li><a href="#MODULE_BLOCK">MODULE_BLOCK Contents</a></li>
-    <li><a href="#PARAMATTR_BLOCK">PARAMATTR_BLOCK Contents</a></li>
-    <li><a href="#TYPE_BLOCK">TYPE_BLOCK Contents</a></li>
-    <li><a href="#CONSTANTS_BLOCK">CONSTANTS_BLOCK Contents</a></li>
-    <li><a href="#FUNCTION_BLOCK">FUNCTION_BLOCK Contents</a></li>
-    <li><a href="#TYPE_SYMTAB_BLOCK">TYPE_SYMTAB_BLOCK Contents</a></li>
-    <li><a href="#VALUE_SYMTAB_BLOCK">VALUE_SYMTAB_BLOCK Contents</a></li>
-    <li><a href="#METADATA_BLOCK">METADATA_BLOCK Contents</a></li>
-    <li><a href="#METADATA_ATTACHMENT">METADATA_ATTACHMENT Contents</a></li>
-    </ol>
-  </li>
-</ol>
-<div class="doc_author">
-  <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a>,
-  <a href="http://www.reverberate.org">Joshua Haberman</a>,
-  and <a href="mailto:housel@acm.org">Peter S. Housel</a>.
-</p>
-</div>
-
-<!-- *********************************************************************** -->
-<h2><a name="abstract">Abstract</a></h2>
-<!-- *********************************************************************** -->
-
-<div>
-
-<p>This document describes the LLVM bitstream file format and the encoding of
-the LLVM IR into it.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<h2><a name="overview">Overview</a></h2>
-<!-- *********************************************************************** -->
-
-<div>
-
-<p>
-What is commonly known as the LLVM bitcode file format (also, sometimes
-anachronistically known as bytecode) is actually two things: a <a 
-href="#bitstream">bitstream container format</a>
-and an <a href="#llvmir">encoding of LLVM IR</a> into the container format.</p>
-
-<p>
-The bitstream format is an abstract encoding of structured data, very
-similar to XML in some ways.  Like XML, bitstream files contain tags, and nested
-structures, and you can parse the file without having to understand the tags.
-Unlike XML, the bitstream format is a binary encoding, and unlike XML it
-provides a mechanism for the file to self-describe "abbreviations", which are
-effectively size optimizations for the content.</p>
-
-<p>LLVM IR files may be optionally embedded into a <a 
-href="#wrapper">wrapper</a> structure that makes it easy to embed extra data
-along with LLVM IR files.</p>
-
-<p>This document first describes the LLVM bitstream format, describes the
-wrapper format, then describes the record structure used by LLVM IR files.
-</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<h2><a name="bitstream">Bitstream Format</a></h2>
-<!-- *********************************************************************** -->
-
-<div>
-
-<p>
-The bitstream format is literally a stream of bits, with a very simple
-structure.  This structure consists of the following concepts:
-</p>
-
-<ul>
-<li>A "<a href="#magic">magic number</a>" that identifies the contents of
-    the stream.</li>
-<li>Encoding <a href="#primitives">primitives</a> like variable bit-rate
-    integers.</li> 
-<li><a href="#blocks">Blocks</a>, which define nested content.</li> 
-<li><a href="#datarecord">Data Records</a>, which describe entities within the
-    file.</li> 
-<li>Abbreviations, which specify compression optimizations for the file.</li> 
-</ul>
-
-<p>Note that the <a 
-href="CommandGuide/html/llvm-bcanalyzer.html">llvm-bcanalyzer</a> tool can be
-used to dump and inspect arbitrary bitstreams, which is very useful for
-understanding the encoding.</p>
-
-<!-- ======================================================================= -->
-<h3>
-  <a name="magic">Magic Numbers</a>
-</h3>
-
-<div>
-
-<p>The first two bytes of a bitcode file are 'BC' (0x42, 0x43).
-The second two bytes are an application-specific magic number.  Generic
-bitcode tools can look at only the first two bytes to verify the file is
-bitcode, while application-specific programs will want to look at all four.</p>
-
-</div>
-
-<!-- ======================================================================= -->
-<h3>
-  <a name="primitives">Primitives</a>
-</h3>
-
-<div>
-
-<p>
-A bitstream literally consists of a stream of bits, which are read in order
-starting with the least significant bit of each byte.  The stream is made up of a
-number of primitive values that encode a stream of unsigned integer values.
-These integers are encoded in two ways: either as <a href="#fixedwidth">Fixed
-Width Integers</a> or as <a href="#variablewidth">Variable Width
-Integers</a>.
-</p>
-
-<!-- _______________________________________________________________________ -->
-<h4>
-  <a name="fixedwidth">Fixed Width Integers</a>
-</h4>
-
-<div>
-
-<p>Fixed-width integer values have their low bits emitted directly to the file.
-   For example, a 3-bit integer value encodes 1 as 001.  Fixed width integers
-   are used when there are a well-known number of options for a field.  For
-   example, boolean values are usually encoded with a 1-bit wide integer. 
-</p>
-
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4>
-  <a name="variablewidth">Variable Width Integers</a>
-</h4>
-
-<div>
-
-<p>Variable-width integer (VBR) values encode values of arbitrary size,
-optimizing for the case where the values are small.  Given a 4-bit VBR field,
-any 3-bit value (0 through 7) is encoded directly, with the high bit set to
-zero.  Values larger than N-1 bits emit their bits in a series of N-1 bit
-chunks, where all but the last set the high bit.</p>
-
-<p>For example, the value 27 (0x1B) is encoded as 1011 0011 when emitted as a
-vbr4 value.  The first set of four bits indicates the value 3 (011) with a
-continuation piece (indicated by a high bit of 1).  The next word indicates a
-value of 24 (011 << 3) with no continuation.  The sum (3+24) yields the value
-27.
-</p>
-
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="char6">6-bit characters</a></h4>
-
-<div>
-
-<p>6-bit characters encode common characters into a fixed 6-bit field.  They
-represent the following characters with the following 6-bit values:</p>
-
-<div class="doc_code">
-<pre>
-'a' .. 'z' &mdash;  0 .. 25
-'A' .. 'Z' &mdash; 26 .. 51
-'0' .. '9' &mdash; 52 .. 61
-       '.' &mdash; 62
-       '_' &mdash; 63
-</pre>
-</div>
-
-<p>This encoding is only suitable for encoding characters and strings that
-consist only of the above characters.  It is completely incapable of encoding
-characters not in the set.</p>
-
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="wordalign">Word Alignment</a></h4>
-
-<div>
-
-<p>Occasionally, it is useful to emit zero bits until the bitstream is a
-multiple of 32 bits.  This ensures that the bit position in the stream can be
-represented as a multiple of 32-bit words.</p>
-
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<h3>
-  <a name="abbrevid">Abbreviation IDs</a>
-</h3>
-
-<div>
-
-<p>
-A bitstream is a sequential series of <a href="#blocks">Blocks</a> and
-<a href="#datarecord">Data Records</a>.  Both of these start with an
-abbreviation ID encoded as a fixed-bitwidth field.  The width is specified by
-the current block, as described below.  The value of the abbreviation ID
-specifies either a builtin ID (which have special meanings, defined below) or
-one of the abbreviation IDs defined for the current block by the stream itself.
-</p>
-
-<p>
-The set of builtin abbrev IDs is:
-</p>
-
-<ul>
-<li><tt>0 - <a href="#END_BLOCK">END_BLOCK</a></tt> &mdash; This abbrev ID marks
-    the end of the current block.</li>
-<li><tt>1 - <a href="#ENTER_SUBBLOCK">ENTER_SUBBLOCK</a></tt> &mdash; This
-    abbrev ID marks the beginning of a new block.</li>
-<li><tt>2 - <a href="#DEFINE_ABBREV">DEFINE_ABBREV</a></tt> &mdash; This defines
-    a new abbreviation.</li>
-<li><tt>3 - <a href="#UNABBREV_RECORD">UNABBREV_RECORD</a></tt> &mdash; This ID
-    specifies the definition of an unabbreviated record.</li>
-</ul>
-
-<p>Abbreviation IDs 4 and above are defined by the stream itself, and specify
-an <a href="#abbrev_records">abbreviated record encoding</a>.</p>
-
-</div>
-
-<!-- ======================================================================= -->
-<h3>
-  <a name="blocks">Blocks</a>
-</h3>
-
-<div>
-
-<p>
-Blocks in a bitstream denote nested regions of the stream, and are identified by
-a content-specific id number (for example, LLVM IR uses an ID of 12 to represent
-function bodies).  Block IDs 0-7 are reserved for <a href="#stdblocks">standard blocks</a>
-whose meaning is defined by Bitcode; block IDs 8 and greater are
-application specific. Nested blocks capture the hierarchical structure of the data
-encoded in it, and various properties are associated with blocks as the file is
-parsed.  Block definitions allow the reader to efficiently skip blocks
-in constant time if the reader wants a summary of blocks, or if it wants to
-efficiently skip data it does not understand.  The LLVM IR reader uses this
-mechanism to skip function bodies, lazily reading them on demand.
-</p>
-
-<p>
-When reading and encoding the stream, several properties are maintained for the
-block.  In particular, each block maintains:
-</p>
-
-<ol>
-<li>A current abbrev id width.  This value starts at 2 at the beginning of
-    the stream, and is set every time a
-    block record is entered.  The block entry specifies the abbrev id width for
-    the body of the block.</li>
-
-<li>A set of abbreviations.  Abbreviations may be defined within a block, in
-    which case they are only defined in that block (neither subblocks nor
-    enclosing blocks see the abbreviation).  Abbreviations can also be defined
-    inside a <tt><a href="#BLOCKINFO">BLOCKINFO</a></tt> block, in which case
-    they are defined in all blocks that match the ID that the BLOCKINFO block is
-    describing.
-</li>
-</ol>
-
-<p>
-As sub blocks are entered, these properties are saved and the new sub-block has
-its own set of abbreviations, and its own abbrev id width.  When a sub-block is
-popped, the saved values are restored.
-</p>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="ENTER_SUBBLOCK">ENTER_SUBBLOCK Encoding</a></h4>
-
-<div>
-
-<p><tt>[ENTER_SUBBLOCK, blockid<sub>vbr8</sub>, newabbrevlen<sub>vbr4</sub>,
-     &lt;align32bits&gt;, blocklen<sub>32</sub>]</tt></p>
-
-<p>
-The <tt>ENTER_SUBBLOCK</tt> abbreviation ID specifies the start of a new block
-record.  The <tt>blockid</tt> value is encoded as an 8-bit VBR identifier, and
-indicates the type of block being entered, which can be
-a <a href="#stdblocks">standard block</a> or an application-specific block.
-The <tt>newabbrevlen</tt> value is a 4-bit VBR, which specifies the abbrev id
-width for the sub-block.  The <tt>blocklen</tt> value is a 32-bit aligned value
-that specifies the size of the subblock in 32-bit words. This value allows the
-reader to skip over the entire block in one jump.
-</p>
-
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="END_BLOCK">END_BLOCK Encoding</a></h4>
-
-<div>
-
-<p><tt>[END_BLOCK, &lt;align32bits&gt;]</tt></p>
-
-<p>
-The <tt>END_BLOCK</tt> abbreviation ID specifies the end of the current block
-record.  Its end is aligned to 32-bits to ensure that the size of the block is
-an even multiple of 32-bits.
-</p>
-
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<h3>
-  <a name="datarecord">Data Records</a>
-</h3>
-
-<div>
-<p>
-Data records consist of a record code and a number of (up to) 64-bit
-integer values.  The interpretation of the code and values is
-application specific and may vary between different block types.
-Records can be encoded either using an unabbrev record, or with an
-abbreviation.  In the LLVM IR format, for example, there is a record
-which encodes the target triple of a module.  The code is
-<tt>MODULE_CODE_TRIPLE</tt>, and the values of the record are the
-ASCII codes for the characters in the string.
-</p>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="UNABBREV_RECORD">UNABBREV_RECORD Encoding</a></h4>
-
-<div>
-
-<p><tt>[UNABBREV_RECORD, code<sub>vbr6</sub>, numops<sub>vbr6</sub>,
-       op0<sub>vbr6</sub>, op1<sub>vbr6</sub>, ...]</tt></p>
-
-<p>
-An <tt>UNABBREV_RECORD</tt> provides a default fallback encoding, which is both
-completely general and extremely inefficient.  It can describe an arbitrary
-record by emitting the code and operands as VBRs.
-</p>
-
-<p>
-For example, emitting an LLVM IR target triple as an unabbreviated record
-requires emitting the <tt>UNABBREV_RECORD</tt> abbrevid, a vbr6 for the
-<tt>MODULE_CODE_TRIPLE</tt> code, a vbr6 for the length of the string, which is
-equal to the number of operands, and a vbr6 for each character.  Because there
-are no letters with values less than 32, each letter would need to be emitted as
-at least a two-part VBR, which means that each letter would require at least 12
-bits.  This is not an efficient encoding, but it is fully general.
-</p>
-
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="abbrev_records">Abbreviated Record Encoding</a></h4>
-
-<div>
-
-<p><tt>[&lt;abbrevid&gt;, fields...]</tt></p>
-
-<p>
-An abbreviated record is a abbreviation id followed by a set of fields that are
-encoded according to the <a href="#abbreviations">abbreviation definition</a>.
-This allows records to be encoded significantly more densely than records
-encoded with the <tt><a href="#UNABBREV_RECORD">UNABBREV_RECORD</a></tt> type,
-and allows the abbreviation types to be specified in the stream itself, which
-allows the files to be completely self describing.  The actual encoding of
-abbreviations is defined below.
-</p>
-
-<p>The record code, which is the first field of an abbreviated record,
-may be encoded in the abbreviation definition (as a literal
-operand) or supplied in the abbreviated record (as a Fixed or VBR
-operand value).</p>
-
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<h3>
-  <a name="abbreviations">Abbreviations</a>
-</h3>
-
-<div>
-<p>
-Abbreviations are an important form of compression for bitstreams.  The idea is
-to specify a dense encoding for a class of records once, then use that encoding
-to emit many records.  It takes space to emit the encoding into the file, but
-the space is recouped (hopefully plus some) when the records that use it are
-emitted.
-</p>
-
-<p>
-Abbreviations can be determined dynamically per client, per file. Because the
-abbreviations are stored in the bitstream itself, different streams of the same
-format can contain different sets of abbreviations according to the needs
-of the specific stream.
-As a concrete example, LLVM IR files usually emit an abbreviation
-for binary operators.  If a specific LLVM module contained no or few binary
-operators, the abbreviation does not need to be emitted.
-</p>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="DEFINE_ABBREV">DEFINE_ABBREV  Encoding</a></h4>
-
-<div>
-
-<p><tt>[DEFINE_ABBREV, numabbrevops<sub>vbr5</sub>, abbrevop0, abbrevop1,
- ...]</tt></p>
-
-<p>
-A <tt>DEFINE_ABBREV</tt> record adds an abbreviation to the list of currently
-defined abbreviations in the scope of this block.  This definition only exists
-inside this immediate block &mdash; it is not visible in subblocks or enclosing
-blocks.  Abbreviations are implicitly assigned IDs sequentially starting from 4
-(the first application-defined abbreviation ID).  Any abbreviations defined in a
-<tt>BLOCKINFO</tt> record for the particular block type
-receive IDs first, in order, followed by any
-abbreviations defined within the block itself.  Abbreviated data records
-reference this ID to indicate what abbreviation they are invoking.
-</p>
-
-<p>
-An abbreviation definition consists of the <tt>DEFINE_ABBREV</tt> abbrevid
-followed by a VBR that specifies the number of abbrev operands, then the abbrev
-operands themselves.  Abbreviation operands come in three forms.  They all start
-with a single bit that indicates whether the abbrev operand is a literal operand
-(when the bit is 1) or an encoding operand (when the bit is 0).
-</p>
-
-<ol>
-<li>Literal operands &mdash; <tt>[1<sub>1</sub>, litvalue<sub>vbr8</sub>]</tt>
-&mdash; Literal operands specify that the value in the result is always a single
-specific value.  This specific value is emitted as a vbr8 after the bit
-indicating that it is a literal operand.</li>
-<li>Encoding info without data &mdash; <tt>[0<sub>1</sub>,
- encoding<sub>3</sub>]</tt> &mdash; Operand encodings that do not have extra
- data are just emitted as their code.
-</li>
-<li>Encoding info with data &mdash; <tt>[0<sub>1</sub>, encoding<sub>3</sub>,
-value<sub>vbr5</sub>]</tt> &mdash; Operand encodings that do have extra data are
-emitted as their code, followed by the extra data.
-</li>
-</ol>
-
-<p>The possible operand encodings are:</p>
-
-<ul>
-<li>Fixed (code 1): The field should be emitted as
-    a <a href="#fixedwidth">fixed-width value</a>, whose width is specified by
-    the operand's extra data.</li>
-<li>VBR (code 2): The field should be emitted as
-    a <a href="#variablewidth">variable-width value</a>, whose width is
-    specified by the operand's extra data.</li>
-<li>Array (code 3): This field is an array of values.  The array operand
-    has no extra data, but expects another operand to follow it, indicating
-    the element type of the array.  When reading an array in an abbreviated
-    record, the first integer is a vbr6 that indicates the array length,
-    followed by the encoded elements of the array.  An array may only occur as
-    the last operand of an abbreviation (except for the one final operand that
-    gives the array's type).</li>
-<li>Char6 (code 4): This field should be emitted as
-    a <a href="#char6">char6-encoded value</a>.  This operand type takes no
-    extra data. Char6 encoding is normally used as an array element type.
-    </li>
-<li>Blob (code 5): This field is emitted as a vbr6, followed by padding to a
-    32-bit boundary (for alignment) and an array of 8-bit objects.  The array of
-    bytes is further followed by tail padding to ensure that its total length is
-    a multiple of 4 bytes.  This makes it very efficient for the reader to
-    decode the data without having to make a copy of it: it can use a pointer to
-    the data in the mapped in file and poke directly at it.  A blob may only
-    occur as the last operand of an abbreviation.</li>
-</ul>
-
-<p>
-For example, target triples in LLVM modules are encoded as a record of the
-form <tt>[TRIPLE, 'a', 'b', 'c', 'd']</tt>.  Consider if the bitstream emitted
-the following abbrev entry:
-</p>
-
-<div class="doc_code">
-<pre>
-[0, Fixed, 4]
-[0, Array]
-[0, Char6]
-</pre>
-</div>
-
-<p>
-When emitting a record with this abbreviation, the above entry would be emitted
-as:
-</p>
-
-<div class="doc_code">
-<p>
-<tt>[4<sub>abbrevwidth</sub>, 2<sub>4</sub>, 4<sub>vbr6</sub>, 0<sub>6</sub>,
-1<sub>6</sub>, 2<sub>6</sub>, 3<sub>6</sub>]</tt>
-</p>
-</div>
-
-<p>These values are:</p>
-
-<ol>
-<li>The first value, 4, is the abbreviation ID for this abbreviation.</li>
-<li>The second value, 2, is the record code for <tt>TRIPLE</tt> records within LLVM IR file <tt>MODULE_BLOCK</tt> blocks.</li>
-<li>The third value, 4, is the length of the array.</li>
-<li>The rest of the values are the char6 encoded values
-    for <tt>"abcd"</tt>.</li>
-</ol>
-
-<p>
-With this abbreviation, the triple is emitted with only 37 bits (assuming a
-abbrev id width of 3).  Without the abbreviation, significantly more space would
-be required to emit the target triple.  Also, because the <tt>TRIPLE</tt> value
-is not emitted as a literal in the abbreviation, the abbreviation can also be
-used for any other string value.
-</p>
-
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<h3>
-  <a name="stdblocks">Standard Blocks</a>
-</h3>
-
-<div>
-
-<p>
-In addition to the basic block structure and record encodings, the bitstream
-also defines specific built-in block types.  These block types specify how the
-stream is to be decoded or other metadata.  In the future, new standard blocks
-may be added.  Block IDs 0-7 are reserved for standard blocks.
-</p>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="BLOCKINFO">#0 - BLOCKINFO Block</a></h4>
-
-<div>
-
-<p>
-The <tt>BLOCKINFO</tt> block allows the description of metadata for other
-blocks.  The currently specified records are:
-</p>
-
-<div class="doc_code">
-<pre>
-[SETBID (#1), blockid]
-[DEFINE_ABBREV, ...]
-[BLOCKNAME, ...name...]
-[SETRECORDNAME, RecordID, ...name...]
-</pre>
-</div>
-
-<p>
-The <tt>SETBID</tt> record (code 1) indicates which block ID is being
-described.  <tt>SETBID</tt> records can occur multiple times throughout the
-block to change which block ID is being described.  There must be
-a <tt>SETBID</tt> record prior to any other records.
-</p>
-
-<p>
-Standard <tt>DEFINE_ABBREV</tt> records can occur inside <tt>BLOCKINFO</tt>
-blocks, but unlike their occurrence in normal blocks, the abbreviation is
-defined for blocks matching the block ID we are describing, <i>not</i> the
-<tt>BLOCKINFO</tt> block itself.  The abbreviations defined
-in <tt>BLOCKINFO</tt> blocks receive abbreviation IDs as described
-in <tt><a href="#DEFINE_ABBREV">DEFINE_ABBREV</a></tt>.
-</p>
-
-<p>The <tt>BLOCKNAME</tt> record (code 2) can optionally occur in this block.  The elements of
-the record are the bytes of the string name of the block.  llvm-bcanalyzer can use
-this to dump out bitcode files symbolically.</p>
-
-<p>The <tt>SETRECORDNAME</tt> record (code 3) can also optionally occur in this block.  The
-first operand value is a record ID number, and the rest of the elements of the record are
-the bytes for the string name of the record.  llvm-bcanalyzer can use
-this to dump out bitcode files symbolically.</p>
-
-<p>
-Note that although the data in <tt>BLOCKINFO</tt> blocks is described as
-"metadata," the abbreviations they contain are essential for parsing records
-from the corresponding blocks.  It is not safe to skip them.
-</p>
-
-</div>
-
-</div>
-
-</div>
-
-<!-- *********************************************************************** -->
-<h2><a name="wrapper">Bitcode Wrapper Format</a></h2>
-<!-- *********************************************************************** -->
-
-<div>
-
-<p>
-Bitcode files for LLVM IR may optionally be wrapped in a simple wrapper
-structure.  This structure contains a simple header that indicates the offset
-and size of the embedded BC file.  This allows additional information to be
-stored alongside the BC file.  The structure of this file header is:
-</p>
-
-<div class="doc_code">
-<p>
-<tt>[Magic<sub>32</sub>, Version<sub>32</sub>, Offset<sub>32</sub>,
-Size<sub>32</sub>, CPUType<sub>32</sub>]</tt>
-</p>
-</div>
-
-<p>
-Each of the fields are 32-bit fields stored in little endian form (as with
-the rest of the bitcode file fields).  The Magic number is always
-<tt>0x0B17C0DE</tt> and the version is currently always <tt>0</tt>.  The Offset
-field is the offset in bytes to the start of the bitcode stream in the file, and
-the Size field is the size in bytes of the stream. CPUType is a target-specific
-value that can be used to encode the CPU of the target.
-</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<h2><a name="llvmir">LLVM IR Encoding</a></h2>
-<!-- *********************************************************************** -->
-
-<div>
-
-<p>
-LLVM IR is encoded into a bitstream by defining blocks and records.  It uses
-blocks for things like constant pools, functions, symbol tables, etc.  It uses
-records for things like instructions, global variable descriptors, type
-descriptions, etc.  This document does not describe the set of abbreviations
-that the writer uses, as these are fully self-described in the file, and the
-reader is not allowed to build in any knowledge of this.
-</p>
-
-<!-- ======================================================================= -->
-<h3>
-  <a name="basics">Basics</a>
-</h3>
-
-<div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="ir_magic">LLVM IR Magic Number</a></h4>
-
-<div>
-
-<p>
-The magic number for LLVM IR files is:
-</p>
-
-<div class="doc_code">
-<p>
-<tt>[0x0<sub>4</sub>, 0xC<sub>4</sub>, 0xE<sub>4</sub>, 0xD<sub>4</sub>]</tt>
-</p>
-</div>
-
-<p>
-When combined with the bitcode magic number and viewed as bytes, this is
-<tt>"BC&nbsp;0xC0DE"</tt>.
-</p>
-
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="ir_signed_vbr">Signed VBRs</a></h4>
-
-<div>
-
-<p>
-<a href="#variablewidth">Variable Width Integer</a> encoding is an efficient way to
-encode arbitrary sized unsigned values, but is an extremely inefficient for
-encoding signed values, as signed values are otherwise treated as maximally large
-unsigned values.
-</p>
-
-<p>
-As such, signed VBR values of a specific width are emitted as follows:
-</p>
-
-<ul>
-<li>Positive values are emitted as VBRs of the specified width, but with their
-    value shifted left by one.</li>
-<li>Negative values are emitted as VBRs of the specified width, but the negated
-    value is shifted left by one, and the low bit is set.</li>
-</ul>
-
-<p>
-With this encoding, small positive and small negative values can both
-be emitted efficiently. Signed VBR encoding is used in
-<tt>CST_CODE_INTEGER</tt> and <tt>CST_CODE_WIDE_INTEGER</tt> records
-within <tt>CONSTANTS_BLOCK</tt> blocks.
-</p>
-
-</div>
-
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="ir_blocks">LLVM IR Blocks</a></h4>
-
-<div>
-
-<p>
-LLVM IR is defined with the following blocks:
-</p>
-
-<ul>
-<li>8  &mdash; <a href="#MODULE_BLOCK"><tt>MODULE_BLOCK</tt></a> &mdash; This is the top-level block that
-    contains the entire module, and describes a variety of per-module
-    information.</li>
-<li>9  &mdash; <a href="#PARAMATTR_BLOCK"><tt>PARAMATTR_BLOCK</tt></a> &mdash; This enumerates the parameter
-    attributes.</li>
-<li>10 &mdash; <a href="#TYPE_BLOCK"><tt>TYPE_BLOCK</tt></a> &mdash; This describes all of the types in
-    the module.</li>
-<li>11 &mdash; <a href="#CONSTANTS_BLOCK"><tt>CONSTANTS_BLOCK</tt></a> &mdash; This describes constants for a
-    module or function.</li>
-<li>12 &mdash; <a href="#FUNCTION_BLOCK"><tt>FUNCTION_BLOCK</tt></a> &mdash; This describes a function
-    body.</li>
-<li>13 &mdash; <a href="#TYPE_SYMTAB_BLOCK"><tt>TYPE_SYMTAB_BLOCK</tt></a> &mdash; This describes the type symbol
-    table.</li>
-<li>14 &mdash; <a href="#VALUE_SYMTAB_BLOCK"><tt>VALUE_SYMTAB_BLOCK</tt></a> &mdash; This describes a value symbol
-    table.</li>
-<li>15 &mdash; <a href="#METADATA_BLOCK"><tt>METADATA_BLOCK</tt></a> &mdash; This describes metadata items.</li>
-<li>16 &mdash; <a href="#METADATA_ATTACHMENT"><tt>METADATA_ATTACHMENT</tt></a> &mdash; This contains records associating metadata with function instruction values.</li>
-</ul>
-
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<h3>
-  <a name="MODULE_BLOCK">MODULE_BLOCK Contents</a>
-</h3>
-
-<div>
-
-<p>The <tt>MODULE_BLOCK</tt> block (id 8) is the top-level block for LLVM
-bitcode files, and each bitcode file must contain exactly one. In
-addition to records (described below) containing information
-about the module, a <tt>MODULE_BLOCK</tt> block may contain the
-following sub-blocks:
-</p>
-
-<ul>
-<li><a href="#BLOCKINFO"><tt>BLOCKINFO</tt></a></li>
-<li><a href="#PARAMATTR_BLOCK"><tt>PARAMATTR_BLOCK</tt></a></li>
-<li><a href="#TYPE_BLOCK"><tt>TYPE_BLOCK</tt></a></li>
-<li><a href="#TYPE_SYMTAB_BLOCK"><tt>TYPE_SYMTAB_BLOCK</tt></a></li>
-<li><a href="#VALUE_SYMTAB_BLOCK"><tt>VALUE_SYMTAB_BLOCK</tt></a></li>
-<li><a href="#CONSTANTS_BLOCK"><tt>CONSTANTS_BLOCK</tt></a></li>
-<li><a href="#FUNCTION_BLOCK"><tt>FUNCTION_BLOCK</tt></a></li>
-<li><a href="#METADATA_BLOCK"><tt>METADATA_BLOCK</tt></a></li>
-</ul>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="MODULE_CODE_VERSION">MODULE_CODE_VERSION Record</a></h4>
-
-<div>
-
-<p><tt>[VERSION, version#]</tt></p>
-
-<p>The <tt>VERSION</tt> record (code 1) contains a single value
-indicating the format version. Only version 0 is supported at this
-time.</p>
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="MODULE_CODE_TRIPLE">MODULE_CODE_TRIPLE Record</a></h4>
-
-<div>
-<p><tt>[TRIPLE, ...string...]</tt></p>
-
-<p>The <tt>TRIPLE</tt> record (code 2) contains a variable number of
-values representing the bytes of the <tt>target triple</tt>
-specification string.</p>
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="MODULE_CODE_DATALAYOUT">MODULE_CODE_DATALAYOUT Record</a></h4>
-
-<div>
-<p><tt>[DATALAYOUT, ...string...]</tt></p>
-
-<p>The <tt>DATALAYOUT</tt> record (code 3) contains a variable number of
-values representing the bytes of the <tt>target datalayout</tt>
-specification string.</p>
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="MODULE_CODE_ASM">MODULE_CODE_ASM Record</a></h4>
-
-<div>
-<p><tt>[ASM, ...string...]</tt></p>
-
-<p>The <tt>ASM</tt> record (code 4) contains a variable number of
-values representing the bytes of <tt>module asm</tt> strings, with
-individual assembly blocks separated by newline (ASCII 10) characters.</p>
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="MODULE_CODE_SECTIONNAME">MODULE_CODE_SECTIONNAME Record</a></h4>
-
-<div>
-<p><tt>[SECTIONNAME, ...string...]</tt></p>
-
-<p>The <tt>SECTIONNAME</tt> record (code 5) contains a variable number
-of values representing the bytes of a single section name
-string. There should be one <tt>SECTIONNAME</tt> record for each
-section name referenced (e.g., in global variable or function
-<tt>section</tt> attributes) within the module. These records can be
-referenced by the 1-based index in the <i>section</i> fields of
-<tt>GLOBALVAR</tt> or <tt>FUNCTION</tt> records.</p>
-</div>
-
-<!-- _______________________________________________________________________ -->
-<h4><a name="MODULE_CODE_DEPLIB">MODULE_CODE_DEPLIB Record</a></h4>
-
-<div>
-<p><tt>[DEPLIB, ...string...]</tt></p>
-
-<p>The <tt>DEPLIB</tt> record (code 6) contains a variable number of
-values representing the bytes of a single dependent library name
author	Bill Wendling <isanbard@gmail.com>	2012-06-28 08:43:12 +0000
committer	Bill Wendling <isanbard@gmail.com>	2012-06-28 08:43:12 +0000
commit	0ca9927a71fed311eea7459b4c85c98cc7ed0352 (patch)
tree	4e950553c2a69c6db06092a5dc69138c0e2e6378 /docs
parent	87dc7a4c8d21bd638465881f3b6d091f22d9767c (diff)