aboutsummaryrefslogtreecommitdiff
path: root/docs/SourceLevelDebugging.html
diff options
context:
space:
mode:
Diffstat (limited to 'docs/SourceLevelDebugging.html')
-rw-r--r--docs/SourceLevelDebugging.html1117
1 files changed, 1117 insertions, 0 deletions
diff --git a/docs/SourceLevelDebugging.html b/docs/SourceLevelDebugging.html
new file mode 100644
index 0000000000..71c74a1938
--- /dev/null
+++ b/docs/SourceLevelDebugging.html
@@ -0,0 +1,1117 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
+ "http://www.w3.org/TR/html4/strict.dtd">
+<html>
+<head>
+ <title>Source Level Debugging with LLVM</title>
+ <link rel="stylesheet" href="llvm.css" type="text/css">
+</head>
+<body>
+
+<div class="doc_title">Source Level Debugging with LLVM</div>
+
+<table class="layout" style="width:100%">
+ <tr class="layout">
+ <td class="left">
+<ul>
+ <li><a href="#introduction">Introduction</a>
+ <ol>
+ <li><a href="#phil">Philosophy behind LLVM debugging information</a></li>
+ <li><a href="#debugopt">Debugging optimized code</a></li>
+ <li><a href="#future">Future work</a></li>
+ </ol></li>
+ <li><a href="#llvm-db">Using the <tt>llvm-db</tt> tool</a>
+ <ol>
+ <li><a href="#limitations">Limitations of <tt>llvm-db</tt></a></li>
+ <li><a href="#sample">A sample <tt>llvm-db</tt> session</a></li>
+ <li><a href="#startup">Starting the debugger</a></li>
+ <li><a href="#commands">Commands recognized by the debugger</a></li>
+ </ol></li>
+
+ <li><a href="#architecture">Architecture of the LLVM debugger</a>
+ <ol>
+ <li><a href="#arch_debugger">The Debugger and InferiorProcess classes</a></li>
+ <li><a href="#arch_info">The RuntimeInfo, ProgramInfo, and SourceLanguage classes</a></li>
+ <li><a href="#arch_llvm-db">The <tt>llvm-db</tt> tool</a></li>
+ <li><a href="#arch_todo">Short-term TODO list</a></li>
+ </ol></li>
+
+ <li><a href="#format">Debugging information format</a>
+ <ol>
+ <li><a href="#format_common_anchors">Anchors for global objects</a></li>
+ <li><a href="#format_common_stoppoint">Representing stopping points in the source program</a></li>
+ <li><a href="#format_common_lifetime">Object lifetimes and scoping</a></li>
+ <li><a href="#format_common_descriptors">Object descriptor formats</a>
+ <ul>
+ <li><a href="#format_common_source_files">Representation of source files</a></li>
+ <li><a href="#format_common_program_objects">Representation of program objects</a></li>
+ <li><a href="#format_common_object_contexts">Program object contexts</a></li>
+ </ul></li>
+ <li><a href="#format_common_intrinsics">Debugger intrinsic functions</a></li>
+ <li><a href="#format_common_tags">Values for debugger tags</a></li>
+ </ol></li>
+ <li><a href="#ccxx_frontend">C/C++ front-end specific debug information</a>
+ <ol>
+ <li><a href="#ccxx_pse">Program Scope Entries</a>
+ <ul>
+ <li><a href="#ccxx_compilation_units">Compilation unit entries</a></li>
+ <li><a href="#ccxx_modules">Module, namespace, and importing entries</a></li>
+ </ul></li>
+ <li><a href="#ccxx_dataobjects">Data objects (program variables)</a></li>
+ </ol></li>
+</ul>
+</td>
+<td class="right">
+<img src="img/venusflytrap.jpg" alt="A leafy and green bug eater" width="247"
+height="369">
+</td>
+</tr></table>
+
+<div class="doc_author">
+ <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
+</div>
+
+
+<!-- *********************************************************************** -->
+<div class="doc_section"><a name="introduction">Introduction</a></div>
+<!-- *********************************************************************** -->
+
+<div class="doc_text">
+
+<p>This document is the central repository for all information pertaining to
+debug information in LLVM. It describes the <a href="#llvm-db">user
+interface</a> for the <tt>llvm-db</tt> tool, which provides a
+powerful <a href="#llvm-db">source-level debugger</a>
+to users of LLVM-based compilers. It then describes the <a
+href="#architecture">various components</a> that make up the debugger and the
+libraries which future clients may use. Finally, it describes the <a
+href="#format">actual format that the LLVM debug information</a> takes,
+which is useful for those interested in creating front-ends or dealing directly
+with the information.</p>
+
+</div>
+
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+ <a name="phil">Philosophy behind LLVM debugging information</a>
+</div>
+
+<div class="doc_text">
+
+<p>The idea of the LLVM debugging information is to capture how the important
+pieces of the source-language's Abstract Syntax Tree map onto LLVM code.
+Several design aspects have shaped the solution that appears here. The
+important ones are:</p>
+
+<ul>
+<li>Debugging information should have very little impact on the rest of the
+compiler. No transformations, analyses, or code generators should need to be
+modified because of debugging information.</li>
+
+<li>LLVM optimizations should interact in <a href="#debugopt">well-defined and
+easily described ways</a> with the debugging information.</li>
+
+<li>Because LLVM is designed to support arbitrary programming languages,
+LLVM-to-LLVM tools should not need to know anything about the semantics of the
+source-level-language.</li>
+
+<li>Source-level languages are often <b>widely</b> different from one another.
+LLVM should not put any restrictions of the flavor of the source-language, and
+the debugging information should work with any language.</li>
+
+<li>With code generator support, it should be possible to use an LLVM compiler
+to compile a program to native machine code and standard debugging formats.
+This allows compatibility with traditional machine-code level debuggers, like
+GDB or DBX.</li>
+
+</ul>
+
+<p>The approach used by the LLVM implementation is to use a small set of <a
+href="#format_common_intrinsics">intrinsic functions</a> to define a mapping
+between LLVM program objects and the source-level objects. The description of
+the source-level program is maintained in LLVM global variables in an <a
+href="#ccxx_frontend">implementation-defined format</a> (the C/C++ front-end
+currently uses working draft 7 of the <a
+href="http://www.eagercon.com/dwarf/dwarf3std.htm">Dwarf 3 standard</a>).</p>
+
+<p>When a program is debugged, the debugger interacts with the user and turns
+the stored debug information into source-language specific information. As
+such, the debugger must be aware of the source-language, and is thus tied to a
+specific language of family of languages. The <a href="#llvm-db">LLVM
+debugger</a> is designed to be modular in its support for source-languages.</p>
+
+</div>
+
+
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+ <a name="debugopt">Debugging optimized code</a>
+</div>
+
+<div class="doc_text">
+
+<p>An extremely high priority of LLVM debugging information is to make it
+interact well with optimizations and analysis. In particular, the LLVM debug
+information provides the following guarantees:</p>
+
+<ul>
+
+<li>LLVM debug information <b>always provides information to accurately read the
+source-level state of the program</b>, regardless of which LLVM optimizations
+have been run, and without any modification to the optimizations themselves.
+However, some optimizations may impact the ability to modify the current state
+of the program with a debugger, such as setting program variables, or calling
+function that have been deleted.</li>
+
+<li>LLVM optimizations gracefully interact with debugging information. If they
+are not aware of debug information, they are automatically disabled as necessary
+in the cases that would invalidate the debug info. This retains the LLVM
+features making it easy to write new transformations.</li>
+
+<li>As desired, LLVM optimizations can be upgraded to be aware of the LLVM
+debugging information, allowing them to update the debugging information as they
+perform aggressive optimizations. This means that, with effort, the LLVM
+optimizers could optimize debug code just as well as non-debug code.</li>
+
+<li>LLVM debug information does not prevent many important optimizations from
+happening (for example inlining, basic block reordering/merging/cleanup, tail
+duplication, etc), further reducing the amount of the compiler that eventually
+is "aware" of debugging information.</li>
+
+<li>LLVM debug information is automatically optimized along with the rest of the
+program, using existing facilities. For example, duplicate information is
+automatically merged by the linker, and unused information is automatically
+removed.</li>
+
+</ul>
+
+<p>Basically, the debug information allows you to compile a program with
+"<tt>-O0 -g</tt>" and get full debug information, allowing you to arbitrarily
+modify the program as it executes from the debugger. Compiling a program with
+"<tt>-O3 -g</tt>" gives you full debug information that is always available and
+accurate for reading (e.g., you get accurate stack traces despite tail call
+elimination and inlining), but you might lose the ability to modify the program
+and call functions where were optimized out of the program, or inlined away
+completely.</p>
+
+</div>
+
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+ <a name="future">Future work</a>
+</div>
+
+<div class="doc_text">
+<p>There are several important extensions that could be eventually added to the
+LLVM debugger. The most important extension would be to upgrade the LLVM code
+generators to support debugging information. This would also allow, for
+example, the X86 code generator to emit native objects that contain debugging
+information consumable by traditional source-level debuggers like GDB or
+DBX.</p>
+
+<p>Additionally, LLVM optimizations can be upgraded to incrementally update the
+debugging information, <a href="#commands">new commands</a> can be added to the
+debugger, and thread support could be added to the debugger.</p>
+
+<p>The "SourceLanguage" modules provided by <tt>llvm-db</tt> could be
+substantially improved to provide good support for C++ language features like
+namespaces and scoping rules.</p>
+
+<p>After working with the debugger for a while, perhaps the nicest improvement
+would be to add some sort of line editor, such as GNU readline (but one that is
+compatible with the LLVM license).</p>
+
+<p>For someone so inclined, it should be straight-forward to write different
+front-ends for the LLVM debugger, as the LLVM debugging engine is cleanly
+separated from the <tt>llvm-db</tt> front-end. A new LLVM GUI debugger or IDE
+would be nice.</p>
+
+</div>
+
+<!-- *********************************************************************** -->
+<div class="doc_section">
+ <a name="llvm-db">Using the <tt>llvm-db</tt> tool</a>
+</div>
+<!-- *********************************************************************** -->
+
+<div class="doc_text">
+
+<p>The <tt>llvm-db</tt> tool provides a GDB-like interface for source-level
+debugging of programs. This tool provides many standard commands for inspecting
+and modifying the program as it executes, loading new programs, single stepping,
+placing breakpoints, etc. This section describes how to use the debugger.</p>
+
+<p><tt>llvm-db</tt> has been designed to be as similar to GDB in its user
+interface as possible. This should make it extremely easy to learn
+<tt>llvm-db</tt> if you already know <tt>GDB</tt>. In general, <tt>llvm-db</tt>
+provides the subset of GDB commands that are applicable to LLVM debugging users.
+If there is a command missing that make a reasonable amount of sense within the
+<a href="#limitations">limitations of <tt>llvm-db</tt></a>, please report it as
+a bug or, better yet, submit a patch to add it.</p>
+
+</div>
+
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+ <a name="limitations">Limitations of <tt>llvm-db</tt></a>
+</div>
+
+<div class="doc_text">
+
+<p><tt>llvm-db</tt> is designed to be modular and easy to extend. This
+extensibility was key to getting the debugger up-and-running quickly, because we
+can start with simple-but-unsophisicated implementations of various components.
+Because of this, it is currently missing many features, though they should be
+easy to add over time (patches welcomed!). The biggest inherent limitations of
+<tt>llvm-db</tt> are currently due to extremely simple <a
+href="#arch_debugger">debugger backend</a> (implemented in
+"lib/Debugger/UnixLocalInferiorProcess.cpp") which is designed to work without
+any cooperation from the code generators. Because it is so simple, it suffers
+from the following inherent limitations:</p>
+
+<ul>
+
+<li>Running a program in <tt>llvm-db</tt> is a bit slower than running it with
+<tt>lli</tt> (i.e., in the JIT).</li>
+
+<li>Inspection of the target hardware is not supported. This means that you
+cannot, for example, print the contents of X86 registers.</li>
+
+<li>Inspection of LLVM code is not supported. This means that you cannot print
+the contents of arbitrary LLVM values, or use commands such as <tt>stepi</tt>.
+This also means that you cannot debug code without debug information.</li>
+
+<li>Portions of the debugger run in the same address space as the program being
+debugged. This means that memory corruption by the program could trample on
+portions of the debugger.</li>
+
+<li>Attaching to existing processes and core files is not currently
+supported.</li>
+
+</ul>
+
+<p>That said, the debugger is still quite useful, and all of these limitations
+can be eliminated by integrating support for the debugger into the code
+generators, and writing a new <a href="#arch_debugger">InferiorProcess</a>
+subclass to use it. See the <a href="#future">future work</a> section for ideas
+of how to extend the LLVM debugger despite these limitations.</p>
+
+</div>
+
+
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+ <a name="sample">A sample <tt>llvm-db</tt> session</a>
+</div>
+
+<div class="doc_text">
+
+<p>TODO: this is obviously lame, when more is implemented, this can be much
+better.</p>
+
+<pre>
+$ <b>llvm-db funccall</b>
+llvm-db: The LLVM source-level debugger
+Loading program... successfully loaded 'funccall.bc'!
+(llvm-db) <b>create</b>
+Starting program: funccall.bc
+main at funccall.c:9:2
+9 -> q = 0;
+(llvm-db) <b>list main</b>
+4 void foo() {
+5 int t = q;
+6 q = t + 1;
+7 }
+8 int main() {
+9 -> q = 0;
+10 foo();
+11 q = q - 1;
+12
+13 return q;
+(llvm-db) <b>list</b>
+14 }
+(llvm-db) <b>step</b>
+10 -> foo();
+(llvm-db) <b>s</b>
+foo at funccall.c:5:2
+5 -> int t = q;
+(llvm-db) <b>bt</b>
+#0 -> 0x85ffba0 in foo at funccall.c:5:2
+#1 0x85ffd98 in main at funccall.c:10:2
+(llvm-db) <b>finish</b>
+main at funccall.c:11:2
+11 -> q = q - 1;
+(llvm-db) <b>s</b>
+13 -> return q;
+(llvm-db) <b>s</b>
+The program stopped with exit code 0
+(llvm-db) <b>quit</b>
+$
+</pre>
+
+</div>
+
+
+
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+ <a name="startup">Starting the debugger</a>
+</div>
+
+<div class="doc_text">
+
+<p>There are three ways to start up the <tt>llvm-db</tt> debugger:</p>
+
+<p>When run with no options, just <tt>llvm-db</tt>, the debugger starts up
+without a program loaded at all. You must use the <a
+href="#c_file"><tt>file</tt> command</a> to load a program, and the <a
+href="#c_set_args"><tt>set args</tt></a> or <a href="#c_run"><tt>run</tt></a>
+commands to specify the arguments for the program.</p>
+
+<p>If you start the debugger with one argument, as <tt>llvm-db
+&lt;program&gt;</tt>, the debugger will start up and load in the specified
+program. You can then optionally specify arguments to the program with the <a
+href="#c_set_args"><tt>set args</tt></a> or <a href="#c_run"><tt>run</tt></a>
+commands.</p>
+
+<p>The third way to start the program is with the <tt>--args</tt> option. This
+option allows you to specify the program to load and the arguments to start out
+with. <!-- No options to <tt>llvm-db</tt> may be specified after the
+<tt>-args</tt> option. --> Example use: <tt>llvm-db --args ls /home</tt></p>
+
+</div>
+
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+ <a name="commands">Commands recognized by the debugger</a>
+</div>
+
+<div class="doc_text">
+
+<p>FIXME: this needs work obviously. See the <a
+href="http://sources.redhat.com/gdb/documentation/">GDB documentation</a> for
+information about what these do, or try '<tt>help [command]</tt>' within
+<tt>llvm-db</tt> to get information.</p>
+
+<p>
+<h2>General usage:</h2>
+<ul>
+<li>help [command]</li>
+<li>quit</li>
+<li><a name="c_file">file</a> [program]</li>
+</ul>
+
+<h2>Program inspection and interaction:</h2>
+<ul>
+<li>create (start the program, stopping it ASAP in <tt>main</tt>)</li>
+<li>kill</li>
+<li>run [args]</li>
+<li>step [num]</li>
+<li>next [num]</li>
+<li>cont</li>
+<li>finish</li>
+
+<li>list [start[, end]]</li>
+<li>info source</li>
+<li>info sources</li>
+<li>info functions</li>
+</ul>
+
+<h2>Call stack inspection:</h2>
+<ul>
+<li>backtrace</li>
+<li>up [n]</li>
+<li>down [n]</li>
+<li>frame [n]</li>
+</ul>
+
+
+<h2>Debugger inspection and interaction:</h2>
+<ul>
+<li>info target</li>
+<li>show prompt</li>
+<li>set prompt</li>
+<li>show listsize</li>
+<li>set listsize</li>
+<li>show language</li>
+<li>set language</li>
+<li>show args</li>
+<li>set args [args]</li>
+</ul>
+
+<h2>TODO:</h2>
+<ul>
+<li>info frame</li>
+<li>break</li>
+<li>print</li>
+<li>ptype</li>
+
+<li>info types</li>
+<li>info variables</li>
+<li>info program</li>
+
+<li>info args</li>
+<li>info locals</li>
+<li>info catch</li>
+<li>... many others</li>
+</ul>
+
+</div>
+
+<!-- *********************************************************************** -->
+<div class="doc_section">
+ <a name="architecture">Architecture of the LLVM debugger</a>
+</div>
+<!-- *********************************************************************** -->
+
+<div class="doc_text">
+<p>The LLVM debugger is built out of three distinct layers of software. These
+layers provide clients with different interface options depending on what pieces
+of they want to implement themselves, and it also promotes code modularity and
+good design. The three layers are the <a href="#arch_debugger">Debugger
+interface</a>, the <a href="#arch_info">"info" interfaces</a>, and the <a
+href="#arch_llvm-db"><tt>llvm-db</tt> tool</a> itself.</p>
+</div>
+
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+ <a name="arch_debugger">The Debugger and InferiorProcess classes</a>
+</div>
+
+<div class="doc_text">
+<p>The Debugger class (defined in the <tt>include/llvm/Debugger/</tt> directory)
+is a low-level class which is used to maintain information about the loaded
+program, as well as start and stop the program running as necessary. This class
+does not provide any high-level analysis or control over the program, only
+exposing simple interfaces like <tt>load/unloadProgram</tt>,
+<tt>create/killProgram</tt>, <tt>step/next/finish/contProgram</tt>, and
+low-level methods for installing breakpoints.</p>
+
+<p>
+The Debugger class is itself a wrapper around the lowest-level InferiorProcess
+class. This class is used to represent an instance of the program running under
+debugger control. The InferiorProcess class can be implemented in different
+ways for different targets and execution scenarios (e.g., remote debugging).
+The InferiorProcess class exposes a small and simple collection of interfaces
+which are useful for inspecting the current state of the program (such as
+collecting stack trace information, reading the memory image of the process,
+etc). The interfaces in this class are designed to be as low-level and simple
+as possible, to make it easy to create new instances of the class.
+</p>
+
+<p>
+The Debugger class exposes the currently active instance of InferiorProcess
+through the <tt>Debugger::getRunningProcess</tt> method, which returns a
+<tt>const</tt> reference to the class. This means that clients of the Debugger
+class can only <b>inspect</b> the running instance of the program directly. To
+change the executing process in some way, they must use the interces exposed by
+the Debugger class.
+</p>
+</div>
+
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+ <a name="arch_info">The RuntimeInfo, ProgramInfo, and SourceLanguage classes</a>
+</div>
+
+<div class="doc_text">
+<p>
+The next-highest level of debugger abstraction is provided through the
+ProgramInfo, RuntimeInfo, SourceLanguage and related classes (also defined in
+the <tt>include/llvm/Debugger/</tt> directory). These classes efficiently
+decode the debugging information and low-level interfaces exposed by
+InferiorProcess into a higher-level representation, suitable for analysis by the
+debugger.
+</p>
+
+<p>
+The ProgramInfo class exposes a variety of different kinds of information about
+the program objects in the source-level-language. The SourceFileInfo class
+represents a source-file in the program (e.g. a .cpp or .h file). The
+SourceFileInfo class captures information such as which SourceLanguage was used
+to compile the file, where the debugger can get access to the actual file text
+(which is lazily loaded on demand), etc. The SourceFunctionInfo class
+represents a... <b>FIXME: finish</b>. The ProgramInfo class provides interfaces
+to lazily find and decode the information needed to create the Source*Info
+classes requested by the debugger.
+</p>
+
+<p>
+The RuntimeInfo class exposes information about the currently executed program,
+by decoding information from the InferiorProcess and ProgramInfo classes. It
+provides a StackFrame class which provides an easy-to-use interface for
+inspecting the current and suspended stack frames in the program.
+</p>
+
+<p>
+The SourceLanguage class is an abstract interface used by the debugger to
+perform all source-language-specific tasks. For example, this interface is used
+by the ProgramInfo class to decode language-specific types and functions and by
+the debugger front-end (such as <a href="#arch_llvm-db"><tt>llvm-db</tt></a> to
+evaluate source-langauge expressions typed into the debugger. This class uses
+the RuntimeInfo &amp; ProgramInfo classes to get information about the current
+execution context and the loaded program, respectively.
+</p>
+
+</div>
+
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+ <a name="arch_llvm-db">The <tt>llvm-db</tt> tool</a>
+</div>
+
+<div class="doc_text">
+<p>
+The <tt>llvm-db</tt> is designed to be a debugger providing an interface as <a
+href="#llvm-db">similar to GDB</a> as reasonable, but no more so than that.
+Because the <a href="#arch_debugger">Debugger</a> and <a
+href="#arch_info">info</a> classes implement all of the heavy lifting and
+analysis, <tt>llvm-db</tt> (which lives in <tt>llvm/tools/llvm-db</tt>) consists
+mainly of of code to interact with the user and parse commands. The CLIDebugger
+constructor registers all of the builtin commands for the debugger, and each
+command is implemented as a CLIDebugger::[name]Command method.
+</p>
+</div>
+
+
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+ <a name="arch_todo">Short-term TODO list</a>
+</div>
+
+<div class="doc_text">
+
+<p>
+FIXME: this section will eventually go away. These are notes to myself of
+things that should be implemented, but haven't yet.
+</p>
+
+<p>
+<b>Breakpoints:</b> Support is already implemented in the 'InferiorProcess'
+class, though it hasn't been tested yet. To finish breakpoint support, we need
+to implement breakCommand (which should reuse the linespec parser from the list
+command), and handle the fact that 'break foo' or 'break file.c:53' may insert
+multiple breakpoints. Also, if you say 'break file.c:53' and there is no
+stoppoint on line 53, the breakpoint should go on the next available line. My
+idea was to have the Debugger class provide a "Breakpoint" class which
+encapsulated this messiness, giving the debugger front-end a simple interface.
+The debugger front-end would have to map the really complex semantics of
+temporary breakpoints and 'conditional' breakpoints onto this intermediate
+level. Also, breakpoints should survive as much as possible across program
+reloads.
+</p>
+
+<p>
+<b>UnixLocalInferiorProcess.cpp speedup</b>: There is no reason for the debugged
+process to code gen the globals corresponding to debug information. The
+IntrinsicLowering object could instead change descriptors into constant expr
+casts of the constant address of the LLVM objects for the descriptors. This
+would also allow us to eliminate the mapping back and forth between physical
+addresses that must be done.</p>
+
+<p>
+<b>Process deaths</b>: The InferiorProcessDead exception should be extended to
+know "how" a process died, i.e., it was killed by a signal. This is easy to
+collect in the UnixLocalInferiorProcess, we just need to represent it.</p>
+
+</div>
+
+<!-- *********************************************************************** -->
+<div class="doc_section">
+ <a name="format">Debugging information format</a>
+</div>
+<!-- *********************************************************************** -->
+
+<div class="doc_text">
+
+<p>LLVM debugging information has been carefully designed to make it possible
+for the optimizer to optimize the program and debugging information without
+necessarily having to know anything about debugging information. In particular,
+the global constant merging pass automatically eliminates duplicated debugging
+information (often caused by header files), the global dead code elimination
+pass automatically deletes debugging information for a function if it decides to
+delete the function, and the linker eliminates debug information when it merges
+<tt>linkonce</tt> functions.</p>
+
+<p>To do this, most of the debugging information (descriptors for types,
+variables, functions, source files, etc) is inserted by the language front-end
+in the form of LLVM global variables. These LLVM global variables are no
+different from any other global variables, except that they have a web of LLVM
+intrinsic functions that point to them. If the last references to a particular
+piece of debugging information are deleted (for example, by the
+<tt>-globaldce</tt> pass), the extraneous debug information will automatically
+become dead and be removed by the optimizer.</p>
+
+<p>The debugger is designed to be agnostic about the contents of most of the
+debugging information. It uses a <a href="#arch_info">source-language-specific
+module</a> to decode the information that represents variables, types,
+functions, namespaces, etc: this allows for arbitrary source-language semantics
+and type-systems to be used, as long as there is a module written for the
+debugger to interpret the information.</p>
+
+<p>To provide basic functionality, the LLVM debugger does have to make some
+assumptions about the source-level language being debugged, though it keeps
+these to a minimum. The only common features that the LLVM debugger assumes
+exist are <a href="#format_common_source_files">source files</a>, and <a
+href="#format_program_objects">program objects</a>. These abstract objects are
+used by the debugger to form stack traces, show information about local
+variables, etc.</p>
+
+<p>This section of the documentation first describes the representation aspects
+common to any source-language. The <a href="#ccxx_frontend">next section</a>
+describes the data layout conventions used by the C and C++ front-ends.</p>
+
+</div>
+
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+ <a name="format_common_anchors">Anchors for global objects</a>
+</div>
+
+<div class="doc_text">
+<p>One important aspect of the LLVM debug representation is that it allows the
+LLVM debugger to efficiently index all of the global objects without having the
+scan the program. To do this, all of the global objects use "anchor" globals of
+type "<tt>{}</tt>", with designated names. These anchor objects obviously do
+not contain any content or meaning by themselves, but all of the global objects
+of a particular type (e.g., source file descriptors) contain a pointer to the
+anchor. This pointer allows the debugger to use def-use chains to find all
+global objects of that type.</p>
+
+<p>So far, the following names are recognized as anchors by the LLVM
+debugger:</p>
+
+<pre>
+ %<a href="#format_common_source_files">llvm.dbg.translation_units</a> = linkonce global {} {}
+ %<a href="#format_program_objects">llvm.dbg.globals</a> = linkonce global {} {}
+</pre>
+
+<p>Using anchors in this way (where the source file descriptor points to the
+anchors, as opposed to having a list of source file descriptors) allows for the
+standard dead global elimination and merging passes to automatically remove
+unused debugging information. If the globals were kept track of through lists,
+there would always be an object pointing to the descriptors, thus would never be
+deleted.</p>
+
+</div>
+
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+ <a name="format_common_stoppoint">
+ Representing stopping points in the source program
+ </a>
+</div>
+
+<div class="doc_text">
+
+<p>LLVM debugger "stop points" are a key part of the debugging representation
+that allows the LLVM to maintain simple semantics for <a
+href="#debugopt">debugging optimized code</a>. The basic idea is that the
+front-end inserts calls to the <tt>%llvm.dbg.stoppoint</tt> intrinsic function
+at every point in the program where the debugger should be able to inspect the
+program (these correspond to places the debugger stops when you "<tt>step</tt>"
+through it). The front-end can choose to place these as fine-grained as it
+would like (for example, before every subexpression evaluated), but it is
+recommended to only put them after every source statement that includes
+executable code.</p>
+
+<p>Using calls to this intrinsic function to demark legal points for the
+debugger to inspect the program automatically disables any optimizations that
+could potentially confuse debugging information. To non-debug-information-aware
+transformations, these calls simply look like calls to an external function,
+which they must assume to do anything (including reading or writing to any part
+of reachable memory). On the other hand, it does not impact many optimizations,
+such as code motion of non-trapping instructions, nor does it impact
+optimization of subexpressions, code duplication transformations, or basic-block
+reordering transformations.</p>
+
+<p>An important aspect of the calls to the <tt>%llvm.dbg.stoppoint</tt>
+intrinsic is that the function-local debugging information is woven together
+with use-def chains. This makes it easy for the debugger to, for example,
+locate the 'next' stop point. For a concrete example of stop points, see the
+example in <a href="#format_common_lifetime">the next section</a>.</p>
+
+</div>
+
+
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+ <a name="format_common_lifetime">Object lifetimes and scoping</a>
+</div>
+
+<div class="doc_text">
+<p>In many languages, the local variables in functions can have their lifetime
+or scope limited to a subset of a function. In the C family of languages, for
+example, variables are only live (readable and writable) within the source block
+that they are defined in. In functional languages, values are only readable
+after they have been defined. Though this is a very obvious concept, it is also
+non-trivial to model in LLVM, because it has no notion of scoping in this sense,
+and does not want to be tied to a language's scoping rules.</p>
+
+<p>In order to handle this, the LLVM debug format uses the notion of "regions"
+of a function, delineated by calls to intrinsic functions. These intrinsic
+functions define new regions of the program and indicate when the region
+lifetime expires. Consider the following C fragment, for example:</p>
+
+<pre>
+1. void foo() {
+2. int X = ...;
+3. int Y = ...;
+4. {
+5. int Z = ...;
+6. ...
+7. }
+8. ...
+9. }
+</pre>
+
+<p>Compiled to LLVM, this function would be represented like this (FIXME: CHECK
+AND UPDATE THIS):</p>
+
+<pre>
+void %foo() {
+ %X = alloca int
+ %Y = alloca int
+ %Z = alloca int
+ <a name="#icl_ex_D1">%D1</a> = call {}* %llvm.dbg.func.start(<a href="#format_program_objects">%lldb.global</a>* %d.foo)
+ %D2 = call {}* <a href="#format_common_stoppoint">%llvm.dbg.stoppoint</a>({}* %D1, uint 2, uint 2, <a href="#format_common_source_files">%lldb.compile_unit</a>* %file)
+
+ %D3 = call {}* %llvm.dbg.DEFINEVARIABLE({}* %D2, ...)
+ <i>;; Evaluate expression on line 2, assigning to X.</i>
+ %D4 = call {}* <a href="#format_common_stoppoint">%llvm.dbg.stoppoint</a>({}* %D3, uint 3, uint 2, <a href="#format_common_source_files">%lldb.compile_unit</a>* %file)
+
+ %D5 = call {}* %llvm.dbg.DEFINEVARIABLE({}* %D4, ...)
+ <i>;; Evaluate expression on line 3, assigning to Y.</i>
+ %D6 = call {}* <a href="#format_common_stoppoint">%llvm.dbg.stoppoint</a>({}* %D5, uint 5, uint 4, <a href="#format_common_source_files">%lldb.compile_unit</a>* %file)
+
+ <a name="#icl_ex_D1">%D7</a> = call {}* %llvm.region.start({}* %D6)
+ %D8 = call {}* %llvm.dbg.DEFINEVARIABLE({}* %D7, ...)
+ <i>;; Evaluate expression on line 5, assigning to Z.</i>
+ %D9 = call {}* <a href="#format_common_stoppoint">%llvm.dbg.stoppoint</a>({}* %D8, uint 6, uint 4, <a href="#format_common_source_files">%lldb.compile_unit</a>* %file)
+
+ <i>;; Code for line 6.</i>
+ %D10 = call {}* %llvm.region.end({}* %D9)
+ %D11 = call {}* <a href="#format_common_stoppoint">%llvm.dbg.stoppoint</a>({}* %D10, uint 8, uint 2, <a href="#format_common_source_files">%lldb.compile_unit</a>* %file)
+
+ <i>;; Code for line 8.</i>
+ <a name="#icl_ex_D1">%D12</a> = call {}* %llvm.region.end({}* %D11)
+ ret void
+}
+</pre>
+
+<p>This example illustrates a few important details about the LLVM debugging
+information. In particular, it shows how the various intrinsics used are woven
+together with def-use and use-def chains, similar to how <a
+href="#format_common_anchors">anchors</a> are used with globals. This allows
+the debugger to analyze the relationship between statements, variable
+definitions, and the code used to implement the function.</p>
+
+<p>In this example, two explicit regions are defined, one with the <a
+href="#icl_ex_D1">definition of the <tt>%D1</tt> variable</a> and one with the
+<a href="#icl_ex_D7">definition of <tt>%D7</tt></a>. In the case of
+<tt>%D1</tt>, the debug information indicates that the function whose <a
+href="#format_program_objects">descriptor</a> is specified as an argument to the
+intrinsic. This defines a new stack frame whose lifetime ends when the region
+is ended by <a href="#icl_ex_D12">the <tt>%D12</tt> call</a>.</p>
+
+<p>Using regions to represent the boundaries of source-level functions allow
+LLVM interprocedural optimizations to arbitrarily modify LLVM functions without
+having to worry about breaking mapping information between the LLVM code and the
+and source-level program. In particular, the inliner requires no m