diff options
author | Chris Lattner <sabre@nondot.org> | 2008-11-22 21:41:31 +0000 |
---|---|---|
committer | Chris Lattner <sabre@nondot.org> | 2008-11-22 21:41:31 +0000 |
commit | 62fd278ff94d1df43652ec30a48fe02bb598e68e (patch) | |
tree | cd0eadb8a14e041705d32175a343c9c9eb4cb0fd /docs/InternalsManual.html | |
parent | 717596279bfb6d45b0fc1cad36a9aa1ba6ecbd9f (diff) |
start documenting Diagnostics. Sebastian, I'd appreciate it
if you can fill in the section for %plural.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@59883 91177308-0d34-0410-b5e6-96231b3b80d8
Diffstat (limited to 'docs/InternalsManual.html')
-rw-r--r-- | docs/InternalsManual.html | 213 |
1 files changed, 210 insertions, 3 deletions
diff --git a/docs/InternalsManual.html b/docs/InternalsManual.html index 2e44640c98..6ecc5d6149 100644 --- a/docs/InternalsManual.html +++ b/docs/InternalsManual.html @@ -17,6 +17,7 @@ <li><a href="#libsystem">LLVM System and Support Libraries</a></li> <li><a href="#libbasic">The clang 'Basic' Library</a> <ul> + <li><a href="#Diagnostics">The Diagnostics Subsystem</a></li> <li><a href="#SourceLocation">The SourceLocation and SourceManager classes</a></li> </ul> @@ -84,6 +85,204 @@ classes somewhere else, or introduce some other solution.</p> <p>We describe the roles of these classes in order of their dependencies.</p> + +<!-- ======================================================================= --> +<h3 id="Diagnostics">The Diagnostics Subsystem</h3> +<!-- ======================================================================= --> + +<p>The Clang Diagnostics subsystem is an important part of how the compiler +communicates with the human. Diagnostics are the warnings and errors produced +when the code is incorrect or dubious. In Clang, each diagnostic produced has +(at the minimum) a unique ID, a <a href="#SourceLocation">SourceLocation</a> to +"put the caret", an English translation associated with it, and a severity (e.g. +<tt>WARNING</tt> or <tt>ERROR</tt>). They can also optionally include a number +of arguments to the dianostic (which fill in "%0"'s in the string) as well as a +number of source ranges that related to the diagnostic.</p> + +<p>In this section, we'll be giving examples produced by the clang command line +driver, but diagnostics can be <a href="#DiagnosticClient">rendered in many +different ways</a> depending on how the DiagnosticClient interface is +implemented. A representative example of a diagonstic is:</p> + +<pre> +t.c:38:15: error: invalid operands to binary expression ('int *' and '_Complex float') + <font color="darkgreen">P = (P-42) + Gamma*4;</font> + <font color="blue">~~~~~~ ^ ~~~~~~~</font> +</pre> + +<p>In this example, you can see the English translation, the severity (error), +you can see the source location (the caret ("^") and file/line/column info), +the source ranges "~~~~", arguments to the diagnostic ("int*" and "_Complex +float"). You'll have to believe me that there is a unique ID backing the +diagnostic :).</p> + +<p>Getting all of this to happen has several steps and involves many moving +pieces, this section describes them and talks about best practices when adding +a new diagnostic.</p> + +<!-- ============================ --> +<h4>The DiagnosticKinds.def file</h4> +<!-- ============================ --> + +<p>Diagnostics are created by adding an entry to the <tt><a +href="http://llvm.org/svn/llvm-project/cfe/trunk/include/clang/Basic/DiagnosticKinds.def" +>DiagnosticKinds.def</a></tt> file. This file encodes the unique ID of the +diagnostic (as an enum, the first argument), the severity of the diagnostic +(second argument) and the English translation + format string.</p> + +<p>There is little sanity with the naming of the unique ID's right now. Some +start with err_, warn_, ext_ to encode the severity into the name. Since the +enum is referenced in the C++ code that produces the diagnostic, it is somewhat +useful for it to be reasonably short.</p> + +<p>The severity of the diagnostic comes from the set {<tt>NOTE</tt>, +<tt>WARNING</tt>, <tt>EXTENSION</tt>, <tt>EXTWARN</tt>, <tt>ERROR</tt>}. The +<tt>ERROR</tt> severity is used for diagnostics indicating the program is never +acceptable under any circumstances. When an error is emitted, the AST for the +input code may not be fully built. The <tt>EXTENSION</tt> and <tt>EXTWARN</tt> +severities are used for extensions to the language that Clang accepts. This +means that Clang fully understands and can represent them in the AST, but we +produce diagnostics to tell the user their code is non-portable. The difference +is that the former are ignored by default, and the later warn by default. The +<tt>WARNING</tt> severity is used for constructs that are valid in the currently +selected source language but that are dubious in some way. The <tt>NOTE</tt> +level is used to staple more information onto a previous diagnostics.</p> + +<p>These <em>severities</em> are mapped into a smaller set (the +Diagnostic::Level enum, {<tt>Ignored</tt>, <tt>Note</tt>, <tt>Warning</tt>, +<tt>Error</tt> }) of output <em>levels</em> by the diagnostics subsystem based +on various configuration options. For example, if the user specifies +<tt>-pedantic</tt>, <tt>EXTENSION</tt> maps to <tt>Warning</tt>, if they specify +<tt>-pedantic-errors</tt>, it turns into <tt>Error</tt>. Clang also internally +supports a fully fine grained mapping mechanism that allows you to map any +diagnostic that doesn't have <tt>ERRROR</tt> severity to any output level that +you want. This is used to implement options like <tt>-Wunused_macros</tt>, +<tt>-Wundef</tt> etc.</p> + +<!-- ================= --> +<h4>The Format String</h4> +<!-- ================= --> + +<p>The format string for the diagnostic is very simple, but it has some power. +It takes the form of a string in English with markers that indicate where and +how arguments to the diagnostic are inserted and formatted. For example, here +are some simple format strings:</p> + +<pre> + "binary integer literals are an extension" + "format string contains '\\0' within the string body" + "more '<b>%%</b>' conversions than data arguments" + "invalid operands to binary expression ('<b>%0</b>' and '<b>%1</b>')" + "overloaded '<b>%0</b>' must be a <b>%select{unary|binary|unary or binary}2</b> operator" + " (has <b>%1</b> parameter<b>%s1</b>)" +</pre> + +<p>These examples show some important points of format strings. You can use any + plain ASCII character in the diagnostic string except "%" without a problem, + but these are C strings, so you have to use and be aware of all the C escape + sequences (as in the second example). If you want to produce a "%" in the + output, use the "%%" escape sequence, like the third diagnostic. Finally, + clang uses the "%...[digit]" sequences to specify where and how arguments to + the diagnostic are formatted.</p> + +<p>Arguments to the diagnostic are numbered according to how they are specified + by the C++ code that <a href="#producingdiag">produces them</a>, and are + referenced by <tt>%0</tt> .. <tt>%9</tt>. If you have more than 10 arguments + to your diagnostic, you are doing something wrong. :). Unlike printf, there + is no requirement that arguments to the diagnostic end up in the output in + the same order as they are specified, you could have a format string with + <tt>"%1 %0"</tt> that swaps them, for example. The text in between the + percent and digit are formatting instructions. If there are no instructions, + the argument is just turned into a string and substituted in.</p> + +<p>Here are some "best practices" for writing the English format string:</p> + +<ul> +<li>Keep the string short. It should ideally fit in the 80 column limit of the + <tt>DiagnosticKinds.def</tt> file. This avoids the diagnostic wrapping when + printed, and forces you to think about the important point you are conveying + with the diagnostic.</li> +<li>Take advantage of location information. The user will be able to see the + line and location of the caret, so you don't need to tell them that the + problem is with the 4th argument to the function: just point to it.</li> +<li>Do not capitalize the diagnostic string, and do not end it with a + period.</li> +<li>If you need to quote something in the diagnostic string, use single + quotes.</li> +</ul> + +<p>Diagnostics should never take random English strings as arguments: you +shouldn't use <tt>"you have a problem with %0"</tt> and pass in things like +<tt>"your argument"</tt> or <tt>"your return value"</tt> as arguments. Doing +this prevents <a href="translation">translating</a> the Clang diagnostics to +other languages (because they'll get random English words in their otherwise +localized diagnostic). The exceptions to this are C/C++ language keywords +(e.g. auto, const, mutable, etc) and C/C++ operators (<tt>/=</tt>). Note +that things like "pointer" and "reference" are not keywords. On the other +hand, you <em>can</em> include anything that comes from the user's source code, +including variable names, types, labels, etc.</p> + +<!-- ==================================== --> +<h4>Formatting a Diagnostic Argument</a></h4> +<!-- ==================================== --> + +<p>Arguments to diagnostics are fully typed internally, and come from a couple +different classes: integers, types, names, and random strings. Depending on +the class of the argument, it can be optionally formatted in different ways. +This gives the DiagnosticClient information about what the argument means +without requiring it to use a specific presentation (consider this MVC for +Clang :).</p> + +<p>Here are the different diagnostic argument formats currently supported by +Clang:</p> + +<table> +<tr><td colspan="2"><b>"s" format</b></td></tr> +<tr><td>Example:</td><td><tt>"requires %1 parameter%s1"</tt></td></tr> +<tr><td>Classes:</td><td>Integers</td></tr> +<tr><td>Description:</td><td>This is a simple formatter for integers that is + useful when producing English diagnostics. When the integer is 1, it prints + as nothing. When the integer is not 1, it prints as "s". This allows some + simple grammar to be to be handled correctly, and eliminates the need to use + gross things like <tt>"rewrite %1 parameter(s)"</tt>.</td></tr> + +<tr><td colspan="2"><b>"select" format</b></td></tr> +<tr><td>Example:</td><td><tt>"must be a %select{unary|binary|unary or binary}2 + operator"</tt></td></tr> +<tr><td>Classes:</td><td>Integers</td></tr> +<tr><td>Description:</td><td>...</td></tr> + +<tr><td colspan="2"><b>"plural" format</b></td></tr> +<tr><td>Example:</td><td><tt>".."</tt></td></tr> +<tr><td>Classes:</td><td>Integers</td></tr> +<tr><td>Description:</td><td>...</td></tr> + + +</table> + + + + +<!-- ===================================================== --> +<h4><a name="#producingdiag">Producing the Diagnostic</a></h4> +<!-- ===================================================== --> + +<p>SemaExpr.cpp example</p> + + +<!-- ============================================================= --> +<h4><a name="DiagnosticClient">The DiagnosticClient Interface</a></h4> +<!-- ============================================================= --> + +<p>Clang command line, buffering, HTMLizing, etc.</p> + +<!-- ====================================================== --> +<h4><a name="translation">Adding Translations to Clang</a></h4> +<!-- ====================================================== --> + +<p>Not possible yet!</p> + + <!-- ======================================================================= --> <h3 id="SourceLocation">The SourceLocation and SourceManager classes</h3> <!-- ======================================================================= --> @@ -367,7 +566,9 @@ efficient way to query whether two types are structurally identical to each other, ignoring typedefs. The solution to both of these problems is the idea of canonical types.</p> +<!-- =============== --> <h4>Canonical Types</h4> +<!-- =============== --> <p>Every instance of the Type class contains a canonical type pointer. For simple types with no typedefs involved (e.g. "<tt>int</tt>", "<tt>int*</tt>", @@ -565,7 +766,9 @@ useful for performing <a href="http://en.wikipedia.org/wiki/Data_flow_analysis#Sensitivities">flow- or path-sensitive</a> program analyses on a given function.</p> +<!-- ============ --> <h4>Basic Blocks</h4> +<!-- ============ --> <p>Concretely, an instance of <tt>CFG</tt> is a collection of basic blocks. Each basic block is an instance of <tt>CFGBlock</tt>, which @@ -587,7 +790,9 @@ should be made on how <tt>CFGBlock</tt>s are numbered other than their numbers are unique and that they are numbered from 0..N-1 (where N is the number of basic blocks in the CFG).</p> +<!-- ===================== --> <h4>Entry and Exit Blocks</h4> +<!-- ===================== --> Each instance of <tt>CFG</tt> contains two special blocks: an <i>entry</i> block (accessible via <tt>CFG::getEntry()</tt>), which @@ -598,7 +803,9 @@ clear entrance and exit for a body of code such as a function body. The presence of these empty blocks greatly simplifies the implementation of many analyses built on top of CFGs. +<!-- ===================================================== --> <h4 id ="ConditionalControlFlow">Conditional Control-Flow</h4> +<!-- ===================================================== --> <p>Conditional control-flow (such as those induced by if-statements and loops) is represented as edges between <tt>CFGBlock</tt>s. @@ -716,9 +923,9 @@ block B4 (i.e., B4.2). In this manner, conditions for control-flow (which also includes conditions for loops and switch statements) are hoisted into the actual basic block.</p> -<!-- -<h4>Implicit Control-Flow</h4> ---> +<!-- ===================== --> +<!-- <h4>Implicit Control-Flow</h4> --> +<!-- ===================== --> <!-- <p>A key design principle of the <tt>CFG</tt> class was to not require |