start documenting Diagnostics. Sebastian, I'd appreciate it

if you can fill in the section for %plural. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@59883 91177308-0d34-0410-b5e6-96231b3b80d8
author: Chris Lattner <sabre@nondot.org> 2008-11-22 21:41:31 +0000
committer: Chris Lattner <sabre@nondot.org> 2008-11-22 21:41:31 +0000
commit: 62fd278ff94d1df43652ec30a48fe02bb598e68e (patch)
tree: cd0eadb8a14e041705d32175a343c9c9eb4cb0fd /docs/InternalsManual.html
parent: 717596279bfb6d45b0fc1cad36a9aa1ba6ecbd9f (diff)
1 files changed, 210 insertions, 3 deletions
diff --git a/docs/InternalsManual.html b/docs/InternalsManual.html
index 2e44640c98..6ecc5d6149 100644
--- a/docs/InternalsManual.html
+++ b/docs/InternalsManual.html
@@ -17,6 +17,7 @@
 <li><a href="#libsystem">LLVM System and Support Libraries</a></li>
 <li><a href="#libbasic">The clang 'Basic' Library</a>
   <ul>
+  <li><a href="#Diagnostics">The Diagnostics Subsystem</a></li>
   <li><a href="#SourceLocation">The SourceLocation and SourceManager
       classes</a></li>
   </ul>
@@ -84,6 +85,204 @@ classes somewhere else, or introduce some other solution.</p>
 
 <p>We describe the roles of these classes in order of their dependencies.</p>
 
+
+<!-- ======================================================================= -->
+<h3 id="Diagnostics">The Diagnostics Subsystem</h3>
+<!-- ======================================================================= -->
+
+<p>The Clang Diagnostics subsystem is an important part of how the compiler
+communicates with the human.  Diagnostics are the warnings and errors produced
+when the code is incorrect or dubious.  In Clang, each diagnostic produced has
+(at the minimum) a unique ID, a <a href="#SourceLocation">SourceLocation</a> to
+"put the caret", an English translation associated with it, and a severity (e.g.
+<tt>WARNING</tt> or <tt>ERROR</tt>).  They can also optionally include a number
+of arguments to the dianostic (which fill in "%0"'s in the string) as well as a
+number of source ranges that related to the diagnostic.</p>
+
+<p>In this section, we'll be giving examples produced by the clang command line
+driver, but diagnostics can be <a href="#DiagnosticClient">rendered in many
+different ways</a> depending on how the DiagnosticClient interface is
+implemented.  A representative example of a diagonstic is:</p>
+
+<pre>
+t.c:38:15: error: invalid operands to binary expression ('int *' and '_Complex float')
+   <font color="darkgreen">P = (P-42) + Gamma*4;</font>
+       <font color="blue">~~~~~~ ^ ~~~~~~~</font>
+</pre>
+
+<p>In this example, you can see the English translation, the severity (error),
+you can see the source location (the caret ("^") and file/line/column info),
+the source ranges "~~~~", arguments to the diagnostic ("int*" and "_Complex
+float").  You'll have to believe me that there is a unique ID backing the
+diagnostic :).</p>
+
+<p>Getting all of this to happen has several steps and involves many moving
+pieces, this section describes them and talks about best practices when adding
+a new diagnostic.</p>
+
+<!-- ============================ -->
+<h4>The DiagnosticKinds.def file</h4>
+<!-- ============================ -->
+
+<p>Diagnostics are created by adding an entry to the <tt><a
+href="http://llvm.org/svn/llvm-project/cfe/trunk/include/clang/Basic/DiagnosticKinds.def"
+>DiagnosticKinds.def</a></tt> file.  This file encodes the unique ID of the 
+diagnostic (as an enum, the first argument), the severity of the diagnostic
+(second argument) and the English translation + format string.</p>
+
+<p>There is little sanity with the naming of the unique ID's right now.  Some
+start with err_, warn_, ext_ to encode the severity into the name.  Since the
+enum is referenced in the C++ code that produces the diagnostic, it is somewhat
+useful for it to be reasonably short.</p>
+
+<p>The severity of the diagnostic comes from the set {<tt>NOTE</tt>,
+<tt>WARNING</tt>, <tt>EXTENSION</tt>, <tt>EXTWARN</tt>, <tt>ERROR</tt>}.  The
+<tt>ERROR</tt> severity is used for diagnostics indicating the program is never
+acceptable under any circumstances.  When an error is emitted, the AST for the
+input code may not be fully built.  The <tt>EXTENSION</tt> and <tt>EXTWARN</tt>
+severities are used for extensions to the language that Clang accepts.  This
+means that Clang fully understands and can represent them in the AST, but we
+produce diagnostics to tell the user their code is non-portable.  The difference
+is that the former are ignored by default, and the later warn by default.  The
+<tt>WARNING</tt> severity is used for constructs that are valid in the currently
+selected source language but that are dubious in some way.  The <tt>NOTE</tt>
+level is used to staple more information onto a previous diagnostics.</p>
+
+<p>These <em>severities</em> are mapped into a smaller set (the
+Diagnostic::Level enum, {<tt>Ignored</tt>, <tt>Note</tt>, <tt>Warning</tt>,
+<tt>Error</tt> }) of output <em>levels</em> by the diagnostics subsystem based
+on various configuration options.  For example, if the user specifies
+<tt>-pedantic</tt>, <tt>EXTENSION</tt> maps to <tt>Warning</tt>, if they specify
+<tt>-pedantic-errors</tt>, it turns into <tt>Error</tt>.  Clang also internally
+supports a fully fine grained mapping mechanism that allows you to map any
+diagnostic that doesn't have <tt>ERRROR</tt> severity to any output level that
+you want.  This is used to implement options like <tt>-Wunused_macros</tt>,
+<tt>-Wundef</tt> etc.</p>
+
+<!-- ================= -->
+<h4>The Format String</h4>
+<!-- ================= -->
+
+<p>The format string for the diagnostic is very simple, but it has some power.
+It takes the form of a string in English with markers that indicate where and
+how arguments to the diagnostic are inserted and formatted.  For example, here
+are some simple format strings:</p>
+
+<pre>
+  "binary integer literals are an extension"
+  "format string contains '\\0' within the string body"
+  "more '<b>%%</b>' conversions than data arguments"
+  "invalid operands to binary expression ('<b>%0</b>' and '<b>%1</b>')"
+  "overloaded '<b>%0</b>' must be a <b>%select{unary|binary|unary or binary}2</b> operator"
+       " (has <b>%1</b> parameter<b>%s1</b>)"
+</pre>
+
+<p>These examples show some important points of format strings.  You can use any
+   plain ASCII character in the diagnostic string except "%" without a problem,
+   but these are C strings, so you have to use and be aware of all the C escape
+   sequences (as in the second example).  If you want to produce a "%" in the
+   output, use the "%%" escape sequence, like the third diagnostic.  Finally,
+   clang uses the "%...[digit]" sequences to specify where and how arguments to
+   the diagnostic are formatted.</p>
+   
+<p>Arguments to the diagnostic are numbered according to how they are specified
+   by the C++ code that <a href="#producingdiag">produces them</a>, and are
+   referenced by <tt>%0</tt> .. <tt>%9</tt>.  If you have more than 10 arguments
+   to your diagnostic, you are doing something wrong. :).  Unlike printf, there
+   is no requirement that arguments to the diagnostic end up in the output in
+   the same order as they are specified, you could have a format string with
+   <tt>"%1 %0"</tt> that swaps them, for example.  The text in between the
+   percent and digit are formatting instructions.  If there are no instructions,
+   the argument is just turned into a string and substituted in.</p>
+
+<p>Here are some "best practices" for writing the English format string:</p>
+
+<ul>
+<li>Keep the string short.  It should ideally fit in the 80 column limit of the
+    <tt>DiagnosticKinds.def</tt> file.  This avoids the diagnostic wrapping when
+    printed, and forces you to think about the important point you are conveying
+    with the diagnostic.</li>
+<li>Take advantage of location information.  The user will be able to see the
+    line and location of the caret, so you don't need to tell them that the
+    problem is with the 4th argument to the function: just point to it.</li>
+<li>Do not capitalize the diagnostic string, and do not end it with a
+    period.</li>
+<li>If you need to quote something in the diagnostic string, use single
+    quotes.</li>
+</ul>
+
+<p>Diagnostics should never take random English strings as arguments: you
+shouldn't use <tt>"you have a problem with %0"</tt> and pass in things like
+<tt>"your argument"</tt> or <tt>"your return value"</tt> as arguments. Doing
+this prevents <a href="translation">translating</a> the Clang diagnostics to
+other languages (because they'll get random English words in their otherwise
+localized diagnostic).  The exceptions to this are C/C++ language keywords
+(e.g. auto, const, mutable, etc) and C/C++ operators (<tt>/=</tt>).  Note
+that things like "pointer" and "reference" are not keywords.  On the other
+hand, you <em>can</em> include anything that comes from the user's source code,
+including variable names, types, labels, etc.</p>
+
+<!-- ==================================== -->
+<h4>Formatting a Diagnostic Argument</a></h4>
+<!-- ==================================== -->
+
+<p>Arguments to diagnostics are fully typed internally, and come from a couple
+different classes: integers, types, names, and random strings.  Depending on
+the class of the argument, it can be optionally formatted in different ways.
+This gives the DiagnosticClient information about what the argument means
+without requiring it to use a specific presentation (consider this MVC for
+Clang :).</p>
+
+<p>Here are the different diagnostic argument formats currently supported by
+Clang:</p>
+
+<table>
+<tr><td colspan="2"><b>"s" format</b></td></tr>
+<tr><td>Example:</td><td><tt>"requires %1 parameter%s1"</tt></td></tr>
+<tr><td>Classes:</td><td>Integers</td></tr>
+<tr><td>Description:</td><td>This is a simple formatter for integers that is
+    useful when producing English diagnostics.  When the integer is 1, it prints
+    as nothing.  When the integer is not 1, it prints as "s".  This allows some
+    simple grammar to be to be handled correctly, and eliminates the need to use
+    gross things like <tt>"rewrite %1 parameter(s)"</tt>.</td></tr>
+
+<tr><td colspan="2"><b>"select" format</b></td></tr>
+<tr><td>Example:</td><td><tt>"must be a %select{unary|binary|unary or binary}2
+     operator"</tt></td></tr>
+<tr><td>Classes:</td><td>Integers</td></tr>
+<tr><td>Description:</td><td>...</td></tr>
+
+<tr><td colspan="2"><b>"plural" format</b></td></tr>
+<tr><td>Example:</td><td><tt>".."</tt></td></tr>
+<tr><td>Classes:</td><td>Integers</td></tr>
+<tr><td>Description:</td><td>...</td></tr>
+
+
+</table>
+
+
+
+
+<!-- ===================================================== -->
+<h4><a name="#producingdiag">Producing the Diagnostic</a></h4>
+<!-- ===================================================== -->
+
+<p>SemaExpr.cpp example</p>
+
+
+<!-- ============================================================= -->
+<h4><a name="DiagnosticClient">The DiagnosticClient Interface</a></h4>
+<!-- ============================================================= -->
+
+<p>Clang command line, buffering, HTMLizing, etc.</p>
+
+<!-- ====================================================== -->
+<h4><a name="translation">Adding Translations to Clang</a></h4>
+<!-- ====================================================== -->
+
+<p>Not possible yet!</p>
+
+
 <!-- ======================================================================= -->
 <h3 id="SourceLocation">The SourceLocation and SourceManager classes</h3>
 <!-- ======================================================================= -->
@@ -367,7 +566,9 @@ efficient way to query whether two types are structurally identical to each
 other, ignoring typedefs.  The solution to both of these problems is the idea of
 canonical types.</p>
 
+<!-- =============== -->
 <h4>Canonical Types</h4>
+<!-- =============== -->
 
 <p>Every instance of the Type class contains a canonical type pointer.  For
 simple types with no typedefs involved (e.g. "<tt>int</tt>", "<tt>int*</tt>",
@@ -565,7 +766,9 @@ useful for performing
 <a href="http://en.wikipedia.org/wiki/Data_flow_analysis#Sensitivities">flow-
 or path-sensitive</a> program analyses on a given function.</p>
 
+<!-- ============ -->
 <h4>Basic Blocks</h4>
+<!-- ============ -->
 
 <p>Concretely, an instance of <tt>CFG</tt> is a collection of basic
 blocks.  Each basic block is an instance of <tt>CFGBlock</tt>, which
@@ -587,7 +790,9 @@ should be made on how <tt>CFGBlock</tt>s are numbered other than their
 numbers are unique and that they are numbered from 0..N-1 (where N is
 the number of basic blocks in the CFG).</p>
 
+<!-- ===================== -->
 <h4>Entry and Exit Blocks</h4>
+<!-- ===================== -->
 
 Each instance of <tt>CFG</tt> contains two special blocks:
 an <i>entry</i> block (accessible via <tt>CFG::getEntry()</tt>), which
@@ -598,7 +803,9 @@ clear entrance and exit for a body of code such as a function body.
 The presence of these empty blocks greatly simplifies the
 implementation of many analyses built on top of CFGs.
 
+<!-- ===================================================== -->
 <h4 id ="ConditionalControlFlow">Conditional Control-Flow</h4>
+<!-- ===================================================== -->
 
 <p>Conditional control-flow (such as those induced by if-statements
 and loops) is represented as edges between <tt>CFGBlock</tt>s.
@@ -716,9 +923,9 @@ block B4 (i.e., B4.2).  In this manner, conditions for control-flow
 (which also includes conditions for loops and switch statements) are
 hoisted into the actual basic block.</p>
 
-<!--
-<h4>Implicit Control-Flow</h4>
--->
+<!-- ===================== -->
+<!-- <h4>Implicit Control-Flow</h4> -->
+<!-- ===================== -->
 
 <!--
 <p>A key design principle of the <tt>CFG</tt> class was to not require
author	Chris Lattner <sabre@nondot.org>	2008-11-22 21:41:31 +0000
committer	Chris Lattner <sabre@nondot.org>	2008-11-22 21:41:31 +0000
commit	62fd278ff94d1df43652ec30a48fe02bb598e68e (patch)
tree	cd0eadb8a14e041705d32175a343c9c9eb4cb0fd /docs/InternalsManual.html
parent	717596279bfb6d45b0fc1cad36a9aa1ba6ecbd9f (diff)