aboutsummaryrefslogtreecommitdiff
path: root/www/analyzer/checker_dev_manual.html
blob: 5368eb0e96186e9a4191929aa0e1a44f38976010 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
          "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
  <title>Checker Developer Manual</title>
  <link type="text/css" rel="stylesheet" href="menu.css">
  <link type="text/css" rel="stylesheet" href="content.css">
  <script type="text/javascript" src="scripts/menu.js"></script>
</head>
<body>

<div id="page">
<!--#include virtual="menu.html.incl"-->

<div id="content">

<h1 style="color:red">This Page Is Under Construction</h1>

<h1>Checker Developer Manual</h1>

<p>The static analyzer engine performs symbolic execution of the program and 
relies on a set of checkers to implement the logic for detecting and 
constructing bug reports. This page provides hints and guidelines for anyone 
who is interested in implementing their own checker. The static analyzer is a 
part of the Clang project, so consult <a href="http://clang.llvm.org/hacking.html">Hacking on Clang</a> 
and <a href="http://llvm.org/docs/ProgrammersManual.html">LLVM Programmer's Manual</a>
for general developer guidelines and information. </p>

    <ul>
      <li><a href="#start">Getting Started</a></li>
      <li><a href="#analyzer">Analyzer Overview</a></li>
      <li><a href="#idea">Idea for a Checker</a></li>
      <li><a href="#registration">Checker Registration</a></li>
      <li><a href="#skeleton">Checker Skeleton</a></li>
      <li><a href="#node">Exploded Node</a></li>
      <li><a href="#bugs">Bug Reports</a></li>
      <li><a href="#ast">AST Visitors</a></li>
      <li><a href="#testing">Testing</a></li>
      <li><a href="#commands">Useful Commands</a></li>
    </ul>

<h2 id=start>Getting Started</h2>
  <ul>
    <li>To check out the source code and build the project, follow steps 1-4 of 
    the <a href="http://clang.llvm.org/get_started.html">Clang Getting Started</a> 
  page.</li>

    <li>The analyzer source code is located under the Clang source tree:
    <br><tt>
    $ <b>cd llvm/tools/clang</b>
    </tt>
    <br>See: <tt>include/clang/StaticAnalyzer</tt>, <tt>lib/StaticAnalyzer</tt>,
     <tt>test/Analysis</tt>.</li>

    <li>The analyzer regression tests can be executed from the Clang's build 
    directory:
    <br><tt>
    $ <b>cd ../../../; cd build/tools/clang; TESTDIRS=Analysis make test</b>
    </tt></li>
    
    <li>Analyze a file with the specified checker:
    <br><tt>
    $ <b>clang -cc1 -analyze -analyzer-checker=core.DivideZero test.c</b>
    </tt></li>

    <li>List the available checkers:
    <br><tt>
    $ <b>clang -cc1 -analyzer-checker-help</b>
    </tt></li>

    <li>See the analyzer help for different output formats, fine tuning, and 
    debug options:
    <br><tt>
    $ <b>clang -cc1 -help | grep "analyzer"</b>
    </tt></li>

  </ul>
 
<h2 id=analyzer>Static Analyzer Overview</h2>
  The analyzer core performs symbolic execution of the given program. All the 
  input values are represented with symbolic values; further, the engine deduces 
  the values of all the expressions in the program based on the input symbols  
  and the path. The execution is path sensitive and every possible path through 
  the program is explored. The explored execution traces are represented with 
  <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedGraph.html">ExplodedGraph</a> object.
  Each node of the graph is 
  <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedNode.html">ExplodedNode</a>, 
  which consists of a <tt>ProgramPoint</tt> and a <tt>ProgramState</tt>.
  <p>
  <a href="http://clang.llvm.org/doxygen/classclang_1_1ProgramPoint.html">ProgramPoint</a> 
  represents the corresponding location in the program (or the CFG graph). 
  <tt>ProgramPoint</tt> is also used to record additional information on 
  when/how the state was added. For example, <tt>PostPurgeDeadSymbolsKind</tt> 
  kind means that the state is the result of purging dead symbols - the 
  analyzer's equivalent of garbage collection. 
  <p>
  <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ProgramState.html">ProgramState</a> 
  represents abstract state of the program. It consists of:
  <ul>
    <li><tt>Environment</tt> - a mapping from source code expressions to symbolic 
    values
    <li><tt>Store</tt> - a mapping from memory locations to symbolic values
    <li><tt>GenericDataMap</tt> - constraints on symbolic values
  </ul>
  
  <h3>Interaction with Checkers</h3>
  Checkers are not merely passive receivers of the analyzer core changes - they 
  actively participate in the <tt>ProgramState</tt> construction through the
  <tt>GenericDataMap</tt> which can be used to store the checker-defined part 
  of the state. Each time the analyzer engine explores a new statement, it 
  notifies each checker registered to listen for that statement, giving it an 
  opportunity to either report a bug or modify the state. (As a rule of thumb, 
  the checker itself should be stateless.) The checkers are called one after another 
  in the predefined order; thus, calling all the checkers adds a chain to the 
  <tt>ExplodedGraph</tt>. 
  
  <h3>Representing Values</h3>
  During symbolic execution, <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SVal.html">SVal</a> 
  objects are used to represent the semantic evaluation of expressions. 
  They can represent things like concrete 
  integers, symbolic values, or memory locations (which are memory regions). 
  They are a discriminated union of "values", symbolic and otherwise. 
  If a value isn't symbolic, usually that means there is no symbolic 
  information to track. For example, if the value was an integer, such as 
  <tt>42</tt>, it would be a <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1nonloc_1_1ConcreteInt.html">ConcreteInt</a>, 
  and the checker doesn't usually need to track any state with the concrete 
  number. In some cases, <tt>SVal</tt> is not a symbol, but it really should be 
  a symbolic value. This happens when the analyzer cannot reason about something 
  (yet). An example is floating point numbers. In such cases, the 
  <tt>SVal</tt> will evaluate to <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1UnknownVal.html">UnknownVal<a>. 
  This represents a case that is outside the realm of the analyzer's reasoning 
  capabilities. <tt>SVals</tt> are value objects and their values can be viewed 
  using the <tt>.dump()</tt> method. Often they wrap persistent objects such as 
  symbols or regions. 
  <p>
  <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymExpr.html">SymExpr</a> (symbol) 
  is meant to represent abstract, but named, symbolic value. Symbols represent 
  an actual (immutable) value. We might not know what its specific value is, but 
  we can associate constraints with that value as we analyze a path. For 
  example, we might record that the value of a symbol is greater than 
  <tt>0</tt>, etc.
  <p>

  <p>
  <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1MemRegion.html">MemRegion</a> is similar to a symbol.  
  It is used to provide a lexicon of how to describe abstract memory. Regions can 
  layer on top of other regions, providing a layered approach to representing memory. 
  For example, a struct object on the stack might be represented by a <tt>VarRegion</tt>, 
  but a <tt>FieldRegion</tt> which is a subregion of the <tt>VarRegion</tt> could 
  be used to represent the memory associated with a specific field of that object.
  So how do we represent symbolic memory regions? That's what 
  <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymbolicRegion.html">SymbolicRegion</a> 
  is for. It is a <tt>MemRegion</tt> that has an associated symbol. Since the 
  symbol is unique and has a unique name; that symbol names the region.
  
  <P>
  Let's see how the analyzer processes the expressions in the following example:
  <p>
  <pre class="code_example">
  int foo(int x) {
     int y = x * 2;
     int z = x;
     ...
  }
  </pre>
  <p>
Let's look at how <tt>x*2</tt> gets evaluated. When <tt>x</tt> is evaluated, 
we first construct an <tt>SVal</tt> that represents the lvalue of <tt>x</tt>, in 
this case it is an <tt>SVal</tt> that references the <tt>MemRegion</tt> for <tt>x</tt>. 
Afterwards, when we do the lvalue-to-rvalue conversion, we get a new <tt>SVal</tt>, 
which references the value <b>currently bound</b> to <tt>x</tt>. That value is 
symbolic; it's whatever <tt>x</tt> was bound to at the start of the function. 
Let's call that symbol <tt>$0</tt>. Similarly, we evaluate the expression for <tt>2</tt>, 
and get an <tt>SVal</tt> that references the concrete number <tt>2</tt>. When 
we evaluate <tt>x*2</tt>, we take the two <tt>SVals</tt> of the subexpressions, 
and create a new <tt>SVal</tt> that represents their multiplication (which in 
this case is a new symbolic expression, which we might call <tt>$1</tt>). When we 
evaluate the assignment to <tt>y</tt>, we again compute its lvalue (a <tt>MemRegion</tt>), 
and then bind the <tt>SVal</tt> for the RHS (which references the symbolic value <tt>$1</tt>) 
to the <tt>MemRegion</tt> in the symbolic store.
<br>
The second line is similar. When we evaluate <tt>x</tt> again, we do the same 
dance, and create an <tt>SVal</tt> that references the symbol <tt>$0</tt>. Note, two <tt>SVals</tt> 
might reference the same underlying values.

<p>
To summarize, MemRegions are unique names for blocks of memory. Symbols are 
unique names for abstract symbolic values. Some MemRegions represents abstract 
symbolic chunks of memory, and thus are also based on symbols. SVals are just 
references to values, and can reference either MemRegions, Symbols, or concrete 
values (e.g., the number 1).

  <!-- 
  TODO: Add a picture.
  <br>
  Symbols<br>
  FunctionalObjects are used throughout.  
  -->
<h2 id=idea>Idea for a Checker</h2>
  Here are several questions which you should consider when evaluating your 
  checker idea:
  <ul>
    <li>Can the check be effectively implemented without path-sensitive 
    analysis? See <a href="#ast">AST Visitors</a>.</li>
    
    <li>How high the false positive rate is going to be? Looking at the occurrences 
    of the issue you want to write a checker for in the existing code bases might 
    give you some ideas. </li>
    
    <li>How the current limitations of the analysis will effect the false alarm 
    rate? Currently, the analyzer only reasons about one procedure at a time (no 
    inter-procedural analysis). Also, it uses a simple range tracking based 
    solver to model symbolic execution.</li>
    
    <li>Consult the <a
    href="http://llvm.org/bugs/buglist.cgi?query_format=advanced&amp;bug_status=NEW&amp;bug_status=REOPENED&amp;version=trunk&amp;component=Static%20Analyzer&amp;product=clang">Bugzilla database</a> 
    to get some ideas for new checkers and consider starting with improving/fixing  
    bugs in the existing checkers.</li>
  </ul>

<h2 id=registration>Checker Registration</h2>
  All checker implementation files are located in <tt>clang/lib/StaticAnalyzer/Checkers</tt> 
  folder. Follow the steps below to register a new checker with the analyzer.
<ol>
  <li>Create a new checker implementation file, for example <tt>./lib/StaticAnalyzer/Checkers/NewChecker.cpp</tt>
<pre class="code_example">
using namespace clang;
using namespace ento;

namespace {
class NewChecker: public Checker< check::PreStmt&lt;CallExpr> > {
public:
  void checkPreStmt(const CallExpr *CE, CheckerContext &amp;Ctx) const {}
}
}
void ento::registerNewChecker(CheckerManager &amp;mgr) {
  mgr.registerChecker&lt;NewChecker>();
}
</pre>

<li>Pick the package name for your checker and add the registration code to 
<tt>./lib/StaticAnalyzer/Checkers/Checkers.td</tt>. Note, all checkers should 
first be developed as experimental. Suppose our new checker performs security 
related checks, then we should add the following lines under 
<tt>SecurityExperimental</tt> package: 
<pre class="code_example">
let ParentPackage = SecurityExperimental in {
...
def NewChecker : Checker<"NewChecker">,
  HelpText<"This text should give a short description of the checks performed.">,
  DescFile<"NewChecker.cpp">;
...
} // end "security.experimental"
</pre>

<li>Make the source code file visible to CMake by adding it to 
<tt>./lib/StaticAnalyzer/Checkers/CMakeLists.txt</tt>.

<li>Compile and see your checker in the list of available checkers by running:<br>
<tt><b>$clang -cc1 -analyzer-checker-help</b></tt>
</ol>
   

<h2 id=skeleton>Checker Skeleton</h2>
  There are two main decisions you need to make:
  <ul>
    <li> Which events the checker should be tracking. 
    See <a href="http://clang.llvm.org/doxygen/classento_1_1CheckerDocumentation.html">CheckerDocumentation</a> 
    for the list of available checker callbacks.</li>
    <li> What data you want to store as part of the checker-specific program 
    state. Try to minimize the checker state as much as possible. </li>
  </ul>

<h2 id=bugs>Bug Reports</h2>

<h2 id=ast>AST Visitors</h2>
  Some checks might not require path-sensitivity to be effective. Simple AST walk 
  might be sufficient. If that is the case, consider implementing a Clang 
  compiler warning. On the other hand, a check might not be acceptable as a compiler 
  warning; for example, because of a relatively high false positive rate. In this 
  situation, AST callbacks <tt><b>checkASTDecl</b></tt> and 
  <tt><b>checkASTCodeBody</b></tt> are your best friends. 

<h2 id=testing>Testing</h2>
  Every patch should be well tested with Clang regression tests. The checker tests 
  live in <tt>clang/test/Analysis</tt> folder. To run all of the analyzer tests, 
  execute the following from the <tt>clang</tt> build directory:
    <pre class="code">
    $ <b>TESTDIRS=Analysis make test</b>
    </pre>

<h2 id=commands>Useful Commands/Debugging Hints</h2>
<ul>
<li>
While investigating a checker-related issue, instruct the analyzer to only 
execute a single checker:
<br><tt>
$ <b>clang -cc1 -analyze -analyzer-checker=osx.KeychainAPI test.c</b>
</tt>
</li>
<li>
To dump AST:
<br><tt>
$ <b>clang -cc1 -ast-dump test.c</b>
</tt>
</li>
<li>
To view/dump CFG use <tt>debug.ViewCFG</tt> or <tt>debug.DumpCFG</tt> checkers:
<br><tt>
$ <b>clang -cc1 -analyze -analyzer-checker=debug.ViewCFG test.c</b>
</tt> 
</li>
<li>
To see all available debug checkers:
<br><tt>
$ <b>clang -cc1 -analyzer-checker-help | grep "debug"</b>
</tt>
</li>
<li>
To see which function is failing while processing a large file use 
<tt>-analyzer-display-progress</tt> option.
</li>
<li>
While debugging execute <tt>clang -cc1 -analyze -analyzer-checker=core</tt> 
instead of <tt>clang --analyze</tt>, as the later would call the compiler 
in a separate process.
</li>
<li>
To view <tt>ExplodedGraph</tt> (the state graph explored by the analyzer) while 
debugging, goto a frame that has <tt>clang::ento::ExprEngine</tt> object and 
execute:
<br><tt> 
(gdb) <b>p ViewGraph(0)</b>
</tt>
</li>
<li>
To see the <tt>ProgramState</tt> while debugging use the following command. 
<br><tt>
(gdb) <b>p State->dump()</b>
</tt> 
</li>
<li>
To see <tt>clang::Expr</tt> while debugging use the following command. If you 
pass in a SourceManager object, it will also dump the corresponding line in the 
source code.
<br><tt>
(gdb) <b>p E->dump()</b>
</tt> 
</li>
<li>
To dump AST of a method that the current <tt>ExplodedNode</tt> belongs to:
<br><tt>
(gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump()</b>
(gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump(getContext().getSourceManager())</b>
</tt>
</li>
</ul>

</div>
</div>
</body>
</html>