diff options
-rw-r--r-- | docs/AddressSanitizer.html | 171 | ||||
-rw-r--r-- | docs/AddressSanitizer.rst | 158 | ||||
-rw-r--r-- | docs/AnalyzerRegions.html | 260 | ||||
-rw-r--r-- | docs/AnalyzerRegions.rst | 259 | ||||
-rw-r--r-- | docs/ClangPlugins.html | 170 | ||||
-rw-r--r-- | docs/ClangPlugins.rst | 149 | ||||
-rw-r--r-- | docs/ClangTools.html | 110 | ||||
-rw-r--r-- | docs/ClangTools.rst | 91 | ||||
-rw-r--r-- | docs/HowToSetupToolingForLLVM.html | 212 | ||||
-rw-r--r-- | docs/HowToSetupToolingForLLVM.rst | 211 | ||||
-rw-r--r-- | docs/IntroductionToTheClangAST.html | 139 | ||||
-rw-r--r-- | docs/IntroductionToTheClangAST.rst | 135 | ||||
-rw-r--r-- | docs/JSONCompilationDatabase.html | 89 | ||||
-rw-r--r-- | docs/JSONCompilationDatabase.rst | 85 | ||||
-rw-r--r-- | docs/LibASTMatchersTutorial.html | 533 | ||||
-rw-r--r-- | docs/LibASTMatchersTutorial.rst | 532 | ||||
-rw-r--r-- | docs/PTHInternals.html | 179 | ||||
-rw-r--r-- | docs/PTHInternals.rst | 164 | ||||
-rw-r--r-- | docs/RAVFrontendAction.html | 224 | ||||
-rw-r--r-- | docs/RAVFrontendAction.rst | 216 | ||||
-rw-r--r-- | docs/UsersManual.html | 1338 | ||||
-rw-r--r-- | docs/UsersManual.rst | 1238 | ||||
-rw-r--r-- | docs/index.rst | 11 |
23 files changed, 3249 insertions, 3425 deletions
diff --git a/docs/AddressSanitizer.html b/docs/AddressSanitizer.html deleted file mode 100644 index 397eafc2d5..0000000000 --- a/docs/AddressSanitizer.html +++ /dev/null @@ -1,171 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> -<!-- Material used from: HTML 4.01 specs: http://www.w3.org/TR/html401/ --> -<html> -<head> - <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> - <title>AddressSanitizer, a fast memory error detector</title> - <link type="text/css" rel="stylesheet" href="../menu.css"> - <link type="text/css" rel="stylesheet" href="../content.css"> - <style type="text/css"> - td { - vertical-align: top; - } - </style> -</head> -<body> - -<!--#include virtual="../menu.html.incl"--> - -<div id="content"> - -<h1>AddressSanitizer</h1> -<ul> - <li> <a href="#intro">Introduction</a> - <li> <a href="#howtobuild">How to Build</a> - <li> <a href="#usage">Usage</a> - <ul><li> <a href="#has_feature">__has_feature(address_sanitizer)</a></ul> - <ul><li> <a href="#no_address_safety_analysis"> - __attribute__((no_address_safety_analysis))</a></ul> - <li> <a href="#platforms">Supported Platforms</a> - <li> <a href="#limitations">Limitations</a> - <li> <a href="#status">Current Status</a> - <li> <a href="#moreinfo">More Information</a> -</ul> - -<h2 id="intro">Introduction</h2> -AddressSanitizer is a fast memory error detector. -It consists of a compiler instrumentation module and a run-time library. -The tool can detect the following types of bugs: -<ul> <li> Out-of-bounds accesses to heap, stack and globals - <li> Use-after-free - <li> Use-after-return (to some extent) - <li> Double-free, invalid free -</ul> -Typical slowdown introduced by AddressSanitizer is <b>2x</b>. - -<h2 id="howtobuild">How to build</h2> -Follow the <a href="../get_started.html">clang build instructions</a>. -CMake build is supported.<BR> - -<h2 id="usage">Usage</h2> -Simply compile and link your program with <tt>-fsanitize=address</tt> flag. <BR> -The AddressSanitizer run-time library should be linked to the final executable, -so make sure to use <tt>clang</tt> (not <tt>ld</tt>) for the final link step.<BR> -When linking shared libraries, the AddressSanitizer run-time is not linked, -so <tt>-Wl,-z,defs</tt> may cause link errors (don't use it with AddressSanitizer). <BR> - -To get a reasonable performance add <tt>-O1</tt> or higher. <BR> -To get nicer stack traces in error messages add -<tt>-fno-omit-frame-pointer</tt>. <BR> -To get perfect stack traces you may need to disable inlining (just use <tt>-O1</tt>) and tail call -elimination (<tt>-fno-optimize-sibling-calls</tt>). - -<pre> -% cat example_UseAfterFree.cc -int main(int argc, char **argv) { - int *array = new int[100]; - delete [] array; - return array[argc]; // BOOM -} -</pre> - -<pre> -# Compile and link -% clang -O1 -g -fsanitize=address -fno-omit-frame-pointer example_UseAfterFree.cc -</pre> -OR -<pre> -# Compile -% clang -O1 -g -fsanitize=address -fno-omit-frame-pointer -c example_UseAfterFree.cc -# Link -% clang -g -fsanitize=address example_UseAfterFree.o -</pre> - -If a bug is detected, the program will print an error message to stderr and exit with a -non-zero exit code. -Currently, AddressSanitizer does not symbolize its output, so you may need to use a -separate script to symbolize the result offline (this will be fixed in future). -<pre> -% ./a.out 2> log -% projects/compiler-rt/lib/asan/scripts/asan_symbolize.py / < log | c++filt -==9442== ERROR: AddressSanitizer heap-use-after-free on address 0x7f7ddab8c084 at pc 0x403c8c bp 0x7fff87fb82d0 sp 0x7fff87fb82c8 -READ of size 4 at 0x7f7ddab8c084 thread T0 - #0 0x403c8c in main example_UseAfterFree.cc:4 - #1 0x7f7ddabcac4d in __libc_start_main ??:0 -0x7f7ddab8c084 is located 4 bytes inside of 400-byte region [0x7f7ddab8c080,0x7f7ddab8c210) -freed by thread T0 here: - #0 0x404704 in operator delete[](void*) ??:0 - #1 0x403c53 in main example_UseAfterFree.cc:4 - #2 0x7f7ddabcac4d in __libc_start_main ??:0 -previously allocated by thread T0 here: - #0 0x404544 in operator new[](unsigned long) ??:0 - #1 0x403c43 in main example_UseAfterFree.cc:2 - #2 0x7f7ddabcac4d in __libc_start_main ??:0 -==9442== ABORTING -</pre> - -AddressSanitizer exits on the first detected error. This is by design. -One reason: it makes the generated code smaller and faster (both by ~5%). -Another reason: this makes fixing bugs unavoidable. With Valgrind, it is often -the case that users treat Valgrind warnings as false positives -(which they are not) and don't fix them. - - -<h3 id="has_feature">__has_feature(address_sanitizer)</h3> -In some cases one may need to execute different code depending on whether -AddressSanitizer is enabled. -<a href="LanguageExtensions.html#__has_feature_extension">__has_feature</a> -can be used for this purpose. -<pre> -#if defined(__has_feature) -# if __has_feature(address_sanitizer) - code that builds only under AddressSanitizer -# endif -#endif -</pre> - -<h3 id="no_address_safety_analysis">__attribute__((no_address_safety_analysis))</h3> -Some code should not be instrumented by AddressSanitizer. -One may use the function attribute -<a href="LanguageExtensions.html#address_sanitizer"> - <tt>no_address_safety_analysis</tt></a> -to disable instrumentation of a particular function. -This attribute may not be supported by other compilers, so we suggest to -use it together with <tt>__has_feature(address_sanitizer)</tt>. -Note: currently, this attribute will be lost if the function is inlined. - -<h2 id="platforms">Supported Platforms</h2> -AddressSanitizer is supported on -<ul><li>Linux i386/x86_64 (tested on Ubuntu 10.04 and 12.04). -<li>MacOS 10.6, 10.7 and 10.8 (i386/x86_64). -</ul> -Support for Linux ARM (and Android ARM) is in progress -(it may work, but is not guaranteed too). - - -<h2 id="limitations">Limitations</h2> -<ul> -<li> AddressSanitizer uses more real memory than a native run. -Exact overhead depends on the allocations sizes. The smaller the -allocations you make the bigger the overhead is. -<li> AddressSanitizer uses more stack memory. We have seen up to 3x increase. -<li> On 64-bit platforms AddressSanitizer maps (but not reserves) -16+ Terabytes of virtual address space. -This means that tools like <tt>ulimit</tt> may not work as usually expected. -<li> Static linking is not supported. -</ul> - - -<h2 id="status">Current Status</h2> -AddressSanitizer is fully functional on supported platforms starting from LLVM 3.1. -The test suite is integrated into CMake build and can be run with -<tt>make check-asan</tt> command. - -<h2 id="moreinfo">More Information</h2> -<a href="http://code.google.com/p/address-sanitizer/">http://code.google.com/p/address-sanitizer</a>. - - -</div> -</body> -</html> diff --git a/docs/AddressSanitizer.rst b/docs/AddressSanitizer.rst new file mode 100644 index 0000000000..0ee108bd9e --- /dev/null +++ b/docs/AddressSanitizer.rst @@ -0,0 +1,158 @@ +================ +AddressSanitizer +================ + +.. contents:: + :local: + +Introduction +============ + +AddressSanitizer is a fast memory error detector. It consists of a +compiler instrumentation module and a run-time library. The tool can +detect the following types of bugs: + +- Out-of-bounds accesses to heap, stack and globals +- Use-after-free +- Use-after-return (to some extent) +- Double-free, invalid free + +Typical slowdown introduced by AddressSanitizer is **2x**. + +How to build +============ + +Follow the `clang build instructions <../get_started.html>`_. CMake +build is supported. + +Usage +===== + +Simply compile and link your program with ``-fsanitize=address`` flag. +The AddressSanitizer run-time library should be linked to the final +executable, so make sure to use ``clang`` (not ``ld``) for the final +link step. +When linking shared libraries, the AddressSanitizer run-time is not +linked, so ``-Wl,-z,defs`` may cause link errors (don't use it with +AddressSanitizer). +To get a reasonable performance add ``-O1`` or higher. +To get nicer stack traces in error messages add +``-fno-omit-frame-pointer``. +To get perfect stack traces you may need to disable inlining (just use +``-O1``) and tail call elimination (``-fno-optimize-sibling-calls``). + +:: + + % cat example_UseAfterFree.cc + int main(int argc, char **argv) { + int *array = new int[100]; + delete [] array; + return array[argc]; // BOOM + } + +:: + + # Compile and link + % clang -O1 -g -fsanitize=address -fno-omit-frame-pointer example_UseAfterFree.cc + +OR + +:: + + # Compile + % clang -O1 -g -fsanitize=address -fno-omit-frame-pointer -c example_UseAfterFree.cc + # Link + % clang -g -fsanitize=address example_UseAfterFree.o + +If a bug is detected, the program will print an error message to stderr +and exit with a non-zero exit code. Currently, AddressSanitizer does not +symbolize its output, so you may need to use a separate script to +symbolize the result offline (this will be fixed in future). + +:: + + % ./a.out 2> log + % projects/compiler-rt/lib/asan/scripts/asan_symbolize.py / < log | c++filt + ==9442== ERROR: AddressSanitizer heap-use-after-free on address 0x7f7ddab8c084 at pc 0x403c8c bp 0x7fff87fb82d0 sp 0x7fff87fb82c8 + READ of size 4 at 0x7f7ddab8c084 thread T0 + #0 0x403c8c in main example_UseAfterFree.cc:4 + #1 0x7f7ddabcac4d in __libc_start_main ??:0 + 0x7f7ddab8c084 is located 4 bytes inside of 400-byte region [0x7f7ddab8c080,0x7f7ddab8c210) + freed by thread T0 here: + #0 0x404704 in operator delete[](void*) ??:0 + #1 0x403c53 in main example_UseAfterFree.cc:4 + #2 0x7f7ddabcac4d in __libc_start_main ??:0 + previously allocated by thread T0 here: + #0 0x404544 in operator new[](unsigned long) ??:0 + #1 0x403c43 in main example_UseAfterFree.cc:2 + #2 0x7f7ddabcac4d in __libc_start_main ??:0 + ==9442== ABORTING + +AddressSanitizer exits on the first detected error. This is by design. +One reason: it makes the generated code smaller and faster (both by +~5%). Another reason: this makes fixing bugs unavoidable. With Valgrind, +it is often the case that users treat Valgrind warnings as false +positives (which they are not) and don't fix them. + +\_\_has\_feature(address\_sanitizer) +------------------------------------ + +In some cases one may need to execute different code depending on +whether AddressSanitizer is enabled. +`\_\_has\_feature <LanguageExtensions.html#__has_feature_extension>`_ +can be used for this purpose. + +:: + + #if defined(__has_feature) + # if __has_feature(address_sanitizer) + code that builds only under AddressSanitizer + # endif + #endif + +``__attribute__((no_address_safety_analysis))`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Some code should not be instrumented by AddressSanitizer. One may use +the function attribute +`no_address_safety_analysis <LanguageExtensions.html#address_sanitizer>`_ +to disable instrumentation of a particular function. This attribute may +not be supported by other compilers, so we suggest to use it together +with ``__has_feature(address_sanitizer)``. Note: currently, this +attribute will be lost if the function is inlined. + +Supported Platforms +=================== + +AddressSanitizer is supported on + +- Linux i386/x86\_64 (tested on Ubuntu 10.04 and 12.04). +- MacOS 10.6, 10.7 and 10.8 (i386/x86\_64). + +Support for Linux ARM (and Android ARM) is in progress (it may work, but +is not guaranteed too). + +Limitations +=========== + +- AddressSanitizer uses more real memory than a native run. Exact + overhead depends on the allocations sizes. The smaller the + allocations you make the bigger the overhead is. +- AddressSanitizer uses more stack memory. We have seen up to 3x + increase. +- On 64-bit platforms AddressSanitizer maps (but not reserves) 16+ + Terabytes of virtual address space. This means that tools like + ``ulimit`` may not work as usually expected. +- Static linking is not supported. + +Current Status +============== + +AddressSanitizer is fully functional on supported platforms starting +from LLVM 3.1. The test suite is integrated into CMake build and can be +run with ``make check-asan`` command. + +More Information +================ + +`http://code.google.com/p/address-sanitizer <http://code.google.com/p/address-sanitizer/>`_. diff --git a/docs/AnalyzerRegions.html b/docs/AnalyzerRegions.html deleted file mode 100644 index f9d3337920..0000000000 --- a/docs/AnalyzerRegions.html +++ /dev/null @@ -1,260 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> -<html> -<head> -<title>Static Analyzer Design Document: Memory Regions</title> -</head> -<body> - -<h1>Static Analyzer Design Document: Memory Regions</h1> - -<h3>Authors</h3> - -<p>Ted Kremenek, <tt>kremenek at apple</tt><br> -Zhongxing Xu, <tt>xuzhongzhing at gmail</tt></p> - -<h2 id="intro">Introduction</h2> - -<p>The path-sensitive analysis engine in libAnalysis employs an extensible API -for abstractly modeling the memory of an analyzed program. This API employs the -concept of "memory regions" to abstractly model chunks of program memory such as -program variables and dynamically allocated memory such as those returned from -'malloc' and 'alloca'. Regions are hierarchical, with subregions modeling -subtyping relationships, field and array offsets into larger chunks of memory, -and so on.</p> - -<p>The region API consists of two components:</p> - -<ul> <li>A taxonomy and representation of regions themselves within the analyzer -engine. The primary definitions and interfaces are described in <tt><a -href="http://clang.llvm.org/doxygen/MemRegion_8h-source.html">MemRegion.h</a></tt>. -At the root of the region hierarchy is the class <tt>MemRegion</tt> with -specific subclasses refining the region concept for variables, heap allocated -memory, and so forth.</li> <li>The modeling of binding of values to regions. For -example, modeling the value stored to a local variable <tt>x</tt> consists of -recording the binding between the region for <tt>x</tt> (which represents the -raw memory associated with <tt>x</tt>) and the value stored to <tt>x</tt>. This -binding relationship is captured with the notion of "symbolic -stores."</li> </ul> - -<p>Symbolic stores, which can be thought of as representing the relation -<tt>regions -> values</tt>, are implemented by subclasses of the -<tt>StoreManager</tt> class (<tt><a -href="http://clang.llvm.org/doxygen/Store_8h-source.html">Store.h</a></tt>). A -particular StoreManager implementation has complete flexibility concerning the -following: - -<ul> -<li><em>How</em> to model the binding between regions and values</li> -<li><em>What</em> bindings are recorded -</ul> - -<p>Together, both points allow different StoreManagers to tradeoff between -different levels of analysis precision and scalability concerning the reasoning -of program memory. Meanwhile, the core path-sensitive engine makes no -assumptions about either points, and queries a StoreManager about the bindings -to a memory region through a generic interface that all StoreManagers share. If -a particular StoreManager cannot reason about the potential bindings of a given -memory region (e.g., '<tt>BasicStoreManager</tt>' does not reason about fields -of structures) then the StoreManager can simply return 'unknown' (represented by -'<tt>UnknownVal</tt>') for a particular region-binding. This separation of -concerns not only isolates the core analysis engine from the details of -reasoning about program memory but also facilities the option of a client of the -path-sensitive engine to easily swap in different StoreManager implementations -that internally reason about program memory in very different ways.</p> - -<p>The rest of this document is divided into two parts. We first discuss region -taxonomy and the semantics of regions. We then discuss the StoreManager -interface, and details of how the currently available StoreManager classes -implement region bindings.</p> - -<h2 id="regions">Memory Regions and Region Taxonomy</h2> - -<h3>Pointers</h3> - -<p>Before talking about the memory regions, we would talk about the pointers -since memory regions are essentially used to represent pointer values.</p> - -<p>The pointer is a type of values. Pointer values have two semantic aspects. -One is its physical value, which is an address or location. The other is the -type of the memory object residing in the address.</p> - -<p>Memory regions are designed to abstract these two properties of the pointer. -The physical value of a pointer is represented by MemRegion pointers. The rvalue -type of the region corresponds to the type of the pointee object.</p> - -<p>One complication is that we could have different view regions on the same -memory chunk. They represent the same memory location, but have different -abstract location, i.e., MemRegion pointers. Thus we need to canonicalize the -abstract locations to get a unique abstract location for one physical -location.</p> - -<p>Furthermore, these different view regions may or may not represent memory -objects of different types. Some different types are semantically the same, -for example, 'struct s' and 'my_type' are the same type.</p> - -<pre> -struct s; -typedef struct s my_type; -</pre> - -<p>But <tt>char</tt> and <tt>int</tt> are not the same type in the code below:</p> - -<pre> -void *p; -int *q = (int*) p; -char *r = (char*) p; -</pre> - -<p>Thus we need to canonicalize the MemRegion which is used in binding and -retrieving.</p> - -<h3>Regions</h3> -<p>Region is the entity used to model pointer values. A Region has the following -properties:</p> - -<ul> -<li>Kind</li> - -<li>ObjectType: the type of the object residing on the region.</li> - -<li>LocationType: the type of the pointer value that the region corresponds to. - Usually this is the pointer to the ObjectType. But sometimes we want to cache - this type explicitly, for example, for a CodeTextRegion.</li> - -<li>StartLocation</li> - -<li>EndLocation</li> -</ul> - -<h3>Symbolic Regions</h3> - -<p>A symbolic region is a map of the concept of symbolic values into the domain -of regions. It is the way that we represent symbolic pointers. Whenever a -symbolic pointer value is needed, a symbolic region is created to represent -it.</p> - -<p>A symbolic region has no type. It wraps a SymbolData. But sometimes we have -type information associated with a symbolic region. For this case, a -TypedViewRegion is created to layer the type information on top of the symbolic -region. The reason we do not carry type information with the symbolic region is -that the symbolic regions can have no type. To be consistent, we don't let them -to carry type information.</p> - -<p>Like a symbolic pointer, a symbolic region may be NULL, has unknown extent, -and represents a generic chunk of memory.</p> - -<p><em><b>NOTE</b>: We plan not to use loc::SymbolVal in RegionStore and remove it - gradually.</em></p> - -<p>Symbolic regions get their rvalue types through the following ways:</p> - -<ul> -<li>Through the parameter or global variable that points to it, e.g.: -<pre> -void f(struct s* p) { - ... -} -</pre> - -<p>The symbolic region pointed to by <tt>p</tt> has type <tt>struct -s</tt>.</p></li> - -<li>Through explicit or implicit casts, e.g.: -<pre> -void f(void* p) { - struct s* q = (struct s*) p; - ... -} -</pre> -</li> -</ul> - -<p>We attach the type information to the symbolic region lazily. For the first -case above, we create the <tt>TypedViewRegion</tt> only when the pointer is -actually used to access the pointee memory object, that is when the element or -field region is created. For the cast case, the <tt>TypedViewRegion</tt> is -created when visiting the <tt>CastExpr</tt>.</p> - -<p>The reason for doing lazy typing is that symbolic regions are sometimes only -used to do location comparison.</p> - -<h3>Pointer Casts</h3> - -<p>Pointer casts allow people to impose different 'views' onto a chunk of -memory.</p> - -<p>Usually we have two kinds of casts. One kind of casts cast down with in the -type hierarchy. It imposes more specific views onto more generic memory regions. -The other kind of casts cast up with in the type hierarchy. It strips away more -specific views on top of the more generic memory regions.</p> - -<p>We simulate the down casts by layering another <tt>TypedViewRegion</tt> on -top of the original region. We simulate the up casts by striping away the top -<tt>TypedViewRegion</tt>. Down casts is usually simple. For up casts, if the -there is no <tt>TypedViewRegion</tt> to be stripped, we return the original -region. If the underlying region is of the different type than the cast-to type, -we flag an error state.</p> - -<p>For toll-free bridging casts, we return the original region.</p> - -<p>We can set up a partial order for pointer types, with the most general type -<tt>void*</tt> at the top. The partial order forms a tree with <tt>void*</tt> as -its root node.</p> - -<p>Every <tt>MemRegion</tt> has a root position in the type tree. For example, -the pointee region of <tt>void *p</tt> has its root position at the root node of -the tree. <tt>VarRegion</tt> of <tt>int x</tt> has its root position at the 'int -type' node.</p> - -<p><tt>TypedViewRegion</tt> is used to move the region down or up in the tree. -Moving down in the tree adds a <tt>TypedViewRegion</tt>. Moving up in the tree -removes a <Tt>TypedViewRegion</tt>.</p> - -<p>Do we want to allow moving up beyond the root position? This happens -when:</p> <pre> int x; void *p = &x; </pre> - -<p>The region of <tt>x</tt> has its root position at 'int*' node. the cast to -void* moves that region up to the 'void*' node. I propose to not allow such -casts, and assign the region of <tt>x</tt> for <tt>p</tt>.</p> - -<p>Another non-ideal case is that people might cast to a non-generic pointer -from another non-generic pointer instead of first casting it back to the generic -pointer. Direct handling of this case would result in multiple layers of -TypedViewRegions. This enforces an incorrect semantic view to the region, -because we can only have one typed view on a region at a time. To avoid this -inconsistency, before casting the region, we strip the TypedViewRegion, then do -the cast. In summary, we only allow one layer of TypedViewRegion.</p> - -<h3>Region Bindings</h3> - -<p>The following region kinds are boundable: VarRegion, CompoundLiteralRegion, -StringRegion, ElementRegion, FieldRegion, and ObjCIvarRegion.</p> - -<p>When binding regions, we perform canonicalization on element regions and field -regions. This is because we can have different views on the same region, some -of which are essentially the same view with different sugar type names.</p> - -<p>To canonicalize a region, we get the canonical types for all TypedViewRegions -along the way up to the root region, and make new TypedViewRegions with those -canonical types.</p> - -<p>For Objective-C and C++, perhaps another canonicalization rule should be -added: for FieldRegion, the least derived class that has the field is used as -the type of the super region of the FieldRegion.</p> - -<p>All bindings and retrievings are done on the canonicalized regions.</p> - -<p>Canonicalization is transparent outside the region store manager, and more -specifically, unaware outside the Bind() and Retrieve() method. We don't need to -consider region canonicalization when doing pointer cast.</p> - -<h3>Constraint Manager</h3> - -<p>The constraint manager reasons about the abstract location of memory objects. -We can have different views on a region, but none of these views changes the -location of that object. Thus we should get the same abstract location for those -regions.</p> - -</body> -</html> diff --git a/docs/AnalyzerRegions.rst b/docs/AnalyzerRegions.rst new file mode 100644 index 0000000000..80b3882bc9 --- /dev/null +++ b/docs/AnalyzerRegions.rst @@ -0,0 +1,259 @@ +=============================================== +Static Analyzer Design Document: Memory Regions +=============================================== + +Authors: Ted Kremenek, ``kremenek at apple``, +Zhongxing Xu, ``xuzhongzhing at gmail`` + +Introduction +============ + +The path-sensitive analysis engine in libAnalysis employs an extensible +API for abstractly modeling the memory of an analyzed program. This API +employs the concept of "memory regions" to abstractly model chunks of +program memory such as program variables and dynamically allocated +memory such as those returned from 'malloc' and 'alloca'. Regions are +hierarchical, with subregions modeling subtyping relationships, field +and array offsets into larger chunks of memory, and so on. + +The region API consists of two components: + +- A taxonomy and representation of regions themselves within the + analyzer engine. The primary definitions and interfaces are described + in ``MemRegion.h``. At the root of the region hierarchy is the class + ``MemRegion`` with specific subclasses refining the region concept + for variables, heap allocated memory, and so forth. +- The modeling of binding of values to regions. For example, modeling + the value stored to a local variable ``x`` consists of recording the + binding between the region for ``x`` (which represents the raw memory + associated with ``x``) and the value stored to ``x``. This binding + relationship is captured with the notion of "symbolic stores." + +Symbolic stores, which can be thought of as representing the relation +``regions -> values``, are implemented by subclasses of the +``StoreManager`` class (``Store.h``). A particular StoreManager +implementation has complete flexibility concerning the following: + +- *How* to model the binding between regions and values +- *What* bindings are recorded + +Together, both points allow different StoreManagers to tradeoff between +different levels of analysis precision and scalability concerning the +reasoning of program memory. Meanwhile, the core path-sensitive engine +makes no assumptions about either points, and queries a StoreManager +about the bindings to a memory region through a generic interface that +all StoreManagers share. If a particular StoreManager cannot reason +about the potential bindings of a given memory region (e.g., +'``BasicStoreManager``' does not reason about fields of structures) then +the StoreManager can simply return 'unknown' (represented by +'``UnknownVal``') for a particular region-binding. This separation of +concerns not only isolates the core analysis engine from the details of +reasoning about program memory but also facilities the option of a +client of the path-sensitive engine to easily swap in different +StoreManager implementations that internally reason about program memory +in very different ways. + +The rest of this document is divided into two parts. We first discuss +region taxonomy and the semantics of regions. We then discuss the +StoreManager interface, and details of how the currently available +StoreManager classes implement region bindings. + +Memory Regions and Region Taxonomy +================================== + +Pointers +-------- + +Before talking about the memory regions, we would talk about the +pointers since memory regions are essentially used to represent pointer +values. + +The pointer is a type of values. Pointer values have two semantic +aspects. One is its physical value, which is an address or location. The +other is the type of the memory object residing in the address. + +Memory regions are designed to abstract these two properties of the +pointer. The physical value of a pointer is represented by MemRegion +pointers. The rvalue type of the region corresponds to the type of the +pointee object. + +One complication is that we could have different view regions on the +same memory chunk. They represent the same memory location, but have +different abstract location, i.e., MemRegion pointers. Thus we need to +canonicalize the abstract locations to get a unique abstract location +for one physical location. + +Furthermore, these different view regions may or may not represent +memory objects of different types. Some different types are semantically +the same, for example, 'struct s' and 'my\_type' are the same type. + +:: + + struct s; + typedef struct s my_type; + +But ``char`` and ``int`` are not the same type in the code below: + +:: + + void *p; + int *q = (int*) p; + char *r = (char*) p; + +Thus we need to canonicalize the MemRegion which is used in binding and +retrieving. + +Regions +------- + +Region is the entity used to model pointer values. A Region has the +following properties: + +- Kind +- ObjectType: the type of the object residing on the region. +- LocationType: the type of the pointer value that the region + corresponds to. Usually this is the pointer to the ObjectType. But + sometimes we want to cache this type explicitly, for example, for a + CodeTextRegion. +- StartLocation +- EndLocation + +Symbolic Regions +---------------- + +A symbolic region is a map of the concept of symbolic values into the +domain of regions. It is the way that we represent symbolic pointers. +Whenever a symbolic pointer value is needed, a symbolic region is +created to represent it. + +A symbolic region has no type. It wraps a SymbolData. But sometimes we +have type information associated with a symbolic region. For this case, +a TypedViewRegion is created to layer the type information on top of the +symbolic region. The reason we do not carry type information with the +symbolic region is that the symbolic regions can have no type. To be +consistent, we don't let them to carry type information. + +Like a symbolic pointer, a symbolic region may be NULL, has unknown +extent, and represents a generic chunk of memory. + +.. note:: + We plan not to use loc::SymbolVal in RegionStore and remove it + gradually. + +Symbolic regions get their rvalue types through the following ways: + +- Through the parameter or global variable that points to it, e.g.: + + :: + + void f(struct s* p) { + ... + } + + The symbolic region pointed to by ``p`` has type ``struct s``. + +- Through explicit or implicit casts, e.g.: + + :: + + void f(void* p) { + struct s* q = (struct s*) p; + ... + } + +We attach the type information to the symbolic region lazily. For the +first case above, we create the ``TypedViewRegion`` only when the +pointer is actually used to access the pointee memory object, that is +when the element or field region is created. For the cast case, the +``TypedViewRegion`` is created when visiting the ``CastExpr``. + +The reason for doing lazy typing is that symbolic regions are sometimes +only used to do location comparison. + +Pointer Casts +------------- + +Pointer casts allow people to impose different 'views' onto a chunk of +m |