diff options
author | Ted Kremenek <kremenek@apple.com> | 2009-04-03 01:38:55 +0000 |
---|---|---|
committer | Ted Kremenek <kremenek@apple.com> | 2009-04-03 01:38:55 +0000 |
commit | d76e0a6c2262e62d943c120a0a8768943fdbde05 (patch) | |
tree | b800af7e446960814f06f557de5ecceb6099e605 /docs | |
parent | 8a2073a85670a13b0bcc1726bc7f80d458f8245b (diff) |
Convert region-design document to HTML.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@68366 91177308-0d34-0410-b5e6-96231b3b80d8
Diffstat (limited to 'docs')
-rw-r--r-- | docs/AnalyzerRegions.html | 232 | ||||
-rw-r--r-- | docs/AnalyzerRegions.txt | 197 |
2 files changed, 232 insertions, 197 deletions
diff --git a/docs/AnalyzerRegions.html b/docs/AnalyzerRegions.html new file mode 100644 index 0000000000..6767cfd579 --- /dev/null +++ b/docs/AnalyzerRegions.html @@ -0,0 +1,232 @@ +<html> +<head> +<title>Static Analyzer Design Document: Memory Regions</title> +</head> +<body> + +<h1>Static Analyzer Design Document: Memory Regions</h1> + +<h3>Authors</h3> + +<p>Ted Kremenek, <tt>kremenek at apple</tt><br> +Zhongxing Xu, <tt>xuzhongzhing at gmail</tt></p> + +<h2 id="intro">Introduction</h2> + +<p>The path-sensitive analysis engine in libAnalysis employs an extensible API +for abstractly modeling the memory of an analyzed program. This API employs the +concept of "memory regions" to abstractly model chunks of program memory such as +program variables and dynamically allocated memory such as those returned from +'malloc' and 'alloca'. Regions are hierarchical, with subregions modeling +subtyping relationships, field and array offsets into larger chunks of memory, +and so on.</p> + +<p>The region API consists of two components:</p> + +<ul> <li>A taxonomy and representation of regions themselves within the analyzer +engine. The primary definitions and interfaces are described in <tt><a +href="http://clang.llvm.org/doxygen/MemRegion_8h-source.html">MemRegion.h</a></tt>. +At the root of the region hierarchy is the class <tt>MemRegion</tt> with +specific subclasses refining the region concept for variables, heap allocated +memory, and so forth.</li> <li>The modeling of binding of values to regions. For +example, modeling the value stored to a local variable <tt>x</tt> consists of +recording the binding between the region for <tt>x</tt> (which represents the +raw memory associated with <tt>x</tt>) and the value stored to <tt>x</tt>. This +binding relationship is captured with the notion of "symbolic +stores."</li> </ul> + +<p>Symbolic stores, which can be thought of as representing the relation +<tt>regions -> values</tt>, are implemented by subclasses of the +<tt>StoreManager</tt> class (<tt><a +href="http://clang.llvm.org/doxygen/Store_8h-source.html">Store.h</a></tt>). A +particular StoreManager implementation has complete flexibility concerning the +following: + +<ul> +<li><em>How</em> to model the binding between regions and values</li> +<li><em>What</em> bindings are recorded +</ul> + +<p>Together, both points allow different StoreManagers to tradeoff between +different levels of analysis precision and scalability concerning the reasoning +of program memory. Meanwhile, the core path-sensitive engine makes no +assumptions about either points, and queries a StoreManager about the bindings +to a memory region through a generic interface that all StoreManagers share. If +a particular StoreManager cannot reason about the potential bindings of a given +memory region (e.g., '<tt>BasicStoreManager</tt>' does not reason about fields +of structures) then the StoreManager can simply return 'unknown' (represented by +'<tt>UnknownVal</tt>') for a particular region-binding. This separation of +concerns not only isolates the core analysis engine from the details of +reasoning about program memory but also facilities the option of a client of the +path-sensitive engine to easily swap in different StoreManager implementations +that internally reason about program memory in very different ways.</pp> + +<p>The rest of this document is divided into two parts. We first discuss region +taxonomy and the semantics of regions. We then discuss the StoreManager +interface, and details of how the currently available StoreManager classes +implement region bindings.</p> + +<h2 id="regions">Memory Regions and Region Taxonomy</h2> + +<h3>Pointers</h3> + +<p>Before talking about the memory regions, we would talk about the pointers +since memory regions are essentially used to represent pointer values.</p> + +<p>The pointer is a type of values. Pointer values have two semantic aspects. +One is its physical value, which is an address or location. The other is the +type of the memory object residing in the address.</p> + +<p>Memory regions are designed to abstract these two properties of the pointer. +The physical value of a pointer is represented by MemRegion pointers. The rvalue +type of the region corresponds to the type of the pointee object.</p> + +<p>One complication is that we could have different view regions on the same +memory chunk. They represent the same memory location, but have different +abstract location, i.e., MemRegion pointers. Thus we need to canonicalize the +abstract locations to get a unique abstract location for one physical +location.</p> + +<p>Furthermore, these different view regions may or may not represent memory +objects of different types. Some different types are semantically the same, +for example, 'struct s' and 'my_type' are the same type.</p> + +<pre> +struct s; +typedef struct s my_type; +</pre> + +<p>But <tt>char</tt> and <tt>int</tt> are not the same type in the code below:</p> + +<pre> +void *p; +int *q = (int*) p; +char *r = (char*) p; +</pre + +<p>Thus we need to canonicalize the MemRegion which is used in binding and +retrieving.</p> + +<h3>Symbolic Regions</h3> + +<p>A symbolic region is a map of the concept of symbolic values into the domain +of regions. It is the way that we represent symbolic pointers. Whenever a +symbolic pointer value is needed, a symbolic region is created to represent +it.</p> + +<p>A symbolic region has no type. It wraps a SymbolData. But sometimes we have +type information associated with a symbolic region. For this case, a +TypedViewRegion is created to layer the type information on top of the symbolic +region. The reason we do not carry type information with the symbolic region is +that the symbolic regions can have no type. To be consistent, we don't let them +to carry type information.</p> + +<p>Like a symbolic pointer, a symbolic region may be NULL, has unknown extent, +and represents a generic chunk of memory.</p> + +<p><em><b>NOTE</b>: We plan not to use loc::SymbolVal in RegionStore and remove it + gradually.</em></p> + +<p>Symbolic regions get their rvalue types through the following ways:</p> + +<ul> +<li>Through the parameter or global variable that points to it, e.g.: +<pre> +void f(struct s* p) { + ... +} +</pre> + +<p>The symbolic region pointed to by <tt>p</tt> has type <tt>struct +s</tt>.</p></li> + +<li>Through explicit or implicit casts, e.g.: +<pre> +void f(void* p) { + struct s* q = (struct s*) p; + ... +} +</pre> +</li> +</ul> + +<p>We attach the type information to the symbolic region lazily. For the first +case above, we create the <tt>TypedViewRegion</tt> only when the pointer is +actually used to access the pointee memory object, that is when the element or +field region is created. For the cast case, the <tt>TypedViewRegion</tt> is +created when visiting the <tt>CastExpr</tt>.</p> + +<p>The reason for doing lazy typing is that symbolic regions are sometimes only +used to do location comparison.</p> + +<h3>Pointer Casts</h3> + +<p>Pointer casts allow people to impose different 'views' onto a chunk of +memory.</p> + +<p>Usually we have two kinds of casts. One kind of casts cast down with in the +type hierarchy. It imposes more specific views onto more generic memory regions. +The other kind of casts cast up with in the type hierarchy. It strips away more +specific views on top of the more generic memory regions.</p> + +<p>We simulate the down casts by layering another <tt>TypedViewRegion</tt> on +top of the original region. We simulate the up casts by striping away the top +<tt>TypedViewRegion</tt>. Down casts is usually simple. For up casts, if the +there is no <tt>TypedViewRegion</tt> to be stripped, we return the original +region. If the underlying region is of the different type than the cast-to type, +we flag an error state.</p> + +<p>For toll-free bridging casts, we return the original region.</p> + +<p>We can set up a partial order for pointer types, with the most general type +<tt>void*</tt> at the top. The partial order forms a tree with <tt>void*</tt> as +its root node.</p> + +<p>Every <tt>MemRegion</tt> has a root position in the type tree. For example, +the pointee region of <tt>void *p</tt> has its root position at the root node of +the tree. <tt>VarRegion</tt> of <tt>int x</tt> has its root position at the 'int +type' node.</p> + +<p><tt>TypedViewRegion</tt> is used to move the region down or up in the tree. +Moving down in the tree adds a <tt>TypedViewRegion</tt>. Moving up in the tree +removes a <Tt>TypedViewRegion</tt>.</p> + +<p>Do we want to allow moving up beyond the root position? This happens +when:</p> <pre> int x; void *p = &x; </pre> + +<p>The region of <tt>x</tt> has its root position at 'int*' node. the cast to +void* moves that region up to the 'void*' node. I propose to not allow such +casts, and assign the region of <tt>x</tt> for <tt>p</tt>.<p> + +<h3>Region Bindings</h3> + +<p>The following region kinds are boundable: VarRegion, CompoundLiteralRegion, +StringRegion, ElementRegion, FieldRegion, and ObjCIvarRegion.</p> + +<p>When binding regions, we perform canonicalization on element regions and field +regions. This is because we can have different views on the same region, some +of which are essentially the same view with different sugar type names.</p> + +<p>To canonicalize a region, we get the canonical types for all TypedViewRegions +along the way up to the root region, and make new TypedViewRegions with those +canonical types.</p> + +<p>For Objective-C and C++, perhaps another canonicalization rule should be +added: for FieldRegion, the least derived class that has the field is used as +the type of the super region of the FieldRegion.</p> + +<p>All bindings and retrievings are done on the canonicalized regions.</p> + +<p>Canonicalization is transparent outside the region store manager, and more +specifically, unaware outside the Bind() and Retrieve() method. We don't need to +consider region canonicalization when doing pointer cast.</p> + +<h3>Constraint Manager</h3> + +<p>The constraint manager reasons about the abstract location of memory objects. +We can have different views on a region, but none of these views changes the +location of that object. Thus we should get the same abstract location for those +regions.</p> + +</body> +</html> diff --git a/docs/AnalyzerRegions.txt b/docs/AnalyzerRegions.txt deleted file mode 100644 index c9c4ab30df..0000000000 --- a/docs/AnalyzerRegions.txt +++ /dev/null @@ -1,197 +0,0 @@ -Static Analyzer: 'Regions' --------------------------- - -INTRODUCTION - - The path-sensitive analysis engine in libAnalysis employs an extensible API - for abstractly modeling the memory of an analyzed program. This API employs - the concept of "memory regions" to abstractly model chunks of program memory - such as program variables and dynamically allocated memory such as those - returned from 'malloc' and 'alloca'. Regions are hierarchical, with subregions - modeling subtyping relationships, field and array offsets into larger chunks - of memory, and so on. - - The region API consists of two components. The first is the taxonomy and - representation of regions themselves within the analyzer engine. The primary - definitions and interfaces are described in - 'include/clang/Analysis/PathSensitive/MemRegion.h'. At the root of the region - hierarchy is the class 'MemRegion' with specific subclasses refining the - region concept for variables, heap allocated memory, and so forth. - - The second component in the region API is the modeling of the binding of - values to regions. For example, modeling the value stored to a local variable - 'x' consists of recording the binding between the region for 'x' (which - represents the raw memory associated with 'x') and the value stored to 'x'. - This binding relationship is captured with the notion of "symbolic stores." - - Symbolic stores, which can be thought of as representing the relation 'regions - -> values', are implemented by subclasses of the StoreManager class (Store.h). - A particular StoreManager implementation has complete flexibility concerning - (a) *how* to model the binding between regions and values and (b) *what* - bindings are recorded. Together, both points allow different StoreManagers to - tradeoff between different levels of analysis precision and scalability - concerning the reasoning of program memory. Meanwhile, the core path-sensitive - engine makes no assumptions about (a) or (b), and queries a StoreManager about - the bindings to a memory region through a generic interface that all - StoreManagers share. If a particular StoreManager cannot reason about the - potential bindings of a given memory region (e.g., 'BasicStoreManager' does - not reason about fields of structures) then the StoreManager can simply return - 'unknown' (represented by 'UnknownVal') for a particular region-binding. This - separation of concerns not only isolates the core analysis engine from the - details of reasoning about program memory but also facilities the option of a - client of the path-sensitive engine to easily swap in different StoreManager - implementations that internally reason about program memory in very different - ways. - - The rest of this document is divided into two parts. We first discuss region - taxonomy and the semantics of regions. We then discuss the StoreManager - interface, and details of how the currently available StoreManager classes - implement region bindings. - -MEMORY REGIONS and REGION TAXONOMY - - POINTERS - - Before talking about the memory regions, we would talk about the pointers - since memory regions are essentially used to represent pointer values. - - The pointer is a type of values. Pointer values have two semantic aspects. One - is its physical value, which is an address or location. The other is the type - of the memory object residing in the address. - - Memory regions are designed to abstract these two properties of the - pointer. The physical value of a pointer is represented by MemRegion - pointers. The rvalue type of the region corresponds to the type of the pointee - object. - - One complication is that we could have different view regions on the same - memory chunk. They represent the same memory location, but have different - abstract location, i.e., MemRegion pointers. Thus we need to canonicalize - the abstract locations to get a unique abstract location for one physical - location. - - Furthermore, these different view regions may or may not represent memory - objects of different types. Some different types are semantically the same, - for example, 'struct s' and 'my_type' are the same type. - struct s; - typedef struct s my_type; - - But 'char' and 'int' are not the same type in the code below: - void *p; - int *q = (int*) p; - char *r = (char*) p; - - Thus we need to canonicalize the MemRegion which is used in binding and - retrieving. - - SYMBOLIC REGIONS - - A symbolic region is a map of the concept of symbolic values into the domain - of regions. It is the way that we represent symbolic pointers. Whenever a - symbolic pointer value is needed, a symbolic region is created to represent - it. - - A symbolic region has no type. It wraps a SymbolData. But sometimes we have - type information associated with a symbolic region. For this case, a - TypedViewRegion is created to layer the type information on top of the - symbolic region. The reason we do not carry type information with the symbolic - region is that the symbolic regions can have no type. To be consistent, we - don't let them to carry type information. - - Like a symbolic pointer, a symbolic region may be NULL, has unknown extent, - and represents a generic chunk of memory. - - NOTE: We plan not to use loc::SymbolVal in RegionStore and remove it - gradually. - - Symbolic regions get their rvalue types through the following ways: - * through the parameter or global variable that points to it, e.g.: - - void f(struct s* p) { - ... - } - - The symbolic region pointed to by 'p' has type 'struct s'. - - * through explicit or implicit casts, e.g.: - void f(void* p) { - struct s* q = (struct s*) p; - ... - } - - We attach the type information to the symbolic region lazily. For the first - case above, we create the TypedViewRegion only when the pointer is actually - used to access the pointee memory object, that is when the element or field - region is created. For the cast case, the TypedViewRegion is created when - visiting the CastExpr. - - The reason for doing lazy typing is that symbolic regions are sometimes only - used to do location comparison. - -Pointer Casts - - Pointer casts allow people to impose different 'views' onto a chunk of memory. - - Usually we have two kinds of casts. One kind of casts cast down with in the - type hierarchy. It imposes more specific views onto more generic memory - regions. The other kind of casts cast up with in the type hierarchy. It strips - away more specific views on top of the more generic memory regions. - - We simulate the down casts by layering another TypedViewRegion on top of the - original region. We simulate the up casts by striping away the top - TypedViewRegion. Down casts is usually simple. For up casts, if the there is - no TypedViewRegion to be stripped, we return the original region. If the - underlying region is of the different type than the cast-to type, we flag an - error state. - - For toll-free bridging casts, we return the original region. - - We can set up a partial order for pointer types, with the most general type - 'void*' at the top. The partial order forms a tree with 'void*' as its root - node. - - Every MemRegion has a root position in the type tree. For example, the pointee - region of 'void *p' has its root position at the root node of the tree. - VarRegion of 'int x' has its root position at the 'int type' node. - - TypedViewRegion is used to move the region down or up in the tree. Moving - down in the tree adds a TypedViewRegion. Moving up in the tree removes a - TypedViewRegion. - - Do we want to allow moving up beyond the root position? This happens when: - int x; - void *p = &x; - - The region of 'x' has its root position at 'int*' node. the cast to void* - moves that region up to the 'void*' node. I propose to not allow such casts, - and assign the region of 'x' for 'p'. - -Region Bindings - - The following region kinds are boundable: VarRegion, CompoundLiteralRegion, - StringRegion, ElementRegion, FieldRegion, and ObjCIvarRegion. - - When binding regions, we perform canonicalization on element regions and field - regions. This is because we can have different views on the same region, some - of which are essentially the same view with different sugar type names. - - To canonicalize a region, we get the canonical types for all TypedViewRegions - along the way up to the root region, and make new TypedViewRegions with those - canonical types. - - For ObjC and C++, perhaps another canonicalization rule should be added: for - FieldRegion, the least derived class that has the field is used as the type - of the super region of the FieldRegion. - - All bindings and retrievings are done on the canonicalized regions. - - Canonicalization is transparent outside the region store manager, and more - specifically, unaware outside the Bind() and Retrieve() method. We don't need - to consider region canonicalization when doing pointer cast. - -Constraint Manager - - The constraint manager reasons about the abstract location of memory - objects. We can have different views on a region, but none of these views - changes the location of that object. Thus we should get the same abstract - location for those regions. |