Convert region-design document to HTML.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@68366 91177308-0d34-0410-b5e6-96231b3b80d8
author: Ted Kremenek <kremenek@apple.com> 2009-04-03 01:38:55 +0000
committer: Ted Kremenek <kremenek@apple.com> 2009-04-03 01:38:55 +0000
commit: d76e0a6c2262e62d943c120a0a8768943fdbde05 (patch)
tree: b800af7e446960814f06f557de5ecceb6099e605 /docs
parent: 8a2073a85670a13b0bcc1726bc7f80d458f8245b (diff)
2 files changed, 232 insertions, 197 deletions
diff --git a/docs/AnalyzerRegions.html b/docs/AnalyzerRegions.html
new file mode 100644
index 0000000000..6767cfd579
--- /dev/null
+++ b/docs/AnalyzerRegions.html
@@ -0,0 +1,232 @@
+<html>
+<head>
+<title>Static Analyzer Design Document: Memory Regions</title>
+</head>
+<body>
+  
+<h1>Static Analyzer Design Document: Memory Regions</h1>
+
+<h3>Authors</h3>
+
+<p>Ted Kremenek, <tt>kremenek at apple</tt><br>
+Zhongxing Xu, <tt>xuzhongzhing at gmail</tt></p>
+
+<h2 id="intro">Introduction</h2>
+
+<p>The path-sensitive analysis engine in libAnalysis employs an extensible API
+for abstractly modeling the memory of an analyzed program. This API employs the
+concept of "memory regions" to abstractly model chunks of program memory such as
+program variables and dynamically allocated memory such as those returned from
+'malloc' and 'alloca'. Regions are hierarchical, with subregions modeling
+subtyping relationships, field and array offsets into larger chunks of memory,
+and so on.</p>
+
+<p>The region API consists of two components:</p>
+
+<ul> <li>A taxonomy and representation of regions themselves within the analyzer
+engine. The primary definitions and interfaces are described in <tt><a
+href="http://clang.llvm.org/doxygen/MemRegion_8h-source.html">MemRegion.h</a></tt>.
+At the root of the region hierarchy is the class <tt>MemRegion</tt> with
+specific subclasses refining the region concept for variables, heap allocated
+memory, and so forth.</li> <li>The modeling of binding of values to regions. For
+example, modeling the value stored to a local variable <tt>x</tt> consists of
+recording the binding between the region for <tt>x</tt> (which represents the
+raw memory associated with <tt>x</tt>) and the value stored to <tt>x</tt>. This
+binding relationship is captured with the notion of &quot;symbolic
+stores.&quot;</li> </ul>
+
+<p>Symbolic stores, which can be thought of as representing the relation
+<tt>regions -> values</tt>, are implemented by subclasses of the
+<tt>StoreManager</tt> class (<tt><a
+href="http://clang.llvm.org/doxygen/Store_8h-source.html">Store.h</a></tt>). A
+particular StoreManager implementation has complete flexibility concerning the
+following:
+
+<ul>
+<li><em>How</em> to model the binding between regions and values</li>
+<li><em>What</em> bindings are recorded
+</ul>
+
+<p>Together, both points allow different StoreManagers to tradeoff between
+different levels of analysis precision and scalability concerning the reasoning
+of program memory. Meanwhile, the core path-sensitive engine makes no
+assumptions about either points, and queries a StoreManager about the bindings
+to a memory region through a generic interface that all StoreManagers share. If
+a particular StoreManager cannot reason about the potential bindings of a given
+memory region (e.g., '<tt>BasicStoreManager</tt>' does not reason about fields
+of structures) then the StoreManager can simply return 'unknown' (represented by
+'<tt>UnknownVal</tt>') for a particular region-binding. This separation of
+concerns not only isolates the core analysis engine from the details of
+reasoning about program memory but also facilities the option of a client of the
+path-sensitive engine to easily swap in different StoreManager implementations
+that internally reason about program memory in very different ways.</pp>
+
+<p>The rest of this document is divided into two parts. We first discuss region
+taxonomy and the semantics of regions. We then discuss the StoreManager
+interface, and details of how the currently available StoreManager classes
+implement region bindings.</p>
+
+<h2 id="regions">Memory Regions and Region Taxonomy</h2>
+
+<h3>Pointers</h3>
+
+<p>Before talking about the memory regions, we would talk about the pointers
+since memory regions are essentially used to represent pointer values.</p>
+
+<p>The pointer is a type of values. Pointer values have two semantic aspects.
+One is its physical value, which is an address or location. The other is the
+type of the memory object residing in the address.</p>
+
+<p>Memory regions are designed to abstract these two properties of the pointer.
+The physical value of a pointer is represented by MemRegion pointers. The rvalue
+type of the region corresponds to the type of the pointee object.</p>
+
+<p>One complication is that we could have different view regions on the same
+memory chunk. They represent the same memory location, but have different
+abstract location, i.e., MemRegion pointers. Thus we need to canonicalize the
+abstract locations to get a unique abstract location for one physical
+location.</p>
+
+<p>Furthermore, these different view regions may or may not represent memory
+objects of different types. Some different types are semantically the same,
+for example, 'struct s' and 'my_type' are the same type.</p>
+
+<pre>
+struct s;
+typedef struct s my_type;
+</pre>
+
+<p>But <tt>char</tt> and <tt>int</tt> are not the same type in the code below:</p>
+
+<pre>
+void *p;
+int *q = (int*) p;
+char *r = (char*) p;
+</pre
+
+<p>Thus we need to canonicalize the MemRegion which is used in binding and
+retrieving.</p>
+
+<h3>Symbolic Regions</h3>
+
+<p>A symbolic region is a map of the concept of symbolic values into the domain
+of regions. It is the way that we represent symbolic pointers. Whenever a
+symbolic pointer value is needed, a symbolic region is created to represent
+it.</p>
+
+<p>A symbolic region has no type. It wraps a SymbolData. But sometimes we have
+type information associated with a symbolic region. For this case, a
+TypedViewRegion is created to layer the type information on top of the symbolic
+region. The reason we do not carry type information with the symbolic region is
+that the symbolic regions can have no type. To be consistent, we don't let them
+to carry type information.</p>
+
+<p>Like a symbolic pointer, a symbolic region may be NULL, has unknown extent,
+and represents a generic chunk of memory.</p>
+
+<p><em><b>NOTE</b>: We plan not to use loc::SymbolVal in RegionStore and remove it
+  gradually.</em></p>
+
+<p>Symbolic regions get their rvalue types through the following ways:</p>
+
+<ul>
+<li>Through the parameter or global variable that points to it, e.g.:
+<pre>
+void f(struct s* p) {
+  ...
+}
+</pre>
+
+<p>The symbolic region pointed to by <tt>p</tt> has type <tt>struct
+s</tt>.</p></li>
+
+<li>Through explicit or implicit casts, e.g.:
+<pre>
+void f(void* p) {
+  struct s* q = (struct s*) p;
+  ...
+}
+</pre>
+</li>
+</ul>
+
+<p>We attach the type information to the symbolic region lazily. For the first
+case above, we create the <tt>TypedViewRegion</tt> only when the pointer is
+actually used to access the pointee memory object, that is when the element or
+field region is created. For the cast case, the <tt>TypedViewRegion</tt> is
+created when visiting the <tt>CastExpr</tt>.</p>
+
+<p>The reason for doing lazy typing is that symbolic regions are sometimes only
+used to do location comparison.</p>
+
+<h3>Pointer Casts</h3>
+
+<p>Pointer casts allow people to impose different 'views' onto a chunk of
+memory.</p>
+
+<p>Usually we have two kinds of casts. One kind of casts cast down with in the
+type hierarchy. It imposes more specific views onto more generic memory regions.
+The other kind of casts cast up with in the type hierarchy. It strips away more
+specific views on top of the more generic memory regions.</p>
+
+<p>We simulate the down casts by layering another <tt>TypedViewRegion</tt> on
+top of the original region. We simulate the up casts by striping away the top
+<tt>TypedViewRegion</tt>. Down casts is usually simple. For up casts, if the
+there is no <tt>TypedViewRegion</tt> to be stripped, we return the original
+region. If the underlying region is of the different type than the cast-to type,
+we flag an error state.</p>
+
+<p>For toll-free bridging casts, we return the original region.</p>
+
+<p>We can set up a partial order for pointer types, with the most general type
+<tt>void*</tt> at the top. The partial order forms a tree with <tt>void*</tt> as
+its root node.</p>
+
+<p>Every <tt>MemRegion</tt> has a root position in the type tree. For example,
+the pointee region of <tt>void *p</tt> has its root position at the root node of
+the tree. <tt>VarRegion</tt> of <tt>int x</tt> has its root position at the 'int
+type' node.</p>
+
+<p><tt>TypedViewRegion</tt> is used to move the region down or up in the tree.
+Moving down in the tree adds a <tt>TypedViewRegion</tt>. Moving up in the tree
+removes a <Tt>TypedViewRegion</tt>.</p>
+
+<p>Do we want to allow moving up beyond the root position? This happens
+when:</p> <pre> int x; void *p = &amp;x; </pre>
+
+<p>The region of <tt>x</tt> has its root position at 'int*' node. the cast to
+void* moves that region up to the 'void*' node. I propose to not allow such
+casts, and assign the region of <tt>x</tt> for <tt>p</tt>.<p>
+
+<h3>Region Bindings</h3>
+
+<p>The following region kinds are boundable: VarRegion, CompoundLiteralRegion,
+StringRegion, ElementRegion, FieldRegion, and ObjCIvarRegion.</p>
+
+<p>When binding regions, we perform canonicalization on element regions and field
+regions. This is because we can have different views on the same region, some
+of which are essentially the same view with different sugar type names.</p>
+
+<p>To canonicalize a region, we get the canonical types for all TypedViewRegions
+along the way up to the root region, and make new TypedViewRegions with those
+canonical types.</p>
+
+<p>For Objective-C and C++, perhaps another canonicalization rule should be
+added: for FieldRegion, the least derived class that has the field is used as
+the type of the super region of the FieldRegion.</p>
+
+<p>All bindings and retrievings are done on the canonicalized regions.</p>
+
+<p>Canonicalization is transparent outside the region store manager, and more
+specifically, unaware outside the Bind() and Retrieve() method. We don't need to
+consider region canonicalization when doing pointer cast.</p>
+
+<h3>Constraint Manager</h3>
+
+<p>The constraint manager reasons about the abstract location of memory objects.
+We can have different views on a region, but none of these views changes the
+location of that object. Thus we should get the same abstract location for those
+regions.</p>
+
+</body>
+</html>
diff --git a/docs/AnalyzerRegions.txt b/docs/AnalyzerRegions.txt
deleted file mode 100644
index c9c4ab30df..0000000000
--- a/docs/AnalyzerRegions.txt
+++ /dev/null
@@ -1,197 +0,0 @@
-Static Analyzer: 'Regions'
---------------------------
-
-INTRODUCTION
-
-  The path-sensitive analysis engine in libAnalysis employs an extensible API
-  for abstractly modeling the memory of an analyzed program. This API employs
-  the concept of "memory regions" to abstractly model chunks of program memory
-  such as program variables and dynamically allocated memory such as those
-  returned from 'malloc' and 'alloca'. Regions are hierarchical, with subregions
-  modeling subtyping relationships, field and array offsets into larger chunks
-  of memory, and so on.
-
-  The region API consists of two components. The first is the taxonomy and
-  representation of regions themselves within the analyzer engine. The primary
-  definitions and interfaces are described in
-  'include/clang/Analysis/PathSensitive/MemRegion.h'. At the root of the region
-  hierarchy is the class 'MemRegion' with specific subclasses refining the
-  region concept for variables, heap allocated memory, and so forth.
-
-  The second component in the region API is the modeling of the binding of
-  values to regions. For example, modeling the value stored to a local variable
-  'x' consists of recording the binding between the region for 'x' (which
-  represents the raw memory associated with 'x') and the value stored to 'x'.
-  This binding relationship is captured with the notion of "symbolic stores."
-
-  Symbolic stores, which can be thought of as representing the relation 'regions
-  -> values', are implemented by subclasses of the StoreManager class (Store.h).
-  A particular StoreManager implementation has complete flexibility concerning
-  (a) *how* to model the binding between regions and values and (b) *what*
-  bindings are recorded. Together, both points allow different StoreManagers to
-  tradeoff between different levels of analysis precision and scalability
-  concerning the reasoning of program memory. Meanwhile, the core path-sensitive
-  engine makes no assumptions about (a) or (b), and queries a StoreManager about
-  the bindings to a memory region through a generic interface that all
-  StoreManagers share. If a particular StoreManager cannot reason about the
-  potential bindings of a given memory region (e.g., 'BasicStoreManager' does
-  not reason about fields of structures) then the StoreManager can simply return
-  'unknown' (represented by 'UnknownVal') for a particular region-binding. This
-  separation of concerns not only isolates the core analysis engine from the
-  details of reasoning about program memory but also facilities the option of a
-  client of the path-sensitive engine to easily swap in different StoreManager
-  implementations that internally reason about program memory in very different
-  ways.
-
-  The rest of this document is divided into two parts. We first discuss region
-  taxonomy and the semantics of regions. We then discuss the StoreManager
-  interface, and details of how the currently available StoreManager classes
-  implement region bindings.
-
-MEMORY REGIONS and REGION TAXONOMY
-
-  POINTERS
-
-  Before talking about the memory regions, we would talk about the pointers
-  since memory regions are essentially used to represent pointer values.
-
-  The pointer is a type of values. Pointer values have two semantic aspects. One
-  is its physical value, which is an address or location. The other is the type
-  of the memory object residing in the address.
-
-  Memory regions are designed to abstract these two properties of the
-  pointer. The physical value of a pointer is represented by MemRegion
-  pointers. The rvalue type of the region corresponds to the type of the pointee
-  object.
-
-  One complication is that we could have different view regions on the same
-  memory chunk. They represent the same memory location, but have different
-  abstract location, i.e., MemRegion pointers. Thus we need to canonicalize
-  the abstract locations to get a unique abstract location for one physical
-  location.
-
-  Furthermore, these different view regions may or may not represent memory
-  objects of different types. Some different types are semantically the same,
-  for example, 'struct s' and 'my_type' are the same type.
-    struct s;
-    typedef struct s my_type;
-
-  But 'char' and 'int' are not the same type in the code below:
-    void *p;
-    int *q = (int*) p;
-    char *r = (char*) p;
-
-  Thus we need to canonicalize the MemRegion which is used in binding and
-  retrieving.
-
-  SYMBOLIC REGIONS
-
-  A symbolic region is a map of the concept of symbolic values into the domain
-  of regions. It is the way that we represent symbolic pointers. Whenever a
-  symbolic pointer value is needed, a symbolic region is created to represent
-  it.
-
-  A symbolic region has no type. It wraps a SymbolData. But sometimes we have
-  type information associated with a symbolic region. For this case, a
-  TypedViewRegion is created to layer the type information on top of the
-  symbolic region. The reason we do not carry type information with the symbolic
-  region is that the symbolic regions can have no type. To be consistent, we
-  don't let them to carry type information.
-
-  Like a symbolic pointer, a symbolic region may be NULL, has unknown extent,
-  and represents a generic chunk of memory.
-
-  NOTE: We plan not to use loc::SymbolVal in RegionStore and remove it
-  gradually.
-
-  Symbolic regions get their rvalue types through the following ways:
-  * through the parameter or global variable that points to it, e.g.:
-
-    void f(struct s* p) {
-      ...
-    }
-
-    The symbolic region pointed to by 'p' has type 'struct s'.
-
-  * through explicit or implicit casts, e.g.:
-    void f(void* p) {
-      struct s* q = (struct s*) p;
-      ...
-    }
-
-  We attach the type information to the symbolic region lazily. For the first
-  case above, we create the TypedViewRegion only when the pointer is actually
-  used to access the pointee memory object, that is when the element or field
-  region is created. For the cast case, the TypedViewRegion is created when
-  visiting the CastExpr.
-
-  The reason for doing lazy typing is that symbolic regions are sometimes only
-  used to do location comparison.
-
-Pointer Casts
-
-  Pointer casts allow people to impose different 'views' onto a chunk of memory.
-  
-  Usually we have two kinds of casts. One kind of casts cast down with in the
-  type hierarchy. It imposes more specific views onto more generic memory
-  regions. The other kind of casts cast up with in the type hierarchy. It strips
-  away more specific views on top of the more generic memory regions.
-  
-  We simulate the down casts by layering another TypedViewRegion on top of the
-  original region. We simulate the up casts by striping away the top
-  TypedViewRegion. Down casts is usually simple. For up casts, if the there is
-  no TypedViewRegion to be stripped, we return the original region. If the
-  underlying region is of the different type than the cast-to type, we flag an
-  error state.
-  
-  For toll-free bridging casts, we return the original region.
-
-  We can set up a partial order for pointer types, with the most general type
-  'void*' at the top. The partial order forms a tree with 'void*' as its root
-  node.
-
-  Every MemRegion has a root position in the type tree. For example, the pointee
-  region of 'void *p' has its root position at the root node of the tree.
-  VarRegion of 'int x' has its root position at the 'int type' node.
-
-  TypedViewRegion is used to move the region down or up in the tree. Moving
-  down in the tree adds a TypedViewRegion. Moving up in the tree removes a
-  TypedViewRegion.
-
-  Do we want to allow moving up beyond the root position? This happens when:
-    int x;
-    void *p = &x;
-  
-  The region of 'x' has its root position at 'int*' node. the cast to void*
-  moves that region up to the 'void*' node. I propose to not allow such casts,
-  and assign the region of 'x' for 'p'.
-
-Region Bindings
-
-  The following region kinds are boundable: VarRegion, CompoundLiteralRegion,
-  StringRegion, ElementRegion, FieldRegion, and ObjCIvarRegion.
-
-  When binding regions, we perform canonicalization on element regions and field
-  regions. This is because we can have different views on the same region, some
-  of which are essentially the same view with different sugar type names.
-
-  To canonicalize a region, we get the canonical types for all TypedViewRegions
-  along the way up to the root region, and make new TypedViewRegions with those
-  canonical types.
-
-  For ObjC and C++, perhaps another canonicalization rule should be added: for
-  FieldRegion, the least derived class that has the field is used as the type
-  of the super region of the FieldRegion.
-  
-  All bindings and retrievings are done on the canonicalized regions.
-  
-  Canonicalization is transparent outside the region store manager, and more
-  specifically, unaware outside the Bind() and Retrieve() method. We don't need
-  to consider region canonicalization when doing pointer cast.
-
-Constraint Manager
-
-  The constraint manager reasons about the abstract location of memory
-  objects. We can have different views on a region, but none of these views
-  changes the location of that object. Thus we should get the same abstract
-  location for those regions.
author	Ted Kremenek <kremenek@apple.com>	2009-04-03 01:38:55 +0000
committer	Ted Kremenek <kremenek@apple.com>	2009-04-03 01:38:55 +0000
commit	d76e0a6c2262e62d943c120a0a8768943fdbde05 (patch)
tree	b800af7e446960814f06f557de5ecceb6099e605 /docs
parent	8a2073a85670a13b0bcc1726bc7f80d458f8245b (diff)