aboutsummaryrefslogtreecommitdiff
path: root/docs/HistoricalNotes
diff options
context:
space:
mode:
Diffstat (limited to 'docs/HistoricalNotes')
-rw-r--r--docs/HistoricalNotes/2000-11-18-EarlyDesignIdeas.txt74
-rw-r--r--docs/HistoricalNotes/2000-11-18-EarlyDesignIdeasResp.txt199
-rw-r--r--docs/HistoricalNotes/2000-12-06-EncodingIdea.txt30
-rw-r--r--docs/HistoricalNotes/2000-12-06-MeetingSummary.txt83
-rw-r--r--docs/HistoricalNotes/2001-01-31-UniversalIRIdea.txt39
-rw-r--r--docs/HistoricalNotes/2001-02-06-TypeNotationDebate.txt67
-rw-r--r--docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp1.txt75
-rw-r--r--docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp2.txt53
-rw-r--r--docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp4.txt89
-rw-r--r--docs/HistoricalNotes/2001-02-09-AdveComments.txt120
-rw-r--r--docs/HistoricalNotes/2001-02-09-AdveCommentsResponse.txt245
-rw-r--r--docs/HistoricalNotes/2001-02-13-Reference-Memory.txt39
-rw-r--r--docs/HistoricalNotes/2001-02-13-Reference-MemoryResponse.txt47
-rw-r--r--docs/HistoricalNotes/2001-04-16-DynamicCompilation.txt49
-rw-r--r--docs/HistoricalNotes/2001-05-18-ExceptionHandling.txt202
-rw-r--r--docs/HistoricalNotes/2001-05-19-ExceptionResponse.txt45
-rw-r--r--docs/HistoricalNotes/2001-06-01-GCCOptimizations.txt63
-rw-r--r--docs/HistoricalNotes/2001-06-01-GCCOptimizations2.txt71
-rw-r--r--docs/HistoricalNotes/2001-06-20-.NET-Differences.txt30
-rw-r--r--docs/HistoricalNotes/2001-07-06-LoweringIRForCodeGen.txt31
-rw-r--r--docs/HistoricalNotes/2001-09-18-OptimizeExceptions.txt56
-rw-r--r--docs/HistoricalNotes/2002-05-12-InstListChange.txt55
-rw-r--r--docs/HistoricalNotes/2002-06-25-MegaPatchInfo.txt72
-rw-r--r--docs/HistoricalNotes/2003-01-23-CygwinNotes.txt28
-rw-r--r--docs/HistoricalNotes/2003-06-25-Reoptimizer1.txt137
-rw-r--r--docs/HistoricalNotes/2003-06-26-Reoptimizer2.txt110
-rw-r--r--docs/HistoricalNotes/2007-OriginalClangReadme.txt178
27 files changed, 0 insertions, 2287 deletions
diff --git a/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeas.txt b/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeas.txt
deleted file mode 100644
index f086181192..0000000000
--- a/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeas.txt
+++ /dev/null
@@ -1,74 +0,0 @@
-Date: Sat, 18 Nov 2000 09:19:35 -0600 (CST)
-From: Vikram Adve <vadve@cs.uiuc.edu>
-To: Chris Lattner <lattner@cs.uiuc.edu>
-Subject: a few thoughts
-
-I've been mulling over the virtual machine problem and I had some
-thoughts about some things for us to think about discuss:
-
-1. We need to be clear on our goals for the VM. Do we want to emphasize
- portability and safety like the Java VM? Or shall we focus on the
- architecture interface first (i.e., consider the code generation and
- processor issues), since the architecture interface question is also
- important for portable Java-type VMs?
-
- This is important because the audiences for these two goals are very
- different. Architects and many compiler people care much more about
- the second question. The Java compiler and OS community care much more
- about the first one.
-
- Also, while the architecture interface question is important for
- Java-type VMs, the design constraints are very different.
-
-
-2. Design issues to consider (an initial list that we should continue
- to modify). Note that I'm not trying to suggest actual solutions here,
- but just various directions we can pursue:
-
- a. A single-assignment VM, which we've both already been thinking about.
-
- b. A strongly-typed VM. One question is do we need the types to be
- explicitly declared or should they be inferred by the dynamic compiler?
-
- c. How do we get more high-level information into the VM while keeping
- to a low-level VM design?
-
- o Explicit array references as operands? An alternative is
- to have just an array type, and let the index computations be
- separate 3-operand instructions.
-
- o Explicit instructions to handle aliasing, e.g.s:
- -- an instruction to say "I speculate that these two values are not
- aliased, but check at runtime", like speculative execution in
- EPIC?
- -- or an instruction to check whether two values are aliased and
- execute different code depending on the answer, somewhat like
- predicated code in EPIC
-
- o (This one is a difficult but powerful idea.)
- A "thread-id" field on every instruction that allows the static
- compiler to generate a set of parallel threads, and then have
- the runtime compiler and hardware do what they please with it.
- This has very powerful uses, but thread-id on every instruction
- is expensive in terms of instruction size and code size.
- We would need to compactly encode it somehow.
-
- Also, this will require some reading on at least two other
- projects:
- -- Multiscalar architecture from Wisconsin
- -- Simultaneous multithreading architecture from Washington
-
- o Or forget all this and stick to a traditional instruction set?
-
-
-BTW, on an unrelated note, after the meeting yesterday, I did remember
-that you had suggested doing instruction scheduling on SSA form instead
-of a dependence DAG earlier in the semester. When we talked about
-it yesterday, I didn't remember where the idea had come from but I
-remembered later. Just giving credit where its due...
-
-Perhaps you can save the above as a file under RCS so you and I can
-continue to expand on this.
-
---Vikram
-
diff --git a/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeasResp.txt b/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeasResp.txt
deleted file mode 100644
index 1c725f5aa7..0000000000
--- a/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeasResp.txt
+++ /dev/null
@@ -1,199 +0,0 @@
-Date: Sun, 19 Nov 2000 16:23:57 -0600 (CST)
-From: Chris Lattner <sabre@nondot.org>
-To: Vikram Adve <vadve@cs.uiuc.edu>
-Subject: Re: a few thoughts
-
-Okay... here are a few of my thoughts on this (it's good to know that we
-think so alike!):
-
-> 1. We need to be clear on our goals for the VM. Do we want to emphasize
-> portability and safety like the Java VM? Or shall we focus on the
-> architecture interface first (i.e., consider the code generation and
-> processor issues), since the architecture interface question is also
-> important for portable Java-type VMs?
-
-I forsee the architecture looking kinda like this: (which is completely
-subject to change)
-
-1. The VM code is NOT guaranteed safe in a java sense. Doing so makes it
- basically impossible to support C like languages. Besides that,
- certifying a register based language as safe at run time would be a
- pretty expensive operation to have to do. Additionally, we would like
- to be able to statically eliminate many bounds checks in Java
- programs... for example.
-
- 2. Instead, we can do the following (eventually):
- * Java bytecode is used as our "safe" representation (to avoid
- reinventing something that we don't add much value to). When the
- user chooses to execute Java bytecodes directly (ie, not
- precompiled) the runtime compiler can do some very simple
- transformations (JIT style) to convert it into valid input for our
- VM. Performance is not wonderful, but it works right.
- * The file is scheduled to be compiled (rigorously) at a later
- time. This could be done by some background process or by a second
- processor in the system during idle time or something...
- * To keep things "safe" ie to enforce a sandbox on Java/foreign code,
- we could sign the generated VM code with a host specific private
- key. Then before the code is executed/loaded, we can check to see if
- the trusted compiler generated the code. This would be much quicker
- than having to validate consistency (especially if bounds checks have
- been removed, for example)
-
-> This is important because the audiences for these two goals are very
-> different. Architects and many compiler people care much more about
-> the second question. The Java compiler and OS community care much more
-> about the first one.
-
-3. By focusing on a more low level virtual machine, we have much more room
- for value add. The nice safe "sandbox" VM can be provided as a layer
- on top of it. It also lets us focus on the more interesting compilers
- related projects.
-
-> 2. Design issues to consider (an initial list that we should continue
-> to modify). Note that I'm not trying to suggest actual solutions here,
-> but just various directions we can pursue:
-
-Understood. :)
-
-> a. A single-assignment VM, which we've both already been thinking
-> about.
-
-Yup, I think that this makes a lot of sense. I am still intrigued,
-however, by the prospect of a minimally allocated VM representation... I
-think that it could have definate advantages for certain applications
-(think very small machines, like PDAs). I don't, however, think that our
-initial implementations should focus on this. :)
-
-Here are some other auxilliary goals that I think we should consider:
-
-1. Primary goal: Support a high performance dynamic compilation
- system. This means that we have an "ideal" division of labor between
- the runtime and static compilers. Of course, the other goals of the
- system somewhat reduce the importance of this point (f.e. portability
- reduces performance, but hopefully not much)
-2. Portability to different processors. Since we are most familiar with
- x86 and solaris, I think that these two are excellent candidates when
- we get that far...
-3. Support for all languages & styles of programming (general purpose
- VM). This is the point that disallows java style bytecodes, where all
- array refs are checked for bounds, etc...
-4. Support linking between different language families. For example, call
- C functions directly from Java without using the nasty/slow/gross JNI
- layer. This involves several subpoints:
- A. Support for languages that require garbage collectors and integration
- with languages that don't. As a base point, we could insist on
- always using a conservative GC, but implement free as a noop, f.e.
-
-> b. A strongly-typed VM. One question is do we need the types to be
-> explicitly declared or should they be inferred by the dynamic
-> compiler?
-
- B. This is kind of similar to another idea that I have: make OOP
- constructs (virtual function tables, class heirarchies, etc) explicit
- in the VM representation. I believe that the number of additional
- constructs would be fairly low, but would give us lots of important
- information... something else that would/could be important is to
- have exceptions as first class types so that they would be handled in
- a uniform way for the entire VM... so that C functions can call Java
- functions for example...
-
-> c. How do we get more high-level information into the VM while keeping
-> to a low-level VM design?
-> o Explicit array references as operands? An alternative is
-> to have just an array type, and let the index computations be
-> separate 3-operand instructions.
-
- C. In the model I was thinking of (subject to change of course), we
- would just have an array type (distinct from the pointer
- types). This would allow us to have arbitrarily complex index
- expressions, while still distinguishing "load" from "Array load",
- for example. Perhaps also, switch jump tables would be first class
- types as well? This would allow better reasoning about the program.
-
-5. Support dynamic loading of code from various sources. Already
- mentioned above was the example of loading java bytecodes, but we want
- to support dynamic loading of VM code as well. This makes the job of
- the runtime compiler much more interesting: it can do interprocedural
- optimizations that the static compiler can't do, because it doesn't
- have all of the required information (for example, inlining from
- shared libraries, etc...)
-
-6. Define a set of generally useful annotations to add to the VM
- representation. For example, a function can be analysed to see if it
- has any sideeffects when run... also, the MOD/REF sets could be
- calculated, etc... we would have to determine what is reasonable. This
- would generally be used to make IP optimizations cheaper for the
- runtime compiler...
-
-> o Explicit instructions to handle aliasing, e.g.s:
-> -- an instruction to say "I speculate that these two values are not
-> aliased, but check at runtime", like speculative execution in
-> EPIC?
-> -- or an instruction to check whether two values are aliased and
-> execute different code depending on the answer, somewhat like
-> predicated code in EPIC
-
-These are also very good points... if this can be determined at compile
-time. I think that an epic style of representation (not the instruction
-packing, just the information presented) could be a very interesting model
-to use... more later...
-
-> o (This one is a difficult but powerful idea.)
-> A "thread-id" field on every instruction that allows the static
-> compiler to generate a set of parallel threads, and then have
-> the runtime compiler and hardware do what they please with it.
-> This has very powerful uses, but thread-id on every instruction
-> is expensive in terms of instruction size and code size.
-> We would need to compactly encode it somehow.
-
-Yes yes yes! :) I think it would be *VERY* useful to include this kind
-of information (which EPIC architectures *implicitly* encode. The trend
-that we are seeing supports this greatly:
-
-1. Commodity processors are getting massive SIMD support:
- * Intel/Amd MMX/MMX2
- * AMD's 3Dnow!
- * Intel's SSE/SSE2
- * Sun's VIS
-2. SMP is becoming much more common, especially in the server space.
-3. Multiple processors on a die are right around the corner.
-
-If nothing else, not designing this in would severely limit our future
-expansion of the project...
-
-> Also, this will require some reading on at least two other
-> projects:
-> -- Multiscalar architecture from Wisconsin
-> -- Simultaneous multithreading architecture from Washington
->
-> o Or forget all this and stick to a traditional instruction set?
-
-Heh... :) Well, from a pure research point of view, it is almost more
-attactive to go with the most extreme/different ISA possible. On one axis
-you get safety and conservatism, and on the other you get degree of
-influence that the results have. Of course the problem with pure research
-is that often times there is no concrete product of the research... :)
-
-> BTW, on an unrelated note, after the meeting yesterday, I did remember
-> that you had suggested doing instruction scheduling on SSA form instead
-> of a dependence DAG earlier in the semester. When we talked about
-> it yesterday, I didn't remember where the idea had come from but I
-> remembered later. Just giving credit where its due...
-
-:) Thanks.
-
-> Perhaps you can save the above as a file under RCS so you and I can
-> continue to expand on this.
-
-I think it makes sense to do so when we get our ideas more formalized and
-bounce it back and forth a couple of times... then I'll do a more formal
-writeup of our goals and ideas. Obviously our first implementation will
-not want to do all of the stuff that I pointed out above... be we will
-want to design the project so that we do not artificially limit ourselves
-at sometime in the future...
-
-Anyways, let me know what you think about these ideas... and if they sound
-reasonable...
-
--Chris
-
diff --git a/docs/HistoricalNotes/2000-12-06-EncodingIdea.txt b/docs/HistoricalNotes/2000-12-06-EncodingIdea.txt
deleted file mode 100644
index 8c452924dd..0000000000
--- a/docs/HistoricalNotes/2000-12-06-EncodingIdea.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-From: Chris Lattner [mailto:sabre@nondot.org]
-Sent: Wednesday, December 06, 2000 6:41 PM
-To: Vikram S. Adve
-Subject: Additional idea with respect to encoding
-
-Here's another idea with respect to keeping the common case instruction
-size down (less than 32 bits ideally):
-
-Instead of encoding an instruction to operate on two register numbers,
-have it operate on two negative offsets based on the current register
-number. Therefore, instead of using:
-
-r57 = add r55, r56 (r57 is the implicit dest register, of course)
-
-We could use:
-
-r57 = add -2, -1
-
-My guess is that most SSA references are to recent values (especially if
-they correspond to expressions like (x+y*z+p*q/ ...), so the negative
-numbers would tend to stay small, even at the end of the procedure (where
-the implicit register destination number could be quite large). Of course
-the negative sign is reduntant, so you would be storing small integers
-almost all of the time, and 5-6 bits worth of register number would be
-plenty for most cases...
-
-What do you think?
-
--Chris
-
diff --git a/docs/HistoricalNotes/2000-12-06-MeetingSummary.txt b/docs/HistoricalNotes/2000-12-06-MeetingSummary.txt
deleted file mode 100644
index b66e18556f..0000000000
--- a/docs/HistoricalNotes/2000-12-06-MeetingSummary.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-SUMMARY
--------
-
-We met to discuss the LLVM instruction format and bytecode representation:
-
-ISSUES RESOLVED
----------------
-
-1. We decided that we shall use a flat namespace to represent our
- variables in SSA form, as opposed to having a two dimensional namespace
- of the original variable and the SSA instance subscript.
-
-ARGUMENT AGAINST:
- * A two dimensional namespace would be valuable when doing alias
- analysis because the extra information can help limit the scope of
- analysis.
-
-ARGUMENT FOR:
- * Including this information would require that all users of the LLVM
- bytecode would have to parse and handle it. This would slow down the
- common case and inflate the instruction representation with another
- infinite variable space.
-
-REASONING:
- * It was decided that because original variable sources could be
- reconstructed from SSA form in linear time, that it would be an
- unjustified expense for the common case to include the extra
- information for one optimization. Alias analysis itself is typically
- greater than linear in asymptotic complexity, so this extra analaysis
- would not affect the runtime of the optimization in a significant
- way. Additionally, this would be an unlikely optimization to do at
- runtime.
-
-
-IDEAS TO CONSIDER
------------------
-
-1. Including dominator information in the LLVM bytecode
- representation. This is one example of an analysis result that may be
- packaged with the bytecodes themselves. As a conceptual implementation
- idea, we could include an immediate dominator number for each basic block
- in the LLVM bytecode program. Basic blocks could be numbered according
- to the order of occurance in the bytecode representation.
-
-2. Including loop header and body information. This would facilitate
- detection of intervals and natural loops.
-
-UNRESOLVED ISSUES
------------------
-
-1. Will oSUIF provide enough of an infrastructure to support the research
- that we will be doing? We know that it has less than stellar
- performance, but hope that this will be of little importance for our
- static compiler. This could affect us if we decided to do some IP
- research. Also we do not yet understand the level of exception support
- currently implemented.
-
-2. Should we consider the requirements of a direct hardware implementation
- of the LLVM when we design it? If so, several design issues should
- have their priorities shifted. The other option is to focus on a
- software layer interpreting the LLVM in all cases.
-
-3. Should we use some form of packetized format to improve forward
- compatibility? For example, we could design the system to encode a
- packet type and length field before analysis information, to allow a
- runtime to skip information that it didn't understand in a bytecode
- stream. The obvious benefit would be for compatibility, the drawback
- is that it would tend to splinter that 'standard' LLVM definition.
-
-4. Should we use fixed length instructions or variable length
- instructions? Fetching variable length instructions is expensive (for
- either hardware or software based LLVM runtimes), but we have several
- 'infinite' spaces that instructions operate in (SSA register numbers,
- type spaces, or packet length [if packets were implemented]). Several
- options were mentioned including:
- A. Using 16 or 32 bit numbers, which would be 'big enough'
- B. A scheme similar to how UTF-8 works, to encode infinite numbers
- while keeping small number small.
- C. Use something similar to Huffman encoding, so that the most common
- numbers are the smallest.
-
--Chris
-
diff --git a/docs/HistoricalNotes/2001-01-31-UniversalIRIdea.txt b/docs/HistoricalNotes/2001-01-31-UniversalIRIdea.txt
deleted file mode 100644
index 111706a344..0000000000
--- a/docs/HistoricalNotes/2001-01-31-UniversalIRIdea.txt
+++ /dev/null
@@ -1,39 +0,0 @@
-Date: Wed, 31 Jan 2001 12:04:33 -0600
-From: Vikram S. Adve <vadve@cs.uiuc.edu>
-To: Chris Lattner <lattner@cs.uiuc.edu>
-Subject: another thought
-
-I have a budding idea about making LLVM a little more ambitious: a
-customizable runtime system that can be used to implement language-specific
-virtual machines for many different languages. E.g., a C vm, a C++ vm, a
-Java vm, a Lisp vm, ..
-
-The idea would be that LLVM would provide a standard set of runtime features
-(some low-level like standard assembly instructions with code generation and
-static and runtime optimization; some higher-level like type-safety and
-perhaps a garbage collection library). Each language vm would select the
-runtime features needed for that language, extending or customizing them as
-needed. Most of the machine-dependent code-generation and optimization
-features as well as low-level machine-independent optimizations (like PRE)
-could be provided by LLVM and should be sufficient for any language,
-simplifying the language compiler. (This would also help interoperability
-between languages.) Also, some or most of the higher-level
-machine-independent features like type-safety and access safety should be
-reusable by different languages, with minor extensions. The language
-compiler could then focus on language-specific analyses and optimizations.
-
-The risk is that this sounds like a universal IR -- something that the
-compiler community has tried and failed to develop for decades, and is
-universally skeptical about. No matter what we say, we won't be able to
-convince anyone that we have a universal IR that will work. We need to
-think about whether LLVM is different or if has something novel that might
-convince people. E.g., the idea of providing a package of separable
-features that different languages select from. Also, using SSA with or
-without type-safety as the intermediate representation.
-
-One interesting starting point would be to discuss how a JVM would be
-implemented on top of LLVM a bit more. That might give us clues on how to
-structure LLVM to support one or more language VMs.
-
---Vikram
-
diff --git a/docs/HistoricalNotes/2001-02-06-TypeNotationDebate.txt b/docs/HistoricalNotes/2001-02-06-TypeNotationDebate.txt
deleted file mode 100644
index c09cf1f03c..0000000000
--- a/docs/HistoricalNotes/2001-02-06-TypeNotationDebate.txt
+++ /dev/null
@@ -1,67 +0,0 @@
-Date: Tue, 6 Feb 2001 20:27:37 -0600 (CST)
-From: Chris Lattner <sabre@nondot.org>
-To: Vikram S. Adve <vadve@cs.uiuc.edu>
-Subject: Type notation debate...
-
-This is the way that I am currently planning on implementing types:
-
-Primitive Types:
-type ::= void|bool|sbyte|ubyte|short|ushort|int|uint|long|ulong
-
-Method:
-typelist ::= typelisth | /*empty*/
-typelisth ::= type | typelisth ',' type
-type ::= type (typelist)
-
-Arrays (without and with size):
-type ::= '[' type ']' | '[' INT ',' type ']'
-
-Pointer:
-type ::= type '*'
-
-Structure:
-type ::= '{' typelist '}'
-
-Packed:
-type ::= '<' INT ',' type '>'
-
-Simple examples:
-
-[[ %4, int ]] - array of (array of 4 (int))
-[ { int, int } ] - Array of structure
-[ < %4, int > ] - Array of 128 bit SIMD packets
-int (int, [[int, %4]]) - Method taking a 2d array and int, returning int
-
-
-Okay before you comment, please look at:
-
-http://www.research.att.com/~bs/devXinterview.html
-
-Search for "In another interview, you defined the C declarator syntax as
-an experiment that failed. However, this syntactic construct has been
-around for 27 years and perhaps more; why do you consider it problematic
-(except for its cumbersome syntax)?" and read that response for me. :)
-
-Now with this syntax, his example would be represented as:
-
-[ %10, bool (int, int) * ] *
-
-vs
-
-bool (*(*)[10])(int, int)
-
-in C.
-
-Basically, my argument for this type construction system is that it is
-VERY simple to use and understand (although it IS different than C, it is
-very simple and straightforward, which C is NOT). In fact, I would assert
-that most programmers TODAY do not understand pointers to member
-functions, and have to look up an example when they have to write them.
-
-In my opinion, it is critically important to have clear and concise type
-specifications, because types are going to be all over the programs.
-
-Let me know your thoughts on this. :)
-
--Chris
-
diff --git a/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp1.txt b/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp1.txt
deleted file mode 100644
index 8bfefbf69f..0000000000
--- a/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp1.txt
+++ /dev/null
@@ -1,75 +0,0 @@
-Date: Thu, 8 Feb 2001 08:42:04 -0600
-From: Vikram S. Adve <vadve@cs.uiuc.edu>
-To: Chris Lattner <sabre@nondot.org>
-Subject: RE: Type notation debate...
-
-Chris,
-
-> Okay before you comment, please look at:
->
-> http://www.research.att.com/~bs/devXinterview.html
-
-I read this argument. Even before that, I was already in agreement with you
-and him that the C declarator syntax is difficult and confusing.
-
-But in fact, if you read the entire answer carefully, he came to the same
-conclusion I do: that you have to go with familiar syntax over logical
-syntax because familiarity is such a strong force:
-
- "However, familiarity is a strong force. To compare, in English, we
-live
-more or less happily with the absurd rules for "to be" (am, are, is, been,
-was, were, ...) and all attempts to simplify are treated with contempt or
-(preferably) humor. It be a curious world and it always beed."
-
-> Basically, my argument for this type construction system is that it is
-> VERY simple to use and understand (although it IS different than C, it is
-> very simple and straightforward, which C is NOT). In fact, I would assert
-> that most programmers TODAY do not understand pointers to member
-> functions, and have to look up an example when they have to write them.
-
-Again, I don't disagree with this at all. But to some extent this
-particular problem is inherently difficult. Your syntax for the above
-example may be easier for you to read because this is the way you have been
-thinking about it. Honestly, I don't find it much easier than the C syntax.
-In either case, I would have to look up an example to write pointers to
-member functions.
-
-But pointers to member functions are nowhere near as common as arrays. And
-the old array syntax:
- type [ int, int, ...]
-is just much more familiar and clear to people than anything new you
-introduce, no matter how logical it is. Introducing a new syntax that may
-make function pointers easier but makes arrays much more difficult seems
-very risky to me.
-
-> In my opinion, it is critically important to have clear and concise type
-> specifications, because types are going to be all over the programs.
-
-I absolutely agree. But the question is, what is more clear and concise?
-The syntax programmers are used to out of years of experience or a new
-syntax that they have never seen that has a more logical structure. I think
-the answer is the former. Sometimes, you have to give up a better idea
-because you can't overcome sociological barriers to it. Qwerty keyboards
-and Windows are two classic examples of bad technology that are difficult to
-root out.
-
-P.S. Also, while I agree that most your syntax is more logical, there is
-one part that isn't:
-
-Arrays (without and with size):
-type ::= '[' type ']' | '[' INT ',' type ']'.
-
-The arrays with size lists the dimensions and the type in a single list.
-That is just too confusing:
- [10, 40, int]
-This seems to be a 3-D array where the third dimension is something strange.
-It is too confusing to have a list of 3 things, some of which are dimensions
-and one is a type. Either of the following would be better:
-
- array [10, 40] of int
-or
- int [10, 40]
-
---Vikram
-
diff --git a/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp2.txt b/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp2.txt
deleted file mode 100644
index 6e9784158a..0000000000
--- a/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp2.txt
+++ /dev/null
@@ -1,53 +0,0 @@
-Date: Thu, 8 Feb 2001 14:31:05 -0600 (CST)
-From: Chris Lattner <sabre@nondot.org>
-To: Vikram S. Adve <vadve@cs.uiuc.edu>
-Subject: RE: Type notation debate...
-
-> Arrays (without and with size):
-> type ::= '[' type ']' | '[' INT ',' type ']'.
->
-> The arrays with size lists the dimensions and the type in a single list.
-> That is just too confusing:
-
-> [10, 40, int]
-> This seems to be a 3-D array where the third dimension is something strange.
-> It is too confusing to have a list of 3 things, some of which are dimensions
-> and one is a type.
-
-The above grammar indicates that there is only one integer parameter, ie
-the upper bound. The lower bound is always implied to be zero, for
-several reasons:
-
-* As a low level VM, we want to expose addressing computations
- explicitly. Since the lower bound must always be known in a high level
- language statically, the language front end can do the translation
- automatically.
-* This fits more closely with what Java needs, ie what we need in the
- short term. Java arrays are always zero based.
-
-If a two element list is too confusing, I would recommend an alternate
-syntax of:
-
-type ::= '[' type ']' | '[' INT 'x' type ']'.
-
-For example:
- [12 x int]
- [12x int]
- [ 12 x [ 4x int ]]
-
-Which is syntactically nicer, and more explicit.
-
-> Either of the following would be better:
-> array [10, 40] of int
-
-I considered this approach for arrays in general (ie array of int/ array
-of 12 int), but found that it made declarations WAY too long. Remember
-that because of the nature of llvm, you get a lot of types strewn all over
-the program, and using the 'typedef' like facility is not a wonderful
-option, because then types aren't explicit anymore.
-
-I find this email interesting, because you contradict the previous email
-you sent, where you recommend that we stick to C syntax....
-
--Chris
-
diff --git a/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp4.txt b/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp4.txt
deleted file mode 100644
index 7b9032742a..0000000000
--- a/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp4.txt
+++ /dev/null
@@ -1,89 +0,0 @@
-> But in fact, if you read the entire answer carefully, he came to the same
-> conclusion I do: that you have to go with familiar syntax over logical
-> syntax because familiarity is such a strong force:
-> "However, familiarity is a strong force. To compare, in English, we
-live
-> more or less happily with the absurd rules for "to be" (am, are, is, been,
-> was, were, ...) and all attempts to simplify are treated with contempt or
-> (preferably) humor. It be a curious world and it always beed."
-
-Although you have to remember that his situation was considerably
-different than ours. He was in a position where he was designing a high
-level language that had to be COMPATIBLE with C. Our language is such
-that a new person would have to learn the new, different, syntax
-anyways. Making them learn about the type system does not seem like much
-of a stretch from learning the opcodes and how SSA form works, and how
-everything ties together...
-
-> > Basically, my argument for this type construction system is that it is
-> > VERY simple to use and understand (although it IS different than C, it is
-> > very simple and straightforward, which C is NOT). In fact, I would assert
-> > that most programmers TODAY do not understand pointers to member
-> > functions, and have to look up an example when they have to write them.
-
-> Again, I don't disagree with this at all. But to some extent this
-> particular problem is inherently difficult. Your syntax for the above
-> example may be easier for you to read because this is the way you have been
-> thinking about it. Honestly, I don't find it much easier than the C syntax.
-> In either case, I would have to look up an example to write pointers to
-> member functions.
-
-I would argue that because the lexical structure of the language is self
-consistent, any person who spent a significant amount of time programming
-in LLVM directly would understand how to do it without looking it up in a
-manual. The reason this does not work for C is because you rarely have to
-declare these pointers, and the syntax is inconsistent with the method
-declaration and calling syntax.
-
-> But pointers to member functions are nowhere near as common as arrays.
-
-Very true. If you're implementing an object oriented language, however,
-remember that you have to do all the pointer to member function stuff
-yourself.... so everytime you invoke a virtual method one is involved
-(instead of having C++ hide it for you behind "syntactic sugar").
-
-> And the old array syntax:
-> type [ int, int, ...]
-> is just much more familiar and clear to people than anything new you
-> introduce, no matter how logical it is.
-
-Erm... excuse me but how is this the "old array syntax"? If you are
-arguing for consistency with C, you should be asking for 'type int []',
-which is significantly different than the above (beside the above
-introduces a new operator and duplicates information
-needlessly). Basically what I am suggesting is exactly the above without
-the fluff. So instead of:
-
- type [ int, int, ...]
-
-you use:
-
- type [ int ]
-
-> Introducing a new syntax that may
-> make function pointers easier but makes arrays much more difficult seems
-> very risky to me.
-
-This is not about function pointers. This is about consistency in the
-type system, and consistency with the rest of the language. The point
-above does not make arrays any more difficult to use, and makes the
-structure of types much more obvious than the "c way".
-
-> > In my opinion, it is critically important to have clear and concise type
-> > specifications, because types are going to be all over the programs.
->
-> I absolutely agree. But the question is, what is more clear and concise?
-> The syntax programmers are used to out of years of experience or a new
-> syntax that they have never seen that has a more logical structure. I think
-> the answer is the former. Sometimes, you have to give up a better idea
-> because you can't overcome sociological barriers to it. Qwerty keyboards
-> and Windows are two classic examples of bad technology that are difficult to
-> root out.
-
-Very true, but you seem to be advocating a completely different Type
-system than C has, in addition to it not offering the advantages of clear
-structure that the system I recommended does... so you seem to not have a
-problem with changing this, just with what I change it to. :)
-
--Chris
-
diff --git a/docs/HistoricalNotes/2001-02-09-AdveComments.txt b/docs/HistoricalNotes/2001-02-09-AdveComments.txt
deleted file mode 100644
index 5503233c1e..0000000000
--- a/docs/HistoricalNotes/2001-02-09-AdveComments.txt
+++ /dev/null
@@ -1,120 +0,0 @@
-Ok, here are my comments and suggestions about the LLVM instruction set.
-We should discuss some now, but can discuss many of them later, when we
-revisit synchronization, type inference, and other issues.
-(We have discussed some of the comments already.)
-
-
-o We should consider eliminating the type annotation in cases where it is
- essentially obvious from the instruction type, e.g., in br, it is obvious
- that the first arg. should be a bool and the other args should be labels:
-
- br bool <cond>, label <iftrue>, label <iffalse>
-
- I think your point was that making all types explicit improves clarity
- and readability. I agree to some extent, but it also comes at the cost
- of verbosity. And when the types are obvious from people's experience
- (e.g., in the br instruction), it doesn't seem to help as much.
-
-
-o On reflection, I really like your idea of having the two different switch
- types (even though they encode implementation techniques rather than
- semantics). It should simplify building the CFG and my guess is it could
- enable some significant optimizations, though we should think about which.
-
-
-o In the lookup-indirect form of the switch, is there a reason not to make
- the val-type uint? Most HLL switch statements (including Java and C++)
- require that anyway. And it would also make the val-type uniform
- in the two forms of the switch.
-
- I did see the switch-on-bool examples and, while cute, we can just use
- the branch instructions in that particular case.
-
-
-o I agree with your comment that we don't need 'neg'.
-
-
-o There's a trade-off with the cast instruction:
- + it avoids having to define all the upcasts and downcasts that are
- valid for the operands of each instruction (you probably have thought
- of other benefits also)
- - it could make the bytecode significantly larger because there could
- be a lot of cast operations
-
-
-o Making the second arg. to 'shl' a ubyte seems good enough to me.
- 255 positions seems adequate for several generations of machines
- and is more compact than uint.
-
-
-o I still have some major concerns about including malloc and free in the
- language (either as builtin functions or instructions). LLVM must be
- able to represent code from many different languages. Languages such as
- C, C++ Java and Fortran 90 would not be able to use our malloc anyway
- because each of them will want to provide a library implementation of it.
-
- This gets even worse when code from different languages is linked
- into a single executable (which is fairly common in large apps).
- Having a single malloc would just not suffice, and instead would simply
- complicate the picture further because it adds an extra variant in
- addition to the one each language provides.
-
- Instead, providing a default library version of malloc and free
- (and perhaps a malloc_gc with garbage collection instead of free)
- would make a good implementation available to anyone who wants it.
-
- I don't recall all your arguments in favor so let's discuss this again,
- and soon.
-
-
-o 'alloca' on the other hand sounds like a good idea, and the
- implementation seems fairly language-independent so it doesn't have the
- problems with malloc listed above.
-
-
-o About indirect call:
- Your option #2 sounded good to me. I'm not sure I understand your
- concern about an explicit 'icall' instruction?
-
-
-o A pair of important synchronization instr'ns to think about:
- load-linked
- store-conditional
-
-
-o Other classes of instructions that are valuable for pipeline performance:
- conditional-move
- predicated instructions
-
-
-o I believe tail calls are relatively easy to identify; do you know why
- .NET has a tailcall instruction?
-
-
-o I agree that we need a static data space. Otherwise, emulating global
- data gets unnecessarily complex.
-
-
-o About explicit parallelism:
-
- We once talked about adding a symbolic thread-id field to each
- instruction. (It could be optional so single-threaded codes are
- not penalized.) This could map well to multi-threaded architectures
- while providing easy ILP for single-threaded onces. But it is probably
- too radical an idea to include in a base version of LLVM. Instead, it
- could a great topic for a separate study.
-
- What is the semantics of the IA64 stop bit?
-
-
-
-
-o And finally, another thought about the syntax for arrays :-)
-
- Although this syntax:
- array <dimension-list> of <type>
- is verbose, it will be used only in the human-readable assembly code so
- size should not matter. I think we should consider it because I find it
- to be the clearest syntax. It could even make arrays of function
- pointers somewhat readable.
-
diff --git a/docs/HistoricalNotes/2001-02-09-AdveCommentsResponse.txt b/docs/HistoricalNotes/2001-02-09-AdveCommentsResponse.txt
deleted file mode 100644
index 5c87330fb7..0000000000
--- a/docs/HistoricalNotes/2001-02-09-AdveCommentsResponse.txt
+++ /dev/null
@@ -1,245 +0,0 @@
-From: Chris Lattner <sabre@nondot.org>
-To: "Vikram S. Adve" <vadve@cs.uiuc.edu>
-Subject: Re: LLVM Feedback
-
-I've included your feedback in the /home/vadve/lattner/llvm/docs directory
-so that it will live in CVS eventually with the rest of LLVM. I've
-significantly updated the documentation to reflect the changes you
-suggested, as specified below:
-
-> We should consider eliminating the type annotation in cases where it is
-> essentially obvious from the instruction type:
-> br bool <cond>, label <iftrue>, label <iffalse>
-> I think your point was that making all types explicit improves clarity
-> and readability. I agree to some extent, but it also comes at the
-> cost of verbosity. And when the types are obvious from people's
-> experience (e.g., in the br instruction), it doesn't seem to help as
-> much.
-
-Very true. We should discuss this more, but my reasoning is more of a
-consistency argument. There are VERY few instructions that can have all
-of the types eliminated, and doing so when available unnecesarily makes
-the language more difficult to handle. Especially when you see 'int
-%this' and 'bool %that' all over the place, I think it would be
-disorienting to see:
-
- br %predicate, %iftrue, %iffalse
-
-for branches. Even just typing that once gives me the creeps. ;) Like I
-said, we should probably discuss this further in person...
-
-> On reflection, I really like your idea of having the two different
-> switch types (even though they encode implementation techniques rather
-> than semantics). It should simplify building the CFG and my guess is it
-> could enable some significant optimizations, though we should think
-> about which.
-
-Great. I added a note to the switch section commenting on how the VM
-should just use the instruction type as a hint, and that the
-implementation may choose altermate representations (such as predicated
-branches).
-
-> In the lookup-indirect form of the switch, is there a reason not to
-> make the val-type uint?
-
-No. This was something I was debating for a while, and didn't really feel
-strongly about either way. It is common to switch on other types in HLL's
-(for example signed int's are particually common), but in this case, all
-that will be added is an additional 'cast' instruction. I removed that
-from the spec.
-
-> I agree with your comment that we don't need 'neg'
-
-Removed.
-
-> There's a trade-off with the cast instruction:
-> + it avoids having to define all the upcasts and downcasts that are
-> valid for the operands of each instruction (you probably have
-> thought of other benefits also)
-> - it could make the bytecode significantly larger because there could
-> be a lot of cast operations
-
- + You NEED casts to represent things like:
- void foo(float);
- ...
- int x;
- ...
- foo(x);
- in a language like C. Even in a Java like language, you need upcasts
- and some way to implement dynamic downcasts.
- + Not all forms of instructions take every type (for example you can't
- shift by a floating point number of bits), thus SOME programs will need
- implicit casts.
-
-To be efficient and to avoid your '-' point above, we just have to be
-careful to specify that the instructions shall operate on all common
-types, therefore casting should be relatively uncommon. For example all
-of the arithmetic operations work on almost all data types.
-
-> Making the second arg. to 'shl' a ubyte seems good enough to me.
-> 255 positions seems adequate for several generations of machines
-
-Okay, that comment is removed.
-
-> and is more compact than uint.
-
-No, it isn't. Remember that the bytecode encoding saves value slots into
-the bytecode instructions themselves, not constant values. This is
-another case where we may introduce more cast instructions (but we will
-also reduce the number of opcode variants that must be supported by a
-virtual machine). Because most shifts are by constant values, I don't
-think that we'll have to cast many shifts. :)
-
-> I still have some major concerns about including malloc and free in the
-> language (either as builtin functions or instructions).
-
-Agreed. How about this proposal:
-
-malloc/free are either built in functions or actual opcodes. They provide
-all of the type safety that the document would indicate, blah blah
-blah. :)
-
-Now, because of all of the excellent points that you raised, an
-implementation may want to override the default malloc/free behavior of
-the program. To do this, they simply implement a "malloc" and
-"free" function. The virtual machine will then be defined to use the user
-defined malloc/free function (which return/take void*'s, not type'd
-pointers like the builtin function would) if one is available, otherwise
-fall back on a system malloc/free.
-
-Does this sound like a good compromise? It would give us all of the
-typesafety/elegance in the language while still allowing the user to do
-all the cool stuff they want to...
-
-> 'alloca' on the other hand sounds like a good idea, and the
-> implementation seems fairly language-independent so it doesn't have the
-> problems with malloc listed above.
-
-Okay, once we get the above stuff figured out, I'll put it all in the
-spec.
-
-> About indirect call:
-> Your option #2 sounded good to me. I'm not sure I understand your
-> concern about an explicit 'icall' instruction?
-
-I worry too much. :) The other alternative has been removed. 'icall' is
-now up in the instruction list next to 'call'.
-
-> I believe tail calls are relatively easy to identify; do you know why
-> .NET has a tailcall instruction?
-
-Although I am just guessing, I believe it probably has to do with the fact
-that they want languages like Haskell and lisp to be efficiently runnable
-on their VM. Of course this means that the VM MUST implement tail calls
-'correctly', or else life will suck. :) I would put this into a future
-feature bin, because it could be pretty handy...
-
-> A pair of important synchronization instr'ns to think about:
-> load-linked
-> store-conditional
-
-What is 'load-linked'? I think that (at least for now) I should add these
-to the 'possible extensions' section, because they are not immediately
-needed...
-
-> Other classes of instructions that are valuable for pipeline
-> performance:
-> conditional-move
-> predicated instructions
-
-Conditional move is effectly a special case of a predicated
-instruction... and I think that all predicated instructions can possibly
-be implemented later in LLVM. It would significantly change things, and
-it doesn't seem to be very necessary right now. It would seem to
-complicate flow control analysis a LOT in the virtual machine. I would
-tend to prefer that a predicated architecture like IA64 convert from a
-"basic block" representation to a predicated rep as part of it's dynamic
-complication phase. Also, if a basic block contains ONLY a move, then
-that can be trivally translated into a conditional move...
-
-> I agree that we need a static data space. Otherwise, emulating global
-> data gets unnecessarily complex.
-
-Definately. Also a later item though. :)
-
-> We once talked about adding a symbolic thread-id field to each
-> ..
-> Instead, it could a great topic for a separate study.
-
-Agreed. :)
-
-> What is the semantics of the IA64 stop bit?
-
-Basically, the IA64 writes instructions like this:
-mov ...
-add ...
-sub ...
-op xxx
-op xxx
-;;
-mov ...
-add ...
-sub ...
-op xxx
-op xxx
-;;
-
-Where the ;; delimits a group of instruction with no dependencies between
-them, which can all be executed concurrently (to the limits of the
-available functional units). The ;; gets translated into a bit set in one
-of the opcodes.
-
-The advantages of this representation is that you don't have to do some
-kind of 'thread id scheduling' pass by having to specify ahead of time how
-many threads to use, and the representation doesn't have a per instruction
-overhead...
-
-> And finally, another thought about the syntax for arrays :-)
-> Although this syntax:
-> array <dimension-list> of <type>
-> is verbose, it will be used only in the human-readable assembly code so
-> size should not matter. I think we should consider it because I find it
-> to be the clearest syntax. It could even make arrays of function
-> pointers somewhat readable.
-
-My only comment will be to give you an example of why this is a bad
-idea. :)
-
-Here is an example of using the switch statement (with my recommended
-syntax):
-
-switch uint %val, label %otherwise,
- [%3 x {uint, label}] [ { uint %57, label %l1 },
- { uint %20, label %l2 },
- { uint %14, label %l3 } ]
-
-Here it is with the syntax you are proposing:
-
-switch uint %val, label %otherwise,
- array %3 of {uint, label}
- array of {uint, label}
- { uint %57, label %l1 },
- { uint %20, label %l2 },
- { uint %14, label %l3 }
-
-Which is ambiguous and very verbose. It would be possible to specify
-constants with [] brackets as in my syntax, which would look like this:
-
-switch uint %val, label %otherwise,
- array %3 of {uint, label} [ { uint %57, label %l1 },
- { uint %20, label %l2 },
- { uint %14, label %l3 } ]
-
-But then the syntax is inconsistent between type definition and constant
-definition (why do []'s enclose the constants but not the types??).
-
-Anyways, I'm sure that there is much debate still to be had over
-this... :)
-
--Chris
-
-http://www.nondot.org/~sabre/os/
-http://www.nondot.org/MagicStats/
-http://korbit.sourceforge.net/
-
-
diff --git a/docs/HistoricalNotes/2001-02-13-Reference-Memory.txt b/docs/HistoricalNotes/2001-02-13-Reference-Memory.txt
deleted file mode 100644
index 2c7534d9da..0000000000
--- a/docs/HistoricalNotes/2001-02-13-Reference-Memory.txt
+++ /dev/null
@@ -1,39 +0,0 @@
-Date: Tue, 13 Feb 2001 13:29:52 -0600 (CST)
-From: Chris Lattner <sabre@nondot.org>
-To: Vikram S. Adve <vadve@cs.uiuc.edu>
-Subject: LLVM Concerns...
-
-
-I've updated the documentation to include load store and allocation
-instructions (please take a look and let me know if I'm on the right
-track):
-
-file:/home/vadve/lattner/llvm/docs/LangRef.html#memoryops
-
-I have a couple of concerns I would like to bring up:
-
-1. Reference types
- Right now, I've spec'd out the language to have a pointer type, which
- works fine for lots of stuff... except that Java really has
- references: constrained pointers that cannot be manipulated: added and
- subtracted, moved, etc... Do we want to have a type like this? It
- could be very nice for analysis (pointer always points to the start of
- an object, etc...) and more closely matches Java semantics. The
- pointer type would be kept for C++ like semantics. Through analysis,
- C++ pointers could be promoted to references in the LLVM
- representation.
-
-2. Our "implicit" memory references in assembly language:
- After thinking about it, this model has two problems:
- A. If you do pointer analysis and realize that two stores are
- independent and can share the same memory source object, there is
- no way to represent this in either the bytecode or assembly.
- B. When parsing assembly/bytecode, we effectively have to do a full
- SSA generation/PHI node insertion pass to build the dependencies
- when we don't want the "pinned" representation. This is not
- cool.
- I'm tempted to make memory references explicit in both the assembly and
- bytecode to get around this... what do you think?
-
--Chris
-
diff --git a/docs/HistoricalNotes/2001-02-13-Reference-MemoryResponse.txt b/docs/HistoricalNotes/2001-02-13-Reference-MemoryResponse.txt
deleted file mode 100644
index 505343378d..0000000000
--- a/docs/HistoricalNotes/2001-02-13-Reference-MemoryResponse.txt
+++ /dev/null
@@ -1,47 +0,0 @@
-Date: Tue, 13 Feb 2001 18:25:42 -0600
-From: Vikram S. Adve <vadve@cs.uiuc.edu>
-To: Chris Lattner <sabre@nondot.org>
-Subject: RE: LLVM Concerns...
-
-> 1. Reference types
-> Right now, I've spec'd out the language to have a pointer type, which
-> works fine for lots of stuff... except that Java really has
-> references: constrained pointers that cannot be manipulated: added and
-> subtracted, moved, etc... Do we want to have a type like this? It
-> could be very nice for analysis (pointer always points to the start of
-> an object, etc...) and more closely matches Java semantics. The
-> pointer type would be kept for C++ like semantics. Through analysis,
-> C++ pointers could be promoted to references in the LLVM
-> representation.
-
-
-You're right, having references would be useful. Even for C++ the *static*
-compiler could generate references instead of pointers with fairly
-straightforward analysis. Let's include a reference type for now. But I'm
-also really concerned that LLVM is becoming big and complex and (perhaps)
-too high-level. After we get some initial performance results, we may have
-a clearer idea of what our goals should be and we should revisit this
-question then.
-
-> 2. Our "implicit" memory references in assembly language:
-> After thinking about it, this model has two problems:
-> A. If you do pointer analysis and realize that two stores are
-> independent and can share the same memory source object,
-
-not sure what you meant by "share the same memory source object"
-
-> there is
-> no way to represent this in either the bytecode or assembly.
-> B. When parsing assembly/bytecode, we effectively have to do a full
-> SSA generation/PHI node insertion pass to build the dependencies
-> when we don't want the "pinned" representation. This is not
-> cool.
-
-I understand the concern. But again, let's focus on the performance first
-and then look at the language design issues. E.g., it would be good to know
-how big the bytecode files are before expanding them further. I am pretty
-keen to explore the implications of LLVM for mobile devices. Both bytecode
-size and power consumption are important to consider there.
-
---Vikram
-
diff --git a/docs/HistoricalNotes/2001-04-16-DynamicCompilation.txt b/docs/HistoricalNotes/2001-04-16-DynamicCompilation.txt
deleted file mode 100644
index 5f7843ab56..0000000000
--- a/docs/HistoricalNotes/2001-04-16-DynamicCompilation.txt
+++ /dev/null
@@ -1,49 +0,0 @@
-By Chris:
-
-LLVM has been designed with two primary goals in mind. First we strive to
-enable the best possible division of labor between static and dynamic
-compilers, and second, we need a flexible and powerful interface
-between these two complementary stages of compilation. We feel that
-providing a solution to these two goals will yield an excellent solution
-to the performance problem faced by modern architectures and programming
-languages.
-
-A key insight into current compiler and runtime systems is that a
-compiler may fall in anywhere in a "continuum of compilation" to do its
-job. On one side, scripting languages statically compile nothing and
-dynamically compile (or equivalently, interpret) everything. On the far
-other side, traditional static compilers process everything statically and
-nothing dynamically. These approaches have typically been seen as a
-tradeoff between performance and portability. On a deeper level, however,
-there are two reasons that optimal system performance may be obtained by a
-system somewhere in between these two extremes: Dynamic application
-behavior and social constraints.
-
-From a technical perspective, pure static compilation cannot ever give
-optimal performance in all cases, because applications have varying dynamic
-behavior that the static compiler cannot take into consideration. Even
-compilers that support profile guided optimization generate poor code in
-the real world, because using such optimization tunes that application
-to one particular usage pattern, whereas real programs (as opposed to
-benchmarks) often have several different usage patterns.
-
-On a social level, static compilation is a very shortsighted solution to
-the performance problem. Instruction set architectures (ISAs) continuously
-evolve, and each implementation of an ISA (a processor) must choose a set
-of tradeoffs that make sense in the market context that it is designed for.
-With every new processor introduced, the vendor faces two fundamental
-problems: First, there is a lag time between when a processor is introduced
-to when compilers generate quality code for the architecture. Secondly,
-even when compilers catch up to the new architecture there is often a large
-body of legacy code that was compiled for previous generations and will
-not or can not be upgraded. Thus a large percentage of code running on a
-processor may be compiled quite sub-optimally for the current
-characteristics of the dynamic execution environment.
-
-For these reasons, LLVM has been designed from the beginning as a long-term
-solution to these problems. Its design allows the large body of platform
-independent, static, program optimizations currently in compilers to be
-reused unchanged in their current form. It also provides important static
-type information to enable powerful dynamic and link time optimizations
-to be performed quickly and efficiently. This combination enables an
-increase in effective system performance for real world environments.
diff --git a/docs/HistoricalNotes/2001-05-18-ExceptionHandling.txt b/docs/HistoricalNotes/2001-05-18-ExceptionHandling.txt
deleted file mode 100644
index b546301d35..0000000000
--- a/docs/HistoricalNotes/2001-05-18-ExceptionHandling.txt
+++ /dev/null
@@ -1,202 +0,0 @@
-Meeting notes: Implementation idea: Exception Handling in C++/Java
-
-The 5/18/01 meeting discussed ideas for implementing exceptions in LLVM.
-We decided that the best solution requires a set of library calls provided by
-the VM, as well as an extension to the LLVM function invocation syntax.
-
-The LLVM function invocation instruction previously looks like this (ignoring
-types):
-
- call func(arg1, arg2, arg3)
-
-The extension discussed today adds an optional "with" clause that
-associates a label with the call site. The new syntax looks like this:
-
- call func(arg1, arg2, arg3) with funcCleanup
-
-This funcHandler always stays tightly associated with the call site (being
-encoded directly into the call opcode itself), and should be used whenever
-there is cleanup work that needs to be done for the current function if
-an exception is thrown by func (or if we are in a try block).
-
-To support this, the VM/Runtime provide the following simple library
-functions (all syntax in this document is very abstract):
-
-typedef struct { something } %frame;
- The VM must export a "frame type", that is an opaque structure used to
- implement different types of stack walking that may be used by various
- language runtime libraries. We imagine that it would be typical to
- represent a frame with a PC and frame pointer pair, although that is not
- required.
-
-%frame getStackCurrentFrame();
- Get a frame object for the current function. Note that if the current
- function was inlined into its caller, the "current" frame will belong to
- the "caller".
-
-bool isFirstFrame(%frame f);
- Returns true if the specified frame is the top level (first activated) frame
- for this thread. For the main thread, this corresponds to the main()
- function, for a spawned thread, it corresponds to the thread function.
-
-%frame getNextFrame(%frame f);
- Return the previous frame on the stack. This function is undefined if f
- satisfies the predicate isFirstFrame(f).
-
-Label *getFrameLabel(%frame f);
- If a label was associated with f (as discussed below), this function returns
- it. Otherwise, it returns a null pointer.
-
-doNonLocalBranch(Label *L);
- At this point, it is not clear whether this should be a function or
- intrinsic. It should probably be an intrinsic in LLVM, but we'll deal with
- this issue later.
-
-
-Here is a motivating example that illustrates how these facilities could be
-used to implement the C++ exception model:
-
-void TestFunction(...) {
- A a; B b;
- foo(); // Any function call may throw
- bar();
- C c;
-
- try {
- D d;
- baz();
- } catch (int) {
- ...int Stuff...
- // execution continues after the try block: the exception is consumed
- } catch (double) {
- ...double stuff...
- throw; // Exception is propogated
- }
-}
-
-This function would compile to approximately the following code (heavy
-pseudo code follows):
-
-Func:
- %a = alloca A
- A::A(%a) // These ctors & dtors could throw, but we ignore this
- %b = alloca B // minor detail for this example
- B::B(%b)
-
- call foo() with fooCleanup // An exception in foo is propogated to fooCleanup
- call bar() with barCleanup // An exception in bar is propogated to barCleanup
-
- %c = alloca C
- C::C(c)
- %d = alloca D
- D::D(d)
- call baz() with bazCleanup // An exception in baz is propogated to bazCleanup
- d->~D();
-EndTry: // This label corresponds to the end of the try block
- c->~C() // These could also throw, these are also ignored
- b->~B()
- a->~A()
- return
-
-Note that this is a very straight forward and literal translation: exactly
-what we want for zero cost (when unused) exception handling. Especially on
-platforms with many registers (ie, the IA64) setjmp/longjmp style exception
-handling is *very* impractical. Also, the "with" clauses describe the
-control flow paths explicitly so that analysis is not adversly effected.
-
-The foo/barCleanup labels are implemented as:
-
-TryCleanup: // Executed if an exception escapes the try block
- c->~C()
-barCleanup: // Executed if an exception escapes from bar()
- // fall through
-fooCleanup: // Executed if an exception escapes from foo()
- b->~B()
- a->~A()
- Exception *E = getThreadLocalException()
- call throw(E) // Implemented by the C++ runtime, described below
-
-Which does the work one would expect. getThreadLocalException is a function
-implemented by the C++ support library. It returns the current exception
-object for the current thread. Note that we do not attempt to recycle the
-shutdown code from before, because performance of the mainline code is
-critically important. Also, obviously fooCleanup and barCleanup may be
-merged and one of them eliminated. This just shows how the code generator
-would most likely emit code.
-
-The bazCleanup label is more interesting. Because the exception may be caught
-by the try block, we must dispatch to its handler... but it does not exist
-on the call stack (it does not have a VM Call->Label mapping installed), so
-we must dispatch statically with a goto. The bazHandler thus appears as:
-
-bazHandler:
- d->~D(); // destruct D as it goes out of scope when entering catch clauses
- goto TryHandler
-
-In general, TryHandler is not the same as bazHandler, because multiple
-function calls could be made from the try block. In this case, trivial
-optimization could merge the two basic blocks. TryHandler is the code
-that actually determines the type of exception, based on the Exception object
-itself. For this discussion, assume that the exception object contains *at
-least*:
-
-1. A pointer to the RTTI info for the contained object
-2. A pointer to the dtor for the contained object
-3. The contained object itself
-
-Note that it is necessary to maintain #1 & #2 in the exception object itself
-because objects without virtual function tables may be thrown (as in this
-example). Assuming this, TryHandler would look something like this:
-
-TryHandler:
- Exception *E = getThreadLocalException();
- switch (E->RTTIType) {
- case IntRTTIInfo:
- ...int Stuff... // The action to perform from the catch block
- break;
- case DoubleRTTIInfo:
- ...double Stuff... // The action to perform from the catch block
- goto TryCleanup // This catch block rethrows the exception
- break; // Redundant, eliminated by the optimizer
- default:
- goto TryCleanup // Exception not caught, rethrow
- }
-
- // Exception was consumed
- if (E->dtor)
- E->dtor(E->object) // Invoke the dtor on the object if it exists
- goto EndTry // Continue mainline code...
-
-And that is all there is to it.
-
-The throw(E) function would then be implemented like this (which may be
-inlined into the caller through standard optimization):
-
-function throw(Exception *E) {
- // Get the start of the stack trace...
- %frame %f = call getStackCurrentFrame()
-
- // Get the label information that corresponds to it
- label * %L = call getFrameLabel(%f)
- while (%L == 0 && !isFirstFrame(%f)) {
- // Loop until a cleanup handler is found
- %f = call getNextFrame(%f)
- %L = call getFrameLabel(%f)
- }
-
- if (%L != 0) {
- call setThreadLocalException(E) // Allow handlers access to this...
- call doNonLocalBranch(%L)
- }
- // No handler found!
- call BlowUp() // Ends up calling the terminate() method in use
-}
-
-That's a brief rundown of how C++ exception handling could be implemented in
-llvm. Java would be very similar, except it only uses destructors to unlock
-synchronized blocks, not to destroy data. Also, it uses two stack walks: a
-nondestructive walk that builds a stack trace, then a destructive walk that
-unwinds the stack as shown here.
-
-It would be trivial to get exception interoperability between C++ and Java.
-
diff --git a/docs/HistoricalNotes/2001-05-19-ExceptionResponse.txt b/docs/HistoricalNotes/2001-05-19-ExceptionResponse.txt
deleted file mode 100644
index 3375365f54..0000000000
--- a/docs/HistoricalNotes/2001-05-19-ExceptionResponse.txt
+++ /dev/null
@@ -1,45 +0,0 @@
-Date: Sat, 19 May 2001 19:09:13 -0500 (CDT)
-From: Chris Lattner <sabre@nondot.org>
-To: Vikram S. Adve <vadve@cs.uiuc.edu>
-Subject: RE: Meeting writeup
-
-> I read it through and it looks great!
-
-Thanks!
-
-> The finally clause in Java may need more thought. The code for this clause
-> is like a subroutine because it needs to be entered from many points (end of
-> try block and beginning of each catch block), and then needs to *return to
-> the place from where the code was entered*. That's why JVM has the
-> jsr/jsr_w instruction.
-
-Hrm... I guess that is an implementation decision. It can either be
-modelled as a subroutine (as java bytecodes do), which is really
-gross... or it can be modelled as code duplication (emitted once inline,
-then once in the exception path). Because this could, at worst,
-slightly less than double the amount of code in a function (it is
-bounded) I don't think this is a big deal. One of the really nice things
-about the LLVM representation is that it still allows for runtime code
-generation for exception paths (exceptions paths are not compiled until
-needed). Obviously a static compiler couldn't do this though. :)
-
-In this case, only one copy of the code would be compiled... until the
-other one is needed on demand. Also this strategy fits with the "zero
-cost" exception model... the standard case is not burdened with extra
-branches or "call"s.
-
-> I suppose you could save the return address in a particular register
-> (specific to this finally block), jump to the finally block, and then at the
-> end of the finally block, jump back indirectly through this register. It
-> will complicate building the CFG but I suppose that can be handled. It is
-> also unsafe in terms of checking where control returns (which is I suppose
-> why the JVM doesn't use this).
-
-I think that a code duplication method would be cleaner, and would avoid
-the caveats that you mention. Also, it does not slow down the normal case
-with an indirect branch...
-
-Like everything, we can probably defer a final decision until later. :)
-
--Chris
-
diff --git a/docs/HistoricalNotes/2001-06-01-GCCOptimizations.txt b/docs/HistoricalNotes/2001-06-01-GCCOptimizations.txt
deleted file mode 100644
index 97af16a2da..0000000000
--- a/docs/HistoricalNotes/2001-06-01-GCCOptimizations.txt
+++ /dev/null
@@ -1,63 +0,0 @@
-Date: Fri, 1 Jun 2001 16:38:17 -0500 (CDT)
-From: Chris Lattner <sabre@nondot.org>
-To: Vikram S. Adve <vadve@cs.uiuc.edu>
-Subject: Interesting: GCC passes
-
-
-Take a look at this document (which describes the order of optimizations
-that GCC performs):
-
-http://gcc.gnu.org/onlinedocs/gcc_17.html
-
-The rundown is that after RTL generation, the following happens:
-
-1 . [t] jump optimization (jumps to jumps, etc)
-2 . [t] Delete unreachable code
-3 . Compute live ranges for CSE
-4 . [t] Jump threading (jumps to jumps with identical or inverse conditions)
-5 . [t] CSE
-6 . *** Conversion to SSA
-7 . [t] SSA Based DCE
-8 . *** Conversion to LLVM
-9 . UnSSA
-10. GCSE
-11. LICM
-12. Strength Reduction
-13. Loop unrolling
-14. [t] CSE
-15. [t] DCE
-16. Instruction combination, register movement, scheduling... etc.
-
-I've marked optimizations with a [t] to indicate things that I believe to
-be relatively trivial to implement in LLVM itself. The time consuming
-things to reimplement would be SSA based PRE, Strength reduction & loop
-unrolling... these would be the major things we would miss out on if we
-did LLVM creation from tree code [inlining and other high level
-optimizations are done on the tree representation].
-
-Given the lack of "strong" optimizations that would take a long time to
-reimplement, I am leaning a bit more towards creating LLVM from the tree
-code. Especially given that SGI has GPL'd their compiler, including many
-SSA based optimizations that could be adapted (besides the fact that their
-code looks MUCH nicer than GCC :)
-
-Even if we choose to do LLVM code emission from RTL, we will almost
-certainly want to move LLVM emission from step 8 down until at least CSE
-has been rerun... which causes me to wonder if the SSA generation code
-will still work (due to global variable dependencies and stuff). I assume
-that it can be made to work, but might be a little more involved than we
-would like.
-
-I'm continuing to look at the Tree -> RTL code. It is pretty gross
-because they do some of the translation a statement at a time, and some
-of it a function at a time... I'm not quite clear why and how the
-distinction is drawn, but it does not appear that there is a wonderful
-place to attach extra info.
-
-Anyways, I'm proceeding with the RTL -> LLVM conversion phase for now. We
-can talk about this more on Monday.
-
-Wouldn't it be nice if there were a obvious decision to be made? :)
-
--Chris
-
diff --git a/docs/HistoricalNotes/2001-06-01-GCCOptimizations2.txt b/docs/HistoricalNotes/2001-06-01-GCCOptimizations2.txt
deleted file mode 100644
index 6c9e0971a0..0000000000
--- a/docs/HistoricalNotes/2001-06-01-GCCOptimizations2.txt
+++ /dev/null
@@ -1,71 +0,0 @@
-Date: Fri, 1 Jun 2001 17:08:44 -0500 (CDT)
-From: Chris Lattner <sabre@nondot.org>
-To: Vikram S. Adve <vadve@cs.uiuc.edu>
-Subject: RE: Interesting: GCC passes
-
-> That is very interesting. I agree that some of these could be done on LLVM
-> at link-time, but it is the extra time required that concerns me. Link-time
-> optimization is severely time-constrained.
-
-If we were to reimplement any of these optimizations, I assume that we
-could do them a translation unit at a time, just as GCC does now. This
-would lead to a pipeline like this:
-
-Static optimizations, xlation unit at a time:
-.c --GCC--> .llvm --llvmopt--> .llvm
-
-Link time optimizations:
-.llvm --llvm-ld--> .llvm --llvm-link-opt--> .llvm
-
-Of course, many optimizations could be shared between llvmopt and
-llvm-link-opt, but the wouldn't need to be shared... Thus compile time
-could be faster, because we are using a "smarter" IR (SSA based).
-
-> BTW, about SGI, "borrowing" SSA-based optimizations from one compiler and
-> putting it into another is not necessarily easier than re-doing it.
-> Optimization code is usually heavily tied in to the specific IR they use.
-
-Understood. The only reason that I brought this up is because SGI's IR is
-more similar to LLVM than it is different in many respects (SSA based,
-relatively low level, etc), and could be easily adapted. Also their
-optimizations are written in C++ and are actually somewhat
-structured... of course it would be no walk in the park, but it would be
-much less time consuming to adapt, say, SSA-PRE than to rewrite it.
-
-> But your larger point is valid that adding SSA based optimizations is
-> feasible and should be fun. (Again, link time cost is the issue.)
-
-Assuming linktime cost wasn't an issue, the question is:
-Does using GCC's backend buy us anything?
-
-> It also occurs to me that GCC is probably doing quite a bit of back-end
-> optimization (step 16 in your list). Do you have a breakdown of that?
-
-Not really. The irritating part of GCC is that it mixes it all up and
-doesn't have a clean seperation of concerns. A lot of the "back end
-optimization" happens right along with other data optimizations (ie, CSE
-of machine specific things).
-
-As far as REAL back end optimizations go, it looks something like this:
-
-1. Instruction combination: try to make CISCy instructions, if available
-2. Register movement: try to get registers in the right places for the
-architecture to avoid register to register moves. For example, try to get
-the first argument of a function to naturally land in %o0 for sparc.
-3. Instruction scheduling: 'nuff said :)
-4. Register class preferencing: ??
-5. Local register allocation
-6. global register allocation
-7. Spilling
-8. Local regalloc
-9. Jump optimization
-10. Delay slot scheduling
-11. Branch shorting for CISC machines
-12. Instruction selection & peephole optimization
-13. Debug info output
-
-But none of this would be usable for LLVM anyways, unless we were using
-GCC as a static compiler.
-
--Chris
-
diff --git a/docs/HistoricalNotes/2001-06-20-.NET-Differences.txt b/docs/HistoricalNotes/2001-06-20-.NET-Differences.txt
deleted file mode 100644
index 1bc2eae746..0000000000
--- a/docs/HistoricalNotes/2001-06-20-.NET-Differences.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-Date: Wed, 20 Jun 2001 12:32:22 -0500
-From: Vikram Adve <vadve@cs.uiuc.edu>
-To: Chris Lattner <lattner@cs.uiuc.edu>
-Subject: .NET vs. our VM
-
-One significant difference between .NET CLR and our VM is that the CLR
-includes full information about classes and inheritance. In fact, I just
-sat through the paper on adding templates to .NET CLR, and the speaker
-indicated that the goal seems to be to do simple static compilation (very
-little lowering or optimization). Also, the templates implementation in CLR
-"relies on dynamic class loading and JIT compilation".
-
-This is an important difference because I think there are some significant
-advantages to have a much lower level VM layer, and do significant static
-analysis and optimization.
-
-I also talked to the lead guy for KAI's C++ compiler (Arch Robison) and he
-said that SGI and other commercial compilers have included options to export
-their *IR* next to the object code (i.e., .il files) and use them for
-link-time code generation. In fact, he said that the .o file was nearly
-empty and was entirely generated from the .il at link-time. But he agreed
-that this limited the link-time interprocedural optimization to modules
-compiled by the same compiler, whereas our approach allows us to link and
-optimize modules from multiple different compilers. (Also, of course, they
-don't do anything for runtime optimization).
-
-All issues to bring up in Related Work.
-
---Vikram
-
diff --git a/docs/HistoricalNotes/2001-07-06-LoweringIRForCodeGen.txt b/docs/HistoricalNotes/2001-07-06-LoweringIRForCodeGen.txt
deleted file mode 100644
index 3e10416fe6..0000000000
--- a/docs/HistoricalNotes/2001-07-06-LoweringIRForCodeGen.txt
+++ /dev/null
@@ -1,31 +0,0 @@
-Date: Fri, 6 Jul 2001 16:56:56 -0500
-From: Vikram S. Adve <vadve@cs.uiuc.edu>
-To: Chris Lattner <lattner@cs.uiuc.edu>
-Subject: lowering the IR
-
-BTW, I do think that we should consider lowering the IR as you said. I
-didn't get time to raise it today, but it comes up with the SPARC
-move-conditional instruction. I don't think we want to put that in the core
-VM -- it is a little too specialized. But without a corresponding
-conditional move instruction in the VM, it is pretty difficult to maintain a
-close mapping between VM and machine code. Other architectures may have
-other such instructions.
-
-What I was going to suggest was that for a particular processor, we define
-additional VM instructions that match some of the unusual opcodes on the
-processor but have VM semantics otherwise, i.e., all operands are in SSA
-form and typed. This means that we can re-generate core VM code from the
-more specialized code any time we want (so that portability is not lost).
-
-Typically, a static compiler like gcc would generate just the core VM, which
-is relatively portable. Anyone (an offline tool, the linker, etc., or even
-the static compiler itself if it chooses) can transform that into more
-specialized target-specific VM code for a particular architecture. If the
-linker does it, it can do it after all machine-independent optimizations.
-This would be the most convenient, but not necessary.
-
-The main benefit of lowering will be that we will be able to retain a close
-mapping between VM and machine code.
-
---Vikram
-
diff --git a/docs/HistoricalNotes/2001-09-18-OptimizeExceptions.txt b/docs/HistoricalNotes/2001-09-18-OptimizeExceptions.txt
deleted file mode 100644
index 9379081018..0000000000
--- a/docs/HistoricalNotes/2001-09-18-OptimizeExceptions.txt
+++ /dev/null
@@ -1,56 +0,0 @@
-Date: Tue, 18 Sep 2001 00:38:37 -0500 (CDT)
-From: Chris Lattner <sabre@nondot.org>
-To: Vikram S. Adve <vadve@cs.uiuc.edu>
-Subject: Idea for a simple, useful link time optimization
-
-
-In C++ programs, exceptions suck, and here's why:
-
-1. In virtually all function calls, you must assume that the function
- throws an exception, unless it is defined as 'nothrow'. This means
- that every function call has to have code to invoke dtors on objects
- locally if one is thrown by the function. Most functions don't throw
- exceptions, so this code is dead [with all the bad effects of dead
- code, including icache pollution].
-2. Declaring a function nothrow causes catch blocks to be added to every
- call that isnot provably nothrow. This makes them very slow.
-3. Extra extraneous exception edges reduce the opportunity for code
- motion.
-4. EH is typically implemented with large lookup tables. Ours is going to
- be much smaller (than the "standard" way of doing it) to start with,
- but eliminating it entirely would be nice. :)
-5. It is physically impossible to correctly put (accurate, correct)
- exception specifications on generic, templated code. But it is trivial
- to analyze instantiations of said code.
-6. Most large C++ programs throw few exceptions. Most well designed
- programs only throw exceptions in specific planned portions of the
- code.
-
-Given our _planned_ model of handling exceptions, all of this would be
-pretty trivial to eliminate through some pretty simplistic interprocedural
-analysis. The DCE factor alone could probably be pretty significant. The
-extra code motion opportunities could also be exploited though...
-
-Additionally, this optimization can be implemented in a straight forward
-conservative manner, allowing libraries to be optimized or individual
-files even (if there are leaf functions visible in the translation unit
-that are called).
-
-I think it's a reasonable optimization that hasn't really been addressed
-(because assembly is way too low level for this), and could have decent
-payoffs... without being a overly complex optimization.
-
-After I wrote all of that, I found this page that is talking about
-basically the same thing I just wrote, except that it is translation unit
-at a time, tree based approach:
-http://www.ocston.org/~jls/ehopt.html
-
-but is very useful from "expected gain" and references perspective. Note
-that their compiler is apparently unable to inline functions that use
-exceptions, so there numbers are pretty worthless... also our results
-would (hopefully) be better because it's interprocedural...
-
-What do you think?
-
--Chris
-
diff --git a/docs/HistoricalNotes/2002-05-12-InstListChange.txt b/docs/HistoricalNotes/2002-05-12-InstListChange.txt
deleted file mode 100644
index 004edb068d..0000000000
--- a/docs/HistoricalNotes/2002-05-12-InstListChange.txt
+++ /dev/null
@@ -1,55 +0,0 @@
-Date: Sun, 12 May 2002 17:12:53 -0500 (CDT)
-From: Chris Lattner <sabre@nondot.org>
-To: "Vikram S. Adve" <vadve@cs.uiuc.edu>
-Subject: LLVM change
-
-There is a fairly fundemental change that I would like to make to the LLVM
-infrastructure, but I'd like to know if you see any drawbacks that I
-don't...
-
-Basically right now at the basic block level, each basic block contains an
-instruction list (returned by getInstList()) that is a ValueHolder of
-instructions. To iterate over instructions, we must actually iterate over
-the instlist, and access the instructions through the instlist.
-
-To add or remove an instruction from a basic block, we need to get an
-iterator to an instruction, which, given just an Instruction*, requires a
-linear search of the basic block the instruction is contained in... just
-to insert an instruction before another instruction, or to delete an
-instruction! This complicates algorithms that should be very simple (like
-simple constant propogation), because they aren't actually sparse anymore,
-they have to traverse basic blocks to remove constant propogated
-instructions.
-
-Additionally, adding or removing instructions to a basic block
-_invalidates all iterators_ pointing into that block, which is really
-irritating.
-
-To fix these problems (and others), I would like to make the ordering of
-the instructions be represented with a doubly linked list in the
-instructions themselves, instead of an external data structure. This is
-how many other representations do it, and frankly I can't remember why I
-originally implemented it the way I did.
-
-Long term, all of the code that depends on the nasty features in the
-instruction list (which can be found by grep'ing for getInstList()) will
-be changed to do nice local transformations. In the short term, I'll
-change the representation, but preserve the interface (including
-getInstList()) so that all of the code doesn't have to change.
-
-Iteration over the instructions in a basic block remains the simple:
-for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; ++I) ...
-
-But we will also support:
-for (Instruction *I = BB->front(); I; I = I->getNext()) ...
-
-After converting instructions over, I'll convert basic blocks and
-functions to have a similar interface.
-
-The only negative aspect of this change that I see is that it increases
-the amount of memory consumed by one pointer per instruction. Given the
-benefits, I think this is a very reasonable tradeoff.
-
-What do you think?
-
--Chris
diff --git a/docs/HistoricalNotes/2002-06-25-MegaPatchInfo.txt b/docs/HistoricalNotes/2002-06-25-MegaPatchInfo.txt
deleted file mode 100644
index 2ca46117ca..0000000000
--- a/docs/HistoricalNotes/2002-06-25-MegaPatchInfo.txt
+++ /dev/null
@@ -1,72 +0,0 @@
-Changes:
-* Change the casting code to be const correct. Now, doing this is invalid:
- const Value *V = ...;
- Instruction *I = dyn_cast<Instruction>(V);
- instead, the second line should be:
- const Instruction *I = dyn_cast<Instruction>(V);
-
-* Change the casting code to allow casting a reference value thus:
- const Value &V = ...;
- Instruction &I = cast<Instruction>(V);
-
- dyn_cast does not work with references, because it must return a null pointer
- on failure.
-
-* Fundamentally change how instructions and other values are represented.
- Before, every llvm container was an instance of the ValueHolder template,
- instantiated for each container type. This ValueHolder was effectively a
- wrapper around a vector of pointers to the sub-objects.
-
- Now, instead of having a vector to pointers of objects, the objects are
- maintained in a doubly linked list of values (ie each Instruction now has
- Next & Previous fields). The containers are now instances of ilist (intrusive
- linked list class), which use the next and previous fields to chain them
- together. The advantage of this implementation is that iterators can be
- formed directly from pointers to the LLVM value, and invalidation is much
- easier to handle.
-
-* As part of the above change, dereferencing an iterator (for example:
- BasicBlock::iterator) now produces a reference to the underlying type (same
- example: Instruction&) instead of a pointer to the underlying object. This
- makes it much easier to write nested loops that iterator over things, changing
- this:
-
- for (Function::iterator BI = Func->begin(); BI != Func->end(); ++BI)
- for (BasicBlock::iterator II = (*BI)->begin(); II != (*BI)->end(); ++II)
- (*II)->dump();
-
- into:
-
- for (Function::iterator BI = Func->begin(); BI != Func->end(); ++BI)
- for (BasicBlock::iterator II = BI->begin(); II != BI->end(); ++II)
- II->dump();
-
- which is much more natural and what users expect.
-
-* Simplification of #include's: Before, it was necessary for a .cpp file to
- include every .h file that it used. Now things are batched a little bit more
- to make it easier to use. Specifically, the include graph now includes these
- edges:
- Module.h -> Function.h, GlobalVariable.h
- Function.h -> BasicBlock.h, Argument.h
- BasicBlock.h -> Instruction.h
-
- Which means that #including Function.h is usually sufficient for getting the
- lower level #includes.
-
-* Printing out a Value* has now changed: Printing a Value* will soon print out
- the address of the value instead of the contents of the Value. To print out
- the contents, you must convert it to a reference with (for example)
- 'cout << *I' instead of 'cout << I;'. This conversion is not yet complete,
- but will be eventually. In the mean time, both forms print out the contents.
-
-* References are used much more throughout the code base. In general, if a
- pointer is known to never be null, it is passed in as a reference instead of a
- pointer. For example, the instruction visitor class uses references instead
- of pointers, and that Pass subclasses now all receive references to Values
- instead of pointers, because they may never be null.
-
-* The Function class now has helper functions for accessing the Arguments list.
- Instead of having to go through getArgumentList for simple things like
- iterator over the arguments, now the a*() methods can be used to access them.
-
diff --git a/docs/HistoricalNotes/2003-01-23-CygwinNotes.txt b/docs/HistoricalNotes/2003-01-23-CygwinNotes.txt
deleted file mode 100644
index fbe811d627..0000000000
--- a/docs/HistoricalNotes/2003-01-23-CygwinNotes.txt
+++ /dev/null
@@ -1,28 +0,0 @@
-Date: Mon, 20 Jan 2003 00:00:28 -0600
-From: Brian R. Gaeke <gaeke@uiuc.edu>
-Subject: windows vs. llvm
-
-If you're interested, here are some of the major problems compiling LLVM
-under Cygwin and/or Mingw.
-
-1. Cygwin doesn't have <inttypes.h> or <stdint.h>, so all the INT*_MAX
- symbols and standard int*_t types are off in limbo somewhere. Mingw has
- <stdint.h>, but Cygwin doesn't like it.
-
-2. Mingw doesn't have <dlfcn.h> (because Windows doesn't have it.)
-
-3. SA_SIGINFO and friends are not around; only signal() seems to work.
-
-4. Relink, aka ld -r, doesn't work (probably an ld bug); you need
- DONT_BUILD_RELINKED. This breaks all the tools makefiles; you just need to
- change them to have .a's.
-
-5. There isn't a <values.h>.
-
-6. There isn't a mallinfo() (or, at least, it's documented, but it doesn't seem
- to link).
-
-7. The version of Bison that cygwin (and newer Linux versions) comes with
- does not like = signs in rules. Burg's gram.yc source file uses them. I think
- you can just take them out.
-
diff --git a/docs/HistoricalNotes/2003-06-25-Reoptimizer1.txt b/docs/HistoricalNotes/2003-06-25-Reoptimizer1.txt
deleted file mode 100644
index a745784639..0000000000
--- a/docs/HistoricalNotes/2003-06-25-Reoptimizer1.txt
+++ /dev/null
@@ -1,137 +0,0 @@
-Wed Jun 25 15:13:51 CDT 2003
-
-First-level instrumentation
----------------------------
-
-We use opt to do Bytecode-to-bytecode instrumentation. Look at
-back-edges and insert llvm_first_trigger() function call which takes
-no arguments and no return value. This instrumentation is designed to
-be easy to remove, for instance by writing a NOP over the function
-call instruction.
-
-Keep count of every call to llvm_first_trigger(), and maintain
-counters in a map indexed by return address. If the trigger count
-exceeds a threshold, we identify a hot loop and perform second-level
-instrumentation on the hot loop region (the instructions between the
-target of the back-edge and the branch that causes the back-edge). We
-do not move code across basic-block boundaries.
-
-
-Second-level instrumentation
----------------------------
-
-We remove the first-level instrumentation by overwriting the CALL to
-llvm_first_trigger() with a NOP.
-
-The reoptimizer maintains a map between machine-code basic blocks and
-LLVM BasicBlock*s. We only keep track of paths that start at the
-first machine-code basic block of the hot loop region.
-
-How do we keep track of which edges to instrument, and which edges are
-exits from the hot region? 3 step process.
-
-1) Do a DFS from the first machine-code basic block of the hot loop
-region and mark reachable edges.
-
-2) Do a DFS from the last machine-code basic block of the hot loop
-region IGNORING back edges, and mark the edges which are reachable in
-1) and also in 2) (i.e., must be reachable from both the start BB and
-the end BB of the hot region).
-
-3) Mark BBs which end in edges that exit the hot region; we need to
-instrument these differently.
-
-Assume that there is 1 free register. On SPARC we use %g1, which LLC
-has agreed not to use. Shift a 1 into it at the beginning. At every
-edge which corresponds to a conditional branch, we shift 0 for not
-taken and 1 for taken into a register. This uniquely numbers the paths
-through the hot region. Silently fail if we need more than 64 bits.
-
-At the end BB we call countPath and increment the counter based on %g1
-and the return address of the countPath call. We keep track of the
-number of iterations and the number of paths. We only run this
-version 30 or 40 times.
-
-Find the BBs that total 90% or more of execution, and aggregate them
-together to form our trace. But we do not allow more than 5 paths; if
-we have more than 5 we take the ones that are executed the most. We
-verify our assumption that we picked a hot back-edge in first-level
-instrumentation, by making sure that the number of times we took an
-exit edge from the hot trace is less than 10% of the number of
-iterations.
-
-LLC has been taught to recognize llvm_first_trigger() calls and NOT
-generate saves and restores of caller-saved registers around these
-calls.
-
-
-Phase behavior
---------------
-
-We turn off llvm_first_trigger() calls with NOPs, but this would hide
-phase behavior from us (when some funcs/traces stop being hot and
-others become hot.)
-
-We have a SIGALRM timer that counts time for us. Every time we get a
-SIGALRM we look at our priority queue of locations where we have
-removed llvm_first_trigger() calls. Each location is inserted along
-with a time when we will next turn instrumentation back on for that
-call site. If the time has arrived for a particular call site, we pop
-that off the prio. queue and turn instrumentation back on for that
-call site.
-
-
-Generating traces
------------------
-
-When we finally generate an optimized trace we first copy the code
-into the trace cache. This leaves us with 3 copies of the code: the
-original code, the instrumented code, and the optimized trace. The
-optimized trace does not have instrumentation. The original code and
-the instrumented code are modified to have a branch to the trace
-cache, where the optimized traces are kept.
-
-We copy the code from the original to the instrumentation version
-by tracing the LLVM-to-Machine code basic block map and then copying
-each machine code basic block we think is in the hot region into the
-trace cache. Then we instrument that code. The process is similar for
-generating the final optimized trace; we copy the same basic blocks
-because we might need to put in fixup code for exit BBs.
-
-LLVM basic blocks are not typically used in the Reoptimizer except
-for the mapping information.
-
-We are restricted to using single instructions to branch between the
-original code, trace, and instrumented code. So we have to keep the
-code copies in memory near the original code (they can't be far enough
-away that a single pc-relative branch would not work.) Malloc() or
-data region space is too far away. this impacts the design of the
-trace cache.
-
-We use a dummy function that is full of a bunch of for loops which we
-overwrite with trace-cache code. The trace manager keeps track of
-whether or not we have enough space in the trace cache, etc.
-
-The trace insertion routine takes an original start address, a vector
-of machine instructions representing the trace, index of branches and
-their corresponding absolute targets, and index of calls and their
-corresponding absolute targets.
-
-The trace insertion routine is responsible for inserting branches from
-the beginning of the original code to the beginning of the optimized
-trace. This is because at some point the trace cache may run out of
-space and it may have to evict a trace, at which point the branch to
-the trace would also have to be removed. It uses a round-robin
-replacement policy; we have found that this is almost as good as LRU
-and better than random (especially because of problems fitting the new
-trace in.)
-
-We cannot deal with discontiguous trace cache areas. The trace cache
-is supposed to be cache-line-aligned, but it is not page-aligned.
-
-We generate instrumentation traces and optimized traces into separate
-trace caches. We keep the instrumented code around because you don't
-want to delete a trace when you still might have to return to it
-(i.e., return from a llvm_first_trigger() or countPath() call.)
-
-
diff --git a/docs/HistoricalNotes/2003-06-26-Reoptimizer2.txt b/docs/HistoricalNotes/2003-06-26-Reoptimizer2.txt
deleted file mode 100644
index ec4b93fea0..0000000000
--- a/docs/HistoricalNotes/2003-06-26-Reoptimizer2.txt
+++ /dev/null
@@ -1,110 +0,0 @@
-Thu Jun 26 14:43:04 CDT 2003
-
-Information about BinInterface
-------------------------------
-
-Take in a set of instructions with some particular register
-allocation. It allows you to add, modify, or delete some instructions,
-in SSA form (kind of like LLVM's MachineInstrs.) Then re-allocate
-registers. It assumes that the transformations you are doing are safe.
-It does not update the mapping information or the LLVM representation
-for the modified trace (so it would not, for instance, support
-multiple optimization passes; passes have to be aware of and update
-manually the mapping information.)
-
-The way you use it is you take the original code and provide it to
-BinInterface; then you do optimizations to it, then you put it in the
-trace cache.
-
-The BinInterface tries to find live-outs for traces so that it can do
-register allocation on just the trace, and stitch the trace back into
-the original code. It has to preserve the live-ins and live-outs when
-it does its register allocation. (On exits from the trace we have
-epilogues that copy live-outs back into the right registers, but
-live-ins have to be in the right registers.)
-
-
-Limitations of BinInterface
----------------------------
-
-It does copy insertions for PHIs, which it infers from the machine
-code. The mapping info inserted by LLC is not sufficient to determine
-the PHIs.
-
-It does not handle integer or floating-point condition codes and it
-does not handle floating-point register allocation.
-
-It is not aggressively able to use lots of registers.
-
-There is a problem with alloca: we cannot find our spill space for
-spilling registers, normally allocated on the stack, if the trace
-follows an alloca(). What might be an acceptable solution would be to
-disable trace generation on functions that have variable-sized
-alloca()s. Variable-sized allocas in the trace would also probably
-screw things up.
-
-Because of the FP and alloca limitations, the BinInterface is
-completely disabled right now.
-
-
-Demo
-----
-
-This is a demo of the Ball & Larus version that does NOT use 2-level
-profiling.
-
-1. Compile program with llvm-gcc.
-2. Run opt -lowerswitch -paths -emitfuncs on the bytecode.
- -lowerswitch change switch statements to branches
- -paths Ball & Larus path-profiling algorithm
- -emitfuncs emit the table of functions
-3. Run llc to generate SPARC assembly code for the result of step 2.
-4. Use g++ to link the (instrumented) assembly code.
-
-We use a script to do all this:
-------------------------------------------------------------------------------
-#!/bin/sh
-llvm-gcc $1.c -o $1
-opt -lowerswitch -paths -emitfuncs $1.bc > $1.run.bc
-llc -f $1.run.bc
-LIBS=$HOME/llvm_sparc/lib/Debug
-GXX=/usr/dcs/software/evaluation/bin/g++
-$GXX -g -L $LIBS $1.run.s -o $1.run.llc \
-$LIBS/tracecache.o \
-$LIBS/mapinfo.o \
-$LIBS/trigger.o \
-$LIBS/profpaths.o \
-$LIBS/bininterface.o \
-$LIBS/support.o \
-$LIBS/vmcore.o \
-$LIBS/transformutils.o \
-$LIBS/bcreader.o \
--lscalaropts -lscalaropts -lanalysis \
--lmalloc -lcpc -lm -ldl
-------------------------------------------------------------------------------
-
-5. Run the resulting binary. You will see output from BinInterface
-(described below) intermixed with the output from the program.
-
-
-Output from BinInterface
-------------------------
-
-BinInterface's debugging code prints out the following stuff in order:
-
-1. Initial code provided to BinInterface with original register
-allocation.
-
-2. Section 0 is the trace prolog, consisting mainly of live-ins and
-register saves which will be restored in epilogs.
-
-3. Section 1 is the trace itself, in SSA form used by BinInterface,
-along with the PHIs that are inserted.
-PHIs are followed by the copies that implement them.
-Each branch (i.e., out of the trace) is annotated with the
-section number that represents the epilog it branches to.
-
-4. All the other sections starting with Section 2 are trace epilogs.
-Every branch from the trace has to go to some epilog.
-
-5. After the last section is the register allocation output.
diff --git a/docs/HistoricalNotes/2007-OriginalClangReadme.txt b/docs/HistoricalNotes/2007-OriginalClangReadme.txt
deleted file mode 100644
index 611dc9d2c0..0000000000
--- a/docs/HistoricalNotes/2007-OriginalClangReadme.txt
+++ /dev/null
@@ -1,178 +0,0 @@
-//===----------------------------------------------------------------------===//
-// C Language Family Front-end
-//===----------------------------------------------------------------------===//
- Chris Lattner
-
-I. Introduction:
-
- clang: noun
- 1. A loud, resonant, metallic sound.
- 2. The strident call of a crane or goose.
- 3. C-language family front-end toolkit.
-
- The world needs better compiler tools, tools which are built as libraries. This
- design point allows reuse of the tools in new and novel ways. However, building
- the tools as libraries isn't enough: they must have clean APIs, be as
- decoupled from each other as possible, and be easy to modify/extend. This
- requires clean layering, decent design, and avoiding tying the libraries to a
- specific use. Oh yeah, did I mention that we want the resultant libraries to
- be as fast as possible? :)
-
- This front-end is built as a component of the LLVM toolkit that can be used
- with the LLVM backend or independently of it. In this spirit, the API has been
- carefully designed as the following components:
-
- libsupport - Basic support library, reused from LLVM.
-
- libsystem - System abstraction library, reused from LLVM.
-
- libbasic - Diagnostics, SourceLocations, SourceBuffer abstraction,
- file system caching for input source files. This depends on
- libsupport and libsystem.
-
- libast - Provides classes to represent the C AST, the C type system,
- builtin functions, and various helpers for analyzing and
- manipulating the AST (visitors, pretty printers, etc). This
- library depends on libbasic.
-
-
- liblex - C/C++/ObjC lexing and preprocessing, identifier hash table,
- pragma handling, tokens, and macros. This depends on libbasic.
-
- libparse - C (for now) parsing and local semantic analysis. This library
- invokes coarse-grained 'Actions' provided by the client to do
- stuff (e.g. libsema builds ASTs). This depends on liblex.
-
- libsema - Provides a set of parser actions to build a standardized AST
- for programs. AST's are 'streamed' out a top-level declaration
- at a time, allowing clients to use decl-at-a-time processing,
- build up entire translation units, or even build 'whole
- program' ASTs depending on how they use the APIs. This depends
- on libast and libparse.
-
- librewrite - Fast, scalable rewriting of source code. This operates on
- the raw syntactic text of source code, allowing a client
- to insert and delete text in very large source files using
- the same source location information embedded in ASTs. This
- is intended to be a low-level API that is useful for
- higher-level clients and libraries such as code refactoring.
-
- libanalysis - Source-level dataflow analysis useful for performing analyses
- such as computing live variables. It also includes a
- path-sensitive "graph-reachability" engine for writing
- analyses that reason about different possible paths of
- execution through source code. This is currently being
- employed to write a set of checks for finding bugs in software.
-
- libcodegen - Lower the AST to LLVM IR for optimization & codegen. Depends
- on libast.
-
- clang - An example driver, client of the libraries at various levels.
- This depends on all these libraries, and on LLVM VMCore.
-
- This front-end has been intentionally built as a DAG of libraries, making it
- easy to reuse individual parts or replace pieces if desired. For example, to
- build a preprocessor, you take the Basic and Lexer libraries. If you want an
- indexer, you take those plus the Parser library and provide some actions for
- indexing. If you want a refactoring, static analysis, or source-to-source
- compiler tool, it makes sense to take those plus the AST building and semantic
- analyzer library. Finally, if you want to use this with the LLVM backend,
- you'd take these components plus the AST to LLVM lowering code.
-
- In the future I hope this toolkit will grow to include new and interesting
- components, including a C++ front-end, ObjC support, and a whole lot of other
- things.
-
- Finally, it should be pointed out that the goal here is to build something that
- is high-quality and industrial-strength: all the obnoxious features of the C
- family must be correctly supported (trigraphs, preprocessor arcana, K&R-style
- prototypes, GCC/MS extensions, etc). It cannot be used if it is not 'real'.
-
-
-II. Usage of clang driver:
-
- * Basic Command-Line Options:
- - Help: clang --help
- - Standard GCC options accepted: -E, -I*, -i*, -pedantic, -std=c90, etc.
- - To make diagnostics more gcc-like: -fno-caret-diagnostics -fno-show-column
- - Enable metric printing: -stats
-
- * -fsyntax-only is currently the default mode.
-
- * -E mode works the same way as GCC.
-
- * -Eonly mode does all preprocessing, but does not print the output,
- useful for timing the preprocessor.
-
- * -fsyntax-only is currently partially implemented, lacking some
- semantic analysis (some errors and warnings are not produced).
-
- * -parse-noop parses code without building an AST. This is useful
- for timing the cost of the parser without including AST building
- time.
-
- * -parse-ast builds ASTs, but doesn't print them. This is most
- useful for timing AST building vs -parse-noop.
-
- * -parse-ast-print pretty prints most expression and statements nodes.
-
- * -parse-ast-check checks that diagnostic messages that are expected
- are reported and that those which are reported are expected.
-
- * -dump-cfg builds ASTs and then CFGs. CFGs are then pretty-printed.
-
- * -view-cfg builds ASTs and then CFGs. CFGs are then visualized by
- invoking Graphviz.
-
- For more information on getting Graphviz to work with clang/LLVM,
- see: http://llvm.org/docs/ProgrammersManual.html#ViewGraph
-
-
-III. Current advantages over GCC:
-
- * Column numbers are fully tracked (no 256 col limit, no GCC-style pruning).
- * All diagnostics have column numbers, includes 'caret diagnostics', and they
- highlight regions of interesting code (e.g. the LHS and RHS of a binop).
- * Full diagnostic customization by client (can format diagnostics however they
- like, e.g. in an IDE or refactoring tool) through DiagnosticClient interface.
- * Built as a framework, can be reused by multiple tools.
- * All languages supported linked into same library (no cc1,cc1obj, ...).
- * mmap's code in read-only, does not dirty the pages like GCC (mem footprint).
- * LLVM License, can be linked into non-GPL projects.
- * Full diagnostic control, per diagnostic. Diagnostics are identified by ID.
- * Significantly faster than GCC at semantic analysis, parsing, preprocessing
- and lexing.
- * Defers exposing platform-specific stuff to as late as possible, tracks use of
- platform-specific features (e.g. #ifdef PPC) to allow 'portable bytecodes'.
- * The lexer doesn't rely on the "lexer hack": it has no notion of scope and
- does not categorize identifiers as types or variables -- this is up to the
- parser to decide.
-
-Potential Future Features:
-
- * Fine grained diag control within the source (#pragma enable/disable warning).
- * Better token tracking within macros? (Token came from this line, which is
- a macro argument instantiated here, recursively instantiated here).
- * Fast #import with a module system.
- * Dependency tracking: change to header file doesn't recompile every function
- that texually depends on it: recompile only those functions that need it.
- This is aka 'incremental parsing'.
-
-
-IV. Missing Functionality / Improvements
-
-Lexer:
- * Source character mapping. GCC supports ASCII and UTF-8.
- See GCC options: -ftarget-charset and -ftarget-wide-charset.
- * Universal character support. Experimental in GCC, enabled with
- -fextended-identifiers.
- * -fpreprocessed mode.
-
-Preprocessor:
- * #assert/#unassert
- * MSExtension: "L#param" stringizes to a wide string literal.
- * Add support for -M*
-
-Traditional Preprocessor:
- * Currently, we have none. :)
-