diff options
Diffstat (limited to 'docs/HistoricalNotes')
27 files changed, 0 insertions, 2287 deletions
diff --git a/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeas.txt b/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeas.txt deleted file mode 100644 index f086181192..0000000000 --- a/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeas.txt +++ /dev/null @@ -1,74 +0,0 @@ -Date: Sat, 18 Nov 2000 09:19:35 -0600 (CST) -From: Vikram Adve <vadve@cs.uiuc.edu> -To: Chris Lattner <lattner@cs.uiuc.edu> -Subject: a few thoughts - -I've been mulling over the virtual machine problem and I had some -thoughts about some things for us to think about discuss: - -1. We need to be clear on our goals for the VM. Do we want to emphasize - portability and safety like the Java VM? Or shall we focus on the - architecture interface first (i.e., consider the code generation and - processor issues), since the architecture interface question is also - important for portable Java-type VMs? - - This is important because the audiences for these two goals are very - different. Architects and many compiler people care much more about - the second question. The Java compiler and OS community care much more - about the first one. - - Also, while the architecture interface question is important for - Java-type VMs, the design constraints are very different. - - -2. Design issues to consider (an initial list that we should continue - to modify). Note that I'm not trying to suggest actual solutions here, - but just various directions we can pursue: - - a. A single-assignment VM, which we've both already been thinking about. - - b. A strongly-typed VM. One question is do we need the types to be - explicitly declared or should they be inferred by the dynamic compiler? - - c. How do we get more high-level information into the VM while keeping - to a low-level VM design? - - o Explicit array references as operands? An alternative is - to have just an array type, and let the index computations be - separate 3-operand instructions. - - o Explicit instructions to handle aliasing, e.g.s: - -- an instruction to say "I speculate that these two values are not - aliased, but check at runtime", like speculative execution in - EPIC? - -- or an instruction to check whether two values are aliased and - execute different code depending on the answer, somewhat like - predicated code in EPIC - - o (This one is a difficult but powerful idea.) - A "thread-id" field on every instruction that allows the static - compiler to generate a set of parallel threads, and then have - the runtime compiler and hardware do what they please with it. - This has very powerful uses, but thread-id on every instruction - is expensive in terms of instruction size and code size. - We would need to compactly encode it somehow. - - Also, this will require some reading on at least two other - projects: - -- Multiscalar architecture from Wisconsin - -- Simultaneous multithreading architecture from Washington - - o Or forget all this and stick to a traditional instruction set? - - -BTW, on an unrelated note, after the meeting yesterday, I did remember -that you had suggested doing instruction scheduling on SSA form instead -of a dependence DAG earlier in the semester. When we talked about -it yesterday, I didn't remember where the idea had come from but I -remembered later. Just giving credit where its due... - -Perhaps you can save the above as a file under RCS so you and I can -continue to expand on this. - ---Vikram - diff --git a/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeasResp.txt b/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeasResp.txt deleted file mode 100644 index 1c725f5aa7..0000000000 --- a/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeasResp.txt +++ /dev/null @@ -1,199 +0,0 @@ -Date: Sun, 19 Nov 2000 16:23:57 -0600 (CST) -From: Chris Lattner <sabre@nondot.org> -To: Vikram Adve <vadve@cs.uiuc.edu> -Subject: Re: a few thoughts - -Okay... here are a few of my thoughts on this (it's good to know that we -think so alike!): - -> 1. We need to be clear on our goals for the VM. Do we want to emphasize -> portability and safety like the Java VM? Or shall we focus on the -> architecture interface first (i.e., consider the code generation and -> processor issues), since the architecture interface question is also -> important for portable Java-type VMs? - -I forsee the architecture looking kinda like this: (which is completely -subject to change) - -1. The VM code is NOT guaranteed safe in a java sense. Doing so makes it - basically impossible to support C like languages. Besides that, - certifying a register based language as safe at run time would be a - pretty expensive operation to have to do. Additionally, we would like - to be able to statically eliminate many bounds checks in Java - programs... for example. - - 2. Instead, we can do the following (eventually): - * Java bytecode is used as our "safe" representation (to avoid - reinventing something that we don't add much value to). When the - user chooses to execute Java bytecodes directly (ie, not - precompiled) the runtime compiler can do some very simple - transformations (JIT style) to convert it into valid input for our - VM. Performance is not wonderful, but it works right. - * The file is scheduled to be compiled (rigorously) at a later - time. This could be done by some background process or by a second - processor in the system during idle time or something... - * To keep things "safe" ie to enforce a sandbox on Java/foreign code, - we could sign the generated VM code with a host specific private - key. Then before the code is executed/loaded, we can check to see if - the trusted compiler generated the code. This would be much quicker - than having to validate consistency (especially if bounds checks have - been removed, for example) - -> This is important because the audiences for these two goals are very -> different. Architects and many compiler people care much more about -> the second question. The Java compiler and OS community care much more -> about the first one. - -3. By focusing on a more low level virtual machine, we have much more room - for value add. The nice safe "sandbox" VM can be provided as a layer - on top of it. It also lets us focus on the more interesting compilers - related projects. - -> 2. Design issues to consider (an initial list that we should continue -> to modify). Note that I'm not trying to suggest actual solutions here, -> but just various directions we can pursue: - -Understood. :) - -> a. A single-assignment VM, which we've both already been thinking -> about. - -Yup, I think that this makes a lot of sense. I am still intrigued, -however, by the prospect of a minimally allocated VM representation... I -think that it could have definate advantages for certain applications -(think very small machines, like PDAs). I don't, however, think that our -initial implementations should focus on this. :) - -Here are some other auxilliary goals that I think we should consider: - -1. Primary goal: Support a high performance dynamic compilation - system. This means that we have an "ideal" division of labor between - the runtime and static compilers. Of course, the other goals of the - system somewhat reduce the importance of this point (f.e. portability - reduces performance, but hopefully not much) -2. Portability to different processors. Since we are most familiar with - x86 and solaris, I think that these two are excellent candidates when - we get that far... -3. Support for all languages & styles of programming (general purpose - VM). This is the point that disallows java style bytecodes, where all - array refs are checked for bounds, etc... -4. Support linking between different language families. For example, call - C functions directly from Java without using the nasty/slow/gross JNI - layer. This involves several subpoints: - A. Support for languages that require garbage collectors and integration - with languages that don't. As a base point, we could insist on - always using a conservative GC, but implement free as a noop, f.e. - -> b. A strongly-typed VM. One question is do we need the types to be -> explicitly declared or should they be inferred by the dynamic -> compiler? - - B. This is kind of similar to another idea that I have: make OOP - constructs (virtual function tables, class heirarchies, etc) explicit - in the VM representation. I believe that the number of additional - constructs would be fairly low, but would give us lots of important - information... something else that would/could be important is to - have exceptions as first class types so that they would be handled in - a uniform way for the entire VM... so that C functions can call Java - functions for example... - -> c. How do we get more high-level information into the VM while keeping -> to a low-level VM design? -> o Explicit array references as operands? An alternative is -> to have just an array type, and let the index computations be -> separate 3-operand instructions. - - C. In the model I was thinking of (subject to change of course), we - would just have an array type (distinct from the pointer - types). This would allow us to have arbitrarily complex index - expressions, while still distinguishing "load" from "Array load", - for example. Perhaps also, switch jump tables would be first class - types as well? This would allow better reasoning about the program. - -5. Support dynamic loading of code from various sources. Already - mentioned above was the example of loading java bytecodes, but we want - to support dynamic loading of VM code as well. This makes the job of - the runtime compiler much more interesting: it can do interprocedural - optimizations that the static compiler can't do, because it doesn't - have all of the required information (for example, inlining from - shared libraries, etc...) - -6. Define a set of generally useful annotations to add to the VM - representation. For example, a function can be analysed to see if it - has any sideeffects when run... also, the MOD/REF sets could be - calculated, etc... we would have to determine what is reasonable. This - would generally be used to make IP optimizations cheaper for the - runtime compiler... - -> o Explicit instructions to handle aliasing, e.g.s: -> -- an instruction to say "I speculate that these two values are not -> aliased, but check at runtime", like speculative execution in -> EPIC? -> -- or an instruction to check whether two values are aliased and -> execute different code depending on the answer, somewhat like -> predicated code in EPIC - -These are also very good points... if this can be determined at compile -time. I think that an epic style of representation (not the instruction -packing, just the information presented) could be a very interesting model -to use... more later... - -> o (This one is a difficult but powerful idea.) -> A "thread-id" field on every instruction that allows the static -> compiler to generate a set of parallel threads, and then have -> the runtime compiler and hardware do what they please with it. -> This has very powerful uses, but thread-id on every instruction -> is expensive in terms of instruction size and code size. -> We would need to compactly encode it somehow. - -Yes yes yes! :) I think it would be *VERY* useful to include this kind -of information (which EPIC architectures *implicitly* encode. The trend -that we are seeing supports this greatly: - -1. Commodity processors are getting massive SIMD support: - * Intel/Amd MMX/MMX2 - * AMD's 3Dnow! - * Intel's SSE/SSE2 - * Sun's VIS -2. SMP is becoming much more common, especially in the server space. -3. Multiple processors on a die are right around the corner. - -If nothing else, not designing this in would severely limit our future -expansion of the project... - -> Also, this will require some reading on at least two other -> projects: -> -- Multiscalar architecture from Wisconsin -> -- Simultaneous multithreading architecture from Washington -> -> o Or forget all this and stick to a traditional instruction set? - -Heh... :) Well, from a pure research point of view, it is almost more -attactive to go with the most extreme/different ISA possible. On one axis -you get safety and conservatism, and on the other you get degree of -influence that the results have. Of course the problem with pure research -is that often times there is no concrete product of the research... :) - -> BTW, on an unrelated note, after the meeting yesterday, I did remember -> that you had suggested doing instruction scheduling on SSA form instead -> of a dependence DAG earlier in the semester. When we talked about -> it yesterday, I didn't remember where the idea had come from but I -> remembered later. Just giving credit where its due... - -:) Thanks. - -> Perhaps you can save the above as a file under RCS so you and I can -> continue to expand on this. - -I think it makes sense to do so when we get our ideas more formalized and -bounce it back and forth a couple of times... then I'll do a more formal -writeup of our goals and ideas. Obviously our first implementation will -not want to do all of the stuff that I pointed out above... be we will -want to design the project so that we do not artificially limit ourselves -at sometime in the future... - -Anyways, let me know what you think about these ideas... and if they sound -reasonable... - --Chris - diff --git a/docs/HistoricalNotes/2000-12-06-EncodingIdea.txt b/docs/HistoricalNotes/2000-12-06-EncodingIdea.txt deleted file mode 100644 index 8c452924dd..0000000000 --- a/docs/HistoricalNotes/2000-12-06-EncodingIdea.txt +++ /dev/null @@ -1,30 +0,0 @@ -From: Chris Lattner [mailto:sabre@nondot.org] -Sent: Wednesday, December 06, 2000 6:41 PM -To: Vikram S. Adve -Subject: Additional idea with respect to encoding - -Here's another idea with respect to keeping the common case instruction -size down (less than 32 bits ideally): - -Instead of encoding an instruction to operate on two register numbers, -have it operate on two negative offsets based on the current register -number. Therefore, instead of using: - -r57 = add r55, r56 (r57 is the implicit dest register, of course) - -We could use: - -r57 = add -2, -1 - -My guess is that most SSA references are to recent values (especially if -they correspond to expressions like (x+y*z+p*q/ ...), so the negative -numbers would tend to stay small, even at the end of the procedure (where -the implicit register destination number could be quite large). Of course -the negative sign is reduntant, so you would be storing small integers -almost all of the time, and 5-6 bits worth of register number would be -plenty for most cases... - -What do you think? - --Chris - diff --git a/docs/HistoricalNotes/2000-12-06-MeetingSummary.txt b/docs/HistoricalNotes/2000-12-06-MeetingSummary.txt deleted file mode 100644 index b66e18556f..0000000000 --- a/docs/HistoricalNotes/2000-12-06-MeetingSummary.txt +++ /dev/null @@ -1,83 +0,0 @@ -SUMMARY -------- - -We met to discuss the LLVM instruction format and bytecode representation: - -ISSUES RESOLVED ---------------- - -1. We decided that we shall use a flat namespace to represent our - variables in SSA form, as opposed to having a two dimensional namespace - of the original variable and the SSA instance subscript. - -ARGUMENT AGAINST: - * A two dimensional namespace would be valuable when doing alias - analysis because the extra information can help limit the scope of - analysis. - -ARGUMENT FOR: - * Including this information would require that all users of the LLVM - bytecode would have to parse and handle it. This would slow down the - common case and inflate the instruction representation with another - infinite variable space. - -REASONING: - * It was decided that because original variable sources could be - reconstructed from SSA form in linear time, that it would be an - unjustified expense for the common case to include the extra - information for one optimization. Alias analysis itself is typically - greater than linear in asymptotic complexity, so this extra analaysis - would not affect the runtime of the optimization in a significant - way. Additionally, this would be an unlikely optimization to do at - runtime. - - -IDEAS TO CONSIDER ------------------ - -1. Including dominator information in the LLVM bytecode - representation. This is one example of an analysis result that may be - packaged with the bytecodes themselves. As a conceptual implementation - idea, we could include an immediate dominator number for each basic block - in the LLVM bytecode program. Basic blocks could be numbered according - to the order of occurance in the bytecode representation. - -2. Including loop header and body information. This would facilitate - detection of intervals and natural loops. - -UNRESOLVED ISSUES ------------------ - -1. Will oSUIF provide enough of an infrastructure to support the research - that we will be doing? We know that it has less than stellar - performance, but hope that this will be of little importance for our - static compiler. This could affect us if we decided to do some IP - research. Also we do not yet understand the level of exception support - currently implemented. - -2. Should we consider the requirements of a direct hardware implementation - of the LLVM when we design it? If so, several design issues should - have their priorities shifted. The other option is to focus on a - software layer interpreting the LLVM in all cases. - -3. Should we use some form of packetized format to improve forward - compatibility? For example, we could design the system to encode a - packet type and length field before analysis information, to allow a - runtime to skip information that it didn't understand in a bytecode - stream. The obvious benefit would be for compatibility, the drawback - is that it would tend to splinter that 'standard' LLVM definition. - -4. Should we use fixed length instructions or variable length - instructions? Fetching variable length instructions is expensive (for - either hardware or software based LLVM runtimes), but we have several - 'infinite' spaces that instructions operate in (SSA register numbers, - type spaces, or packet length [if packets were implemented]). Several - options were mentioned including: - A. Using 16 or 32 bit numbers, which would be 'big enough' - B. A scheme similar to how UTF-8 works, to encode infinite numbers - while keeping small number small. - C. Use something similar to Huffman encoding, so that the most common - numbers are the smallest. - --Chris - diff --git a/docs/HistoricalNotes/2001-01-31-UniversalIRIdea.txt b/docs/HistoricalNotes/2001-01-31-UniversalIRIdea.txt deleted file mode 100644 index 111706a344..0000000000 --- a/docs/HistoricalNotes/2001-01-31-UniversalIRIdea.txt +++ /dev/null @@ -1,39 +0,0 @@ -Date: Wed, 31 Jan 2001 12:04:33 -0600 -From: Vikram S. Adve <vadve@cs.uiuc.edu> -To: Chris Lattner <lattner@cs.uiuc.edu> -Subject: another thought - -I have a budding idea about making LLVM a little more ambitious: a -customizable runtime system that can be used to implement language-specific -virtual machines for many different languages. E.g., a C vm, a C++ vm, a -Java vm, a Lisp vm, .. - -The idea would be that LLVM would provide a standard set of runtime features -(some low-level like standard assembly instructions with code generation and -static and runtime optimization; some higher-level like type-safety and -perhaps a garbage collection library). Each language vm would select the -runtime features needed for that language, extending or customizing them as -needed. Most of the machine-dependent code-generation and optimization -features as well as low-level machine-independent optimizations (like PRE) -could be provided by LLVM and should be sufficient for any language, -simplifying the language compiler. (This would also help interoperability -between languages.) Also, some or most of the higher-level -machine-independent features like type-safety and access safety should be -reusable by different languages, with minor extensions. The language -compiler could then focus on language-specific analyses and optimizations. - -The risk is that this sounds like a universal IR -- something that the -compiler community has tried and failed to develop for decades, and is -universally skeptical about. No matter what we say, we won't be able to -convince anyone that we have a universal IR that will work. We need to -think about whether LLVM is different or if has something novel that might -convince people. E.g., the idea of providing a package of separable -features that different languages select from. Also, using SSA with or -without type-safety as the intermediate representation. - -One interesting starting point would be to discuss how a JVM would be -implemented on top of LLVM a bit more. That might give us clues on how to -structure LLVM to support one or more language VMs. - ---Vikram - diff --git a/docs/HistoricalNotes/2001-02-06-TypeNotationDebate.txt b/docs/HistoricalNotes/2001-02-06-TypeNotationDebate.txt deleted file mode 100644 index c09cf1f03c..0000000000 --- a/docs/HistoricalNotes/2001-02-06-TypeNotationDebate.txt +++ /dev/null @@ -1,67 +0,0 @@ -Date: Tue, 6 Feb 2001 20:27:37 -0600 (CST) -From: Chris Lattner <sabre@nondot.org> -To: Vikram S. Adve <vadve@cs.uiuc.edu> -Subject: Type notation debate... - -This is the way that I am currently planning on implementing types: - -Primitive Types: -type ::= void|bool|sbyte|ubyte|short|ushort|int|uint|long|ulong - -Method: -typelist ::= typelisth | /*empty*/ -typelisth ::= type | typelisth ',' type -type ::= type (typelist) - -Arrays (without and with size): -type ::= '[' type ']' | '[' INT ',' type ']' - -Pointer: -type ::= type '*' - -Structure: -type ::= '{' typelist '}' - -Packed: -type ::= '<' INT ',' type '>' - -Simple examples: - -[[ %4, int ]] - array of (array of 4 (int)) -[ { int, int } ] - Array of structure -[ < %4, int > ] - Array of 128 bit SIMD packets -int (int, [[int, %4]]) - Method taking a 2d array and int, returning int - - -Okay before you comment, please look at: - -http://www.research.att.com/~bs/devXinterview.html - -Search for "In another interview, you defined the C declarator syntax as -an experiment that failed. However, this syntactic construct has been -around for 27 years and perhaps more; why do you consider it problematic -(except for its cumbersome syntax)?" and read that response for me. :) - -Now with this syntax, his example would be represented as: - -[ %10, bool (int, int) * ] * - -vs - -bool (*(*)[10])(int, int) - -in C. - -Basically, my argument for this type construction system is that it is -VERY simple to use and understand (although it IS different than C, it is -very simple and straightforward, which C is NOT). In fact, I would assert -that most programmers TODAY do not understand pointers to member -functions, and have to look up an example when they have to write them. - -In my opinion, it is critically important to have clear and concise type -specifications, because types are going to be all over the programs. - -Let me know your thoughts on this. :) - --Chris - diff --git a/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp1.txt b/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp1.txt deleted file mode 100644 index 8bfefbf69f..0000000000 --- a/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp1.txt +++ /dev/null @@ -1,75 +0,0 @@ -Date: Thu, 8 Feb 2001 08:42:04 -0600 -From: Vikram S. Adve <vadve@cs.uiuc.edu> -To: Chris Lattner <sabre@nondot.org> -Subject: RE: Type notation debate... - -Chris, - -> Okay before you comment, please look at: -> -> http://www.research.att.com/~bs/devXinterview.html - -I read this argument. Even before that, I was already in agreement with you -and him that the C declarator syntax is difficult and confusing. - -But in fact, if you read the entire answer carefully, he came to the same -conclusion I do: that you have to go with familiar syntax over logical -syntax because familiarity is such a strong force: - - "However, familiarity is a strong force. To compare, in English, we -live -more or less happily with the absurd rules for "to be" (am, are, is, been, -was, were, ...) and all attempts to simplify are treated with contempt or -(preferably) humor. It be a curious world and it always beed." - -> Basically, my argument for this type construction system is that it is -> VERY simple to use and understand (although it IS different than C, it is -> very simple and straightforward, which C is NOT). In fact, I would assert -> that most programmers TODAY do not understand pointers to member -> functions, and have to look up an example when they have to write them. - -Again, I don't disagree with this at all. But to some extent this -particular problem is inherently difficult. Your syntax for the above -example may be easier for you to read because this is the way you have been -thinking about it. Honestly, I don't find it much easier than the C syntax. -In either case, I would have to look up an example to write pointers to -member functions. - -But pointers to member functions are nowhere near as common as arrays. And -the old array syntax: - type [ int, int, ...] -is just much more familiar and clear to people than anything new you -introduce, no matter how logical it is. Introducing a new syntax that may -make function pointers easier but makes arrays much more difficult seems -very risky to me. - -> In my opinion, it is critically important to have clear and concise type -> specifications, because types are going to be all over the programs. - -I absolutely agree. But the question is, what is more clear and concise? -The syntax programmers are used to out of years of experience or a new -syntax that they have never seen that has a more logical structure. I think -the answer is the former. Sometimes, you have to give up a better idea -because you can't overcome sociological barriers to it. Qwerty keyboards -and Windows are two classic examples of bad technology that are difficult to -root out. - -P.S. Also, while I agree that most your syntax is more logical, there is -one part that isn't: - -Arrays (without and with size): -type ::= '[' type ']' | '[' INT ',' type ']'. - -The arrays with size lists the dimensions and the type in a single list. -That is just too confusing: - [10, 40, int] -This seems to be a 3-D array where the third dimension is something strange. -It is too confusing to have a list of 3 things, some of which are dimensions -and one is a type. Either of the following would be better: - - array [10, 40] of int -or - int [10, 40] - ---Vikram - diff --git a/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp2.txt b/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp2.txt deleted file mode 100644 index 6e9784158a..0000000000 --- a/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp2.txt +++ /dev/null @@ -1,53 +0,0 @@ -Date: Thu, 8 Feb 2001 14:31:05 -0600 (CST) -From: Chris Lattner <sabre@nondot.org> -To: Vikram S. Adve <vadve@cs.uiuc.edu> -Subject: RE: Type notation debate... - -> Arrays (without and with size): -> type ::= '[' type ']' | '[' INT ',' type ']'. -> -> The arrays with size lists the dimensions and the type in a single list. -> That is just too confusing: - -> [10, 40, int] -> This seems to be a 3-D array where the third dimension is something strange. -> It is too confusing to have a list of 3 things, some of which are dimensions -> and one is a type. - -The above grammar indicates that there is only one integer parameter, ie -the upper bound. The lower bound is always implied to be zero, for -several reasons: - -* As a low level VM, we want to expose addressing computations - explicitly. Since the lower bound must always be known in a high level - language statically, the language front end can do the translation - automatically. -* This fits more closely with what Java needs, ie what we need in the - short term. Java arrays are always zero based. - -If a two element list is too confusing, I would recommend an alternate -syntax of: - -type ::= '[' type ']' | '[' INT 'x' type ']'. - -For example: - [12 x int] - [12x int] - [ 12 x [ 4x int ]] - -Which is syntactically nicer, and more explicit. - -> Either of the following would be better: -> array [10, 40] of int - -I considered this approach for arrays in general (ie array of int/ array -of 12 int), but found that it made declarations WAY too long. Remember -that because of the nature of llvm, you get a lot of types strewn all over -the program, and using the 'typedef' like facility is not a wonderful -option, because then types aren't explicit anymore. - -I find this email interesting, because you contradict the previous email -you sent, where you recommend that we stick to C syntax.... - --Chris - diff --git a/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp4.txt b/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp4.txt deleted file mode 100644 index 7b9032742a..0000000000 --- a/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp4.txt +++ /dev/null @@ -1,89 +0,0 @@ -> But in fact, if you read the entire answer carefully, he came to the same -> conclusion I do: that you have to go with familiar syntax over logical -> syntax because familiarity is such a strong force: -> "However, familiarity is a strong force. To compare, in English, we -live -> more or less happily with the absurd rules for "to be" (am, are, is, been, -> was, were, ...) and all attempts to simplify are treated with contempt or -> (preferably) humor. It be a curious world and it always beed." - -Although you have to remember that his situation was considerably -different than ours. He was in a position where he was designing a high -level language that had to be COMPATIBLE with C. Our language is such -that a new person would have to learn the new, different, syntax -anyways. Making them learn about the type system does not seem like much -of a stretch from learning the opcodes and how SSA form works, and how -everything ties together... - -> > Basically, my argument for this type construction system is that it is -> > VERY simple to use and understand (although it IS different than C, it is -> > very simple and straightforward, which C is NOT). In fact, I would assert -> > that most programmers TODAY do not understand pointers to member -> > functions, and have to look up an example when they have to write them. - -> Again, I don't disagree with this at all. But to some extent this -> particular problem is inherently difficult. Your syntax for the above -> example may be easier for you to read because this is the way you have been -> thinking about it. Honestly, I don't find it much easier than the C syntax. -> In either case, I would have to look up an example to write pointers to -> member functions. - -I would argue that because the lexical structure of the language is self -consistent, any person who spent a significant amount of time programming -in LLVM directly would understand how to do it without looking it up in a -manual. The reason this does not work for C is because you rarely have to -declare these pointers, and the syntax is inconsistent with the method -declaration and calling syntax. - -> But pointers to member functions are nowhere near as common as arrays. - -Very true. If you're implementing an object oriented language, however, -remember that you have to do all the pointer to member function stuff -yourself.... so everytime you invoke a virtual method one is involved -(instead of having C++ hide it for you behind "syntactic sugar"). - -> And the old array syntax: -> type [ int, int, ...] -> is just much more familiar and clear to people than anything new you -> introduce, no matter how logical it is. - -Erm... excuse me but how is this the "old array syntax"? If you are -arguing for consistency with C, you should be asking for 'type int []', -which is significantly different than the above (beside the above -introduces a new operator and duplicates information -needlessly). Basically what I am suggesting is exactly the above without -the fluff. So instead of: - - type [ int, int, ...] - -you use: - - type [ int ] - -> Introducing a new syntax that may -> make function pointers easier but makes arrays much more difficult seems -> very risky to me. - -This is not about function pointers. This is about consistency in the -type system, and consistency with the rest of the language. The point -above does not make arrays any more difficult to use, and makes the -structure of types much more obvious than the "c way". - -> > In my opinion, it is critically important to have clear and concise type -> > specifications, because types are going to be all over the programs. -> -> I absolutely agree. But the question is, what is more clear and concise? -> The syntax programmers are used to out of years of experience or a new -> syntax that they have never seen that has a more logical structure. I think -> the answer is the former. Sometimes, you have to give up a better idea -> because you can't overcome sociological barriers to it. Qwerty keyboards -> and Windows are two classic examples of bad technology that are difficult to -> root out. - -Very true, but you seem to be advocating a completely different Type -system than C has, in addition to it not offering the advantages of clear -structure that the system I recommended does... so you seem to not have a -problem with changing this, just with what I change it to. :) - --Chris - diff --git a/docs/HistoricalNotes/2001-02-09-AdveComments.txt b/docs/HistoricalNotes/2001-02-09-AdveComments.txt deleted file mode 100644 index 5503233c1e..0000000000 --- a/docs/HistoricalNotes/2001-02-09-AdveComments.txt +++ /dev/null @@ -1,120 +0,0 @@ -Ok, here are my comments and suggestions about the LLVM instruction set. -We should discuss some now, but can discuss many of them later, when we -revisit synchronization, type inference, and other issues. -(We have discussed some of the comments already.) - - -o We should consider eliminating the type annotation in cases where it is - essentially obvious from the instruction type, e.g., in br, it is obvious - that the first arg. should be a bool and the other args should be labels: - - br bool <cond>, label <iftrue>, label <iffalse> - - I think your point was that making all types explicit improves clarity - and readability. I agree to some extent, but it also comes at the cost - of verbosity. And when the types are obvious from people's experience - (e.g., in the br instruction), it doesn't seem to help as much. - - -o On reflection, I really like your idea of having the two different switch - types (even though they encode implementation techniques rather than - semantics). It should simplify building the CFG and my guess is it could - enable some significant optimizations, though we should think about which. - - -o In the lookup-indirect form of the switch, is there a reason not to make - the val-type uint? Most HLL switch statements (including Java and C++) - require that anyway. And it would also make the val-type uniform - in the two forms of the switch. - - I did see the switch-on-bool examples and, while cute, we can just use - the branch instructions in that particular case. - - -o I agree with your comment that we don't need 'neg'. - - -o There's a trade-off with the cast instruction: - + it avoids having to define all the upcasts and downcasts that are - valid for the operands of each instruction (you probably have thought - of other benefits also) - - it could make the bytecode significantly larger because there could - be a lot of cast operations - - -o Making the second arg. to 'shl' a ubyte seems good enough to me. - 255 positions seems adequate for several generations of machines - and is more compact than uint. - - -o I still have some major concerns about including malloc and free in the - language (either as builtin functions or instructions). LLVM must be - able to represent code from many different languages. Languages such as - C, C++ Java and Fortran 90 would not be able to use our malloc anyway - because each of them will want to provide a library implementation of it. - - This gets even worse when code from different languages is linked - into a single executable (which is fairly common in large apps). - Having a single malloc would just not suffice, and instead would simply - complicate the picture further because it adds an extra variant in - addition to the one each language provides. - - Instead, providing a default library version of malloc and free - (and perhaps a malloc_gc with garbage collection instead of free) - would make a good implementation available to anyone who wants it. - - I don't recall all your arguments in favor so let's discuss this again, - and soon. - - -o 'alloca' on the other hand sounds like a good idea, and the - implementation seems fairly language-independent so it doesn't have the - problems with malloc listed above. - - -o About indirect call: - Your option #2 sounded good to me. I'm not sure I understand your - concern about an explicit 'icall' instruction? - - -o A pair of important synchronization instr'ns to think about: - load-linked - store-conditional - - -o Other classes of instructions that are valuable for pipeline performance: - conditional-move - predicated instructions - - -o I believe tail calls are relatively easy to identify; do you know why - .NET has a tailcall instruction? - - -o I agree that we need a static data space. Otherwise, emulating global - data gets unnecessarily complex. - - -o About explicit parallelism: - - We once talked about adding a symbolic thread-id field to each - instruction. (It could be optional so single-threaded codes are - not penalized.) This could map well to multi-threaded architectures - while providing easy ILP for single-threaded onces. But it is probably - too radical an idea to include in a base version of LLVM. Instead, it - could a great topic for a separate study. - - What is the semantics of the IA64 stop bit? - - - - -o And finally, another thought about the syntax for arrays :-) - - Although this syntax: - array <dimension-list> of <type> - is verbose, it will be used only in the human-readable assembly code so - size should not matter. I think we should consider it because I find it - to be the clearest syntax. It could even make arrays of function - pointers somewhat readable. - diff --git a/docs/HistoricalNotes/2001-02-09-AdveCommentsResponse.txt b/docs/HistoricalNotes/2001-02-09-AdveCommentsResponse.txt deleted file mode 100644 index 5c87330fb7..0000000000 --- a/docs/HistoricalNotes/2001-02-09-AdveCommentsResponse.txt +++ /dev/null @@ -1,245 +0,0 @@ -From: Chris Lattner <sabre@nondot.org> -To: "Vikram S. Adve" <vadve@cs.uiuc.edu> -Subject: Re: LLVM Feedback - -I've included your feedback in the /home/vadve/lattner/llvm/docs directory -so that it will live in CVS eventually with the rest of LLVM. I've -significantly updated the documentation to reflect the changes you -suggested, as specified below: - -> We should consider eliminating the type annotation in cases where it is -> essentially obvious from the instruction type: -> br bool <cond>, label <iftrue>, label <iffalse> -> I think your point was that making all types explicit improves clarity -> and readability. I agree to some extent, but it also comes at the -> cost of verbosity. And when the types are obvious from people's -> experience (e.g., in the br instruction), it doesn't seem to help as -> much. - -Very true. We should discuss this more, but my reasoning is more of a -consistency argument. There are VERY few instructions that can have all -of the types eliminated, and doing so when available unnecesarily makes -the language more difficult to handle. Especially when you see 'int -%this' and 'bool %that' all over the place, I think it would be -disorienting to see: - - br %predicate, %iftrue, %iffalse - -for branches. Even just typing that once gives me the creeps. ;) Like I -said, we should probably discuss this further in person... - -> On reflection, I really like your idea of having the two different -> switch types (even though they encode implementation techniques rather -> than semantics). It should simplify building the CFG and my guess is it -> could enable some significant optimizations, though we should think -> about which. - -Great. I added a note to the switch section commenting on how the VM -should just use the instruction type as a hint, and that the -implementation may choose altermate representations (such as predicated -branches). - -> In the lookup-indirect form of the switch, is there a reason not to -> make the val-type uint? - -No. This was something I was debating for a while, and didn't really feel -strongly about either way. It is common to switch on other types in HLL's -(for example signed int's are particually common), but in this case, all -that will be added is an additional 'cast' instruction. I removed that -from the spec. - -> I agree with your comment that we don't need 'neg' - -Removed. - -> There's a trade-off with the cast instruction: -> + it avoids having to define all the upcasts and downcasts that are -> valid for the operands of each instruction (you probably have -> thought of other benefits also) -> - it could make the bytecode significantly larger because there could -> be a lot of cast operations - - + You NEED casts to represent things like: - void foo(float); - ... - int x; - ... - foo(x); - in a language like C. Even in a Java like language, you need upcasts - and some way to implement dynamic downcasts. - + Not all forms of instructions take every type (for example you can't - shift by a floating point number of bits), thus SOME programs will need - implicit casts. - -To be efficient and to avoid your '-' point above, we just have to be -careful to specify that the instructions shall operate on all common -types, therefore casting should be relatively uncommon. For example all -of the arithmetic operations work on almost all data types. - -> Making the second arg. to 'shl' a ubyte seems good enough to me. -> 255 positions seems adequate for several generations of machines - -Okay, that comment is removed. - -> and is more compact than uint. - -No, it isn't. Remember that the bytecode encoding saves value slots into -the bytecode instructions themselves, not constant values. This is -another case where we may introduce more cast instructions (but we will -also reduce the number of opcode variants that must be supported by a -virtual machine). Because most shifts are by constant values, I don't -think that we'll have to cast many shifts. :) - -> I still have some major concerns about including malloc and free in the -> language (either as builtin functions or instructions). - -Agreed. How about this proposal: - -malloc/free are either built in functions or actual opcodes. They provide -all of the type safety that the document would indicate, blah blah -blah. :) - -Now, because of all of the excellent points that you raised, an -implementation may want to override the default malloc/free behavior of -the program. To do this, they simply implement a "malloc" and -"free" function. The virtual machine will then be defined to use the user -defined malloc/free function (which return/take void*'s, not type'd -pointers like the builtin function would) if one is available, otherwise -fall back on a system malloc/free. - -Does this sound like a good compromise? It would give us all of the -typesafety/elegance in the language while still allowing the user to do -all the cool stuff they want to... - -> 'alloca' on the other hand sounds like a good idea, and the -> implementation seems fairly language-independent so it doesn't have the -> problems with malloc listed above. - -Okay, once we get the above stuff figured out, I'll put it all in the -spec. - -> About indirect call: -> Your option #2 sounded good to me. I'm not sure I understand your -> concern about an explicit 'icall' instruction? - -I worry too much. :) The other alternative has been removed. 'icall' is -now up in the instruction list next to 'call'. - -> I believe tail calls are relatively easy to identify; do you know why -> .NET has a tailcall instruction? - -Although I am just guessing, I believe it probably has to do with the fact -that they want languages like Haskell and lisp to be efficiently runnable -on their VM. Of course this means that the VM MUST implement tail calls -'correctly', or else life will suck. :) I would put this into a future -feature bin, because it could be pretty handy... - -> A pair of important synchronization instr'ns to think about: -> load-linked -> store-conditional - -What is 'load-linked'? I think that (at least for now) I should add these -to the 'possible extensions' section, because they are not immediately -needed... - -> Other classes of instructions that are valuable for pipeline -> performance: -> conditional-move -> predicated instructions - -Conditional move is effectly a special case of a predicated -instruction... and I think that all predicated instructions can possibly -be implemented later in LLVM. It would significantly change things, and -it doesn't seem to be very necessary right now. It would seem to -complicate flow control analysis a LOT in the virtual machine. I would -tend to prefer that a predicated architecture like IA64 convert from a -"basic block" representation to a predicated rep as part of it's dynamic -complication phase. Also, if a basic block contains ONLY a move, then -that can be trivally translated into a conditional move... - -> I agree that we need a static data space. Otherwise, emulating global -> data gets unnecessarily complex. - -Definately. Also a later item though. :) - -> We once talked about adding a symbolic thread-id field to each -> .. -> Instead, it could a great topic for a separate study. - -Agreed. :) - -> What is the semantics of the IA64 stop bit? - -Basically, the IA64 writes instructions like this: -mov ... -add ... -sub ... -op xxx -op xxx -;; -mov ... -add ... -sub ... -op xxx -op xxx -;; - -Where the ;; delimits a group of instruction with no dependencies between -them, which can all be executed concurrently (to the limits of the -available functional units). The ;; gets translated into a bit set in one -of the opcodes. - -The advantages of this representation is that you don't have to do some -kind of 'thread id scheduling' pass by having to specify ahead of time how -many threads to use, and the representation doesn't have a per instruction -overhead... - -> And finally, another thought about the syntax for arrays :-) -> Although this syntax: -> array <dimension-list> of <type> -> is verbose, it will be used only in the human-readable assembly code so -> size should not matter. I think we should consider it because I find it -> to be the clearest syntax. It could even make arrays of function -> pointers somewhat readable. - -My only comment will be to give you an example of why this is a bad -idea. :) - -Here is an example of using the switch statement (with my recommended -syntax): - -switch uint %val, label %otherwise, - [%3 x {uint, label}] [ { uint %57, label %l1 }, - { uint %20, label %l2 }, - { uint %14, label %l3 } ] - -Here it is with the syntax you are proposing: - -switch uint %val, label %otherwise, - array %3 of {uint, label} - array of {uint, label} - { uint %57, label %l1 }, - { uint %20, label %l2 }, - { uint %14, label %l3 } - -Which is ambiguous and very verbose. It would be possible to specify -constants with [] brackets as in my syntax, which would look like this: - -switch uint %val, label %otherwise, - array %3 of {uint, label} [ { uint %57, label %l1 }, - { uint %20, label %l2 }, - { uint %14, label %l3 } ] - -But then the syntax is inconsistent between type definition and constant -definition (why do []'s enclose the constants but not the types??). - -Anyways, I'm sure that there is much debate still to be had over -this... :) - --Chris - -http://www.nondot.org/~sabre/os/ -http://www.nondot.org/MagicStats/ -http://korbit.sourceforge.net/ - - diff --git a/docs/HistoricalNotes/2001-02-13-Reference-Memory.txt b/docs/HistoricalNotes/2001-02-13-Reference-Memory.txt deleted file mode 100644 index 2c7534d9da..0000000000 --- a/docs/HistoricalNotes/2001-02-13-Reference-Memory.txt +++ /dev/null @@ -1,39 +0,0 @@ -Date: Tue, 13 Feb 2001 13:29:52 -0600 (CST) -From: Chris Lattner <sabre@nondot.org> -To: Vikram S. Adve <vadve@cs.uiuc.edu> -Subject: LLVM Concerns... - - -I've updated the documentation to include load store and allocation -instructions (please take a look and let me know if I'm on the right -track): - -file:/home/vadve/lattner/llvm/docs/LangRef.html#memoryops - -I have a couple of concerns I would like to bring up: - -1. Reference types - Right now, I've spec'd out the language to have a pointer type, which - works fine for lots of stuff... except that Java really has - references: constrained pointers that cannot be manipulated: added and - subtracted, moved, etc... Do we want to have a type like this? It - could be very nice for analysis (pointer always points to the start of - an object, etc...) and more closely matches Java semantics. The - pointer type would be kept for C++ like semantics. Through analysis, - C++ pointers could be promoted to references in the LLVM - representation. - -2. Our "implicit" memory references in assembly language: - After thinking about it, this model has two problems: - A. If you do pointer analysis and realize that two stores are - independent and can share the same memory source object, there is - no way to represent this in either the bytecode or assembly. - B. When parsing assembly/bytecode, we effectively have to do a full - SSA generation/PHI node insertion pass to build the dependencies - when we don't want the "pinned" representation. This is not - cool. - I'm tempted to make memory references explicit in both the assembly and - bytecode to get around this... what do you think? - --Chris - diff --git a/docs/HistoricalNotes/2001-02-13-Reference-MemoryResponse.txt b/docs/HistoricalNotes/2001-02-13-Reference-MemoryResponse.txt deleted file mode 100644 index 505343378d..0000000000 --- a/docs/HistoricalNotes/2001-02-13-Reference-MemoryResponse.txt +++ /dev/null @@ -1,47 +0,0 @@ -Date: Tue, 13 Feb 2001 18:25:42 -0600 -From: Vikram S. Adve <vadve@cs.uiuc.edu> -To: Chris Lattner <sabre@nondot.org> -Subject: RE: LLVM Concerns... - -> 1. Reference types -> Right now, I've spec'd out the language to have a pointer type, which -> works fine for lots of stuff... except that Java really has -> references: constrained pointers that cannot be manipulated: added and -> subtracted, moved, etc... Do we want to have a type like this? It -> could be very nice for analysis (pointer always points to the start of -> an object, etc...) and more closely matches Java semantics. The -> pointer type would be kept for C++ like semantics. Through analysis, -> C++ pointers could be promoted to references in the LLVM -> representation. - - -You're right, having references would be useful. Even for C++ the *static* -compiler could generate references instead of pointers with fairly -straightforward analysis. Let's include a reference type for now. But I'm -also really concerned that LLVM is becoming big and complex and (perhaps) -too high-level. After we get some initial performance results, we may have -a clearer idea of what our goals should be and we should revisit this -question then. - -> 2. Our "implicit" memory references in assembly language: -> After thinking about it, this model has two problems: -> A. If you do pointer analysis and realize that two stores are -> independent and can share the same memory source object, - -not sure what you meant by "share the same memory source object" - -> there is -> no way to represent this in either the bytecode or assembly. -> B. When parsing assembly/bytecode, we effectively have to do a full -> SSA generation/PHI node insertion pass to build the dependencies -> when we don't want the "pinned" representation. This is not -> cool. - -I understand the concern. But again, let's focus on the performance first -and then look at the language design issues. E.g., it would be good to know -how big the bytecode files are before expanding them further. I am pretty -keen to explore the implications of LLVM for mobile devices. Both bytecode -size and power consumption are important to consider there. - ---Vikram - diff --git a/docs/HistoricalNotes/2001-04-16-DynamicCompilation.txt b/docs/HistoricalNotes/2001-04-16-DynamicCompilation.txt deleted file mode 100644 index 5f7843ab56..0000000000 --- a/docs/HistoricalNotes/2001-04-16-DynamicCompilation.txt +++ /dev/null @@ -1,49 +0,0 @@ -By Chris: - -LLVM has been designed with two primary goals in mind. First we strive to -enable the best possible division of labor between static and dynamic -compilers, and second, we need a flexible and powerful interface -between these two complementary stages of compilation. We feel that -providing a solution to these two goals will yield an excellent solution -to the performance problem faced by modern architectures and programming -languages. - -A key insight into current compiler and runtime systems is that a -compiler may fall in anywhere in a "continuum of compilation" to do its -job. On one side, scripting languages statically compile nothing and -dynamically compile (or equivalently, interpret) everything. On the far -other side, traditional static compilers process everything statically and -nothing dynamically. These approaches have typically been seen as a -tradeoff between performance and portability. On a deeper level, however, -there are two reasons that optimal system performance may be obtained by a -system somewhere in between these two extremes: Dynamic application -behavior and social constraints. - -From a technical perspective, pure static compilation cannot ever give -optimal performance in all cases, because applications have varying dynamic -behavior that the static compiler cannot take into consideration. Even -compilers that support profile guided optimization generate poor code in -the real world, because using such optimization tunes that application -to one particular usage pattern, whereas real programs (as opposed to -benchmarks) often have several different usage patterns. - -On a social level, static compilation is a very shortsighted solution to -the performance problem. Instruction set architectures (ISAs) continuously -evolve, and each implementation of an ISA (a processor) must choose a set -of tradeoffs that make sense in the market context that it is designed for. -With every new processor introduced, the vendor faces two fundamental -problems: First, there is a lag time between when a processor is introduced -to when compilers generate quality code for the architecture. Secondly, -even when compilers catch up to the new architecture there is often a large -body of legacy code that was compiled for previous generations and will -not or can not be upgraded. Thus a large percentage of code running on a -processor may be compiled quite sub-optimally for the current -characteristics of the dynamic execution environment. - -For these reasons, LLVM has been designed from the beginning as a long-term -solution to these problems. Its design allows the large body of platform -independent, static, program optimizations currently in compilers to be -reused unchanged in their current form. It also provides important static -type information to enable powerful dynamic and link time optimizations -to be performed quickly and efficiently. This combination enables an -increase in effective system performance for real world environments. diff --git a/docs/HistoricalNotes/2001-05-18-ExceptionHandling.txt b/docs/HistoricalNotes/2001-05-18-ExceptionHandling.txt deleted file mode 100644 index b546301d35..0000000000 --- a/docs/HistoricalNotes/2001-05-18-ExceptionHandling.txt +++ /dev/null @@ -1,202 +0,0 @@ -Meeting notes: Implementation idea: Exception Handling in C++/Java - -The 5/18/01 meeting discussed ideas for implementing exceptions in LLVM. -We decided that the best solution requires a set of library calls provided by -the VM, as well as an extension to the LLVM function invocation syntax. - -The LLVM function invocation instruction previously looks like this (ignoring -types): - - call func(arg1, arg2, arg3) - -The extension discussed today adds an optional "with" clause that -associates a label with the call site. The new syntax looks like this: - - call func(arg1, arg2, arg3) with funcCleanup - -This funcHandler always stays tightly associated with the call site (being -encoded directly into the call opcode itself), and should be used whenever -there is cleanup work that needs to be done for the current function if -an exception is thrown by func (or if we are in a try block). - -To support this, the VM/Runtime provide the following simple library -functions (all syntax in this document is very abstract): - -typedef struct { something } %frame; - The VM must export a "frame type", that is an opaque structure used to - implement different types of stack walking that may be used by various - language runtime libraries. We imagine that it would be typical to - represent a frame with a PC and frame pointer pair, although that is not - required. - -%frame getStackCurrentFrame(); - Get a frame object for the current function. Note that if the current - function was inlined into its caller, the "current" frame will belong to - the "caller". - -bool isFirstFrame(%frame f); - Returns true if the specified frame is the top level (first activated) frame - for this thread. For the main thread, this corresponds to the main() - function, for a spawned thread, it corresponds to the thread function. - -%frame getNextFrame(%frame f); - Return the previous frame on the stack. This function is undefined if f - satisfies the predicate isFirstFrame(f). - -Label *getFrameLabel(%frame f); - If a label was associated with f (as discussed below), this function returns - it. Otherwise, it returns a null pointer. - -doNonLocalBranch(Label *L); - At this point, it is not clear whether this should be a function or - intrinsic. It should probably be an intrinsic in LLVM, but we'll deal with - this issue later. - - -Here is a motivating example that illustrates how these facilities could be -used to implement the C++ exception model: - -void TestFunction(...) { - A a; B b; - foo(); // Any function call may throw - bar(); - C c; - - try { - D d; - baz(); - } catch (int) { - ...int Stuff... - // execution continues after the try block: the exception is consumed - } catch (double) { - ...double stuff... - throw; // Exception is propogated - } -} - -This function would compile to approximately the following code (heavy -pseudo code follows): - -Func: - %a = alloca A - A::A(%a) // These ctors & dtors could throw, but we ignore this - %b = alloca B // minor detail for this example - B::B(%b) - - call foo() with fooCleanup // An exception in foo is propogated to fooCleanup - call bar() with barCleanup // An exception in bar is propogated to barCleanup - - %c = alloca C - C::C(c) - %d = alloca D - D::D(d) - call baz() with bazCleanup // An exception in baz is propogated to bazCleanup - d->~D(); -EndTry: // This label corresponds to the end of the try block - c->~C() // These could also throw, these are also ignored - b->~B() - a->~A() - return - -Note that this is a very straight forward and literal translation: exactly -what we want for zero cost (when unused) exception handling. Especially on -platforms with many registers (ie, the IA64) setjmp/longjmp style exception -handling is *very* impractical. Also, the "with" clauses describe the -control flow paths explicitly so that analysis is not adversly effected. - -The foo/barCleanup labels are implemented as: - -TryCleanup: // Executed if an exception escapes the try block - c->~C() -barCleanup: // Executed if an exception escapes from bar() - // fall through -fooCleanup: // Executed if an exception escapes from foo() - b->~B() - a->~A() - Exception *E = getThreadLocalException() - call throw(E) // Implemented by the C++ runtime, described below - -Which does the work one would expect. getThreadLocalException is a function -implemented by the C++ support library. It returns the current exception -object for the current thread. Note that we do not attempt to recycle the -shutdown code from before, because performance of the mainline code is -critically important. Also, obviously fooCleanup and barCleanup may be -merged and one of them eliminated. This just shows how the code generator -would most likely emit code. - -The bazCleanup label is more interesting. Because the exception may be caught -by the try block, we must dispatch to its handler... but it does not exist -on the call stack (it does not have a VM Call->Label mapping installed), so -we must dispatch statically with a goto. The bazHandler thus appears as: - -bazHandler: - d->~D(); // destruct D as it goes out of scope when entering catch clauses - goto TryHandler - -In general, TryHandler is not the same as bazHandler, because multiple -function calls could be made from the try block. In this case, trivial -optimization could merge the two basic blocks. TryHandler is the code -that actually determines the type of exception, based on the Exception object -itself. For this discussion, assume that the exception object contains *at -least*: - -1. A pointer to the RTTI info for the contained object -2. A pointer to the dtor for the contained object -3. The contained object itself - -Note that it is necessary to maintain #1 & #2 in the exception object itself -because objects without virtual function tables may be thrown (as in this -example). Assuming this, TryHandler would look something like this: - -TryHandler: - Exception *E = getThreadLocalException(); - switch (E->RTTIType) { - case IntRTTIInfo: - ...int Stuff... // The action to perform from the catch block - break; - case DoubleRTTIInfo: - ...double Stuff... // The action to perform from the catch block - goto TryCleanup // This catch block rethrows the exception - break; // Redundant, eliminated by the optimizer - default: - goto TryCleanup // Exception not caught, rethrow - } - - // Exception was consumed - if (E->dtor) - E->dtor(E->object) // Invoke the dtor on the object if it exists - goto EndTry // Continue mainline code... - -And that is all there is to it. - -The throw(E) function would then be implemented like this (which may be -inlined into the caller through standard optimization): - -function throw(Exception *E) { - // Get the start of the stack trace... - %frame %f = call getStackCurrentFrame() - - // Get the label information that corresponds to it - label * %L = call getFrameLabel(%f) - while (%L == 0 && !isFirstFrame(%f)) { - // Loop until a cleanup handler is found - %f = call getNextFrame(%f) - %L = call getFrameLabel(%f) - } - - if (%L != 0) { - call setThreadLocalException(E) // Allow handlers access to this... - call doNonLocalBranch(%L) - } - // No handler found! - call BlowUp() // Ends up calling the terminate() method in use -} - -That's a brief rundown of how C++ exception handling could be implemented in -llvm. Java would be very similar, except it only uses destructors to unlock -synchronized blocks, not to destroy data. Also, it uses two stack walks: a -nondestructive walk that builds a stack trace, then a destructive walk that -unwinds the stack as shown here. - -It would be trivial to get exception interoperability between C++ and Java. - diff --git a/docs/HistoricalNotes/2001-05-19-ExceptionResponse.txt b/docs/HistoricalNotes/2001-05-19-ExceptionResponse.txt deleted file mode 100644 index 3375365f54..0000000000 --- a/docs/HistoricalNotes/2001-05-19-ExceptionResponse.txt +++ /dev/null @@ -1,45 +0,0 @@ -Date: Sat, 19 May 2001 19:09:13 -0500 (CDT) -From: Chris Lattner <sabre@nondot.org> -To: Vikram S. Adve <vadve@cs.uiuc.edu> -Subject: RE: Meeting writeup - -> I read it through and it looks great! - -Thanks! - -> The finally clause in Java may need more thought. The code for this clause -> is like a subroutine because it needs to be entered from many points (end of -> try block and beginning of each catch block), and then needs to *return to -> the place from where the code was entered*. That's why JVM has the -> jsr/jsr_w instruction. - -Hrm... I guess that is an implementation decision. It can either be -modelled as a subroutine (as java bytecodes do), which is really -gross... or it can be modelled as code duplication (emitted once inline, -then once in the exception path). Because this could, at worst, -slightly less than double the amount of code in a function (it is -bounded) I don't think this is a big deal. One of the really nice things -about the LLVM representation is that it still allows for runtime code -generation for exception paths (exceptions paths are not compiled until -needed). Obviously a static compiler couldn't do this though. :) - -In this case, only one copy of the code would be compiled... until the -other one is needed on demand. Also this strategy fits with the "zero -cost" exception model... the standard case is not burdened with extra -branches or "call"s. - -> I suppose you could save the return address in a particular register -> (specific to this finally block), jump to the finally block, and then at the -> end of the finally block, jump back indirectly through this register. It -> will complicate building the CFG but I suppose that can be handled. It is -> also unsafe in terms of checking where control returns (which is I suppose -> why the JVM doesn't use this). - -I think that a code duplication method would be cleaner, and would avoid -the caveats that you mention. Also, it does not slow down the normal case -with an indirect branch... - -Like everything, we can probably defer a final decision until later. :) - --Chris - diff --git a/docs/HistoricalNotes/2001-06-01-GCCOptimizations.txt b/docs/HistoricalNotes/2001-06-01-GCCOptimizations.txt deleted file mode 100644 index 97af16a2da..0000000000 --- a/docs/HistoricalNotes/2001-06-01-GCCOptimizations.txt +++ /dev/null @@ -1,63 +0,0 @@ -Date: Fri, 1 Jun 2001 16:38:17 -0500 (CDT) -From: Chris Lattner <sabre@nondot.org> -To: Vikram S. Adve <vadve@cs.uiuc.edu> -Subject: Interesting: GCC passes - - -Take a look at this document (which describes the order of optimizations -that GCC performs): - -http://gcc.gnu.org/onlinedocs/gcc_17.html - -The rundown is that after RTL generation, the following happens: - -1 . [t] jump optimization (jumps to jumps, etc) -2 . [t] Delete unreachable code -3 . Compute live ranges for CSE -4 . [t] Jump threading (jumps to jumps with identical or inverse conditions) -5 . [t] CSE -6 . *** Conversion to SSA -7 . [t] SSA Based DCE -8 . *** Conversion to LLVM -9 . UnSSA -10. GCSE -11. LICM -12. Strength Reduction -13. Loop unrolling -14. [t] CSE -15. [t] DCE -16. Instruction combination, register movement, scheduling... etc. - -I've marked optimizations with a [t] to indicate things that I believe to -be relatively trivial to implement in LLVM itself. The time consuming -things to reimplement would be SSA based PRE, Strength reduction & loop -unrolling... these would be the major things we would miss out on if we -did LLVM creation from tree code [inlining and other high level -optimizations are done on the tree representation]. - -Given the lack of "strong" optimizations that would take a long time to -reimplement, I am leaning a bit more towards creating LLVM from the tree -code. Especially given that SGI has GPL'd their compiler, including many -SSA based optimizations that could be adapted (besides the fact that their -code looks MUCH nicer than GCC :) - -Even if we choose to do LLVM code emission from RTL, we will almost -certainly want to move LLVM emission from step 8 down until at least CSE -has been rerun... which causes me to wonder if the SSA generation code -will still work (due to global variable dependencies and stuff). I assume -that it can be made to work, but might be a little more involved than we -would like. - -I'm continuing to look at the Tree -> RTL code. It is pretty gross -because they do some of the translation a statement at a time, and some -of it a function at a time... I'm not quite clear why and how the -distinction is drawn, but it does not appear that there is a wonderful -place to attach extra info. - -Anyways, I'm proceeding with the RTL -> LLVM conversion phase for now. We -can talk about this more on Monday. - -Wouldn't it be nice if there were a obvious decision to be made? :) - --Chris - diff --git a/docs/HistoricalNotes/2001-06-01-GCCOptimizations2.txt b/docs/HistoricalNotes/2001-06-01-GCCOptimizations2.txt deleted file mode 100644 index 6c9e0971a0..0000000000 --- a/docs/HistoricalNotes/2001-06-01-GCCOptimizations2.txt +++ /dev/null @@ -1,71 +0,0 @@ -Date: Fri, 1 Jun 2001 17:08:44 -0500 (CDT) -From: Chris Lattner <sabre@nondot.org> -To: Vikram S. Adve <vadve@cs.uiuc.edu> -Subject: RE: Interesting: GCC passes - -> That is very interesting. I agree that some of these could be done on LLVM -> at link-time, but it is the extra time required that concerns me. Link-time -> optimization is severely time-constrained. - -If we were to reimplement any of these optimizations, I assume that we -could do them a translation unit at a time, just as GCC does now. This -would lead to a pipeline like this: - -Static optimizations, xlation unit at a time: -.c --GCC--> .llvm --llvmopt--> .llvm - -Link time optimizations: -.llvm --llvm-ld--> .llvm --llvm-link-opt--> .llvm - -Of course, many optimizations could be shared between llvmopt and -llvm-link-opt, but the wouldn't need to be shared... Thus compile time -could be faster, because we are using a "smarter" IR (SSA based). - -> BTW, about SGI, "borrowing" SSA-based optimizations from one compiler and -> putting it into another is not necessarily easier than re-doing it. -> Optimization code is usually heavily tied in to the specific IR they use. - -Understood. The only reason that I brought this up is because SGI's IR is -more similar to LLVM than it is different in many respects (SSA based, -relatively low level, etc), and could be easily adapted. Also their -optimizations are written in C++ and are actually somewhat -structured... of course it would be no walk in the park, but it would be -much less time consuming to adapt, say, SSA-PRE than to rewrite it. - -> But your larger point is valid that adding SSA based optimizations is -> feasible and should be fun. (Again, link time cost is the issue.) - -Assuming linktime cost wasn't an issue, the question is: -Does using GCC's backend buy us anything? - -> It also occurs to me that GCC is probably doing quite a bit of back-end -> optimization (step 16 in your list). Do you have a breakdown of that? - -Not really. The irritating part of GCC is that it mixes it all up and -doesn't have a clean seperation of concerns. A lot of the "back end -optimization" happens right along with other data optimizations (ie, CSE -of machine specific things). - -As far as REAL back end optimizations go, it looks something like this: - -1. Instruction combination: try to make CISCy instructions, if available -2. Register movement: try to get registers in the right places for the -architecture to avoid register to register moves. For example, try to get -the first argument of a function to naturally land in %o0 for sparc. -3. Instruction scheduling: 'nuff said :) -4. Register class preferencing: ?? -5. Local register allocation -6. global register allocation -7. Spilling -8. Local regalloc -9. Jump optimization -10. Delay slot scheduling -11. Branch shorting for CISC machines -12. Instruction selection & peephole optimization -13. Debug info output - -But none of this would be usable for LLVM anyways, unless we were using -GCC as a static compiler. - --Chris - diff --git a/docs/HistoricalNotes/2001-06-20-.NET-Differences.txt b/docs/HistoricalNotes/2001-06-20-.NET-Differences.txt deleted file mode 100644 index 1bc2eae746..0000000000 --- a/docs/HistoricalNotes/2001-06-20-.NET-Differences.txt +++ /dev/null @@ -1,30 +0,0 @@ -Date: Wed, 20 Jun 2001 12:32:22 -0500 -From: Vikram Adve <vadve@cs.uiuc.edu> -To: Chris Lattner <lattner@cs.uiuc.edu> -Subject: .NET vs. our VM - -One significant difference between .NET CLR and our VM is that the CLR -includes full information about classes and inheritance. In fact, I just -sat through the paper on adding templates to .NET CLR, and the speaker -indicated that the goal seems to be to do simple static compilation (very -little lowering or optimization). Also, the templates implementation in CLR -"relies on dynamic class loading and JIT compilation". - -This is an important difference because I think there are some significant -advantages to have a much lower level VM layer, and do significant static -analysis and optimization. - -I also talked to the lead guy for KAI's C++ compiler (Arch Robison) and he -said that SGI and other commercial compilers have included options to export -their *IR* next to the object code (i.e., .il files) and use them for -link-time code generation. In fact, he said that the .o file was nearly -empty and was entirely generated from the .il at link-time. But he agreed -that this limited the link-time interprocedural optimization to modules -compiled by the same compiler, whereas our approach allows us to link and -optimize modules from multiple different compilers. (Also, of course, they -don't do anything for runtime optimization). - -All issues to bring up in Related Work. - ---Vikram - diff --git a/docs/HistoricalNotes/2001-07-06-LoweringIRForCodeGen.txt b/docs/HistoricalNotes/2001-07-06-LoweringIRForCodeGen.txt deleted file mode 100644 index 3e10416fe6..0000000000 --- a/docs/HistoricalNotes/2001-07-06-LoweringIRForCodeGen.txt +++ /dev/null @@ -1,31 +0,0 @@ -Date: Fri, 6 Jul 2001 16:56:56 -0500 -From: Vikram S. Adve <vadve@cs.uiuc.edu> -To: Chris Lattner <lattner@cs.uiuc.edu> -Subject: lowering the IR - -BTW, I do think that we should consider lowering the IR as you said. I -didn't get time to raise it today, but it comes up with the SPARC -move-conditional instruction. I don't think we want to put that in the core -VM -- it is a little too specialized. But without a corresponding -conditional move instruction in the VM, it is pretty difficult to maintain a -close mapping between VM and machine code. Other architectures may have -other such instructions. - -What I was going to suggest was that for a particular processor, we define -additional VM instructions that match some of the unusual opcodes on the -processor but have VM semantics otherwise, i.e., all operands are in SSA -form and typed. This means that we can re-generate core VM code from the -more specialized code any time we want (so that portability is not lost). - -Typically, a static compiler like gcc would generate just the core VM, which -is relatively portable. Anyone (an offline tool, the linker, etc., or even -the static compiler itself if it chooses) can transform that into more -specialized target-specific VM code for a particular architecture. If the -linker does it, it can do it after all machine-independent optimizations. -This would be the most convenient, but not necessary. - -The main benefit of lowering will be that we will be able to retain a close -mapping between VM and machine code. - ---Vikram - diff --git a/docs/HistoricalNotes/2001-09-18-OptimizeExceptions.txt b/docs/HistoricalNotes/2001-09-18-OptimizeExceptions.txt deleted file mode 100644 index 9379081018..0000000000 --- a/docs/HistoricalNotes/2001-09-18-OptimizeExceptions.txt +++ /dev/null @@ -1,56 +0,0 @@ -Date: Tue, 18 Sep 2001 00:38:37 -0500 (CDT) -From: Chris Lattner <sabre@nondot.org> -To: Vikram S. Adve <vadve@cs.uiuc.edu> -Subject: Idea for a simple, useful link time optimization - - -In C++ programs, exceptions suck, and here's why: - -1. In virtually all function calls, you must assume that the function - throws an exception, unless it is defined as 'nothrow'. This means - that every function call has to have code to invoke dtors on objects - locally if one is thrown by the function. Most functions don't throw - exceptions, so this code is dead [with all the bad effects of dead - code, including icache pollution]. -2. Declaring a function nothrow causes catch blocks to be added to every - call that isnot provably nothrow. This makes them very slow. -3. Extra extraneous exception edges reduce the opportunity for code - motion. -4. EH is typically implemented with large lookup tables. Ours is going to - be much smaller (than the "standard" way of doing it) to start with, - but eliminating it entirely would be nice. :) -5. It is physically impossible to correctly put (accurate, correct) - exception specifications on generic, templated code. But it is trivial - to analyze instantiations of said code. -6. Most large C++ programs throw few exceptions. Most well designed - programs only throw exceptions in specific planned portions of the - code. - -Given our _planned_ model of handling exceptions, all of this would be -pretty trivial to eliminate through some pretty simplistic interprocedural -analysis. The DCE factor alone could probably be pretty significant. The -extra code motion opportunities could also be exploited though... - -Additionally, this optimization can be implemented in a straight forward -conservative manner, allowing libraries to be optimized or individual -files even (if there are leaf functions visible in the translation unit -that are called). - -I think it's a reasonable optimization that hasn't really been addressed -(because assembly is way too low level for this), and could have decent -payoffs... without being a overly complex optimization. - -After I wrote all of that, I found this page that is talking about -basically the same thing I just wrote, except that it is translation unit -at a time, tree based approach: -http://www.ocston.org/~jls/ehopt.html - -but is very useful from "expected gain" and references perspective. Note -that their compiler is apparently unable to inline functions that use -exceptions, so there numbers are pretty worthless... also our results -would (hopefully) be better because it's interprocedural... - -What do you think? - --Chris - diff --git a/docs/HistoricalNotes/2002-05-12-InstListChange.txt b/docs/HistoricalNotes/2002-05-12-InstListChange.txt deleted file mode 100644 index 004edb068d..0000000000 --- a/docs/HistoricalNotes/2002-05-12-InstListChange.txt +++ /dev/null @@ -1,55 +0,0 @@ -Date: Sun, 12 May 2002 17:12:53 -0500 (CDT) -From: Chris Lattner <sabre@nondot.org> -To: "Vikram S. Adve" <vadve@cs.uiuc.edu> -Subject: LLVM change - -There is a fairly fundemental change that I would like to make to the LLVM -infrastructure, but I'd like to know if you see any drawbacks that I -don't... - -Basically right now at the basic block level, each basic block contains an -instruction list (returned by getInstList()) that is a ValueHolder of -instructions. To iterate over instructions, we must actually iterate over -the instlist, and access the instructions through the instlist. - -To add or remove an instruction from a basic block, we need to get an -iterator to an instruction, which, given just an Instruction*, requires a -linear search of the basic block the instruction is contained in... just -to insert an instruction before another instruction, or to delete an -instruction! This complicates algorithms that should be very simple (like -simple constant propogation), because they aren't actually sparse anymore, -they have to traverse basic blocks to remove constant propogated -instructions. - -Additionally, adding or removing instructions to a basic block -_invalidates all iterators_ pointing into that block, which is really -irritating. - -To fix these problems (and others), I would like to make the ordering of -the instructions be represented with a doubly linked list in the -instructions themselves, instead of an external data structure. This is -how many other representations do it, and frankly I can't remember why I -originally implemented it the way I did. - -Long term, all of the code that depends on the nasty features in the -instruction list (which can be found by grep'ing for getInstList()) will -be changed to do nice local transformations. In the short term, I'll -change the representation, but preserve the interface (including -getInstList()) so that all of the code doesn't have to change. - -Iteration over the instructions in a basic block remains the simple: -for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; ++I) ... - -But we will also support: -for (Instruction *I = BB->front(); I; I = I->getNext()) ... - -After converting instructions over, I'll convert basic blocks and -functions to have a similar interface. - -The only negative aspect of this change that I see is that it increases -the amount of memory consumed by one pointer per instruction. Given the -benefits, I think this is a very reasonable tradeoff. - -What do you think? - --Chris diff --git a/docs/HistoricalNotes/2002-06-25-MegaPatchInfo.txt b/docs/HistoricalNotes/2002-06-25-MegaPatchInfo.txt deleted file mode 100644 index 2ca46117ca..0000000000 --- a/docs/HistoricalNotes/2002-06-25-MegaPatchInfo.txt +++ /dev/null @@ -1,72 +0,0 @@ -Changes: -* Change the casting code to be const correct. Now, doing this is invalid: - const Value *V = ...; - Instruction *I = dyn_cast<Instruction>(V); - instead, the second line should be: - const Instruction *I = dyn_cast<Instruction>(V); - -* Change the casting code to allow casting a reference value thus: - const Value &V = ...; - Instruction &I = cast<Instruction>(V); - - dyn_cast does not work with references, because it must return a null pointer - on failure. - -* Fundamentally change how instructions and other values are represented. - Before, every llvm container was an instance of the ValueHolder template, - instantiated for each container type. This ValueHolder was effectively a - wrapper around a vector of pointers to the sub-objects. - - Now, instead of having a vector to pointers of objects, the objects are - maintained in a doubly linked list of values (ie each Instruction now has - Next & Previous fields). The containers are now instances of ilist (intrusive - linked list class), which use the next and previous fields to chain them - together. The advantage of this implementation is that iterators can be - formed directly from pointers to the LLVM value, and invalidation is much - easier to handle. - -* As part of the above change, dereferencing an iterator (for example: - BasicBlock::iterator) now produces a reference to the underlying type (same - example: Instruction&) instead of a pointer to the underlying object. This - makes it much easier to write nested loops that iterator over things, changing - this: - - for (Function::iterator BI = Func->begin(); BI != Func->end(); ++BI) - for (BasicBlock::iterator II = (*BI)->begin(); II != (*BI)->end(); ++II) - (*II)->dump(); - - into: - - for (Function::iterator BI = Func->begin(); BI != Func->end(); ++BI) - for (BasicBlock::iterator II = BI->begin(); II != BI->end(); ++II) - II->dump(); - - which is much more natural and what users expect. - -* Simplification of #include's: Before, it was necessary for a .cpp file to - include every .h file that it used. Now things are batched a little bit more - to make it easier to use. Specifically, the include graph now includes these - edges: - Module.h -> Function.h, GlobalVariable.h - Function.h -> BasicBlock.h, Argument.h - BasicBlock.h -> Instruction.h - - Which means that #including Function.h is usually sufficient for getting the - lower level #includes. - -* Printing out a Value* has now changed: Printing a Value* will soon print out - the address of the value instead of the contents of the Value. To print out - the contents, you must convert it to a reference with (for example) - 'cout << *I' instead of 'cout << I;'. This conversion is not yet complete, - but will be eventually. In the mean time, both forms print out the contents. - -* References are used much more throughout the code base. In general, if a - pointer is known to never be null, it is passed in as a reference instead of a - pointer. For example, the instruction visitor class uses references instead - of pointers, and that Pass subclasses now all receive references to Values - instead of pointers, because they may never be null. - -* The Function class now has helper functions for accessing the Arguments list. - Instead of having to go through getArgumentList for simple things like - iterator over the arguments, now the a*() methods can be used to access them. - diff --git a/docs/HistoricalNotes/2003-01-23-CygwinNotes.txt b/docs/HistoricalNotes/2003-01-23-CygwinNotes.txt deleted file mode 100644 index fbe811d627..0000000000 --- a/docs/HistoricalNotes/2003-01-23-CygwinNotes.txt +++ /dev/null @@ -1,28 +0,0 @@ -Date: Mon, 20 Jan 2003 00:00:28 -0600 -From: Brian R. Gaeke <gaeke@uiuc.edu> -Subject: windows vs. llvm - -If you're interested, here are some of the major problems compiling LLVM -under Cygwin and/or Mingw. - -1. Cygwin doesn't have <inttypes.h> or <stdint.h>, so all the INT*_MAX - symbols and standard int*_t types are off in limbo somewhere. Mingw has - <stdint.h>, but Cygwin doesn't like it. - -2. Mingw doesn't have <dlfcn.h> (because Windows doesn't have it.) - -3. SA_SIGINFO and friends are not around; only signal() seems to work. - -4. Relink, aka ld -r, doesn't work (probably an ld bug); you need - DONT_BUILD_RELINKED. This breaks all the tools makefiles; you just need to - change them to have .a's. - -5. There isn't a <values.h>. - -6. There isn't a mallinfo() (or, at least, it's documented, but it doesn't seem - to link). - -7. The version of Bison that cygwin (and newer Linux versions) comes with - does not like = signs in rules. Burg's gram.yc source file uses them. I think - you can just take them out. - diff --git a/docs/HistoricalNotes/2003-06-25-Reoptimizer1.txt b/docs/HistoricalNotes/2003-06-25-Reoptimizer1.txt deleted file mode 100644 index a745784639..0000000000 --- a/docs/HistoricalNotes/2003-06-25-Reoptimizer1.txt +++ /dev/null @@ -1,137 +0,0 @@ -Wed Jun 25 15:13:51 CDT 2003 - -First-level instrumentation ---------------------------- - -We use opt to do Bytecode-to-bytecode instrumentation. Look at -back-edges and insert llvm_first_trigger() function call which takes -no arguments and no return value. This instrumentation is designed to -be easy to remove, for instance by writing a NOP over the function -call instruction. - -Keep count of every call to llvm_first_trigger(), and maintain -counters in a map indexed by return address. If the trigger count -exceeds a threshold, we identify a hot loop and perform second-level -instrumentation on the hot loop region (the instructions between the -target of the back-edge and the branch that causes the back-edge). We -do not move code across basic-block boundaries. - - -Second-level instrumentation ---------------------------- - -We remove the first-level instrumentation by overwriting the CALL to -llvm_first_trigger() with a NOP. - -The reoptimizer maintains a map between machine-code basic blocks and -LLVM BasicBlock*s. We only keep track of paths that start at the -first machine-code basic block of the hot loop region. - -How do we keep track of which edges to instrument, and which edges are -exits from the hot region? 3 step process. - -1) Do a DFS from the first machine-code basic block of the hot loop -region and mark reachable edges. - -2) Do a DFS from the last machine-code basic block of the hot loop -region IGNORING back edges, and mark the edges which are reachable in -1) and also in 2) (i.e., must be reachable from both the start BB and -the end BB of the hot region). - -3) Mark BBs which end in edges that exit the hot region; we need to -instrument these differently. - -Assume that there is 1 free register. On SPARC we use %g1, which LLC -has agreed not to use. Shift a 1 into it at the beginning. At every -edge which corresponds to a conditional branch, we shift 0 for not -taken and 1 for taken into a register. This uniquely numbers the paths -through the hot region. Silently fail if we need more than 64 bits. - -At the end BB we call countPath and increment the counter based on %g1 -and the return address of the countPath call. We keep track of the -number of iterations and the number of paths. We only run this -version 30 or 40 times. - -Find the BBs that total 90% or more of execution, and aggregate them -together to form our trace. But we do not allow more than 5 paths; if -we have more than 5 we take the ones that are executed the most. We -verify our assumption that we picked a hot back-edge in first-level -instrumentation, by making sure that the number of times we took an -exit edge from the hot trace is less than 10% of the number of -iterations. - -LLC has been taught to recognize llvm_first_trigger() calls and NOT -generate saves and restores of caller-saved registers around these -calls. - - -Phase behavior --------------- - -We turn off llvm_first_trigger() calls with NOPs, but this would hide -phase behavior from us (when some funcs/traces stop being hot and -others become hot.) - -We have a SIGALRM timer that counts time for us. Every time we get a -SIGALRM we look at our priority queue of locations where we have -removed llvm_first_trigger() calls. Each location is inserted along -with a time when we will next turn instrumentation back on for that -call site. If the time has arrived for a particular call site, we pop -that off the prio. queue and turn instrumentation back on for that -call site. - - -Generating traces ------------------ - -When we finally generate an optimized trace we first copy the code -into the trace cache. This leaves us with 3 copies of the code: the -original code, the instrumented code, and the optimized trace. The -optimized trace does not have instrumentation. The original code and -the instrumented code are modified to have a branch to the trace -cache, where the optimized traces are kept. - -We copy the code from the original to the instrumentation version -by tracing the LLVM-to-Machine code basic block map and then copying -each machine code basic block we think is in the hot region into the -trace cache. Then we instrument that code. The process is similar for -generating the final optimized trace; we copy the same basic blocks -because we might need to put in fixup code for exit BBs. - -LLVM basic blocks are not typically used in the Reoptimizer except -for the mapping information. - -We are restricted to using single instructions to branch between the -original code, trace, and instrumented code. So we have to keep the -code copies in memory near the original code (they can't be far enough -away that a single pc-relative branch would not work.) Malloc() or -data region space is too far away. this impacts the design of the -trace cache. - -We use a dummy function that is full of a bunch of for loops which we -overwrite with trace-cache code. The trace manager keeps track of -whether or not we have enough space in the trace cache, etc. - -The trace insertion routine takes an original start address, a vector -of machine instructions representing the trace, index of branches and -their corresponding absolute targets, and index of calls and their -corresponding absolute targets. - -The trace insertion routine is responsible for inserting branches from -the beginning of the original code to the beginning of the optimized -trace. This is because at some point the trace cache may run out of -space and it may have to evict a trace, at which point the branch to -the trace would also have to be removed. It uses a round-robin -replacement policy; we have found that this is almost as good as LRU -and better than random (especially because of problems fitting the new -trace in.) - -We cannot deal with discontiguous trace cache areas. The trace cache -is supposed to be cache-line-aligned, but it is not page-aligned. - -We generate instrumentation traces and optimized traces into separate -trace caches. We keep the instrumented code around because you don't -want to delete a trace when you still might have to return to it -(i.e., return from a llvm_first_trigger() or countPath() call.) - - diff --git a/docs/HistoricalNotes/2003-06-26-Reoptimizer2.txt b/docs/HistoricalNotes/2003-06-26-Reoptimizer2.txt deleted file mode 100644 index ec4b93fea0..0000000000 --- a/docs/HistoricalNotes/2003-06-26-Reoptimizer2.txt +++ /dev/null @@ -1,110 +0,0 @@ -Thu Jun 26 14:43:04 CDT 2003 - -Information about BinInterface ------------------------------- - -Take in a set of instructions with some particular register -allocation. It allows you to add, modify, or delete some instructions, -in SSA form (kind of like LLVM's MachineInstrs.) Then re-allocate -registers. It assumes that the transformations you are doing are safe. -It does not update the mapping information or the LLVM representation -for the modified trace (so it would not, for instance, support -multiple optimization passes; passes have to be aware of and update -manually the mapping information.) - -The way you use it is you take the original code and provide it to -BinInterface; then you do optimizations to it, then you put it in the -trace cache. - -The BinInterface tries to find live-outs for traces so that it can do -register allocation on just the trace, and stitch the trace back into -the original code. It has to preserve the live-ins and live-outs when -it does its register allocation. (On exits from the trace we have -epilogues that copy live-outs back into the right registers, but -live-ins have to be in the right registers.) - - -Limitations of BinInterface ---------------------------- - -It does copy insertions for PHIs, which it infers from the machine -code. The mapping info inserted by LLC is not sufficient to determine -the PHIs. - -It does not handle integer or floating-point condition codes and it -does not handle floating-point register allocation. - -It is not aggressively able to use lots of registers. - -There is a problem with alloca: we cannot find our spill space for -spilling registers, normally allocated on the stack, if the trace -follows an alloca(). What might be an acceptable solution would be to -disable trace generation on functions that have variable-sized -alloca()s. Variable-sized allocas in the trace would also probably -screw things up. - -Because of the FP and alloca limitations, the BinInterface is -completely disabled right now. - - -Demo ----- - -This is a demo of the Ball & Larus version that does NOT use 2-level -profiling. - -1. Compile program with llvm-gcc. -2. Run opt -lowerswitch -paths -emitfuncs on the bytecode. - -lowerswitch change switch statements to branches - -paths Ball & Larus path-profiling algorithm - -emitfuncs emit the table of functions -3. Run llc to generate SPARC assembly code for the result of step 2. -4. Use g++ to link the (instrumented) assembly code. - -We use a script to do all this: ------------------------------------------------------------------------------- -#!/bin/sh -llvm-gcc $1.c -o $1 -opt -lowerswitch -paths -emitfuncs $1.bc > $1.run.bc -llc -f $1.run.bc -LIBS=$HOME/llvm_sparc/lib/Debug -GXX=/usr/dcs/software/evaluation/bin/g++ -$GXX -g -L $LIBS $1.run.s -o $1.run.llc \ -$LIBS/tracecache.o \ -$LIBS/mapinfo.o \ -$LIBS/trigger.o \ -$LIBS/profpaths.o \ -$LIBS/bininterface.o \ -$LIBS/support.o \ -$LIBS/vmcore.o \ -$LIBS/transformutils.o \ -$LIBS/bcreader.o \ --lscalaropts -lscalaropts -lanalysis \ --lmalloc -lcpc -lm -ldl ------------------------------------------------------------------------------- - -5. Run the resulting binary. You will see output from BinInterface -(described below) intermixed with the output from the program. - - -Output from BinInterface ------------------------- - -BinInterface's debugging code prints out the following stuff in order: - -1. Initial code provided to BinInterface with original register -allocation. - -2. Section 0 is the trace prolog, consisting mainly of live-ins and -register saves which will be restored in epilogs. - -3. Section 1 is the trace itself, in SSA form used by BinInterface, -along with the PHIs that are inserted. -PHIs are followed by the copies that implement them. -Each branch (i.e., out of the trace) is annotated with the -section number that represents the epilog it branches to. - -4. All the other sections starting with Section 2 are trace epilogs. -Every branch from the trace has to go to some epilog. - -5. After the last section is the register allocation output. diff --git a/docs/HistoricalNotes/2007-OriginalClangReadme.txt b/docs/HistoricalNotes/2007-OriginalClangReadme.txt deleted file mode 100644 index 611dc9d2c0..0000000000 --- a/docs/HistoricalNotes/2007-OriginalClangReadme.txt +++ /dev/null @@ -1,178 +0,0 @@ -//===----------------------------------------------------------------------===// -// C Language Family Front-end -//===----------------------------------------------------------------------===// - Chris Lattner - -I. Introduction: - - clang: noun - 1. A loud, resonant, metallic sound. - 2. The strident call of a crane or goose. - 3. C-language family front-end toolkit. - - The world needs better compiler tools, tools which are built as libraries. This - design point allows reuse of the tools in new and novel ways. However, building - the tools as libraries isn't enough: they must have clean APIs, be as - decoupled from each other as possible, and be easy to modify/extend. This - requires clean layering, decent design, and avoiding tying the libraries to a - specific use. Oh yeah, did I mention that we want the resultant libraries to - be as fast as possible? :) - - This front-end is built as a component of the LLVM toolkit that can be used - with the LLVM backend or independently of it. In this spirit, the API has been - carefully designed as the following components: - - libsupport - Basic support library, reused from LLVM. - - libsystem - System abstraction library, reused from LLVM. - - libbasic - Diagnostics, SourceLocations, SourceBuffer abstraction, - file system caching for input source files. This depends on - libsupport and libsystem. - - libast - Provides classes to represent the C AST, the C type system, - builtin functions, and various helpers for analyzing and - manipulating the AST (visitors, pretty printers, etc). This - library depends on libbasic. - - - liblex - C/C++/ObjC lexing and preprocessing, identifier hash table, - pragma handling, tokens, and macros. This depends on libbasic. - - libparse - C (for now) parsing and local semantic analysis. This library - invokes coarse-grained 'Actions' provided by the client to do - stuff (e.g. libsema builds ASTs). This depends on liblex. - - libsema - Provides a set of parser actions to build a standardized AST - for programs. AST's are 'streamed' out a top-level declaration - at a time, allowing clients to use decl-at-a-time processing, - build up entire translation units, or even build 'whole - program' ASTs depending on how they use the APIs. This depends - on libast and libparse. - - librewrite - Fast, scalable rewriting of source code. This operates on - the raw syntactic text of source code, allowing a client - to insert and delete text in very large source files using - the same source location information embedded in ASTs. This - is intended to be a low-level API that is useful for - higher-level clients and libraries such as code refactoring. - - libanalysis - Source-level dataflow analysis useful for performing analyses - such as computing live variables. It also includes a - path-sensitive "graph-reachability" engine for writing - analyses that reason about different possible paths of - execution through source code. This is currently being - employed to write a set of checks for finding bugs in software. - - libcodegen - Lower the AST to LLVM IR for optimization & codegen. Depends - on libast. - - clang - An example driver, client of the libraries at various levels. - This depends on all these libraries, and on LLVM VMCore. - - This front-end has been intentionally built as a DAG of libraries, making it - easy to reuse individual parts or replace pieces if desired. For example, to - build a preprocessor, you take the Basic and Lexer libraries. If you want an - indexer, you take those plus the Parser library and provide some actions for - indexing. If you want a refactoring, static analysis, or source-to-source - compiler tool, it makes sense to take those plus the AST building and semantic - analyzer library. Finally, if you want to use this with the LLVM backend, - you'd take these components plus the AST to LLVM lowering code. - - In the future I hope this toolkit will grow to include new and interesting - components, including a C++ front-end, ObjC support, and a whole lot of other - things. - - Finally, it should be pointed out that the goal here is to build something that - is high-quality and industrial-strength: all the obnoxious features of the C - family must be correctly supported (trigraphs, preprocessor arcana, K&R-style - prototypes, GCC/MS extensions, etc). It cannot be used if it is not 'real'. - - -II. Usage of clang driver: - - * Basic Command-Line Options: - - Help: clang --help - - Standard GCC options accepted: -E, -I*, -i*, -pedantic, -std=c90, etc. - - To make diagnostics more gcc-like: -fno-caret-diagnostics -fno-show-column - - Enable metric printing: -stats - - * -fsyntax-only is currently the default mode. - - * -E mode works the same way as GCC. - - * -Eonly mode does all preprocessing, but does not print the output, - useful for timing the preprocessor. - - * -fsyntax-only is currently partially implemented, lacking some - semantic analysis (some errors and warnings are not produced). - - * -parse-noop parses code without building an AST. This is useful - for timing the cost of the parser without including AST building - time. - - * -parse-ast builds ASTs, but doesn't print them. This is most - useful for timing AST building vs -parse-noop. - - * -parse-ast-print pretty prints most expression and statements nodes. - - * -parse-ast-check checks that diagnostic messages that are expected - are reported and that those which are reported are expected. - - * -dump-cfg builds ASTs and then CFGs. CFGs are then pretty-printed. - - * -view-cfg builds ASTs and then CFGs. CFGs are then visualized by - invoking Graphviz. - - For more information on getting Graphviz to work with clang/LLVM, - see: http://llvm.org/docs/ProgrammersManual.html#ViewGraph - - -III. Current advantages over GCC: - - * Column numbers are fully tracked (no 256 col limit, no GCC-style pruning). - * All diagnostics have column numbers, includes 'caret diagnostics', and they - highlight regions of interesting code (e.g. the LHS and RHS of a binop). - * Full diagnostic customization by client (can format diagnostics however they - like, e.g. in an IDE or refactoring tool) through DiagnosticClient interface. - * Built as a framework, can be reused by multiple tools. - * All languages supported linked into same library (no cc1,cc1obj, ...). - * mmap's code in read-only, does not dirty the pages like GCC (mem footprint). - * LLVM License, can be linked into non-GPL projects. - * Full diagnostic control, per diagnostic. Diagnostics are identified by ID. - * Significantly faster than GCC at semantic analysis, parsing, preprocessing - and lexing. - * Defers exposing platform-specific stuff to as late as possible, tracks use of - platform-specific features (e.g. #ifdef PPC) to allow 'portable bytecodes'. - * The lexer doesn't rely on the "lexer hack": it has no notion of scope and - does not categorize identifiers as types or variables -- this is up to the - parser to decide. - -Potential Future Features: - - * Fine grained diag control within the source (#pragma enable/disable warning). - * Better token tracking within macros? (Token came from this line, which is - a macro argument instantiated here, recursively instantiated here). - * Fast #import with a module system. - * Dependency tracking: change to header file doesn't recompile every function - that texually depends on it: recompile only those functions that need it. - This is aka 'incremental parsing'. - - -IV. Missing Functionality / Improvements - -Lexer: - * Source character mapping. GCC supports ASCII and UTF-8. - See GCC options: -ftarget-charset and -ftarget-wide-charset. - * Universal character support. Experimental in GCC, enabled with - -fextended-identifiers. - * -fpreprocessed mode. - -Preprocessor: - * #assert/#unassert - * MSExtension: "L#param" stringizes to a wide string literal. - * Add support for -M* - -Traditional Preprocessor: - * Currently, we have none. :) - |