diff options
author | mike-m <mikem.llvm@gmail.com> | 2010-05-07 00:28:04 +0000 |
---|---|---|
committer | mike-m <mikem.llvm@gmail.com> | 2010-05-07 00:28:04 +0000 |
commit | e2c3a49c8029ebd9ef530101cc24c66562e3dff5 (patch) | |
tree | 91bf9600cc8df90cf99751a8f8bafc317cffc91e /docs/HistoricalNotes | |
parent | c10b5afbe8138b0fdf3af4ed3e1ddf96cf3cb4cb (diff) |
Revert r103213. It broke several sections of live website.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@103219 91177308-0d34-0410-b5e6-96231b3b80d8
Diffstat (limited to 'docs/HistoricalNotes')
27 files changed, 2287 insertions, 0 deletions
diff --git a/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeas.txt b/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeas.txt new file mode 100644 index 0000000000..f086181192 --- /dev/null +++ b/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeas.txt @@ -0,0 +1,74 @@ +Date: Sat, 18 Nov 2000 09:19:35 -0600 (CST) +From: Vikram Adve <vadve@cs.uiuc.edu> +To: Chris Lattner <lattner@cs.uiuc.edu> +Subject: a few thoughts + +I've been mulling over the virtual machine problem and I had some +thoughts about some things for us to think about discuss: + +1. We need to be clear on our goals for the VM. Do we want to emphasize + portability and safety like the Java VM? Or shall we focus on the + architecture interface first (i.e., consider the code generation and + processor issues), since the architecture interface question is also + important for portable Java-type VMs? + + This is important because the audiences for these two goals are very + different. Architects and many compiler people care much more about + the second question. The Java compiler and OS community care much more + about the first one. + + Also, while the architecture interface question is important for + Java-type VMs, the design constraints are very different. + + +2. Design issues to consider (an initial list that we should continue + to modify). Note that I'm not trying to suggest actual solutions here, + but just various directions we can pursue: + + a. A single-assignment VM, which we've both already been thinking about. + + b. A strongly-typed VM. One question is do we need the types to be + explicitly declared or should they be inferred by the dynamic compiler? + + c. How do we get more high-level information into the VM while keeping + to a low-level VM design? + + o Explicit array references as operands? An alternative is + to have just an array type, and let the index computations be + separate 3-operand instructions. + + o Explicit instructions to handle aliasing, e.g.s: + -- an instruction to say "I speculate that these two values are not + aliased, but check at runtime", like speculative execution in + EPIC? + -- or an instruction to check whether two values are aliased and + execute different code depending on the answer, somewhat like + predicated code in EPIC + + o (This one is a difficult but powerful idea.) + A "thread-id" field on every instruction that allows the static + compiler to generate a set of parallel threads, and then have + the runtime compiler and hardware do what they please with it. + This has very powerful uses, but thread-id on every instruction + is expensive in terms of instruction size and code size. + We would need to compactly encode it somehow. + + Also, this will require some reading on at least two other + projects: + -- Multiscalar architecture from Wisconsin + -- Simultaneous multithreading architecture from Washington + + o Or forget all this and stick to a traditional instruction set? + + +BTW, on an unrelated note, after the meeting yesterday, I did remember +that you had suggested doing instruction scheduling on SSA form instead +of a dependence DAG earlier in the semester. When we talked about +it yesterday, I didn't remember where the idea had come from but I +remembered later. Just giving credit where its due... + +Perhaps you can save the above as a file under RCS so you and I can +continue to expand on this. + +--Vikram + diff --git a/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeasResp.txt b/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeasResp.txt new file mode 100644 index 0000000000..1c725f5aa7 --- /dev/null +++ b/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeasResp.txt @@ -0,0 +1,199 @@ +Date: Sun, 19 Nov 2000 16:23:57 -0600 (CST) +From: Chris Lattner <sabre@nondot.org> +To: Vikram Adve <vadve@cs.uiuc.edu> +Subject: Re: a few thoughts + +Okay... here are a few of my thoughts on this (it's good to know that we +think so alike!): + +> 1. We need to be clear on our goals for the VM. Do we want to emphasize +> portability and safety like the Java VM? Or shall we focus on the +> architecture interface first (i.e., consider the code generation and +> processor issues), since the architecture interface question is also +> important for portable Java-type VMs? + +I forsee the architecture looking kinda like this: (which is completely +subject to change) + +1. The VM code is NOT guaranteed safe in a java sense. Doing so makes it + basically impossible to support C like languages. Besides that, + certifying a register based language as safe at run time would be a + pretty expensive operation to have to do. Additionally, we would like + to be able to statically eliminate many bounds checks in Java + programs... for example. + + 2. Instead, we can do the following (eventually): + * Java bytecode is used as our "safe" representation (to avoid + reinventing something that we don't add much value to). When the + user chooses to execute Java bytecodes directly (ie, not + precompiled) the runtime compiler can do some very simple + transformations (JIT style) to convert it into valid input for our + VM. Performance is not wonderful, but it works right. + * The file is scheduled to be compiled (rigorously) at a later + time. This could be done by some background process or by a second + processor in the system during idle time or something... + * To keep things "safe" ie to enforce a sandbox on Java/foreign code, + we could sign the generated VM code with a host specific private + key. Then before the code is executed/loaded, we can check to see if + the trusted compiler generated the code. This would be much quicker + than having to validate consistency (especially if bounds checks have + been removed, for example) + +> This is important because the audiences for these two goals are very +> different. Architects and many compiler people care much more about +> the second question. The Java compiler and OS community care much more +> about the first one. + +3. By focusing on a more low level virtual machine, we have much more room + for value add. The nice safe "sandbox" VM can be provided as a layer + on top of it. It also lets us focus on the more interesting compilers + related projects. + +> 2. Design issues to consider (an initial list that we should continue +> to modify). Note that I'm not trying to suggest actual solutions here, +> but just various directions we can pursue: + +Understood. :) + +> a. A single-assignment VM, which we've both already been thinking +> about. + +Yup, I think that this makes a lot of sense. I am still intrigued, +however, by the prospect of a minimally allocated VM representation... I +think that it could have definate advantages for certain applications +(think very small machines, like PDAs). I don't, however, think that our +initial implementations should focus on this. :) + +Here are some other auxilliary goals that I think we should consider: + +1. Primary goal: Support a high performance dynamic compilation + system. This means that we have an "ideal" division of labor between + the runtime and static compilers. Of course, the other goals of the + system somewhat reduce the importance of this point (f.e. portability + reduces performance, but hopefully not much) +2. Portability to different processors. Since we are most familiar with + x86 and solaris, I think that these two are excellent candidates when + we get that far... +3. Support for all languages & styles of programming (general purpose + VM). This is the point that disallows java style bytecodes, where all + array refs are checked for bounds, etc... +4. Support linking between different language families. For example, call + C functions directly from Java without using the nasty/slow/gross JNI + layer. This involves several subpoints: + A. Support for languages that require garbage collectors and integration + with languages that don't. As a base point, we could insist on + always using a conservative GC, but implement free as a noop, f.e. + +> b. A strongly-typed VM. One question is do we need the types to be +> explicitly declared or should they be inferred by the dynamic +> compiler? + + B. This is kind of similar to another idea that I have: make OOP + constructs (virtual function tables, class heirarchies, etc) explicit + in the VM representation. I believe that the number of additional + constructs would be fairly low, but would give us lots of important + information... something else that would/could be important is to + have exceptions as first class types so that they would be handled in + a uniform way for the entire VM... so that C functions can call Java + functions for example... + +> c. How do we get more high-level information into the VM while keeping +> to a low-level VM design? +> o Explicit array references as operands? An alternative is +> to have just an array type, and let the index computations be +> separate 3-operand instructions. + + C. In the model I was thinking of (subject to change of course), we + would just have an array type (distinct from the pointer + types). This would allow us to have arbitrarily complex index + expressions, while still distinguishing "load" from "Array load", + for example. Perhaps also, switch jump tables would be first class + types as well? This would allow better reasoning about the program. + +5. Support dynamic loading of code from various sources. Already + mentioned above was the example of loading java bytecodes, but we want + to support dynamic loading of VM code as well. This makes the job of + the runtime compiler much more interesting: it can do interprocedural + optimizations that the static compiler can't do, because it doesn't + have all of the required information (for example, inlining from + shared libraries, etc...) + +6. Define a set of generally useful annotations to add to the VM + representation. For example, a function can be analysed to see if it + has any sideeffects when run... also, the MOD/REF sets could be + calculated, etc... we would have to determine what is reasonable. This + would generally be used to make IP optimizations cheaper for the + runtime compiler... + +> o Explicit instructions to handle aliasing, e.g.s: +> -- an instruction to say "I speculate that these two values are not +> aliased, but check at runtime", like speculative execution in +> EPIC? +> -- or an instruction to check whether two values are aliased and +> execute different code depending on the answer, somewhat like +> predicated code in EPIC + +These are also very good points... if this can be determined at compile +time. I think that an epic style of representation (not the instruction +packing, just the information presented) could be a very interesting model +to use... more later... + +> o (This one is a difficult but powerful idea.) +> A "thread-id" field on every instruction that allows the static +> compiler to generate a set of parallel threads, and then have +> the runtime compiler and hardware do what they please with it. +> This has very powerful uses, but thread-id on every instruction +> is expensive in terms of instruction size and code size. +> We would need to compactly encode it somehow. + +Yes yes yes! :) I think it would be *VERY* useful to include this kind +of information (which EPIC architectures *implicitly* encode. The trend +that we are seeing supports this greatly: + +1. Commodity processors are getting massive SIMD support: + * Intel/Amd MMX/MMX2 + * AMD's 3Dnow! + * Intel's SSE/SSE2 + * Sun's VIS +2. SMP is becoming much more common, especially in the server space. +3. Multiple processors on a die are right around the corner. + +If nothing else, not designing this in would severely limit our future +expansion of the project... + +> Also, this will require some reading on at least two other +> projects: +> -- Multiscalar architecture from Wisconsin +> -- Simultaneous multithreading architecture from Washington +> +> o Or forget all this and stick to a traditional instruction set? + +Heh... :) Well, from a pure research point of view, it is almost more +attactive to go with the most extreme/different ISA possible. On one axis +you get safety and conservatism, and on the other you get degree of +influence that the results have. Of course the problem with pure research +is that often times there is no concrete product of the research... :) + +> BTW, on an unrelated note, after the meeting yesterday, I did remember +> that you had suggested doing instruction scheduling on SSA form instead +> of a dependence DAG earlier in the semester. When we talked about +> it yesterday, I didn't remember where the idea had come from but I +> remembered later. Just giving credit where its due... + +:) Thanks. + +> Perhaps you can save the above as a file under RCS so you and I can +> continue to expand on this. + +I think it makes sense to do so when we get our ideas more formalized and +bounce it back and forth a couple of times... then I'll do a more formal +writeup of our goals and ideas. Obviously our first implementation will +not want to do all of the stuff that I pointed out above... be we will +want to design the project so that we do not artificially limit ourselves +at sometime in the future... + +Anyways, let me know what you think about these ideas... and if they sound +reasonable... + +-Chris + diff --git a/docs/HistoricalNotes/2000-12-06-EncodingIdea.txt b/docs/HistoricalNotes/2000-12-06-EncodingIdea.txt new file mode 100644 index 0000000000..8c452924dd --- /dev/null +++ b/docs/HistoricalNotes/2000-12-06-EncodingIdea.txt @@ -0,0 +1,30 @@ +From: Chris Lattner [mailto:sabre@nondot.org] +Sent: Wednesday, December 06, 2000 6:41 PM +To: Vikram S. Adve +Subject: Additional idea with respect to encoding + +Here's another idea with respect to keeping the common case instruction +size down (less than 32 bits ideally): + +Instead of encoding an instruction to operate on two register numbers, +have it operate on two negative offsets based on the current register +number. Therefore, instead of using: + +r57 = add r55, r56 (r57 is the implicit dest register, of course) + +We could use: + +r57 = add -2, -1 + +My guess is that most SSA references are to recent values (especially if +they correspond to expressions like (x+y*z+p*q/ ...), so the negative +numbers would tend to stay small, even at the end of the procedure (where +the implicit register destination number could be quite large). Of course +the negative sign is reduntant, so you would be storing small integers +almost all of the time, and 5-6 bits worth of register number would be +plenty for most cases... + +What do you think? + +-Chris + diff --git a/docs/HistoricalNotes/2000-12-06-MeetingSummary.txt b/docs/HistoricalNotes/2000-12-06-MeetingSummary.txt new file mode 100644 index 0000000000..b66e18556f --- /dev/null +++ b/docs/HistoricalNotes/2000-12-06-MeetingSummary.txt @@ -0,0 +1,83 @@ +SUMMARY +------- + +We met to discuss the LLVM instruction format and bytecode representation: + +ISSUES RESOLVED +--------------- + +1. We decided that we shall use a flat namespace to represent our + variables in SSA form, as opposed to having a two dimensional namespace + of the original variable and the SSA instance subscript. + +ARGUMENT AGAINST: + * A two dimensional namespace would be valuable when doing alias + analysis because the extra information can help limit the scope of + analysis. + +ARGUMENT FOR: + * Including this information would require that all users of the LLVM + bytecode would have to parse and handle it. This would slow down the + common case and inflate the instruction representation with another + infinite variable space. + +REASONING: + * It was decided that because original variable sources could be + reconstructed from SSA form in linear time, that it would be an + unjustified expense for the common case to include the extra + information for one optimization. Alias analysis itself is typically + greater than linear in asymptotic complexity, so this extra analaysis + would not affect the runtime of the optimization in a significant + way. Additionally, this would be an unlikely optimization to do at + runtime. + + +IDEAS TO CONSIDER +----------------- + +1. Including dominator information in the LLVM bytecode + representation. This is one example of an analysis result that may be + packaged with the bytecodes themselves. As a conceptual implementation + idea, we could include an immediate dominator number for each basic block + in the LLVM bytecode program. Basic blocks could be numbered according + to the order of occurance in the bytecode representation. + +2. Including loop header and body information. This would facilitate + detection of intervals and natural loops. + +UNRESOLVED ISSUES +----------------- + +1. Will oSUIF provide enough of an infrastructure to support the research + that we will be doing? We know that it has less than stellar + performance, but hope that this will be of little importance for our + static compiler. This could affect us if we decided to do some IP + research. Also we do not yet understand the level of exception support + currently implemented. + +2. Should we consider the requirements of a direct hardware implementation + of the LLVM when we design it? If so, several design issues should + have their priorities shifted. The other option is to focus on a + software layer interpreting the LLVM in all cases. + +3. Should we use some form of packetized format to improve forward + compatibility? For example, we could design the system to encode a + packet type and length field before analysis information, to allow a + runtime to skip information that it didn't understand in a bytecode + stream. The obvious benefit would be for compatibility, the drawback + is that it would tend to splinter that 'standard' LLVM definition. + +4. Should we use fixed length instructions or variable length + instructions? Fetching variable length instructions is expensive (for + either hardware or software based LLVM runtimes), but we have several + 'infinite' spaces that instructions operate in (SSA register numbers, + type spaces, or packet length [if packets were implemented]). Several + options were mentioned including: + A. Using 16 or 32 bit numbers, which would be 'big enough' + B. A scheme similar to how UTF-8 works, to encode infinite numbers + while keeping small number small. + C. Use something similar to Huffman encoding, so that the most common + numbers are the smallest. + +-Chris + diff --git a/docs/HistoricalNotes/2001-01-31-UniversalIRIdea.txt b/docs/HistoricalNotes/2001-01-31-UniversalIRIdea.txt new file mode 100644 index 0000000000..111706a344 --- /dev/null +++ b/docs/HistoricalNotes/2001-01-31-UniversalIRIdea.txt @@ -0,0 +1,39 @@ +Date: Wed, 31 Jan 2001 12:04:33 -0600 +From: Vikram S. Adve <vadve@cs.uiuc.edu> +To: Chris Lattner <lattner@cs.uiuc.edu> +Subject: another thought + +I have a budding idea about making LLVM a little more ambitious: a +customizable runtime system that can be used to implement language-specific +virtual machines for many different languages. E.g., a C vm, a C++ vm, a +Java vm, a Lisp vm, .. + +The idea would be that LLVM would provide a standard set of runtime features +(some low-level like standard assembly instructions with code generation and +static and runtime optimization; some higher-level like type-safety and +perhaps a garbage collection library). Each language vm would select the +runtime features needed for that language, extending or customizing them as +needed. Most of the machine-dependent code-generation and optimization +features as well as low-level machine-independent optimizations (like PRE) +could be provided by LLVM and should be sufficient for any language, +simplifying the language compiler. (This would also help interoperability +between languages.) Also, some or most of the higher-level +machine-independent features like type-safety and access safety should be +reusable by different languages, with minor extensions. The language +compiler could then focus on language-specific analyses and optimizations. + +The risk is that this sounds like a universal IR -- something that the +compiler community has tried and failed to develop for decades, and is +universally skeptical about. No matter what we say, we won't be able to +convince anyone that we have a universal IR that will work. We need to +think about whether LLVM is different or if has something novel that might +convince people. E.g., the idea of providing a package of separable +features that different languages select from. Also, using SSA with or +without type-safety as the intermediate representation. + +One interesting starting point would be to discuss how a JVM would be +implemented on top of LLVM a bit more. That might give us clues on how to +structure LLVM to support one or more language VMs. + +--Vikram + diff --git a/docs/HistoricalNotes/2001-02-06-TypeNotationDebate.txt b/docs/HistoricalNotes/2001-02-06-TypeNotationDebate.txt new file mode 100644 index 0000000000..c09cf1f03c --- /dev/null +++ b/docs/HistoricalNotes/2001-02-06-TypeNotationDebate.txt @@ -0,0 +1,67 @@ +Date: Tue, 6 Feb 2001 20:27:37 -0600 (CST) +From: Chris Lattner <sabre@nondot.org> +To: Vikram S. Adve <vadve@cs.uiuc.edu> +Subject: Type notation debate... + +This is the way that I am currently planning on implementing types: + +Primitive Types: +type ::= void|bool|sbyte|ubyte|short|ushort|int|uint|long|ulong + +Method: +typelist ::= typelisth | /*empty*/ +typelisth ::= type | typelisth ',' type +type ::= type (typelist) + +Arrays (without and with size): +type ::= '[' type ']' | '[' INT ',' type ']' + +Pointer: +type ::= type '*' + +Structure: +type ::= '{' typelist '}' + +Packed: +type ::= '<' INT ',' type '>' + +Simple examples: + +[[ %4, int ]] - array of (array of 4 (int)) +[ { int, int } ] - Array of structure +[ < %4, int > ] - Array of 128 bit SIMD packets +int (int, [[int, %4]]) - Method taking a 2d array and int, returning int + + +Okay before you comment, please look at: + +http://www.research.att.com/~bs/devXinterview.html + +Search for "In another interview, you defined the C declarator syntax as +an experiment that failed. However, this syntactic construct has been +around for 27 years and perhaps more; why do you consider it problematic +(except for its cumbersome syntax)?" and read that response for me. :) + +Now with this syntax, his example would be represented as: + +[ %10, bool (int, int) * ] * + +vs + +bool (*(*)[10])(int, int) + +in C. + +Basically, my argument for this type construction system is that it is +VERY simple to use and understand (although it IS different than C, it is +very simple and straightforward, which C is NOT). In fact, I would assert +that most programmers TODAY do not understand pointers to member +functions, and have to look up an example when they have to write them. + +In my opinion, it is critically important to have clear and concise type +specifications, because types are going to be all over the programs. + +Let me know your thoughts on this. :) + +-Chris + diff --git a/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp1.txt b/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp1.txt new file mode 100644 index 0000000000..8bfefbf69f --- /dev/null +++ b/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp1.txt @@ -0,0 +1,75 @@ +Date: Thu, 8 Feb 2001 08:42:04 -0600 +From: Vikram S. Adve <vadve@cs.uiuc.edu> +To: Chris Lattner <sabre@nondot.org> +Subject: RE: Type notation debate... + +Chris, + +> Okay before you comment, please look at: +> +> http://www.research.att.com/~bs/devXinterview.html + +I read this argument. Even before that, I was already in agreement with you +and him that the C declarator syntax is difficult and confusing. + +But in fact, if you read the entire answer carefully, he came to the same +conclusion I do: that you have to go with familiar syntax over logical +syntax because familiarity is such a strong force: + + "However, familiarity is a strong force. To compare, in English, we +live +more or less happily with the absurd rules for "to be" (am, are, is, been, +was, were, ...) and all attempts to simplify are treated with contempt or +(preferably) humor. It be a curious world and it always beed." + +> Basically, my argument for this type construction system is that it is +> VERY simple to use and understand (although it IS different than C, it is +> very simple and straightforward, which C is NOT). In fact, I would assert +> that most programmers TODAY do not understand pointers to member +> functions, and have to look up an example when they have to write them. + +Again, I don't disagree with this at all. But to some extent this +particular problem is inherently difficult. Your syntax for the above +example may be easier for you to read because this is the way you have been +thinking about it. Honestly, I don't find it much easier than the C syntax. +In either case, I would have to look up an example to write pointers to +member functions. + +But pointers to member functions are nowhere near as common as arrays. And +the old array syntax: + type [ int, int, ...] +is just much more familiar and clear to people than anything new you +introduce, no matter how logical it is. Introducing a new syntax that may +make function pointers easier but makes arrays much more difficult seems +very risky to me. + +> In my opinion, it is critically important to have clear and concise type +> specifications, because types are going to be all over the programs. + +I absolutely agree. But the question is, what is more clear and concise? +The syntax programmers are used to out of years of experience or a new +syntax that they have never seen that has a more logical structure. I think +the answer is the former. Sometimes, you have to give up a better idea +because you can't overcome sociological barriers to it. Qwerty keyboards +and Windows are two classic examples of bad technology that are difficult to +root out. + +P.S. Also, while I agree that most your syntax is more logical, there is +one part that isn't: + +Arrays (without and with size): +type ::= '[' type ']' | '[' INT ',' type ']'. + +The arrays with size lists the dimensions and the type in a single list. +That is just too confusing: + [10, 40, int] +This seems to be a 3-D array where the third dimension is something strange. +It is too confusing to have a list of 3 things, some of which are dimensions +and one is a type. Either of the following would be better: + + array [10, 40] of int +or + int [10, 40] + +--Vikram + diff --git a/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp2.txt b/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp2.txt new file mode 100644 index 0000000000..6e9784158a --- /dev/null +++ b/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp2.txt @@ -0,0 +1,53 @@ +Date: Thu, 8 Feb 2001 14:31:05 -0600 (CST) +From: Chris Lattner <sabre@nondot.org> +To: Vikram S. Adve <vadve@cs.uiuc.edu> +Subject: RE: Type notation debate... + +> Arrays (without and with size): +> type ::= '[' type ']' | '[' INT ',' type ']'. +> +> The arrays with size lists the dimensions and the type in a single list. +> That is just too confusing: + +> [10, 40, int] +> This seems to be a 3-D array where the third dimension is something strange. +> It is too confusing to have a list of 3 things, some of which are dimensions +> and one is a type. + +The above grammar indicates that there is only one integer parameter, ie +the upper bound. The lower bound is always implied to be zero, for +several reasons: + +* As a low level VM, we want to expose addressing computations + explicitly. Since the lower bound must always be known in a high level + language statically, the language front end can do the translation + automatically. +* This fits more closely with what Java needs, ie what we need in the + short term. Java arrays are always zero based. + +If a two element list is too confusing, I would recommend an alternate +syntax of: + +type ::= '[' type ']' | '[' INT 'x' type ']'. + +For example: + [12 x int] + [12x int] + [ 12 x [ 4x int ]] + +Which is syntactically nicer, and more explicit. + +> Either of the following would be better: +> array [10, 40] of int + +I considered this approach for arrays in general (ie array of int/ array +of 12 int), but found that it made declarations WAY too long. Remember +that because of the nature of llvm, you get a lot of types strewn all over +the program, and using the 'typedef' like facility is not a wonderful +option, because then types aren't explicit anymore. + +I find this email interesting, because you contradict the previous email +you sent, where you recommend that we stick to C syntax.... + +-Chris + diff --git a/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp4.txt b/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp4.txt new file mode 100644 index 0000000000..7b9032742a --- /dev/null +++ b/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp4.txt @@ -0,0 +1,89 @@ +> But in fact, if you read the entire answer carefully, he came to the same +> conclusion I do: that you have to go with familiar syntax over logical +> syntax because familiarity is such a strong force: +> "However, familiarity is a strong force. To compare, in English, we +live +> more or less happily with the absurd rules for "to be" (am, are, is, been, +> was, were, ...) and all attempts to simplify are treated with contempt or +> (preferably) humor. It be a curious world and it always beed." + +Although you have to remember that his situation was considerably +different than ours. He was in a position where he was designing a high +level language that had to be COMPATIBLE with C. Our language is such +that a new person would have to learn the new, different, syntax +anyways. Making them learn about the type system does not seem like much +of a stretch from learning the opcodes and how SSA form works, and how +everything ties together... + +> > Basically, my argument for this type construction system is that it is +> > VERY simple to use and understand (although it IS different than C, it is +> > very simple and straightforward, which C is NOT). In fact, I would assert +> > that most programmers TODAY do not understand pointers to member +> > functions, and have to look up an example when they have to write them. + +> Again, I don't disagree with this at all. But to some extent this +> particular problem is inherently difficult. Your syntax for the above +> example may be easier for you to read because this is the way you have been +> thinking about it. Honestly, I don't find it much easier than the C syntax. +> In either case, I would have to look up an example to write pointers to +> member functions. + +I would argue that because the lexical structure of the language is self +consistent, any person who spent a significant amount of time programming +in LLVM directly would understand how to do it without looking it up in a +manual. The reason this does not work for C is because you rarely have to +declare these pointers, and the syntax is inconsistent with the method +declaration and calling syntax. + +> But pointers to member functions are nowhere near as common as arrays. + +Very true. If you're implementing an object oriented language, however, +remember that you have to do all the pointer to member function stuff +yourself.... so everytime you invoke a virtual method one is involved +(instead of having C++ hide it for you behind "syntactic sugar"). + +> And the old array syntax: +> type [ int, int, ...] +> is just much more familiar and clear to people than anything new you +> introduce, no matter how logical it is. + +Erm... excuse me but how is this the "old array syntax"? If you are +arguing for consistency with C, you should be asking for 'type int []', +which is significantly different than the above (beside the above +introduces a new operator and duplicates information +needlessly). Basically what I am suggesting is exactly the above without +the fluff. So instead of: + + type [ int, int, ...] + +you use: + + type [ int ] + +> Introducing a new syntax that may +> make function pointers easier but makes arrays much more difficult seems +> very risky to me. + +This is not about function pointers. This is about consistency in the +type system, and consistency with the rest of the language. The point +above does not make arrays any more difficult to use, and makes the +structure of types much more obvious than the "c way". + +> > In my opinion, it is critically important to have clear and concise type +> > specifications, because types are going to be all over the programs. +> +> I absolutely agree. But the question is, what is more clear and concise? +> The syntax programmers are used to out of years of experience or a new +> syntax that they have never seen that has a more logical structure. I think +> the answer is the former. Sometimes, you have to give up a better idea +> because you can't overcome sociological barriers to it. Qwerty keyboards +> and Windows are two classic examples of bad technology that are difficult to +> root out. + +Very true, but you seem to be advocating a completely different Type +system than C has, in addition to it not offering the advantages of clear +structure that the system I recommended does... so you seem to not have a +problem with changing this, just with what I change it to. :) + +-Chris + diff --git a/docs/HistoricalNotes/2001-02-09-AdveComments.txt b/docs/HistoricalNotes/2001-02-09-AdveComments.txt new file mode 100644 index 0000000000..5503233c1e --- /dev/null +++ b/docs/HistoricalNotes/2001-02-09-AdveComments.txt @@ -0,0 +1,120 @@ +Ok, here are my comments and suggestions about the LLVM instruction set. +We should discuss some now, but can discuss many of them later, when we +revisit synchronization, type inference, and other issues. +(We have discussed some of the comments already.) + + +o We should consider eliminating the type annotation in cases where it is + essentially obvious from the instruction type, e.g., in br, it is obvious + that the first arg. should be a bool and the other args should be labels: + + br bool <cond>, label <iftrue>, label <iffalse> + + I think your point was that making all types explicit improves clarity + and readability. I agree to some extent, but it also comes at the cost + of verbosity. And when the types are obvious from people's experience + (e.g., in the br instruction), it doesn't seem to help as much. + + +o On reflection, I really like your idea of having the two different switch + types (even though they encode implementation techniques rather than + semantics). It should simplify building the CFG and my guess is it could + enable some significant optimizations, though we should think about which. + + +o In the lookup-indirect form of the switch, is there a reason not to make + the val-type uint? Most HLL switch statements (including Java and C++) + require that anyway. And it would also make the val-type uniform + in the two forms of the switch. + + I did see the switch-on-bool examples and, while cute, we can just use + the branch instructions in that particular case. + + +o I agree with your comment that we don't need 'neg'. + + +o There's a trade-off with the cast instruction: + + it avoids having to define all the upcasts and downcasts that are + valid for the operands of each instruction (you probably have thought + of other benefits also) + - it could make the bytecode significantly larger because there could + be a lot of cast operations + + +o Making the second arg. to 'shl' a ubyte seems good enough to me. + 255 positions seems adequate for several generations of machines + and is more compact than uint. + + +o I still have some major concerns about including malloc and free in the + language (either as builtin functions or instructions). LLVM must be + able to represent code from many different languages. Languages such as + C, C++ Java and Fortran 90 would not be able to use our malloc anyway + because each of them will want to provide a library implementation of it. + + This gets even worse when code from different languages is linked + into a single executable (which is fairly common in large apps). + Having a single malloc would just not suffice, and instead would simply + complicate the picture further because it adds an extra variant in + addition to the one each language provides. + + Instead, providing a default library version of malloc and free + (and perhaps a malloc_gc with garbage collection instead of free) + would make a good implementation available to anyone who wants it. + + I don't recall all your arguments in favor so let's discuss this again, + and soon. + + +o 'alloca' on the other hand sounds like a good idea, and the + implementation seems fairly language-independent so it doesn't have the + problems with malloc listed above. + + +o About indirect call: + Your option #2 sounded good to me. I'm not sure I understand your + concern about an explicit 'icall' instruction? + + +o A pair of important synchronization instr'ns to think about: + load-linked + store-conditional + + +o Other classes of instructions that are valuable for pipeline performance: + conditional-move + predicated instructions + + +o I believe tail calls are relatively easy to identify; do you know why + .NET has a tailcall instruction? + + +o I agree that we need a static data space. Otherwise, emulating global + data gets unnecessarily complex. + + +o About explicit parallelism: + + We once talked about adding a symbolic thread-id field to each + instruction. (It could be optional so single-threaded codes are + not penalized.) This could map well to multi-threaded architectures + while providing easy ILP for single-threaded onces. But it is probably + too radical an idea to include in a base version of LLVM. Instead, it + could a great topic for a separate study. + + What is the semantics of the IA64 stop bit? + + + + +o And finally, another thought about the syntax for arrays :-) + + Although this syntax: + array <dimension-list> of <type> + is verbose, it will be used only in the human-readable assembly code so + size should not matter. I think we should consider it because I find it + to be the clearest syntax. It could even make arrays of function + pointers somewhat readable. + diff --git a/docs/HistoricalNotes/2001-02-09-AdveCommentsResponse.txt b/docs/HistoricalNotes/2001-02-09-AdveCommentsResponse.txt new file mode 100644 index 0000000000..5c87330fb7 --- /dev/null +++ b/docs/HistoricalNotes/2001-02-09-AdveCommentsResponse.txt @@ -0,0 +1,245 @@ +From: Chris Lattner <sabre@nondot.org> +To: "Vikram S. Adve" <vadve@cs.uiuc.edu> +Subject: Re: LLVM Feedback + +I've included your feedback in the /home/vadve/lattner/llvm/docs directory +so that it will live in CVS eventually with the rest of LLVM. I've +significantly updated the documentation to reflect the changes you +suggested, as specified below: + +> We should consider eliminating the type annotation in cases where it is +> essentially obvious from the instruction type: +> br bool <cond>, label <iftrue>, label <iffalse> +> I think your point was that making all types explicit improves clarity +> and readability. I agree to some extent, but it also comes at the +> cost of verbosity. And when the types are obvious from people's +> experience (e.g., in the br instruction), it doesn't seem to help as +> much. + +Very true. We should discuss this more, but my reasoning is more of a +consistency argument. There are VERY few instructions that can have all +of the types eliminated, and doing so when available unnecesarily makes +the language more difficult to handle. Especially when you see 'int +%this' and 'bool %that' all over the place, I think it would be +disorienting to see: + + br %predicate, %iftrue, %iffalse + +for branches. Even just typing that once gives me the creeps. ;) Like I +said, we should probably discuss this further in person... + +> On reflection, I really like your idea of having the two different +> switch types (even though they encode implementation techniques rather +> than semantics). It should simplify building the CFG and my guess is it +> could enable some significant optimizations, though we should think +> about which. + +Great. I added a note to the switch section commenting on how the VM +should just use the instruction type as a hint, and that the +implementation may choose altermate representations (such as predicated +branches). + +> In the lookup-indirect form of the switch, is there a reason not to +> make the val-type uint? + +No. This was something I was debating for a while, and didn't really feel +strongly about either way. It is common to switch on other types in HLL's +(for example signed int's are particually common), but in this case, all +that will be added is an additional 'cast' instruction. I removed that +from the spec. + +> I agree with your comment that we don't need 'neg' + +Removed. + +> There's a trade-off with the cast instruction: +> + it avoids having to define all the upcasts and downcasts that are +> valid for the operands of each instruction (you probably have +> thought of other benefits also) +> - it could make the bytecode significantly larger because there could +> be a lot of cast operations + + + You NEED casts to represent things like: + void foo(float); + ... + int x; + ... + foo(x); + in a language like C. Even in a Java like language, you need upcasts + and some way to implement dynamic downcasts. + + Not all forms of instructions take every type (for example you can't + shift by a floating point number of bits), thus SOME programs will need + implicit casts. + +To be efficient and to avoid your '-' point above, we just have to be +careful to specify that the instructions shall operate on all common +types, therefore casting should be relatively uncommon. For example all +of the arithmetic operations work on almost all data types. + +> Making the second arg. to 'shl' a ubyte seems good enough to me. +> 255 positions seems adequate for several generations of machines + +Okay, that comment is removed. + +> and is more compact than uint. + +No, it isn't. Remember that the bytecode encoding saves value slots into +the bytecode instructions themselves, not constant values. This is +another case where we may introduce more cast instructions (but we will +also reduce the number of opcode variants that must be supported by a +virtual machine). Because most shifts are by constant values, I don't +think that we'll have to cast many shifts. :) + +> I still have some major concerns about including malloc and free in the +> language (either as builtin functions or instructions). + +Agreed. How about this proposal: + +malloc/free are either built in functions or actual opcodes. They provide +all of the type safety that the document would indicate, blah blah +blah. :) + +Now, because of all of the excellent points that you raised, an +implementation may want to override the default malloc/free behavior of +the program. To do this, they simply implement a "malloc" and +"free" function. The virtual machine will then be defined to use the user +defined malloc/free function (which return/take void*'s, not type'd +pointers like the builtin function would) if one is available, otherwise +fall back on a system malloc/free. + +Does this sound like a good compromise? It would give us all of the +typesafety/elegance in the language while still allowing the user to do +all the cool stuff they want to... + +> 'alloca' on the other hand sounds like a good idea, and the +> implementation seems fairly language-independent so it doesn't have the +> problems with malloc listed above. + +Okay, once we get the above stuff figured out, I'll put it all in the +spec. + +> About indirect call: +> Your option #2 sounded good to me. I'm not sure I understand your +> concern about an explicit 'icall' instruction? + +I worry too much. :) The other alternative has been removed. 'icall' is +now up in the instruction list next to 'call'. + +> I believe tail calls are relatively easy to identify; do you know why +> .NET has a tailcall instruction? + +Although I am just guessing, I believe it probably has to do with the fact +that they want languages like Haskell and lisp to be efficiently runnable +on their VM. Of course this means that the VM MUST implement tail calls +'correctly', or else life will suck. :) I would put this into a future +feature bin, because it could be pretty handy... + +> A pair of important synchronization instr'ns to think about: +> load-linked +> store-conditional + +What is 'load-linked'? I think that (at least for now) I should add these +to the 'possible extensions' section, because they are not immediately +needed... + +> Other classes of instructions that are valuable for pipeline +> performance: +> conditional-move +> predicated instructions + +Conditional move is effectly a special case of a predicated +instruction... and I think that all predicated instructions can possibly +be implemented later in LLVM. It would significantly change things, and +it doesn't seem to be very necessary right now. It would seem to +complicate flow control analysis a LOT in the virtual machine. I would +tend to prefer that a predicated architecture like IA64 convert from a +"basic block" representation to a predicated rep as part of it's dynamic +complication phase. Also, if a basic block contains ONLY a move, then +that can be trivally translated into a conditional move... + +> I agree that we need a static data space. Otherwise, emulating global +> data gets unnecessarily complex. + +Definately. Also a later item though. :) + +> We once talked about adding a symbolic thread-id field to each +> .. +> Instead, it could a great topic for a separate study. + +Agreed. :) + +> What is the semantics of the IA64 stop bit? + +Basically, the IA64 writes instructions like this: +mov ... +add ... +sub ... +op xxx +op xxx +;; +mov ... +add ... +sub ... +op xxx +op xxx +;; + +Where the ;; delimits a group of instruction with no dependencies between +them, which can all be executed concurrently (to the limits of the +available functional units). The ;; gets translated into a bit set in one +of the opcodes. + +The advantages of this representation is that you don't have to do some +kind of 'thread id scheduling' pass by having to specify ahead of time how +many threads to use, and the representation doesn't have a per instruction +overhead... + +> And finally, another thought about the syntax for arrays :-) +> Although this syntax: +> array <dimension-list> of <type> +> is verbose, it will be used only in the human-readable assembly code so +> size should not matter. I think we should consider it because I find it +> to be the clearest syntax. It could even make arrays of function +> pointers somewhat readable. + +My only comment will be to give you an example of why this is a bad +idea. :) + +Here is an example of using the switch statement (with my recommended +syntax): + +switch uint %val, label %otherwise, + [%3 x {uint, label}] [ { uint %57, label %l1 }, + { uint %20, label %l2 }, + { uint %14, label %l3 } ] + +Here it is with the syntax you are proposing: + +switch uint %val, label %otherwise, + array %3 of {uint, label} + array of {uint, label} + { uint %57, label %l1 }, + { uint %20, label %l2 }, + { uint %14, label %l3 } + +Which is ambiguous and very verbose. It would be possible to specify +constants with [] brackets as in my syntax, which would look like this: + +switch uint %val, label %otherwise, + array %3 of {uint, label} [ { uint %57, label %l1 }, + { uint %20, label %l2 }, + { uint %14, label %l3 } ] + +But then the syntax is inconsistent between type definition and constant +definition (why do []'s enclose the constants but not the types??). + +Anyways, I'm sure that there is much debate still to be had over +this... :) + +-Chris + +http://www.nondot.org/~sabre/os/ +http://www.nondot.org/MagicStats/ +http://korbit.sourceforge.net/ + + diff --git a/docs/HistoricalNotes/2001-02-13-Reference-Memory.txt b/docs/HistoricalNotes/2001-02-13-Reference-Memory.txt new file mode 100644 index 0000000000..2c7534d9da --- /dev/null +++ b/docs/HistoricalNotes/2001-02-13-Reference-Memory.txt @@ -0,0 +1,39 @@ +Date: Tue, 13 Feb 2001 13:29:52 -0600 (CST) +From: Chris Lattner <sabre@nondot.org> +To: Vikram S. Adve <vadve@cs.uiuc.edu> +Subject: LLVM Concerns... + + +I've updated the documentation to include load store and allocation +instructions (please take a look and let me know if I'm on the right +track): + +file:/home/vadve/lattner/llvm/docs/LangRef.html#memoryops + +I have a couple of concerns I would like to bring up: + +1. Reference types + Right now, I've spec'd out the language to have a pointer type, which + works fine for lots of stuff... except that Java really has + references: constrained pointers that cannot be manipulated: added and + subtracted, moved, etc... Do we want to have a type like this? It + could be very nice for analysis (pointer always points to the start of + an object, etc...) and more closely matches Java semantics. The + pointer type would be kept for C++ like semantics. Through analysis, + C++ pointers could be promoted to references in the LLVM + representation. + +2. Our "implicit" memory references in assembly language: + After thinking about it, this model has two problems: + A. If you do pointer analysis and realize that two stores are + independent and can share the same memory source object, there is + no way to represent this in either the bytecode or assembly. + B. When parsing assembly/bytecode, we effectively have to do a full + SSA generation/PHI node insertion pass to build the dependencies + when we don't want the "pinned" representation. This is not + cool. + I'm tempted to make memory references explicit in both the assembly and + bytecode to get around this... what do you think? + +-Chris + diff --git a/docs/HistoricalNotes/2001-02-13-Reference-MemoryResponse.txt b/docs/HistoricalNotes/2001-02-13-Reference-MemoryResponse.txt new file mode 100644 index 0000000000..505343378d --- /dev/null +++ b/docs/HistoricalNotes/2001-02-13-Reference-MemoryResponse.txt @@ -0,0 +1,47 @@ +Date: Tue, 13 Feb 2001 18:25:42 -0600 +From: Vikram S. Adve <vadve@cs.uiuc.edu> +To: Chris Lattner <sabre@nondot.org> +Subject: RE: LLVM Concerns... + +> 1. Reference types +> Right now, I've spec'd out the language to have a pointer type, which +> works fine for lots of stuff... except that Java really has +> references: constrained pointers that cannot be manipulated: added and +> subtracted, moved, etc... Do we want to have a type like this? It +> could be very nice for analysis (pointer always points to the start of +> an object, etc...) and more closely matches Java semantics. The +> pointer type would be kept for C++ like semantics. Through analysis, +> C++ pointers could be promoted to references in the LLVM +> representation. + + +You're right, having references would be useful. Even for C++ the *static* +compiler could generate references instead of pointers with fairly +straightforward analysis. Let's include a reference type for now. But I'm +also really concerned that LLVM is becoming big and complex and (perhaps) +too high-level. After we get some initial performance results, we may have +a clearer idea of what our goals should be and we should revisit this +question then. + +> 2. Our "implicit" memory references in assembly language: +> After thinking about it, this model has two problems: +> A. If you do pointer analysis and realize that two stores are +> independent and can share the same memory source object, + +not sure what you meant by "share the same memory source object" + +> there is +> no way to represent this in either the bytecode or assembly. +> B. When parsing assembly/bytecode, we effectively have to do a full +> SSA generation/PHI node insertion pass to build the dependencies +> when we don't want the "pinned" representation. This is not +> cool. + +I understand the concern. But again, let's focus on the performance first +and then look at the language design issues. E.g., it would be good to know +how big the bytecode files are before expanding them further. I am pretty +keen to explore the implications of LLVM for mobile devices. Both bytecode +size and power consumption are important to consider there. + +--Vikram + diff --git a/docs/HistoricalNotes/2001-04-16-DynamicCompilation.txt b/docs/HistoricalNotes/2001-04-16-DynamicCompilation.txt new file mode 100644 index 0000000000..5f7843ab56 --- /dev/null +++ b/docs/HistoricalNotes/2001-04-16-DynamicCompilation.txt @@ -0,0 +1,49 @@ +By Chris: + +LLVM has been designed with two primary goals in mind. First we strive to +enable the best possible division of labor between static and dynamic +compilers, and second, we need a flexible and powerful interface +between these two complementary stages of compilation. We feel that +providing a solution to these two goals will yield an excellent solution +to the performance problem faced by modern architectures and programming +languages. + +A key insight into current compiler and runtime systems is that a +compiler may fall in anywhere in a "continuum of compilation" to do its +job. On one side, scripting languages statically compile nothing and +dynamically compile (or equivalently, interpret) everything. On the far +other side, traditional static compilers process everything statically and +nothing dynamically. These approaches have typically been seen as a +tradeoff between performance and portability. On a deeper level, however, +there are two reasons that optimal system performance may be obtained by a +system somewhere in between these two extremes: Dynamic application +behavior and social constraints. + +From a technical perspective, pure static compilation cannot ever give +optimal performance in all cases, because applications have varying dynamic +behavior that the static compiler cannot take into consideration. Even +compilers that support profile guided optimization generate poor code in +the real world, because using such optimization tunes that application +to one particular usage pattern, whereas real programs (as opposed to +benchmarks) often have several different usage patterns. + +On a social level, static compilation is a very shortsighted solution to +the performance problem. Instruction set architectures (ISAs) continuously +evolve, and each implementation of an ISA (a processor) must choose a set +of tradeoffs that make sense in the market context that it is designed for. +With every new processor introduced, the vendor faces two fundamental +problems: First, there is a lag time between when a processor is introduced +to when compilers generate quality code for the architecture. Secondly, +even when compilers catch up to the new architecture there is often a large +body of legacy code that was compiled for previous generations and will +not or can not be upgraded. Thus a large percentage of code running on a +processor may be compiled quite sub-optimally for the current +characteristics of the dynamic execution environment. + +For these reasons, LLVM has been designed from the beginning as a long-term +solution to these problems. Its design allows the large body of platform +independent, static, program optimizations currently in compilers to be +reused unchanged in their current form. It also provides important static +type information to enable powerful dynamic and link time optimizations +to be performed quickly and efficiently. This combination enables an +increase in effective system performance for real world environments. diff --git a/docs/HistoricalNotes/2001-05-18-ExceptionHandling.txt b/docs/HistoricalNotes/2001-05-18-ExceptionHandling.txt new file mode 100644 index 0000000000..b546301d35 --- /dev/null +++ b/docs/HistoricalNotes/2001-05-18-ExceptionHandling.txt @@ -0,0 +1,202 @@ +Meeting notes: Implementation idea: Exception Handling in C++/Java + +The 5/18/01 meeting discussed ideas for implementing exceptions in LLVM. +We decided that the best solution requires a set of library calls provided by +the VM, as well as an extension to the LLVM function invocation syntax. + +The LLVM function invocation instruction previously looks like this (ignoring +types): + + call func(arg1, arg2, arg3) + +The extension discussed today adds an optional "with" clause that +associates a label with the call site. The new syntax looks like this: + + call func(arg1, arg2, arg3) with funcCleanup + +This funcHandler always stays tightly associated with the call site (being +encoded directly into the call opcode itself), and should be used whenever +there is cleanup work that needs to be done for the current function if +an exception is thrown by func (or if we are in a try block). + +To support this, the VM/Runtime provide the following simple library +functions (all syntax in this document is very abstract): + +typedef struct { something } %frame; + The VM must export a "frame type", that is an opaque structure used to + implement different types of stack walking that may be used by various + language runtime libraries. We imagine that it would be typical to + represent a frame with a PC and frame pointer pair, although that is not + required. + +%frame getStackCurrentFrame(); + Get a frame object for the current function. Note that if the current + function was inlined into its caller, the "current" frame will belong to + the "caller". + +bool isFirstFrame(%frame f); + Returns true if the specified frame is the top level (first activated) frame + for this thread. For the main thread, this corresponds to the main() + function, for a spawned thread, it corresponds to the thread function. + +%frame getNextFrame(%frame f); + Return the previous frame on the stack. This function is undefined if f + satisfies the predicate isFirstFrame(f). + +Label *getFrameLabel(%frame f); + If a label was associated with f (as discussed below), this function returns + it. Otherwise, it returns a null pointer. + +doNonLocalBranch(Label *L); + At this point, it is not clear whether this should be a function or + intrinsic. It should probably be an intrinsic in LLVM, but we'll deal with + this issue later. + + +Here is a motivating example that illustrates how these facilities could be +used to implement the C++ exception model: + +void TestFunction(...) { + A a; B b; + foo(); // Any function call may throw + bar(); + C c; + + try { + D d; + baz(); + } catch (int) { + ...int Stuff... + // execution continues after the try block: the exception is consumed + } catch (double) { + ...double stuff... + throw; // Exception is propogated + } +} + +This function would compile to approximately the following code (heavy +pseudo code follows): + +Func: + %a = alloca A + A::A(%a) // These ctors & dtors could throw, but we ignore this + %b = alloca B // minor detail for this example + B::B(%b) + + call foo() with fooCleanup // An exception in foo is propogated to fooCleanup + call bar() with barCleanup // An exception in bar is propogated to barCleanup + + %c = alloca C + C::C(c) + %d = alloca D + D::D(d) + call baz() with bazCleanup // An exception in baz is propogated to bazCleanup + d->~D(); +EndTry: // This label corresponds to the end of the try block + c->~C() // These could also throw, these are also ignored + b->~B() + a->~A() + return + +Note that this is a very straight forward and literal translation: exactly +what we want for zero cost (when unused) exception handling. Especially on +platforms with many registers (ie, the IA64) setjmp/longjmp style exception +handling is *very* impractical. Also, the "with" clauses describe the +control flow paths explicitly so that analysis is not adversly effected. + +The foo/barCleanup labels are implemented as: + +TryCleanup: // Executed if an exception escapes the try block + c->~C() +barCleanup: // Executed if an exception escapes from bar() + // fall through +fooCleanup: // Executed if an exception escapes from foo() + b->~B() + a->~A() + Exception *E = getThreadLocalException() + call throw(E) // Implemented by the C++ runtime, described below + +Which does the work one would expect. getThreadLocalException is a function +implemented by the C++ support library. It returns the current exception +object for the current thread. Note that we do not attempt to recycle the +shutdown code from before, because performance of the mainline code is +critically important. Also, obviously fooCleanup and barCleanup may be +merged and one of them eliminated. This just shows how the code generator +would most likely emit code. + +The bazCleanup label is more interesting. Because the exception may be caught +by the try block, we must dispatch to its handler... but it does not exist +on the call stack (it does not have a VM Call->Label mapping installed), so +we must dispatch statically with a goto. The bazHandler thus appears as: + +bazHandler: + d->~D(); // destruct D as it goes out of scope when entering catch clauses + goto TryHandler + +In general, TryHandler is not the same as bazHandler, because multiple +function calls could be made from the try block. In this case, trivial +optimization could merge the two basic blocks. TryHandler is the code +that actually determines the type of exception, based on the Exception object +itself. For this discussion, assume that the exception object contains *at +least*: + +1. A pointer to the RTTI info for the contained object +2. A pointer to the dtor for the contained object +3. The contained object itself + +Note that it is necessary to maintain #1 & #2 in the exception object itself +because objects without virtual function tables may be thrown (as in this +example). Assuming this, TryHandler would look something like this: + +TryHandler: + Exception *E = getThreadLocalException(); + switch (E->RTTIType) { + case IntRTTIInfo: + ...int Stuff... // The action to perform from the catch block + break; + case DoubleRTTIInfo: + ...double Stuff... // The action to perform from the catch block + goto TryCleanup // This catch block rethrows the exception + break; // Redundant, eliminated by the optimizer + default: + goto TryCleanup // Exception not caught, rethrow + } + + // Exception was consumed + if (E->dtor) + E->dtor(E->object) // Invoke the dtor on the object if it exists + goto EndTry // Continue mainline code... + +And that is all there is to it. + +The throw(E) function would then be implemented like this (which may be +inlined into the caller through standard optimization): + +function throw(Exception *E) { + // Get the start of the stack trace... + %frame %f = call getStackCurrentFrame() + + // Get the label information that corresponds to it + label * %L = call getFrameLabel(%f) + while (%L == 0 && !isFirstFrame(%f)) { + // Loop until a cleanup handler is found + %f = call getNextFrame(%f) + %L = call getFrameLabel(%f) + } + + if (%L != 0) { + call setThreadLocalException(E) // Allow handlers access to this... + call doNonLocalBranch(%L) + } + // No handler found! + call BlowUp() // Ends up calling the terminate() method in use +} + +That's a brief rundown of how C++ exception handling could be implemented in +llvm. Java would be very similar, except it only uses destructors to unlock +synchronized blocks, not to destroy data. Also, it uses two stack walks: a +nondestructive walk that builds a stack trace, then a destructive walk that +unwinds the stack as shown here. + +It would be trivial to get exception interoperability between C++ and Java. + diff --git a/docs/HistoricalNotes/2001-05-19-ExceptionResponse.txt b/docs/HistoricalNotes/2001-05-19-ExceptionResponse.txt new file mode 100644 index 0000000000..3375365f54 --- /dev/null +++ b/docs/HistoricalNotes/2001-05-19-ExceptionResponse.txt @@ -0,0 +1,45 @@ +Date: Sat, 19 May 2001 19:09:13 -0500 (CDT) +From: Chris Lattner <sabre@nondot.org> +To: Vikram S. Adve <vadve@cs.uiuc.edu> +Subject: RE: Meeting writeup + +> I read it through and it looks great! + +Thanks! + +> The finally clause in Java may need more thought. The code for this clause +> is like a subroutine because it needs to be entered from many points (end of +> try block and beginning of each catch block), and then needs to *return to +> the place from where the code was entered*. That's why JVM has the +> jsr/jsr_w instruction. + +Hrm... I guess that is an implementation decision. It can either be +modelled as a subroutine (as java bytecodes do), which is really +gross... or it can be modelled as code duplication (emitted once inline, +then once in the exception path). Because this could, at worst, +slightly less than double the amount of code in a function (it is +bounded) I don't think this is a big deal. One of the really nice things +about the LLVM representation is that it still allows for runtime code +generation for exception paths (exceptions paths are not compiled until +needed). Obviously a static compiler couldn't do this though. :) + +In this case, only one copy of the code would be compiled... until the +other one is needed on demand. Also this strategy fits with the "zero +cost" exception model... the standard case is not burdened with extra +branches or "call"s. + +> I suppose you could save the return address in a particular register +> (specific to this finally block), jump to the finally block, and then at the +> end of the finally block, jump back indirectly through this register. It +> will complicate building the CFG but I suppose that can be handled. It is +> also unsafe in terms of checking where control returns (which is I suppose +> why the JVM doesn't use this). + +I think that a code duplication method would be cleaner, and would avoid +the caveats that you mention. Also, it does not slow down the normal case +with an indirect branch... + +Like everything, we can probably defer a final decision until later. :) + +-Chris + diff --git a/docs/HistoricalNotes/2001-06-01-GCCOptimizations.txt b/docs/HistoricalNotes/2001-06-01-GCCOptimizations.txt new file mode 100644 index 0000000000..97af16a2da --- /dev/null +++ b/docs/HistoricalNotes/2001-06-01-GCCOptimizations.txt @@ -0,0 +1,63 @@ +Date: Fri, 1 Jun 2001 16:38:17 -0500 (CDT) +From: Chris Lattner <sabre@nondot.org> +To: Vikram S. Adve <vadve@cs.uiuc.edu> +Subject: Interesting: GCC passes + + +Take a look at this document (which describes the order of optimizations +that GCC performs): + +http://gcc.gnu.org/onlinedocs/gcc_17.html + +The rundown is that after RTL generation, the following happens: + +1 . [t] jump optimization (jumps to jumps, etc) +2 . [t] Delete unreachable code +3 . Compute live ranges for CSE +4 . [t] Jump threading (jumps to jumps with identical or inverse conditions) +5 . [t] CSE +6 . *** Conversion to SSA +7 . [t] SSA Based DCE +8 . *** Conversion to LLVM +9 . UnSSA +10. GCSE +11. LICM +12. Strength Reduction +13. Loop unrolling +14. [t] CSE +15. [t] DCE +16. Instruction combination, register movement, scheduling... etc. + +I've marked optimizations with a [t] to indicate things that I believe to +be relatively trivial to implement in LLVM itself. The time consuming +things to reimplement would be SSA based PRE, Strength reduction & loop +unrolling... these would be the major things we would miss out on if we +did LLVM creation from tree code [inlining and other high level +optimizations are done on the tree representation]. + +Given the lack of "strong" optimizations that would take a long time to +reimplement, I am leaning a bit more towards creating LLVM from the tree +code. Especially given that SGI has GPL'd their compiler, including many +SSA based optimizations that could be adapted (besides the fact that their +code looks MUCH nicer than GCC :) + +Even if we choose to do LLVM code emission from RTL, we will almost +certainly want to move LLVM emission from step 8 down until at least CSE +has been rerun... which causes me to wonder if the SSA generation code +will still work (due to global variable dependencies and stuff). I assume +that it can be made to work, but might be a little more involved than we +would like. + +I'm continuing to look at the Tree -> RTL code. It is pretty gross +because they do some of the translation a statement at a time, and some +of it a function at a time... I'm not quite clear why and how the +distinction is drawn, but it does not appear that there is a wonderful +place to attach extra info. + +Anyways, I'm proceeding with the RTL -> LLVM conversion phase for now. We +can talk about this more on Monday. + +Wouldn't it be nice if there were a obvious decision to be made? :) + +-Chris + diff --git a/docs/HistoricalNotes/2001-06-01-GCCOptimizations2.txt b/docs/HistoricalNotes/2001-06-01-GCCOptimizations2.txt new file mode 100644 index 0000000000..6c9e0971a0 --- /dev/null +++ b/docs/HistoricalNotes/2001-06-01-GCCOptimizations2.txt @@ -0,0 +1,71 @@ +Date: Fri, 1 Jun 2001 17:08:44 -0500 (CDT) +From: Chris Lattner <sabre@nondot.org> +To: Vikram S. Adve <vadve@cs.uiuc.edu> +Subject: RE: Interesting: GCC passes + +> That is very interesting. I agree that some of these could be done on LLVM +> at link-time, but it is the extra time required that concerns me. Link-time +> optimization is severely time-constrained. + +If we were to reimplement any of these optimizations, I assume that we +could do them a translation unit at a time, just as GCC does now. This +would lead to a pipeline like this: + +Static optimizations, xlation unit at a time: +.c --GCC--> .llvm --llvmopt--> .llvm + +Link time optimizations: +.llvm --llvm-ld--> .llvm --llvm-link-opt--> .llvm + +Of course, many optimizations could be shared between llvmopt and +llvm-link-opt, but the wouldn't need to be shared... Thus compile time +could be faster, because we are using a "smarter" IR (SSA based). + +> BTW, about SGI, "borrowing" SSA-based optimizations from one compiler and +> putting it into another is not necessarily easier than re-doing it. +> Optimization code is usually heavily tied in to the specific IR they use. + +Understood. The only reason that I brought this up is because SGI's IR is +more similar to LLVM than it is different in many respects (SSA based, +relatively low level, etc), and could be easily adapted. Also their +optimizations are written in C++ and are actually somewhat +structured... of course it would be no walk in the park, but it would be +much less time consuming to adapt, say, SSA-PRE than to rewrite it. + +> But your larger point is valid that adding SSA based optimizations is +> feasible and should be fun. (Again, link time cost is the issue.) + +Assuming linktime cost wasn't an issue, the question is: +Does using GCC's backend buy us anything? + +> It also occurs to me that GCC is probably doing quite a bit of back-end +> optimization (step 16 in your list). Do you have a breakdown of that? + +Not really. The irritating part of GCC is that it mixes it all up and +doesn't have a clean seperation of concerns. A lot of the "back end +optimization" happens right along with other data optimizations (ie, CSE +of machine specific things). + +As far as REAL back end optimizations go, it looks something like this: + +1. Instruction combination: try to make CISCy instructions, if available +2. Register movement: try to get registers in the right places for the +architecture to avoid register to register moves. For example, try to get +the first argument of a function to naturally land in %o0 for sparc. +3. Instruction scheduling: 'nuff said :) +4. Register class preferencing: ?? +5. Local register allocation +6. global register allocation +7. Spilling +8. Local regalloc +9. Jump optimization +10. Delay slot scheduling +11. Branch shorting for CISC machines +12. Instruction selection & peephole optimization +13. Debug info output + +But none of this would be usable for LLVM anyways, unless we were using +GCC as a static compiler. + +-Chris + diff --git a/docs/HistoricalNotes/2001-06-20-.NET-Differences.txt b/docs/HistoricalNotes/2001-06-20-.NET-Differences.txt new file mode 100644 index 0000000000..1bc2eae746 --- /dev/null +++ b/docs/HistoricalNotes/2001-06-20-.NET-Differences.txt @@ -0,0 +1,30 @@ +Date: Wed, 20 Jun 2001 12:32:22 -0500 +From: Vikram Adve <vadve@cs.uiuc.edu> +To: Chris Lattner <lattner@cs.uiuc.edu> +Subject: .NET vs. our VM + +One significant difference between .NET CLR and our VM is that the CLR +includes full information about classes and inheritance. In fact, I just +sat through the paper on adding templates to .NET CLR, and the speaker +indicated that the goal seems to be to do simple static compilation (very +little lowering or optimization). Also, the templates implementation in CLR +"relies on dynamic class loading and JIT compilation". + +This is an important difference because I think there are some significant +advantages to have a much lower level VM layer, and do significant static +analysis and optimization. + +I also talked to the lead guy for KAI's C++ compiler (Arch Robison) and he +said that SGI and other commercial compilers have included options to export +their *IR* next to the object code (i.e., .il files) and use them for +link-time code generation. In fact, he said that the .o file was nearly +empty and was entirely generated from the .il at link-time. But he agreed +that this limited the link-time interprocedural optimization to modules +compiled by the same compiler, whereas our approach allows us to link and +optimize modules from multiple different compilers. (Also, of course, they +don't do anything for runtime optimization). + +All issues to bring up in Related Work. + +--Vikram + diff --git a/docs/HistoricalNotes/2001-07-06-LoweringIRForCodeGen.txt b/docs/HistoricalNotes/2001-07-06-LoweringIRForCodeGen.txt new file mode 100644 index 0000000000..3e10416fe6 --- /dev/null +++ b/docs/HistoricalNotes/2001-07-06-LoweringIRForCodeGen.txt @@ -0,0 +1,31 @@ +Date: Fri, 6 Jul 2001 16:56:56 -0500 +From: Vikram S. Adve <vadve@cs.uiuc.edu> +To: Chris Lattner <lattner@cs.uiuc.edu> +Subject: lowering the IR + +BTW, I do think that we should consider lowering the IR as you said. I +didn't get time to raise it today, but it comes up with the SPARC +move-conditional instruction. I don't think we want to put that in the core +VM -- it is a little too specialized. But without a corresponding +conditional move instruction in the VM, it is pretty difficult to maintain a +close mapping between VM and machine code. Other architectures may have +other such instructions. + +What I was going to suggest was that for a particular processor, we define +additional VM instructions that match some of the unusual opcodes on the +processor but have VM semantics otherwise, i.e., all operands are in SSA +form and typed. This means that we can re-generate core VM code from the +more specialized code any time we want (so that portability is not lost). + +Typically, a static compiler like gcc would generate just the core VM, which +is relatively portable. Anyone (an offline tool, the linker, etc., or even +the static compiler itself if it chooses) can transform that into more +specialized target-specific VM code for a particular architecture. If the +linker does it, it can do it after all machine-independent optimizations. +This would be the most convenient, but not necessary. + +The main benefit of lowering will be that we will be able to retain a close +mapping between VM and machine code. + +--Vikram + diff --git a/docs/HistoricalNotes/2001-09-18-OptimizeExceptions.txt b/docs/HistoricalNotes/2001-09-18-OptimizeExceptions.txt new file mode 100644 index 0000000000..9379081018 --- /dev/null +++ b/docs/HistoricalNotes/2001-09-18-OptimizeExceptions.txt @@ -0,0 +1,56 @@ +Date: Tue, 18 Sep 2001 00:38:37 -0500 (CDT) +From: Chris Lattner <sabre@nondot.org> +To: Vikram S. Adve <vadve@cs.uiuc.edu> +Subject: Idea for a simple, useful link time optimization + + +In C++ programs, exceptions suck, and here's why: + +1. In virtually all function calls, you must assume that the function + throws an exception, unless it is defined as 'nothrow'. This means + that every function call has to have code to invoke dtors on objects + locally if one is thrown by the function. Most functions don't throw + exceptions, so this code is dead [with all the bad effects of dead + code, including icache pollution]. +2. Declaring a function nothrow causes catch blocks to be added to every + call that isnot provably nothrow. This makes them very slow. +3. Extra extraneous exception edges reduce the opportunity for code + motion. +4. EH is typically implemented with large lookup tables. Ours is going to + be much smaller (than the "standard" way of doing it) to start with, + but eliminating it entirely would be nice. :) +5. It is physically impossible to correctly put (accurate, correct) + exception specifications on generic, templated code. But it is trivial + to analyze instantiations of said code. +6. Most large C++ programs throw few exceptions. Most well designed + programs only throw exceptions in specific planned portions of the + code. + +Given our _planned_ model of handling exceptions, all of this would be +pretty trivial to eliminate through some pretty simplistic interprocedural +analysis. The DCE factor alone could probably be pretty significant. The +extra code motion opportunities could also be exploited though... + +Additionally, this optimization can be implemented in a straight forward +conservative manner, allowing libraries to be optimized or individual +files even (if there are leaf functions visible in the translation unit +that are called). + +I think it's a reasonable optimization that hasn't really been addressed +(because assembly is way too low level for this), and could have decent +payoffs... without being a overly complex optimization. + +After I wrote all of that, I found this page that is talking about +basically the same thing I just wrote, except that it is translation unit +at a time, tree based approach: +http://www.ocston.org/~jls/ehopt.html + +but is very useful from "expected gain" and references perspective. Note +that their compiler is apparently unable to inline functions that use +exceptions, so there numbers are pretty worthless... also our results +would (hopefully) be better because it's interprocedural... + +What do you think? + +-Chris + diff --git a/docs/HistoricalNotes/2002-05-12-InstListChange.txt b/docs/HistoricalNotes/2002-05-12-InstListChange.txt new file mode 100644 index 0000000000..004edb068d --- /dev/null +++ b/docs/HistoricalNotes/2002-05-12-InstListChange.txt @@ -0,0 +1,55 @@ +Date: Sun, 12 May 2002 17:12:53 -0500 (CDT) +From: Chris Lattner <sabre@nondot.org> +To: "Vikram S. Adve" <vadve@cs.uiuc.edu> +Subject: LLVM change + +There is a fairly fundemental change that I would like to make to the LLVM +infrastructure, but I'd like to know if you see any drawbacks that I +don't... + +Basically right now at the basic block level, each basic block contains an +instruction list (returned by getInstList()) that is a ValueHolder of +instructions. To iterate over instructions, we must actually iterate over +the instlist, and access the instructions through the instlist. + +To add or remove an instruction from a basic block, we need to get an +iterator to an instruction, which, given just an Instruction*, requires a +linear search of the basic block the instruction is contained in... just +to insert an instruction before another instruction, or to delete an +instruction! This complicates algorithms that should be very simple (like +simple constant propogation), because they aren't actually sparse anymore, +they have to traverse basic blocks to remove constant propogated +instructions. + +Additionally, adding or removing instructions to a basic block +_invalidates all iterators_ pointing into that block, which is really +irritating. + +To fix these problems (and others), I would like to make the ordering of +the instructions be represented with a doubly linked list in the +instructions themselves, instead of an external data structure. This is +how many other representations do it, and frankly I can't remember why I +originally implemented it the way I did. + +Long term, all of the code that depends on the nasty features in the +instruction list (which can be found by grep'ing for getInstList()) will +be changed to do nice local transformations. In the short term, I'll +change the representation, but preserve the interface (including +getInstList()) so that all of the code doesn't have to change. + +Iteration over the instructions in a basic block remains the simple: +for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; ++I) ... + +But we will also support: +for (Instruction *I = BB->front(); I; I = I->getNext()) ... + +After converting instructions over, I'll convert basic blocks and +functions to have a similar interface. + +The only negative aspect of this change that I see is that it increases +the amount of memory consumed by one pointer per instruction. Given the +benefits, I think this is a very reasonable tradeoff. + +What do you think? + +-Chris diff --git a/docs/HistoricalNotes/2002-06-25-MegaPatchInfo.txt b/docs/HistoricalNotes/2002-06-25-MegaPatchInfo.txt new file mode 100644 index 0000000000..2ca46117ca --- /dev/null +++ b/docs/HistoricalNotes/2002-06-25-MegaPatchInfo.txt @@ -0,0 +1,72 @@ +Changes: +* Change the casting code to be const correct. Now, doing this is invalid: + const Value *V = ...; + Instruction *I = dyn_cast<Instruction>(V); + instead, the second line should be: + const Instruction *I = dyn_cast<Instruction>(V); + +* Change the casting code to allow casting a reference value thus: + const Value &V = ...; + Instruction &I = cast<Instruction>(V); + + dyn_cast does not work with references, because it must return a null pointer + on failure. + +* Fundamentally change how instructions and other values are represented. + Before, every llvm container was an instance of the ValueHolder template, + instantiated for each container type. This ValueHolder was effectively a + wrapper around a vector of pointers to the sub-objects. + + Now, instead of having a vector to pointers of objects, the objects are + maintained in a doubly linked list of values (ie each Instruction now has + Next & Previous fields). The containers are now instances of ilist (intrusive + linked list class), which use the next and previous fields to chain them + together. The advantage of this implementation is that iterators can be + formed directly from pointers to the LLVM value, and invalidation is much + easier to handle. + +* As part of the above change, dereferencing an iterator (for example: + BasicBlock::iterator) now produces a reference to the underlying type (same + example: Instruction&) instead of a pointer to the underlying object. This + makes it much easier to write nested loops that iterator over things, changing + this: + + for (Function::iterator BI = Func->begin(); BI != Func->end(); ++BI) + for (BasicBlock::iterator II = (*BI)->begin(); II != (*BI)->end(); ++II) + (*II)->dump(); + + into: + + for (Function::iterator BI = Func->begin(); BI != Func->end(); ++BI) + for (BasicBlock::iterator II = BI->begin(); II != BI->end(); ++II) + II->dump(); + + which is much more natural and what users expect. + +* Simplification of #include's: Before, it was necessary for a .cpp file to + include every .h file that it used. Now things are batched a little bit more + to make it easier to use. Specifically, the include graph now includes these + edges: + Module.h -> Function.h, GlobalVariable.h + Function.h -> BasicBlock.h, Argument.h + BasicBlock.h -> Instruction.h + + Which means that #including Function.h is usually sufficient for getting the + lower level #includes. + +* Printing out a Value* has now changed: Printing a Value* will soon print out + the address of the value instead of the contents of the Value. To print out + the contents, you must convert it to a reference with (for example) + 'cout << *I' instead of 'cout << I;'. This conversion is not yet complete, + but will be eventually. In the mean time, both forms print out the contents. + +* References are used much more throughout the code base. In general, if a + pointer is known to never be null, it is passed in as a reference instead of a + pointer. For example, the instruction visitor class uses references instead + of pointers, and that Pass subclasses now all receive references to Values + instead of pointers, because they may never be null. + +* The Function class now has helper functions for accessing the Arguments list. + Instead of having to go through getArgumentList for simple things like + iterator over the arguments, now the a*() methods can be used to access them. + diff --git a/docs/HistoricalNotes/2003-01-23-CygwinNotes.txt b/docs/HistoricalNotes/2003-01-23-CygwinNotes.txt new file mode 100644 index 0000000000..fbe811d627 --- /dev/null +++ b/docs/HistoricalNotes/2003-01-23-CygwinNotes.txt @@ -0,0 +1,28 @@ +Date: Mon, 20 Jan 2003 00:00:28 -0600 +From: Brian R. Gaeke <gaeke@uiuc.edu> +Subject: windows vs. llvm + +If you're interested, here are some of the major problems compiling LLVM +under Cygwin and/or Mingw. + +1. Cygwin doesn't have <inttypes.h> or <stdint.h>, so all the INT*_MAX + symbols and standard int*_t types are off in limbo somewhere. Mingw has + <stdint.h>, but Cygwin doesn't like it. + +2. Mingw doesn't have <dlfcn.h> (because Windows doesn't have it.) + +3. SA_SIGINFO and friends are not around; only signal() seems to work. + +4. Relink, aka ld -r, doesn't work (probably an ld bug); you need + DONT_BUILD_RELINKED. This breaks all the tools makefiles; you just need to + change them to have .a's. + +5. There isn't a <values.h>. + +6. There isn't a mallinfo() (or, at least, it's documented, but it doesn't seem + to link). + +7. The version of Bison that cygwin (and newer Linux versions) comes with + does not like = signs in rules. Burg's gram.yc source file uses them. I think + you can just take them out. + diff --git a/docs/HistoricalNotes/2003-06-25-Reoptimizer1.txt b/docs/HistoricalNotes/2003-06-25-Reoptimizer1.txt new file mode 100644 index 0000000000..a745784639 --- /dev/null +++ b/docs/HistoricalNotes/2003-06-25-Reoptimizer1.txt @@ -0,0 +1,137 @@ +Wed Jun 25 15:13:51 CDT 2003 + +First-level instrumentation +--------------------------- + +We use opt to do Bytecode-to-bytecode instrumentation. Look at +back-edges and insert llvm_first_trigger() function call which takes +no arguments and no return value. This instrumentation is designed to +be easy to remove, for instance by writing a NOP over the function +call instruction. + +Keep count of every call to llvm_first_trigger(), and maintain +counters in a map indexed by return address. If the trigger count +exceeds a threshold, we identify a hot loop and perform second-level +instrumentation on the hot loop region (the instructions between the +target of the back-edge and the branch that causes the back-edge). We +do not move code across basic-block boundaries. + + +Second-level instrumentation +--------------------------- + +We remove the first-level instrumentation by overwriting the CALL to +llvm_first_trigger() with a NOP. + +The reoptimizer maintains a map between machine-code basic blocks and +LLVM BasicBlock*s. We only keep track of paths that start at the +first machine-code basic block of the hot loop region. + +How do we keep track of which edges to instrument, and which edges are +exits from the hot region? 3 step process. + +1) Do a DFS from the first machine-code basic block of the hot loop +region and mark reachable edges. + +2) Do a DFS from the last machine-code basic block of the hot loop +region IGNORING back edges, and mark the edges which are reachable in +1) and also in 2) (i.e., must be reachable from both the start BB and +the end BB of the hot region). + +3) Mark BBs which end in edges that exit the hot region; we need to +instrument these differently. + +Assume that there is 1 free register. On SPARC we use %g1, which LLC +has agreed not to use. Shift a 1 into it at the beginning. At every +edge which corresponds to a conditional branch, we shift 0 for not +taken and 1 for taken into a register. This uniquely numbers the paths +through the hot region. Silently fail if we need more than 64 bits. + +At the end BB we call countPath and increment the counter based on %g1 +and the return address of the countPath call. We keep track of the +number of iterations and the number of paths. We only run this +version 30 or 40 times. + +Find the BBs that total 90% or more of execution, and aggregate them +together to form our trace. But we do not allow more than 5 paths; if +we have more than 5 we take the ones that are executed the most. We +verify our assumption that we picked a hot back-edge in first-level +instrumentation, by making sure that the number of times we took an +exit edge from the hot trace is less than 10% of the number of +iterations. + +LLC has been taught to recognize llvm_first_trigger() calls and NOT +generate saves and restores of caller-saved registers around these +calls. + + +Phase behavior +-------------- + +We turn off llvm_first_trigger() calls with NOPs, but this would hide +phase behavior from us (when some funcs/traces stop being hot and +others become hot.) + +We have a SIGALRM timer that counts time for us. Every time we get a +SIGALRM we look at our priority queue of locations where we have +removed llvm_first_trigger() calls. Each location is inserted along +with a time when we will next turn instrumentation back on for that +call site. If the time has arrived for a particular call site, we pop +that off the prio. queue and turn instrumentation back on for that +call site. + + +Generating traces +----------------- + +When we finally generate an optimized trace we first copy the code +into the trace cache. This leaves us with 3 copies of the code: the +original code, the instrumented code, and the optimized trace. The +optimized trace does not have instrumentation. The original code and +the instrumented code are modified to have a branch to the trace +cache, where the optimized traces are kept. + +We copy the code from the original to the instrumentation version +by tracing the LLVM-to-Machine code basic block map and then copying +each machine code basic block we think is in the hot region into the +trace cache. Then we instrument that code. The process is similar for +generating the final optimized trace; we copy the same basic blocks +because we might need to put in fixup code for exit BBs. + +LLVM basic blocks are not typically used in the Reoptimizer except +for the mapping information. + +We are restricted to using single instructions to branch between the +original code, trace, and instrumented code. So we have to keep the +code copies in memory near the original code (they can't be far enough +away that a single pc-relative branch would not work.) Malloc() or +data region space is too far away. this impacts the design of the +trace cache. + +We use a dummy function that is full of a bunch of for loops which we +overwrite with trace-cache code. The trace manager keeps track of +whether or not we have enough space in the trace cache, etc. + +The trace insertion routine takes an original start address, a vector +of machine instructions representing the trace, index of branches and +their corresponding absolute targets, and index of calls and their +corresponding absolute targets. + +The trace insertion routine is responsible for inserting branches from +the beginning of the original code to the beginning of the optimized +trace. This is because at some point the trace cache may run out of +space and it may have to evict a trace, at which point the branch to +the trace would also have to be removed. It uses a round-robin +replacement policy; we have found that this is almost as good as LRU +and better than random (especially because of problems fitting the new +trace in.) + +We cannot deal with discontiguous trace cache areas. The trace cache +is supposed to be cache-line-aligned, but it is not page-aligned. + +We generate instrumentation traces and optimized traces into separate +trace caches. We keep the instrumented code around because you don't +want to delete a trace when you still might have to return to it +(i.e., return from a llvm_first_trigger() or countPath() call.) + + diff --git a/docs/HistoricalNotes/2003-06-26-Reoptimizer2.txt b/docs/HistoricalNotes/2003-06-26-Reoptimizer2.txt new file mode 100644 index 0000000000..ec4b93fea0 --- /dev/null +++ b/docs/HistoricalNotes/2003-06-26-Reoptimizer2.txt @@ -0,0 +1,110 @@ +Thu Jun 26 14:43:04 CDT 2003 + +Information about BinInterface +------------------------------ + +Take in a set of instructions with some particular register +allocation. It allows you to add, modify, or delete some instructions, +in SSA form (kind of like LLVM's MachineInstrs.) Then re-allocate +registers. It assumes that the transformations you are doing are safe. +It does not update the mapping information or the LLVM representation +for the modified trace (so it would not, for instance, support +multiple optimization passes; passes have to be aware of and update +manually the mapping information.) + +The way you use it is you take the original code and provide it to +BinInterface; then you do optimizations to it, then you put it in the +trace cache. + +The BinInterface tries to find live-outs for traces so that it can do +register allocation on just the trace, and stitch the trace back into +the original code. It has to preserve the live-ins and live-outs when +it does its register allocation. (On exits from the trace we have +epilogues that copy live-outs back into the right registers, but +live-ins have to be in the right registers.) + + +Limitations of BinInterface +--------------------------- + +It does copy insertions for PHIs, which it infers from the machine +code. The mapping info inserted by LLC is not sufficient to determine +the PHIs. + +It does not handle integer or floating-point condition codes and it +does not handle floating-point register allocation. + +It is not aggressively able to use lots of registers. + +There is a problem with alloca: we cannot find our spill space for +spilling registers, normally allocated on the stack, if the trace +follows an alloca(). What might be an acceptable solution would be to +disable trace generation on functions that have variable-sized +alloca()s. Variable-sized allocas in the trace would also probably +screw things up. + +Because of the FP and alloca limitations, the BinInterface is +completely disabled right now. + + +Demo +---- + +This is a demo of the Ball & Larus version that does NOT use 2-level +profiling. + +1. Compile program with llvm-gcc. +2. Run opt -lowerswitch -paths -emitfuncs on the bytecode. + -lowerswitch change switch statements to branches + -paths Ball & Larus path-profiling algorithm + -emitfuncs emit the table of functions +3. Run llc to generate SPARC assembly code for the result of step 2. +4. Use g++ to link the (instrumented) assembly code. + +We use a script to do all this: +------------------------------------------------------------------------------ +#!/bin/sh +llvm-gcc $1.c -o $1 +opt -lowerswitch -paths -emitfuncs $1.bc > $1.run.bc +llc -f $1.run.bc +LIBS=$HOME/llvm_sparc/lib/Debug +GXX=/usr/dcs/software/evaluation/bin/g++ +$GXX -g -L $LIBS $1.run.s -o $1.run.llc \ +$LIBS/tracecache.o \ +$LIBS/mapinfo.o \ +$LIBS/trigger.o \ +$LIBS/profpaths.o \ +$LIBS/bininterface.o \ +$LIBS/support.o \ +$LIBS/vmcore.o \ +$LIBS/transformutils.o \ +$LIBS/bcreader.o \ +-lscalaropts -lscalaropts -lanalysis \ +-lmalloc -lcpc -lm -ldl +------------------------------------------------------------------------------ + +5. Run the resulting binary. You will see output from BinInterface +(described below) intermixed with the output from the program. + + +Output from BinInterface +------------------------ + +BinInterface's debugging code prints out the following stuff in order: + +1. Initial code provided to BinInterface with original register +allocation. + +2. Section 0 is the trace prolog, consisting mainly of live-ins and +register saves which will be restored in epilogs. + +3. Section 1 is the trace itself, in SSA form used by BinInterface, +along with the PHIs that are inserted. +PHIs are followed by the copies that implement them. +Each branch (i.e., out of the trace) is annotated with the +section number that represents the epilog it branches to. + +4. All the other sections starting with Section 2 are trace epilogs. +Every branch from the trace has to go to some epilog. + +5. After the last section is the register allocation output. diff --git a/docs/HistoricalNotes/2007-OriginalClangReadme.txt b/docs/HistoricalNotes/2007-OriginalClangReadme.txt new file mode 100644 index 0000000000..611dc9d2c0 --- /dev/null +++ b/docs/HistoricalNotes/2007-OriginalClangReadme.txt @@ -0,0 +1,178 @@ +//===----------------------------------------------------------------------===// +// C Language Family Front-end +//===----------------------------------------------------------------------===// + Chris Lattner + +I. Introduction: + + clang: noun + 1. A loud, resonant, metallic sound. + 2. The strident call of a crane or goose. + 3. C-language family front-end toolkit. + + The world needs better compiler tools, tools which are built as libraries. This + design point allows reuse of the tools in new and novel ways. However, building + the tools as libraries isn't enough: they must have clean APIs, be as + decoupled from each other as possible, and be easy to modify/extend. This + requires clean layering, decent design, and avoiding tying the libraries to a + specific use. Oh yeah, did I mention that we want the resultant libraries to + be as fast as possible? :) + + This front-end is built as a component of the LLVM toolkit that can be used + with the LLVM backend or independently of it. In this spirit, the API has been + carefully designed as the following components: + + libsupport - Basic support library, reused from LLVM. + + libsystem - System abstraction library, reused from LLVM. + + libbasic - Diagnostics, SourceLocations, SourceBuffer abstraction, + file system caching for input source files. This depends on + libsupport and libsystem. + + libast - Provides classes to represent the C AST, the C type system, + builtin functions, and various helpers for analyzing and + manipulating the AST (visitors, pretty printers, etc). This + library depends on libbasic. + + + liblex - C/C++/ObjC lexing and preprocessing, identifier hash table, + pragma handling, tokens, and macros. This depends on libbasic. + + libparse - C (for now) parsing and local semantic analysis. This library + invokes coarse-grained 'Actions' provided by the client to do + stuff (e.g. libsema builds ASTs). This depends on liblex. + + libsema - Provides a set of parser actions to build a standardized AST + for programs. AST's are 'streamed' out a top-level declaration + at a time, allowing clients to use decl-at-a-time processing, + build up entire translation units, or even build 'whole + program' ASTs depending on how they use the APIs. This depends + on libast and libparse. + + librewrite - Fast, scalable rewriting of source code. This operates on + the raw syntactic text of source code, allowing a client + to insert and delete text in very large source files using + the same source location information embedded in ASTs. This + is intended to be a low-level API that is useful for + higher-level clients and libraries such as code refactoring. + + libanalysis - Source-level dataflow analysis useful for performing analyses + such as computing live variables. It also includes a + path-sensitive "graph-reachability" engine for writing + analyses that reason about different possible paths of + execution through source code. This is currently being + employed to write a set of checks for finding bugs in software. + + libcodegen - Lower the AST to LLVM IR for optimization & codegen. Depends + on libast. + + clang - An example driver, client of the libraries at various levels. + This depends on all these libraries, and on LLVM VMCore. + + This front-end has been intentionally built as a DAG of libraries, making it + easy to reuse individual parts or replace pieces if desired. For example, to + build a preprocessor, you take the Basic and Lexer libraries. If you want an + indexer, you take those plus the Parser library and provide some actions for + indexing. If you want a refactoring, static analysis, or source-to-source + compiler tool, it makes sense to take those plus the AST building and semantic + analyzer library. Finally, if you want to use this with the LLVM backend, + you'd take these components plus the AST to LLVM lowering code. + + In the future I hope this toolkit will grow to include new and interesting + components, including a C++ front-end, ObjC support, and a whole lot of other + things. + + Finally, it should be pointed out that the goal here is to build something that + is high-quality and industrial-strength: all the obnoxious features of the C + family must be correctly supported (trigraphs, preprocessor arcana, K&R-style + prototypes, GCC/MS extensions, etc). It cannot be used if it is not 'real'. + + +II. Usage of clang driver: + + * Basic Command-Line Options: + - Help: clang --help + - Standard GCC options accepted: -E, -I*, -i*, -pedantic, -std=c90, etc. + - To make diagnostics more gcc-like: -fno-caret-diagnostics -fno-show-column + - Enable metric printing: -stats + + * -fsyntax-only is currently the default mode. + + * -E mode works the same way as GCC. + + * -Eonly mode does all preprocessing, but does not print the output, + useful for timing the preprocessor. + + * -fsyntax-only is currently partially implemented, lacking some + semantic analysis (some errors and warnings are not produced). + + * -parse-noop parses code without building an AST. This is useful + for timing the cost of the parser without including AST building + time. + + * -parse-ast builds ASTs, but doesn't print them. This is most + useful for timing AST building vs -parse-noop. + + * -parse-ast-print pretty prints most expression and statements nodes. + + * -parse-ast-check checks that diagnostic messages that are expected + are reported and that those which are reported are expected. + + * -dump-cfg builds ASTs and then CFGs. CFGs are then pretty-printed. + + * -view-cfg builds ASTs and then CFGs. CFGs are then visualized by + invoking Graphviz. + + For more information on getting Graphviz to work with clang/LLVM, + see: http://llvm.org/docs/ProgrammersManual.html#ViewGraph + + +III. Current advantages over GCC: + + * Column numbers are fully tracked (no 256 col limit, no GCC-style pruning). + * All diagnostics have column numbers, includes 'caret diagnostics', and they + highlight regions of interesting code (e.g. the LHS and RHS of a binop). + * Full diagnostic customization by client (can format diagnostics however they + like, e.g. in an IDE or refactoring tool) through DiagnosticClient interface. + * Built as a framework, can be reused by multiple tools. + * All languages supported linked into same library (no cc1,cc1obj, ...). + * mmap's code in read-only, does not dirty the pages like GCC (mem footprint). + * LLVM License, can be linked into non-GPL projects. + * Full diagnostic control, per diagnostic. Diagnostics are identified by ID. + * Significantly faster than GCC at semantic analysis, parsing, preprocessing + and lexing. + * Defers exposing platform-specific stuff to as late as possible, tracks use of + platform-specific features (e.g. #ifdef PPC) to allow 'portable bytecodes'. + * The lexer doesn't rely on the "lexer hack": it has no notion of scope and + does not categorize identifiers as types or variables -- this is up to the + parser to decide. + +Potential Future Features: + + * Fine grained diag control within the source (#pragma enable/disable warning). + * Better token tracking within macros? (Token came from this line, which is + a macro argument instantiated here, recursively instantiated here). + * Fast #import with a module system. + * Dependency tracking: change to header file doesn't recompile every function + that texually depends on it: recompile only those functions that need it. + This is aka 'incremental parsing'. + + +IV. Missing Functionality / Improvements + +Lexer: + * Source character mapping. GCC supports ASCII and UTF-8. + See GCC options: -ftarget-charset and -ftarget-wide-charset. + * Universal character support. Experimental in GCC, enabled with + -fextended-identifiers. + * -fpreprocessed mode. + +Preprocessor: + * #assert/#unassert + * MSExtension: "L#param" stringizes to a wide string literal. + * Add support for -M* + +Traditional Preprocessor: + * Currently, we have none. :) + |