diff options
author | Alon Zakai <alonzakai@gmail.com> | 2011-03-31 21:06:46 -0700 |
---|---|---|
committer | Alon Zakai <alonzakai@gmail.com> | 2011-03-31 21:06:46 -0700 |
commit | 2ff7f5979873cd9f876256016b6e7c00f2c4a095 (patch) | |
tree | 830e52c628f9a338c4ebefb76c47094caa169730 /docs | |
parent | 5148a01e691ecef3ad3b2a540d52d63fcccbd297 (diff) |
paper update
Diffstat (limited to 'docs')
-rw-r--r-- | docs/paper.tex | 145 |
1 files changed, 77 insertions, 68 deletions
diff --git a/docs/paper.tex b/docs/paper.tex index 133772cf..89503442 100644 --- a/docs/paper.tex +++ b/docs/paper.tex @@ -206,6 +206,7 @@ following simple example of a C program, which we want to compile into JavaScrip This program calculates the sum of the integers from 1 to 100. When compiled by Clang, the generated LLVM assembly code includes the following: +\label{code:examplellvm} \begin{verbatim} @.str = private constant [14 x i8] c"1+...+100=%d\0A\00" @@ -342,6 +343,11 @@ to take notice of: Section~\ref{sssec:realworldcode}. \end{itemize} +\subsection{Performance} + +In this section we will deal with several topics regarding +Emscripten's approach to generating high-performance JavaScript code. + \subsection{Load-Store Consistency (LSC)} \label{sec:lsc} @@ -365,16 +371,12 @@ compiling has \textbf{Load-Store Consistency} (LSC), which is the requirement th loads from and stores to a specific memory address will use the same type. Normal C and C++ code generally does so: If $x$ is a variable containing a 32-bit floating point number, then both loads and stores of $x$ will be of 32-bit floating -point values, and not 16-bit unsigned integers or anything else. (Note that -even if we write something like \begin{verbatim}float x = 5\end{verbatim} then the -compiler will assign a 32-bit float with the value of 5 to $x$, and not -an integer.) +point values, and not 16-bit unsigned integers or anything else. To see why this is important for performance, consider the following C code fragment, which does \emph{not} have LSC: \begin{verbatim} int x = 12345; - [...] printf("first byte: %d\n", *((char*)&x)); \end{verbatim} Assuming an architecture with more than 8 bits, this code will read @@ -423,8 +425,7 @@ the far more optimal (Note that even this can be optimized even more -- we can store \emph{x} in a normal JavaScript variable. For now though we are just clarifying why it is useful to assume we are compiling code -that has LSC -- doing so lets us generate shorter and more natural -JavaScript.) +that has LSC.) In practice the vast majority of C and C++ code does have LSC. Exceptions do exist, however, for example: @@ -448,7 +449,11 @@ do exist, however, for example: \end{itemize} In practice it may be hard to know if code has LSC or not, and requiring a time-consuming code audit is obviously impractical. Emscripten therefore has -automatic tools to detect violations of LSC, see SAFE\_HEAP in Section~\ref{sssec:realworldcode}. +a compilation option, SAFE\_HEAP, which generates code that checks that LSC holds, and warns if it +doesn't. It also warns about other memory-related issues like +reading from memory before a value was written (somewhat similarly to tools +like Valgrind). When such problems are detected, possible solutions are to ignore the issue (if it has no actual +consqeuences), or altering the source code. Note that it is somewhat wasteful to allocation 4 memory locations for a 32-bit integer, and use only one of them. It is possible to change @@ -459,9 +464,7 @@ called with the normal size each the value would have, and not a single memory address each as Emscripten would prefer. We are looking into modifications to LLVM itself to remedy that. -\subsection{Performance} - -\subsubsection{Running Real-World Code Efficiently} +\subsubsection{Emulating Code Semantics} \label{sssec:realworldcode} The semantics of LLVM assembly and JavaScript are not identical: The former @@ -473,16 +476,16 @@ more challenging. For example, if we want to convert add i8 %1, %2 \end{verbatim} to JavaScript, then to be completely accurate we must emulate the -exact same semantics. That means we must handle overflows properly, which would not be the case if we just implement -this as $\%1 + \%2$ in JavaScript. For example, with inputs of $255,1$, the +exact same behavior, in particular, we must handle overflows properly, which would not be the case if we just implement +this as $\%1 + \%2$ in JavaScript. For example, with inputs of $255$ and $1$, the correct output is 0, but simple addition in JavaScript will give us 256. We can emulate the proper behavior by adding additional code, one way (not necessarily the most optimal) would be to check for overflows after each addition, and correct them as necessary. This however makes each operation take significantly more CPU time that the original code. -Emscripten's approach to this problem is to support both accurate code, -that is identical in behavior to LLVM assembly, and simple code which is +Emscripten's approach to this problem is to allow the generation of both accurate code, +that is identical in behavior to LLVM assembly, and inaccurate code which is faster. In practice, most addition operations in LLVM do not overflow, and can be translated into $\%1 + \%2$ in JavaScript which is fast. Emscripten provides tools that make it straightforward to find which code does require @@ -500,7 +503,7 @@ In practice, this is done as follows: \item Run the compiled code on a representative sample of inputs, and notice which lines are warned about by the runtime checks. \item Recompile the code, telling Emscripten to add corrections (using CORRECT\_SIGNS, CORRECT\_OVERFLOWS - or CORRECT\_ROUNDINGS) only on the specific lines that need it. + or CORRECT\_ROUNDINGS) only on the specific lines that actually need it. \end{itemize} This method is not guaranteed to work, as if we do not run on a truly representative @@ -510,20 +513,9 @@ sure things will work properly (this is the default compilation setting), howeve practice the procedure above appears to work quite well, and can result in code that runs very significantly faster. -There are additional challenges with real-world code. We already mentioned the -Load-Store Consistency assumption (LSC) earlier, which is necessary for the -compiled code to run properly. There is an additional compilation option, -SAFE\_HEAP, which generates code that checks that LSC holds, and warns if it -doesn't. It also warns about other memory-related issues like -reading from memory before a value was written (somewhat similarly to tools -like Valgrind). When such problems are detected, possible solutions are to ignore the issue (if it has no actual -consqeuences), or altering the source code. In -some cases, such nonportable code can be avoided in a simple manner by changing the configuration under -which the code is built (see the examples in Section X). +\subsubsection{Code Optimizations} -\subsubsection{Optimizations} - -When comparing the example program from page~\ref{code:example}, +When comparing the example program from page~\pageref{code:example}, the generated code was fairly complicated and cumbersome, and unsurprisingly it performs quite poorly. There are two main reasons for that: First, that the code is simply @@ -538,14 +530,15 @@ Emscripten's approach to generating fast-performing code is as follows. Emscripten doesn't do any optimizations that can be done by other tools: LLVM can be used to perform optimizations before Emscripten, and -the Closure Compiler\cite{closure} can perform optimizations on the generated JavaScript afterwards. Those +the Closure Compiler~\cite{closure} can perform optimizations on the generated JavaScript afterwards. Those tools will perform standard useful optimizations like removing unneeded variables, dead code, function inlining, etc. That leaves two major optimizations that are left for Emscripten to perform: \begin{itemize} \item \textbf{Variable nativization}: Convert variables - that are on the stack into native JavaScript variables. In general, + that are on the stack -- which is implemented using addresses in the \emph{HEAP} array + as mentioned earlier -- into native JavaScript variables (that is to say, \emph{var x;} and so forth). In general, a variable will be nativized unless it is used outside that function, e.g., if its address is taken and stored somewhere or passed to another function. When optimizing, Emscripten tries to nativize @@ -566,7 +559,7 @@ function _main() { $1 = 0; $sum = 0; $i = 0; - $2$2: while(1) { // $2 + $2$2: while(1) { var $3 = $i; var $4 = $3 < 100; if (!($4)) { __label__ = 2; break $2$2; } @@ -607,21 +600,36 @@ to recreate the original high-level structure of the code that was compiled into LLVM assembly, despite that structure not being explicitly available to Emscripten. +\subsection{Benchmarks} + +\begin{tabular}{ l | c | c | c } + \textbf{benchmark} & \textbf{JS} & \textbf{gcc} & \textbf{ratio} \\ + raytrace (5, 64) & 1.553 & 0.033 & 46.45 \\ + fannkuch (9) & 0.663 & 0.054 & 12.27 \\ + fasta (100,000) & 0.578 & 0.055 & 10.51 \\ +\end{tabular} + +\bigskip + +LLVM opts, closure +SM -m +gcc -O3 + \section{Emscripten's Architecture} In the previous section we saw a general overview of Emscripten's approach to compiling LLVM assembly into JavaScript. We will now get into more detail -into how Emscripten implements that approach and its architecture. +into how Emscripten itself is implemented. Emscripten is written in JavaScript. A main reason for that decision was to simplify sharing code between the compiler and the runtime, and -to enable various dynamic compilation techniques. Two simple examples: (1) -The compiler can create JavaScript objects that represent constants in -the assembly code, and convert them to a string using JSON.stringify() -in a convenient manner, -and (2) The compiler can simplify numerical operations by simply +to enable various dynamic compilation techniques. Two simple benefits of this approach are that (1) +the compiler can create JavaScript objects that represent structures from the original +assembly code, and convert them to a string using JSON.stringify() +in a trivial manner, +and (2) the compiler can simplify numerical operations by simply eval()ing the code (so ``1+2'' would become ``3'', etc.). In both examples, -the http://sourceware.org/newlib/development of Emscripten was made simpler by having the exact same environment +the development of Emscripten was made simpler by having the exact same environment during compilation as the executing code will have. Emscripten's compilation has three main phases: @@ -646,7 +654,7 @@ direct access to the local filesystem. Emscripten comes with a partial implementation of a C library, mostly written from scratch in JavaScript, which parts compiled from an -existing C library (newlib\cite{newlib}). Some aspects of the runtime environment, as +existing C library (newlib~\cite{newlib}). Some aspects of the runtime environment, as implemented in that C library, are: \begin{itemize} \item Files to be read must be `preloaded' in JavaScript. They can @@ -669,10 +677,11 @@ implemented in that C library, are: The Relooper is among the most complicated components in Emscripten. It receives a `soup of blocks', which is a set of labeled fragments of code, each -ending with a branch operation (either a simple branch, a conditional branch, or a switch), and the goal is to generate normal -high-level JavaScript code flow structures such as loops and ifs. +ending with a branch operation, and the goal is to generate normal +high-level JavaScript code flow structures such as loops and ifs. As +we saw before, generating such code structures is essential to generating good-performing code. -For example, the LLVM assembly on page X has the following block +For example, the LLVM assembly on page~\pageref{code:examplellvm} has the following block structure: \begin{verbatim} /-----------\ @@ -707,11 +716,11 @@ blocks `blocks' in the following. There are three types of Emscripten blocks: \begin{itemize} -\item \textbf{Simple block}: A block with one internal label and a Next +\item \textbf{Simple block}: A block with one internal label, and a Next block, which the internal label branches to. The block is later translated simply into the code for that label, and the Next block appears right after it. -\item \textbf{Loop}: An block that represents a basic loop, comprised of +\item \textbf{Loop}: A block that represents a basic loop, comprised of two internal sub-blocks: \begin{itemize} \item \textbf{Inner}: A block that will appear inside @@ -743,49 +752,49 @@ check its value when we enter that block. So, for example, when we create a Loop block, its Next block can have multiple entries -- any label to which we branch out from the loop. By creating a Multiple block after the loop, we can enter the proper label when the loop is -exited. Having a \emph{\_\_label\_\_} variable does add some overhead, +exited. (Having a \emph{\_\_label\_\_} variable does add some overhead, but it greatly simplifies the problem that the Relooper needs to solve and allows us to only need three kinds of blocks as described above. -(Of course, it is possible to optimize -away writes and reads to \emph{\_\_label\_\_} in many cases.) +Of course, it is possible to optimize +away writes and reads to \emph{\_\_label\_\_} in many or even most cases.) -Emscripten uses the following recursive algorithm for generating -blocks from the soup of labels. We use the term `entry' here to +We will use the term `entry' to mean a label that can be reached immediately in a block. In other words, a block consists of labels $l_1,..,l_n$, and the entries are a subset of those labels, specifically the ones that execution -can directly reach when we reach that block. The algorithm can -them be written as follows: +can directly reach when we reach that block. With that +definition, the Relooper algorithm can +then be written as follows: \begin{itemize} -\item Receive a set of labels and which of them are entry points. +\item \textbf{Receive a set of labels and which of them are entry points.} We wish to create a block comprised of all those labels. -\item Calculate, for each label, which other labels it \emph{can} - reach, i.e., which labels we are able to reach if we start +\item \textbf{Calculate, for each label, which other labels it \emph{can} + reach}, i.e., which labels we are able to reach if we start at the current label and follow one of the possible paths of execution. -\item If we have a single entry, and cannot return to it from - any other label, then create a Simple block, with the entry +\item \textbf{If we have a single entry, and cannot return to it (by some other + label later on branching to it) then create a Simple block}, with the entry as its internal label, and the Next block comprised of all the other labels. The entries for the Next block are the entries to which the internal label can branch. -\item If we can return to all of the entries, return a - Loop block, whose Inner block is comprised of all labels that +\item \textbf{If we can return to all of the entries, return a + Loop block}, whose Inner block is comprised of all labels that can reach one of the entries, and whose Next block is comprised of all the others. The entry labels for the current block become entry labels for the Inner block (note that they must be in the Inner block by definition, as each one can reach itself). The Next block's entry labels are all the labels in the Next block that can be reached by the Inner block. -\item If we have more than one entry, try to create a Multiple block: For each entry, find all +\item \textbf{If we have more than one entry, try to create a Multiple block}: For each entry, find all the labels it reaches that cannot be reached by any other entry. If at least one entry has such labels, return a Multiple block, whose Handled blocks are blocks for those labels (and whose entries are those labels), and whose Next block is all the rest. Entries for the next block are entries that did not become part of the Handled blocks, and also labels that can be reached from the Handled blocks. -\item If we could not create a Multiple block, then we must be able to return to at least one of the entries (see proof below), so create - a Loop block as described above. +\item \textbf{If we could not create a Multiple block, then create a Loop block as described above} + (see proof below of why creating a Loop block is possible, i.e., why the labels contain a loop). \end{itemize} Note that we first create a Loop only if we must, then try to create a Multiple, then create a Loop if we have no other choice. We could have slightly simplified this in @@ -838,25 +847,25 @@ is empty, then since we built the Multiple block from a set of labels with more than one entry, then the Handled blocks are strictly smaller than the current one. \end{itemize} -So, whenever we successfully create a block, we simplify the remaining problem +We have seen that whenever we successfully create a block, we simplify the remaining problem as defined above, which means that we must eventually halt successfully (since we strictly decrease a nonnegative integer). The remaining issue is whether we can reach a situation where we \emph{cannot} successfully create a block, which is if we reach the final part of the relooper algorithm, but cannot create a Loop block there. For that to occur, we must not be able to return to any of the entries (or else we would create a Loop -block). But since that is so, we can, at minimum, create a Multiple +block). But if that is so, we can, at minimum, create a Multiple block with entries for all the current entries, since the entry labels themselves cannot be reached by the others as we have just assumed (when we ruled out creating a Loop block here), contradicting the assumption -that we cannot create a block, and concluding the proof. +and concluding the proof. -We have not, of course, proven that the shape of the blocks is optimal +(We have not, of course, proven that the shape of the blocks is optimal in any sense. However, even if it is possible to optimize them further, the Relooper already gives a very substantial speedup due to the move from the switch-in-a-loop -construction to more natural JavaScript code flow structures. +construction to more natural JavaScript code flow structures.) -\section{Example Uses and Benchmarks} +\section{Example Uses} Emscripten has been run successfully on several real-world codebases, including the following: |