\documentclass[11pt]{proc} %\documentclass[preprint,10pt]{sigplanconf} \usepackage{amsmath} \usepackage{url} \begin{document} %\conferenceinfo{Splash '11}{??-2011, Portland.} %\copyrightyear{2011} %\copyrightdata{[to be supplied]} %\titlebanner{} % These are ignored unless %\preprintfooter{(C) 2011 Alon Zakai, Creative Commons BY-SA Licensed} % 'preprint' option specified. \title{Emscripten: An LLVM-to-JavaScript Compiler} %\subtitle{} %\authorinfo{Alon Zakai} % {Mozilla} % {azakai@mozilla.com} \author{Alon Zakai \\ Mozilla \\ \url{azakai@mozilla.com}} \maketitle \begin{abstract} JavaScript is the standard language of the web, supported on essentially all web browsers. Despite efforts to allow other languages to be run as well, none have come close to being universally available on all browsers, which severely limits their usefulness on the web. However, there are reasonable reasons why allowing other languages would be beneficial, including reusing existing code and allowing developers to use their languages of choice. We present Emscripten, an LLVM-to-JavaScript compiler. Emscripten compiles LLVM assembly code into standard JavaScript, which opens up two avenues for running code written in other languages on the web: (1) Compile a language directly into LLVM, and then compile that into JavaScript using Emscripten, or (2) Compiling a language's entire runtime (typically written in C or C++) into JavaScript using Emscripten, and using the compiled runtime to run code written in that language. Examples of languages that can use the first approach are C and C++, as compilers exist for them into LLVM. An example of a language that can use the second approach is Python, and Emscripten has been used to compile CPython (the standard C implementation of Python) into JavaScript, allowing standard Python code to be run on the web, which was not previously possible. Emscripten itself is written in JavaScript (to enable various dynamic compilation techniques), and is available under the MIT license (a permissive open source license), at \url{http://www.emscripten.org}. As an LLVM-to-JavaScript compiler, the challenges in designing Emscripten are somewhat the reverse of the norm -- one must go from a low-level assembly into a high-level language, and recreate parts of the original high-level structure of the code that were lost in the compilation to low-level LLVM. We detail the algorithms used in Emscripten to deal with those challenges. \end{abstract} %\category{CR-number}{subcategory}{third-level} %\terms %term1, term2 %\keywords %keyword1, keyword2 \bigskip \copyright 2011 Alon Zakai. License: Creative Commons Attribution-ShareAlike (CC BY-SA), \url{http://creativecommons.org/licenses/by-sa/3.0/} \section{Introduction} Since the mid 1990's, JavaScript has been present in most web browsers (sometimes with minor variations and under slightly different names, e.g., JScript in Internet Explorer), and today it is well-supported on essentially all web browsers, from desktop browsers like Internet Explorer, Firefox, Chrome and Safari, to mobile browsers on smartphones and tablets. Together with HTML and CSS, JavaScript is the standards-based foundation of the web. Running other programming languages on the web has been suggested many times, and browser plugins have allowed doing so, e.g., via the Java and Flash plugins. However, plugins must be manually installed and do not integrate in a perfect way with the outside HTML. Perhaps more problematic is that they cannot run at all on some platforms, for example, Java and Flash cannot run on iOS devices such as the iPhone and iPad. For those reasons, JavaScript remains the primary programming language of the web. There are, however, justifiable motivations for running code from other programming languages on the web, for example, if one has a large amount of existing code already written in another language, or if one simply has a strong preference for another language (and perhaps is more productive in it). As a consequence, there have been some efforts to compile languages \textbf{into} JavaScript. Since JavaScript is present in essentially all web browsers, by compiling one's language of choice into JavaScript, one can still generate content that will run practically everywhere. Examples of this approach include the Google Web Toolkit, which compiles Java into JavaScript; Pyjamas, which compiles Python into JavaScript; Script\# and jsc, % http://jsc.sourceforge.net/ which compile .NET assemblies into JavaScript; and there are rumors about an Oracle project to translate JVM bytecode into JavaScript. In this paper we present another project along those lines: \textbf{Emscripten}, which compiles LLVM assembly into JavaScript. LLVM (Low Level Virtual Machine) is a compiler project primarily focused on C, C++ and Objective-C. It compiles those languages through a \emph{frontend} (the main ones of which are Clang and LLVM-GCC) into the LLVM intermediary representation (which can be machine-readable bitcode, or human-readable assembly), and then passes it through a \emph{backend} which generates actual machine code for a particular architecure. Emscripten plays the role of a backend which targets JavaScript. By using Emscripten, potentially many languages can be run on the web, using one of the following methods: \begin{itemize} \item Compile \textbf{code} in a language recognized by one of the existing LLVM frontends into LLVM, and then compile that into JavaScript using Emscripten. Frontends for various languages exist, including many of the most popular programming languages such as C and C++, and also various new and emerging languages (e.g., Rust). \item Compile the \textbf{runtime} used to parse and execute code in a particular language into LLVM, then compile that into JavaScript using Emscripten. It is then possible to run code in that runtime on the web. This is a useful approach if a language's runtime is written in a language for which an LLVM frontend exists, but the language iself has no such frontend. For example, no currently-supported frontend exists for Python, however it is possible to compile CPython -- the standard implementation of Python, written in C -- into JavaScript, and run Python code on that (see Subsection X.Y). \end{itemize} From a technical standpoint, the main challenges in designing and implementing Emscripten are that it compiles a low-level language -- LLVM assembly -- into a high-level one -- JavaScript. This is somethat the reverse of the usual situation one is in when building a compiler, and leads to some unique difficulties. For example, to get good performance in JavaScript one must use natural JavaScript code flow structures, like loops and ifs, but those structures do not exist in LLVM assembly (instead, what is present there is essentially `flat' code with \emph{goto} commands). Emscripten must therefore reconstruct a high-level representation from the low-level data it receives. In theory that issue could have been avoided by compiling a higher-level language into JavaScript. For example, if compiling Java into JavaScript (as the Google Web Toolkit does), then one can benefit from the fact that Java's loops, ifs and so forth generally have a very direct parallel in JavaScript (of course the downside is that this approach yields a compiler only for Java). Compiling LLVM into JavaScript is less straightforward, but wee will see later that it is possible to reconstruct a substantial part of the high-level structure of the original code. We conclude this introduction with a list of this paper's main contributions: \begin{itemize} \item We describe Emscripten itself, during which we detail its approach in compiling LLVM into JavaScript. \item We give details of Emscripten's `Relooper' algorithm, which generates high-level loop structures from low-level branching data. We are unaware of related results in the literature. \end{itemize} In addition, the following are the main contributions of Emscripten itself, that to our knowledge were not previously possible: \begin{itemize} \item It allows compiling a very large subset of C and C++ code into JavaScript, which can then be run on the web. \item By compiling their runtimes, it allows running languages such as Python on the web. \end{itemize} The remainder of this paper is structured as follows. In Section 2 we describe, from a high level, the approach taken to compiling LLVM assembly into JavaScript. In Section 3 we describe the workings of Emscripten on a lower, more concrete level. In Section 4 we give an overview of some uses of Emscripten. In Section 5 we summarize and give directions for future work on Emscripten and uses of it. \section{Compilation Approach} Let us begin by considering what the challenge is, when we want to compile something into JavaScript. Assume we are given the following simple example of a C program, which we want to compile into JavaScript: \begin{verbatim} #include int main() { int sum = 0; for (int i = 1; i < 100; i++) sum += i; printf("1+...+100=%d\n", sum); return 0; } \end{verbatim} This program calculates the sum of the integers from 1 to 100. When compiled by Clang, the generated LLVM assembly code includes the following: \begin{verbatim} @.str = private constant [14 x i8] c"1+...+100=%d\0A\00" define i32 @main() { %1 = alloca i32, align 4 %sum = alloca i32, align 4 %i = alloca i32, align 4 store i32 0, i32* %1 store i32 0, i32* %sum, align 4 store i32 1, i32* %i, align 4 br label %2 ;