diff options
Diffstat (limited to 'README.txt')
-rw-r--r-- | README.txt | 171 |
1 files changed, 171 insertions, 0 deletions
diff --git a/README.txt b/README.txt new file mode 100644 index 0000000000..9ec1cc4a3d --- /dev/null +++ b/README.txt @@ -0,0 +1,171 @@ +//===----------------------------------------------------------------------===// +// C Language Family Front-end +//===----------------------------------------------------------------------===// + Chris Lattner + +I. Introduction: + + clang: noun + 1. A loud, resonant, metallic sound. + 2. The strident call of a crane or goose. + 3. C-language family front-end toolkit. + + The world needs better compiler tools, tools which are built as libraries. This + design point allows reuse of the tools in new and novel ways. However, building + the tools as libraries isn't enough: they must have clean APIs, be as + decoupled from each other as possible, and be easy to modify/extend. This + requires clean layering, decent design, and avoiding tying the libraries to a + specific use. Oh yeah, did I mention that we want the resultant libraries to + be as fast as possible? :) + + This front-end is built as a component of the LLVM toolkit that can be used + with the LLVM backend or independently of it. In this spirit, the API has been + carefully designed as the following components: + + libsupport - Basic support library, reused from LLVM. + libsystem - System abstraction library, reused from LLVM. + + libbasic - Diagnostics, SourceLocations, SourceBuffer abstraction, + file system caching for input source files. This depends on + libsupport and libsystem. + libast - Provides classes to represent the C AST, the C type system, + builtin functions, and various helpers for analyzing and + manipulating the AST (visitors, pretty printers, etc). This + library depends on libbasic. + + liblex - C/C++/ObjC lexing and preprocessing, identifier hash table, + pragma handling, tokens, and macros. This depends on libbasic. + libparse - C (for now) parsing and local semantic analysis. This library + invokes coarse-grained 'Actions' provided by the client to do + stuff (e.g. libsema builds ASTs). This depends on liblex. + libsema - Provides a set of parser actions to build a standardized AST + for programs. AST's are 'streamed' out a top-level declaration + at a time, allowing clients to use decl-at-a-time processing, + build up entire translation units, or even build 'whole + program' ASTs depending on how they use the APIs. This depends + on libast and libparse. + + libcodegen - Lower the AST to LLVM IR for optimization & codegen. Depends + on libast. + clang - An example driver, client of the libraries at various levels. + This depends on all these libraries, and on LLVM VMCore. + + This front-end has been intentionally built as a DAG, making it easy to + reuse individual parts or replace pieces if desired. For example, to build a + preprocessor, you take the Basic and Lexer libraries. If you want an indexer, + you take those plus the Parser library and provide some actions for indexing. + If you want a refactoring, static analysis, or source-to-source compiler tool, + it makes sense to take those plus the AST building and semantic analyzer + library. Finally, if you want to use this with the LLVM backend, you'd take + these components plus the AST to LLVM lowering code. + + In the future I hope this toolkit will grow to include new and interesting + components, including a C++ front-end, ObjC support, and a whole lot of other + things. + + Finally, it should be pointed out that the goal here is to build something that + is high-quality and industrial-strength: all the obnoxious features of the C + family must be correctly supported (trigraphs, preprocessor arcana, K&R-style + prototypes, GCC/MS extensions, etc). It cannot be used if it is not 'real'. + + +II. Usage of clang driver: + + * Basic Command-Line Options: + - Help: clang --help + - Standard GCC options accepted: -E, -I*, -i*, -pedantic, -std=c90, etc. + - To make diagnostics more gcc-like: -fno-caret-diagnostics -fno-show-column + - Enable metric printing: -stats + + * -fsyntax-only is the default mode. + + * -E mode gives output nearly identical to GCC, though not all bugs in + whitespace calculation have been emulated (e.g. the number of blank lines + emitted). + + * -fsyntax-only is currently partially implemented, lacking some semantic + analysis. + + * -Eonly mode does all preprocessing, but does not print the output, useful for + timing the preprocessor. + + * -parse-print-callbacks prints almost no callbacks so far. + + * -parse-ast builds ASTs, but doesn't print them. This is most useful for + timing AST building vs -parse-noop. + + * -parse-ast-print prints most expression and statements nodes, but some + minor things are missing. + + * -parse-ast-check checks that diagnostic messages that are expected are + reported and that those which are reported are expected. + +III. Current advantages over GCC: + + * Column numbers are fully tracked (no 256 col limit, no GCC-style pruning). + * All diagnostics have column numbers, includes 'caret diagnostics', and they + highlight regions of interesting code (e.g. the LHS and RHS of a binop). + * Full diagnostic customization by client (can format diagnostics however they + like, e.g. in an IDE or refactoring tool) through DiagnosticClient interface. + * Built as a framework, can be reused by multiple tools. + * All languages supported linked into same library (no cc1,cc1obj, ...). + * mmap's code in read-only, does not dirty the pages like GCC (mem footprint). + * LLVM License, can be linked into non-GPL projects. + * Full diagnostic control, per diagnostic. Diagnostics are identified by ID. + * Significantly faster than GCC at semantic analysis, parsing, preprocessing + and lexing. + * Defers exposing platform-specific stuff to as late as possible, tracks use of + platform-specific features (e.g. #ifdef PPC) to allow 'portable bytecodes'. + * The lexer doesn't rely on the "lexer hack": it has no notion of scope and + does not categorize identifiers as types or variables -- this is up to the + parser to decide. + +Potential Future Features: + + * Fine grained diag control within the source (#pragma enable/disable warning). + * Better token tracking within macros? (Token came from this line, which is + a macro argument instantiated here, recursively instantiated here). + * Fast #import with a module system. + * Dependency tracking: change to header file doesn't recompile every function + that texually depends on it: recompile only those functions that need it. + + +IV. Missing Functionality / Improvements + +clang driver: + * Include search paths are hard-coded into the driver. + +File Manager: + * Reduce syscalls, see NOTES.txt. + +Lexer: + * Source character mapping. GCC supports ASCII and UTF-8. + See GCC options: -ftarget-charset and -ftarget-wide-charset. + * Universal character support. Experimental in GCC, enabled with + -fextended-identifiers. + * -fpreprocessed mode. + +Preprocessor: + * Know about apple header maps. + * #assert/#unassert + * #line / #file directives (currently accepted and ignored). + * MSExtension: "L#param" stringizes to a wide string literal. + * Charize extension: "#define F(o) #@o F(a)" -> 'a'. + * Consider merging the parser's expression parser into the preprocessor to + eliminate duplicate code. + * Add support for -M* + +Traditional Preprocessor: + * All. + +Parser: + * C90/K&R modes are only partially implemented. + * __extension__, __attribute__ [currently just skipped and ignored]. + * "initializers", GCC inline asm. + +Semantic Analysis: + * Perhaps 75% done. + +Code Gen: + * Mostly missing. + |