aboutsummaryrefslogtreecommitdiff
path: root/NOTES.txt
blob: ea7d206a7889aff715cf4a902c02dc1c59ddbdd0 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
//===---------------------------------------------------------------------===//
// Random Notes
//===---------------------------------------------------------------------===//

C90/C99/C++ Comparisons:
http://david.tribble.com/text/cdiffs.htm

//===---------------------------------------------------------------------===//
Extensions:

 * "#define_target X Y"
   This preprocessor directive works exactly the same was as #define, but it
   notes that 'X' is a target-specific preprocessor directive.  When used, a
   diagnostic is emitted indicating that the translation unit is non-portable.
   
   If a target-define is #undef'd before use, no diagnostic is emitted.  If 'X'
   were previously a normal #define macro, the macro is tainted.  If 'X' is
   subsequently #defined as a non-target-specific define, the taint bit is
   cleared.
   
 * "#define_other_target X"
    The preprocessor directive takes a single identifier argument.  It notes
    that this identifier is a target-specific #define for some target other than
    the current one.  Use of this identifier will result in a diagnostic.
    
    If 'X' is later #undef'd or #define'd, the taint bit is cleared.  If 'X' is
    already defined, X is marked as a target-specific define. 

//===---------------------------------------------------------------------===//

To time GCC preprocessing speed without output, use:
   "time gcc -MM file"
This is similar to -Eonly.


//===---------------------------------------------------------------------===//

  C++ Template Instantiation benchmark:
     http://users.rcn.com/abrahams/instantiation_speed/index.html

//===---------------------------------------------------------------------===//

TODO: File Manager Speedup:

 We currently do a lot of stat'ing for files that don't exist, particularly
 when lots of -I paths exist (e.g. see the <iostream> example, check for
 failures in stat in FileManager::getFile).  It would be far better to make
 the following changes:
   1. FileEntry contains a sys::Path instead of a std::string for Name.
   2. sys::Path contains timestamp and size, lazily computed.  Eliminate from
      FileEntry.
   3. File UIDs are created on request, not when files are opened.
 These changes make it possible to efficiently have FileEntry objects for
 files that exist on the file system, but have not been used yet.
 
 Once this is done:
   1. DirectoryEntry gets a boolean value "has read entries".  When false, not
      all entries in the directory are in the file mgr, when true, they are.
   2. Instead of stat'ing the file in FileManager::getFile, check to see if 
      the dir has been read.  If so, fail immediately, if not, read the dir,
      then retry.
   3. Reading the dir uses the getdirentries syscall, creating an FileEntry
      for all files found.

//===---------------------------------------------------------------------===//

TODO: Fast #Import:

 * Get frameworks that don't use #import to do so, e.g. 
   DirectoryService, AudioToolbox, CoreFoundation, etc.  Why not using #import?
   Because they work in C mode? C has #import.
 * Have the lexer return a token for #import instead of handling it itself.
   - Create a new preprocessor object with no external state (no -D/U options
     from the command line, etc).  Alternatively, keep track of exactly which
     external state is used by a #import: declare it somehow.
 * When having reading a #import file, keep track of whether we have (and/or
   which) seen any "configuration" macros.  Various cases:
   - Uses of target args (__POWERPC__, __i386): Header has to be parsed 
     multiple times, per-target.  What about #ifndef checks?  How do we know?
   - "Configuration" preprocessor macros not defined: POWERPC, etc.  What about
     things like __STDC__ etc?  What is and what isn't allowed.
 * Special handling for "umbrella" headers, which just contain #import stmts:
   - Cocoa.h/AppKit.h - Contain pointers to digests instead of entire digests
     themselves?  Foundation.h isn't pure umbrella!
 * Frameworks digests:
   - Can put "digest" of a framework-worth of headers into the framework
     itself.  To open AppKit, just mmap
     /System/Library/Frameworks/AppKit.framework/"digest", which provides a
     symbol table in a well defined format.  Lazily unstream stuff that is
     needed.  Contains declarations, macros, and debug information.
   - System frameworks ship with digests.  How do we handle configuration
     information?  How do we handle stuff like:
       #if MAC_OS_X_VERSION_MAX_ALLOWED >= MAC_OS_X_VERSION_10_2
     which guards a bunch of decls?  Should there be a couple of default
     configs, then have the UI fall back to building/caching its own?
   - GUI automatically builds digests when UI is idle, both of system
     frameworks if they aren't not available in the right config, and of app
     frameworks.
   - GUI builds dependence graph of frameworks/digests based on #imports.  If a
     digest is out date, dependent digests are automatically invalidated.

 * New constraints on #import for objc-v3:
   - #imported file must not define non-inline function bodies.
     - Alternatively, they can, and these bodies get compiled/linked *once*
       per app into a dylib.  What about building user dylibs?
   - Restrictions on ObjC grammar: can't #import the body of a for stmt or fn.
   - Compiler must detect and reject these cases.
   - #defines defined within a #import have two behaviors:
     - By default, they escape the header.  These macros *cannot* be #undef'd
       by other code: this is enforced by the front-end.
     - Optionally, user can specify what macros escape (whitelist) or can use
       #undef.

//===---------------------------------------------------------------------===//

TODO: New language feature: Configuration queries:
  - Instead of #ifdef __POWERPC__, use "if (strcmp(`cpu`, __POWERPC__))", or
    some other, better, syntax.
  - Use it to increase the number of "architecture-clean" #import'd files,
    allowing a single index to be used for all fat slices.

//===---------------------------------------------------------------------===//

The 'portability' model in clang is sufficient to catch translation units (or
their parts) that are not portable, but it doesn't help if the system headers
are non-portable and not fixed.  An alternative model that would be easy to use
is a 'tainting' scheme.  Consider:

int32_t
OSHostByteOrder(void) {
#if defined(__LITTLE_ENDIAN__)
    return OSLittleEndian;
#elif defined(__BIG_ENDIAN__)
    return OSBigEndian;
#else
    return OSUnknownByteOrder;
#endif
}

It would be trivial to mark 'OSHostByteOrder' as being non-portable (tainted)
instead of marking the entire translation unit.  Then, if OSHostByteOrder is
never called/used by the current translation unit, the t-u wouldn't be marked
non-portable.  However, there is no good way to handle stuff like:

extern int X, Y;

#ifndef __POWERPC__
#define X Y
#endif

int bar() { return X; }

When compiling for powerpc, the #define is skipped, so it doesn't know that bar
uses a #define that is set on some other target.  In practice, limited cases
could be handled by scanning the skipped region of a #if, but the fully general
case cannot be implemented efficiently.  In this case, for example, the #define
in the protected region could be turned into either a #define_target or
#define_other_target as appropriate.  The harder case is code like this (from
OSByteOrder.h):

  #if (defined(__ppc__) || defined(__ppc64__))
  #include <libkern/ppc/OSByteOrder.h>
  #elif (defined(__i386__) || defined(__x86_64__))
  #include <libkern/i386/OSByteOrder.h>
  #else
  #include <libkern/machine/OSByteOrder.h>
  #endif

The realistic way to fix this is by having an initial #ifdef __llvm__ that
defines its contents in terms of the llvm bswap intrinsics.  Other things should
be handled on a case-by-case basis.


We probably have to do something smarter like this in the future. The C++ header
<limits> contains a lot of code like this:

   static const int digits10 = __LDBL_DIG__;
   static const int min_exponent = __LDBL_MIN_EXP__;
   static const int min_exponent10 = __LDBL_MIN_10_EXP__;
   static const float_denorm_style has_denorm
     = bool(__LDBL_DENORM_MIN__) ? denorm_present : denorm_absent;

 ... since this isn't being used in an #ifdef, it should be easy enough to taint
the decl for these ivars.


/usr/include/sys/cdefs.h contains stuff like this:

#if defined(__ppc__)
#  if defined(__LDBL_MANT_DIG__) && defined(__DBL_MANT_DIG__) && \
	__LDBL_MANT_DIG__ > __DBL_MANT_DIG__
#    if __ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__-0 < 1040
#      define	__DARWIN_LDBL_COMPAT(x)	__asm("_" __STRING(x) "$LDBLStub")
#    else
#      define	__DARWIN_LDBL_COMPAT(x)	__asm("_" __STRING(x) "$LDBL128")
#    endif
#    define	__DARWIN_LDBL_COMPAT2(x) __asm("_" __STRING(x) "$LDBL128")
#    define	__DARWIN_LONG_DOUBLE_IS_DOUBLE	0
#  else
#   define	__DARWIN_LDBL_COMPAT(x) /* nothing */
#   define	__DARWIN_LDBL_COMPAT2(x) /* nothing */
#   define	__DARWIN_LONG_DOUBLE_IS_DOUBLE	1
#  endif
#elif defined(__i386__) || defined(__ppc64__) || defined(__x86_64__)
#  define	__DARWIN_LDBL_COMPAT(x)	/* nothing */
#  define	__DARWIN_LDBL_COMPAT2(x) /* nothing */
#  define	__DARWIN_LONG_DOUBLE_IS_DOUBLE	0
#else
#  error Unknown architecture
#endif

An ideal way to solve this issue is to mark __DARWIN_LDBL_COMPAT / 
__DARWIN_LDBL_COMPAT2 / __DARWIN_LONG_DOUBLE_IS_DOUBLE as being non-portable
because they depend on non-portable macros.  In practice though, this may end
up being a serious problem: every use of printf will mark the translation unit
non-portable if targetting ppc32 and something else.

//===---------------------------------------------------------------------===//