27 files changed, 607 insertions, 323 deletions
diff --git a/docs/LangRef.html b/docs/LangRef.html
index 79d6f8820c..35990bdcdd 100644
--- a/docs/LangRef.html
+++ b/docs/LangRef.html
@@ -3714,17 +3714,27 @@ of an intrinsic function.  Additionally, because intrinsic functions are part
 of the LLVM language, it is required if any are added that they be documented
 here.</p>
 
-<p>Some intrinsic functions can be overloaded, i.e., the intrinsic represents
-a family of functions that perform the same operation but on different data
-types. This is most frequent with the integer types. Since LLVM can represent
-over 8 million different integer types, there is a way to declare an intrinsic 
-that can be overloaded based on its arguments. Such an intrinsic will have the
-names of its argument types encoded into its function name, each
-preceded by a period. For example, the <tt>llvm.ctpop</tt> function can take an
-integer of any width. This leads to a family of functions such as 
-<tt>i32 @llvm.ctpop.i8(i8 %val)</tt> and <tt>i32 @llvm.ctpop.i29(i29 %val)</tt>.
-</p>
-
+<p>Some intrinsic functions can be overloaded, i.e., the intrinsic represents 
+a family of functions that perform the same operation but on different data 
+types. Because LLVM can represent over 8 million different integer types, 
+overloading is used commonly to allow an intrinsic function to operate on any 
+integer type. One or more of the argument types or the result type can be 
+overloaded to accept any integer type. Argument types may also be defined as 
+exactly matching a previous argument's type or the result type. This allows an 
+intrinsic function which accepts multiple arguments, but needs all of them to 
+be of the same type, to only be overloaded with respect to a single argument or 
+the result.</p>
+
+<p>Overloaded intrinsics will have the names of its overloaded argument types 
+encoded into its function name, each preceded by a period. Only those types 
+which are overloaded result in a name suffix. Arguments whose type is matched 
+against another type do not. For example, the <tt>llvm.ctpop</tt> function can 
+take an integer of any width and returns an integer of exactly the same integer 
+width. This leads to a family of functions such as
+<tt>i8 @llvm.ctpop.i8(i8 %val)</tt> and <tt>i29 @llvm.ctpop.i29(i29 %val)</tt>.
+Only one type, the return type, is overloaded, and only one type suffix is 
+required. Because the argument's type is matched against the return type, it 
+does not require its own name suffix.</p>
 
 <p>To learn how to add an intrinsic function, please see the 
 <a href="ExtendingLLVM.html">Extending LLVM Guide</a>.
@@ -4558,12 +4568,11 @@ These allow efficient code generation for some algorithms.
 
 <h5>Syntax:</h5>
 <p>This is an overloaded intrinsic function. You can use bswap on any integer
-type that is an even number of bytes (i.e. BitWidth % 16 == 0). Note the suffix
-that includes the type for the result and the operand.
+type that is an even number of bytes (i.e. BitWidth % 16 == 0).
 <pre>
-  declare i16 @llvm.bswap.i16.i16(i16 &lt;id&gt;)
-  declare i32 @llvm.bswap.i32.i32(i32 &lt;id&gt;)
-  declare i64 @llvm.bswap.i64.i64(i64 &lt;id&gt;)
+  declare i16 @llvm.bswap.i16(i16 &lt;id&gt;)
+  declare i32 @llvm.bswap.i32(i32 &lt;id&gt;)
+  declare i64 @llvm.bswap.i64(i64 &lt;id&gt;)
 </pre>
 
 <h5>Overview:</h5>
@@ -4578,12 +4587,12 @@ byte order.
 <h5>Semantics:</h5>
 
 <p>
-The <tt>llvm.bswap.16.i16</tt> intrinsic returns an i16 value that has the high 
+The <tt>llvm.bswap.i16</tt> intrinsic returns an i16 value that has the high 
 and low byte of the input i16 swapped.  Similarly, the <tt>llvm.bswap.i32</tt> 
 intrinsic returns an i32 value that has the four bytes of the input i32 
 swapped, so that if the input bytes are numbered 0, 1, 2, 3 then the returned 
-i32 will have its bytes in 3, 2, 1, 0 order.  The <tt>llvm.bswap.i48.i48</tt>, 
-<tt>llvm.bswap.i64.i64</tt> and other intrinsics extend this concept to
+i32 will have its bytes in 3, 2, 1, 0 order.  The <tt>llvm.bswap.i48</tt>, 
+<tt>llvm.bswap.i64</tt> and other intrinsics extend this concept to
 additional even-byte lengths (6 bytes, 8 bytes and more, respectively).
 </p>
 
@@ -4600,11 +4609,11 @@ additional even-byte lengths (6 bytes, 8 bytes and more, respectively).
 <p>This is an overloaded intrinsic. You can use llvm.ctpop on any integer bit
 width. Not all targets support all bit widths however.
 <pre>
-  declare i32 @llvm.ctpop.i8 (i8  &lt;src&gt;)
-  declare i32 @llvm.ctpop.i16(i16 &lt;src&gt;)
+  declare i8 @llvm.ctpop.i8 (i8  &lt;src&gt;)
+  declare i16 @llvm.ctpop.i16(i16 &lt;src&gt;)
   declare i32 @llvm.ctpop.i32(i32 &lt;src&gt;)
-  declare i32 @llvm.ctpop.i64(i64 &lt;src&gt;)
-  declare i32 @llvm.ctpop.i256(i256 &lt;src&gt;)
+  declare i64 @llvm.ctpop.i64(i64 &lt;src&gt;)
+  declare i256 @llvm.ctpop.i256(i256 &lt;src&gt;)
 </pre>
 
 <h5>Overview:</h5>
@@ -4639,11 +4648,11 @@ The '<tt>llvm.ctpop</tt>' intrinsic counts the 1's in a variable.
 <p>This is an overloaded intrinsic. You can use <tt>llvm.ctlz</tt> on any 
 integer bit width. Not all targets support all bit widths however.
 <pre>
-  declare i32 @llvm.ctlz.i8 (i8  &lt;src&gt;)
-  declare i32 @llvm.ctlz.i16(i16 &lt;src&gt;)
+  declare i8 @llvm.ctlz.i8 (i8  &lt;src&gt;)
+  declare i16 @llvm.ctlz.i16(i16 &lt;src&gt;)
   declare i32 @llvm.ctlz.i32(i32 &lt;src&gt;)
-  declare i32 @llvm.ctlz.i64(i64 &lt;src&gt;)
-  declare i32 @llvm.ctlz.i256(i256 &lt;src&gt;)
+  declare i64 @llvm.ctlz.i64(i64 &lt;src&gt;)
+  declare i256 @llvm.ctlz.i256(i256 &lt;src&gt;)
 </pre>
 
 <h5>Overview:</h5>
@@ -4682,11 +4691,11 @@ of src. For example, <tt>llvm.ctlz(i32 2) = 30</tt>.
 <p>This is an overloaded intrinsic. You can use <tt>llvm.cttz</tt> on any 
 integer bit width. Not all targets support all bit widths however.
 <pre>
-  declare i32 @llvm.cttz.i8 (i8  &lt;src&gt;)
-  declare i32 @llvm.cttz.i16(i16 &lt;src&gt;)
+  declare i8 @llvm.cttz.i8 (i8  &lt;src&gt;)
+  declare i16 @llvm.cttz.i16(i16 &lt;src&gt;)
   declare i32 @llvm.cttz.i32(i32 &lt;src&gt;)
-  declare i32 @llvm.cttz.i64(i64 &lt;src&gt;)
-  declare i32 @llvm.cttz.i256(i256 &lt;src&gt;)
+  declare i64 @llvm.cttz.i64(i64 &lt;src&gt;)
+  declare i256 @llvm.cttz.i256(i256 &lt;src&gt;)
 </pre>
 
 <h5>Overview:</h5>
@@ -4723,8 +4732,8 @@ of src.  For example, <tt>llvm.cttz(2) = 1</tt>.
 <p>This is an overloaded intrinsic. You can use <tt>llvm.part.select</tt> 
 on any integer bit width.
 <pre>
-  declare i17 @llvm.part.select.i17.i17 (i17 %val, i32 %loBit, i32 %hiBit)
-  declare i29 @llvm.part.select.i29.i29 (i29 %val, i32 %loBit, i32 %hiBit)
+  declare i17 @llvm.part.select.i17 (i17 %val, i32 %loBit, i32 %hiBit)
+  declare i29 @llvm.part.select.i29 (i29 %val, i32 %loBit, i32 %hiBit)
 </pre>
 
 <h5>Overview:</h5>
@@ -4770,8 +4779,8 @@ returned in the reverse order. So, for example, if <tt>X</tt> has the value
 <p>This is an overloaded intrinsic. You can use <tt>llvm.part.set</tt> 
 on any integer bit width.
 <pre>
-  declare i17 @llvm.part.set.i17.i17.i9 (i17 %val, i9 %repl, i32 %lo, i32 %hi)
-  declare i29 @llvm.part.set.i29.i29.i9 (i29 %val, i9 %repl, i32 %lo, i32 %hi)
+  declare i17 @llvm.part.set.i17.i9 (i17 %val, i9 %repl, i32 %lo, i32 %hi)
+  declare i29 @llvm.part.set.i29.i9 (i29 %val, i9 %repl, i32 %lo, i32 %hi)
 </pre>
 
 <h5>Overview:</h5>
diff --git a/include/llvm/AutoUpgrade.h b/include/llvm/AutoUpgrade.h
new file mode 100644
index 0000000000..e3a32b93c9
--- /dev/null
+++ b/include/llvm/AutoUpgrade.h
@@ -0,0 +1,38 @@
+//===-- llvm/AutoUpgrade.h - AutoUpgrade Helpers ----------------*- C++ -*-===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file was developed by Chandler Carruth is distributed under the 
+// University of Illinois Open Source License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+//  These functions are implemented by lib/VMCore/AutoUpgrade.cpp.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_AUTOUPGRADE_H
+#define LLVM_AUTOUPGRADE_H
+
+namespace llvm {
+  class Function;
+  class CallInst;
+  class BasicBlock;
+
+  /// This is a more granular function that simply checks an intrinsic function 
+  /// for upgrading, and if it requires upgrading provides the new function.
+  Function* UpgradeIntrinsicFunction(Function *F);
+
+  /// This is the complement to the above, replacing a specific call to an 
+  /// intrinsic function with a call to the specified new function.
+  void UpgradeIntrinsicCall(CallInst *CI, Function *NewFn);
+  
+  /// This is an auto-upgrade hook for any old intrinsic function syntaxes 
+  /// which need to have both the function updated as well as all calls updated 
+  /// to the new function. This should only be run in a post-processing fashion 
+  /// so that it can update all calls to the old function.
+  void UpgradeCallsToIntrinsic(Function* F);
+
+} // End llvm namespace
+
+#endif
diff --git a/include/llvm/CodeGen/ValueTypes.h b/include/llvm/CodeGen/ValueTypes.h
index 84d80606b5..4ddacba319 100644
--- a/include/llvm/CodeGen/ValueTypes.h
+++ b/include/llvm/CodeGen/ValueTypes.h
@@ -67,14 +67,14 @@ namespace MVT {  // MVT = Machine Value Types
 
     LAST_VALUETYPE =  27,   // This always remains at the end of the list.
 
-    // iAny - An integer value of any bit width. This is used for intrinsics
-    // that have overloadings based on integer bit widths. This is only for
-    // tblgen's consumption!
-    iAny           = 254,   
+    // iAny - An integer or vector integer value of any bit width. This is
+    // used for intrinsics that have overloadings based on integer bit widths.
+    // This is only for tblgen's consumption!
+    iAny           =  254,   
 
     // iPTR - An int value the size of the pointer of the current
     // target.  This should only be used internal to tblgen!
-    iPTR           = 255
+    iPTR           =  255
   };
 
   /// MVT::ValueType - This type holds low-level value types. Valid values
diff --git a/include/llvm/CodeGen/ValueTypes.td b/include/llvm/CodeGen/ValueTypes.td
index a133875d9d..47678d1de9 100644
--- a/include/llvm/CodeGen/ValueTypes.td
+++ b/include/llvm/CodeGen/ValueTypes.td
@@ -50,7 +50,7 @@ def v4f32  : ValueType<128, 25>;   //  4 x f32 vector value
 def v2f64  : ValueType<128, 26>;   //  2 x f64 vector value
 
 // Pseudo valuetype to represent "integer of any bit width"
-def iAny   : ValueType<0  , 254>;   // integer value of any bit width
+def iAny   : ValueType<0  , 254>;
 
 // Pseudo valuetype mapped to the current pointer size.
 def iPTR   : ValueType<0  , 255>;
diff --git a/include/llvm/Intrinsics.td b/include/llvm/Intrinsics.td
index 205eac509a..91f12841f4 100644
--- a/include/llvm/Intrinsics.td
+++ b/include/llvm/Intrinsics.td
@@ -52,59 +52,48 @@ def IntrWriteMem : IntrinsicProperty;
 // Types used by intrinsics.
 //===----------------------------------------------------------------------===//
 
-class LLVMType<ValueType vt, string typeval> {
+class LLVMType<ValueType vt> {
   ValueType VT = vt;
-  string TypeVal = typeval;
 }
 
-class LLVMIntegerType<ValueType VT, int width>
-  : LLVMType<VT, "Type::IntegerTyID"> {
-  int Width = width;
-}
-
-class LLVMVectorType<ValueType VT, int numelts, LLVMType elty>
-  : LLVMType<VT, "Type::VectorTyID">{
-  int NumElts = numelts;
-  LLVMType ElTy = elty;
-} 
-
 class LLVMPointerType<LLVMType elty>
-  : LLVMType<iPTR, "Type::PointerTyID">{
+  : LLVMType<iPTR>{
   LLVMType ElTy = elty;
 } 
 
-class LLVMEmptyStructType
-  : LLVMType<OtherVT, "Type::StructTyID">{
+class LLVMMatchType<int num>
+  : LLVMType<OtherVT>{
+  int Number = num;
 } 
 
-def llvm_void_ty       : LLVMType<isVoid, "Type::VoidTyID">;
-def llvm_int_ty        : LLVMIntegerType<iAny, 0>;
-def llvm_i1_ty         : LLVMIntegerType<i1 , 1>;
-def llvm_i8_ty         : LLVMIntegerType<i8 , 8>;
-def llvm_i16_ty        : LLVMIntegerType<i16, 16>;
-def llvm_i32_ty        : LLVMIntegerType<i32, 32>;
-def llvm_i64_ty        : LLVMIntegerType<i64, 64>;
-def llvm_float_ty      : LLVMType<f32, "Type::FloatTyID">;
-def llvm_double_ty     : LLVMType<f64, "Type::DoubleTyID">;
+def llvm_void_ty       : LLVMType<isVoid>;
+def llvm_anyint_ty     : LLVMType<iAny>;
+def llvm_i1_ty         : LLVMType<i1>;
+def llvm_i8_ty         : LLVMType<i8>;
+def llvm_i16_ty        : LLVMType<i16>;
+def llvm_i32_ty        : LLVMType<i32>;
+def llvm_i64_ty        : LLVMType<i64>;
+def llvm_float_ty      : LLVMType<f32>;
+def llvm_double_ty     : LLVMType<f64>;
 def llvm_ptr_ty        : LLVMPointerType<llvm_i8_ty>;             // i8*
 def llvm_ptrptr_ty     : LLVMPointerType<llvm_ptr_ty>;            // i8**
-def llvm_empty_ty      : LLVMEmptyStructType;                     // { }
+def llvm_empty_ty      : LLVMType<OtherVT>;                       // { }
 def llvm_descriptor_ty : LLVMPointerType<llvm_empty_ty>;          // { }*
 
-def llvm_v16i8_ty      : LLVMVectorType<v16i8,16, llvm_i8_ty>;    // 16 x i8
-def llvm_v8i16_ty      : LLVMVectorType<v8i16, 8, llvm_i16_ty>;   //  8 x i16
-def llvm_v2i64_ty      : LLVMVectorType<v2i64, 2, llvm_i64_ty>;   //  2 x i64
-def llvm_v2i32_ty      : LLVMVectorType<v2i32, 2, llvm_i32_ty>;   //  2 x i32
-def llvm_v1i64_ty      : LLVMVectorType<v1i64, 1, llvm_i64_ty>;   //  1 x i64
-def llvm_v4i32_ty      : LLVMVectorType<v4i32, 4, llvm_i32_ty>;   //  4 x i32
-def llvm_v4f32_ty      : LLVMVectorType<v4f32, 4, llvm_float_ty>; //  4 x float
-def llvm_v2f64_ty      : LLVMVectorType<v2f64, 2, llvm_double_ty>;//  2 x double
+def llvm_v16i8_ty      : LLVMType<v16i8>;    // 16 x i8
+def llvm_v8i16_ty      : LLVMType<v8i16>;    //  8 x i16
+def llvm_v2i64_ty      : LLVMType<v2i64>;    //  2 x i64
+def llvm_v2i32_ty      : LLVMType<v2i32>;    //  2 x i32
+def llvm_v1i64_ty      : LLVMType<v1i64>;    //  1 x i64
+def llvm_v4i32_ty      : LLVMType<v4i32>;    //  4 x i32
+def llvm_v4f32_ty      : LLVMType<v4f32>;    //  4 x float
+def llvm_v2f64_ty      : LLVMType<v2f64>;    //  2 x double
 
 // MMX Vector Types
-def llvm_v8i8_ty       : LLVMVectorType<v8i8,  8, llvm_i8_ty>;    //  8 x i8
-def llvm_v4i16_ty      : LLVMVectorType<v4i16, 4, llvm_i16_ty>;   //  4 x i16
+def llvm_v8i8_ty       : LLVMType<v8i8>;     //  8 x i8
+def llvm_v4i16_ty      : LLVMType<v4i16>;    //  4 x i16
 
-def llvm_vararg_ty     : LLVMType<isVoid, "...">; // vararg
+def llvm_vararg_ty     : LLVMType<isVoid>;   // this means vararg here
 
 //===----------------------------------------------------------------------===//
 // Intrinsic Definitions.
@@ -185,10 +174,10 @@ let Properties = [IntrWriteArgMem] in {
 }
 
 let Properties = [IntrNoMem] in {
-  def int_sqrt_f32 : Intrinsic<[llvm_float_ty , llvm_float_ty]>;
+  def int_sqrt_f32 : Intrinsic<[llvm_float_ty, llvm_float_ty]>;
   def int_sqrt_f64 : Intrinsic<[llvm_double_ty, llvm_double_ty]>;
 
-  def int_powi_f32 : Intrinsic<[llvm_float_ty , llvm_float_ty, llvm_i32_ty]>;
+  def int_powi_f32 : Intrinsic<[llvm_float_ty, llvm_float_ty, llvm_i32_ty]>;
   def int_powi_f64 : Intrinsic<[llvm_double_ty, llvm_double_ty, llvm_i32_ty]>;
 }
 
@@ -203,14 +192,14 @@ def int_siglongjmp : Intrinsic<[llvm_void_ty, llvm_ptr_ty, llvm_i32_ty]>;
 
 // None of these intrinsics accesses memory at all.
 let Properties = [IntrNoMem] in {
-  def int_bswap: Intrinsic<[llvm_int_ty, llvm_int_ty]>;
-  def int_ctpop: Intrinsic<[llvm_i32_ty, llvm_int_ty]>;
-  def int_ctlz : Intrinsic<[llvm_i32_ty, llvm_int_ty]>;
-  def int_cttz : Intrinsic<[llvm_i32_ty, llvm_int_ty]>;
+  def int_bswap: Intrinsic<[llvm_anyint_ty, LLVMMatchType<0>]>;
+  def int_ctpop: Intrinsic<[llvm_anyint_ty, LLVMMatchType<0>]>;
+  def int_ctlz : Intrinsic<[llvm_anyint_ty, LLVMMatchType<0>]>;
+  def int_cttz : Intrinsic<[llvm_anyint_ty, LLVMMatchType<0>]>;
   def int_part_select : 
-     Intrinsic<[llvm_int_ty, llvm_int_ty, llvm_i32_ty, llvm_i32_ty]>;
+     Intrinsic<[llvm_anyint_ty, LLVMMatchType<0>, llvm_i32_ty, llvm_i32_ty]>;
   def int_part_set :
-     Intrinsic<[llvm_int_ty, llvm_int_ty, llvm_int_ty, llvm_i32_ty, 
+     Intrinsic<[llvm_anyint_ty, LLVMMatchType<0>, llvm_anyint_ty, llvm_i32_ty, 
                 llvm_i32_ty]>;
 }
 
diff --git a/lib/Analysis/ConstantFolding.cpp b/lib/Analysis/ConstantFolding.cpp
index e85d150204..dedeb4edf3 100644
--- a/lib/Analysis/ConstantFolding.cpp
+++ b/lib/Analysis/ConstantFolding.cpp
@@ -448,13 +448,13 @@ llvm::ConstantFoldCall(Function *F, Constant** Operands, unsigned NumOperands) {
         return ConstantInt::get(Op->getValue().byteSwap());
       } else if (Name.size() > 11 && !memcmp(&Name[0],"llvm.ctpop",10)) {
         uint64_t ctpop = Op->getValue().countPopulation();
-        return ConstantInt::get(Type::Int32Ty, ctpop);
+        return ConstantInt::get(Ty, ctpop);
       } else if (Name.size() > 10 && !memcmp(&Name[0], "llvm.cttz", 9)) {
         uint64_t cttz = Op->getValue().countTrailingZeros();
-        return ConstantInt::get(Type::Int32Ty, cttz);
+        return ConstantInt::get(Ty, cttz);
       } else if (Name.size() > 10 && !memcmp(&Name[0], "llvm.ctlz", 9)) {
         uint64_t ctlz = Op->getValue().countLeadingZeros();
-        return ConstantInt::get(Type::Int32Ty, ctlz);
+        return ConstantInt::get(Ty, ctlz);
       }
     }
   } else if (NumOperands == 2) {
diff --git a/lib/AsmParser/llvmAsmParser.y b/lib/AsmParser/llvmAsmParser.y
index f93fe06d7e..9d7b063d0c 100644
--- a/lib/AsmParser/llvmAsmParser.y
+++ b/lib/AsmParser/llvmAsmParser.y
@@ -18,6 +18,7 @@
 #include "llvm/Instructions.h"
 #include "llvm/Module.h"
 #include "llvm/ValueSymbolTable.h"
+#include "llvm/AutoUpgrade.h"
 #include "llvm/Support/GetElementPtrTypeIterator.h"
 #include "llvm/Support/CommandLine.h"
 #include "llvm/ADT/SmallVector.h"
@@ -131,6 +132,11 @@ static struct PerModuleInfo {
       return;
     }
 
+    // Look for intrinsic functions and CallInst that need to be upgraded
+    for (Module::iterator FI = CurrentModule->begin(),
+         FE = CurrentModule->end(); FI != FE; )
+      UpgradeCallsToIntrinsic(FI++); // must be post-increment, as we remove
+
     Values.clear();         // Clear out function local definitions
     Types.clear();
     CurrentModule = 0;
diff --git a/lib/Bitcode/Reader/BitcodeReader.cpp b/lib/Bitcode/Reader/BitcodeReader.cpp
index 9c1f49e865..07a4279e13 100644
--- a/lib/Bitcode/Reader/BitcodeReader.cpp
+++ b/lib/Bitcode/Reader/BitcodeReader.cpp
@@ -19,6 +19,7 @@
 #include "llvm/Instructions.h"
 #include "llvm/Module.h"
 #include "llvm/ParameterAttributes.h"
+#include "llvm/AutoUpgrade.h"
 #include "llvm/ADT/SmallString.h"
 #include "llvm/Support/MathExtras.h"
 #include "llvm/Support/MemoryBuffer.h"
@@ -857,6 +858,13 @@ bool BitcodeReader::ParseModule(const std::string &ModuleID) {
       if (!FunctionsWithBodies.empty())
         return Error("Too few function bodies found");
 
+      // Look for intrinsic functions which need to be upgraded at some point
+      for (Module::iterator FI = TheModule->begin(), FE = TheModule->end();
+           FI != FE; ++FI) {
+        if (Function* NewFn = UpgradeIntrinsicFunction(FI))
+          UpgradedIntrinsics.push_back(std::make_pair(FI, NewFn));
+      }
+
       // Force deallocation of memory for these vectors to favor the client that
       // want lazy deserialization.
       std::vector<std::pair<GlobalVariable*, unsigned> >().swap(GlobalInits);
@@ -1588,6 +1596,18 @@ bool BitcodeReader::materializeFunction(Function *F, std::string *ErrInfo) {
     if (ErrInfo) *ErrInfo = ErrorString;
     return true;
   }
+
+  // Upgrade any old intrinsic calls in the function.
+  for (UpgradedIntrinsicMap::iterator I = UpgradedIntrinsics.begin(),
+       E = UpgradedIntrinsics.end(); I != E; ++I) {
+    if (I->first != I->second) {
+      for (Value::use_iterator UI = I->first->use_begin(),
+           UE = I->first->use_end(); UI != UE; ) {
+        if (CallInst* CI = dyn_cast<CallInst>(*UI++))
+          UpgradeIntrinsicCall(CI, I->second);
+      }
+    }
+  }
   
   return false;
 }
@@ -1614,6 +1634,25 @@ Module *BitcodeReader::materializeModule(std::string *ErrInfo) {
         materializeFunction(F, ErrInfo))
       return 0;
   }
+
+  // Upgrade any intrinsic calls that slipped through (should not happen!) and 
+  // delete the old functions to clean up. We can't do this unless the entire 
+  // module is materialized because there could always be another function body 
+  // with calls to the old function.
+  for (std::vector<std::pair<Function*, Function*> >::iterator I =
+       UpgradedIntrinsics.begin(), E = UpgradedIntrinsics.end(); I != E; ++I) {
+    if (I->first != I->second) {
+      for (Value::use_iterator UI = I->first->use_begin(),
+           UE = I->first->use_end(); UI != UE; ) {
+        if (CallInst* CI = dyn_cast<CallInst>(*UI++))
+          UpgradeIntrinsicCall(CI, I->second);
+      }
+      ValueList.replaceUsesOfWith(I->first, I->second);
+      I->first->eraseFromParent();
+    }
+  }
+  std::vector<std::pair<Function*, Function*> >().swap(UpgradedIntrinsics);
+  
   return TheModule;
 }
 
diff --git a/lib/Bitcode/Reader/BitcodeReader.h b/lib/Bitcode/Reader/BitcodeReader.h
index 2f61b06c60..0655a1a91c 100644
--- a/lib/Bitcode/Reader/BitcodeReader.h
+++ b/lib/Bitcode/Reader/BitcodeReader.h
@@ -102,6 +102,11 @@ class BitcodeReader : public ModuleProvider {
   // When reading the module header, this list is populated with functions that
   // have bodies later in the file.
   std::vector<Function*> FunctionsWithBodies;
+
+  // When intrinsic functions are encountered which require upgrading they are 
+  // stored here with their replacement function.
+  typedef std::vector<std::pair<Function*, Function*> > UpgradedIntrinsicMap;
+  UpgradedIntrinsicMap UpgradedIntrinsics;
   
   // After the module header has been read, the FunctionsWithBodies list is 
   // reversed.  This keeps track of whether we've done this yet.
diff --git a/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp b/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
index d1f7669024..afb681f9bd 100644
--- a/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
+++ b/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
@@ -2814,10 +2814,6 @@ SelectionDAGLowering::visitIntrinsicCall(CallInst &I, unsigned Intrinsic) {
     SDOperand Arg = getValue(I.getOperand(1));
     MVT::ValueType Ty = Arg.getValueType();
     SDOperand result = DAG.getNode(ISD::CTTZ, Ty, Arg);
-    if (Ty < MVT::i32)
-      result = DAG.getNode(ISD::ZERO_EXTEND, MVT::i32, result);
-    else if (Ty > MVT::i32)
-      result = DAG.getNode(ISD::TRUNCATE, MVT::i32, result);
     setValue(&I, result);
     return 0;
   }
@@ -2825,10 +2821,6 @@ SelectionDAGLowering::visitIntrinsicCall(CallInst &I, unsigned Intrinsic) {
     SDOperand Arg = getValue(I.getOperand(1));
     MVT::ValueType Ty = Arg.getValueType();
     SDOperand result = DAG.getNode(ISD::CTLZ, Ty, Arg);
-    if (Ty < MVT::i32)
-      result = DAG.getNode(ISD::ZERO_EXTEND, MVT::i32, result);
-    else if (Ty > MVT::i32)
-      result = DAG.getNode(ISD::TRUNCATE, MVT::i32, result);
     setValue(&I, result);
     return 0;
   }
@@ -2836,10 +2828,6 @@ SelectionDAGLowering::visitIntrinsicCall(CallInst &I, unsigned Intrinsic) {
     SDOperand Arg = getValue(I.getOperand(1));
     MVT::ValueType Ty = Arg.getValueType();
     SDOperand result = DAG.getNode(ISD::CTPOP, Ty, Arg);
-    if (Ty < MVT::i32)
-      result = DAG.getNode(ISD::ZERO_EXTEND, MVT::i32, result);
-    else if (Ty > MVT::i32)
-      result = DAG.getNode(ISD::TRUNCATE, MVT::i32, result);
     setValue(&I, result);
     return 0;
   }
diff --git a/lib/Target/X86/X86TargetAsmInfo.cpp b/lib/Target/X86/X86TargetAsmInfo.cpp
index 79df32b565..bf206b4806 100644
--- a/lib/Target/X86/X86TargetAsmInfo.cpp
+++ b/lib/Target/X86/X86TargetAsmInfo.cpp
@@ -219,9 +219,9 @@ bool X86TargetAsmInfo::LowerToBSwap(CallInst *CI) const {
     return false;
   
   // Okay, we can do this xform, do so now.
-  const Type *Tys[] = { Ty, Ty };
+  const Type *Tys[] = { Ty };
   Module *M = CI->getParent()->getParent()->getParent();
-  Constant *Int = Intrinsic::getDeclaration(M, Intrinsic::bswap, Tys, 2);
+  Constant *Int = Intrinsic::getDeclaration(M, Intrinsic::bswap, Tys, 1);
   
   Value *Op = CI->getOperand(1);
   Op = new CallInst(Int, Op, CI->getName(), CI);
diff --git a/lib/Transforms/Scalar/InstructionCombining.cpp b/lib/Transforms/Scalar/InstructionCombining.cpp
index e843074920..4330b16746 100644
--- a/lib/Transforms/Scalar/InstructionCombining.cpp
+++ b/lib/Transforms/Scalar/InstructionCombining.cpp
@@ -3717,9 +3717,9 @@ Instruction *InstCombiner::MatchBSwap(BinaryOperator &I) {
   for (unsigned i = 1, e = ByteValues.size(); i != e; ++i)
     if (ByteValues[i] != V)
       return 0;
-  const Type *Tys[] = { ITy, ITy };
+  const Type *Tys[] = { ITy };
   Module *M = I.getParent()->getParent()->getParent();
-  Function *F = Intrinsic::getDeclaration(M, Intrinsic::bswap, Tys, 2);
+  Function *F = Intrinsic::getDeclaration(M, Intrinsic::bswap, Tys, 1);
   return new CallInst(F, V);
 }
 
diff --git a/lib/VMCore/AutoUpgrade.cpp b/lib/VMCore/AutoUpgrade.cpp
new file mode 100644
index 0000000000..b56fe70235
--- /dev/null
+++ b/lib/VMCore/AutoUpgrade.cpp
@@ -0,0 +1,197 @@
+//===-- AutoUpgrade.cpp - Implement auto-upgrade helper functions ---------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file was developed by Chandler Carruth and is distributed under the 
+// University of Illinois Open Source License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// This file implements the auto-upgrade helper functions 
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/AutoUpgrade.h"
+#include "llvm/Function.h"
+#include "llvm/Module.h"
+#include "llvm/Instructions.h"
+#include "llvm/ParameterAttributes.h"
+#include "llvm/Intrinsics.h"
+using namespace llvm;
+
+
+Function* llvm::UpgradeIntrinsicFunction(Function *F) {
+  assert(F && "Illegal to upgrade a non-existent Function.");
+
+  // Get the Function's name.
+  const std::string& Name = F->getName();
+
+  // Convenience
+  const FunctionType *FTy = F->getFunctionType();
+
+  // Quickly eliminate it, if it's not a candidate.
+  if (Name.length() <= 8 || Name[0] != 'l' || Name[1] != 'l' || 
+      Name[2] != 'v' || Name[3] != 'm' || Name[4] != '.')
+    return 0;
+
+  Module *M = F->getParent();
+  switch (Name[5]) {
+  default: break;
+  case 'b':
+    //  This upgrades the name of the llvm.bswap intrinsic function to only use 
+    //  a single type name for overloading. We only care about the old format
+    //  'llvm.bswap.i*.i*', so check for 'bswap.' and then for there being 
+    //  a '.' after 'bswap.'
+    if (Name.compare(5,6,"bswap.",6) == 0) {
+      std::string::size_type delim = Name.find('.',11);
+      
+      if (delim != std::string::npos) {
+        //  Construct the new name as 'llvm.bswap' + '.i*'
+        F->setName(Name.substr(0,10)+Name.substr(delim));
+        return F;
+      }
+    }
+    break;
+
+  case 'c':
+    //  We only want to fix the 'llvm.ct*' intrinsics which do not have the 
+    //  correct return type, so we check for the name, and then check if the 
+    //  return type does not match the parameter type.
+    if ( (Name.compare(5,5,"ctpop",5) == 0 ||
+          Name.compare(5,4,"ctlz",4) == 0 ||
+          Name.compare(5,4,"cttz",4) == 0) &&
+        FTy->getReturnType() != FTy->getParamType(0)) {
+      //  We first need to change the name of the old (bad) intrinsic, because 
+      //  its type is incorrect, but we cannot overload that name. We 
+      //  arbitrarily unique it here allowing us to construct a correctly named 
+      //  and typed function below.
+      F->setName("");
+
+      //  Now construct the new intrinsic with the correct name and type. We 
+      //  leave the old function around in order to query its type, whatever it 
+      //  may be, and correctly convert up to the new type.
+      return cast<Function>(M->getOrInsertFunction(Name, 
+                                                   FTy->getParamType(0),
+                                                   FTy->getParamType(0),
+                                                   (Type *)0));
+    }
+    break;
+
+  case 'p':
+    //  This upgrades the llvm.part.select overloaded intrinsic names to only