Instruction fixes, added instructions, and AsmString changes in the

X86 instruction tables. Also (while I was at it) cleaned up the X86 tables, removing tabs and 80-line violations. This patch was reviewed by Chris Lattner, but please let me know if there are any problems. * X86*.td Removed tabs and fixed 80-line violations * X86Instr64bit.td (IRET, POPCNT, BT_, LSL, SWPGS, PUSH_S, POP_S, L_S, SMSW) Added (CALL, CMOV) Added qualifiers (JMP) Added PC-relative jump instruction (POPFQ/PUSHFQ) Added qualifiers; renamed PUSHFQ to indicate that it is 64-bit only (ambiguous since it has no REX prefix) (MOV) Added rr form going the other way, which is encoded differently (MOV) Changed immediates to offsets, which is more correct; also fixed MOV64o64a to have to a 64-bit offset (MOV) Fixed qualifiers (MOV) Added debug-register and condition-register moves (MOVZX) Added more forms (ADC, SUB, SBB, AND, OR, XOR) Added reverse forms, which (as with MOV) are encoded differently (ROL) Made REX.W required (BT) Uncommented mr form for disassembly only (CVT__2__) Added several missing non-intrinsic forms (LXADD, XCHG) Reordered operands to make more sense for MRMSrcMem (XCHG) Added register-to-register forms (XADD, CMPXCHG, XCHG) Added non-locked forms * X86InstrSSE.td (CVTSS2SI, COMISS, CVTTPS2DQ, CVTPS2PD, CVTPD2PS, MOVQ) Added * X86InstrFPStack.td (COM_FST0, COMP_FST0, COM_FI, COM_FIP, FFREE, FNCLEX, FNOP, FXAM, FLDL2T, FLDL2E, FLDPI, FLDLG2, FLDLN2, F2XM1, FYL2X, FPTAN, FPATAN, FXTRACT, FPREM1, FDECSTP, FINCSTP, FPREM, FYL2XP1, FSINCOS, FRNDINT, FSCALE, FCOMPP, FXSAVE, FXRSTOR) Added (FCOM, FCOMP) Added qualifiers (FSTENV, FSAVE, FSTSW) Fixed opcode names (FNSTSW) Added implicit register operand * X86InstrInfo.td (opaque512mem) Added for FXSAVE/FXRSTOR (offset8, offset16, offset32, offset64) Added for MOV (NOOPW, IRET, POPCNT, IN, BTC, BTR, BTS, LSL, INVLPG, STR, LTR, PUSHFS, PUSHGS, POPFS, POPGS, LDS, LSS, LES, LFS, LGS, VERR, VERW, SGDT, SIDT, SLDT, LGDT, LIDT, LLDT, LODSD, OUTSB, OUTSW, OUTSD, HLT, RSM, FNINIT, CLC, STC, CLI, STI, CLD, STD, CMC, CLTS, XLAT, WRMSR, RDMSR, RDPMC, SMSW, LMSW, CPUID, INVD, WBINVD, INVEPT, INVVPID, VMCALL, VMCLEAR, VMLAUNCH, VMRESUME, VMPTRLD, VMPTRST, VMREAD, VMWRITE, VMXOFF, VMXON) Added (NOOPL, POPF, POPFD, PUSHF, PUSHFD) Added qualifier (JO, JNO, JB, JAE, JE, JNE, JBE, JA, JS, JNS, JP, JNP, JL, JGE, JLE, JG, JCXZ) Added 32-bit forms (MOV) Changed some immediate forms to offset forms (MOV) Added reversed reg-reg forms, which are encoded differently (MOV) Added debug-register and condition-register moves (CMOV) Added qualifiers (AND, OR, XOR, ADC, SUB, SBB) Added reverse forms, like MOV (BT) Uncommented memory-register forms for disassembler (MOVSX, MOVZX) Added forms (XCHG, LXADD) Made operand order make sense for MRMSrcMem (XCHG) Added register-register forms (XADD, CMPXCHG) Added unlocked forms * X86InstrMMX.td (MMX_MOVD, MMV_MOVQ) Added forms * X86InstrInfo.cpp: Changed PUSHFQ to PUSHFQ64 to reflect table change * X86RegisterInfo.td: Added debug and condition register sets * x86-64-pic-3.ll: Fixed testcase to reflect call qualifier * peep-test-3.ll: Fixed testcase to reflect test qualifier * cmov.ll: Fixed testcase to reflect cmov qualifier * loop-blocks.ll: Fixed testcase to reflect call qualifier * x86-64-pic-11.ll: Fixed testcase to reflect call qualifier * 2009-11-04-SubregCoalescingBug.ll: Fixed testcase to reflect call qualifier * x86-64-pic-2.ll: Fixed testcase to reflect call qualifier * live-out-reg-info.ll: Fixed testcase to reflect test qualifier * tail-opts.ll: Fixed testcase to reflect call qualifiers * x86-64-pic-10.ll: Fixed testcase to reflect call qualifier * bss-pagealigned.ll: Fixed testcase to reflect call qualifier * x86-64-pic-1.ll: Fixed testcase to reflect call qualifier * widen_load-1.ll: Fixed testcase to reflect call qualifier git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@91638 91177308-0d34-0410-b5e6-96231b3b80d8
author: Sean Callanan <scallanan@apple.com> 2009-12-18 00:01:26 +0000
committer: Sean Callanan <scallanan@apple.com> 2009-12-18 00:01:26 +0000
commit: 108934c65d4cba18f08ed4fab0cae506c20fd212 (patch)
tree: 693abb3580c9939943d0ee2b300d8f8242f1c931
parent: a6923131032be5e47b5e1155e69b23aa4c5e65ac (diff)
24 files changed, 1433 insertions, 620 deletions
diff --git a/lib/Target/X86/X86.td b/lib/Target/X86/X86.td
index da467fe6aa..a6e1ca3128 100644
--- a/lib/Target/X86/X86.td
+++ b/lib/Target/X86/X86.td
@@ -63,7 +63,7 @@ def FeatureSSE4A   : SubtargetFeature<"sse4a", "HasSSE4A", "true",
 def FeatureAVX     : SubtargetFeature<"avx", "HasAVX", "true",
                                       "Enable AVX instructions">;
 def FeatureFMA3    : SubtargetFeature<"fma3", "HasFMA3", "true",
-                                      "Enable three-operand fused multiple-add">;
+                                     "Enable three-operand fused multiple-add">;
 def FeatureFMA4    : SubtargetFeature<"fma4", "HasFMA4", "true",
                                       "Enable four-operand fused multiple-add">;
 
diff --git a/lib/Target/X86/X86Instr64bit.td b/lib/Target/X86/X86Instr64bit.td
index 0751b9da8e..65fbbdae9a 100644
--- a/lib/Target/X86/X86Instr64bit.td
+++ b/lib/Target/X86/X86Instr64bit.td
@@ -111,6 +111,9 @@ def ADJCALLSTACKUP64   : I<0, Pseudo, (outs), (ins i32imm:$amt1, i32imm:$amt2),
                           Requires<[In64BitMode]>;
 }
 
+// Interrupt Instructions
+def IRET64 : RI<0xcf, RawFrm, (outs), (ins), "iret{q}", []>;
+
 //===----------------------------------------------------------------------===//
 //  Call Instructions...
 //
@@ -131,20 +134,21 @@ let isCall = 1 in
     // the 32-bit pcrel field that we have.
     def CALL64pcrel32 : Ii32<0xE8, RawFrm,
                           (outs), (ins i64i32imm_pcrel:$dst, variable_ops),
-                          "call\t$dst", []>,
+                          "call{q}\t$dst", []>,
                         Requires<[In64BitMode, NotWin64]>;
     def CALL64r       : I<0xFF, MRM2r, (outs), (ins GR64:$dst, variable_ops),
-                          "call\t{*}$dst", [(X86call GR64:$dst)]>,
+                          "call{q}\t{*}$dst", [(X86call GR64:$dst)]>,
                         Requires<[NotWin64]>;
     def CALL64m       : I<0xFF, MRM2m, (outs), (ins i64mem:$dst, variable_ops),
-                          "call\t{*}$dst", [(X86call (loadi64 addr:$dst))]>,
+                          "call{q}\t{*}$dst", [(X86call (loadi64 addr:$dst))]>,
                         Requires<[NotWin64]>;
                         
     def FARCALL64   : RI<0xFF, MRM3m, (outs), (ins opaque80mem:$dst),
                          "lcall{q}\t{*}$dst", []>;
   }
 
-  // FIXME: We need to teach codegen about single list of call-clobbered registers.
+  // FIXME: We need to teach codegen about single list of call-clobbered 
+  // registers.
 let isCall = 1 in
   // All calls clobber the non-callee saved registers. RSP is marked as
   // a use to prevent stack-pointer assignments that appear immediately
@@ -162,9 +166,10 @@ let isCall = 1 in
     def WINCALL64r       : I<0xFF, MRM2r, (outs), (ins GR64:$dst, variable_ops),
                              "call\t{*}$dst",
                              [(X86call GR64:$dst)]>, Requires<[IsWin64]>;
-    def WINCALL64m       : I<0xFF, MRM2m, (outs), (ins i64mem:$dst, variable_ops),
-                             "call\t{*}$dst",
-                             [(X86call (loadi64 addr:$dst))]>, Requires<[IsWin64]>;
+    def WINCALL64m       : I<0xFF, MRM2m, (outs), 
+                             (ins i64mem:$dst, variable_ops), "call\t{*}$dst",
+                             [(X86call (loadi64 addr:$dst))]>, 
+                           Requires<[IsWin64]>;
   }
 
 
@@ -188,6 +193,8 @@ let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1 in
 
 // Branches
 let isBranch = 1, isTerminator = 1, isBarrier = 1, isIndirectBranch = 1 in {
+  def JMP64pcrel32 : I<0xE9, RawFrm, (outs), (ins brtarget:$dst), 
+                       "jmp{q}\t$dst", []>;
   def JMP64r     : I<0xFF, MRM4r, (outs), (ins GR64:$dst), "jmp{q}\t{*}$dst",
                      [(brind GR64:$dst)]>;
   def JMP64m     : I<0xFF, MRM4m, (outs), (ins i64mem:$dst), "jmp{q}\t{*}$dst",
@@ -210,6 +217,12 @@ def EH_RETURN64   : I<0xC3, RawFrm, (outs), (ins GR64:$addr),
 //===----------------------------------------------------------------------===//
 //  Miscellaneous Instructions...
 //
+
+def POPCNT64rr : RI<0xB8, MRMSrcReg, (outs GR64:$dst), (ins GR64:$src),
+                    "popcnt{q}\t{$src, $dst|$dst, $src}", []>, XS;
+def POPCNT64rm : RI<0xB8, MRMSrcMem, (outs GR64:$dst), (ins i64mem:$src),
+                    "popcnt{q}\t{$src, $dst|$dst, $src}", []>, XS;
+
 let Defs = [RBP,RSP], Uses = [RBP,RSP], mayLoad = 1, neverHasSideEffects = 1 in
 def LEAVE64  : I<0xC9, RawFrm,
                  (outs), (ins), "leave", []>;
@@ -238,9 +251,9 @@ def PUSH64i32  : Ii32<0x68, RawFrm, (outs), (ins i32imm:$imm),
 }
 
 let Defs = [RSP, EFLAGS], Uses = [RSP], mayLoad = 1 in
-def POPFQ    : I<0x9D, RawFrm, (outs), (ins), "popf", []>, REX_W;
+def POPFQ    : I<0x9D, RawFrm, (outs), (ins), "popf{q}", []>, REX_W;
 let Defs = [RSP], Uses = [RSP, EFLAGS], mayStore = 1 in
-def PUSHFQ   : I<0x9C, RawFrm, (outs), (ins), "pushf", []>;
+def PUSHFQ64   : I<0x9C, RawFrm, (outs), (ins), "pushf{q}", []>;
 
 def LEA64_32r : I<0x8D, MRMSrcMem,
                   (outs GR32:$dst), (ins lea64_32mem:$src),
@@ -309,6 +322,9 @@ def MOV64ri32 : RIi32<0xC7, MRM0r, (outs GR64:$dst), (ins i64i32imm:$src),
                       [(set GR64:$dst, i64immSExt32:$src)]>;
 }
 
+def MOV64rr_REV : RI<0x8B, MRMSrcReg, (outs GR64:$dst), (ins GR64:$src),
+                     "mov{q}\t{$src, $dst|$dst, $src}", []>;
+
 let canFoldAsLoad = 1, isReMaterializable = 1, mayHaveSideEffects = 1 in
 def MOV64rm : RI<0x8B, MRMSrcMem, (outs GR64:$dst), (ins i64mem:$src),
                  "mov{q}\t{$src, $dst|$dst, $src}",
@@ -321,24 +337,36 @@ def MOV64mi32 : RIi32<0xC7, MRM0m, (outs), (ins i64mem:$dst, i64i32imm:$src),
                       "mov{q}\t{$src, $dst|$dst, $src}",
                       [(store i64immSExt32:$src, addr:$dst)]>;
 
-def MOV64o8a : RIi8<0xA0, RawFrm, (outs), (ins i8imm:$src),
+def MOV64o8a : RIi8<0xA0, RawFrm, (outs), (ins offset8:$src),
                       "mov{q}\t{$src, %rax|%rax, $src}", []>;
-def MOV64o32a : RIi32<0xA1, RawFrm, (outs), (ins i32imm:$src),
+def MOV64o64a : RIi32<0xA1, RawFrm, (outs), (ins offset64:$src),
                        "mov{q}\t{$src, %rax|%rax, $src}", []>;
-def MOV64ao8 : RIi8<0xA2, RawFrm, (outs i8imm:$dst), (ins),
+def MOV64ao8 : RIi8<0xA2, RawFrm, (outs offset8:$dst), (ins),
                        "mov{q}\t{%rax, $dst|$dst, %rax}", []>;
-def MOV64ao32 : RIi32<0xA3, RawFrm, (outs i32imm:$dst), (ins),
+def MOV64ao64 : RIi32<0xA3, RawFrm, (outs offset64:$dst), (ins),
                        "mov{q}\t{%rax, $dst|$dst, %rax}", []>;
 
 // Moves to and from segment registers
 def MOV64rs : RI<0x8C, MRMDestReg, (outs GR64:$dst), (ins SEGMENT_REG:$src),
-                 "mov{w}\t{$src, $dst|$dst, $src}", []>;
+                 "mov{q}\t{$src, $dst|$dst, $src}", []>;
 def MOV64ms : RI<0x8C, MRMDestMem, (outs i64mem:$dst), (ins SEGMENT_REG:$src),
-                 "mov{w}\t{$src, $dst|$dst, $src}", []>;
+                 "mov{q}\t{$src, $dst|$dst, $src}", []>;
 def MOV64sr : RI<0x8E, MRMSrcReg, (outs SEGMENT_REG:$dst), (ins GR64:$src),
-                 "mov{w}\t{$src, $dst|$dst, $src}", []>;
+                 "mov{q}\t{$src, $dst|$dst, $src}", []>;
 def MOV64sm : RI<0x8E, MRMSrcMem, (outs SEGMENT_REG:$dst), (ins i64mem:$src),
-                 "mov{w}\t{$src, $dst|$dst, $src}", []>;
+                 "mov{q}\t{$src, $dst|$dst, $src}", []>;
+
+// Moves to and from debug registers
+def MOV64rd : I<0x21, MRMDestReg, (outs GR64:$dst), (ins DEBUG_REG:$src),
+                "mov{q}\t{$src, $dst|$dst, $src}", []>, TB;
+def MOV64dr : I<0x23, MRMSrcReg, (outs DEBUG_REG:$dst), (ins GR64:$src),
+                "mov{q}\t{$src, $dst|$dst, $src}", []>, TB;
+
+// Moves to and from control registers
+def MOV64rc : I<0x20, MRMDestReg, (outs GR64:$dst), (ins CONTROL_REG_64:$src),
+                "mov{q}\t{$src, $dst|$dst, $src}", []>, TB;
+def MOV64cr : I<0x22, MRMSrcReg, (outs CONTROL_REG_64:$dst), (ins GR64:$src),
+                "mov{q}\t{$src, $dst|$dst, $src}", []>, TB;
 
 // Sign/Zero extenders
 
@@ -365,6 +393,16 @@ def MOVSX64rm32: RI<0x63, MRMSrcMem, (outs GR64:$dst), (ins i32mem:$src),
                     "movs{lq|xd}\t{$src, $dst|$dst, $src}",
                     [(set GR64:$dst, (sextloadi64i32 addr:$src))]>;
 
+// movzbq and movzwq encodings for the disassembler
+def MOVZX64rr8_Q : RI<0xB6, MRMSrcReg, (outs GR64:$dst), (ins GR8:$src),
+                       "movz{bq|x}\t{$src, $dst|$dst, $src}", []>, TB;
+def MOVZX64rm8_Q : RI<0xB6, MRMSrcMem, (outs GR64:$dst), (ins i8mem:$src),
+                       "movz{bq|x}\t{$src, $dst|$dst, $src}", []>, TB;
+def MOVZX64rr16_Q : RI<0xB7, MRMSrcReg, (outs GR64:$dst), (ins GR16:$src),
+                       "movz{wq|x}\t{$src, $dst|$dst, $src}", []>, TB;
+def MOVZX64rm16_Q : RI<0xB7, MRMSrcMem, (outs GR64:$dst), (ins i16mem:$src),
+                       "movz{wq|x}\t{$src, $dst|$dst, $src}", []>, TB;
+
 // Use movzbl instead of movzbq when the destination is a register; it's
 // equivalent due to implicit zero-extending, and it has a smaller encoding.
 def MOVZX64rr8 : I<0xB6, MRMSrcReg, (outs GR64:$dst), (ins GR8 :$src),
@@ -430,31 +468,36 @@ let isTwoAddress = 1 in {
 let isConvertibleToThreeAddress = 1 in {
 let isCommutable = 1 in
 // Register-Register Addition
-def ADD64rr    : RI<0x01, MRMDestReg, (outs GR64:$dst), (ins GR64:$src1, GR64:$src2),
+def ADD64rr    : RI<0x01, MRMDestReg, (outs GR64:$dst), 
+                    (ins GR64:$src1, GR64:$src2),
                     "add{q}\t{$src2, $dst|$dst, $src2}",
                     [(set GR64:$dst, (add GR64:$src1, GR64:$src2)),
                      (implicit EFLAGS)]>;
 
 // Register-Integer Addition
-def ADD64ri8  : RIi8<0x83, MRM0r, (outs GR64:$dst), (ins GR64:$src1, i64i8imm:$src2),
+def ADD64ri8  : RIi8<0x83, MRM0r, (outs GR64:$dst), 
+                     (ins GR64:$src1, i64i8imm:$src2),
                      "add{q}\t{$src2, $dst|$dst, $src2}",
                      [(set GR64:$dst, (add GR64:$src1, i64immSExt8:$src2)),
                       (implicit EFLAGS)]>;
-def ADD64ri32 : RIi32<0x81, MRM0r, (outs GR64:$dst), (ins GR64:$src1, i64i32imm:$src2),
+def ADD64ri32 : RIi32<0x81, MRM0r, (outs GR64:$dst), 
+                      (ins GR64:$src1, i64i32imm:$src2),
                       "add{q}\t{$src2, $dst|$dst, $src2}",
                       [(set GR64:$dst, (add GR64:$src1, i64immSExt32:$src2)),
                        (implicit EFLAGS)]>;
 } // isConvertibleToThreeAddress
 
 // Register-Memory Addition
-def ADD64rm     : RI<0x03, MRMSrcMem, (outs GR64:$dst), (ins GR64:$src1, i64mem:$src2),
+def ADD64rm     : RI<0x03, MRMSrcMem, (outs GR64:$dst), 
+                     (ins GR64:$src1, i64mem:$src2),
                      "add{q}\t{$src2, $dst|$dst, $src2}",
                      [(set GR64:$dst, (add GR64:$src1, (load addr:$src2))),
                       (implicit EFLAGS)]>;
 
 // Register-Register Addition - Equivalent to the normal rr form (ADD64rr), but
 //   differently encoded.
-def ADD64mrmrr  : RI<0x03, MRMSrcReg, (outs GR64:$dst), (ins GR64:$src1, GR64:$src2),
+def ADD64mrmrr  : RI<0x03, MRMSrcReg, (outs GR64:$dst), 
+                     (ins GR64:$src1, GR64:$src2),
                      "add{l}\t{$src2, $dst|$dst, $src2}", []>;
 
 } // isTwoAddress
@@ -480,18 +523,26 @@ def ADC64i32 : RI<0x15, RawFrm, (outs), (ins i32imm:$src),
 
 let isTwoAddress = 1 in {
 let isCommutable = 1 in
-def ADC64rr  : RI<0x11, MRMDestReg, (outs GR64:$dst), (ins GR64:$src1, GR64:$src2),
+def ADC64rr  : RI<0x11, MRMDestReg, (outs GR64:$dst), 
+                  (ins GR64:$src1, GR64:$src2),
                   "adc{q}\t{$src2, $dst|$dst, $src2}",
                   [(set GR64:$dst, (adde GR64:$src1, GR64:$src2))]>;
 
-def ADC64rm  : RI<0x13, MRMSrcMem , (outs GR64:$dst), (ins GR64:$src1, i64mem:$src2),
+def ADC64rr_REV : RI<0x13, MRMSrcReg , (outs GR32:$dst), 
+                     (ins GR64:$src1, GR64:$src2),
+                    "adc{q}\t{$src2, $dst|$dst, $src2}", []>;
+
+def ADC64rm  : RI<0x13, MRMSrcMem , (outs GR64:$dst), 
+                  (ins GR64:$src1, i64mem:$src2),
                   "adc{q}\t{$src2, $dst|$dst, $src2}",
                   [(set GR64:$dst, (adde GR64:$src1, (load addr:$src2)))]>;
 
-def ADC64ri8 : RIi8<0x83, MRM2r, (outs GR64:$dst), (ins GR64:$src1, i64i8imm:$src2),
+def ADC64ri8 : RIi8<0x83, MRM2r, (outs GR64:$dst), 
+                    (ins GR64:$src1, i64i8imm:$src2),
                     "adc{q}\t{$src2, $dst|$dst, $src2}",
                     [(set GR64:$dst, (adde GR64:$src1, i64immSExt8:$src2))]>;
-def ADC64ri32 : RIi32<0x81, MRM2r, (outs GR64:$dst), (ins GR64:$src1, i64i32imm:$src2),
+def ADC64ri32 : RIi32<0x81, MRM2r, (outs GR64:$dst), 
+                      (ins GR64:$src1, i64i32imm:$src2),
                       "adc{q}\t{$src2, $dst|$dst, $src2}",
                       [(set GR64:$dst, (adde GR64:$src1, i64immSExt32:$src2))]>;
 } // isTwoAddress
@@ -501,21 +552,29 @@ def ADC64mr  : RI<0x11, MRMDestMem, (outs), (ins i64mem:$dst, GR64:$src2),
                   [(store (adde (load addr:$dst), GR64:$src2), addr:$dst)]>;
 def ADC64mi8 : RIi8<0x83, MRM2m, (outs), (ins i64mem:$dst, i64i8imm :$src2),
                     "adc{q}\t{$src2, $dst|$dst, $src2}",
-                 [(store (adde (load addr:$dst), i64immSExt8:$src2), addr:$dst)]>;
+                 [(store (adde (load addr:$dst), i64immSExt8:$src2), 
+                  addr:$dst)]>;
 def ADC64mi32 : RIi32<0x81, MRM2m, (outs), (ins i64mem:$dst, i64i32imm:$src2),
                       "adc{q}\t{$src2, $dst|$dst, $src2}",
-                 [(store (adde (load addr:$dst), i64immSExt8:$src2), addr:$dst)]>;
+                 [(store (adde (load addr:$dst), i64immSExt8:$src2), 
+                  addr:$dst)]>;
 } // Uses = [EFLAGS]
 
 let isTwoAddress = 1 in {
 // Register-Register Subtraction
-def SUB64rr  : RI<0x29, MRMDestReg, (outs GR64:$dst), (ins GR64:$src1, GR64:$src2),
+def SUB64rr  : RI<0x29, MRMDestReg, (outs GR64:$dst), 
+                  (ins GR64:$src1, GR64:$src2),
                   "sub{q}\t{$src2, $dst|$dst, $src2}",
                   [(set GR64:$dst, (sub GR64:$src1, GR64:$src2)),
                    (implicit EFLAGS)]>;
 
+def SUB64rr_REV : RI<0x2B, MRMSrcReg, (outs GR64:$dst), 
+                     (ins GR64:$src1, GR64:$src2),
+                     "sub{q}\t{$src2, $dst|$dst, $src2}", []>;
+
 // Register-Memory Subtraction
-def SUB64rm  : RI<0x2B, MRMSrcMem, (outs GR64:$dst), (ins GR64:$src1, i64mem:$src2),
+def SUB64rm  : RI<0x2B, MRMSrcMem, (outs GR64:$dst), 
+                  (ins GR64:$src1, i64mem:$src2),
                   "sub{q}\t{$src2, $dst|$dst, $src2}",
                   [(set GR64:$dst, (sub GR64:$src1, (load addr:$src2))),
                    (implicit EFLAGS)]>;
@@ -556,18 +615,26 @@ def SUB64mi32 : RIi32<0x81, MRM5m, (outs), (ins i64mem:$dst, i64i32imm:$src2),
 
 let Uses = [EFLAGS] in {
 let isTwoAddress = 1 in {
-def SBB64rr    : RI<0x19, MRMDestReg, (outs GR64:$dst), (ins GR64:$src1, GR64:$src2),
+def SBB64rr    : RI<0x19, MRMDestReg, (outs GR64:$dst), 
+                    (ins GR64:$src1, GR64:$src2),
                     "sbb{q}\t{$src2, $dst|$dst, $src2}",
                     [(set GR64:$dst, (sube GR64:$src1, GR64:$src2))]>;
 
-def SBB64rm  : RI<0x1B, MRMSrcMem, (outs GR64:$dst), (ins GR64:$src1, i64mem:$src2),
+def SBB64rr_REV : RI<0x1B, MRMSrcReg, (outs GR64:$dst), 
+                     (ins GR64:$src1, GR64:$src2),
+                     "sbb{q}\t{$src2, $dst|$dst, $src2}", []>;
+                     
+def SBB64rm  : RI<0x1B, MRMSrcMem, (outs GR64:$dst), 
+                  (ins GR64:$src1, i64mem:$src2),
                   "sbb{q}\t{$src2, $dst|$dst, $src2}",
                   [(set GR64:$dst, (sube GR64:$src1, (load addr:$src2)))]>;
 
-def SBB64ri8 : RIi8<0x83, MRM3r, (outs GR64:$dst), (ins GR64:$src1, i64i8imm:$src2),
+def SBB64ri8 : RIi8<0x83, MRM3r, (outs GR64:$dst), 
+                    (ins GR64:$src1, i64i8imm:$src2),
                     "sbb{q}\t{$src2, $dst|$dst, $src2}",
                     [(set GR64:$dst, (sube GR64:$src1, i64immSExt8:$src2))]>;
-def SBB64ri32 : RIi32<0x81, MRM3r, (outs GR64:$dst), (ins GR64:$src1, i64i32imm:$src2),
+def SBB64ri32 : RIi32<0x81, MRM3r, (outs GR64:$dst), 
+                      (ins GR64:$src1, i64i32imm:$src2),
                       "sbb{q}\t{$src2, $dst|$dst, $src2}",
                       [(set GR64:$dst, (sube GR64:$src1, i64immSExt32:$src2))]>;
 } // isTwoAddress
@@ -652,15 +719,19 @@ def IMUL64rmi32 : RIi32<0x69, MRMSrcMem,                   // GR64 = [mem64]*I32
 
 // Unsigned division / remainder
 let Defs = [RAX,RDX,EFLAGS], Uses = [RAX,RDX] in {
-def DIV64r : RI<0xF7, MRM6r, (outs), (ins GR64:$src),        // RDX:RAX/r64 = RAX,RDX
+// RDX:RAX/r64 = RAX,RDX
+def DIV64r : RI<0xF7, MRM6r, (outs), (ins GR64:$src),
                 "div{q}\t$src", []>;
 // Signed division / remainder
-def IDIV64r: RI<0xF7, MRM7r, (outs), (ins GR64:$src),        // RDX:RAX/r64 = RAX,RDX
+// RDX:RAX/r64 = RAX,RDX
+def IDIV64r: RI<0xF7, MRM7r, (outs), (ins GR64:$src),
                 "idiv{q}\t$src", []>;
 let mayLoad = 1 in {
-def DIV64m : RI<0xF7, MRM6m, (outs), (ins i64mem:$src),      // RDX:RAX/[mem64] = RAX,RDX
+// RDX:RAX/[mem64] = RAX,RDX
+def DIV64m : RI<0xF7, MRM6m, (outs), (ins i64mem:$src),
                 "div{q}\t$src", []>;
-def IDIV64m: RI<0xF7, MRM7m, (outs), (ins i64mem:$src),      // RDX:RAX/[mem64] = RAX,RDX
+// RDX:RAX/[mem64] = RAX,RDX
+def IDIV64m: RI<0xF7, MRM7m, (outs), (ins i64mem:$src),
                 "idiv{q}\t$src", []>;
 }
 }
@@ -694,19 +765,23 @@ def DEC64m : RI<0xFF, MRM1m, (outs), (ins i64mem:$dst), "dec{q}\t$dst",
 // In 64-bit mode, single byte INC and DEC cannot be encoded.
 let isTwoAddress = 1, isConvertibleToThreeAddress = 1 in {
 // Can transform into LEA.
-def INC64_16r : I<0xFF, MRM0r, (outs GR16:$dst), (ins GR16:$src), "inc{w}\t$dst",
+def INC64_16r : I<0xFF, MRM0r, (outs GR16:$dst), (ins GR16:$src), 
+                  "inc{w}\t$dst",
                   [(set GR16:$dst, (add GR16:$src, 1)),
                    (implicit EFLAGS)]>,
                 OpSize, Requires<[In64BitMode]>;
-def INC64_32r : I<0xFF, MRM0r, (outs GR32:$dst), (ins GR32:$src), "inc{l}\t$dst",
+def INC64_32r : I<0xFF, MRM0r, (outs GR32:$dst), (ins GR32:$src), 
+                  "inc{l}\t$dst",
                   [(set GR32:$dst, (add GR32:$src, 1)),
                    (implicit EFLAGS)]>,
                 Requires<[In64BitMode]>;
-def DEC64_16r : I<0xFF, MRM1r, (outs GR16:$dst), (ins GR16:$src), "dec{w}\t$dst",
+def DEC64_16r : I<0xFF, MRM1r, (outs GR16:$dst), (ins GR16:$src), 
+                  "dec{w}\t$dst",
                   [(set GR16:$dst, (add GR16:$src, -1)),
                    (implicit EFLAGS)]>,
                 OpSize, Requires<[In64BitMode]>;
-def DEC64_32r : I<0xFF, MRM1r, (outs GR32:$dst), (ins GR32:$src), "dec{l}\t$dst",
+def DEC64_32r : I<0xFF, MRM1r, (outs GR32:$dst), (ins GR32:$src), 
+                  "dec{l}\t$dst",
                   [(set GR32:$dst, (add GR32:$src, -1)),
                    (implicit EFLAGS)]>,
                 Requires<[In64BitMode]>;
@@ -743,13 +818,14 @@ def SHL64rCL : RI<0xD3, MRM4r, (outs GR64:$dst), (ins GR64:$src),
                   "shl{q}\t{%cl, $dst|$dst, %CL}",
                   [(set GR64:$dst, (shl GR64:$src, CL))]>;
 let isConvertibleToThreeAddress = 1 in   // Can transform into LEA.
-def SHL64ri  : RIi8<0xC1, MRM4r, (outs GR64:$dst), (ins GR64:$src1, i8imm:$src2),
+def SHL64ri  : RIi8<0xC1, MRM4r, (outs GR64:$dst), 
+                    (ins GR64:$src1, i8imm:$src2),
                     "shl{q}\t{$src2, $dst|$dst, $src2}",
                     [(set GR64:$dst, (shl GR64:$src1, (i8 imm:$src2)))]>;
 // NOTE: We don't include patterns for shifts of a register by one, because
 // 'add reg,reg' is cheaper.
 def SHL64r1  : RI<0xD1, MRM4r, (outs GR64:$dst), (ins GR64:$src1),
-                 "shr{q}\t$dst", []>;
+                 "shl{q}\t$dst", []>;
 } // isTwoAddress
 
 let Uses = [CL] in
@@ -792,9 +868,10 @@ let Uses = [CL] in
 def SAR64rCL : RI<0xD3, MRM7r, (outs GR64:$dst), (ins GR64:$src),
                  "sar{q}\t{%cl, $dst|$dst, %CL}",
                  [(set GR64:$dst, (sra GR64:$src, CL))]>;
-def SAR64ri  : RIi8<0xC1, MRM7r, (outs GR64:$dst), (ins GR64:$src1, i8imm:$src2),
-                   "sar{q}\t{$src2, $dst|$dst, $src2}",
-                   [(set GR64:$dst, (sra GR64:$src1, (i8 imm:$src2)))]>;
+def SAR64ri  : RIi8<0xC1, MRM7r, (outs GR64:$dst),
+                    (ins GR64:$src1, i8imm:$src2),
+                    "sar{q}\t{$src2, $dst|$dst, $src2}",
+                    [(set GR64:$dst, (sra GR64:$src1, (i8 imm:$src2)))]>;
 def SAR64r1  : RI<0xD1, MRM7r, (outs GR64:$dst), (ins GR64:$src1),
                  "sar{q}\t$dst",
                  [(set GR64:$dst, (sra GR64:$src1, (i8 1)))]>;
@@ -826,7 +903,8 @@ def RCL64mCL : RI<0xD3, MRM2m, (outs i64mem:$dst), (ins i64mem:$src),
 }
 def RCL64ri : RIi8<0xC1, MRM2r, (outs GR64:$dst), (ins GR64:$src, i8imm:$cnt),
                    "rcl{q}\t{$cnt, $dst|$dst, $cnt}", []>;
-def RCL64mi : RIi8<0xC1, MRM2m, (outs i64mem:$dst), (ins i64mem:$src, i8imm:$cnt),
+def RCL64mi : RIi8<0xC1, MRM2m, (outs i64mem:$dst), 
+                   (ins i64mem:$src, i8imm:$cnt),
                    "rcl{q}\t{$cnt, $dst|$dst, $cnt}", []>;
 
 def RCR64r1 : RI<0xD1, MRM3r, (outs GR64:$dst), (ins GR64:$src),
@@ -841,7 +919,8 @@ def RCR64mCL : RI<0xD3, MRM3m, (outs i64mem:$dst), (ins i64mem:$src),
 }
 def RCR64ri : RIi8<0xC1, MRM3r, (outs GR64:$dst), (ins GR64:$src, i8imm:$cnt),
                    "rcr{q}\t{$cnt, $dst|$dst, $cnt}", []>;
-def RCR64mi : RIi8<0xC1, MRM3m, (outs i64mem:$dst), (ins i64mem:$src, i8imm:$cnt),
+def RCR64mi : RIi8<0xC1, MRM3m, (outs i64mem:$dst), 
+                   (ins i64mem:$src, i8imm:$cnt),
                    "rcr{q}\t{$cnt, $dst|$dst, $cnt}", []>;
 }
 
@@ -850,7 +929,8 @@ let Uses = [CL] in
 def ROL64rCL : RI<0xD3, MRM0r, (outs GR64:$dst), (ins GR64:$src),
                   "rol{q}\t{%cl, $dst|$dst, %CL}",
                   [(set GR64:$dst, (rotl GR64:$src, CL))]>;
-def ROL64ri  : RIi8<0xC1, MRM0r, (outs GR64:$dst), (ins GR64:$src1, i8imm:$src2),
+def ROL64ri  : RIi8<0xC1, MRM0r, (outs GR64:$dst), 
+                    (ins GR64:$src1, i8imm:$src2),
                     "rol{q}\t{$src2, $dst|$dst, $src2}",
                     [(set GR64:$dst, (rotl GR64:$src1, (i8 imm:$src2)))]>;
 def ROL64r1  : RI<0xD1, MRM0r, (outs GR64:$dst), (ins GR64:$src1),
@@ -859,9 +939,9 @@ def ROL64r1  : RI<0xD1, MRM0r, (outs GR64:$dst), (ins GR64:$src1),
 } // isTwoAddress
 
 let Uses = [CL] in
-def ROL64mCL :  I<0xD3, MRM0m, (outs), (ins i64mem:$dst),
-                  "rol{q}\t{%cl, $dst|$dst, %CL}",
-                  [(store (rotl (loadi64 addr:$dst), CL), addr:$dst)]>;
+def ROL64mCL :  RI<0xD3, MRM0m, (outs), (ins i64mem:$dst),
+                   "rol{q}\t{%cl, $dst|$dst, %CL}",
+                   [(store (rotl (loadi64 addr:$dst), CL), addr:$dst)]>;
 def ROL64mi  : RIi8<0xC1, MRM0m, (outs), (ins i64mem:$dst, i8imm:$src),
                     "rol{q}\t{$src, $dst|$dst, $src}",
                 [(store (rotl (loadi64 addr:$dst), (i8 imm:$src)), addr:$dst)]>;
@@ -874,7 +954,8 @@ let Uses = [CL] in
 def ROR64rCL : RI<0xD3, MRM1r, (outs GR64:$dst), (ins GR64:$src),
                   "ror{q}\t{%cl, $dst|$dst, %CL}",
                   [(set GR64:$dst, (rotr GR64:$src, CL))]>;
-def ROR64ri  : RIi8<0xC1, MRM1r, (outs GR64:$dst), (ins GR64:$src1, i8imm:$src2),
+def ROR64ri  : RIi8<0xC1, MRM1r, (outs GR64:$dst), 
+                    (ins GR64:$src1, i8imm:$src2),
                     "ror{q}\t{$src2, $dst|$dst, $src2}",
                     [(set GR64:$dst, (rotr GR64:$src1, (i8 imm:$src2)))]>;
 def ROR64r1  : RI<0xD1, MRM1r, (outs GR64:$dst), (ins GR64:$src1),
@@ -896,23 +977,29 @@ def ROR64m1  : RI<0xD1, MRM1m, (outs), (ins i64mem:$dst),
 // Double shift instructions (generalizations of rotate)
 let isTwoAddress = 1 in {
 let Uses = [CL] in {
-def SHLD64rrCL : RI<0xA5, MRMDestReg, (outs GR64:$dst), (ins GR64:$src1, GR64:$src2),
+def SHLD64rrCL : RI<0xA5, MRMDestReg, (outs GR64:$dst), 
+                    (ins GR64:$src1, GR64:$src2),
                     "shld{q}\t{%cl, $src2, $dst|$dst, $src2, %CL}",
-                    [(set GR64:$dst, (X86shld GR64:$src1, GR64:$src2, CL))]>, TB;
-def SHRD64rrCL : RI<0xAD, MRMDestReg, (outs GR64:$dst), (ins GR64:$src1, GR64:$src2),
+                    [(set GR64:$dst, (X86shld GR64:$src1, GR64:$src2, CL))]>, 
+                    TB;
+def SHRD64rrCL : RI<0xAD, MRMDestReg, (outs GR64:$dst), 
+                    (ins GR64:$src1, GR64:$src2),
                     "shrd{q}\t{%cl, $src2, $dst|$dst, $src2, %CL}",
-                    [(set GR64:$dst, (X86shrd GR64:$src1, GR64:$src2, CL))]>, TB;
+                    [(set GR64:$dst, (X86shrd GR64:$src1, GR64:$src2, CL))]>, 
+                    TB;
 }
 
 let isCommutable = 1 in {  // FIXME: Update X86InstrInfo::commuteInstruction
 def SHLD64rri8 : RIi8<0xA4, MRMDestReg,
-                      (outs GR64:$dst), (ins GR64:$src1, GR64:$src2, i8imm:$src3),
+                      (outs GR64:$dst), 
+                      (ins GR64:$src1, GR64:$src2, i8imm:$src3),
                       "shld{q}\t{$src3, $src2, $dst|$dst, $src2, $src3}",
                       [(set GR64:$dst, (X86shld GR64:$src1, GR64:$src2,
                                        (i8 imm:$src3)))]>,
                  TB;
 def SHRD64rri8 : RIi8<0xAC, MRMDestReg,
-                      (outs GR64:$dst), (ins GR64:$src1, GR64:$src2, i8imm:$src3),
+                      (outs GR64:$dst), 
+                      (ins GR64:$src1, GR64:$src2, i8imm:$src3),
                       "shrd{q}\t{$src3, $src2, $dst|$dst, $src2, $src3}",
                       [(set GR64:$dst, (X86shrd GR64:$src1, GR64:$src2,
                                        (i8 imm:$src3)))]>,
@@ -965,6 +1052,9 @@ def AND64rr  : RI<0x21, MRMDestReg,
                   "and{q}\t{$src2, $dst|$dst, $src2}",
                   [(set GR64:$dst, (and GR64:$src1, GR64:$src2)),
                    (implicit EFLAGS)]>;
+def AND64rr_REV : RI<0x23, MRMSrcReg, (outs GR64:$dst), 
+                     (ins GR64:$src1, GR64:$src2),
+                     "and{q}\t{$src2, $dst|$dst, $src2}", []>;
 def AND64rm  : RI<0x23, MRMSrcMem,
                   (outs GR64:$dst), (ins GR64:$src1, i64mem:$src2),
                   "and{q}\t{$src2, $dst|$dst, $src2}",
@@ -1000,19 +1090,26 @@ def AND64mi32  : RIi32<0x81, MRM4m,
 
 let isTwoAddress = 1 in {
 let isCommutable = 1 in
-def OR64rr   : RI<0x09, MRMDestReg, (outs GR64:$dst), (ins GR64:$src1, GR64:$src2),
+def OR64rr   : RI<0x09, MRMDestReg, (outs GR64:$dst), 
+                  (ins GR64:$src1, GR64:$src2),
                   "or{q}\t{$src2, $dst|$dst, $src2}",
                   [(set GR64:$dst, (or GR64:$src1, GR64:$src2)),
                    (implicit EFLAGS)]>;
-def OR64rm   : RI<0x0B, MRMSrcMem , (outs GR64:$dst), (ins GR64:$src1, i64mem:$src2),
+def OR64rr_REV : RI<0x0B, MRMSrcReg, (outs GR64:$dst), 
+                    (ins GR64:$src1, GR64:$src2),
+                    "or{q}\t{$src2, $dst|$dst, $src2}", []>;
+def OR64rm   : RI<0x0B, MRMSrcMem , (outs GR64:$dst),
+                  (ins GR64:$src1, i64mem:$src2),
                   "or{q}\t{$src2, $dst|$dst, $src2}",
                   [(set GR64:$dst, (or GR64:$src1, (load addr:$src2))),
                    (implicit EFLAGS)]>;
-def OR64ri8  : RIi8<0x83, MRM1r, (outs GR64:$dst), (ins GR64:$src1, i64i8imm:$src2),
+def OR64ri8  : RIi8<0x83, MRM1r, (outs GR64:$dst),
+                    (ins GR64:$src1, i64i8imm:$src2),
                     "or{q}\t{$src2, $dst|$dst, $src2}",
                     [(set GR64:$dst, (or GR64:$src1, i64immSExt8:$src2)),
                      (implicit EFLAGS)]>;
-def OR64ri32 : RIi32<0x81, MRM1r, (outs GR64:$dst), (ins GR64:$src1, i64i32imm:$src2),
+def OR64ri32 : RIi32<0x81, MRM1r, (outs GR64:$dst),
+                     (ins GR64:$src1, i64i32imm:$src2),
                      "or{q}\t{$src2, $dst|$dst, $src2}",
                      [(set GR64:$dst, (or GR64:$src1, i64immSExt32:$src2)),
                       (implicit EFLAGS)]>;
@@ -1036,15 +1133,21 @@ def OR64i32 : RIi32<0x0D, RawFrm, (outs), (ins i32imm:$src),
 
 let isTwoAddress = 1 in {
 let isCommutable = 1 in
-def XOR64rr  : RI<0x31, MRMDestReg,  (outs GR64:$dst), (ins GR64:$src1, GR64:$src2), 
+def XOR64rr  : RI<0x31, MRMDestReg,  (outs GR64:$dst), 
+                  (ins GR64:$src1, GR64:$src2), 
                   "xor{q}\t{$src2, $dst|$dst, $src2}",
                   [(set GR64:$dst, (xor GR64:$src1, GR64:$src2)),
                    (implicit EFLAGS)]>;
-def XOR64rm  : RI<0x33, MRMSrcMem, (outs GR64:$dst), (ins GR64:$src1, i64mem:$src2), 
+def XOR64rr_REV : RI<0x33, MRMSrcReg, (outs GR64:$dst), 
+                     (ins GR64:$src1, GR64:$src2),
+                    "xor{q}\t{$src2, $dst|$dst, $src2}", []>;
+def XOR64rm  : RI<0x33, MRMSrcMem, (outs GR64:$dst), 
+                  (ins GR64:$src1, i64mem:$src2), 
                   "xor{q}\t{$src2, $dst|$dst, $src2}",
                   [(set GR64:$dst, (xor GR64:$src1, (load addr:$src2))),
                    (implicit EFLAGS)]>;
-def XOR64ri8 : RIi8<0x83, MRM6r,  (outs GR64:$dst), (ins GR64:$src1, i64i8imm:$src2),
+def XOR64ri8 : RIi8<0x83, MRM6r,  (outs GR64:$dst), 
+                    (ins GR64:$src1, i64i8imm:$src2),
                     "xor{q}\t{$src2, $dst|$dst, $src2}",
                     [(set GR64:$dst, (xor GR64:$src1, i64immSExt8:$src2)),
                      (implicit EFLAGS)]>;
@@ -1148,10 +1251,12 @@ def BT64rr : RI<0xA3, MRMDestReg, (outs), (ins GR64:$src1, GR64:$src2),
 // Unlike with the register+register form, the memory+register form of the
 // bt instruction does not ignore the high bits of the index. From ISel's
 // perspective, this is pretty bizarre. Disable these instructions for now.
-//def BT64mr : RI<0xA3, MRMDestMem, (outs), (ins i64mem:$src1, GR64:$src2),
-//               "bt{q}\t{$src2, $src1|$src1, $src2}",
+def BT64mr : RI<0xA3, MRMDestMem, (outs), (ins i64mem:$src1, GR64:$src2),
+               "bt{q}\t{$src2, $src1|$src1, $src2}",
 //               [(X86bt (loadi64 addr:$src1), GR64:$src2),
-//                (implicit EFLAGS)]>, TB;
+//                (implicit EFLAGS)]
+                []
+                >, TB;
 
 def BT64ri8 : Ii8<0xBA, MRM4r, (outs), (ins GR64:$src1, i64i8imm:$src2),
                 "bt{q}\t{$src2, $src1|$src1, $src2}",
@@ -1164,6 +1269,33 @@ def BT64mi8 : Ii8<0xBA, MRM4m, (outs), (ins i64mem:$src1, i64i8imm:$src2),
                 "bt{q}\t{$src2, $src1|$src1, $src2}",
                 [(X86bt (loadi64 addr:$src1), i64immSExt8:$src2),
                  (implicit EFLAGS)]>, TB;
+
+def BTC64rr : RI<0xBB, MRMDestReg, (outs), (ins GR64:$src1, GR64:$src2),
+                 "btc{q}\t{$src2, $src1|$src1, $src2}", []>, TB;
+def BTC64mr : RI<0xBB, MRMDestMem, (outs), (ins i64mem:$src1, GR64:$src2),
+                 "btc{q}\t{$src2, $src1|$src1, $src2}", []>, TB;
+def BTC64ri8 : RIi8<0xBA, MRM7r, (outs), (ins GR64:$src1, i64i8imm:$src2),
+                    "btc{q}\t{$src2, $src1|$src1, $src2}", []>, TB;
+def BTC64mi8 : RIi8<0xBA, MRM7m, (outs), (ins i64mem:$src1, i64i8imm:$src2),
+                    "btc{q}\t{$src2, $src1|$src1, $src2}", []>, TB;
+
+def BTR64rr : RI<0xB3, MRMDestReg, (outs), (ins GR64:$src1, GR64:$src2),
+                 "btr{q}\t{$src2, $src1|$src1, $src2}", []>, TB;
+def BTR64mr : RI<0xB3, MRMDestMem, (outs), (ins i64mem:$src1, GR64:$src2),
+                 "btr{q}\t{$src2, $src1|$src1, $src2}", []>, TB;
+def BTR64ri8 : RIi8<0xBA, MRM6r, (outs), (ins GR64:$src1, i64i8imm:$src2),
+                    "btr{q}\t{$src2, $src1|$src1, $src2}", []>, TB;
+def BTR64mi8 : RIi8<0xBA, MRM6m, (outs), (ins i64mem:$src1, i64i8imm:$src2),
+                    "btr{q}\t{$src2, $src1|$src1, $src2}", []>, TB;
+
+def BTS64rr : RI<0xAB, MRMDestReg, (outs), (ins GR64:$src1, GR64:$src2),
+                 "bts{q}\t{$src2, $src1|$src1, $src2}", []>, TB;
+def BTS64mr : RI<0xAB, MRMDestMem, (outs), (ins i64mem:$src1, GR64:$src2),
+                 "bts{q}\t{$src2, $src1|$src1, $src2}", []>, TB;
+def BTS64ri8 : RIi8<0xBA, MRM5r, (outs), (ins GR64:$src1, i64i8imm:$src2),
+                    "bts{q}\t{$src2, $src1|$src1, $src2}", []>, TB;
+def BTS64mi8 : RIi8<0xBA, MRM5m, (outs), (ins i64mem:$src1, i64i8imm:$src2),
+                    "bts{q}\t{$src2, $src1|$src1, $src2}", []>, TB;
 } // Defs = [EFLAGS]
 
 // Conditional moves
@@ -1171,164 +1303,164 @@ let Uses = [EFLAGS], isTwoAddress = 1 in {
 let isCommutable = 1 in {
 def CMOVB64rr : RI<0x42, MRMSrcReg,       // if <u, GR64 = GR64
                    (outs GR64:$dst), (ins GR64:$src1, GR64:$src2),
-                   "cmovb\t{$src2, $dst|$dst, $src2}",
+                   "cmovb{q}\t{$src2, $dst|$dst, $src2}",
                    [(set GR64:$dst, (X86cmov GR64:$src1, GR64:$src2,
                                      X86_COND_B, EFLAGS))]>, TB;
 def CMOVAE64rr: RI<0x43, MRMSrcReg,       // if >=u, GR64 = GR64
                    (outs GR64:$dst), (ins GR64:$src1, GR64:$src2),
-                   "cmovae\t{$src2, $dst|$dst, $src2}",
+                   "cmovae{q}\t{$src2, $dst|$dst, $src2}",
                    [(set GR64:$dst, (X86cmov GR64:$src1, GR64:$src2,
                                      X86_COND_AE, EFLAGS))]>, TB;
 def CMOVE64rr : RI<0x44, MRMSrcReg,       // if ==, GR64 = GR64
                    (outs GR64:$dst), (ins GR64:$src1, GR64:$src2),
-                   "cmove\t{$src2, $dst|$dst, $src2}",
+                   "cmove{q}\t{$src2, $dst|$dst, $src2}",
                    [(set GR64:$dst, (X86cmov GR64:$src1, GR64:$src2,
                                      X86_COND_E, EFLAGS))]>, TB;
 def CMOVNE64rr: RI<0x45, MRMSrcReg,       // if !=, GR64 = GR64
                    (outs GR64:$dst), (ins GR64:$src1, GR64:$src2),
-                   "cmovne\t{$src2, $dst|$dst, $src2}",
+                   "cmovne{q}\t{$src2, $dst|$dst, $src2}",
                    [(set GR64:$dst, (X86cmov GR64:$src1, GR64:$src2,
                                     X86_COND_NE, EFLAGS))]>, TB;
 def CMOVBE64rr: RI<0x46, MRMSrcReg,       // if <=u, GR64 = GR64
                    (outs GR64:$dst), (ins GR64:$src1, GR64:$src2),
-                   "cmovbe\t{$src2, $dst|$dst, $src2}",
+                   "cmovbe{q}\t{$src2, $dst|$dst, $src2}",
                    [(set GR64:$dst, (X86cmov GR64:$src1, GR64:$src2,
                                     X86_COND_BE, EFLAGS))]>, TB;
 def CMOVA64rr : RI<0x47, MRMSrcReg,       // if >u, GR64 = GR64
                    (outs GR64:$dst), (ins GR64:$src1, GR64:$src2),
-                   "cmova\t{$src2, $dst|$dst, $src2}",
+                   "cmova{q}\t{$src2, $dst|$dst, $src2}",
                    [(set GR64:$dst, (X86cmov GR64:$src1, GR64:$src2,
                                     X86_COND_A, EFLAGS))]>, TB;
 def CMOVL64rr : RI<0x4C, MRMSrcReg,       // if <s, GR64 = GR64
                    (outs GR64:$dst), (ins GR64:$src1, GR64:$src2),
-                   "cmovl\t{$src2, $dst|$dst, $src2}",
+                   "cmovl{q}\t{$src2, $dst|$dst, $src2}",
                    [(set GR64:$dst, (X86cmov GR64:$src1, GR64:$src2,
                                     X86_COND_L, EFLAGS))]>, TB;
 def CMOVGE64rr: RI<0x4D, MRMSrcReg,       // if >=s, GR64 = GR64
                    (outs GR64:$dst), (ins GR64:$src1, GR64:$src2),
-                   "cmovge\t{$src2, $dst|$dst, $src2}",
+                   "cmovge{q}\t{$src2, $dst|$dst, $src2}",
                    [(set GR64:$dst, (X86cmov GR64:$src1, GR64:$src2,
                                     X86_COND_GE, EFLAGS))]>, TB;
 def CMOVLE64rr: RI<0x4E, MRMSrcReg,       // if <=s, GR64 = GR64
                    (outs GR64:$dst), (ins GR64:$src1, GR64:$src2),
-                   "cmovle\t{$src2, $dst|$dst, $src2}",
+                   "cmovle{q}\t{$src2, $dst|$dst, $src2}",
                    [(set GR64:$dst, (X86cmov GR64:$src1, GR64:$src2,
                                     X86_COND_LE, EFLAGS))]>, TB;
 def CMOVG64rr : RI<0x4F, MRMSrcReg,       // if >s, GR64 = GR64
                    (outs GR64:$dst), (ins GR64:$src1, GR64:$src2),
-                   "cmovg\t{$src2, $dst|$dst, $src2}",
+                   "cmovg{q}\t{$src2, $dst|$dst, $src2}",
                    [(set GR64:$dst, (X86cmov GR64:$src1, GR64:$src2,
                                     X86_COND_G, EFLAGS))]>, TB;
 def CMOVS64rr : RI<0x48, MRMSrcReg,       // if signed, GR64 = GR64
                    (outs GR64:$dst), (ins GR64:$src1, GR64:$src2),
-                   "cmovs\t{$src2, $dst|$dst, $src2}",
+                   "cmovs{q}\t{$src2, $dst|$dst, $src2}",
                    [(set GR64:$dst, (X86cmov GR64:$src1, GR64:$src2,
                                     X86_COND_S, EFLAGS))]>, TB;
 def CMOVNS64rr: RI<0x49, MRMSrcReg,       // if !signed, GR64 = GR64
                    (outs GR64:$dst), (ins GR64:$src1, GR64:$src2),
-                   "cmovns\t{$src2, $dst|$dst, $src2}",
+                   "cmovns{q}\t{$src2, $dst|$dst, $src2}",
                    [(set GR64:$dst, (X86cmov GR64:$src1, GR64:$src2,
                                     X86_COND_NS, EFLAGS))]>, TB;
 def CMOVP64rr : RI<0x4A, MRMSrcReg,       // if parity, GR64 = GR64
                    (outs GR64:$dst), (ins GR64:$src1, GR64:$src2),
-                   "cmovp\t{$src2, $dst|$dst, $src2}",
+                   "cmovp{q}\t{$src2, $dst|$dst, $src2}",
                    [(set GR64:$dst, (X86cmov GR64:$src1, GR64:$src2,
                                     X86_COND_P, EFLAGS))]>, TB;
 def CMOVNP64rr : RI<0x4B, MRMSrcReg,       // if !parity, GR64 = GR64
                    (outs GR64:$dst), (ins GR64:$src1, GR64:$src2),
-                   "cmovnp\t{$src2, $dst|$dst, $src2}",
+                   "cmovnp{q}\t{$src2, $dst|$dst, $src2}",
                     [(set GR64:$dst, (X86cmov GR64:$src1, GR64:$src2,
                                      X86_COND_NP, EFLAGS))]>, TB;
 def CMOVO64rr : RI<0x40, MRMSrcReg,       // if overflow, GR64 = GR64
                    (outs GR64:$dst), (ins GR64:$src1, GR64:$src2),
-                   "cmovo\t{$src2, $dst|$dst, $src2}",
+                   "cmovo{q}\t{$src2, $dst|$dst, $src2}",
                    [(set GR64:$dst, (X86cmov GR64:$src1, GR64:$src2,
                                     X86_COND_O, EFLAGS))]>, TB;
 def CMOVNO64rr : RI<0x41, MRMSrcReg,       // if !overflow, GR64 = GR64
                    (outs GR64:$dst), (ins GR64:$src1, GR64:$src2),
-                   "cmovno\t{$src2, $dst|$dst, $src2}",
+                   "cmovno{q}\t{$src2, $dst|$dst, $src2}",
                     [(set GR64:$dst, (X86cmov GR64:$src1, GR64:$src2,
                                      X86_COND_NO, EFLAGS))]>, TB;
 } // isCommutable = 1
 
 def CMOVB64rm : RI<0x42, MRMSrcMem,       // if <u, GR64 = [mem64]
                    (outs GR64:$dst), (ins GR64:$src1, i64mem:$src2),
-                   "cmovb\t{$src2, $dst|$dst, $src2}",
+                   "cmovb{q}\t{$src2, $dst|$dst, $src2}",
                    [(set GR64:$dst, (X86cmov GR64:$src1, (loadi64 addr:$src2),
                                      X86_COND_B, EFLAGS))]>, TB;
 def CMOVAE64rm: RI<0x43, MRMSrcMem,       // if >=u, GR64 = [mem64]
                    (outs GR64:$dst), (ins GR64:$src1, i64mem:$src2),
-                   "cmovae\t{$src2, $dst|$dst, $src2}",
+                   "cmovae{q}\t{$src2, $dst|$dst, $src2}",
                    [(set GR64:$dst, (X86cmov GR64:$src1, (loadi64 addr:$src2),
                                      X86_COND_AE, EFLAGS))]>, TB;
 def CMOVE64rm : RI<0x44, MRMSrcMem,       // if ==, GR64 = [mem64]
                    (outs GR64:$dst), (ins GR64:$src1, i64mem:$src2),
-                   "cmove\t{$src2, $dst|$dst, $src2}",
+                   "cmove{q}\t{$src2, $dst|$dst, $src2}",
                    [(set GR64:$dst, (X86cmov GR64:$src1, (loadi64 addr:$src2),
                                      X86_COND_E, EFLAGS))]>, TB;
 def CMOVNE64rm: RI<0x45, MRMSrcMem,       // if !=, GR64 = [mem64]
                    (outs GR64:$dst), (ins GR64:$src1, i64mem:$src2),
-                   "cmovne\t{$src2, $dst|$dst, $src2}",
+                   "cmovne{q}\t{$src2, $dst|$dst, $src2}",
                    [(set GR64:$dst, (X86cmov GR64:$src1, (loadi64 addr:$src2),
                                     X86_COND_NE, EFLAGS))]>, TB;
 def CMOVBE64rm: RI<0x46, MRMSrcMem,       // if <=u, GR64 = [mem64]
                    (outs GR64:$dst), (ins GR64:$src1, i64mem:$src2),
-                   "cmovbe\t{$src2, $dst|$dst, $src2}",
+                   "cmovbe{q}\t{$src2, $dst|$dst, $src2}",
                    [(set GR64:$dst, (X86cmov GR64:$src1, (loadi64 addr:$src2),
                                     X86_COND_BE, EFLAGS))]>, TB;
 def CMOVA64rm : RI<0x47, MRMSrcMem,       // if >u, GR64 = [mem64]
                    (outs GR64:$dst), (ins GR64:$src1, i64mem:$src2),
-                   "cmova\t{$src2, $dst|$dst, $src2}",
+                   "cmova{q}\t{$src2, $dst|$dst, $src2}",
                    [(set GR64:$dst, (X86cmov GR64:$src1, (loadi64 addr:$src2),
                                     X86_COND_A, EFLAGS))]>, TB;
 def CMOVL64rm : RI<0x4C, MRMSrcMem,       // if <s, GR64 = [mem64]
                    (outs GR64:$dst), (ins GR64:$src1, i64mem:$src2),
-                   "cmovl\t{$src2, $dst|$dst, $src2}",
+                   "cmovl{q}\t{$src2, $dst|$dst, $src2}",
                    [(set GR64:$dst, (X86cmov GR64:$src1, (loadi64 addr:$src2),
                                     X86_COND_L, EFLAGS))]>, TB;
 def CMOVGE64rm: RI<0x4D, MRMSrcMem,       // if >=s, GR64 = [mem64]
                    (outs GR64:$dst), (ins GR64:$src1, i64mem:$src2),
-                   "cmovge\t{$src2, $dst|$dst, $src2}",
+                   "cmovge{q}\t{$src2, $dst|$dst, $src2}",
                    [(set GR64:$dst, (X86cmov GR64:$src1, (loadi64 addr:$src2),
                                     X86_COND_GE, EFLAGS))]>, TB;
 def CMOVLE64rm: RI<0x4E, MRMSrcMem,       // if <=s, GR64 = [mem64]
                    (outs GR64:$dst), (ins GR64:$src1, i64mem:$src2),
-                   "cmovle\t{$src2, $dst|$dst, $src2}",
+                   "cmovle{q}\t{$src2, $dst|$dst, $src2}",
                    [(set GR64:$dst, (X86cmov GR64:$src1, (loadi64 addr:$src2),
                                     X86_COND_LE, EFLAGS))]>, TB;
 def CMOVG64rm : RI<0x4F, MRMSrcMem,       // if >s, GR64 = [mem64]
                    (outs GR64:$dst), (ins GR64:$src1, i64mem:$src2),
-                   "cmovg\t{$src2, $dst|$dst, $src2}",
+                   "cmovg{q}\t{$src2, $dst|$dst, $src2}",
                    [(set GR64:$dst, (X86cmov GR64:$src1, (loadi64 addr:$src2),
                                     X86_COND_G, EFLAGS))]>, TB;
 def CMOVS64rm : RI<0x48, MRMSrcMem,       // if signed, GR64 = [mem64]
                    (outs GR64:$dst), (ins GR64:$src1, i64mem:$src2),
-                   "cmovs\t{$src2, $dst|$dst, $src2}",
+                   "cmovs{q}\t{$src2, $dst|$dst, $src2}",
                    [(set GR64:$dst, (X86cmov GR64:$src1, (loadi64 addr:$src2),
                                     X86_COND_S, EFLAGS))]>, TB;
 def CMOVNS64rm: RI<0x49, MRMSrcMem,       // if !signed, GR64 = [mem64]
                    (outs GR64:$dst), (ins GR64:$src1, i64mem:$src2),
-                   "cmovns\t{$src2, $dst|$dst, $src2}",
+                   "cmovns{q}\t{$src2, $dst|$dst, $src2}",
                    [(set GR64:$dst, (X86cmov GR64:$src1, (loadi64 addr:$src2),
                                     X86_COND_NS, EFLAGS))]>, TB;
 def CMOVP64rm : RI<0x4A, MRMSrcMem,       // if parity, GR64 = [mem64]
                    (outs GR64:$dst), (ins GR64:$src1, i64mem:$src2),
-                   "cmovp\t{$src2, $dst|$dst, $src2}",
+                   "cmovp{q}\t{$src2, $dst|$dst, $src2}",
                    [(set GR64:$dst, (X86cmov GR64:$src1, (loadi64 addr:$src2),
                                     X86_COND_P, EFLAGS))]>, TB;
 def CMOVNP64rm : RI<0x4B, MRMSrcMem,       // if !parity, GR64 = [mem64]
                    (outs GR64:$dst), (ins GR64:$src1, i64mem:$src2),
-                   "cmovnp\t{$src2, $dst|$dst, $src2}",
+                   "cmovnp{q}\t{$src2, $dst|$dst, $src2}",
                     [(set GR64:$dst, (X86cmov GR64:$src1, (loadi64 addr:$src2),
                                      X86_COND_NP, EFLAGS))]>, TB;
 def CMOVO64rm : RI<0x40, MRMSrcMem,       // if overflow, GR64 = [mem64]
                    (outs GR64:$dst), (ins GR64:$src1, i64mem:$src2),
-                   "cmovo\t{$src2, $dst|$dst, $src2}",
+                   "cmovo{q}\t{$src2, $dst|$dst, $src2}",
                    [(set GR64:$dst, (X86cmov GR64:$src1, (loadi64 addr:$src2),
                                     X86_COND_O, EFLAGS))]>, TB;
 def CMOVNO64rm : RI<0x41, MRMSrcMem,       // if !overflow, GR64 = [mem64]
                    (outs GR64:$dst), (ins GR64:$src1, i64mem:$src2),
-                   "cmovno\t{$src2, $dst|$dst, $src2}",
+                   "cmovno{q}\t{$src2, $dst|$dst, $src2}",
                     [(set GR64:$dst, (X86cmov GR64:$src1, (loadi64 addr:$src2),
                                      X86_COND_NO, EFLAGS))]>, TB;
 } // isTwoAddress
@@ -1347,11 +1479,16 @@ def : Pat<(i64 (anyext (i8 (X86setcc_c X86_COND_B, EFLAGS)))),
 //
 
 // f64 -> signed i64
+def CVTSD2SI64rr: RSDI<0x2D, MRMSrcReg, (outs GR64:$dst), (ins FR64:$src),
+                       "cvtsd2si{q}\t{$src, $dst|$dst, $src}", []>;
+def CVTSD2SI64rm: RSDI<0x2D, MRMSrcMem, (outs GR64:$dst), (ins f64mem:$src),
+                       "cvtsd2si{q}\t{$src, $dst|$dst, $src}", []>;
 def Int_CVTSD2SI64rr: RSDI<0x2D, MRMSrcReg, (outs GR64:$dst), (ins VR128:$src),
                            "cvtsd2si{q}\t{$src, $dst|$dst, $src}",
                            [(set GR64:$dst,
                              (int_x86_sse2_cvtsd2si64 VR128:$src))]>;
-def Int_CVTSD2SI64rm: RSDI<0x2D, MRMSrcMem, (outs GR64:$dst), (ins f128mem:$src),
+def Int_CVTSD2SI64rm: RSDI<0x2D, MRMSrcMem, (outs GR64:$dst), 
+                           (ins f128mem:$src),
                            "cvtsd2si{q}\t{$src, $dst|$dst, $src}",
                            [(set GR64:$dst, (int_x86_sse2_cvtsd2si64
                                              (load addr:$src)))]>;
@@ -1365,7 +1502,8 @@ def Int_CVTTSD2SI64rr: RSDI<0x2C, MRMSrcReg, (outs GR64:$dst), (ins VR128:$src),
                             "cvttsd2si{q}\t{$src, $dst|$dst, $src}",
                             [(set GR64:$dst,
                               (int_x86_sse2_cvttsd2si64 VR128:$src))]>;
-def Int_CVTTSD2SI64rm: RSDI<0x2C, MRMSrcMem, (outs GR64:$dst), (ins f128mem:$src),
+def Int_CVTTSD2SI64rm: RSDI<0x2C, MRMSrcMem, (outs GR64:$dst), 
+                            (ins f128mem:$src),
                             "cvttsd2si{q}\t{$src, $dst|$dst, $src}",
                             [(set GR64:$dst,
                               (int_x86_sse2_cvttsd2si64
@@ -1410,7 +1548,8 @@ let isTwoAddress = 1 in {
                                 (int_x86_sse_cvtsi642ss VR128:$src1,
                                  GR64:$src2))]>;
   def Int_CVTSI2SS64rm : RSSI<0x2A, MRMSrcMem,
-                              (outs VR128:$dst), (ins VR128:$src1, i64mem:$src2),
+                              (outs VR128:$dst), 
+                              (ins VR128:$src1, i64mem:$src2),
                               "cvtsi2ss{q}\t{$src2, $dst|$dst, $src2}",
                               [(set VR128:$dst,
                                 (int_x86_sse_cvtsi642ss VR128:$src1,
@@ -1418,6 +1557,10 @@ let isTwoAddress = 1 in {
 }
 
 // f32 -> signed i64
+def CVTSS2SI64rr: RSSI<0x2D, MRMSrcReg, (outs GR64:$dst), (ins FR32:$src),
+                       "cvtss2si{q}\t{$src, $dst|$dst, $src}", []>;
+def CVTSS2SI64rm: RSSI<0x2D, MRMSrcMem, (outs GR64:$dst), (ins f32mem:$src),
+                       "cvtss2si{q}\t{$src, $dst|$dst, $src}", []>;
 def Int_CVTSS2SI64rr: RSSI<0x2D, MRMSrcReg, (outs GR64:$dst), (ins VR128:$src),
                            "cvtss2si{q}\t{$src, $dst|$dst, $src}",
                            [(set GR64:$dst,
@@ -1436,10 +1579,20 @@ def Int_CVTTSS2SI64rr: RSSI<0x2C, MRMSrcReg, (outs GR64:$dst), (ins VR128:$src),
                             "cvttss2si{q}\t{$src, $dst|$dst, $src}",
                             [(set GR64:$dst,
                               (int_x86_sse_cvttss2si64 VR128:$src))]>;
-def Int_CVTTSS2SI64rm: RSSI<0x2C, MRMSrcMem, (outs GR64:$dst), (ins f32mem:$src),
+def Int_CVTTSS2SI64rm: RSSI<0x2C, MRMSrcMem, (outs GR64:$dst),
+                            (ins f32mem:$src),
                             "cvttss2si{q}\t{$src, $dst|$dst, $src}",
                             [(set GR64:$dst,
                               (int_x86_sse_cvttss2si64 (load addr:$src)))]>;
+                              
+// Descriptor-table support instructions
+
+// LLDT is not interpreted specially in 64-bit mode because there is no sign
+//   extension.
+def SLDT64r : RI<0x00, MRM0r, (outs GR64:$dst), (ins),
+                 "sldt{q}\t$dst", []>, TB;
+def SLDT64m : RI<0x00, MRM0m, (outs i16mem:$dst), (ins),
+                 "sldt{q}\t$dst", []>, TB;
 
 //===----------------------------------------------------------------------===//
 // Alias Instructions
@@ -1505,17 +1658,37 @@ def LCMPXCHG64 : RI<0xB1, MRMDestMem, (outs), (ins i64mem:$ptr, GR64:$swap),
 
 let Constraints = "$val = $dst" in {
 let Defs = [EFLAGS] in
-def LXADD64 : RI<0xC1, MRMSrcMem, (outs GR64:$dst), (ins i64mem:$ptr,GR64:$val),
+def LXADD64 : RI<0xC1, MRMSrcMem, (outs GR64:$dst), (ins GR64:$val,i64mem:$ptr),
                "lock\n\t"
                "xadd\t$val, $ptr",
                [(set GR64:$dst, (atomic_load_add_64 addr:$ptr, GR64:$val))]>,
                 TB, LOCK;
 
-def XCHG64rm : RI<0x87, MRMSrcMem, (outs GR64:$dst), (ins i64mem:$ptr,GR64:$val),
-                  "xchg\t$val, $ptr", 
+def XCHG64rm : RI<0x87, MRMSrcMem, (outs GR64:$dst), 
+                  (ins GR64:$val,i64mem:$ptr),
+                  "xchg{q}\t{$val, $ptr|$ptr, $val}", 
                   [(set GR64:$dst, (atomic_swap_64 addr:$ptr, GR64:$val))]>;
+
+def XCHG64rr : RI<0x87, MRMSrcReg, (outs GR64:$dst), (ins GR64:$val,GR64:$src),
+                  "xchg{q}\t{$val, $src|$src, $val}", []>;
 }
 
+def XADD64rr  : RI<0xC1, MRMDestReg, (outs GR64:$dst), (ins GR64:$src),
+                   "xadd{q}\t{$src, $dst|$dst, $src}", []>, TB;
+def XADD64rm  : RI<0xC1, MRMDestMem, (outs), (ins i64mem:$dst, GR64:$src),
+                   "xadd{q}\t{$src, $dst|$dst, $src}", []>, TB;
+                   
+def CMPXCHG64rr  : RI<0xB1, MRMDestReg, (outs GR64:$dst), (ins GR64:$src),
+                      "cmpxchg{q}\t{$src, $dst|$dst, $src}", []>, TB;
+def CMPXCHG64rm  : RI<0xB1, MRMDestMem, (outs), (ins i64mem:$dst, GR64:$src),
+                      "cmpxchg{q}\t{$src, $dst|$dst, $src}", []>, TB;
+                      
+def CMPXCHG16B : RI<0xC7, MRM1m, (outs), (ins i128mem:$dst),
+                    "cmpxchg16b\t$dst", []>, TB;
+
+def XCHG64ar : RI<0x90, AddRegFrm, (outs), (ins GR64:$src),
+                  "xchg{q}\t{$src, %rax|%rax, $src}", []>;
+
 // Optimized codegen when the non-memory output is not used.
 let Defs = [EFLAGS] in {
 // FIXME: Use normal add / sub instructions and add lock prefix dynamically.
@@ -1585,6 +1758,36 @@ def LAR64rm : RI<0x02, MRMSrcMem, (outs GR64:$dst), (ins i16mem:$src),
 def LAR64rr : RI<0x02, MRMSrcReg, (outs GR64:$dst), (ins GR32:$src),
                  "lar{q}\t{$src, $dst|$dst, $src}", []>, TB;
                  
+def LSL64rm : RI<0x03, MRMSrcMem, (outs GR64:$dst), (ins i64mem:$src),
+                 "lsl{q}\t{$src, $dst|$dst, $src}", []>, TB; 
+def LSL64rr : RI<0x03, MRMSrcReg, (outs GR64:$dst), (ins GR64:$src),
+                 "lsl{q}\t{$src, $dst|$dst, $src}", []>, TB;
+
+def SWPGS : I<0x01, RawFrm, (outs), (ins), "swpgs", []>, TB;
+
+def PUSHFS64 : I<0xa0, RawFrm, (outs), (ins),
+                 "push{q}\t%fs", []>, TB;
+def PUSHGS64 : I<0xa8, RawFrm, (outs), (ins),
+                 "push{q}\t%gs", []>, TB;
+
+def POPFS64 : I<0xa1, RawFrm, (outs), (ins),
+                "pop{q}\t%fs", []>, TB;
+def POPGS64 : I<0xa9, RawFrm, (outs), (ins),
+                "pop{q}\t%gs", []>, TB;
+                 
+def LSS64rm : RI<0xb2, MRMSrcMem, (outs GR64:$dst), (ins opaque80mem:$src),
+                 "lss{q}\t{$src, $dst|$dst, $src}", []>, TB;
+def LFS64rm : RI<0xb4, MRMSrcMem, (outs GR64:$dst), (ins opaque80mem:$src),
+                 "lfs{q}\t{$src, $dst|$dst, $src}", []>, TB;
+def LGS64rm : RI<0xb5, MRMSrcMem, (outs GR64:$dst), (ins opaque80mem:$src),
+                 "lgs{q}\t{$src, $dst|$dst, $src}", []>, TB;
+
+// Specialized register support
+
+// no m form encodable; use SMSW16m
+def SMSW64r : RI<0x01, MRM4r, (outs GR64:$dst), (ins), 
+                 "smsw{q}\t$dst", []>, TB;
+
 // String manipulation instructions
 
 def LODSQ : RI<0xAD, RawFrm, (outs), (ins), "lodsq", []>;
@@ -1722,9 +1925,9 @@ def : Pat<(X86cmov (loadi64 addr:$src1), GR64:$src2, X86_COND_NO, EFLAGS),
 def : Pat<(zextloadi64i1 addr:$src), (MOVZX64rm8 addr:$src)>;
 
 // extload
-// When extloading from 16-bit and smaller memory locations into 64-bit registers,
-// use zero-extending loads so that the entire 64-bit register is defined, avoiding
-// partial-register updates.
+// When extloading from 16-bit and smaller memory locations into 64-bit 
+// registers, use zero-extending loads so that the entire 64-bit register is 
+// defined, avoiding partial-register updates.
 def : Pat<(extloadi64i1 addr:$src),  (MOVZX64rm8  addr:$src)>;
 def : Pat<(extloadi64i8 addr:$src),  (MOVZX64rm8  addr:$src)>;
 def : Pat<(extloadi64i16 addr:$src), (MOVZX64rm16 addr:$src)>;
@@ -1995,7 +2198,8 @@ def : Pat<(parallel (store (X86add_flag (loadi64 addr:$dst), i64immSExt8:$src2),
                            addr:$dst),
                     (implicit EFLAGS)),
           (ADD64mi8 addr:$dst, i64immSExt8:$src2)>;
-def : Pat<(parallel (store (X86add_flag (loadi64 addr:$dst), i64immSExt32:$src2),
+def : Pat<(parallel (store (X86add_flag (loadi64 addr:$dst), 
+                                        i64immSExt32:$src2),
                            addr:$dst),
                     (implicit EFLAGS)),
           (ADD64mi32 addr:$dst, i64immSExt32:$src2)>;
@@ -2025,11 +2229,13 @@ def : Pat<(parallel (store (X86sub_flag (loadi64 addr:$dst), GR64:$src2),
           (SUB64mr addr:$dst, GR64:$src2)>;
 
 // Memory-Integer Subtraction with EFLAGS result
-def : Pat<(parallel (store (X86sub_flag (loadi64 addr:$dst), i64immSExt8:$src2),
+def : Pat<(parallel (store (X86sub_flag (loadi64 addr:$dst), 
+                                        i64immSExt8:$src2),
                            addr:$dst),
                     (implicit EFLAGS)),
           (SUB64mi8 addr:$dst, i64immSExt8:$src2)>;
-def : Pat<(parallel (store (X86sub_flag (loadi64 addr:$dst), i64immSExt32:$src2),
+def : Pat<(parallel (store (X86sub_flag (loadi64 addr:$dst),
+                                        i64immSExt32:$src2),
                            addr:$dst),
                     (implicit EFLAGS)),
           (SUB64mi32 addr:$dst, i64immSExt32:$src2)>;
@@ -2153,7 +2359,8 @@ def : Pat<(parallel (store (X86xor_flag (loadi64 addr:$dst), i64immSExt8:$src2),
                            addr:$dst),
                     (implicit EFLAGS)),
           (XOR64mi8 addr:$dst, i64immSExt8:$src2)>;
-def : Pat<(parallel (store (X86xor_flag (loadi64 addr:$dst), i64immSExt32:$src2),
+def : Pat<(parallel (store (X86xor_flag (loadi64 addr:$dst), 
+                                        i64immSExt32:$src2),
                            addr:$dst),
                     (implicit EFLAGS)),
           (XOR64mi32 addr:$dst, i64immSExt32:$src2)>;
@@ -2185,7 +2392,8 @@ def : Pat<(parallel (store (X86and_flag (loadi64 addr:$dst), i64immSExt8:$src2),
                            addr:$dst),
                     (implicit EFLAGS)),
           (AND64mi8 addr:$dst, i64immSExt8:$src2)>;
-def : Pat<(parallel (store (X86and_flag (loadi64 addr:$dst), i64immSExt32:$src2),
+def : Pat<(parallel (store (X86and_flag (loadi64 addr:$dst), 
+                                        i64immSExt32:$src2),
                            addr:$dst),
                     (implicit EFLAGS)),
           (AND64mi32 addr:$dst, i64immSExt32:$src2)>;
diff --git a/lib/Target/X86/X86InstrFPStack.td b/lib/Target/X86/X86InstrFPStack.td
index b0b0409ad2..71ec178e30 100644
--- a/lib/Target/X86/X86InstrFPStack.td
+++ b/lib/Target/X86/X86InstrFPStack.td
@@ -195,48 +195,67 @@ def _Fp80 : FpI_<(outs RFP80:$dst), (ins RFP80:$src1, RFP80:$src2), TwoArgFP,
 // These instructions cannot address 80-bit memory.
 multiclass FPBinary<SDNode OpNode, Format fp, string asmstring> {
 // ST(0) = ST(0) + [mem]
-def _Fp32m  : FpIf32<(outs RFP32:$dst), (ins RFP32:$src1, f32mem:$src2), OneArgFPRW,
+def _Fp32m  : FpIf32<(outs RFP32:$dst), 
+                     (ins RFP32:$src1, f32mem:$src2), OneArgFPRW,
                   [(set RFP32:$dst, 
                     (OpNode RFP32:$src1, (loadf32 addr:$src2)))]>;
-def _Fp64m  : FpIf64<(outs RFP64:$dst), (ins RFP64:$src1, f64mem:$src2), OneArgFPRW,
+def _Fp64m  : FpIf64<(outs RFP64:$dst), 
+                     (ins RFP64:$src1, f64mem:$src2), OneArgFPRW,
                   [(set RFP64:$dst, 
                     (OpNode RFP64:$src1, (loadf64 addr:$src2)))]>;
-def _Fp64m32: FpIf64<(outs RFP64:$dst), (ins RFP64:$src1, f32mem:$src2), OneArgFPRW,
+def _Fp64m32: FpIf64<(outs RFP64:$dst), 
+                     (ins RFP64:$src1, f32mem:$src2), OneArgFPRW,
                   [(set RFP64:$dst, 
                     (OpNode RFP64:$src1, (f64 (extloadf32 addr:$src2))))]>;
-def _Fp80m32: FpI_<(outs RFP80:$dst), (ins RFP80:$src1, f32mem:$src2), OneArgFPRW,
+def _Fp80m32: FpI_<(outs RFP80:$dst), 
+                   (ins RFP80:$src1, f32mem:$src2), OneArgFPRW,
                   [(set RFP80:$dst, 
                     (OpNode RFP80:$src1, (f80 (extloadf32 addr:$src2))))]>;
-def _Fp80m64: FpI_<(outs RFP80:$dst), (ins RFP80:$src1, f64mem:$src2), OneArgFPRW,
+def _Fp80m64: FpI_<(outs RFP80:$dst), 
+                   (ins RFP80:$src1, f64mem:$src2), OneArgFPRW,
                   [(set RFP80:$dst, 
                     (OpNode RFP80:$src1, (f80 (extloadf64 addr:$src2))))]>;
 def _F32m  : FPI<0xD8, fp, (outs), (ins f32mem:$src), 
-                 !strconcat("f", !strconcat(asmstring, "{s}\t$src"))> { let mayLoad = 1; }
+                 !strconcat("f", !strconcat(asmstring, "{s}\t$src"))> { 
+  let mayLoad = 1; 
+}
 def _F64m  : FPI<0xDC, fp, (outs), (ins f64mem:$src), 
-                 !strconcat("f", !strconcat(asmstring, "{l}\t$src"))> { let mayLoad = 1; }
+                 !strconcat("f", !strconcat(asmstring, "{l}\t$src"))> { 
+  let mayLoad = 1; 
+}
 // ST(0) = ST(0) + [memint]
-def _FpI16m32 : FpIf32<(outs RFP32:$dst), (ins RFP32:$src1, i16mem:$src2), OneArgFPRW,
+def _FpI16m32 : FpIf32<(outs RFP32:$dst), (ins RFP32:$src1, i16mem:$src2), 
+                       OneArgFPRW,
                     [(set RFP32:$dst, (OpNode RFP32:$src1,
                                        (X86fild addr:$src2, i16)))]>;
-def _FpI32m32 : FpIf32<(outs RFP32:$dst), (ins RFP32:$src1, i32mem:$src2), OneArgFPRW,
+def _FpI32m32 : FpIf32<(outs RFP32:$dst), (ins RFP32:$src1, i32mem:$src2), 
+                       OneArgFPRW,
                     [(set RFP32:$dst, (OpNode RFP32:$src1,
                                        (X86fild addr:$src2, i32)))]>;
-def _FpI16m64 : FpIf64<(outs RFP64:$dst), (ins RFP64:$src1, i16mem:$src2), OneArgFPRW,
+def _FpI16m64 : FpIf64<(outs RFP64:$dst), (ins RFP64:$src1, i16mem:$src2), 
+                       OneArgFPRW,
                     [(set RFP64:$dst, (OpNode RFP64:$src1,
                                        (X86fild addr:$src2, i16)))]>;
-def _FpI32m64 : FpIf64<(outs RFP64:$dst), (ins RFP64:$src1, i32mem:$src2), OneArgFPRW,
+def _FpI32m64 : FpIf64<(outs RFP64:$dst), (ins RFP64:$src1, i32mem:$src2), 
+                       OneArgFPRW,
                     [(set RFP64:$dst, (OpNode RFP64:$src1,
                                        (X86fild addr:$src2, i32)))]>;
-def _FpI16m80 : FpI_<(outs RFP80:$dst), (ins RFP80:$src1, i16mem:$src2), OneArgFPRW,
+def _FpI16m80 : FpI_<(outs RFP80:$dst), (ins RFP80:$src1, i16mem:$src2), 
+                       OneArgFPRW,
                     [(set RFP80:$dst, (OpNode RFP80:$src1,
                                        (X86fild addr:$src2, i16)))]>;
-def _FpI32m80 : FpI_<(outs RFP80:$dst), (ins RFP80:$src1, i32mem:$src2), OneArgFPRW,
+def _FpI32m80 : FpI_<(outs RFP80:$dst), (ins RFP80:$src1, i32mem:$src2), 
+                       OneArgFPRW,
                     [(set RFP80:$dst, (OpNode RFP80:$src1,
                                        (X86fild addr:$src2, i32)))]>;
 def _FI16m  : FPI<0xDE, fp, (outs), (ins i16mem:$src), 
-                  !strconcat("fi", !strconcat(asmstring, "{s}\t$src"))> { let mayLoad = 1; }
+                  !strconcat("fi", !strconcat(asmstring, "{s}\t$src"))> { 
+  let mayLoad = 1; 
+}
 def _FI32m  : FPI<0xDA, fp, (outs), (ins i32mem:$src), 
-                  !strconcat("fi", !strconcat(asmstring, "{l}\t$src"))> { let mayLoad = 1; }
+                  !strconcat("fi", !strconcat(asmstring, "{l}\t$src"))> { 
+  let mayLoad = 1; 
+}
 }
 
 defm ADD : FPBinary_rr<fadd>;
@@ -279,6 +298,9 @@ def DIV_FST0r   : FPST0rInst <0xF0, "fdiv\t$op">;
 def DIVR_FrST0  : FPrST0Inst <0xF0, "fdiv{|r}\t{%st(0), $op|$op, %ST(0)}">;
 def DIVR_FPrST0 : FPrST0PInst<0xF0, "fdiv{|r}p\t$op">;
 
+def COM_FST0r   : FPST0rInst <0xD0, "fcom\t$op">;
+def COMP_FST0r  : FPST0rInst <0xD8, "fcomp\t$op">;
+
 // Unary operations.
 multiclass FPUnary<SDNode OpNode, bits<8> opcode, string asmstring> {
 def _Fp32  : FpIf32<(outs RFP32:$dst), (ins RFP32:$src), OneArgFPRW,
@@ -305,22 +327,22 @@ def TST_F  : FPI<0xE4, RawFrm, (outs), (ins), "ftst">, D9;
 
 // Versions of FP instructions that take a single memory operand.  Added for the
 //   disassembler; remove as they are included with patterns elsewhere.
-def FCOM32m  : FPI<0xD8, MRM2m, (outs), (ins f32mem:$src), "fcom\t$src">;
-def FCOMP32m : FPI<0xD8, MRM3m, (outs), (ins f32mem:$src), "fcomp\t$src">;
+def FCOM32m  : FPI<0xD8, MRM2m, (outs), (ins f32mem:$src), "fcom{l}\t$src">;
+def FCOMP32m : FPI<0xD8, MRM3m, (outs), (ins f32mem:$src), "fcomp{l}\t$src">;
 
 def FLDENVm  : FPI<0xD9, MRM4m, (outs), (ins f32mem:$src), "fldenv\t$src">;
-def FSTENVm  : FPI<0xD9, MRM6m, (outs f32mem:$dst), (ins), "fstenv\t$dst">;
+def FSTENVm  : FPI<0xD9, MRM6m, (outs f32mem:$dst), (ins), "fnstenv\t$dst">;
 
 def FICOM32m : FPI<0xDA, MRM2m, (outs), (ins i32mem:$src), "ficom{l}\t$src">;
 def FICOMP32m: FPI<0xDA, MRM3m, (outs), (ins i32mem:$src), "ficomp{l}\t$src">;
 
-def FCOM64m  : FPI<0xDC, MRM2m, (outs), (ins f64mem:$src), "fcom\t$src">;
-def FCOMP64m : FPI<0xDC, MRM3m, (outs), (ins f64mem:$src), "fcomp\t$src">;
+def FCOM64m  : FPI<0xDC, MRM2m, (outs), (ins f64mem:$src), "fcom{ll}\t$src">;
+def FCOMP64m : FPI<0xDC, MRM3m, (outs), (ins f64mem:$src), "fcomp{ll}\t$src">;
 
 def FISTTP32m: FPI<0xDD, MRM1m, (outs i32mem:$dst), (ins), "fisttp{l}\t$dst">;
 def FRSTORm  : FPI<0xDD, MRM4m, (outs f32mem:$dst), (ins), "frstor\t$dst">;
-def FSAVEm   : FPI<0xDD, MRM6m, (outs f32mem:$dst), (ins), "fsave\t$dst">;
-def FSTSWm   : FPI<0xDD, MRM7m, (outs f32mem:$dst), (ins), "fstsw\t$dst">;
+def FSAVEm   : FPI<0xDD, MRM6m, (outs f32mem:$dst), (ins), "fnsave\t$dst">;
+def FNSTSWm  : FPI<0xDD, MRM7m, (outs f32mem:$dst), (ins), "fnstsw\t$dst">;
 
 def FICOM16m : FPI<0xDE, MRM2m, (outs), (ins i16mem:$src), "ficom{w}\t$src">;
 def FICOMP16m: FPI<0xDE, MRM3m, (outs), (ins i16mem:$src), "ficomp{w}\t$src">;
@@ -493,7 +515,8 @@ def ISTT_Fp64m80 : FpI_<(outs), (ins i64mem:$op, RFP80:$src), OneArgFP,
 let mayStore = 1 in {
 def ISTT_FP16m : FPI<0xDF, MRM1m, (outs), (ins i16mem:$dst), "fisttp{s}\t$dst">;
 def ISTT_FP32m : FPI<0xDB, MRM1m, (outs), (ins i32mem:$dst), "fisttp{l}\t$dst">;
-def ISTT_FP64m : FPI<0xDD, MRM1m, (outs), (ins i64mem:$dst), "fisttp{ll}\t$dst">;
+def ISTT_FP64m : FPI<0xDD, MRM1m, (outs), (ins i64mem:$dst), 
+  "fisttp{ll}\t$dst">;
 }
 
 // FP Stack manipulation instructions.
@@ -561,10 +584,15 @@ def UCOM_FIPr  : FPI<0xE8, AddRegFrm,     // CC = cmp ST(0) with ST(i), pop
                     "fucomip\t{$reg, %st(0)|%ST(0), $reg}">, DF;
 }
 
+def COM_FIr : FPI<0xF0, AddRegFrm, (outs), (ins RST:$reg),
+                  "fcomi\t{$reg, %st(0)|%ST(0), $reg}">, DB;
+def COM_FIPr : FPI<0xF0, AddRegFrm, (outs), (ins RST:$reg),
+                   "fcomip\t{$reg, %st(0)|%ST(0), $reg}">, DF;
+
 // Floating point flag ops.
 let Defs = [AX] in
 def FNSTSW8r  : I<0xE0, RawFrm,                  // AX = fp flags
-                  (outs), (ins), "fnstsw", []>, DF;
+                  (outs), (ins), "fnstsw %ax", []>, DF;
 
 def FNSTCW16m : I<0xD9, MRM7m,                   // [mem16] = X87 control world
                   (outs), (ins i16mem:$dst), "fnstcw\t$dst",
@@ -574,6 +602,44 @@ let mayLoad = 1 in
 def FLDCW16m  : I<0xD9, MRM5m,                   // X87 control world = [mem16]
                   (outs), (ins i16mem:$dst), "fldcw\t$dst", []>;
 
+// Register free
+
+def FFREE : FPI<0xC0, AddRegFrm, (outs), (ins RST:$reg),
+                "ffree\t$reg">, DD;
+
+// Clear exceptions
+
+def FNCLEX : I<0xE2, RawFrm, (outs), (ins), "fnclex", []>, DB;
+
+// Operandless floating-point instructions for the disassembler
+
+def FNOP : I<0xD0, RawFrm, (outs), (ins), "fnop", []>, D9;
+def FXAM : I<0xE5, RawFrm, (outs), (ins), "fxam", []>, D9;
+def FLDL2T : I<0xE9, RawFrm, (outs), (ins), "fldl2t", []>, D9;
+def FLDL2E : I<0xEA, RawFrm, (outs), (ins), "fldl2e", []>, D9;
+def FLDPI : I<0xEB, RawFrm, (outs), (ins), "fldpi", []>, D9;
+def FLDLG2 : I<0xEC, RawFrm, (outs), (ins), "fldlg2", []>, D9;
+def FLDLN2 : I<0xED, RawFrm, (outs), (ins), "fldln2", []>, D9;
+def F2XM1 : I<0xF0, RawFrm, (outs), (ins), "f2xm1", []>, D9;
+def FYL2X : I<0xF1, RawFrm, (outs), (ins), "fyl2x", []>, D9;
+def FPTAN : I<0xF2, RawFrm, (outs), (ins), "fptan", []>, D9;
+def FPATAN : I<0xF3, RawFrm, (outs), (ins), "fpatan", []>, D9;
+def FXTRACT : I<0xF4, RawFrm, (outs), (ins), "fxtract", []>, D9;
+def FPREM1 : I<0xF5, RawFrm, (outs), (ins), "fprem1", []>, D9;
+def FDECSTP : I<0xF6, RawFrm, (outs), (ins), "fdecstp", []>, D9;
+def FINCSTP : I<0xF7, RawFrm, (outs), (ins), "fincstp", []>, D9;
+def FPREM : I<0xF8, RawFrm, (outs), (ins), "fprem", []>, D9;
+def FYL2XP1 : I<0xF9, RawFrm, (outs), (ins), "fyl2xp1", []>, D9;
+def FSINCOS : I<0xFB, RawFrm, (outs), (ins), "fsincos", []>, D9;
+def FRNDINT : I<0xFC, RawFrm, (outs), (ins), "frndint", []>, D9;
+def FSCALE : I<0xFD, RawFrm, (outs), (ins), "fscale", []>, D9;
+def FCOMPP : I<0xD9, RawFrm, (outs), (ins), "fcompp", []>, DE;
+
+def FXSAVE : I<0xAE, MRM0m, (outs opaque512mem:$dst), (ins),
+               "fxsave\t$dst", []>, TB;
+def FXRSTOR : I<0xAE, MRM1m, (outs), (ins opaque512mem:$src),
+                "fxrstor\t$src", []>, TB;
+
 //===----------------------------------------------------------------------===//
 // Non-Instruction Patterns
 //===----------------------------------------------------------------------===//
@@ -585,11 +651,15 @@ def : Pat<(X86fld addr:$src, f80), (LD_Fp80m addr:$src)>;
 
 // Required for CALL which return f32 / f64 / f80 values.
 def : Pat<(X86fst RFP32:$src, addr:$op, f32), (ST_Fp32m addr:$op, RFP32:$src)>;
-def : Pat<(X86fst RFP64:$src, addr:$op, f32), (ST_Fp64m32 addr:$op, RFP64:$src)>;
+def : Pat<(X86fst RFP64:$src, addr:$op, f32), (ST_Fp64m32 addr:$op, 
+                                                          RFP64:$src)>;
 def : Pat<(X86fst RFP64:$src, addr:$op, f64), (ST_Fp64m addr:$op, RFP64:$src)>;
-def : Pat<(X86fst RFP80:$src, addr:$op, f32), (ST_Fp80m32 addr:$op, RFP80:$src)>;
-def : Pat<(X86fst RFP80:$src, addr:$op, f64), (ST_Fp80m64 addr:$op, RFP80:$src)>;
-def : Pat<(X86fst RFP80:$src, addr:$op, f80), (ST_FpP80m addr:$op, RFP80:$src)>;
+def : Pat<(X86fst RFP80:$src, addr:$op, f32), (ST_Fp80m32 addr:$op, 
+                                                          RFP80:$src)>;
+def : Pat<(X86fst RFP80:$src, addr:$op, f64), (ST_Fp80m64 addr:$op, 
+                                                          RFP80:$src)>;
+def : Pat<(X86fst RFP80:$src, addr:$op, f80), (ST_FpP80m addr:$op,
+                                                         RFP80:$src)>;
 
 // Floating point constant -0.0 and -1.0
 def : Pat<(f32 fpimmneg0), (CHS_Fp32 (LD_Fp032))>, Requires<[FPStackf32]>;
diff --git a/lib/Target/X86/X86InstrFormats.td b/lib/Target/X86/X86InstrFormats.td
index 2f14bb0d9a..a799f165f7 100644
--- a/lib/Target/X86/X86InstrFormats.td
+++ b/lib/Target/X86/X86InstrFormats.td
@@ -115,17 +115,20 @@ class I<bits<8> o, Format f, dag outs, dag ins, string asm, list<dag> pattern>
   let Pattern = pattern;
   let CodeSize = 3;
 }
-class Ii8 <bits<8> o, Format f, dag outs, dag ins, string asm, list<dag> pattern>
+class Ii8 <bits<8> o, Format f, dag outs, dag ins, string asm, 
+           list<dag> pattern>
   : X86Inst<o, f, Imm8 , outs, ins, asm> {
   let Pattern = pattern;
   let CodeSize = 3;
 }
-class Ii16<bits<8> o, Format f, dag outs, dag ins, string asm, list<dag> pattern>
+class Ii16<bits<8> o, Format f, dag outs, dag ins, string asm, 
+           list<dag> pattern>
   : X86Inst<o, f, Imm16, outs, ins, asm> {
   let Pattern = pattern;
   let CodeSize = 3;
 }
-class Ii32<bits<8> o, Format f, dag outs, dag ins, string asm, list<dag> pattern>
+class Ii32<bits<8> o, Format f, dag outs, dag ins, string asm, 
+           list<dag> pattern>
   : X86Inst<o, f, Imm32, outs, ins, asm> {
   let Pattern = pattern;
   let CodeSize = 3;
@@ -169,7 +172,8 @@ class Iseg32 <bits<8> o, Format f, dag outs, dag ins, string asm,
 
 class SSI<bits<8> o, Format F, dag outs, dag ins, string asm, list<dag> pattern>
       : I<o, F, outs, ins, asm, pattern>, XS, Requires<[HasSSE1]>;
-class SSIi8<bits<8> o, Format F, dag outs, dag ins, string asm, list<dag> pattern>
+class SSIi8<bits<8> o, Format F, dag outs, dag ins, string asm, 
+            list<dag> pattern>
       : Ii8<o, F, outs, ins, asm, pattern>, XS, Requires<[HasSSE1]>;
 class PSI<bits<8> o, Format F, dag outs, dag ins, string asm, list<dag> pattern>
       : I<o, F, outs, ins, asm, pattern>, TB, Requires<[HasSSE1]>;
@@ -205,9 +209,11 @@ class PDIi8<bits<8> o, Format F, dag outs, dag ins, string asm,
 //   S3SI  - SSE3 instructions with XS prefix.
 //   S3DI  - SSE3 instructions with XD prefix.
 
-class S3SI<bits<8> o, Format F, dag outs, dag ins, string asm, list<dag> pattern>
+class S3SI<bits<8> o, Format F, dag outs, dag ins, string asm, 
+           list<dag> pattern>
       : I<o, F, outs, ins, asm, pattern>, XS, Requires<[HasSSE3]>;
-class S3DI<bits<8> o, Format F, dag outs, dag ins, string asm, list<dag> pattern>
+class S3DI<bits<8> o, Format F, dag outs, dag ins, string asm, 
+           list<dag> pattern>
       : I<o, F, outs, ins, asm, pattern>, XD, Requires<[HasSSE3]>;
 class S3I<bits<8> o, Format F, dag outs, dag ins, string asm, list<dag> pattern>
       : I<o, F, outs, ins, asm, pattern>, TB, OpSize, Requires<[HasSSE3]>;
@@ -255,7 +261,7 @@ class SS42FI<bits<8> o, Format F, dag outs, dag ins, string asm,
       
 //   SS42AI = SSE 4.2 instructions with TA prefix
 class SS42AI<bits<8> o, Format F, dag outs, dag ins, string asm,
-	     list<dag> pattern>
+             list<dag> pattern>
       : I<o, F, outs, ins, asm, pattern>, TA, Requires<[HasSSE42]>;
 
 // X86-64 Instruction templates...
@@ -297,17 +303,24 @@ class RPDI<bits<8> o, Format F, dag outs, dag ins, string asm,
 // MMXIi8 - MMX instructions with ImmT == Imm8 and TB prefix.
 // MMXID  - MMX instructions with XD prefix.
 // MMXIS  - MMX instructions with XS prefix.
-class MMXI<bits<8> o, Format F, dag outs, dag ins, string asm, list<dag> pattern>
+class MMXI<bits<8> o, Format F, dag outs, dag ins, string asm, 
+           list<dag> pattern>
       : I<o, F, outs, ins, asm, pattern>, TB, Requires<[HasMMX]>;
-class MMXI64<bits<8> o, Format F, dag outs, dag ins, string asm, list<dag> pattern>
+class MMXI64<bits<8> o, Format F, dag outs, dag ins, string asm, 
+             list<dag> pattern>
       : I<o, F, outs, ins, asm, pattern>, TB, Requires<[HasMMX,In64BitMode]>;
-class MMXRI<bits<8> o, Format F, dag outs, dag ins, string asm, list<dag> pattern>
+class MMXRI<bits<8> o, Format F, dag outs, dag ins, string asm, 
+            list<dag> pattern>
       : I<o, F, outs, ins, asm, pattern>, TB, REX_W, Requires<[HasMMX]>;
-class MMX2I<bits<8> o, Format F, dag outs, dag ins, string asm, list<dag> pattern>
+class MMX2I<bits<8> o, Format F, dag outs, dag ins, string asm, 
+            list<dag> pattern>
       : I<o, F, outs, ins, asm, pattern>, TB, OpSize, Requires<[HasMMX]>;
-class MMXIi8<bits<8> o, Format F, dag outs, dag ins, string asm, list<dag> pattern>
+class MMXIi8<bits<8> o, Format F, dag outs, dag ins, string asm, 
+             list<dag> pattern>
       : Ii8<o, F, outs, ins, asm, pattern>, TB, Requires<[HasMMX]>;
-class MMXID<bits<8> o, Format F, dag outs, dag ins, string asm, list<dag> pattern>
+class MMXID<bits<8> o, Format F, dag outs, dag ins, string asm, 
+            list<dag> pattern>
       : Ii8<o, F, outs, ins, asm, pattern>, XD, Requires<[HasMMX]>;
-class MMXIS<bits<8> o, Format F, dag outs, dag ins, string asm, list<dag> pattern>
+class MMXIS<bits<8> o, Format F, dag outs, dag ins, string asm, 
+            list<dag> pattern>
       : Ii8<o, F, outs, ins, asm, pattern>, XS, Requires<[HasMMX]>;
diff --git a/lib/Target/X86/X86InstrInfo.cpp b/lib/Target/X86/X86InstrInfo.cpp
index 1947d3585c..bc72f63aeb 100644
--- a/lib/Target/X86/X86InstrInfo.cpp
+++ b/lib/Target/X86/X86InstrInfo.cpp
@@ -1880,7 +1880,7 @@ bool X86InstrInfo::copyRegToReg(MachineBasicBlock &MBB,
     if (SrcReg != X86::EFLAGS)
       return false;
     if (DestRC == &X86::GR64RegClass || DestRC == &X86::GR64_NOSPRegClass) {
-      BuildMI(MBB, MI, DL, get(X86::PUSHFQ));
+      BuildMI(MBB, MI, DL, get(X86::PUSHFQ64));
       BuildMI(MBB, MI, DL, get(X86::POP64r), DestReg);
       return true;
     } else if (DestRC == &X86::GR32RegClass ||
diff --git a/lib/Target/X86/X86InstrInfo.td b/lib/Target/X86/X86InstrInfo.td
index 3f63acad89..7411dde895 100644
--- a/lib/Target/X86/X86InstrInfo.td
+++ b/lib/Target/X86/X86InstrInfo.td
@@ -1,4 +1,4 @@
-//===- X86InstrInfo.td - Describe the X86 Instruction Set --*- tablegen -*-===//
+
 // 
 //                     The LLVM Compiler Infrastructure
 //
@@ -199,6 +199,12 @@ class X86MemOperand<string printMethod> : Operand<iPTR> {
 def opaque32mem : X86MemOperand<"printopaquemem">;
 def opaque48mem : X86MemOperand<"printopaquemem">;
 def opaque80mem : X86MemOperand<"printopaquemem">;
+def opaque512mem : X86MemOperand<"printopaquemem">;
+
+def offset8 : Operand<i64>  { let PrintMethod = "print_pcrel_imm"; }
+def offset16 : Operand<i64> { let PrintMethod = "print_pcrel_imm"; }
+def offset32 : Operand<i64> { let PrintMethod = "print_pcrel_imm"; }
+def offset64 : Operand<i64> { let PrintMethod = "print_pcrel_imm"; }
 
 def i8mem   : X86MemOperand<"printi8mem">;
 def i16mem  : X86MemOperand<"printi16mem">;
@@ -354,7 +360,8 @@ def loadi16 : PatFrag<(ops node:$ptr), (i16 (unindexedload node:$ptr)), [{
   return false;
 }]>;
 
-def loadi16_anyext : PatFrag<(ops node:$ptr), (i32 (unindexedload node:$ptr)), [{
+def loadi16_anyext : PatFrag<(ops node:$ptr), (i32 (unindexedload node:$ptr)),
+[{
   LoadSDNode *LD = cast<LoadSDNode>(N);
   if (const Value *Src = LD->getSrcValue())
     if (const PointerType *PT = dyn_cast<PointerType>(Src->getType()))
@@ -542,13 +549,17 @@ def VASTART_SAVE_XMM_REGS : I<0, Pseudo,
 // Nop
 let neverHasSideEffects = 1 in {
   def NOOP : I<0x90, RawFrm, (outs), (ins), "nop", []>;
+  def NOOPW : I<0x1f, MRM0m, (outs), (ins i16mem:$zero),
+                "nop{w}\t$zero", []>, TB, OpSize;
   def NOOPL : I<0x1f, MRM0m, (outs), (ins i32mem:$zero),
-                "nopl\t$zero", []>, TB;
+                "nop{l}\t$zero", []>, TB;
 }
 
 // Trap
 def INT3 : I<0xcc, RawFrm, (outs), (ins), "int\t3", []>;
 def INT : I<0xcd, RawFrm, (outs), (ins i8imm:$trap), "int\t$trap", []>;
+def IRET16 : I<0xcf, RawFrm, (outs), (ins), "iret{w}", []>, OpSize;
+def IRET32 : I<0xcf, RawFrm, (outs), (ins), "iret{l}", []>;
 
 // PIC base construction.  This expands to code that looks like this:
 //     call  $next_inst
@@ -712,12 +723,14 @@ def ENTER : I<0xC8, RawFrm, (outs), (ins i16imm:$len, i8imm:$lvl),
 // Tail call stuff.
 
 let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1 in
-def TCRETURNdi : I<0, Pseudo, (outs), (ins i32imm:$dst, i32imm:$offset, variable_ops),
+def TCRETURNdi : I<0, Pseudo, (outs), 
+                   (ins i32imm:$dst, i32imm:$offset, variable_ops),
                  "#TC_RETURN $dst $offset",
                  []>;
 
 let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1 in
-def TCRETURNri : I<0, Pseudo, (outs), (ins GR32:$dst, i32imm:$offset, variable_ops),
+def TCRETURNri : I<0, Pseudo, (outs), 
+                   (ins GR32:$dst, i32imm:$offset, variable_ops),
                  "#TC_RETURN $dst $offset",
                  []>;
 
@@ -725,7 +738,8 @@ let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1 in
   def TAILJMPd : IBr<0xE9, (ins i32imm_pcrel:$dst), "jmp\t$dst  # TAILCALL",
                  []>;
 let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1 in
-  def TAILJMPr : I<0xFF, MRM4r, (outs), (ins GR32:$dst), "jmp{l}\t{*}$dst  # TAILCALL",
+  def TAILJMPr : I<0xFF, MRM4r, (outs), (ins GR32:$dst), 
+                   "jmp{l}\t{*}$dst  # TAILCALL",
                  []>;     
 let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1 in
   def TAILJMPm : I<0xFF, MRM4m, (outs), (ins i32mem:$dst),
@@ -738,6 +752,15 @@ let Defs = [EBP, ESP], Uses = [EBP, ESP], mayLoad = 1, neverHasSideEffects=1 in
 def LEAVE    : I<0xC9, RawFrm,
                  (outs), (ins), "leave", []>;
 
+def POPCNT16rr : I<0xB8, MRMSrcReg, (outs GR16:$dst), (ins GR16:$src),
+                   "popcnt{w}\t{$src, $dst|$dst, $src}", []>, OpSize, XS;
+def POPCNT16rm : I<0xB8, MRMSrcMem, (outs GR16:$dst), (ins i16mem:$src),
+                   "popcnt{w}\t{$src, $dst|$dst, $src}", []>, OpSize, XS;
+def POPCNT32rr : I<0xB8, MRMSrcReg, (outs GR32:$dst), (ins GR32:$src),
+                   "popcnt{l}\t{$src, $dst|$dst, $src}", []>, XS;
+def POPCNT32rm : I<0xB8, MRMSrcMem, (outs GR32:$dst), (ins i32mem:$src),
+                   "popcnt{l}\t{$src, $dst|$dst, $src}", []>, XS;
+
 let Defs = [ESP], Uses = [ESP], neverHasSideEffects=1 in {
 let mayLoad = 1 in {
 def POP16r  : I<0x58, AddRegFrm, (outs GR16:$reg), (ins), "pop{w}\t$reg", []>,
@@ -773,10 +796,14 @@ def PUSH32i32  : Ii32<0x68, RawFrm, (outs), (ins i32imm:$imm),
                       "push{l}\t$imm", []>;
 }
 
-let Defs = [ESP, EFLAGS], Uses = [ESP], mayLoad = 1, neverHasSideEffects=1 in
-def POPFD    : I<0x9D, RawFrm, (outs), (ins), "popf", []>;
-let Defs = [ESP], Uses = [ESP, EFLAGS], mayStore = 1, neverHasSideEffects=1 in
-def PUSHFD   : I<0x9C, RawFrm, (outs), (ins), "pushf", []>;
+let Defs = [ESP, EFLAGS], Uses = [ESP], mayLoad = 1, neverHasSideEffects=1 in {
+def POPF     : I<0x9D, RawFrm, (outs), (ins), "popf{w}", []>, OpSize;
+def POPFD    : I<0x9D, RawFrm, (outs), (ins), "popf{l}", []>;
+}
+let Defs = [ESP], Uses = [ESP, EFLAGS], mayStore = 1, neverHasSideEffects=1 in {
+def PUSHF    : I<0x9C, RawFrm, (outs), (ins), "pushf{w}", []>, OpSize;
+def PUSHFD   : I<0x9C, RawFrm, (outs), (ins), "pushf{l}", []>;
+}
 
 let isTwoAddress = 1 in                               // GR32 = bswap GR32
   def BSWAP32r : I<0xC8, AddRegFrm,
@@ -918,6 +945,13 @@ let Uses = [EAX] in
 def OUT32ir : Ii8<0xE7, RawFrm, (outs), (ins i16i8imm:$port),
                    "out{l}\t{%eax, $port|$port, %EAX}", []>;
 
+def IN8  : I<0x6C, RawFrm, (outs), (ins),
+             "ins{b}", []>;
+def IN16 : I<0x6D, RawFrm, (outs), (ins),
+             "ins{w}", []>,  OpSize;
+def IN32 : I<0x6D, RawFrm, (outs), (ins),
+             "ins{l}", []>;
+
 //===----------------------------------------------------------------------===//
 //  Move Instructions...
 //
@@ -950,18 +984,18 @@ def MOV32mi : Ii32<0xC7, MRM0m, (outs), (ins i32mem:$dst, i32imm:$src),
                    "mov{l}\t{$src, $dst|$dst, $src}",
                    [(store (i32 imm:$src), addr:$dst)]>;
 
-def MOV8o8a : Ii8 <0xA0, RawFrm, (outs), (ins i8imm:$src),
+def MOV8o8a : Ii8 <0xA0, RawFrm, (outs), (ins offset8:$src),
                    "mov{b}\t{$src, %al|%al, $src}", []>;
-def MOV16o16a : Ii16 <0xA1, RawFrm, (outs), (ins i16imm:$src),
+def MOV16o16a : Ii16 <0xA1, RawFrm, (outs), (ins offset16:$src),
                       "mov{w}\t{$src, %ax|%ax, $src}", []>, OpSize;
-def MOV32o32a : Ii32 <0xA1, RawFrm, (outs), (ins i32imm:$src),
+def MOV32o32a : Ii32 <0xA1, RawFrm, (outs), (ins offset32:$src),
                       "mov{l}\t{$src, %eax|%eax, $src}", []>;
 
-def MOV8ao8 : Ii8 <0xA2, RawFrm, (outs i8imm:$dst), (ins),
+def MOV8ao8 : Ii8 <0xA2, RawFrm, (outs offset8:$dst), (ins),
                    "mov{b}\t{%al, $dst|$dst, %al}", []>;
-def MOV16ao16 : Ii16 <0xA3, RawFrm, (outs i16imm:$dst), (ins),
+def MOV16ao16 : Ii16 <0xA3, RawFrm, (outs offset16:$dst), (ins),
                       "mov{w}\t{%ax, $dst|$dst, %ax}", []>, OpSize;
-def MOV32ao32 : Ii32 <0xA3, RawFrm, (outs i32imm:$dst), (ins),
+def MOV32ao32 : Ii32 <0xA3, RawFrm, (outs offset32:$dst), (ins),
                       "mov{l}\t{%eax, $dst|$dst, %eax}", []>;
 
 // Moves to and from segment registers
@@ -974,6 +1008,13 @@ def MOV16sr : I<0x8E, MRMSrcReg, (outs SEGMENT_REG:$dst), (ins GR16:$src),
 def MOV16sm : I<0x8E, MRMSrcMem, (outs SEGMENT_REG:$dst), (ins i16mem:$src),
                 "mov{w}\t{$src, $dst|$dst, $src}", []>;
 
+def MOV8rr_REV : I<0x8A, MRMSrcReg, (outs GR8:$dst), (ins GR8:$src),
+                   "mov{b}\t{$src, $dst|$dst, $src}", []>;
+def MOV16rr_REV : I<0x8B, MRMSrcReg, (outs GR16:$dst), (ins GR16:$src),
+                    "mov{w}\t{$src, $dst|$dst, $src}", []>, OpSize;
+def MOV32rr_REV : I<0x8B, MRMSrcReg, (outs GR32:$dst), (ins GR32:$src),
+                    "mov{l}\t{$src, $dst|$dst, $src}", []>;
+
 let canFoldAsLoad = 1, isReMaterializable = 1, mayHaveSideEffects = 1 in {
 def MOV8rm  : I<0x8A, MRMSrcMem, (outs GR8 :$dst), (ins i8mem :$src),
                 "mov{b}\t{$src, $dst|$dst, $src}",
@@ -1013,6 +1054,18 @@ def MOV8rm_NOREX : I<0x8A, MRMSrcMem,
                      (outs GR8_NOREX:$dst), (ins i8mem_NOREX:$src),
                      "mov{b}\t{$src, $dst|$dst, $src}  # NOREX", []>;
 
+// Moves to and from debug registers
+def MOV32rd : I<0x21, MRMDestReg, (outs GR32:$dst), (ins DEBUG_REG:$src),
+                "mov{l}\t{$src, $dst|$dst, $src}", []>, TB;
+def MOV32dr : I<0x23, MRMSrcReg, (outs DEBUG_REG:$dst), (ins GR32:$src),
+                "mov{l}\t{$src, $dst|$dst, $src}", []>, TB;
+                
+// Moves to and from control registers
+def MOV32rc : I<0x20, MRMDestReg, (outs GR32:$dst), (ins CONTROL_REG_32:$src),
+                "mov{q}\t{$src, $dst|$dst, $src}", []>, TB;
+def MOV32cr : I<0x22, MRMSrcReg, (outs CONTROL_REG_32:$dst), (ins GR32:$src),
+                "mov{q}\t{$src, $dst|$dst, $src}", []>, TB;
+
 //===----------------------------------------------------------------------===//
 //  Fixed-Register Multiplication and Division Instructions...
 //
@@ -1082,45 +1135,47 @@ def IMUL32m : I<0xF7, MRM5m, (outs), (ins i32mem:$src),
 
 // unsigned division/remainder
 let Defs = [AL,AH,EFLAGS], Uses = [AX] in
-def DIV8r  : I<0xF6, MRM6r, (outs),  (ins GR8:$src),          // AX/r8 = AL,AH
+def DIV8r  : I<0xF6, MRM6r, (outs),  (ins GR8:$src),    // AX/r8 = AL,AH
                "div{b}\t$src", []>;
 let Defs = [AX,DX,EFLAGS], Uses = [AX,DX] in
-def DIV16r : I<0xF7, MRM6r, (outs),  (ins GR16:$src),         // DX:AX/r16 = AX,DX
+def DIV16r : I<0xF7, MRM6r, (outs),  (ins GR16:$src),   // DX:AX/r16 = AX,DX
                "div{w}\t$src", []>, OpSize;
 let Defs = [EAX,EDX,EFLAGS], Uses = [EAX,EDX] in
-def DIV32r : I<0xF7, MRM6r, (outs),  (ins GR32:$src),         // EDX:EAX/r32 = EAX,EDX
+def DIV32r : I<0xF7, MRM6r, (outs),  (ins GR32:$src),   // EDX:EAX/r32 = EAX,EDX
                "div{l}\t$src", []>;
 let mayLoad = 1 in {
 let Defs = [AL,AH,EFLAGS], Uses = [AX] in
-def DIV8m  : I<0xF6, MRM6m, (outs), (ins i8mem:$src),       // AX/[mem8] = AL,AH
+def DIV8m  : I<0xF6, MRM6m, (outs), (ins i8mem:$src),   // AX/[mem8] = AL,AH
                "div{b}\t$src", []>;
 let Defs = [AX,DX,EFLAGS], Uses = [AX,DX] in
-def DIV16m : I<0xF7, MRM6m, (outs), (ins i16mem:$src),      // DX:AX/[mem16] = AX,DX
+def DIV16m : I<0xF7, MRM6m, (outs), (ins i16mem:$src),  // DX:AX/[mem16] = AX,DX
                "div{w}\t$src", []>, OpSize;
 let Defs = [EAX,EDX,EFLAGS], Uses = [EAX,EDX] in
-def DIV32m : I<0xF7, MRM6m, (outs), (ins i32mem:$src),      // EDX:EAX/[mem32] = EAX,EDX
+                                                    // EDX:EAX/[mem32] = EAX,EDX
+def DIV32m : I<0xF7, MRM6m, (outs), (ins i32mem:$src),
                "div{l}\t$src", []>;
 }
 
 // Signed division/remainder.
 let Defs = [AL,AH,EFLAGS], Uses = [AX] in
-def IDIV8r : I<0xF6, MRM7r, (outs),  (ins GR8:$src),          // AX/r8 = AL,AH
+def IDIV8r : I<0xF6, MRM7r, (outs),  (ins GR8:$src),    // AX/r8 = AL,AH
                "idiv{b}\t$src", []>;
 let Defs = [AX,DX,EFLAGS], Uses = [AX,DX] in
-def IDIV16r: I<0xF7, MRM7r, (outs),  (ins GR16:$src),         // DX:AX/r16 = AX,DX
+def IDIV16r: I<0xF7, MRM7r, (outs),  (ins GR16:$src),   // DX:AX/r16 = AX,DX
                "idiv{w}\t$src", []>, OpSize;
 let Defs = [EAX,EDX,EFLAGS], Uses = [EAX,EDX] in
-def IDIV32r: I<0xF7, MRM7r, (outs),  (ins GR32:$src),         // EDX:EAX/r32 = EAX,EDX
+def IDIV32r: I<0xF7, MRM7r, (outs),  (ins GR32:$src),   // EDX:EAX/r32 = EAX,EDX
                "idiv{l}\t$src", []>;
 let mayLoad = 1, mayLoad = 1 in {
 let Defs = [AL,AH,EFLAGS], Uses = [AX] in
-def IDIV8m : I<0xF6, MRM7m, (outs), (ins i8mem:$src),      // AX/[mem8] = AL,AH
+def IDIV8m : I<0xF6, MRM7m, (outs), (ins i8mem:$src),   // AX/[mem8] = AL,AH
                "idiv{b}\t$src", []>;
 let Defs = [AX,DX,EFLAGS], Uses = [AX,DX] in
-def IDIV16m: I<0xF7, MRM7m, (outs), (ins i16mem:$src),     // DX:AX/[mem16] = AX,DX
+def IDIV16m: I<0xF7, MRM7m, (outs), (ins i16mem:$src),  // DX:AX/[mem16] = AX,DX
                "idiv{w}\t$src", []>, OpSize;
 let Defs = [EAX,EDX,EFLAGS], Uses = [EAX,EDX] in
-def IDIV32m: I<0xF7, MRM7m, (outs), (ins i32mem:$src),     // EDX:EAX/[mem32] = EAX,EDX
+def IDIV32m: I<0xF7, MRM7m, (outs), (ins i32mem:$src), 
+                                                    // EDX:EAX/[mem32] = EAX,EDX
                "idiv{l}\t$src", []>;
 }
 
@@ -1148,193 +1203,193 @@ def CMOV_GR8 : I<0, Pseudo,
 let isCommutable = 1 in {
 def CMOVB16rr : I<0x42, MRMSrcReg,       // if <u, GR16 = GR16
                   (outs GR16:$dst), (ins GR16:$src1, GR16:$src2),
-                  "cmovb\t{$src2, $dst|$dst, $src2}",
+                  "cmovb{w}\t{$src2, $dst|$dst, $src2}",
                   [(set GR16:$dst, (X86cmov GR16:$src1, GR16:$src2,
                                    X86_COND_B, EFLAGS))]>,
                   TB, OpSize;
 def CMOVB32rr : I<0x42, MRMSrcReg,       // if <u, GR32 = GR32
                   (outs GR32:$dst), (ins GR32:$src1, GR32:$src2),
-                  "cmovb\t{$src2, $dst|$dst, $src2}",
+                  "cmovb{l}\t{$src2, $dst|$dst, $src2}",
                   [(set GR32:$dst, (X86cmov GR32:$src1, GR32:$src2,
                                    X86_COND_B, EFLAGS))]>,
                    TB;
 def CMOVAE16rr: I<0x43, MRMSrcReg,       // if >=u, GR16 = GR16
                   (outs GR16:$dst), (ins GR16:$src1, GR16:$src2),
-                  "cmovae\t{$src2, $dst|$dst, $src2}",
+                  "cmovae{w}\t{$src2, $dst|$dst, $src2}",
                   [(set GR16:$dst, (X86cmov GR16:$src1, GR16:$src2,
                                    X86_COND_AE, EFLAGS))]>,
                    TB, OpSize;
 def CMOVAE32rr: I<0x43, MRMSrcReg,       // if >=u, GR32 = GR32
                   (outs GR32:$dst), (ins GR32:$src1, GR32:$src2),
-                  "cmovae\t{$src2, $dst|$dst, $src2}",
+                  "cmovae{l}\t{$src2, $dst|$dst, $src2}",
                   [(set GR32:$dst, (X86cmov GR32:$src1, GR32:$src2,
                                    X86_COND_AE, EFLAGS))]>,
                    TB;
 def CMOVE16rr : I<0x44, MRMSrcReg,       // if ==, GR16 = GR16
                   (outs GR16:$dst), (ins GR16:$src1, GR16:$src2),
-                  "cmove\t{$src2, $dst|$dst, $src2}",
+                  "cmove{w}\t{$src2, $dst|$dst, $src2}",
                   [(set GR16:$dst, (X86cmov GR16:$src1, GR16:$src2,
                                    X86_COND_E, EFLAGS))]>,
                    TB, OpSize;
 def CMOVE32rr : I<0x44, MRMSrcReg,       // if ==, GR32 = GR32
                   (outs GR32:$dst), (ins GR32:$src1, GR32:$src2),
-                  "cmove\t{$src2, $dst|$dst, $src2}",
+                  "cmove{l}\t{$src2, $dst|$dst, $src2}",
                   [(set GR32:$dst, (X86cmov GR32:$src1, GR32:$src2,
                                    X86_COND_E, EFLAGS))]>,
                    TB;
 def CMOVNE16rr: I<0x45, MRMSrcReg,       // if !=, GR16 = GR16
                   (outs GR16:$dst), (ins GR16:$src1, GR16:$src2),
-                  "cmovne\t{$src2, $dst|$dst, $src2}",
+                  "cmovne{w}\t{$src2, $dst|$dst, $src2}",
                   [(set GR16:$dst, (X86cmov GR16:$src1, GR16:$src2,
                                    X86_COND_NE, EFLAGS))]>,
                    TB, OpSize;
 def CMOVNE32rr: I<0x45, MRMSrcReg,       // if !=, GR32 = GR32
                   (outs GR32:$dst), (ins GR32:$src1, GR32:$src2),
-                  "cmovne\t{$src2, $dst|$dst, $src2}",
+                  "cmovne{l}\t{$src2, $dst|$dst, $src2}",
                   [(set GR32:$dst, (X86cmov GR32:$src1, GR32:$src2,
                                    X86_COND_NE, EFLAGS))]>,
                    TB;
 def CMOVBE16rr: I<0x46, MRMSrcReg,       // if <=u, GR16 = GR16
                   (outs GR16:$dst), (ins GR16:$src1, GR16:$src2),
-                  "cmovbe\t{$src2, $dst|$dst, $src2}",
+                  "cmovbe{w}\t{$src2, $dst|$dst, $src2}",
                   [(set GR16:$dst, (X86cmov GR16:$src1, GR16:$src2,
                                    X86_COND_BE, EFLAGS))]>,
                    TB, OpSize;
 def CMOVBE32rr: I<0x46, MRMSrcReg,       // if <=u, GR32 = GR32
                   (outs GR32:$dst), (ins GR32:$src1, GR32:$src2),
-                  "cmovbe\t{$src2, $dst|$dst, $src2}",
+                  "cmovbe{l}\t{$src2, $dst|$dst, $src2}",
                   [(set GR32:$dst, (X86cmov GR32:$src1, GR32:$src2,
                                    X86_COND_BE, EFLAGS))]>,
                    TB;
 def CMOVA16rr : I<0x47, MRMSrcReg,       // if >u, GR16 = GR16
                   (outs GR16:$dst), (ins GR16:$src1, GR16:$src2),
-                  "cmova\t{$src2, $dst|$dst, $src2}",
+                  "cmova{w}\t{$src2, $dst|$dst, $src2}",
                   [(set GR16:$dst, (X86cmov GR16:$src1, GR16:$src2,
                                    X86_COND_A, EFLAGS))]>,
                    TB, OpSize;
 def CMOVA32rr : I<0x47, MRMSrcReg,       // if >u, GR32 = GR32
                   (outs GR32:$dst), (ins GR32:$src1, GR32:$src2),
-                  "cmova\t{$src2, $dst|$dst, $src2}",
+                  "cmova{l}\t{$src2, $dst|$dst, $src2}",
                   [(set GR32:$dst, (X86cmov GR32:$src1, GR32:$src2,
                                    X86_COND_A, EFLAGS))]>,
                    TB;
 def CMOVL16rr : I<0x4C, MRMSrcReg,       // if <s, GR16 = GR16
                   (outs GR16:$dst), (ins GR16:$src1, GR16:$src2),
-                  "cmovl\t{$src2, $dst|$dst, $src2}",
+                  "cmovl{w}\t{$src2, $dst|$dst, $src2}",
                   [(set GR16:$dst, (X86cmov GR16:$src1, GR16:$src2,
                                    X86_COND_L, EFLAGS))]>,
                    TB, OpSize;
 def CMOVL32rr : I<0x4C, MRMSrcReg,       // if <s, GR32 = GR32
                   (outs GR32:$dst), (ins GR32:$src1, GR32:$src2),
-                  "cmovl\t{$src2, $dst|$dst, $src2}",
+                  "cmovl{l}\t{$src2, $dst|$dst, $src2}",
                   [(set GR32:$dst, (X86cmov GR32:$src1, GR32:$src2,
                                    X86_COND_L, EFLAGS))]>,
                    TB;
 def CMOVGE16rr: I<0x4D, MRMSrcReg,       // if >=s, GR16 = GR16
                   (outs GR16:$dst), (ins GR16:$src1, GR16:$src2),
-                  "cmovge\t{$src2, $dst|$dst, $src2}",
+                  "cmovge{w}\t{$src2, $dst|$dst, $src2}",
                   [(set GR16:$dst, (X86cmov GR16:$src1, GR16:$src2,
                                    X86_COND_GE, EFLAGS))]>,
                    TB, OpSize;
 def CMOVGE32rr: I<0x4D, MRMSrcReg,       // if >=s, GR32 = GR32
                   (outs GR32:$dst), (ins GR32:$src1, GR32:$src2),
-                  "cmovge\t{$src2, $dst|$dst, $src2}",
+                  "cmovge{l}\t{$src2, $dst|$dst, $src2}",
                   [(set GR32:$dst, (X86cmov GR32:$src1, GR32:$src2,
                                    X86_COND_GE, EFLAGS))]>,
                    TB;
 def CMOVLE16rr: I<0x4E, MRMSrcReg,       // if <=s, GR16 = GR16
                   (outs GR16:$dst), (ins GR16:$src1, GR16:$src2),
-                  "cmovle\t{$src2, $dst|$dst, $src2}",
+                  "cmovle{w}\t{$src2, $dst|$dst, $src2}",
                   [(set GR16:$dst, (X86cmov GR16:$src1, GR16:$src2,
                                    X86_COND_LE, EFLAGS))]>,
                    TB, OpSize;
 def CMOVLE32rr: I<0x4E, MRMSrcReg,       // if <=s, GR32 = GR32
                   (outs GR32:$dst), (ins GR32:$src1, GR32:$src2),
-                  "cmovle\t{$src2, $dst|$dst, $src2}",
+                  "cmovle{l}\t{$src2, $dst|$dst, $src2}",
                   [(set GR32:$dst, (X86cmov GR32:$src1, GR32:$src2,
                                    X86_COND_LE, EFLAGS))]>,
                    TB;
 def CMOVG16rr : I<0x4F, MRMSrcReg,       // if >s, GR16 = GR16
                   (outs GR16:$dst), (ins GR16:$src1, GR16:$src2),
-                  "cmovg\t{$src2, $dst|$dst, $src2}",
+                  "cmovg{w}\t{$src2, $dst|$dst, $src2}",
                   [(set GR16:$dst, (X86cmov GR16:$src1, GR16:$src2,
                                    X86_COND_G, EFLAGS))]>,
                    TB, OpSize;
 def CMOVG32rr : I<0x4F, MRMSrcReg,       // if >s, GR32 = GR32
                   (outs GR32:$dst), (ins GR32:$src1, GR32:$src2),
-                  "cmovg\t{$src2, $dst|$dst, $src2}",
+                  "cmovg{l}\t{$src2, $dst|$dst, $src2}",
                   [(set GR32:$dst, (X86cmov GR32:$src1, GR32:$src2,
                                    X86_COND_G, EFLAGS))]>,
                    TB;
 def CMOVS16rr : I<0x48, MRMSrcReg,       // if signed, GR16 = GR16
                   (outs GR16:$dst), (ins GR16:$src1, GR16:$src2),
-                  "cmovs\t{$src2, $dst|$dst, $src2}",
+                  "cmovs{w}\t{$src2, $dst|$dst, $src2}",
                   [(set GR16:$dst, (X86cmov GR16:$src1, GR16:$src2,
                                    X86_COND_S, EFLAGS))]>,
                   TB, OpSize;
 def CMOVS32rr : I<0x48, MRMSrcReg,       // if signed, GR32 = GR32
                   (outs GR32:$dst), (ins GR32:$src1, GR32:$src2),
-                  "cmovs\t{$src2, $dst|$dst, $src2}",
+                  "cmovs{l}\t{$src2, $dst|$dst, $src2}",
                   [(set GR32:$dst, (X86cmov GR32:$src1, GR32:$src2,
                                    X86_COND_S, EFLAGS))]>,
                   TB;
 def CMOVNS16rr: I<0x49, MRMSrcReg,       // if !signed, GR16 = GR16
                   (outs GR16:$dst), (ins GR16:$src1, GR16:$src2),
-                  "cmovns\t{$src2, $dst|$dst, $src2}",
+                  "cmovns{w}\t{$src2, $dst|$dst, $src2}",
                   [(set GR16:$dst, (X86cmov GR16:$src1, GR16:$src2,
                                    X86_COND_NS, EFLAGS))]>,
                   TB, OpSize;
 def CMOVNS32rr: I<0x49, MRMSrcReg,       // if !signed, GR32 = GR32
                   (outs GR32:$dst), (ins GR32:$src1, GR32:$src2),
-                  "cmovns\t{$src2, $dst|$dst, $src2}",
+                  "cmovns{l}\t{$src2, $dst|$dst, $src2}",
                   [(set GR32:$dst, (X86cmov GR32:$src1, GR32:$src2,
                                    X86_COND_NS, EFLAGS))]>,
                   TB;
 def CMOVP16rr : I<0x4A, MRMSrcReg,       // if parity, GR16 = GR16
                   (outs GR16:$dst), (ins GR16:$src1, GR16:$src2),
-                  "cmovp\t{$src2, $dst|$dst, $src2}",
+                  "cmovp{w}\t{$src2, $dst|$dst, $src2}",
                   [(set GR16:$dst, (X86cmov GR16:$src1, GR16:$src2,
                                    X86_COND_P, EFLAGS))]>,
                   TB, OpSize;
 def CMOVP32rr : I<0x4A, MRMSrcReg,       // if parity, GR32 = GR32
                   (outs GR32:$dst), (ins GR32:$src1, GR32:$src2),
-                  "cmovp\t{$src2, $dst|$dst, $src2}",
+                  "cmovp{l}\t{$src2, $dst|$dst, $src2}",
                   [(set GR32:$dst, (X86cmov GR32:$src1, GR32:$src2,
                                    X86_COND_P, EFLAGS))]>,
                   TB;
 def CMOVNP16rr : I<0x4B, MRMSrcReg,       // if !parity, GR16 = GR16
                   (outs GR16:$dst), (ins GR16:$src1, GR16:$src2),
-                  "cmovnp\t{$src2, $dst|$dst, $src2}",
+                  "cmovnp{w}\t{$src2, $dst|$dst, $src2}",
                    [(set GR16:$dst, (X86cmov GR16:$src1, GR16:$src2,
                                     X86_COND_NP, EFLAGS))]>,
                   TB, OpSize;
 def CMOVNP32rr : I<0x4B, MRMSrcReg,       // if !parity, GR32 = GR32
                   (outs GR32:$dst), (ins GR32:$src1, GR32:$src2),
-                  "cmovnp\t{$src2, $dst|$dst, $src2}",
+                  "cmovnp{l}\t{$src2, $dst|$dst, $src2}",
                    [(set GR32:$dst, (X86cmov GR32:$src1, GR32:$src2,
                                     X86_COND_NP, EFLAGS))]>,
                   TB;
 def CMOVO16rr : I<0x40, MRMSrcReg,       // if overflow, GR16 = GR16
                   (outs GR16:$dst), (ins GR16:$src1, GR16:$src2),
-                  "cmovo\t{$src2, $dst|$dst, $src2}",
+                  "cmovo{w}\t{$src2, $dst|$dst, $src2}",
                   [(set GR16:$dst, (X86cmov GR16:$src1, GR16:$src2,
                                    X86_COND_O, EFLAGS))]>,
                   TB, OpSize;
 def CMOVO32rr : I<0x40, MRMSrcReg,       // if overflow, GR32 = GR32
                   (outs GR32:$dst), (ins GR32:$src1, GR32:$src2),
-                  "cmovo\t{$src2, $dst|$dst, $src2}",
+                  "cmovo{l}\t{$src2, $dst|$dst, $src2}",
                   [(set GR32:$dst, (X86cmov GR32:$src1, GR32:$src2,
                                    X86_COND_O, EFLAGS))]>,
                   TB;
 def CMOVNO16rr : I<0x41, MRMSrcReg,       // if !overflow, GR16 = GR16
                   (outs GR16:$dst), (ins GR16:$src1, GR16:$src2),
-                  "cmovno\t{$src2, $dst|$dst, $src2}",
+                  "cmovno{w}\t{$src2, $dst|$dst, $src2}",
                    [(set GR16:$dst, (X86cmov GR16:$src1, GR16:$src2,
                                     X86_COND_NO, EFLAGS))]>,
                   TB, OpSize;
 def CMOVNO32rr : I<0x41, MRMSrcReg,       // if !overflow, GR32 = GR32
                   (outs GR32:$dst), (ins GR32:$src1, GR32:$src2),
-                  "cmovno\t{$src2, $dst|$dst, $src2}",
+                  "cmovno{l}\t{$src2, $dst|$dst, $src2}",
                    [(set GR32:$dst, (X86cmov GR32:$src1, GR32:$src2,
                                     X86_COND_NO, EFLAGS))]>,
                   TB;
@@ -1342,193 +1397,193 @@ def CMOVNO32rr : I<0x41, MRMSrcReg,       // if !overflow, GR32 = GR32
 
 def CMOVB16rm : I<0x42, MRMSrcMem,       // if <u, GR16 = [mem16]
                   (outs GR16:$dst), (ins GR16:$src1, i16mem:$src2),
-                  "cmovb\t{$src2, $dst|$dst, $src2}",
+                  "cmovb{w}\t{$src2, $dst|$dst, $src2}",
                   [(set GR16:$dst, (X86cmov GR16:$src1, (loadi16 addr:$src2),
                                    X86_COND_B, EFLAGS))]>,
                   TB, OpSize;
 def CMOVB32rm : I<0x42, MRMSrcMem,       // if <u, GR32 = [mem32]
                   (outs GR32:$dst), (ins GR32:$src1, i32mem:$src2),
-                  "cmovb\t{$src2, $dst|$dst, $src2}",
+                  "cmovb{l}\t{$src2, $dst|$dst, $src2}",
                   [(set GR32:$dst, (X86cmov GR32:$src1, (loadi32 addr:$src2),
                                    X86_COND_B, EFLAGS))]>,
                    TB;
 def CMOVAE16rm: I<0x43, MRMSrcMem,       // if >=u, GR16 = [mem16]
                   (outs GR16:$dst), (ins GR16:$src1, i16mem:$src2),
-                  "cmovae\t{$src2, $dst|$dst, $src2}",
+                  "cmovae{w}\t{$src2, $dst|$dst, $src2}",
                   [(set GR16:$dst, (X86cmov GR16:$src1, (loadi16 addr:$src2),
                                    X86_COND_AE, EFLAGS))]>,
                    TB, OpSize;
 def CMOVAE32rm: I<0x43, MRMSrcMem,       // if >=u, GR32 = [mem32]
                   (outs GR32:$dst), (ins GR32:$src1, i32mem:$src2),
-                  "cmovae\t{$src2, $dst|$dst, $src2}",
+                  "cmovae{l}\t{$src2, $dst|$dst, $src2}",
                   [(set GR32:$dst, (X86cmov GR32:$src1, (loadi32 addr:$src2),
                                    X86_COND_AE, EFLAGS))]>,
                    TB;
 def CMOVE16rm : I<0x44, MRMSrcMem,       // if ==, GR16 = [mem16]
                   (outs GR16:$dst), (ins GR16:$src1, i16mem:$src2),
-                  "cmove\t{$src2, $dst|$dst, $src2}",
+                  "cmove{w}\t{$src2, $dst|$dst, $src2}",
                   [(set GR16:$dst, (X86cmov GR16:$src1, (loadi16 addr:$src2),
                                    X86_COND_E, EFLAGS))]>,
                    TB, OpSize;
 def CMOVE32rm : I<0x44, MRMSrcMem,       // if ==, GR32 = [mem32]
                   (outs GR32:$dst), (ins GR32:$src1, i32mem:$src2),
-                  "cmove\t{$src2, $dst|$dst, $src2}",
+                  "cmove{l}\t{$src2, $dst|$dst, $src2}",
                   [(set GR32:$dst, (X86cmov GR32:$src1, (loadi32 addr:$src2),
                                    X86_COND_E, EFLAGS))]>,
                    TB;
 def CMOVNE16rm: I<0x45, MRMSrcMem,       // if !=, GR16 = [mem16]
                   (outs GR16:$dst), (ins GR16:$src1, i16mem:$src2),
-                  "cmovne\t{$src2, $dst|$dst, $src2}",
+                  "cmovne{w}\t{$src2, $dst|$dst, $src2}",
                   [(set GR16:$dst, (X86cmov GR16:$src1, (loadi16 addr:$src2),
                                    X86_COND_NE, EFLAGS))]>,
                    TB, OpSize;
 def CMOVNE32rm: I<0x45, MRMSrcMem,       // if !=, GR32 = [mem32]
                   (outs GR32:$dst), (ins GR32:$src1, i32mem:$src2),
-                  "cmovne\t{$src2, $dst|$dst, $src2}",
+                  "cmovne{l}\t{$src2, $dst|$dst, $src2}",
                   [(set GR32:$dst, (X86cmov GR32:$src1, (loadi32 addr:$src2),
                                    X86_COND_NE, EFLAGS))]>,
                    TB;
 def CMOVBE16rm: I<0x46, MRMSrcMem,       // if <=u, GR16 = [mem16]
                   (outs GR16:$dst), (ins GR16:$src1, i16mem:$src2),
-                  "cmovbe\t{$src2, $dst|$dst, $src2}",
+                  "cmovbe{w}\t{$src2, $dst|$dst, $src2}",
                   [(set GR16:$dst, (X86cmov GR16:$src1, (loadi16 addr:$src2),
                                    X86_COND_BE, EFLAGS))]>,
                    TB, OpSize;
 def CMOVBE32rm: I<0x46, MRMSrcMem,       // if <=u, GR32 = [mem32]
                   (outs GR32:$dst), (ins GR32:$src1, i32mem:$src2),
-                  "cmovbe\t{$src2, $dst|$dst, $src2}",
+                  "cmovbe{l}\t{$src2, $dst|$dst, $src2}",
                   [(set GR32:$dst, (X86cmov GR32:$src1, (loadi32 addr:$src2),
                                    X86_COND_BE, EFLAGS))]>,
                    TB;
 def CMOVA16rm : I<0x47, MRMSrcMem,       // if >u, GR16 = [mem16]
                   (outs GR16:$dst), (ins GR16:$src1, i16mem:$src2),
-                  "cmova\t{$src2, $dst|$dst, $src2}",
+                  "cmova{w}\t{$src2, $dst|$dst, $src2}",
                   [(set GR16:$dst, (X86cmov GR16:$src1, (loadi16 addr:$src2),
                                    X86_COND_A, EFLAGS))]>,
                    TB, OpSize;
 def CMOVA32rm : I<0x47, MRMSrcMem,       // if >u, GR32 = [mem32]
                   (outs GR32:$dst), (ins GR32:$src1, i32mem:$src2),
-                  "cmova\t{$src2, $dst|$dst, $src2}",
+                  "cmova{l}\t{$src2, $dst|$dst, $src2}",
                   [(set GR32:$dst, (X86cmov GR32:$src1, (loadi32 addr:$src2),
                                    X86_COND_A, EFLAGS))]>,
                    TB;
 def CMOVL16rm : I<0x4C, MRMSrcMem,       // if <s, GR16 = [mem16]
                   (outs GR16:$dst), (ins GR16:$src1, i16mem:$src2),
-                  "cmovl\t{$src2, $dst|$dst, $src2}",
+                  "cmovl{w}\t{$src2, $dst|$dst, $src2}",
                   [(set GR16:$dst, (X86cmov GR16:$src1, (loadi16 addr:$src2),
                                    X86_COND_L, EFLAGS))]>,
                    TB, OpSize;
 def CMOVL32rm : I<0x4C, MRMSrcMem,       // if <s, GR32 = [mem32]
                   (outs GR32:$dst), (ins GR32:$src1, i32mem:$src2),
-                  "cmovl\t{$src2, $dst|$dst, $src2}",
+                  "cmovl{l}\t{$src2, $dst|$dst, $src2}",
                   [(set GR32:$dst, (X86cmov GR32:$src1, (loadi32 addr:$src2),
                                    X86_COND_L, EFLAGS))]>,
                    TB;
 def CMOVGE16rm: I<0x4D, MRMSrcMem,       // if >=s, GR16 = [mem16]
                   (outs GR16:$dst), (ins GR16:$src1, i16mem:$src2),
-                  "cmovge\t{$src2, $dst|$dst, $src2}",
+                  "cmovge{w}\t{$src2, $dst|$dst, $src2}",
                   [(set GR16:$dst, (X86cmov GR16:$src1, (loadi16 addr:$src2),
                                    X86_COND_GE, EFLAGS))]>,
                    TB, OpSize;
 def CMOVGE32rm: I<0x4D, MRMSrcMem,       // if >=s, GR32 = [mem32]
                   (outs GR32:$dst), (ins GR32:$src1, i32mem:$src2),
-                  "cmovge\t{$src2, $dst|$dst, $src2}",
+                  "cmovge{l}\t{$src2, $dst|$dst, $src2}",
                   [(set GR32:$dst, (X86cmov GR32:$src1, (loadi32 addr:$src2),
                                    X86_COND_GE, EFLAGS))]>,
                    TB;
 def CMOVLE16rm: I<0x4E, MRMSrcMem,       // if <=s, GR16 = [mem16]
                   (outs GR16:$dst), (ins GR16:$src1, i16mem:$src2),
-                  "cmovle\t{$src2, $dst|$dst, $src2}",
+                  "cmovle{w}\t{$src2, $dst|$dst, $src2}",
                   [(set GR16:$dst, (X86cmov GR16:$src1, (loadi16 addr:$src2),
                                    X86_COND_LE, EFLAGS))]>,
                    TB, OpSize;
 def CMOVLE32rm: I<0x4E, MRMSrcMem,       // if <=s, GR32 = [mem32]
                   (outs GR32:$dst), (ins GR32:$src1, i32mem:$src2),
-                  "cmovle\t{$src2, $dst|$dst, $src2}",
+                  "cmovle{l}\t{$src2, $dst|$dst, $src2}",
                   [(set GR32:$dst, (X86cmov GR32:$src1, (loadi32 addr:$src2),
                                    X86_COND_LE, EFLAGS))]>,
                    TB;
 def CMOVG16rm : I<0x4F, MRMSrcMem,       // if >s, GR16 = [mem16]
                   (outs GR16:$dst), (ins GR16:$src1, i16mem:$src2),
-                  "cmovg\t{$src2, $dst|$dst, $src2}",
+                  "cmovg{w}\t{$src2, $dst|$dst, $src2}",
                   [(set GR16:$dst, (X86cmov GR16:$src1, (loadi16 addr:$src2),
                                    X86_COND_G, EFLAGS))]>,
                    TB, OpSize;
 def CMOVG32rm : I<0x4F, MRMSrcMem,       // if >s, GR32 = [mem32]
                   (outs GR32:$dst), (ins GR32:$src1, i32mem:$src2),
-                  "cmovg\t{$src2, $dst|$dst, $src2}",
+                  "cmovg{l}\t{$src2, $dst|$dst, $src2}",
                   [(set GR32:$dst, (X86cmov GR32:$src1, (loadi32 addr:$src2),
                                    X86_COND_G, EFLAGS))]>,
                    TB;
 def CMOVS16rm : I<0x48, MRMSrcMem,       // if signed, GR16 = [mem16]
                   (outs GR16:$dst), (ins GR16:$src1, i16mem:$src2),
-                  "cmovs\t{$src2, $dst|$dst, $src2}",
+                  "cmovs{w}\t{$src2, $dst|$dst, $src2}",
                   [(set GR16:$dst, (X86cmov GR16:$src1, (loadi16 addr:$src2),
                                    X86_COND_S, EFLAGS))]>,
                   TB, OpSize;
 def CMOVS32rm : I<0x48, MRMSrcMem,       // if signed, GR32 = [mem32]
                   (outs GR32:$dst), (ins GR32:$src1, i32mem:$src2),
-                  "cmovs\t{$src2, $dst|$dst, $src2}",
+                  "cmovs{l}\t{$src2, $dst|$dst, $src2}",
                   [(set GR32:$dst, (X86cmov GR32:$src1, (loadi32 addr:$src2),
                                    X86_COND_S, EFLAGS))]>,
                   TB;
 def CMOVNS16rm: I<0x49, MRMSrcMem,       // if !signed, GR16 = [mem16]
                   (outs GR16:$dst), (ins GR16:$src1, i16mem:$src2),
-                  "cmovns\t{$src2, $dst|$dst, $src2}",
+                  "cmovns{w}\t{$src2, $dst|$dst, $src2}",
                   [(set GR16:$dst, (X86cmov GR16:$src1, (loadi16 addr:$src2),
                                    X86_COND_NS, EFLAGS))]>,
                   TB, OpSize;
 def CMOVNS32rm: I<0x49, MRMSrcMem,       // if !signed, GR32 = [mem32]
                   (outs GR32:$dst), (ins GR32:$src1, i32mem:$src2),
-                  "cmovns\t{$src2, $dst|$dst, $src2}",
+                  "cmovns{l}\t{$src2, $dst|$dst, $src2}",
                   [(set GR32:$dst, (X86cmov GR32:$src1, (loadi32 addr:$src2),
                                    X86_COND_NS, EFLAGS))]>,
                   TB;
 def CMOVP16rm : I<0x4A, MRMSrcMem,       // if parity, GR16 = [mem16]
                   (outs GR16:$dst), (ins GR16:$src1, i16mem:$src2),
-                  "cmovp\t{$src2, $dst|$dst, $src2}",
+                  "cmovp{w}\t{$src2, $dst|$dst, $src2}",
                   [(set GR16:$dst, (X86cmov GR16:$src1, (loadi16 addr:$src2),
                                    X86_COND_P, EFLAGS))]>,
                   TB, OpSize;
 def CMOVP32rm : I<0x4A, MRMSrcMem,       // if parity, GR32 = [mem32]
                   (outs GR32:$dst), (ins GR32:$src1, i32mem:$src2),
-                  "cmovp\t{$src2, $dst|$dst, $src2}",
+                  "cmovp{l}\t{$src2, $dst|$dst, $src2}",
                   [(set GR32:$dst, (X86cmov GR32:$src1, (loadi32 addr:$src2),
                                    X86_COND_P, EFLAGS))]>,
                   TB;
 def CMOVNP16rm : I<0x4B, MRMSrcMem,       // if !parity, GR16 = [mem16]
                   (outs GR16:$dst), (ins GR16:$src1, i16mem:$src2),
-                  "cmovnp\t{$src2, $dst|$dst, $src2}",
+                  "cmovnp{w}\t{$src2, $dst|$dst, $src2}",
                    [(set GR16:$dst, (X86cmov GR16:$src1, (loadi16 addr:$src2),
                                     X86_COND_NP, EFLAGS))]>,
                   TB, OpSize;
 def CMOVNP32rm : I<0x4B, MRMSrcMem,       // if !parity, GR32 = [mem32]
                   (outs GR32:$dst), (ins GR32:$src1, i32mem:$src2),
-                  "cmovnp\t{$src2, $dst|$dst, $src2}",
+                  "cmovnp{l}\t{$src2, $dst|$dst, $src2}",
                    [(set GR32:$dst, (X86cmov GR32:$src1, (loadi32 addr:$src2),
                                     X86_COND_NP, EFLAGS))]>,
                   TB;
 def CMOVO16rm : I<0x40, MRMSrcMem,       // if overflow, GR16 = [mem16]
                   (outs GR16:$dst), (ins GR16:$src1, i16mem:$src2),
-                  "cmovo\t{$src2, $dst|$dst, $src2}",
+                  "cmovo{w}\t{$src2, $dst|$dst, $src2}",
                   [(set GR16:$dst, (X86cmov GR16:$src1, (loadi16 addr:$src2),
                                    X86_COND_O, EFLAGS))]>,
                   TB, OpSize;
 def CMOVO32rm : I<0x40, MRMSrcMem,       // if overflow, GR32 = [mem32]
                   (outs GR32:$dst), (ins GR32:$src1, i32mem:$src2),
-                  "cmovo\t{$src2, $dst|$dst, $src2}",
+                  "cmovo{l}\t{$src2, $dst|$dst, $src2}",
                   [(set GR32:$dst, (X86cmov GR32:$src1, (loadi32 addr:$src2),
                                    X86_COND_O, EFLAGS))]>,
                   TB;
 def CMOVNO16rm : I<0x41, MRMSrcMem,       // if !overflow, GR16 = [mem16]
                   (outs GR16:$dst), (ins GR16:$src1, i16mem:$src2),
-                  "cmovno\t{$src2, $dst|$dst, $src2}",
+                  "cmovno{w}\t{$src2, $dst|$dst, $src2}",
                    [(set GR16:$dst, (X86cmov GR16:$src1, (loadi16 addr:$src2),
                                     X86_COND_NO, EFLAGS))]>,
                   TB, OpSize;
 def CMOVNO32rm : I<0x41, MRMSrcMem,       // if !overflow, GR32 = [mem32]
                   (outs GR32:$dst), (ins GR32:$src1, i32mem:$src2),
-                  "cmovno\t{$src2, $dst|$dst, $src2}",
+                  "cmovno{l}\t{$src2, $dst|$dst, $src2}",
                    [(set GR32:$dst, (X86cmov GR32:$src1, (loadi32 addr:$src2),
                                     X86_COND_NO, EFLAGS))]>,
                   TB;
@@ -1586,11 +1641,13 @@ def INC8r  : I<0xFE, MRM0r, (outs GR8 :$dst), (ins GR8 :$src), "inc{b}\t$dst",
                [(set GR8:$dst, (add GR8:$src, 1)),
                 (implicit EFLAGS)]>;
 let isConvertibleToThreeAddress = 1, CodeSize = 1 in {  // Can xform into LEA.
-def INC16r : I<0x40, AddRegFrm, (outs GR16:$dst), (ins GR16:$src), "inc{w}\t$dst",
+def INC16r : I<0x40, AddRegFrm, (outs GR16:$dst), (ins GR16:$src), 
+               "inc{w}\t$dst",
                [(set GR16:$dst, (add GR16:$src, 1)),
                 (implicit EFLAGS)]>,
              OpSize, Requires<[In32BitMode]>;
-def INC32r : I<0x40, AddRegFrm, (outs GR32:$dst), (ins GR32:$src), "inc{l}\t$dst",
+def INC32r : I<0x40, AddRegFrm, (outs GR32:$dst), (ins GR32:$src), 
+               "inc{l}\t$dst",
                [(set GR32:$dst, (add GR32:$src, 1)),
                 (implicit EFLAGS)]>, Requires<[In32BitMode]>;
 }
@@ -1613,11 +1670,13 @@ def DEC8r  : I<0xFE, MRM1r, (outs GR8 :$dst), (ins GR8 :$src), "dec{b}\t$dst",
                [(set GR8:$dst, (add GR8:$src, -1)),
                 (implicit EFLAGS)]>;
 let isConvertibleToThreeAddress = 1, CodeSize = 1 in {   // Can xform into LEA.
-def DEC16r : I<0x48, AddRegFrm, (outs GR16:$dst), (ins GR16:$src), "dec{w}\t$dst",
+def DEC16r : I<0x48, AddRegFrm, (outs GR16:$dst), (ins GR16:$src), 
+               "dec{w}\t$dst",
                [(set GR16:$dst, (add GR16:$src, -1)),
                 (implicit EFLAGS)]>,
              OpSize, Requires<[In32BitMode]>;
-def DEC32r : I<0x48, AddRegFrm, (outs GR32:$dst), (ins GR32:$src), "dec{l}\t$dst",
+def DEC32r : I<0x48, AddRegFrm, (outs GR32:$dst), (ins GR32:$src), 
+               "dec{l}\t$dst",
                [(set GR32:$dst, (add GR32:$src, -1)),
                 (implicit EFLAGS)]>, Requires<[In32BitMode]>;
 }
@@ -1657,6 +1716,17 @@ def AND32rr  : I<0x21, MRMDestReg,
                   (implicit EFLAGS)]>;
 }
 
+// AND instructions with the destination register in REG and the source register
+//   in R/M.  Included for the disassembler.
+def AND8rr_REV : I<0x22, MRMSrcReg, (outs GR8:$dst), (ins GR8:$src1, GR8:$src2),
+                  "and{b}\t{$src2, $dst|$dst, $src2}", []>;
+def AND16rr_REV : I<0x23, MRMSrcReg, (outs GR16:$dst), 
+                    (ins GR16:$src1, GR16:$src2),
+                   "and{w}\t{$src2, $dst|$dst, $src2}", []>, OpSize;
+def AND32rr_REV : I<0x23, MRMSrcReg, (outs GR32:$dst), 
+                    (ins GR32:$src1, GR32:$src2),
+                   "and{l}\t{$src2, $dst|$dst, $src2}", []>;
+
 def AND8rm   : I<0x22, MRMSrcMem, 
                  (outs GR8 :$dst), (ins GR8 :$src1, i8mem :$src2),
                  "and{b}\t{$src2, $dst|$dst, $src2}",
@@ -1756,50 +1826,73 @@ let isTwoAddress = 0 in {
 
 
 let isCommutable = 1 in {   // X = OR Y, Z   --> X = OR Z, Y
-def OR8rr    : I<0x08, MRMDestReg, (outs GR8 :$dst), (ins GR8 :$src1, GR8 :$src2),
+def OR8rr    : I<0x08, MRMDestReg, (outs GR8 :$dst), 
+                 (ins GR8 :$src1, GR8 :$src2),
                  "or{b}\t{$src2, $dst|$dst, $src2}",
                  [(set GR8:$dst, (or GR8:$src1, GR8:$src2)),
                   (implicit EFLAGS)]>;
-def OR16rr   : I<0x09, MRMDestReg, (outs GR16:$dst), (ins GR16:$src1, GR16:$src2),
+def OR16rr   : I<0x09, MRMDestReg, (outs GR16:$dst), 
+                 (ins GR16:$src1, GR16:$src2),
                  "or{w}\t{$src2, $dst|$dst, $src2}",
                  [(set GR16:$dst, (or GR16:$src1, GR16:$src2)),
                   (implicit EFLAGS)]>, OpSize;
-def OR32rr   : I<0x09, MRMDestReg, (outs GR32:$dst), (ins GR32:$src1, GR32:$src2),
+def OR32rr   : I<0x09, MRMDestReg, (outs GR32:$dst), 
+                 (ins GR32:$src1, GR32:$src2),
                  "or{l}\t{$src2, $dst|$dst, $src2}",
                  [(set GR32:$dst, (or GR32:$src1, GR32:$src2)),
                   (implicit EFLAGS)]>;
 }
-def OR8rm    : I<0x0A, MRMSrcMem , (outs GR8 :$dst), (ins GR8 :$src1, i8mem :$src2),
+
+// OR instructions with the destination register in REG and the source register
+//   in R/M.  Included for the disassembler.
+def OR8rr_REV : I<0x0A, MRMSrcReg, (outs GR8:$dst), (ins GR8:$src1, GR8:$src2),
+                  "or{b}\t{$src2, $dst|$dst, $src2}", []>;
+def OR16rr_REV : I<0x0B, MRMSrcReg, (outs GR16:$dst),
+                   (ins GR16:$src1, GR16:$src2),
+                   "or{w}\t{$src2, $dst|$dst, $src2}", []>, OpSize;
+def OR32rr_REV : I<0x0B, MRMSrcReg, (outs GR32:$dst), 
+                   (ins GR32:$src1, GR32:$src2),
+                   "or{l}\t{$src2, $dst|$dst, $src2}", []>;
+                  
+def OR8rm    : I<0x0A, MRMSrcMem , (outs GR8 :$dst), 
+                 (ins GR8 :$src1, i8mem :$src2),
                  "or{b}\t{$src2, $dst|$dst, $src2}",
                 [(set GR8:$dst, (or GR8:$src1, (load addr:$src2))),
                  (implicit EFLAGS)]>;
-def OR16rm   : I<0x0B, MRMSrcMem , (outs GR16:$dst), (ins GR16:$src1, i16mem:$src2),
+def OR16rm   : I<0x0B, MRMSrcMem , (outs GR16:$dst), 
+                 (ins GR16:$src1, i16mem:$src2),
                  "or{w}\t{$src2, $dst|$dst, $src2}",
                 [(set GR16:$dst, (or GR16:$src1, (load addr:$src2))),
                  (implicit EFLAGS)]>, OpSize;
-def OR32rm   : I<0x0B, MRMSrcMem , (outs GR32:$dst), (ins GR32:$src1, i32mem:$src2),
+def OR32rm   : I<0x0B, MRMSrcMem , (outs GR32:$dst), 
+                 (ins GR32:$src1, i32mem:$src2),
                  "or{l}\t{$src2, $dst|$dst, $src2}",
                 [(set GR32:$dst, (or GR32:$src1, (load addr:$src2))),
                  (implicit EFLAGS)]>;
 
-def OR8ri    : Ii8 <0x80, MRM1r, (outs GR8 :$dst), (ins GR8 :$src1, i8imm:$src2),
+def OR8ri    : Ii8 <0x80, MRM1r, (outs GR8 :$dst), 
+                    (ins GR8 :$src1, i8imm:$src2),
                     "or{b}\t{$src2, $dst|$dst, $src2}",
                     [(set GR8:$dst, (or GR8:$src1, imm:$src2)),
                      (implicit EFLAGS)]>;
-def OR16ri   : Ii16<0x81, MRM1r, (outs GR16:$dst), (ins GR16:$src1, i16imm:$src2),
+def OR16ri   : Ii16<0x81, MRM1r, (outs GR16:$dst), 
+                    (ins GR16:$src1, i16imm:$src2),
                     "or{w}\t{$src2, $dst|$dst, $src2}", 
                     [(set GR16:$dst, (or GR16:$src1, imm:$src2)),
                      (implicit EFLAGS)]>, OpSize;
-def OR32ri   : Ii32<0x81, MRM1r, (outs GR32:$dst), (ins GR32:$src1, i32imm:$src2),
+def OR32ri   : Ii32<0x81, MRM1r, (outs GR32:$dst), 
+                    (ins GR32:$src1, i32imm:$src2),
                     "or{l}\t{$src2, $dst|$dst, $src2}",
                     [(set GR32:$dst, (or GR32:$src1, imm:$src2)),
                      (implicit EFLAGS)]>;
 
-def OR16ri8  : Ii8<0x83, MRM1r, (outs GR16:$dst), (ins GR16:$src1, i16i8imm:$src2),
+def OR16ri8  : Ii8<0x83, MRM1r, (outs GR16:$dst), 
+                   (ins GR16:$src1, i16i8imm:$src2),
                    "or{w}\t{$src2, $dst|$dst, $src2}",
                    [(set GR16:$dst, (or GR16:$src1, i16immSExt8:$src2)),
                     (implicit EFLAGS)]>, OpSize;
-def OR32ri8  : Ii8<0x83, MRM1r, (outs GR32:$dst), (ins GR32:$src1, i32i8imm:$src2),
+def OR32ri8  : Ii8<0x83, MRM1r, (outs GR32:$dst), 
+                   (ins GR32:$src1, i32i8imm:$src2),
                    "or{l}\t{$src2, $dst|$dst, $src2}",
                    [(set GR32:$dst, (or GR32:$src1, i32immSExt8:$src2)),
                     (implicit EFLAGS)]>;
@@ -1866,6 +1959,17 @@ let isCommutable = 1 in { // X = XOR Y, Z --> X = XOR Z, Y
                     (implicit EFLAGS)]>;
 } // isCommutable = 1
 
+// XOR instructions with the destination register in REG and the source register
+//   in R/M.  Included for the disassembler.
+def XOR8rr_REV : I<0x32, MRMSrcReg, (outs GR8:$dst), (ins GR8:$src1, GR8:$src2),
+                  "xor{b}\t{$src2, $dst|$dst, $src2}", []>;
+def XOR16rr_REV : I<0x33, MRMSrcReg, (outs GR16:$dst), 
+                    (ins GR16:$src1, GR16:$src2),
+                   "xor{w}\t{$src2, $dst|$dst, $src2}", []>, OpSize;
+def XOR32rr_REV : I<0x33, MRMSrcReg, (outs GR32:$dst), 
+                    (ins GR32:$src1, GR32:$src2),
+                   "xor{l}\t{$src2, $dst|$dst, $src2}", []>;
+
 def XOR8rm   : I<0x32, MRMSrcMem , 
                  (outs GR8 :$dst), (ins GR8:$src1, i8mem :$src2), 
                  "xor{b}\t{$src2, $dst|$dst, $src2}",
@@ -2205,7 +2309,8 @@ def RCL16mCL : I<0xD3, MRM2m, (outs i16mem:$dst), (ins i16mem:$src),
 }
 def RCL16ri : Ii8<0xC1, MRM2r, (outs GR16:$dst), (ins GR16:$src, i8imm:$cnt),
                   "rcl{w}\t{$cnt, $dst|$dst, $cnt}", []>, OpSize;
-def RCL16mi : Ii8<0xC1, MRM2m, (outs i16mem:$dst), (ins i16mem:$src, i8imm:$cnt),
+def RCL16mi : Ii8<0xC1, MRM2m, (outs i16mem:$dst), 
+                  (ins i16mem:$src, i8imm:$cnt),
                   "rcl{w}\t{$cnt, $dst|$dst, $cnt}", []>, OpSize;
 
 def RCL32r1 : I<0xD1, MRM2r, (outs GR32:$dst), (ins GR32:$src),
@@ -2220,7 +2325,8 @@ def RCL32mCL : I<0xD3, MRM2m, (outs i32mem:$dst), (ins i32mem:$src),
 }
 def RCL32ri : Ii8<0xC1, MRM2r, (outs GR32:$dst), (ins GR32:$src, i8imm:$cnt),
                   "rcl{l}\t{$cnt, $dst|$dst, $cnt}", []>;
-def RCL32mi : Ii8<0xC1, MRM2m, (outs i32mem:$dst), (ins i32mem:$src, i8imm:$cnt),
+def RCL32mi : Ii8<0xC1, MRM2m, (outs i32mem:$dst), 
+                  (ins i32mem:$src, i8imm:$cnt),
                   "rcl{l}\t{$cnt, $dst|$dst, $cnt}", []>;
                   
 def RCR8r1 : I<0xD0, MRM3r, (outs GR8:$dst), (ins GR8:$src),
@@ -2250,7 +2356,8 @@ def RCR16mCL : I<0xD3, MRM3m, (outs i16mem:$dst), (ins i16mem:$src),
 }
 def RCR16ri : Ii8<0xC1, MRM3r, (outs GR16:$dst), (ins GR16:$src, i8imm:$cnt),
                   "rcr{w}\t{$cnt, $dst|$dst, $cnt}", []>, OpSize;
-def RCR16mi : Ii8<0xC1, MRM3m, (outs i16mem:$dst), (ins i16mem:$src, i8imm:$cnt),
+def RCR16mi : Ii8<0xC1, MRM3m, (outs i16mem:$dst), 
+                  (ins i16mem:$src, i8imm:$cnt),
                   "rcr{w}\t{$cnt, $dst|$dst, $cnt}", []>, OpSize;
 
 def RCR32r1 : I<0xD1, MRM3r, (outs GR32:$dst), (ins GR32:$src),
@@ -2265,7 +2372,8 @@ def RCR32mCL : I<0xD3, MRM3m, (outs i32mem:$dst), (ins i32mem:$src),
 }
 def RCR32ri : Ii8<0xC1, MRM3r, (outs GR32:$dst), (ins GR32:$src, i8imm:$cnt),
                   "rcr{l}\t{$cnt, $dst|$dst, $cnt}", []>;
-def RCR32mi : Ii8<0xC1, MRM3m, (outs i32mem:$dst), (ins i32mem:$src, i8imm:$cnt),
+def RCR32mi : Ii8<0xC1, MRM3m, (outs i32mem:$dst), 
+                  (ins i32mem:$src, i8imm:$cnt),
                   "rcr{l}\t{$cnt, $dst|$dst, $cnt}", []>;
 
 // FIXME: provide shorter instructions when imm8 == 1
@@ -2286,7 +2394,8 @@ def ROL8ri   : Ii8<0xC0, MRM0r, (outs GR8 :$dst), (ins GR8 :$src1, i8imm:$src2),
                    [(set GR8:$dst, (rotl GR8:$src1, (i8 imm:$src2)))]>;
 def ROL16ri  : Ii8<0xC1, MRM0r, (outs GR16:$dst), (ins GR16:$src1, i8imm:$src2),
                    "rol{w}\t{$src2, $dst|$dst, $src2}",
-                   [(set GR16:$dst, (rotl GR16:$src1, (i8 imm:$src2)))]>, OpSize;
+                   [(set GR16:$dst, (rotl GR16:$src1, (i8 imm:$src2)))]>, 
+                   OpSize;
 def ROL32ri  : Ii8<0xC1, MRM0r, (outs GR32:$dst), (ins GR32:$src1, i8imm:$src2),
                    "rol{l}\t{$src2, $dst|$dst, $src2}",
                    [(set GR32:$dst, (rotl GR32:$src1, (i8 imm:$src2)))]>;
@@ -2355,7 +2464,8 @@ def ROR8ri   : Ii8<0xC0, MRM1r, (outs GR8 :$dst), (ins GR8 :$src1, i8imm:$src2),
                    [(set GR8:$dst, (rotr GR8:$src1, (i8 imm:$src2)))]>;
 def ROR16ri  : Ii8<0xC1, MRM1r, (outs GR16:$dst), (ins GR16:$src1, i8imm:$src2),
                    "ror{w}\t{$src2, $dst|$dst, $src2}",
-                   [(set GR16:$dst, (rotr GR16:$src1, (i8 imm:$src2)))]>, OpSize;
+                   [(set GR16:$dst, (rotr GR16:$src1, (i8 imm:$src2)))]>, 
+                   OpSize;
 def ROR32ri  : Ii8<0xC1, MRM1r, (outs GR32:$dst), (ins GR32:$src1, i8imm:$src2),
                    "ror{l}\t{$src2, $dst|$dst, $src2}",
                    [(set GR32:$dst, (rotr GR32:$src1, (i8 imm:$src2)))]>;
@@ -2411,17 +2521,21 @@ let isTwoAddress = 0 in {
 
 // Double shift instructions (generalizations of rotate)
 let Uses = [CL] in {
-def SHLD32rrCL : I<0xA5, MRMDestReg, (outs GR32:$dst), (ins GR32:$src1, GR32:$src2),
+def SHLD32rrCL : I<0xA5, MRMDestReg, (outs GR32:$dst), 
+                   (ins GR32:$src1, GR32:$src2),
                    "shld{l}\t{%cl, $src2, $dst|$dst, $src2, CL}",
                    [(set GR32:$dst, (X86shld GR32:$src1, GR32:$src2, CL))]>, TB;
-def SHRD32rrCL : I<0xAD, MRMDestReg, (outs GR32:$dst), (ins GR32:$src1, GR32:$src2),
+def SHRD32rrCL : I<0xAD, MRMDestReg, (outs GR32:$dst),
+                   (ins GR32:$src1, GR32:$src2),
                    "shrd{l}\t{%cl, $src2, $dst|$dst, $src2, CL}",
                    [(set GR32:$dst, (X86shrd GR32:$src1, GR32:$src2, CL))]>, TB;
-def SHLD16rrCL : I<0xA5, MRMDestReg, (outs GR16:$dst), (ins GR16:$src1, GR16:$src2),
+def SHLD16rrCL : I<0xA5, MRMDestReg, (outs GR16:$dst), 
+                   (ins GR16:$src1, GR16:$src2),
                    "shld{w}\t{%cl, $src2, $dst|$dst, $src2, CL}",
                    [(set GR16:$dst, (X86shld GR16:$src1, GR16:$src2, CL))]>,
                    TB, OpSize;
-def SHRD16rrCL : I<0xAD, MRMDestReg, (outs GR16:$dst), (ins GR16:$src1, GR16:$src2),
+def SHRD16rrCL : I<0xAD, MRMDestReg, (outs GR16:$dst), 
+                   (ins GR16:$src1, GR16:$src2),
                    "shrd{w}\t{%cl, $src2, $dst|$dst, $src2, CL}",
                    [(set GR16:$dst, (X86shrd GR16:$src1, GR16:$src2, CL))]>,
                    TB, OpSize;
@@ -2429,25 +2543,29 @@ def SHRD16rrCL : I<0xAD, MRMDestReg, (outs GR16:$dst), (ins GR16:$src1, GR16:$sr
 
 let isCommutable = 1 in {  // These instructions commute to each other.
 def SHLD32rri8 : Ii8<0xA4, MRMDestReg,
-                     (outs GR32:$dst), (ins GR32:$src1, GR32:$src2, i8imm:$src3),
+                     (outs GR32:$dst), 
+                     (ins GR32:$src1, GR32:$src2, i8imm:$src3),
                      "shld{l}\t{$src3, $src2, $dst|$dst, $src2, $src3}",
                      [(set GR32:$dst, (X86shld GR32:$src1, GR32:$src2,
                                       (i8 imm:$src3)))]>,
                  TB;
 def SHRD32rri8 : Ii8<0xAC, MRMDestReg,
-                     (outs GR32:$dst), (ins GR32:$src1, GR32:$src2, i8imm:$src3),
+                     (outs GR32:$dst), 
+                     (ins GR32:$src1, GR32:$src2, i8imm:$src3),
                      "shrd{l}\t{$src3, $src2, $dst|$dst, $src2, $src3}",
                      [(set GR32:$dst, (X86shrd GR32:$src1, GR32:$src2,
                                       (i8 imm:$src3)))]>,
                  TB;
 def SHLD16rri8 : Ii8<0xA4, MRMDestReg,
-                     (outs GR16:$dst), (ins GR16:$src1, GR16:$src2, i8imm:$src3),
+                     (outs GR16:$dst), 
+                     (ins GR16:$src1, GR16:$src2, i8imm:$src3),
                      "shld{w}\t{$src3, $src2, $dst|$dst, $src2, $src3}",
                      [(set GR16:$dst, (X86shld GR16:$src1, GR16:$src2,
                                       (i8 imm:$src3)))]>,
                      TB, OpSize;
 def SHRD16rri8 : Ii8<0xAC, MRMDestReg,
-                     (outs GR16:$dst), (ins GR16:$src1, GR16:$src2, i8imm:$src3),
+                     (outs GR16:$dst), 
+                     (ins GR16:$src1, GR16:$src2, i8imm:$src3),
                      "shrd{w}\t{$src3, $src2, $dst|$dst, $src2, $src3}",
                      [(set GR16:$dst, (X86shrd GR16:$src1, GR16:$src2,
                                       (i8 imm:$src3)))]>,
@@ -2645,6 +2763,16 @@ def ADC32rr  : I<0x11, MRMDestReg, (outs GR32:$dst),
                  "adc{l}\t{$src2, $dst|$dst, $src2}",
                  [(set GR32:$dst, (adde GR32:$src1, GR32:$src2))]>;
 }
+
+def ADC8rr_REV : I<0x12, MRMSrcReg, (outs GR8:$dst), (ins GR8:$src1, GR8:$src2),
+                 "adc{b}\t{$src2, $dst|$dst, $src2}", []>;
+def ADC16rr_REV : I<0x13, MRMSrcReg, (outs GR16:$dst), 
+                    (ins GR16:$src1, GR16:$src2),
+                    "adc{w}\t{$src2, $dst|$dst, $src2}", []>, OpSize;
+def ADC32rr_REV : I<0x13, MRMSrcReg, (outs GR32:$dst), 
+                    (ins GR32:$src1, GR32:$src2),
+                    "adc{l}\t{$src2, $dst|$dst, $src2}", []>;
+
 def ADC8rm   : I<0x12, MRMSrcMem , (outs GR8:$dst), 
                                    (ins GR8:$src1, i8mem:$src2),
                  "adc{b}\t{$src2, $dst|$dst, $src2}",
@@ -2731,6 +2859,15 @@ def SUB32rr : I<0x29, MRMDestReg, (outs GR32:$dst), (ins GR32:$src1,GR32:$src2),
                 [(set GR32:$dst, (sub GR32:$src1, GR32:$src2)),
                  (implicit EFLAGS)]>;
 
+def SUB8rr_REV : I<0x2A, MRMSrcReg, (outs GR8:$dst), (ins GR8:$src1, GR8:$src2),
+                   "sub{b}\t{$src2, $dst|$dst, $src2}", []>;
+def SUB16rr_REV : I<0x2B, MRMSrcReg, (outs GR16:$dst), 
+                    (ins GR16:$src1, GR16:$src2),
+                    "sub{w}\t{$src2, $dst|$dst, $src2}", []>, OpSize;
+def SUB32rr_REV : I<0x2B, MRMSrcReg, (outs GR32:$dst), 
+                    (ins GR32:$src1, GR32:$src2),
+                    "sub{l}\t{$src2, $dst|$dst, $src2}", []>;
+
 // Register-Memory Subtraction
 def SUB8rm  : I<0x2A, MRMSrcMem, (outs GR8 :$dst),
                                  (ins GR8 :$src1, i8mem :$src2),
@@ -2872,6 +3009,16 @@ let isTwoAddress = 0 in {
   def SBB32i32 : Ii32<0x1D, RawFrm, (outs), (ins i32imm:$src),
                       "sbb{l}\t{$src, %eax|%eax, $src}", []>;
 }
+
+def SBB8rr_REV : I<0x1A, MRMSrcReg, (outs GR8:$dst), (ins GR8:$src1, GR8:$src2),
+                   "sbb{b}\t{$src2, $dst|$dst, $src2}", []>;
+def SBB16rr_REV : I<0x1B, MRMSrcReg, (outs GR16:$dst), 
+                    (ins GR16:$src1, GR16:$src2),
+                    "sbb{w}\t{$src2, $dst|$dst, $src2}", []>, OpSize;
+def SBB32rr_REV : I<0x1B, MRMSrcReg, (outs GR32:$dst), 
+                    (ins GR32:$src1, GR32:$src2),
+                    "sbb{l}\t{$src2, $dst|$dst, $src2}", []>;
+
 def SBB8rm   : I<0x1A, MRMSrcMem, (outs GR8:$dst), (ins GR8:$src1, i8mem:$src2),
                     "sbb{b}\t{$src2, $dst|$dst, $src2}",
                     [(set GR8:$dst, (sube GR8:$src1, (load addr:$src2)))]>;
@@ -2926,7 +3073,8 @@ def IMUL16rm : I<0xAF, MRMSrcMem, (outs GR16:$dst),
                  "imul{w}\t{$src2, $dst|$dst, $src2}",
                  [(set GR16:$dst, (mul GR16:$src1, (load addr:$src2))),
                   (implicit EFLAGS)]>, TB, OpSize;
-def IMUL32rm : I<0xAF, MRMSrcMem, (outs GR32:$dst), (ins GR32:$src1, i32mem:$src2),
+def IMUL32rm : I<0xAF, MRMSrcMem, (outs GR32:$dst), 
+                 (ins GR32:$src1, i32mem:$src2),
                  "imul{l}\t{$src2, $dst|$dst, $src2}",
                  [(set GR32:$dst, (mul GR32:$src1, (load addr:$src2))),
                   (implicit EFLAGS)]>, TB;
@@ -2958,12 +3106,12 @@ def IMUL32rri8 : Ii8<0x6B, MRMSrcReg,                       // GR32 = GR32*I8
                       (implicit EFLAGS)]>;
 
 // Memory-Integer Signed Integer Multiply
-def IMUL16rmi  : Ii16<0x69, MRMSrcMem,                      // GR16 = [mem16]*I16
+def IMUL16rmi  : Ii16<0x69, MRMSrcMem,                     // GR16 = [mem16]*I16
                       (outs GR16:$dst), (ins i16mem:$src1, i16imm:$src2),
                       "imul{w}\t{$src2, $src1, $dst|$dst, $src1, $src2}",
                       [(set GR16:$dst, (mul (load addr:$src1), imm:$src2)),
                        (implicit EFLAGS)]>, OpSize;
-def IMUL32rmi  : Ii32<0x69, MRMSrcMem,                      // GR32 = [mem32]*I32
+def IMUL32rmi  : Ii32<0x69, MRMSrcMem,                     // GR32 = [mem32]*I32
                       (outs GR32:$dst), (ins i32mem:$src1, i32imm:$src2),
                       "imul{l}\t{$src2, $src1, $dst|$dst, $src1, $src2}",
                       [(set GR32:$dst, (mul (load addr:$src1), imm:$src2)),
@@ -3374,15 +3522,21 @@ def BT32rr : I<0xA3, MRMDestReg, (outs), (ins GR32:$src1, GR32:$src2),
 
 // Unlike with the register+register form, the memory+register form of the
 // bt instruction does not ignore the high bits of the index. From ISel's
-// perspective, this is pretty bizarre. Disable these instructions for now.
-//def BT16mr : I<0xA3, MRMDestMem, (outs), (ins i16mem:$src1, GR16:$src2),
-//               "bt{w}\t{$src2, $src1|$src1, $src2}",
+// perspective, this is pretty bizarre. Make these instructions disassembly
+// only for now.
+
+def BT16mr : I<0xA3, MRMDestMem, (outs), (ins i16mem:$src1, GR16:$src2),
+               "bt{w}\t{$src2, $src1|$src1, $src2}", 
 //               [(X86bt (loadi16 addr:$src1), GR16:$src2),
-//                (implicit EFLAGS)]>, OpSize, TB, Requires<[FastBTMem]>;
-//def BT32mr : I<0xA3, MRMDestMem, (outs), (ins i32mem:$src1, GR32:$src2),
-//               "bt{l}\t{$src2, $src1|$src1, $src2}",
+//                (implicit EFLAGS)]
+               []
+               >, OpSize, TB, Requires<[FastBTMem]>;
+def BT32mr : I<0xA3, MRMDestMem, (outs), (ins i32mem:$src1, GR32:$src2),
+               "bt{l}\t{$src2, $src1|$src1, $src2}", 
 //               [(X86bt (loadi32 addr:$src1), GR32:$src2),
-//                (implicit EFLAGS)]>, TB, Requires<[FastBTMem]>;
+//                (implicit EFLAGS)]
+               []
+               >, TB, Requires<[FastBTMem]>;
 
 def BT16ri8 : Ii8<0xBA, MRM4r, (outs), (ins GR16:$src1, i16i8imm:$src2),
                 "bt{w}\t{$src2, $src1|$src1, $src2}",
@@ -3403,12 +3557,67 @@ def BT32mi8 : Ii8<0xBA, MRM4m, (outs), (ins i32mem:$src1, i32i8imm:$src2),
                 "bt{l}\t{$src2, $src1|$src1, $src2}",
                 [(X86bt (loadi32 addr:$src1), i32immSExt8:$src2),
                  (implicit EFLAGS)]>, TB;
+
+def BTC16rr : I<0xBB, MRMDestReg, (outs), (ins GR16:$src1, GR16:$src2),
+                "btc{w}\t{$src2, $src1|$src1, $src2}", []>, OpSize, TB;
+def BTC32rr : I<0xBB, MRMDestReg, (outs), (ins GR32:$src1, GR32:$src2),
+                "btc{l}\t{$src2, $src1|$src1, $src2}", []>, TB;
+def BTC16mr : I<0xBB, MRMDestMem, (outs), (ins i16mem:$src1, GR16:$src2),
+                "btc{w}\t{$src2, $src1|$src1, $src2}", []>, OpSize, TB;
+def BTC32mr : I<0xBB, MRMDestMem, (outs), (ins i32mem:$src1, GR32:$src2),
+                "btc{l}\t{$src2, $src1|$src1, $src2}", []>, TB;
+def BTC16ri8 : Ii8<0xBA, MRM7r, (outs), (ins GR16:$src1, i16i8imm:$src2),
+                    "btc{w}\t{$src2, $src1|$src1, $src2}", []>, OpSize, TB;
+def BTC32ri8 : Ii8<0xBA, MRM7r, (outs), (ins GR32:$src1, i32i8imm:$src2),
+                    "btc{l}\t{$src2, $src1|$src1, $src2}", []>, TB;
+def BTC16mi8 : Ii8<0xBA, MRM7m, (outs), (ins i16mem:$src1, i16i8imm:$src2),
+                    "btc{w}\t{$src2, $src1|$src1, $src2}", []>, OpSize, TB;
+def BTC32mi8 : Ii8<0xBA, MRM7m, (outs), (ins i32mem:$src1, i32i8imm:$src2),
+                    "btc{l}\t{$src2, $src1|$src1, $src2}", []>, TB;
+
+def BTR16rr : I<0xB3, MRMDestReg, (outs), (ins GR16:$src1, GR16:$src2),
+                "btr{w}\t{$src2, $src1|$src1, $src2}", []>, OpSize, TB;
+def BTR32rr : I<0xB3, MRMDestReg, (outs), (ins GR32:$src1, GR32:$src2),
+                "btr{l}\t{$src2, $src1|$src1, $src2}", []>, TB;
+def BTR16mr : I<0xB3, MRMDestMem, (outs), (ins i16mem:$src1, GR16:$src2),
+                "btr{w}\t{$src2, $src1|$src1, $src2}", []>, OpSize, TB;
+def BTR32mr : I<0xB3, MRMDestMem, (outs), (ins i32mem:$src1, GR32:$src2),
+                "btr{l}\t{$src2, $src1|$src1, $src2}", []>, TB;
+def BTR16ri8 : Ii8<0xBA, MRM6r, (outs), (ins GR16:$src1, i16i8imm:$src2),
+                    "btr{w}\t{$src2, $src1|$src1, $src2}", []>, OpSize, TB;
+def BTR32ri8 : Ii8<0xBA, MRM6r, (outs), (ins GR32:$src1, i32i8imm:$src2),
+                    "btr{l}\t{$src2, $src1|$src1, $src2}", []>, TB;
+def BTR16mi8 : Ii8<0xBA, MRM6m, (outs), (ins i16mem:$src1, i16i8imm:$src2),
+                    "btr{w}\t{$src2, $src1|$src1, $src2}", []>, OpSize, TB;
+def BTR32mi8 : Ii8<0xBA, MRM6m, (outs), (ins i32mem:$src1, i32i8imm:$src2),
+                    "btr{l}\t{$src2, $src1|$src1, $src2}", []>, TB;
+
+def BTS16rr : I<0xAB, MRMDestReg, (outs), (ins GR16:$src1, GR16:$src2),
+                "bts{w}\t{$src2, $src1|$src1, $src2}", []>, OpSize, TB;
+def BTS32rr : I<0xAB, MRMDestReg, (outs), (ins GR32:$src1, GR32:$src2),
+                "bts{l}\t{$src2, $src1|$src1, $src2}", []>, TB;
+def BTS16mr : I<0xAB, MRMDestMem, (outs), (ins i16mem:$src1, GR16:$src2),
+                "bts{w}\t{$src2, $src1|$src1, $src2}", []>, OpSize, TB;
+def BTS32mr : I<0xAB, MRMDestMem, (outs), (ins i32mem:$src1, GR32:$src2),
+                "bts{l}\t{$src2, $src1|$src1, $src2}", []>, TB;
+def BTS16ri8 : Ii8<0xBA, MRM5r, (outs), (ins GR16:$src1, i16i8imm:$src2),
+                    "bts{w}\t{$src2, $src1|$src1, $src2}", []>, OpSize, TB;
+def BTS32ri8 : Ii8<0xBA, MRM5r, (outs), (ins GR32:$src1, i32i8imm:$src2),
+                    "bts{l}\t{$src2, $src1|$src1, $src2}", []>, TB;
+def BTS16mi8 : Ii8<0xBA, MRM5m, (outs), (ins i16mem:$src1, i16i8imm:$src2),
+                    "bts{w}\t{$src2, $src1|$src1, $src2}", []>, OpSize, TB;
+def BTS32mi8 : Ii8<0xBA, MRM5m, (outs), (ins i32mem:$src1, i32i8imm:$src2),
+                    "bts{l}\t{$src2, $src1|$src1, $src2}", []>, TB;
 } // Defs = [EFLAGS]
 
 // Sign/Zero extenders
 // Use movsbl intead of movsbw; we don't care about the high 16 bits
 // of the register here. This has a smaller encoding and avoids a
-// partial-register update.
+// partial-register update.  Actual movsbw included for the disassembler.
+def MOVSX16rr8W : I<0xBE, MRMSrcReg, (outs GR16:$dst), (ins GR8:$src),
+                    "movs{bw|x}\t{$src, $dst|$dst, $src}", []>, TB, OpSize;
+def MOVSX16rm8W : I<0xBE, MRMSrcMem, (outs GR16:$dst), (ins i8mem:$src),
+                    "movs{bw|x}\t{$src, $dst|$dst, $src}", []>, TB, OpSize;
 def MOVSX16rr8 : I<0xBE, MRMSrcReg, (outs GR16:$dst), (ins GR8 :$src),
                    "", [(set GR16:$dst, (sext GR8:$src))]>, TB;
 def MOVSX16rm8 : I<0xBE, MRMSrcMem, (outs GR16:$dst), (ins i8mem :$src),
@@ -3428,7 +3637,11 @@ def MOVSX32rm16: I<0xBF, MRMSrcMem, (outs GR32:$dst), (ins i16mem:$src),
 
 // Use movzbl intead of movzbw; we don't care about the high 16 bits
 // of the register here. This has a smaller encoding and avoids a
-// partial-register update.
+// partial-register update.  Actual movzbw included for the disassembler.
+def MOVZX16rr8W : I<0xB6, MRMSrcReg, (outs GR16:$dst), (ins GR8:$src),
+                    "movz{bw|x}\t{$src, $dst|$dst, $src}", []>, TB, OpSize;
+def MOVZX16rm8W : I<0xB6, MRMSrcMem, (outs GR16:$dst), (ins i8mem:$src),
+                    "movz{bw|x}\t{$src, $dst|$dst, $src}", []>, TB, OpSize;  
 def MOVZX16rr8 : I<0xB6, MRMSrcReg, (outs GR16:$dst), (ins GR8 :$src),
                    "", [(set GR16:$dst, (zext GR8:$src))]>, TB;
 def MOVZX16rm8 : I<0xB6, MRMSrcMem, (outs GR16:$dst), (ins i8mem :$src),
@@ -3541,18 +3754,32 @@ def EH_RETURN   : I<0xC3, RawFrm, (outs), (ins GR32:$addr),
 // Atomic swap. These are just normal xchg instructions. But since a memory
 // operand is referenced, the atomicity is ensured.
 let Constraints = "$val = $dst" in {
-def XCHG32rm : I<0x87, MRMSrcMem, (outs GR32:$dst), (ins i32mem:$ptr, GR32:$val),
+def XCHG32rm : I<0x87, MRMSrcMem, (outs GR32:$dst), 
+                 (ins GR32:$val, i32mem:$ptr),
                "xchg{l}\t{$val, $ptr|$ptr, $val}", 
                [(set GR32:$dst, (atomic_swap_32 addr:$ptr, GR32:$val))]>;
-def XCHG16rm : I<0x87, MRMSrcMem, (outs GR16:$dst), (ins i16mem:$ptr, GR16:$val),
+def XCHG16rm : I<0x87, MRMSrcMem, (outs GR16:$dst), 
+                 (ins GR16:$val, i16mem:$ptr),
                "xchg{w}\t{$val, $ptr|$ptr, $val}", 
                [(set GR16:$dst, (atomic_swap_16 addr:$ptr, GR16:$val))]>, 
                 OpSize;
-def XCHG8rm  : I<0x86, MRMSrcMem, (outs GR8:$dst), (ins i8mem:$ptr, GR8:$val),
+def XCHG8rm  : I<0x86, MRMSrcMem, (outs GR8:$dst), (ins GR8:$val, i8mem:$ptr),
                "xchg{b}\t{$val, $ptr|$ptr, $val}", 
                [(set GR8:$dst, (atomic_swap_8 addr:$ptr, GR8:$val))]>;
+
+def XCHG32rr : I<0x87, MRMSrcReg, (outs GR32:$dst), (ins GR32:$val, GR32:$src),
+                 "xchg{l}\t{$val, $src|$src, $val}", []>;
+def XCHG16rr : I<0x87, MRMSrcReg, (outs GR16:$dst), (ins GR16:$val, GR16:$src),
+                 "xchg{w}\t{$val, $src|$src, $val}", []>, OpSize;
+def XCHG8rr : I<0x86, MRMSrcReg, (outs GR8:$dst), (ins GR8:$val, GR8:$src),
+                "xchg{b}\t{$val, $src|$src, $val}", []>;
 }
 
+def XCHG16ar : I<0x90, AddRegFrm, (outs), (ins GR16:$src),
+                  "xchg{w}\t{$src, %ax|%ax, $src}", []>, OpSize;
+def XCHG32ar : I<0x90, AddRegFrm, (outs), (ins GR32:$src),
+                  "xchg{l}\t{$src, %eax|%eax, $src}", []>;
+
 // Atomic compare and swap.
 let Defs = [EAX, EFLAGS], Uses = [EAX] in {
 def LCMPXCHG32 : I<0xB1, MRMDestMem, (outs), (ins i32mem:$ptr, GR32:$swap),
@@ -3582,23 +3809,54 @@ def LCMPXCHG8 : I<0xB0, MRMDestMem, (outs), (ins i8mem:$ptr, GR8:$swap),
 
 // Atomic exchange and add
 let Constraints = "$val = $dst", Defs = [EFLAGS] in {
-def LXADD32 : I<0xC1, MRMSrcMem, (outs GR32:$dst), (ins i32mem:$ptr, GR32:$val),
+def LXADD32 : I<0xC1, MRMSrcMem, (outs GR32:$dst), (ins GR32:$val, i32mem:$ptr),
                "lock\n\t"
                "xadd{l}\t{$val, $ptr|$ptr, $val}",
                [(set GR32:$dst, (atomic_load_add_32 addr:$ptr, GR32:$val))]>,
                 TB, LOCK;
-def LXADD16 : I<0xC1, MRMSrcMem, (outs GR16:$dst), (ins i16mem:$ptr, GR16:$val),
+def LXADD16 : I<0xC1, MRMSrcMem, (outs GR16:$dst), (ins GR16:$val, i16mem:$ptr),
                "lock\n\t"
                "xadd{w}\t{$val, $ptr|$ptr, $val}",
                [(set GR16:$dst, (atomic_load_add_16 addr:$ptr, GR16:$val))]>,
                 TB, OpSize, LOCK;
-def LXADD8  : I<0xC0, MRMSrcMem, (outs GR8:$dst), (ins i8mem:$ptr, GR8:$val),
+def LXADD8  : I<0xC0, MRMSrcMem, (outs GR8:$dst), (ins GR8:$val, i8mem:$ptr),
                "lock\n\t"
                "xadd{b}\t{$val, $ptr|$ptr, $val}",
                [(set GR8:$dst, (atomic_load_add_8 addr:$ptr, GR8:$val))]>,
                 TB, LOCK;
 }
 
+def XADD8rr : I<0xC0, MRMDestReg, (outs GR8:$dst), (ins GR8:$src),
+                "xadd{b}\t{$src, $dst|$dst, $src}", []>, TB;
+def XADD16rr : I<0xC1, MRMDestReg, (outs GR16:$dst), (ins GR16:$src),
+                 "xadd{w}\t{$src, $dst|$dst, $src}", []>, TB, OpSize;
+def XADD32rr  : I<0xC1, MRMDestReg, (outs GR32:$dst), (ins GR32:$src),
+                 "xadd{l}\t{$src, $dst|$dst, $src}", []>, TB;
+
+def XADD8rm   : I<0xC0, MRMDestMem, (outs), (ins i8mem:$dst, GR8:$src),
+                 "xadd{b}\t{$src, $dst|$dst, $src}", []>, TB;
+def XADD16rm  : I<0xC1, MRMDestMem, (outs), (ins i16mem:$dst, GR16:$src),
+                 "xadd{w}\t{$src, $dst|$dst, $src}", []>, TB, OpSize;
+def XADD32rm  : I<0xC1, MRMDestMem, (outs), (ins i32mem:$dst, GR32:$src),
+                 "xadd{l}\t{$src, $dst|$dst, $src}", []>, TB;
+
+def CMPXCHG8rr : I<0xB0, MRMDestReg, (outs GR8:$dst), (ins GR8:$src),
+                   "cmpxchg{b}\t{$src, $dst|$dst, $src}", []>, TB;
+def CMPXCHG16rr : I<0xB1, MRMDestReg, (outs GR16:$dst), (ins GR16:$src),
+                    "cmpxchg{w}\t{$src, $dst|$dst, $src}", []>, TB, OpSize;
+def CMPXCHG32rr  : I<0xB1, MRMDestReg, (outs GR32:$dst), (ins GR32:$src),
+                     "cmpxchg{l}\t{$src, $dst|$dst, $src}", []>, TB;
+
+def CMPXCHG8rm   : I<0xB0, MRMDestMem, (outs), (ins i8mem:$dst, GR8:$src),
+                     "cmpxchg{b}\t{$src, $dst|$dst, $src}", []>, TB;
+def CMPXCHG16rm  : I<0xB1, MRMDestMem, (outs), (ins i16mem:$dst, GR16:$src),
+                     "cmpxchg{w}\t{$src, $dst|$dst, $src}", []>, TB, OpSize;
+def CMPXCHG32rm  : I<0xB1, MRMDestMem, (outs), (ins i32mem:$dst, GR32:$src),
+                     "cmpxchg{l}\t{$src, $dst|$dst, $src}", []>, TB;
+
+def CMPXCHG8B : I<0xC7, MRM1m, (outs), (ins i64mem:$dst),
+                  "cmpxchg8b\t$dst", []>, TB;
+
 // Optimized codegen when the non-memory output is not used.
 // FIXME: Use normal add / sub instructions and add lock prefix dynamically.
 let Defs = [EFLAGS] in {
@@ -3655,7 +3913,7 @@ def LOCK_SUB16mi  : Ii16<0x81, MRM5m, (outs), (ins i16mem:$dst, i16imm:$src2),
 def LOCK_SUB32mi  : Ii32<0x81, MRM5m, (outs), (ins i32mem:$dst, i32imm:$src2), 
                     "lock\n\t"
                      "sub{l}\t{$src2, $dst|$dst, $src2}", []>, LOCK;
-def LOCK_SUB16mi8 : Ii8<0x83, MRM5m, (outs), (ins i16mem:$dst, i16i8imm :$src2), 
+def LOCK_SUB16mi8 : Ii8<0x83, MRM5m, (outs), (ins i16mem:$dst, i16i8imm :$src2),
                     "lock\n\t"
                      "sub{w}\t{$src2, $dst|$dst, $src2}", []>, OpSize, LOCK;
 def LOCK_SUB32mi8 : Ii8<0x83, MRM5m, (outs), (ins i32mem:$dst, i32i8imm :$src2),
@@ -3780,12 +4038,193 @@ def LAR32rm : I<0x02, MRMSrcMem, (outs GR32:$dst), (ins i16mem:$src),
                 "lar{l}\t{$src, $dst|$dst, $src}", []>, TB;
 def LAR32rr : I<0x02, MRMSrcReg, (outs GR32:$dst), (ins GR32:$src),
                 "lar{l}\t{$src, $dst|$dst, $src}", []>, TB;
+
+def LSL16rm : I<0x03, MRMSrcMem, (outs GR16:$dst), (ins i16mem:$src),
+                "lsl{w}\t{$src, $dst|$dst, $src}", []>, TB, OpSize; 
+def LSL16rr : I<0x03, MRMSrcReg, (outs GR16:$dst), (ins GR16:$src),
+                "lsl{w}\t{$src, $dst|$dst, $src}", []>, TB, OpSize;
+def LSL32rm : I<0x03, MRMSrcMem, (outs GR32:$dst), (ins i32mem:$src),
+                "lsl{l}\t{$src, $dst|$dst, $src}", []>, TB; 
+def LSL32rr : I<0x03, MRMSrcReg, (outs GR32:$dst), (ins GR32:$src),
+                "lsl{l}\t{$src, $dst|$dst, $src}", []>, TB;
+                
+def INVLPG : I<0x01, RawFrm, (outs), (ins), "invlpg", []>, TB;
+
+def STRr : I<0x00, MRM1r, (outs GR16:$dst), (ins),
+             "str{w}\t{$dst}", []>, TB;
+def STRm : I<0x00, MRM1m, (outs i16mem:$dst), (ins),
+             "str{w}\t{$dst}", []>, TB;
+def LTRr : I<0x00, MRM3r, (outs), (ins GR16:$src),
+             "ltr{w}\t{$src}", []>, TB;
+def LTRm : I<0x00, MRM3m, (outs), (ins i16mem:$src),
+             "ltr{w}\t{$src}", []>, TB;
+             
+def PUSHFS16 : I<0xa0, RawFrm, (outs), (ins),
+                 "push{w}\t%fs", []>, OpSize, TB;
+def PUSHFS32 : I<0xa0, RawFrm, (outs), (ins),
+                 "push{l}\t%fs", []>, TB;
+def PUSHGS16 : I<0xa8, RawFrm, (outs), (ins),
+                 "push{w}\t%gs", []>, OpSize, TB;
+def PUSHGS32 : I<0xa8, RawFrm, (outs), (ins),
+                 "push{l}\t%gs", []>, TB;
+
+def POPFS16 : I<0xa1, RawFrm, (outs), (ins),
+                "pop{w}\t%fs", []>, OpSize, TB;
+def POPFS32 : I<0xa1, RawFrm, (outs), (ins),
+                "pop{l}\t%fs", []>, TB;
+def POPGS16 : I<0xa9, RawFrm, (outs), (ins),
+                "pop{w}\t%gs", []>, OpSize, TB;
+def POPGS32 : I<0xa9, RawFrm, (outs), (ins),
+                "pop{l}\t%gs", []>, TB;
+
+def LDS16rm : I<0xc5, MRMSrcMem, (outs GR16:$dst), (ins opaque32mem:$src),
+                "lds{w}\t{$src, $dst|$dst, $src}", []>, OpSize;
+def LDS32rm : I<0xc5, MRMSrcMem, (outs GR32:$dst), (ins opaque48mem:$src),
+                "lds{l}\t{$src, $dst|$dst, $src}", []>;
+def LSS16rm : I<0xb2, MRMSrcMem, (outs GR16:$dst), (ins opaque32mem:$src),
+                "lss{w}\t{$src, $dst|$dst, $src}", []>, TB, OpSize;
+def LSS32rm : I<0xb2, MRMSrcMem, (outs GR32:$dst), (ins opaque48mem:$src),
+                "lss{l}\t{$src, $dst|$dst, $src}", []>, TB;
+def LES16rm : I<0xc4, MRMSrcMem, (outs GR16:$dst), (ins opaque32mem:$src),
+                "les{w}\t{$src, $dst|$dst, $src}", []>, OpSize;
+def LES32rm : I<0xc4, MRMSrcMem, (outs GR32:$dst), (ins opaque48mem:$src),
+                "les{l}\t{$src, $dst|$dst, $src}", []>;
+def LFS16rm : I<0xb4, MRMSrcMem, (outs GR16:$dst), (ins opaque32mem:$src),
+                "lfs{w}\t{$src, $dst|$dst, $src}", []>, TB, OpSize;
+def LFS32rm : I<0xb4, MRMSrcMem, (outs GR32:$dst), (ins opaque48mem:$src),
+                "lfs{l}\t{$src, $dst|$dst, $src}", []>, TB;
+def LGS16rm : I<0xb5, MRMSrcMem, (outs GR16:$dst), (ins opaque32mem:$src),
+                "lgs{w}\t{$src, $dst|$dst, $src}", []>, TB, OpSize;
+def LGS32rm : I<0xb5, MRMSrcMem, (outs GR32:$dst), (ins opaque48mem:$src),
+                "lgs{l}\t{$src, $dst|$dst, $src}", []>, TB;
+
+def VERRr : I<0x00, MRM4r, (outs), (ins GR16:$seg),
+              "verr\t$seg", []>, TB;
+def VERRm : I<0x00, MRM4m, (outs), (ins i16mem:$seg),
+              "verr\t$seg", []>, TB;
+def VERWr : I<0x00, MRM5r, (outs), (ins GR16:$seg),
+              "verw\t$seg", []>, TB;
+def VERWm : I<0x00, MRM5m, (outs), (ins i16mem:$seg),
+              "verw\t$seg", []>, TB;
+
+// Descriptor-table support instructions
+
+def SGDTm : I<0x01, MRM0m, (outs opaque48mem:$dst), (ins),
+              "sgdt\t$dst", []>, TB;
+def SIDTm : I<0x01, MRM1m, (outs opaque48mem:$dst), (ins),
+              "sidt\t$dst", []>, TB;
+def SLDT16r : I<0x00, MRM0r, (outs GR16:$dst), (ins),
+                "sldt{w}\t$dst", []>, TB;
+def SLDT16m : I<0x00, MRM0m, (outs i16mem:$dst), (ins),
+                "sldt{w}\t$dst", []>, TB;
+def LGDTm : I<0x01, MRM2m, (outs), (ins opaque48mem:$src),
+              "lgdt\t$src", []>, TB;
+def LIDTm : I<0x01, MRM3m, (outs), (ins opaque48mem:$src),
+              "lidt\t$src", []>, TB;
+def LLDT16r : I<0x00, MRM2r, (outs), (ins GR16:$src),
+                "lldt{w}\t$src", []>, TB;
+def LLDT16m : I<0x00, MRM2m, (outs), (ins i16mem:$src),
+                "lldt{w}\t$src", []>, TB;
                 
 // String manipulation instructions
 
 def LODSB : I<0xAC, RawFrm, (outs), (ins), "lodsb", []>;
 def LODSW : I<0xAD, RawFrm, (outs), (ins), "lodsw", []>, OpSize;
-def LODSD : I<0xAD, RawFrm, (outs), (ins), "lodsd", []>;
+def LODSD : I<0xAD, RawFrm, (outs), (ins), "lods{l|d}", []>;
+
+def OUTSB : I<0x6E, RawFrm, (outs), (ins), "outsb", []>;
+def OUTSW : I<0x6F, RawFrm, (outs), (ins), "outsw", []>, OpSize;
+def OUTSD : I<0x6F, RawFrm, (outs), (ins), "outs{l|d}", []>;
+
+// CPU flow control instructions
+
+def HLT : I<0xF4, RawFrm, (outs), (ins), "hlt", []>;
+def RSM : I<0xAA, RawFrm, (outs), (ins), "rsm", []>, TB;
+
+// FPU control instructions
+
+def FNINIT : I<0xE3, RawFrm, (outs), (ins), "fninit", []>, DB;
+
+// Flag instructions
+
+def CLC : I<0xF8, RawFrm, (outs), (ins), "clc", []>;
+def STC : I<0xF9, RawFrm, (outs), (ins), "stc", []>;
+def CLI : I<0xFA, RawFrm, (outs), (ins), "cli", []>;
+def STI : I<0xFB, RawFrm, (outs), (ins), "sti", []>;
+def CLD : I<0xFC, RawFrm, (outs), (ins), "cld", []>;
+def STD : I<0xFD, RawFrm, (outs), (ins), "std", []>;
+def CMC : I<0xF5, RawFrm, (outs), (ins), "cmc", []>;
+
+def CLTS : I<0x06, RawFrm, (outs), (ins), "clts", []>, TB;
+
+// Table lookup instructions
+
+def XLAT : I<0xD7, RawFrm, (outs), (ins), "xlatb", []>;
+
+// Specialized register support
+
+def WRMSR : I<0x30, RawFrm, (outs), (ins), "wrmsr", []>, TB;
+def RDMSR : I<0x32, RawFrm, (outs), (ins), "rdmsr", []>, TB;
+def RDPMC : I<0x33, RawFrm, (outs), (ins), "rdpmc", []>, TB;
+
+def SMSW16r : I<0x01, MRM4r, (outs GR16:$dst), (ins), 
+                "smsw{w}\t$dst", []>, OpSize, TB;
+def SMSW32r : I<0x01, MRM4r, (outs GR32:$dst), (ins), 
+                "smsw{l}\t$dst", []>, TB;
+// For memory operands, there is only a 16-bit form
+def SMSW16m : I<0x01, MRM4m, (outs i16mem:$dst), (ins),
+                "smsw{w}\t$dst", []>, TB;
+
+def LMSW16r : I<0x01, MRM6r, (outs), (ins GR16:$src),
+                "lmsw{w}\t$src", []>, TB;
+def LMSW16m : I<0x01, MRM6m, (outs), (ins i16mem:$src),
+                "lmsw{w}\t$src", []>, TB;
+                
+def CPUID : I<0xA2, RawFrm, (outs), (ins), "cpuid", []>, TB;
+
+// Cache instructions
+
+def INVD : I<0x08, RawFrm, (outs), (ins), "invd", []>, TB;
+def WBINVD : I<0x09, RawFrm, (outs), (ins), "wbinvd", []>, TB;
+
+// VMX instructions
+
+// 66 0F 38 80
+def INVEPT : I<0x38, RawFrm, (outs), (ins), "invept", []>, OpSize, TB;
+// 66 0F 38 81
+def INVVPID : I<0x38, RawFrm, (outs), (ins), "invvpid", []>, OpSize, TB;
+// 0F 01 C1
+def VMCALL : I<0x01, RawFrm, (outs), (ins), "vmcall", []>, TB;
+def VMCLEARm : I<0xC7, MRM6m, (outs), (ins i64mem:$vmcs),
+  "vmclear\t$vmcs", []>, OpSize, TB;
+// 0F 01 C2
+def VMLAUNCH : I<0x01, RawFrm, (outs), (ins), "vmlaunch", []>, TB;
+// 0F 01 C3
+def VMRESUME : I<0x01, RawFrm, (outs), (ins), "vmresume", []>, TB;
+def VMPTRLDm : I<0xC7, MRM6m, (outs), (ins i64mem:$vmcs),
+  "vmptrld\t$vmcs", []>, TB;
+def VMPTRSTm : I<0xC7, MRM7m, (outs i64mem:$vmcs), (ins),
+  "vmptrst\t$vmcs", []>, TB;
+def VMREAD64rm : I<0x78, MRMDestMem, (outs i64mem:$dst), (ins GR64:$src),
+  "vmread{q}\t{$src, $dst|$dst, $src}", []>, TB;
+def VMREAD64rr : I<0x78, MRMDestReg, (outs GR64:$dst), (ins GR64:$src),
+  "vmread{q}\t{$src, $dst|$dst, $src}", []>, TB;
+def VMREAD32rm : I<0x78, MRMDestMem, (outs i32mem:$dst), (ins GR32:$src),
+  "vmread{l}\t{$src, $dst|$dst, $src}", []>, TB;
+def VMREAD32rr : I<0x78, MRMDestReg, (outs GR32:$dst), (ins GR32:$src),
+  "vmread{l}\t{$src, $dst|$dst, $src}", []>, TB;
+def VMWRITE64rm : I<0x79, MRMSrcMem, (outs GR64:$dst), (ins i64mem:$src),
+  "vmwrite{q}\t{$src, $dst|$dst, $src}", []>, TB;
+def VMWRITE64rr : I<0x79, MRMSrcReg, (outs GR64:$dst), (ins GR64:$src),
+  "vmwrite{q}\t{$src, $dst|$dst, $src}", []>, TB;
+def VMWRITE32rm : I<0x79, MRMSrcMem, (outs GR32:$dst), (ins i32mem:$src),
+  "vmwrite{l}\t{$src, $dst|$dst, $src}", []>, TB;
+def VMWRITE32rr : I<0x79, MRMSrcReg, (outs GR32:$dst), (ins GR32:$src),
+  "vmwrite{l}\t{$src, $dst|$dst, $src}", []>, TB;
+// 0F 01 C4
+def VMXOFF : I<0x01, RawFrm, (outs), (ins), "vmxoff", []>, OpSize;
+def VMXON : I<0xC7, MRM6m, (outs), (ins i64mem:$vmxon),
+  "vmxon\t{$vmxon}", []>, XD;
 
 //===----------------------------------------------------------------------===//
 // Non-Instruction Patterns
@@ -4031,15 +4470,18 @@ def : Pat<(srl_su GR16:$src, (i8 8)),
             x86_subreg_16bit)>,
       Requires<[In32BitMode]>;
 def : Pat<(i32 (zext (srl_su GR16:$src, (i8 8)))),
-          (MOVZX32rr8 (EXTRACT_SUBREG (i16 (COPY_TO_REGCLASS GR16:$src, GR16_ABCD)),
+          (MOVZX32rr8 (EXTRACT_SUBREG (i16 (COPY_TO_REGCLASS GR16:$src, 
+                                                             GR16_ABCD)),
                                       x86_subreg_8bit_hi))>,
       Requires<[In32BitMode]>;
 def : Pat<(i32 (anyext (srl_su GR16:$src, (i8 8)))),
-          (MOVZX32rr8 (EXTRACT_SUBREG (i16 (COPY_TO_REGCLASS GR16:$src, GR16_ABCD)),
+          (MOVZX32rr8 (EXTRACT_SUBREG (i16 (COPY_TO_REGCLASS GR16:$src, 
+                                                             GR16_ABCD)),
                                       x86_subreg_8bit_hi))>,
       Requires<[In32BitMode]>;
 def : Pat<(and (srl_su GR32:$src, (i8 8)), (i32 255)),
-          (MOVZX32rr8 (EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS GR32:$src, GR32_ABCD)),
+          (MOVZX32rr8 (EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS GR32:$src, 
+                                                             GR32_ABCD)),
                                       x86_subreg_8bit_hi))>,
       Requires<[In32BitMode]>;
 
diff --git a/lib/Target/X86/X86InstrMMX.td b/lib/Target/X86/X86InstrMMX.td
index 500785b990..fc40c9a420 100644
--- a/lib/Target/X86/X86InstrMMX.td
+++ b/lib/Target/X86/X86InstrMMX.td
@@ -72,13 +72,13 @@ let Constraints = "$src1 = $dst" in {
   multiclass MMXI_binop_rm<bits<8> opc, string OpcodeStr, SDNode OpNode,
                            ValueType OpVT, bit Commutable = 0> {
     def rr : MMXI<opc, MRMSrcReg, (outs VR64:$dst),
-				  (ins VR64:$src1, VR64:$src2),
+                  (ins VR64:$src1, VR64:$src2),
                   !strconcat(OpcodeStr, "\t{$src2, $dst|$dst, $src2}"),
                   [(set VR64:$dst, (OpVT (OpNode VR64:$src1, VR64:$src2)))]> {
       let isCommutable = Commutable;
     }
     def rm : MMXI<opc, MRMSrcMem, (outs VR64:$dst),
-				  (ins VR64:$src1, i64mem:$src2),
+                  (ins VR64:$src1, i64mem:$src2),
                   !strconcat(OpcodeStr, "\t{$src2, $dst|$dst, $src2}"),
                   [(set VR64:$dst, (OpVT (OpNode VR64:$src1,
                                          (bitconvert
@@ -88,13 +88,13 @@ let Constraints = "$src1 = $dst" in {
   multiclass MMXI_binop_rm_int<bits<8> opc, string OpcodeStr, Intrinsic IntId,
                                bit Commutable = 0> {
     def rr : MMXI<opc, MRMSrcReg, (outs VR64:$dst),
-				  (ins VR64:$src1, VR64:$src2),
+                 (ins VR64:$src1, VR64:$src2),
                  !strconcat(OpcodeStr, "\t{$src2, $dst|$dst, $src2}"),
                  [(set VR64:$dst, (IntId VR64:$src1, VR64:$src2))]> {
       let isCommutable = Commutable;
     }
     def rm : MMXI<opc, MRMSrcMem, (outs VR64:$dst),
-				  (ins VR64:$src1, i64mem:$src2),
+                 (ins VR64:$src1, i64mem:$src2),
                  !strconcat(OpcodeStr, "\t{$src2, $dst|$dst, $src2}"),
                  [(set VR64:$dst, (IntId VR64:$src1,
                                    (bitconvert (load_mmx addr:$src2))))]>;
@@ -144,9 +144,9 @@ let Constraints = "$src1 = $dst" in {
 //===----------------------------------------------------------------------===//
 
 def MMX_EMMS  : MMXI<0x77, RawFrm, (outs), (ins), "emms",
-						  [(int_x86_mmx_emms)]>;
+                     [(int_x86_mmx_emms)]>;
 def MMX_FEMMS : MMXI<0x0E, RawFrm, (outs), (ins), "femms",
-						  [(int_x86_mmx_femms)]>;
+                     [(int_x86_mmx_femms)]>;
 
 //===----------------------------------------------------------------------===//
 // MMX Scalar Instructions
@@ -155,16 +155,21 @@ def MMX_FEMMS : MMXI<0x0E, RawFrm, (outs), (ins), "femms",
 // Data Transfer Instructions
 def MMX_MOVD64rr : MMXI<0x6E, MRMSrcReg, (outs VR64:$dst), (ins GR32:$src),
                         "movd\t{$src, $dst|$dst, $src}",
-                        [(set VR64:$dst,
-		   	  (v2i32 (scalar_to_vector GR32:$src)))]>;
+                        [(set VR64:$dst, 
+                         (v2i32 (scalar_to_vector GR32:$src)))]>;
 let canFoldAsLoad = 1, isReMaterializable = 1 in
 def MMX_MOVD64rm : MMXI<0x6E, MRMSrcMem, (outs VR64:$dst), (ins i32mem:$src),
                         "movd\t{$src, $dst|$dst, $src}",
               [(set VR64:$dst,
-		(v2i32 (scalar_to_vector (loadi32 addr:$src))))]>;
+               (v2i32 (scalar_to_vector (loadi32 addr:$src))))]>;
 let mayStore = 1 in
 def MMX_MOVD64mr : MMXI<0x7E, MRMDestMem, (outs), (ins i32mem:$dst, VR64:$src),
                         "movd\t{$src, $dst|$dst, $src}", []>;
+def MMX_MOVD64grr : MMXI<0x7E, MRMDestReg, (outs), (ins GR32:$dst, VR64:$src),
+                        "movd\t{$src, $dst|$dst, $src}", []>;
+def MMX_MOVQ64gmr : MMXRI<0x7E, MRMDestMem, (outs), 
+                         (ins i64mem:$dst, VR64:$src),
+                         "movq\t{$src, $dst|$dst, $src}", []>;
 
 let neverHasSideEffects = 1 in
 def MMX_MOVD64to64rr : MMXRI<0x6E, MRMSrcReg, (outs VR64:$dst), (ins GR64:$src),
@@ -181,7 +186,7 @@ def MMX_MOVD64from64rr : MMXRI<0x7E, MRMDestReg,
 def MMX_MOVD64rrv164 : MMXI<0x6E, MRMSrcReg, (outs VR64:$dst), (ins GR64:$src),
                             "movd\t{$src, $dst|$dst, $src}",
                             [(set VR64:$dst,
-			      (v1i64 (scalar_to_vector GR64:$src)))]>;
+                             (v1i64 (scalar_to_vector GR64:$src)))]>;
 
 let neverHasSideEffects = 1 in
 def MMX_MOVQ64rr : MMXI<0x6F, MRMSrcReg, (outs VR64:$dst), (ins VR64:$src),
@@ -223,7 +228,7 @@ def MMX_MOVZDI2PDIrr : MMXI<0x6E, MRMSrcReg, (outs VR64:$dst), (ins GR32:$src),
                     (v2i32 (X86vzmovl (v2i32 (scalar_to_vector GR32:$src)))))]>;
 let AddedComplexity = 20 in
 def MMX_MOVZDI2PDIrm : MMXI<0x6E, MRMSrcMem, (outs VR64:$dst),
-					     (ins i32mem:$src),
+                           (ins i32mem:$src),
                              "movd\t{$src, $dst|$dst, $src}",
           [(set VR64:$dst,
                 (v2i32 (X86vzmovl (v2i32
@@ -432,21 +437,21 @@ def MMX_CVTPD2PIrr  : MMX2I<0x2D, MRMSrcReg, (outs VR64:$dst), (ins VR128:$src),
                             "cvtpd2pi\t{$src, $dst|$dst, $src}", []>;
 let mayLoad = 1 in
 def MMX_CVTPD2PIrm  : MMX2I<0x2D, MRMSrcMem, (outs VR64:$dst),
-					     (ins f128mem:$src),
+                            (ins f128mem:$src),
                             "cvtpd2pi\t{$src, $dst|$dst, $src}", []>;
 
 def MMX_CVTPI2PDrr  : MMX2I<0x2A, MRMSrcReg, (outs VR128:$dst), (ins VR64:$src),
                             "cvtpi2pd\t{$src, $dst|$dst, $src}", []>;
 let mayLoad = 1 in
 def MMX_CVTPI2PDrm  : MMX2I<0x2A, MRMSrcMem, (outs VR128:$dst),
-	  				     (ins i64mem:$src),
+                            (ins i64mem:$src),
                             "cvtpi2pd\t{$src, $dst|$dst, $src}", []>;
 
 def MMX_CVTPI2PSrr  : MMXI<0x2A, MRMSrcReg, (outs VR128:$dst), (ins VR64:$src),
                            "cvtpi2ps\t{$src, $dst|$dst, $src}", []>;
 let mayLoad = 1 in
 def MMX_CVTPI2PSrm  : MMXI<0x2A, MRMSrcMem, (outs VR128:$dst),
-					    (ins i64mem:$src),
+                           (ins i64mem:$src),
                            "cvtpi2ps\t{$src, $dst|$dst, $src}", []>;
 
 def MMX_CVTPS2PIrr  : MMXI<0x2D, MRMSrcReg, (outs VR64:$dst), (ins VR128:$src),
@@ -459,7 +464,7 @@ def MMX_CVTTPD2PIrr : MMX2I<0x2C, MRMSrcReg, (outs VR64:$dst), (ins VR128:$src),
                             "cvttpd2pi\t{$src, $dst|$dst, $src}", []>;
 let mayLoad = 1 in
 def MMX_CVTTPD2PIrm : MMX2I<0x2C, MRMSrcMem, (outs VR64:$dst),
-					     (ins f128mem:$src),
+                            (ins f128mem:$src),
                             "cvttpd2pi\t{$src, $dst|$dst, $src}", []>;
 
 def MMX_CVTTPS2PIrr : MMXI<0x2C, MRMSrcReg, (outs VR64:$dst), (ins VR128:$src),
@@ -481,14 +486,14 @@ def MMX_PEXTRWri  : MMXIi8<0xC5, MRMSrcReg,
                                              (iPTR imm:$src2)))]>;
 let Constraints = "$src1 = $dst" in {
   def MMX_PINSRWrri : MMXIi8<0xC4, MRMSrcReg,
-                      (outs VR64:$dst), (ins VR64:$src1, GR32:$src2,
-					     i16i8imm:$src3),
+                      (outs VR64:$dst), 
+                      (ins VR64:$src1, GR32:$src2,i16i8imm:$src3),
                       "pinsrw\t{$src3, $src2, $dst|$dst, $src2, $src3}",
                       [(set VR64:$dst, (v4i16 (MMX_X86pinsrw (v4i16 VR64:$src1),
                                                GR32:$src2,(iPTR imm:$src3))))]>;
   def MMX_PINSRWrmi : MMXIi8<0xC4, MRMSrcMem,
-                     (outs VR64:$dst), (ins VR64:$src1, i16mem:$src2,
-					    i16i8imm:$src3),
+                     (outs VR64:$dst),
+                     (ins VR64:$src1, i16mem:$src2, i16i8imm:$src3),
                      "pinsrw\t{$src3, $src2, $dst|$dst, $src2, $src3}",
                      [(set VR64:$dst,
                        (v4i16 (MMX_X86pinsrw (v4i16 VR64:$src1),
diff --git a/lib/Target/X86/X86InstrSSE.td b/lib/Target/X86/X86InstrSSE.td
index 62841f8dec..ae1a68aea4 100644
--- a/lib/Target/X86/X86InstrSSE.td
+++ b/lib/Target/X86/X86InstrSSE.td
@@ -70,7 +70,7 @@ def X86pcmpgtd : SDNode<"X86ISD::PCMPGTD", SDTIntBinOp>;
 def X86pcmpgtq : SDNode<"X86ISD::PCMPGTQ", SDTIntBinOp>;
 
 def SDTX86CmpPTest : SDTypeProfile<0, 2, [SDTCisVT<0, v4f32>,
-					  SDTCisVT<1, v4f32>]>;
+                                          SDTCisVT<1, v4f32>]>;
 def X86ptest   : SDNode<"X86ISD::PTEST", SDTX86CmpPTest>;
 
 //===----------------------------------------------------------------------===//
@@ -116,12 +116,18 @@ def alignedload : PatFrag<(ops node:$ptr), (load node:$ptr), [{
   return cast<LoadSDNode>(N)->getAlignment() >= 16;
 }]>;
 
-def alignedloadfsf32 : PatFrag<(ops node:$ptr), (f32   (alignedload node:$ptr))>;
-def alignedloadfsf64 : PatFrag<(ops node:$ptr), (f64   (alignedload node:$ptr))>;
-def alignedloadv4f32 : PatFrag<(ops node:$ptr), (v4f32 (alignedload node:$ptr))>;
-def alignedloadv2f64 : PatFrag<(ops node:$ptr), (v2f64 (alignedload node:$ptr))>;
-def alignedloadv4i32 : PatFrag<(ops node:$ptr), (v4i32 (alignedload node:$ptr))>;
-def alignedloadv2i64 : PatFrag<(ops node:$ptr), (v2i64 (alignedload node:$ptr))>;
+def alignedloadfsf32 : PatFrag<(ops node:$ptr), 
+                               (f32 (alignedload node:$ptr))>;
+def alignedloadfsf64 : PatFrag<(ops node:$ptr), 
+                               (f64 (alignedload node:$ptr))>;
+def alignedloadv4f32 : PatFrag<(ops node:$ptr), 
+                               (v4f32 (alignedload node:$ptr))>;
+def alignedloadv2f64 : PatFrag<(ops node:$ptr), 
+                               (v2f64 (alignedload node:$ptr))>;
+def alignedloadv4i32 : PatFrag<(ops node:$ptr), 
+                               (v4i32 (alignedload node:$ptr))>;
+def alignedloadv2i64 : PatFrag<(ops node:$ptr), 
+                               (v2i64 (alignedload node:$ptr))>;
 
 // Like 'load', but uses special alignment checks suitable for use in
 // memory operands in most SSE instructions, which are required to
@@ -363,6 +369,11 @@ def CVTSI2SSrm  : SSI<0x2A, MRMSrcMem, (outs FR32:$dst), (ins i32mem:$src),
                       [(set FR32:$dst, (sint_to_fp (loadi32 addr:$src)))]>;
 
 // Match intrinsics which expect XMM operand(s).
+def CVTSS2SIrr: SSI<0x2D, MRMSrcReg, (outs GR32:$dst), (ins FR32:$src),
+                    "cvtss2si{l}\t{$src, $dst|$dst, $src}", []>;
+def CVTSS2SIrm: SSI<0x2D, MRMSrcMem, (outs GR32:$dst), (ins f32mem:$src),
+                    "cvtss2si{l}\t{$src, $dst|$dst, $src}", []>;
+
 def Int_CVTSS2SIrr : SSI<0x2D, MRMSrcReg, (outs GR32:$dst), (ins VR128:$src),
                          "cvtss2si\t{$src, $dst|$dst, $src}",
                          [(set GR32:$dst, (int_x86_sse_cvtss2si VR128:$src))]>;
@@ -441,19 +452,26 @@ def UCOMISSrm: PSI<0x2E, MRMSrcMem, (outs), (ins FR32:$src1, f32mem:$src2),
                    "ucomiss\t{$src2, $src1|$src1, $src2}",
                    [(X86cmp FR32:$src1, (loadf32 addr:$src2)),
                     (implicit EFLAGS)]>;
+                    
+def COMISSrr: PSI<0x2F, MRMSrcReg, (outs), (ins VR128:$src1, VR128:$src2),
+                  "comiss\t{$src2, $src1|$src1, $src2}", []>;
+def COMISSrm: PSI<0x2F, MRMSrcMem, (outs), (ins VR128:$src1, f128mem:$src2),
+                  "comiss\t{$src2, $src1|$src1, $src2}", []>;
+                  
 } // Defs = [EFLAGS]
 
 // Aliases to match intrinsics which expect XMM operand(s).
 let Constraints = "$src1 = $dst" in {
   def Int_CMPSSrr : SSIi8<0xC2, MRMSrcReg,
-                        (outs VR128:$dst), (ins VR128:$src1, VR128:$src,
-					        SSECC:$cc),
+                        (outs VR128:$dst), 
+                        (ins VR128:$src1, VR128:$src, SSECC:$cc),
                         "cmp${cc}ss\t{$src, $dst|$dst, $src}",
-                        [(set VR128:$dst, (int_x86_sse_cmp_ss VR128:$src1,
-                                           	VR128:$src, imm:$cc))]>;
+                        [(set VR128:$dst, (int_x86_sse_cmp_ss 
+                                             VR128:$src1,
+                                             VR128:$src, imm:$cc))]>;
   def Int_CMPSSrm : SSIi8<0xC2, MRMSrcMem,
-                        (outs VR128:$dst), (ins VR128:$src1, f32mem:$src,
-						SSECC:$cc),
+                        (outs VR128:$dst), 
+                        (ins VR128:$src1, f32mem:$src, SSECC:$cc),
                         "cmp${cc}ss\t{$src, $dst|$dst, $src}",
                         [(set VR128:$dst, (int_x86_sse_cmp_ss VR128:$src1,
                                            (load addr:$src), imm:$cc))]>;
@@ -1205,14 +1223,14 @@ def UCOMISDrm: PDI<0x2E, MRMSrcMem, (outs), (ins FR64:$src1, f64mem:$src2),
 // Aliases to match intrinsics which expect XMM operand(s).
 let Constraints = "$src1 = $dst" in {
   def Int_CMPSDrr : SDIi8<0xC2, MRMSrcReg,
-                        (outs VR128:$dst), (ins VR128:$src1, VR128:$src,
-						SSECC:$cc),
+                        (outs VR128:$dst), 
+                        (ins VR128:$src1, VR128:$src, SSECC:$cc),
                         "cmp${cc}sd\t{$src, $dst|$dst, $src}",
                         [(set VR128:$dst, (int_x86_sse2_cmp_sd VR128:$src1,
                                            VR128:$src, imm:$cc))]>;
   def Int_CMPSDrm : SDIi8<0xC2, MRMSrcMem,
-                        (outs VR128:$dst), (ins VR128:$src1, f64mem:$src,
-						SSECC:$cc),
+                        (outs VR128:$dst), 
+                        (ins VR128:$src1, f64mem:$src, SSECC:$cc),
                         "cmp${cc}sd\t{$src, $dst|$dst, $src}",
                         [(set VR128:$dst, (int_x86_sse2_cmp_sd VR128:$src1,
                                            (load addr:$src), imm:$cc))]>;
@@ -1542,9 +1560,15 @@ def Int_CVTPS2DQrm : PDI<0x5B, MRMSrcMem, (outs VR128:$dst), (ins f128mem:$src),
                          [(set VR128:$dst, (int_x86_sse2_cvtps2dq
                                             (memop addr:$src)))]>;
 // SSE2 packed instructions with XS prefix
+def CVTTPS2DQrr : SSI<0x5B, MRMSrcReg, (outs VR128:$dst), (ins VR128:$src),
+                      "cvttps2dq\t{$src, $dst|$dst, $src}", []>;
+def CVTTPS2DQrm : SSI<0x5B, MRMSrcMem, (outs VR128:$dst), (ins f128mem:$src),
+                      "cvttps2dq\t{$src, $dst|$dst, $src}", []>;
+
 def Int_CVTTPS2DQrr : I<0x5B, MRMSrcReg, (outs VR128:$dst), (ins VR128:$src),
                         "cvttps2dq\t{$src, $dst|$dst, $src}",
-                        [(set VR128:$dst, (int_x86_sse2_cvttps2dq VR128:$src))]>,
+                        [(set VR128:$dst, 
+                              (int_x86_sse2_cvttps2dq VR128:$src))]>,
                       XS, Requires<[HasSSE2]>;
 def Int_CVTTPS2DQrm : I<0x5B, MRMSrcMem, (outs VR128:$dst), (ins f128mem:$src),
                         "cvttps2dq\t{$src, $dst|$dst, $src}",
@@ -1572,6 +1596,11 @@ def Int_CVTTPD2DQrm : PDI<0xE6, MRMSrcMem, (outs VR128:$dst),(ins f128mem:$src),
                                              (memop addr:$src)))]>;
 
 // SSE2 instructions without OpSize prefix
+def CVTPS2PDrr : I<0x5A, MRMSrcReg, (outs VR128:$dst), (ins VR128:$src),
+                       "cvtps2pd\t{$src, $dst|$dst, $src}", []>, TB;
+def CVTPS2PDrm : I<0x5A, MRMSrcMem, (outs VR128:$dst), (ins f64mem:$src),
+                       "cvtps2pd\t{$src, $dst|$dst, $src}", []>, TB;
+
 def Int_CVTPS2PDrr : I<0x5A, MRMSrcReg, (outs VR128:$dst), (ins VR128:$src),
                        "cvtps2pd\t{$src, $dst|$dst, $src}",
                        [(set VR128:$dst, (int_x86_sse2_cvtps2pd VR128:$src))]>,
@@ -1582,6 +1611,12 @@ def Int_CVTPS2PDrm : I<0x5A, MRMSrcMem, (outs VR128:$dst), (ins f64mem:$src),
                                           (load addr:$src)))]>,
                      TB, Requires<[HasSSE2]>;
 
+def CVTPD2PSrr : PDI<0x5A, MRMSrcReg, (outs VR128:$dst), (ins VR128:$src),
+                     "cvtpd2ps\t{$src, $dst|$dst, $src}", []>;
+def CVTPD2PSrm : PDI<0x5A, MRMSrcMem, (outs VR128:$dst), (ins f128mem:$src),
+                     "cvtpd2ps\t{$src, $dst|$dst, $src}", []>;
+
+
 def Int_CVTPD2PSrr : PDI<0x5A, MRMSrcReg, (outs VR128:$dst), (ins VR128:$src),
                          "cvtpd2ps\t{$src, $dst|$dst, $src}",
                         [(set VR128:$dst, (int_x86_sse2_cvtpd2ps VR128:$src))]>;
@@ -1856,31 +1891,34 @@ let Constraints = "$src1 = $dst" in {
 
 multiclass PDI_binop_rm_int<bits<8> opc, string OpcodeStr, Intrinsic IntId,
                             bit Commutable = 0> {
-  def rr : PDI<opc, MRMSrcReg, (outs VR128:$dst), (ins VR128:$src1, VR128:$src2),
+  def rr : PDI<opc, MRMSrcReg, (outs VR128:$dst), 
+                               (ins VR128:$src1, VR128:$src2),
                !strconcat(OpcodeStr, "\t{$src2, $dst|$dst, $src2}"),
                [(set VR128:$dst, (IntId VR128:$src1, VR128:$src2))]> {
     let isCommutable = Commutable;
   }
-  def rm : PDI<opc, MRMSrcMem, (outs VR128:$dst), (ins VR128:$src1, i128mem:$src2),
+  def rm : PDI<opc, MRMSrcMem, (outs VR128:$dst), 
+                               (ins VR128:$src1, i128mem:$src2),
                !strconcat(OpcodeStr, "\t{$src2, $dst|$dst, $src2}"),
                [(set VR128:$dst, (IntId VR128:$src1,
-                                        (bitconvert (memopv2i64 addr:$src2))))]>;
+                                        (bitconvert (memopv2i64 
+                                                     addr:$src2))))]>;
 }
 
 multiclass PDI_binop_rmi_int<bits<8> opc, bits<8> opc2, Format ImmForm,
                              string OpcodeStr,
                              Intrinsic IntId, Intrinsic IntId2> {
-  def rr : PDI<opc, MRMSrcReg, (outs VR128:$dst), (ins VR128:$src1,
-						       VR128:$src2),
+  def rr : PDI<opc, MRMSrcReg, (outs VR128:$dst), 
+                               (ins VR128:$src1, VR128:$src2),
                !strconcat(OpcodeStr, "\t{$src2, $dst|$dst, $src2}"),
                [(set VR128:$dst, (IntId VR128:$src1, VR128:$src2))]>;
-  def rm : PDI<opc, MRMSrcMem, (outs VR128:$dst), (ins VR128:$src1,
-						       i128mem:$src2),
+  def rm : PDI<opc, MRMSrcMem, (outs VR128:$dst),
+                               (ins VR128:$src1, i128mem:$src2),
                !strconcat(OpcodeStr, "\t{$src2, $dst|$dst, $src2}"),
                [(set VR128:$dst, (IntId VR128:$src1,
                                       (bitconvert (memopv2i64 addr:$src2))))]>;
-  def ri : PDIi8<opc2, ImmForm, (outs VR128:$dst), (ins VR128:$src1,
-							i32i8imm:$src2),
+  def ri : PDIi8<opc2, ImmForm, (outs VR128:$dst), 
+                                (ins VR128:$src1, i32i8imm:$src2),
                !strconcat(OpcodeStr, "\t{$src2, $dst|$dst, $src2}"),
                [(set VR128:$dst, (IntId2 VR128:$src1, (i32 imm:$src2)))]>;
 }
@@ -1888,14 +1926,14 @@ multiclass PDI_binop_rmi_int<bits<8> opc, bits<8> opc2, Format ImmForm,
 /// PDI_binop_rm - Simple SSE2 binary operator.
 multiclass PDI_binop_rm<bits<8> opc, string OpcodeStr, SDNode OpNode,
                         ValueType OpVT, bit Commutable = 0> {
-  def rr : PDI<opc, MRMSrcReg, (outs VR128:$dst), (ins VR128:$src1,
-						       VR128:$src2),
+  def rr : PDI<opc, MRMSrcReg, (outs VR128:$dst), 
+                               (ins VR128:$src1, VR128:$src2),
                !strconcat(OpcodeStr, "\t{$src2, $dst|$dst, $src2}"),
                [(set VR128:$dst, (OpVT (OpNode VR128:$src1, VR128:$src2)))]> {
     let isCommutable = Commutable;
   }
-  def rm : PDI<opc, MRMSrcMem, (outs VR128:$dst), (ins VR128:$src1,
-						       i128mem:$src2),
+  def rm : PDI<opc, MRMSrcMem, (outs VR128:$dst), 
+                               (ins VR128:$src1, i128mem:$src2),
                !strconcat(OpcodeStr, "\t{$src2, $dst|$dst, $src2}"),
                [(set VR128:$dst, (OpVT (OpNode VR128:$src1,
                                      (bitconvert (memopv2i64 addr:$src2)))))]>;
@@ -1909,16 +1947,16 @@ multiclass PDI_binop_rm<bits<8> opc, string OpcodeStr, SDNode OpNode,
 multiclass PDI_binop_rm_v2i64<bits<8> opc, string OpcodeStr, SDNode OpNode,
                               bit Commutable = 0> {
   def rr : PDI<opc, MRMSrcReg, (outs VR128:$dst),
-			       (ins VR128:$src1, VR128:$src2),
+               (ins VR128:$src1, VR128:$src2),
                !strconcat(OpcodeStr, "\t{$src2, $dst|$dst, $src2}"),
                [(set VR128:$dst, (v2i64 (OpNode VR128:$src1, VR128:$src2)))]> {
     let isCommutable = Commutable;
   }
   def rm : PDI<opc, MRMSrcMem, (outs VR128:$dst),
-			       (ins VR128:$src1, i128mem:$src2),
+               (ins VR128:$src1, i128mem:$src2),
                !strconcat(OpcodeStr, "\t{$src2, $dst|$dst, $src2}"),
                [(set VR128:$dst, (OpNode VR128:$src1,
-					 (memopv2i64 addr:$src2)))]>;
+               (memopv2i64 addr:$src2)))]>;
 }
 
 } // Constraints = "$src1 = $dst"
@@ -2455,6 +2493,13 @@ def : Pat<(v2i64 (X86vzmovl (bc_v2i64 (loadv4i32 addr:$src)))),
             (MOVZPQILo2PQIrm addr:$src)>;
 }
 
+// Instructions for the disassembler
+// xr = XMM register
+// xm = mem64
+
+def MOVQxrxr : I<0x7E, MRMSrcReg, (outs VR128:$dst), (ins VR128:$src),
+                 "movq\t{$src, $dst|$dst, $src}", []>, XS;
+
 //===---------------------------------------------------------------------===//
 // SSE3 Instructions
 //===---------------------------------------------------------------------===//
@@ -3661,7 +3706,7 @@ let Constraints = "$src1 = $dst" in {
                     "\t{$src3, $src2, $dst|$dst, $src2, $src3}"),
                    [(set VR128:$dst,
                      (X86insrtps VR128:$src1, VR128:$src2, imm:$src3))]>,
-		OpSize;
+      OpSize;
     def rm : SS4AIi8<opc, MRMSrcMem, (outs VR128:$dst),
                    (ins VR128:$src1, f32mem:$src2, i32i8imm:$src3),
                    !strconcat(OpcodeStr,
@@ -3786,76 +3831,63 @@ let Constraints = "$src1 = $dst" in {
 // String/text processing instructions.
 let Defs = [EFLAGS], usesCustomInserter = 1 in {
 def PCMPISTRM128REG : SS42AI<0, Pseudo, (outs VR128:$dst),
-			(ins VR128:$src1, VR128:$src2, i8imm:$src3),
-		    "#PCMPISTRM128rr PSEUDO!",
-		    [(set VR128:$dst,
-			(int_x86_sse42_pcmpistrm128 VR128:$src1, VR128:$src2,
-						    imm:$src3))]>, OpSize;
+  (ins VR128:$src1, VR128:$src2, i8imm:$src3),
+  "#PCMPISTRM128rr PSEUDO!",
+  [(set VR128:$dst, (int_x86_sse42_pcmpistrm128 VR128:$src1, VR128:$src2,
+                                                imm:$src3))]>, OpSize;
 def PCMPISTRM128MEM : SS42AI<0, Pseudo, (outs VR128:$dst),
-			(ins VR128:$src1, i128mem:$src2, i8imm:$src3),
-		    "#PCMPISTRM128rm PSEUDO!",
-		    [(set VR128:$dst,
-			(int_x86_sse42_pcmpistrm128 VR128:$src1,
-						    (load addr:$src2),
-						    imm:$src3))]>, OpSize;
+  (ins VR128:$src1, i128mem:$src2, i8imm:$src3),
+  "#PCMPISTRM128rm PSEUDO!",
+  [(set VR128:$dst, (int_x86_sse42_pcmpistrm128 VR128:$src1, (load addr:$src2),
+                                                imm:$src3))]>, OpSize;
 }
 
 let Defs = [XMM0, EFLAGS] in {
 def PCMPISTRM128rr : SS42AI<0x62, MRMSrcReg, (outs),
-			    (ins VR128:$src1, VR128:$src2, i8imm:$src3),
-		     "pcmpistrm\t{$src3, $src2, $src1|$src1, $src2, $src3}",
-		     []>, OpSize;
+  (ins VR128:$src1, VR128:$src2, i8imm:$src3),
+   "pcmpistrm\t{$src3, $src2, $src1|$src1, $src2, $src3}", []>, OpSize;
 def PCMPISTRM128rm : SS42AI<0x62, MRMSrcMem, (outs),
-			    (ins VR128:$src1, i128mem:$src2, i8imm:$src3),
-		     "pcmpistrm\t{$src3, $src2, $src1|$src1, $src2, $src3}",
-		     []>, OpSize;
+  (ins VR128:$src1, i128mem:$src2, i8imm:$src3),
+  "pcmpistrm\t{$src3, $src2, $src1|$src1, $src2, $src3}", []>, OpSize;
 }
 
-let Defs = [EFLAGS], Uses = [EAX, EDX],
-	usesCustomInserter = 1 in {
+let Defs = [EFLAGS], Uses = [EAX, EDX], usesCustomInserter = 1 in {
 def PCMPESTRM128REG : SS42AI<0, Pseudo, (outs VR128:$dst),
-			(ins VR128:$src1, VR128:$src3, i8imm:$src5),
-		    "#PCMPESTRM128rr PSEUDO!",
-		    [(set VR128:$dst,
-			(int_x86_sse42_pcmpestrm128 VR128:$src1, EAX,
-						    VR128:$src3,
-						    EDX, imm:$src5))]>, OpSize;
+  (ins VR128:$src1, VR128:$src3, i8imm:$src5),
+  "#PCMPESTRM128rr PSEUDO!",
+  [(set VR128:$dst, 
+        (int_x86_sse42_pcmpestrm128 
+         VR128:$src1, EAX, VR128:$src3, EDX, imm:$src5))]>, OpSize;
+
 def PCMPESTRM128MEM : SS42AI<0, Pseudo, (outs VR128:$dst),
-			(ins VR128:$src1, i128mem:$src3, i8imm:$src5),
-		    "#PCMPESTRM128rm PSEUDO!",
-		    [(set VR128:$dst,
-			(int_x86_sse42_pcmpestrm128 VR128:$src1, EAX,
-						    (load addr:$src3),
-						    EDX, imm:$src5))]>, OpSize;
+  (ins VR128:$src1, i128mem:$src3, i8imm:$src5),
+  "#PCMPESTRM128rm PSEUDO!",
+  [(set VR128:$dst, (int_x86_sse42_pcmpestrm128 
+                     VR128:$src1, EAX, (load addr:$src3), EDX, imm:$src5))]>, 
+  OpSize;
 }
 
 let Defs = [XMM0, EFLAGS], Uses = [EAX, EDX] in {
 def PCMPESTRM128rr : SS42AI<0x60, MRMSrcReg, (outs),
-			    (ins VR128:$src1, VR128:$src3, i8imm:$src5),
-		     "pcmpestrm\t{$src5, $src3, $src1|$src1, $src3, $src5}",
-		     []>, OpSize;
+  (ins VR128:$src1, VR128:$src3, i8imm:$src5),
+  "pcmpestrm\t{$src5, $src3, $src1|$src1, $src3, $src5}", []>, OpSize;
 def PCMPESTRM128rm : SS42AI<0x60, MRMSrcMem, (outs),
-			    (ins VR128:$src1, i128mem:$src3, i8imm:$src5),
-		     "pcmpestrm\t{$src5, $src3, $src1|$src1, $src3, $src5}",
-		     []>, OpSize;
+  (ins VR128:$src1, i128mem:$src3, i8imm:$src5),
+  "pcmpestrm\t{$src5, $src3, $src1|$src1, $src3, $src5}", []>, OpSize;
 }
 
 let Defs = [ECX, EFLAGS] in {
   multiclass SS42AI_pcmpistri<Intrinsic IntId128> {
-    def rr : SS42AI<0x63, MRMSrcReg, (outs),
-		(ins VR128:$src1, VR128:$src2, i8imm:$src3),
-		"pcmpistri\t{$src3, $src2, $src1|$src1, $src2, $src3}",
-		[(set ECX,
-		   (IntId128 VR128:$src1, VR128:$src2, imm:$src3)),
-	         (implicit EFLAGS)]>,
-		OpSize;
+    def rr : SS42AI<0x63, MRMSrcReg, (outs), 
+      (ins VR128:$src1, VR128:$src2, i8imm:$src3),
+      "pcmpistri\t{$src3, $src2, $src1|$src1, $src2, $src3}",
+      [(set ECX, (IntId128 VR128:$src1, VR128:$src2, imm:$src3)),
+       (implicit EFLAGS)]>, OpSize;
     def rm : SS42AI<0x63, MRMSrcMem, (outs),
-		(ins VR128:$src1, i128mem:$src2, i8imm:$src3),
-		"pcmpistri\t{$src3, $src2, $src1|$src1, $src2, $src3}",
-		[(set ECX,
-		  (IntId128 VR128:$src1, (load addr:$src2), imm:$src3)),
-		 (implicit EFLAGS)]>,
-		OpSize;
+      (ins VR128:$src1, i128mem:$src2, i8imm:$src3),
+      "pcmpistri\t{$src3, $src2, $src1|$src1, $src2, $src3}",
+      [(set ECX, (IntId128 VR128:$src1, (load addr:$src2), imm:$src3)),
+       (implicit EFLAGS)]>, OpSize;
   }
 }
 
@@ -3870,20 +3902,16 @@ let Defs = [ECX, EFLAGS] in {
 let Uses = [EAX, EDX] in {
   multiclass SS42AI_pcmpestri<Intrinsic IntId128> {
     def rr : SS42AI<0x61, MRMSrcReg, (outs),
-		(ins VR128:$src1, VR128:$src3, i8imm:$src5),
-		"pcmpestri\t{$src5, $src3, $src1|$src1, $src3, $src5}",
-		[(set ECX,
-		   (IntId128 VR128:$src1, EAX, VR128:$src3, EDX, imm:$src5)),
-	         (implicit EFLAGS)]>,
-		OpSize;
+      (ins VR128:$src1, VR128:$src3, i8imm:$src5),
+      "pcmpestri\t{$src5, $src3, $src1|$src1, $src3, $src5}",
+      [(set ECX, (IntId128 VR128:$src1, EAX, VR128:$src3, EDX, imm:$src5)),
+       (implicit EFLAGS)]>, OpSize;
     def rm : SS42AI<0x61, MRMSrcMem, (outs),
-		(ins VR128:$src1, i128mem:$src3, i8imm:$src5),
-		"pcmpestri\t{$src5, $src3, $src1|$src1, $src3, $src5}",
-		[(set ECX,
-		  (IntId128 VR128:$src1, EAX, (load addr:$src3),
-		    EDX, imm:$src5)),
-		 (implicit EFLAGS)]>,
-		OpSize;
+      (ins VR128:$src1, i128mem:$src3, i8imm:$src5),
+       "pcmpestri\t{$src5, $src3, $src1|$src1, $src3, $src5}",
+       [(set ECX, 
+             (IntId128 VR128:$src1, EAX, (load addr:$src3), EDX, imm:$src5)),
+        (implicit EFLAGS)]>, OpSize;
   }
 }
 }
diff --git a/lib/Target/X86/X86RegisterInfo.td b/lib/Target/X86/X86RegisterInfo.td
index 7bf074d499..6db0cc3057 100644
--- a/lib/Target/X86/X86RegisterInfo.td
+++ b/lib/Target/X86/X86RegisterInfo.td
@@ -195,6 +195,36 @@ let Namespace = "X86" in {
   def ES : Register<"es">;
   def FS : Register<"fs">;
   def GS : Register<"gs">;
+  
+  // Debug registers
+  def DR0 : Register<"dr0">;
+  def DR1 : Register<"dr1">;
+  def DR2 : Register<"dr2">;
+  def DR3 : Register<"dr3">;
+  def DR4 : Register<"dr4">;
+  def DR5 : Register<"dr5">;
+  def DR6 : Register<"dr6">;
+  def DR7 : Register<"dr7">;
+  
+  // Condition registers
+  def ECR0 : Register<"ecr0">;
+  def ECR1 : Register<"ecr1">;
+  def ECR2 : Register<"ecr2">;
+  def ECR3 : Register<"ecr3">;
+  def ECR4 : Register<"ecr4">;
+  def ECR5 : Register<"ecr5">;
+  def ECR6 : Register<"ecr6">;
+  def ECR7 : Register<"ecr7">;
+
+  def RCR0 : Register<"rcr0">;
+  def RCR1 : Register<"rcr1">;
+  def RCR2 : Register<"rcr2">;
+  def RCR3 : Register<"rcr3">;
+  def RCR4 : Register<"rcr4">;
+  def RCR5 : Register<"rcr5">;
+  def RCR6 : Register<"rcr6">;
+  def RCR7 : Register<"rcr7">;
+  def RCR8 : Register<"rcr8">; 
 }
 
 
@@ -446,6 +476,22 @@ def GR64 : RegisterClass<"X86", [i64], 64,
 def SEGMENT_REG : RegisterClass<"X86", [i16], 16, [CS, DS, SS, ES, FS, GS]> {
 }
 
+// Debug registers.
+def DEBUG_REG : RegisterClass<"X86", [i32], 32, 
+                              [DR0, DR1, DR2, DR3, DR4, DR5, DR6, DR7]> {
+}
+
+// Control registers.
+def CONTROL_REG_32 : RegisterClass<"X86", [i32], 32,
+                                   [ECR0, ECR1, ECR2, ECR3, ECR4, ECR5, ECR6,
+                                    ECR7]> {
+}
+
+def CONTROL_REG_64 : RegisterClass<"X86", [i64], 64,
+                                   [RCR0, RCR1, RCR2, RCR3, RCR4, RCR5, RCR6,
+                                    RCR7, RCR8]> {
+}
+
 // GR8_ABCD_L, GR8_ABCD_H, GR16_ABCD, GR32_ABCD, GR64_ABCD - Subclasses of
 // GR8, GR16, GR32, and GR64 which contain just the "a" "b", "c", and "d"
 // registers. On x86-32, GR16_ABCD and GR32_ABCD are classes for registers
@@ -661,7 +707,8 @@ def GR64_NOREX_NOSP : RegisterClass<"X86", [i64], 64,
   }];
   let MethodBodies = [{
     GR64_NOREX_NOSPClass::iterator
-    GR64_NOREX_NOSPClass::allocation_order_end(const MachineFunction &MF) const {
+    GR64_NOREX_NOSPClass::allocation_order_end(const MachineFunction &MF) const
+  {
       const TargetMachine &TM = MF.getTarget();
       const TargetRegisterInfo *RI = TM.getRegisterInfo();
       // Does the function dedicate RBP to being a frame ptr?
diff --git a/test/CodeGen/X86/2009-11-04-SubregCoalescingBug.ll b/test/CodeGen/X86/2009-11-04-SubregCoalescingBug.ll
index d84b63a21b..628b8993f3 100644
--- a/test/CodeGen/X86/2009-11-04-SubregCoalescingBug.ll
+++ b/test/CodeGen/X86/2009-11-04-SubregCoalescingBug.ll
@@ -5,7 +5,7 @@ define void @bar(i32 %b, i32 %a) nounwind optsize ssp {
 entry:
 ; CHECK:     leal 15(%rsi), %edi
 ; CHECK-NOT: movl
-; CHECK:     call _foo
+; CHECK:     callq _foo
   %0 = add i32 %a, 15                             ; <i32> [#uses=1]
   %1 = zext i32 %0 to i64                         ; <i64> [#uses=1]
   tail call void @foo(i64 %1) nounwind
diff --git a/test/CodeGen/X86/abi-isel.ll b/test/CodeGen/X86/abi-isel.ll
index 6d7b2d4343..6cc1518336 100644
--- a/test/CodeGen/X86/abi-isel.ll
+++ b/test/CodeGen/X86/abi-isel.ll
@@ -8365,13 +8365,13 @@ entry:
 	tail call void @x() nounwind
 	ret void
 ; LINUX-64-STATIC: lcallee:
-; LINUX-64-STATIC: call    x
-; LINUX-64-STATIC: call    x
-; LINUX-64-STATIC: call    x
-; LINUX-64-STATIC: call    x
-; LINUX-64-STATIC: call    x
-; LINUX-64-STATIC: call    x
-; LINUX-64-STATIC: call    x
+; LINUX-64-STATIC: callq   x
+; LINUX-64-STATIC: callq   x
+; LINUX-64-STATIC: callq   x
+; LINUX-64-STATIC: callq   x
+; LINUX-64-STATIC: callq   x
+; LINUX-64-STATIC: callq   x
+; LINUX-64-STATIC: callq   x
 ; LINUX-64-STATIC: ret
 
 ; LINUX-32-STATIC: lcallee:
@@ -8400,13 +8400,13 @@ entry:
 
 ; LINUX-64-PIC: lcallee:
 ; LINUX-64-PIC: 	subq	$8, %rsp
-; LINUX-64-PIC-NEXT: 	call	x@PLT
-; LINUX-64-PIC-NEXT: 	call	x@PLT
-; LINUX-64-PIC-NEXT: 	call	x@PLT
-; LINUX-64-PIC-NEXT: 	call	x@PLT
-; LINUX-64-PIC-NEXT: 	call	x@PLT
-; LINUX-64-PIC-NEXT: 	call	x@PLT
-; LINUX-64-PIC-NEXT: 	call	x@PLT
+; LINUX-64-PIC-NEXT: 	callq	x@PLT
+; LINUX-64-PIC-NEXT: 	callq	x@PLT
+; LINUX-64-PIC-NEXT: 	callq	x@PLT
+; LINUX-64-PIC-NEXT: 	callq	x@PLT
+; LINUX-64-PIC-NEXT: 	callq	x@PLT
+; LINUX-64-PIC-NEXT: 	callq	x@PLT
+; LINUX-64-PIC-NEXT: 	callq	x@PLT
 ; LINUX-64-PIC-NEXT: 	addq	$8, %rsp
 ; LINUX-64-PIC-NEXT: 	ret
 
@@ -8448,37 +8448,37 @@ entry:
 
 ; DARWIN-64-STATIC: _lcallee:
 ; DARWIN-64-STATIC: 	subq	$8, %rsp
-; DARWIN-64-STATIC-NEXT: 	call	_x
-; DARWIN-64-STATIC-NEXT: 	call	_x
-; DARWIN-64-STATIC-NEXT: 	call	_x
-; DARWIN-64-STATIC-NEXT: 	call	_x
-; DARWIN-64-STATIC-NEXT: 	call	_x
-; DARWIN-64-STATIC-NEXT: 	call	_x
-; DARWIN-64-STATIC-NEXT: 	call	_x
+; DARWIN-64-STATIC-NEXT: 	callq	_x
+; DARWIN-64-STATIC-NEXT: 	callq	_x
+; DARWIN-64-STATIC-NEXT: 	callq	_x
+; DARWIN-64-STATIC-NEXT: 	callq	_x
+; DARWIN-64-STATIC-NEXT: 	callq	_x
+; DARWIN-64-STATIC-NEXT: 	callq	_x
+; DARWIN-64-STATIC-NEXT: 	callq	_x
 ; DARWIN-64-STATIC-NEXT: 	addq	$8, %rsp
 ; DARWIN-64-STATIC-NEXT: 	ret
 
 ; DARWIN-64-DYNAMIC: _lcallee:
 ; DARWIN-64-DYNAMIC: 	subq	$8, %rsp
-; DARWIN-64-DYNAMIC-NEXT: 	call	_x
-; DARWIN-64-DYNAMIC-NEXT: 	call	_x
-; DARWIN-64-DYNAMIC-NEXT: 	call	_x
-; DARWIN-64-DYNAMIC-NEXT: 	call	_x
-; DARWIN-64-DYNAMIC-NEXT: 	call	_x
-; DARWIN-64-DYNAMIC-NEXT: 	call	_x
-; DARWIN-64-DYNAMIC-NEXT: 	call	_x
+; DARWIN-64-DYNAMIC-NEXT: 	callq	_x
+; DARWIN-64-DYNAMIC-NEXT: 	callq	_x
+; DARWIN-64-DYNAMIC-NEXT: 	callq	_x
+; DARWIN-64-DYNAMIC-NEXT: 	callq	_x
+; DARWIN-64-DYNAMIC-NEXT: 	callq	_x
+; DARWIN-64-DYNAMIC-NEXT: 	callq	_x
+; DARWIN-64-DYNAMIC-NEXT: 	callq	_x
 ; DARWIN-64-DYNAMIC-NEXT: 	addq	$8, %rsp
 ; DARWIN-64-DYNAMIC-NEXT: 	ret
 
 ; DARWIN-64-PIC: _lcallee:
 ; DARWIN-64-PIC: 	subq	$8, %rsp
-; DARWIN-64-PIC-NEXT: 	call	_x
-; DARWIN-64-PIC-NEXT: 	call	_x
-; DARWIN-64-PIC-NEXT: 	call	_x
-; DARWIN-64-PIC-NEXT: 	call	_x
-; DARWIN-64-PIC-NEXT: 	call	_x
-; DARWIN-64-PIC-NEXT: 	call	_x
-; DARWIN-64-PIC-NEXT: 	call	_x
+; DARWIN-64-PIC-NEXT: 	callq	_x
+; DARWIN-64-PIC-NEXT: 	callq	_x
+; DARWIN-64-PIC-NEXT: 	callq	_x
+; DARWIN-64-PIC-NEXT: 	callq	_x
+; DARWIN-64-PIC-NEXT: 	callq	_x
+; DARWIN-64-PIC-NEXT: 	callq	_x
+; DARWIN-64-PIC-NEXT: 	callq	_x
 ; DARWIN-64-PIC-NEXT: 	addq	$8, %rsp
 ; DARWIN-64-PIC-NEXT: 	ret
 }
@@ -8496,13 +8496,13 @@ entry:
 	tail call void @y() nounwind
 	ret void
 ; LINUX-64-STATIC: dcallee:
-; LINUX-64-STATIC: call    y
-; LINUX-64-STATIC: call    y
-; LINUX-64-STATIC: call    y
-; LINUX-64-STATIC: call    y
-; LINUX-64-STATIC: call    y
-; LINUX-64-STATIC: call    y
-; LINUX-64-STATIC: call    y
+; LINUX-64-STATIC: callq   y
+; LINUX-64-STATIC: callq   y
+; LINUX-64-STATIC: callq   y
+; LINUX-64-STATIC: callq   y
+; LINUX-64-STATIC: callq   y
+; LINUX-64-STATIC: callq   y
+; LINUX-64-STATIC: callq   y
 ; LINUX-64-STATIC: ret
 
 ; LINUX-32-STATIC: dcallee:
@@ -8531,13 +8531,13 @@ entry:
 
 ; LINUX-64-PIC: dcallee:
 ; LINUX-64-PIC: 	subq	$8, %rsp
-; LINUX-64-PIC-NEXT: 	call	y@PLT
-; LINUX-64-PIC-NEXT: 	call	y@PLT
-; LINUX-64-PIC-NEXT: 	call	y@PLT
-; LINUX-64-PIC-NEXT: 	call	y@PLT
-; LINUX-64-PIC-NEXT: 	call	y@PLT
-; LINUX-64-PIC-NEXT: 	call	y@PLT
-; LINUX-64-PIC-NEXT: 	call	y@PLT
+; LINUX-64-PIC-NEXT: 	callq	y@PLT
+; LINUX-64-PIC-NEXT: 	callq	y@PLT
+; LINUX-64-PIC-NEXT: 	callq	y@PLT
+; LINUX-64-PIC-NEXT: 	callq	y@PLT
+; LINUX-64-PIC-NEXT: 	callq	y@PLT
+; LINUX-64-PIC-NEXT: 	callq	y@PLT
+; LINUX-64-PIC-NEXT: 	callq	y@PLT
 ; LINUX-64-PIC-NEXT: 	addq	$8, %rsp
 ; LINUX-64-PIC-NEXT: 	ret
 
@@ -8579,37 +8579,37 @@ entry:
 
 ; DARWIN-64-STATIC: _dcallee:
 ; DARWIN-64-STATIC: 	subq	$8, %rsp
-; DARWIN-64-STATIC-NEXT: 	call	_y
-; DARWIN-64-STATIC-NEXT: 	call	_y
-; DARWIN-64-STATIC-NEXT: 	call	_y
-; DARWIN-64-STATIC-NEXT: 	call	_y
-; DARWIN-64-STATIC-NEXT: 	call	_y
-; DARWIN-64-STATIC-NEXT: 	call	_y
-; DARWIN-64-STATIC-NEXT: 	call	_y
+; DARWIN-64-STATIC-NEXT: 	callq	_y
+; DARWIN-64-STATIC-NEXT: 	callq	_y
+; DARWIN-64-STATIC-NEXT: 	callq	_y
+; DARWIN-64-STATIC-NEXT: 	callq	_y
+; DARWIN-64-STATIC-NEXT: 	callq	_y
+; DARWIN-64-STATIC-NEXT: 	callq	_y
+; DARWIN-64-STATIC-NEXT: 	callq	_y
 ; DARWIN-64-STATIC-NEXT: 	addq	$8, %rsp
 ; DARWIN-64-STATIC-NEXT: 	ret
 
 ; DARWIN-64-DYNAMIC: _dcallee:
 ; DARWIN-64-DYNAMIC: 	subq	$8, %rsp
-; DARWIN-64-DYNAMIC-NEXT: 	call	_y
-; DARWIN-64-DYNAMIC-NEXT: 	call	_y
-; DARWIN-64-DYNAMIC-NEXT: 	call	_y
-; DARWIN-64-DYNAMIC-NEXT: 	call	_y
-; DARWIN-64-DYNAMIC-NEXT: 	call	_y
-; DARWIN-64-DYNAMIC-NEXT: 	call	_y
-; DARWIN-64-DYNAMIC-NEXT: 	call	_y
+; DARWIN-64-DYNAMIC-NEXT: 	callq	_y
+; DARWIN-64-DYNAMIC-NEXT: 	callq	_y
+; DARWIN-64-DYNAMIC-NEXT: 	callq	_y
+; DARWIN-64-DYNAMIC-NEXT: 	callq	_y
+; DARWIN-64-DYNAMIC-NEXT: 	callq	_y
+; DARWIN-64-DYNAMIC-NEXT: 	callq	_y
+; DARWIN-64-DYNAMIC-NEXT: 	callq	_y
 ; DARWIN-64-DYNAMIC-NEXT: 	addq	$8, %rsp
 ; DARWIN-64-DYNAMIC-NEXT: 	ret
 
 ; DARWIN-64-PIC: _dcallee:
 ; DARWIN-64-PIC: 	subq	$8, %rsp
-; DARWIN-64-PIC-NEXT: 	call	_y
-; DARWIN-64-PIC-NEXT: 	call	_y
-; DARWIN-64-PIC-NEXT: 	call	_y
-; DARWIN-64-PIC-NEXT: 	call	_y
-; DARWIN-64-PIC-NEXT: 	call	_y
-; DARWIN-64-PIC-NEXT: 	call	_y
-; DARWIN-64-PIC-NEXT: 	call	_y
+; DARWIN-64-PIC-NEXT: 	callq	_y
+; DARWIN-64-PIC-NEXT: 	callq	_y
+; DARWIN-64-PIC-NEXT: 	callq	_y
+; DARWIN-64-PIC-NEXT: 	callq	_y
+; DARWIN-64-PIC-NEXT: 	callq	_y
+; DARWIN-64-PIC-NEXT: 	callq	_y
+; DARWIN-64-PIC-NEXT: 	callq	_y
 ; DARWIN-64-PIC-NEXT: 	addq	$8, %rsp
 ; DARWIN-64-PIC-NEXT: 	ret
 }
@@ -8765,8 +8765,8 @@ entry:
 	tail call void @callee() nounwind
 	ret void
 ; LINUX-64-STATIC: caller:
-; LINUX-64-STATIC: call    callee
-; LINUX-64-STATIC: call    callee
+; LINUX-64-STATIC: callq   callee
+; LINUX-64-STATIC: callq   callee
 ; LINUX-64-STATIC: ret
 
 ; LINUX-32-STATIC: caller:
@@ -8785,8 +8785,8 @@ entry:
 
 ; LINUX-64-PIC: caller:
 ; LINUX-64-PIC: 	subq	$8, %rsp
-; LINUX-64-PIC-NEXT: 	call	callee@PLT
-; LINUX-64-PIC-NEXT: 	call	callee@PLT
+; LINUX-64-PIC-NEXT: 	callq	callee@PLT
+; LINUX-64-PIC-NEXT: 	callq	callee@PLT
 ; LINUX-64-PIC-NEXT: 	addq	$8, %rsp
 ; LINUX-64-PIC-NEXT: 	ret
 
@@ -8813,22 +8813,22 @@ entry:
 
 ; DARWIN-64-STATIC: _caller:
 ; DARWIN-64-STATIC: 	subq	$8, %rsp
-; DARWIN-64-STATIC-NEXT: 	call	_callee
-; DARWIN-64-STATIC-NEXT: 	call	_callee
+; DARWIN-64-STATIC-NEXT: 	callq	_callee
+; DARWIN-64-STATIC-NEXT: 	callq	_callee
 ; DARWIN-64-STATIC-NEXT: 	addq	$8, %rsp
 ; DARWIN-64-STATIC-NEXT: 	ret
 
 ; DARWIN-64-DYNAMIC: _caller:
 ; DARWIN-64-DYNAMIC: 	subq	$8, %rsp
-; DARWIN-64-DYNAMIC-NEXT: 	call	_callee
-; DARWIN-64-DYNAMIC-NEXT: 	call	_callee
+; DARWIN-64-DYNAMIC-NEXT: 	callq	_callee
+; DARWIN-64-DYNAMIC-NEXT: 	callq	_callee
 ; DARWIN-64-DYNAMIC-NEXT: 	addq	$8, %rsp
 ; DARWIN-64-DYNAMIC-NEXT: 	ret
 
 ; DARWIN-64-PIC: _caller:
 ; DARWIN-64-PIC: 	subq	$8, %rsp
-; DARWIN-64-PIC-NEXT: 	call	_callee
-; DARWIN-64-PIC-NEXT: 	call	_callee
+; DARWIN-64-PIC-NEXT: 	callq	_callee
+; DARWIN-64-PIC-NEXT: 	callq	_callee
 ; DARWIN-64-PIC-NEXT: 	addq	$8, %rsp
 ; DARWIN-64-PIC-NEXT: 	ret
 }
@@ -8839,8 +8839,8 @@ entry:
 	tail call void @dcallee() nounwind
 	ret void
 ; LINUX-64-STATIC: dcaller:
-; LINUX-64-STATIC: call    dcallee
-; LINUX-64-STATIC: call    dcallee
+; LINUX-64-STATIC: callq   dcallee
+; LINUX-64-STATIC: callq   dcallee
 ; LINUX-64-STATIC: ret
 
 ; LINUX-32-STATIC: dcaller:
@@ -8859,8 +8859,8 @@ entry:
 
 ; LINUX-64-PIC: dcaller:
 ; LINUX-64-PIC: 	subq	$8, %rsp
-; LINUX-64-PIC-NEXT: 	call	dcallee
-; LINUX-64-PIC-NEXT: 	call	dcallee
+; LINUX-64-PIC-NEXT: 	callq	dcallee
+; LINUX-64-PIC-NEXT: 	callq	dcallee
 ; LINUX-64-PIC-NEXT: 	addq	$8, %rsp
 ; LINUX-64-PIC-NEXT: 	ret
 
@@ -8887,22 +8887,22 @@ entry:
 
 ; DARWIN-64-STATIC: _dcaller:
 ; DARWIN-64-STATIC: 	subq	$8, %rsp
-; DARWIN-64-STATIC-NEXT: 	call	_dcallee
-; DARWIN-64-STATIC-NEXT: 	call	_dcallee
+; DARWIN-64-STATIC-NEXT: 	callq	_dcallee
+; DARWIN-64-STATIC-NEXT: 	callq	_dcallee
 ; DARWIN-64-STATIC-NEXT: 	addq	$8, %rsp
 ; DARWIN-64-STATIC-NEXT: 	ret
 
 ; DARWIN-64-DYNAMIC: _dcaller:
 ; DARWIN-64-DYNAMIC: 	subq	$8, %rsp
-; DARWIN-64-DYNAMIC-NEXT: 	call	_dcallee
-; DARWIN-64-DYNAMIC-NEXT: 	call	_dcallee
+; DARWIN-64-DYNAMIC-NEXT: 	callq	_dcallee
+; DARWIN-64-DYNAMIC-NEXT: 	callq	_dcallee
 ; DARWIN-64-DYNAMIC-NEXT: 	addq	$8, %rsp
 ; DARWIN-64-DYNAMIC-NEXT: 	ret
 
 ; DARWIN-64-PIC: _dcaller:
 ; DARWIN-64-PIC: 	subq	$8, %rsp
-; DARWIN-64-PIC-NEXT: 	call	_dcallee
-; DARWIN-64-PIC-NEXT: 	call	_dcallee
+; DARWIN-64-PIC-NEXT: 	callq	_dcallee
+; DARWIN-64-PIC-NEXT: 	callq	_dcallee
 ; DARWIN-64-PIC-NEXT: 	addq	$8, %rsp
 ; DARWIN-64-PIC-NEXT: 	ret
 }
@@ -8913,8 +8913,8 @@ entry:
 	tail call void @lcallee() nounwind
 	ret void
 ; LINUX-64-STATIC: lcaller:
-; LINUX-64-STATIC: call    lcallee
-; LINUX-64-STATIC: call    lcallee
+; LINUX-64-STATIC: callq   lcallee
+; LINUX-64-STATIC: callq   lcallee
 ; LINUX-64-STATIC: ret
 
 ; LINUX-32-STATIC: lcaller:
@@ -8933,8 +8933,8 @@ entry:
 
 ; LINUX-64-PIC: lcaller:
 ; LINUX-64-PIC: 	subq	$8, %rsp
-; LINUX-64-PIC-NEXT: 	call	lcallee@PLT
-; LINUX-64-PIC-NEXT: 	call	lcallee@PLT
+; LINUX-64-PIC-NEXT: 	callq	lcallee@PLT
+; LINUX-64-PIC-NEXT: 	callq	lcallee@PLT
 ; LINUX-64-PIC-NEXT: 	addq	$8, %rsp
 ; LINUX-64-PIC-NEXT: 	ret
 
@@ -8961,22 +8961,22 @@ entry:
 
 ; DARWIN-64-STATIC: _lcaller:
 ; DARWIN-64-STATIC: 	subq	$8, %rsp
-; DARWIN-64-STATIC-NEXT: 	call	_lcallee
-; DARWIN-64-STATIC-NEXT: 	call	_lcallee
+; DARWIN-64-STATIC-NEXT: 	callq	_lcallee
+; DARWIN-64-STATIC-NEXT: 	callq	_lcallee
 ; DARWIN-64-STATIC-NEXT: 	addq	$8, %rsp
 ; DARWIN-64-STATIC-NEXT: 	ret
 
 ; DARWIN-64-DYNAMIC: _lcaller:
 ; DARWIN-64-DYNAMIC: 	subq	$8, %rsp
-; DARWIN-64-DYNAMIC-NEXT: 	call	_lcallee
-; DARWIN-64-DYNAMIC-NEXT: 	call	_lcallee
+; DARWIN-64-DYNAMIC-NEXT: 	callq	_lcallee
+; DARWIN-64-DYNAMIC-NEXT: 	callq	_lcallee
 ; DARWIN-64-DYNAMIC-NEXT: 	addq	$8, %rsp
 ; DARWIN-64-DYNAMIC-NEXT: 	ret
 
 ; DARWIN-64-PIC: _lcaller:
 ; DARWIN-64-PIC: 	subq	$8, %rsp
-; DARWIN-64-PIC-NEXT: 	call	_lcallee
-; DARWIN-64-PIC-NEXT: 	call	_lcallee
+; DARWIN-64-PIC-NEXT: 	callq	_lcallee
+; DARWIN-64-PIC-NEXT: 	callq	_lcallee
 ; DARWIN-64-PIC-NEXT: 	addq	$8, %rsp
 ; DARWIN-64-PIC-NEXT: 	ret
 }
@@ -8986,7 +8986,7 @@ entry:
 	tail call void @callee() nounwind
 	ret void
 ; LINUX-64-STATIC: tailcaller:
-; LINUX-64-STATIC: call    callee
+; LINUX-64-STATIC: callq   callee
 ; LINUX-64-STATIC: ret
 
 ; LINUX-32-STATIC: tailcaller:
@@ -9003,7 +9003,7 @@ entry:
 
 ; LINUX-64-PIC: tailcaller:
 ; LINUX-64-PIC: 	subq	$8, %rsp
-; LINUX-64-PIC-NEXT: 	call	callee@PLT
+; LINUX-64-PIC-NEXT: 	callq	callee@PLT
 ; LINUX-64-PIC-NEXT: 	addq	$8, %rsp
 ; LINUX-64-PIC-NEXT: 	ret
 
@@ -9027,19 +9027,19 @@ entry:
 
 ; DARWIN-64-STATIC: _tailcaller:
 ; DARWIN-64-STATIC: 	subq	$8, %rsp
-; DARWIN-64-STATIC-NEXT: 	call	_callee
+; DARWIN-64-STATIC-NEXT: 	callq	_callee
 ; DARWIN-64-STATIC-NEXT: 	addq	$8, %rsp
 ; DARWIN-64-STATIC-NEXT: 	ret
 
 ; DARWIN-64-DYNAMIC: _tailcaller:
 ; DARWIN-64-DYNAMIC: 	subq	$8, %rsp
-; DARWIN-64-DYNAMIC-NEXT: 	call	_callee
+; DARWIN-64-DYNAMIC-NEXT: 	callq	_callee
 ; DARWIN-64-DYNAMIC-NEXT: 	addq	$8, %rsp
 ; DARWIN-64-DYNAMIC-NEXT: 	ret
 
 ; DARWIN-64-PIC: _tailcaller:
 ; DARWIN-64-PIC: 	subq	$8, %rsp
-; DARWIN-64-PIC-NEXT: 	call	_callee
+; DARWIN-64-PIC-NEXT: 	callq	_callee
 ; DARWIN-64-PIC-NEXT: 	addq	$8, %rsp
 ; DARWIN-64-PIC-NEXT: 	ret
 }
@@ -9049,7 +9049,7 @@ entry:
 	tail call void @dcallee() nounwind
 	ret void
 ; LINUX-64-STATIC: dtailcaller:
-; LINUX-64-STATIC: call    dcallee
+; LINUX-64-STATIC: callq   dcallee
 ; LINUX-64-STATIC: ret
 
 ; LINUX-32-STATIC: dtailcaller:
@@ -9066,7 +9066,7 @@ entry:
 
 ; LINUX-64-PIC: dtailcaller:
 ; LINUX-64-PIC: 	subq	$8, %rsp
-; LINUX-64-PIC-NEXT: 	call	dcallee
+; LINUX-64-PIC-NEXT: 	callq	dcallee
 ; LINUX-64-PIC-NEXT: 	addq	$8, %rsp
 ; LINUX-64-PIC-NEXT: 	ret
 
@@ -9090,19 +9090,19 @@ entry:
 
 ; DARWIN-64-STATIC: _dtailcaller:
 ; DARWIN-64-STATIC: 	subq	$8, %rsp
-; DARWIN-64-STATIC-NEXT: 	call	_dcallee
+; DARWIN-64-STATIC-NEXT: 	callq	_dcallee
 ; DARWIN-64-STATIC-NEXT: 	addq	$8, %rsp
 ; DARWIN-64-STATIC-NEXT: 	ret
 
 ; DARWIN-64-DYNAMIC: _dtailcaller:
 ; DARWIN-64-DYNAMIC: 	subq	$8, %rsp
-; DARWIN-64-DYNAMIC-NEXT: 	call	_dcallee
+; DARWIN-64-DYNAMIC-NEXT: 	callq	_dcallee
 ; DARWIN-64-DYNAMIC-NEXT: 	addq	$8, %rsp
 ; DARWIN-64-DYNAMIC-NEXT: 	ret
 
 ; DARWIN-64-PIC: _dtailcaller:
 ; DARWIN-64-PIC: 	subq	$8, %rsp
-; DARWIN-64-PIC-NEXT: 	call	_dcallee
+; DARWIN-64-PIC-NEXT: 	callq	_dcallee
 ; DARWIN-64-PIC-NEXT: 	addq	$8, %rsp
 ; DARWIN-64-PIC-NEXT: 	ret
 }
@@ -9112,7 +9112,7 @@ entry:
 	tail call void @lcallee() nounwind
 	ret void
 ; LINUX-64-STATIC: ltailcaller:
-; LINUX-64-STATIC: call    lcallee
+; LINUX-64-STATIC: callq   lcallee
 ; LINUX-64-STATIC: ret
 
 ; LINUX-32-STATIC: ltailcaller:
@@ -9129,7 +9129,7 @@ entry:
 
 ; LINUX-64-PIC: ltailcaller:
 ; LINUX-64-PIC: 	subq	$8, %rsp
-; LINUX-64-PIC-NEXT: 	call	lcallee@PLT
+; LINUX-64-PIC-NEXT: 	callq	lcallee@PLT
 ; LINUX-64-PIC-NEXT: 	addq	$8, %rsp
 ; LINUX-64-PIC-NEXT: 	ret
 
@@ -9153,19 +9153,19 @@ entry:
 
 ; DARWIN-64-STATIC: _ltailcaller:
 ; DARWIN-64-STATIC: 	subq	$8, %rsp
-; DARWIN-64-STATIC-NEXT: 	call	_lcallee
+; DARWIN-64-STATIC-NEXT: 	callq	_lcallee
 ; DARWIN-64-STATIC-NEXT: 	addq	$8, %rsp
 ; DARWIN-64-STATIC-NEXT: 	ret
 
 ; DARWIN-64-DYNAMIC: _ltailcaller:
 ; DARWIN-64-DYNAMIC: 	subq	$8, %rsp
-; DARWIN-64-DYNAMIC-NEXT: 	call	_lcallee
+; DARWIN-64-DYNAMIC-NEXT: 	callq	_lcallee
 ; DARWIN-64-DYNAMIC-NEXT: 	addq	$8, %rsp
 ; DARWIN-64-DYNAMIC-NEXT: 	ret
 
 ; DARWIN-64-PIC: _ltailcaller:
 ; DARWIN-64-PIC: 	subq	$8, %rsp
-; DARWIN-64-PIC-NEXT: 	call	_lcallee
+; DARWIN-64-PIC-NEXT: 	callq	_lcallee
 ; DARWIN-64-PIC-NEXT: 	addq	$8, %rsp
 ; DARWIN-64-PIC-NEXT: 	ret
 }
@@ -9178,8 +9178,8 @@ entry:
 	tail call void %1() nounwind
 	ret void
 ; LINUX-64-STATIC: icaller:
-; LINUX-64-STATIC: call    *ifunc
-; LINUX-64-STATIC: call    *ifunc
+; LINUX-64-STATIC: callq   *ifunc
+; LINUX-64-STATIC: callq   *ifunc
 ; LINUX-64-STATIC: ret
 
 ; LINUX-32-STATIC: icaller:
@@ -9199,8 +9199,8 @@ entry:
 ; LINUX-64-PIC: icaller:
 ; LINUX-64-PIC: 	pushq	%rbx
 ; LINUX-64-PIC-NEXT: 	movq	ifunc@GOTPCREL(%rip), %rbx
-; LINUX-64-PIC-NEXT: 	call	*(%rbx)
-; LINUX-64-PIC-NEXT: 	call	*(%rbx)
+; LINUX-64-PIC-NEXT: 	callq	*(%rbx)
+; LINUX-64-PIC-NEXT: 	callq	*(%rbx)
 ; LINUX-64-PIC-NEXT: 	popq	%rbx
 ; LINUX-64-PIC-NEXT: 	ret
 
@@ -9237,24 +9237,24 @@ entry:
 ; DARWIN-64-STATIC: _icaller:
 ; DARWIN-64-STATIC: 	pushq	%rbx
 ; DARWIN-64-STATIC-NEXT: 	movq	_ifunc@GOTPCREL(%rip), %rbx
-; DARWIN-64-STATIC-NEXT: 	call	*(%rbx)
-; DARWIN-64-STATIC-NEXT: 	call	*(%rbx)
+; DARWIN-64-STATIC-NEXT: 	callq	*(%rbx)
+; DARWIN-64-STATIC-NEXT: 	callq	*(%rbx)
 ; DARWIN-64-STATIC-NEXT: 	popq	%rbx
 ; DARWIN-64-STATIC-NEXT: 	ret
 
 ; DARWIN-64-DYNAMIC: _icaller:
 ; DARWIN-64-DYNAMIC: 	pushq	%rbx
 ; DARWIN-64-DYNAMIC-NEXT: 	movq	_ifunc@GOTPCREL(%rip), %rbx
-; DARWIN-64-DYNAMIC-NEXT: 	call	*(%rbx)
-; DARWIN-64-DYNAMIC-NEXT: 	call	*(%rbx)
+; DARWIN-64-DYNAMIC-NEXT: 	callq	*(%rbx)
+; DARWIN-64-DYNAMIC-NEXT: 	callq	*(%rbx)
 ; DARWIN-64-DYNAMIC-NEXT: 	popq	%rbx
 ; DARWIN-64-DYNAMIC-NEXT: 	ret
 
 ; DARWIN-64-PIC: _icaller:
 ; DARWIN-64-PIC: 	pushq	%rbx
 ; DARWIN-64-PIC-NEXT: 	movq	_ifunc@GOTPCREL(%rip), %rbx
-; DARWIN-64-PIC-NEXT: 	call	*(%rbx)
-; DARWIN-64-PIC-NEXT: 	call	*(%rbx)
+; DARWIN-64-PIC-NEXT: 	callq	*(%rbx)
+; DARWIN-64-PIC-NEXT: 	callq	*(%rbx)
 ; DARWIN-64-PIC-NEXT: 	popq	%rbx
 ; DARWIN-64-PIC-NEXT: 	ret
 }
@@ -9267,8 +9267,8 @@ entry:
 	tail call void %1() nounwind
 	ret void
 ; LINUX-64-STATIC: dicaller:
-; LINUX-64-STATIC: call    *difunc
-; LINUX-64-STATIC: call    *difunc
+; LINUX-64-STATIC: callq   *difunc
+; LINUX-64-STATIC: callq   *difunc
 ; LINUX-64-STATIC: ret
 
 ; LINUX-32-STATIC: dicaller:
@@ -9288,8 +9288,8 @@ entry:
 ; LINUX-64-PIC: dicaller:
 ; LINUX-64-PIC: 	pushq	%rbx
 ; LINUX-64-PIC-NEXT: 	movq	difunc@GOTPCREL(%rip), %rbx
-; LINUX-64-PIC-NEXT: 	call	*(%rbx)
-; LINUX-64-PIC-NEXT: 	call	*(%rbx)
+; LINUX-64-PIC-NEXT: 	callq	*(%rbx)
+; LINUX-64-PIC-NEXT: 	callq	*(%rbx)
 ; LINUX-64-PIC-NEXT: 	popq	%rbx
 ; LINUX-64-PIC-NEXT: 	ret
 
@@ -9321,22 +9321,22 @@ entry:
 
 ; DARWIN-64-STATIC: _dicaller:
 ; DARWIN-64-STATIC: 	subq	$8, %rsp
-; DARWIN-64-STATIC-NEXT: 	call	*_difunc(%rip)
-; DARWIN-64-STATIC-NEXT: 	call	*_difunc(%rip)
+; DARWIN-64-STATIC-NEXT: 	callq	*_difunc(%rip)
+; DARWIN-64-STATIC-NEXT: 	callq	*_difunc(%rip)
 ; DARWIN-64-STATIC-NEXT: 	addq	$8, %rsp
 ; DARWIN-64-STATIC-NEXT: 	ret
 
 ; DARWIN-64-DYNAMIC: _dicaller:
 ; DARWIN-64-DYNAMIC: 	subq	$8, %rsp
-; DARWIN-64-DYNAMIC-NEXT: 	call	*_difunc(%rip)
-; DARWIN-64-DYNAMIC-NEXT: 	call	*_difunc(%rip)
+; DARWIN-64-DYNAMIC-NEXT: 	callq	*_difunc(%rip)
+; DARWIN-64-DYNAMIC-NEXT: 	callq	*_difunc(%rip)
 ; DARWIN-64-DYNAMIC-NEXT: 	addq	$8, %rsp
 ; DARWIN-64-DYNAMIC-NEXT: 	ret
 
 ; DARWIN-64-PIC: _dicaller:
 ; DARWIN-64-PIC: 	subq	$8, %rsp
-; DARWIN-64-PIC-NEXT: 	call	*_difunc(%rip)
-; DARWIN-64-PIC-NEXT: 	call	*_difunc(%rip)
+; DARWIN-64-PIC-NEXT: 	callq	*_difunc(%rip)
+; DARWIN-64-PIC-NEXT: 	callq	*_difunc(%rip)
 ; DARWIN-64-PIC-NEXT: 	addq	$8, %rsp
 ; DARWIN-64-PIC-NEXT: 	ret
 }
@@ -9349,8 +9349,8 @@ entry:
 	tail call void %1() nounwind
 	ret void
 ; LINUX-64-STATIC: licaller:
-; LINUX-64-STATIC: call    *lifunc
-; LINUX-64-STATIC: call    *lifunc
+; LINUX-64-STATIC: callq   *lifunc
+; LINUX-64-STATIC: callq   *lifunc
 ; LINUX-64-STATIC: ret
 
 ; LINUX-32-STATIC: licaller:
@@ -9369,8 +9369,8 @@ entry:
 
 ; LINUX-64-PIC: licaller:
 ; LINUX-64-PIC: 	subq	$8, %rsp
-; LINUX-64-PIC-NEXT: 	call	*lifunc(%rip)
-; LINUX-64-PIC-NEXT: 	call	*lifunc(%rip)
+; LINUX-64-PIC-NEXT: 	callq	*lifunc(%rip)
+; LINUX-64-PIC-NEXT: 	callq	*lifunc(%rip)
 ; LINUX-64-PIC-NEXT: 	addq	$8, %rsp
 ; LINUX-64-PIC-NEXT: 	ret
 
@@ -9402,22 +9402,22 @@ entry:
 
 ; DARWIN-64-STATIC: _licaller:
 ; DARWIN-64-STATIC: 	subq	$8, %rsp
-; DARWIN-64-STATIC-NEXT: 	call	*_lifunc(%rip)
-; DARWIN-64-STATIC-NEXT: 	call	*_lifunc(%rip)
+; DARWIN-64-STATIC-NEXT: 	callq	*_lifunc(%rip)
+; DARWIN-64-STATIC-NEXT: 	callq	*_lifunc(%rip)
 ; DARWIN-64-STATIC-NEXT: 	addq	$8, %rsp
 ; DARWIN-64-STATIC-NEXT: 	ret
 
 ; DARWIN-64-DYNAMIC: _licaller:
 ; DARWIN-64-DYNAMIC: 	subq	$8, %rsp
-; DARWIN-64-DYNAMIC-NEXT: 	call	*_lifunc(%rip)
-; DARWIN-64-DYNAMIC-NEXT: 	call	*_lifunc(%rip)
+; DARWIN-64-DYNAMIC-NEXT: 	callq	*_lifunc(%rip)
+; DARWIN-64-DYNAMIC-NEXT: 	callq	*_lifunc(%rip)
 ; DARWIN-64-DYNAMIC-NEXT: 	addq	$8, %rsp
 ; DARWIN-64-DYNAMIC-NEXT: 	ret
 
 ; DARWIN-64-PIC: _licaller:
 ; DARWIN-64-PIC: 	subq	$8, %rsp
-; DARWIN-64-PIC-NEXT: 	call	*_lifunc(%rip)
-; DARWIN-64-PIC-NEXT: 	call	*_lifunc(%rip)
+; DARWIN-64-PIC-NEXT: 	callq	*_lifunc(%rip)
+; DARWIN-64-PIC-NEXT: 	callq	*_lifunc(%rip)
 ; DARWIN-64-PIC-NEXT: 	addq	$8, %rsp
 ; DARWIN-64-PIC-NEXT: 	ret
 }
@@ -9430,8 +9430,8 @@ entry:
 	tail call void %1() nounwind
 	ret void
 ; LINUX-64-STATIC: itailcaller:
-; LINUX-64-STATIC: call    *ifunc
-; LINUX-64-STATIC: call    *ifunc
+; LINUX-64-STATIC: callq   *ifunc
+; LINUX-64-STATIC: callq   *ifunc
 ; LINUX-64-STATIC: ret
 
 ; LINUX-32-STATIC: itailcaller:
@@ -9451,8 +9451,8 @@ entry:
 ; LINUX-64-PIC: itailcaller:
 ; LINUX-64-PIC: 	pushq	%rbx
 ; LINUX-64-PIC-NEXT: 	movq	ifunc@GOTPCREL(%rip), %rbx
-; LINUX-64-PIC-NEXT: 	call	*(%rbx)
-; LINUX-64-PIC-NEXT: 	call	*(%rbx)
+; LINUX-64-PIC-NEXT: 	callq	*(%rbx)
+; LINUX-64-PIC-NEXT: 	callq	*(%rbx)
 ; LINUX-64-PIC-NEXT: 	popq	%rbx
 ; LINUX-64-PIC-NEXT: 	ret
 
@@ -9489,24 +9489,24 @@ entry:
 ; DARWIN-64-STATIC: _itailcaller:
 ; DARWIN-64-STATIC: 	pushq	%rbx
 ; DARWIN-64-STATIC-NEXT: 	movq	_ifunc@GOTPCREL(%rip), %rbx
-; DARWIN-64-STATIC-NEXT: 	call	*(%rbx)
-; DARWIN-64-STATIC-NEXT: 	call	*(%rbx)
+; DARWIN-64-STATIC-NEXT: 	callq	*(%rbx)
+; DARWIN-64-STATIC-NEXT: 	callq	*(%rbx)
 ; DARWIN-64-STATIC-NEXT: 	popq	%rbx
 ; DARWIN-64-STATIC-NEXT: 	ret
 
 ; DARWIN-64-DYNAMIC: _itailcaller:
 ; DARWIN-64-DYNAMIC: 	pushq	%rbx
 ; DARWIN-64-DYNAMIC-NEXT: 	movq	_ifunc@GOTPCREL(%rip), %rbx
-; DARWIN-64-DYNAMIC-NEXT: 	call	*(%rbx)
-; DARWIN-64-DYNAMIC-NEXT: 	call	*(%rbx)
+; DARWIN-64-DYNAMIC-NEXT: 	callq	*(%rbx)
+; DARWIN-64-DYNAMIC-NEXT: 	callq	*(%rbx)
 ; DARWIN-64-DYNAMIC-NEXT: 	popq	%rbx
 ; DARWIN-64-DYNAMIC-NEXT: 	ret
 
 ; DARWIN-64-PIC: _itailcaller:
 ; DARWIN-64-PIC: 	pushq	%rbx
 ; DARWIN-64-PIC-NEXT: 	movq	_ifunc@GOTPCREL(%rip), %rbx
-; DARWIN-64-PIC-NEXT: 	call	*(%rbx)
-; DARWIN-64-PIC-NEXT: 	call	*(%rbx)
+; DARWIN-64-PIC-NEXT: 	callq	*(%rbx)
+; DARWIN-64-PIC-NEXT: 	callq	*(%rbx)
 ; DARWIN-64-PIC-NEXT: 	popq	%rbx
 ; DARWIN-64-PIC-NEXT: 	ret
 }
@@ -9517,7 +9517,7 @@ entry:
 	tail call void %0() nounwind
 	ret void
 ; LINUX-64-STATIC: ditailcaller:
-; LINUX-64-STATIC: call    *difunc
+; LINUX-64-STATIC: callq   *difunc
 ; LINUX-64-STATIC: ret
 
 ; LINUX-32-STATIC: ditailcaller:
@@ -9535,7 +9535,7 @@ entry:
 ; LINUX-64-PIC: ditailcaller:
 ; LINUX-64-PIC: 	subq	$8, %rsp
 ; LINUX-64-PIC-NEXT: 	movq	difunc@GOTPCREL(%rip), %rax
-; LINUX-64-PIC-NEXT: 	call	*(%rax)
+; LINUX-64-PIC-NEXT: 	callq	*(%rax)
 ; LINUX-64-PIC-NEXT: 	addq	$8, %rsp
 ; LINUX-64-PIC-NEXT: 	ret
 
@@ -9562,18 +9562,18 @@ entry:
 
 ; DARWIN-64-STATIC: _ditailcaller:
 ; DARWIN-64-STATIC: 	subq	$8, %rsp
-; DARWIN-64-STATIC-NEXT: 	call	*_difunc(%rip)
+; DARWIN-64-STATIC-NEXT: 	callq	*_difunc(%rip)
 ; DARWIN-64-STATIC-NEXT: 	addq	$8, %rsp
 ; DARWIN-64-STATIC-NEXT: 	ret
 
 ; DARWIN-64-DYNAMIC: _ditailcaller:
 ; DARWIN-64-DYNAMIC: 	subq	$8, %rsp
-; DARWIN-64-DYNAMIC-NEXT: 	call	*_difunc(%rip)
+; DARWIN-64-DYNAMIC-NEXT: 	callq	*_difunc(%rip)
 ; DARWIN-64-DYNAMIC-NEXT: 	addq	$8, %rsp
 ; DARWIN-64-DYNAMIC-NEXT: 	ret
 
 ; DARWIN-64-PIC: _ditailcaller:
-; DARWIN-64-PIC: 	call	*_difunc(%rip)
+; DARWIN-64-PIC: 	callq	*_difunc(%rip)
 ; DARWIN-64-PIC-NEXT: 	addq	$8, %rsp
 ; DARWIN-64-PIC-NEXT: 	ret
 }
@@ -9584,7 +9584,7 @@ entry:
 	tail call void %0() nounwind
 	ret void
 ; LINUX-64-STATIC: litailcaller:
-; LINUX-64-STATIC: call    *lifunc
+; LINUX-64-STATIC: callq   *lifunc
 ; LINUX-64-STATIC: ret
 
 ; LINUX-32-STATIC: litailcaller:
@@ -9601,7 +9601,7 @@ entry:
 
 ; LINUX-64-PIC: litailcaller:
 ; LINUX-64-PIC: 	subq	$8, %rsp
-; LINUX-64-PIC-NEXT: 	call	*lifunc(%rip)
+; LINUX-64-PIC-NEXT: 	callq	*lifunc(%rip)
 ; LINUX-64-PIC-NEXT: 	addq	$8, %rsp
 ; LINUX-64-PIC-NEXT: 	ret
 
@@ -9628,19 +9628,19 @@ entry:
 
 ; DARWIN-64-STATIC: _litailcaller:
 ; DARWIN-64-STATIC: 	subq	$8, %rsp
-; DARWIN-64-STATIC-NEXT: 	call	*_lifunc(%rip)
+; DARWIN-64-STATIC-NEXT: 	callq	*_lifunc(%rip)
 ; DARWIN-64-STATIC-NEXT: 	addq	$8, %rsp
 ; DARWIN-64-STATIC-NEXT: 	ret
 
 ; DARWIN-64-DYNAMIC: _litailcaller:
 ; DARWIN-64-DYNAMIC: 	subq	$8, %rsp
-; DARWIN-64-DYNAMIC-NEXT: 	call	*_lifunc(%rip)
+; DARWIN-64-DYNAMIC-NEXT: 	callq	*_lifunc(%rip)
 ; DARWIN-64-DYNAMIC-NEXT: 	addq	$8, %rsp
 ; DARWIN-64-DYNAMIC-NEXT: 	ret
 
 ; DARWIN-64-PIC: _litailcaller:
 ; DARWIN-64-PIC: 	subq	$8, %rsp
-; DARWIN-64-PIC-NEXT: 	call	*_lifunc(%rip)
+; DARWIN-64-PIC-NEXT: 	callq	*_lifunc(%rip)
 ; DARWIN-64-PIC-NEXT: 	addq	$8, %rsp
 ; DARWIN-64-PIC-NEXT: 	ret
 }
diff --git a/test/CodeGen/X86/bss_pagealigned.ll b/test/CodeGen/X86/bss_pagealigned.ll
index 4a1049bc56..27c536144b 100644
--- a/test/CodeGen/X86/bss_pagealigned.ll
+++ b/test/CodeGen/X86/bss_pagealigned.ll
@@ -10,7 +10,7 @@ define void @unxlate_dev_mem_ptr(i64 %phis, i8* %addr) nounwind {
 ; CHECK: movq    $bm_pte, %rdi
 ; CHECK-NEXT: xorl    %esi, %esi
 ; CHECK-NEXT: movl    $4096, %edx
-; CHECK-NEXT: call    memset
+; CHECK-NEXT: callq   memset
   ret void
 }
 @bm_pte = internal global [512 x %struct.kmem_cache_order_objects] zeroinitializer, section ".bss.page_aligned", align 4096
diff --git a/test/CodeGen/X86/cmov.ll b/test/CodeGen/X86/cmov.ll
index f3c9a7addf..39d9d1e9ec 100644
--- a/test/CodeGen/X86/cmov.ll
+++ b/test/CodeGen/X86/cmov.ll
@@ -6,7 +6,7 @@ entry:
 ; CHECK: test1:
 ; CHECK: btl
 ; CHECK-NEXT: movl	$12, %eax
-; CHECK-NEXT: cmovae	(%rcx), %eax
+; CHECK-NEXT: cmovael	(%rcx), %eax
 ; CHECK-NEXT: ret
 
 	%0 = lshr i32 %x, %n		; <i32> [#uses=1]
@@ -21,7 +21,7 @@ entry:
 ; CHECK: test2:
 ; CHECK: btl
 ; CHECK-NEXT: movl	$12, %eax
-; CHECK-NEXT: cmovb	(%rcx), %eax
+; CHECK-NEXT: cmovbl	(%rcx), %eax
 ; CHECK-NEXT: ret
 
 	%0 = lshr i32 %x, %n		; <i32> [#uses=1]
@@ -41,7 +41,7 @@ declare void @bar(i64) nounwind
 
 define void @test3(i64 %a, i64 %b, i1 %p) nounwind {
 ; CHECK: test3:
-; CHECK:      cmovne  %edi, %esi
+; CHECK:      cmovnel %edi, %esi
 ; CHECK-NEXT: movl    %esi, %edi
 
   %c = trunc i64 %a to i32
diff --git a/test/CodeGen/X86/live-out-reg-info.ll b/test/CodeGen/X86/live-out-reg-info.ll
index 7132777b69..8cd9774983 100644
--- a/test/CodeGen/X86/live-out-reg-info.ll
+++ b/test/CodeGen/X86/live-out-reg-info.ll
@@ -1,4 +1,4 @@
-; RUN: llc < %s -march=x86-64 | grep {testb	\[$\]1,}
+; RUN: llc < %s -march=x86-64 | grep testb
 
 ; Make sure dagcombine doesn't eliminate the comparison due
 ; to an off-by-one bug with ComputeMaskedBits information.
diff --git a/test/CodeGen/X86/loop-blocks.ll b/test/CodeGen/X86/loop-blocks.ll
index ec5236b3ae..a125e54050 100644
--- a/test/CodeGen/X86/loop-blocks.ll
+++ b/test/CodeGen/X86/loop-blocks.ll
@@ -10,9 +10,9 @@
 ;      CHECK:   jmp   .LBB1_1
 ; CHECK-NEXT:   align
 ; CHECK-NEXT: .LBB1_2:
-; CHECK-NEXT:   call loop_latch
+; CHECK-NEXT:   callq loop_latch
 ; CHECK-NEXT: .LBB1_1:
-; CHECK-NEXT:   call loop_header
+; CHECK-NEXT:   callq loop_header
 
 define void @simple() nounwind {
 entry:
@@ -40,9 +40,9 @@ done:
 ;      CHECK:   jmp .LBB2_1
 ; CHECK-NEXT:   align
 ; CHECK-NEXT: .LBB2_4:
-; CHECK-NEXT:   call bar99
+; CHECK-NEXT:   callq bar99
 ; CHECK-NEXT: .LBB2_1:
-; CHECK-NEXT:   call body
+; CHECK-NEXT:   callq body
 
 define void @slightly_more_involved() nounwind {
 entry:
@@ -75,18 +75,18 @@ exit:
 ;      CHECK:   jmp .LBB3_1
 ; CHECK-NEXT:   align
 ; CHECK-NEXT: .LBB3_4:
-; CHECK-NEXT:   call bar99
-; CHECK-NEXT:   call get
+; CHECK-NEXT:   callq bar99
+; CHECK-NEXT:   callq get
 ; CHECK-NEXT:   cmpl $2999, %eax
 ; CHECK-NEXT:   jg .LBB3_6
-; CHECK-NEXT:   call block_a_true_func
+; CHECK-NEXT:   callq block_a_true_func
 ; CHECK-NEXT:   jmp .LBB3_7
 ; CHECK-NEXT: .LBB3_6:
-; CHECK-NEXT:   call block_a_false_func
+; CHECK-NEXT:   callq block_a_false_func
 ; CHECK-NEXT: .LBB3_7:
-; CHECK-NEXT:   call block_a_merge_func
+; CHECK-NEXT:   callq block_a_merge_func
 ; CHECK-NEXT: .LBB3_1:
-; CHECK-NEXT:   call body
+; CHECK-NEXT:   callq body
 
 define void @yet_more_involved() nounwind {
 entry:
@@ -134,18 +134,18 @@ exit:
 ;      CHECK:   jmp     .LBB4_1
 ; CHECK-NEXT:   align
 ; CHECK-NEXT: .LBB4_7:
-; CHECK-NEXT:   call    bar100
+; CHECK-NEXT:   callq   bar100
 ; CHECK-NEXT:   jmp     .LBB4_1
 ; CHECK-NEXT: .LBB4_8:
-; CHECK-NEXT:   call    bar101
+; CHECK-NEXT:   callq   bar101
 ; CHECK-NEXT:   jmp     .LBB4_1
 ; CHECK-NEXT: .LBB4_9:
-; CHECK-NEXT:   call    bar102
+; CHECK-NEXT:   callq   bar102
 ; CHECK-NEXT:   jmp     .LBB4_1
 ; CHECK-NEXT: .LBB4_5:
-; CHECK-NEXT:   call    loop_latch
+; CHECK-NEXT:   callq   loop_latch
 ; CHECK-NEXT: .LBB4_1:
-; CHECK-NEXT:   call    loop_header
+; CHECK-NEXT:   callq   loop_header
 
 define void @cfg_islands() nounwind {
 entry:
diff --git a/test/CodeGen/X86/peep-test-3.ll b/test/CodeGen/X86/peep-test-3.ll
index 5aaf81b4fd..a34a9784cd 100644
--- a/test/CodeGen/X86/peep-test-3.ll
+++ b/test/CodeGen/X86/peep-test-3.ll
@@ -65,7 +65,7 @@ return:                                           ; preds = %entry
   ret void
 }
 
-; Just like @and, but without the trunc+store. This should use a testl
+; Just like @and, but without the trunc+store. This should use a testb
 ; instead of an andl.
 ; CHECK: test:
 define void @test(float* %A, i32 %IA, i32 %N, i8* %p) nounwind {
diff --git a/test/CodeGen/X86/select-aggregate.ll b/test/CodeGen/X86/select-aggregate.ll
index 822e5946d3..44cafe22af 100644
--- a/test/CodeGen/X86/select-aggregate.ll
+++ b/test/CodeGen/X86/select-aggregate.ll
@@ -1,7 +1,7 @@
 ; RUN: llc < %s -march=x86-64 | FileCheck %s
 ; PR5757
 
-; CHECK: cmovne %rdi, %rsi
+; CHECK: cmovneq %rdi, %rsi
 ; CHECK: movl (%rsi), %eax
 
 %0 = type { i64, i32 }
diff --git a/test/CodeGen/X86/tail-opts.ll b/test/CodeGen/X86/tail-opts.ll
index c70c9fadd2..8c3cae9e8d 100644
--- a/test/CodeGen/X86/tail-opts.ll
+++ b/test/CodeGen/X86/tail-opts.ll
@@ -274,7 +274,7 @@ declare fastcc %union.tree_node* @default_conversion(%union.tree_node*) nounwind
 ; one ret instruction.
 
 ; CHECK: foo:
-; CHECK:        call func
+; CHECK:        callq func
 ; CHECK-NEXT: .LBB5_2:
 ; CHECK-NEXT:   addq $8, %rsp
 ; CHECK-NEXT:   ret
diff --git a/test/CodeGen/X86/widen_load-1.ll b/test/CodeGen/X86/widen_load-1.ll
index 2d34b31314..8a970bff49 100644
--- a/test/CodeGen/X86/widen_load-1.ll
+++ b/test/CodeGen/X86/widen_load-1.ll
@@ -5,7 +5,7 @@
 
 ; CHECK: movq    compl+128(%rip), %xmm0
 ; CHECK: movaps  %xmm0, (%rsp)
-; CHECK: call    killcommon
+; CHECK: callq   killcommon
 
 @compl = linkonce global [20 x i64] zeroinitializer, align 64 ; <[20 x i64]*> [#uses=1]
 
diff --git a/test/CodeGen/X86/x86-64-pic-1.ll b/test/CodeGen/X86/x86-64-pic-1.ll
index b21918ef80..46f6d335d0 100644
--- a/test/CodeGen/X86/x86-64-pic-1.ll
+++ b/test/CodeGen/X86/x86-64-pic-1.ll
@@ -1,5 +1,5 @@
 ; RUN: llc < %s -mtriple=x86_64-pc-linux -relocation-model=pic -o %t1
-; RUN: grep {call	f@PLT} %t1
+; RUN: grep {callq	f@PLT} %t1
 
 define void @g() {
 entry:
diff --git a/test/CodeGen/X86/x86-64-pic-10.ll b/test/CodeGen/X86/x86-64-pic-10.ll
index 7baa7e59e1..b6f82e23b7 100644
--- a/test/CodeGen/X86/x86-64-pic-10.ll
+++ b/test/CodeGen/X86/x86-64-pic-10.ll
@@ -1,5 +1,5 @@
 ; RUN: llc < %s -mtriple=x86_64-pc-linux -relocation-model=pic -o %t1
-; RUN: grep {call	g@PLT} %t1
+; RUN: grep {callq	g@PLT} %t1
 
 @g = alias weak i32 ()* @f
 
diff --git a/test/CodeGen/X86/x86-64-pic-11.ll b/test/CodeGen/X86/x86-64-pic-11.ll
index ef81685332..4db331cee4 100644
--- a/test/CodeGen/X86/x86-64-pic-11.ll
+++ b/test/CodeGen/X86/x86-64-pic-11.ll
@@ -1,5 +1,5 @@
 ; RUN: llc < %s -mtriple=x86_64-pc-linux -relocation-model=pic -o %t1
-; RUN: grep {call	__fixunsxfti@PLT} %t1
+; RUN: grep {callq	__fixunsxfti@PLT} %t1
 
 define i128 @f(x86_fp80 %a) nounwind {
 entry:
diff --git a/test/CodeGen/X86/x86-64-pic-2.ll b/test/CodeGen/X86/x86-64-pic-2.ll
index a52c564f96..1ce2de7209 100644
--- a/test/CodeGen/X86/x86-64-pic-2.ll
+++ b/test/CodeGen/X86/x86-64-pic-2.ll
@@ -1,6 +1,6 @@
 ; RUN: llc < %s -mtriple=x86_64-pc-linux -relocation-model=pic -o %t1
-; RUN: grep {call	f} %t1
-; RUN: not grep {call	f@PLT} %t1
+; RUN: grep {callq	f} %t1
+; RUN: not grep {callq	f@PLT} %t1
 
 define void @g() {
 entry:
diff --git a/test/CodeGen/X86/x86-64-pic-3.ll b/test/CodeGen/X86/x86-64-pic-3.ll
index 246c00f741..aa3c888ed6 100644
--- a/test/CodeGen/X86/x86-64-pic-3.ll
+++ b/test/CodeGen/X86/x86-64-pic-3.ll
@@ -1,6 +1,6 @@
 ; RUN: llc < %s -mtriple=x86_64-pc-linux -relocation-model=pic -o %t1
-; RUN: grep {call	f} %t1
-; RUN: not grep {call	f@PLT} %t1
+; RUN: grep {callq	f} %t1
+; RUN: not grep {callq	f@PLT} %t1
 
 define void @g() {
 entry:
author	Sean Callanan <scallanan@apple.com>	2009-12-18 00:01:26 +0000
committer	Sean Callanan <scallanan@apple.com>	2009-12-18 00:01:26 +0000
commit	108934c65d4cba18f08ed4fab0cae506c20fd212 (patch)
tree	693abb3580c9939943d0ee2b300d8f8242f1c931
parent	a6923131032be5e47b5e1155e69b23aa4c5e65ac (diff)